Back to Blog

Keep calm and cache: The early days of

/ News
Our story about the small role we played in the early stages of New Zealand's response to COVID-19.

Now that New Zealand has moved into Level 1 and has no active COVID-19 cases, we thought it would be an appropriate time to share a story about the role SiteHost played in assisting with the early stages of New Zealand's response to COVID-19.

On the 17th of March, a customer provisioned an interesting website on our Cloud Container platform — Those who are familiar with Cloud Containers will know they are 100% automated, and with tens of thousands of websites on the platform, it’s not necessarily unusual for websites of some significance to appear on our servers without any prior warning. In this case, like many businesses we were busy transitioning the team to work from home so seeing a domain like caught our attention!

It is highly likely that you visited in the last couple of months, it was a great resource designed and built by the talented teams at Voyage and Clemenger BBDO. But just in case you were tramping and completely missed lockdown I will give you a quick overview. This website played a huge part in uniting our team of five million kiwis against COVID-19, and it was the source of truth on how New Zealanders should operate and conduct themselves at the various stages of lockdown.

At this point it was not clear to us just how much traffic this website was going to receive. One thing however that was immediately apparent was that having our specialist team monitoring the website 24x7 was going to be essential to its success. And as part of this work we proactively ensured it was running on the latest generation of AMD hardware allowing plenty of headroom should we needed to scale quickly, which as you’ll see later came in quite handy.

By the 18th March, the situation around the world was making it clearer that our local authorities were going to have to come up with a significant response. We began receiving positive feedback from the public around our involvement hosting, and a large number of people reached out to offer support and advice. Something I have come to realise over the years (and need to remind myself from time to time) is that things are always more complicated than they seem. As such we wanted to offer a glimpse behind the curtain of the work that went on at SiteHost supporting the website that week.

The focus on the website grew to a level of national importance with our Prime Minister, Jacinda Ardern, directing the nation to go to the website on the 19th causing a sudden spike in traffic and the first of two outages. This occurred around 1pm on the 19th.

The team responded to the issue straight away, resolving the issue within a few minutes by assigning more resources and tuning processes to use the resources effectively. Following this incident we immediately realised that these were not ordinary times nor was this an ordinary engagement and a different set of solutions was going to be required.

While we had allocated some additional resources, the delays associated with autoscaling combined with the sheer scale of the spikes in traffic meant continuing without aggressive server-side caching was simply not going to end well. Limited control over the domain name and DNS for the website also meant a CDN solution was unavailable to us in the short term. A project plan was put together in the afternoon of the 19th for the implementation of server-side caching.

Unfortunately, no feedback on the caching plan was received on Friday the 20th, and like the rest of New Zealand we went into the weekend thinking that we would likely hear more about COVID-19 on Monday.

On the morning of Saturday the 21st of March, one of our team members advised that the Prime Minister was going to be making an address to the nation at midday. We had no prior knowledge of this address which was the first announcement of the Alert Level system, nor that the PM would be directing viewers to

Being a small, nimble team - a number of us tuned in and we had senior engineers monitoring the website traffic levels as the address occurred. When the PM advised the nation to head to the site instantly locked up and we immediately scaled the resources to cope. This time the site was down for approximately ten minutes while we scaled it up 600%, and re-tuned the services to use those resources. After resolving a couple of bottlenecks in real-time during which a few tweets about error pages and some #hugops messages were posted the site settled down.

Sending hug-ops to the covid19 website folk.
— mikeforbes! (@mikeforbes) , March 20, 2020

Focus immediately shifted to caching and we had a conference call where the project plan was approved, and so two members of our team volunteered to give up most of their Saturday (and some of their Sunday). Working from home, one in Auckland another in New Plymouth they went about collecting technical requirements around what we could safely cache and for how long. They then implemented the caching, tested it worked as expected and rolled the changes to production. This was all done while the website was under heavy load due to the huge amount of public interest. Those individuals rose to the challenge and did this without being asked and while keeping online throughout.

The caching solution we settled on utilised Nginx as it was already part of the stack for our Cloud Containers and we knew it would perform well without needing to introduce additional complexity. Once the caching was in place, the site didn’t skip a beat and despite seeing some significant networking spikes after the caching was in place the website never utilised more than two physical CPU cores.

Graph showing load times drop by more than 50%.
Sustained load showing a huge drop when our caching implementation went live.

When the infrastructure was ready the website was moved to a standard government platform as planned, but it was humbling to be the first port of call for a scalable and nimble service in those early days. Especially when time was of the essence in helping millions of kiwis fight COVID-19.

The caching solution worked so well that we also used it to ensure the website performed optimally for Fiji’s fight against COVID-19. We have since evolved this caching solution into a standard feature for our Cloud Container customers to benefit from in the future and to ensure we can respond even faster for our customers who get sudden bursts of traffic in the future.

If you’re an existing Cloud Container customer or you have a website that needs a performance boost please feel free to get in touch for early access to our Simple Cache feature.

We take our responsibility helping all of our customers here in NZ and abroad very seriously and so are always looking at ways we can improve. Upon review we identified a few key lessons from this project which we think are worth mentioning:

  • While being able to spin up resources quickly and easily is great in the cloud - reaching out for advice on server tuning, caching and CDN advice can be crucial and is best done early on.

  • Open communication channels around events that might place demands on any and all pieces of infrastructure will ensure you get the best results from any campaigns with the lowest amount of downtime.

  • Autoscaling is a great tool, but does not solve all scaling problems.

  • Caching can enable your website to handle more traffic, provide a faster experience for your customers and be cheaper all at the same time.

  • Be kind. Stay home if you're sick. Wash your hands.

As a 100% kiwi owned business who are committed to the long term success of all our customers we were extremely proud to be called on to do our part, and I am extremely proud of our team who rose to the challenge. Thanks for reading.