A deep dive into Autotune

One of CrowdHandler's exclusive features is Autotune. 

When you have a lot of traffic waiting to enter your site, this nifty tool regulates access by sampling the performance of your pages and finding the right ingress rate: the rate at which people can move out of the waiting room and onto your domain.

We thought we'd take a deep dive into Autotune to explain why we added the feature, how it works and how you can make the most of it.

On-sales are stressful!

The origins of CrowdHandler lie in a digital agency that specialised in building and maintaining ecommerce websites for large entertainment brands.

The design of CrowdHandler, and features such as Autotune, is based on extensive experience of managing ticketing on-sales.

Before Autotune, we found we needed to put together an ‘incident room’ for every major on-sale. This meant a team of three or four people, each monitoring different performance metrics (using monitoring services, and tools like New Relic) and co-ordinating actions across multiple screens.

It was stressful.

That’s why, when we built the CrowdHandler product, we wanted the dashboard to show everything you needed to know in a single view. We wanted a single, non-technical person to be able to access all the metrics and understand what was happening at a glance. That meant getting performance information onto the same screen that shows you queue activity. Once we were collecting that information, an Autotune feature was the next obvious step.

So how does Autotune work?

One of the main reasons people use a waiting room is to protect their website infrastructure from crashing during an on-sale. And a key way to know if a website is healthy – or if it's about to crash – is by measuring the average page load times for users.

The reason page load time is a strong metric for server health is because the harder your servers are working, the longer they will be taking to build and generate those pages.

Of course, slow page load times also have a direct impact on user experience. After all, as well as making sales, you also need to make sure users trust that the process is going well. 

You can think of Autotune as a continual load test, using live traffic to ascertain the right rate to send users from your waiting room to your website. Based on the feedback it is getting, Autotune will constantly adjust the rate to ensure the optimum number of users are being let in. This prevents crashes, but it also prevents queues from getting longer than they need to.

A deeper dive

Sampling page performance is an out-of-the-box feature of CrowdHandler – in fact, it's incorporated into the same code that checks if the user should be granted access to the page – so it doesn’t require additional integration work. It allows us to follow the user out of the queue and onto your website, sample the performance of the pages they are loading, and log those average page load times against different URLs.

(Logging the times against URLs also means we can zone in on specific pages - so, if you notice the same pages regularly appearing at the top of your list, it's worth your developers having a look at them, as they might be targets for optimization.)

To be specific, the metric we work with is Time To First Byte. Whilst the SEO and UX community often look at other page load metrics – which can be affected by things like internet speed, or the time a browser takes to render a page – Time To First Byte is essentially measuring how long it's taking your server to generate the page. So, it's the most reliable indicator of the performance of your servers.

How does Autotune work out how many users to send through?

You can configure acceptable performance metrics for your site. We'll give you some suggested defaults, but you can configure what counts as a slow page, and how many slow pages are acceptable. (The suggested defaults are five seconds for the page load time, and 2% as an acceptable percentage of slow pages – but you may want to tweak this if you know you have legitimate pages that always load slowly, for example during the checkout process.)

Autotune will use these metrics as a target, to control the ingress rate.

However, the mechanism is not as simple as “if the percentage of slow pages is under the threshold, let everyone in”. If CrowdHandler did this, it would quickly cause a crash as everyone floods the site. Instead, Autotune starts conservatively, monitoring the error rate, and responding accordingly.

Using the coefficients that our algorithm has learned from the many real world on-sales we've facilitated, Autotune can judge how quickly you're heading towards the error rate you've specified, and moderate the rate of users coming through accordingly. It’s like having your foot on an accelerator pedal as you judge the road ahead – if you’re approaching the limit fast, Autotune will slow the ingress rate down. If you're way under the acceptable threshold you’ve specified, it will push harder to let more people through.

(By the way: if you think this mechanism sounds familiar, you're probably right. The PID controller control loop mechanism has been used by engineers since the 1920s. It's the same algorithm your car is using when you set it to cruise control.)

What does Autotune look like in practice?

In practice, you'll see a cycle.

During quiet periods with very little traffic, Autotune will maintain a rate with enough headroom to deal with business as usual, and no queues will kick in. When demand starts to exceed the rate, it samples performance and tracks demand, opening up the rate to keep the queue moving at the optimal speed. If the website slows down, it reduces the rate to maintain healthy response times. Toward the end of the on-sale, as demand ebbs away, Autotune returns the rate to a business-as-usual rate.

We’ve found that this cycle reflects exactly how an experienced person who is skilled at using CrowdHandler reacts during an on-sale. However, Autotune responds much more quickly. Humans tend to fret over their decisions, either taking risks or being over-cautious, and this tends to result in queues lasting much longer than necessary, or worse – crashed sites. 

And, importantly – unlike a human – Autotune won't panic if a site becomes unresponsive for a moment. This kind of performance blip happens more often than you think: a server decides to run a maintenance operation, or someone runs a sales report in the middle of a busy on-sale. If it happens, Autotune will recognise the slowdown much more quickly – and recover more quickly too, opening up the rate way before a spooked human operator would.

Start using the Autotune feature today

Autotune rate management is exclusive to CrowdHandler and, nowadays, we would not recommend operating a major on-sale without it. That's why we've made it a core part of the CrowdHandler product.

Look for the Autotune feature on your CrowdHandler dashboard.

Sign up