Le problème du "One In, One Out" (une entrée, une sortie)

At CrowdHandler it’s often the first question we are asked. “Do you support One In, One Out?” And the answer is, “yes, we do... but, for most users, we don't recommend it”.
We know from experience that simply setting the right ingress rate is a more efficient way to manage traffic than One In, One Out, and the best solution for the vast majority of our clients.
Unfortunately, many website owners hear “yes, we support One In, One Out” but don’t hear the second part of our advice. For them, the concept of One In, One Out just seems so logical that they find it hard to envisage traffic being managed any other way. So they go ahead, check the One In, One Out setting and go live... only to find queues building up and users complaining about the wait.
So what’s wrong with One In, One Out?
The problem is that the simplicity of the One In, One Out formula as most of us understand it does not match the reality of website traffic and user journeys. The One In, One Out threshold can be both irrelevant and problematic.
Irrelevant? Well, say you have a website with a capacity of 1000 users and you are letting in 100 users a minute. You know the average user journey takes ten minutes so, going from a cold start, with 0 users, you would expect to reach capacity in ten minutes. After that, if One In, One Out worked the way that many assume it does, the equation would be a simple 1000/10. You would expect the first batch of 100 users to check out in minute 10, at which point you allow the next 100 users in, and so on.
Those who paid attention in algebra class may have noticed that, by now, the 1000 capacity variable has cancelled itself out. You could just set the ingress rate to 100 and it would have exactly the same effect.

However, One In, One Out isn't just irrelevant in this scenario. It’s actively problematic. That’s because the simple One In, One Out equation also ignores two crucial factors: Session timeouts and checkout capacity.
Problem one: Session timeouts
OK, so the average user journey might take ten minutes. But quite a few users will never check out at all – and if they do, you may not know (unless you set up checkout busting). So, rather than letting user sessions linger and potentially impact capacity forever, we time them out.
CrowdHandler's default session timeout is 15 minutes, meaning every user's place will be held for 15 minutes after their last activity. This allows users to take a comfort break, answer an urgent email, or make a quick price comparison on another site during their visit, and return to the queue without having to start all over again.
So accounting for that 15 minutes, the actual average user journey is 25 minutes. Going back to our equation, 1000/25 means the effective rate is just 40 in and out per minute if we are enforcing One In, One Out at 1000 users. Using our original calculation, we would expect a queue of 3000 people to be empty after 30 minutes with a median wait of 15 minutes. But accounting for the session timeout, the first 1000 users will leave the queue in the first 10 minutes, and then the remaining 2000 users will empty at a rate of 40 per minute, taking 50 minutes. So the entire queue will actually take one hour to empty, with a median wait of 30 minutes. Twice as long.
“Ah!” you think. “But I won’t let the first 1000 users in at 100 per minute, I’ll maximise the rate and rely solely on One In, One Out to let the first 1000 users in in minute one, maximising my capacity”. Well, that’s a bad idea for two reasons, but the first is this: Under One In, One Out, in our model scenario, the remaining 2000 users in the queue will not move at all for 25 minutes when the first checked out sessions start to expire. Those users will see a queue position that isn't moving, and a terrible wait time estimate. Just the kind of user experience you are trying to avoid. Furthermore, unless you are carefully accounting for session timeout in your One In, One Out threshold, you will actually have fewer checkouts over the long run, in addition to the stubborn queue.

Problem two: Checkout capacity
This problem comes about because of another misunderstanding: basing One In, One Out settings on load test numbers.
Load tests tend to ramp up the traffic gradually (just like setting a rate in CrowdHandler!) and often model day-to-day traffic, which assumes that more visitors are browsing than buying. But during a busy product launch, or drop scenario, the ratio of checkouts will be much higher than normal traffic. You should presume that most users you put in at the start of a user journey will check out at the earliest opportunity.
In many ways it doesn't matter how many concurrent users you think your site can handle. The issue is how many checkouts or transactions it can handle per minute. While many sites may handle 1000 concurrent users, very few can process 1000 checkouts per minute. Basing One In, One Out settings on concurrent user metrics will not account for the spike in actual transactions that occurs during a drop or launch, and the site is more likely to crash under the surge of checkouts that will occur in minute 10 of our model scenario, if you let 1000 users in on minute one.
A useful analogy is a physical store with 1000-person capacity but only 10 registers. While the store can comfortably hold 1000 customers, it can only efficiently process 10 orders per minute. Letting in 1000 customers at once will just create long queues of frustrated people, with no progress being made as transactions back up. The same applies online, except that the risk is not just of long queues, but of crashed servers.
How to get it right
One way to mitigate the issues with One In, One Out would be to do some complicated calculations during setup. These might involve things like increasing the One In, One Out capacity, reducing the session timeout settings, calculating session-to-checkout ratios, ensuring “Destroy session on checkout” is switched on and working out real user journey times.
In fact, we got so sad watching customers sabotage their on-sales with bad One In, One Out settings, we added a checker which attempts these calculations and shows them recommendations:

For some customers – those for whom simultaneous sessions really are a critical application metric; who are confident they understand all of the outputs of their load tests, and have a clear understanding of the optimal settings for session timeouts and their impact on the One In, One Out threshold – these calculations may make sense. If you are that customer, we salute you.
But, for most customers, the answer is much simpler: forget complicated calculations based on theoretical limits and unreliable load tests, and focus on the metrics that genuinely impact user experience. Success comes from understanding your proven peak throughput, not hypothetical concurrent users. 
For most sites, that means adjusting the ingress rate to a number a little higher than your maximum sustainable checkout rate. (A good starting point is to divide the maximum number of orders you’ve done in a successful one-hour period by 60, then set the rate just a little higher than that.) This will allow a good throughflow of visitors to your site.
Start conservatively, then monitor the performance closely. If the queue is getting long but the site is maintaining full health, you can always increase the rate; CrowdHandler's Autotune feature can help you here.
In summary: focus on proven performance
As a queue strategy it may sound logical, but a simplistic understanding of One In, One Out fails to account for real-world factors in execution. It often results in issues like long wait times, frustrated customers, and failed checkouts, because calculations are based on hypotheticals: unreliable or unrealistic metrics.
So, although we support it, our advice is: don’t set a One In, One Out threshold without understanding exactly what it means, and focus on your checkout rates over concurrent user metrics. By focusing on proven performance, and the reality of your site limits, you will improve the user experience for everyone.
S'inscrire