Most of your A/B tests will fail (and that's OK)

The vast majority of experiments don't do anything at all, so make sure you plan, budget and manage expectations accordingly

29 April 2022

Illustration of many arrows missing a target and one hitting it

People tend to have a unrealistic expectations about A/B testing. Vendors and agencies alike would have you believe that experimentation platforms are magical money machines that turn button colours and copy tweaks into double-digit conversion rates increases.

One highly resourced program I know of built its business case on a strike rate of 50%. Every second experiment they ran, they proposed, would generate a revenue uplift! That's complete and utter madness.

A meta-analysis of 20,000 anonymised experiments by over a thousand Optimizely customers conducted by Stefan Thomke and Sourobh Ghosh gives a more realistic view. Just 10% of experiments had a statistically significant uplift for their primary metric. Airbnb, Booking.com and Google all report a similar rate of success:

At Google and Bing, only about 10% to 20% of experiments generate positive results. At Microsoft as a whole, one-third prove effective, one-third have neutral results, and one-third have negative results. All this goes to show that companies need to kiss a lot of frogs (that is, perform a massive number of experiments) to find a prince.

The vast majority of experiments will not impact your tracked metrics one way or another. Many optimisation programs struggle to acknowledge this fact. They plod along in obscurity, not wondering why they’re not getting the same results as 'everyone else'. They sheepishly sweep their ‘failures’ under the rug which ensures that they will never learn from them, stagnating at level 1.

The only failed test is one that doesn’t teach you anything

I don’t like to talk about ‘failed’ tests at all. A properly designed experiment will be successful no matter how the results turn out, negative, positive or simply flat.

While your ‘success’ rate can happily hover at 10% (and it absolutely will sometimes be lower), you need to ensure that your 'learning rate' is at 100%. A great way to do this is to ensure that your experiments are seeking to validate a hypothesis born out of user research, data or other insights.

Many new optimisation programs haven’t yet learned this lesson. Instead, they come up with interesting ideas and then see if they work. They're using experimentation as a platform to prove how clever they are. I know, I did it myself for a while. Eventually, you discover that it’s like throwing shit at a wall. But probably less productive.

These are sorts of experiments tend to lead to a dead-end:

“I thought this was a good idea but unfortunately our customers issued a collective shrug.” There’s no clear path forward. Dead end, game over, you lose.

Learning and iteration are essential for success

Creating a safe space for failure, while putting an emphasis on learning is critical for an experimentation program to reach any level of maturity or success.

Moving out of the jackpot mindset and toward intentional, iterative learning is a critical leap that many programs never make.

Fail, fail, fail, succeed. But most of all, make sure your experiments are teaching you something and they won't be failures at all.