AB split test calculator

Version Include Trials Successes Aprox probability of being best 95% chance conversion rate between
A    
B    
C    
D    



 

The hard part of split test analysis is sorting out

  • the signal of which version performs best from
  • the noise of random variation.

This calculator can tell you, given the data you have, how likely is it that A is better than B

The usual approach to analysing split tests is to use a z-test or similar. There’s a few problems with this approach. The requirements are restrictive, and the output can be difficult to interpret correctly: if its assumptions are satisfied, a z-test tells you the probability of getting results at least as extreme as you actually got, if A were the same as B (the probability of the data under the ‘null hypothesis’). If the result is very improbable under the null hypothesis, you can safely say A is different to B.
Clearly, this is a couple of steps removed from the business question of whether A beats B.

A Bayesian approach can take these steps and tell you the simple probablity that A beats B given the data you have. Often this can enable you to draw meaningful inferences, even where conversion rates and sample sizes are low.
Sometimes this allows you to compare final conversion rates where once you may have been stuck looking at click-through, in pursuit of statistical significance.

The plots show the probability distribution of conversion rates, given the data. The probabilities of being the most successful version, displayed in the table, are based on a random sample of several thousand points within the distribution. For experiments that are close, you will notice the probabilities vary a little each time you re-calculate.

The calculations depend on a few assumptions. In particular it is assumed that each trial has equal probability of success, so if something else changed during your experiment, it may throw out the results (also a problem for z and G tests, of course).

Request for feedback: This is new – so please check it out and let me know what you think, and where the bugs are.

Reading the graph

  • The horizontal axis is conversion rate expressed as a percentage.
  • The area under the curve between any two points on the horizontal axis represents the probability that the conversion rate lies between those points.
  • The vertical axis shows a scale that makes the whole area under each curve integrate to 1 – so that the area represents probability.

The spread of the curve represents how precisely the experiment has measured the conversion rate.

The extent to which the areas under the curves overlap corresponds with your experiment not separating the probable conversion rates.

  • If the means are reasonably well separated but the curves are wide, you need more trials.
  • If the means are very close together and the curves are getting quite steep, there probably isn’t much difference between A and B in terms of conversion rate.

You will see that as your number of trials and conversion increases up, the sharpness, and hopefully separation, of the peaks increases. What you are aiming to achieve is a clear signal of well separated peaks.

Why use it?

A Bayesian approach to analysis of AB tests has many important advantages compared to approaches for estimating statistical significance.

You can obtain a very clear picture of the probable spread of conversion rates, even if you have a limited number of trials. You decide what level of confidence you need.

The probabilities you look at in this graph are the ones most directly relevant to the business question: which has the best conversion rate. There is no null hypothesis involved.

The traditional approach to analysis of split tests is to calculate some measure of the statistical significance of the result such as a p value. This has a few issues:

  • it can be difficult to reach significance in measuring your final conversion rate, unless you have a lot of traffic and a reasonable conversion rate.
  • The most commonly used formulas for calculating p values assume that you set your number of samples in advance of the test. If you constantly check for significance and stop the test when you find it, you are at risk of considering a result more significant than it is. See how not to run an AB test
  • The p value tells you the probability of getting a result as extreme as you got, assuming the null hypothesis – that the conversion rates for A and B are the same. This is several steps away from telling you which of A or B is better.

Assumptions and the maths

The calculation assumes that you are measuring a variable that has only two values: success and failure, and that the assumptions of a binomial distribution apply.

The posterior probability is a beta distribution.

A uniform prior probability is assumed.

Technology

The distribution is calculated and plotted using the jStat javascript statistical library.

Other References

Next steps

Looking at analysis where the variable in question is not binary, for example, spend-per-customer or time-on-site

Be Sociable, Share!

6 comments to AB split test graphical Bayesian calculator

  • Justin

    This is really great. Thanks for posting. I was curious if you had any thoughts on the best way to split test when revenue per view is the defining metric? Higher success rates is one part of the story, but when you have revenue differences among the treatments it ads another level of complexity. Would simply a comparison of means be sufficient?

    • Justin

      Thanks Justin. This is a really interesting question – the calculator here only works for yes/no type variables so far, and we can expect the distribution of revenue per visit to be different.

      I think you need to be careful with a simple comparison of the means, especially if your sample is not very large. Without some mathematical analysis it’s very hard to know if how much of the difference is likely to be random variation.

      I’m going to need a bit of time to work up an answer. In the interim if anyone has one please post.

  • Justin

    Thanks! Looking forward to your answer.

  • JT

    Would you be willing to share the math behind your calculations in the tool? Additionally, I’m interested in your thoughts on Rev/Visit as a metric and how to calculate a probability of success.

    Great post and tool though.

    • Justin

      Good questions: very happy to share. I’ve sent an email, and restored the links to sites showing the maths. (I accidentally knocked these off the post in an edit a while ago, so very glad you asked this question!)

  • Kartik

    This is very interesting. I recently studied Bayesian at Carnegie and it feels to great to see it practically implemented. Could you share the calculation behind it ?

Leave a Reply

  

  

  

You can use these HTML tags

<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>