The hard part of split test analysis is sorting out
- the signal of which version performs best from
- the noise of random variation.
This calculator can tell you, given the data you have, how likely is it that A is better than B
The usual approach to analysing split tests is to use a z-test or similar. There’s a few problems with this approach. The requirements are restrictive, and the output can be difficult to interpret correctly: if its assumptions are satisfied, a z-test tells you the probability of getting results at least as extreme as you actually got, if A were the same as B (the probability of the data under the ‘null hypothesis’). If the result is very improbable under the null hypothesis, you can safely say A is different to B.
Clearly, this is a couple of steps removed from the business question of whether A beats B.
A Bayesian approach can take these steps and tell you the simple probablity that A beats B given the data you have. Often this can enable you to draw meaningful inferences, even where conversion rates and sample sizes are low.
Sometimes this allows you to compare final conversion rates where once you may have been stuck looking at click-through, in pursuit of statistical significance.
The plots show the probability distribution of conversion rates, given the data. The probabilities of being the most successful version, displayed in the table, are based on a random sample of several thousand points within the distribution. For experiments that are close, you will notice the probabilities vary a little each time you re-calculate.
The calculations depend on a few assumptions. In particular it is assumed that each trial has equal probability of success, so if something else changed during your experiment, it may throw out the results (also a problem for z and G tests, of course).
Request for feedback: This is new – so please check it out and let me know what you think, and where the bugs are.
Reading the graph
- The horizontal axis is conversion rate expressed as a percentage.
- The area under the curve between any two points on the horizontal axis represents the probability that the conversion rate lies between those points.
- The vertical axis shows a scale that makes the whole area under each curve integrate to 1 – so that the area represents probability.
The spread of the curve represents how precisely the experiment has measured the conversion rate.
The extent to which the areas under the curves overlap corresponds with your experiment not separating the probable conversion rates.
- If the means are reasonably well separated but the curves are wide, you need more trials.
- If the means are very close together and the curves are getting quite steep, there probably isn’t much difference between A and B in terms of conversion rate.
You will see that as your number of trials and conversion increases up, the sharpness, and hopefully separation, of the peaks increases. What you are aiming to achieve is a clear signal of well separated peaks.
Why use it?
A Bayesian approach to analysis of AB tests has many important advantages compared to approaches for estimating statistical significance.
You can obtain a very clear picture of the probable spread of conversion rates, even if you have a limited number of trials. You decide what level of confidence you need.
The probabilities you look at in this graph are the ones most directly relevant to the business question: which has the best conversion rate. There is no null hypothesis involved.
The traditional approach to analysis of split tests is to calculate some measure of the statistical significance of the result such as a p value. This has a few issues:
- it can be difficult to reach significance in measuring your final conversion rate, unless you have a lot of traffic and a reasonable conversion rate.
- The most commonly used formulas for calculating p values assume that you set your number of samples in advance of the test. If you constantly check for significance and stop the test when you find it, you are at risk of considering a result more significant than it is. See how not to run an AB test
- The p value tells you the probability of getting a result as extreme as you got, assuming the null hypothesis – that the conversion rates for A and B are the same. This is several steps away from telling you which of A or B is better.
Assumptions and the maths
The calculation assumes that you are measuring a variable that has only two values: success and failure, and that the assumptions of a binomial distribution apply.
A uniform prior probability is assumed.
- Bayesian A/B testing with theory and code – The Technical
- Random inequalities V: beta distributions John D. Cook
- Book: Bayesian Statistics: An Introduction Peter M Lee. (avail Amazon UK) – an approachable introduction and the the first dead-tree book I’ve been compelled to buy for while!
- Christopher Lee’s Lectures on Vimeo – a great introduction
Looking at analysis where the variable in question is not binary, for example, spend-per-customer or time-on-site