Running controlled experiments to make product decisions with evidence, not opinion
A/B testing is the closest thing product has to a scientific experiment. You take two versions of something - a headline, a flow, a feature, a price - show each to a random half of your users, and measure which one performs better. No opinions, no HiPPOs, no “I think users prefer…” - just data 🔬
You define a hypothesis (“changing the CTA from ‘Sign up’ to ‘Start for free’ will increase conversion”), split your traffic randomly between the control (A) and the variant (B), run the experiment until you have statistical significance, and read the result.The random split is what makes it powerful. Because users are randomly assigned, any difference in outcome between A and B can be attributed to the change you made - not to differences in the users themselves.
A clear hypothesis - “Changing X will cause Y because Z.” The because matters. If you don’t have a reason, you’re guessing, and a result that confirms a guess doesn’t teach you anything.A single metric - Pick one primary metric before you run the test. Teams that go fishing through ten metrics after the fact will always find something that looks significant. That’s not learning, that’s confirmation bias wearing a lab coat 😅Enough traffic - Statistical significance requires sample size. A test that reaches significance with 50 users per variant is telling you something reliable. A test with 8 users per variant is not. Use a sample size calculator before you start.Enough time - Run tests for at least one full business cycle (usually one to two weeks) to account for day-of-week effects. Stopping early when you see a positive result is a common way to get false positives.
A/B testing tells you what works better. It doesn’t tell you why. A variant that wins can still leave you puzzled about the underlying reason - which limits how much you can generalise the learning.It also requires sufficient traffic. If your product has a few hundred monthly active users, most A/B tests will never reach significance in a reasonable timeframe. Pretotyping or usability testing will serve you better at that scale.
The real value of A/B testing isn’t any individual result - it’s what it does to decision-making culture. When the team knows that opinions get tested rather than debated, meetings get shorter and egos matter less 🙌Lesson learned: the most important A/B test culture habit is publishing the results of tests that showed no difference or went the wrong way. That’s where the real learning is - and where most teams go quiet.