A/B Significance
Test statistical significance between two conversion rates using a two‑proportion z‑test. Enter visitors and conversions for A and B, along with the alpha threshold.
What is A/B significance?
Significance tests estimate the likelihood observed differences are due to chance. A two‑proportion z‑test compares conversion rates between variants.
How to use this calculator
Provide visitors and conversions for A and B, and choose an alpha (commonly 5%). The tool returns conversion rates, lifts, p‑value, and significance.
Why it matters
Significance guards against false positives from random noise, helping you ship changes that truly improve outcomes.
Assumptions & limitations
Assumes independent samples and sufficient size. For rare events or small samples, consider exact tests or Bayesian approaches.
Formula and interpretation
Let CRA = conversionsA ÷ visitorsA, CRB = conversionsB ÷ visitorsB. The pooled rate is p = (conversionsA + conversionsB) ÷ (visitorsA + visitorsB). The standard error is SE = √( p(1 − p) (1/visitorsA + 1/visitorsB) ). The test statistic is z = (CRB − CRA) ÷ SE. The two‑tailed p‑value evaluates the probability of observing a difference as extreme under the null (no true difference).
Sample size, power, and MDE
Statistical power is the probability of detecting a true effect. Larger samples increase power and reduce the minimum detectable effect (MDE). Define a practical lift you care about, pick α (e.g., 5%) and desired power (e.g., 80%), then estimate the required sample size. Avoid underpowered tests that frequently yield inconclusive results.
One‑tailed vs two‑tailed tests
Two‑tailed tests detect differences in either direction and are appropriate when you simply check for change. One‑tailed tests can be used when you only care about improvements, but they must be pre‑registered and not changed mid‑test to avoid bias.
Peeking and multiple comparisons
Repeatedly checking significance early (“peeking”) inflates false positives. Use fixed horizons or sequential methods. When testing many variants or metrics, apply corrections (e.g., Bonferroni, Benjamini‑Hochberg) or control the false discovery rate.
Step‑by‑step example
With A = 500 conversions / 20,000 visitors (2.50%) and B = 600 / 20,000 (3.00%), absolute lift is +0.50 percentage points, relative lift is +20%. The z‑test computes a p‑value; if p < 0.05, we call the result statistically significant.
Best practices
- Define success metrics and stopping rules before launching tests.
- Segment by traffic source and device to check consistency.
- Run long enough to cover typical weekly seasonality.
- Focus on practical significance (impact, cost) alongside p‑values.
- Validate winners with follow‑up experiments or rollout monitoring.