Introduction to the Chi-Squared Algorithm in Shogun's A/B Testing Feature

Why A/B Testing is Impactful for Merchants

A/B Testing empowers merchants to make decisions based on concrete data rather than assumptions or intuition. By testing different variants of your Shogun pages—such as headlines, product images, call-to-action buttons, and layout—merchants gain valuable insights into what resonates best with their shoppers. It’s low-risk because the merchant controls the decision on how large a testing population is selected.

A/B Testing encourages merchants to iteratively improve their websites over time. Small tweaks can significantly improve conversion rates, ultimately boosting revenue.

Why Statistical Validity of Outcomes Matters

Ensuring statistical validity in A/B Testing outcomes is critical for discerning merchants. Trusting Shogun’s A/B Testing analytics enables merchants to distinguish meaningful performance enhancements from random irregularities. By validating outcomes, merchants can make informed decisions, feel confident they are publishing the best version of their Shogun pages, and drive sustained growth in the competitive e-commerce landscape.

Why Chi-Squared for Shogun's A/B Testing Feature?

A Chi-Squared test is particularly well-suited for testing where outcomes are categorical in nature, such as clicks or conversions. It is also robust and doesn't require assumptions about the underlying distribution of data. Finally, it is a widely accepted and established method in statistics, providing merchants with confidence in the reliability of the results generated by Shogun's A/B Testing feature.

By utilizing the Chi-Squared test in Shogun's A/B Testing feature, merchants can make data-driven decisions with confidence, ensuring that the observed differences between variants are not merely due to chance but are statistically significant.

What is Chi-Squared?

Chi-Squared (χ²) is a statistical method used to determine the significance of differences between observed and expected frequencies in categorical data. In the context of Shogun’s A/B testing feature, it helps to ascertain whether the differences observed between two variants (A and B) of a webpage, advertisement, or any other element are statistically significant or simply due to chance.

How Does Chi-Squared Work?

The Chi-Squared algorithm in A/B testing essentially compares the observed distribution of outcomes (such as clicks, conversions, or any other desired action) between the control (A) and variant (B) groups with the expected distribution if there were no difference between the groups. It follows these steps:

Formulation of Hypotheses:
- Null Hypothesis: The variant(s) have no impact on the target metric (the changes and the test metric are independent of each other)
- Alternative Hypothesis The variant(s) have an impact on the target metric
Data Collection and Tabulation:
- Collect data on the outcomes of interest from both the control and variant.
- Tabulate this data into a contingency table, showing the observed outcomes frequencies for each group (ex: conversion rates for both the control and variant).
Calculation of Expected Frequencies:
- Calculate the expected frequencies for each cell in the contingency table under the assumption of no difference between the groups. This is typically done by applying the overall probability of each outcome to the total number of observations in each group.
Calculation of Chi-Squared Statistic:
- Compute the Chi-Squared statistic by comparing each cell's observed and expected frequencies in the contingency table. The formula for this is: χ2=Σ((O−E)2/E)χ2=Σ((O−E)2/E) where:
  - OO = Observed frequency
  - EE = Expected frequency
  - ΣΣ = Summation over all cells
Determination of Significance:
- Compare the calculated Chi-Squared statistic to a critical value from the Chi-Squared distribution with appropriate degrees of freedom based on number of variants.
- If the calculated Chi-Squared value exceeds the critical value, the null hypothesis is rejected, indicating there is some impact based on the changes you made in your variant(s)
- Calculate the significance level, the probability of falsely rejecting the null hypothesis and look for 0.05% or lower. In other words we have 95% confidence that the difference we observed is not due to chance.

Shogun, of course, does this for you.

How is a winner determined?

Chi-squared tells us if the difference we observed was due to chance; however, it does not take into consideration directionality. Thus, we have confidence that the difference is statistically significant, but that difference could be negative or positive.

A winner is determined by examining each variant's conversion rate and looking for the variant with the greatest improvement over the control (default variant).

How do I determine what state my experiment is in?

While Chi-Squared tells us whether any differences we observe are statistically significant, it does not indicate a state or provide direction in how you should interpret the results. Shogun utilizes the above statistical information to provide a status and an estimated time to significance to provide additional clarity on the experiment's state.

What are the different statuses?

No data:
- We did not observe any traffic to the page after the experiment was published. You will need to direct traffic to this page.
Insufficient data:
- We have observed a minimal amount of traffic to this page since the experiment was published but not enough to draw any conclusions. You should continue to direct traffic to this page.
Inconclusive
- We have observed enough data (reached the desired sample size), however the results are too close to reach the desired significance level within a reasonable amount of time. You should end the experiment and try something new.
Directional
- We have observed some data. While the results haven't reached the desired significance level of 0.05, they have reached 0.25% or less. Given more time, the experiment is likely to reach the desired significance level.
Not significant
- We have observed some data but the results have yet to reach the directional significance level of 0.25% or lower.
Significant
- We have reached the desired significance level of 0.05 (95% confidence).

How is time to significance calculated?

Time to significance uses a mathematical formula based on a required sample size, and the sample collection rate.

Calculate required sample size per variant:
- Calculate the number of samples (sessions) required to reach a 95% confidence level based on how traffic is distributed between variants.
- By default we assume a conversion rate of 10% with a desired effect size of a 25% lift. As the experiment matures and we start to observe a conversion rate on the control (default variant) this measurement will be used as the baseline.
Calculate sample rate:
- Calculate the number of samples collected in the lowest performing variant since the experiment started and divide by the number of minutes the experiment has been running for.
Calculate time:
- Divide the required sample size by the sample rate to get the number of minutes it is estimated to take to reach the desired significance level and effect size.