A 2$\sigma$-test is a quick “back of the envelope” technique for evaluating whether the outcome of an experiment is sufficient evidence to reject a null hypothesis.

<aside> <img src="/icons/fleur-de-lis_purple.svg" alt="/icons/fleur-de-lis_purple.svg" width="40px" />

Definition (2$\sigma$-rule for Rejection Criteria)

Assumptions

  1. The null hypothesis $H_0$, alternative hypothesis $H_1$, and test statistic $X$ are defined, but you are seeking a good first guess for a rejection criterion.
  2. Your probability model for $X$ is approximately Gaussian when the null hypothesis is true.
  3. The mean, $\mu = E_{H_0}(X)$, and standard deviation, $\sigma =\mathrm{SD}_{H_0}(X)$, are known for your test statistic when the null hypothesis $H_0$ is true.

Then the $2\sigma$-rule for constructing a rejection region is

$$ \mathcal{R} = \{X < \mu - 2 \sigma\} \, \cup \, \{X > \mu + 2\sigma\}. $$

</aside>

Example: Pfizer COVID Vaccine.

Frequently Asked Questions

What are some circumstances when this rule does not work?

The $2\sigma$-rule will causes errors in your statistical decisions most often when your probability model is skewed, or when right and left tails of the distribution look substantially different than a standard Gaussian distribution. Skewness can happen when a binomial distribution has a very low or very high success probability. The tails of a HyperGeometric distribution can look non-Gaussian if the total population size $N$ is not large compared to the size of the target population $K$.

What is the “binomial heuristic” and why do we use it?

For the HyperGeometric distribution, the formula for the standard deviation is a bit more complicated than it is for the Binomial distribution. Luckily, when the total population size is large enough, the two distributions look similar.

<aside> <img src="/icons/fleur-de-lis_purple.svg" alt="/icons/fleur-de-lis_purple.svg" width="40px" />

Definition (Binomial Heuristic for the Hypergeometric Distribution)

Suppose that $X \sim \text{HyperGeom}(N,K,n)$. Then the true mean and standard deviation can be written

$$ \mu = n \tilde{p}, \text{ and } \sigma = \sqrt{\frac{N-n}{N-1} \, n \,\tilde{p} \,(1 - \tilde{p})}, \quad \text{where} \quad \tilde{p} := K/N. $$

We can think of $\tilde{p}$ as the proportion of the total population that is in the focal group.

The binomial heuristic for the mean and standard deviation is

$$ \mu = n \tilde{p} \text{ and } \tilde{\sigma} = \sqrt{n \tilde p (1 - \tilde p)}. $$

</aside>

Notice that $\tilde \sigma$ is just $\sigma$, but ignoring the pre-factor $(N-n)/(N-1).$ We say that the HyperGeometric distribution is approximately Binomial when $N > 10n$. When this inequality holds, notice that

$$ \frac{N - n}{N - 1} = \frac{1 - \frac{n}{N}}{1 - \frac{1}{N}} \approx 1 - \frac{n}{N} > 1 - \frac{1}{10} = 0.9. $$

So, when the Hypergeometric is approximately Binomial, $\tilde \sigma$ is between $0.9 \sigma$ and $\sigma.$

How do I know how well my $2\sigma$-test is working?

This is why we have the concepts of significance and power. The significance is the probability of rejecting the null hypothesis when it is true. Power is the probability of rejecting the null hypothesis when a specific version of the alternative hypothesis is true.