The Bonferroni Correction

Last updated on 2025-09-23 | Edit this page

Estimated time: 45 minutes

Overview

Questions

What is one way to control family wise error rate?

Objectives

Apply the Bonferroni procedure to identify the true positives.

The Bonferroni Correction

Now that we have learned about the Family Wise Error Rate (FWER), we describe what we can actually do to control it. In practice, we want to choose a procedure that guarantees the FWER is smaller than a predetermined value such as 0.05. We can keep it general and instead of 0.05, use \(\alpha\) in our derivations.

Since we are now describing what we do in practice, we no longer have the advantage of knowing the truth. Instead, we pose a procedure and try to estimate the FWER. Let’s consider the naive procedure: “reject all the hypotheses with p-value <0.01”. For illustrative purposes we will assume all the tests are independent (in the case of testing diets this is a safe assumption; in the case of genes it is not so safe since some groups of genes act together). Let \(p_1,\dots,p_{10000}\) be the the p-values we get from each test. These are independent random variables so:

\[ \begin{aligned} \mbox{Pr}(\mbox{at least one rejection}) &= 1 -\mbox{Pr}(\mbox{no rejections}) \\ &= 1 - \prod_{i=1}^{1000} \mbox{Pr}(p_i>0.01) \\ &= 1-0.99^{1000} \approx 1 \end{aligned} \]

Or if you want to use simulations:

R

B <- 10000
minpval <- replicate(B, min(runif(10000, 0, 1)) < 0.01)
mean(minpval >= 1)

OUTPUT

[1] 1

So our FWER is 1! This is not what we were hoping for. If we wanted it to be lower than \(\alpha=0.05\), we failed miserably.

So what do we do to make the probability of a mistake lower than \(\alpha\) ? Using the derivation above we can change the procedure by selecting a more stringent cutoff, previously 0.01, to lower our probability of at least one mistake to be 5%. Namely, by noting that:

\(\mbox{Pr}(\mbox{at least one rejection}) = 1-(1-k)^{10000}\)

and solving for \(k\), we get \(1-(1-k)^{10000}=0.01 \implies k = 1-0.99^{1/10000} \approx 1e-6\)

This now gives a specific example of a procedure. This one is actually called Sidak’s procedure. Specifically, we define a set of instructions, such as “reject all the null hypothesis for which p-values < 1e-6. Then, knowing the p-values are random variables, we use statistical theory to compute how many mistakes, on average, we are expected to make if we follow this procedure. More precisely, we compute bounds on these rates; that is, we show that they are smaller than some predetermined value. There is a preference in the life sciences to err on the side of being conservative.

A problem with Sidak’s procedure is that it assumes the tests are independent. It therefore only controls FWER when this assumption holds. The Bonferroni correction is more general in that it controls FWER even if the tests are not independent. As with Sidak’s procedure we start by noting that:

\(FWER = \mbox{Pr}(V>0) \leq \mbox{Pr}(V>0 \mid \mbox{all nulls are true})\)

or using the notation from the table above:

\(\mbox{Pr}(V>0) \leq \mbox{Pr}(V>0 \mid m_1=0)\)

The Bonferroni procedure sets \(k=\alpha/m\) since we can show that:

\[ \begin{align*} \mbox{Pr}(V>0 \,\mid \, m_1=0) &= \mbox{Pr}\left( \min_i \{p_i\} \leq \frac{\alpha}{m} \mid m_1=0 \right)\\ &\leq \sum_{i=1}^m \mbox{Pr}\left(p_i \leq \frac{\alpha}{m} \right)\\ &= m \frac{\alpha}{m}=\alpha \end{align*} \]

Controlling the FWER at 0.05 is a very conservative approach. Using the p-values computed in the previous section…

R

set.seed(1)
population <- unlist(read.csv(file = "./data/femaleControlsPopulation.csv"))
m <- 10000
N <- 12
p0 <- 0.90 ##10% of diets work, 90% don't
m0 <- m * p0
m1 <- m - m0
nullHypothesis <- c( rep(TRUE,m0), rep(FALSE,m1)) 
delta <- 3
pvals <- sapply(1:m, function(i){
  control <- sample(population, N)
  treatment <- sample(population, N)
  if(!nullHypothesis[i]) treatment <- treatment + delta
  t.test(treatment, control)$p.value
})

…we note that only:

R

sum(pvals < 0.05/10000)

OUTPUT

[1] 0

are called significant after applying the Bonferroni procedure, despite having 1,000 diets that work.

Exercises

The following exercises should help you understand the concept of an error controlling procedure. You can think of it as defining a set of instructions, such as “reject all the null hypothesis for which p-values < 0.0001” or “reject the null hypothesis for the 10 features with smallest p-values”. Then, knowing the p-values are random variables, we use statistical theory to compute how many mistakes, on average, we will make if we follow this procedure. More precisely, we commonly find bounds on these rates, meaning that we show that they are smaller than some predetermined value. As described in the text, we can compute different error rates. The FWER tells us the probability of having at least one false positive. The FDR is the expected rate of rejected null hypothesis.

Note 1: the FWER and FDR are not procedures, but error rates. We will review procedures here and use Monte Carlo simulations to estimate their error rates.

Note 2: We sometimes use the colloquial term “pick genes that” meaning “reject the null hypothesis for genes that”.

Challenge

Exercise 1

We have learned about the family wide error rate FWER. This is the probability of incorrectly rejecting the null at least once. Using the notation in the video, this probability is written like this: Pr(V 0). What we want to do in practice is choose a procedure that guarantees this probability is smaller than a predetermined value such as 0.05. Here we keep it general and, instead of 0.05, we use α. We have already learned that the procedure “pick all the genes with p-value < 0.05” fails miserably as we have seen that Pr(V 0) ≈ 1. So what else can we do? The Bonferroni procedure assumes we have computed p-values for each test and asks what constant k should we pick so that the procedure “pick all genes with p-value less than k “ has Pr(V 0) = 0.05. Furthermore, we typically want to be conservative rather than lenient, so we accept a procedure that has Pr(V 0) ≤ 0.05. So the first result we rely on is that this probability is largest when all the null hypotheses are true: Pr(V 0) ≤ Pr(V 0|all nulls are true) or: Pr(V 0) ≤ Pr(V 0 | m1 = 0) We showed that if the tests are independent then: Pr(V 0|m1) = 1−(1−k)m And we pick k so that 1 − (1 − k)m = α =⇒ k=1−(1−α)1/m Now this requires the tests to be independent. The Bonferroni procedure does not make this assumption and, as we previously saw, sets k = α/m and shows that with this choice of k this procedure results in P r(V 0) ≤ α. In R define alphas <- seq(0,0.25,0.01) Make a plot of α/m and 1 − (1 − α)1/m for various values of m 1.

Show me the solution

plot(alphas/m, (1-(1-alphas)^(1/m)), xlab = 'bonf', ylab = 'sidak', main = 'p-val cutoff')
abline(0,1)

Challenge

Exercise 2

Which procedure is more conservative Bonferroni’s or Sidak’s?
A) They are the same.
B) Bonferroni’s.
C) Depends on m.
D) Sidak’s.

Show me the solution

Bonferroni’s procedure is more conservative (choice B). Conservative refers to strictness. The p-value cutoff for significance is lower in Bonferroni’s procedure, and therefore more conservative.

Challenge

Exercise 3

To simulate the p-value results of, say 8,793 t-tests for which the null is true, we don’t actually have to generate the original data. We can generate p-values for a uniform distribution like this: pvals <- runif(8793,0,1). Using what we have learned, set the cutoff using the Bonferroni correction and report back the FWER. Set the seed at 1 and run 10,000 simulation.

Show me the solution

set.seed(1)
bonf_res <- replicate(10000, {
pvals <- runif(8793,0,1)
bonfcall <- sum((pvals * m) < 0.05)
return(bonfcall)
})
sum(bonf_res>0)/length(bonf_res)

Challenge

Exercise 4

Using the same seed, repeat exercise 5, but for Sidak’s cutoff.

Show me the solution

set.seed(1)
sidak_res <- replicate(10000, {
pvals <- runif(8793,0,1)
sidakcall <- sum((1-(1-pvals)^m) < 0.05)
return(sidakcall)
sum(sidak_res>0)/length(sidak_res)
})

Key Points

The Bonferroni technique controls FWER by dividing a predetermined alpha rate (e.g. alpha = .05) by the number of inferential tests performed.
The Bonferroni correction is very strict and conservative.