Proof of Bessel's Correction

Bessel's correction is the division of the sample variance by N1N - 1 rather than NN. I walk the reader through a quick proof that this correction results in an unbiased estimator of the population variance.

Let X={X1,X2,,XN}X = \{ X_1, X_2, \dots, X_N\} be a random sample of NN i.i.d. random variables. Let Xˉ\bar{X} denote the sample mean,

Xˉ=1Nn=1NXn.(1) \bar{X} = \frac{1}{N} \sum_{n=1}^{N} X_n. \tag{1}

When computing the sample variance s2s^2, students are told to divide by N1N - 1 rather than NN:

s2=1N1n=1N(XnXˉ)2.(2) s^2 = \frac{1}{N-1} \sum_{n=1}^{N} (X_n - \bar{X})^2. \tag{2}

When first learning about this fact, I was shown computer simulations but no mathematical proof of why this must hold. The goal of this post is to provide a quick proof of why this correction makes sense.

The proof outline is straightforward: we need to show that the estimator in Equation 44 (below) is biased, and that we can correct this bias by dividing by N1N - 1 rather than NN. For an estimator to be unbiased, the expectation of that estimator must equal the population parameter. In our case, if the sample variance is s2s^2 and the population variance is σ2\sigma^2, we want

E[s2]=σ2.(3) \mathbb{E}[s^2] = \sigma^2. \tag{3}

Let’s begin.

Proof

Let’s prove that the following estimator for the population variance is biased:

s2=1Nn=1N(XnXˉ)2.(4) s^2 = \frac{1}{N} \sum_{n=1}^{N} (X_n - \bar{X})^2. \tag{4}

First, let’s take the expectation of this estimator and manipulate it:

E[1Nn=1N(XnXˉ)2]=E[1Nn=1N(Xn22XnXˉ+Xˉ2)]=E[1Nn=1NXn22Xˉ1Nn=1NXn+1Nn=1NXˉ2]=E[1Nn=1NXn2]E[2Xˉ2]+E[Xˉ2]=E[1Nn=1NXn2]E[Xˉ2]=E[Xn2]E[Xˉ2].(5) \begin{aligned} \mathbb{E}\left[\frac{1}{N} \sum_{n=1}^{N} (X_n - \bar{X})^2\right] &= \mathbb{E}\left[\frac{1}{N} \sum_{n=1}^{N} (X_n^2 - 2 X_n \bar{X} + \bar{X}^2) \right] \\ &= \mathbb{E}\left[\frac{1}{N} \sum_{n=1}^{N} X_n^2 - 2 \bar{X} \frac{1}{N} \sum_{n=1}^{N} X_n + \frac{1}{N} \sum_{n=1}^{N} \bar{X}^2 \right] \\ &\stackrel{\star}{=} \mathbb{E}\left[\frac{1}{N} \sum_{n=1}^{N} X_n^2 \right] - \mathbb{E}\left[2 \bar{X}^2\right] + \mathbb{E}\left[\bar{X}^2 \right] \\ &= \mathbb{E}\left[\frac{1}{N} \sum_{n=1}^{N} X_n^2\right] - \mathbb{E}\left[\bar{X}^2 \right] \\ &\stackrel{\dagger}{=} \mathbb{E}\left[ X_n^2 \right] - \mathbb{E} \left[ \bar{X}^2 \right]. \end{aligned} \tag{5}

Note that step \star holds because

n=1NXn=NXˉ.(6) \sum_{n=1}^{N} X_n = N \bar{X}. \tag{6}

while step \dagger holds because the data are i.i.d., i.e.

E[1Nn=1NXn2]=1Nn=1NE[Xn2]=E[Xn2].(7) \mathbb{E}\left[\frac{1}{N} \sum_{n=1}^{N} X_n^2 \right] = \frac{1}{N} \sum_{n=1}^{N} \mathbb{E}\left[ X_n^2 \right] = \mathbb{E}\left[ X_n^2 \right]. \tag{7}

Now note that since XnX_n is an i.i.d. random variable, all XnXX_n \in X have the same variance. Furthermore, recall that for any random variable YY,

V[Y]=E[Y2]E[Y]2,E[Y2]=V[Y]+E[Y]2.(8) \begin{aligned} \mathbb{V}[Y] &= \mathbb{E}[Y^2] - \mathbb{E}[Y]^2, \\ &\Downarrow \\ \mathbb{E}[Y^2] &= \mathbb{V}[Y] + \mathbb{E}[Y]^2. \end{aligned} \tag{8}

So we can write

E[Xn2]=V[Xn]+E[Xn]2=σ2+μ2,E[Xˉ2]=V[Xˉ]+E[Xˉ]2=σ2N+μ2.(9) \begin{aligned} \mathbb{E}\left[ X_n^2 \right] &= \mathbb{V}[X_n] + \mathbb{E}[X_n]^2 \\ &= \sigma^2 + \mu^2, \\\\ \mathbb{E} \left[ \bar{X}^2 \right] &= \mathbb{V}[\bar{X}] + \mathbb{E}[\bar{X}]^2 \\ &\stackrel{\star}{=} \frac{\sigma^2}{N} + \mu^2. \end{aligned} \tag{9}

Step \star holds because

V[Xˉ]=V[1Nn=1NXn]=iid1N2n=1NV[Xn]=1N2n=1Nσ2=σ2N.(10) \begin{aligned} \mathbb{V}[\bar{X}] &= \mathbb{V}\left[\frac{1}{N} \sum_{n=1}^{N} X_n \right] \\ &\stackrel{\textsf{iid}}{=} \frac{1}{N^2} \sum_{n=1}^{N} \mathbb{V}[X_n] \\ &= \frac{1}{N^2} \sum_{n=1}^{N} \sigma^2 \\ &= \frac{\sigma^2}{N}. \end{aligned} \tag{10}

Finally, let’s put everything together:

E[s2]=σ2+μ2(σ2N+μ2)=σ2(11N).(11) \begin{aligned} \mathbb{E}[s^2] &= \sigma^2 + \mu^2 - \left(\frac{\sigma^2}{N} + \mu^2\right) \\ &= \sigma^2 \left(1 - \frac{1}{N} \right). \end{aligned} \tag{11}

What we have shown is that our estimator is off by a constant, (11N)=(N1N)\left(1 - \frac{1}{N} \right) = \left( \frac{N-1}{N} \right). If we want an unbiased estimator, we should multiply both sides of Equation 1111 by the inverse of the constant:

E[(NN1)s2]=E[1N1n=1N(XnXˉ)2]=σ2.(12) \mathbb{E}\left[\left(\frac{N}{N-1}\right) s^2\right] = \mathbb{E}\left[\frac{1}{N-1} \sum_{n=1}^{N} (X_n - \bar{X})^2\right] = \sigma^2. \tag{12}

And this new estimator is exactly what we wanted to prove. Bessel’s correction results in an unbiased estimator for the population variance.