Let X={X1,X2,…,XN} be a random sample of N i.i.d. random variables. Let Xˉ denote the sample mean,
Xˉ=N1n=1∑NXn.(1)
When computing the sample variance s2, students are told to divide by N−1 rather than N:
s2=N−11n=1∑N(Xn−Xˉ)2.(2)
When first learning about this fact, I was shown computer simulations but no mathematical proof of why this must hold. The goal of this post is to provide a quick proof of why this correction makes sense.
The proof outline is straightforward: we need to show that the estimator in Equation 4 (below) is biased, and that we can correct this bias by dividing by N−1 rather than N. For an estimator to be unbiased, the expectation of that estimator must equal the population parameter. In our case, if the sample variance is s2 and the population variance is σ2, we want
E[s2]=σ2.(3)
Let’s begin.
Proof
Let’s prove that the following estimator for the population variance is biased:
s2=N1n=1∑N(Xn−Xˉ)2.(4)
First, let’s take the expectation of this estimator and manipulate it:
E[N1n=1∑N(Xn−Xˉ)2]=E[N1n=1∑N(Xn2−2XnXˉ+Xˉ2)]=E[N1n=1∑NXn2−2XˉN1n=1∑NXn+N1n=1∑NXˉ2]=⋆E[N1n=1∑NXn2]−E[2Xˉ2]+E[Xˉ2]=E[N1n=1∑NXn2]−E[Xˉ2]=†E[Xn2]−E[Xˉ2].(5)
Note that step ⋆ holds because
n=1∑NXn=NXˉ.(6)
while step † holds because the data are i.i.d., i.e.
E[N1n=1∑NXn2]=N1n=1∑NE[Xn2]=E[Xn2].(7)
Now note that since Xn is an i.i.d. random variable, all Xn∈X have the same variance. Furthermore, recall that for any random variable Y,
V[Y]E[Y2]=E[Y2]−E[Y]2,⇓=V[Y]+E[Y]2.(8)
So we can write
E[Xn2]E[Xˉ2]=V[Xn]+E[Xn]2=σ2+μ2,=V[Xˉ]+E[Xˉ]2=⋆Nσ2+μ2.(9)
Step ⋆ holds because
V[Xˉ]=V[N1n=1∑NXn]=iidN21n=1∑NV[Xn]=N21n=1∑Nσ2=Nσ2.(10)
Finally, let’s put everything together:
E[s2]=σ2+μ2−(Nσ2+μ2)=σ2(1−N1).(11)
What we have shown is that our estimator is off by a constant, (1−N1)=(NN−1). If we want an unbiased estimator, we should multiply both sides of Equation 11 by the inverse of the constant:
E[(N−1N)s2]=E[N−11n=1∑N(Xn−Xˉ)2]=σ2.(12)
And this new estimator is exactly what we wanted to prove. Bessel’s correction results in an unbiased estimator for the population variance.