I derive the mean and variance of the OLS estimator, as well as an unbiased estimator of the OLS estimator's variance. I then show that the OLS estimator is normally distributed if we assume the error terms are normally distributed.
Published
26 August 2021
As introduced in mypreviousposts on ordinary least squares (OLS), the linear regression model has the form
yn=β0+β1xn,1+⋯+βPxn,P+εn.(1)
To perform tasks such as hypothesis testing for a given estimated coefficient β^p, we need to pin down the sampling distribution of the OLS estimator β^=[β1,…,βP]⊤. To do this, we need to make some assumptions. We can then use those assumptions to derive some basic properties of β^.
I’ll start this post by working through the standard OLS assumptions. I’ll then show how these assumptions imply some established properties of the OLS estimator β^. Finally, I’ll show how if we assume our error terms are normally distributed, we can pin down the distribution of β^ exactly.
Standard OLS assumptions
The standard assumptions of OLS are:
Linearity
Strict exogeneity
No multicollinearity
Spherical errors
Normality (optional)
Assumptions 1 and 3 are not terribly interesting here. Assumption 1 is just Equation 1; it means that we have correctly specified our model. Assumption 3 is that our design matrix X is full rank; this property not relevant for this post, but I have another post on the topic for the curious.
Assumptions 2 and 4 are more interesting here. Assumption 2, strict exogeneity, is that the expectation of the error term is zero:
E[εn∣X]=0,n∈{1,…,N}.(2)
An exogenous variable is a variable that is not determined by other variables or parameters in the model. Here is a nice example of why Equation 2 captures this intuition.
Assumption 4 can be broken into two assumptions. The first is homoskedasticity, meaning that our observations have a constant variance σ2:
V[εn∣X]=σ2,n∈{1,…,N}.(3)
The second is that our error terms are uncorrelated:
E[εnεm∣X]=0,n,m∈{1,…,N},n=m.(4)
Taken together, these two sub-assumptions are typically stated as just spherical errors, since we can formalize both at once as
V[ε∣X]=σ2IN.(5)
Finally, assumption 5 is that our error terms are normally distributed. This assumption is not required for OLS theory, but some sort of distributional assumption about the noise is required for hypothesis testing in OLS. As we will see, the normality assumption will imply that the OLS estimator β^ is normally distributed.
With these properties in mind, let’s prove some important facts about the OLS estimator β^.
OLS estimator is unbiased
First, let’s prove that β^ is unbiased, i.e. that
E[β^∣X]=β.(6)
Equivalently, we just need to show that
E[β^−β∣X]=0.(7)
The term in the expectation, β^−β, is sometimes called the sampling error, and we can write it in terms of the predictors and noise terms:
Step ⋆ is the normal equation, and step † is the matrix form of our linear assumption, y=Xβ+ε. Since we assume that X is non-random, we can pull it out of the expectation, and we’re done:
E[β^−β∣X]=(X⊤X)−1X⊤E[ε∣X]=0.(9)
As we can see, we require strict exogeneity to prove that β^ is unbiased.
Variance of the OLS estimator
This proof is from (Hayashi, 2000). The variance of the OLS estimator is
Step ⋆ is because the true value β is non-random; step † is just applying Equation 5 from above; step ‡ is because A is non-random; and step ∗ is assumption 4 or spherical errors.
As we can see, the basic idea of the proof is to write β^ in terms of the random variables ε, since this is the quantity with constant variance σ2.
Unbiased variance estimator
This section is not strictly necessary for understanding the sampling distribution of β^, but it’s a useful property of the finite sample distribution, e.g. it shows up when computing t-statistics for OLS. This proof is also from (Hayashi, 2000), but I’ve organized and expanded it to be more explicit.
An unbiased estimator of the variance σ2 is s2 where
s2=N−Pe⊤e,(11)
where e is a vector of residuals, i.e. en≜yn−β^⊤xn. To prove that s2 is unbiased, it suffices to show that
E[s2∣X]E[e⊤e∣X]=σ2,⇓=(N−P)σ2.(12)
We will prove this three step. First, we will show
If we make assumption 5, that the error terms are normally distributed, then β^ is also normally distributed. To see this, note that assumptions 2 and 4 already specify the mean and variance of ε. If we assume normality in the errors, then clearly
ε∣X∼N(0,σ2IN),(25)
since the normal distribution is fully specified by its mean and variance. Since the random variable ε does not depend on X, clearly the marginal distribution is also normal,
ε∼N(0,σ2IN).(26)
Finally, note that Equation 8 means we can write write the sampling error in terms of the residuals:
β^−β=(X⊤X)−1X⊤ε.(27)
Since X is a linear function, then (X⊤X)−1X⊤ is a linear function. A linear function of a normal random variable ε is still normally distributed, meaning that β^−β is normally distributed. We know the mean of β^−β from Equation 9, and we know the variance from Equation 10. Therefore we have:
β^−β∼N(0,σ2(X⊤X)−1).(28)
Using basic properties of the normal distribution, we can immediately derive the distribution of the OLS estimator:
β^∼N(β,σ2(X⊤X)−1).(29)
In summary, we have derived a standard result for the OLS estimator when assuming normally distributed errors.
Conclusion
OLS makes a few important assumptions (assumptions 1-4), which mathematically imply some basic properties of the OLS estimator β^. For example, the unbiasedness of β^ is due to strict exogeneity or assumption 2. However, without assuming a distribution on the noise (assumption 5), we cannot pin down a sampling distribution on β^. If we assume normally distributed errors, then β^ is itself normally distributed. Knowing this distribution is useful in analyzing the results of linear models, such as when performing hypothesis testing for a given estimated parameter β^p.
Acknowledgements
I thank Mattia Mariantoni for pointing out a typo in Equation 20.
Hayashi, F. (2000). Econometrics. Princeton University Press. Section, 1, 60–69.