I discuss and prove the Gauss–Markov theorem, which states that under certain conditions, the least squares estimator is the minimum-variance linear unbiased estimator of the model parameters.
Published
08 February 2022
Informally, the Gauss–Markov theorem states that, under certain conditions, the ordinary least squares (OLS) estimator is the best linear model we can use. This is a powerful claim. Formally, the theorem states the following:
Gauss–Markov theorem. In a linear regression with response vector y and design matrix X, the least squares estimator β^≜(X⊤X)−1X⊤y is the minimum-variance linear unbiased estimator of the model parameter β, under the ordinary least squares assumptions.
Here, we can see that “best” is defined as both minimum variance and unbiased, and that the regularity conditions are the assumptions of OLS. There is no guarantee that a nonlinear method, for example, will not be better for our data by some other metric, but if we want to use an unbiased linear model and if the OLS assumptions hold for our data, then we should just use OLS.
An estimator that is optimal in this way is sometimes referred to as “BLUE”, for best linear unbiased estimator. The Gauss–Markov theorem could be stated even more succinctly as: “Under the OLS assumptions, the OLS estimator is BLUE.”
Obviously, if the OLS assumptions do not hold, then the OLS estimator is not necessarily BLUE. If our data have heteroscedasticity, for example, then a least squares regression fit to our data will not necessarily be optimal as defined above.
Proof
This proof is from (Greene, 2003). Consider a second linear and unbiased estimator β^0:
β^0=Cy.(1)
Here, C is a P×N matrix, and therefore β^0 is linear in that the predictions y^ are linear functions of the response, since
y^=Xβ^0=XCy.(2)
A linear function of a linear function is still a linear function, so XCy is simply a linear projection of our response variables y onto the space spanned by the columns of XC. Thus, this estimator adheres to the first assumption of OLS, linearity.
We assumed this new estimator β^0, like the OLS estimator β^, is unbiased. (Although we proved, not assumed, that the original OLS estimator is unbiased.) This implies
In other words, the properties in Equation 4 are true since we assume that β^0 is unbiased. Note that E[ε∣X]=0 is the second assumption of OLS, strict exogeneity.
The inequality follows because DD⊤ is a positive definite matrix. So in words, the conditional variance of this new estimator β^0 is greater or equal to the variance of β^. This proves that β^ is the minimum-variance linear unbiased estimator or BLUE.
So why are the cross terms in Equation 8 zero? Because
DX=(C−A)X=CX−AX=CX−(X⊤X)−1X⊤X=I−I=0.(9)
Note that CX=I by the assumption that β^0 is unbiased. Clearly, if DX=0, then X⊤D⊤=0⊤—both 0 and 0⊤ are P×P matrices of all zeros. Therefore the cross terms can be each written as
DA⊤AD⊤=DX(X⊤X)−1=0,=(X⊤X)−1X⊤D⊤=0.(10)
Notice that implicit in the existence of the inverse of X⊤X, we assume that X is full-rank. This is the second OLS assumption, no multicollinearity. Thus, the Gauss–Markov theorem holds when we adhere to the four assumptions of OLS: linearity, no multicollinearity, strict exogeneity, and spherical errors. If we make these four assumptions, then β^ is BLUE, the best (minimum-variance) linear unbiased estimator.
Greene, W. H. (2003). Econometric analysis. Pearson Education India.