Proof of the Cramér–Rao Lower Bound

The Cramér–Rao lower bound allows us to derive uniformly minimum–variance unbiased estimators by finding unbiased estimators that achieve this bound. I derive the main result.

Given a statistical model XPθX \sim \mathbb{P}_{\theta} with a fixed true parameter θ\theta, the Cramér–Rao lower bound (CRLB) provides a lower bound on the variance of an estimator T(X)T(X). The CRLB is useful because if an unbiased estimator achieves the CRLB, it must be a uniformly minimum–variance unbiased estimator because it is unbiased by construction and has minimum variance by the CRLB. A precise statement of scalar-case of the CLRB is

CRLB: Let X=(X1,,XN)RNX = (X_1, \dots, X_N) \in \mathbb{R}^N be a random vector with joint density f(X;θ)f(X; \theta) where θΘR\theta \in \Theta \subseteq \mathbb{R}. Let T(X)T(X) be a biased estimator of θ\theta. Assume the Fisher information is always defined and that the operations of integration with respect to XX and differention with respect to θ\theta can be interchanged. Then

V[T(X)](ddθE[T(X)])2E[(ddθlogf(X;θ))2]CRLB(θ).(1) \mathbb{V}[T(X)] \geq \frac{\left( \frac{d}{d \theta} \mathbb{E}[T(X)] \right)^2}{\mathbb{E} \left[\left(\frac{d}{d\theta} \log f(X; \theta) \right)^2\right]} \equiv \text{CRLB}(\theta). \tag{1}

The denominator of the CRLB is the Fisher information. If the estimator is unbiased, then the numerator is one since E[T(X)]=θ\mathbb{E}[T(X)] = \theta.

To prove the CRLB, let WW and YY be two random variables. In general, E[W]0\mathbb{E}[W] \neq 0 and E[Y]0\mathbb{E}[Y] \neq 0. However, assume E[Y]=0\mathbb{E}[Y] = 0. A property of covariance is

(Cov[W,Y])2V[W]V[Y].(2) (\text{Cov}[W, Y])^2 \leq \mathbb{V}[W] \mathbb{V}[Y]. \tag{2}

This can be derived by applying Cauchy–Schwarz to random variables (see the Appendix). Now set WW and YY to

W=T(X),Y=θlogf(X;θ).(3) W = T(X), \qquad Y = \frac{\partial}{\partial \theta} \log f(X; \theta). \tag{3}

We know that the expectation of the score is zero. Therefore, E[Y]=0\mathbb{E}[Y] = 0 as desired. Then Equation 22 can be rewritten as

V[T(X)]V[θlogf(X;θ)](Cov[T(X),θlogf(X;θ)])2V[T(X)](Cov[T(X),θlogf(X;θ)])2V[θlogf(X;θ)].(4) \begin{aligned} \mathbb{V}[T(X)] \mathbb{V}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right] &\geq \left(\text{Cov}\left[T(X), \frac{\partial}{\partial \theta} \log f(X; \theta)\right]\right)^2 \\ \mathbb{V}[T(X)] &\geq \frac{\left(\text{Cov}\left[T(X), \frac{\partial}{\partial \theta} \log f(X; \theta)\right]\right)^2}{ \mathbb{V}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right]}. \end{aligned} \tag{4}

The numerator is our desired quantity. Ignoring the square, we have

Cov[T(X),θlogf(X;θ)]=E[T(X)θlogf(X;θ)]E[T(X)]E[θlogf(X;θ)]=0=T(X)θlogf(X;θ)f(X;θ)dμ(X)=T(X)θf(X;θ)dμ(X)=θT(X)f(X;θ)dμ(X)=θE[T(X)].(5) \begin{aligned} &\text{Cov}\left[T(X), \frac{\partial}{\partial \theta} \log f(X; \theta)\right] \\ &= \mathbb{E}\left[T(X)\frac{\partial}{\partial \theta} \log f(X; \theta) \right] - \overbrace{\mathbb{E}[T(X)] \mathbb{E}\left[\frac{\partial}{\partial \theta} \log f(X; \theta) \right]}^{=\,0} \\ &= \int T(X) \frac{\partial}{\partial \theta} \log f(X; \theta) f(X; \theta) \text{d}\mu(X) \\ &\stackrel{\star}{=} \int T(X) \frac{\partial}{\partial \theta} f(X; \theta) \text{d}\mu(X) \\ &\stackrel{\dagger}{=} \frac{\partial}{\partial \theta} \int T(X) f(X; \theta) \text{d}\mu(X) \\ &= \frac{\partial}{\partial \theta} \mathbb{E}[T(X)]. \end{aligned} \tag{5}

In step \star, we use the fact that if g(x)=logh(x)g(x) = \log h(x), then g(x)=h(x)/h(x)g^{\prime}(x) = h^{\prime}(x) / h(x). In step \dagger, we use our assumption that we can interchange integration and differention.

The denominator is the desired quantity because

V[θlogf(X;θ)]=E[(θlogf(X;θ))2]E[θlogf(X;θ)]2=E[(θlogf(X;θ))2].(6) \begin{aligned} \mathbb{V}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right] &= \mathbb{E}\left[\left(\frac{\partial}{\partial \theta} \log f(X; \theta)\right)^2\right] - \mathbb{E}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right]^2 \\ &= \mathbb{E}\left[\left(\frac{\partial}{\partial \theta} \log f(X; \theta)\right)^2\right]. \end{aligned} \tag{6}

Putting these results together in Equation 44, we have

V[T(X)](θE[T(X)])2E[(θlogf(X;θ))2](7) \mathbb{V}[T(X)] \geq \frac{\left( \frac{\partial}{\partial \theta} \mathbb{E}[T(X)] \right)^2}{\mathbb{E}\left[\left(\frac{\partial}{\partial \theta} \log f(X; \theta)\right)^2\right]} \tag{7}

as desired.

   

Appendix

1. Covariance inequality

The Cauchy–Schwarz inequality for vectors u\mathbf{u} and v\mathbf{v} with an inner product u,v\langle \mathbf{u}, \mathbf{v} \rangle is

u,v2u,uv,v.(A1.1) | \langle \mathbf{u}, \mathbf{v }\rangle |^2 \leq \langle \mathbf{u}, \mathbf{u} \rangle \cdot \langle \mathbf{v}, \mathbf{v} \rangle. \tag{A1.1}

Now note that for real-valued random variables WW and YY, the expected value of their product is itself an inner product:

W,Y=E[WY].(A1.2) \langle W, Y \rangle = \mathbb{E}[WY]. \tag{A1.2}

Now let E[W]=ω\mathbb{E}[W] = \omega and E[Y]=γ\mathbb{E}[Y] = \gamma. Then if we apply the definition of covariance and Cauchy–Schwarz to this inner product, we have

Cov[W,Y]2E[(Wω)(Yγ)]2=Wω,Yγ2Wω,WωYγ,Yγ=E[(Wω)2]E[(Yγ)2]=V[W]V[Y](A1.3) \begin{aligned} |\text{Cov}[W, Y]|^2 &\triangleq|\mathbb{E}[(W - \omega)(Y - \gamma)] |^2 \\ &= |\langle W - \omega, Y - \gamma \rangle |^2 \\ &\leq \langle W - \omega, W - \omega \rangle \cdot \langle Y - \gamma, Y - \gamma \rangle \\ &= \mathbb{E}[(W - \omega)^2] \mathbb{E}[(Y - \gamma)^2] \\ &= \mathbb{V}[W] \mathbb{V}[Y] \end{aligned} \tag{A1.3}

as desired.