Proof of the Cramér–Rao Lower Bound

The Cramér–Rao lower bound allows us to derive uniformly minimum–variance unbiased estimators by finding unbiased estimators that achieve this bound. I derive the main result.

Published

27 November 2019

Given a statistical model $X \sim \mathbb{P}_{\theta}$ with a fixed true parameter $\theta$ , the Cramér–Rao lower bound (CRLB) provides a lower bound on the variance of an estimator $T(X)$ . The CRLB is useful because if an unbiased estimator achieves the CRLB, it must be a uniformly minimum–variance unbiased estimator because it is unbiased by construction and has minimum variance by the CRLB. A precise statement of scalar-case of the CLRB is

CRLB: Let $X = (X_1, \dots, X_N) \in \mathbb{R}^N$ be a random vector with joint density $f(X; \theta)$ where $\theta \in \Theta \subseteq \mathbb{R}$ . Let $T(X)$ be a biased estimator of $\theta$ . Assume the Fisher information is always defined and that the operations of integration with respect to $X$ and differention with respect to $\theta$ can be interchanged. Then

$\mathbb{V}[T(X)] \geq \frac{\left( \frac{d}{d \theta} \mathbb{E}[T(X)] \right)^2}{\mathbb{E} \left[\left(\frac{d}{d\theta} \log f(X; \theta) \right)^2\right]} \equiv \text{CRLB}(\theta). \tag{1}$

The denominator of the CRLB is the Fisher information. If the estimator is unbiased, then the numerator is one since $\mathbb{E}[T(X)] = \theta$ .

To prove the CRLB, let $W$ and $Y$ be two random variables. In general, $\mathbb{E}[W] \neq 0$ and $\mathbb{E}[Y] \neq 0$ . However, assume $\mathbb{E}[Y] = 0$ . A property of covariance is

$(\text{Cov}[W, Y])^2 \leq \mathbb{V}[W] \mathbb{V}[Y]. \tag{2}$

This can be derived by applying Cauchy–Schwarz to random variables (see the Appendix). Now set $W$ and $Y$ to

$W = T(X), \qquad Y = \frac{\partial}{\partial \theta} \log f(X; \theta). \tag{3}$

We know that the expectation of the score is zero. Therefore, $\mathbb{E}[Y] = 0$ as desired. Then Equation $2$ can be rewritten as

$\begin{aligned} \mathbb{V}[T(X)] \mathbb{V}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right] &\geq \left(\text{Cov}\left[T(X), \frac{\partial}{\partial \theta} \log f(X; \theta)\right]\right)^2 \\ \mathbb{V}[T(X)] &\geq \frac{\left(\text{Cov}\left[T(X), \frac{\partial}{\partial \theta} \log f(X; \theta)\right]\right)^2}{ \mathbb{V}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right]}. \end{aligned} \tag{4}$

The numerator is our desired quantity. Ignoring the square, we have

$\begin{aligned} &\text{Cov}\left[T(X), \frac{\partial}{\partial \theta} \log f(X; \theta)\right] \\ &= \mathbb{E}\left[T(X)\frac{\partial}{\partial \theta} \log f(X; \theta) \right] - \overbrace{\mathbb{E}[T(X)] \mathbb{E}\left[\frac{\partial}{\partial \theta} \log f(X; \theta) \right]}^{=\,0} \\ &= \int T(X) \frac{\partial}{\partial \theta} \log f(X; \theta) f(X; \theta) \text{d}\mu(X) \\ &\stackrel{\star}{=} \int T(X) \frac{\partial}{\partial \theta} f(X; \theta) \text{d}\mu(X) \\ &\stackrel{\dagger}{=} \frac{\partial}{\partial \theta} \int T(X) f(X; \theta) \text{d}\mu(X) \\ &= \frac{\partial}{\partial \theta} \mathbb{E}[T(X)]. \end{aligned} \tag{5}$

In step $\star$ , we use the fact that if $g(x) = \log h(x)$ , then $g^{\prime}(x) = h^{\prime}(x) / h(x)$ . In step $\dagger$ , we use our assumption that we can interchange integration and differention.

The denominator is the desired quantity because

$\begin{aligned} \mathbb{V}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right] &= \mathbb{E}\left[\left(\frac{\partial}{\partial \theta} \log f(X; \theta)\right)^2\right] - \mathbb{E}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right]^2 \\ &= \mathbb{E}\left[\left(\frac{\partial}{\partial \theta} \log f(X; \theta)\right)^2\right]. \end{aligned} \tag{6}$

Putting these results together in Equation $4$ , we have

$\mathbb{V}[T(X)] \geq \frac{\left( \frac{\partial}{\partial \theta} \mathbb{E}[T(X)] \right)^2}{\mathbb{E}\left[\left(\frac{\partial}{\partial \theta} \log f(X; \theta)\right)^2\right]} \tag{7}$

as desired.

Appendix

1. Covariance inequality

The Cauchy–Schwarz inequality for vectors $\mathbf{u}$ and $\mathbf{v}$ with an inner product $\langle \mathbf{u}, \mathbf{v} \rangle$ is

$| \langle \mathbf{u}, \mathbf{v }\rangle |^2 \leq \langle \mathbf{u}, \mathbf{u} \rangle \cdot \langle \mathbf{v}, \mathbf{v} \rangle. \tag{A1.1}$

Now note that for real-valued random variables $W$ and $Y$ , the expected value of their product is itself an inner product:

$\langle W, Y \rangle = \mathbb{E}[WY]. \tag{A1.2}$

Now let $\mathbb{E}[W] = \omega$ and $\mathbb{E}[Y] = \gamma$ . Then if we apply the definition of covariance and Cauchy–Schwarz to this inner product, we have

$\begin{aligned} |\text{Cov}[W, Y]|^2 &\triangleq|\mathbb{E}[(W - \omega)(Y - \gamma)] |^2 \\ &= |\langle W - \omega, Y - \gamma \rangle |^2 \\ &\leq \langle W - \omega, W - \omega \rangle \cdot \langle Y - \gamma, Y - \gamma \rangle \\ &= \mathbb{E}[(W - \omega)^2] \mathbb{E}[(Y - \gamma)^2] \\ &= \mathbb{V}[W] \mathbb{V}[Y] \end{aligned} \tag{A1.3}$

as desired.