Given a statistical model X ∼ P θ X \sim \mathbb{P}_{\theta} X ∼ P θ with a fixed true parameter θ \theta θ , the Cramér–Rao lower bound (CRLB) provides a lower bound on the variance of an estimator T ( X ) T(X) T ( X ) . The CRLB is useful because if an unbiased estimator achieves the CRLB, it must be a uniformly minimum–variance unbiased estimator because it is unbiased by construction and has minimum variance by the CRLB. A precise statement of scalar-case of the CLRB is
CRLB: Let X = ( X 1 , … , X N ) ∈ R N X = (X_1, \dots, X_N) \in \mathbb{R}^N X = ( X 1 , … , X N ) ∈ R N be a random vector with joint density f ( X ; θ ) f(X; \theta) f ( X ; θ ) where θ ∈ Θ ⊆ R \theta \in \Theta \subseteq \mathbb{R} θ ∈ Θ ⊆ R . Let T ( X ) T(X) T ( X ) be a biased estimator of θ \theta θ . Assume the Fisher information is always defined and that the operations of integration with respect to X X X and differention with respect to θ \theta θ can be interchanged. Then
V [ T ( X ) ] ≥ ( d d θ E [ T ( X ) ] ) 2 E [ ( d d θ log f ( X ; θ ) ) 2 ] ≡ CRLB ( θ ) . (1) \mathbb{V}[T(X)] \geq \frac{\left( \frac{d}{d \theta} \mathbb{E}[T(X)] \right)^2}{\mathbb{E} \left[\left(\frac{d}{d\theta} \log f(X; \theta) \right)^2\right]} \equiv \text{CRLB}(\theta). \tag{1} V [ T ( X ) ] ≥ E [ ( d θ d log f ( X ; θ ) ) 2 ] ( d θ d E [ T ( X ) ] ) 2 ≡ CRLB ( θ ) . ( 1 )
The denominator of the CRLB is the Fisher information . If the estimator is unbiased, then the numerator is one since E [ T ( X ) ] = θ \mathbb{E}[T(X)] = \theta E [ T ( X ) ] = θ .
To prove the CRLB, let W W W and Y Y Y be two random variables. In general, E [ W ] ≠ 0 \mathbb{E}[W] \neq 0 E [ W ] = 0 and E [ Y ] ≠ 0 \mathbb{E}[Y] \neq 0 E [ Y ] = 0 . However, assume E [ Y ] = 0 \mathbb{E}[Y] = 0 E [ Y ] = 0 . A property of covariance is
( Cov [ W , Y ] ) 2 ≤ V [ W ] V [ Y ] . (2)
(\text{Cov}[W, Y])^2 \leq \mathbb{V}[W] \mathbb{V}[Y]. \tag{2}
( Cov [ W , Y ] ) 2 ≤ V [ W ] V [ Y ] . ( 2 )
This can be derived by applying Cauchy–Schwarz to random variables (see the Appendix ). Now set W W W and Y Y Y to
W = T ( X ) , Y = ∂ ∂ θ log f ( X ; θ ) . (3)
W = T(X), \qquad Y = \frac{\partial}{\partial \theta} \log f(X; \theta). \tag{3}
W = T ( X ) , Y = ∂ θ ∂ log f ( X ; θ ) . ( 3 )
We know that the expectation of the score is zero . Therefore, E [ Y ] = 0 \mathbb{E}[Y] = 0 E [ Y ] = 0 as desired. Then Equation 2 2 2 can be rewritten as
V [ T ( X ) ] V [ ∂ ∂ θ log f ( X ; θ ) ] ≥ ( Cov [ T ( X ) , ∂ ∂ θ log f ( X ; θ ) ] ) 2 V [ T ( X ) ] ≥ ( Cov [ T ( X ) , ∂ ∂ θ log f ( X ; θ ) ] ) 2 V [ ∂ ∂ θ log f ( X ; θ ) ] . (4)
\begin{aligned}
\mathbb{V}[T(X)] \mathbb{V}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right]
&\geq \left(\text{Cov}\left[T(X), \frac{\partial}{\partial \theta} \log f(X; \theta)\right]\right)^2
\\
\mathbb{V}[T(X)]
&\geq \frac{\left(\text{Cov}\left[T(X), \frac{\partial}{\partial \theta} \log f(X; \theta)\right]\right)^2}{ \mathbb{V}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right]}.
\end{aligned} \tag{4}
V [ T ( X ) ] V [ ∂ θ ∂ log f ( X ; θ ) ] V [ T ( X ) ] ≥ ( Cov [ T ( X ) , ∂ θ ∂ log f ( X ; θ ) ] ) 2 ≥ V [ ∂ θ ∂ log f ( X ; θ ) ] ( Cov [ T ( X ) , ∂ θ ∂ log f ( X ; θ ) ] ) 2 . ( 4 )
The numerator is our desired quantity. Ignoring the square, we have
Cov [ T ( X ) , ∂ ∂ θ log f ( X ; θ ) ] = E [ T ( X ) ∂ ∂ θ log f ( X ; θ ) ] − E [ T ( X ) ] E [ ∂ ∂ θ log f ( X ; θ ) ] ⏞ = 0 = ∫ T ( X ) ∂ ∂ θ log f ( X ; θ ) f ( X ; θ ) d μ ( X ) = ⋆ ∫ T ( X ) ∂ ∂ θ f ( X ; θ ) d μ ( X ) = † ∂ ∂ θ ∫ T ( X ) f ( X ; θ ) d μ ( X ) = ∂ ∂ θ E [ T ( X ) ] . (5)
\begin{aligned}
&\text{Cov}\left[T(X), \frac{\partial}{\partial \theta} \log f(X; \theta)\right]
\\
&= \mathbb{E}\left[T(X)\frac{\partial}{\partial \theta} \log f(X; \theta) \right] - \overbrace{\mathbb{E}[T(X)] \mathbb{E}\left[\frac{\partial}{\partial \theta} \log f(X; \theta) \right]}^{=\,0}
\\
&= \int T(X) \frac{\partial}{\partial \theta} \log f(X; \theta) f(X; \theta) \text{d}\mu(X)
\\
&\stackrel{\star}{=} \int T(X) \frac{\partial}{\partial \theta} f(X; \theta) \text{d}\mu(X)
\\
&\stackrel{\dagger}{=} \frac{\partial}{\partial \theta} \int T(X) f(X; \theta) \text{d}\mu(X)
\\
&= \frac{\partial}{\partial \theta} \mathbb{E}[T(X)].
\end{aligned} \tag{5}
Cov [ T ( X ) , ∂ θ ∂ log f ( X ; θ ) ] = E [ T ( X ) ∂ θ ∂ log f ( X ; θ ) ] − E [ T ( X ) ] E [ ∂ θ ∂ log f ( X ; θ ) ] = 0 = ∫ T ( X ) ∂ θ ∂ log f ( X ; θ ) f ( X ; θ ) d μ ( X ) = ⋆ ∫ T ( X ) ∂ θ ∂ f ( X ; θ ) d μ ( X ) = † ∂ θ ∂ ∫ T ( X ) f ( X ; θ ) d μ ( X ) = ∂ θ ∂ E [ T ( X ) ] . ( 5 )
In step ⋆ \star ⋆ , we use the fact that if g ( x ) = log h ( x ) g(x) = \log h(x) g ( x ) = log h ( x ) , then g ′ ( x ) = h ′ ( x ) / h ( x ) g^{\prime}(x) = h^{\prime}(x) / h(x) g ′ ( x ) = h ′ ( x ) / h ( x ) . In step † \dagger † , we use our assumption that we can interchange integration and differention.
The denominator is the desired quantity because
V [ ∂ ∂ θ log f ( X ; θ ) ] = E [ ( ∂ ∂ θ log f ( X ; θ ) ) 2 ] − E [ ∂ ∂ θ log f ( X ; θ ) ] 2 = E [ ( ∂ ∂ θ log f ( X ; θ ) ) 2 ] . (6)
\begin{aligned}
\mathbb{V}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right]
&= \mathbb{E}\left[\left(\frac{\partial}{\partial \theta} \log f(X; \theta)\right)^2\right] - \mathbb{E}\left[\frac{\partial}{\partial \theta} \log f(X; \theta)\right]^2
\\
&= \mathbb{E}\left[\left(\frac{\partial}{\partial \theta} \log f(X; \theta)\right)^2\right].
\end{aligned} \tag{6}
V [ ∂ θ ∂ log f ( X ; θ ) ] = E [ ( ∂ θ ∂ log f ( X ; θ ) ) 2 ] − E [ ∂ θ ∂ log f ( X ; θ ) ] 2 = E [ ( ∂ θ ∂ log f ( X ; θ ) ) 2 ] . ( 6 )
Putting these results together in Equation 4 4 4 , we have
V [ T ( X ) ] ≥ ( ∂ ∂ θ E [ T ( X ) ] ) 2 E [ ( ∂ ∂ θ log f ( X ; θ ) ) 2 ] (7)
\mathbb{V}[T(X)] \geq \frac{\left( \frac{\partial}{\partial \theta} \mathbb{E}[T(X)] \right)^2}{\mathbb{E}\left[\left(\frac{\partial}{\partial \theta} \log f(X; \theta)\right)^2\right]} \tag{7}
V [ T ( X ) ] ≥ E [ ( ∂ θ ∂ log f ( X ; θ ) ) 2 ] ( ∂ θ ∂ E [ T ( X ) ] ) 2 ( 7 )
as desired.
Appendix
1. Covariance inequality
The Cauchy–Schwarz inequality for vectors u \mathbf{u} u and v \mathbf{v} v with an inner product ⟨ u , v ⟩ \langle \mathbf{u}, \mathbf{v} \rangle ⟨ u , v ⟩ is
∣ ⟨ u , v ⟩ ∣ 2 ≤ ⟨ u , u ⟩ ⋅ ⟨ v , v ⟩ . (A1.1)
| \langle \mathbf{u}, \mathbf{v }\rangle |^2 \leq \langle \mathbf{u}, \mathbf{u} \rangle \cdot \langle \mathbf{v}, \mathbf{v} \rangle. \tag{A1.1}
∣ ⟨ u , v ⟩ ∣ 2 ≤ ⟨ u , u ⟩ ⋅ ⟨ v , v ⟩ . ( A 1 . 1 )
Now note that for real-valued random variables W W W and Y Y Y , the expected value of their product is itself an inner product :
⟨ W , Y ⟩ = E [ W Y ] . (A1.2)
\langle W, Y \rangle = \mathbb{E}[WY]. \tag{A1.2}
⟨ W , Y ⟩ = E [ W Y ] . ( A 1 . 2 )
Now let E [ W ] = ω \mathbb{E}[W] = \omega E [ W ] = ω and E [ Y ] = γ \mathbb{E}[Y] = \gamma E [ Y ] = γ . Then if we apply the definition of covariance and Cauchy–Schwarz to this inner product, we have
∣ Cov [ W , Y ] ∣ 2 ≜ ∣ E [ ( W − ω ) ( Y − γ ) ] ∣ 2 = ∣ ⟨ W − ω , Y − γ ⟩ ∣ 2 ≤ ⟨ W − ω , W − ω ⟩ ⋅ ⟨ Y − γ , Y − γ ⟩ = E [ ( W − ω ) 2 ] E [ ( Y − γ ) 2 ] = V [ W ] V [ Y ] (A1.3)
\begin{aligned}
|\text{Cov}[W, Y]|^2
&\triangleq|\mathbb{E}[(W - \omega)(Y - \gamma)] |^2
\\
&= |\langle W - \omega, Y - \gamma \rangle |^2
\\
&\leq \langle W - \omega, W - \omega \rangle \cdot \langle Y - \gamma, Y - \gamma \rangle
\\
&= \mathbb{E}[(W - \omega)^2] \mathbb{E}[(Y - \gamma)^2]
\\
&= \mathbb{V}[W] \mathbb{V}[Y]
\end{aligned} \tag{A1.3}
∣ Cov [ W , Y ] ∣ 2 ≜ ∣ E [ ( W − ω ) ( Y − γ ) ] ∣ 2 = ∣ ⟨ W − ω , Y − γ ⟩ ∣ 2 ≤ ⟨ W − ω , W − ω ⟩ ⋅ ⟨ Y − γ , Y − γ ⟩ = E [ ( W − ω ) 2 ] E [ ( Y − γ ) 2 ] = V [ W ] V [ Y ] ( A 1 . 3 )
as desired.