De Moivre–Laplace Theorem

I work through a standard proof of the de Moivre–Laplace theorem, which is the earliest version of the central limit theorem.

Published

08 March 2025

As I understand it, the de Moivre–Laplace theorem is the earliest version of the central limit theorem (CLT). In his book The Doctrine of Chances (De Moivre, 1738), Abraham de Moivre proved that the probability mass function of the binomial distribution asymptotically approximates the probability density function of a particular normal distribution as its parameter $n$ grows arbitrarily large. Today, we know that the CLT generalizes this result, and we might say this is a special case of the CLT for the binomial distribution.

To introduce notation, we say that $X_n$ is a binomial random variable with parameters $n$ and $p$ if

$\mathbb{P}(X_n = k) = {n \choose k} p^k q^{n-k}, \qquad p \in [0, 1],\;\;q := 1-p,\;\;n \in \mathbb{N}. \tag{1}$

Typically, we view $X_n$ as the the sum of $n$ Bernoulli random variables, each with parameter $p$ . Intuitively, if we flip $n$ coins each with bias $p$ , Equation $1$ gives the probability of $k$ successes.

This is clearly related to the CLT, which loosely states that the properly normalized sum of random variables asymptotically approaches the normal distribution. If we let $Y_i$ denote these Bernoulli random variables, we can express this idea as

$\overbrace{\vphantom{\Big|} Y_1 + Y_2 + \dots + Y_n}^{X_n} \;\simeq\; \mathcal{N}\!\left(np, npq\right), \tag{2}$

where $\simeq$ denotes asymptotic equivalence as $n \rightarrow \infty$ .

This is probably the most intuitive form of the CLT because if we simply plot the probability mass function (PMF) for the binomial distribution for increasing values of $n$ , we get a discrete distribution which nearly immediately looks a lot like the normal distribution even for relatively small $n$ (Figure $1$ ). In contrast, I think the CLT is much less obvious feeling if I were to claim (correctly) that the properly normalized sum of skew normal random variables is also normally distributed!

Figure 1. Binomial distributions for increasing

n

and fixed

p=1/2

. The red line is the probability density function of the normal distribution with mean

np

and variance

npq

A modern version of de Moivre’s proof is tedious, but it’s not actually that hard to follow. This post is simply my notes on that proof. To start, let’s rewrite the binomial coefficient without the factorial using Stirling’s approximation:

$n! \simeq \sqrt{2\pi n} \left(\frac{n}{e}\right)^n. \tag{3}$

As a historical aside, note that while Stirling is credited with this approximation, it was actually de Moivre who discovered an early version of it while working on these ideas. So de Moivre has been robbed twice, once for this approximation and once for the normal distribution sometimes being called the “Gaussian” rather than the “de Moivrian”. Anyway, using Stirling’s approximation, we can rewrite the binomial coefficient as

$\begin{aligned} {n \choose k} & \simeq \sqrt{\frac{2\pi n}{(2\pi k)(2\pi (n-k))}} \left(\frac{n}{e}\right)^n \left(\frac{k}{e}\right)^{-k} \left(\frac{n-k}{e}\right)^{k-n} \\ &= \sqrt{\frac{n}{2\pi k (n-k))}} \left(\frac{n^n}{k^k (n-k)^{n-k}}\right). \end{aligned} \tag{4}$

If we multiply this term by the “raw probabilities” $p^k q^{n-k}$ and group the terms raised to the powers $k$ and $n-k$ , we get:

$\begin{aligned} {n \choose k} p^k q^{n-k} &\simeq \sqrt{\frac{n}{2\pi k (n-k))}} \left(\frac{n^n}{k^k (n-k)^{n-k}}\right) p^k q^{n-k} \\ &= \sqrt{\frac{n}{2\pi k (n-k))}} \left( \frac{np}{k} \right)^k \left( \frac{nq}{n-k} \right)^{n-k}. \end{aligned} \tag{5}$

My understanding as to the motivation for the next two steps is that we want to “push” $n$ into the denominator, which is often nice in asymptotics because it makes terms vanish as $n$ gets larger. Let’s tackle the normalizing term (square root) and the probabilities separately.

First, the square root. Note that by the law of large numbers, as $n$ gets very large, $k/n$ arbitrarily approaches the true probability of success $p$ . So let’s rewrite the the square root in terms of $k/n$ and then write $k/n$ in terms of $p$ :

${n \choose k} \simeq \sqrt{\frac{n}{2\pi k (n-k))}} = \sqrt{\frac{1}{2\pi \frac{k}{n} n (1 - \frac{k}{n}))}} \simeq \frac{1}{\sqrt{2\pi n p q}}. \tag{6}$

If you were already familiar with the normal distribution, this term should look suspiciously like the normalizing constant!

Second, the probabilities. The next step is a fairly standard trick, which is to convert a product into a sum by taking the exp-log of the product. Looking only at the terms raised to $k$ and $n-k$ in Equation $5$ , we get:

$\begin{aligned} \left( \frac{np}{k} \right)^k \left( \frac{nq}{n-k} \right)^{n-k} &= \exp \left\{ \log \left( \frac{np}{k} \right)^k + \log \left( \frac{nq}{n-k} \right)^{n-k} \right\} \\ &= \exp \left\{ - k \log \left( \frac{k}{np} \right) + (k-n) \log \left( \frac{n-k}{nq} \right) \right\}. \end{aligned} \tag{7}$

The next trick is express $k$ in terms of a standardized binomial random variable $z$ . Notice that $X_n$ is the sum of $n$ independent Bernoulli random variables. By the linearity of expectation and the linearity of variance under independence, we have:

$\begin{aligned} \mathbb{E}[X_n] &= \sum_{i=1}^n \mathbb{E}[Y_i] = np, \\ \mathbb{V}[X_n] &= \sum_{i=1}^n \mathbb{V}[Y_i] = npq. \end{aligned} \tag{8}$

Since the mean of $X_n$ is $np$ and its variance is $npq$ , a standardized binomial random variable is

$z := \frac{k - \mathbb{E}[k]}{\sqrt{\mathbb{V}[k]}} = \frac{k - np}{\sqrt{npq}}. \tag{9}$

And we can write this in terms of $k$ as

$k = np + z \sqrt{npq}. \tag{10}$

Putting this definition of $k$ into the formula above—the point here is to express $k$ in terms of $n$ , which is the term we want to pay attention to as it increases—, we get:

$\begin{aligned} &\exp \left\{ - k \log \left( \frac{k}{np} \right) + (k-n) \log \left( \frac{n-k}{nq} \right) \right\} \\ &= \exp \left\{ - k \log \left( \frac{np + z \sqrt{npq}}{np} \right) + (k-n) \log \left( \frac{n-np - z \sqrt{npq}}{nq} \right) \right\} \\ &= \exp \left\{ - k \log \left( 1 + z \sqrt{\frac{q}{np}} \right) + (k-n) \log \left( 1 - z \sqrt{\frac{p}{nq}} \right) \right\}. \end{aligned} \tag{11}$

In my mind, the final step is the least obvious, but it’s lovely when you see it. Recall that the Maclaurin series of $\log(1+x)$ is

$\log(1+x) = x - \frac{x^2}{2} + \frac{x^3}{3} - \dots \tag{12}$

This is a fairly standard result, and it’s worth just writing out yourself if you’ve never done it. Anyway, we can plug in these two definitions of $x$ ,

$x := z \sqrt{\frac{q}{np}}, \qquad x := -z \sqrt{\frac{p}{nq}}, \tag{13}$

into Equation $12$ above, and use that to expand the logs in Equation $11$ into infinite sums. Why are we doing this? The key idea that we’ll see is that nearly every term in each sum will be a fraction with $n$ in the denominator. So as $n$ grows larger, these terms will become arbitrarily small. In the limit, they vanish. All that will be left is the normal distribution’s kernel, $\exp\{-0.5 z^2\}$ . Let’s do this.

First, let’s just look at one of the log terms. We can write the left one as:

$\begin{aligned} &-k \log \left( 1 + z \sqrt{\frac{q}{np}} \right) \\ &= -(np + z\sqrt{npq})\left[z \left(\frac{q}{np}\right)^{1/2} - \frac{1}{2} z^2 \frac{q}{np} + \frac{1}{3} z^3 \left(\frac{q}{np}\right)^{3/2} + \dots \right]. \end{aligned} \tag{14}$

The key thing to see is that for most terms in the sum, after we multiply it by $n$ or $\sqrt{n}$ , we still have $n$ in the denominator. And these terms vanish since for some constant $c$ , the ratio $c/n$ goes to zero as $n \rightarrow \infty$ . So multiplying the terms in Equation $14$ , we get

$\begin{aligned} &\left[ - z\sqrt{npq} + \frac{1}{2} z^2 q - \frac{1}{3} z^3 \frac{q^{3/2}}{\sqrt{np}} + \dots \right] + \left[ - z^2 q + \frac{1}{2} z^3 \frac{q^{3/2}}{\sqrt{np}} - \frac{1}{3} z^4 \frac{q^2}{np} + \dots \right] \\ &= -z\sqrt{npq} + \frac{1}{2} z^2 q - z^2 q \\ &= -z\sqrt{npq} - \frac{1}{2} z^2 q. \end{aligned} \tag{15}$

That’s the basic idea. If we do the expansion for the other term in Equation $11$ , we’ll see that it’s equal to:

$\begin{aligned} (k-n) \log \left( 1 - z \sqrt{\frac{p}{nq}} \right) &= z\sqrt{npq} + \frac{1}{2} z^2 p - z^2 p + \dots \\ &\simeq z\sqrt{npq} - \frac{1}{2} z^2 p. \end{aligned} \tag{16}$

Putting these two terms together, we can see that the exponent term is equal to:

$\begin{aligned} &\exp \left\{ - k \log \left( 1 + z \sqrt{\frac{q}{np}} \right) + (k-n) \log \left( 1 - z \sqrt{\frac{p}{nq}} \right) \right\} \\ &\simeq \exp\left\{ -z\sqrt{npq} - \frac{1}{2} z^2 q + z\sqrt{npq} - \frac{1}{2} z^2 p \right\} \\ &= \exp\left\{ - \frac{1}{2} z^2 p - \frac{1}{2} z^2 q \right\} \\ &= \exp\left\{ - \frac{1}{2} z^2 (p + q) \right\} \\ &= \exp\left\{ - \frac{1}{2} z^2 \right\}. \end{aligned} \tag{17}$

And this is the normal distribution’s kernel! Putting this together with the normalizing term in Equation $6$ and then using the definition of the standardized variable $z$ in Equation $9$ , we get:

${n \choose k} p^k q^{n-k} \;\simeq\; \frac{1}{\sqrt{2\pi n p q}} \exp\left\{ - \frac{1}{2} \left( \frac{k - np}{\sqrt{npq}} \right)^2 \right\}. \tag{18}$

And we’re done! This is quite elegant, because we have expressed this asymptotic distribution in terms of the mean and variance of $X_n$ .

This is remarkable! I still remember the first time I saw this derived and realized precisely why the normal distribution was so pervasive. The normal distribution is everywhere because if you take a bunch of random noise and smash it together, the result is most likely normally distributed!

Figure 2. (Left) A histogram of

Y_i

, where each sample is skew normally distributed. (Right) A histogram of

X_n

, where

X_n = Y_1 + \dots + Y_n

is approximately normally distributed.

Note that the more general CLT does not require that the random variables in the sum be Bernoulli distributed. For example, if $X_n$ is the sum of $n$ independent skew normal random variables, $X_n$ itself is still normally distributed! See Figure $2$ for a numerical experiment demonstrating this. The de Moivre–Laplace Theorem was the first hint that this more general result, the central limit theorem, was actually true.

De Moivre, A. (1738). The doctrine of chances: or, A method of calculating the probabilities of events in play. Woodfall.