De Moivre–Laplace Theorem
I work through a standard proof of the de Moivre–Laplace theorem, which is the earliest version of the central limit theorem.
As I understand it, the de Moivre–Laplace theorem is the earliest version of the central limit theorem (CLT). In his book The Doctrine of Chances (De Moivre, 1738), Abraham de Moivre proved that the probability mass function of the binomial distribution asymptotically approximates the probability density function of a particular normal distribution as its parameter grows arbitrarily large. Today, we know that the CLT generalizes this result, and we might say this is a special case of the CLT for the binomial distribution.
To introduce notation, we say that is a binomial random variable with parameters and if
Typically, we view as the the sum of Bernoulli random variables, each with parameter . Intuitively, if we flip coins each with bias , Equation gives the probability of successes.
This is clearly related to the CLT, which loosely states that the properly normalized sum of random variables asymptotically approaches the normal distribution. If we let denote these Bernoulli random variables, we can express this idea as
where denotes asymptotic equivalence as .
This is probably the most intuitive form of the CLT because if we simply plot the probability mass function (PMF) for the binomial distribution for increasing values of , we get a discrete distribution which nearly immediately looks a lot like the normal distribution even for relatively small (Figure ). In contrast, I think the CLT is much less obvious feeling if I were to claim (correctly) that the properly normalized sum of skew normal random variables is also normally distributed!

A modern version of de Moivre’s proof is tedious, but it’s not actually that hard to follow. This post is simply my notes on that proof. To start, let’s rewrite the binomial coefficient without the factorial using Stirling’s approximation:
As a historical aside, note that while Stirling is credited with this approximation, it was actually de Moivre who discovered an early version of it while working on these ideas. So de Moivre has been robbed twice, once for this approximation and once for the normal distribution sometimes being called the “Gaussian” rather than the “de Moivrian”. Anyway, using Stirling’s approximation, we can rewrite the binomial coefficient as
If we multiply this term by the “raw probabilities” and group the terms raised to the powers and , we get:
My understanding as to the motivation for the next two steps is that we want to “push” into the denominator, which is often nice in asymptotics because it makes terms vanish as gets larger. Let’s tackle the normalizing term (square root) and the probabilities separately.
First, the square root. Note that by the law of large numbers, as gets very large, arbitrarily approaches the true probability of success . So let’s rewrite the the square root in terms of and then write in terms of :
If you were already familiar with the normal distribution, this term should look suspiciously like the normalizing constant!
Second, the probabilities. The next step is a fairly standard trick, which is to convert a product into a sum by taking the exp-log of the product. Looking only at the terms raised to and in Equation , we get:
The next trick is express in terms of a standardized binomial random variable . Notice that is the sum of independent Bernoulli random variables. By the linearity of expectation and the linearity of variance under independence, we have:
Since the mean of is and its variance is , a standardized binomial random variable is
And we can write this in terms of as
Putting this definition of into the formula above—the point here is to express in terms of , which is the term we want to pay attention to as it increases—, we get:
In my mind, the final step is the least obvious, but it’s lovely when you see it. Recall that the Maclaurin series of is
This is a fairly standard result, and it’s worth just writing out yourself if you’ve never done it. Anyway, we can plug in these two definitions of ,
into Equation above, and use that to expand the logs in Equation into infinite sums. Why are we doing this? The key idea that we’ll see is that nearly every term in each sum will be a fraction with in the denominator. So as grows larger, these terms will become arbitrarily small. In the limit, they vanish. All that will be left is the normal distribution’s kernel, . Let’s do this.
First, let’s just look at one of the log terms. We can write the left one as:
The key thing to see is that for most terms in the sum, after we multiply it by or , we still have in the denominator. And these terms vanish since for some constant , the ratio goes to zero as . So multiplying the terms in Equation , we get
That’s the basic idea. If we do the expansion for the other term in Equation , we’ll see that it’s equal to:
Putting these two terms together, we can see that the exponent term is equal to:
And this is the normal distribution’s kernel! Putting this together with the normalizing term in Equation and then using the definition of the standardized variable in Equation , we get:
And we’re done! This is quite elegant, because we have expressed this asymptotic distribution in terms of the mean and variance of .
This is remarkable! I still remember the first time I saw this derived and realized precisely why the normal distribution was so pervasive. The normal distribution is everywhere because if you take a bunch of random noise and smash it together, the result is most likely normally distributed!

Note that the more general CLT does not require that the random variables in the sum be Bernoulli distributed. For example, if is the sum of independent skew normal random variables, itself is still normally distributed! See Figure for a numerical experiment demonstrating this. The de Moivre–Laplace Theorem was the first hint that this more general result, the central limit theorem, was actually true.
- De Moivre, A. (1738). The doctrine of chances: or, A method of calculating the probabilities of events in play. Woodfall.