Approximating Stirling's Approximation

How did early mathematicians discover Stirling's approximation, a seemingly non-obvious relationship between factorials, exponents, π\pi, and ee? I provide some plausible reasoning.

I have always been slightly uncomfortable with Stirling’s approximation of the factorial:

n!2πn(ne)n.(1) n! \simeq \sqrt{2 \pi n} \left( \frac{n}{e} \right)^n. \tag{1}

I was uncomfortable because it just felt like magic. Sure, I could read a proof that shows it is true, but why does it work? How did Stirling come up with it? Could I have reasoned my way to this approximation on my own?

But after learning more about the history of the approximation (Dutka, 1991), I found this relationship a bit less mysterious. In fact, I think the history of this problem is a great example of street-fighting mathematics—that is, using basic mathematics, approximations, and guesswork to reason quantitatively, rather than using exact proofs.

So the goal of this post is not to prove that Stirling’s approximation is true. The goal of this post is to use basic mathematics to show it is plausible. In my mind, this was probably how early mathematicians such as Abraham de Moivre and James Stirling approached this problem. Long before they could prove that Equation 11 was true, they had to have somehow convinced themselves that it or a similar functional form were likely. So let’s try to retrace some early steps to gain the same conviction.

Perhaps the most fundamental insight needed is that it is often easier to deal with sums rather than products; and so rather than thinking about n!n!, let’s think about its log:

log ⁣(n!)=i=1nlogi.(2) \log\!\left(n!\right) = \sum_{i=1}^{n} \log i. \tag{2}

This looks a lot like a Riemann integral with a sub-interval width Δx=1\Delta x = 1, which is an approximation of the integral of log(x)\log (x) (Figure 11).

Figure 1. A Riemann integral with Δx=1\Delta x = 1 approximating the integral of log(x)\log(x).

And so we already have a spark of hope, because integrals sometimes have nice, closed forms. So our very first guess might be this:

i=1nlogi1nlog(x)dx.(3) \sum_{i=1}^{n} \log i \approx \int_1^n \log(x) dx. \tag{3}

Conveniently, we can solve this integral using integration by parts:

1nlog(x)dx=nlognn+1,(4) \int_1^n \log(x) dx = n \log n - n + 1, \tag{4}

and this gives us our first approximation:

log(n!)nlognn+1.(5) \log(n!) \approx n \log n - n + 1. \tag{5}

If we exponentiate both sides, we get something very promising:

n!g(n):=e(ne)n.(6) n! \approx g(n) := e \left( \frac{n}{e} \right)^n. \tag{6}

Furthermore, notice that since 252=62525^2 = 625, then

2π6.282.5e.(7) \sqrt{2\pi} \approx \sqrt{6.28} \approx 2.5 \approx e. \tag{7}

So with nothing more than elementary calculus and napkin math, we have an approximation that appears to be within a factor of n\sqrt{n}.

But of course, if we’re early mathematicians, it might not be obvious that this is promising because we don’t know Equation 11 a priori. What could we do? We could check if we’re onto something by computing both n!n! and g(n)g(n) for increasing values of nn. Luckily for us, we have computers, but since n!n! grows very, very quickly, it is hard to visualize. The last datum always dominates yy-axis. So instead, I’ve plotted the ratio between n!n! and g(n)g(n). This ratio should be easy to visualize since it does not grow exponentially (Figure 22).

Figure 2. The ratio between n!n! and g(n)g(n), where g(n)g(n) is defined in Equation 66.

Using Heron’s method for estimating square roots in our head, we can quickly check that this ratio grows roughly with n\sqrt{n}:

204.52,406.32,607.52,8092,(8) \begin{aligned} \sqrt{20} &\approx 4.5^2, \\ \sqrt{40} &\approx 6.3^2, \\ \sqrt{60} &\approx 7.5^2, \\ \sqrt{80} &\approx 9^2, \\ &\dots \end{aligned} \tag{8}

In other words, this figure is screaming at us that our approximation is missing a factor proportional to n\sqrt{n}. Of course it is. I very conveniently plotted the one ratio that would highlight this fact. But let’s pretend we didn’t know this. Could we have gotten to n\sqrt{n} another way?

In fact, we can derive n\sqrt{n} using just a little more calculus. Recall the trapezoidal rule for approximating integrals. We can approximate a definite integral with endpoints aa and bb as

abf(x)dxΔx2(f(x0)+2f(x1)+2f(x2)++2f(xn1)+f(xn)),(9) \int_a^b f(x) dx \approx \frac{\Delta x}{2}\left( f(x_0) + 2f(x_1) + 2f(x_2) + \dots + 2f(x_{n-1}) + f(x_n) \right), \tag{9}

where the points {xi}\{x_i\} partition the region with a constant sub-interval width Δx\Delta x. The basic idea is that we’re approximating each sub-interval of the Riemann integral with a trapezoid rather than a rectangle. This introduces many 0.5f(xi)0.5 f(x_i) terms, due to the triangular”tops”.

Figure 3. An approximation of the integral log(x)\log (x) using the Trapezoidal rule. Here, this is a Riemann integral with Δx=1\Delta x = 1 plus additional triangles to lower the approximation error.

Each intermediate 0.5f(xi)0.5 f(x_i) term has a “sibling” such that the pair sums to unity. The only two terms which don’t have a sibling are at the endpoints a:=x0a := x_0 and b:=xnb := x_n.

So a second key insight is to realize that the right-hand side of Equation 22 is essentially this integral approximation using the trapezoidal rule, except for an extra 0.5f(xi)0.5 f(x_i) for the two endpoints! In other words, we can refine our integral approximation as:

i=1nlogi12(log1+logn)0nlog(x)dx.(10) \sum_{i=1}^n \log i - \frac{1}{2}\left( \log 1 + \log n \right) \approx \int_0^n \log(x) dx. \tag{10}

Combining this with Equation 44, we get

logn!nlognn+1+12logn.(11) \log n! \approx n \log n - n + 1 + \frac{1}{2} \log n. \tag{11}

And exponentiating both sides, we can derive an approximation that is very close in spirit to Stirling’s approximation:

n!h(x):=en(ne)n.(12) n! \approx h(x) := e \sqrt{n} \left( \frac{n}{e} \right)^n. \tag{12}

Now we’re only off from the true approximation by a constant factor,

2πe0.92.(13) \frac{\sqrt{2 \pi}}{e} \approx 0.92. \tag{13}

This is not a function of nn, so if we plot the ratio n!/h(n)n! / h(n), we should expect the ratio to rapidly decay to this factor, which it does (Figure 44).

Figure 4. The ratio between n!n! and h(n)h(n), where h(n)h(n) is defined in Equation 1212.

So that’s it. With some basic calculus and plausible reasoning, we were able to derive an approximation of n!n! that is quite close to true approximation discovered by de Moivre and Stirling. Of course, the reasoning here does not constitute a proof, but in my imagination, this kind of thinking might have guided someone through a more rigorous process. In fact, my understanding is that Stirling is credited with this approximation precisely because he proved the missing factor 2π\sqrt{2 \pi}. So mathematicians knew some form of this approximation was plausible well before Stirling made it rigorous.

Perhaps the most interesting question now might be: why does the number π\pi come into an asymptotic approximation of nn factorial? This is an interesting question, but it is big enough to warrant a second post. Thankfully, I don’t need to write that, since Donald Knuth has already given an interesting talk on just this question: Why Pi?

  1. Dutka, J. (1991). The early history of the factorial function. Archive for History of Exact Sciences, 225–249.