A Poisson–Gamma Mixture Is Negative-Binomially Distributed

We can view the negative binomial distribution as a Poisson distribution with a gamma prior on the rate parameter. I work through this derivation in detail.

Consider a Poisson model for count data,

yPoisson(θ),θ0. y \sim \text{Poisson}(\theta), \qquad \theta \geq 0.

The parameter θ\theta can be interpreted as the rate of arrivals, and importantly, E[y]=Var(y)=θ\mathbb{E}[y] = \text{Var}(y) = \theta. An unfortunate property of this Poisson model is that it cannot model overdispersed data or data in which the variance is greater than the mean. This is because Poisson regression has one free parameter. However, if we place a gamma prior on θ\theta,

yPoisson(θ)θgamma(r,p1p), \begin{aligned} y &\sim \text{Poisson}(\theta) \\ \theta &\sim \text{gamma}\Big(r, \frac{p}{1-p}\Big), \end{aligned}

and then marginalize out θ\theta, we get a negative binomial (NB) distribution, which has the useful property that its variance can be greater than its mean. The derivation is

p(y)=0p(yθ)p(θ)dθ=0(θyeθy!)(1Γ(r)(p1p)rθr1eθ(1p)/p)dθ=(1p)rpry!Γ(r)0θr+y1eθ/pdθ=(1p)rpry!Γ(r)pr+yΓ(r+y)=Γ(r+y)Γ(r)y!py(1p)r=(r+y1)!(r1)!y!py(1p)r=(y+r1y)py(1p)r=NB(r,p). \begin{aligned} p(y) &= \int_{0}^{\infty} p(y \mid \theta) p(\theta) \text{d} \theta \\\\ &= \int_{0}^{\infty} \Big( \frac{\theta^{y} e^{-\theta}}{y!} \Big) \Big( \frac{1}{\Gamma(r) \big( \frac{p}{1-p} \big)^r} \theta^{r - 1} e^{-\theta (1-p)/p} \Big) \text{d} \theta \\\\ &= \frac{(1-p)^r p^{-r}}{y! \Gamma(r)} \int_{0}^{\infty} \theta^{r+y-1} e^{-\theta/p} \text{d} \theta \\\\ &\stackrel{\star}{=} \frac{(1-p)^r p^{-r}}{y! \Gamma(r)} p^{r+y} \Gamma(r+y) \\\\ &= \frac{\Gamma(r+y)}{\Gamma(r) y!} p^y (1-p)^r \\\\ &\stackrel{\dagger}{=} \frac{(r+y-1)!}{(r-1)!y!} p^y (1-p)^r \\\\ &= {y+r-1 \choose y} p^y (1-p)^r \\\\ &= \text{NB}(r, p). \end{aligned}

Step \star holds because of the following equality,

0xbeaxdx=Γ(b+1)ab+1. \int_{0}^{\infty} x^{b} e^{-ax} \text{d}x = \frac{\Gamma(b + 1)}{a^{b+1}}.

Wikipedia claims that this is part of the usefulness of the gamma function: integrals of expressions of the form f(x)eg(x)f(x) e^{-g(x)}, which model exponential decay, can be sometimes solved in closed form using the above equation.

Step \dagger uses the fact that Γ(x)=(x1)!\Gamma(x) = (x - 1)!.