A Python Implementation of the Multivariate t-distribution

I needed a fast and numerically stable Python implementation of the multivariate t-distribution. I wrote one based on SciPy's multivariate distributions module.

Update: My SciPy pull request has been merged into master.

 

Curiously enough, SciPy does not have an implementation of the multivariate t-distribution. I needed one, but after casting around on the internet, the only thing I found in Python was from this StackOverflow Q&A. I dislike trusting statistical software unless it is widely used or I understand the code, and so I decided to write the function myself. However, once I got underway, I realized that the multivariate t-distribution’s probability density function (PDF),

f(x;μ,Σ,ν)=Γ((ν+p)/2)Γ(ν/2)νp/2πp/2Σ1/2[1+1ν(xμ)Σ1(xμ)](ν+p)/2(1) f(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}, \nu) = \frac{\Gamma((\nu + p) / 2)}{\Gamma(\nu / 2) \nu^{p/2} \pi^{p/2} |\boldsymbol{\Sigma}|^{1/2}} \Big[ 1 + \frac{1}{\nu} (\mathbf{x} - \boldsymbol{\mu})^{\top} \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) \Big]^{-(\nu + p)/2} \tag{1}

looks a lot like the multivariate normal’s PDF,

f(x;μ,Σ)=1(2π)Ddet(Σ)exp{12(xμ)Σ1(xμ)}.(2) f(\mathbf{x}; \boldsymbol{\mu}, \boldsymbol{\Sigma}) = \frac{1}{\sqrt{(2 \pi)^D \det(\boldsymbol{\Sigma})}} \exp\Big\{ -\frac{1}{2} (\mathbf{x} - \boldsymbol{\mu})^{\top} \boldsymbol{\Sigma}^{-1} (\mathbf{x} - \boldsymbol{\mu}) \Big\}. \tag{2}

This isn’t surprising since the student t-distribution’s relationship to the normal distribution is well-known. However, since I have already written about and re-implemented a fast and numerically stable implementation of SciPy’s multivariate normal PDF, I thought it made sense to use some of the logic in that implementation. The basic idea is that if we perform an eigendecomposition of Σ\boldsymbol{\Sigma}, which is positive definite, we can easily and stably compute the matrix inverse and log determinant:

Σ=QΛQΣ1=QΛ1QΣ1=Q[1/λ11/λd]Qlogdet(Σ)=d=1Dlogλd.(3) \begin{aligned} \boldsymbol{\Sigma} &= \mathbf{Q} \boldsymbol{\Lambda} \mathbf{Q}^{\top} \\ &\Downarrow \\ \boldsymbol{\Sigma}^{-1} &= \mathbf{Q} \boldsymbol{\Lambda}^{-1} \mathbf{Q}^{\top} \\ &\Downarrow \\ \boldsymbol{\Sigma}^{-1} &= \mathbf{Q} \begin{bmatrix} 1/\lambda_1 & & \\ & \ddots & \\ & & 1/\lambda_d \end{bmatrix} \mathbf{Q}^{\top} \\ \log \det(\boldsymbol{\Sigma}) &= \sum_{d=1}^{D} \log \lambda_d. \end{aligned} \tag{3}

Above, we use the fact that the determinant of a D×DD \times D matrix is the product of its eigenvalues. We compute the log PDF and exponentiate it for numerical stability.

A nice feature of this approach is that we can use SciPy’s tested implementation and infrastructure for the tricky bits: inverting Σ\boldsymbol{\Sigma}, computing the log determinant, and computing a vectorized Mahalanobis distance. In particular, the original author of that module, Joris Vankerschaver, wrote a utility class for automatically handling Σ\boldsymbol{\Sigma}.

See my GitHub for a fairly complete implementation of a multivariate t-distributed random variable object, which relies heavily on Vankerschaver’s implementation. For legibility, here is a simpler version of the PDF without any comments, sanity checks, or error handling:

import numpy as np
from   scipy.special import gammaln


def pdf(x, mean, shape, df):
    return np.exp(logpdf(x, mean, shape, df))


def logpdf(x, mean, shape, df):
    dim = mean.size

    vals, vecs = np.linalg.eigh(shape)
    logdet     = np.log(vals).sum()
    valsinv    = np.array([1./v for v in vals])
    U          = vecs * np.sqrt(valsinv)
    dev        = x - mean
    maha       = np.square(np.dot(dev, U)).sum(axis=-1)

    t = 0.5 * (df + dim)
    A = gammaln(t)
    B = gammaln(0.5 * df)
    C = dim/2. * np.log(df * np.pi)
    D = 0.5 * logdet
    E = -t * np.log(1 + (1./df) * maha)

    return A - B - C - D + E

I hope to put together a SciPy pull request (PR) for this code in the near future. However, I am not familiar with the workflow and typical duration of SciPy PRs and figured I should just post this now. I am also unsure of how to handle the cumulative distribution function, which has no closed-form solution.

   

Acknowledgements

I thank Allen Downey for proposing an improvement to the above code.