A Python Implementation of the Multivariate t-distribution
I needed a fast and numerically stable Python implementation of the multivariate t-distribution. I wrote one based on SciPy's multivariate distributions module.
Update: My SciPy pull request has been merged into master.
Curiously enough, SciPy does not have an implementation of the multivariate t-distribution. I needed one, but after casting around on the internet, the only thing I found in Python was from this StackOverflow Q&A. I dislike trusting statistical software unless it is widely used or I understand the code, and so I decided to write the function myself. However, once I got underway, I realized that the multivariate t-distribution’s probability density function (PDF),
looks a lot like the multivariate normal’s PDF,
This isn’t surprising since the student t-distribution’s relationship to the normal distribution is well-known. However, since I have already written about and re-implemented a fast and numerically stable implementation of SciPy’s multivariate normal PDF, I thought it made sense to use some of the logic in that implementation. The basic idea is that if we perform an eigendecomposition of , which is positive definite, we can easily and stably compute the matrix inverse and log determinant:
Above, we use the fact that the determinant of a matrix is the product of its eigenvalues. We compute the log PDF and exponentiate it for numerical stability.
A nice feature of this approach is that we can use SciPy’s tested implementation and infrastructure for the tricky bits: inverting , computing the log determinant, and computing a vectorized Mahalanobis distance. In particular, the original author of that module, Joris Vankerschaver, wrote a utility class for automatically handling .
See my GitHub for a fairly complete implementation of a multivariate t-distributed random variable object, which relies heavily on Vankerschaver’s implementation. For legibility, here is a simpler version of the PDF without any comments, sanity checks, or error handling:
import numpy as np
from scipy.special import gammaln
def pdf(x, mean, shape, df):
return np.exp(logpdf(x, mean, shape, df))
def logpdf(x, mean, shape, df):
dim = mean.size
vals, vecs = np.linalg.eigh(shape)
logdet = np.log(vals).sum()
valsinv = np.array([1./v for v in vals])
U = vecs * np.sqrt(valsinv)
dev = x - mean
maha = np.square(np.dot(dev, U)).sum(axis=-1)
t = 0.5 * (df + dim)
A = gammaln(t)
B = gammaln(0.5 * df)
C = dim/2. * np.log(df * np.pi)
D = 0.5 * logdet
E = -t * np.log(1 + (1./df) * maha)
return A - B - C - D + E
I hope to put together a SciPy pull request (PR) for this code in the near future. However, I am not familiar with the workflow and typical duration of SciPy PRs and figured I should just post this now. I am also unsure of how to handle the cumulative distribution function, which has no closed-form solution.
Acknowledgements
I thank Allen Downey for proposing an improvement to the above code.