Bayesian modeling

I learned very early the difference between knowing the name of something and knowing something.

Bayesian modeling

16 April 2021

I derive the evidence lower bound (ELBO) in variational inference and explore its relationship to the objective in expectation–maximization and the variational autoencoder.

Understanding Dirichlet–Multinomial Models

24 December 2020

The Dirichlet distribution is really a multivariate beta distribution. I discuss this connection and then derive the posterior, marginal likelihood, and posterior predictive distributions for Dirichlet–multinomial models.

Conjugate Analysis for the Multivariate Gaussian

18 November 2020

I work through Bayesian parameter estimation of the mean for the multivariate Gaussian.

From Entropy Search to Predictive Entropy Search

28 October 2020

In Bayesian optimization, a popular acquisition function is predictive entropy search, which is a clever reframing of another acquisition function, entropy search. I rederive the connection and explain why this reframing is useful.

A Unifying Review of EM for Gaussian Latent Factor Models

25 October 2020

The expectation–maximization (EM) updates for several Gaussian latent factor models (factor analysis, probabilistic principal component analysis, probabilistic canonical correlation analysis, and inter-battery factor analysis) are closely related. I explore these relationships in detail.

Implementing Bayesian Online Changepoint Detection

20 October 2020

I annotate my Python implementation of the framework in Adams and MacKay's 2007 paper, "Bayesian Online Changepoint Detection".

Bayesian Inference for Beta–Bernoulli Models

19 August 2020

I derive the posterior, marginal likelihood, and posterior predictive distributions for beta–Bernoulli models.

Gaussian Process Dynamical Models

24 July 2020

Wang and Fleet's 2008 paper, "Gaussian Process Dynamical Models for Human Motion", introduces a Gaussian process latent variable model with Gaussian process latent dynamics. I discuss this paper in detail.

From Probabilistic PCA to the GPLVM

14 July 2020

A Gaussian process latent variable model (GPLVM) can be viewed as a generalization of probabilistic principal component analysis (PCA) in which the latent maps are Gaussian-process distributed. I discuss this relationship.

Hamiltonian Monte Carlo

05 July 2020

The physics of Hamiltonian Monte Carlo, part 3: In the final post in this series, I discuss Hamiltonian Monte Carlo, building off previous discussions of the Euler–Lagrange equation and Hamiltonian dynamics.

Gaussian Processes with Multinomial Observations

03 July 2020

Linderman, Johnson, and Adam's 2015 paper, "Dependent multinomial models made easy: Stick-breaking with the Pólya-gamma augmentation", introduces a Gibbs sampler for Gaussian processes with multinomial observations. I discuss this model in detail.

A Stick-Breaking Representation of the Multinomial Distribution

01 July 2020

Following Linderman, Johnson, and Adam's 2015 paper, "Dependent multinomial models made easy: Stick-breaking with the Pólya-gamma augmentation", I show that a multinomial density can be represented as a product of binomial densities.

Lagrangian and Hamiltonian Mechanics

14 June 2020

The physics of Hamiltonian Monte Carlo, part 2: Building off the Euler–Lagrange equation, I discuss Lagrangian mechanics, the principle of stationary action, and Hamilton's equations.

The Euler–Lagrange Equation

10 May 2020

The physics of Hamiltonian Monte Carlo, part 1: Lagrangian and Hamiltonian mechanics are based on the principle of stationary action, formalized by the calculus of variations and the Euler–Lagrange equation. I discuss this result.

Gibbs Sampling Is a Special Case of Metropolis–Hastings

23 February 2020

Gibbs sampling is a computationally convenient Bayesian inference algorithm that is a special case of the Metropolis–Hastings algorithm. I discuss Gibbs sampling in the broader context of Markov chain Monte Carlo methods.

Bayesian Linear Regression

04 February 2020

I discuss Bayesian linear regression or classical linear regression with a prior on the parameters. Using a particular prior as an example, I provide intuition and detailed derivations for the full model.

Comparing Kernel Ridge with Gaussian Process Regression

06 January 2020

The posterior mean from a Gaussian process regressor is related to the prediction of a kernel ridge regressor. I explore this connection in detail.

Expectation–Maximization

10 November 2019

For many latent variable models, maximizing the complete log likelihood is easier than maximizing the log likelihood. The expectation–maximization (EM) algorithm leverages this fact to construct and optimize a tight lower bound. I rederive EM.

Why Metropolis–Hastings Works

02 November 2019

Many authors introduce Metropolis–Hastings through its acceptance criteria without explaining why such a criteria allows us to sample from our target distribution. I provide a formal justification.

Pólya-Gamma Augmentation

20 September 2019

Bayesian inference for models with binomial likelihoods is hard, but in a 2013 paper, Nicholas Polson and his coauthors introduced a new method fast Bayesian inference using Gibbs sampling. I discuss their main results in detail.

A Poisson–Gamma Mixture Is Negative-Binomially Distributed

16 September 2019

We can view the negative binomial distribution as a Poisson distribution with a gamma prior on the rate parameter. I work through this derivation in detail.

A Practical Implementation of Gaussian Process Regression

12 September 2019

I discuss Rasmussen and Williams's Algorithm 2.1 for an efficient implementation of Gaussian process regression.

Sampling: Two Basic Algorithms

01 September 2019

Numerical sampling uses randomized algorithms to sample from and estimate properties of distributions. I explain two basic sampling algorithms, rejection sampling and importance sampling.

Bayesian Online Changepoint Detection

13 August 2019

Adams and MacKay's 2007 paper, "Bayesian Online Changepoint Detection", introduces a modular Bayesian framework for online estimation of changes in the generative parameters of sequential data. I discuss this paper in detail.

Gaussian Process Regression with Code Snippets

27 June 2019

The definition of a Gaussian process is fairly abstract: it is an infinite collection of random variables, any finite number of which are jointly Gaussian. I work through this definition with an example and provide several complete code snippets.

Laplace's Method

08 May 2019

Laplace's method is used to approximate a distribution with a Gaussian. I explain the technique in general and work through an exercise by David MacKay.

Bayesian Inference for the Gaussian

04 April 2019

I work through several cases of Bayesian parameter estimation of Gaussian models.

The Exponential Family

19 March 2019

Probability distributions that are members of the exponential family have mathematically convenient properties for Bayesian inference. I provide the general form, work through several examples, and discuss several important properties.

Conjugacy in Bayesian Inference

16 March 2019

Conjugacy is an important property in exact Bayesian inference. I work though Bishop's example of a beta conjugate prior for the binomial distribution and explore why conjugacy is useful.