Machine learning

I learned very early the difference between knowing the name of something and knowing something.

Machine learning

Scaling Factors for Hidden Markov Models

13 August 2022

Inference for hidden Markov models (HMMs) is numerically unstable. A standard approach to resolving this instability is to use scaling factors. I discuss this idea in detail.

Conjugate Gradient Descent

20 March 2022

Conjugate gradient descent (CGD) is an iterative algorithm for minimizing quadratic functions. CGD uses a kind of orthogonality (conjugacy) to efficiently search for the minimum. I present CGD by building it up from gradient descent.

The ELBO in Variational Inference

16 April 2021

I derive the evidence lower bound (ELBO) in variational inference and explore its relationship to the objective in expectation–maximization and the variational autoencoder.

Inference for Hidden Markov Models

28 November 2020

Expectation–maximization for hidden Markov models is called the Baum–Welch algorithm, and it relies on the forward–backward algorithm for efficient computation. I review HMMs and then present these algorithms in detail.

The Unscented Transform

19 November 2020

The unscented transform, most commonly associated with the nonlinear Kalman filter, was proposed by Jeffrey Uhlmann to estimate a nonlinear transformation of a Gaussian. I illustrate the main idea.

From Entropy Search to Predictive Entropy Search

28 October 2020

In Bayesian optimization, a popular acquisition function is predictive entropy search, which is a clever reframing of another acquisition function, entropy search. I rederive the connection and explain why this reframing is useful.

A Unifying Review of EM for Gaussian Latent Factor Models

25 October 2020

The expectation–maximization (EM) updates for several Gaussian latent factor models (factor analysis, probabilistic principal component analysis, probabilistic canonical correlation analysis, and inter-battery factor analysis) are closely related. I explore these relationships in detail.

Implementing Bayesian Online Changepoint Detection

20 October 2020

I annotate my Python implementation of the framework in Adams and MacKay's 2007 paper, "Bayesian Online Changepoint Detection".

Gaussian Process Dynamical Models

24 July 2020

Wang and Fleet's 2008 paper, "Gaussian Process Dynamical Models for Human Motion", introduces a Gaussian process latent variable model with Gaussian process latent dynamics. I discuss this paper in detail.

From Probabilistic PCA to the GPLVM

14 July 2020

A Gaussian process latent variable model (GPLVM) can be viewed as a generalization of probabilistic principal component analysis (PCA) in which the latent maps are Gaussian-process distributed. I discuss this relationship.

Gaussian Processes with Multinomial Observations

03 July 2020

Linderman, Johnson, and Adam's 2015 paper, "Dependent multinomial models made easy: Stick-breaking with the Pólya-gamma augmentation", introduces a Gibbs sampler for Gaussian processes with multinomial observations. I discuss this model in detail.

A Stick-Breaking Representation of the Multinomial Distribution

01 July 2020

Following Linderman, Johnson, and Adam's 2015 paper, "Dependent multinomial models made easy: Stick-breaking with the Pólya-gamma augmentation", I show that a multinomial density can be represented as a product of binomial densities.

Comparing Kernel Ridge with Gaussian Process Regression

06 January 2020

The posterior mean from a Gaussian process regressor is related to the prediction of a kernel ridge regressor. I explore this connection in detail.

Random Fourier Features

23 December 2019

Rahimi and Recht's 2007 paper, "Random Features for Large-Scale Kernel Machines", introduces a framework for randomized, low-dimensional approximations of kernel functions. I discuss this paper in detail with a focus on random Fourier features.

Implicit Lifting and the Kernel Trick

10 December 2019

I disentangle the what I call the "lifting trick" from the kernel trick as a way of clarifying what the kernel trick is and does.

Approximate Counting with Morris's Algorithm

11 November 2019

Robert Morris's algorithm for counting large numbers using 8-bit registers is an early example of a sketch or data structure for efficiently processing a data stream. I introduce the algorithm and analyze its probabilistic behavior.

Pólya-Gamma Augmentation

20 September 2019

Bayesian inference for models with binomial likelihoods is hard, but in a 2013 paper, Nicholas Polson and his coauthors introduced a new method fast Bayesian inference using Gibbs sampling. I discuss their main results in detail.

A Practical Implementation of Gaussian Process Regression

12 September 2019

I discuss Rasmussen and Williams's Algorithm 2.1 for an efficient implementation of Gaussian process regression.

Bayesian Online Changepoint Detection

13 August 2019

Adams and MacKay's 2007 paper, "Bayesian Online Changepoint Detection", introduces a modular Bayesian framework for online estimation of changes in the generative parameters of sequential data. I discuss this paper in detail.

Gaussian Process Regression with Code Snippets

27 June 2019

The definition of a Gaussian process is fairly abstract: it is an infinite collection of random variables, any finite number of which are jointly Gaussian. I work through this definition with an example and provide several complete code snippets.

Modeling Repulsion with Determinantal Point Processes

06 November 2018

Determinantal point process are point processes characterized by the determinant of a positive semi-definite matrix, but what this means is not necessarily obvious. I explain how such a process can model repulsive systems.

An Example of Probabilistic Machine Learning

13 June 2018

Probabilistic machine learning is a useful framework for handling uncertainty and modeling generative processes. I explore this approach by comparing two models, one with and one without a clear probabilistic interpretation.

The Reparameterization Trick

29 April 2018

A common explanation for the reparameterization trick with variational autoencoders is that we cannot backpropagate through a stochastic node. I provide a more formal justification.

Why Backprop Goes Backward

15 April 2018

Backprogation is an algorithm that computes the gradient of a neural network, but it may not be obvious why the algorithm uses a backward pass. The answer allows us to reconstruct backprop from first principles.

From Convolution to Neural Network

24 February 2017

Most explanations of CNNs assume the reader understands the convolution operation and how it relates to image processing. I explore convolutions in detail and explain how they are implemented as layers in a neural network.