Probability and statistics

I learned very early the difference between knowing the name of something and knowing something.

Probability and statistics

08 March 2025

I work through a standard proof of the de Moivre–Laplace theorem, which is the earliest version of the central limit theorem.

Approximating Stirling's Approximation

06 March 2025

How did early mathematicians discover Stirling's approximation, a seemingly non-obvious relationship between factorials, exponents, $\pi$ , and $e$ ? I provide some plausible reasoning.

Expectation of the Truncated Lognormal Distribution

18 August 2024

I derive the expected value of a random variable that is left-truncated and lognormally distributed.

Bienaymé's Identity

04 January 2024

In probability theory, Bienaymé's identity is a formula for the variance of random variables which are themselves sums of random variables. I provide a little intuition for the identity and then prove it.

Lognormal Distribution

17 December 2023

I derive some basic properties of the lognormal distribution.

High-Dimensional Variance

09 December 2023

A useful view of a covariance matrix is that it is a natural generalization of variance to higher dimensions. I explore this idea.

Moving Averages

04 June 2022

I discuss moving or rolling averages, which are algorithms to compute means over different subsets of sequential data.

The Gauss–Markov Theorem

08 February 2022

I discuss and prove the Gauss–Markov theorem, which states that under certain conditions, the least squares estimator is the minimum-variance linear unbiased estimator of the model parameters.

Standard Errors and Confidence Intervals

16 February 2021

How do we know when a parameter estimate from a random sample is significant? I discuss the use of standard errors and confidence intervals to answer this question.

A Python Demonstration that Mutual Information Is Symmetric

11 November 2020

I provide a numerical demonstration that the mutual information of two random variables, the observations and latent variables in a Gaussian mixture model, is symmetric.

Proof that Mutual Information Is Symmetric

10 November 2020

The mutual information (MI) of two random variables quantifies how much information (in bits or nats) is obtained about one random variable by observing the other. I discuss MI and show it is symmetric.

Entropy of the Gaussian

01 September 2020

I derive the entropy for the univariate and multivariate Gaussian distributions.

Understanding Moments

11 April 2020

Why are a distribution's moments called "moments"? How does the equation for a moment capture the shape of a distribution? Why do we typically only study four moments? I explore these and other questions in detail.

Asymptotic Normality of Maximum Likelihood Estimators

28 November 2019

Under certain regularity conditions, maximum likelihood estimators are "asymptotically efficient", meaning that they achieve the Cramér–Rao lower bound in the limit. I discuss this result.

Proof of the Cramér–Rao Lower Bound

27 November 2019

The Cramér–Rao lower bound allows us to derive uniformly minimum–variance unbiased estimators by finding unbiased estimators that achieve this bound. I derive the main result.

The Fisher Information

21 November 2019

I document several properties of the Fisher information or the variance of the derivative of the log likelihood.

Proof of the Rao–Blackwell Theorem

15 November 2019

I walk the reader through a proof the Rao–Blackwell Theorem.

Proof of the Law of Total Expectation

14 November 2019

I discuss a straightforward proof of the law of total expectation with three standard assumptions.

Interpreting Expectations and Medians as Minimizers

04 October 2019

I show how several properties of the distribution of a random variable—the expectation, conditional expectation, and median—can be viewed as solutions to optimization problems.

The Exponential Family

19 March 2019

Probability distributions that are members of the exponential family have mathematically convenient properties for Bayesian inference. I provide the general form, work through several examples, and discuss several important properties.

Random Noise and the Central Limit Theorem

01 February 2019

Many probabilistic models assume random noise is Gaussian distributed. I explain at least part of the motivation for this, which is grounded in the Central Limit Theorem.

The KL Divergence: From Information to Density Estimation

22 January 2019

The KL divergence, also known as "relative entropy", is a commonly used metric for density estimation. I re-derive the relationships between probabilities, entropy, and relative entropy for quantifying similarity between distributions.

Proof of Bessel's Correction

11 January 2019

Bessel's correction is the division of the sample variance by $N - 1$ rather than $N$ . I walk the reader through a quick proof that this correction results in an unbiased estimator of the population variance.