I learned very early the difference between knowing the name of something and knowing something.

Richard Feynman

Linear models

Principal Component Analysis

Principal component analyis (PCA) is a simple, fast, and elegant linear method for data analysis. I explore PCA in detail, first with pictures and intuition, then with linear algebra and detailed derivations, and finally with code.

Weighted Least Squares

Weighted least squares (WLS) is a generalization of ordinary least squares in which each observation is assigned a weight, which scales the squared residual error. I discuss WLS and then derive its estimator in detail.

Generalized Least Squares

I discuss generalized least squares (GLS), which extends ordinary least squares by assuming heteroscedastic errors. I prove some basic properties of GLS, particularly that it is the best linear unbiased estimator, and work through a complete example.

The Gauss–Markov Theorem

I discuss and prove the Gauss–Markov theorem, which states that under certain conditions, the least squares estimator is the minimum-variance linear unbiased estimator of the model parameters.

Breusch–Pagan Test for Heteroscedasticity

I discuss the Breusch–Pagan test, a simple hypothesis test for heteroscedasticity in linear models. I also implement the test in Python and demonstrate that it can detect heteroscedasticity in a toy example.

OLS with Heteroscedasticity

The ordinary least squares estimator is inefficient when the homoscedasticity assumption does not hold. I provide a simple example of a nonsensical tt-statistic from data with heteroscedasticity and discuss why this happens in general.

Consistency of the OLS Estimator

A consistent estimator converges in probability to the true value. I discuss this idea in general and then prove that the ordinary least squares estimator is consistent.

Autoregressive Model

Autoregressive (AR) models represent random processes in which each observation is a linear function of some of its previous values, plus noise. I present the main ideas behind AR models, including when they are stationary and how to fit them with the Yule–Walker equations.

Hypothesis Testing for OLS

When can we be confident in our estimated coefficients when using OLS? We typically use a tt-statistic to quantify whether an inferred coefficient was likely to have happened by chance. I discuss hypothesis testing and tt-statistics for OLS.

Residual Sum of Squares in Terms of Pearson's Correlation

I re-derive a relationship between the residual sum of squares in simple linear regresssion and Pearson's correlation coefficient.

Sampling Distribution of the OLS Estimator

I derive the mean and variance of the OLS estimator, as well as an unbiased estimator of the OLS estimator's variance. I then show that the OLS estimator is normally distributed if we assume the error terms are normally distributed.

Simple Linear Regression and Correlation

In simple linear regression, the slope parameter is a simple function of the correlation between the targets and predictors. I derive this result and discuss a few consequences.

Coefficient of Determination

In ordinary least squares, the coefficient of determination quantifies the variation in the dependent variables that can be explained by the model. However, this interpretation has a few assumptions which are worth understanding. I explore this metric and the assumptions in detail.

Multicollinearity

Multicollinearity is when two or more predictors are linearly dependent. This can impact the interpretability of a linear model's estimated coefficients. I discuss this phenomenon in detail.

Bayesian Linear Regression

I discuss Bayesian linear regression or classical linear regression with a prior on the parameters. Using a particular prior as an example, I provide intuition and detailed derivations for the full model.

Can Linear Models Overfit?

We know that regularization is important for linear models, but what does overfitting mean in this context? I discuss this question.

Ordinary Least Squares

I discuss ordinary least squares or linear regression when the optimal coefficients minimize the residual sum of squares. I discuss various properties and interpretations of this classic model.