Implementing Bayesian Online Changepoint Detection
I annotate my Python implementation of the framework in Adams and MacKay's 2007 paper, "Bayesian Online Changepoint Detection".
Since I first wrote about Bayesian online changepoint detection (BOCD), I have received a number of emails asking about implementation details. This is understandable, since the code near the end is relatively brief. The goal of this post is to explain my Python implementation of BOCD. This models the simplest case, Gaussian data with fixed variance and changing mean, but I think it highlights the main points. I assume you are familiar with both BOCD and the notation used in either my post or the original paper (Adams & MacKay, 2007).
Posterior predictive
We want to recursively compute the following joint distribution:
The code for these recursions is fairly straightforward, and I’ll tackle them in the next section. However, the term labeled “predictive” is tricky because it requires some knowledge of Bayesian inference. The big idea is that the predictive term,
is just the posterior predictive when we hypothesize that the run length is
As mentioned, let’s work through the simplest case, a Gaussian with unknown mean. In other words, the model is:
where
See Section
At each time
To be clear, the subscripts above are the number of data points associated with a given run length hypothesis. For example,
So at time
prec_params = np.array([1/var0])
and then recursively update them as
new_prec_params = prec_params + (1/varx)
prec_params = np.append([1/var0], new_prec_params)
This works by leveraging NumPy’s (Harris et al., 2020) element-wise vector addition. Now let’s look at a sequence of parameters for
One caveat: observe the data indices. For example, at time
Anyway, the updates from Eq.
This means we can write
So we can do a clever element-wise vector multiplication to generate the next set of parameters from the previous. First, we initialize our mean parameters as
mean_params = np.array([mean0])
and then recursively update them as
new_mean_params = (mean_params * prec_params[:-1] + (x / varx)) / new_prec_params
mean_params = np.append([mean0], new_mean_params)
where new_prec_params
is defined in a snippet above. Notice that at time mean_params
has length prec_params
, it has length mean_params * prec_params[:-1]
. We then add the new observation, normalized by its assumed variance new_prec_params
is also of length
Finally, given our estimated mean and precision parameters, we can compute the posterior predictive probability for a new data point for each possible run length hypothesis at time
preds = np.empty(t)
for n in range(t):
mean_n = mean_params[n]
var_n = np.sqrt(1. / prec_params[n] + varx)
preds[n] = norm(mean_n, std_n).pdf(x)
The above code is correct, but we can easily vectorize it:
post_means = mean_params[:t]
post_stds = np.sqrt(1. / prec_params[:t] + varx)
preds = norm(post_means, post_stds).pdf(x)
Filtering recursions
Recursive estimation of the run length posterior is, at least in my mind, much easier to grok. We first initialize a
R = np.zeros((T + 1, T + 1))
R[0, 0] = 1
message = np.array([1])
Then, for each observation, we need to compute the predictive probabilities and multiply them times our changepoint prior and message. In this case, I assume a fixed changepoint prior denoted by hazard
:
preds = model.pred_prob(t, x)
growth_probs = preds * message * (1 - hazard)
cp_prob = sum(preds * message * hazard)
new_joint = np.append(cp_prob, growth_probs)
Next, we normalize our new joint distribution by the evidence and store it in
R[t, :t+1] = new_joint
evidence = sum(new_joint)
R[t, :t+1] /= evidence
Finally, update our sufficient statistics or parameters and message pass:
model.update_statistics(t, x)
message = new_joint
Numerical stability
At this point, we have a working version of BOCD. However, the code is numerically unstable for big datasets because we are constantly multiplying small numbers, resulting in numerical underflow. The standard fix is to work in log space. This is straightforward. We just need to use norm.logpdf
instead of norm.pdf
in when computing predictive probabilities, and we need to use the log-sum-exp trick to normalize. Now our code looks like this:
# 3. Evaluate predictive probabilities.
log_pis = model.log_pred_prob(t, x)
# 4. Calculate growth probabilities.
log_growth_probs = log_pis + log_message + log_1mH
# 5. Calculate changepoint probabilities.
log_cp_prob = logsumexp(log_pis + log_message + log_H)
# 6. Calculate evidence
new_log_joint = np.append(log_cp_prob, log_growth_probs)
# 7. Determine run length distribution.
log_R[t, :t+1] = new_log_joint
log_R[t, :t+1] -= logsumexp(new_log_joint)
# 8. Update sufficient statistics.
model.update_params(t, x)
# Pass message.
log_message = new_log_joint
See the code on GitHub for a complete working example with all variables defined.
Conclusion
As we can see, BOCD is not a terribly complicated algorithm conceptually, but actually implementing it requires both knowledge of Bayesian inference and some tedious derivations. In my opinion, the sequential updates are the trickiest part, and I wouldn’t be surprised if there is an off-by-one error in my own logic or code; please email if you find a mistake. Hopefully going through these updates in painful detail underscores what’s happening in the algorithm.
- Adams, R. P., & MacKay, D. J. C. (2007). Bayesian online changepoint detection. ArXiv Preprint ArXiv:0710.3742.
- Murphy, K. P. (2007). Conjugate Bayesian analysis of the Gaussian distribution. Def, 1(2\sigma2), 16.
- Harris, C. R., Millman, K. J., van der Walt, S. J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N. J., & others. (2020). Array programming with NumPy. Nature, 585(7825), 357–362.