Summing Quadratic Forms

The sum of two equations that are quadratic in x\mathbf{x} is a single quadratic form in x\mathbf{x}. I work through this derivation in detail.

I originally used this mathematical trick in an appendix of another post. However, I’ve used this derivation so many times that I am re-posting it as a standalone post with better notation so that I can reference it as needed. I’m sure this derivation exists elsewhere, but I’ve never seen it, and knowledge of the result is assumed in many papers.

The sum of two quadratic forms in x\mathbf{x} can be written as a single quadratic form plus a constant term that is independent of x\mathbf{x}. This is useful in many derivations in Bayesian inference because you often want to combine two Gaussian kernels. Thus, we are going to write out

(xμA)ΣA1(xμA)+(xμB)ΣB1(xμB),(1) (\mathbf{x} - \boldsymbol{\mu}_A)^{\top} \boldsymbol{\Sigma}_A^{-1} (\mathbf{x} - \boldsymbol{\mu}_A) + (\mathbf{x} - \boldsymbol{\mu}_B)^{\top} \boldsymbol{\Sigma}_B^{-1} (\mathbf{x} - \boldsymbol{\mu}_B), \tag{1}

as quadratic in x\mathbf{x} while dropping any terms that do not depend on x\mathbf{x}. First, expand each quadratic term out:

(xμA)ΣA1(xμA)=xΣA1x2μAΣA1x+μAΣA1μA(xμB)ΣB1(xμB)=xΣB1x2μBΣB1x+μBΣB1μB.(2) \begin{aligned} (\mathbf{x} - \boldsymbol{\mu}_A)^{\top} \boldsymbol{\Sigma}_A^{-1} (\mathbf{x} - \boldsymbol{\mu}_A) &= \mathbf{x}^{\top} \boldsymbol{\Sigma}_A^{-1} \mathbf{x} - 2 \boldsymbol{\mu}_A^{\top} \boldsymbol{\Sigma}_A^{-1} \mathbf{x} + \boldsymbol{\mu}_A^{\top} \boldsymbol{\Sigma}_A^{-1} \boldsymbol{\mu}_A \\ (\mathbf{x} - \boldsymbol{\mu}_B)^{\top} \boldsymbol{\Sigma}_B^{-1} (\mathbf{x} - \boldsymbol{\mu}_B) &= \mathbf{x}^{\top} \boldsymbol{\Sigma}_B^{-1} \mathbf{x} - 2 \boldsymbol{\mu}_B^{\top} \boldsymbol{\Sigma}_B^{-1} \mathbf{x} + \boldsymbol{\mu}_B^{\top} \boldsymbol{\Sigma}_B^{-1} \boldsymbol{\mu}_B. \end{aligned} \tag{2}

If we combine similar terms and distribute, we get

x(ΣA1+ΣB1)x2(μAΣA1+μBΣB1)x+(μAΣA1μA+μBΣB1μB)(3) \begin{aligned} \mathbf{x}^{\top}(\boldsymbol{\Sigma}_A^{-1} + \boldsymbol{\Sigma}_B^{-1}) \mathbf{x} - 2(\boldsymbol{\mu}_A^{\top} \boldsymbol{\Sigma}_A^{-1} + \boldsymbol{\mu}_B^{\top} \boldsymbol{\Sigma}_B^{-1}) \mathbf{x} + (\boldsymbol{\mu}_A^{\top} \boldsymbol{\Sigma}_A^{-1} \boldsymbol{\mu}_A + \boldsymbol{\mu}_B^{\top} \boldsymbol{\Sigma}_B^{-1} \boldsymbol{\mu}_B) \end{aligned} \tag{3}

which is again quadratic in x\mathbf{x}. If we set

V=ΣA1+ΣB1m=ΣA1μA+ΣB1μBR=μAΣA1μA+μBΣB1μB(4) \begin{aligned} \mathbf{V} &= \boldsymbol{\Sigma}_A^{-1} + \boldsymbol{\Sigma}_B^{-1} \\ \mathbf{m} &= \boldsymbol{\Sigma}_A^{-1} \boldsymbol{\mu}_A + \boldsymbol{\Sigma}_B^{-1} \boldsymbol{\mu}_B \\ R &= \boldsymbol{\mu}_A^{\top} \boldsymbol{\Sigma}_A^{-1} \boldsymbol{\mu}_A + \boldsymbol{\mu}_B^{\top} \boldsymbol{\Sigma}_B^{-1} \boldsymbol{\mu}_B \end{aligned} \tag{4}

and apply the multivariate case of completing the square, then we can write Eq. 33 as

xVx2mx+R=(xV1m)V(xV1m)mV1m+R.(5) \begin{aligned} &\mathbf{x}^{\top} \mathbf{V} \mathbf{x} - 2 \mathbf{m}^{\top} \mathbf{x} + R \\ &= (\mathbf{x} - \mathbf{V}^{-1} \mathbf{m}) \mathbf{V} (\mathbf{x} - \mathbf{V}^{-1} \mathbf{m}) - \mathbf{m}^{\top} \mathbf{V}^{-1} \mathbf{m} + R. \end{aligned} \tag{5}

This is proportional to a Gaussian kernel with mean V1m\mathbf{V}^{-1} \mathbf{m} and covariance V1\mathbf{V}^{-1} if we can ignore the remainder terms mV1m\mathbf{m}^{\top} \mathbf{V}^{-1} \mathbf{m} and RR, which do not depend on x\mathbf{x}.