I originally used this mathematical trick in an appendix of another post. However, I’ve used this derivation so many times that I am re-posting it as a standalone post with better notation so that I can reference it as needed. I’m sure this derivation exists elsewhere, but I’ve never seen it, and knowledge of the result is assumed in many papers.
The sum of two quadratic forms in x can be written as a single quadratic form plus a constant term that is independent of x. This is useful in many derivations in Bayesian inference because you often want to combine two Gaussian kernels. Thus, we are going to write out
(x−μA)⊤ΣA−1(x−μA)+(x−μB)⊤ΣB−1(x−μB),(1)
as quadratic in x while dropping any terms that do not depend on x. First, expand each quadratic term out:
(x−μA)⊤ΣA−1(x−μA)(x−μB)⊤ΣB−1(x−μB)=x⊤ΣA−1x−2μA⊤ΣA−1x+μA⊤ΣA−1μA=x⊤ΣB−1x−2μB⊤ΣB−1x+μB⊤ΣB−1μB.(2)
If we combine similar terms and distribute, we get
x⊤(ΣA−1+ΣB−1)x−2(μA⊤ΣA−1+μB⊤ΣB−1)x+(μA⊤ΣA−1μA+μB⊤ΣB−1μB)(3)
which is again quadratic in x. If we set
VmR=ΣA−1+ΣB−1=ΣA−1μA+ΣB−1μB=μA⊤ΣA−1μA+μB⊤ΣB−1μB(4)
and apply the multivariate case of completing the square, then we can write Eq. 3 as
x⊤Vx−2m⊤x+R=(x−V−1m)V(x−V−1m)−m⊤V−1m+R.(5)
This is proportional to a Gaussian kernel with mean V−1m and covariance V−1 if we can ignore the remainder terms m⊤V−1m and R, which do not depend on x.