Matrix Multiplication as the Sum of Outer Products

The transpose of a matrix times itself is equal to the sum of outer products created by the rows of the matrix. I prove this identity.

Published

17 July 2020

In a previous post, I used the following identity. Let $\mathbf{A}$ be an $N \times K$ matrix, and let $\mathbf{a}_n$ denote a $K$ -dimensional row vector in $\mathbf{A}$ . Then the following holds:

$\mathbf{A}^{\top} \mathbf{A} = \sum_{n=1}^{N} \mathbf{a}_n^{\top} \mathbf{a}_n. \tag{1}$

Note since $\mathbf{a}_n$ is a row vector, the operation $\mathbf{a}_n^{\top} \mathbf{a}_n$ is an outer product, not a dot product. Thus, the $K \times K$ matrix $\mathbf{A}^{\top} \mathbf{A}$ is the sum of $N$ outer products. Let’s prove this.

Let $\mathbf{0}$ denote a $K$ -dimensional row vector of all zeros. We can write $\mathbf{A}$ as

$\mathbf{A} = \begin{bmatrix} & \mathbf{a}_1 & \\ \hline & \mathbf{0} & \\ \hline & \vdots & \\ \hline & \mathbf{0} & \end{bmatrix} + \begin{bmatrix} & \mathbf{0} & \\ \hline & \mathbf{a}_2 & \\ \hline & \vdots & \\ \hline & \mathbf{0} & \end{bmatrix} + \dots + \begin{bmatrix} & \mathbf{0} & \\ \hline & \mathbf{0} & \\ \hline & \vdots & \\ \hline & \mathbf{a}_N & \end{bmatrix} = \sum_{n=1}^{N} \begin{bmatrix} & \mathbf{0} & \\ \hline & \vdots & \\ \hline & \mathbf{a}_n & \\ \hline & \vdots & \\ \hline & \mathbf{0} & \end{bmatrix}. \tag{2}$

This is nothing deep. We are writing $\mathbf{A}$ as the sum of $N$ matrices, each of which has shape $N \times K$ and is all zeros except for the $n$ -th row vector $\mathbf{a}_n$ . Therefore we can write $\mathbf{A}^{\top} \mathbf{A}$ as

$\mathbf{A}^{\top} \mathbf{A} = \mathbf{A}^{\top} \sum_{n=1}^{N} \begin{bmatrix} & \mathbf{0} & \\ \hline & \vdots & \\ \hline & \mathbf{a}_n & \\ \hline & \vdots & \\ \hline & \mathbf{0} & \end{bmatrix} = \sum_{n=1}^{N} \mathbf{A}^{\top} \begin{bmatrix} & \mathbf{0} & \\ \hline & \vdots & \\ \hline & \mathbf{a}_n & \\ \hline & \vdots & \\ \hline & \mathbf{0} & \end{bmatrix}, \tag{3}$

because matrix multiplication distributes with respect to addition. It may not be obvious why this representation is helpful. Let’s write it out explicitly:

$\mathbf{A}^{\top} \mathbf{A} = \sum_{n=1}^{N} \underbrace{\overbrace{\begin{bmatrix} a_{11} & \dots & a_{1n} & \dots & a_{1N} \\ \vdots & \ddots & \vdots & \ddots & \vdots \\ a_{K1} & \dots & a_{Kn} & \dots & a_{KN} \end{bmatrix}}^{\mathbf{A}^{\top}} \overbrace{\begin{bmatrix} 0 & \dots & 0 \\ \vdots & \ddots & \vdots \\ a_{n1} & \dots & a_{nK} \\ \vdots & \ddots & \vdots \\ 0 & \vdots & 0 \end{bmatrix}}^{\mathbf{B}^{(n)}} }_{\mathbf{C}^{(n)}}. \tag{4}$

In Eq. $4$ , each term inside the sum, labeled $\mathbf{C}^{(n)}$ is a matrix multiplication between a $K \times N$ matrix $\mathbf{A}^{\top}$ and an $N \times K$ matrix $\mathbf{B}^{(n)}$ . This results in a $K \times K$ matrix $\mathbf{C}^{(n)}$ . Using the definition of matrix multiplication, where $\mathbf{C}^{(n)}_{ij} = (\mathbf{A}^{\top}_{i,:})^{\top} \mathbf{B}^{(n)}_{:,j}$ , we see that each $\mathbf{C}^{(n)}$ is

$\mathbf{C}^{(n)} = \begin{bmatrix} a_{1n} a_{n1} & \dots & a_{1n} a_{nK} \\ \vdots & \ddots & \vdots \\ a_{Kn} a_{n1} & \dots & a_{Kn} a_{nK} \end{bmatrix}. \tag{5}$

And Eq. $5$ is just the definition of the outer product between a column vector $\mathbf{a}_n^{\top}$ and a row vector $\mathbf{a}_n$ ,

$\mathbf{D}^{(n)} = \mathbf{a}_n^{\top} \mathbf{a}_n. \tag{6}$

Thus, we have proven Eq. $1$ . It’s easy to extend this result to $\mathbf{A} \mathbf{A}^{\top}$ . Just sum over $K$ instead:

$\mathbf{A}\mathbf{A}^{\top} = \sum_{k=1}^{K} \mathbf{a}_k \mathbf{a}_k^{\top}. \tag{7}$

In this case, $\mathbf{a}_k$ is a column vector and the term $\mathbf{a}_k \mathbf{a}_k^{\top}$ is an outer product resulting in an $N \times N$ matrix.