Matrix Multiplication as the Sum of Outer Products

The transpose of a matrix times itself is equal to the sum of outer products created by the rows of the matrix. I prove this identity.

In a previous post, I used the following identity. Let A\mathbf{A} be an N×KN \times K matrix, and let an\mathbf{a}_n denote a KK-dimensional row vector in A\mathbf{A}. Then the following holds:

AA=n=1Nanan.(1) \mathbf{A}^{\top} \mathbf{A} = \sum_{n=1}^{N} \mathbf{a}_n^{\top} \mathbf{a}_n. \tag{1}

Note since an\mathbf{a}_n is a row vector, the operation anan\mathbf{a}_n^{\top} \mathbf{a}_n is an outer product, not a dot product. Thus, the K×KK \times K matrix AA\mathbf{A}^{\top} \mathbf{A} is the sum of NN outer products. Let’s prove this.

Let 0\mathbf{0} denote a KK-dimensional row vector of all zeros. We can write A\mathbf{A} as

A=[a100]+[0a20]++[00aN]=n=1N[0an0].(2) \mathbf{A} = \begin{bmatrix} & \mathbf{a}_1 & \\ \hline & \mathbf{0} & \\ \hline & \vdots & \\ \hline & \mathbf{0} & \end{bmatrix} + \begin{bmatrix} & \mathbf{0} & \\ \hline & \mathbf{a}_2 & \\ \hline & \vdots & \\ \hline & \mathbf{0} & \end{bmatrix} + \dots + \begin{bmatrix} & \mathbf{0} & \\ \hline & \mathbf{0} & \\ \hline & \vdots & \\ \hline & \mathbf{a}_N & \end{bmatrix} = \sum_{n=1}^{N} \begin{bmatrix} & \mathbf{0} & \\ \hline & \vdots & \\ \hline & \mathbf{a}_n & \\ \hline & \vdots & \\ \hline & \mathbf{0} & \end{bmatrix}. \tag{2}

This is nothing deep. We are writing A\mathbf{A} as the sum of NN matrices, each of which has shape N×KN \times K and is all zeros except for the nn-th row vector an\mathbf{a}_n. Therefore we can write AA\mathbf{A}^{\top} \mathbf{A} as

AA=An=1N[0an0]=n=1NA[0an0],(3) \mathbf{A}^{\top} \mathbf{A} = \mathbf{A}^{\top} \sum_{n=1}^{N} \begin{bmatrix} & \mathbf{0} & \\ \hline & \vdots & \\ \hline & \mathbf{a}_n & \\ \hline & \vdots & \\ \hline & \mathbf{0} & \end{bmatrix} = \sum_{n=1}^{N} \mathbf{A}^{\top} \begin{bmatrix} & \mathbf{0} & \\ \hline & \vdots & \\ \hline & \mathbf{a}_n & \\ \hline & \vdots & \\ \hline & \mathbf{0} & \end{bmatrix}, \tag{3}

because matrix multiplication distributes with respect to addition. It may not be obvious why this representation is helpful. Let’s write it out explicitly:

AA=n=1N[a11a1na1NaK1aKnaKN]A[00an1anK00]B(n)C(n).(4) \mathbf{A}^{\top} \mathbf{A} = \sum_{n=1}^{N} \underbrace{\overbrace{\begin{bmatrix} a_{11} & \dots & a_{1n} & \dots & a_{1N} \\ \vdots & \ddots & \vdots & \ddots & \vdots \\ a_{K1} & \dots & a_{Kn} & \dots & a_{KN} \end{bmatrix}}^{\mathbf{A}^{\top}} \overbrace{\begin{bmatrix} 0 & \dots & 0 \\ \vdots & \ddots & \vdots \\ a_{n1} & \dots & a_{nK} \\ \vdots & \ddots & \vdots \\ 0 & \vdots & 0 \end{bmatrix}}^{\mathbf{B}^{(n)}} }_{\mathbf{C}^{(n)}}. \tag{4}

In Eq. 44, each term inside the sum, labeled C(n)\mathbf{C}^{(n)} is a matrix multiplication between a K×NK \times N matrix A\mathbf{A}^{\top} and an N×KN \times K matrix B(n)\mathbf{B}^{(n)}. This results in a K×KK \times K matrix C(n)\mathbf{C}^{(n)}. Using the definition of matrix multiplication, where Cij(n)=(Ai,:)B:,j(n)\mathbf{C}^{(n)}_{ij} = (\mathbf{A}^{\top}_{i,:})^{\top} \mathbf{B}^{(n)}_{:,j}, we see that each C(n)\mathbf{C}^{(n)} is

C(n)=[a1nan1a1nanKaKnan1aKnanK].(5) \mathbf{C}^{(n)} = \begin{bmatrix} a_{1n} a_{n1} & \dots & a_{1n} a_{nK} \\ \vdots & \ddots & \vdots \\ a_{Kn} a_{n1} & \dots & a_{Kn} a_{nK} \end{bmatrix}. \tag{5}

And Eq. 55 is just the definition of the outer product between a column vector an\mathbf{a}_n^{\top} and a row vector an\mathbf{a}_n,

D(n)=anan.(6) \mathbf{D}^{(n)} = \mathbf{a}_n^{\top} \mathbf{a}_n. \tag{6}

Thus, we have proven Eq. 11. It’s easy to extend this result to AA\mathbf{A} \mathbf{A}^{\top}. Just sum over KK instead:

AA=k=1Kakak.(7) \mathbf{A}\mathbf{A}^{\top} = \sum_{k=1}^{K} \mathbf{a}_k \mathbf{a}_k^{\top}. \tag{7}

In this case, ak\mathbf{a}_k is a column vector and the term akak\mathbf{a}_k \mathbf{a}_k^{\top} is an outer product resulting in an N×NN \times N matrix.