Proof of the Singular Value Decomposition

I walk the reader carefully through Gilbert Strang's existence proof of the singular value decomposition.

Published

20 December 2018

The existence claim for the singular value decomposition (SVD) is quite strong: “Every matrix is diagonal, provided one uses the proper bases for the domain and range spaces” (Trefethen & Bau III, 1997). MIT professor Gilbert Strang has a wonderful lecture on the SVD, and he includes an existence proof for the SVD. The goal of this post is to force myself to walk through his proof carefully. For legibility, I break the proof into two sections: an overview and details. I hope this allows the reader to get the big picture of the proof while consulting details as needed.

Note that while the SVD holds for complex matrices, we restrict ourselves to real-valued matrices in this proof. Trefethen and Bau have a proof for the existence and uniqueness of the SVD for complex matrices, but I found Strang’s proof more instructive.

If you’re unfamiliar with the SVD, please see my previous post on a geometrical interpretation of it.

Proof

Consider an $m \times n$ matrix $A$ with rank $r$ . The matrix $A^{\top} A$ is therefore symmetric and positive semi-definite (PSD) (Details, Section 1). This means the matrix is diagonalizable with an eigendecomposition of the form:

$A^{\top} A = V \Lambda V^{\top} = \sum_{i=1}^{n} \lambda_i \textbf{v}_i \textbf{v}_i^{\top} = \sum_{i=1}^{n} (\sigma_i)^2 \textbf{v}_i \textbf{v}_i^{\top}$

where $V$ is an orthonormal matrix whose columns are the eigenvectors of $A^{\top} A$ and where $r \leq n$ and $r = \text{rank}(A) = \text{rank}(A^{\top} A)$ (Details, Section 2). The second equality above, which is a sum of matrices, holds because $\Lambda$ is diagonal.

We have defined a quantity $\sigma_i$ (the singular values) as the square root of the $i$ -th eigenvalue; we know we can take the square root of our eigenvalues because PSD matrices can be equivalently characterized as matrices with non-negative eigenvalues (Details, Section 3).

For the $i$ -th eigenvector-eigenvalue pair, we have

$A^{\top} A \textbf{v}_i = (\sigma_i)^2 \textbf{v}_i.$

Next comes what is, at least in my mind, the critical step in the proof. For now, assume that we have a full-rank matrix ( $\sigma_i > 0$ for all $i$ ). Define a new vector $\textbf{u}_i$ such that,

$\textbf{u}_i = \frac{A \textbf{v}_i}{\sigma_i}.$

By construction, $\textbf{u}_i$ is a unit eigenvector of $AA^{\top}$ (Details, Section 4). Now let $V$ be an $n \times n$ matrix—because $A^{\top} A$ is $n \times n$ —where the $i$ -th column is $\textbf{v}_i$ ; let $U$ be an $m \times m$ matrix—because $A \textbf{v}_i$ is an $m$ -vector—where the $i$ -th column is $\textbf{u}_i$ ; and let $\Sigma$ be a diagonal matrix whose $i$ -th element is $\sigma_i$ . Then we can express the relationships we have so far in matrix form as:

$\begin{aligned} U &= A V \Sigma^{-1} \\ U \Sigma &= A V \\ A &= U \Sigma V^{\top} \end{aligned}$

where we use the fact that $VV^{\top} = I$ and $\Sigma^{-1}$ is a diagonal matrix where the $i$ -th value is the reciprocal of $\sigma_i$ . And we’re done. Note that the first $r$ columns of $V$ are an orthonormal basis for the row space of $A$ , while the first $r$ columns of $U$ are an orthonormal basis for the column space of $A$ .

In the low-rank scenario, some $\sigma_i = 0$ . Provided the $\sigma_i$ are sorted, we can complete $U$ by adding additional column vectors that span $\mathbb{R}^m$ and then add rows of $0$ -vectors to $\Sigma$ . See Figure $7$ here.

Details

1. Gram matrices as positive semi-definite

Gram matrices are PSD. Consider an arbitrary Gram matrix $G = M^{\top} M$ . Then we have:

$\begin{aligned} \textbf{x}^{\top} G \textbf{x} &= \textbf{x}^{\top} M^{\top} M \textbf{x} \\ &= (M \textbf{x})^{\top} M \textbf{x} \\ &= (M \textbf{x})^2 \\ &\geq 0 \end{aligned}$

If that last step is not obvious, let $\textbf{z} = M \textbf{x}$ and note that

$\textbf{z}^{\top} \textbf{z} = \sum_{i=1}^{N} (z_i)^2.$

In general, positive-definiteness is required for any operation to be an inner product.

2. $A$ and $A^{\top} A$ have the same rank

To show that $\text{rank}(A) = \text{rank}(A^{\top} A)$ , it is sufficient to show that $A \textbf{x} = \textbf{0}$ and $A^{\top} A \textbf{x} = \textbf{0}$ have the same solutions, i.e. $A \textbf{x} = \textbf{0} \iff A^{\top} A \textbf{x} = \textbf{0}$ . This makes sense because rank is just the maximal number of linearly independent columns, and a set of vectors $\\{\textbf{x}_1, \textbf{x}_2, \dots \textbf{x}_k\\}$ are linearly dependent if there exists scalars $a_1, a_2, \dots, a_k$ , not all zero, such that

$a_1 \textbf{x}_1 + a_2 \textbf{x}_2 + \dots + a_k \textbf{x}_k = \textbf{0}.$

Now if $A \textbf{x} = \textbf{0}$ , then $A^{\top} (A \textbf{x}) = A^{\top} \textbf{0} = \textbf{0}$ .

If $A^{\top} A \textbf{x} = \textbf{0}$ , then

$\begin{aligned} A^{\top} A \textbf{x} &= \textbf{0} \\ x^{\top} A^{\top} A \textbf{x} &= \textbf{0} \\ (A \textbf{x})^{\top} A \textbf{x} &= \textbf{0}. \end{aligned}$

Note that for any vector $\textbf{v}$ , by definition of the inner product, $\textbf{v}^{\top} \textbf{v} = 0 \iff \textbf{v} = \textbf{0}$ .

3. The eigenvalues of PSD matrices are all non-negative

An equivalent characterization of a PSD matrix is that all its eigenvalues are non-negative. First, consider a real symmetric matrix $A$ . Since it is real and symmetric, it has an eigendecomposition of the form:

$A = Q \Lambda Q^{\top} = \sum_{n=1}^{N} \textbf{q}_n \lambda_n \textbf{q}_n^{\top}$

And therefore:

$\begin{aligned} \textbf{x}^{\top} A \textbf{x} &= \textbf{x}^{\top} \Big( \sum_{n=1}^{N} \textbf{q}_n \lambda_n \textbf{q}_n^{\top} \Big) \textbf{x} \\ &= \sum_{n=1}^{N} \lambda_n \textbf{x}^{\top} \textbf{q}_n \textbf{q}_n^{\top} \textbf{x} \\ &= \sum_{n=1}^{N} \lambda_n (\textbf{x}^{\top} \textbf{q}_n)^2 \end{aligned}$

This final expression is greater or equal to zero if all the eigenvalues $\lambda_n$ are non-negative. So if that is true, the matrix $A$ is PSD.

Next, consider a matrix that is PSD, so:

$\textbf{x}^{\top} A \textbf{x} \geq 0$

Now consider an arbitrary eigenvector of $A$ , $\textbf{v}_i$ . Since $A$ is PSD, we know:

$\begin{aligned} \textbf{v}_i^{\top} A \textbf{v}_i &= \textbf{v}_i^{\top} \lambda_i \textbf{v}_i \\ &= \lambda_i \textbf{v}_i^{\top} \textbf{v}_i \\ &\geq 0 \end{aligned}$

Since $\textbf{v}_i^{\top} \textbf{v}_i$ is necessarily non-negative, then $\lambda_i$ must be non-negative for $\lambda_i \textbf{v}_i^{\top} \textbf{v}_i \geq 0$ to hold.

4. $\textbf{u}_i$ is a unit eigenvector of $AA^{\top}$

To see that $\textbf{u}_i$ is an eigenvector of $AA^{\top}$ , note that

$\begin{aligned} AA^{\top} \textbf{u}_i &= AA^{\top} \Big( \frac{A \textbf{v}_i}{\sigma_i} \Big) \\ &= AA^{\top} A \textbf{v}_i \frac{1}{\sigma_i} \\ &= A (\sigma_i)^2 \textbf{v}_i \frac{1}{\sigma_i} \\ &= (\sigma_i)^2 \textbf{u}_i \end{aligned}$

where step 2-3 applies the definition of $\textbf{v}_i$ as an eigenvector of $A^{\top} A$ and step 3-4 applies the definition of $\textbf{u}_i$ .

To see that $\textbf{u}_i$ is unit, note that

$\begin{aligned} \textbf{u}_i^{\top} \textbf{u}_i &= \Big(\frac{A \textbf{v}_i}{\sigma_i} \Big)^{\top} \frac{A \textbf{v}_i}{\sigma_i} \\ &= \Big( \frac{\textbf{v}_i^{\top} A^{\top}}{\sigma_i} \Big) \frac{A \textbf{v}_i}{\sigma_i} \\ &= \frac{\textbf{v}_i^{\top} A^{\top} A \textbf{v}_i}{(\sigma_i)^2} \\ &= \frac{\textbf{v}_i^{\top} (\sigma_i)^2 \textbf{v}_i}{(\sigma_i)^2} \\ &= \textbf{v}_i^{\top} \textbf{v}_i \\ &= 1 \end{aligned}$

where the last step is because $\textbf{v}_i$ is a unit vector as well.

Conclusion

I found this proof instructive. Importantly, it makes it clear where the relationship between singular values and eigenvalues comes from. The right singular vectors of $A$ , the columns of $V$ , are the set of orthonormal eigenvectors of $A^{\top} A$ . The left singular vectors of $A$ , the columns of $U$ , are the set of orthonormal eigenvectors of $AA^{\top}$ . And the non-zero singular values of $A$ are the square roots of the eigenvalues of both $A^{\top}A$ and $AA^{\top}$ .

Acknowledgements

I thank James D., Carl A., and José M. for pointing out a few mistakes in earlier drafts.

Trefethen, L. N., & Bau III, D. (1997). Numerical linear algebra (Vol. 50). Siam.