Proof of the Singular Value Decomposition

I walk the reader carefully through Gilbert Strang's existence proof of the singular value decomposition.

The existence claim for the singular value decomposition (SVD) is quite strong: “Every matrix is diagonal, provided one uses the proper bases for the domain and range spaces” (Trefethen & Bau III, 1997). MIT professor Gilbert Strang has a wonderful lecture on the SVD, and he includes an existence proof for the SVD. The goal of this post is to force myself to walk through his proof carefully. For legibility, I break the proof into two sections: an overview and details. I hope this allows the reader to get the big picture of the proof while consulting details as needed.

Note that while the SVD holds for complex matrices, we restrict ourselves to real-valued matrices in this proof. Trefethen and Bau have a proof for the existence and uniqueness of the SVD for complex matrices, but I found Strang’s proof more instructive.

If you’re unfamiliar with the SVD, please see my previous post on a geometrical interpretation of it.

Proof

Consider an m×nm \times n matrix AA with rank rr. The matrix AAA^{\top} A is therefore symmetric and positive semi-definite (PSD) (Details, Section 1). This means the matrix is diagonalizable with an eigendecomposition of the form:

AA=VΛV=i=1nλivivi=i=1n(σi)2vivi A^{\top} A = V \Lambda V^{\top} = \sum_{i=1}^{n} \lambda_i \textbf{v}_i \textbf{v}_i^{\top} = \sum_{i=1}^{n} (\sigma_i)^2 \textbf{v}_i \textbf{v}_i^{\top}

where VV is an orthonormal matrix whose columns are the eigenvectors of AAA^{\top} A and where rnr \leq n and r=rank(A)=rank(AA)r = \text{rank}(A) = \text{rank}(A^{\top} A) (Details, Section 2). The second equality above, which is a sum of matrices, holds because Λ\Lambda is diagonal.

We have defined a quantity σi\sigma_i (the singular values) as the square root of the ii-th eigenvalue; we know we can take the square root of our eigenvalues because PSD matrices can be equivalently characterized as matrices with non-negative eigenvalues (Details, Section 3).

For the ii-th eigenvector-eigenvalue pair, we have

AAvi=(σi)2vi. A^{\top} A \textbf{v}_i = (\sigma_i)^2 \textbf{v}_i.

Next comes what is, at least in my mind, the critical step in the proof. For now, assume that we have a full-rank matrix (σi>0\sigma_i > 0 for all ii). Define a new vector ui\textbf{u}_i such that,

ui=Aviσi. \textbf{u}_i = \frac{A \textbf{v}_i}{\sigma_i}.

By construction, ui\textbf{u}_i is a unit eigenvector of AAAA^{\top} (Details, Section 4). Now let VV be an n×nn \times n matrix—because AAA^{\top} A is n×nn \times n—where the ii-th column is vi\textbf{v}_i; let UU be an m×mm \times m matrix—because AviA \textbf{v}_i is an mm-vector—where the ii-th column is ui\textbf{u}_i; and let Σ\Sigma be a diagonal matrix whose ii-th element is σi\sigma_i. Then we can express the relationships we have so far in matrix form as:

U=AVΣ1UΣ=AVA=UΣV \begin{aligned} U &= A V \Sigma^{-1} \\ U \Sigma &= A V \\ A &= U \Sigma V^{\top} \end{aligned}

where we use the fact that VV=IVV^{\top} = I and Σ1\Sigma^{-1} is a diagonal matrix where the ii-th value is the reciprocal of σi\sigma_i. And we’re done. Note that the first rr columns of VV are an orthonormal basis for the row space of AA, while the first rr columns of UU are an orthonormal basis for the column space of AA.

In the low-rank scenario, some σi=0\sigma_i = 0. Provided the σi\sigma_i are sorted, we can complete UU by adding additional column vectors that span Rm\mathbb{R}^m and then add rows of 00-vectors to Σ\Sigma. See Figure 77 here.

Details

1. Gram matrices as positive semi-definite

Gram matrices are PSD. Consider an arbitrary Gram matrix G=MMG = M^{\top} M. Then we have:

xGx=xMMx=(Mx)Mx=(Mx)20 \begin{aligned} \textbf{x}^{\top} G \textbf{x} &= \textbf{x}^{\top} M^{\top} M \textbf{x} \\ &= (M \textbf{x})^{\top} M \textbf{x} \\ &= (M \textbf{x})^2 \\ &\geq 0 \end{aligned}

If that last step is not obvious, let z=Mx\textbf{z} = M \textbf{x} and note that

zz=i=1N(zi)2. \textbf{z}^{\top} \textbf{z} = \sum_{i=1}^{N} (z_i)^2.

In general, positive-definiteness is required for any operation to be an inner product.

2. AA and AAA^{\top} A have the same rank

To show that rank(A)=rank(AA)\text{rank}(A) = \text{rank}(A^{\top} A), it is sufficient to show that Ax=0A \textbf{x} = \textbf{0} and AAx=0A^{\top} A \textbf{x} = \textbf{0} have the same solutions, i.e. Ax=0    AAx=0A \textbf{x} = \textbf{0} \iff A^{\top} A \textbf{x} = \textbf{0}. This makes sense because rank is just the maximal number of linearly independent columns, and a set of vectors x1,x2,xk\\{\textbf{x}_1, \textbf{x}_2, \dots \textbf{x}_k\\} are linearly dependent if there exists scalars a1,a2,,aka_1, a_2, \dots, a_k, not all zero, such that

a1x1+a2x2++akxk=0. a_1 \textbf{x}_1 + a_2 \textbf{x}_2 + \dots + a_k \textbf{x}_k = \textbf{0}.

Now if Ax=0A \textbf{x} = \textbf{0}, then A(Ax)=A0=0A^{\top} (A \textbf{x}) = A^{\top} \textbf{0} = \textbf{0}.

If AAx=0A^{\top} A \textbf{x} = \textbf{0}, then

AAx=0xAAx=0(Ax)Ax=0. \begin{aligned} A^{\top} A \textbf{x} &= \textbf{0} \\ x^{\top} A^{\top} A \textbf{x} &= \textbf{0} \\ (A \textbf{x})^{\top} A \textbf{x} &= \textbf{0}. \end{aligned}

Note that for any vector v\textbf{v}, by definition of the inner product, vv=0    v=0\textbf{v}^{\top} \textbf{v} = 0 \iff \textbf{v} = \textbf{0}.

3. The eigenvalues of PSD matrices are all non-negative

An equivalent characterization of a PSD matrix is that all its eigenvalues are non-negative. First, consider a real symmetric matrix AA. Since it is real and symmetric, it has an eigendecomposition of the form:

A=QΛQ=n=1Nqnλnqn A = Q \Lambda Q^{\top} = \sum_{n=1}^{N} \textbf{q}_n \lambda_n \textbf{q}_n^{\top}

And therefore:

xAx=x(n=1Nqnλnqn)x=n=1Nλnxqnqnx=n=1Nλn(xqn)2 \begin{aligned} \textbf{x}^{\top} A \textbf{x} &= \textbf{x}^{\top} \Big( \sum_{n=1}^{N} \textbf{q}_n \lambda_n \textbf{q}_n^{\top} \Big) \textbf{x} \\ &= \sum_{n=1}^{N} \lambda_n \textbf{x}^{\top} \textbf{q}_n \textbf{q}_n^{\top} \textbf{x} \\ &= \sum_{n=1}^{N} \lambda_n (\textbf{x}^{\top} \textbf{q}_n)^2 \end{aligned}

This final expression is greater or equal to zero if all the eigenvalues λn\lambda_n are non-negative. So if that is true, the matrix AA is PSD.

Next, consider a matrix that is PSD, so:

xAx0 \textbf{x}^{\top} A \textbf{x} \geq 0

Now consider an arbitrary eigenvector of AA, vi\textbf{v}_i. Since AA is PSD, we know:

viAvi=viλivi=λivivi0 \begin{aligned} \textbf{v}_i^{\top} A \textbf{v}_i &= \textbf{v}_i^{\top} \lambda_i \textbf{v}_i \\ &= \lambda_i \textbf{v}_i^{\top} \textbf{v}_i \\ &\geq 0 \end{aligned}

Since vivi\textbf{v}_i^{\top} \textbf{v}_i is necessarily non-negative, then λi\lambda_i must be non-negative for λivivi0\lambda_i \textbf{v}_i^{\top} \textbf{v}_i \geq 0 to hold.

4. ui\textbf{u}_i is a unit eigenvector of AAAA^{\top}

To see that ui\textbf{u}_i is an eigenvector of AAAA^{\top}, note that

AAui=AA(Aviσi)=AAAvi1σi=A(σi)2vi1σi=(σi)2ui \begin{aligned} AA^{\top} \textbf{u}_i &= AA^{\top} \Big( \frac{A \textbf{v}_i}{\sigma_i} \Big) \\ &= AA^{\top} A \textbf{v}_i \frac{1}{\sigma_i} \\ &= A (\sigma_i)^2 \textbf{v}_i \frac{1}{\sigma_i} \\ &= (\sigma_i)^2 \textbf{u}_i \end{aligned}

where step 2-3 applies the definition of vi\textbf{v}_i as an eigenvector of AAA^{\top} A and step 3-4 applies the definition of ui\textbf{u}_i.

To see that ui\textbf{u}_i is unit, note that

uiui=(Aviσi)Aviσi=(viAσi)Aviσi=viAAvi(σi)2=vi(σi)2vi(σi)2=vivi=1 \begin{aligned} \textbf{u}_i^{\top} \textbf{u}_i &= \Big(\frac{A \textbf{v}_i}{\sigma_i} \Big)^{\top} \frac{A \textbf{v}_i}{\sigma_i} \\ &= \Big( \frac{\textbf{v}_i^{\top} A^{\top}}{\sigma_i} \Big) \frac{A \textbf{v}_i}{\sigma_i} \\ &= \frac{\textbf{v}_i^{\top} A^{\top} A \textbf{v}_i}{(\sigma_i)^2} \\ &= \frac{\textbf{v}_i^{\top} (\sigma_i)^2 \textbf{v}_i}{(\sigma_i)^2} \\ &= \textbf{v}_i^{\top} \textbf{v}_i \\ &= 1 \end{aligned}

where the last step is because vi\textbf{v}_i is a unit vector as well.

Conclusion

I found this proof instructive. Importantly, it makes it clear where the relationship between singular values and eigenvalues comes from. The right singular vectors of AA, the columns of VV, are the set of orthonormal eigenvectors of AAA^{\top} A. The left singular vectors of AA, the columns of UU, are the set of orthonormal eigenvectors of AAAA^{\top}. And the non-zero singular values of AA are the square roots of the eigenvalues of both AAA^{\top}A and AAAA^{\top}.

   

Acknowledgements

I thank James D., Carl A., and José M. for pointing out a few mistakes in earlier drafts.

  1. Trefethen, L. N., & Bau III, D. (1997). Numerical linear algebra (Vol. 50). Siam.