Proof of the Singular Value Decomposition
I walk the reader carefully through Gilbert Strang's existence proof of the singular value decomposition.
The existence claim for the singular value decomposition (SVD) is quite strong: “Every matrix is diagonal, provided one uses the proper bases for the domain and range spaces” (Trefethen & Bau III, 1997). MIT professor Gilbert Strang has a wonderful lecture on the SVD, and he includes an existence proof for the SVD. The goal of this post is to force myself to walk through his proof carefully. For legibility, I break the proof into two sections: an overview and details. I hope this allows the reader to get the big picture of the proof while consulting details as needed.
Note that while the SVD holds for complex matrices, we restrict ourselves to real-valued matrices in this proof. Trefethen and Bau have a proof for the existence and uniqueness of the SVD for complex matrices, but I found Strang’s proof more instructive.
If you’re unfamiliar with the SVD, please see my previous post on a geometrical interpretation of it.
Proof
Consider an matrix with rank . The matrix is therefore symmetric and positive semi-definite (PSD) (Details, Section 1). This means the matrix is diagonalizable with an eigendecomposition of the form:
where is an orthonormal matrix whose columns are the eigenvectors of and where and (Details, Section 2). The second equality above, which is a sum of matrices, holds because is diagonal.
We have defined a quantity (the singular values) as the square root of the -th eigenvalue; we know we can take the square root of our eigenvalues because PSD matrices can be equivalently characterized as matrices with non-negative eigenvalues (Details, Section 3).
For the -th eigenvector-eigenvalue pair, we have
Next comes what is, at least in my mind, the critical step in the proof. For now, assume that we have a full-rank matrix ( for all ). Define a new vector such that,
By construction, is a unit eigenvector of (Details, Section 4). Now let be an matrix—because is —where the -th column is ; let be an matrix—because is an -vector—where the -th column is ; and let be a diagonal matrix whose -th element is . Then we can express the relationships we have so far in matrix form as:
where we use the fact that and is a diagonal matrix where the -th value is the reciprocal of . And we’re done. Note that the first columns of are an orthonormal basis for the row space of , while the first columns of are an orthonormal basis for the column space of .
In the low-rank scenario, some . Provided the are sorted, we can complete by adding additional column vectors that span and then add rows of -vectors to . See Figure here.
Details
1. Gram matrices as positive semi-definite
Gram matrices are PSD. Consider an arbitrary Gram matrix . Then we have:
If that last step is not obvious, let and note that
In general, positive-definiteness is required for any operation to be an inner product.
2. and have the same rank
To show that , it is sufficient to show that and have the same solutions, i.e. . This makes sense because rank is just the maximal number of linearly independent columns, and a set of vectors are linearly dependent if there exists scalars , not all zero, such that
Now if , then .
If , then
Note that for any vector , by definition of the inner product, .
3. The eigenvalues of PSD matrices are all non-negative
An equivalent characterization of a PSD matrix is that all its eigenvalues are non-negative. First, consider a real symmetric matrix . Since it is real and symmetric, it has an eigendecomposition of the form:
And therefore:
This final expression is greater or equal to zero if all the eigenvalues are non-negative. So if that is true, the matrix is PSD.
Next, consider a matrix that is PSD, so:
Now consider an arbitrary eigenvector of , . Since is PSD, we know:
Since is necessarily non-negative, then must be non-negative for to hold.
4. is a unit eigenvector of
To see that is an eigenvector of , note that
where step 2-3 applies the definition of as an eigenvector of and step 3-4 applies the definition of .
To see that is unit, note that
where the last step is because is a unit vector as well.
Conclusion
I found this proof instructive. Importantly, it makes it clear where the relationship between singular values and eigenvalues comes from. The right singular vectors of , the columns of , are the set of orthonormal eigenvectors of . The left singular vectors of , the columns of , are the set of orthonormal eigenvectors of . And the non-zero singular values of are the square roots of the eigenvalues of both and .
Acknowledgements
I thank James D., Carl A., and José M. for pointing out a few mistakes in earlier drafts.
- Trefethen, L. N., & Bau III, D. (1997). Numerical linear algebra (Vol. 50). Siam.