Singular Value Decompositions


Learning Objectives

Singular Value Decomposition

An m×n real matrix A has a singular value decomposition of the form

A=UΣVT

where U is an m×m orthogonal matrix, V is an n×n orthogonal matrix, and Σ is an m×n diagonal matrix. Specifically,

AAT=(UΣVT)(UΣVT)T =(UΣVT)((VT)TΣTUT) =UΣ(VTV)ΣTUT (V is an orthogonal matrix,VT=V1 and VTV=I) =U(ΣΣT)UT

U is also an orthogonal matrix, we can apply diagonalization (B=XDX1).

We have the columns of U are the eigenvectors of AAT, with eigenvalues in the diagonal entries of ΣΣT.

ATA=(UΣVT)T(UΣVT) =V(ΣTΣ)VT

Similar to above, we have the columns of V as the eigenvectors of ATA, with eigenvalues in the diagonal entries of ΣTΣ.

Σ=[σ1σs0000]when m>n,andΣ=[σ100σs00]whenm<n.

where s=min(m,n) and σ1σ2σs0 are the square roots of the eigenvalues values of ATA. The diagonal entries are called the singular values of A.

Note that if ATx0, then ATA and AAT both have the same eigenvalues:

AATx=λx

(left-multiply both sides by AT)

ATAATx=ATλx ATA(ATx)=λ(ATx)

Time Complexity

The time-complexity for computing the SVD factorization of an arbitrary m×n matrix is α(m2n+n3), where the constant α ranges from 4 to 10 (or more) depending on the algorithm.

In general, we can define the cost as:

O(m2n+n3)

Reduced SVD

The SVD factorization of a non-square matrix A of size m×n can be represented in a reduced format:


The following figure depicts the reduced SVD factorization (in red) against the full SVD factorizations (in gray).

In general, we will represent the reduced SVD as:

A=URΣRVRT

where UR is a m×s matrix, VR is a n×s matrix, ΣR is a s×s matrix, and s=min(m,n).

Example: Computing the SVD

We begin with the following non-square matrix, A

A=[323882874187647]

and we will compute the reduced form of the SVD (where here s=3):

(1) Compute ATA:

ATA=[174158106158197134106134127]

(2) Compute the eigenvectors and eigenvalues of ATA:

λ1=437.479,λ2=42.6444,λ3=17.8766,v1=[0.5850510.6526480.481418],v2=[0.7103990.1260680.692415],v3=[0.3912120.7470980.537398]

(3) Construct VR from the eigenvectors of ATA:

VR=[0.5850510.7103990.3912120.6526480.1260680.7470980.4814180.6924150.537398].

(4) Construct ΣR from the square roots of the eigenvalues of ATA:

ΣR=[20.9160006.532070004.22807]

(5) Find U by solving UΣ=AV. For our reduced case, we can find UR=AVRΣR1. You could also find U by computing the eigenvectors of AAT.

U=[323882874187647]A[0.5850510.7103990.3912120.6526480.1260680.7470980.4814180.6924150.537398]V[0.0478100.00.00.00.1531330.00.00.00.236515]Σ1 U=[0.2153710.0303480.3054900.5194320.5037790.4191730.5342620.3110210.0117300.4387150.7878780.4313520.4537590.1667290.738082]

We obtain the following singular value decomposition for A:

[323882874187647]A=[0.2153710.0303480.3054900.5194320.5037790.4191730.5342620.3110210.0117300.4387150.7878780.4313520.4537590.1667290.738082]U[20.9160006.532070004.22807]Σ[0.5850510.6526480.4814180.7103990.1260680.6924150.3912120.7470980.537398]VT

Recall that we computed the reduced SVD factorization (i.e. Σ is square, U is non-square) here.

Rank, null space and range of a matrix

Suppose A is a m×n matrix where m>n (without loss of generality):

A=UΣVT=[||||||u1unum||||||][σ1σn0][v1TvnT]

We can re-write the above as:

A=[||||u1un||||][σ1v1TσnvnT]

Furthermore, the product of two matrices can be written as a sum of outer products:

A=σ1u1v1T+σ2u2v2T+...+σnunvnT

For a general rectangular matrix, we have:

A=i=1sσiuiviT

where s=min(m,n).

If A has s non-zero singular values, the matrix is full rank, i.e. rank(A)=s.

If A has r non-zero singular values, and r<s, the matrix is rank deficient, i.e. rank(A)=r.

In other words, the rank of A equals the number of non-zero singular values which is the same as the number of non-zero diagonal elements in Σ.

Rounding errors may lead to small but non-zero singular values in a rank deficient matrix. Singular values that are smaller than a given tolerance are assumed to be numerically equivalent to zero, defining what is sometimes called the effective rank.

The right-singular vectors (columns of V) corresponding to vanishing singular values of A span the null space of A, i.e. null(A) = span{vr+1, vr+2, …, vn}.

The left-singular vectors (columns of U) corresponding to the non-zero singular values of A span the range of A, i.e. range(A) = span{u1, u2, …, ur}.

Example:

A=[121200122120000010010][14000140000000][100010001]

The rank of A is 2.

The vectors [121200] and [121200] provide an orthonormal basis for the range of A.

The vector [001] provides an orthonormal basis for the null space of A.

(Moore-Penrose) Pseudoinverse

If the matrix Σ is rank deficient, we cannot get its inverse. We define instead the pseudoinverse:

(Σ+)ii={1σiσi00σi=0

For a general non-square matrix A with known SVD (A=UΣVT), the pseudoinverse is defined as:

A+=VΣ+UT

For example, if we consider a m×n full rank matrix where m>n:

A+=[|...|v1...vn|...|][1/σ1001/σn00][||||||u1unum||||||]T

Euclidean norm of matrices

The induced 2-norm of a matrix A can be obtained using the SVD of the matrix :

A2=maxx=1Ax=maxx=1UΣVTx=maxx=1ΣVTx=maxVTx=1ΣVTx=maxy=1Σy

And hence,

A2=σ1

In the above equations, all the notations for the norm . refer to the p=2 Euclidean norm, and we used the fact that U and V are orthogonal matrices and hence U2=V2=1.

Example:

We begin with the following non-square matrix A:

A=[323882874187647].

The matrix of singular values, Σ, computed from the SVD factorization is:

Σ=[20.9160006.532070004.22807].

Consequently the 2-norm of A is

A2=20.916.

Euclidean norm of the inverse of matrices

Following the same derivation as above, we can show that for a full rank n×n matrix we have:

A12=1σn

where σn is the smallest singular value.

For non-square matrices, we can use the definition of the pseudoinverse (regardless of the rank):

A+2=1σr

where σr is the smallest non-zero singular value. Note that for a full rank square matrix, we have A+2=A12. An exception of the definition above is the zero matrix. In this case, A+2=0

2-Norm Condition Number

The 2-norm condition number of a matrix A is given by the ratio of its largest singular value to its smallest singular value:

cond2(A)=A2A12=σmax/σmin.

If the matrix A is rank deficient, i.e. rank(A)<min(m,n), then cond2(A)=.

Low-rank Approximation

The best rank-k approximation for a m×n matrix A, where k<s=min(m,n), for some matrix norm ., is one that minimizes the following problem:

minAk AAksuch thatrank(Ak)k.

Under the induced 2-norm, the best rank-k approximation is given by the sum of the first k outer products of the left and right singular vectors scaled by the corresponding singular value (where, σ1σs):

Ak=σ1u1v1T+σkukvkT

Observe that the norm of the difference between the best approximation and the matrix under the induced 2-norm condition is the magnitude of the (k+1)th singular value of the matrix:

AAk2=||i=k+1nσiuiviT||2=σk+1

Note that the best rank-k approximation to A can be stored efficiently by only storing the k singular values σ1,,σk, the k left singular vectors u1,,uk, and the k right singular vectors v1,,vk.

The figure below show best rank-k approximations of an image (you can find the code snippet that generates these images in the IPython notebook):

SVD Summary

Review Questions

ChangeLog