Singular Value Decompositions

Learning Objectives

Construct an SVD of a matrix
Identify pieces of an SVD
Use an SVD to solve a problem

Singular Value Decomposition

An $m \times n$ real matrix $A$ has a singular value decomposition of the form

A = U Σ V^{T}

where $U$ is an $m \times m$ orthogonal matrix, $V$ is an $n \times n$ orthogonal matrix, and $Σ$ is an $m \times n$ diagonal matrix. Specifically,

$U$ is an $m \times m$ orthogonal matrix whose columns are eigenvectors of $A A^{T}$ . The columns of $U$ are called the left singular vectors of $A$ .

A A^{T} = (U Σ V^{T}) (U Σ V^{T})^{T}

= (U Σ V^{T}) ((V^{T})^{T} Σ^{T} U^{T})

= U Σ (V^{T} V) Σ^{T} U^{T}

(V is an orthogonal matrix, V^{T} = V^{- 1} and V^{T} V = I)

= U (Σ Σ^{T}) U^{T}

$U$ is also an orthogonal matrix, we can apply diagonalization ( $B = X D X^{- 1}$ ).

We have the columns of $U$ are the eigenvectors of $A A^{T}$ , with eigenvalues in the diagonal entries of $Σ Σ^{T}$ .

$V$ is an $n \times n$ orthogonal matrix whose columns are eigenvectors of $A^{T} A$ . The columns of $V$ are called the right singular vectors of $A$ .

A^{T} A = (U Σ V^{T})^{T} (U Σ V^{T})

= V (Σ^{T} Σ) V^{T}

Similar to above, we have the columns of $V$ as the eigenvectors of $A^{T} A$ , with eigenvalues in the diagonal entries of $Σ^{T} Σ$ .

$Σ$ is an $m \times n$ diagonal matrix of the form:

\begin{array}{r} Σ = [\begin{array}{c} σ_{1} \\ ⋱ \\ σ_{s} \\ 0 & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & 0 \end{array}] when m > n, and Σ = [\begin{array}{c} σ_{1} & 0 & \dots & 0 \\ ⋱ & ⋱ \\ σ_{s} & 0 & \dots & 0 \end{array}] when m < n . \end{array}

where $s = min (m, n)$ and $σ_{1} \geq σ_{2} \dots \geq σ_{s} \geq 0$ are the square roots of the eigenvalues values of $A^{T} A$ . The diagonal entries are called the singular values of $A$ .

Note that if $A^{T} x \neq 0$ , then $A^{T} A$ and $A A^{T}$ both have the same eigenvalues:

A A^{T} x = λ x

(left-multiply both sides by $A^{T}$ )

A^{T} A A^{T} x = A^{T} λ x

A^{T} A (A^{T} x) = λ (A^{T} x)

Time Complexity

The time-complexity for computing the SVD factorization of an arbitrary $m \times n$ matrix is $α (m^{2} n + n^{3})$ , where the constant $α$ ranges from 4 to 10 (or more) depending on the algorithm.

In general, we can define the cost as:

O (m^{2} n + n^{3})

Reduced SVD

The SVD factorization of a non-square matrix $A$ of size $m \times n$ can be represented in a reduced format:

For $m \geq n$ : $U$ is $m \times n$ , $Σ$ is $n \times n$ , and $V$ is $n \times n$
For $m \leq n$ : $U$ is $m \times m$ , $Σ$ is $m \times m$ , and $V$ is $n \times m$ (note if $V$ is $n \times m$ , then $V^{T}$ is $m \times n$ )

The following figure depicts the reduced SVD factorization (in red) against the full SVD factorizations (in gray).

In general, we will represent the reduced SVD as:

A = U_{R} Σ_{R} V_{R}^{T}

where $U_{R}$ is a $m \times s$ matrix, $V_{R}$ is a $n \times s$ matrix, $Σ_{R}$ is a $s \times s$ matrix, and $s = min (m, n)$ .

Example: Computing the SVD

We begin with the following non-square matrix, $A$

A = [\begin{array}{ccc} 3 & 2 & 3 \\ 8 & 8 & 2 \\ 8 & 7 & 4 \\ 1 & 8 & 7 \\ 6 & 4 & 7 \end{array}]

and we will compute the reduced form of the SVD (where here $s = 3$ ):

(1) Compute $A^{T} A$ :

A^{T} A = [\begin{array}{ccc} 174 & 158 & 106 \\ 158 & 197 & 134 \\ 106 & 134 & 127 \end{array}]

(2) Compute the eigenvectors and eigenvalues of $A^{T} A$ :

λ_{1} = 437.479, λ_{2} = 42.6444, λ_{3} = 17.8766, v_{1} = [\begin{matrix} 0.585051 \\ 0.652648 \\ 0.481418 \end{matrix}], v_{2} = [\begin{matrix} - 0.710399 \\ 0.126068 \\ 0.692415 \end{matrix}], v_{3} = [\begin{matrix} 0.391212 \\ - 0.747098 \\ 0.537398 \end{matrix}]

(3) Construct $V_{R}$ from the eigenvectors of $A^{T} A$ :

V_{R} = [\begin{array}{ccc} 0.585051 & - 0.710399 & 0.391212 \\ 0.652648 & 0.126068 & - 0.747098 \\ 0.481418 & 0.692415 & 0.537398 \end{array}] .

(4) Construct $Σ_{R}$ from the square roots of the eigenvalues of $A^{T} A$ :

Σ_{R} = [\begin{matrix} 20.916 & 0 & 0 \\ 0 & 6.53207 & 0 \\ 0 & 0 & 4.22807 \end{matrix}]

(5) Find $U$ by solving $U Σ = A V$ . For our reduced case, we can find $U_{R} = A V_{R} Σ_{R}^{- 1}$ . You could also find $U$ by computing the eigenvectors of ${A A}^{T}$ .

U = \overset{A}{\overset{⏞}{[\begin{array}{ccc} 3 & 2 & 3 \\ 8 & 8 & 2 \\ 8 & 7 & 4 \\ 1 & 8 & 7 \\ 6 & 4 & 7 \end{array}]}} \overset{V}{\overset{⏞}{[\begin{array}{ccc} 0.585051 & - 0.710399 & 0.391212 \\ 0.652648 & 0.126068 & - 0.747098 \\ 0.481418 & 0.692415 & 0.537398 \end{array}]}} \overset{Σ^{- 1}}{\overset{⏞}{[\begin{array}{ccc} 0.047810 & 0.0 & 0.0 \\ 0.0 & 0.153133 & 0.0 \\ 0.0 & 0.0 & 0.236515 \end{array}]}}

U = [\begin{array}{ccc} 0.215371 & 0.030348 & 0.305490 \\ 0.519432 & - 0.503779 & - 0.419173 \\ 0.534262 & - 0.311021 & 0.011730 \\ 0.438715 & 0.787878 & - 0.431352 \\ 0.453759 & 0.166729 & 0.738082 \end{array}]

We obtain the following singular value decomposition for $A$ :

\overset{A}{\overset{⏞}{[\begin{array}{ccc} 3 & 2 & 3 \\ 8 & 8 & 2 \\ 8 & 7 & 4 \\ 1 & 8 & 7 \\ 6 & 4 & 7 \end{array}]}} = \overset{U}{\overset{⏞}{[\begin{array}{ccc} 0.215371 & 0.030348 & 0.305490 \\ 0.519432 & - 0.503779 & - 0.419173 \\ 0.534262 & - 0.311021 & 0.011730 \\ 0.438715 & 0.787878 & - 0.431352 \\ 0.453759 & 0.166729 & 0.738082 \end{array}]}} \overset{Σ}{\overset{⏞}{[\begin{array}{ccc} 20.916 & 0 & 0 \\ 0 & 6.53207 & 0 \\ 0 & 0 & 4.22807 \end{array}]}} \overset{V^{T}}{\overset{⏞}{[\begin{array}{ccc} 0.585051 & 0.652648 & 0.481418 \\ - 0.710399 & 0.126068 & 0.692415 \\ 0.391212 & - 0.747098 & 0.537398 \end{array}]}}

Recall that we computed the reduced SVD factorization (i.e. $Σ$ is square, $U$ is non-square) here.

Rank, null space and range of a matrix

Suppose $A$ is a $m \times n$ matrix where $m > n$ (without loss of generality):

A = {U Σ V}^{T} = [\begin{matrix} | & | & | \\ | & | & | \\ u_{1} & \dots & u_{n} & \dots & u_{m} \\ | & | & | \\ | & | & | \end{matrix}] [\begin{matrix} σ_{1} \\ ⋱ \\ σ_{n} \\ ⋮ \\ - & 0 & - \end{matrix}] [\begin{matrix} - & v_{1}^{T} & - \\ ⋮ \\ - & v_{n}^{T} & - \end{matrix}]

We can re-write the above as:

A = [\begin{matrix} | & | \\ | & | \\ u_{1} & \dots & u_{n} \\ | & | \\ | & | \end{matrix}] [\begin{matrix} - & σ_{1} v_{1}^{T} & - \\ ⋮ \\ - & σ_{n} v_{n}^{T} & - \end{matrix}]

Furthermore, the product of two matrices can be written as a sum of outer products:

A = σ_{1} u_{1} v_{1}^{T} + σ_{2} u_{2} v_{2}^{T} + . . . + σ_{n} u_{n} v_{n}^{T}

For a general rectangular matrix, we have:

A = \sum_{i = 1}^{s} σ_{i} u_{i} v_{i}^{T}

where $s = min (m, n)$ .

If $A$ has $s$ non-zero singular values, the matrix is full rank, i.e. $rank (A) = s$ .

If $A$ has $r$ non-zero singular values, and $r < s$ , the matrix is rank deficient, i.e. $rank (A) = r$ .

In other words, the rank of $A$ equals the number of non-zero singular values which is the same as the number of non-zero diagonal elements in $Σ$ .

Rounding errors may lead to small but non-zero singular values in a rank deficient matrix. Singular values that are smaller than a given tolerance are assumed to be numerically equivalent to zero, defining what is sometimes called the effective rank.

The right-singular vectors (columns of $V$ ) corresponding to vanishing singular values of $A$ span the null space of $A$ , i.e. null( $A$ ) = span{ $v_{r + 1}$ , $v_{r + 2}$ , …, $v_{n}$ }.

The left-singular vectors (columns of $U$ ) corresponding to the non-zero singular values of $A$ span the range of $A$ , i.e. range( $A$ ) = span{ $u_{1}$ , $u_{2}$ , …, $u_{r}$ }.

Example:

A = [\begin{array}{cccc} \frac{1}{\sqrt{2}} & - \frac{1}{\sqrt{2}} & 0 & 0 \\ \frac{1}{\sqrt{2}} 2 & \frac{1}{\sqrt{2}} & 0 & 0 \\ 0 & 0 & 0 & 1 \\ 0 & 0 & 1 & 0 \end{array}] [\begin{array}{ccc} 14 & 0 & 0 \\ 0 & 14 & 0 \\ 0 & 0 & 0 \\ 0 & 0 & 0 \end{array}] [\begin{array}{ccc} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{array}]

The rank of $A$ is 2.

The vectors $[\begin{matrix} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{matrix}]$ and $[\begin{matrix} - \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{matrix}]$ provide an orthonormal basis for the range of $A$ .

The vector $[\begin{matrix} 0 \\ 0 \\ 1 \end{matrix}]$ provides an orthonormal basis for the null space of $A$ .

(Moore-Penrose) Pseudoinverse

If the matrix $Σ$ is rank deficient, we cannot get its inverse. We define instead the pseudoinverse:

(Σ^{+})_{i i} = {\begin{cases} \frac{1}{σ_{i}} & σ_{i} \neq 0 \\ 0 & σ_{i} = 0 \end{cases}

For a general non-square matrix $A$ with known SVD ( $A = {U Σ V}^{T}$ ), the pseudoinverse is defined as:

A^{+} = {V Σ}^{+} U^{T}

For example, if we consider a $m \times n$ full rank matrix where $m > n$ :

A^{+} = [\begin{matrix} | & . . . & | \\ v_{1} & . . . & v_{n} \\ | & . . . & | \end{matrix}] [\begin{matrix} 1 / σ_{1} & 0 & \dots & 0 \\ ⋱ & ⋱ \\ 1 / σ_{n} & 0 & \dots & 0 \end{matrix}] {[\begin{matrix} | & | & | \\ | & | & | \\ u_{1} & \dots & u_{n} & \dots & u_{m} \\ | & | & | \\ | & | & | \end{matrix}]}^{T}

Euclidean norm of matrices

The induced 2-norm of a matrix $A$ can be obtained using the SVD of the matrix :

\begin{aligned} ‖ A ‖_{2} & = max_{‖ x ‖ = 1} ‖ A x ‖ = max_{‖ x ‖ = 1} ‖ {U Σ V}^{T} x ‖ \\ = max_{‖ x ‖ = 1} ‖ {Σ V}^{T} x ‖ = max_{‖ V^{T} x ‖ = 1} ‖ {Σ V}^{T} x ‖ = max_{‖ y ‖ = 1} ‖ Σ y ‖ \end{aligned}

And hence,

‖ A ‖_{2} = σ_{1}

In the above equations, all the notations for the norm $‖ . ‖$ refer to the $p = 2$ Euclidean norm, and we used the fact that $U$ and $V$ are orthogonal matrices and hence $‖ U ‖_{2} = ‖ V ‖_{2} = 1$ .

Example:

We begin with the following non-square matrix $A$ :

A = [\begin{array}{ccc} 3 & 2 & 3 \\ 8 & 8 & 2 \\ 8 & 7 & 4 \\ 1 & 8 & 7 \\ 6 & 4 & 7 \end{array}] .

The matrix of singular values, $Σ$ , computed from the SVD factorization is:

Σ = [\begin{array}{ccc} 20.916 & 0 & 0 \\ 0 & 6.53207 & 0 \\ 0 & 0 & 4.22807 \end{array}] .

Consequently the 2-norm of $A$ is

‖ A ‖_{2} = 20.916 .

Euclidean norm of the inverse of matrices

Following the same derivation as above, we can show that for a full rank $n \times n$ matrix we have:

‖ A^{- 1} ‖_{2} = \frac{1}{σ_{n}}

where $σ_{n}$ is the smallest singular value.

For non-square matrices, we can use the definition of the pseudoinverse (regardless of the rank):

‖ A^{+} ‖_{2} = \frac{1}{σ_{r}}

where $σ_{r}$ is the smallest non-zero singular value. Note that for a full rank square matrix, we have $‖ A^{+} ‖_{2} = ‖ A^{- 1} ‖_{2}$ . An exception of the definition above is the zero matrix. In this case, $‖ A^{+} ‖_{2} = 0$

2-Norm Condition Number

The 2-norm condition number of a matrix $A$ is given by the ratio of its largest singular value to its smallest singular value:

{cond}_{2} (A) = ‖ A ‖_{2} ‖ A^{- 1} ‖_{2} = σ_{max} / σ_{min} .

If the matrix $A$ is rank deficient, i.e. $rank (A) < min (m, n)$ , then ${cond}_{2} (A) = \infty$ .

Low-rank Approximation

The best rank- $k$ approximation for a $m \times n$ matrix $A$ , where $k < s = min (m, n)$ , for some matrix norm $‖ . ‖$ , is one that minimizes the following problem:

\begin{aligned} min_{A_{k}} ‖ A - A_{k} ‖ \\ such that rank (A_{k}) \leq k . \end{aligned}

Under the induced $2$ -norm, the best rank- $k$ approximation is given by the sum of the first $k$ outer products of the left and right singular vectors scaled by the corresponding singular value (where, $σ_{1} \geq \dots \geq σ_{s}$ ):

A_{k} = σ_{1} u_{1} v_{1}^{T} + \dots σ_{k} u_{k} v_{k}^{T}

Observe that the norm of the difference between the best approximation and the matrix under the induced $2$ -norm condition is the magnitude of the $(k + 1)^{th}$ singular value of the matrix:

‖ A - A_{k} ‖_{2} = {| | \sum_{i = k + 1}^{n} σ_{i} u_{i} v_{i}^{T} | |}_{2} = σ_{k + 1}

Note that the best rank- $k$ approximation to $A$ can be stored efficiently by only storing the $k$ singular values $σ_{1}, \dots, σ_{k}$ , the $k$ left singular vectors $u_{1}, \dots, u_{k}$ , and the $k$ right singular vectors $v_{1}, \dots, v_{k}$ .

The figure below show best rank- $k$ approximations of an image (you can find the code snippet that generates these images in the IPython notebook):

SVD Summary

The SVD is a factorization of an $m \times n$ matrix $A$ into $A = U Σ V^{T}$ where $U$ is an $m \times m$ orthogonal matrix, $V$ is an $n \times n$ orthogonal matrix, and $Σ$ is an $m \times n$ diagonal matrix.
Reduced form: $A = U_{R} Σ_{R} {V_{R}}^{T}$ where $U_{R}$ is an $m \times s$ matrix, $V_{R}$ is an $n \times s$ matrix, and $Σ_{R}$ is an $s \times s$ diagonal matrix. Here, $s = min (m, n)$ .
The columns of $U$ are the eigenvectors of the matrix $A A^{T}$ , and are called the left singular vectors of $A$ .
The columns of $V$ are the eigenvectors of the matrix $A^{T} A$ , and are called the right singular vectors of $A$ .
The square roots of the eigenvalues of $A^{T} A$ are the diagonal entries of $Σ$ , called the singular values $σ_{i} = \sqrt{λ_{i}}$ .
The singular values $σ_{i}$ are always non-negative.

Review Questions

See this review link

ChangeLog

2022-04-10 Yuxuan Chen yuxuan19@illinois.edu: added svd proof, changed svd cost, included svd summary
2020-04-26 Mariana Silva mfsilva@illinois.edu: adding more details to sections
2018-11-14 Erin Carrier ecarrie2@illinois.edu: spelling fix
2018-10-18 Erin Carrier ecarrie2@illinois.edu: correct svd cost
2018-01-14 Erin Carrier ecarrie2@illinois.edu: removes demo links
2017-12-04 Arun Lakshmanan lakshma2@illinois.edu: fix best rank approx, svd image
2017-11-15 Erin Carrier ecarrie2@illinois.edu: adds review questions, adds cond num sec, removes normal equations, minor corrections and clarifications
2017-11-13 Arun Lakshmanan lakshma2@illinois.edu: first complete draft
2017-10-17 Luke Olson lukeo@illinois.edu: outline