# Singular Value Decompositions

## Learning Objectives

• Construct an SVD of a matrix
• Identify pieces of an SVD
• Use an SVD to solve a problem

## Singular Value Decomposition

An $$m \times n$$ real matrix $${\bf A}$$ has a singular value decomposition of the form

where

• $${\bf U}$$ is an $$m \times m$$ orthogonal matrix whose columns are eigenvectors of $${\bf A} {\bf A}^T$$. The columns of $${\bf U}$$ are called the left singular vectors of $${\bf A}$$.
• $${\bf \Sigma}$$ is an $$m \times n$$ diagonal matrix of the form:

where $s = \min(m,n)$ and $$\sigma_1 \ge \sigma_2 \dots \ge \sigma_s \ge 0$$ are the square roots of the eigenvalues values of $${\bf A}^T {\bf A}$$. The diagonal entries are called the singular values of $${\bf A}$$.

Note that if $\mathbf{A}^T\mathbf{x} \ne 0$, then $\mathbf{A}^T\mathbf{A}$ and $\mathbf{A}\mathbf{A}^T$ both have the same eigenvalues:

$\hspace{13cm}$(left-multiply both sides by $\mathbf{A}^T$)

• $${\bf V}$$ is an $$n \times n$$ orthogonal matrix whose columns are eigenvectors of $${\bf A}^T {\bf A}$$ The columns of $${\bf V}$$ are called the right singular vectors of $${\bf A}$$.

## Time Complexity

The time-complexity for computing the SVD factorization of an arbitrary $$m \times n$$ matrix is proportional to $m^2n + n^3$, where the constant of proportionality ranges from 4 to 10 (or more) depending on the algorithm.

In general, we can define the cost as:

## Reduced SVD

The SVD factorization of a non-square matrix $${\bf A}$$ of size $$m \times n$$ can be represented in a reduced format:

• For $$m \ge n$$: $${\bf U}$$ is $$m \times n$$, $${\bf \Sigma}$$ is $$n \times n$$, and $${\bf V}$$ is $$n \times n$$
• For $$m \le n$$: $${\bf U}$$ is $$m \times m$$, $${\bf \Sigma}$$ is $$m \times m$$, and $${\bf V}$$ is $$n \times m$$ (note if $${\bf V}$$ is $$n \times m$$, then $${\bf V}^T$$ is $$m \times n$$)

The following figure depicts the reduced SVD factorization (in red) against the full SVD factorizations (in gray).

In general, we will represent the reduced SVD as:

where ${\bf U}_R$ is a $m \times s$ matrix, ${\bf V}_R$ is a $n \times s$ matrix, ${\bf \Sigma}_R$ is a $s \times s$ matrix, and $s = \min(m,n)$.

## Example: Computing the SVD

We begin with the following non-square matrix, $${\bf A}$$

${\bf A} = \left[ \begin{array}{ccc} 3 & 2 & 3 \\ 8 & 8 & 2 \\ 8 & 7 & 4 \\ 1 & 8 & 7 \\ 6 & 4 & 7 \\ \end{array} \right]$

and we will compute the reduced form of the SVD (where here $s = 3$):

(1) Compute $${\bf A}^T {\bf A}$$:

(2) Compute the eigenvectors and eigenvalues of $${\bf A}^T {\bf A}$$:

(3) Construct $${\bf V}_R$$ from the eigenvectors of $${\bf A}^T {\bf A}$$:

${\bf V}_R = \left[ \begin{array}{ccc} 0.585051 & -0.710399 & 0.391212 \\ 0.652648 & 0.126068 & -0.747098 \\ 0.481418 & 0.692415 & 0.537398 \\ \end{array} \right].$

(4) Construct $${\bf \Sigma}_R$$ from the square roots of the eigenvalues of $${\bf A}^T {\bf A}$$:

${\bf \Sigma}_R = \begin{bmatrix} 20.916 & 0 & 0 \\ 0 & 6.53207 & 0 \\ 0 & 0 & 4.22807 \end{bmatrix}$

(5) Find $${\bf U}$$ by solving $${\bf U}{\bf\Sigma} = {\bf A}{\bf V}$$. For our reduced case, we can find $${\bf U}_R = {\bf A}{\bf V}_R {\bf \Sigma}_R^{-1}$$. You could also find $${\bf U}$$ by computing the eigenvectors of $${\bf AA}^T$$.

We obtain the following singular value decomposition for $${\bf A}$$:

$\overbrace{\left[ \begin{array}{ccc} 3 & 2 & 3 \\ 8 & 8 & 2 \\ 8 & 7 & 4 \\ 1 & 8 & 7 \\ 6 & 4 & 7 \\ \end{array} \right]}^{A} = \overbrace{\left[ \begin{array}{ccc} 0.215371 & 0.030348 & 0.305490 \\ 0.519432 & -0.503779 & -0.419173 \\ 0.534262 & -0.311021 & 0.011730 \\ 0.438715 & 0.787878 & -0.431352\\ 0.453759 & 0.166729 & 0.738082\\ \end{array} \right]}^{U} \overbrace{\left[ \begin{array}{ccc} 20.916 & 0 & 0 \\ 0 & 6.53207 & 0 \\ 0 & 0 & 4.22807 \\ \end{array} \right]}^{\Sigma} \overbrace{\left[ \begin{array}{ccc} 0.585051 & 0.652648 & 0.481418 \\ -0.710399 & 0.126068 & 0.692415\\ 0.391212 & -0.747098 & 0.537398\\ \end{array} \right]}^{V^T}$

Recall that we computed the reduced SVD factorization (i.e. $${\bf \Sigma}$$ is square, $${\bf U}$$ is non-square) here.

## Rank, null space and range of a matrix

Suppose ${\bf A}$ is a $m \times n$ matrix where $m > n$ (without loss of generality):

We can re-write the above as:

Furthermore, the product of two matrices can be written as a sum of outer products:

For a general rectangular matrix, we have:

where $s = \min(m,n)$.

If ${\bf A}$ has $s$ non-zero singular values, the matrix is full rank, i.e. $\text{rank}({\bf A}) = s$.

If ${\bf A}$ has $r$ non-zero singular values, and $% $, the matrix is rank deficient, i.e. $\text{rank}({\bf A}) = r$.

In other words, the rank of ${\bf A}$ equals the number of non-zero singular values which is the same as the number of non-zero diagonal elements in ${\bf \Sigma}$.

Rounding errors may lead to small but non-zero singular values in a rank deficient matrix. Singular values that are smaller than a given tolerance are assumed to be numerically equivalent to zero, defining what is sometimes called the effective rank.

The right-singular vectors (columns of ${\bf V}$) corresponding to vanishing singular values of ${\bf A}$ span the null space of ${\bf A}$, i.e. null(${\bf A}$) = span{${\bf v}_{r+1}$, ${\bf v}_{r+2}$, …, ${\bf v}_{n}$}.

The left-singular vectors (columns of ${\bf U}$) corresponding to the non-zero singular values of ${\bf A}$ span the range of ${\bf A}$, i.e. range(${\bf A}$) = span{${\bf u}_{1}$, ${\bf u}_{2}$, …, ${\bf u}_{r}$}.

#### Example:

The rank of ${\bf A}$ is 2.

The vectors $\left[ \begin{array}{c} \frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{array} \right]$ and $\left[ \begin{array}{c} -\frac{1}{\sqrt{2}} \\ \frac{1}{\sqrt{2}} \\ 0 \\ 0 \end{array} \right]$ provide an orthonormal basis for the range of ${\bf A}$.

The vector $\left[ \begin{array}{c} 0 \\ 0\\ 1 \end{array} \right]$ provides an orthonormal basis for the null space of ${\bf A}$.

## (Moore-Penrose) Pseudoinverse

If the matrix ${\bf \Sigma}$ is rank deficient, we cannot get its inverse. We define instead the pseudoinverse:

For a general non-square matrix $${\bf A}$$ with known SVD ($${\bf A} = {\bf U\Sigma V}^T$$), the pseudoinverse is defined as:

For example, if we consider a $m \times n$ full rank matrix where $m > n$:

## Euclidean norm of matrices

The induced 2-norm of a matrix ${\bf A}$ can be obtained using the SVD of the matrix :

And hence,

In the above equations, all the notations for the norm $\| . \|$ refer to the $p=2$ Euclidean norm, and we used the fact that ${\bf U}$ and ${\bf V}$ are orthogonal matrices and hence $\|{\bf U}\|_2 = \|{\bf V}\|_2 = 1$.

#### Example:

We begin with the following non-square matrix ${\bf A}$:

The matrix of singular values, $${\bf \Sigma}$$, computed from the SVD factorization is:

$\Sigma = \left[ \begin{array}{ccc} 20.916 & 0 & 0 \\ 0 & 6.53207 & 0 \\ 0 & 0 & 4.22807 \\ \end{array} \right].$

Consequently the 2-norm of $${\bf A}$$ is

$\|{\bf A}\|_2 = 20.916.$

## Euclidean norm of the inverse of matrices

Following the same derivation as above, we can show that for a full rank $n \times n$ matrix we have:

where ${\sigma_n}$ is the smallest singular value.

For non-square matrices, we can use the definition of the pseudoinverse (regardless of the rank):

where ${\sigma_r}$ is the smallest non-zero singular value. Note that for a full rank square matrix, we have $\| {\bf A}^{+} \|_2 = \| {\bf A}^{-1} \|_2$. An exception of the definition above is the zero matrix. In this case, $\| {\bf A}^{+} \|_2 = 0$

## 2-Norm Condition Number

The 2-norm condition number of a matrix $${\bf A}$$ is given by the ratio of its largest singular value to its smallest singular value:

If the matrix ${\bf A}$ is rank deficient, i.e. $% $, then $\text{cond}_2({\bf A}) = \infty$.

## Low-rank Approximation

The best rank-$k$ approximation for a $m \times n$ matrix ${\bf A}$, where $% $, for some matrix norm $\|.\|$, is one that minimizes the following problem:

Under the induced $$2$$-norm, the best rank-$$k$$ approximation is given by the sum of the first $$k$$ outer products of the left and right singular vectors scaled by the corresponding singular value (where, $$\sigma_1 \ge \dots \ge \sigma_s$$):

Observe that the norm of the difference between the best approximation and the matrix under the induced $$2$$-norm condition is the magnitude of the $$(k+1)^\text{th}$$ singular value of the matrix:

Note that the best rank-${k}$ approximation to ${\bf A}$ can be stored efficiently by only storing the ${k}$ singular values ${\sigma_1,\dots,\sigma_k}$, the ${k}$ left singular vectors ${\bf u_1,\dots,\bf u_k}$, and the ${k}$ right singular vectors ${\bf v_1,\dots, \bf v_k}$.

The figure below show best rank-$$k$$ approximations of an image (you can find the code snippet that generates these images in the IPython notebook):