Eigenvalues and Eigenvectors

Learning Objectives

Compute eigenvalue/eigenvector for various applications.
Use the Power Method to find an eigenvector.

Eigenvalues and Eigenvectors

An eigenvalue of an $n \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\times</mo><mi>n</mi></math>$ matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ is a scalar $λ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>λ</mi></math>$ such that $A x = λ x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mi>λ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ for some non-zero vector $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ . The eigenvalue $λ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>λ</mi></math>$ can be any real or complex scalar, (which we write $λ \in R or λ \in C <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>λ</mi><mo>\in</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mtext> </mtext><mtext>or </mtext><mi>λ</mi><mo>\in</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">C</mi></mrow></math>$ ). Eigenvalues can be complex even if all the entries of the matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ are real. In this case, the corresponding vector $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ must have complex-valued components (which we write $x \in C n <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>\in</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">C</mi></mrow><mi>n</mi></msup></math>$ ). The equation $A x = λ x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mi>λ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ is called the eigenvalue equation and any such non-zero vector $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ is called an eigenvector of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ corresponding to $λ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>λ</mi></math>$ .

The eigenvalue equation can be rearranged to $(A - λ I) x = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>-</mo><mi>λ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mn>0</mn></math>$ , and because $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ is not zero this has solutions if and only if $λ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>λ</mi></math>$ is a solution of the characteristic equation:

det (A - λ I) = 0. <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>det</mi><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>-</mo><mi>λ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><mn>0.</mn></math>

The expression $p (λ) = det (A - λ I) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>p</mi><mo stretchy="false">(</mo><mi>λ</mi><mo stretchy="false">)</mo><mo>=</mo><mi>det</mi><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>-</mo><mi>λ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow><mo stretchy="false">)</mo></math>$ is called the characteristic polynomial and is a polynomial of degree $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ .

Although all eigenvalues can be found by solving the characteristic equation, there is no general, closed-form analytical solution for the roots of polynomials of degree $n \geq 5 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\geq</mo><mn>5</mn></math>$ and this is not a good numerical approach for finding eigenvalues.

Unless otherwise specified, we write eigenvalues ordered by magnitude, so that

| λ 1 | \geq | λ 2 | \geq \dots \geq | λ n |, <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mo stretchy="false">|</mo><msub><mi>λ</mi><mn>1</mn></msub><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mo>\geq</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><msub><mi>λ</mi><mn>2</mn></msub><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mo>\geq</mo><mo>\dots</mo><mo>\geq</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><msub><mi>λ</mi><mi>n</mi></msub><mrow data-mjx-texclass="ORD"><mo stretchy="false">|</mo></mrow><mo>,</mo></math>

and we normalize eigenvectors, so that $‖ x ‖ = 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo>=</mo><mn>1</mn></math>$ .

Eigenvalues of a Shifted Matrix

Given a matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ , for any constant scalar $σ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>σ</mi></math>$ , we define the shifted matrix is $A - σ I <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>-</mo><mi>σ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow></math>$ . If $λ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>λ</mi></math>$ is an eigenvalue of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ with eigenvector $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ then $λ - σ <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>λ</mi><mo>-</mo><mi>σ</mi></math>$ is an eigenvalue of the shifted matrix with the same eigenvector. This can be derived by

(A - σ I) x = A x - σ I x = λ x - σ x = (λ - σ) x . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>-</mo><mi>σ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></mtd><mtd><mi></mi><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>-</mo><mi>σ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mi>λ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>-</mo><mi>σ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><mo stretchy="false">(</mo><mi>λ</mi><mo>-</mo><mi>σ</mi><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>.</mo></mtd></mtr></mtable></math>

Eigenvalues of an Inverse

An invertible matrix cannot have an eigenvalue equal to zero. Furthermore, the eigenvalues of the inverse matrix are equal to the inverse of the eigenvalues of the original matrix:

Ax=λx⟹A−1Ax=λA−1x⟹x=λA−1x⟹A−1x=1λx.<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mi>λ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mo stretchy="false">⟹</mo><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mspace linebreak="newline"></mspace><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mi>λ</mi><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mo stretchy="false">⟹</mo><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mspace linebreak="newline"></mspace><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mi>λ</mi><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mo stretchy="false">⟹</mo><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mspace linebreak="newline"></mspace><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mfrac><mn>1</mn><mi>λ</mi></mfrac><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>.</mo></math>

Eigenvalues of a Shifted Inverse

Similarly, we can describe the eigenvalues for shifted inverse matrices as:

(A−σI)−1x=1λ−σx.<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>−</mo><mi>σ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow><msup><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mo>−</mo><mn>1</mn></mrow></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mfrac><mn>1</mn><mrow><mi>λ</mi><mo>−</mo><mi>σ</mi></mrow></mfrac><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>.</mo></math>

It is important to note here, that the eigenvectors remain unchanged for shifted or/and inverted matrices.

Diagonalizability

An $n \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\times</mo><mi>n</mi></math>$ matrix with $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ linearly independent eigenvectors can be expressed as its eigenvalues and eigenvectors as:

The eigenvector matrix can be inverted to obtain the following similarity transformation of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ :

AX = XD ⟺ A = XDX - 1 ⟺ X - 1 A X = D <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">AX</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">XD</mi></mrow><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mo stretchy="false">⟺</mo><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>=</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">XDX</mi></mrow><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mo stretchy="false">⟺</mo><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">X</mi></mrow><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">X</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">D</mi></mrow></math>

Multiplying the matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ by $X - 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">X</mi></mrow><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup></math>$ on the left and $X <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">X</mi></mrow></math>$ on the right transforms it into a diagonal matrix; it has been ‘‘diagonalized’’.

Example: Matrix that is diagonalizable

A $n \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\times</mo><mi>n</mi></math>$ matrix is diagonalizable if and only if it has $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ linearly independent eigenvectors. For example:

Example: Matrix that is not diagonalizable

A matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ with linearly dependent eigenvectors is not diagonalizable. For example, while it is true that

A ⏞ [1101] X ⏞ [1100] = X ⏞ [1100] D ⏞ [1001], <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mover><mrow data-mjx-texclass="OP"><mover><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd><mtd><mn>1</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>⏞</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></mrow></mover><mover><mrow data-mjx-texclass="OP"><mover><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd><mtd><mn>1</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>⏞</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">X</mi></mrow></mrow></mover><mo>=</mo><mover><mrow data-mjx-texclass="OP"><mover><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd><mtd><mn>1</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>⏞</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">X</mi></mrow></mrow></mover><mover><mrow data-mjx-texclass="OP"><mover><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>⏞</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">D</mi></mrow></mrow></mover><mo>,</mo></math>

the matrix $X <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">X</mi></mrow></math>$ does not have an inverse, so we cannot diagonalize $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ by applying an inverse. In fact, for any non-singular matrix $P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">P</mi></mrow></math>$ , the product $P - 1 AP <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">P</mi></mrow><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">AP</mi></mrow></math>$ is not diagonal.

Expressing an Arbitrary Vector as a Linear Combination of Eigenvectors

If an $n \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\times</mo><mi>n</mi></math>$ matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ is diagonalizable, then we can write an arbitrary vector as a linear combination of the eigenvectors of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ . Let $u 1, u 2, \dots, u n <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>1</mn></msub><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>2</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mi>n</mi></msub></math>$ be $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ linearly independent eigenvectors of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ ; then an arbitrary vector $x 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mn>0</mn></msub></math>$ can be written:

x 0 = α 1 u 1 + α 2 u 2 + \dots + α n u n . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mn>0</mn></msub><mo>=</mo><msub><mi>α</mi><mn>1</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>1</mn></msub><mo>+</mo><msub><mi>α</mi><mn>2</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>2</mn></msub><mo>+</mo><mo>\dots</mo><mo>+</mo><msub><mi>α</mi><mi>n</mi></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mi>n</mi></msub><mo>.</mo></math>

If we apply the matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ to $x 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mn>0</mn></msub></math>$ :

Ax0=α1Au1+α2Au2+⋯+αnAun,=α1λ1u1+α2λ2u2+⋯+αnλnun,=λ1(α1u1+α2λ2λ1u2+⋯+αnλnλ1un).<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtable displaystyle="true" columnalign="right left" columnspacing="0em" rowspacing="3pt"><mtr><mtd><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mn>0</mn></msub></mtd><mtd><mi></mi><mo>=</mo><msub><mi>α</mi><mn>1</mn></msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>1</mn></msub><mo>+</mo><msub><mi>α</mi><mn>2</mn></msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>2</mn></msub><mo>+</mo><mo>⋯</mo><mo>+</mo><msub><mi>α</mi><mi>n</mi></msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mi>n</mi></msub><mo>,</mo></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><msub><mi>α</mi><mn>1</mn></msub><msub><mi>λ</mi><mn>1</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>1</mn></msub><mo>+</mo><msub><mi>α</mi><mn>2</mn></msub><msub><mi>λ</mi><mn>2</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>2</mn></msub><mo>+</mo><mo>⋯</mo><mo>+</mo><msub><mi>α</mi><mi>n</mi></msub><msub><mi>λ</mi><mi>n</mi></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mi>n</mi></msub><mo>,</mo></mtd></mtr><mtr><mtd></mtd><mtd><mi></mi><mo>=</mo><msub><mi>λ</mi><mn>1</mn></msub><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">(</mo><msub><mi>α</mi><mn>1</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>1</mn></msub><mo>+</mo><msub><mi>α</mi><mn>2</mn></msub><mfrac><msub><mi>λ</mi><mn>2</mn></msub><msub><mi>λ</mi><mn>1</mn></msub></mfrac><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>2</mn></msub><mo>+</mo><mo>⋯</mo><mo>+</mo><msub><mi>α</mi><mi>n</mi></msub><mfrac><msub><mi>λ</mi><mi>n</mi></msub><msub><mi>λ</mi><mn>1</mn></msub></mfrac><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mi>n</mi></msub><mo data-mjx-texclass="CLOSE">)</mo></mrow><mo>.</mo></mtd></mtr></mtable></math>

If we repeatedly apply $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ we have

Akx0=λk1(α1u1+α2(λ2λ1)ku2+⋯+αn(λnλ1)kun).<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mi>k</mi></msup><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mn>0</mn></msub><mo>=</mo><msubsup><mi>λ</mi><mn>1</mn><mi>k</mi></msubsup><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">(</mo><msub><mi>α</mi><mn>1</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>1</mn></msub><mo>+</mo><msub><mi>α</mi><mn>2</mn></msub><msup><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">(</mo><mfrac><msub><mi>λ</mi><mn>2</mn></msub><msub><mi>λ</mi><mn>1</mn></msub></mfrac><mo data-mjx-texclass="CLOSE">)</mo></mrow><mi>k</mi></msup><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>2</mn></msub><mo>+</mo><mo>⋯</mo><mo>+</mo><msub><mi>α</mi><mi>n</mi></msub><msup><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">(</mo><mfrac><msub><mi>λ</mi><mi>n</mi></msub><msub><mi>λ</mi><mn>1</mn></msub></mfrac><mo data-mjx-texclass="CLOSE">)</mo></mrow><mi>k</mi></msup><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mi>n</mi></msub><mo data-mjx-texclass="CLOSE">)</mo></mrow><mo>.</mo></math>

In the case where one eigenvalue has magnitude that is strictly greater than all the others, i.e.

$| λ 1 | > | λ 2 | \geq | λ 3 | \geq \dots \geq | λ n | <math xmlns="http://www.w3.org/1998/Math/MathML"><mo data-mjx-texclass="ORD" fence="false" stretchy="false">|</mo><msub><mi>λ</mi><mn>1</mn></msub><mo data-mjx-texclass="ORD" fence="false" stretchy="false">|</mo><mo>></mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">|</mo><msub><mi>λ</mi><mn>2</mn></msub><mo data-mjx-texclass="ORD" fence="false" stretchy="false">|</mo><mo>\geq</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">|</mo><msub><mi>λ</mi><mn>3</mn></msub><mo data-mjx-texclass="ORD" fence="false" stretchy="false">|</mo><mo>\geq</mo><mo>\dots</mo><mo>\geq</mo><mo data-mjx-texclass="ORD" fence="false" stretchy="false">|</mo><msub><mi>λ</mi><mi>n</mi></msub><mo data-mjx-texclass="ORD" fence="false" stretchy="false">|</mo></math>$ ,

this implies

lim<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><munder><mo data-mjx-texclass="OP" movablelimits="true">lim</mo><mrow data-mjx-texclass="ORD"><mi>k</mi><mo accent="false" stretchy="false">→</mo><mi mathvariant="normal">∞</mi></mrow></munder><mfrac><mrow><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mi>k</mi></msup><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mn>0</mn></msub></mrow><msubsup><mi>λ</mi><mn>1</mn><mrow data-mjx-texclass="ORD"><mi>k</mi></mrow></msubsup></mfrac><mo>=</mo><msub><mi>α</mi><mn>1</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mn>1</mn></msub><mo>.</mo></math>

This observation motivates the algorithm known as power iteration, which is the topic of the next section.

Power Iteration algorithm

For a matrix $A$ , power iteration will find a scalar multiple of an eigenvector $u_{1}$ , corresponding to the dominant eigenvalue (largest in magnitude) $λ_{1}$ , provided that $| λ_{1} |$ is strictly greater than the magnitude of the other eigenvalues, i.e., $| λ_{1} | > | λ_{2} | \geq \dots \geq | λ_{n} |$ .

Suppose

$x_{0} = α_{1} u_{1} + α_{2} u_{2} + \dots α_{n} u_{n}, with α_{1} \neq 0$ .

From the previous section, the iterative sequence

x_{k} = A x_{k - 1} for k = 1, 2, 3, \dots

satisfies

$x_{k} = A^{k} x_{0} ⟹ lim_{k \to \infty} \frac{x_{k}}{λ_{1}^{k}} = α_{1} u_{1}$ .

Thus, for large $k$ , $x_{k} \approx λ_{1}^{k} α_{1} u_{1}$ . Unfortunately, this mean that $‖ x_{k} ‖ \approx | λ_{1} |^{k} \cdot ‖ α_{1} u_{1} ‖,$ which will be very large if $| λ_{1} | > 1$ , or very small if $| λ_{1} | < 1$ . For this reason, we use normalized power iteration.

Normalized power iteration, is given by the following. Let $x_{0}$ be a vector with unit norm: $‖ x_{0} ‖ = 1$ (any norm is fine), with $x_{0} = α_{1} u_{1} + α_{2} u_{2} + \dots α_{n} u_{n}, and α_{1} \neq 0$ .

Normalized power iteration is defined by the following iterative sequence for $k = 1, 2, 3, \dots$ :

\begin{aligned} y_{k} = A x_{k - 1} \\ x_{k} = \frac{y_{k}}{‖ y_{k} ‖} \end{aligned}

where the norm $‖ \cdot ‖$ is identical to the norm used when we assumed $‖ x_{0} ‖ = 1$ .

It can be shown that this sequence satisfies

x_{k} = \frac{A^{k} x_{0}}{‖ A^{k} x_{0} ‖} .

This means that for large values of $k$ , we have

x_{k} \approx {(\frac{λ_{1}}{| λ_{1} |})}^{k} \cdot \frac{α_{1} u_{1}}{‖ α_{1} u_{1} ‖} .

The largest eigenvalue could be positive, negative, or a complex number. In each case we will have:

\begin{aligned} λ_{1} > 0 ⟹ & x_{k} \approx \frac{α_{1} u_{1}}{‖ α_{1} u_{1} ‖} x_{k} converges \\ λ_{1} < 0 ⟹ & x_{k} \approx (- 1)^{k} \frac{α_{1} u_{1}}{‖ α_{1} u_{1} ‖} in the limit, x_{k} alternates between \pm \frac{α_{1} u_{1}}{‖ α_{1} u_{1} ‖} \\ λ_{1} = r e^{i θ} ⟹ & x_{k} \approx e^{i k θ} \frac{α_{1} u_{1}}{‖ α_{1} u_{1} ‖} in the limit, x_{k} is a scalar multiple of u_{1} with coefficient that rotates around the unit circle . \end{aligned}

Strictly speaking, normalized power iteration only converges to a single vector if $λ_{1} > 0$ , but $x_{k}$ will be close to a scalar multiple of the eigenvector $u_{1}$ for large values of $k$ , regardless of whether the dominant eigenvalue is positive, negative, or complex. So normalized power iteration will work for any value of $λ_{1}$ , as long as it is strictly bigger in magnitude than the other eigenvalues.

Power Iteration code

The following code snippet performs power iteration:

import numpy as np
def power_iter(A, x_0, p):
  # A: nxn matrix, x_0: initial guess, p: type of norm
  x_0 = x_0/np.linalg.norm(x_0,p)
  x_k = x_0
  for i in range(max_iterations):
    y_k = A @ x_k
    x_k = y_k/np.linalg.norm(y_k,p)
  return x_k

Example: Two Steps of Power Iteration

We’ll use normalized power iteration (with the infinity norm) to approximate an eigenvector of the following matrix: $A = [\begin{matrix} 1 & - 2 \\ - 1 & 1 \end{matrix}],$ and the following initial guess: $x_{0} = [\begin{matrix} - 1 \\ 0 \end{matrix}]$

First Iteration:

\begin{aligned} y_{1} = A x_{0} = [\begin{array}{c} 1 & - 2 \\ - 1 & 1 \end{array}] [\begin{array}{c} - 1 \\ 0 \end{array}] = [\begin{array}{c} - 1 \\ 1 \end{array}], \\ x_{1} = \frac{y_{1}}{‖ y_{1} ‖_{\infty}} = y_{1} = [\begin{array}{c} - 1 \\ 1 \end{array}] . \end{aligned}

Second Iteration:

\begin{aligned} y_{2} = A x_{1} = [\begin{array}{c} 1 & - 2 \\ - 1 & 1 \end{array}] [\begin{array}{c} - 1 \\ 1 \end{array}] = [\begin{array}{c} - 3 \\ 2 \end{array}], \\ x_{2} = \frac{y_{2}}{‖ y_{2} ‖_{\infty}} = \frac{1}{3} y_{2} = [\begin{array}{c} - 1 \\ \frac{2}{3} \end{array}] = [\begin{array}{c} - 1 \\ 0.6666 \dots \end{array}] . \end{aligned}

Even after only two iterations, we are getting close to a corresponding eigenvector:

u_{1} = [\begin{matrix} - 1 \\ \frac{1}{\sqrt{2}} \end{matrix}] \approx [\begin{matrix} - 1 \\ 0.7071 \end{matrix}]

with relative error about 4 percent when measured in the infinity norm.

Computing Eigenvalues from Eigenvectors

Power iteration allows us to find an approximate eigenvector corresponding to the largest eigenvalue in magnitude. How can we compute the actual eigenvalue from this? If $λ is an eigenvalue of A, with corresponding eigenvector u$ , then we can compute the value of $λ$ using the Rayleigh Quotient:

λ = \frac{u^{T} A u}{u^{T} u} .

Thus, one can compute an approximate eigenvalue using the approximate eigenvector found during power iteration.

Power Iteration and Floating-Point Arithmetic

Recall that we made the assumption that the initial guess satisfies

$x_{0} = α_{1} u_{1} + α_{2} u_{2} + \dots α_{n} u_{n}, with α_{1} \neq 0$ .

What happens if we choose an initial guess where $α_{1} = 0$ ? If we further assume that $| λ_{2} | > | λ_{3} | \geq | λ_{4} | \geq \dots \geq | λ_{n} |$ , then in theory

A^{k} x_{0} = λ_{2}^{k} (α_{2} u_{2} + α_{3} {(\frac{λ_{3}}{λ_{2}})}^{k} u_{3} + \dots + α_{n} {(\frac{λ_{n}}{λ_{2}})}^{k} u_{n}),

and we would expect that

lim_{k \to \infty} \frac{A^{k} x_{0}}{λ_{2}^{k}} = α_{2} u_{2} .

In practice, this does not happen. For one thing, choosing an initial guess such that $α_{1} = 0$ is extremely unlikely if we have no prior knowledge about the eigenvector $u_{1}$ . Since power iteration is performed numerically, using finite precision arithmetic, we will encounter the presence of rounding error in every iteration. This means that at every iteration $k, including k = 0$ , we will instead have

A^{k} {\hat{x}}_{0} = λ_{1}^{k} ({\hat{α}}_{1} u_{1} + {\hat{α}}_{2} {(\frac{λ_{2}}{λ_{1}})}^{k} u_{2} + \dots + {\hat{α}}_{n} {(\frac{λ_{n}}{λ_{1}})}^{k} u_{n}),

where the ${\hat{α}}_{k}$ are the approximate expansion coefficients of the rounded result. Even if $α_{1} = 0$ , the finite precision representation ${\hat{x}}_{0}$ , will very likely have expansion coefficient ${\hat{α}}_{1} \neq 0$ . Even in the case where rounding the initial guess does not introduce a non-zero ${\hat{α}}_{1}$ , rounding after applying the matrix $A$ will almost certainly introduce a non-zero component in the dominant eigenvector after enough iterations. The probability of coming up with a starting guess $x_{0}$ such that ${\hat{α}}_{1} = 0$ for all iterations is very, very low, if not impossible.

Power Iteration without a Dominant Eigenvalue

Above, we assumed that one eigenvalue had magnitude strictly larger than all the others: $| λ_{1} | > | λ_{2} | \geq | λ_{3} | \geq \dots \geq | λ_{n} |$ . What happens if $| λ_{1} | = | λ_{2} |$ ?

If $λ_{1} = λ_{2} = λ \in R$ , then:

x_{k} = A^{k} x_{0} \approx α_{1} λ^{k} u_{1} + α_{2} λ^{k} u_{2} = λ^{k} (α_{1} u_{1} + α_{2} u_{2}),

hence

$lim_{k \to \infty} λ^{- k} A^{k} x_{0} = α_{1} u_{1} + α_{2} u_{2}$ .

The quantity $α_{1} u_{1} + α_{2} u_{2}$ is still an eigenvector corresponding to $λ$ , so power iteration will still approach a dominant eigenvector.

If the dominant eigenvalues have opposite sign, i.e., $λ_{1} = - λ_{2} = λ \in R$ , then

x_{k} = A^{k} x_{0} \approx α_{1} λ^{k} u_{1} + α_{2} (- λ)^{k} u_{2} = λ^{k} (α_{1} u_{1} + (- 1)^{k} α_{2} u_{2}) .

For large $k$ , we will have $λ^{- k} A x_{0} \approx α_{1} u_{1} + (- 1)^{k} α_{2} u_{2}$ , which although is a linear combination of two eigenvectors, is not itself an eigenvector of $A$ .

Finally, if the two dominant eigenvalues are a complex-conjugate pair $λ_{1} = r e^{i θ}, λ_{2} = r e^{- i θ}$ , then $x_{k} = A^{k} x_{0} \approx α_{1} λ^{k} u_{1} + α_{2} (\overset{―}{λ})^{k} u_{2} = λ^{k} (α_{1} u_{1} + {(\frac{\overset{―}{λ}}{λ})}^{k} α_{2} u_{2}) = λ^{k} (α_{1} u_{1} + α_{2} e^{- i 2 k θ} u_{2}) .$

For large $k$ , $λ^{- k} A x_{0}$ approximate a linear combination of two eigenvectors, but this linear combination will not itself be an eigenvector.

Inverse Iteration

To obtain an eigenvector corresponding to the smallest eigenvalue $λ_{n}$ of a non-singular matrix, we can apply power iteration to $A^{- 1}$ . The following recurrence relationship describes inverse iteration algorithm: $x_{k + 1} = \frac{A^{- 1} x_{k}}{‖ A^{- 1} x_{k} ‖},$

Inverse Iteration with Shift

To obtain an eigenvector corresponding to the eigenvalue closest to some value $σ$ , $A$ can be shifted by $σ$ and inverted in order to solve it similarly to the power iteration algorithm. The following recurrence relationship describes inverse iteration algorithm: $x_{k + 1} = \frac{(A - σ I)^{- 1} x_{k}}{‖ (A - σ I)^{- 1} x_{k} ‖}$ . Note that this is identical to inverse iteration if the shift is zero.

Rayleigh Quotient Iteration

The shift $σ$ can be updated based on a current estimate of the eigenvalue in order to improve convergence rate. Such an estimate can be found using the Rayleigh Quotient. Rayleigh Quotient Iteration is given by the following recurrence relation:

σ_{k} = \frac{x_{k}^{T} A x_{k}}{x_{k}^{T} x_{k}}

x_{k + 1} = \frac{(A - σ_{k} I)^{- 1} x_{k}}{‖ (A - σ_{k} I)^{- 1} x_{k} ‖} .

Convergence properties

The convergence rate for power iteration is linear and the recurrence relationship for the error between the current iterate and a dominant eigenvector is given by: $e_{k + 1} \approx \frac{| λ_{2} |}{| λ_{1} |} e_{k}$ The convergence rate for (shifted) inverse iteration is also linear, but now depends on the two closest eigenvalues to the shift $σ$ . (Standard inverse iteration corresponds to a shift $σ = 0$ . The recurrence relationship for the errors is given by: $e_{k + 1} \approx \frac{| λ_{closest} - σ |}{| λ_{second-closest} - σ |} e_{k}$

Orthogonal Matrices

Square matrices are called orthogonal if and only if the columns are mutually orthogonal to one another and have a norm of $1$ (such a set of vectors are formally known as a orthonormal set), i.e.: $c_{i}^{T} c_{j} = 0 \forall i \neq j, ‖ c_{i} ‖ = 1 \forall i ⟺ A \in O (n),$ or $⟨ c_{i}, c_{j} ⟩ = {\begin{cases} 0 if i \neq j, \\ 1 if i = j \end{cases} ⟺ A \in O (n),$ where $O (n)$ is the set of all $n \times n$ orthogonal matrices called the orthogonal group, $c_{i}$ , $i = 1, \dots, n$ , are the columns of $A$ , and $⟨ \cdot, \cdot ⟩$ is the inner product operator. Orthogonal matrices have many desirable properties: $A^{T} \in O (n) A^{T} A = A A^{T} = I ⟹ A^{- 1} = A^{T} det A = \pm 1 κ_{2} (A) = 1$

Gram-Schmidt

The algorithm to construct an orthogonal basis from a set of linearly independent vectors is called the Gram-Schmidt process. For a basis set ${x_{1}, x_{2}, \dots x_{n}}$ , we can form a orthogonal set ${v_{1}, v_{2}, \dots v_{n}}$ given by the following transformation: $\begin{aligned} v_{1} & = x_{1}, \\ v_{2} & = x_{2} - \frac{⟨ v_{1}, x_{2} ⟩}{‖ v_{1} ‖^{2}} v_{1} \\ v_{3} & = x_{3} - \frac{⟨ v_{1}, x_{3} ⟩}{‖ v_{1} ‖^{2}} v_{1} - \frac{⟨ v_{2}, x_{3} ⟩}{‖ v_{2} ‖^{2}} v_{2} \\ ⋮ & = ⋮ \\ v_{n} & = x_{n} - \sum_{i = 1}^{n - 1} \frac{⟨ v_{i}, x_{n} ⟩}{‖ v_{i} ‖^{2}} v_{i}, \end{aligned}$ where $⟨ \cdot, \cdot ⟩$ is the inner product operator. Each of the vectors in the orthogonal set can be normalized independently to obtain a orthonormal basis.

Review Questions

See this review link

ChangeLog

2020-03-01 Peter Sentz: added text to include content from slides
2018-10-14 Erin Carrier ecarrie2@illinois.edu: removes orthogonal/GS sections
2018-01-14 Erin Carrier ecarrie2@illinois.edu: removes demo links
2017-11-10 Erin Carrier ecarrie2@illinois.edu: adds costs of methods
2017-10-26 Matthew West mwest@illinois.edu: rewrote eval/evec definitions
2017-10-25 Erin Carrier ecarrie2@illinois.edu: minor fixes, added review questions
2017-10-14 Arun Lakshmanan lakshma2@illinois.edu: first complete draft
2017-10-16 Luke Olson lukeo@illinois.edu: outline