Vectors, matrices and norms

Learning Objectives

Understanding matrix-vector multiplications
Special matrix types
How we can “measure” vectors
How we can “measure” matrices

Vector Spaces

A vector space is a set $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>$ of vectors and a field $F <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>F</mi></math>$ (elements of F are called scalars) with the following two operations:

Vector addition: $\forall v, w \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">\forall</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mo>\in</mo><mi>V</mi></math>$ , $v + w \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>+</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mo>\in</mo><mi>V</mi></math>$
Scalar multiplication: $\forall α \in F, v \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">\forall</mi><mi>α</mi><mo>\in</mo><mi>F</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>\in</mo><mi>V</mi></math>$ , $α v \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>α</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>\in</mo><mi>V</mi></math>$

which satisfiy the following conditions:

Associativity (vector): $\forall u, v, w \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">\forall</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mo>\in</mo><mi>V</mi></math>$ , $(u + v) + w = u + (v + w) <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>+</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo stretchy="false">)</mo><mo>+</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>+</mo><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>+</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mo stretchy="false">)</mo></math>$
Zero vector: There exists a vector $0 \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mn mathvariant="bold">0</mn></mrow><mo>\in</mo><mi>V</mi></math>$ such that $\forall u \in V, 0 + u = u <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">\forall</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>\in</mo><mi>V</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mn mathvariant="bold">0</mn></mrow><mo>+</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow></math>$
Additive inverse (negatives): For every $u \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>\in</mo><mi>V</mi></math>$ , there exists $- u \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mo mathvariant="bold">-</mo><mi mathvariant="bold">u</mi></mrow><mo>\in</mo><mi>V</mi></math>$ , such that $u + - u = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>+</mo><mrow data-mjx-texclass="ORD"><mo mathvariant="bold">-</mo><mi mathvariant="bold">u</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mn mathvariant="bold">0</mn></mrow></math>$ .
Associativity (scalar): $\forall α, β \in F, u \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">\forall</mi><mi>α</mi><mo>,</mo><mi>β</mi><mo>\in</mo><mi>F</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>\in</mo><mi>V</mi></math>$ , $(α β) u = α (β u) <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mi>α</mi><mi>β</mi><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>=</mo><mi>α</mi><mo stretchy="false">(</mo><mi>β</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo stretchy="false">)</mo></math>$
Distributivity: $\forall α, β \in F, u \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">\forall</mi><mi>α</mi><mo>,</mo><mi>β</mi><mo>\in</mo><mi>F</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>\in</mo><mi>V</mi></math>$ , $(α + β) u = α u + β u <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mi>α</mi><mo>+</mo><mi>β</mi><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>=</mo><mi>α</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>+</mo><mi>β</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow></math>$
Unitarity: $\forall u \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi mathvariant="normal">\forall</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>\in</mo><mi>V</mi></math>$ , $1 u = u <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>1</mn><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow></math>$

If there exist a set of vectors $v 1, v 2 \dots, v n <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>1</mn></msub><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>2</mn></msub><mo>\dots</mo><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mi>n</mi></msub></math>$ such that any vector $x \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>\in</mo><mi>V</mi></math>$ can be written as a linear combination

x = c 1 v 1 + c 2 v 2 + \dots + c n v n <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><msub><mi>c</mi><mn>1</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>1</mn></msub><mo>+</mo><msub><mi>c</mi><mn>2</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>2</mn></msub><mo>+</mo><mo>\dots</mo><mo>+</mo><msub><mi>c</mi><mi>n</mi></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mi>n</mi></msub></math>

with uniquely determined scalars $c 1, \dots, c n <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>c</mi><mn>1</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mi>c</mi><mi>n</mi></msub></math>$ , the set $v 1, \dots, v n <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>1</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mi>n</mi></msub></mrow></math>$ is called a basis for $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>$ . The size of the basis $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ is called the dimension of $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>$ .

The standard example of a vector space is $V = R n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi><mo>=</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup></math>$ with $F = R <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>F</mi><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow></math>$ . Vectors in $R n <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup></math>$ are written as an array of numbers:

x = [x 1 x 2 ⋮ x n] = [x 1 x 2 \dots x n] T <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><msub><mi>x</mi><mi>n</mi></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>=</mo><msup><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd><mtd><msub><mi>x</mi><mn>2</mn></msub></mtd><mtd><mo>\dots</mo></mtd><mtd><msub><mi>x</mi><mi>n</mi></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mi>T</mi></msup></math>

The dimension of $R n <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup></math>$ is $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ . The standard basis vectors of $R n <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup></math>$ are written as

e 1 = [10 ⋮ 0] e 2 = [01 ⋮ 0] \dots e n = [00 ⋮ 1] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mn>1</mn></msub><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><mn>0</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mspace width="5mm"></mspace><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mn>2</mn></msub><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><mn>0</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mtext> </mtext><mo>\dots</mo><mspace width="5mm"></mspace><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mi>n</mi></msub><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

A set of vectors $v 1, \dots, v k <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>1</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mi>k</mi></msub></math>$ is called linearly independent if the equation $α 1 v 1 + α 2 v 2 + \dots + α k v k = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>α</mi><mn>1</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>1</mn></msub><mo>+</mo><msub><mi>α</mi><mn>2</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>2</mn></msub><mo>+</mo><mo>\dots</mo><mo>+</mo><msub><mi>α</mi><mi>k</mi></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mi>k</mi></msub><mo>=</mo><mrow data-mjx-texclass="ORD"><mn mathvariant="bold">0</mn></mrow></math>$ in the unknowns $α 1, \dots, α k <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>α</mi><mn>1</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mi>α</mi><mi>k</mi></msub></math>$ , has only the trivial solution $α 1 = α 2 = \dots = α k = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>α</mi><mn>1</mn></msub><mo>=</mo><msub><mi>α</mi><mn>2</mn></msub><mo>=</mo><mo>\dots</mo><mo>=</mo><msub><mi>α</mi><mi>k</mi></msub><mo>=</mo><mn>0</mn></math>$ . Otherwise the vectors are linearly dependent, and at least one of the vectors can be written as a linear combination of the other vectors in the set. A basis is always linearly independent.

Inner Product

Let $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>$ be a real vector space. Then, an inner product is a function $⟨ \cdot, \cdot ⟩ : V \times V \to R <math xmlns="http://www.w3.org/1998/Math/MathML"><mo fence="false" stretchy="false">⟨</mo><mo>\cdot</mo><mo>,</mo><mo>\cdot</mo><mo fence="false" stretchy="false">⟩</mo><mo>:</mo><mi>V</mi><mo>\times</mo><mi>V</mi><mo stretchy="false">\to</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow></math>$ (i.e., it takes two vectors and returns a real number) which satisfies the following four properties, where $u, v, w \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mo>\in</mo><mi>V</mi></math>$ and $α, β \in R <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>α</mi><mo>,</mo><mi>β</mi><mo>\in</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow></math>$ :

Positivity: $⟨ u, u ⟩ \geq 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo fence="false" stretchy="false">⟨</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo fence="false" stretchy="false">⟩</mo><mo>\geq</mo><mn>0</mn></math>$
Definiteness: $⟨ u, u ⟩ = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo fence="false" stretchy="false">⟨</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo fence="false" stretchy="false">⟩</mo><mo>=</mo><mn>0</mn></math>$ if and only if $u = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>=</mo><mn>0</mn></math>$
Symmetric: $⟨ u, v ⟩ = ⟨ v, u ⟩ <math xmlns="http://www.w3.org/1998/Math/MathML"><mo fence="false" stretchy="false">⟨</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo fence="false" stretchy="false">⟩</mo><mo>=</mo><mo fence="false" stretchy="false">⟨</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo fence="false" stretchy="false">⟩</mo></math>$
Linearity: $⟨ α u + β v, w ⟩ = α ⟨ u, w ⟩ + β ⟨ v, w ⟩ <math xmlns="http://www.w3.org/1998/Math/MathML"><mo fence="false" stretchy="false">⟨</mo><mi>α</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>+</mo><mi>β</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mo fence="false" stretchy="false">⟩</mo><mo>=</mo><mi>α</mi><mo fence="false" stretchy="false">⟨</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>,</mo><mi>w</mi><mo fence="false" stretchy="false">⟩</mo><mo>+</mo><mi>β</mi><mo fence="false" stretchy="false">⟨</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mo fence="false" stretchy="false">⟩</mo></math>$

The inner product intuitively represents the similarity between two vectors. Two vectors $u, v \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>\in</mo><mi>V</mi></math>$ are said to be orthogonal if $⟨ u, v ⟩ = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><mo fence="false" stretchy="false">⟨</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo fence="false" stretchy="false">⟩</mo><mo>=</mo><mn>0</mn></math>$ .

The standard inner product on $R n <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup></math>$ is the dot product : $⟨ x, y ⟩ = x T y = \sum n i = 1 x i y i . <math xmlns="http://www.w3.org/1998/Math/MathML"><mo fence="false" stretchy="false">⟨</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">y</mi></mrow><mo fence="false" stretchy="false">⟩</mo><mo>=</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mi>T</mi></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">y</mi></mrow><mo>=</mo><munderover><mo data-mjx-texclass="OP">\sum</mo><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>x</mi><mi>i</mi></msub><msub><mi>y</mi><mi>i</mi></msub><mo>.</mo></math>$

To read more about Inner Product Definition

Linear Transformations and Matrices

A function $f : V \to W <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo>:</mo><mi>V</mi><mo accent="false" stretchy="false">\to</mo><mi>W</mi></math>$ between two vector spaces $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>$ and $W <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>W</mi></math>$ is called linear if

$f (u + v) = f (u) + f (v) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>+</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo stretchy="false">)</mo><mo>+</mo><mi>f</mi><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo stretchy="false">)</mo></math>$ , for any $u, v \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>\in</mo><mi>V</mi></math>$
$f (c v) = c f (v) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>c</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><mi>c</mi><mi>f</mi><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo stretchy="false">)</mo></math>$ , for all $v \in V <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>\in</mo><mi>V</mi></math>$ and all scalars $c <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>c</mi></math>$

$f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ is commonly called a linear transformation.

If $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ and $m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>$ are the dimension of $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>$ and $W <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>W</mi></math>$ , respectively, then $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ can be represented as an $m \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi><mo>\times</mo><mi>n</mi></math>$ rectangular array or matrix

A = [a 11 a 12 \dots a 1 n a 21 a 22 \dots a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 \dots a m n] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>11</mn></mrow></msub></mtd><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>12</mn></mrow></msub></mtd><mtd><mo>\dots</mo></mtd><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>1</mn><mi>n</mi></mrow></msub></mtd></mtr><mtr><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>21</mn></mrow></msub></mtd><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>22</mn></mrow></msub></mtd><mtd><mo>\dots</mo></mtd><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>2</mn><mi>n</mi></mrow></msub></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd><mtd><mo>⋱</mo></mtd><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mi>m</mi><mn>1</mn></mrow></msub></mtd><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mi>m</mi><mn>2</mn></mrow></msub></mtd><mtd><mo>\dots</mo></mtd><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>n</mi></mrow></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

The numbers in the matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ are determined by the basis vectors for the spaces $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>$ and $W <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>W</mi></math>$ . To see how, we first review matrix vector multiplication.

Matrix-vector multiplication

Let $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ be an $m \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi><mo>\times</mo><mi>n</mi></math>$ matrix of real numbers. We can also write $A \in R m \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>\in</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi><mo>\times</mo><mi>n</mi></mrow></msup></math>$ as shorthand. If $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ is a vector in $R n <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup></math>$ then the matrix-vector product $A x = b <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow></math>$ is a vector in $R m <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">R</mi></mrow><mi>m</mi></msup></math>$ defined by:

b i = n \sum j = 1 a i j x j for i = 1, 2, \dots, m . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>b</mi><mi>i</mi></msub><mo>=</mo><munderover><mo data-mjx-texclass="OP">\sum</mo><mrow data-mjx-texclass="ORD"><mi>j</mi><mo>=</mo><mn>1</mn></mrow><mi>n</mi></munderover><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></msub><msub><mi>x</mi><mi>j</mi></msub><mspace width="6mm"></mspace><mtext>for </mtext><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mo>\dots</mo><mo>,</mo><mi>m</mi><mo>.</mo></math>

We can interpret matrix-vector multiplications in two ways. Throughout this online textbook reference, we will use the notation $a i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">a</mi></mrow><mi>i</mi></msub></math>$ to refer to the $i t h <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>i</mi><mrow data-mjx-texclass="ORD"><mi>t</mi><mi>h</mi></mrow></msup></math>$ column of the matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ and $a T i <math xmlns="http://www.w3.org/1998/Math/MathML"><msubsup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">a</mi></mrow><mi>i</mi><mi>T</mi></msubsup></math>$ to refer to the $i t h <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mi>i</mi><mrow data-mjx-texclass="ORD"><mi>t</mi><mi>h</mi></mrow></msup></math>$ row of the matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ .

1) Writing a matrix-vector multiplication as inner products of the rows $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ :

A x = [a T 1 \cdot x a T 2 \cdot x ⋮ a T m \cdot x] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msubsup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">a</mi></mrow><mrow data-mjx-texclass="ORD"><mn>1</mn></mrow><mi>T</mi></msubsup><mo>\cdot</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></mtd></mtr><mtr><mtd><msubsup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">a</mi></mrow><mrow data-mjx-texclass="ORD"><mn>2</mn></mrow><mi>T</mi></msubsup><mo>\cdot</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><msubsup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">a</mi></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi></mrow><mi>T</mi></msubsup><mo>\cdot</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>

2) Writing a matrix-vector multiplication as linear combination of the columns of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ :

It is this representation that allows us to express any linear transformation between finite-dimensional vector spaces with matrices.

Matrix Representation of Linear Transformations

Let $e 1, e 2, \dots, e n <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mn>1</mn></msub><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mn>2</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mi>n</mi></msub></math>$ be the standard basis of $R n <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>n</mi></msup></math>$ . If we define the vector $z j = A e j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">z</mi></mrow><mi>j</mi></msub><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mi>j</mi></msub></math>$ , then using the interpretation of matrix-vector products as linear combinations of the column of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ , we have that:

where we have written the standard basis of $R m <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>m</mi></msup></math>$ as $ˆ e 1, ˆ e 2, \dots, ˆ e m <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mover><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mo stretchy="false">^</mo></mover></mrow><mn>1</mn></msub><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mover><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mo stretchy="false">^</mo></mover></mrow><mn>2</mn></msub><mo>,</mo><mo>\dots</mo><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mover><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mo stretchy="false">^</mo></mover></mrow><mi>m</mi></msub></math>$ .

In other words, if $z j = A e j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">z</mi></mrow><mi>j</mi></msub><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mi>j</mi></msub></math>$ is written as a linear combination of the basis vectors of $R m <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mi>m</mi></msup></math>$ , the element $a i j <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></msub></math>$ is the coefficient corresponding to $ˆ e i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mover><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">e</mi></mrow><mo stretchy="false">^</mo></mover></mrow><mrow data-mjx-texclass="ORD"><mi>i</mi></mrow></msub></math>$ .

Example

Let’s try an example. Suppose that $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>$ is a vector space with basis $v 1, v 2, v 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>1</mn></msub><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>2</mn></msub><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>3</mn></msub></math>$ , and $W <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>W</mi></math>$ is a vector space with basis $w 1, w 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>1</mn></msub><mo>,</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>2</mn></msub></math>$ . Then $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>$ and $W <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>W</mi></math>$ have dimension 3 and 2, respectively. Thus any linear transformation $f : V \to W <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo>:</mo><mi>V</mi><mo accent="false" stretchy="false">\to</mo><mi>W</mi></math>$ can be represented by a $2 \times 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>2</mn><mo>\times</mo><mn>3</mn></math>$ matrix. We can introduce column vector notation, so that vectors $v = α 1 v 1 + α 2 v 2 + α 3 v 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>=</mo><msub><mi>α</mi><mn>1</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>1</mn></msub><mo>+</mo><msub><mi>α</mi><mn>2</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>2</mn></msub><mo>+</mo><msub><mi>α</mi><mn>3</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>3</mn></msub></math>$ and $w = β 1 w 1 + β 2 w 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mo>=</mo><msub><mi>β</mi><mn>1</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>1</mn></msub><mo>+</mo><msub><mi>β</mi><mn>2</mn></msub><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>2</mn></msub></math>$ can be written as

v = [α 1 α 2 α 3], w = [β 1 β 2] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>α</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><msub><mi>α</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><msub><mi>α</mi><mn>3</mn></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>,</mo><mspace width="6mm"></mspace><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>β</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><msub><mi>β</mi><mn>2</mn></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

We have not specified what the vector spaces $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi></math>$ and $W <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>W</mi></math>$ , but it is fine if we treat them like elements of $R 3 <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mn>3</mn></msup></math>$ and $R 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mn>2</mn></msup></math>$ .

Suppose that the following facts are known about the linear transformation $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ :

$f (v 1) = w 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>1</mn></msub><mo stretchy="false">)</mo><mo>=</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>1</mn></msub></math>$
$f (v 2) = 5 w 1 - w 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>2</mn></msub><mo stretchy="false">)</mo><mo>=</mo><mn>5</mn><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>1</mn></msub><mo>-</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>2</mn></msub></math>$
$f (v 3) = 2 w 1 + 2 w 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>3</mn></msub><mo stretchy="false">)</mo><mo>=</mo><mn>2</mn><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>1</mn></msub><mo>+</mo><mn>2</mn><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>2</mn></msub></math>$

This is enough information to completely determine the matrix representation of $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ . The first equation tells us

So we know $a 11 = 1, a 21 = 0 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>11</mn></mrow></msub><mo>=</mo><mn>1</mn><mo>,</mo><mtext> </mtext><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>21</mn></mrow></msub><mo>=</mo><mn>0</mn></math>$ . The second equation tells us that

So we know $a 12 = 5, a 22 = - 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>12</mn></mrow></msub><mo>=</mo><mn>5</mn><mo>,</mo><mtext> </mtext><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>22</mn></mrow></msub><mo>=</mo><mo>-</mo><mn>1</mn></math>$ . Finally, the third equation tells us

2 w 1 + 2 w 2 = f (v 2) ⟹ [22] = [15 a 13 0 - 1 a 23] [001] = [a 13 a 23] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mn>2</mn><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>1</mn></msub><mo>+</mo><mn>2</mn><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">w</mi></mrow><mn>2</mn></msub><mo>=</mo><mi>f</mi><mo stretchy="false">(</mo><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mn>2</mn></msub><mo stretchy="false">)</mo><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mo stretchy="false">⟹</mo><mstyle scriptlevel="0"><mspace width="0.278em"></mspace></mstyle><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>2</mn></mtd></mtr><mtr><mtd><mn>2</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd><mtd><mn>5</mn></mtd><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>13</mn></mrow></msub></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mo>-</mo><mn>1</mn></mtd><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>23</mn></mrow></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>13</mn></mrow></msub></mtd></mtr><mtr><mtd><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>23</mn></mrow></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

Thus, $a 13 = 2, a 23 = 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>13</mn></mrow></msub><mo>=</mo><mn>2</mn><mo>,</mo><mtext> </mtext><msub><mi>a</mi><mrow data-mjx-texclass="ORD"><mn>23</mn></mrow></msub><mo>=</mo><mn>2</mn></math>$ , and the linear transformation $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ can be represented by the matrix:

[152 0 - 1 2] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd><mtd><mn>5</mn></mtd><mtd><mn>2</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mo>-</mo><mn>1</mn></mtd><mtd><mn>2</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

It is important to note that the matrix representation not only depends on $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ , but also our choice of basis. If we chose different bases for the vector spaces $V and W <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>V</mi><mtext> and </mtext><mi>W</mi></math>$ , the matrix representation of $f <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi></math>$ would change as well.

Special Matrices

Zero Matrices

The $m \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi><mo>\times</mo><mi>n</mi></math>$ zero matrix is denoted by $0 m n <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mn mathvariant="bold">0</mn></mrow><mrow data-mjx-texclass="ORD"><mi>m</mi><mi>n</mi></mrow></msub></math>$ and has all entries equal to zero. For example, the $3 \times 4 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>3</mn><mo>\times</mo><mn>4</mn></math>$ zero matrix is

034 = [000000000000] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mrow data-mjx-texclass="ORD"><mn mathvariant="bold">0</mn></mrow><mrow data-mjx-texclass="ORD"><mn>34</mn></mrow></msub><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

Identity Matrices

The $n \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\times</mo><mi>n</mi></math>$ identity matrix is denoted by $I n <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow><mi>n</mi></msub></math>$ and has all entries equal to zero except for the diagonal, which is all 1. For example, the $4 \times 4 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>4</mn><mo>\times</mo><mn>4</mn></math>$ identity matrix is

I 4 = [1000010000100001] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow><mn>4</mn></msub><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>1</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>1</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>1</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

Diagonal Matrices

A $n \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\times</mo><mi>n</mi></math>$ diagonal matrix has all entries equal to zero except for the diagonal entries. We typically use $D <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">D</mi></mrow></math>$ for diagonal matrices. For ecample, $4 \times 4 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>4</mn><mo>\times</mo><mn>4</mn></math>$ diagonal matrices have the form:

[d 11 000 0 d 22 00 00 d 33 0 000 d 44] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>d</mi><mrow data-mjx-texclass="ORD"><mn>11</mn></mrow></msub></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><msub><mi>d</mi><mrow data-mjx-texclass="ORD"><mn>22</mn></mrow></msub></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><msub><mi>d</mi><mrow data-mjx-texclass="ORD"><mn>33</mn></mrow></msub></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><msub><mi>d</mi><mrow data-mjx-texclass="ORD"><mn>44</mn></mrow></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

Triangular Matrices

A lower-triangular matrix is a square matrix that is entirely zero above the diagonal. We typically use $L <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">L</mi></mrow></math>$ for lower-triangular matrices. For example, $4 \times 4 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>4</mn><mo>\times</mo><mn>4</mn></math>$ lower-triangular matrices have the form:

L = [ℓ 11 000 ℓ 21 ℓ 22 00 ℓ 31 ℓ 32 ℓ 33 0 ℓ 41 ℓ 42 ℓ 43 ℓ 44] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">L</mi></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>ℓ</mi><mrow data-mjx-texclass="ORD"><mn>11</mn></mrow></msub></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><msub><mi>ℓ</mi><mrow data-mjx-texclass="ORD"><mn>21</mn></mrow></msub></mtd><mtd><msub><mi>ℓ</mi><mrow data-mjx-texclass="ORD"><mn>22</mn></mrow></msub></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><msub><mi>ℓ</mi><mrow data-mjx-texclass="ORD"><mn>31</mn></mrow></msub></mtd><mtd><msub><mi>ℓ</mi><mrow data-mjx-texclass="ORD"><mn>32</mn></mrow></msub></mtd><mtd><msub><mi>ℓ</mi><mrow data-mjx-texclass="ORD"><mn>33</mn></mrow></msub></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><msub><mi>ℓ</mi><mrow data-mjx-texclass="ORD"><mn>41</mn></mrow></msub></mtd><mtd><msub><mi>ℓ</mi><mrow data-mjx-texclass="ORD"><mn>42</mn></mrow></msub></mtd><mtd><msub><mi>ℓ</mi><mrow data-mjx-texclass="ORD"><mn>43</mn></mrow></msub></mtd><mtd><msub><mi>ℓ</mi><mrow data-mjx-texclass="ORD"><mn>44</mn></mrow></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

An upper triangular matrix is a square matrix that is entirely zero below the diagonal. We typically use $U <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">U</mi></mrow></math>$ for upper-triangular matrices. For example, $4 \times 4 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>4</mn><mo>\times</mo><mn>4</mn></math>$ upper-triangular matrices have the form:

U = [u 11 u 12 u 13 u 14 0 u 22 u 23 u 24 00 u 33 u 34 000 u 44] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">U</mi></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>u</mi><mrow data-mjx-texclass="ORD"><mn>11</mn></mrow></msub></mtd><mtd><msub><mi>u</mi><mrow data-mjx-texclass="ORD"><mn>12</mn></mrow></msub></mtd><mtd><msub><mi>u</mi><mrow data-mjx-texclass="ORD"><mn>13</mn></mrow></msub></mtd><mtd><msub><mi>u</mi><mrow data-mjx-texclass="ORD"><mn>14</mn></mrow></msub></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><msub><mi>u</mi><mrow data-mjx-texclass="ORD"><mn>22</mn></mrow></msub></mtd><mtd><msub><mi>u</mi><mrow data-mjx-texclass="ORD"><mn>23</mn></mrow></msub></mtd><mtd><msub><mi>u</mi><mrow data-mjx-texclass="ORD"><mn>24</mn></mrow></msub></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><msub><mi>u</mi><mrow data-mjx-texclass="ORD"><mn>33</mn></mrow></msub></mtd><mtd><msub><mi>u</mi><mrow data-mjx-texclass="ORD"><mn>34</mn></mrow></msub></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><msub><mi>u</mi><mrow data-mjx-texclass="ORD"><mn>44</mn></mrow></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

Properties of triangular matrices:

An $n \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\times</mo><mi>n</mi></math>$ triangular matrix has $n (n - 1) / 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo stretchy="false">(</mo><mi>n</mi><mo>-</mo><mn>1</mn><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mo>/</mo></mrow><mn>2</mn></math>$ entries that must be zero, and $n (n + 1) / 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo stretchy="false">(</mo><mi>n</mi><mo>+</mo><mn>1</mn><mo stretchy="false">)</mo><mrow data-mjx-texclass="ORD"><mo>/</mo></mrow><mn>2</mn></math>$ entries that are allowed to be non-zero.
Zero matrices, identity matrices, and diagonal matrices are all both lower triangular and upper triangular.

Permutation Matrices

A permuation matrix is a square matrix that is all zero, except for a single entry in each row and each column which is 1. We typically use $P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">P</mi></mrow></math>$ for permutation matrices. An example of a $4 \times 4 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>4</mn><mo>\times</mo><mn>4</mn></math>$ permutation matrix is

P = [0100000110000010] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">P</mi></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>0</mn></mtd><mtd><mn>1</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>1</mn></mtd></mtr><mtr><mtd><mn>1</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><mn>0</mn></mtd><mtd><mn>1</mn></mtd><mtd><mn>0</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>.</mo></math>

The properties of a permutation matrix are:

Exactly $n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi></math>$ entries are non-zero.
Multiplying a vector with a permutation matrix permutes (rearranges) the order of the entries in the vector. For example, using $P <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">P</mi></mrow></math>$ above and $x = [1, 2, 3, 4] T <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>x</mi><mo>=</mo><mo stretchy="false">[</mo><mn>1</mn><mo>,</mo><mn>2</mn><mo>,</mo><mn>3</mn><mo>,</mo><mn>4</mn><msup><mo stretchy="false">]</mo><mi>T</mi></msup></math>$ , the product is $P x = [2, 4, 1, 3] T <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">P</mi><mi mathvariant="bold">x</mi></mrow><mo>=</mo><mo stretchy="false">[</mo><mn>2</mn><mo>,</mo><mn>4</mn><mo>,</mo><mn>1</mn><mo>,</mo><mn>3</mn><msup><mo stretchy="false">]</mo><mi>T</mi></msup></math>$ .
If $P i j = 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>P</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mi>j</mi></mrow></msub><mo>=</mo><mn>1</mn></math>$ then $(P x) i = x j <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">P</mi><mi mathvariant="bold">x</mi></mrow><msub><mo stretchy="false">)</mo><mi>i</mi></msub><mo>=</mo><msub><mi>x</mi><mi>j</mi></msub></math>$ .
The inverse of a permutation matrix is its transpose, so $P P T = P T P = I <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">P</mi><mi mathvariant="bold">P</mi></mrow><mi>T</mi></msup><mo>=</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">P</mi></mrow><mi>T</mi></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">P</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow></math>$ .

Matrices in Block Form

A matrix in block form is a matrix partitioned into blocks. A block is simply a submatrix. For example, consider

M = [A B C D] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">M</mi></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></mtd><mtd><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">B</mi></mrow></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">C</mi></mrow></mtd><mtd><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">D</mi></mrow></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>

where $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ , $B <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">B</mi></mrow></math>$ , $C <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">C</mi></mrow></math>$ , and $D <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">D</mi></mrow></math>$ are submatrices.

There are special matrices in block form as well. For instance, a block diagonal matrix is a block matrix whose off-diagonal blocks are zero matrices.

Matrix Rank

The rank of a matrix is the number of linearly independent columns of the matrix. It can also be shown that the matrix has the same number of linearly indendent rows, as well. If $A is an m \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mtext> is an </mtext><mi>m</mi><mo>\times</mo><mi>n</mi></math>$ matrix, then

$rank (A) \leq min (m, n) <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>rank</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><mo>\leq</mo><mtext>min</mtext><mo stretchy="false">(</mo><mi>m</mi><mo>,</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ .
If $rank (A) = min (m, n) <math xmlns="http://www.w3.org/1998/Math/MathML"><mtext>rank</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><mtext>min</mtext><mo stretchy="false">(</mo><mi>m</mi><mo>,</mo><mi>n</mi><mo stretchy="false">)</mo></math>$ , then $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ is full rank. Otherwise, $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ is rank deficient.

A square $n \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\times</mo><mi>n</mi></math>$ matrix $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ is invertible if there exists a square matrix $B <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">B</mi></mrow></math>$ such that $AB = BA = I <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">AB</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">BA</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow></math>$ , where $I <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">I</mi></mrow></math>$ is the $n \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\times</mo><mi>n</mi></math>$ identity matrix. The matrix $B <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">B</mi></mrow></math>$ is denoted by $A - 1 <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mo>-</mo><mn>1</mn></mrow></msup></math>$ . A square matrix is invertible if and only if it has full rank. A square matrix that is not invertible is called a singular matrix.

Vector Norm

A vector norm is a function $‖ <math xmlns="http://www.w3.org/1998/Math/MathML"><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mo>:</mo><mi>V</mi><mo stretchy="false">\to</mo><msubsup><mrow data-mjx-texclass="ORD"><mi mathvariant="double-struck">R</mi></mrow><mn>0</mn><mo>+</mo></msubsup></math>$ (i.e., it takes a vector and returns a nonnegative real number) that satisfies the following properties, where $u, v \in V$ and $α \in R$ :

Positivity: $‖ u ‖ \geq 0$
Definiteness: $‖ u ‖ = 0$ if and only if $u = 0$
Homogeneity: $‖ α u ‖ = | α | ‖ u ‖$
Triangle inequality: $‖ u + v ‖ \leq ‖ u ‖ + ‖ v ‖$

A norm is a generalization of “absolute value” and measures the “magnitude” of the input vector.

The p-norm

The p-norm is defined as

$‖ w ‖_{p} = (\sum_{i = 1}^{N} | w_{i} |^{p})^{\frac{1}{p}}$ .

The definition is a valid norm when $p \geq 1$ . If $0 \leq p < 1$ then it is not a valid norm because it violates the triangle inequality.

When $p = 2$ (2-norm), this is called the Euclidean norm and it corresponds to the length of the vector.

Vector Norm Examples

Consider the case of $w = [- 3, 5, 0, 1]$ , in this part we will show how to calculate the 1, 2, and $\infty$ norm of $w$ .

For the 1-norm:

‖ w ‖_{1} = (\sum_{i = 1}^{N} | w_{i} |^{1})^{\frac{1}{1}}

‖ w ‖_{1} = \sum_{i = 1}^{N} | w_{i} |

‖ w ‖_{1} = | - 3 | + | 5 | + | 0 | + | 1 |

‖ w ‖_{1} = 3 + 5 + 0 + 1

‖ w ‖_{1} = 9

For the 2-norm:

‖ w ‖_{2} = (\sum_{i = 1}^{N} | w_{i} |^{2})^{\frac{1}{2}}

‖ w ‖_{2} = \sqrt{\sum_{i = 1}^{N} w_{i}^{2}}

‖ w ‖_{2} = \sqrt{(- 3)^{2} + (5)^{2} + (0)^{2} + (1)^{2}}

‖ w ‖_{2} = \sqrt{9 + 25 + 0 + 1}

‖ w ‖_{2} = \sqrt{35} \approx 5.92

For the $\infty$ -norm:

‖ w ‖_{\infty} = lim_{p \to \infty} (\sum_{i = 1}^{N} | w_{i} |^{p})^{\frac{1}{p}}

‖ w ‖_{\infty} = max_{i = 1, \dots, N} | w_{i} |

‖ w ‖_{\infty} = max (| - 3 |, | 5 |, | 0 |, | 1 |)

‖ w ‖_{\infty} = max (3, 5, 0, 1)

‖ w ‖_{\infty} = 5

Matrix Norm

A general matrix norm is a real valued function $‖ A ‖$ that satisfies the following properties:

Positivity: $‖ A ‖ \geq 0$
Definiteness: $‖ A ‖ = 0$ if and only if $A = 0$
Homogeneity: $‖ λ A ‖ = | λ | ‖ A ‖$ for all scalars $λ$
Triangle inequality: $‖ A + B ‖ \leq ‖ A ‖ + ‖ B ‖$

Induced (or operator) matrix norms are associated with a specific vector norm $‖ \cdot ‖$ and are defined as:

‖ A ‖ := max_{‖ x ‖ = 1} ‖ A x ‖ .

An induced matrix norm is a particular type of a general matrix norm. Induced matrix norms tell us the maximum amplification of the norm of any vector when multiplied by the matrix. Note that the definition above is equivalent to

‖ A ‖ = max_{‖ x ‖ \neq 0} \frac{‖ A x ‖}{‖ x ‖} .

In addition to the properties above of general matrix norms, induced matrix norms also satisfy the submultiplicative conditions:

‖ A x ‖ \leq ‖ A ‖ ‖ x ‖

‖ A B ‖ \leq ‖ A ‖ ‖ B ‖

Frobenius norm

The Frobenius norm is simply the sum of every element of the matrix squared, which is equivalent to applying the vector $2$ -norm to the flattened matrix,

‖ A ‖_{F} = \sqrt{\sum_{i, j} a_{i j}^{2}} .

The Frobenius norm is an example of a general matrix norm that is not an induced norm.

The matrix p-norm

The matrix p-norm is induced by the p-norm of a vector. It is $‖ A ‖_{p} := max_{‖ x ‖_{p} = 1} ‖ A x ‖_{p}$ . There are three special cases:

For the 1-norm, this reduces to the maximum absolute column sum of the matrix, i.e.,

‖ A ‖_{1} = max_{j} \sum_{i = 1}^{n} | a_{i j} | .

For the 2-norm, this reduces the maximum singular value of the matrix.

‖ A ‖_{2} = max_{k} σ_{k}

For the $\infty$ -norm this reduces to the maximum absolute row sum of the matrix.

‖ A ‖_{\infty} = max_{i} \sum_{j = 1}^{n} | a_{i j} | .

Matrix Norm Examples

Now we will go through a few examples with a matrix $C$ , defined below.

C = [\begin{matrix} 3 & - 2 \\ - 1 & 3 \end{matrix}]

For the 1-norm:

‖ C ‖_{1} = max_{‖ x ‖_{1} = 1} ‖ C x ‖_{1}

‖ C ‖_{1} = max_{1 \leq j \leq 3} \sum_{i = 1}^{3} | C_{i j} |

‖ C ‖_{1} = max (| 3 | + | - 1 |, | - 2 | + | 3 |)

‖ C ‖_{1} = max (4, 5)

‖ C ‖_{1} = 5

For the 2-norm:

The singular values are the square roots of the eigenvalues of the matrix $C^{T} C$ . You can also find the maximum singular values by calculating the Singular Value Decomposition of the matrix.

‖ C ‖_{2} = max_{‖ x ‖_{2} = 1} ‖ C x ‖_{2}

d e t (C^{T} C - λ I) = 0

d e t ([\begin{matrix} 3 & - 1 \\ - 2 & 3 \end{matrix}] [\begin{matrix} 3 & - 2 \\ - 1 & 3 \end{matrix}] - λ I) = 0

d e t ([\begin{matrix} 9 + 1 & - 6 - 3 \\ - 3 - 6 & 4 + 9 \end{matrix}] - λ I) = 0

d e t ([\begin{matrix} 10 - λ & - 9 \\ - 9 & 13 - λ \end{matrix}]) = 0

(10 - λ) (13 - λ) - 81 = 0

λ^{2} - 23 λ + 130 - 81 = 0

λ^{2} - 23 λ + 49 = 0

(λ - \frac{1}{2} (23 + 3 \sqrt{37})) (λ - \frac{1}{2} (23 - 3 \sqrt{37})) = 0

‖ C ‖_{2} = \sqrt{λ_{m a x}} = \sqrt{\frac{1}{2} (23 + 3 \sqrt{37})} \approx 4.54

For the $\infty$ -norm:

‖ C ‖_{\infty} = max_{‖ x ‖_{\infty} = 1} ‖ C x ‖_{\infty}

‖ C ‖_{\infty} = max_{1 \leq i \leq 3} \sum_{j = 1}^{3} | C_{i j} |

‖ C ‖_{\infty} = max (| 3 | + | - 2 |, | - 1 | + | 3 |)

‖ C ‖_{\infty} = max (5, 4)

‖ C ‖_{\infty} = 5

Review Questions

See this review link

ChangeLog

2020-04-27 Mariana Silva mfsilva@illinois.edu: updated notation and mat-vec section
2020-02-01 Peter Sentz: added more text from current slide deck
2018-3-14 Adam Stewart adamjs5@illinois.edu: clarifying definition of Frobenius norm
2017-11-10 Erin Carrier ecarrie2@illinois.edu: fixing index range for example, adds linear function definition
2017-10-29 Erin Carrier ecarrie2@illinois.edu: changing inner product notation, additional comments on induced vs general norms
2017-10-29 Erin Carrier ecarrie2@illinois.edu: adds review questions, completes vector space definition, rewords matrix norm section, minor other revisions
2017-10-29 Erin Carrier ecarrie2@illinois.edu: adds block form
2017-10-28 John Doherty jjdoher2@illinois.edu: first complete draft
2017-10-16 Matthew West mwest@illinois.edu: first complete draft