Least Squares Data Fitting

Learning Objectives

Set up a linear least-squares problem from a set of data
Use an SVD to solve the least-squares problem
Quantify the error in a least-squares problem

Linear Regression with a Set of Data

Consider a set of data points (where ), . Suppose we want to find a straight line that best fits these data points. Mathematically, we are finding $x_0$ and $x_1$ such that

y i = x 1 t i + x 0, \forall i \in [1, m] . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>y</mi><mi>i</mi></msub><mo>=</mo><msub><mi>x</mi><mn>1</mn></msub><mstyle scriptlevel="0"><mspace width="0.167em"></mspace></mstyle><msub><mi>t</mi><mi>i</mi></msub><mo>+</mo><msub><mi>x</mi><mn>0</mn></msub><mo>,</mo><mstyle scriptlevel="0"><mspace width="1em"></mspace></mstyle><mi mathvariant="normal">\forall</mi><mi>i</mi><mo>\in</mo><mo stretchy="false">[</mo><mn>1</mn><mo>,</mo><mi>m</mi><mo stretchy="false">]</mo><mo>.</mo></math>

In matrix form, the resulting linear system is

[1 t 1 1 t 2 ⋮ ⋮ 1 t m] [x 0 x 1] = [y 1 y 2 ⋮ y m] <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><mn>1</mn></mtd><mtd><msub><mi>t</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><mn>1</mn></mtd><mtd><msub><mi>t</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><mn>1</mn></mtd><mtd><msub><mi>t</mi><mi>m</mi></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>x</mi><mn>0</mn></msub></mtd></mtr><mtr><mtd><msub><mi>x</mi><mn>1</mn></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>y</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><msub><mi>y</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><msub><mi>y</mi><mi>m</mi></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow></math>

However, it is obvious that we have more equations than unknowns, and there is usually no exact solution to the above problem.

Generally, if we have a linear system

${\bf A x} = {\bf b}$

where ${\bf A}$ is an matrix. When we call this system overdetermined and the equality is usually not exactly satisfiable as ${\bf b}$ may not lie in the column space of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ .

Therefore, an overdetermined system is better written as

${\bf A x} \cong {\bf b}$

Linear Least-squares Problem

For an overdetermined system $A x ≅ b <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi><mi mathvariant="bold">x</mi></mrow><mo>≅</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow></math>$ , we are typically looking for a solution $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ that minimizes the squared Euclidean norm of the residual vector $r = b - A x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">r</mi></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow><mo>-</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ ,

$\min_{ {\bf x} } \|{\bf r}\|_2^2 = \min_{ {\bf x} } \|{\bf b} - {\bf A} {\bf x}\|_2^2.$

This problem $A x ≅ b <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>A</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>≅</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow></math>$ is called a linear least-squares problem, and the solution $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ is called least-squares solution. Here we will first focus on linear least-squares problems.

Data Fitting vs Interpolation

It is important to understand that interpolation and least-squares data fitting, while somewhat similar, are fundamentally different in their goals. In both problems we have a set of data points $(t i, y i) <math xmlns="http://www.w3.org/1998/Math/MathML"><mo stretchy="false">(</mo><msub><mi>t</mi><mi>i</mi></msub><mo>,</mo><msub><mi>y</mi><mi>i</mi></msub><mo stretchy="false">)</mo></math>$ , $i = 1, \dots, m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>\dots</mo><mo>,</mo><mi>m</mi></math>$ , and we are attempting to determine the coefficients for a linear combination of basis functions.

With interpolation, we are looking for the linear combination of basis functions such that the resulting function passes through each of the data points exactly. So, for $m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>$ unique data points, we need $m <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi></math>$ linearly independent basis functions (and the resulting linear system will be square and full rank, so it will have an exact solution).

In contrast, however, with least squares data fitting we have some model that we are trying to find the parameters of the model that best fits the data points. For example, with linear least squares we may have 300 noisy data points that we want to model as a quadratic function. Therefore, we are trying represent our data as

$y = x_0 + x_1 t + x_2 t^2$

where $x 0, x 1, <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>x</mi><mn>0</mn></msub><mo>,</mo><msub><mi>x</mi><mn>1</mn></msub><mo>,</mo></math>$ and $x 2 <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>x</mi><mn>2</mn></msub></math>$ are the unknowns we want to determine (the coefficients to our basis functions). Because there are significantly more data points than parameters, we do not expect that the function will exactly pass through the data points. For this example, with noisy data points we would not want our function to pass through the data points exactly as we are looking to model the general trend and not capture the noise.

Normal Equations

Consider the least squares problem, $A x ≅ b <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>≅</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow></math>$ , where $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ is $m \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi><mo>\times</mo><mi>n</mi></math>$ real matrix (with $m > n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi><mo>></mo><mi>n</mi></math>$ ). As stated above, the least squares solution minimizes the squared 2-norm of the residual. Hence, we want to find the $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ that minimizes the function:

$\phi(\mathbf{x}) = \|\mathbf{r}\|_2^2 = (\mathbf{b} - {\bf A} \mathbf{x})^T (\mathbf{b} - {\bf A} \mathbf{x}) = \mathbf{b}^T \mathbf{b} - 2\mathbf{x}^T {\bf A} ^T \mathbf{b} + \mathbf{x}^T {\bf A} ^T {\bf A} \mathbf{x}.$

To solve this unconstrained minimization problem, we need to satisfy the first order necessary condition to get a stationary point:

$\nabla \phi(\mathbf{x}) = 0 \Rightarrow -2 {\bf A} ^T \mathbf{b} + 2 {\bf A} ^T {\bf A} \mathbf{x} = 0.$

The resulting square ( $n \times n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>n</mi><mo>\times</mo><mi>n</mi></math>$ ) linear system

${\bf A} ^T {\bf A} \mathbf{x} = {\bf A} ^T \mathbf{b}$

is called the system of normal equations. If the matrix ${\bf A}$ is full rank, the least-squares solution is unique and given by:

${\bf x} = ({\bf A} ^T {\bf A})^{-1} {\bf A} ^T \mathbf{b}$

We can look at the second-order sufficient condition of the the minimization problem by evaluating the Hessian of $\phi$ :

${\bf H} = 2 {\bf A} ^T {\bf A}$

Since the Hessian is symmetric and positive-definite, we confirm that the least-squares solution ${\bf x}$ is indeed a minimizer.

Although the least squares problem can be solved via the normal equations for full rank matrices, the solution tend to worsen the conditioning of the problem. Specifically,

cond (A T A) = (cond (A)) 2 . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mtext>cond</mtext><mo stretchy="false">(</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mi>T</mi></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><mo stretchy="false">(</mo><mtext>cond</mtext><mo stretchy="false">(</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo stretchy="false">)</mo><msup><mo stretchy="false">)</mo><mn>2</mn></msup><mo>.</mo></math>

Because of this, finding the least squares solution using Normal Equations is often not a good choice (however, simple to implement).

Computational complexity:

Since the system of normal equations yield a square and symmetric matrix, the least-squares solution can be computed using efficient methods such as Cholesky factorization. Note that the overall computational complexity of the factorization is $\mathcal{O}(n^3)$ . However, the construction of the matrix ${\bf A} ^T {\bf A}$ has complexity $\mathcal{O}(mn^2)$ . In typical data fitting problems, $m >> n$ and hence the overall complexity of the Normal Equations method is $\mathcal{O}(mn^2)$ .

Solving Least-Squares Problems Using SVD

Another way to solve the least-squares problem (where we are looking for that minimizes $\|{\bf b} - {\bf A} {\bf x}\|_2^2$ is to use the singular value decomposition (SVD) of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ ,

${\bf A} = {\bf U \Sigma V}^T$

where the squared norm of the residual becomes:

We can go from (1) to (2) because multiplying a vector by an orthogonal matrix does not change the 2-norm of the vector. Now let

y = V T x <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">y</mi></mrow><mo>=</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">V</mi></mrow><mi>T</mi></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>

z = U T b, <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">z</mi></mrow><mo>=</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">U</mi></mrow><mi>T</mi></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow><mo>,</mo></math>

then we are looking for $y <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">y</mi></mrow></math>$ that minimizes

‖ z - Σ y ‖ 22 . <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">z</mi></mrow><mo>-</mo><mi mathvariant="normal">Σ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">y</mi></mrow><msubsup><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mn>2</mn><mn>2</mn></msubsup><mo>.</mo></math>

Note that

Σ y = [σ 1 y 1 σ 2 y 2 ⋮ σ n y n], (σ i = Σ i, i) <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi mathvariant="normal">Σ</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">y</mi></mrow><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">[</mo><mtable columnspacing="1em" rowspacing="4pt"><mtr><mtd><msub><mi>σ</mi><mn>1</mn></msub><msub><mi>y</mi><mn>1</mn></msub></mtd></mtr><mtr><mtd><msub><mi>σ</mi><mn>2</mn></msub><msub><mi>y</mi><mn>2</mn></msub></mtd></mtr><mtr><mtd><mrow data-mjx-texclass="ORD"><mo>⋮</mo></mrow></mtd></mtr><mtr><mtd><msub><mi>σ</mi><mi>n</mi></msub><msub><mi>y</mi><mi>n</mi></msub></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE">]</mo></mrow><mo>,</mo><mo stretchy="false">(</mo><msub><mi>σ</mi><mi>i</mi></msub><mo>=</mo><msub><mi mathvariant="normal">Σ</mi><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>,</mo><mi>i</mi></mrow></msub><mo stretchy="false">)</mo></math>

so we choose

yi={ziσiσi≠00σi=0<math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><msub><mi>y</mi><mi>i</mi></msub><mo>=</mo><mrow data-mjx-texclass="INNER"><mo data-mjx-texclass="OPEN">{</mo><mtable columnalign="left left" columnspacing="1em" rowspacing=".2em"><mtr><mtd><mfrac><msub><mi>z</mi><mi>i</mi></msub><msub><mi>σ</mi><mi>i</mi></msub></mfrac></mtd><mtd><msub><mi>σ</mi><mi>i</mi></msub><mo>≠</mo><mn>0</mn></mtd></mtr><mtr><mtd><mn>0</mn></mtd><mtd><msub><mi>σ</mi><mi>i</mi></msub><mo>=</mo><mn>0</mn></mtd></mtr></mtable><mo data-mjx-texclass="CLOSE" fence="true" stretchy="true" symmetric="true"></mo></mrow></math>

which will minimize $\|{\bf z} - {\bf \Sigma} {\bf y}\|_2^2$ . Finally, we compute

${\bf x} = {\bf V} {\bf y}$

to find $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ . The expression of least-squares solution is

${\bf x} = \sum_{\sigma_i \neq 0} \frac{ {\bf u}_i^T {\bf b} }{\sigma_i} {\bf v}_i$

where $u i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mi>i</mi></msub></math>$ represents the $i <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$ th column of $U <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">U</mi></mrow></math>$ and $v i <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">v</mi></mrow><mi>i</mi></msub></math>$ represents the $i <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>i</mi></math>$ th column of $V <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">V</mi></mrow></math>$ .

In closed-form, we can express the least-squares solution as:

${\bf x} = {\bf V\Sigma}^{+}{\bf U}^T{\bf b}$

where $Σ + <math xmlns="http://www.w3.org/1998/Math/MathML"><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">Σ</mi></mrow><mrow data-mjx-texclass="ORD"><mo>+</mo></mrow></msup></math>$ is the pseudoinverse of the singular matrix computed by taking the reciprocal of the non-zero diagonal entries, leaving the zeros in place and transposing the resulting matrix. For example:

$\Sigma = \begin{bmatrix} \sigma_1 & & &\\ & \ddots & & \\& & \sigma_r &\\ & & & 0\\ 0 & ... & ... & 0 \\ \vdots & \ddots & \ddots & \vdots \\ 0 & ... & ... & 0 \end{bmatrix} \quad \implies \quad \Sigma^{+} = \begin{bmatrix} \frac{1}{\sigma_1} & & & & 0 & \dots & 0 \\ & \ddots & & & & \ddots &\\ & & \frac{1}{\sigma_r} & & 0 & \dots & 0 \\ & & & 0 & 0 & \dots & 0 \end{bmatrix}.$

Or in reduced form:

${\bf x} = {\bf V\Sigma}_R^{+}{\bf U}_R^T{\bf b}$

Note: Solving the least squares problem using a given reduced SVD has time complexity $O (m n) <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi data-mjx-variant="-tex-calligraphic" mathvariant="script">O</mi></mrow><mo stretchy="false">(</mo><mi>m</mi><mi>n</mi><mo stretchy="false">)</mo></math>$ .

Determining Residual in Least-Squares Problem Using SVD

We’ve shown above how the SVD can be used to find the least-squares solution (the solution that minimizes the squared 2-norm of the residual) to the least squares problem $A x ≅ b <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>≅</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow></math>$ . We can also use the SVD to determine an exact expression for the value of the residual with the least-squares solution.

Assume in the SVD of $A <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow></math>$ , $A = U Σ V T <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mo>=</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">U</mi><mi mathvariant="bold">Σ</mi><mi mathvariant="bold">V</mi></mrow><mi>T</mi></msup></math>$ , the diagonal entries of $Σ <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">Σ</mi></mrow></math>$ are in descending order ( $σ 1 \geq σ 2 \geq \dots <math xmlns="http://www.w3.org/1998/Math/MathML"><msub><mi>σ</mi><mn>1</mn></msub><mo>\geq</mo><msub><mi>σ</mi><mn>2</mn></msub><mo>\geq</mo><mo>\dots</mo></math>$ ), and the first $r <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>r</mi></math>$ diagonal entries of $Σ <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">Σ</mi></mrow></math>$ are nonzeros while all others are zeros, then

$\begin{align} \|{\bf b} - A {\bf x}\|_2^2 &= \|{\bf z} - \Sigma {\bf y}\|_2^2\\ &= \sum_{i=1}^n (z_i - \sigma_i y_i)^2\\ &= \sum_{i=1}^r (z_i - \sigma_i \frac{z_i}{\sigma_i})^2 + \sum_{i=r+1}^n (z_i - 0)^2\\ &= \sum_{i=r+1}^n z_i^2 \end{align}$

Recall that

z = U T b, <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">z</mi></mrow><mo>=</mo><msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">U</mi></mrow><mi>T</mi></msup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow><mo>,</mo></math>

we get

‖ b - A x ‖ 22 = n \sum i = r + 1 (u T i b) 2 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow><mo>-</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">A</mi></mrow><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><msubsup><mo data-mjx-texclass="ORD" fence="false" stretchy="false">‖</mo><mn>2</mn><mn>2</mn></msubsup><mo>=</mo><munderover><mo data-mjx-texclass="OP">\sum</mo><mrow data-mjx-texclass="ORD"><mi>i</mi><mo>=</mo><mi>r</mi><mo>+</mo><mn>1</mn></mrow><mi>n</mi></munderover><mo stretchy="false">(</mo><msubsup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">u</mi></mrow><mi>i</mi><mi>T</mi></msubsup><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow><msup><mo stretchy="false">)</mo><mn>2</mn></msup></math>

(For more formal proof check this video.)

Example of a Least-squares solution using SVD

Assume we have $3 <math xmlns="http://www.w3.org/1998/Math/MathML"><mn>3</mn></math>$ data points, $(t i, y i) = (1, 1.2), (2, 1.9), (3, 1) <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><msub><mi>t</mi><mi>i</mi></msub><mo>,</mo><msub><mi>y</mi><mi>i</mi></msub><mo stretchy="false">)</mo></mrow><mo>=</mo><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><mn>1</mn><mo>,</mo><mn>1.2</mn><mo stretchy="false">)</mo><mo>,</mo><mo stretchy="false">(</mo><mn>2</mn><mo>,</mo><mn>1.9</mn><mo stretchy="false">)</mo><mo>,</mo><mo stretchy="false">(</mo><mn>3</mn><mo>,</mo><mn>1</mn><mo stretchy="false">)</mo></mrow></math>$ , we want to find a line that best fits these data points. The code for using SVD to solve this least-squares problem is:

import numpy as np
import numpy.linalg as la

A = np.array([[1,1],[2,1],[3,1]])
b = np.array([1.2,1.9,1])
U, s, V = la.svd(A)
V = V.T
y = np.zeros(len(A[0]))
z = np.dot(U.T,b)
k = 0
threshold = 0.01
while k < len(A[0]) and s[k] > threshold:
  y[k] = z[k]/s[k]
  k += 1

x = np.dot(V,y)
print("The function of the best line is: y = " + str(x[0]) + "x + " + str(x[1]))

Non-linear Least-squares Problem vs. Linear Least-squares Problem

The above linear least-squares problem is associated with an overdetermined linear system $A x ≅ b . <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>A</mi><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo>≅</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">b</mi></mrow><mo>.</mo></math>$ This problem is called “linear” because the fitting function we are looking for is linear in the components of $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ . For example, if we are looking for a polynomial fitting function

f (t, x) = x 1 + x 2 t + x 3 t 2 + \dots + x n t n - 1 <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>f</mi><mo stretchy="false">(</mo><mi>t</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><msub><mi>x</mi><mn>1</mn></msub><mo>+</mo><msub><mi>x</mi><mn>2</mn></msub><mi>t</mi><mo>+</mo><msub><mi>x</mi><mn>3</mn></msub><msup><mi>t</mi><mn>2</mn></msup><mo>+</mo><mo>\dots</mo><mo>+</mo><msub><mi>x</mi><mi>n</mi></msub><msup><mi>t</mi><mrow data-mjx-texclass="ORD"><mi>n</mi><mo>-</mo><mn>1</mn></mrow></msup></math>

to fit the data points $(t i, y i), i = 1, \dots, m <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mo stretchy="false">(</mo><msub><mi>t</mi><mi>i</mi></msub><mo>,</mo><msub><mi>y</mi><mi>i</mi></msub><mo stretchy="false">)</mo><mo>,</mo><mi>i</mi><mo>=</mo><mn>1</mn><mo>,</mo><mo>\dots</mo><mo>,</mo><mi>m</mi></mrow></math>$ and ( $m > n <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>m</mi><mo>></mo><mi>n</mi></math>$ ), the problem can be solved using the linear least-squares method, because $f (t, x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>t</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo stretchy="false">)</mo></math>$ is linear in the components of $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ (though $f (t, x) <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>f</mi><mo stretchy="false">(</mo><mi>t</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo stretchy="false">)</mo></math>$ is nonlinear in $t <math xmlns="http://www.w3.org/1998/Math/MathML"><mi>t</mi></math>$ ).

If the fitting function for data points $(t_i,y_i), i = 1, ..., m$ is nonlinear in the components of $x <math xmlns="http://www.w3.org/1998/Math/MathML"><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow></math>$ , then the problem is a non-linear least-squares problem. For example, fitting sum of exponentials

f (t, x) = x 1 e x 2 t + x 2 e x 3 t + \dots + x n - 1 e x n t <math xmlns="http://www.w3.org/1998/Math/MathML" display="block"><mi>f</mi><mo stretchy="false">(</mo><mi>t</mi><mo>,</mo><mrow data-mjx-texclass="ORD"><mi mathvariant="bold">x</mi></mrow><mo stretchy="false">)</mo><mo>=</mo><msub><mi>x</mi><mn>1</mn></msub><msup><mi>e</mi><mrow data-mjx-texclass="ORD"><msub><mi>x</mi><mn>2</mn></msub><mi>t</mi></mrow></msup><mo>+</mo><msub><mi>x</mi><mn>2</mn></msub><msup><mi>e</mi><mrow data-mjx-texclass="ORD"><msub><mi>x</mi><mn>3</mn></msub><mi>t</mi></mrow></msup><mo>+</mo><mo>\dots</mo><mo>+</mo><msub><mi>x</mi><mrow data-mjx-texclass="ORD"><mi>n</mi><mo>-</mo><mn>1</mn></mrow></msub><msup><mi>e</mi><mrow data-mjx-texclass="ORD"><msub><mi>x</mi><mi>n</mi></msub><mi>t</mi></mrow></msup></math>

is a non-linear least-squares problem.

Review Questions

See this review link

ChangeLog

2020-08-08 Jerry Yang jiayiy7@illinois.edu: adds formal proof link for solving least-squares using SVD
2020-04-26 Mariana Silva mfsilva@illinois.edu: improved text overall; removed theory of the nonlinear least-squares
2018-11-14 Erin Carrier ecarrie2@illinois.edu: fix typo in lstsq res sum range
2018-01-14 Erin Carrier ecarrie2@illinois.edu: removes demo links
2017-11-29 Erin Carrier ecarrie2@illinois.edu: fixes typos in lst-sq code, jacobian desc in nonlinear lst-sq
2017-11-17 Erin Carrier ecarrie2@illinois.edu: fixes incorrect link
2017-11-16 Erin Carrier ecarrie2@illinois.edu: adds review questions minor formatting changes throughout for consistency, adds normal equations and interp vs lst-sq sections removes Gauss-Newton from nonlinear least squares
2017-11-12 Yu Meng yumeng5@illinois.edu: first complete draft
2017-10-17 Luke Olson lukeo@illinois.edu: outline