Consider a set of
In matrix form, the resulting linear system is
However, it is obvious that we have more equations than unknowns, and there is usually no exact solution to the above problem.
Generally, if we have a linear system
Ax=bwhere A is an
Therefore, an overdetermined system is better written as
Ax≅bFor an overdetermined system
This problem
It is important to understand that interpolation and least-squares data fitting, while somewhat similar, are fundamentally different in their goals. In both problems we have a set of data points
With interpolation, we are looking for the linear combination of basis functions such that the resulting function passes through each of the data points exactly. So, for
In contrast, however, with least squares data fitting we have some model that we are trying to find the parameters of the model that best fits the data points. For example, with linear least squares we may have 300 noisy data points that we want to model as a quadratic function. Therefore, we are trying represent our data as
y=x0+x1t+x2t2where
Consider the least squares problem,
To solve this unconstrained minimization problem, we need to satisfy the first order necessary condition to get a stationary point:
∇ϕ(x)=0⇒−2ATb+2ATAx=0.The resulting square (
is called the system of normal equations. If the matrix A is full rank, the least-squares solution is unique and given by:
x=(ATA)−1ATbWe can look at the second-order sufficient condition of the the minimization problem by evaluating the Hessian of ϕ:
H=2ATASince the Hessian is symmetric and positive-definite, we confirm that the least-squares solution x is indeed a minimizer.
Although the least squares problem can be solved via the normal equations for full rank matrices, the solution tend to worsen the conditioning of the problem. Specifically,
Because of this, finding the least squares solution using Normal Equations is often not a good choice (however, simple to implement).
Since the system of normal equations yield a square and symmetric matrix, the least-squares solution can be computed using efficient methods such as Cholesky factorization. Note that the overall computational complexity of the factorization is O(n3). However, the construction of the matrix ATA has complexity O(mn2). In typical data fitting problems, m>>n and hence the overall complexity of the Normal Equations method is O(mn2).
Another way to solve the least-squares problem
where the squared norm of the residual becomes:
We can go from (1) to (2) because multiplying a vector by an orthogonal matrix does not change the 2-norm of the vector. Now let
then we are looking for
Note that
so we choose
which will minimize ‖z−Σy‖22. Finally, we compute
x=Vyto find
where
In closed-form, we can express the least-squares solution as:
x=VΣ+UTbwhere
Or in reduced form:
x=VΣ+RUTRbNote: Solving the least squares problem using a given reduced SVD has time complexity
We’ve shown above how the SVD can be used to find the least-squares solution (the solution that minimizes the squared 2-norm of the residual) to the least squares problem
Assume in the SVD of
Recall that
we get
where
(For more formal proof check this video.)
Assume we have
import numpy as np
import numpy.linalg as la
A = np.array([[1,1],[2,1],[3,1]])
b = np.array([1.2,1.9,1])
U, s, V = la.svd(A)
V = V.T
y = np.zeros(len(A[0]))
z = np.dot(U.T,b)
k = 0
threshold = 0.01
while k < len(A[0]) and s[k] > threshold:
y[k] = z[k]/s[k]
k += 1
x = np.dot(V,y)
print("The function of the best line is: y = " + str(x[0]) + "x + " + str(x[1]))
The above linear least-squares problem is associated with an overdetermined linear system
to fit the data points
If the fitting function
is a non-linear least-squares problem.