Consider a function \(f : \mathbb{R} \to \mathbb{R}\). The point \(x \in \mathbb{R}\) is called the root of \(f\) if \(f(x) = 0\).
Finding the values of \(x\) for which \(f(x) = 0\) is useful for many applications, but a more general task is to find the values of \(x\) for which \(f(x) = y\). The same techniques used to find the root of a function can be used to solve an equation by manipulating the function like so:
\[\tilde{f}(x) = f(x) - y = 0\]The new function \(\tilde{f}(x)\) has a root at the solution to the original equation \(f(x) = y\).
Given \(\boldsymbol{f} : \mathbb{R}^n \to \mathbb{R}^n\) we define the Jacobian matrix \({\bf J}_f\) as:
\[{\bf J}_f(\boldsymbol{x}) = \begin{bmatrix} \frac{\partial f_1}{\partial x_1} & \ldots & \frac{\partial f_1}{\partial x_n} \\ \vdots & \ddots & \vdots \\ \frac{\partial f_n}{\partial x_1} & \ldots & \frac{\partial f_n}{\partial x_n} \end{bmatrix}\]Linear functions are trivial to solve, as are quadratic functions if you have the quadratic formula memorized. However, polynomials of higher degree and non-polynomial functions are much more difficult to solve. The simplest technique for solving these types of equations is to use an iterative root-finding technique. Instead of finding out where \(f(x) = 0\) directly, we will start with an initial guess and improve it over multiple steps until our residual \(f(x)\) is sufficiently small
We will try out the following techniques using the function:
\[f(x) = x^3 - x - 1\] 
 The bisection method is the simplest root-finding technique.
The algorithm for bisection is analogous to binary search:
With this algorithm we successively half the length of the interval known to contain the root each time. We can repeat this process until the length of the interval is less than the tolerance to which we want to know the root.
Conceptually bisection method uses 2 function evaluations at each iteration. However, at each step either one of \(a\) or \(b\) stays the same. So, at each iteration (after the first iteration), one of \(f(a)\) or \(f(b)\) was computed during the previous iteration. Therefore, bisection method requires only one new function evaluation per iteration. Depending on how costly the function is to evaluate, this can be a significant cost savings.
Bisection method has linear convergence, with a constant of 1/2.
The bisection method requires us to know a little about our function. Specifically, \(f(x)\) must be continuous and we must have an interval \([a, b]\) such that
\[\mathrm{sgn}(f(a)) = -\mathrm{sgn}(f(b)).\]Then, by the intermediate value theorem, we know that there must be a root in the interval \([a,b]\).
This restriction means that the bisection method cannot solve for the root of \(x^2\), as it never crosses the x-axis and becomes negative.
From the graph above, we can see that \(f(x)\) has a root somewhere between 1 and 2. It is difficult to tell exactly what the root is, but we can use the bisection method to approximate it. Specifically, we can set \(a = 1\) and \(b = 2\).
Since \(f(b)\) and \(f(c)\) are both positive, we will replace \(b\) with \(c\) and further narrow our interval.
Since \(f(a)\) and \(f(c)\) are both negative, we will replace \(a\) with \(c\) and further narrow our interval.
Note that as described above, we didn’t need to recalculate \(f(a)\) or \(f(b)\) as we had already calculated them during the previous iteration. Reusing these values can be a significant cost savings.
Since \(f(b)\) and \(f(c)\) are both positive, we will replace \(b\) with \(c\) and further narrow our interval.
…
When running the code for bisection method given below, the resulting approximate root determined is 1.324717957244502. With bisection, we can approximate the root to a desired tolerance (the value above is for the default tolerances).
The following Python code calls SciPy’s bisect method:
import scipy.optimize as opt
def f(x):
    return x**3 - x - 1
root = opt.bisect(f, a=1, b=2)
The Newton-Raphson Method (a.k.a. Newton’s Method) uses a Taylor series approximation of the function to find an approximate solution. Specifically, it takes the first 2 terms:
\[f(x_k + h) \approx f(x_k) + f'(x_k)h\]Starting with the Taylor series above, we can find the root of this new function like so:
\(f(x_k) + f'(x_k)h = 0\) \(h = - \frac{f(x_k)}{f'(x_k)}\)
This value of \(h\) can now be used to find a value of \(x\) closer to the root of \(f\):
\[x_{k+1} = x_k + h = x_k - \frac{f(x_k)}{f'(x_k)}\]Geometrically, \((x_{k+1}, 0)\) is the intersection of the x-axis and the tangent of the graph at \((x_k, f(x_k))\).
By repeatedly this procedure, we can get closer and closer to the actual root.
With Newton’s method, at each iteration we must evaluate both \(f(x)\) and \(f'(x)\).
Typically, Newton’s Method has quadratic convergence.
Although Newton’s Method converges quickly, the additional cost of evaluating the derivative makes each iteration slower to compute. Many functions are not easily differentiable, so Newton’s Method is not always possible. Even in cases when it is possible to evaluate the derivative, it may be quite costly.
Convergence only works well if you are already close to the root. Specifically, if started too far from the root Newton’s method may not converge at all.
We will need the following equations:
\[\begin{align*} f(x) &= x^3 - x - 1 \\ f'(x) &= 3x^2 - 1 \end{align*}\]From the graph above, we can see that the root is somewhere near \(x = 1\). We will use this as our starting position, \(x_0\).
\[\begin{align*} x_1 &= x_0 - \frac{f(x_0)}{f'(x_0)} \\ &= 1 - \frac{f(1)}{f'(1)} \\ &= 1 - \frac{1^3 - 1 - 1}{3 \cdot 1^2 - 1} \\ &= 1 + \frac{1}{2} \\ &= 1.5 \end{align*}\]As you can see, Newton’s Method is already converging significantly faster than the Bisection Method.
…
When running the code for Newton’s method given below, the resulting approximate root determined is 1.324717957244746.
The following Python code calls SciPy’s newton method:
import scipy.optimize as opt
def f(x):
    return x**3 - x - 1
def fprime(x):
    return 3 * x**2 - 1
root = opt.newton(f, x0=1, fprime=fprime)
Like Newton’s Method, secant method uses the Taylor Series to find the solution. However, you may not always be able to take the derivative of a function. Secant method gets around this by approximating the derivative as:
\[f'(x_k) \approx \frac{f(x_k) - f(x_{k-1})}{x_k - x_{k-1}}\]The steps involved in the Secant Method are identical to those of the Newton Method, with the derivative replaced by an approximation for the slope of the tangent.
Similar to bisection, although secant method conceptually requires 2 function evaluations per iteration, one of the function evaluations will have been computed in the previous iteration and can be reused. So, secant method requires 1 new function evaluation per iteration (after the first iteration).
Secant method has superlinear convergence.
More specifically, the rate of convergence \(r\) is:
\[r = \frac{1 + \sqrt{5}}{2} \approx 1.618\]This happens to be the golden ratio.
This technique has many of the same drawbacks as Newton’s Method, but does not require a derivative. It does not converge as quickly as Newton’s Method. It also requires two starting guesses near the root.
Let’s start with \(x_0 = 1\) and \(x_{-1} = 2\).
First, find an approximate for the derivative (slope):
\[\begin{align*} f'(x_0) &\approx \frac{f(x_0) - f(x_{-1})}{x_0 - x_{-1}} \\ &= \frac{f(1) - f(2)}{1 - 2} \\ &= \frac{(1^3 - 1 - 1) - (2^3 - 2 - 1)}{1 - 2} \\ &= \frac{(-1) - (5)}{1 - 2} \\ &= 6 \end{align*}\]Then, use this for Newton’s Method:
\[\begin{align*} x_1 &= x_0 - \frac{f(x_0)}{f'(x_0)} \\ &= 1 - \frac{f(1)}{f'(1)} \\ &= 1 - \frac{1^3 - 1 - 1}{6} \\ &= 1 + \frac{1}{6} \\ &= 1.1666666666666667 \end{align*}\]…
When running the code for secant method given below, the resulting approximate root determined is 1.324717957244753.
SciPy’s newton method serves double-duty. If given a function \(f\) and a
first derivative \(f'\), it will use Newton’s Method. If it is not given a
derivative, it will instead use the Secant Method to approximate it:
import scipy.optimize as opt
def f(x):
    return x**3 - x - 1
root = opt.newton(f, x0=1)
Similar to root-finding in 1 dimension, we can also perform root-finding for multiple equations in \(n\) dimensions. Mathematically, we are trying to solve \(\boldsymbol{f(x) = 0}\) for \(\boldsymbol{f} : \mathbb{R}^n \to \mathbb{R}^n\). In other words, \(\boldsymbol{f(x)}\) is now a vector-valued function
\[\boldsymbol{f(x)} = \begin{bmatrix} f_1(\boldsymbol{x}) \\ \vdots \\ f_n(\boldsymbol{x}) \end{bmatrix} = \begin{bmatrix} f_1(x_1, \ldots, x_n) \\ \vdots \\ f_n(x_1, \ldots, x_n) \end{bmatrix}\]If we are instead looking for the solution to \(\boldsymbol{f(x) = y}\), we can rework our function like so:
\[\boldsymbol{\tilde{f}(x)} = \boldsymbol{f(x)} - \boldsymbol{y} = \boldsymbol{0}\]We can think of each equation as a function that describes a surface. We are looking for vectors that describe the intersection of these surfaces.
The multi-dimensional equivalent of Newton’s Method involves approximating a function as:
\[\boldsymbol{f(x + s)} \approx \boldsymbol{f(x)} + {\bf J}_f(\boldsymbol{x})\boldsymbol{s}\]where \({\bf J}_f\) is the Jacobian matrix of \(\boldsymbol{f}\).
By setting this to \(\mathbf{0}\) and rearranging, we get:
\[\begin{align*} {\bf J}_f(\boldsymbol{x})\boldsymbol{s} &= -\boldsymbol{f(x)} \qquad \qquad (1) \\ \boldsymbol{s} &= - {\bf J}_f(\boldsymbol{x})^{-1} \boldsymbol{f(x)} \end{align*}\]Note that in practice we would not actually invert the Jacobian, but would instead solve the linear system in \((1)\) to determine the step.
Similar to the way we solved for \(x_{k+1}\) in 1 dimension, we can solve for:
\(\boldsymbol{x_{k+1}} = \boldsymbol{x_k} + \boldsymbol{s_k}\) where $\boldsymbol{s_k}$ is determined by solving the linear system \({\bf J}_f(\boldsymbol{x_k})\boldsymbol{s_k} = -\boldsymbol{f(x_k)}.\)
Just like in 1D, Newton’s Method only converges locally. It may also be expensive to compute \({\bf J}_f\) at each iteration and we must solve a linear system at each iteration.
Let’s find a root for:
\[\boldsymbol{f}(x, y) = \begin{bmatrix} x + 2y - 2 \\ x^2 + 4y^2 - 4 \end{bmatrix}\]The corresponding Jacobian and inverse Jacobian are:
\[{\bf J}_f(\boldsymbol{x}) = \begin{bmatrix} 1 & 2 \\ 2x & 8y \end{bmatrix}\] \[{\bf J}_f^{-1} = \frac{1}{x - 2y} \begin{bmatrix} -2y & \frac{1}{2} \\ \frac{x}{2} & - \frac{1}{4} \end{bmatrix}\]In this example, as the Jacobian is a \(2 \times 2\) matrix with a simple inverse, we work explicitly with the inverse, even though we would not explicitly compute the inverse for a real problem.
Let’s start at \(\boldsymbol{x_0} = \begin{bmatrix}1 \\ 1\end{bmatrix}\).
\[\begin{align*} \boldsymbol{x_1} &= \boldsymbol{x_0} - {\bf J}_f(\boldsymbol{x_0})^{-1} \boldsymbol{f(x_0)} \\ &= \begin{bmatrix}1 \\ 1\end{bmatrix} - \frac{1}{1 - 2}\begin{bmatrix}-2 & \frac{1}{2} \\ \frac{1}{2} & - \frac{1}{4}\end{bmatrix} \begin{bmatrix}1 \\ 1\end{bmatrix} \\ &= \begin{bmatrix}1 \\ 1\end{bmatrix} + \begin{bmatrix}-1.5 \\ 0.25\end{bmatrix} \\ &= \begin{bmatrix}-0.5 \\ 1.25\end{bmatrix} \end{align*}\]…
When running the code for Newton’s method given below, the resulting approximate root determined is \(\begin{bmatrix}-2.74060567 \cdot 10^{-16} & 1\end{bmatrix}^\top\).
import numpy as np
import scipy.optimize as opt
def f(xvec):
    x, y = xvec
    return np.array([
        x + 2*y - 2,
        x**2 + 4*y**2 - 4
    ])
def Jf(xvec):
    x, y = xvec
    return np.array([
        [1, 2],
        [2*x, 8*y]
    ])
sol = opt.root(f, x0=[1, 1], jac=Jf)
root = sol.x