Consider a function . The point is called the root of if .
Finding the values of for which is useful for many applications, but a more general task is to find the values of for which . The same techniques used to find the root of a function can be used to solve an equation by manipulating the function like so:
The new function has a root at the solution to the original equation .
Given we define the Jacobian matrix as:
Linear functions are trivial to solve, as are quadratic functions if you have the quadratic formula memorized. However, polynomials of higher degree and non-polynomial functions are much more difficult to solve. The simplest technique for solving these types of equations is to use an iterative root-finding technique.
We will try out the following techniques using the function:
The bisection method is the simplest root-finding technique.
The algorithm for bisection is analogous to binary search:
With this algorithm we successively half the length of the interval known to contain the root each time. We can repeat this process until the length of the interval is less than the tolerance to which we want to know the root.
Conceptually bisection method uses 2 function evaluations at each iteration. However, at each step either one of or stays the same. So, at each iteration (after the first iteration), one of or was computed during the previous iteration. Therefore, bisection method requires only one new function evaluation per iteration. Depending on how costly the function is to evaluate, this can be a significant cost savings.
Bisection method has linear convergence, with a constant of 1/2.
The bisection method requires us to know a little about our function. Specifically, must be continuous and we must have an interval such that
Then, by the intermediate value theorem, we know that there must be a root in the interval .
This restriction means that the bisection method cannot solve for the root of , as it never crosses the x-axis and becomes negative.
From the graph above, we can see that has a root somewhere between 1 and 2. It is difficult to tell exactly what the root is, but we can use the bisection method to approximate it. Specifically, we can set and .
Since and are both positive, we will replace with and further narrow our interval.
Since and are both negative, we will replace with and further narrow our interval.
Note that as described above, we didn’t need to recalculate or as we had already calculated them during the previous iteration. Reusing these values can be a significant cost savings.
Since and are both positive, we will replace with and further narrow our interval.
…
When running the code for bisection method given below, the resulting approximate root determined is 1.324717957244502. With bisection, we can approximate the root to a desired tolerance (the value above is for the default tolerances).
The following Python code calls SciPy’s bisect
method:
import scipy.optimize as opt
def f(x):
return x**3 - x - 1
root = opt.bisect(f, a=1, b=2)
The Newton-Raphson Method (a.k.a. Newton’s Method) uses a Taylor series approximation of the function to find an approximate solution. Specifically, it takes the first 2 terms:
Starting with the Taylor series above, we can find the root of this new function like so:
This value of can now be used to find a value of closer to the root of :
Geometrically, is the intersection of the x-axis and the tangent of the graph at .
By repeatedly this procedure, we can get closer and closer to the actual root.
With Newton’s method, at each iteration we must evaluate both and .
Typically, Newton’s Method has quadratic convergence.
Although Newton’s Method converges quickly, the additional cost of evaluating the derivative makes each iteration slower to compute. Many functions are not easily differentiable, so Newton’s Method is not always possible. Even in cases when it is possible to evaluate the derivative, it may be quite costly.
Convergence only works well if you are already close to the root. Specifically, if started too far from the root Newton’s method may not converge at all.
We will need the following equations:
From the graph above, we can see that the root is somewhere near . We will use this as our starting position, .
As you can see, Newton’s Method is already converging significantly faster than the Bisection Method.
…
When running the code for Newton’s method given below, the resulting approximate root determined is 1.324717957244746.
The following Python code calls SciPy’s newton
method:
import scipy.optimize as opt
def f(x):
return x**3 - x - 1
def fprime(x):
return 3 * x**2 - 1
root = opt.newton(f, x0=1, fprime=fprime)
Like Newton’s Method, secant method uses the Taylor Series to find the solution. However, you may not always be able to take the derivative of a function. Secant method gets around this by approximating the derivative as:
The steps involved in the Secant Method are identical to those of the Newton Method, with the derivative replaced by an approximation for the slope of the tangent.
Similar to bisection, although secant method conceptually requires 2 function evaluations per iteration, one of the function evaluations will have been computed in the previous iteration and can be reused. So, secant method requires 1 new function evaluation per iteration (after the first iteration).
Secant method has superlinear convergence.
More specifically, the rate of convergence is:
This happens to be the golden ratio.
This technique has many of the same drawbacks as Newton’s Method, but does not require a derivative. It does not converge as quickly as Newton’s Method. It also requires two starting guesses near the root.
Let’s start with and .
First, find an approximate for the derivative (slope):
Then, use this for Newton’s Method:
…
When running the code for secant method given below, the resulting approximate root determined is 1.324717957244753.
SciPy’s newton
method serves double-duty. If given a function and a
first derivative , it will use Newton’s Method. If it is not given a
derivative, it will instead use the Secant Method to approximate it:
import scipy.optimize as opt
def f(x):
return x**3 - x - 1
root = opt.newton(f, x0=1)
Similar to root-finding in 1 dimension, we can also perform root-finding for multiple equations in dimensions. Mathematically, we are trying to solve for . In other words, is now a vector-valued function
If we are instead looking for the solution to , we can rework our function like so:
We can think of each equation as a function that describes a surface. We are looking for vectors that describe the intersection of these surfaces.
The multi-dimensional equivalent of Newton’s Method involves approximating a function as:
where is the Jacobian matrix of .
By setting this to and rearranging, we get:
Note that in practice we would not actually invert the Jacobian, but would instead solve the linear system in to determine the step.
Similar to the way we solved for in 1 dimension, we can solve for:
where $\boldsymbol{s_k}$ is determined by solving the linear system
Just like in 1D, Newton’s Method only converges locally. It may also be expensive to compute at each iteration and we must solve a linear system at each iteration.
Let’s find a root for:
The corresponding Jacobian and inverse Jacobian are:
In this example, as the Jacobian is a matrix with a simple inverse, we work explicitly with the inverse, even though we would not explicitly compute the inverse for a real problem.
Let’s start at .
…
When running the code for Newton’s method given below, the resulting approximate root determined is .
import numpy as np
import scipy.optimize as opt
def f(xvec):
x, y = xvec
return np.array([
x + 2*y - 2,
x**2 + 4*y**2 - 4
])
def Jf(xvec):
x, y = xvec
return np.array([
[1, 2],
[2*x, 8*y]
])
sol = opt.root(f, x0=[1, 1], jac=Jf)
root = sol.x