Consider a function
For the rest of this topic we try to find the minimizer of a function, as one
can easily find the maximizer of a function
Consider a domain
Local Minima:
Global Minima:
Note that it is easier to find the local minimum than the global minimum. Given a function, finding whether a global minimum exists over the domain is in itself a non-trivial problem. Hence, we will limit ourselves to finding the local minima of the function.
In the case of 1-D optimization, we can tell if a point
Example:
Consider the function
The critical points are tabulated as:
Characteristic | |||
---|---|---|---|
3 | 0 | Local Minimum | |
1 | 0 | Local Maximum |
Looking at the table, we see that
As we saw in 1-D, on extending that concept to
Given
Given
A function is unimodal on an interval
A 1-dimensional function
Some examples of unimodal functions on an interval:
In order to simplify, we will consider our objective function to be unimodal as it guarantees us a unique solution to the minimization problem.
We know that in order to find a local minimum we need to find the root of the
derivative of the function. Inspired from Newton’s method for root-finding we
define our iterative scheme as follows:
For Newton’s method for optimization
in 1-D, we evaluate
Inspired by bisection for root finding, we define an interval reduction method for finding the minimum of a function. As in bisection where we reduce the interval such that the reduced interval always contains the root, in Golden Section Search we reduce our interval such that it always has a unique minimizer in our domain.
Algorithm to find the minima of
Our goal is to reduce the domain to
We select
The challenging part is to select a
As the interval gets reduced by a fixed factor each time, it can be observed
that the method is linearly convergent. The number
In golden section search, we do not need to evaluate any derivatives
of
The negative of the gradient of a differentiable function
We know the direction we need to move to approach the minimum but we
still do not know the distance we need to move in order to approach the
minimum. If
The next problem would be to find the
The steepest descent algorithm can be summed up in the following function:
import numpy.linalg as la
import scipy.optimize as opt
import numpy as np
def obj_func(alpha, x, s):
# code for computing the objective function at (x+alpha*s)
return f_of_x_plus_alpha_s
def gradient(x):
# code for computing gradient
return grad_x
def steepest_descent(x_init):
x_new = x_init
x_prev = np.random.randn(x_init.shape[0])
while(la.norm(x_prev - x_new) > 1e-6):
x_prev = x_new
s = -gradient(x_prev)
alpha = opt.minimize_scalar(obj_func, args=(x_prev, s)).x
x_new = x_prev + alpha*s
return x_new
The steepest descent method converges linearly.
Newton’s Method in
We solve for
The Newton’s Method can be expressed as a python function as follows:
import numpy as np
def hessian(x):
# Computes the hessian matrix corresponding the given objective function
return hessian_matrix_at_x
def gradient(x):
# Computes the gradient vector corresponding the given objective function
return gradient_vector_at_x
def newtons_method(x_init):
x_new = x_init
x_prev = np.random.randn(x_init.shape[0])
while(la.norm(x_prev-x_new)>1e-6):
x_prev = x_new
s = -la.solve(hessian(x_prev), gradient(x_prev))
x_new = x_prev + s
return x_new
newtons_method