1. Summary

This time, we have summarized the Newton's method, which is the most famous and commonly used nonlinear optimization method, and provided implementation examples in R and Python. I used it (planned) for parameter estimation of an analysis method with multivariate analysis, so I implemented it for studying.

2. Non-linear optimization

2.1 What is it?

As it is, it is the optimization (= minimization / maximization) of nonlinear functions. Examples of nonlinear functions include $ y = x ^ 2 $ and $ y = x \ log_e x $. For example, taking $ y = x \ log_e x $ as an example, if you want to get a point that gives the minimum value of this function, (assuming that you know that $ 0 <x $ is convex downwards) ) Differentiate with respect to $ x $ $\frac{d}{dx}y = \log_e x + 1$ All you have to do is calculate the value of the function at the point where this gradient is $ 0 . In other words $\log x_e + 1 = 0$$ All you have to do is find the root of. By the way, $ \ hat {x} = 1 / e $. In this way, if the equation can be solved simply as a span (if it can be solved analytically), that is fine, but in many cases the solution cannot be expressed explicitly. Non-linear optimization methods are useful in such cases.

2.2 Newton-Raphson method

The Newton method, Newton-Raphson method, wiki is the most famous, simple and versatile method of nonlinear optimization. There is Newton% 27s_method)). For mathematical details, see Here (What is Newton's method ?? Approximate solution of equations solved by Newton's method). If you are polite and understand high school mathematics softly, I think you can get a general idea. To implement this

Numerical differentiation for calculating the gradient
Iteration of Newton's method update formula

Two are required. Regarding the first, for the time being,

\frac{d}{dx}f(x) = \underset{h\rightarrow 0}\lim \frac{f(x + h) - f(x)}{h}

Let's implement. In R,

numerical.differentiate = function(f.name, x, h = 1e-6){
  return((f.name(x+h) - f.name(x))/h)
}

So, in python,

def numerical_diff(func, x, h = 1e-6):
    return (func(x + h)-func(x))/h

Is it like that? Using these and the sequential calculation by Newton's method,

\log_e x + 1 = 0

Let's solve. An example of execution in R is

newton.raphson(f.5, 100, alpha = 1e-3) # 0.3678798

An example of execution in python is

newton_m(f_1, 100, alpha = 1e-3) # 0.36787980890600075

In both cases, we can see that the solution of the above equation is close to $ 1 / e \ fallingdotseq1 / 2.718 = 0.3679176 $.

3. Sample code

It's a miscellaneous code ...

`{r, newton_raphson.R}`


numerical.differentiate = function(f.name, x, h = 1e-6){
  return((f.name(x+h) - f.name(x))/h)
}

newton.raphson = function(equa, ini.val, alpha = 1, tol = 1e-6){
  x = ini.val
  while(abs(equa(x)) > tol){
    #print(x)
    grad = numerical.differentiate(equa, x)
    x = x - alpha * equa(x)/grad
  }
  return(x)
}

f.5 = function(x)return(log(x) + 1)
newton.raphson(f.5, 100, alpha = 1e-3)

`{python, newton_raphson.py}`


import math
def numerical_diff(func, x, h = 1e-6):
    return (func(x + h)-func(x))/h

def newton_m(func, ini, alpha = 1, tol = 1e-6):
    x = ini
    while abs(func(x)) > tol:
        grad = numerical_diff(func, x)
        x = x - alpha * func(x)/grad
    return x

def f_1(x):
    return math.log(x) + 1

print(newton_m(f_1, 100, alpha = 1e-3))

In order for Newton's method to converge to the global optimal solution, the function to be optimized must have a global optimal solution in the interval $ [a, b] $ to be optimized and must be convex in that interval. Let's watch out. The program I used to write was a multivariable program, so it wasn't easy to understand, but a single variable program is simple and easy to understand. The accuracy of the calculation depends on the tol and h parameters.

Introduction to Nonlinear Optimization (I)