The observed values obtained by experiments are usually discrete values, but there are times when you want to find the values in between. At that time, various interpolation methods (not complementation) and other curve fittings (curve approximation) are used. These solutions are often used as the subject of programming exercises, but there are many powerful libraries in practice, so it is more convenient to use existing libraries than to create your own.
Here, the following three mathematical functions f (x), g (x), and h (x) are prepared as computer experiments.
import numpy as np
f = lambda x: 1/(1 + np.exp(-x))
g = lambda x: 1.0/(1.0+x**2)
h = lambda x: np.sin(x)
It is assumed that the above three mathematical functions show the phenomenon that is actually occurring.
However, in reality, the observation points obtained in the experiment are discrete values as follows.
x_observed = np.linspace(-10, 10, 11)
x_observed
array([-10., -8., -6., -4., -2., 0., 2., 4., 6., 8., 10.])
At this time, let's illustrate what kind of value is observed by the experiment.
%matplotlib inline
import matplotlib.pyplot as plt
for func in [f, g, h]:
y_observed = func(x_observed)
plt.scatter(x_observed, y_observed)
plt.grid()
plt.show()
What kind of curve can you imagine from the above observations?
Let's illustrate the truth that is not observed in the experiment.
x_latent = np.linspace(-10, 10, 101)
x_latent
array([-10. , -9.8, -9.6, -9.4, -9.2, -9. , -8.8, -8.6, -8.4,
-8.2, -8. , -7.8, -7.6, -7.4, -7.2, -7. , -6.8, -6.6,
-6.4, -6.2, -6. , -5.8, -5.6, -5.4, -5.2, -5. , -4.8,
-4.6, -4.4, -4.2, -4. , -3.8, -3.6, -3.4, -3.2, -3. ,
-2.8, -2.6, -2.4, -2.2, -2. , -1.8, -1.6, -1.4, -1.2,
-1. , -0.8, -0.6, -0.4, -0.2, 0. , 0.2, 0.4, 0.6,
0.8, 1. , 1.2, 1.4, 1.6, 1.8, 2. , 2.2, 2.4,
2.6, 2.8, 3. , 3.2, 3.4, 3.6, 3.8, 4. , 4.2,
4.4, 4.6, 4.8, 5. , 5.2, 5.4, 5.6, 5.8, 6. ,
6.2, 6.4, 6.6, 6.8, 7. , 7.2, 7.4, 7.6, 7.8,
8. , 8.2, 8.4, 8.6, 8.8, 9. , 9.2, 9.4, 9.6,
9.8, 10. ])
fig_idx = 0
plt.figure(figsize=(12,4))
for func in [f, g, h]:
fig_idx += 1
y_observed = func(x_observed)
y_latent = func(x_latent)
plt.subplot(1, 3, fig_idx)
plt.scatter(x_observed, y_observed)
plt.plot(x_latent, y_latent)
plt.grid()
plt.show()
Now, what I want to ask here is how close to this "true figure" can be derived from the limited observation values.
scipy provides a powerful library called interpolate that allows you to use a variety of interpolation methods.
from scipy import interpolate
ip1 = ["Nearest point interpolation", lambda x, y: interpolate.interp1d(x, y, kind="nearest")]
ip2 = ["Linear interpolation", interpolate.interp1d]
ip3 = ["Lagrange interpolation", interpolate.lagrange]
ip4 = ["Center of gravity interpolation", interpolate.BarycentricInterpolator]
ip5 = ["Krogh interpolation", interpolate.KroghInterpolator]
ip6 = ["Second-order spline interpolation", lambda x, y: interpolate.interp1d(x, y, kind="quadratic")]
ip7 = ["3rd order spline interpolation", lambda x, y: interpolate.interp1d(x, y, kind="cubic")]
ip8 = ["Autumn interpolation", interpolate.Akima1DInterpolator]
ip9 = ["Piecewise third-order Hermite interpolation", interpolate.PchipInterpolator]
x_observed = np.linspace(-10, 10, 11)
x_observed
array([-10., -8., -6., -4., -2., 0., 2., 4., 6., 8., 10.])
y_observed = f(x_observed)
y_latent = f(x_latent)
for method_name, method in [ip1, ip2, ip3, ip4, ip5, ip6, ip7, ip8, ip9]:
print(method_name)
fitted_curve = method(x_observed, y_observed)
plt.scatter(x_observed, y_observed, label="observed")
plt.plot(x_latent, fitted_curve(x_latent), c="red", label="fitted")
plt.plot(x_latent, y_latent, label="latent")
plt.grid()
plt.legend()
plt.show()
Nearest point interpolation
Linear interpolation
Lagrange interpolation
Center of gravity interpolation
Krogh interpolation
Second-order spline interpolation
3rd order spline interpolation
Autumn interpolation
Piecewise third-order Hermite interpolation
Interpolate g (x) with various interpolation methods.
Interpolate h (x) with various interpolation methods.
Interpolate the following measurements with various interpolation methods.
x_observed = np.array([9, 28, 38, 58, 88, 98, 108, 118, 128, 138, 148, 158, 168, 178, 188, 198, 208, 218, 228, 238, 278, 288, 298])
y_observed = np.array([51, 80, 112, 294, 286, 110, 59, 70, 56, 70, 104, 59, 59, 72, 87, 99, 64, 60, 74, 151, 157, 57, 83])
x_latent = np.linspace(min(x_observed), max(x_observed), 100)
Numpy's .polfyfit makes curve fitting easy with least squares. If you recalculate the jumping observations observed by the experiment
x_observed = np.linspace(-10, 10, 11)
x_observed
array([-10., -8., -6., -4., -2., 0., 2., 4., 6., 8., 10.])
y_observed = f(x_observed)
y_observed
array([4.53978687e-05, 3.35350130e-04, 2.47262316e-03, 1.79862100e-02,
1.19202922e-01, 5.00000000e-01, 8.80797078e-01, 9.82013790e-01,
9.97527377e-01, 9.99664650e-01, 9.99954602e-01])
When linear regression is performed on the above x and y by the least squares method,
coefficients = np.polyfit(x_observed, y_observed, 1)
coefficients
array([0.06668944, 0.5 ])
The above return value directly corresponds to the coefficients a and b of y = ax + b. Expressed using Sympy,
import sympy as sym
from sympy.plotting import plot
sym.init_printing(use_unicode=True)
%matplotlib inline
x, y = sym.symbols("x y")
#If you are using Google Colab, run it to support TeX display by Sympy
def custom_latex_printer(exp,**options):
from google.colab.output._publish import javascript
url = "https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.3/latest.js?config=default"
javascript(url=url)
return sym.printing.latex(exp,**options)
sym.init_printing(use_latex="mathjax",latex_printer=custom_latex_printer)
expr = 0
for index, coefficient in enumerate(coefficients):
expr += coefficient * x ** (len(coefficients) - index - 1)
display(sym.Eq(y, expr))
That is the regression equation obtained.
You can specify the order of regression with the last argument. For example, if you want to fit in a cubic formula
coefficients = np.polyfit(x_observed, y_observed, 3)
expr = 0
for index, coefficient in enumerate(coefficients):
expr += coefficient * x ** (len(coefficients) - index - 1)
display(sym.Eq(y, expr))
It shows that we were able to return to.
To draw the resulting curve, we need multiple x's and their corresponding y's, which we get:
fitted_curve = np.poly1d(np.polyfit(x_observed, y_observed, 3))(x_latent)
fitted_curve
array([ 0.04796474, 0.02805317, 0.00989629, -0.00654171, -0.02129666,
-0.03440435, -0.0459006 , -0.05582121, -0.064202 , -0.07107878,
-0.07648735, -0.08046353, -0.08304312, -0.08426194, -0.08415579,
-0.08276049, -0.08011184, -0.07624566, -0.07119776, -0.06500394,
-0.05770001, -0.04932179, -0.03990508, -0.02948569, -0.01809944,
-0.00578213, 0.00743042, 0.02150241, 0.03639803, 0.05208146,
0.0685169 , 0.08566854, 0.10350056, 0.12197717, 0.14106254,
0.16072086, 0.18091634, 0.20161315, 0.22277549, 0.24436755,
0.26635352, 0.28869759, 0.31136394, 0.33431678, 0.35752028,
0.38093865, 0.40453606, 0.42827671, 0.45212479, 0.47604449,
0.5 , 0.52395551, 0.54787521, 0.57172329, 0.59546394,
0.61906135, 0.64247972, 0.66568322, 0.68863606, 0.71130241,
0.73364648, 0.75563245, 0.77722451, 0.79838685, 0.81908366,
0.83927914, 0.85893746, 0.87802283, 0.89649944, 0.91433146,
0.9314831 , 0.94791854, 0.96360197, 0.97849759, 0.99256958,
1.00578213, 1.01809944, 1.02948569, 1.03990508, 1.04932179,
1.05770001, 1.06500394, 1.07119776, 1.07624566, 1.08011184,
1.08276049, 1.08415579, 1.08426194, 1.08304312, 1.08046353,
1.07648735, 1.07107878, 1.064202 , 1.05582121, 1.0459006 ,
1.03440435, 1.02129666, 1.00654171, 0.99010371, 0.97194683,
0.95203526])
plt.scatter(x_observed, f(x_observed), label="observed")
plt.plot(x_latent, fitted_curve, c="red", label="fitted")
plt.plot(x_latent, f(x_latent), label="latent")
plt.grid()
plt.legend()
<matplotlib.legend.Legend at 0x7fb673a3c630>
Approximate g (x) with a quintic equation.
Approximate h (x) with a 9th-order equation.
Approximate the following measured values with a 9th-order equation.
x_observed = np.array([9, 28, 38, 58, 88, 98, 108, 118, 128, 138, 148, 158, 168, 178, 188, 198, 208, 218, 228, 238, 278, 288, 298])
y_observed = np.array([51, 80, 112, 294, 286, 110, 59, 70, 56, 70, 104, 59, 59, 72, 87, 99, 64, 60, 74, 151, 157, 57, 83])
One way to do curve fitting (curve approximation) using Python is to use scipy.optimize.curve_fit.
x_observed = [9, 28, 38, 58, 88, 98, 108, 118, 128, 138, 148, 158, 168, 178, 188, 198, 208, 218, 228, 238, 278, 288, 298]
y_observed = [51, 80, 112, 294, 286, 110, 59, 70, 56, 70, 104, 59, 59, 72, 87, 99, 64, 60, 74, 151, 157, 57, 83]
Let's convert from Python's List type to Numpy's Array type.
import numpy as np
x_observed = np.array(x_observed)
y_observed = np.array(y_observed)
First, let's approximate it to a linear equation. Define the function func1. a and b are the values you want to find.
def func1(X, a, b): #Linear approximation
Y = a + b * X
return Y
Here is an example of using the func1 function.
func1(x_observed, 2, 2)
array([ 20, 58, 78, 118, 178, 198, 218, 238, 258, 278, 298, 318, 338,
358, 378, 398, 418, 438, 458, 478, 558, 578, 598])
Now, let's approximate it using scipy.optimize.curve_fit.
from scipy.optimize import curve_fit
popt, pcov = curve_fit(func1,x_observed,y_observed) #popt is the optimal estimate, pcov is the covariance
popt
array([125.77023172, -0.1605313 ])
The popt obtained here stores the optimal estimate. Let's compare it with the estimates obtained by curve fitting with Numpy.polyfit.
Next, let's approximate it to a quadratic equation. Define the function func2. a, b, and c are the values you want.
def func2(X, a, b, c): #Quadratic approximation
Y = a + b * X + c * X ** 2
return Y
Here is an example of using the func2 function.
func2(x_observed, 2, 2, 2)
array([ 182, 1626, 2966, 6846, 15666, 19406, 23546, 28086,
33026, 38366, 44106, 50246, 56786, 63726, 71066, 78806,
86946, 95486, 104426, 113766, 155126, 166466, 178206])
Now, let's approximate it using scipy.optimize.curve_fit.
from scipy.optimize import curve_fit
popt, pcov = curve_fit(func2,x_observed,y_observed) #popt is the optimal estimate, pcov is the covariance
popt
array([ 1.39787684e+02, -4.07809516e-01, 7.97183226e-04])
The popt obtained here stores the optimal estimate. Let's compare it with the estimates obtained by curve fitting with Numpy.polyfit.
You can draw an approximate curve using the optimal estimate by doing the following.
func2(x_observed, 1.39787684e+02, -4.07809516e-01, 7.97183226e-04)
array([136.1819702 , 128.9940092 , 125.44205497, 118.81645644,
110.07383349, 107.47849913, 105.04260142, 102.76614035,
100.64911593, 98.69152815, 96.89337701, 95.25466253,
93.77538468, 92.45554348, 91.29513893, 90.29417102,
89.45263976, 88.77054514, 88.24788717, 87.88466585,
88.02614699, 88.46010889, 89.05350743])
Similarly, let's approximate it to a cubic equation. Define the function func3. a, b, c, and d are the values you want to find.
def func3(X, a, b, c, d): #Quantitative approximation
Y = a + b * X + c * X ** 2 + d * X ** 3
return Y
Now, let's approximate it using scipy.optimize.curve_fit.
from scipy.optimize import curve_fit
popt, pcov = curve_fit(func3,x_observed,y_observed) #popt is the optimal estimate, pcov is the covariance
popt
array([ 7.84214107e+01, 1.88213104e+00, -1.74165777e-02, 3.89638087e-05])
The popt obtained here stores the optimal estimate. Let's compare it with the estimates obtained by curve fitting with Numpy.polyfit.
You can draw an approximate curve using the optimal estimate by doing the following.
func3(x_observed, 7.84214107e+01, 1.88213104e+00, -1.74165777e-02, 3.89638087e-05)
array([ 93.97825188, 118.32181643, 126.93087413, 136.59795028,
135.72770915, 132.27386543, 127.62777811, 122.02323006,
115.69400413, 108.87388316, 101.79665001, 94.69608754,
87.80597859, 81.36010602, 75.59225267, 70.73620141,
67.02573508, 64.69463654, 63.97668864, 65.10567422,
92.76660851, 106.63700433, 123.75703075])
So far, I have created separate functions for linear expressions, quadratic expressions, and cubic expressions, but this is too troublesome, so let's create a general-purpose function that can be used regardless of the order expression. The order of approximation is automatically determined by the number of parameters.
import numpy as np
def func(X, *params):
Y = np.zeros_like(X)
for i, param in enumerate(params):
Y = Y + np.array(param * X ** i)
return Y
Execute the following two codes to confirm that the calculation result by func2 and the calculation result by func3 above are the same.
func(x_observed, 1.39787684e+02, -4.07809516e-01, 7.97183226e-04)
array([136.1819702 , 128.9940092 , 125.44205497, 118.81645644,
110.07383349, 107.47849913, 105.04260142, 102.76614035,
100.64911593, 98.69152815, 96.89337701, 95.25466253,
93.77538468, 92.45554348, 91.29513893, 90.29417102,
89.45263976, 88.77054514, 88.24788717, 87.88466585,
88.02614699, 88.46010889, 89.05350743])
func(x_observed, 7.84214107e+01, 1.88213104e+00, -1.74165777e-02, 3.89638087e-05)
array([ 93.97825188, 118.32181643, 126.93087413, 136.59795028,
135.72770915, 132.27386543, 127.62777811, 122.02323006,
115.69400413, 108.87388316, 101.79665001, 94.69608754,
87.80597859, 81.36010602, 75.59225267, 70.73620141,
67.02573508, 64.69463654, 63.97668864, 65.10567422,
92.76660851, 106.63700433, 123.75703075])
At this point, polynomial approximation can be performed on any degree expression. However, you will need to enter the required number of initial values with p0 = to specify the number of parameters.
from scipy.optimize import curve_fit
popt, pcov = curve_fit(func,x_observed,y_observed, p0=[1, 1])
popt
array([125.77023172, -0.1605313 ])
from scipy.optimize import curve_fit
popt, pcov = curve_fit(func,x_observed,y_observed, p0=[1, 1, 1])
popt
array([ 1.39787684e+02, -4.07809516e-01, 7.97183226e-04])
from scipy.optimize import curve_fit
popt, pcov = curve_fit(func,x_observed,y_observed, p0=[1, 1, 1, 1])
popt
array([ 7.84214107e+01, 1.88213104e+00, -1.74165777e-02, 3.89638087e-05])
from scipy.optimize import curve_fit
popt, pcov = curve_fit(func,x_observed,y_observed, p0=[1, 1, 1, 1, 1])
popt
array([-7.54495613e+01, 1.13179580e+01, -1.50591743e-01, 7.02970546e-04,
-1.07313869e-06])
Let's compare the above calculation result with the estimated value obtained by curve fitting using Numpy.polyfit.
Numpy.polyfit is convenient and easy to use if you just want to approximate polynomials, but you can only approximate polynomials. If you want to approximate with other functions, it's a good idea to understand how to use scipy.optimize.curve_fit.
If you say "optimize?", There are various other methods explained. Although it is in English.
from scipy import optimize
optimize?
Of course it's okay to google, but if you use "?", You can go directly to the explanation, and it's good that you can use it offline.
The dichotomy and Newton's method are typical methods for finding the solution of the equation f (x) = 0. As an example, we will use this function.
import math
def f(x):
return math.exp(x) - 3 * x
import math
f = lambda x: math.exp(x) - 3 * x
A library for scientific and technological calculations called scipy is convenient.
from scipy import optimize
The dichotomy can be used when the function y = f (x) is continuous in the interval [a, b] and there is one solution. For example, if you know that there is a solution in the interval [0,1]
optimize.bisect(f, 0, 1)
0.619061286737633
Oh easy.
Newton's method can be used when the function y = f (x) is monotonically continuous, has no inflection points, and the derivative of f (x) is found. When you want to find x when f (x) = 0
optimize.newton(f, 0)
0.6190612867359452
Oh easy.
You can use "?" To check parameter explanations and usage examples. Although it is in English.
optimize.bisect?
optimize.newton?