Perform least squares fitting with numpy.

About y explained by a single variable x using Python's matrix operation library numpy I would like to do a least squares fitting.

First, use numpy to stochastically draw a cubic graph.

Drawing a graph


#Module import
import numpy as np
import matplotlib.pyplot as plt

#Explanatory variable(1D)
x = np.arange(-3,7,0.5)
#Response variable(It was a three-dimensional function of the explanatory variables and was randomly generated based on the normal distribution.
y = 10*np.random.rand()+x * np.random.rand() + 
    2*np.random.rand()*x**2  +x**3

#drawing
plt.scatter(x,y)
plt.show()

生データグラフ

In the least squares method, the L2 norm of the data and the predicted value is Interpret it as an error and find the coefficient of the regression line to minimize it. If the data you want to predict is y and the regression line is $ p $, Error will be

Error = \sum_{i=1}^{N}(y_i - p_i)

It will be. Minimizing this error is the goal of least squares regression.

Also, $ p_i $ is expressed by the nth-order equation as follows.

Linear expression\\
p_i = a_1 x + a_0 \\
Quadratic expression\\
p_i = a_2 x^2 + a_1 x + a_0 \\
Third-order formula\\
p_i = a_3 x^3 +  a_2 x^2 + a_1 x + a_0 \\
n-th order equation\\
p_i = a_n x^n + ...  a_2 x^2 + a_1 x + a_0\\

This time, I would like to find the coefficient $ A = (a_0, a_1, .. a_N) $ of the equation after fitting using the polyfit function of numpy. Find the coefficients of the regression equation with the polyfit function. Then apply the coefficients to the n-th order equation Find the regression equation, but if it gets complicated It is also convenient to use the polyfit1d function.

Fitting and drawing of the nth order obtained by it



#Linear expression
coef_1 = np.polyfit(x,y,1) #coefficient
y_pred_1 = coef_1[0]*x+ coef_1[1] #Fitting function

#Quadratic expression
coef_2 = np.polyfit(x,y,2) 
y_pred_2 = coef_2[0]*x**2+ coef_2[1]*x + coef_2[2] 

#Third-order formula
coef_3 = np.polyfit(x,y,3) 
y_pred_3 = np.poly1d(coef_3)(x) #np.poly1d,Obtained coefficient coef_3 is automatically applied to the formula.

#drawing

plt.scatter(x,y,label="raw_data") #original data
plt.plot(x,y_pred_1,label="d=1") #Linear expression
plt.plot(x,y_pred_2,label="d=2") #Quadratic expression
plt.plot(x,y_pred_3,label = "d=3") #Third-order formula
plt.legend(loc="upper left")
plt.title("least square fitting")
plt.show()

フィッティング後のグラフ

This time, the fitting by the cubic formula seems to be good. The higher the order, the smaller the error tends to be, Overfitting that depends only on the resulting dataset Let's be careful.

Recommended Posts

Perform least squares fitting with numpy.
Moving average with numpy
Model fitting with lmfit
Getting Started with Numpy
Learn with Cheminformatics NumPy
Matrix concatenation with Numpy
Hamming code with numpy
Regression analysis with NumPy
Extend NumPy with Rust
Solving with Ruby and Python AtCoder ARC 059 C Least Squares
I wrote GP with numpy
CNN implementation with just numpy
Artificial data generation with numpy
[Python] Calculation method with numpy
Try matrix operation with NumPy
Diffusion equation animation with NumPy
Debt repayment simulation with numpy
Implemented SMO with Python + NumPy
Stick strings together with Numpy
Perform logical operations with Perceptron
[Python] Curve fitting with polynomial
Handle numpy arrays with f2py
Use OpenBLAS with numpy, scipy
Perform Stratified Split with PyTorch
Python3 | Getting Started with numpy
Implementing logistic regression with NumPy
Least squares method and maximum likelihood estimation method (comparison by model fitting)