Continuation from the last time University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (1) https://github.com/legacyworld/sklearn-basic
This is a problem of creating training data with an error of $ N (0,1) \ times0.1 $ on $ y = \ sin (x) $ and regressing it with a polynomial. Explanation is the 3rd (1) per 56 minutes 40 seconds
python:Homework_3.1.py
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import copy
from sklearn.preprocessing import PolynomialFeatures as PF
from sklearn import linear_model
from sklearn.metrics import mean_squared_error
#Number of training data
NUM_TR = 6
np.random.seed(0)
rng = np.random.RandomState(0)
#X-axis data for drawing
x_plot = np.linspace(0,10,100)
#Training data
tmp = copy.deepcopy(x_plot)
rng.shuffle(tmp)
x_tr = np.sort(tmp[:NUM_TR])
y_tr = np.sin(x_tr) + 0.1*np.random.randn(NUM_TR)
#Convert to Matrix
X_tr = x_tr.reshape(-1,1)
X_plot = x_plot.reshape(-1,1)
#Data for polynomials
#Degree
degree = 1
pf = PF(degree=degree)
X_poly = pf.fit_transform(X_tr)
X_plot_poly = pf.fit_transform(X_plot)
model = linear_model.LinearRegression()
model.fit(X_poly,y_tr)
fig = plt.figure()
plt.scatter(x_tr,y_tr,label="training Samples")
plt.plot(x_plot,model.predict(X_plot_poly),label=f"degree = {degree}")
plt.legend()
plt.ylim(-2,2)
fig.savefig(f"{degree}.png ")
#Data for polynomials
#All orders
fig = plt.figure()
plt.scatter(x_tr,y_tr,label="Training Samples")
for degree in range(1,NUM_TR):
pf = PF(degree=degree)
X_poly = pf.fit_transform(X_tr)
X_plot_poly = pf.fit_transform(X_plot)
model = linear_model.LinearRegression()
model.fit(X_poly,y_tr)
plt.plot(x_plot,model.predict(X_plot_poly),label=f"degree {degree}")
plt.legend()
mse = mean_squared_error(y_tr,model.predict(X_poly))
print(f"degree = {degree} mse = {mse}")
plt.xlim(0,10)
plt.ylim(-2,2)
fig.savefig('all_degree.png')
We have prepared two data (x_tr
) for calculating regression and one for graph drawing (x_plot
).
If you simply do x_tr = x_plot
, the actual data will not be copied.
If you do it as it is, the number of drawing data will also be NUM_TR
in the part ofx_tr = np.sort (tmp [: NUM_TR])
, and the graph drawing will be strange.
So I use deepcopy.
The original data is prepared by dividing 0-10 into 100 equal parts.
Randomly select only NUM_TR
of the training data (6 in the course)
As an error, the random number generated between 0-1 multiplied by 1/10 is added to sin (x_tr).
Since the seed is fixed at the beginning, the same result is obtained no matter how many times it is executed in any environment.
This is the prepared data
What is different from the past is the part called Polynomial Features. This is the part that prepares training data such as $ x, x ^ 2, x ^ 3, x ^ 4 $ for the degree of the polynomial. For example, if the order = 3, then this is the case.
degree = 3
pf = PF(degree=degree)
X_poly = pf.fit_transform(X_tr)
print(f"degree = {degree}\nX_Tr = {X_tr}\nX_poly = {X_poly}")
The execution result is
degree = 3
X_Tr = [[0.2020202 ]
[2.62626263]
[5.55555556]
[7.57575758]
[8.68686869]
[9.39393939]]
X_poly = [[1.00000000e+00 2.02020202e-01 4.08121620e-02 8.24488122e-03]
[1.00000000e+00 2.62626263e+00 6.89725538e+00 1.81140040e+01]
[1.00000000e+00 5.55555556e+00 3.08641975e+01 1.71467764e+02]
[1.00000000e+00 7.57575758e+00 5.73921028e+01 4.34788658e+02]
[1.00000000e+00 8.68686869e+00 7.54616876e+01 6.55525771e+02]
[1.00000000e+00 9.39393939e+00 8.82460973e+01 8.28978490e+02]]
In the first data, the original training data is $ x = 2.020202 \ times10 ^ {-1} $ and $ x ^ 2 = 4.08 \ times10 ^ {-2} $. The point is that $ x ^ 2 and x ^ 3 $ are treated as different features.
Next, let's regress in the first order (straight line). The result is this.
Finally, change the order by 1-5 and draw each on the graph.
This is the error. The numbers did not match by degree = 5.
degree = 1 mse = 0.33075005001856256
degree = 2 mse = 0.3252271169458752
degree = 3 mse = 0.30290034474812344
degree = 4 mse = 0.010086018410257538
degree = 5 mse = 3.1604543144050787e-22
Recommended Posts