University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (1) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (2) https://github.com/legacyworld/sklearn-basic

Exercise 3.2 Training and test errors for polynomial simple regression

Youtube commentary is 4th (1) per 40 minutes Create 30 training data with an error of $ N (0,1) \ times0.1 $ on $ y = \ cos (1.5 \ pi x) $ and perform polynomial regression. Cross-validation enters from here. It returns in order from the 1st order to the 20th order. This is the training data.

Source code

`python:Homework_3.2.py`


import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import PolynomialFeatures as PF
from sklearn import linear_model
from sklearn.pipeline import Pipeline
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import cross_val_score

DEGREE = 20

def true_f(x):
    return np.cos(1.5 * x * np.pi)

np.random.seed(0)
n_samples = 30

#X-axis data for drawing
x_plot = np.linspace(0,1,100)
#Training data
x_tr = np.sort(np.random.rand(n_samples))
y_tr = true_f(x_tr) + np.random.randn(n_samples) * 0.1
#Convert to Matrix
X_tr = x_tr.reshape(-1,1)
X_plot = x_plot.reshape(-1,1)

for degree in range(1,DEGREE+1):
    plt.scatter(x_tr,y_tr,label="Training Samples")
    plt.plot(x_plot,true_f(x_plot),label="True")
    plt.xlim(0,1)
    plt.ylim(-2,2)
    filename = f"{degree}.png "
    pf = PF(degree=degree,include_bias=False)
    linear_reg = linear_model.LinearRegression()
    steps = [("Polynomial_Features",pf),("Linear_Regression",linear_reg)]
    pipeline = Pipeline(steps=steps)
    pipeline.fit(X_tr,y_tr)
    plt.plot(x_plot,pipeline.predict(X_plot),label="Model")
    y_predict = pipeline.predict(X_tr)
    mse = mean_squared_error(y_tr,y_predict)
    scores = cross_val_score(pipeline,X_tr,y_tr,scoring="neg_mean_squared_error",cv=10)
    plt.title(f"Degree: {degree} TrainErr: {mse:.2e} TestErr: {-scores.mean():.2e}(+/- {scores.std():.2e})")
    plt.legend()
    plt.savefig(filename)
    plt.clf()

In the previous task 3.1, I prepared $ x, x ^ 2, x ^ 3 $, etc. in Polynomial Features and then performed Linear Regression, but I learned that it can be done in one shot by using pipeline. When I actually saw the source code in the explanation video of Exercise 3.1, I was using pipeline. There is nothing difficult, just list the processing contents with steps.

steps = [("Polynomial_Features",pf),("Linear_Regression",linear_reg)]
pipeline = Pipeline(steps=steps)
pipeline.fit(X_tr,y_tr)

Other than this part, the difference from Task 3.1 is that cross-validation is included. This part in the program.

scores = cross_val_score(pipeline,X_tr,y_tr,scoring="neg_mean_squared_error",cv=10)

After dividing the data into 10 with cv = 10, one part is used as the test data to evaluate the test error. Basically, the one with a small test error is excellent. When the program is executed, 20 graph files up to 1.png-20.png will be created.

--Minimum training error = 20th order

--Minimum test error = 3rd order

From this, we can see how overfitting is bad.

University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the task (3)

Exercise 3.2 Training and test errors for polynomial simple regression

python:Homework_3.2.py

`python:Homework_3.2.py`