○ The main points of this article Note that overfitting has been reproduced Overfitting: It can handle learning data, but it cannot handle unknown data. Feeling that there is no application power.
How to check model overfitting and overfitting
from sklearn.datasets import load_boston
from sklearn.svm import SVR
from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
%matplotlib inline
#Data preparation. Boston Home Prices
data = load_boston()
X = data.data[:, [5,]] #Extract only the number of rooms as explanatory variables
y = data.target
#Separated into training data and test data
train_X, test_X = X[:400], X[400:]
train_y, test_y = y[:400], y[400:]
#SVR with modified hyperparameters(Support vector machine (kernel method))Learning at
model_s = SVR(C=1.0, kernel='rbf') #Uses rbf kernel with regularization parameter 1
model_s.fit(train_X, train_y)
#Prediction using learning data
s_pred = model_s.predict(train_X)
#Prediction using test data (prediction for unknown data)
s_pred_t = model_s.predict(test_X)
#graph display
fig, ax = plt.subplots()
ax.scatter(train_X, train_y, color='red', marker='s', label='data')
ax.plot(train_X, s_pred, color='blue', label='svr_rbf curve(train)')
ax.plot(test_X, s_pred_t, color='orange', label='svr_rbf curve(test)')
ax.legend()
plt.show()
print("○ Mean square error and coefficient of determination of training data")
print(mean_squared_error(train_y, s_pred))
print(r2_score(train_y, s_pred))
print("○ Mean square error and coefficient of determination of test data")
print(mean_squared_error(test_y, s_pred_t))
print(r2_score(test_y, s_pred_t))
result ○ Mean square error and coefficient of determination of training data 30.330756428515905 0.6380880725968641 ○ Mean square error and coefficient of determination of test data 69.32813164021485 -1.4534559402985217
The training data line (blue) is drawn fairly nicely, but the test data line (orange) is subtle. It is clear from the values of mean square error and coefficient of determination. This is overfitting.
There are various ways to prevent overfitting, but I'll explain them again. ・ Increase the number of learning (training) data ・ Perform cross-validation ・ Adjust hyperparameters (make the model simple) ・ Reduce features ・ Implement regularization
Recommended Posts