When we analyzed using a small amount of data, we used LeaveOneOut cross-validation as a model evaluation method, so we will share it.
LeaveOneOut cross-validation trains and tests n samples of data, using one sample as test data and the other as train data. Then, while exchanging the test data, this is repeated n times to evaluate the performance of the model. Speaking of k-validation, k is the same value as n samples of data volume. It seems to be used when the amount of data is small.
Below, we will evaluate simple regression using the LOO method.
There is a certain DataFrame, and the explanatory variable used for simple regression is specified by loo_column
.
Suppose the DataFrame's mokuteki
contains an objective variable.
It trains n times while exchanging data, and finally calculates and returns RootMeanSquaredError
.
Statsmodels
is used for simple regression.
loo.py
from sklearn.model_selection import LeaveOneOut
from statsmodels import api as sm
loo_column = "setsumei"
def loo_rmse(df,loo_column):
loo_X = df[loo_column]
#Create a simple regression constant term.
loo_X = sm.add_constant(loo_X)
loo_y = df_analytics["recognition"]
loo = LeaveOneOut()
loo.get_n_splits(loo_X)
# square_List to save errors
se_list = list()
#Repeat the data while exchanging the indexes of the data used for train and test
for train_index, test_index in loo.split(loo_X):
X_train, X_test = loo_X.iloc[train_index], loo_X.iloc[test_index]
y_train, y_test = loo_y.iloc[train_index], loo_y.iloc[test_index]
#Simple regression learning
model = sm.OLS(y_train,X_train)
result = model.fit()
#Prediction for test data based on learning results. Get the error.
pred = result.params["const"] + result.params[loo_column] * X_test[loo_column].values[0]
diff = pred - y_test.values[0]
#Square the error and save
se_list.append(diff**2)
#Average the squared error, take the route and return
ar = np.array(se_list)
print("RMSE:",np.sqrt(ar.mean()))
http://scikit-learn.org/stable/modules/generated/sklearn.model_selection.LeaveOneOut.html
Thank you very much.
Recommended Posts