Grid search and cross-validation are introduced in various places, but since there was no place that introduced the method of cross-validation based on the result of grid search, I will introduce it here.
We will introduce how to perform cross-validation based on the results of grid search.
First, get the data for machine learning. Since scikit-learn has a dataset prepared in advance, it is a good place to start machine learning for the time being. The details of the dataset are summarized in [this site](the dataset that comes with http://pythondatascience.plavox.info/scikit-learn/scikit-learn/).
data set
#Get dataset
iris = datasets.load_iris()
Then do a grid search. Set the parameters for grid search and execute. One of the attractions of scikit-learn is that you can easily search the grid. It's cool to call parameters hyperparameters ♪
Grid search
#Set parameters for grid search
parameters = {
'C':[1, 3, 5],
'loss':('hinge', 'squared_hinge')
}
#Perform grid search
clf = grid_search.GridSearchCV(svm.LinearSVC(), parameters)
clf.fit(iris.data, iris.target)
#Grid search results(Optimal parameters)Get
GS_loss, GS_C = clf.best_params_.values()
print "Optimal parameters:{}".format(clf.best_params_)
The optimum parameters are assigned to'GS_loss' and'GS_C', respectively. Before getting the optimum parameters, it is better to display them once and check the order of the parameters. The order of the parameters does not seem to be the order of the parameters on the Official site (sklearn.svm.LinearSVC) ...
Cross-validation is performed based on the result of the grid search at the end.
Cross-validation
#Cross-validation(Cross-validation)Run
clf = svm.LinearSVC(loss=GS_loss, C=GS_C)
score = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5)
#Display cross-validation results
print "Correct answer rate(average):{}".format(score.mean())
print "Correct answer rate(minimum):{}".format(score.min())
print "Correct answer rate(maximum):{}".format(score.max())
print "Correct answer rate(standard deviation):{}".format(score.std())
print "Correct answer rate(all):{}".format(score)
The entire
# -*- coding: utf-8 -*-
from sklearn import datasets
from sklearn import svm
from sklearn import grid_search
from sklearn import cross_validation
# main
if __name__ == "__main__":
#Get dataset
iris = datasets.load_iris()
#Set parameters for grid search
parameters = {
'C':[1, 3, 5],
'loss':('hinge', 'squared_hinge')
}
#Perform grid search
clf = grid_search.GridSearchCV(svm.LinearSVC(), parameters)
clf.fit(iris.data, iris.target)
#Grid search results(Optimal parameters)Get
GS_loss, GS_C = clf.best_params_.values()
print "Optimal parameters:{}".format(clf.best_params_)
#Cross-validation(Cross-validation)Run
clf = svm.LinearSVC(loss=GS_loss, C=GS_C)
score = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5)
#Display cross-validation results
print "Correct answer rate(average):{}".format(score.mean())
print "Correct answer rate(minimum):{}".format(score.min())
print "Correct answer rate(maximum):{}".format(score.max())
print "Correct answer rate(standard deviation):{}".format(score.std())
print "Correct answer rate(all):{}".format(score)
Execution result
Optimal parameters:{'loss': 'squared_hinge', 'C': 1}
Correct answer rate(average):0.966666666667
Correct answer rate(minimum):0.9
Correct answer rate(maximum):1.0
Correct answer rate(standard deviation):0.0421637021356
Correct answer rate(all):[ 1. 1. 0.93333333 0.9 1. ]
It was a bit disappointing that the grid search results were the same as the LinearSVC () defaults, but for the time being I was able to cross-validate using the grid search results. I'm allergic to English, so I had a hard time learning while looking at the official website.
[Dataset included with scikit-learn](http://pythondatascience.plavox.info/scikit-learn/Dataset included with scikit-learn /)
Official site (sklearn.svm.LinearSVC)
Introduction of python machine learning library scikit-learn
Recommended Posts