Introduction

Grid search and cross-validation are introduced in various places, but since there was no place that introduced the method of cross-validation based on the result of grid search, I will introduce it here.

environment

python: 2.7.6
scikit-learn: 0.17.1

We will introduce how to perform cross-validation based on the results of grid search.

Implementation

Get dataset

First, get the data for machine learning. Since scikit-learn has a dataset prepared in advance, it is a good place to start machine learning for the time being. The details of the dataset are summarized in [this site](the dataset that comes with http://pythondatascience.plavox.info/scikit-learn/scikit-learn/).

`data set`


#Get dataset
iris = datasets.load_iris()

Grid search

Then do a grid search. Set the parameters for grid search and execute. One of the attractions of scikit-learn is that you can easily search the grid. It's cool to call parameters hyperparameters ♪

`Grid search`


#Set parameters for grid search
parameters = {
    'C':[1, 3, 5],
    'loss':('hinge', 'squared_hinge')
}

#Perform grid search
clf = grid_search.GridSearchCV(svm.LinearSVC(), parameters)
clf.fit(iris.data, iris.target)

#Grid search results(Optimal parameters)Get
GS_loss, GS_C = clf.best_params_.values()
print "Optimal parameters:{}".format(clf.best_params_)

The optimum parameters are assigned to'GS_loss' and'GS_C', respectively. Before getting the optimum parameters, it is better to display them once and check the order of the parameters. The order of the parameters does not seem to be the order of the parameters on the Official site (sklearn.svm.LinearSVC) ...

Cross-validation

Cross-validation is performed based on the result of the grid search at the end.

`Cross-validation`


#Cross-validation(Cross-validation)Run
clf = svm.LinearSVC(loss=GS_loss, C=GS_C)
score = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5)

#Display cross-validation results
print "Correct answer rate(average)：{}".format(score.mean())
print "Correct answer rate(minimum)：{}".format(score.min())
print "Correct answer rate(maximum)：{}".format(score.max())
print "Correct answer rate(standard deviation)：{}".format(score.std())
print "Correct answer rate(all)：{}".format(score)

Whole code

`The entire`


# -*- coding: utf-8 -*-
from sklearn import datasets
from sklearn import svm
from sklearn import grid_search
from sklearn import cross_validation

# main
if __name__ == "__main__":
    #Get dataset
    iris = datasets.load_iris()

    #Set parameters for grid search
    parameters = {
        'C':[1, 3, 5],
        'loss':('hinge', 'squared_hinge')
    }

    #Perform grid search
    clf = grid_search.GridSearchCV(svm.LinearSVC(), parameters)
    clf.fit(iris.data, iris.target)

    #Grid search results(Optimal parameters)Get
    GS_loss, GS_C = clf.best_params_.values()
    print "Optimal parameters:{}".format(clf.best_params_)

    #Cross-validation(Cross-validation)Run
    clf = svm.LinearSVC(loss=GS_loss, C=GS_C)
    score = cross_validation.cross_val_score(clf, iris.data, iris.target, cv=5)

    #Display cross-validation results
    print "Correct answer rate(average)：{}".format(score.mean())
    print "Correct answer rate(minimum)：{}".format(score.min())
    print "Correct answer rate(maximum)：{}".format(score.max())
    print "Correct answer rate(standard deviation)：{}".format(score.std())
    print "Correct answer rate(all)：{}".format(score)

Execution result

`Execution result`


Optimal parameters:{'loss': 'squared_hinge', 'C': 1}
Correct answer rate(average)：0.966666666667
Correct answer rate(minimum)：0.9
Correct answer rate(maximum)：1.0
Correct answer rate(standard deviation)：0.0421637021356
Correct answer rate(all)：[ 1.          1.          0.93333333  0.9         1.        ]

Summary

It was a bit disappointing that the grid search results were the same as the LinearSVC () defaults, but for the time being I was able to cross-validate using the grid search results. I'm allergic to English, so I had a hard time learning while looking at the official website.

reference

[Dataset included with scikit-learn](http://pythondatascience.plavox.info/scikit-learn/Dataset included with scikit-learn /)

Official site (sklearn.svm.LinearSVC)

Introduction of python machine learning library scikit-learn

I tried cross-validation based on the grid search results with scikit-learn