In machine learning, you divide the data into two parts, one for training and the other for testing, and measure the accuracy. Therefore, KFold makes it easy to divide the data. There are also Static KFold and Shuffle Split, but this time I will use KFold.
KFold divides the data into k datasets, for example, if you divide it into 10 datasets, 9 will be used as training datasets and the remaining 1 will be used for testing. By the way, in this case, test 10 times so that the 10 separated data sets are always used once for testing. The default code looks like this:
KFold
from sklearn.cross_validation import KFold
KFold(n_splits=3, shuffle=False, random_state=None)
And below are the parameters in KFold.
n_split Specifies how many pieces of data to divide. default is 3. The test is repeated for the number of numerical values specified here.
shuffle The default is False, but by setting it to True, you can create a group by randomly fetching values from the dataset instead of simply grouping consecutive numbers.
random_state By making this a numerical value with a random number control parameter, the same data set can be obtained every time.
This Official Document
cross_val_score is a convenient tool that allows you to specify the classifier, training data, and test data and determine their accuracy. Below is the default code.
cross_val_score
from sklearn.model_selection import cross_val_score
cross_val_score(estimator, X, y=None, groups=None, scoring=None, cv=None, n_jobs=1, verbose=0, fit_params=None, pre_dispatch='2*n_jobs')
Below is a description of the main parameters.
estimator This will specify the classifier.
X Specifying data for training
y Specifying data for testing
scoring Specify how to score. In addition to accuracy accuracy, there are average_precision and f1. Link
cv Abbreviation for cross-validation, which allows you to specify how to split the data.
This Official Document
It is a method to apply multiple main machine learning methods in one shot. This time, we will use DecisionTreeClassifier, KNeighborsClassifier, and SVC. First of all, please have the code. By the way, here, we will proceed to the extent that the training data set and the test data set have already been separated.
machine_learning
from sklearn.model_selection import KFold
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.svm import SVC
#Store machine learning models in a list
models = []
models.append(("KNC",KNeighborsClassifier()))
models.append(("DTC",DecisionTreeClassifier()))
models.append(("SVM",SVC()))
#Applying multiple classifiers
results = []
names = []
for name,model in models:
kfold = KFold(n_splits=10, random_state=42)
result = cross_val_score(model,X_train,Y_train, cv = kfold, scoring = "accuracy")
names.append(name)
results.append(result)
#Score display of applied classifier
for i in range(len(names)):
print(names[i],results[i].mean())
The result is
KNC 0.88 DTC 0.91 SVM 0.79
I think it will be like that.
Recommended Posts