Hello. This is Hayashi @ Ienter.

In the previous Blog, the color reduction process of the image was performed using scikit-learn's k-means algorithm and OpenCV.

This time, using the handwritten character data sample prepared in scikit-learn. Let's do a quick performance check of multiple classifiers.

Reading handwritten character data

A sample of handwritten character data is prepared in scikit-learn datasets, so load it.

The explanatory variable X is an array of image data from 0 to 9, and the objective variable Y is an array of numbers from 0 to 9 corresponding to each image.

The first data of the X data is such an array of 64 numbers.

Actually, this array is 8x8 size image data as an image, so Let's process the array and display the first 20 data. The image is displayed as a grayscale pixel image.

K-fold cross-validation

This time, we will evaluate the accuracy of the classifier using "K-fold cross-validation". "K-fold cross-validation" divides a sample group into K blocks and evaluates K-1 blocks as training data and the remaining 1 block as test data. Also, the test block will be evaluated while switching from the 1st to the Kth. The image is as follows. scikit-learn provides K-fold for cross_validation. This time, we will prepare a K-fold that divides the sample data into 10 parts.

Classifier to evaluate

Check the performance of the following classifiers. LogisticRegression ([Logistic Regression](https://ja.wikipedia.org/wiki/Logistic Regression)) GaussianNB ([Naive Bayes](https://ja.wikipedia.org/wiki/naive Bayes classifier)) SVC ([Support Vector Machine](https://ja.wikipedia.org/wiki/Support Vector Machine) ))) DecisionTreeClassifier ([Decision Tree](https://ja.wikipedia.org/wiki/Decision Tree)) RandomForestClassifier ([Random Forest](https://ja.wikipedia.org/wiki/Random Forest)) AdaBoostClassifier（AdaBoost） KNeighborsClassifier ([K-nearest neighbor method](https://ja.wikipedia.org/wiki/K-nearest neighbor method) )))

For SVC, check the kernel types with three types: "rbf (Gaussian kernel)", "linear (linear kernel)", and "poly (polynomial kernel)".

Prepare an array whose elements are the classifier instance and name as shown below.

About performance check

Performance checks are evaluated based on the accuracy and analysis speed of each classifier. For accuracy, 10 prediction tests in K-fold are scored and averaged with accuracy_score in sklearn.metrics. In addition, the analysis speed measures the time required from learning (fit) to prediction (predict). Take the average.

The following result was output.

The three types of kernels, "SVC" (support vector machine) and "K Neighbors Classifier" (K-nearest neighbor method), give good numerical values.

The highest accuracy is SVC-rbf, but the analysis time seems to take some time. KNeighborsClassifier is the second numerical value in accuracy, but the analysis speed is 4 times faster than SVC-rbf.

Comprehensively assessing accuracy and speed, the K-NeighborsClassifier classifier is probably the best performing classifier in this test.

That's all for this story!

[Machine learning] Check the performance of the classifier with handwritten character data

Reading handwritten character data

K-fold cross-validation

Classifier to evaluate

About performance check