Identify where the misclassification occurred to improve the accuracy of your data analysis results

That is the theme of this time.

So, today we will use the confusion matrix to visualize where the misclassification occurred.



from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix

clf = DecisionTreeClassifier()

clf.fit(X_train, Y_train)
result = clf.predict(X_test)
cm = confusion_matrix(Y_test, result)

print(cm)

If you use the iris dataset, it will be visualized as shown in the figure below.

Screen Shot 2017-05-12 at 17.47.25.png Extracted from sklearn Official Document

It's a bit small and hard to see, but the y-axis is True value, that is, the correct labeling, the x-axis is Predicted value, and it's labeled using a machine learning model. Looking at the above figure, misclassification occurs in the center line and on the right.

Recognizing this, you may be able to improve the accuracy by reviewing the data preprocessing and readjusting the parameters of the machine learning model.

Recommended Posts

How to visualize where misclassification is occurring in data analysis classification

How to use is and == in Python

How to use data analysis tools for beginners

How to create data to put in CNN (Chainer)

How to read time series data in PyTorch

I want to visualize where and how many people are in the factory

How to replace with Pandas DataFrame, which is useful for data analysis (easy)

The first step to log analysis (how to format and put log data in Pandas)

How to study Python 3 engineer certification data analysis test by Python beginner (passed in September 2020)

How to use xgboost: Multi-class classification with iris data

How to apply markers only to specific data in matplotlib