A trained model built using training data I will touch on the evaluation index that determines how good it is.
First, the confusion matrix is the prediction result of the model for each test data. It is a table that summarizes the number of prediction results that apply to each of the four perspectives of True Positive, True Negative, False Positive, and False Negative.
“True or false” indicates whether the prediction was correct, and “Positive or negative” indicates the predicted class. In other words
(1) The number of true positives predicted to be in the positive class and the result was also in the positive class
(2) The number of true negatives predicted to be in the negative class and the result was also in the negative class
③ The number of false positives predicted to be in the positive class but the result was in the negative class
④ The number of false negatives predicted to be in the negative class but the result was in the positive class
Are shown respectively.
The machine learning model answers True Positive and True Negative correctly. False Positives and False Negatives indicate that the machine learning model is incorrect.
sklearn.In the metrics module
confusion_Let's actually see the number of each component of the confusion matrix using the matrix function.
The confusion_matrix function can be used as follows.
from sklearn.metrics import confusion_matrix
confmat = confusion_matrix(y_true, y_pred)
y_true stores the actual class of correct data as an array y_pred contains the expected classes as an array. The way it is stored is as shown in the figure confirmed in the confusion matrix.
If you can actually build a classification model, is that classification model better than other classification models? We need a clear standard to evaluate whether it is good or not.
Check the correct answer rate. The correct answer rate means that the diagnosis results were correct in all the events. It is a percentage of the number (classified as TP / TN) and can be calculated as follows.
Precision / precision is the percentage of data that is predicted to be positive that is actually positive. (Predicted success rate) Recall represents the percentage of actual positive data that can be predicted to be positive. (Practical, reliable rate)
The F value is a combination of both the precision and recall (harmonic mean).
Basically, check not only the correct answer rate but also the F value, accuracy, and recall rate. I will check if it is really reliable.
Let's use the performance evaluation index implemented in scikit-learn.
#Conformity rate, recall rate, F value
from sklearn.metrics import precision_score, recall_score, f1_score
#Stores data. This time 0 is positive and 1 is negative
y_true = [0,0,0,1,1,1]
y_pred = [1,0,0,1,1,1]
# y_true is the correct label, y_Pass each prediction result label to pred
print("Precision: {:.3f}".format(precision_score(y_true, y_pred)))
print("Recall: {:.3f}".format(recall_score(y_true, y_pred)))
print("F1: {:.3f}".format(f1_score(y_true, y_pred)))
The relationship between these two performance evaluation indexes is a trade-off relationship. The trade-off relationship is If you try to increase the recall rate, the precision rate will decrease. If you try to increase the precision rate, it means that the recall rate will decrease.
If many patients are positive in a strict examination at a hospital examination Higher recall, but lower accuracy, etc.
Select and use the recall rate, accuracy, and F value according to the basics and the content to be handled.
A PR curve is a graph that plots data, with the horizontal axis representing recall and the vertical axis representing precision / precision.
Let me give you an example. For 10 patients who have undergone cancer screening After calculating the possibility of cancer for each, consider declaring the patient positive or negative based on it.
In this case, the precision is of the number of patients declared positive in the cancer screening. The percentage of patients who really have cancer Recall is among patients who are truly cancerous The rate at which cancer was declared.
The problem here is when 10 patients are listed in order of increasing likelihood of cancer. The top number of people to declare positive.
Depending on how many people are declared positive Both recall and precision / precision will change.
At this time, if only the first person is positive, if the second person is positive, and so on. The figure that calculates the precision / recall and plots them all It can be called a PR curve. The process of plotting is as follows.
The plot of these precision / recall and recall is as follows. Also, the shape of the PR curve changes depending on the result.
From the above figure, it can be said that the relationship between recall and precision / precision is a trade-off.
Considering the maximum effective utilization by placing it on the PR curve, let's first review the two axes.
High precision / low recall Although there is little waste, it can be said that the judgment is often missed. In other words, it can be said that there is a loss of opportunity.
Low precision / high recall It can be said that there are few omissions, but it is judged that there are many wasted shots. In other words, the budget for the approach is likely to be wasted.
It's best to have high precision / recall and high recall. However, due to the trade-off relationship, if you try to raise either one, one will fall.
However, there is a point in the PR curve where the precision and recall match. This point
Break even point(BEP)Is called.
In this respect, it is an important point in business because it is possible to optimize costs and profits while maintaining a well-balanced relationship between precision / precision and recall. I touched on the evaluation index called F value, but you should keep the break even point as a similar concept.
Let's evaluate the model using the PR curve. The superiority and inferiority of the model based on the PR curve is as follows. In other words, it can be said that the better the model was built, the more BEP transitioned to the upper right. This is because the more the BEP moves to the upper right, the higher the precision / precision and recall at the same time.