A matrix that represents the number that was correctly determined and the number that was mistakenly determined between the prediction result passed through some model and the actual value.
Generally, binary classification.
For example, when you want to predict whether you have cancer or not from a given image, The actual value is 98/100 for non-cancer people (0) 2/100 for people with cancer (1) Suppose it was.
At this time, if the predictions are all 0, the correct answer rate is 98%. This looks like a good number when viewed in terms of correct answer rate, Is this really a good evaluation? Isn't the two people who missed it a fatal mistake?
Even in such cases, the confusion matrix is used to make a successful evaluation.
In general, the horizontal axis is the prediction result of the model and the vertical axis is the actual value, and they are summarized by the combination of 2 × 2 = 4 as shown in the table below.
True: Results that can be predicted correctly False: Incorrectly predicted result positive: The result of determining that there is a disease (= 1) negative: Result of determining no disease (= 0)
matrix.py
from sklearn.metrics import confusion_matrix
#Creating a confusion matrix
cm = confusion_matrix(y_true=y_test, y_pred=y_pred)
# y_Passing to true is the objective variable data for evaluation
# y_X to pass to pred_test with predict()Results predicted using the function
#Dataframe the confusion matrix
df_cm = pd.DataFrame(np.rot90(cm, 2), index=["actual_Positive", "actual_Negative"], columns=["predict_Positive", "predict_Negative"])
print(df_cm)
#Visualization of confusion matrix with heatmap
sns.heatmap(df_cm, annot=True, fmt="2g", cmap='Blues')
plt.yticks(va='center')
plt.show()
Consider the evaluation index that measures the performance of the model from here
First of all, check how correctly you could classify in the whole data
After getting a positive (1) result, check if you actually answered correctly
The actual data is positive (1), how much Is the predicted data correctly inferred to be positive? The higher this value, the better the performance, and the less the wrong positive judgment is.
The actual data is negative (0), how much Is the predicted data correctly estimated to be negative? The higher this value, the better the performance and the less false Negative judgments.
The actual data is positive (1), how much Was the predicted data mistakenly presumed to be negative? The lower this value is, the better the performance is, and the less the wrong positive judgment is made.
The actual data is negative (0), how much Was the predicted data mistakenly presumed to be positive? The lower this value, the better the performance, and the less false Negative judgments are made.
Positive prediction results | Negative prediction results | |
---|---|---|
Actual positive result | 98 | 0 |
Actual negative result | 2 | 0 |
98% correct answer rate
100% => This determines that all positives are correctly classified
0% => This determines that all negatives were classified incorrectly
To use a binary classification machine learning model in business, calculate an index to measure performance, It is important to understand and use the index value that suits the purpose
Recommended Posts