A memorandum for drawing ROC curves by reading predicted values of multiclass classification saved in xlsx or csv format with pandas. I also practice writing articles for Qiita.
I won't go into the basics in this article. Regarding the story of the ROC curve, the article here is easy to understand.
First, check the data format. --GT has numbers 0, 1, 2 and corresponds to each label. --C1, C2, C3: Each label C1 = 0.0, C2 = 1.0, C3 = 2.0 --M1, M2, M3: Each model
The ROC curve needs to be converted to a 0, 1 binary, so
from sklearn.preprocessing import label_binarize
y_test = label_binarize(df.iloc[:, 0], classes=[0,1,2])
Use sklearn's label_binarize to do a binary translation. It's a little long, so only the upper part. As a result of binary conversion, C1 = [1, 0, 0], C2 = [0, 1, 0], C3 = [0, 0, 1].
After binarizing the labels, the predicted values for each model must also be converted to the corresponding list.
M1_y_score = []
M2_y_score = []
M3_y_score = []
for i in df.index:
M1_y_score.append(([df.iloc[i, 1], df.iloc[i, 2], df.iloc[i, 3]]))
M2_y_score.append(([df.iloc[i, 4], df.iloc[i, 5], df.iloc[i, 6]]))
M3_y_score.append(([df.iloc[i, 7], df.iloc[i, 8], df.iloc[i, 9]]))
M1_y_score = M1_y_score
M2_y_score = M2_y_score
M3_y_score = M3_y_score
Like this, I executed the loop processing and stored the predicted value. At this point,
from sklearn.metrics import roc_auc_score
auc_m1 = roc_auc_score(y_test, M1_y_score, multi_class="ovo")
print(auc_m1)
You can find a multi-class AUC by typing. The argument multi_class seems to throw an error if you don't set either "ovo" or "ovr". More details can be found in sklearn documentation .
This part has just stumbled.
M1_fpr = dict()
M1_tpr = dict()
M1_roc_auc = dict()
M2_fpr = dict()
M2_tpr = dict()
M2_roc_auc = dict()
M3_fpr = dict()
M3_tpr = dict()
M3_roc_auc = dict()
After creating an empty dictionary to store the data
n_class = 3
from sklearn.metrics import roc_curve, auc
for i in range(n_classes):
M1_fpr[i], M1_tpr[i], _ = roc_curve(y_test[:, i], M1_y_score[:, i])
M1_roc_auc[i] = auc(M1_fpr[i], M1_tpr[i])
M2_fpr[i], M2_tpr[i], _ = roc_curve(y_test[:, i], M2_y_score[:, i])
M2_roc_auc[i] = auc(M2_fpr[i], M2_tpr[i])
M3_fpr[i], M3_tpr[i], _ = roc_curve(y_test[:, i], M3_y_score[:, i])
M3_roc_auc[i] = auc(M3_fpr[i], M3_tpr[i])
I loop through the number of labels and store the fpr and tpr of each model. I get an error for some reason! It was because I didn't make it an ndarray when storing the predicted values. So, change the above code a little ...
M1_y_score = np.array(M1_y_score)
M2_y_score = np.array(M2_y_score)
M3_y_score = np.array(M3_y_score)
By making it ndarray type, it is possible to store data in the dictionary. After that, if you code according to the official document, it's OK!
M1_all_fpr = np.unique(np.concatenate([M1_fpr[i] for i in range(n_classes)]))
M2_all_fpr = np.unique(np.concatenate([M2_fpr[i] for i in range(n_classes)]))
M3_all_fpr = np.unique(np.concatenate([M3_fpr[i] for i in range(n_classes)]))
M1_mean_tpr = np.zeros_like(M1_all_fpr)
M2_mean_tpr = np.zeros_like(M2_all_fpr)
M3_mean_tpr = np.zeros_like(M3_all_fpr)
for i in range(n_classes):
M1_mean_tpr += np.interp(M1_all_fpr, M1_fpr[i], M1_tpr[i])
M2_mean_tpr += np.interp(M2_all_fpr, M2_fpr[i], M2_tpr[i])
M3_mean_tpr += np.interp(M3_all_fpr, M3_fpr[i], M3_tpr[i])
M1_mean_tpr /= n_classes
M2_mean_tpr /= n_classes
M3_mean_tpr /= n_classes
M1_fpr["macro"] = M1_all_fpr
M1_tpr["macro"] = M1_mean_tpr
M1_roc_auc["macro"] = auc(M1_fpr["macro"], M1_tpr["macro"])
M2_fpr["macro"] = M2_all_fpr
M2_tpr["macro"] = M2_mean_tpr
M2_roc_auc["macro"] = auc(M2_fpr["macro"], M2_tpr["macro"])
M3_fpr["macro"] = M3_all_fpr
M3_tpr["macro"] = M3_mean_tpr
M3_roc_auc["macro"] = auc(M3_fpr["macro"], M3_tpr["macro"])
Once you've done that, all you have to do is graph using matplotlib.
import matplotlib.pyplot as plt
from matplotlib import cm
lw=1
colors = [cm.gist_ncar(190), cm.gist_ncar(30), cm.gist_ncar(10)]
sns.color_palette(colors)
sns.set_palette(colors, desat=1.0)
plt.figure(figsize=(6, 6))
plt.plot(M1_fpr["macro"], M1_tpr["macro"],
label='M1',
color=colors[0],
linestyle='-',
linewidth=2)
plt.plot(M2_fpr["macro"], M2_tpr["macro"],
label='M2',
color=colors[1],
linestyle='-',
linewidth=2)
plt.plot(M3_fpr["macro"], M3_tpr["macro"],
label='M3',
color=colors[2],
linestyle='-',
linewidth=2)
plt.plot([0, 1], [0, 1], 'k--', lw=lw)
plt.xlim([0.0, 1.0])
plt.ylim([0.0, 1.05])
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.legend(loc="lower right")
plt.show()
<img src="https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/784518/0a44adaf-50eb-9bd7-55bc-1a7a0adaf28b.jpeg ", width=50%)> I was able to draw a ROC curve for multiclass classification using macro mean.
Recommended Posts