While reading "Deep Learning from scratch" (written by Yasuki Saito, published by O'Reilly Japan), I will make a note of the sites I referred to. Part 18 ←
I've been able to judge dogs and cats reasonably well, but I feel that the correct answer rate is still less than 90%, so I think I'll try the Data Augmentation described on page 245 of the book. ..
Data Augmentation What seems to be easy to do by expanding data Inversion rotation Move Is it around?
So, as I saw it on GradCAM, it seems that the cat responds to the curled back posture and the dog responds to the nose.
That means
Isn't it possible to improve the accuracy of cat identification by adding a rotated or inverted image of a cat? If you enlarge the area around the face of the dog image, the dog's identification accuracy will improve. What can be considered.
Therefore, I would like to verify how adding extended data enhances learning.
The data used so far is made fairly well, just because it is only necessary to confirm the operation of the program. There are only 100 test data items.
I want to save about 1000 test data at random. For the rest of the training data, merge the dog and cat images and then rearrange them randomly. The extended data will be separated for each dog and cat processing method, and will be merged with the training data during learning so that the effect can be verified. In the verification, not only the overall correct answer rate, but also the correct answer rate for dogs and the correct answer rate for cats are verified. In addition, check which features the incorrect image responds to with GradCAM.
With such a policy, we recreated the training data.
def rnd_list(motoarray, toridasi):
#Create an integer list from 0 to the number of data in the original np array
#After sorting randomly
#Returns a list of the specified number of integers and a list of the remaining integers
import random
import numpy as np
kensuu , tate, yoko, channel = motoarray.shape
moto = list(range(0, kensuu))
random.shuffle(moto)
sel = moto[0:toridasi]
nokori=moto[toridasi:]
return sel, nokori
def bunkatu(motoarray, toridasi, lblA):
#np array data
#In the specified number list and the rest of the list
#To divide
sel, nokori = rnd_list(motoarray, toridasi)
tsl = []
tsi = []
trl = []
tri = []
for i in sel:
imgA = dogimg[i]
tsl.append(lblA)
tsi.append(imgA)
for i in nokori:
imgA = dogimg[i]
trl.append(lblA)
tri.append(imgA)
return tsl, tsi, trl, tri
def rnd_arry(tri, trl):
#An array of images and an array of labels
#Randomly sorted
#List and return
sel, nokori = rnd_list(tri, 0)
wtri = []
wtrl = []
for i in nokori:
imgA = tri[i]
lblA = trl[i]
wtri.append(imgA)
wtrl.append(lblA)
return wtri, wtrl
#Divide for training and testing and integrate dogs and cats
bunkatusuu = 500
ctsl, ctsi, ctrl, ctri = bunkatu(catimg, bunkatusuu, 0)
dtsl, dtsi, dtrl, dtri = bunkatu(dogimg, bunkatusuu, 1)
tri=np.append(ctri, dtri, axis=0)
trl=np.append(ctrl, dtrl, axis=0)
tsi=np.append(ctsi, dtsi, axis=0)
tsl=np.append(ctsl, dtsl, axis=0)
#Sort randomly
wtri, wtrl = rnd_arry(tri, trl)
wtsi, wtsl = rnd_arry(tsi, tsl)
#save
dataset = {}
dataset['test_label'] = np.array(wtsl, dtype=np.uint8)
dataset['test_img'] = np.array(wtsi, dtype=np.uint8)
dataset['train_label'] = np.array(wtrl, dtype=np.uint8)
dataset['train_img'] = np.array(wtri, dtype=np.uint8)
import pickle
save_file = '/content/drive/My Drive/Colab Notebooks/deep_learning/dataset/catdog.pkl'
with open(save_file, 'wb') as f:
pickle.dump(dataset, f, -1)
Training data 23994 (dog 11997, cat 11997), test data 1000 (dog 500, cat 500) Is done.
If you input this and process it with DeepConvNet made in Part 18
Epoch 1/10 188/188 [==============================] - 373s 2s/step - loss: 0.7213 - accuracy: 0.5663 Epoch 2/10 188/188 [==============================] - 373s 2s/step - loss: 0.6378 - accuracy: 0.6290 Epoch 3/10 188/188 [==============================] - 373s 2s/step - loss: 0.5898 - accuracy: 0.6713 Epoch 4/10 188/188 [==============================] - 374s 2s/step - loss: 0.5682 - accuracy: 0.6904 Epoch 5/10 188/188 [==============================] - 373s 2s/step - loss: 0.5269 - accuracy: 0.7128 Epoch 6/10 188/188 [==============================] - 374s 2s/step - loss: 0.4972 - accuracy: 0.7300 Epoch 7/10 188/188 [==============================] - 372s 2s/step - loss: 0.4713 - accuracy: 0.7473 Epoch 8/10 188/188 [==============================] - 374s 2s/step - loss: 0.4446 - accuracy: 0.7617 Epoch 9/10 188/188 [==============================] - 373s 2s/step - loss: 0.4318 - accuracy: 0.7665 Epoch 10/10 188/188 [==============================] - 376s 2s/step - loss: 0.4149 - accuracy: 0.7755 32/32 - 4s - loss: 0.3811 - accuracy: 0.8420
The result is a correct answer rate of 84.2%.
predictions = model.predict(x_test)
#List the subscripts for misjudgment
gohantei = []
kensuu, w = predictions.shape
for i in range(kensuu):
predictions_array = predictions[i]
predicted_label = np.argmax(predictions_array)
true_label = t_test[i]
if predicted_label != true_label:
gohantei.append(i)
print(len(gohantei))
158
There were 158 false positives.
def plot_image(i, predictions, t_label, img):
class_names = ['cat', 'dog']
predictions_array = predictions[i]
img = img[i].reshape((80, 80, 3))
true_label = t_label[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.imshow(img, cmap=plt.cm.binary)
predicted_label = np.argmax(predictions_array)
if predicted_label == true_label:
color = 'blue'
else:
color = 'red'
plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
100*np.max(predictions_array),
class_names[true_label]),
color=color)
num_cols = 10
num_rows = int(len(gohantei) / num_cols ) + 1
plt.figure(figsize=(2*num_cols, 2.5*num_rows))
j = 0
for i in gohantei:
inuneko = t_test[i]
if inuneko == 0:
plt.subplot(num_rows, num_cols, j+1)
plot_image(i, predictions, t_test, x_test)
j +=1
plt.show()
print("I made a mistake with the cat",j)
plt.figure(figsize=(2*num_cols, 2.5*num_rows))
j = 0
for i in gohantei:
inuneko = t_test[i]
if inuneko == 1:
plt.subplot(num_rows, num_cols, j+1)
plot_image(i, predictions, t_test, x_test)
j +=1
plt.show()
print("I made a mistake in the dog",j)
The breakdown of misjudgment was 109 cats and 49 dogs. Cats are more than twice as misjudged as dogs.
Let's see if cat misjudgment can be reduced by expanding and adding cat data.
#Extract only cat images
catdatalist = []
kensuu = len(dataset['train_img'])
for i in range(kensuu):
label = dataset['train_label'][i]
if label == 0:
catdatalist.append(i)
print(len(catdatalist))
11997
#Create a left-right inverted image dataset of a cat
trl = []
tri = []
lbl = 0
for i in catdatalist:
img = dataset['train_img'][i]
img = img[:, ::-1, :]
trl.append(lbl)
tri.append(img)
catdataset = {}
catdataset['train_label'] = np.array(trl, dtype=np.uint8)
catdataset['train_img'] = np.array(tri, dtype=np.uint8)
tri =np.append(dataset['train_img'], catdataset['train_img'], axis=0)
trl =np.append(dataset['train_label'], catdataset['train_label'], axis=0)
x_train = tri / 255.0
t_train = trl
Train with the training data with the inverted cat data added.
model.fit(x_train, t_train, epochs=10, batch_size=128)
Epoch 1/10 282/282 [==============================] - 571s 2s/step - loss: 0.6604 - accuracy: 0.6783 Epoch 2/10 282/282 [==============================] - 569s 2s/step - loss: 0.5840 - accuracy: 0.7220 Epoch 3/10 282/282 [==============================] - 570s 2s/step - loss: 0.5407 - accuracy: 0.7511 Epoch 4/10 282/282 [==============================] - 572s 2s/step - loss: 0.5076 - accuracy: 0.7689 Epoch 5/10 282/282 [==============================] - 565s 2s/step - loss: 0.4808 - accuracy: 0.7860 Epoch 6/10 282/282 [==============================] - 566s 2s/step - loss: 0.4599 - accuracy: 0.7974 Epoch 7/10 282/282 [==============================] - 563s 2s/step - loss: 0.4337 - accuracy: 0.8115 Epoch 8/10 282/282 [==============================] - 565s 2s/step - loss: 0.4137 - accuracy: 0.8181 Epoch 9/10 282/282 [==============================] - 564s 2s/step - loss: 0.3966 - accuracy: 0.8256 Epoch 10/10 282/282 [==============================] - 565s 2s/step - loss: 0.3759 - accuracy: 0.8331
test_loss, test_acc = model.evaluate(x_test, t_test, verbose=2)
32/32 - 4s - loss: 0.3959 - accuracy: 0.8220
predictions = model.predict(x_test)
#List the subscripts for misjudgment
gohantei = []
kensuu, w = predictions.shape
for i in range(kensuu):
predictions_array = predictions[i]
predicted_label = np.argmax(predictions_array)
true_label = t_test[i]
if predicted_label != true_label:
gohantei.append(i)
print(len(gohantei))
178
def plot_image(i, predictions, t_label, img):
class_names = ['cat', 'dog']
predictions_array = predictions[i]
img = img[i].reshape((80, 80, 3))
true_label = t_label[i]
plt.grid(False)
plt.xticks([])
plt.yticks([])
plt.imshow(img, cmap=plt.cm.binary)
predicted_label = np.argmax(predictions_array)
if predicted_label == true_label:
color = 'blue'
else:
color = 'red'
plt.xlabel("{} {:2.0f}% ({})".format(class_names[predicted_label],
100*np.max(predictions_array),
class_names[true_label]),
color=color)
num_cols = 10
num_rows = int(len(gohantei) / num_cols ) + 1
plt.figure(figsize=(2*num_cols, 2.5*num_rows))
j = 0
for i in gohantei:
inuneko = t_test[i]
if inuneko == 0:
plt.subplot(num_rows, num_cols, j+1)
plot_image(i, predictions, t_test, x_test)
j +=1
plt.show()
print("I made a mistake with the cat",j)
plt.figure(figsize=(2*num_cols, 2.5*num_rows))
j = 0
for i in gohantei:
inuneko = t_test[i]
if inuneko == 1:
plt.subplot(num_rows, num_cols, j+1)
plot_image(i, predictions, t_test, x_test)
j +=1
plt.show()
print("I made a mistake in the dog",j)
Wrong cat 28 Wrong dog 150
When the inverted data is not entered There were 158 mistakes, 109 mistakes for cats, and 49 mistakes for dogs. Only for cats, the accuracy is greatly improved. However, the accuracy of the dog decreased by that amount, and the accuracy of the dog as a whole also decreased.
Learning seems to be strengthened by adding data, but does it mean that the side effects are also large?
The cat image that changed from a mistake to the correct answer The red mark is the one that was wrong the second time
The dog image that changed from the correct answer to the wrong answer The yellow mark was the first mistake
The image of a cat sitting like a dog is now correctly judged, but the image of a dog sitting normally is also judged as a "cat"? Also, in the first time, a large copy of the face, which would have been judged as a "dog" only by the nose, is judged as a "cat".
Woom. I don't know.
So what happens if you learn by adding an inverted image of the dog? I tried it.
In other words, the number of training data will be doubled.
Result is
32/32 - 4s - loss: 0.2186 - accuracy: 0.9090
Corrected answer rate improved to 90%. Of the 91 misjudgments, 48 were mistaken for cats as dogs and 43 were mistaken for dogs as cats.
When the inverted data is not entered There were 158 mistakes, 109 mistakes for cats, and 49 mistakes for dogs. The number of mistaken cats has been halved.
It seems that the image of sitting like a dog and the image of a large face are changed from mistakes to correct answers.
From the above, can we say the following?
-The training data is flipped horizontally, and even if the number of cases is doubled, it can be used for learning and is effective. ・ When classifying into two as in this example, it is better to have the same number of training data for dogs and cats for more unbiased learning.
However, if the number of training data doubles, Google Colab will also run out of RAM and crash. So, it seems difficult to increase the data any more, so we will end the discrimination of dog and cat data at this point.
Part 18 ←
Recommended Posts