――This is the output of my own machine learning and deep learning study records. --This time, Google Colaboratory will classify two types of image data. Judge "my child (pet dog)" and "other than my child (pet dog)" from the picture of Shiba Inu. ――Describe as much as possible the parts that stumbled due to various errors, and describe them so that anyone can easily reproduce them. ――This is the first time you have posted an article on qiita, so if you have any notices or corrections, please let us know.
――Those who are studying deep learning and want to do various analyzes, but lack knowledge and information and stumbled in various ways in and around Google Colaboratory, and the analysis does not proceed as expected (articles for advanced users of Bali Bali) Not!) --And if you like ** Shiba Inu **, please do! !!
――I started studying machine learning in 2018, and since 2019, I am a member of society who is mainly studying deep learning on Saturdays and Sundays when there is no work. ――I have no experience as an engineer and have been working as a member of society. I have no chance to utilize machine learning in practice, but the more I study, the more I become addicted to the deep charm of this field, and in September 2019 ** JDLA Deep Learning for Engeneer 2019 # I got 2 **. ――I want to gain experience in various deep learning analysis, and I am currently participating in case analysis using actual data. ――By the end of March 2020, I will retire from a civil servant who has been working for 18 years, and from April I will change jobs to a data engineer.
-: closed_book: ** "Deep Learning with Python and Keras" **: boy_tone1: Francois Chollet Translated by Quipe Co., Ltd. Translated by Yusuke Negago, published by Mynavi Publishing Co., Ltd. --The source code for this article is available on the author of this book ** Chollet's github **. --This time, we will analyze based on ** 5.2-using-convnets-with-small-datasets.ipynb **. (As an aside, regarding machine learning, I was given a lecture by the translator, Mr. Nestago, at a certain school in 2018.)
-You must have a ** Google Account **. --If you do not have it, please refer to ** This article ** and get your own account in advance.
** Step 1 Collect photos for analysis, resize and upload to Google Drive in zip format ** ** Step 2 Create a work folder on Google Drive and decompress the data. ** ** ** Step 3 Copy the specified number of Shiba Inu photos (50 each) to the train, test, and validation folders. ** ** ** Step 4 Model construction ** ** Step 5 Learning ** ** Step 6 Result ** ** Step 7 Data Augumentation ** ** Step 8 Other adjustment results (change of image input size, etc.) **
First of all, let me introduce my child before proceeding with the analysis procedure. As of January 2020, I am a 16 year old female Shiba. The name is ** Mirin **. The name is my wife, but since it is a Japanese dog, it was named in a Japanese style. ** Already, this mirin is cute and cute! ** ** There are many cute pictures of this mirin in the analysis data zip file (mydog) that will come out later. We hope that you will take a look at this as well and enjoy the various facial expressions of our mirin together (← ** parent idiot **).
--Collect photos of copyright-free Shiba Inu online. I brought it from a site around here. When searching on overseas sites, photos will appear with the word "Shiba Inu" as the search term.
pixabay
unsplash
Gahag
Photo material Adachi
Free material dot com
――This site, which allows you to search across multiple free material sites, was also convenient.
O-DAN
――I collected Mirin's photos from my smartphone and digital camera folders. At that time, I chose the period of 12 to 16 years old (currently) by excluding the puppy age as the period of photography so that the features would not vary.
――Since each image has various photographs, it is not suitable for use as data as it is. Therefore, I trimmed the image so that the proportion of the dog element contained in one image was as high as possible. Using photo retouching software (Photoshop Elements, gimp, etc.), the aspect ratio is 1: 1 and you can cut out the image without worrying about the image size and save it as an image file (jpg file). --The cut out file uses software such as ** Reduction **, and this time it is an image of 320 x 320 pixels. Resize in bulk as. I used ** FlexRena84 ** for batch renaming of image files. --In this way, prepare a total of 120 image files, 60 mirin and 60 other Shiba Inu, store them in the folders named "mydog" and "otherdogs", and save each folder as a zip file.
I will post some photos. ** Mirin photo (my dog) **
** Photos of other Shiba Inu (other dogs) **
--Create a folder to store data in Google drive in advance. --The figure is an example of the configuration of my directory. (Actually, I will work using the orange folder) If you execute it with the same configuration, the source code will work as it is. -** Upload the two zip files "mydog1.zip" ** ** "otherdogs1.zip" ** to Google Drive (** "original_data" ** folder).
From here, actually start Google Colaboratory and operate on Colab.
If you are new to this, please refer to ** here **.
#Google Drive mount
from google.colab import drive
drive.mount('/content/drive')
# cd '/content/drive/'My Drive/'Colab Notebooks'Change the current directory to the working folder in
%cd '/content/drive/'My Drive/Colab Notebooks/Self_Study/02_mydog_or_otherdogs/original_data
# mydog1.Unzip the zip
!unzip "mydog1.zip"
# otherdogs1.Unzip the zip
!unzip "otherdogs1".zip"
#Check the number of unzipped files
!ls ./original_data/mydog1 | wc -l
!ls ./original_data/otherdogs1 | wc -l
** Note) Data should be compressed and uploaded to Google Drive before decompression. </ font> ** I tried uploading directly to Google Drive without compression, but it takes a lot of time. Since the operation is performed on colaboratory, the decompression command uses the Linux command starting with "!". A message like this will appear and the file will be decompressed.
#Loading the required libraries
import os, shutil
#Define the required file path
# cd '/content/drive/'My Drive/'Colab Notebooks'Change the current directory to the working folder in
%cd '/content/drive/'My Drive/Colab Notebooks/Self_Study/02_mydog_or_otherdogs
# original_Setting the file path of the data folder
original_dataset_dir = 'original_data'
# original_data(Definition name'original_dataset_dir')Set the following two folder paths in.
original_mydog_dir = 'original_data/mydog'
original_otherdogs_dir = 'original_data/otherdogs'
# use_Data file path settings
base_dir = 'use_data'
# use_data folder(Definition name'base_dir')Set the following three folder paths in. Create a folder.
train_dir = os.path.join(base_dir, 'train')
os.mkdir(train_dir)
validation_dir = os.path.join(base_dir, 'validation')
os.mkdir(validation_dir)
test_dir = os.path.join(base_dir, 'test')
os.mkdir(test_dir)
#In the train folder'mydog'With folders'otherdogs'Set the folder path. Create a folder.
train_mydog_dir = os.path.join(train_dir, 'mydog')
os.mkdir(train_mydog_dir)
train_otherdogs_dir = os.path.join(train_dir, 'otherdogs')
os.mkdir(train_otherdogs_dir)
#In the validation folder'mydog'With folders'otherdogs'Set the folder path. Create a folder.
validation_mydog_dir = os.path.join(validation_dir, 'mydog')
os.mkdir(validation_mydog_dir)
validation_otherdogs_dir = os.path.join(validation_dir, 'otherdogs')
os.mkdir(validation_otherdogs_dir)
#In the test folder'mydog'With folders'otherdogs'Set the folder path. Create a folder.
test_mydog_dir = os.path.join(test_dir, 'mydog')
os.mkdir(test_mydog_dir)
test_otherdogs_dir = os.path.join(test_dir, 'otherdogs')
os.mkdir(test_otherdogs_dir)
#train for train_mydog_Copy 30 mydog files to dir
fnames = ['mydog{}.jpg'.format(i) for i in range(30)]
for fname in fnames:
src = os.path.join(original_mydog_dir, fname)
dst = os.path.join(train_mydog_dir, fname)
shutil.copyfile(src, dst)
#validation for validation_mydog_Copy 20 mydog files to dir
fnames = ['mydog{}.jpg'.format(i) for i in range(30,50)]
for fname in fnames:
src = os.path.join(original_mydog_dir, fname)
dst = os.path.join(validation_mydog_dir, fname)
shutil.copyfile(src, dst)
#test for test_mydog_Copy 10 mydog files to dir
fnames = ['mydog{}.jpg'.format(i) for i in range(50,60)]
for fname in fnames:
src = os.path.join(original_mydog_dir, fname)
dst = os.path.join(test_mydog_dir, fname)
shutil.copyfile(src, dst)
#train for train_otherdogs_Copy 30 otherdogs files to dir
fnames = ['otherdogs{}.jpg'.format(i) for i in range(30)]
for fname in fnames:
src = os.path.join(original_otherdogs_dir, fname)
dst = os.path.join(train_otherdogs_dir, fname)
shutil.copyfile(src, dst)
#validation for validation_otherdogs_Copy 20 otherdogs files to dir
fnames = ['otherdogs{}.jpg'.format(i) for i in range(30, 50)]
for fname in fnames:
src = os.path.join(original_otherdogs_dir, fname)
dst = os.path.join(validation_otherdogs_dir, fname)
shutil.copyfile(src, dst)
#test for test_otherdogs_Copy 10 otherdogs files to dir
fnames = ['otherdogs{}.jpg'.format(i) for i in range(50, 60)]
for fname in fnames:
src = os.path.join(original_otherdogs_dir, fname)
dst = os.path.join(test_otherdogs_dir, fname)
shutil.copyfile(src, dst)
#Check the number of files stored in each folder
print('total training mydog images:', len(os.listdir(train_mydog_dir)))
print('total training otherdogs images:', len(os.listdir(train_otherdogs_dir)))
print('total validation mydog images:', len(os.listdir(validation_mydog_dir)))
print('total validation otherdogs images:', len(os.listdir(validation_otherdogs_dir)))
print('total test mydog images:', len(os.listdir(test_mydog_dir)))
print('total test otherdogs images:', len(os.listdir(test_otherdogs_dir)))
The number of files is displayed as follows.
The size of the input image is 320 * 320 in the original file, but this time the size of the input is read as 150 * 150.
from keras import layers
from keras import models
model = models.Sequential()
model.add(layers.Conv2D(32, (3, 3), activation='relu',
input_shape=(150, 150, 3)))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(64, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Conv2D(128, (3, 3), activation='relu'))
model.add(layers.MaxPooling2D((2, 2)))
model.add(layers.Flatten())
model.add(layers.Dense(512, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.summary()
The model configuration is displayed like this.
from keras import optimizers
model.compile(loss='binary_crossentropy',
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=['acc'])
No data augmentation was implemented this time.
from keras.preprocessing.image import ImageDataGenerator
# rescaled by 1./255
train_datagen = ImageDataGenerator(rescale=1./255)
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# target directory
train_dir,
# resized to 150x150
target_size=(150, 150),
batch_size=20,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=20,
class_mode='binary')
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=30,
validation_data=validation_generator,
validation_steps=50)
When learning starts, the display will look like this, and the calculation will proceed. (It will take a while)
model.save('mydog_or_otherdogs_01a.h5')
import matplotlib.pyplot as plt
acc = history.history['acc']
val_acc = history.history['val_acc']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()
Result graph
First of all, accuracy
--The accuracy of traning has reached 100% immediately from the beginning. --The accuracy of validation is around 70% to 85%, and the accuracy does not improve any more. You can see the features such as. Unfortunately, we are only looking at the characteristics that are valid only for train data, and we cannot say that we have a clear characteristic of our dog that is valid for other than train data.
On the other hand, about loss
--Training loss is close to 0 from the beginning. ――Validation loss does not decrease as the number of times increases, but increases upward.
From the above, we are overfitting the train data. After all, the number of data was originally small, so it seems that the result is unavoidable.
Let's apply this model to the test data with the following code to see the classification accuracy. ** Note) ImageDataGenerator will fail without conversion if there are no two or more subfolders in the target folder. ** (In this case you need two subfolders, mydog and otherdogs) </ font>
test_datagen = ImageDataGenerator(rescale=1./255)
test_generator = test_datagen.flow_from_directory(
test_dir,
target_size=(150,150),
batch_size=50,
class_mode='binary')
test_loss, test_acc = model.evaluate_generator(test_generator, steps=50)
print('test loss:', test_loss)
print('test acc:', test_acc)
test loss: 2.7508722241719563 test acc: 0.7666666607062022
The accuracy of this implementation result is about 76%. More adjustments are needed.
Let's continue to use the previous model, perform Data Augumentation on the train data, and train it again. The code for the padded part of the ImageDataGenerator is below.
#Inflated train data
train_datagen = ImageDataGenerator(
rescale=1./255,
rotation_range=40,
width_shift_range=0.2,
height_shift_range=0.2,
shear_range=0.2,
zoom_range=0.2,
horizontal_flip=True,)
#Validation data is not padded
test_datagen = ImageDataGenerator(rescale=1./255)
train_generator = train_datagen.flow_from_directory(
# Target directory
train_dir,
# resized to 150x150
target_size=(150, 150),
batch_size=32,
class_mode='binary')
validation_generator = test_datagen.flow_from_directory(
validation_dir,
target_size=(150, 150),
batch_size=32,
class_mode='binary')
Learn the model with 100 epochs.
history = model.fit_generator(
train_generator,
steps_per_epoch=100,
epochs=100,
validation_data=validation_generator,
validation_steps=50)
Save the model after training.
model.save('mydog_or_otherdogs_01b.h5')
The resulting graph is as follows:
Let's apply the trained model to the test data and see the classification accuracy. test loss: 2.480180886810025 test acc: 0.7499999996026357 In general, it seems that padding improves the accuracy, but in this example, the accuracy has dropped a little compared to the previous one. Since the number of data samples is originally small, I wonder if this is the case.
So far, the size of the input image has been set to 150pixel x 150pixel. What if I change the size of the input image to the original size of 320pixel x 320pixel? Using the model built so far, let's train the model in two ways, ** ① without Data Augumentation ** ** ② with Data Augumentation **, and see the implementation result.
** * Added on 2020/1/4 Since the numerical value was incorrect, the numerical value was corrected to the correct one and the comment was also revised. </ font> ** The classification accuracy of the test data is as follows. Result of ① test loss: 1.6523902654647826 ~~(1.7524430536416669)~~ test acc: 0.75 ~~(0.8166666567325592)~~
Result of ② test loss: 2.102495942115784 ~~(1.382319548305386)~~ test acc: 0.75 ~~(0.8666666634877522)~~
Compared to the case where the input size is 150 pixels, there was not much difference in error and accuracy at the input size of 320 pixels. </ font>
Next time, I would like to analyze again when I increase the number of image data samples.
Recommended Posts