Introduction

When I was doing machine learning for the first time, I was interested in image recognition and decided to try it, but I was wondering what approach to take. Therefore, there was a course called "Image recognition using CNN" by Aidemy, so I took it and decided to do CNN.

The first thing I thought about was the classification of racehorses because I like horse racing. Thinking of it as a [racehorse = a horse on which a jockey is riding], I thought about making something that could be recognized as a [horse], whether it was a racehorse or a naked horse.

** Classification is horse and deer. ** ** The reason is that I also used the CIFAR-10 dataset in the process of collecting images of horses, but there was a deer in the CIFAR-10. I wanted to try something similar as an animal, so it was just right.

Image collection
Processing of collected images
Image padding
Turn the image into learning / verification data
Model building and saving
Graphing the results
Test with another image

Image collection

The following was used for image collection.

Google Images Download
bing_image_downloader API

Search words "race horse" "Race horse" "Cheval de course" Approximately 1400 sheets Search word "deer" about 800

data set --CIFAR-10 horse 1001 --CIFAR-10 deer 999 sheets

Processing of collected images

The dataset is perfectly processed, so we will process the images we picked up from the web.

1. Delete duplicate images

Use image hash (phash) to extract duplicate images. Detect folders with the same image in ImageHash I used the code quoted and edited.


from PIL import Image, ImageFile
import imagehash
import os

#Do not skip large images
ImageFile.LOAD_TRUNCATED_IMAGES = True


#phash Outputs the difference between the hash values of two images
def image_hash(img, otherimg):
    #Specify phash
    hash = imagehash.phash ( Image.open ( img ) )
    other_hash = imagehash.phash ( Image.open ( otherimg ) )
    return hash - other_hash


#Detect the smaller image size
def minhash(img, otherimg):
    # (width,height)Tuples
    hash_size = Image.open ( img ).size
    otherhash_size = Image.open ( otherimg ).size
    if hash_size == otherhash_size:
        return 0
    if hash_size < otherhash_size:
        return 1


#Save the directory containing the images you want to look up in the following path
default_dir = 'The path where you saved the directory containing the images'
#Get the directory containing the images you want to look up
img_dir = os.listdir ( default_dir )
#Get the path containing the image you want to look up
img_dir_path = os.path.join ( default_dir, img_dir[0] )

#Get a list of images
img_list = os.listdir ( img_dir_path )
#If there are two or more images, get the image path and list it.
img_path = [os.path.join ( img_dir_path, i ) for i in img_list
            if len ( os.path.join ( img_dir_path, i ) ) > 2]
#Get the number of images in a folder
img_list_count = len ( img_list )

i = 0
delete_list = []

# image_hash(),minhash()Compare images by folder with
while i < img_list_count:
    #progress
    print ( 'Running: ', str ( i + 1 ) + '/' + str ( img_list_count ) )
    # i +Do not compare the same image with the one in the second comparison with 1
    for j in range ( i + 1, img_list_count ):
        #If the difference between hash values is 10 or less, it is recognized as the same image.
        if image_hash ( img_path[i], img_path[j] ) < 10:
            print ( img_path[i] + ' | vs | ' + img_path[j] )
            #If the image size is the same, delete one path_Store in list
            if minhash ( img_path[i], img_path[j] ) == 0:
                if not img_path[j] in delete_list:
                    delete_list.append ( img_path[i] )
            #Delete the path with the smaller image size_Store in list
            if minhash ( img_path[i], img_path[j] ) == 1:
                delete_list.append ( img_path[i] )
            j += 1
    i += 1

#Display the image path you want to delete
print ( delete_list )


#To open the image you want to delete
# def open_folder(path):
#     subprocess.run ( 'explorer {}'.format ( path ) )
#
# for i in range ( len ( delete_list ) ):
#     open_folder ( delete_list[i] )


#If you want to continue deleting
# for i in delete_list:
#     try:
#         os.remove( i )
#     except OSError :
#         pass

References

Compare image similarity between ORB and Perceptual Hash using python

Calculate image similarity using Perceptual Hash

2. Delete unrelated images and convert images to RGB format

I manually deleted images that I thought couldn't be used for learning.

Please refer to here for conversion to RGB format. Preprocess images for machine learning

In this way, the processed image is ready.

--horse folder Approximately 1400 sheets → 438 sheets in total --deer folder Approximately 800 sheets → 139 sheets in total

Use the plus CIFAR-10 image for the above image.

Image padding

459 photos picked up by horse folder API, deer folder API + CIFAR-10 1138 sheets Is inflated with ImageDataGenerator.

You can train the model as-is using fit_generator () and flow (), but this time it's just for simple padding.

So save the generated image to your drive.

from keras.preprocessing.image import ImageDataGenerator
import os


datagen = ImageDataGenerator(rotation_range=20,  #Rotation range that rotates randomly (unit: degree)
                             width_shift_range=0.2,  #Randomly translates horizontally, as a percentage of the width of the image
                             height_shift_range=0.2,  #Randomly translates vertically, as a percentage of the vertical width of the image
                             shear_range=0.2,  #Degree of shear. Increasing the size makes the image look more diagonally crushed or stretched (unit: degree).
                             zoom_range=0.2,  #The rate at which the image is randomly compressed and enlarged. Minimum 1-Compressed to zoomrange, up to 1+zoom_Expanded to range
                             horizontal_flip=True)  #Randomly flip horizontally

root_dir = './data/padding'  #The path to the image folder you want to inflate
targetsize = (128, 128)  #Processing size
save_dir = os.listdir(root_dir)  #Folder name to save the inflated image
save_path = os.path.join('./data/save', save_dir[0])  #Where to save the inflated image
increase = len(os.listdir(os.path.join(root_dir, save_dir[0])))  #Number of images in the image folder you want to pad
increase_count = 1  #Inflate by this number of patterns per sheet(increase✕increase_Images increase by the number of count)

#Create if the destination directory does not exist
if not os.path.exists(save_path):
    os.makedirs(save_path)

# flow_from_directory()Image you want to inflate with(folder)And process and save the inflated image at the same time.
ffd = datagen.flow_from_directory(
    directory=root_dir,
    target_size=targetsize,
    color_mode='rgb',
    batch_size=increase,
    save_to_dir=save_path)

[next(ffd) for i in range(increase_count)]

2000 horse folders 2000 deer folders Is ready.

References

Keras-Increase training images with Keras ImageDataGenerator Modify Keras CNN to understand ImageDataGenerator classifier_from_little_data_script_1.py [How to pad learning images with Keras ImageDataGenerator](https://intellectual-curiosity.tokyo/2019/07/03/keras%E3%81%AEimagedatagenerator%E3%81%A7%E5%AD%A6% E7% BF% 92% E7% 94% A8% E7% 94% BB% E5% 83% 8F% E3% 82% 92% E6% B0% B4% E5% A2% 97% E3% 81% 97% E3% 81% 99% E3% 82% 8B% E6% 96% B9% E6% B3% 95 /) Image Preprocessing

import

Import up to "Graph of results" is as follows

#Code to run plaidML with Karas
import plaidml.keras
plaidml.keras.install_backend()

from sklearn.model_selection import train_test_split
from keras.callbacks import ModelCheckpoint
from keras.layers import Conv2D, MaxPooling2D, Dense, Dropout, Flatten
from keras.models import Sequential
from keras.utils import np_utils
from keras import optimizers
from keras.preprocessing.image import img_to_array, load_img
import keras
import glob
import numpy as np
import matplotlib.pyplot as plt

Turn images into learning / verification data

--Unify all image sizes --Arrangement --Divided by 80% of learning data and 20% of verification data

#Image directory path
root_dir = './data/'
#Image directory name
baka = ['horse', 'deer']

X = []  #List that stores 2D data of images
y = []  #label(Correct answer)List to store information about

for label, img_title in enumerate(baka):
    file_dir = root_dir + img_title
    img_file = glob.glob(file_dir + '/*')
    for i in img_file:
        img = img_to_array(load_img(i, target_size=(128, 128)))
        X.append(img)
        y.append(label)

#4D list of Numpy arrays(*, 244, 224, 3)
X = np.asarray(X)
y = np.asarray(y)

#Convert pixel values from 0 to 1
X = X.astype('float32') / 255.0
#Label One-Convert to hot label
y = np_utils.to_categorical(y, 2)

#Divide the data
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=0)
xy = (X_train, X_test, y_train, y_test)
# .Save with npy
np.save('./npy/train.npy', xy)
#Image data for learning(Row (height),Column (width),Color (3))Confirmation of(Input layer input_Same as shape)
print('3D:', X_train.shape[1:])

Model building and saving

This part is trial and error Use ModelCheckpoint in callbacks of model.fit () to save the model for each epoch. Finally, the entire model when val_loss is the minimum value remains in hdf5 format. Models and graphs are ** the ones with the highest accuracy rate in the final test. ** **

#Input layer,Hidden layer(Activation function:relu)
model.add ( Conv2D ( 32, (3, 3), activation='relu', padding='same', input_shape=X_train.shape[1:] ) )
model.add ( MaxPooling2D ( pool_size=(2, 2) ) )

model.add ( Conv2D ( 32, (3, 3), activation='relu', padding='same' ) )
model.add ( MaxPooling2D ( pool_size=(2, 2) ) )

model.add ( Conv2D ( 64, (3, 3), activation='relu' ) )
model.add ( MaxPooling2D ( pool_size=(2, 2) ) )

model.add ( Conv2D ( 128, (3, 3), activation='relu' ) )
model.add ( MaxPooling2D ( pool_size=(2, 2) ) )

model.add ( Flatten () )
model.add ( Dense ( 512, activation='relu' ) )
model.add ( Dropout ( 0.5 ) )

#Output layer(2 classification)(Activation function:softmax)
model.add ( Dense ( 2, activation='softmax' ) )

#Compile (learning rate:1e-3, loss function: categorical_crossentropy, optimization algorithm: RMSprop, merit function: accuracy(Correct answer rate)）
rms = optimizers.RMSprop ( lr=1e-3 )
model.compile ( loss='categorical_crossentropy',
                optimizer=rms,
                metrics=['accuracy'] )

#Learning model epoch
epoch = 50
#Path to save the model
fpath = f'./model/model.{epoch:02d}-.h5'
#Check if you want to save the model for each epoch
mc = ModelCheckpoint (
    filepath=fpath,
    monitor='val_loss',  #What to check the rating
    verbose=1,
    save_best_only=True,  # val_The latest optimal model of loss is not overwritten
    save_weights_only=False,  #If False, save the entire model
    mode='min',  #The target of the check is val_Since it is loss, specify the minimum
    period=1 )  #Epoch interval to check

#Learning with the built model
history = model.fit (
    X_train,
    y_train,
    batch_size=64,
    epochs=epoch,
    callbacks=[mc],
    validation_data=(X_test, y_test) )

Graphing the results

The number displayed in the graph is not the number in model.h5, but the number in the last epoch.

#Visualization
fig = plt.figure(figsize=(18, 6))  #Window creation

#Correct answer rate graph
plt.subplot(1, 2, 1)  #Display two side by side on the right side
plt.plot(history.history['acc'], label='acc', ls='-', marker='o')  #Accuracy of training data
plt.plot(history.history['val_acc'], label='val_acc', ls='-', marker='x')  #Training data accuracy
plt.title(f'Training and validation accuracy \n val_acc {score[1]:.4f}')  #title
plt.xlabel('epoch')  #Horizontal axis
plt.ylabel('accuracy')  #Vertical axis
plt.legend(['acc', 'val_acc'])  #Usage Guide
plt.grid(color='gray', alpha=0.2)  #Grid display

#Loss graph
plt.subplot(1, 2, 2)  #Display two side by side on the left side
plt.plot(
    history.history['loss'], label='loss', ls='-', marker='o')  #Loss of training data
plt.plot(history.history['val_loss'], label='val_loss', ls='-', marker='x')  #Training data loss
plt.title(f'Training and validation loss \n val_loss {score[0]:.4f}')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.legend(['loss', 'val_loss'])
plt.grid(color='gray', alpha=0.2)

#Save
plt.savefig('1.png')

plt.show()

Saved model

Epoch 15/50 ･･････ 3200/3200 [==============================] - 122s 38ms/step - loss: 0.1067 - acc: 0.9625 - val_loss: 0.1872 - val_acc: 0.9363

Because the graph is not stable Epoch15 has reached the minimum val_loss value. val_loss: 0.1872 - val_acc: 0.9363

It is possible that the learning rate is high and the number of data is small, In the second half, overfitting is occurring.

Improvements based on this are described in "** Tried **" near the end of this article. As a result, it didn't work ...

Test with another image

Prepare a completely different image and distinguish it with the saved model.

#Code to run plaidML with Karas
import plaidml.keras
plaidml.keras.install_backend()

from keras.preprocessing.image import img_to_array, load_img
from keras.models import load_model
import numpy as np
import glob


#Model data path
hdf_path = './model/model.20-val_loss 0.2648 - val_acc 0.8793.h5'
#Model loading
model = load_model(hdf_path)

#Directory containing images to test
img_path = './baka/'
#Get 14 images
img_list = glob.glob(img_path + '*')
size = (128, 128, 3)

for index, i in enumerate(img_list):
    #Resize and load images and arrange them
    test_img = img_to_array(load_img(i, target_size=size))
    # 0~Range to 1
    test_img = test_img / 255
    #In a four-dimensional array
    test_img = test_img[np.newaxis, ...]
    #Forecast
    pre = model.predict(test_img)
    if np.max(pre[0]) == pre[0, 0]:
        print(f'{img_list[index]} -> {pre}Horse')
    if np.max(pre[0]) == pre[0, 1]:
        print(f'{img_list[index]} -> {pre}Deer')

A high number on the left side of the array is a horse, and a high number on the right side is a deer.

deer1.jpg-> [[0.08649362 0.9135064]] is a deer deer2.jpg-> [[5.096481e-06 9.999949e-01]] is a deer deer3.jpg-> [[0.01137464 0.9886254]] is a deer deer4.jpg-> [[0.04577665 0.9542234]] is a deer deer5.jpg-> [[1.0562457e-07 9.9999988e-01]] is a deer deer6.jpg-> [[0.10744881 0.89255124]] is a deer deer7.jpg-> [[0.5856648 0.41433516]] is a horse horse1.jpg-> [[0.00249346 0.99750656]] is a deer horse10.jpg-> [[0.6968936 0.30310643]] is a horse horse2.jpg-> [[0.90138936 0.09861059]] is a horse horse3.jpg-> [[9.9987268e-01 1.2731158e-04]] is a horse horse4.jpg-> [[9.9999964e-01 4.1403896e-07]] is a horse horse5.jpg-> [[9.999294e-01 7.052123e-05]] is a horse horse6.jpg-> [[9.9999738e-01 2.6105645e-06]] is a horse horse7.jpg-> [[0.93193245 0.06806755]] is a horse horse8.jpg-> [[0.01251398 0.987486]] is a deer horse9.jpg-> [[0.00848716 0.99151284]] is a deer

The correct answer rate was 76.47%.

Horse10.jpg was judged as [horse], but horse8.jpg and horse9.jpg were judged as [deer], which was the main purpose of judging racehorses.

I realized that I still haven't studied enough, whether the cause is the dataset, the data size, or something else at all.

Here are some of the things I tried.

What I tried part 1

Change dataset

--Change horse folder

Increase 21 photos from the front of the racehorse
Put only the padded 459 sheets in the horse folder (Do not use 1 horse 1001 of CIFAR-10) --Inflate the deer folder to match the horse folder (This uses 999 deer of CIFAR-10)

horse folder → 2295 sheets deer folder → 2295 sheets And although I didn't change the layer, I lowered the learning rate to 1e-4.

Epoch 27/30 ･･････ 3672/3672 [==============================] - 139s 38ms/step - loss: 0.1167 - acc: 0.9570 - val_loss: 0.1760 - val_acc: 0.9227

The graph is not stable test results

Incorrect deer1.jpg-> [[0.5788138 0.42118627]] is a horse deer5.jpg-> [[0.5183205 0.48167947]] is a horse horse8.jpg-> [[0.0699899 0.93001]] is a deer Correct answer horse9.jpg-> [[0.612066 0.38793397]] is a horse horse10.jpg-> [[0.7463752 0.2536248]] is a horse

The correct answer rate has dropped by 70.59%.

What I tried part 2

This time I lowered the learning rate further to 1e-5 and set the batch size to 32. I haven't changed the layers. The graph has become stable. However, the correct answer rate of the test was 47.06%, which was considerably lower.

What I tried # 3

I tried many other things with the above dataset, but I didn't get the results I expected, so I changed the dataset again.

--Change horse folder --Use one CIFAR-10 horse 100 that was not used However, inflating is only for those who picked up from the Web (459 sheets → 1275 sheets) --The deer folder has not changed, just the number

horse folder → 2276 sheets deer folder → 2276 sheets

Also reduced the number of layers

#Input layer,Hidden layer(Activation function:relu)
model.add ( Conv2D ( 32, (3, 3), activation='relu', padding='same', input_shape=X_train.shape[1:] ) )
model.add ( MaxPooling2D ( pool_size=(2, 2) ) )

model.add ( Conv2D ( 32, (3, 3), activation='relu', padding='same' ) )
model.add ( MaxPooling2D ( pool_size=(2, 2) ) )

model.add ( Conv2D ( 64, (3, 3), activation='relu' ) )
model.add ( MaxPooling2D ( pool_size=(2, 2) ) )

model.add ( Flatten () )
model.add ( Dense ( 64, activation='relu' ) )
model.add ( Dropout ( 0.5 ) )

#Output layer(2 classification)(Activation function:softmax)
model.add ( Dense ( 2, activation='softmax' ) )

Compilation (learning rate: 1e-4, loss function: categorical_crossentropy, optimization algorithm: RMSprop, evaluation function: accuracy (correct answer rate)) Epoch epochs = 20, batch size batch_size = 32

Epoch 18/20 3641/3641 [==============================] - 131s 36msstep - loss 0.2647 - acc 0.8846 - val_loss 0.2948 - val_acc 0.8716

The graph tends to be a little stable, but the correct answer rate of the test was 64.71%.

Consideration from the test image

I tried various things including the sigmoid function, deer1.jpg has a high probability of being judged as a horse. More than that, the front image of the racehorse of horse8.jpg and horse9.jpg is more likely to be judged as a deer. There may be insufficient data.

in conclusion

There are still various techniques to increase the accuracy rate, but I would like to finish here once and take on the challenge again. Learning rate attenuation, ensemble learning, transference learning, EfficientNet, etc.

I didn't get the results I wanted, but I was able to try image recognition using CNN.

References

Classification into 2 types (classes) by Keras Which optimization method shows the best performance for learning CNN I actually made and released an AI that determines whether the chocolate I got on Valentine's Day is my favorite (2019) Operation method when handling arrays such as images Friends identification with TensorFlow + Keras-Part 2: Learning with simple CNN Deep learning basic knowledge for super beginners Easy to build CNN with Keras Machines that can't distinguish beautiful women are just machines: Dataset generation for machine learning by Python Creating a slope group discrimination AI by image recognition Create an AI that lets you choose Ayataka with image recognition Tweak hyperparameters with MNIST to see the loss / accuracy graph CNN on Keras Save the best model (How to use ModelCheckpoint) A story about using deep learning to create an actress discrimination program that looks just like you DNN implementation in Keras Load a model trained with KERAS and identify one image

Image recognition using CNN Horses and deer

Introduction

table of contents

Image collection

Processing of collected images

1. Delete duplicate images

References

2. Delete unrelated images and convert images to RGB format

Image padding

References

import

Turn images into learning / verification data

Model building and saving

Graphing the results

Test with another image

What I tried part 1

What I tried part 2

What I tried # 3

Consideration from the test image

in conclusion

References