Introduction

Hello. I've always been interested in image recognition, and although I had vague knowledge, I hadn't written any code yet. This time I would like to write an article for beginners. Thank you.

procedure

Image scraping
Delete the same image
Make the image names serial
Cut out only the face part
Separate teacher data and test data
Inflate teacher data
Learn using Google Colaboratory

1. Image scraping

First, collect member images. There are image search services such as Bing and Yahoo, but in this article we will collect images by scraping the results of Google image search. You can get the original code from here. However, the original code cannot be downloaded (as of April 2020). There is a patch for this problem as Pull requests, so this time I will use the code.

Example of how to use google_images_download.py

cd google_images_download/
python google_images_download.py -k "Asuka Saito"
python google_images_download.py -k "site:twitter.com Asuka Saito"

You can only download up to 100 copies at a time with this program. If you want to collect more images, please change the search word. Also, if you use the search word "site: url", you can download only the images that correspond to the url.

Reference article

Google image download did not work, so support it

2. Delete the same image

Next, delete the exact same image data from the images collected in 1. The algorithm is summarized in the figure below. 同じ画像を削除 (1).png

In 1., 8bit x 3 (RGB) vector is converted to 1bit (black and white). At this time, monochrome conversion is performed using an algorithm called Floyd-Steinberg dithering. ([a](https://ja.wikipedia.org/wiki/%E3%83%95%E3%83%AD%E3%82%A4%E3%83%89-%E3%82%B9%E3 % 82% BF% E3% 82% A4% E3% 83% B3% E3% 83% 90% E3% 83% BC% E3% 82% B0% E3% 83% BB% E3% 83% 87% E3% 82 % A3% E3% 82% B6% E3% 83% AA% E3% 83% B3% E3% 82% B0), [b](https://pillow.readthedocs.io/en/stable/reference/Image. html), c)). The code for monochrome conversion is shown below. I referred to the code of here.

def img2vec(filename):
    img = Image.open(filename)
    img = img.resize((200, 200), resample=Image.BILINEAR) #Shrink
    img = img.convert('1') #Binarization
    #img.save(get_mono_filename(filename)) #If you want to check the image
    img = np.array(img)
    #Convert to int type matrix
    int_img = img.astype(np.int)
    return int_img

Then calculate the norm.

def calnorm(vec):
    norm = np.linalg.norm(vec)
    return norm

Arrange the norms in order of descending. Use the sort function to sort the values in any column.

def fig_norm_list_cal(list_fig):
    #Make a list that summarizes each fig and norm
    #Example
    # fig_norm_list = [[fig, norm], [fig.A, 2], [fig.B, 3],...[figZ, 5]]
    fig_norm_list = fig_norm_mat(list_fig)
    #Arrange the norm sizes in descending order
    #http://sota1235.com/blog/2015/04/23/python_sort_twodime.html
    fig_norm_list = sorted(fig_norm_list, key = lambda x: x[1])
    #nd.Convert to array type
    fig_norm_list = np.array(fig_norm_list)
    return fig_norm_list

Examine the norm values one by one and delete the ones with the same value.

def del_fig_list_cal(fig_norm_list):
    #List files to delete
    del_fig_list = []
    #The bottom line does not count
    for i in range(fig_norm_list.shape[0] - 1):
        if (fig_norm_list[i, 1] == fig_norm_list[i+1, 1]):
            del_fig_list.append(str(fig_norm_list[i, 0]))
    return del_fig_list

Delete the file using the del_fig_list and os.remove () calculated above.

3. Make the image names serial

This code also determines the extension at the same time.

#Change the name of the image to a serial number
def fig_rename():
    i = 1
    j = 1
    k = 1
    list_fig = listfig()
    for pre_filename in list_fig:
        if ((".jpg " in pre_filename) == True ):
            os.rename(pre_filename, path + "/fig_data/" + str(i) + '.jpg')
            i += 1
        elif ((".png " in pre_filename) == True):
            os.rename(pre_filename, path + "/fig_data/" + str(j) + '.png')
            j += 1
        elif ((".jpeg " in pre_filename) == True):
            os.rename(pre_filename, path + "/fig_data/" + str(k) + '.jpeg')
            k += 1
        else:
            pass
    return

Reference article

Program to search for the same image

4. Cut out only the face part

The image may contain something other than a face, or it may contain multiple faces. You need to shape this into a face-only image that you can use for learning. Here, we used face detection by Haar-Cascade, which is installed as standard in OpenCV.

import cv2
import os
import glob
path = os.getcwd()

def listfig():
    #.Fetch files other than py
    #https://qiita.com/AAAAisBraver/items/8d40d9c2d624ecee105d
    filename = path + "/face_detect_test/" + "*[!.py]"
    list_fig = glob.glob(filename)
    return list_fig

def jud_ext(filename):
    if ((".jpg " in filename) == True):
        ext = '.jpg'
    elif ((".png " in filename) == True):
        ext = '.png'
    elif ((".jpeg " in filename) == True):
        ext = '.jpeg'
    else:
        ext = 'end'
    return ext

def main():
    face_cascade_path = '/home/usr/anaconda3/pkgs/libopencv-4.2.0-py36_2'\
                    '/share/opencv4/haarcascades/haarcascade_frontalface_default.xml'
    face_cascade = cv2.CascadeClassifier(face_cascade_path)
    #Image list creation
    fig_list = listfig()
    #Extracted from image list with for statement
    j = 1 #Number images before detecting faces
    k = 0 #Count images that are too small
    for fig_name in fig_list:
        #fig_Identify the extension of name
        ext = jud_ext(fig_name)
        #If end, end
        if (ext == 'end'):
            break

        #Image loading
        src = cv2.imread(fig_name)
        src_gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
        faces = face_cascade.detectMultiScale(src_gray)

        #Cut out the faces part
        i = 1 #In case there are two or more faces in one image
        for x, y, w, h in faces:
            #cv2.rectangle(src, (x, y), (x + w, y + h), (255, 0, 0), 2)
            #Cut only the face part from src
            face = src[y: y + h, x: x + w]
            #face_gray = src_gray[y: y + h, x: x + w]
            #Make sure there are at least 64 pixels on each side
            if face.shape[0] < 64:
                print('\ntoo small!!\n')
                print('\n{}\n'.format(fig_name))
                k += 1
                continue
            #Caution: (64,64)Is not saved below 64x64 pixels
            face = cv2.resize(face, (64, 64))
            #Face save
            cv2.imwrite(path + '/face_detect_test/clip_face/' + 'face' + str(i) + str(j) + ext, face)
            i += 1
        j += 1
    print("too small fig is {}".format(k))
    

if __name__ == '__main__':
    main()

I will explain a partial excerpt of the code. First, use the following code to determine the extension of the image.

def jud_ext(filename):
    if ((".jpg " in filename) == True):
        ext = '.jpg'
    elif ((".png " in filename) == True):
        ext = '.png'
    elif ((".jpeg " in filename) == True):
        ext = '.jpeg'
    else:
        ext = 'end'
    return ext

The image I used had only jpg, png, and jpeg formats, so I wrote the code to judge those three. The extension of the image is determined because the image is saved with the same extension after the face image is cut.

The following code is used when cutting the face. At this time, the resolution is converted to 64x64. This is a measure to reduce the data size. The cv2.resize function does not save images smaller than 64x64. However, as I noticed later, it is extremely difficult to see when judging the face image manually, so I think that you can save it at 128 x 128.

def main():
    face_cascade_path = '/home/usr/anaconda3/pkgs/libopencv-4.2.0-py36_2'\
                    '/share/opencv4/haarcascades/haarcascade_frontalface_default.xml'
    face_cascade = cv2.CascadeClassifier(face_cascade_path)
    #Image list creation
    fig_list = listfig()
    #Extracted from image list with for statement
    j = 1 #Number images before detecting faces
    k = 0 #Count images that are too small
    for fig_name in fig_list:
        #fig_Identify the extension of name
        ext = jud_ext(fig_name)
        #If end, end
        if (ext == 'end'):
            break

        #Image loading
        src = cv2.imread(fig_name)
        src_gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
        faces = face_cascade.detectMultiScale(src_gray)

        #Cut out the faces part
        i = 1 #In case there are two or more faces in one image
        for x, y, w, h in faces:
            #cv2.rectangle(src, (x, y), (x + w, y + h), (255, 0, 0), 2)
            #Cut only the face part from src
            face = src[y: y + h, x: x + w]
            #face_gray = src_gray[y: y + h, x: x + w]
            #Make sure there are at least 64 pixels on each side
            if face.shape[0] < 64:
                print('\ntoo small!!\n')
                print('\n{}\n'.format(fig_name))
                k += 1
                continue
            #Caution: (64,64)Is not saved below 64x64 pixels
            face = cv2.resize(face, (64, 64))
            #Face save
            cv2.imwrite(path + '/face_detect_test/clip_face/' + 'face' + str(i) + str(j) + ext, face)
            i += 1
        j += 1
    print("too small fig is {}".format(k))

5. Separate teacher data and test data

Face photos are divided by allocating 80% of teacher data and 20% of test data. I used the following code.

import os
import random
import shutil
import glob

path = os.getcwd()
names = ["ikuta_face", "saito_asuka_face", "shiraishi_face"]

#Collect image names in a list
def listfig(tar_dir):
    #.Fetch files other than py
    #https://qiita.com/AAAAisBraver/items/8d40d9c2d624ecee105d
    filename = path + tar_dir + "*[!.py]"
    list_fig = glob.glob(filename)
    return list_fig


def main():
    for name in names:
        #Get image directory list
        face_list = listfig("/dataset_face_fig/" + name + "/")
        # face_Shuffle list
        random.shuffle(face_list)
        # face_Move 20% from the top of list to the test directory
        for i in range(len(face_list)//5):
            shutil.move(str(face_list[i]), str(path + "/dataset_face_fig/test/" + name))
    

if __name__ == "__main__":
    main()

After separating the image data, check that the data is correct. If there is a mistake in Google image search, an image that has nothing to do with it may be saved. In my case, there were the following cases.

--The search results for Asuka Saito include a photo of Seira Hayakawa's face. --Asuka Saito and Nanase Nishino can no longer be distinguished. ――Gestalt collapses, making it difficult to identify the face.

As I mentioned earlier, I was struggling with a 64x64 image while doing this.

6. Inflate teacher data

After sorting the images, the next step is to process the images to increase the teacher data. The source code is shown below.

import os
import glob
import traceback
import cv2
from scipy import ndimage

path = os.getcwd()

#Collect image names in a list
def listfig(tar_dir):
    #.Fetch files other than py
    #https://qiita.com/AAAAisBraver/items/8d40d9c2d624ecee105d
    filename = path + tar_dir + "*[!.py]"
    list_fig = glob.glob(filename)
    return list_fig

#Rotate the image
def fig_rot(img, ang):
    img_rot = ndimage.rotate(img,ang)
    img_rot = cv2.resize(img_rot,(64,64))
    return img_rot

#Extension judgment
def jud_ext(filename):
    if ((".jpg " in filename) == True):
        ext = '.jpg'
    elif ((".png " in filename) == True):
        ext = '.png'
    elif ((".jpeg " in filename) == True):
        ext = '.jpeg'
    else:
        ext = 'end'
        try:
            raise Exception
        except:
            traceback.print_exc()
    return ext

def main():
    #Specify the directory that contains the teacher data to increase
    names = ["ikuta_face", "saito_asuka_face", "shiraishi_face"]
    for name in names:
        #  /dataset_face_fig/name/Get an image list from a directory
        train_fig_list = listfig("/dataset_face_fig/" + "/" + name + "/")
        i = 1  #Don't cover the image name
        for train_fig in train_fig_list:
            ext = jud_ext(train_fig)
            #Image load
            img = cv2.imread(train_fig)
            #rotation
            for ang in [-10, 0, 10]:
                # j = 1  #Don't cover the image name
                img_rot = fig_rot(img, ang)
                #Save image
                cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + ext, img_rot)
                #Threshold processing
                img_thr = cv2.threshold(img_rot, 100, 255, cv2.THRESH_TOZERO)[1]
                cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + 'thr' + ext, img_thr)
                #Blur processing
                img_filter = cv2.GaussianBlur(img_rot, (5, 5), 0)
                cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + 'fil' + ext, img_filter)
            i += 1
        
    return 

if __name__ == "__main__":
    main()

Rotate one image data to increase it to three. After that, threshold processing and blurring processing are performed for each image. Nine images are included in 3x3 for one image data. Below is an excerpt of the core code.

def main():
    #Specify the directory that contains the teacher data to increase
    names = ["ikuta_face", "saito_asuka_face", "shiraishi_face"]
    for name in names:
        #  /dataset_face_fig/name/Get an image list from a directory
        train_fig_list = listfig("/dataset_face_fig/" + "/" + name + "/")
        i = 1  #Don't cover the image name
        for train_fig in train_fig_list:
            ext = jud_ext(train_fig)
            #Image load
            img = cv2.imread(train_fig)
            #rotation
            for ang in [-10, 0, 10]:
                # j = 1  #Don't cover the image name
                img_rot = fig_rot(img, ang)
                #Save image
                cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + ext, img_rot)
                #Threshold processing
                img_thr = cv2.threshold(img_rot, 100, 255, cv2.THRESH_TOZERO)[1]
                cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + 'thr' + ext, img_thr)
                #Blur processing
                img_filter = cv2.GaussianBlur(img_rot, (5, 5), 0)
                cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + 'fil' + ext, img_filter)
            i += 1

    return

7. Learn using Google Colaboratory

After this, you need to 1. label the image data and 2. learn. From here, I will try using Google Colaboratory (Colab). Colab provides a high-spec CPU and GPU, which enables high-speed calculations. Colab can be used by anyone who has a Google account. First, execute the following code to mount Google drive. This will give you access to the data in Google drive. Execution is Shift + Enter.

from google.colab import drive
drive.mount('/content/drive')
import tensorflow as tf
tf.test.gpu_device_name()

Next, we will label and train the image data. First, the source code is shown below.

#Do everything from labeling to learning

#debugger
%pdb off

import os
import glob
import cv2
import numpy as np
from keras.layers import Activation, Conv2D, Dense, Flatten, MaxPooling2D
from keras.models import Sequential
from keras.utils.np_utils import to_categorical
from tqdm import tqdm
import matplotlib.pyplot as plt
import pickle
import datetime


path = os.getcwd()
print(path)
%cd /content/drive/My\ Drive/dataset_face_fig

#Collect image names in a list
def listfig(tar_dir):
  #.Fetch files other than py
  #.Fetch files other than py
  #https://qiita.com/AAAAisBraver/items/8d40d9c2d624ecee105d
  filename = tar_dir + "*[!.py]"
  list_fig = glob.glob(filename)
  return list_fig  

#Person judgment
def jud_hum(filename):
    if (("ikuta" in filename) == True):
        ext = 0
    elif (("saito_asuka" in filename) == True):
        ext = 1
    elif (("shiraishi" in filename) == True):
        ext = 2
    else:
        ext = 'end'
        try:
            raise Exception
        except:
            traceback.print_exc()
    return ext
  
#Teacher data labeling
def label_training(names):
  #Labeling teacher data
  x_train = [] 
  y_train = []
  for name in tqdm(names):
    face_fig_list = glob.glob("/content/drive/My Drive/dataset_face_fig/train/" + name + "/" + "*")
    print(face_fig_list)
    for face_fig_filename in tqdm(face_fig_list):
      face_img = cv2.imread(face_fig_filename)
      b,g,r = cv2.split(face_img)
      face_img = cv2.merge([r,g,b])
      x_train.append(face_img)
      kind_hum = jud_hum(face_fig_filename)
      y_train.append(kind_hum)
  print(y_train)
  print("y_train length")
  print(len(y_train))
  # x_train, y_save train
  f = open('x_train.txt', 'wb')
  #Dump list to f
  pickle.dump(x_train, f)
  f = open('y_train.txt', 'wb')
  #Dump list to f
  pickle.dump(y_train, f)
  
  return x_train, y_train


#Test data labeling
def label_test(names):
  #Labeling test data
  x_test = [] 
  y_test = []
  for name in tqdm(names):
    # face_fig_list = listfig("/content/drive/My Drive/dataset_face_fig/" + name)
    face_fig_list = glob.glob("/content/drive/My Drive/dataset_face_fig/test/" + name + "/" + "*")
    # print("/content/drive/My Drive/dataset_face_fig/" + name)
    print(face_fig_list)
    for face_fig_filename in tqdm(face_fig_list):
      face_img = cv2.imread(face_fig_filename)
      b,g,r = cv2.split(face_img)
      face_img = cv2.merge([r,g,b])
      x_test.append(face_img)
      kind_hum = jud_hum(face_fig_filename)
      y_test.append(kind_hum)
  
  # x_test, y_Save test
  f = open('x_test.txt', 'wb')
  #Dump list to f
  pickle.dump(x_test, f)
  f = open('y_test.txt', 'wb')
  #Dump list to f
  pickle.dump(y_test, f)

  print(y_test)
  print("y_test length")
  print(len(y_test))
  return x_test, y_test
    

def cnn(x_train, y_train, x_test, y_test):
  #Model definition
  model = Sequential()
  model.add(Conv2D(input_shape=(64, 64, 3), filters=32,kernel_size=(3, 3), 
                  strides=(1, 1), padding="same"))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Conv2D(filters=32, kernel_size=(3, 3), 
                  strides=(1, 1), padding="same"))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Conv2D(filters=32, kernel_size=(3, 3), 
                  strides=(1, 1), padding="same"))
  model.add(MaxPooling2D(pool_size=(2, 2)))
  model.add(Flatten())
  model.add(Dense(256))
  model.add(Activation("sigmoid"))
  model.add(Dense(128))
  model.add(Activation('sigmoid'))
  model.add(Dense(3))
  model.add(Activation('softmax'))

  #compile
  model.compile(optimizer='sgd',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

  #Learning
  history = model.fit(x_train, y_train, batch_size=32, 
                      epochs=50, verbose=1, validation_data=(x_test, y_test))

  #Evaluation and display of generalization system
  score = model.evaluate(x_test, y_test, batch_size=32, verbose=0)
  print('validation loss:{0[0]}\nvalidation accuracy:{0[1]}'.format(score))
  model.save("my_model.h5")
  type(history)
  return history

def learn_monitor_plot(history):
  #acc, val_acc plot
  plt.plot(history.history["accuracy"], label="acc", ls="-", marker="o")
  plt.plot(history.history["val_accuracy"], label="val_acc", ls="-", marker="x")
  plt.ylabel("accuracy")
  plt.xlabel("epoch")
  plt.legend(loc="best")
  now = datetime.datetime.now()
  plt.savefig("learning_cpu_" + now.strftime('%Y%m%d_%H%M%S') + '.png')


def main():  
  print("ok")
  names = ["ikuta_face", "saito_asuka_face", "shiraishi_face"]
  #Labeling of teacher data and training data
  x_train, y_train = label_training(names)
  x_test, y_test = label_test(names)
  #Image data shaping
  x_train = np.array(x_train)
  x_test = np.array(x_test)
  #Label data formatting
  y_train = to_categorical(y_train)
  y_test = to_categorical(y_test)
  #Deep learning
  learn_history = cnn(x_train, y_train, x_test, y_test)
  #Graph
  learn_monitor_plot(learn_history)
  
main()
%cd /content

The part to be labeled is shown below.

#Person judgment
def jud_hum(filename):
    if (("ikuta" in filename) == True):
        ext = 0
    elif (("saito_asuka" in filename) == True):
        ext = 1
    elif (("shiraishi" in filename) == True):
        ext = 2
    else:
        ext = 'end'
        try:
            raise Exception
        except:
            traceback.print_exc()
    return ext
  
#Teacher data labeling
def label_training(names):
  #Labeling teacher data
  x_train = [] 
  y_train = []
  for name in tqdm(names):
    face_fig_list = glob.glob("/content/drive/My Drive/dataset_face_fig/train/" + name + "/" + "*")
    print(face_fig_list)
    for face_fig_filename in tqdm(face_fig_list):
      face_img = cv2.imread(face_fig_filename)
      b,g,r = cv2.split(face_img)
      face_img = cv2.merge([r,g,b])
      x_train.append(face_img)
      kind_hum = jud_hum(face_fig_filename)
      y_train.append(kind_hum)
  print(y_train)
  print("y_train length")
  print(len(y_train))
  # x_train, y_save train
  f = open('x_train.txt', 'wb')
  #Dump list to f
  pickle.dump(x_train, f)
  f = open('y_train.txt', 'wb')
  #Dump list to f
  pickle.dump(y_train, f)
  
  return x_train, y_train


#Test data labeling
def label_test(names):
  #Labeling test data
  x_test = [] 
  y_test = []
  for name in tqdm(names):
    # face_fig_list = listfig("/content/drive/My Drive/dataset_face_fig/" + name)
    face_fig_list = glob.glob("/content/drive/My Drive/dataset_face_fig/test/" + name + "/" + "*")
    # print("/content/drive/My Drive/dataset_face_fig/" + name)
    print(face_fig_list)
    for face_fig_filename in tqdm(face_fig_list):
      face_img = cv2.imread(face_fig_filename)
      b,g,r = cv2.split(face_img)
      face_img = cv2.merge([r,g,b])
      x_test.append(face_img)
      kind_hum = jud_hum(face_fig_filename)
      y_test.append(kind_hum)
  
  # x_test, y_Save test
  f = open('x_test.txt', 'wb')
  #Dump list to f
  pickle.dump(x_test, f)
  f = open('y_test.txt', 'wb')
  #Dump list to f
  pickle.dump(y_test, f)

  print(y_test)
  print("y_test length")
  print(len(y_test))
  return x_test, y_test

cv2.imread (image data) converts to 3D ndarray type. For example, if you load a 64x64 pixel image, the data type will be as follows.

data = cv2.imread('filepath')
print(data.shape)
# (64, 64, 3)

Extract 3 2D arrays of data with cv2.split (ndarray type data). Note that at this time, blue, green, and red are extracted in that order. Create a list with cv2.merge ([r, g, b]) and add it to x_train or x_test with append. When calculating with GPU, it took time to read the image, so I wrote the code to temporarily save only the image data in Drive.

  # x_test, y_Save test
  f = open('x_test.txt', 'wb')
  #Dump list to f
  pickle.dump(x_test, f)
  f = open('y_test.txt', 'wb')
  #Dump list to f
  pickle.dump(y_test, f)

It takes about 5 seconds to calculate 1 epoch on the CPU, but it takes less than 1 second on the GPU. Please, try it. The result is as follows.

ダウンロード.png

The accuracy is about 80%.

Summary

For the first time, I wrote a full-scale code and performed image recognition. In the future, I would like to study and use another network. We look forward to your questions and comments. Thank you for reading this far.

Reference article

Classify Nogizaka member's face by CNN

[Deep learning] Nogizaka face detection ~ For beginners ~

Introduction

procedure

1. Image scraping

Reference article

2. Delete the same image

3. Make the image names serial

Reference article

4. Cut out only the face part

5. Separate teacher data and test data

6. Inflate teacher data

7. Learn using Google Colaboratory

Summary

Reference article