Hello. I've always been interested in image recognition, and although I had vague knowledge, I hadn't written any code yet. This time I would like to write an article for beginners. Thank you.
First, collect member images. There are image search services such as Bing and Yahoo, but in this article we will collect images by scraping the results of Google image search. You can get the original code from here. However, the original code cannot be downloaded (as of April 2020). There is a patch for this problem as Pull requests, so this time I will use the code.
Example of how to use google_images_download.py
cd google_images_download/
python google_images_download.py -k "Asuka Saito"
python google_images_download.py -k "site:twitter.com Asuka Saito"
You can only download up to 100 copies at a time with this program. If you want to collect more images, please change the search word. Also, if you use the search word "site: url", you can download only the images that correspond to the url.
Google image download did not work, so support it
Next, delete the exact same image data from the images collected in 1. The algorithm is summarized in the figure below.
In 1., 8bit x 3 (RGB) vector is converted to 1bit (black and white). At this time, monochrome conversion is performed using an algorithm called Floyd-Steinberg dithering. ([a](https://ja.wikipedia.org/wiki/%E3%83%95%E3%83%AD%E3%82%A4%E3%83%89-%E3%82%B9%E3 % 82% BF% E3% 82% A4% E3% 83% B3% E3% 83% 90% E3% 83% BC% E3% 82% B0% E3% 83% BB% E3% 83% 87% E3% 82 % A3% E3% 82% B6% E3% 83% AA% E3% 83% B3% E3% 82% B0), [b](https://pillow.readthedocs.io/en/stable/reference/Image. html), c)). The code for monochrome conversion is shown below. I referred to the code of here.
def img2vec(filename):
img = Image.open(filename)
img = img.resize((200, 200), resample=Image.BILINEAR) #Shrink
img = img.convert('1') #Binarization
#img.save(get_mono_filename(filename)) #If you want to check the image
img = np.array(img)
#Convert to int type matrix
int_img = img.astype(np.int)
return int_img
Then calculate the norm.
def calnorm(vec):
norm = np.linalg.norm(vec)
return norm
Arrange the norms in order of descending. Use the sort function to sort the values in any column.
def fig_norm_list_cal(list_fig):
#Make a list that summarizes each fig and norm
#Example
# fig_norm_list = [[fig, norm], [fig.A, 2], [fig.B, 3],...[figZ, 5]]
fig_norm_list = fig_norm_mat(list_fig)
#Arrange the norm sizes in descending order
#http://sota1235.com/blog/2015/04/23/python_sort_twodime.html
fig_norm_list = sorted(fig_norm_list, key = lambda x: x[1])
#nd.Convert to array type
fig_norm_list = np.array(fig_norm_list)
return fig_norm_list
Examine the norm values one by one and delete the ones with the same value.
def del_fig_list_cal(fig_norm_list):
#List files to delete
del_fig_list = []
#The bottom line does not count
for i in range(fig_norm_list.shape[0] - 1):
if (fig_norm_list[i, 1] == fig_norm_list[i+1, 1]):
del_fig_list.append(str(fig_norm_list[i, 0]))
return del_fig_list
Delete the file using the del_fig_list and os.remove () calculated above.
This code also determines the extension at the same time.
#Change the name of the image to a serial number
def fig_rename():
i = 1
j = 1
k = 1
list_fig = listfig()
for pre_filename in list_fig:
if ((".jpg " in pre_filename) == True ):
os.rename(pre_filename, path + "/fig_data/" + str(i) + '.jpg')
i += 1
elif ((".png " in pre_filename) == True):
os.rename(pre_filename, path + "/fig_data/" + str(j) + '.png')
j += 1
elif ((".jpeg " in pre_filename) == True):
os.rename(pre_filename, path + "/fig_data/" + str(k) + '.jpeg')
k += 1
else:
pass
return
Program to search for the same image
The image may contain something other than a face, or it may contain multiple faces. You need to shape this into a face-only image that you can use for learning. Here, we used face detection by Haar-Cascade, which is installed as standard in OpenCV.
import cv2
import os
import glob
path = os.getcwd()
def listfig():
#.Fetch files other than py
#https://qiita.com/AAAAisBraver/items/8d40d9c2d624ecee105d
filename = path + "/face_detect_test/" + "*[!.py]"
list_fig = glob.glob(filename)
return list_fig
def jud_ext(filename):
if ((".jpg " in filename) == True):
ext = '.jpg'
elif ((".png " in filename) == True):
ext = '.png'
elif ((".jpeg " in filename) == True):
ext = '.jpeg'
else:
ext = 'end'
return ext
def main():
face_cascade_path = '/home/usr/anaconda3/pkgs/libopencv-4.2.0-py36_2'\
'/share/opencv4/haarcascades/haarcascade_frontalface_default.xml'
face_cascade = cv2.CascadeClassifier(face_cascade_path)
#Image list creation
fig_list = listfig()
#Extracted from image list with for statement
j = 1 #Number images before detecting faces
k = 0 #Count images that are too small
for fig_name in fig_list:
#fig_Identify the extension of name
ext = jud_ext(fig_name)
#If end, end
if (ext == 'end'):
break
#Image loading
src = cv2.imread(fig_name)
src_gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(src_gray)
#Cut out the faces part
i = 1 #In case there are two or more faces in one image
for x, y, w, h in faces:
#cv2.rectangle(src, (x, y), (x + w, y + h), (255, 0, 0), 2)
#Cut only the face part from src
face = src[y: y + h, x: x + w]
#face_gray = src_gray[y: y + h, x: x + w]
#Make sure there are at least 64 pixels on each side
if face.shape[0] < 64:
print('\ntoo small!!\n')
print('\n{}\n'.format(fig_name))
k += 1
continue
#Caution: (64,64)Is not saved below 64x64 pixels
face = cv2.resize(face, (64, 64))
#Face save
cv2.imwrite(path + '/face_detect_test/clip_face/' + 'face' + str(i) + str(j) + ext, face)
i += 1
j += 1
print("too small fig is {}".format(k))
if __name__ == '__main__':
main()
I will explain a partial excerpt of the code. First, use the following code to determine the extension of the image.
def jud_ext(filename):
if ((".jpg " in filename) == True):
ext = '.jpg'
elif ((".png " in filename) == True):
ext = '.png'
elif ((".jpeg " in filename) == True):
ext = '.jpeg'
else:
ext = 'end'
return ext
The image I used had only jpg, png, and jpeg formats, so I wrote the code to judge those three. The extension of the image is determined because the image is saved with the same extension after the face image is cut.
The following code is used when cutting the face. At this time, the resolution is converted to 64x64. This is a measure to reduce the data size. The cv2.resize
function does not save images smaller than 64x64.
However, as I noticed later, it is extremely difficult to see when judging the face image manually, so I think that you can save it at 128 x 128.
def main():
face_cascade_path = '/home/usr/anaconda3/pkgs/libopencv-4.2.0-py36_2'\
'/share/opencv4/haarcascades/haarcascade_frontalface_default.xml'
face_cascade = cv2.CascadeClassifier(face_cascade_path)
#Image list creation
fig_list = listfig()
#Extracted from image list with for statement
j = 1 #Number images before detecting faces
k = 0 #Count images that are too small
for fig_name in fig_list:
#fig_Identify the extension of name
ext = jud_ext(fig_name)
#If end, end
if (ext == 'end'):
break
#Image loading
src = cv2.imread(fig_name)
src_gray = cv2.cvtColor(src, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(src_gray)
#Cut out the faces part
i = 1 #In case there are two or more faces in one image
for x, y, w, h in faces:
#cv2.rectangle(src, (x, y), (x + w, y + h), (255, 0, 0), 2)
#Cut only the face part from src
face = src[y: y + h, x: x + w]
#face_gray = src_gray[y: y + h, x: x + w]
#Make sure there are at least 64 pixels on each side
if face.shape[0] < 64:
print('\ntoo small!!\n')
print('\n{}\n'.format(fig_name))
k += 1
continue
#Caution: (64,64)Is not saved below 64x64 pixels
face = cv2.resize(face, (64, 64))
#Face save
cv2.imwrite(path + '/face_detect_test/clip_face/' + 'face' + str(i) + str(j) + ext, face)
i += 1
j += 1
print("too small fig is {}".format(k))
Face photos are divided by allocating 80% of teacher data and 20% of test data. I used the following code.
import os
import random
import shutil
import glob
path = os.getcwd()
names = ["ikuta_face", "saito_asuka_face", "shiraishi_face"]
#Collect image names in a list
def listfig(tar_dir):
#.Fetch files other than py
#https://qiita.com/AAAAisBraver/items/8d40d9c2d624ecee105d
filename = path + tar_dir + "*[!.py]"
list_fig = glob.glob(filename)
return list_fig
def main():
for name in names:
#Get image directory list
face_list = listfig("/dataset_face_fig/" + name + "/")
# face_Shuffle list
random.shuffle(face_list)
# face_Move 20% from the top of list to the test directory
for i in range(len(face_list)//5):
shutil.move(str(face_list[i]), str(path + "/dataset_face_fig/test/" + name))
if __name__ == "__main__":
main()
After separating the image data, check that the data is correct. If there is a mistake in Google image search, an image that has nothing to do with it may be saved. In my case, there were the following cases.
--The search results for Asuka Saito include a photo of Seira Hayakawa's face. --Asuka Saito and Nanase Nishino can no longer be distinguished. ――Gestalt collapses, making it difficult to identify the face.
As I mentioned earlier, I was struggling with a 64x64 image while doing this.
After sorting the images, the next step is to process the images to increase the teacher data. The source code is shown below.
import os
import glob
import traceback
import cv2
from scipy import ndimage
path = os.getcwd()
#Collect image names in a list
def listfig(tar_dir):
#.Fetch files other than py
#https://qiita.com/AAAAisBraver/items/8d40d9c2d624ecee105d
filename = path + tar_dir + "*[!.py]"
list_fig = glob.glob(filename)
return list_fig
#Rotate the image
def fig_rot(img, ang):
img_rot = ndimage.rotate(img,ang)
img_rot = cv2.resize(img_rot,(64,64))
return img_rot
#Extension judgment
def jud_ext(filename):
if ((".jpg " in filename) == True):
ext = '.jpg'
elif ((".png " in filename) == True):
ext = '.png'
elif ((".jpeg " in filename) == True):
ext = '.jpeg'
else:
ext = 'end'
try:
raise Exception
except:
traceback.print_exc()
return ext
def main():
#Specify the directory that contains the teacher data to increase
names = ["ikuta_face", "saito_asuka_face", "shiraishi_face"]
for name in names:
# /dataset_face_fig/name/Get an image list from a directory
train_fig_list = listfig("/dataset_face_fig/" + "/" + name + "/")
i = 1 #Don't cover the image name
for train_fig in train_fig_list:
ext = jud_ext(train_fig)
#Image load
img = cv2.imread(train_fig)
#rotation
for ang in [-10, 0, 10]:
# j = 1 #Don't cover the image name
img_rot = fig_rot(img, ang)
#Save image
cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + ext, img_rot)
#Threshold processing
img_thr = cv2.threshold(img_rot, 100, 255, cv2.THRESH_TOZERO)[1]
cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + 'thr' + ext, img_thr)
#Blur processing
img_filter = cv2.GaussianBlur(img_rot, (5, 5), 0)
cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + 'fil' + ext, img_filter)
i += 1
return
if __name__ == "__main__":
main()
Rotate one image data to increase it to three. After that, threshold processing and blurring processing are performed for each image. Nine images are included in 3x3 for one image data. Below is an excerpt of the core code.
def main():
#Specify the directory that contains the teacher data to increase
names = ["ikuta_face", "saito_asuka_face", "shiraishi_face"]
for name in names:
# /dataset_face_fig/name/Get an image list from a directory
train_fig_list = listfig("/dataset_face_fig/" + "/" + name + "/")
i = 1 #Don't cover the image name
for train_fig in train_fig_list:
ext = jud_ext(train_fig)
#Image load
img = cv2.imread(train_fig)
#rotation
for ang in [-10, 0, 10]:
# j = 1 #Don't cover the image name
img_rot = fig_rot(img, ang)
#Save image
cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + ext, img_rot)
#Threshold processing
img_thr = cv2.threshold(img_rot, 100, 255, cv2.THRESH_TOZERO)[1]
cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + 'thr' + ext, img_thr)
#Blur processing
img_filter = cv2.GaussianBlur(img_rot, (5, 5), 0)
cv2.imwrite(path + '/dataset_face_fig/train/' + name + "/" + str(i) + '_' + str(ang) + 'fil' + ext, img_filter)
i += 1
return
After this, you need to 1. label the image data and 2. learn. From here, I will try using Google Colaboratory (Colab).
Colab provides a high-spec CPU and GPU, which enables high-speed calculations. Colab can be used by anyone who has a Google account.
First, execute the following code to mount Google drive. This will give you access to the data in Google drive. Execution is Shift + Enter
.
from google.colab import drive
drive.mount('/content/drive')
import tensorflow as tf
tf.test.gpu_device_name()
Next, we will label and train the image data. First, the source code is shown below.
#Do everything from labeling to learning
#debugger
%pdb off
import os
import glob
import cv2
import numpy as np
from keras.layers import Activation, Conv2D, Dense, Flatten, MaxPooling2D
from keras.models import Sequential
from keras.utils.np_utils import to_categorical
from tqdm import tqdm
import matplotlib.pyplot as plt
import pickle
import datetime
path = os.getcwd()
print(path)
%cd /content/drive/My\ Drive/dataset_face_fig
#Collect image names in a list
def listfig(tar_dir):
#.Fetch files other than py
#.Fetch files other than py
#https://qiita.com/AAAAisBraver/items/8d40d9c2d624ecee105d
filename = tar_dir + "*[!.py]"
list_fig = glob.glob(filename)
return list_fig
#Person judgment
def jud_hum(filename):
if (("ikuta" in filename) == True):
ext = 0
elif (("saito_asuka" in filename) == True):
ext = 1
elif (("shiraishi" in filename) == True):
ext = 2
else:
ext = 'end'
try:
raise Exception
except:
traceback.print_exc()
return ext
#Teacher data labeling
def label_training(names):
#Labeling teacher data
x_train = []
y_train = []
for name in tqdm(names):
face_fig_list = glob.glob("/content/drive/My Drive/dataset_face_fig/train/" + name + "/" + "*")
print(face_fig_list)
for face_fig_filename in tqdm(face_fig_list):
face_img = cv2.imread(face_fig_filename)
b,g,r = cv2.split(face_img)
face_img = cv2.merge([r,g,b])
x_train.append(face_img)
kind_hum = jud_hum(face_fig_filename)
y_train.append(kind_hum)
print(y_train)
print("y_train length")
print(len(y_train))
# x_train, y_save train
f = open('x_train.txt', 'wb')
#Dump list to f
pickle.dump(x_train, f)
f = open('y_train.txt', 'wb')
#Dump list to f
pickle.dump(y_train, f)
return x_train, y_train
#Test data labeling
def label_test(names):
#Labeling test data
x_test = []
y_test = []
for name in tqdm(names):
# face_fig_list = listfig("/content/drive/My Drive/dataset_face_fig/" + name)
face_fig_list = glob.glob("/content/drive/My Drive/dataset_face_fig/test/" + name + "/" + "*")
# print("/content/drive/My Drive/dataset_face_fig/" + name)
print(face_fig_list)
for face_fig_filename in tqdm(face_fig_list):
face_img = cv2.imread(face_fig_filename)
b,g,r = cv2.split(face_img)
face_img = cv2.merge([r,g,b])
x_test.append(face_img)
kind_hum = jud_hum(face_fig_filename)
y_test.append(kind_hum)
# x_test, y_Save test
f = open('x_test.txt', 'wb')
#Dump list to f
pickle.dump(x_test, f)
f = open('y_test.txt', 'wb')
#Dump list to f
pickle.dump(y_test, f)
print(y_test)
print("y_test length")
print(len(y_test))
return x_test, y_test
def cnn(x_train, y_train, x_test, y_test):
#Model definition
model = Sequential()
model.add(Conv2D(input_shape=(64, 64, 3), filters=32,kernel_size=(3, 3),
strides=(1, 1), padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=32, kernel_size=(3, 3),
strides=(1, 1), padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Conv2D(filters=32, kernel_size=(3, 3),
strides=(1, 1), padding="same"))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(256))
model.add(Activation("sigmoid"))
model.add(Dense(128))
model.add(Activation('sigmoid'))
model.add(Dense(3))
model.add(Activation('softmax'))
#compile
model.compile(optimizer='sgd',
loss='categorical_crossentropy',
metrics=['accuracy'])
#Learning
history = model.fit(x_train, y_train, batch_size=32,
epochs=50, verbose=1, validation_data=(x_test, y_test))
#Evaluation and display of generalization system
score = model.evaluate(x_test, y_test, batch_size=32, verbose=0)
print('validation loss:{0[0]}\nvalidation accuracy:{0[1]}'.format(score))
model.save("my_model.h5")
type(history)
return history
def learn_monitor_plot(history):
#acc, val_acc plot
plt.plot(history.history["accuracy"], label="acc", ls="-", marker="o")
plt.plot(history.history["val_accuracy"], label="val_acc", ls="-", marker="x")
plt.ylabel("accuracy")
plt.xlabel("epoch")
plt.legend(loc="best")
now = datetime.datetime.now()
plt.savefig("learning_cpu_" + now.strftime('%Y%m%d_%H%M%S') + '.png')
def main():
print("ok")
names = ["ikuta_face", "saito_asuka_face", "shiraishi_face"]
#Labeling of teacher data and training data
x_train, y_train = label_training(names)
x_test, y_test = label_test(names)
#Image data shaping
x_train = np.array(x_train)
x_test = np.array(x_test)
#Label data formatting
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)
#Deep learning
learn_history = cnn(x_train, y_train, x_test, y_test)
#Graph
learn_monitor_plot(learn_history)
main()
%cd /content
The part to be labeled is shown below.
#Person judgment
def jud_hum(filename):
if (("ikuta" in filename) == True):
ext = 0
elif (("saito_asuka" in filename) == True):
ext = 1
elif (("shiraishi" in filename) == True):
ext = 2
else:
ext = 'end'
try:
raise Exception
except:
traceback.print_exc()
return ext
#Teacher data labeling
def label_training(names):
#Labeling teacher data
x_train = []
y_train = []
for name in tqdm(names):
face_fig_list = glob.glob("/content/drive/My Drive/dataset_face_fig/train/" + name + "/" + "*")
print(face_fig_list)
for face_fig_filename in tqdm(face_fig_list):
face_img = cv2.imread(face_fig_filename)
b,g,r = cv2.split(face_img)
face_img = cv2.merge([r,g,b])
x_train.append(face_img)
kind_hum = jud_hum(face_fig_filename)
y_train.append(kind_hum)
print(y_train)
print("y_train length")
print(len(y_train))
# x_train, y_save train
f = open('x_train.txt', 'wb')
#Dump list to f
pickle.dump(x_train, f)
f = open('y_train.txt', 'wb')
#Dump list to f
pickle.dump(y_train, f)
return x_train, y_train
#Test data labeling
def label_test(names):
#Labeling test data
x_test = []
y_test = []
for name in tqdm(names):
# face_fig_list = listfig("/content/drive/My Drive/dataset_face_fig/" + name)
face_fig_list = glob.glob("/content/drive/My Drive/dataset_face_fig/test/" + name + "/" + "*")
# print("/content/drive/My Drive/dataset_face_fig/" + name)
print(face_fig_list)
for face_fig_filename in tqdm(face_fig_list):
face_img = cv2.imread(face_fig_filename)
b,g,r = cv2.split(face_img)
face_img = cv2.merge([r,g,b])
x_test.append(face_img)
kind_hum = jud_hum(face_fig_filename)
y_test.append(kind_hum)
# x_test, y_Save test
f = open('x_test.txt', 'wb')
#Dump list to f
pickle.dump(x_test, f)
f = open('y_test.txt', 'wb')
#Dump list to f
pickle.dump(y_test, f)
print(y_test)
print("y_test length")
print(len(y_test))
return x_test, y_test
cv2.imread (image data)
converts to 3D ndarray type. For example, if you load a 64x64 pixel image, the data type will be as follows.
data = cv2.imread('filepath')
print(data.shape)
# (64, 64, 3)
Extract 3 2D arrays of data with cv2.split (ndarray type data). Note that at this time, blue, green, and red are extracted in that order. Create a list with cv2.merge ([r, g, b]) and add it to x_train or x_test with append. When calculating with GPU, it took time to read the image, so I wrote the code to temporarily save only the image data in Drive.
# x_test, y_Save test
f = open('x_test.txt', 'wb')
#Dump list to f
pickle.dump(x_test, f)
f = open('y_test.txt', 'wb')
#Dump list to f
pickle.dump(y_test, f)
It takes about 5 seconds to calculate 1 epoch on the CPU, but it takes less than 1 second on the GPU. Please, try it. The result is as follows.
The accuracy is about 80%.
For the first time, I wrote a full-scale code and performed image recognition. In the future, I would like to study and use another network. We look forward to your questions and comments. Thank you for reading this far.
Classify Nogizaka member's face by CNN
Recommended Posts