** ~~ The article has been revised (March 10, 2020) ~~ ** ** Corrected the article (May 4, 2020) **
This article summarizes the following:
By the way, I'm a beginner in machine learning, so I may be using inappropriate expressions. In that case, we apologize for the inconvenience, but we would appreciate it if you could point out.
The execution environment is as follows.
The URLs that I referred to are listed below. https://qiita.com/ayumiya/items/e1e87df54c41519be6b4
This time, I used the flickr API to collect a large number of images. Regarding this, we collected images by registering flickr, issuing API, secret KEY, etc. with reference to the following URL. http://ykubot.com/2017/11/05/flickr-api/
Regarding collection, I was able to collect images automatically by executing the following program on python. The time required to collect 300 sheets was about 5 minutes in this case.
Below is the full text of the program.
collection.py
import os
import time
import traceback
import flickrapi
from urllib.request import urlretrieve
import sys
from retry import retry
flickr_api_key = "xxxx" #Enter the API KEY issued here
secret_key = "xxxx" #Enter the SECRET KEY issued here
keyword = sys.argv[1]
@retry()
def get_photos(url, filepath):
urlretrieve(url, filepath)
time.sleep(1)
if __name__ == '__main__':
flicker = flickrapi.FlickrAPI(flickr_api_key, secret_key, format='parsed-json')
response = flicker.photos.search(
text='child',#Enter the word you want to search
per_page=300,#Specify how many sheets you want to collect
media='photos',
sort='relevance',
safe_search=1,
extras='url_q,license'
)
photos = response['photos']
try:
if not os.path.exists('./image-data/' + 'child'):
os.mkdir('./image-data/' +'child')
for photo in photos['photo']:
url_q = photo['url_q']
filepath = './image-data/' + 'child' + '/' + photo['id'] + '.jpg'
get_photos(url_q, filepath)
except Exception as e:
traceback.print_exc()
This time, we collected 300 of these images to recognize "child: child" and "old man: elder". As I will explain later, it is estimated that the search word elder did not have a high recognition rate because images of plants, young women, etc. were collected as well as old people.
The following article describes the image recognition program. I will focus on the points that I have stumbled upon and the points that I should be aware of from the overall picture, and then write the full program.
child.py
import os
import cv2
import numpy as np
import matplotlib.pyplot as plt
from keras.utils.np_utils import to_categorical
from keras.layers import Dense, Dropout, Flatten, Input
from keras.applications.vgg16 import VGG16
from keras.models import Model, Sequential
from keras import optimizers
child.py
model = Model(input=vgg16.input, output=top_model(vgg16.output))
Originally, the specification environment installed tensorflow and quoted it as the library keras in it. Example: from tensorflow.keras.utils import to_categorical
If you proceed as it is, in defining this Model function,
('Keyword argument not understood:', 'input')
An error has occurred.
It seems that this happens when the version of keras is old, I also updated keras itself, but the error continued to appear. Therefore, when I tried to write to read keras itself, it was solved.
This error summary and update of keras itself https://sugiyamayoshiaki.jp/%E3%80%90%E3%82%A8%E3%83%A9%E3%83%BC%E3%80%91typeerror-keyword-argument-not-understood-data_format/
The model used this time is a trained model based on a multi-layer perceptron called VGG16.
URL:https://arxiv.org/pdf/1409.1556.pdf Author: Karen Simonyan & Andrew Zisserman Visual Geometry Group, Department of Engineering Science, University of Oxford
The above ** image D ** is the model handled this time. It is a neural network with 13 convolutional layers ** with a kernel size of ** 3D and 3 fully connected layers. This algorithm was announced in 2015, but before that, the kernel size was very large, such as 9 or 11 dimensions, so there was a problem that the discrimination rate was low although the calculation load was high. Therefore, VGG16 is a method that solves this problem by reducing the kernel size to 3 and making the layers deeper (13 layers). Furthermore, the point of this model is that you can refer to a weighted model that has undergone a large amount of image recognition learning on an image site called ImageNet. It can be identified as 1000 types of items. Therefore, there are two ways to use it.
One is to use the weighted model that has been trained as it is to find the closest one from 1000 types. The second method is to train a new model using the model itself without using weights. This time, I would like to use this second method to distinguish between children and old people.
Let's take a look at the contents of this model.
child.py
model.summary()
output
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_1 (InputLayer) (None, 224, 224, 3) 0
_________________________________________________________________
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
_________________________________________________________________
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
_________________________________________________________________
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
_________________________________________________________________
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
_________________________________________________________________
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
_________________________________________________________________
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
_________________________________________________________________
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
_________________________________________________________________
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
_________________________________________________________________
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
_________________________________________________________________
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
_________________________________________________________________
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
_________________________________________________________________
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
_________________________________________________________________
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
_________________________________________________________________
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
_________________________________________________________________
flatten (Flatten) (None, 25088) 0
_________________________________________________________________
fc1 (Dense) (None, 4096) 102764544
_________________________________________________________________
fc2 (Dense) (None, 4096) 16781312
_________________________________________________________________
predictions (Dense) (None, 1000) 4097000
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________
Looking at the ** Output shape **, it is (None, 244, 244, 3). This represents ** (number of samples, vertical resolution, horizontal resolution, number of channels (≈ RGB if the number of dimensions is color)) **. Looking at this value, we can see that the number of dimensions increases from 64 to 128 to 256 to 512 after passing through the convolution layer.
child.py
input_tensor = Input(shape=(50, 50, 3))
vgg16 = VGG16(include_top=False, weights='imagenet', input_tensor=input_tensor)
top_model = Sequential()
top_model.add(Flatten(input_shape=vgg16.output_shape[1:]))
top_model.add(Dense(256, activation='relu'))
top_model.add(Dropout(0.5))
top_model.add(Dense(2, activation='softmax'))
Add after VGG16 layer to divide into two types, children and old people.
https://qiita.com/MuAuan/items/86a56637a1ebf455e180
child.py
def child(img):
img = cv2.resize(img, (50, 50))
pred = np.argmax(model.predict(np.array([img])))
if pred == 0:
return 'This is a kid'
else:
return 'This is an old man'
As a result, the correct answer rate was not very good. This is because if you search for "elder" on flickr, you will find photos of non-old people, so I think that learning is not going well. The photo below is an example.
When using an already trained model such as VGG16, it is effective to use a dataset that can make good use of the trained content.
I hope you can use it by riding firmly on the shoulders of your ancestors.
The full program text is stored below. https://github.com/Fumio-eisan/vgg16_child20200310
Recommended Posts