Hello, this is Magicchic of Aidemy trainees. What kind of image do you have of programming? It's been about two months since I first came into contact with programming, but I still feel that it's difficult. However, I would be happy if I could improve the error and make it work. This time, I would like to share the results of such a beginner trying to make something that works for the first time using python.
Recently, technology called image recognition is playing an active role, such as face recognition of cameras and detection of defective products at factories. There is also a similar entertainer diagnosis in the app so that you can judge the type of animal from the photo. CNN (Convolutional Neural Network) is a technology that realizes such advanced image recognition. Convolution Neural Network is one of the learning methods for AI to perform image analysis, and it can analyze even images that are partially difficult to see. It is also called CNN for short.
It is a forward-propagating network with a structure that includes two layers, a convolution layer and a pooling layer. It has a combination of "weight sharing".
It can be said that it is a neural network that incorporates a "structure" of two hidden layers that have been devised in addition to the "multilayer structure".
After the image to be analyzed is loaded into the input layer, the filter is used to scan the data all over and extract the features (gradient, unevenness, etc.) of the data. The extracted feature data is sent to the convolution layer, where more condensed feature data is created.
And Keras is the library that made that CNN easy for everyone to use. If you want to create an image recognition program, creating a CNN in Keras is a shortcut.
Keras is a high-level neural network library written in Python that can be run on TensorFlow or Theano. Keras was developed with a focus on enabling rapid experimentation. It is important for good research to be able to move from idea to result as quickly as possible. (From keras official document)
I decided to judge the car model as the theme set this time. I'm totally ignorant of cars. I think it's amazing to see people who can guess the type of car just by looking at the car. I don't mean to compete, but I wish I could do the same with machine learning. Three target models are picked up from Toyota's domestic luxury cars
The general procedure is as follows.
From here on, follow the steps shown above. This time, I collected it using icrawler.
pip install icrawler
Run this in your terminal and first install icrawler. Then in a text editor
from icrawler.builtin import BingImageCrawler
crawler = BingImageCrawler(storage={"root_dir": "toyotacentury"})
crawler.crawl(keyword="toyotacentury", max_num=100)
from icrawler.builtin import BingImageCrawler
crawler = BingImageCrawler(storage={"root_dir": "toyotacrown"})
crawler.crawl(keyword="toyotacrown", max_num=120)
from icrawler.builtin import BingImageCrawler
crawler = BingImageCrawler(storage={"root_dir": "toyotamarkx"})
crawler.crawl(keyword="toyotamarkx", max_num=120)
As a result of visually deleting the data that could not be used as data (irrelevant or unclear data), the number of data collected was 80 each. Save them as npy.
from PIL import Image
import os, glob
import numpy as np
import sklearn
from sklearn import model_selection
classes = ["toyotacentury", "toyotacrown", "toyotamarkx"]
num_classes = len(classes)
image_size = 100
#Load image, convert to numpy array
X = []
Y = []
for index, classlabel in enumerate(classes):
photos_dir = "./" + classlabel
files = glob.glob(photos_dir + "/*.jpg ")
for i, file in enumerate(files):
if i >= 93: break
image = Image.open(file)
image = image.convert("RGB")
image = image.resize((image_size, image_size))
data = np.asarray(image)
X.append(data)
Y.append(index)
#Convert from list to numpy
X = np.array(X)
Y = np.array(Y)
#Divide the data for training and evaluation
X_train, X_test, y_train, y_test = model_selection.train_test_split(X, Y)
xy = (X_train, X_test, y_train, y_test)
np.save("./toyotacar.npy", xy)
Next is the work of increasing the data. Since the current number should not be enough, increase the number of images in the target folder.
import os
import glob
import numpy as np
from keras.preprocessing.image import ImageDataGenerator,load_img, img_to_array, array_to_img
#Functions that extend the image
def draw_images(generator, x, dir_name, index):
save_name = 'extened-' + str(index)
g = generator.flow(x, batch_size=1, save_to_dir=output_dir,
save_prefix=save_name, save_format='jpeg')
#Specify how many images to expand from one input image (10 images this time)
for i in range(10):
bach = g.next()
#Output destination folder settings
output_dir = "toyotacenturyzou"
if not(os.path.exists(output_dir)):
os.mkdir(output_dir)
#Image loading to expand
images = glob.glob(os.path.join("toyotacentury", "*.jpg "))
#Define ImageDataGenerator
datagen = ImageDataGenerator(rotation_range=20,
width_shift_range=0,
shear_range=0,
height_shift_range=0,
zoom_range=0,
horizontal_flip=True,
fill_mode="nearest",
channel_shift_range=40)
#Image expansion
for i in range(len(images)):
img = load_img(images[i])
img = img.resize((350,300 ))
x = img_to_array(img)
x = np.expand_dims(x, axis=0)
draw_images(datagen, x, output_dir, i)
import os
import glob
import numpy as np
from keras.preprocessing.image import ImageDataGenerator,load_img, img_to_array, array_to_img
#Functions that extend the image
def draw_images(generator, x, dir_name, index):
save_name = 'extened-' + str(index)
g = generator.flow(x, batch_size=1, save_to_dir=output_dir,
save_prefix=save_name, save_format='jpeg')
#Specify how many images to expand from one input image (10 images this time)
for i in range(10):
bach = g.next()
#Output destination folder settings
output_dir = "toyotacrownzou"
if not(os.path.exists(output_dir)):
os.mkdir(output_dir)
#Image loading to expand
images = glob.glob(os.path.join("toyotacrown", "*.jpg "))
#Define ImageDataGenerator
datagen = ImageDataGenerator(rotation_range=20,
width_shift_range=0,
shear_range=0,
height_shift_range=0,
zoom_range=0,
horizontal_flip=True,
fill_mode="nearest",
channel_shift_range=40)
#Image expansion
for i in range(len(images)):
img = load_img(images[i])
img = img.resize((350,300 ))
x = img_to_array(img)
x = np.expand_dims(x, axis=0)
draw_images(datagen, x, output_dir, i)
import os
import glob
import numpy as np
from keras.preprocessing.image import ImageDataGenerator,load_img, img_to_array, array_to_img
#Functions that extend the image
def draw_images(generator, x, dir_name, index):
save_name = 'extened-' + str(index)
g = generator.flow(x, batch_size=1, save_to_dir=output_dir,
save_prefix=save_name, save_format='jpeg')
#Specify how many images to expand from one input image (10 images this time)
for i in range(10):
bach = g.next()
#Output destination folder settings
output_dir = "toyotamarkxzou"
if not(os.path.exists(output_dir)):
os.mkdir(output_dir)
#Image loading to expand
images = glob.glob(os.path.join("toyotamarkx", "*.jpg "))
#Define ImageDataGenerator
datagen = ImageDataGenerator(rotation_range=20,
width_shift_range=0,
shear_range=0,
height_shift_range=0,
zoom_range=0,
horizontal_flip=True,
fill_mode="nearest",
channel_shift_range=40)
#Image expansion
for i in range(len(images)):
img = load_img(images[i])
img = img.resize((350,300 ))
x = img_to_array(img)
x = np.expand_dims(x, axis=0)
draw_images(datagen, x, output_dir, i)
This work has increased the number of images to 800, which is 10 times the original number of images.
The first thing to note is that if you specify data in () with np.load when defining the main function, an error may occur. I think it will be improved if you specify the allow_pickle option there. See the link below (https://qiita.com/ytkj/items/ee6e1125476883923db8)
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.utils import np_utils
import keras
import numpy as np
from keras.optimizers import RMSprop
classes = ["toyotacentury", "toyotacrown", "toyotamarkx"]
num_classes = len(classes)
image_size = 100
#Definition of main function
def main():
X_train, X_test, y_train, y_test = np.load("./toyotacar.npy", allow_pickle=True)#Read data from a file into an array
X_train = X_train.astype("float") / 256#Normalize data
X_test = X_test.astype("float") / 256
y_train = np_utils.to_categorical(y_train, num_classes)
y_test = np_utils.to_categorical(y_test, num_classes)
#Calling training and evaluation functions
model = model_train(X_train, y_train)
model_eval(model, X_test, y_test)
def model_train(X, y):
model = Sequential()
model.add(Conv2D(32,(3,3), padding='same',input_shape=X.shape[1:]))
model.add(Activation('relu'))
model.add(Conv2D(32,(3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(2,2)))
model.add(Dropout(0.25))
model.add(Conv2D(64,(2,2), padding='same'))
model.add(Activation('relu'))
model.add(Conv2D(64,(3,3)))
model.add(Activation('relu'))
model.add(MaxPooling2D(pool_size=(3,3)))
model.add(Dropout(0.25))
model.add(Flatten())
model.add(Dense(512))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(3))
model.add(Activation('softmax'))
#Optimization process
opt = keras.optimizers.RMSprop(lr=0.0001, decay=1e-6)
#Try to reduce the error between the correct answer and the estimated value
model.compile(loss='categorical_crossentropy',optimizer=opt,metrics=['accuracy'])
model.fit(X, y, batch_size=20, epochs=75)
#Save model
model.save('./toyota_cnn.h5')
return model
def model_eval(model, X, y):
scores = model.evaluate(X, y, verbose=1)
print('Test Loss: ', scores[0])
print('Test Accuracy: ', scores[1])
if __name__ == "__main__":
main()
You may get an AttributeError when you run it (maybe just me), I've attached the link below for a detailed explanation of the cause. (https://ja.stackoverflow.com/questions/48286/python%e3%81%a7attributeerror)
The execution result is as follows.
Test Loss: 2.74328875541687
Test Accuracy: 0.4833333194255829
In the result of this execution, accuracy is less than 50% and the accuracy is inaccurate, and loss is far from 0. I would like to find the optimum solution by cropping the image, increasing the number of images, and changing the number of epochs to increase the numerical value.
Especially on the theme of "car", there is not much difference in the characteristics of a large framework. After making accuracy 1.0 and loss as close as possible, I would like to define a function that receives an image and makes a judgment, and executes the prediction.
(https://qiita.com/kenichiro-yamato/items/b64c70882473904600bf)
(https://qiita.com/kazama0119/items/ede4732d21fe00085eb6) (https://qiita.com/keimoriyama/items/846a3462a92c8c5661ff) (https://qiita.com/keimoriyama/items/7b09d7c1797fcee6a2b0) (https://udemy.benesse.co.jp/data-science/ai/convolution-neural-network.html) (https://dev.classmethod.jp/articles/introduction-keras-deeplearning/)
Recommended Posts