First of all, I used it as a reference. Thank you very much.
https://qiita.com/yottyann1221/items/a08300b572206075ee9f https://qiita.com/tomo_20180402/items/e8c55bdca648f4877188 https://qiita.com/mainvoidllll/items/db991dc30d3ddced6250 https://newtechnologylifestyle.net/keras_imagedatagenerator/
I used free images for the images posted. https://pixabay.com/ja/
Introduced to machine learning. It seems that it will proceed like this after a quick round.
This time, I decided to create a sheltie judgment program with image recognition.
I put python3 in chocolatey on Windows 10. Build a virtual environment with venv and put the required libraries.
> mkdir -p e:\python\ml
> cd e:\python\ml
> python -m venv ml
> .\ml\Scripts\activate
(ml)> pip install requests
(ml)> pip install beautifulsoup4
#I get the following error if I don't include lxml
# bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
(ml)> pip install lxml
(ml)> pip install pillow
#Put the latest numpy and np.load()Then the following error will appear, so specify the old version
# ValueError: Object arrays cannot be loaded when allow_pickle=False
(ml)> pip install numpy==1.16.2
(ml)> pip install sklearn
(ml)> pip install tensorflow
(ml)> pip install keras
(ml)> pip install matplotlib
I made it like this.
e:\python\ml
├─ml
└─src
├─data
│ ├─img
│ │ ├─original :Original image obtained by scraping
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
│ │ ├─trimmed :Image left after removing unnecessary images
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
│ │ ├─excluded :Unnecessary images
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
│ │ ├─extended :Image that is inflated by processing the remaining image
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
│ │ └─test :Image for checking the operation of AI
│ ├─train_test_data
│ │ └─data.npy :Teacher / test data created using extended images
│ ├─model :Created model
│ │ ├─Training_and_validation_accuracy.png
│ │ ├─Training_and_validation_loss.png
│ │ ├─model_predict.hdf5
│ │ └─model_predict.json
├─img_scraper :Image scraping
│ ├─google_search.py
│ └─main.py
├─img_trimmer :Image removal
│ └─main.py
├─img_duplicator :Image padding
│ └─main.py
├─train_test_data_generator :Teacher / test data creation
│ └─main.py
├─model_generator :Model generation
│ └─main.py
└─ai :Sheltie judgment
└─main.py
First, prepare an image. This time, I scraped 300 sheets at a time.
# -- coding: utf-8 --
import json
from urllib import parse
import requests
from bs4 import BeautifulSoup
class Google:
def __init__(self):
self.GOOGLE_SEARCH_URL = 'https://www.google.co.jp/search'
self.session = requests.session()
self.session.headers.update({
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0'})
def Search(self, keyword, type='text', maximum=1000):
'''Google search'''
print('Google', type.capitalize(), 'Search :', keyword)
result, total = [], 0
query = self.query_gen(keyword, type)
while True:
#Search
html = self.session.get(next(query)).text
links = self.get_links(html, type)
#Add search results
if not len(links):
print('-> No more links')
break
elif len(links) > maximum - total:
result += links[:maximum - total]
break
else:
result += links
total += len(links)
print('->result', str(len(result)), 'I got the links of')
return result
def query_gen(self, keyword, type):
'''Search query generator'''
page = 0
while True:
if type == 'text':
params = parse.urlencode({
'q': keyword,
'num': '100',
'filter': '0',
'start': str(page * 100)})
elif type == 'image':
params = parse.urlencode({
'q': keyword,
'tbm': 'isch',
'filter': '0',
'ijn': str(page)})
yield self.GOOGLE_SEARCH_URL + '?' + params
page += 1
def get_links(self, html, type):
'''Get link'''
soup = BeautifulSoup(html, 'lxml')
if type == 'text':
elements = soup.select('.rc > .r > a')
links = [e['href'] for e in elements]
elif type == 'image':
elements = soup.select('.rg_meta.notranslate')
jsons = [json.loads(e.get_text()) for e in elements]
links = [js['ou'] for js in jsons]
return links
# -- coding: utf-8 --
import os
import requests
import google_search
#Search keyword
KEYWORDS = [u'Sheltie', u'Corgi', u'Border collie']
#Number of images to acquire
IMG_CNT = 300
g = google_search.Google()
for keyword in KEYWORDS:
#Create save destination
img_dir = os.path.join('./../data/img/original', keyword)
os.makedirs(img_dir, exist_ok=True)
print(u'Destination: {}'.format(img_dir))
#Image search
img_urls = g.Search('{} filetype:jpg -Shiba inu-felt-Product-toy-Plush Doll-ornament-item-mascot-Book-Cover-movies-Cartoon-charm-jpg -Rakuten-An illustration-Sticker-mail order-LINE -stamp-silhouette-design-Mug-Breeder-Jimoty-Group-Flock-nail-crane-Freebie-Product'.format(keyword), type='image', maximum=IMG_CNT)
#Save image
total_cnt = 0
for i,img_url in enumerate(img_urls):
try:
#Image path
img_full_path = os.path.join(img_dir, (str(i) + '.jpg'))
print('{}: {} -> {}'.format(str(i), img_url, img_full_path))
re = requests.get(img_url, allow_redirects=False)
if len(re.content) <= 0:
print(u'{}:Skip because the content of the response is empty.'.format(str(i)))
continue
with open(img_full_path, 'wb') as f:
f.write(re.content)
total_cnt = total_cnt + 1
except requests.exceptions.ConnectionError:
continue
except UnicodeEncodeError:
continue
except UnicodeError:
continue
except IsADirectoryError:
continue
print(u'{}: {}I got the image of the case.'.format(keyword, total_cnt))
print(u'Saving is complete.')
> cd e:\python\ml\src\img_scraper
> python main.py
Sheltie:We have acquired 270 images.
Corgi:We have acquired 286 images.
Border collie:We have acquired 281 images.
It seems that some of them couldn't save the image well.
Remove images that cannot be used as teacher data. For example, other dog breeds are shown together or there is a lot of text in it, so make a note of the number of those images. After that, Python sorts the data used for teacher data and the exclusion target.
# -- coding: utf-8 --
import os, glob
import shutil
#Search keyword
KEYWORDS = [u'Sheltie', u'Corgi', u'Border collie']
targets = [
[19, 25, 34, 40, 41, 49, 54, 74, 77, 81, 86, 89, 91, 93, 102, 104, 108, 111, 118, 119, 124, 127, 130, 131, 132, 136, 152, 154, 158, 159, 161, 168, 169, 173, 178, 181, 183, 184, 192, 193, 198, 201, 202, 213, 218, 233, 235, 236, 238, 240, 243, 245, 246, 248, 249, 252, 255, 264, 265, 269, 271, 272, 280, 283, 284, 287, 294, 295],
[19, 29, 30, 40, 45, 67, 69, 73, 76, 78, 80, 100, 101, 105, 117, 120, 132, 136, 141, 143, 149, 151, 154, 158, 167, 170, 179, 180, 183, 186, 200, 201, 208, 213, 220, 224, 225, 228, 234, 235, 241, 243, 247, 248, 250, 253, 259, 262, 264, 266, 273, 277, 278, 279, 281, 282, 283, 285, 290, 293, 294, 298, 299],
[9, 21, 25, 37, 38, 40, 41, 42, 49, 52, 55, 61, 65, 66, 71, 72, 78, 80, 89, 93, 96, 103, 108, 110, 113, 114, 118, 122, 126, 127, 128, 145, 146, 152, 158, 160, 161, 164, 166, 167, 171, 174, 175, 176, 182, 183, 186, 187, 193, 194, 196, 197, 200, 202, 203, 206, 207, 222, 223, 224, 226, 228, 230, 232, 233, 234, 237, 238, 241, 243, 244, 257, 259, 260, 262, 263, 264, 265, 267, 268, 270, 273, 275, 276, 277, 278, 281, 282, 283, 287, 289, 292, 293, 295]
]
for idx, keyword in enumerate(KEYWORDS):
total_count = 0
#Original
img_dir = os.path.join('./../data/img/original', keyword)
#Copy to
os.makedirs(os.path.join('./../data/img/trimmed', keyword), exist_ok=True)
os.makedirs(os.path.join('./../data/img/excluded', keyword), exist_ok=True)
files = glob.glob(os.path.join(img_dir, '*.jpg'))
for f in files:
if int(os.path.splitext(os.path.basename(f))[0]) in targets[idx]:
shutil.copy2(f, os.path.join('./../data/img/excluded', keyword))
print(u'{} :It is an exclusion target.'.format(f))
else:
shutil.copy2(f, os.path.join('./../data/img/trimmed', keyword))
print(u'{} :Used for teacher data.'.format(f))
total_count = total_count + 1
print(u'{} :The number of data that can be used{}It is a matter.'.format(keyword, total_count))
print(u'Saving is complete.')
> cd e:\python\ml\src\img_trimmer
> python main.py
Sheltie:The number of data that can be used is 202.
Corgi:The number of data that can be used is 223.
Border collie:The number of data that can be used is 187.
Only 60 to 70% can be used as teacher data.
Inflate the number of data by duplicating and processing images with Keras' ImageDataGenerator.
# -- coding: utf-8 --
import os
import glob
import numpy as np
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array, array_to_img
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#Image size
IMG_SIZE = 150
#Define ImageDataGenerator
DATA_GENERATOR = ImageDataGenerator(horizontal_flip=0.3, zoom_range=0.1)
for idx, category in enumerate(CATEGORIES):
#Original
img_dir = os.path.join('./../data/img/trimmed', category)
#Copy to
out_dir = os.path.join('./../data/img/extended', category)
os.makedirs(out_dir, exist_ok=True)
files = glob.glob(os.path.join(img_dir, '*.jpg'))
for i, file in enumerate(files):
img = load_img(file)
img = img.resize((IMG_SIZE, IMG_SIZE))
x = img_to_array(img)
x = np.expand_dims(x, axis=0)
g = DATA_GENERATOR.flow(x, batch_size=1, save_to_dir=out_dir, save_prefix='img', save_format='jpg')
for i in range(10):
batch = g.next()
print(u'{} :The number of files is{}It is a matter.'.format(category, len(os.listdir(out_dir))))
> cd e:\python\ml\src\img_duplicator
> python main.py
Using TensorFlow backend.
Sheltie:The number of files is 1817.
Corgi:The number of files is 1983.
Border collie:The number of files is 1708.
Since we have prepared a large number of images, we will create teacher data. I will label the image. About 20% of all data is used for test data. Save the created teacher / test data.
# -*- coding: utf-8 -*-
from PIL import Image
import os, glob
import numpy as np
import random, math
from sklearn.model_selection import train_test_split
from keras.utils import np_utils
#The path of the root directory where the image is stored
IMG_ROOT_DIR = './../data/img/extended'
#category
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#density
DENSE_SIZE = len(CATEGORIES)
#Image size
IMG_SIZE = 150
#image data
X = []
#Category data
Y = []
#Teacher data
X_TRAIN = []
Y_TRAIN = []
#test data
X_TEST = []
Y_TEST = []
#Data storage destination
TRAIN_TEST_DATA = './../data/train_test_data/data.npy'
#Process by category
for idx, category in enumerate(CATEGORIES):
#Image directory for each label
image_dir = os.path.join(IMG_ROOT_DIR, category)
files = glob.glob(os.path.join(image_dir, '*.jpg'))
for f in files:
#Resize each image and convert it to data
img = Image.open(f)
img = img.convert('RGB')
img = img.resize((IMG_SIZE, IMG_SIZE))
data = np.asarray(img)
X.append(data)
Y.append(idx)
X = np.array(X)
Y = np.array(Y)
#Normalization
X = X.astype('float32') /255
Y = np_utils.to_categorical(Y, DENSE_SIZE)
#Separate teacher data and test data
X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = train_test_split(X, Y, test_size=0.20)
#Save teacher / test data
np.save(TRAIN_TEST_DATA, (X_TRAIN, X_TEST, Y_TRAIN, Y_TEST))
print(u'Teacher / test data creation is complete.: {}'.format(TRAIN_TEST_DATA))
> cd e:\python\ml\src\train_test_data_generator
> python main.py
Using TensorFlow backend.
Teacher / test data creation is complete.: ./../data/train_test_data/data.npy
Now that we have the teacher / test data ready, it's time to build the model. Save the built model.
# -*- coding: utf-8 -*-
#Model building
from keras import layers, models
from keras import optimizers
import numpy as np
import matplotlib.pyplot as plt
import os
#category
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#density
DENSE_SIZE = len(CATEGORIES)
#Image size
IMG_SIZE = 150
INPUT_SHAPE = (IMG_SIZE, IMG_SIZE,3)
#Teacher data
X_TRAIN = []
Y_TRAIN = []
#test data
X_TEST = []
Y_TEST = []
#Data storage destination
TRAIN_TEST_DATA = './../data/train_test_data/data.npy'
#Model save destination
MODEL_ROOT_DIR = './../data/model/'
# -----Model building----- #
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=INPUT_SHAPE))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(128,(3,3),activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(128,(3,3),activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512,activation="relu"))
model.add(layers.Dense(DENSE_SIZE,activation="sigmoid"))
#Confirmation of model configuration
model.summary()
# ----- /Model building----- #
# -----Model compilation----- #
model.compile(loss="binary_crossentropy",
optimizer=optimizers.RMSprop(lr=1e-4),
metrics=["acc"])
# ----- /Model building----- #
# -----Model learning----- #
#Read teacher data and test data
X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = np.load(TRAIN_TEST_DATA)
model = model.fit(X_TRAIN,
Y_TRAIN,
epochs=10,
batch_size=6,
validation_data=(X_TEST, Y_TEST))
# ----- /Model learning----- #
# -----Learning result plot----- #
acc = model.history['acc']
val_acc = model.history['val_acc']
loss = model.history['loss']
val_loss = model.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.savefig(os.path.join(MODEL_ROOT_DIR, 'Training_and_validation_accuracy.png'))
plt.figure()
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.savefig(os.path.join(MODEL_ROOT_DIR, 'Training_and_validation_loss.png'))
# ----- /Learning result plot----- #
# -----Save model----- #
#Save model
json_string = model.model.to_json()
open(os.path.join(MODEL_ROOT_DIR, 'model_predict.json'), 'w').write(json_string)
#Weight storage
model.model.save_weights(os.path.join(MODEL_ROOT_DIR, 'model_predict.hdf5'))
# ----- /Save model----- #
> cd e:\python\ml\src\model_generator
> python main.py
Using TensorFlow backend.
2019-11-15 17:02:03.400229: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
conv2d_1 (Conv2D) (None, 148, 148, 32) 896
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 74, 74, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 72, 72, 64) 18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 36, 36, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 34, 34, 128) 73856
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 17, 17, 128) 0
_________________________________________________________________
conv2d_4 (Conv2D) (None, 15, 15, 128) 147584
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 128) 0
_________________________________________________________________
flatten_1 (Flatten) (None, 6272) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 6272) 0
_________________________________________________________________
dense_1 (Dense) (None, 512) 3211776
_________________________________________________________________
dense_2 (Dense) (None, 3) 1539
=================================================================
Total params: 3,454,147
Trainable params: 3,454,147
Non-trainable params: 0
_________________________________________________________________
Train on 4396 samples, validate on 1100 samples
Epoch 1/10
4396/4396 [==============================] - 103s 23ms/step - loss: 0.5434 - acc: 0.7298 - val_loss: 0.5780 - val_acc: 0.7067
Epoch 2/10
4396/4396 [==============================] - 109s 25ms/step - loss: 0.4457 - acc: 0.7989 - val_loss: 0.4288 - val_acc: 0.8024
Epoch 3/10
4396/4396 [==============================] - 103s 23ms/step - loss: 0.3874 - acc: 0.8318 - val_loss: 0.3992 - val_acc: 0.8170
Epoch 4/10
4396/4396 [==============================] - 106s 24ms/step - loss: 0.3483 - acc: 0.8469 - val_loss: 0.3476 - val_acc: 0.8476
Epoch 5/10
4396/4396 [==============================] - 107s 24ms/step - loss: 0.3029 - acc: 0.8717 - val_loss: 0.3085 - val_acc: 0.8603
Epoch 6/10
4396/4396 [==============================] - 109s 25ms/step - loss: 0.2580 - acc: 0.8947 - val_loss: 0.2918 - val_acc: 0.8736
Epoch 7/10
4396/4396 [==============================] - 107s 24ms/step - loss: 0.2182 - acc: 0.9084 - val_loss: 0.2481 - val_acc: 0.8970
Epoch 8/10
4396/4396 [==============================] - 113s 26ms/step - loss: 0.1855 - acc: 0.9217 - val_loss: 0.1920 - val_acc: 0.9209
Epoch 9/10
4396/4396 [==============================] - 120s 27ms/step - loss: 0.1548 - acc: 0.9394 - val_loss: 0.1775 - val_acc: 0.9345
Epoch 10/10
4396/4396 [==============================] - 114s 26ms/step - loss: 0.1243 - acc: 0.9530 - val_loss: 0.1738 - val_acc: 0.9412
It's finally time to prepare a sheltie judgment program. It's time to prepare an image for confirmation.
# -*- coding: utf-8 -*-
from keras import models
from keras.models import model_from_json
from keras.preprocessing import image
import numpy as np
import sys
import os
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array, array_to_img
#Model save destination
MODEL_ROOT_DIR = './../data/model/'
MODEL_PATH = os.path.join(MODEL_ROOT_DIR, 'model_predict.json')
WEIGHT_PATH = os.path.join(MODEL_ROOT_DIR, 'model_predict.hdf5')
#category
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#Image size
IMG_SIZE = 150
INPUT_SHAPE = (IMG_SIZE, IMG_SIZE,3)
#Load the model
model = model_from_json(open(MODEL_PATH).read())
model.load_weights(WEIGHT_PATH)
#Read image from input argument
args = sys.argv
img = image.load_img(args[1], target_size=INPUT_SHAPE)
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
#Predict with a model
features = model.predict(x)
print(features)
if features[0, 0] == 1:
print(u'Sheltie.')
else:
for i in range(0, len(CATEGORIES)):
if features[0, i] == 1:
print(u'It doesn't seem to be sheltie.{}is.'.format(CATEGORIES[i]))
(ml)> cd e:\python\ml\src\ai
(ml)> python .\main.py '..\data\img\test\sheltie_00.jpg'
Using TensorFlow backend.
2019-11-15 17:58:44.863437: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[1. 0. 0.]]
Sheltie.
(ml)> python .\main.py '..\data\img\test\corgi_00.jpg'
Using TensorFlow backend.
2019-11-15 17:58:55.519838: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[0. 1. 0.]]
It doesn't seem to be sheltie. This is Corgi.
(ml)> python .\main.py '..\data\img\test\bordercollie_00.jpg'
Using TensorFlow backend.
2019-11-15 17:59:06.457517: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[0. 0. 1.]]
It doesn't seem to be sheltie. It's a border collie.
sheltie_00.jpg Sheltie.
cogy_00.jpg It doesn't seem to be sheltie. This is Corgi.
bordercolli_00.png It doesn't seem to be sheltie. It's a border collie.
I was able to judge it in a good way.
How will Fran-chan, who is also used for the icon, be judged?
(ml)> python .\main.py '..\data\img\test\fran.jpg'
Using TensorFlow backend.
2019-11-15 17:59:28.118592: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[0. 0. 1.]]
It doesn't seem to be sheltie. It's a border collie.
fran.png It doesn't seem to be sheltie. It's a border collie.
Sorry. .. .. Although my Fran is a sheltie, it was judged that "it doesn't seem to be a sheltie. It's a border collie." The reason is probably that there were few black shelties in the teacher data. I was keenly aware of the importance of teacher data.
I changed the image and revenged.
(ml)> python .\main.py '..\data\img\test\fran_01.jpg'
Using TensorFlow backend.
2019-11-18 17:21:07.929836: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[1. 0. 0.]]
Sheltie.
Our Fran was judged as a sheltie.
--Add black sheltie to teacher data and revenge --Incorporate AI into your web app --Understanding CNN
Recommended Posts