First of all, I used it as a reference. Thank you very much.
I used free images for the images posted.
Introduced to machine learning. It seems that it will proceed like this after a quick round.
This time, I decided to create a sheltie judgment program with image recognition.
I put python3 in chocolatey on Windows 10. Build a virtual environment with venv and put the required libraries.
> mkdir -p e:\python\ml
> cd e:\python\ml
> python -m venv ml
> .\ml\Scripts\activate
(ml)> pip install requests
(ml)> pip install beautifulsoup4
#I get the following error if I don't include lxml
# bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
(ml)> pip install lxml
(ml)> pip install pillow
#Put the latest numpy and np.load()Then the following error will appear, so specify the old version
# ValueError: Object arrays cannot be loaded when allow_pickle=False
(ml)> pip install numpy==1.16.2
(ml)> pip install sklearn
(ml)> pip install tensorflow
(ml)> pip install keras
(ml)> pip install matplotlib
I made it like this.
│ ├─img
│ │ ├─original :Original image obtained by scraping
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
│ │ ├─trimmed :Image left after removing unnecessary images
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
│ │ ├─excluded :Unnecessary images
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
│ │ ├─extended :Image that is inflated by processing the remaining image
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
│ │ └─test :Image for checking the operation of AI
│ ├─train_test_data
│ │ └─data.npy :Teacher / test data created using extended images
│ ├─model :Created model
│ │ ├─Training_and_validation_accuracy.png
│ │ ├─Training_and_validation_loss.png
│ │ ├─model_predict.hdf5
│ │ └─model_predict.json
├─img_scraper :Image scraping
│ ├─
│ └─
├─img_trimmer :Image removal
│ └─
├─img_duplicator :Image padding
│ └─
├─train_test_data_generator :Teacher / test data creation
│ └─
├─model_generator :Model generation
│ └─
└─ai :Sheltie judgment
First, prepare an image. This time, I scraped 300 sheets at a time.
# -- coding: utf-8 --
import json
from urllib import parse
import requests
from bs4 import BeautifulSoup
class Google:
def __init__(self):
self.session = requests.session()
'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0'})
def Search(self, keyword, type='text', maximum=1000):
'''Google search'''
print('Google', type.capitalize(), 'Search :', keyword)
result, total = [], 0
query = self.query_gen(keyword, type)
while True:
html = self.session.get(next(query)).text
links = self.get_links(html, type)
#Add search results
if not len(links):
print('-> No more links')
elif len(links) > maximum - total:
result += links[:maximum - total]
result += links
total += len(links)
print('->result', str(len(result)), 'I got the links of')
return result
def query_gen(self, keyword, type):
'''Search query generator'''
page = 0
while True:
if type == 'text':
params = parse.urlencode({
'q': keyword,
'num': '100',
'filter': '0',
'start': str(page * 100)})
elif type == 'image':
params = parse.urlencode({
'q': keyword,
'tbm': 'isch',
'filter': '0',
'ijn': str(page)})
yield self.GOOGLE_SEARCH_URL + '?' + params
page += 1
def get_links(self, html, type):
'''Get link'''
soup = BeautifulSoup(html, 'lxml')
if type == 'text':
elements ='.rc > .r > a')
links = [e['href'] for e in elements]
elif type == 'image':
elements ='.rg_meta.notranslate')
jsons = [json.loads(e.get_text()) for e in elements]
links = [js['ou'] for js in jsons]
return links
# -- coding: utf-8 --
import os
import requests
import google_search
#Search keyword
KEYWORDS = [u'Sheltie', u'Corgi', u'Border collie']
#Number of images to acquire
IMG_CNT = 300
g = google_search.Google()
for keyword in KEYWORDS:
#Create save destination
img_dir = os.path.join('./../data/img/original', keyword)
os.makedirs(img_dir, exist_ok=True)
print(u'Destination: {}'.format(img_dir))
#Image search
img_urls = g.Search('{} filetype:jpg -Shiba inu-felt-Product-toy-Plush Doll-ornament-item-mascot-Book-Cover-movies-Cartoon-charm-jpg -Rakuten-An illustration-Sticker-mail order-LINE -stamp-silhouette-design-Mug-Breeder-Jimoty-Group-Flock-nail-crane-Freebie-Product'.format(keyword), type='image', maximum=IMG_CNT)
#Save image
total_cnt = 0
for i,img_url in enumerate(img_urls):
#Image path
img_full_path = os.path.join(img_dir, (str(i) + '.jpg'))
print('{}: {} -> {}'.format(str(i), img_url, img_full_path))
re = requests.get(img_url, allow_redirects=False)
if len(re.content) <= 0:
print(u'{}:Skip because the content of the response is empty.'.format(str(i)))
with open(img_full_path, 'wb') as f:
total_cnt = total_cnt + 1
except requests.exceptions.ConnectionError:
except UnicodeEncodeError:
except UnicodeError:
except IsADirectoryError:
print(u'{}: {}I got the image of the case.'.format(keyword, total_cnt))
print(u'Saving is complete.')
> cd e:\python\ml\src\img_scraper
> python
Sheltie:We have acquired 270 images.
Corgi:We have acquired 286 images.
Border collie:We have acquired 281 images.
It seems that some of them couldn't save the image well.
Remove images that cannot be used as teacher data. For example, other dog breeds are shown together or there is a lot of text in it, so make a note of the number of those images. After that, Python sorts the data used for teacher data and the exclusion target.
# -- coding: utf-8 --
import os, glob
import shutil
#Search keyword
KEYWORDS = [u'Sheltie', u'Corgi', u'Border collie']
targets = [
[19, 25, 34, 40, 41, 49, 54, 74, 77, 81, 86, 89, 91, 93, 102, 104, 108, 111, 118, 119, 124, 127, 130, 131, 132, 136, 152, 154, 158, 159, 161, 168, 169, 173, 178, 181, 183, 184, 192, 193, 198, 201, 202, 213, 218, 233, 235, 236, 238, 240, 243, 245, 246, 248, 249, 252, 255, 264, 265, 269, 271, 272, 280, 283, 284, 287, 294, 295],
[19, 29, 30, 40, 45, 67, 69, 73, 76, 78, 80, 100, 101, 105, 117, 120, 132, 136, 141, 143, 149, 151, 154, 158, 167, 170, 179, 180, 183, 186, 200, 201, 208, 213, 220, 224, 225, 228, 234, 235, 241, 243, 247, 248, 250, 253, 259, 262, 264, 266, 273, 277, 278, 279, 281, 282, 283, 285, 290, 293, 294, 298, 299],
[9, 21, 25, 37, 38, 40, 41, 42, 49, 52, 55, 61, 65, 66, 71, 72, 78, 80, 89, 93, 96, 103, 108, 110, 113, 114, 118, 122, 126, 127, 128, 145, 146, 152, 158, 160, 161, 164, 166, 167, 171, 174, 175, 176, 182, 183, 186, 187, 193, 194, 196, 197, 200, 202, 203, 206, 207, 222, 223, 224, 226, 228, 230, 232, 233, 234, 237, 238, 241, 243, 244, 257, 259, 260, 262, 263, 264, 265, 267, 268, 270, 273, 275, 276, 277, 278, 281, 282, 283, 287, 289, 292, 293, 295]
for idx, keyword in enumerate(KEYWORDS):
total_count = 0
img_dir = os.path.join('./../data/img/original', keyword)
#Copy to
os.makedirs(os.path.join('./../data/img/trimmed', keyword), exist_ok=True)
os.makedirs(os.path.join('./../data/img/excluded', keyword), exist_ok=True)
files = glob.glob(os.path.join(img_dir, '*.jpg'))
for f in files:
if int(os.path.splitext(os.path.basename(f))[0]) in targets[idx]:
shutil.copy2(f, os.path.join('./../data/img/excluded', keyword))
print(u'{} :It is an exclusion target.'.format(f))
shutil.copy2(f, os.path.join('./../data/img/trimmed', keyword))
print(u'{} :Used for teacher data.'.format(f))
total_count = total_count + 1
print(u'{} :The number of data that can be used{}It is a matter.'.format(keyword, total_count))
print(u'Saving is complete.')
> cd e:\python\ml\src\img_trimmer
> python
Sheltie:The number of data that can be used is 202.
Corgi:The number of data that can be used is 223.
Border collie:The number of data that can be used is 187.
Only 60 to 70% can be used as teacher data.
Inflate the number of data by duplicating and processing images with Keras' ImageDataGenerator.
# -- coding: utf-8 --
import os
import glob
import numpy as np
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array, array_to_img
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#Image size
IMG_SIZE = 150
#Define ImageDataGenerator
DATA_GENERATOR = ImageDataGenerator(horizontal_flip=0.3, zoom_range=0.1)
for idx, category in enumerate(CATEGORIES):
img_dir = os.path.join('./../data/img/trimmed', category)
#Copy to
out_dir = os.path.join('./../data/img/extended', category)
os.makedirs(out_dir, exist_ok=True)
files = glob.glob(os.path.join(img_dir, '*.jpg'))
for i, file in enumerate(files):
img = load_img(file)
img = img.resize((IMG_SIZE, IMG_SIZE))
x = img_to_array(img)
x = np.expand_dims(x, axis=0)
g = DATA_GENERATOR.flow(x, batch_size=1, save_to_dir=out_dir, save_prefix='img', save_format='jpg')
for i in range(10):
batch =
print(u'{} :The number of files is{}It is a matter.'.format(category, len(os.listdir(out_dir))))
> cd e:\python\ml\src\img_duplicator
> python
Using TensorFlow backend.
Sheltie:The number of files is 1817.
Corgi:The number of files is 1983.
Border collie:The number of files is 1708.
Since we have prepared a large number of images, we will create teacher data. I will label the image. About 20% of all data is used for test data. Save the created teacher / test data.
# -*- coding: utf-8 -*-
from PIL import Image
import os, glob
import numpy as np
import random, math
from sklearn.model_selection import train_test_split
from keras.utils import np_utils
#The path of the root directory where the image is stored
IMG_ROOT_DIR = './../data/img/extended'
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#Image size
IMG_SIZE = 150
#image data
X = []
#Category data
Y = []
#Teacher data
X_TRAIN = []
Y_TRAIN = []
#test data
X_TEST = []
Y_TEST = []
#Data storage destination
TRAIN_TEST_DATA = './../data/train_test_data/data.npy'
#Process by category
for idx, category in enumerate(CATEGORIES):
#Image directory for each label
image_dir = os.path.join(IMG_ROOT_DIR, category)
files = glob.glob(os.path.join(image_dir, '*.jpg'))
for f in files:
#Resize each image and convert it to data
img =
img = img.convert('RGB')
img = img.resize((IMG_SIZE, IMG_SIZE))
data = np.asarray(img)
X = np.array(X)
Y = np.array(Y)
X = X.astype('float32') /255
Y = np_utils.to_categorical(Y, DENSE_SIZE)
#Separate teacher data and test data
X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = train_test_split(X, Y, test_size=0.20)
#Save teacher / test data, (X_TRAIN, X_TEST, Y_TRAIN, Y_TEST))
print(u'Teacher / test data creation is complete.: {}'.format(TRAIN_TEST_DATA))
> cd e:\python\ml\src\train_test_data_generator
> python
Using TensorFlow backend.
Teacher / test data creation is complete.: ./../data/train_test_data/data.npy
Now that we have the teacher / test data ready, it's time to build the model. Save the built model.
# -*- coding: utf-8 -*-
#Model building
from keras import layers, models
from keras import optimizers
import numpy as np
import matplotlib.pyplot as plt
import os
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#Image size
IMG_SIZE = 150
#Teacher data
X_TRAIN = []
Y_TRAIN = []
#test data
X_TEST = []
Y_TEST = []
#Data storage destination
TRAIN_TEST_DATA = './../data/train_test_data/data.npy'
#Model save destination
MODEL_ROOT_DIR = './../data/model/'
# -----Model building----- #
model = models.Sequential()
#Confirmation of model configuration
# ----- /Model building----- #
# -----Model compilation----- #
# ----- /Model building----- #
# -----Model learning----- #
#Read teacher data and test data
model =,
validation_data=(X_TEST, Y_TEST))
# ----- /Model learning----- #
# -----Learning result plot----- #
acc = model.history['acc']
val_acc = model.history['val_acc']
loss = model.history['loss']
val_loss = model.history['val_loss']
epochs = range(len(acc))
plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.savefig(os.path.join(MODEL_ROOT_DIR, 'Training_and_validation_accuracy.png'))
plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.savefig(os.path.join(MODEL_ROOT_DIR, 'Training_and_validation_loss.png'))
# ----- /Learning result plot----- #
# -----Save model----- #
#Save model
json_string = model.model.to_json()
open(os.path.join(MODEL_ROOT_DIR, 'model_predict.json'), 'w').write(json_string)
#Weight storage
model.model.save_weights(os.path.join(MODEL_ROOT_DIR, 'model_predict.hdf5'))
# ----- /Save model----- #
> cd e:\python\ml\src\model_generator
> python
Using TensorFlow backend.
2019-11-15 17:02:03.400229: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Model: "sequential_1"
Layer (type) Output Shape Param #
conv2d_1 (Conv2D) (None, 148, 148, 32) 896
max_pooling2d_1 (MaxPooling2 (None, 74, 74, 32) 0
conv2d_2 (Conv2D) (None, 72, 72, 64) 18496
max_pooling2d_2 (MaxPooling2 (None, 36, 36, 64) 0
conv2d_3 (Conv2D) (None, 34, 34, 128) 73856
max_pooling2d_3 (MaxPooling2 (None, 17, 17, 128) 0
conv2d_4 (Conv2D) (None, 15, 15, 128) 147584
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 128) 0
flatten_1 (Flatten) (None, 6272) 0
dropout_1 (Dropout) (None, 6272) 0
dense_1 (Dense) (None, 512) 3211776
dense_2 (Dense) (None, 3) 1539
Total params: 3,454,147
Trainable params: 3,454,147
Non-trainable params: 0
Train on 4396 samples, validate on 1100 samples
Epoch 1/10
4396/4396 [==============================] - 103s 23ms/step - loss: 0.5434 - acc: 0.7298 - val_loss: 0.5780 - val_acc: 0.7067
Epoch 2/10
4396/4396 [==============================] - 109s 25ms/step - loss: 0.4457 - acc: 0.7989 - val_loss: 0.4288 - val_acc: 0.8024
Epoch 3/10
4396/4396 [==============================] - 103s 23ms/step - loss: 0.3874 - acc: 0.8318 - val_loss: 0.3992 - val_acc: 0.8170
Epoch 4/10
4396/4396 [==============================] - 106s 24ms/step - loss: 0.3483 - acc: 0.8469 - val_loss: 0.3476 - val_acc: 0.8476
Epoch 5/10
4396/4396 [==============================] - 107s 24ms/step - loss: 0.3029 - acc: 0.8717 - val_loss: 0.3085 - val_acc: 0.8603
Epoch 6/10
4396/4396 [==============================] - 109s 25ms/step - loss: 0.2580 - acc: 0.8947 - val_loss: 0.2918 - val_acc: 0.8736
Epoch 7/10
4396/4396 [==============================] - 107s 24ms/step - loss: 0.2182 - acc: 0.9084 - val_loss: 0.2481 - val_acc: 0.8970
Epoch 8/10
4396/4396 [==============================] - 113s 26ms/step - loss: 0.1855 - acc: 0.9217 - val_loss: 0.1920 - val_acc: 0.9209
Epoch 9/10
4396/4396 [==============================] - 120s 27ms/step - loss: 0.1548 - acc: 0.9394 - val_loss: 0.1775 - val_acc: 0.9345
Epoch 10/10
4396/4396 [==============================] - 114s 26ms/step - loss: 0.1243 - acc: 0.9530 - val_loss: 0.1738 - val_acc: 0.9412
It's finally time to prepare a sheltie judgment program. It's time to prepare an image for confirmation.
# -*- coding: utf-8 -*-
from keras import models
from keras.models import model_from_json
from keras.preprocessing import image
import numpy as np
import sys
import os
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array, array_to_img
#Model save destination
MODEL_ROOT_DIR = './../data/model/'
MODEL_PATH = os.path.join(MODEL_ROOT_DIR, 'model_predict.json')
WEIGHT_PATH = os.path.join(MODEL_ROOT_DIR, 'model_predict.hdf5')
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#Image size
IMG_SIZE = 150
#Load the model
model = model_from_json(open(MODEL_PATH).read())
#Read image from input argument
args = sys.argv
img = image.load_img(args[1], target_size=INPUT_SHAPE)
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)
#Predict with a model
features = model.predict(x)
if features[0, 0] == 1:
for i in range(0, len(CATEGORIES)):
if features[0, i] == 1:
print(u'It doesn't seem to be sheltie.{}is.'.format(CATEGORIES[i]))
(ml)> cd e:\python\ml\src\ai
(ml)> python .\ '..\data\img\test\sheltie_00.jpg'
Using TensorFlow backend.
2019-11-15 17:58:44.863437: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[1. 0. 0.]]
(ml)> python .\ '..\data\img\test\corgi_00.jpg'
Using TensorFlow backend.
2019-11-15 17:58:55.519838: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[0. 1. 0.]]
It doesn't seem to be sheltie. This is Corgi.
(ml)> python .\ '..\data\img\test\bordercollie_00.jpg'
Using TensorFlow backend.
2019-11-15 17:59:06.457517: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[0. 0. 1.]]
It doesn't seem to be sheltie. It's a border collie.
sheltie_00.jpg Sheltie.
cogy_00.jpg It doesn't seem to be sheltie. This is Corgi.
bordercolli_00.png It doesn't seem to be sheltie. It's a border collie.
I was able to judge it in a good way.
How will Fran-chan, who is also used for the icon, be judged?
(ml)> python .\ '..\data\img\test\fran.jpg'
Using TensorFlow backend.
2019-11-15 17:59:28.118592: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[0. 0. 1.]]
It doesn't seem to be sheltie. It's a border collie.
fran.png It doesn't seem to be sheltie. It's a border collie.
Sorry. .. .. Although my Fran is a sheltie, it was judged that "it doesn't seem to be a sheltie. It's a border collie." The reason is probably that there were few black shelties in the teacher data. I was keenly aware of the importance of teacher data.
I changed the image and revenged.
(ml)> python .\ '..\data\img\test\fran_01.jpg'
Using TensorFlow backend.
2019-11-18 17:21:07.929836: I tensorflow/core/platform/] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[1. 0. 0.]]
Our Fran was judged as a sheltie.
--Add black sheltie to teacher data and revenge --Incorporate AI into your web app --Understanding CNN
Recommended Posts