Acknowledgments

First of all, I used it as a reference. Thank you very much.

https://qiita.com/yottyann1221/items/a08300b572206075ee9f https://qiita.com/tomo_20180402/items/e8c55bdca648f4877188 https://qiita.com/mainvoidllll/items/db991dc30d3ddced6250 https://newtechnologylifestyle.net/keras_imagedatagenerator/

I used free images for the images posted. https://pixabay.com/ja/

Introduction

Introduced to machine learning. It seems that it will proceed like this after a quick round.

Prepare a large number of images
Create teacher / test data
Create a model
Evaluate with model

This time, I decided to create a sheltie judgment program with image recognition.

Environment

I put python3 in chocolatey on Windows 10. Build a virtual environment with venv and put the required libraries.

> mkdir -p e:\python\ml
> cd e:\python\ml
> python -m venv ml
> .\ml\Scripts\activate
(ml)> pip install requests
(ml)> pip install beautifulsoup4
#I get the following error if I don't include lxml
# bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: lxml. Do you need to install a parser library?
(ml)> pip install lxml
(ml)> pip install pillow
#Put the latest numpy and np.load()Then the following error will appear, so specify the old version
# ValueError: Object arrays cannot be loaded when allow_pickle=False
(ml)> pip install numpy==1.16.2
(ml)> pip install sklearn
(ml)> pip install tensorflow
(ml)> pip install keras
(ml)> pip install matplotlib

Constitution

I made it like this.

e:\python\ml
├─ml
└─src
   ├─data
   │ ├─img
   │ │ ├─original              :Original image obtained by scraping
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
   │ │ ├─trimmed               :Image left after removing unnecessary images
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
   │ │ ├─excluded              :Unnecessary images
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
   │ │ ├─extended              :Image that is inflated by processing the remaining image
│ │ │ ├─ Sheltie
│ │ │ ├─ Corgi
│ │ │ └─ Border Collie
   │ │ └─test                  :Image for checking the operation of AI
   │ ├─train_test_data
   │ │ └─data.npy              :Teacher / test data created using extended images
   │ ├─model                   :Created model
   │ │ ├─Training_and_validation_accuracy.png
   │ │ ├─Training_and_validation_loss.png
   │ │ ├─model_predict.hdf5
   │ │ └─model_predict.json
   ├─img_scraper               :Image scraping
   │ ├─google_search.py
   │ └─main.py
   ├─img_trimmer               :Image removal
   │ └─main.py
   ├─img_duplicator            :Image padding
   │ └─main.py
   ├─train_test_data_generator :Teacher / test data creation
   │ └─main.py
   ├─model_generator           :Model generation
   │ └─main.py
   └─ai                        :Sheltie judgment
     └─main.py

Image scraping

First, prepare an image. This time, I scraped 300 sheets at a time.

img_scraper\google_search.py

# -- coding: utf-8 --

import json
from urllib import parse
import requests
from bs4 import BeautifulSoup

class Google:
    def __init__(self):
        self.GOOGLE_SEARCH_URL = 'https://www.google.co.jp/search'
        self.session = requests.session()
        self.session.headers.update({
            'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0'})

    def Search(self, keyword, type='text', maximum=1000):
        '''Google search'''
        print('Google', type.capitalize(), 'Search :', keyword)
        result, total = [], 0
        query = self.query_gen(keyword, type)
        while True:
            #Search
            html = self.session.get(next(query)).text
            links = self.get_links(html, type)

            #Add search results
            if not len(links):
                print('-> No more links')
                break
            elif len(links) > maximum - total:
                result += links[:maximum - total]
                break
            else:
                result += links
                total += len(links)

        print('->result', str(len(result)), 'I got the links of')
        return result

    def query_gen(self, keyword, type):
        '''Search query generator'''
        page = 0
        while True:
            if type == 'text':
                params = parse.urlencode({
                    'q': keyword,
                    'num': '100',
                    'filter': '0',
                    'start': str(page * 100)})
            elif type == 'image':
                params = parse.urlencode({
                    'q': keyword,
                    'tbm': 'isch',
                    'filter': '0',
                    'ijn': str(page)})

            yield self.GOOGLE_SEARCH_URL + '?' + params
            page += 1

    def get_links(self, html, type):
        '''Get link'''
        soup = BeautifulSoup(html, 'lxml')
        if type == 'text':
            elements = soup.select('.rc > .r > a')
            links = [e['href'] for e in elements]
        elif type == 'image':
            elements = soup.select('.rg_meta.notranslate')
            jsons = [json.loads(e.get_text()) for e in elements]
            links = [js['ou'] for js in jsons]
        return links

img_scraper\main.py

# -- coding: utf-8 --

import os
import requests
import google_search

#Search keyword
KEYWORDS = [u'Sheltie', u'Corgi', u'Border collie']
#Number of images to acquire
IMG_CNT = 300

g = google_search.Google()
for keyword in KEYWORDS:
    #Create save destination
    img_dir = os.path.join('./../data/img/original', keyword)
    os.makedirs(img_dir, exist_ok=True)
    print(u'Destination: {}'.format(img_dir))
    #Image search
    img_urls = g.Search('{} filetype:jpg -Shiba inu-felt-Product-toy-Plush Doll-ornament-item-mascot-Book-Cover-movies-Cartoon-charm-jpg -Rakuten-An illustration-Sticker-mail order-LINE -stamp-silhouette-design-Mug-Breeder-Jimoty-Group-Flock-nail-crane-Freebie-Product'.format(keyword), type='image', maximum=IMG_CNT)   
    #Save image
    total_cnt = 0
    for i,img_url in enumerate(img_urls):
        try:
            #Image path
            img_full_path = os.path.join(img_dir, (str(i) + '.jpg'))
            print('{}: {} -> {}'.format(str(i), img_url, img_full_path))
            re = requests.get(img_url, allow_redirects=False)
            if len(re.content) <= 0:
                print(u'{}:Skip because the content of the response is empty.'.format(str(i)))
                continue
            with open(img_full_path, 'wb') as f:
                f.write(re.content)
                total_cnt = total_cnt + 1
        except requests.exceptions.ConnectionError:
            continue
        except UnicodeEncodeError:
            continue
        except UnicodeError:
            continue
        except IsADirectoryError:
            continue
    print(u'{}: {}I got the image of the case.'.format(keyword, total_cnt))

print(u'Saving is complete.')

> cd e:\python\ml\src\img_scraper
> python main.py
Sheltie:We have acquired 270 images.
Corgi:We have acquired 286 images.
Border collie:We have acquired 281 images.

It seems that some of them couldn't save the image well.

Image scrutiny

Remove images that cannot be used as teacher data. For example, other dog breeds are shown together or there is a lot of text in it, so make a note of the number of those images. After that, Python sorts the data used for teacher data and the exclusion target.

img_trimmer\main.py

# -- coding: utf-8 --

import os, glob
import shutil

#Search keyword
KEYWORDS = [u'Sheltie', u'Corgi', u'Border collie']

targets = [
    [19, 25, 34, 40, 41, 49, 54, 74, 77, 81, 86, 89, 91, 93, 102, 104, 108, 111, 118, 119, 124, 127, 130, 131, 132, 136, 152, 154, 158, 159, 161, 168, 169, 173, 178, 181, 183, 184, 192, 193, 198, 201, 202, 213, 218, 233, 235, 236, 238, 240, 243, 245, 246, 248, 249, 252, 255, 264, 265, 269, 271, 272, 280, 283, 284, 287, 294, 295],
    [19, 29, 30, 40, 45, 67, 69, 73, 76, 78, 80, 100, 101, 105, 117, 120, 132, 136, 141, 143, 149, 151, 154, 158, 167, 170, 179, 180, 183, 186, 200, 201, 208, 213, 220, 224, 225, 228, 234, 235, 241, 243, 247, 248, 250, 253, 259, 262, 264, 266, 273, 277, 278, 279, 281, 282, 283, 285, 290, 293, 294, 298, 299],
    [9, 21, 25, 37, 38, 40, 41, 42, 49, 52, 55, 61, 65, 66, 71, 72, 78, 80, 89, 93, 96, 103, 108, 110, 113, 114, 118, 122, 126, 127, 128, 145, 146, 152, 158, 160, 161, 164, 166, 167, 171, 174, 175, 176, 182, 183, 186, 187, 193, 194, 196, 197, 200, 202, 203, 206, 207, 222, 223, 224, 226, 228, 230, 232, 233, 234, 237, 238, 241, 243, 244, 257, 259, 260, 262, 263, 264, 265, 267, 268, 270, 273, 275, 276, 277, 278, 281, 282, 283, 287, 289, 292, 293, 295]
    ]

for idx, keyword in enumerate(KEYWORDS):
    total_count = 0
    #Original
    img_dir = os.path.join('./../data/img/original', keyword)
    #Copy to
    os.makedirs(os.path.join('./../data/img/trimmed', keyword), exist_ok=True)
    os.makedirs(os.path.join('./../data/img/excluded', keyword), exist_ok=True)
    files = glob.glob(os.path.join(img_dir, '*.jpg'))
    for f in files:
        if int(os.path.splitext(os.path.basename(f))[0]) in targets[idx]:
            shutil.copy2(f, os.path.join('./../data/img/excluded', keyword))
            print(u'{} :It is an exclusion target.'.format(f))
        else:
            shutil.copy2(f, os.path.join('./../data/img/trimmed', keyword))
            print(u'{} :Used for teacher data.'.format(f))
            total_count = total_count + 1
    print(u'{} :The number of data that can be used{}It is a matter.'.format(keyword, total_count))


print(u'Saving is complete.')

> cd e:\python\ml\src\img_trimmer
> python main.py
Sheltie:The number of data that can be used is 202.
Corgi:The number of data that can be used is 223.
Border collie:The number of data that can be used is 187.

Only 60 to 70% can be used as teacher data.

Image replication

Inflate the number of data by duplicating and processing images with Keras' ImageDataGenerator.

img_duplicator\main.py

# -- coding: utf-8 --

import os
import glob
import numpy as np
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array, array_to_img


CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#Image size
IMG_SIZE = 150

#Define ImageDataGenerator
DATA_GENERATOR = ImageDataGenerator(horizontal_flip=0.3, zoom_range=0.1)

for idx, category in enumerate(CATEGORIES):
    #Original
    img_dir = os.path.join('./../data/img/trimmed', category)
    #Copy to
    out_dir = os.path.join('./../data/img/extended', category)
    os.makedirs(out_dir, exist_ok=True)

    files = glob.glob(os.path.join(img_dir, '*.jpg'))
    for i, file in enumerate(files):
        img = load_img(file)
        img = img.resize((IMG_SIZE, IMG_SIZE))
        x = img_to_array(img)
        x = np.expand_dims(x, axis=0)
        g = DATA_GENERATOR.flow(x, batch_size=1, save_to_dir=out_dir, save_prefix='img', save_format='jpg')
        for i in range(10):
            batch = g.next()
    print(u'{} :The number of files is{}It is a matter.'.format(category, len(os.listdir(out_dir))))

> cd e:\python\ml\src\img_duplicator
> python main.py
Using TensorFlow backend.
Sheltie:The number of files is 1817.
Corgi:The number of files is 1983.
Border collie:The number of files is 1708.

Teacher data creation

Since we have prepared a large number of images, we will create teacher data. I will label the image. About 20% of all data is used for test data. Save the created teacher / test data.

train_test_data_generator\main.py

# -*- coding: utf-8 -*-

from PIL import Image
import os, glob
import numpy as np
import random, math
from sklearn.model_selection import train_test_split
from keras.utils import np_utils

#The path of the root directory where the image is stored
IMG_ROOT_DIR = './../data/img/extended'
#category
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#density
DENSE_SIZE = len(CATEGORIES)
#Image size
IMG_SIZE = 150
#image data
X = []
#Category data
Y = []
#Teacher data
X_TRAIN = []
Y_TRAIN = []
#test data
X_TEST = []
Y_TEST = []
#Data storage destination
TRAIN_TEST_DATA = './../data/train_test_data/data.npy'


#Process by category
for idx, category in enumerate(CATEGORIES):
    #Image directory for each label
    image_dir = os.path.join(IMG_ROOT_DIR, category)
    files = glob.glob(os.path.join(image_dir, '*.jpg'))
    for f in files:
        #Resize each image and convert it to data
        img = Image.open(f)
        img = img.convert('RGB')
        img = img.resize((IMG_SIZE, IMG_SIZE))
        data = np.asarray(img)
        X.append(data)
        Y.append(idx)

X = np.array(X)
Y = np.array(Y)

#Normalization
X = X.astype('float32') /255
Y = np_utils.to_categorical(Y, DENSE_SIZE)

#Separate teacher data and test data
X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = train_test_split(X, Y, test_size=0.20)

#Save teacher / test data
np.save(TRAIN_TEST_DATA, (X_TRAIN, X_TEST, Y_TRAIN, Y_TEST))
print(u'Teacher / test data creation is complete.: {}'.format(TRAIN_TEST_DATA))

> cd e:\python\ml\src\train_test_data_generator
> python main.py
Using TensorFlow backend.
Teacher / test data creation is complete.: ./../data/train_test_data/data.npy

Model building

Now that we have the teacher / test data ready, it's time to build the model. Save the built model.

model_generator\main.py

# -*- coding: utf-8 -*-

#Model building

from keras import layers, models
from keras import optimizers
import numpy as np
import matplotlib.pyplot as plt
import os

#category
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#density
DENSE_SIZE = len(CATEGORIES)
#Image size
IMG_SIZE = 150
INPUT_SHAPE = (IMG_SIZE, IMG_SIZE,3)
#Teacher data
X_TRAIN = []
Y_TRAIN = []
#test data
X_TEST = []
Y_TEST = []
#Data storage destination
TRAIN_TEST_DATA = './../data/train_test_data/data.npy'
#Model save destination
MODEL_ROOT_DIR = './../data/model/'


# -----Model building----- #
model = models.Sequential()
model.add(layers.Conv2D(32,(3,3),activation="relu",input_shape=INPUT_SHAPE))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(64,(3,3),activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(128,(3,3),activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Conv2D(128,(3,3),activation="relu"))
model.add(layers.MaxPooling2D((2,2)))
model.add(layers.Flatten())
model.add(layers.Dropout(0.5))
model.add(layers.Dense(512,activation="relu"))
model.add(layers.Dense(DENSE_SIZE,activation="sigmoid"))

#Confirmation of model configuration
model.summary()
# ----- /Model building----- #

# -----Model compilation----- #
model.compile(loss="binary_crossentropy",
              optimizer=optimizers.RMSprop(lr=1e-4),
              metrics=["acc"])
# ----- /Model building----- #

# -----Model learning----- #
#Read teacher data and test data
X_TRAIN, X_TEST, Y_TRAIN, Y_TEST = np.load(TRAIN_TEST_DATA)
model = model.fit(X_TRAIN,
                  Y_TRAIN,
                  epochs=10,
                  batch_size=6,
                  validation_data=(X_TEST, Y_TEST))
# ----- /Model learning----- #

# -----Learning result plot----- #
acc = model.history['acc']
val_acc = model.history['val_acc']
loss = model.history['loss']
val_loss = model.history['val_loss']

epochs = range(len(acc))

plt.plot(epochs, acc, 'bo', label='Training acc')
plt.plot(epochs, val_acc, 'b', label='Validation acc')
plt.title('Training and validation accuracy')
plt.legend()
plt.savefig(os.path.join(MODEL_ROOT_DIR, 'Training_and_validation_accuracy.png'))

plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.savefig(os.path.join(MODEL_ROOT_DIR, 'Training_and_validation_loss.png'))
# ----- /Learning result plot----- #

# -----Save model----- #
#Save model
json_string = model.model.to_json()
open(os.path.join(MODEL_ROOT_DIR, 'model_predict.json'), 'w').write(json_string)

#Weight storage
model.model.save_weights(os.path.join(MODEL_ROOT_DIR, 'model_predict.hdf5'))
# ----- /Save model----- #

> cd e:\python\ml\src\model_generator
> python main.py
Using TensorFlow backend.
2019-11-15 17:02:03.400229: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #
=================================================================
conv2d_1 (Conv2D)            (None, 148, 148, 32)      896
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 74, 74, 32)        0
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 72, 72, 64)        18496
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 36, 36, 64)        0
_________________________________________________________________
conv2d_3 (Conv2D)            (None, 34, 34, 128)       73856
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 17, 17, 128)       0
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 15, 15, 128)       147584
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 7, 7, 128)         0
_________________________________________________________________
flatten_1 (Flatten)          (None, 6272)              0
_________________________________________________________________
dropout_1 (Dropout)          (None, 6272)              0
_________________________________________________________________
dense_1 (Dense)              (None, 512)               3211776
_________________________________________________________________
dense_2 (Dense)              (None, 3)                 1539
=================================================================
Total params: 3,454,147
Trainable params: 3,454,147
Non-trainable params: 0
_________________________________________________________________
Train on 4396 samples, validate on 1100 samples
Epoch 1/10
4396/4396 [==============================] - 103s 23ms/step - loss: 0.5434 - acc: 0.7298 - val_loss: 0.5780 - val_acc: 0.7067
Epoch 2/10
4396/4396 [==============================] - 109s 25ms/step - loss: 0.4457 - acc: 0.7989 - val_loss: 0.4288 - val_acc: 0.8024
Epoch 3/10
4396/4396 [==============================] - 103s 23ms/step - loss: 0.3874 - acc: 0.8318 - val_loss: 0.3992 - val_acc: 0.8170
Epoch 4/10
4396/4396 [==============================] - 106s 24ms/step - loss: 0.3483 - acc: 0.8469 - val_loss: 0.3476 - val_acc: 0.8476
Epoch 5/10
4396/4396 [==============================] - 107s 24ms/step - loss: 0.3029 - acc: 0.8717 - val_loss: 0.3085 - val_acc: 0.8603
Epoch 6/10
4396/4396 [==============================] - 109s 25ms/step - loss: 0.2580 - acc: 0.8947 - val_loss: 0.2918 - val_acc: 0.8736
Epoch 7/10
4396/4396 [==============================] - 107s 24ms/step - loss: 0.2182 - acc: 0.9084 - val_loss: 0.2481 - val_acc: 0.8970
Epoch 8/10
4396/4396 [==============================] - 113s 26ms/step - loss: 0.1855 - acc: 0.9217 - val_loss: 0.1920 - val_acc: 0.9209
Epoch 9/10
4396/4396 [==============================] - 120s 27ms/step - loss: 0.1548 - acc: 0.9394 - val_loss: 0.1775 - val_acc: 0.9345
Epoch 10/10
4396/4396 [==============================] - 114s 26ms/step - loss: 0.1243 - acc: 0.9530 - val_loss: 0.1738 - val_acc: 0.9412

Sheltie judgment program creation

It's finally time to prepare a sheltie judgment program. It's time to prepare an image for confirmation.

ai\main.py

# -*- coding: utf-8 -*-

from keras import models
from keras.models import model_from_json
from keras.preprocessing import image
import numpy as np
import sys
import os
from keras.preprocessing.image import ImageDataGenerator, load_img, img_to_array, array_to_img

#Model save destination
MODEL_ROOT_DIR = './../data/model/'
MODEL_PATH = os.path.join(MODEL_ROOT_DIR, 'model_predict.json')
WEIGHT_PATH = os.path.join(MODEL_ROOT_DIR, 'model_predict.hdf5')
#category
CATEGORIES = [u'Sheltie', u'Corgi', u'Border collie']
#Image size
IMG_SIZE = 150
INPUT_SHAPE = (IMG_SIZE, IMG_SIZE,3)

#Load the model
model = model_from_json(open(MODEL_PATH).read())
model.load_weights(WEIGHT_PATH)

#Read image from input argument
args = sys.argv
img = image.load_img(args[1], target_size=INPUT_SHAPE)
x = image.img_to_array(img)
x = np.expand_dims(x, axis=0)

#Predict with a model
features = model.predict(x)

print(features)

if features[0, 0] == 1:
    print(u'Sheltie.')
else:
    for i in range(0, len(CATEGORIES)):
        if features[0, i] == 1:
            print(u'It doesn't seem to be sheltie.{}is.'.format(CATEGORIES[i]))

(ml)> cd e:\python\ml\src\ai
(ml)> python .\main.py '..\data\img\test\sheltie_00.jpg'
Using TensorFlow backend.
2019-11-15 17:58:44.863437: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[1. 0. 0.]]
Sheltie.
(ml)> python .\main.py '..\data\img\test\corgi_00.jpg'
Using TensorFlow backend.
2019-11-15 17:58:55.519838: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[0. 1. 0.]]
It doesn't seem to be sheltie. This is Corgi.
(ml)> python .\main.py '..\data\img\test\bordercollie_00.jpg'
Using TensorFlow backend.
2019-11-15 17:59:06.457517: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[0. 0. 1.]]
It doesn't seem to be sheltie. It's a border collie.

sheltie_00.jpg Sheltie.

cogy_00.jpg It doesn't seem to be sheltie. This is Corgi.

bordercolli_00.png It doesn't seem to be sheltie. It's a border collie.

I was able to judge it in a good way.

Is Fran a sheltie?

How will Fran-chan, who is also used for the icon, be judged?

(ml)> python .\main.py '..\data\img\test\fran.jpg'
Using TensorFlow backend.
2019-11-15 17:59:28.118592: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[0. 0. 1.]]
It doesn't seem to be sheltie. It's a border collie.

fran.png It doesn't seem to be sheltie. It's a border collie.

Sorry. .. .. Although my Fran is a sheltie, it was judged that "it doesn't seem to be a sheltie. It's a border collie." The reason is probably that there were few black shelties in the teacher data. I was keenly aware of the importance of teacher data.

Postscript

I changed the image and revenged.

(ml)> python .\main.py '..\data\img\test\fran_01.jpg'
Using TensorFlow backend.
2019-11-18 17:21:07.929836: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
[[1. 0. 0.]]
Sheltie.

Our Fran was judged as a sheltie.

Outlook

--Add black sheltie to teacher data and revenge --Incorporate AI into your web app --Understanding CNN

A machine learning beginner tried to create a sheltie judgment AI in one day