Overview

Some people have already tried it on Qiita, but it also serves as their own study. I tried to classify guitar images using CNN (ResNet), so I tried it in the process, Here are some things that may be helpful. (Since it is not summarized, it is a little dirty, but I will also post the code)

--Specific classification method --About pretreatment --About learning method --About learning results ――Try and play --Summary

About specific classification method

The guitar image is scraped and preprocessed to inflate the image. By fine-tuning ResNet, which is a method of CNN, using inflated images, I will try machine learning without spending too much learning cost.

About labels

I chose the following models, which seem to be relatively easy to collect images.

--Made by Fender --Stratocaster --Telecaster --Jazzmaster --Jaguar --Mustang (including similar models) --Made by Gibson

Les Paul
- SG
- ES-335 --Flying V --Other --Various acoustic guitars

About pretreatment

The first is to collect images. This time, I collected it using iCrawler. Generally, most of them are collected from Google image search, but as of March 12, 2020, due to changes in specifications on the Google side. This time I collected images from Bing because the tool seems to be out of order.

`crawling.py`


import os

from icrawler.builtin import BingImageCrawler

searching_words = [
                    "Fender Stratocaster",
                    "Fender Telecaster",
                    "Fender Jazzmaster",
                    "Fender Jaguar",
                    "Fender Mustang",
                    "Gibson LesPaul",
                    "Gibson SG",
                    "Gibson FlyingV",
                    "Gibson ES-335",
                    "Acoustic guitar"
                ]
if __name__ == "__main__":
    for word in searching_words:
        if not os.path.isdir('./searched_image/' + word):
            os.makedirs('./searched_image/' + word)
        bing_crawler = BingImageCrawler(storage={ 'root_dir': './searched_image/' + word })
        bing_crawler.crawl(keyword=word, max_num=1000)

After collecting, I manually omitted images that are unlikely to be used (those that do not show the whole body of the guitar, those that contain letters, those that have reflections such as hands, etc.). As a result, we were able to collect about 100 to 160 images for each label. (I specified max_num = 1000 in the crawl method, but it only collected about 400 sheets.)

Next, we will preprocess the collected images. This time, the image was rotated by 45 ° and inverted. Therefore, the result increased 16 times to about 1600 to 2000 images for each label.

`image_preprocessing.py`


import os
import glob

from PIL import Image
import numpy as np
from sklearn.model_selection import train_test_split 

#The size of the image to be compressed
image_size = 224
#Number of training data
traindata = 1000
#Number of test data
testdata = 300

#Input folder name
src_dir = './searched_image'
#Output folder name
dst_dir = './input_guitar_data'

#Label name to identify
labels = [
                    "Fender Stratocaster",
                    "Fender Telecaster",
                    "Fender Jazzmaster",
                    "Fender Jaguar",
                    "Fender Mustang",
                    "Gibson LesPaul",
                    "Gibson SG",
                    "Gibson FlyingV",
                    "Gibson ES-335",
                    "Acoustic guitar"
                ]
#Loading images
for index, label in enumerate(labels):
    files =glob.glob("{}/{}/all/*.jpg ".format(src_dir, label))
        
    #Image converted data
    X = []
    #label
    Y = []

    for file in files:
        #Open image
        img = Image.open(file)
        img = img.convert("RGB")
        
        #===================#Convert to square#===================#
        width, height = img.size
        #If it is vertically long, expand it horizontally
        if width < height:
            result = Image.new(img.mode,(height, height),(255, 255, 255))
            result.paste(img, ((height - width) // 2, 0))
        #If it is horizontally long, expand it vertically
        elif width > height:
            result = Image.new(img.mode,(width, width),(255, 255, 255))
            result.paste(img, (0, (width - height) // 2))
        else:
            result = img

        #Align image size to 224x224
        result.resize((image_size, image_size))

        data = np.asarray(result)
        X.append(data)
        Y.append(index)

        #===================#Inflated data#===================#
        for angle in range(0, 360, 45):
            #rotation
            img_r = result.rotate(angle)
            data = np.asarray(img_r)
            X.append(data)
            Y.append(index)

            #Invert
            img_t = img_r.transpose(Image.FLIP_LEFT_RIGHT)
            data = np.asarray(img_t)
            X.append(data)
            Y.append(index)
    
    #Normalization(0~255->0~1)
    X = np.array(X,dtype='float32') / 255.0
    Y = np.array(Y)


    #Split data for cross-validation
    X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=testdata, train_size=traindata)
    xy = (X_train, X_test, y_train, y_test)
    np.save("{}/{}_{}.npy".format(dst_dir, label, index), xy)

Save the preprocessed results in an npy file for each label.

About learning method

This time, I will try to learn using ResNet, which is a typical method of CNN. Since the PC I own does not have an NVIDIA GPU, if I try to learn as it is, it will take a huge amount of time because it will be calculated only by the CPU, so let's execute and learn the following code in the GPGPU environment using Google Colab I did. (How to use Colab, how to upload files, etc. are omitted)

import gc

import keras
from keras.applications.resnet50 import ResNet50
from keras.models import Sequential, Model
from keras.layers import Conv2D, MaxPooling2D
from keras.layers import Activation, Dropout, Flatten, Dense, Input
from keras.callbacks import EarlyStopping 
from keras.utils import np_utils
from keras import optimizers

from sklearn.metrics import confusion_matrix

import numpy as np
import matplotlib.pyplot as plt

#Class label definition
classes = [
                    "Fender Stratocaster",
                    "Fender Telecaster",
                    "Fender Jazzmaster",
                    "Fender Jaguar",
                    "Fender Mustang",
                    "Gibson LesPaul",
                    "Gibson SG",
                    "Gibson FlyingV",
                    "Gibson ES-335",
                    "Acoustic guitar"
                ]
num_classes = len(classes)

#Image size to load
ScaleTo = 224

#Definition of main function
def main():
    #Reading training data
    src_dir = '/content/drive/My Drive/Machine learning/input_guitar_data'

    train_Xs = []
    test_Xs = []
    train_ys = []
    test_ys = []

    for index, class_name in enumerate(classes):
        file = "{}/{}_{}.npy".format(src_dir, class_name, index)
        #Bring a separate learning file
        train_X, test_X, train_y, test_y = np.load(file, allow_pickle=True)

        #Combine data into one
        train_Xs.append(train_X)
        test_Xs.append(test_X)
        train_ys.append(train_y)
        test_ys.append(test_y)

    #Combine the combined data
    X_train = np.concatenate(train_Xs, 0)
    X_test = np.concatenate(test_Xs, 0)
    y_train = np.concatenate(train_ys, 0)
    y_test = np.concatenate(test_ys, 0)

    #Label
    y_train = np_utils.to_categorical(y_train, num_classes)
    y_test = np_utils.to_categorical(y_test, num_classes)


    #Generation of machine learning model
    model, history = model_train(X_train, y_train, X_test, y_test)
    model_eval(model, X_test, y_test)
    #Display learning history
    model_visualization(history)

def model_train(X_train, y_train, X_test, y_test):
    #ResNet 50 load. Include because no fully connected layer is required_top=False
    input_tensor = Input(shape=(ScaleTo, ScaleTo, 3))
    resnet50 = ResNet50(include_top=False, weights='imagenet', input_tensor=input_tensor)

    #Creating a fully connected layer
    top_model = Sequential()
    top_model.add(Flatten(input_shape=resnet50.output_shape[1:]))
    top_model.add(Dense(256, activation='relu'))
    top_model.add(Dropout(0.5))
    top_model.add(Dense(num_classes, activation='softmax'))

    #Create a model by combining ResNet50 and a fully connected layer
    resnet50_model = Model(input=resnet50.input, output=top_model(resnet50.output))

    """
    #Fixed some weights of ResNet50
    for layer in resnet50_model.layers[:100]:
        layer.trainable = False
    """

    #Specify multi-class classification
    resnet50_model.compile(loss='categorical_crossentropy',
            optimizer=optimizers.SGD(lr=1e-3, momentum=0.9),
            metrics=['accuracy'])
    resnet50_model.summary()

    #Execution of learning
    early_stopping = EarlyStopping(monitor='val_loss', patience=0, verbose=1) 
    history = resnet50_model.fit(X_train, y_train,
                        batch_size=75,
                        epochs=25, validation_data=(X_test, y_test),
                        callbacks=[early_stopping])
    #Save model
    resnet50_model.save("/content/drive/My Drive/Machine learning/guitar_cnn_resnet50.h5")
    
    return resnet50_model, history

def model_eval(model, X_test, y_test):
    scores = model.evaluate(X_test, y_test, verbose=1)
    print("test Loss", scores[0])
    print("test Accuracy", scores[1])
    #Calculation of confusion matrix
    predict_classes = model.predict(X_test)
    predict_classes = np.argmax(predict_classes, 1)
    true_classes = np.argmax(y_test, 1)
    print(predict_classes)
    print(true_classes)
    cmx = confusion_matrix(true_classes, predict_classes)
    print(cmx)
    #Erase the model after inference
    del model
    keras.backend.clear_session() #← This is
    gc.collect()

def model_visualization(history):
    #Graph display of loss value
    plt.plot(history.history['loss'])
    plt.plot(history.history['val_loss'])
    plt.title('model loss')
    plt.ylabel('loss')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()

    #Graph display of correct answer rate
    plt.plot(history.history['acc'])
    plt.plot(history.history['val_acc'])
    plt.title('model accuracy')
    plt.ylabel('accuracy')
    plt.xlabel('epoch')
    plt.legend(['train', 'test'], loc='upper left')
    plt.show()
    
if __name__ == "__main__":
    main()

This time, the result of val acc etc. was better if the weight was not fixed, so the weight of each layer is also learned again. In the code, 100 epochs are trained, but in reality, early stopping has actually completed the learning at the 5th epoch.

About learning results

The result is as follows.

test Loss 0.09369107168481061
test Accuracy 0.9744

I will also put out a confusion matrix.

[[199   0   1   0   0   0   0   0   0   0]
 [  0 200   0   0   0   0   0   0   0   0]
 [  2   5 191   2   0   0   0   0   0   0]
 [  1   0  11 180   6   0   2   0   0   0]
 [  0   2   0   0 198   0   0   0   0   0]
 [  0   0   0   0   0 288   4   0   6   2]
 [  0   2   0   0   0   0 296   0   2   0]
 [  0   0   0   0   0   0   0 300   0   0]
 [  0   0   0   0   0   0   0   0 300   0]
 [  0   0   0   0   0   0   0   1   0 299]]

ダウンロード2.png ダウンロード.png

At the end of one epoch, you can see that learning has progressed considerably.

Try and play

I will try inference based on the saved model. This time I tried to make it a very rudimentary web application using Flask that I touched for the first time.

`graphing.py`


import matplotlib.pyplot as plt
from PIL import Image
import numpy as np

def to_graph(image, labels, predicted):
    #=======#Plot and save#=======#
    fig = plt.figure(figsize=(10.24, 5.12))
    fig.subplots_adjust(left=0.2)

    #=======#Write a bar chart#=======#
    ax1 = fig.add_subplot(1,2,1)
    ax1.barh(labels, predicted, color='c', align="center")
    ax1.set_yticks(labels)#y-axis label
    ax1.set_xticks([])#Remove x-axis labels

    #Write numbers in bar charts
    for interval, value in zip(range(0,len(labels)), predicted):
        ax1.text(0.02, interval, value, ha='left', va='center')

    #=======#Insert the identified image#=======#
    ax2 = fig.add_subplot(1,2,2)
    ax2.imshow(image)
    ax2.axis('off')

    return fig

def expand_to_square(input_file):
    """Convert a rectangular image to a square
    input_file:File name to convert
Return value:Converted image
    """
    img = Image.open(input_file)
    img = img.convert("RGB")
    
    width, height = img.size
    #If it is vertically long, expand it horizontally
    if width < height:
        result = Image.new(img.mode,(height, height),(255, 255, 255))
        result.paste(img, ((height - width) // 2, 0))
    #If it is horizontally long, expand it vertically
    elif width > height:
        result = Image.new(img.mode,(width, width),(255, 255, 255))
        result.paste(img, (0, (width - height) // 2))
    else:
        result = img
    
    return result

`predict_file.py`


predict_file.py
import io
import gc

from flask import Flask, request, redirect, url_for
from flask import flash, render_template, make_response

from keras.models import Sequential, load_model
from keras.applications.resnet50 import decode_predictions
import keras

import numpy as np
from PIL import Image
from matplotlib.backends.backend_agg import FigureCanvasAgg

import graphing

classes = [
            "Fender Stratocaster",
            "Fender Telecaster",
            "Fender Jazzmaster",
            "Fender Jaguar",
            "Fender Mustang",
            "Gibson LesPaul",
            "Gibson SG",
            "Gibson FlyingV",
            "Gibson ES-335",
            "Acoustic guitar"
            ]
num_classes = len(classes)
image_size = 224
ALLOWED_EXTENSIONS = set(['png', 'jpg', 'gif'])


app = Flask(__name__)

def allowed_file(filename):
    return '.' in filename and filename.rsplit('.',1)[1].lower() in ALLOWED_EXTENSIONS

@app.route('/', methods=['GET', 'POST'])
def upload_file():
    if request.method == 'POST':
        if 'file' not in request.files:
            flash('No file')
            return redirect(request.url)
        file = request.files['file']

        if file.filename == '':
            flash('No file')
            return redirect(request.url)

        if file and allowed_file(file.filename):
            virtual_output = io.BytesIO()
            file.save(virtual_output)
            filepath = virtual_output

            model = load_model('./cnn_model/guitar_cnn_resnet50.h5')

            #Convert image to square
            image = graphing.expand_to_square(filepath)
            image = image.convert('RGB')
            #Align image size to 224x224
            image = image.resize((image_size, image_size))
            #Change from image to numpy array and normalize
            data = np.asarray(image) / 255.0
            #Increase the dimensions of the array(3D->4 dimensions)
            data = np.expand_dims(data, axis=0)
            #Make inferences using the learned model
            result = model.predict(data)[0]
            
            #Draw the inference result and the inferred image as a graph
            fig = graphing.to_graph(image, classes, result)
            canvas = FigureCanvasAgg(fig)
            png_output = io.BytesIO()
            canvas.print_png(png_output)
            data = png_output.getvalue()

            response = make_response(data)
            response.headers['Content-Type'] = 'image/png'
            response.headers['Content-Length'] = len(data)

            #Erase the model after inference
            del model
            keras.backend.clear_session()
            gc.collect()

            return response
    return '''
    <!doctype html>
    <html>
        <head>
            <meta charset="UTF-8">
            <title>Let's upload the file and judge</title>
        </head>
        <body>
            <h1>Upload the file and judge!</h1>
            <form method = post enctype = multipart/form-data>
                <p><input type=file name=file>
                <input type=submit value=Upload>
            </form>
        </body>
    </html>
    '''

By the way, if you repeat learning and inference on Keras many times, the data seems to overflow in the memory, so it seems that you have to explicitly erase it in the code. (Similarly on colab)

Reference URL ↓ Fixed the problem that memory usage increases when learning repeatedly with keras

Also, I will post the source code of the web application that I actually made. ↓ Guitar Classification Web App

Try and play

I actually tried it with my own instrument.

First from the Jazzmaster ジャズマスター判定.png It also responds to Jaguar, which has many similarities. However, if it is another image obtained from another net, it may be judged as 99% Jazzmaster, so it can not be said that the classification accuracy is bad.

Then Stratocaster ストラトキャスター判定.png It was almost certainly determined to be a Stratocaster. There seems to be no problem even if the contrast is slightly dark.

So what happens if you let them determine which base they haven't trained? I tried it with my jazz bass type. ジャズベース判定.png It is not clear that it is judged as a Mustang, but I am concerned that the probability of SG is also high. It seems that the horns are not similar ...?

Summary

This time, by fine-tuning ResNet, which is a method of CNN, we were able to create a classifier that is relatively easy to create but has high accuracy. However, some machine learning, such as CNN, is hard to explain why the results happened. Therefore, if I have time, I will try visualization methods such as Grad-CAM in the future.

that's all.

A story of a deep learning beginner trying to classify guitars on CNN

Overview

table of contents

About specific classification method

About labels

About pretreatment

`crawling.py`

`image_preprocessing.py`

About learning method

About learning results

Try and play

`graphing.py`

`predict_file.py`

Try and play

Summary