Image classification with self-made neural network by Keras and PyTorch

Introduction

PyTorch practice. This is the content up to the last time ↓ Implementation of simple regression analysis in Keras Wine classification by Keras Machine Sommelier by Keras- Dataset preparation for PyTorch

--Up to the last time, we have dealt with examples such as regression and classification using Keras. I also learned about the outline and implementation of machine learning and deep learning. ――This time, I will create a neural network that learns and classifies the images I have collected. (The backbone is selectable.) --The deep learning frameworks used are Keras and PyTorch, and the differences between the two are also compared. --Click here for the program ↓ (Execution environment is described in [Bottom of page](#Execution environment-environment)) (Data set is also available) GitHub-moriitkys/MyOwnNN

Problem setting

--For the data set, a hook wrench (62 sheets) and a spanner wrench (62 sheets) are collected and expanded for trial use as images for learning and evaluation (verification) (Figure 1-a, b). Tool classification.

fig1_a1_hook.png fig1_b1_spanner.png
fig1-a2-hook.gif fig1-b2-spanner.gif
Figure 1-a. Hook Wrench Figure 1-b. Spanner Wrench

--The input of your own NN (MyNet) is 28x28x3 and the output is 2, which is a classification problem. The network structure is detailed below. --The number of learnings is epoch, the optimization function is SGD, and the loss function is categorical cross entropy. --For test images (unknown images), prepare 2 hook wrenches and 2 spanner wrenches that are not used for learning and evaluation. --UI is the same as the one used in the previous Dataset preparation for PyTorch. ――As a bonus, I also tried to classify the logos of the two machine tool makers that continued from the previous time.

Network structure

The self-made NN is called MyNet in this article. It is a network consisting of an input layer (28 * 28 * 3 nodes), an intermediate layer (200 nodes), and an output layer (2 outputs). This time, we have made it possible to consider 3 RGB channels. The conceptual diagram of the structure is Figure 2.

fig2_network.png
Figure 2.Conceptual diagram of MyNet

In the middle layer, ReLU is applied as an activation function, and Dropout is also applied. Apply the softmax function as an activation function in the output layer, and output for each class (2) To get

A brief glossary about networks

fig2_NN_terms.png
Figure 3.Conceptual diagram of terms and learning in machine learning

・ ** Neurons, Nodes </ font> ** The part that receives an input signal and outputs something. As shown in Figure 3, the rounded part is called a neuron (node), and it converts some function into an input signal and outputs an output signal.

・ ** Activation function </ font> **: ReLU, softmax A function that transforms each neuron (node) when it receives an output from an input. Something like $ f_ {()} $ shown in the figure.

Click here for a description of the activation function In the figure, the softmax function is shown as an example. The softmax function is used in the final layer, and the sum of the outputs corresponding to each class is 1 (it can be regarded as a class probability). The activation function is used to mimic the phenomenon that synapses in the brain fire when they exceed a certain threshold. By making the activation function a non-linear function, the recognition accuracy in image recognition has improved dramatically. In addition to Softmax, there are ReLU, sigmoid, etc., which need to be used properly depending on the situation, but new ones are appearing one after another. * Non-linear means that it cannot be written with a single straight line, and linear function is a function that can be written with a single straight line.
fig4_a.png fig4_b.png fig4_c.png
Figure 4-a.Softmax function Figure 4-b. ReLU Figure 4-c.Sigmoid function

・ ** Loss function </ font> **: categorical_crossentropy The loss value is the error between the value predicted by the neural network and the correct answer, and the function to find the error is the loss function. As shown in the figure, it is a function that calculates the error from the output of the model and the correct label.

Click here for a description of the loss function In the figure, cross entropy is shown as an example. Cross entropy is used in classification tasks. The formula for cross entropy is as follows. $$E=-\sum_{k=1}^{K} t_{n k} \log y_{n k}$$ $ n $ is the sample number, $ K $ is the number of classes, $ y_ {nk} $ is the output of the $ n $ sample eye class $ k $, $ t_ {nk} $ is the $ n $ sample eye class $ k $ Correct label

・ ** Optimization function </ font> **: SGD The optimization function is a function that changes the weight so that the value of the loss function decreases. Calculate the gradient from the error and weight as shown and adjust the weight.

Click here for a description of the optimization function An optimization algorithm is more appropriate than a function, and SGD is shown as an example in the figure. SGD (Stochastic Gradient Descent) repeatedly updates weights little by little in mini-batch learning. The figure below is a conceptual animation of weight update of SGD, which is updated as $ w ← w ± εΔE $. In addition, there are optimization functions such as Adam and RMSprop. ![SGD.gif](https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/282390/657e4792-2b07-2462-deab-c0bc7cb500ee.gif)

・ ** Keras ** A high-level neural network library written in Python that can be run on TensorFlow, CNTK, or Theano.

・ ** PyTorch ** It’s a Python-based scientific computing package targeted at two sets of audiences: ・ A replacement for NumPy to use the power of GPUs ・ A deep learning research platform that provides maximum flexibility and speed

Implementation in Keras

See GitHub for the entire program. The following is an extraction of the MyNet part.

Implementation in Keras
# Build a model
from keras.applications.mobilenet import MobileNet
from keras.applications.resnet50 import ResNet50
from keras.layers.pooling import GlobalAveragePooling2D
from keras.layers.core import Dense, Dropout, Flatten
from keras.models import Model, load_model, Sequential
from keras.optimizers import Adam, RMSprop, SGD

base_model = Sequential()
top_model = Sequential()

INPUT_SHAPE = (img_size[0], img_size[1], 3)
neuron_total = 500

elif type_backbone == "MyNet":
    INPUT_SHAPE = (img_size[0], img_size[1], 3)
    base_model.add(Dense(neuron_total, activation='relu',
                         input_shape=(INPUT_SHAPE[0]*INPUT_SHAPE[1]*INPUT_SHAPE[2],)))
    base_model.add(Dropout(0.5))
    top_model.add(Dense(nb_classes, activation='softmax', 
                  input_shape=base_model.output_shape[1:]))

# Concatenate base_model(backbone) with top model
model = Model(input=base_model.input, output=top_model(base_model.output))

print("{}layer".format(len(model.layers)))

# Compile the model
model.compile(
    optimizer = SGD(lr=0.001),
    loss = 'categorical_crossentropy',
    metrics = ["accuracy"]
)

model.summary()

Implementation in PyTorch

See GitHub for the entire program. The following is an extraction of the MyNet part.

PyTorch implementation
from torch.autograd import Variable
import torch.nn as nn
import torch.nn.functional as F
import torchvision.models
from torchsummary import summary

neuron_total = 200
INPUT_SHAPE = (img_size[0], img_size[1], 3)
print(INPUT_SHAPE)
print(nb_classes)

# Create my model
class MyNet(nn.Module):
    def __init__(self):
        super().__init__()
        self.l1 = nn.Linear(INPUT_SHAPE[0]*INPUT_SHAPE[1]*INPUT_SHAPE[2], neuron_total)# Input Layer to Intermediate modules
        self.dropout1 = torch.nn.Dropout2d(p=0.5)
        self.l2 = nn.Linear(neuron_total, 2) #Intermediate modules to Output Layer

    def forward(self, x):#Forward propagation
        x = x.view(-1, INPUT_SHAPE[0]*INPUT_SHAPE[1]*INPUT_SHAPE[2] ) # x.view : Transform a tensor shape. If the first argument is "-1", automatically adjust to the second argument.
        x = self.l1(x)
        x = self.dropout1(x)
        x = self.l2(x)
        return x

if type_backbone == "ResNet50":
    model = Resnet()
elif type_backbone == "Mobilenet":
    model = Mobilenet()
elif type_backbone == "MyNet":
    model = MyNet()
model = model.to(device)
# Show the model 
summary(model, ( 3, img_size[1], img_size[0]))#channel, w, h

Comparison of the two in programming

Dataset preparation comparison

First, as mentioned in Dataset preparation for PyTorch, it is said that Keras uses numpy format and PyTorch uses DataLoader and tensor format. The point is different.

Model construction comparison

Next, regarding how to make a model, Keras will automatically match the shape when connecting layers with Dense etc., but PyTorch must clarify it. For example, if you add some middle layers in Figure 2, Keras

base_model = Sequential()
top_model = Sequential()
INPUT_SHAPE = (img_size[0], img_size[1], 3)
base_model.add(Dense(neuron_total, activation='relu',
                         input_shape=(INPUT_SHAPE[0]*INPUT_SHAPE[1]*INPUT_SHAPE[2],)))
base_model.add(Dense(neuron_total, activation='relu'))
base_model.add(Dense(neuron_total, activation='relu'))
base_model.add(Dropout(0.5))
top_model.add(Dense(nb_classes, activation='softmax', 
                  input_shape=base_model.output_shape[1:]))

# Concatenate base_model(backbone) with top model
model = Model(input=base_model.input, output=top_model(base_model.output))

In PyTorch

class MyNet2(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(INPUT_SHAPE[0]*INPUT_SHAPE[1]*INPUT_SHAPE[2], neuron_total)# Input Layer to Intermediate modules
        self.fc2 = nn.Linear(neuron_total, int(neuron_total/2)) #Intermediate modules to Output Layer
        self.dropout1 = torch.nn.Dropout2d(p=0.5)
        self.fc3 = nn.Linear(int(neuron_total/2), 2)

    def forward(self, x):#Forward propagation
        x = x.view(-1, INPUT_SHAPE[0]*INPUT_SHAPE[1]*INPUT_SHAPE[2] ) # x.view : Transform a tensor shape. If the first argument is "-1", automatically adjust to the second argument.
        x = self.fc1(x)
        x = self.fc2(x)
        x = F.relu(x)
        x = self.dropout1(x)
        x = self.fc3(x)
        return x

In PyTorch, the number of nodes is specified for both input and output.

Dropout comparison

I'm worried because I don't know much about it, but Keras shouldn't need to switch Dropout application between learning and evaluation. In PyTorch, Dropout is disabled by model.eval (), so when loading the test image, it is clearly stated that it is not in learning mode

param = torch.load(weights_folder_path + "/" + best_weights_path)
model.load_state_dict(param, strict=False)
model.eval()
# ~ Inference

Comparison of model_summary (number of parameters)

As you can see, the number of parameters matched exactly.

fig_modelsummary.png
Figure 5.Keras by model summary(left)And PyTorch(right)comparison

GPU usage comparison

It's a small story, but in Keras you do not need to change the description when using GPU, but in the case of PyTorch

#image, label = Variable(image), Variable(label)
image, label = Variable(image).cuda(), Variable(label).cuda()

It needs to be rewritten as.

Learning loop comparison

In Keras, by writing like model.fit, the learning evaluation loop is repeated for the number of epochs. In PyTorch, repeat for the number of epochs as follows in a for loop.

def train(epoch):
    #~Abbreviation
def validation():
    #~Abbreviation
for epoch in range(1, total_epochs + 1):
    train(epoch)
    validation()

Output comparison

Also, PyTorch uses log_softmax by default, so the total class probabilities will not be 1 (specify softmax or convert it yourself).

Comparison of both during learning

First, when I checked the operating status of the PC with Task Manager, there were the following differences.

fig4_tm.png
Figure 6. Kera(left)And PyTorch(right)Task manager performance during each learning (per 10 epoch)

The memory usage was small on the PyTorch side. Since Keras holds datasets in lists and numpy arrays (in this program), it inevitably consumes memory. GPU usage was also small on the PyTorch side.

Next, we will compare the learning execution speeds of the Keras and PyTorch networks. The table below summarizes the time [s] required for 40 epochs when trained using a network.

Keras PyTorch
ResNet 3520 s 3640 s
Mobilenet 1600 s 1760 s
MyNet 40 s 680 s

Keras sets verbose = 1 in model.fit, so I'm looking at the seconds of the value that was output without permission. It is accurate to calculate from the time per step, but it is annoying, so it is an approximate value. From the table above, PyTorch is slightly slower (about 3 seconds slower to 1 epoch). Especially MyNet is quite slow. However, PyTorch is more energy efficient (?). I intended PyTorch to be faster, but I feel like the code is bad. I feel that PyTorch is better for saving energy at almost the same speed.

Comparison of results

Inferred results of ResNet, Mobilenet, MyNet in Keras

The estimated results of Loss, Accuracy, and test images of the trained results are summarized below. The learning curve is terrible, but the results are reasonably reasonable.

fig5_Keras_l_a.png
Figure 7.Loss and Accuracy (Keras) for epochs in learning
fig6_a.png
Figure 8-a.Estimated result by ResNet50(Keras)
fig6_b.png
Figure 8-b.Guess results by Mobilenet v1(Keras)
fig6_c.png
Figure 8-c.Guess results by MyNet(Keras)

Inferred results of ResNet, Mobilenet, MyNet in PyTorch

The estimated results of Loss, Accuracy, and test images of the trained results are summarized below. Similar to Keras, so the result is shown in the fold.

** Click here for a summary of learning guess results with PyTorch **
fig9_l_a.png
Figure 9.Loss and Accuracy (PyTorch) for epochs in learning
fig10_a.png
Figure 10-a.Estimated result by ResNet50 (PyTorch)
fig10_b.png
Figure 10-b.Estimated result by Mobilenet v1 (PyTorch)
fig10_c.png
Figure 10-c.Guess by MyNet (PyTorch)

Based on the results of Keras and PyTorch

Both tend to be the same (because I tried to learn almost the same).

Both Keras and PyTorch can be classified by ResNet and Mobilenet, but not by MNIST level MyNet. However, looking at how Loss goes down, it seems that learning is not going well with ResNet and Mobilenet. This time, the test image is similar to the training data, so I think it was the correct answer. In the case of classification problems that are as similar as hook wrench and spanner wrench, it seems that the number of data is small with about 60 sheets. Moreover, I feel that even if all the data is available, it cannot be classified.

By the way, the result of learning the middle layer node with 500 and the number of learning with 100 epoch in MyNet is as follows.

fig_mynet100.png
Figure 11.The result of learning the middle layer node with 500 and the number of learning with 100 epoch with MyNet

The loss value of Validation will not decrease. Perhaps it is a problem that cannot be classified by a simple neural network that is not deep. It is necessary to devise whether to increase the number of layers or use CNN (Convolutional Neural Network).

bonus

We will classify the logos of the two companies for the manufacturer's logo that appeared in the previous machine tool sommelier. There are differences in shape, but can it be classified as a neural network? I'll try this on MyNet. For learning and evaluation, I used the Makino Milling Co., Ltd. logo and Okuma logo collected online, and for the test, I used my own handwritten logo. The one I wrote myself.

makino_logo_test1.png Okuma_logo_test1.png
Figure 12-a.Handwritten Makino Milling logo Figure 12-b.Handwritten Okuma logo

The transition of Loss and Accuracy is as follows.

graph_loss.png graph_acc.png
Figure 13-a.Transition of Loss with respect to Epoch Figure 13-b.Transition of Accuracy with respect to Epoch

I think you're learning better than learning a hook wrench and a spanner wrench. I guess it looks like this:

2figure.png 4figure.png
Figure 14-a.Makino Milling Logo Guess Results Figure 14-b.Okuma logo guess result

This result is very well classified. If there is a difference in shape like a logo, it seems that it is possible to classify even a neural network that is not deep.

Summary

--Hook wrench and spanner wrench cannot be classified by a simple neural network --If it is a corporate logo, it can be classified even if it is not deep

Execution environment Environment

  • Windows10
  • CPU:Core i7-7700HQ
  • Memory: 16GB
  • Graphic board: GTX1060 6GB
  • Strage: NVMe M.2 SSD 1TB
  • CUDA 9.0.176
  • cuDNN 7.0.5
  • If you have not installed CUDA or cuDNN, you need to build an environment.
  • Keras==2.1.5
  • tensorflow-gpu==1.11.0
  • torch==1.1.0
  • scikit-learn==0.19.1
  • scipy==1.4.1

reference

https://keras.io/ja/ https://pytorch.org/tutorials/beginner/blitz/tensor_tutorial.html https://qiita.com/sheep96/items/0c2c8216d566f58882aa https://rightcode.co.jp/blog/information-technology/pytorch-mnist-learning https://water2litter.net/rum/post/pytorch_tutorial_classifier/ https://qiita.com/jyori112/items/aad5703c1537c0139edb https://pystyle.info/pytorch-cnn-based-classification-model-with-fashion-mnist/ https://pytorch.org/docs/stable/torchvision/models.html https://qiita.com/perrying/items/857df46bb6cdc3047bd8 https://qiita.com/sakaia/items/5e8375d82db197222669 https://discuss.pytorch.org/t/low-accuracy-when-loading-the-model-and-testing/44991/5

Recommended Posts