Effects of image rotation, enlargement, color, etc. on convolutional neural networks (CNN)

For the purpose of understanding the characteristics of convolutional neural networks (CNN) and other machine learning methods (gradient boosting, multi-layer perceptron), the classification performance is improved when the image is rotated, enlarged, or changed in color. I tried how it would change.

The image was generated by the method of Automatically generate images of koala and bear. Can you distinguish between koalas and bears by silhouette?

code

Image data reading

from PIL import Image

koalas = []
for i in range(num_data):
    koala = Image.open("koala_or_bear/koala_{}.jpg ".format(i))
    koalas.append(koala)

bears = []
for i in range(num_data):
    bear = Image.open("koala_or_bear/bear_{}.jpg ".format(i))
    bears.append(bear)

First data display

%matplotlib inline
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(10,10))
for i in range(16):
    ax = fig.add_subplot(4, 4, i+1)
    ax.axis('off')
    if i < 8:
        ax.set_title('koala_{}'.format(i))
        ax.imshow(koalas[i],cmap=plt.cm.gray, interpolation='none')
    else:
        ax.set_title('bear_{}'.format(i - 8))
        ax.imshow(bears[i - 8],cmap=plt.cm.gray, interpolation='none')
plt.show()

Explanatory variable / objective variable

import numpy as np

X = [] #Explanatory variable
Y = [] #Objective variable

index = 0
for koala in koalas:
    resize_img = koala.resize((128, 128))
    r, g, b = resize_img.split()
    r_resize_img = np.asarray(np.float32(r)/255.0)
    g_resize_img = np.asarray(np.float32(g)/255.0)
    b_resize_img = np.asarray(np.float32(b)/255.0)
    rgb_resize_img = np.asarray([r_resize_img, g_resize_img, b_resize_img])
    X.append(rgb_resize_img)
    Y.append(0)
    index += 1
    if index >= num_data:
        break

index = 0
for bear in bears:
    resize_img = bear.resize((128, 128))
    r, g, b = resize_img.split()
    r_resize_img = np.asarray(np.float32(r)/255.0)
    g_resize_img = np.asarray(np.float32(g)/255.0)
    b_resize_img = np.asarray(np.float32(b)/255.0)
    rgb_resize_img = np.asarray([r_resize_img, g_resize_img, b_resize_img])
    X.append(rgb_resize_img)
    Y.append(1)
    index += 1
    if index >= num_data:
        break

X = np.array(X, dtype='float32')
Y = np.array(Y, dtype='int64')

Separated into teacher set and test set

from sklearn import model_selection
X_train, X_test, Y_train, Y_test = model_selection.train_test_split(
    X, Y, test_size=0.1
)

Data type conversion for scikit-learn

d1, d2, d3, d4 = X_train.shape
X_train_a = X_train.reshape((d1, d2 * d3 * d4))
Y_train_onehot = np.identity(2)[Y_train]
d1, d2, d3, d4 = X_test.shape
X_test_a = X_test.reshape((d1, d2 * d3 * d4))
Y_test_onehot = np.identity(2)[Y_test]

Data type conversion for PyTorch

import torch
from torch.utils.data import TensorDataset
from torch.utils.data import DataLoader

X_train_t = torch.from_numpy(X_train).float()
Y_train_t = torch.from_numpy(Y_train).long()

X_train_v = torch.autograd.Variable(X_train_t)
Y_train_v = torch.autograd.Variable(Y_train_t)

X_test_t = torch.from_numpy(X_test).float()
Y_test_t = torch.from_numpy(Y_test).long()

X_test_v = torch.autograd.Variable(X_test_t)
Y_test_v = torch.autograd.Variable(Y_test_t)

train = TensorDataset(X_train_t, Y_train_t)
train_loader = DataLoader(train, batch_size=32, shuffle=True)

Gradient boosting

%%time
from sklearn.ensemble import GradientBoostingClassifier

classifier = GradientBoostingClassifier()
classifier.fit(X_train_a, Y_train)
print("Accuracy score (train): ", classifier.score(X_train_a, Y_train))
print("Accuracy score (test): ", classifier.score(X_test_a, Y_test))

Multilayer perceptron (1 intermediate layer)

%%time
from sklearn.neural_network import MLPClassifier

classifier = MLPClassifier(max_iter=10000, early_stopping=True)
classifier.fit(X_train_a, Y_train)
print("Accuracy score (train): ", classifier.score(X_train_a, Y_train))
print("Accuracy score (test): ", classifier.score(X_test_a, Y_test))

Multilayer perceptron (2 intermediate layers)

%%time
from sklearn.neural_network import MLPClassifier
classifier = MLPClassifier(max_iter=10000, early_stopping=True,
                           hidden_layer_sizes=(100, 100))
classifier.fit(X_train_a, Y_train)
print("Accuracy score (train): ", classifier.score(X_train_a, Y_train))
print("Accuracy score (test): ", classifier.score(X_test_a, Y_test))

Convolutional Neural Network (CNN)

Network definition

class CNN(torch.nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = torch.nn.Conv2d(3, 10, 5)
        self.conv2 = torch.nn.Conv2d(10, 20, 5)
        self.fc1 = torch.nn.Linear(20 * 29 * 29, 50)
        self.fc2 = torch.nn.Linear(50, 2)
    
    def forward(self, x):
        x = torch.nn.functional.relu(self.conv1(x))
        x = torch.nn.functional.max_pool2d(x, 2)
        x = torch.nn.functional.relu(self.conv2(x))
        x = torch.nn.functional.max_pool2d(x, 2)
        x = x.view(-1, 20 * 29 * 29)
        x = torch.nn.functional.relu(self.fc1(x))
        x = torch.nn.functional.log_softmax(self.fc2(x), 1)
        return x

Checking the network structure

from torchsummary import summary
model = CNN()
summary(model, X[0].shape)
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 10, 124, 124]             760
            Conv2d-2           [-1, 20, 58, 58]           5,020
            Linear-3                   [-1, 50]         841,050
            Linear-4                    [-1, 2]             102
================================================================
Total params: 846,932
Trainable params: 846,932
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.19
Forward/backward pass size (MB): 1.69
Params size (MB): 3.23
Estimated Total Size (MB): 5.11
----------------------------------------------------------------

Function for execution

def learn(model, criterion, optimizer, n_iteration):
    for epoch in range(n_iteration):
        total_loss = np.array(0, dtype='float64')
        for x, y in train_loader:
            x = torch.autograd.Variable(x)
            y = torch.autograd.Variable(y)
            optimizer.zero_grad()
            y_pred = model(x)
            loss = criterion(y_pred, y)
            loss.backward()
            optimizer.step()
            total_loss += loss.data.numpy()
        
        if (epoch + 1) % 10 == 0:
            print(epoch + 1, total_loss)
        
        if total_loss == np.array(0, dtype='float64'):
            break
        
        loss_history.append(total_loss)
    
    return model

Execution code

#%%time
model = CNN()
criterion = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
loss_history = []
model = learn(model, criterion, optimizer, 300)

Loss history display

ax = plt.subplot(2, 1, 1)
ax.plot(loss_history)
ax.grid()
ax = plt.subplot(2, 1, 2)
ax.plot(loss_history)
ax.set_yscale('log')
ax.grid()

Correct answer rate

Correct answer rate (teacher set)

Y_pred = torch.max(model(X_train_v).data, 1)[1]
accuracy = sum(Y_train == Y_pred.numpy()) / len(Y_train)
print(accuracy)

Correct answer rate (test set)

Y_pred = torch.max(model(X_test_v).data, 1)[1]
accuracy = sum(Y_test == Y_pred.numpy()) / len(Y_test)
print(accuracy)

result

The first is the simplest question to distinguish between koalas and bears.

基本

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 1min 2s 1.0 1.0
Multilayer Perceptron (1 intermediate layer) 50.9 s 1.0 1.0
Multilayer Perceptron (Intermediate 2 Layers) 35.1 s 1.0 1.0
Convolutional Neural Network (CNN) - 1.0 1.0

All prediction methods were able to answer perfectly.

Expansion

Let's magnify Koala and Kuma at random magnifications. At this time, I will try to shift it slightly vertically and horizontally.

拡大

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 5min 11s 1.0 0.975
Multilayer Perceptron (1st intermediate layer) 1min 4s 1.0 1.0
Multilayer Perceptron (Intermediate 2 Layers) 1min 4s 0.977 1.0
Convolutional Neural Network (CNN) - 1.0 1.0

The percentage of correct answers for gradient boosting has dropped slightly. Convolutional Neural Networks (CNN) have the perfect percentage of correct answers.

There is an obstacle

Draw something that might get in the way in the background.

邪魔物あり

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 1min 23s 1.0 1.0
Multilayer perceptron (intermediate 1 layer) 38.3 s 1.0 1.0
Multilayer Perceptron (2 intermediate layers) 1min 3s 0.983 1.0
Convolutional Neural Network (CNN) - 1.0 1.0

It seems to have almost no effect.

Color only the background

Let's make the background colorful.

背景だけ彩色

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 1min 30s 1.0 1.0
Multilayer perceptron (intermediate 1 layer) 43.9 s 0.9916 1.0
Multilayer Perceptron (Intermediate 2 Layers) 41.6 s 1.0 1.0
Convolutional Neural Network (CNN) - 1.0 1.0

This also seems to have little effect.

Color only animals

Let's make koalas and bears colorful.

動物だけ彩色

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 4min 26s 1.0 0.975
Multilayer Perceptron (1 intermediate layer) 27.8 s 0.494 0.55
Multilayer Perceptron (Intermediate 2 Layers) 1min 9s 0.816 0.775
Convolutional Neural Network (CNN) - 0.505 0.45

The prediction accuracy has dropped considerably. The performance of the convolutional neural network (CNN) has dropped. that? Surprisingly, the multi-layer perceptron (middle two layers) is doing well. And the performance of gradient boosting has hardly deteriorated. It's amazing.

Enlarged / colored / with obstacles

拡大・彩色・邪魔物

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 7min 24s 1.0 0.9
Multilayer perceptron (intermediate 1 layer) 42.4 s 0.6861 0.6
Multilayer Perceptron (2 intermediate layers) 1min 50s 0.925 0.75
Convolutional Neural Network (CNN) - 0.5 0.5

The difficulty has increased, but the gradient boosting is pretty good. Convolutional neural networks (CNNs) are completely useless.

Enlargement / coloring / animals only black / with obstacles

拡大・彩色・動物だけ黒・邪魔物あり

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 6min 12s 1.0 0.975
Multilayer Perceptron (1 intermediate layer) 1min 1s 0.9916 0.975
Multilayer Perceptron (Intermediate 2 Layers) 1min 12s 1.0 1.0
Convolutional Neural Network (CNN) - 1.0 1.0

All prediction performance was restored just by unifying the colors of Koala and Kuma to black. After all, that is the criterion.

rotation

Koalas spin, bears spin

回転

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 3min 10s 1.0 0.925
Multilayer Perceptron (1 intermediate layer) 27.4 s 0.5 0.5
Multilayer Perceptron (Intermediate 2 Layers) 1min 20s 0.994 1.0
Convolutional Neural Network (CNN) - 1.0 1.0

The multi-layer perceptron (middle 1 layer) didn't work, but everyone else seems to have done it about rotation. Gradient boosting seems to be a little weak in rotation.

Rotate / enlarge

Let's expand while rotating.

回転・拡大・グレースケール

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 5min 28s 1.0 0.775
Multilayer Perceptron (1 intermediate layer) 1min 33s 0.825 0.7
Multilayer Perceptron (Intermediate 2 Layers) 30.9 s 0.65 0.675
Convolutional Neural Network (CNN) - 0.505 0.45

It's okay if it's just rotation, it's okay if it's just expansion. However, it seems that everyone gets confused if it expands while rotating. Still, gradient boosting, I'm doing my best.

Rotation / enlargement / coloring / black only for animals / with obstacles

回転・拡大・彩色・動物だけ黒・邪魔物あり

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 7min 6s 1.0 0.6
Multilayer Perceptron (1 intermediate layer) 29.5 s 0.5194 0.325
Multilayer Perceptron (Intermediate 2 Layers) 33 s 0.572 0.65
Convolutional Neural Network (CNN) - 0.5194 0.325

If various factors (rotation, enlargement, obstacles, colors) are attacked individually, even if it is not a big problem, it seems that it will be difficult if those factors are mixed and attacked.

All the various factors

いろんな要因全部のせ

Method Learning time Correct answer rate (teacher set) Correct answer rate (test set)
Gradient Boosting 7min 55s 1.0 0.45
Multilayer Perceptron (1 intermediate layer) 31.8 s 0.505 0.45
Multilayer Perceptron (Intermediate 2 Layers) 55.7 s 0.6027 0.45
Convolutional Neural Network (CNN) - 0.505 0.45

I was able to ride all the various factors. This seems to be a mess in any way.

Summary

I wanted to get a feel for the convolutional neural network (CNN), so I wanted to get a "CNN wins alone !!!" </ b> result, but the conclusion is "gradient boosting". Great yeah yeah! "</ B>. (ヽ ´ω`)

Recommended Posts