"Deep Learning from scratch" Self-study memo (Part 12) Deep learning

While reading "Deep Learning from scratch" (written by Yasuki Saito, published by O'Reilly Japan), I will make a note of the sites I referred to. Part 11 ←

8.1.1 To a deeper network

In P243, it is explained that the network definition is deep_convnet.py in the folder ch08, the training code is train_deepnet.py, and the learned weight parameter is deep_conv_net_params.pkl, but the weight parameter is read and the test data is read. The program to process is misclassified_mnist.py. This program can be executed by creating JupyterNote in the folder ch08. If you try to run it on the base directory, you will have to change the location and call of deep_convert.py and deep_conv_net_params.pkl.

Execution result of misclassified_mnist.py

The contents of the DeepConvNet class have only been explained so far, so I think it is necessary to check the contents again.

so,

This time, let's use this class to process Kaggle's dog and cat dataset.

Kaggle cat and dog dataset

Data content

Download from the following site https://www.microsoft.com/en-us/download/details.aspx?id=54765

Images are contained in the folders Cat and Dog under the folder PetImages. The file names are serial numbers from 0.jpg to 12499.jpg. This means that there are 12500 images for each dog and cat. The images are in color and are different in size. If you store one 0.jpg of this image in a NumPy array and look at the contents

(375, 500, 3)

It is stored as a three-dimensional array of (height, width, color). The color 0 is red, 1 is green, and 2 is blue. In Memo 6-2, it was converted to grayscale, but this time it is converted to 3 colors and 3 channels of data.

Align the size of the image and convert it to a format that can be processed

Arrange the image data in an array of channel 3, height 80, and width 80.

import os
import glob
from PIL import Image
import numpy as np 

dataset_dir = os.path.dirname(os.path.abspath('__file__'))+'/dataset'
catfiles = glob.glob(dataset_dir + '/Cat/*.jpg')
dogfiles = glob.glob(dataset_dir + '/Dog/*.jpg')

fsize = 80

for f in catfiles:
    try:
        lblA = 0
        pad_u, pad_d, pad_l, pad_r = 0,0,0,0 
        img = Image.open(f)
        w,h=img.size
        if w>h:
            imgr = img.resize((fsize, int(h*fsize/w)))
            wr,hr = imgr.size
            pad_u = int((fsize - hr)/2)
            pad_d = fsize - hr - pad_u
        else:
            imgr = img.resize((int(w*fsize/h),fsize))
            wr,hr = imgr.size
            pad_l = int((fsize - wr)/2)
            pad_r = fsize - wr - pad_l

        imgtr = np.array(imgr).transpose(2,0,1)
        imgA = np.pad(imgtr, [(0, 0),(pad_u,pad_d),(pad_l,pad_r)], 'constant')      
        imgA = imgA.tolist()
    except Exception as e:
        print(f+" : " + str(e))

Information of the image file read by PIL

img = Image.open(f)
print(img.format, img.size, img.mode)

JPEG (500, 375) RGB

The size of the file is (width, height). Converting this to a numpy array, it looks like this:

imgA = np.array(img)
print(imgA.size, imgA.shape)

562500 (375, 500, 3)

(Height, width, color). Based on this, resize the image to 80x80 and swap the dimensional axes to create an array with channel 3, height 80, and width 80.

However, when the above program continuously processes the files in the folder, this error occurs.

C:\Users\021133/dataset/Cat\10125.jpg : axes don't match array C:\Users\021133/dataset/Cat\10501.jpg : axes don't match array C:\Users\021133/dataset/Cat\1074.jpg : Python int too large to convert to C ssize_t C:\Users\021133/dataset/Cat\666.jpg : cannot identify image file

Apparently, there is no dimensional axis around transpose (2,0,1). 1074.jpg is also an error when processed with Memo 6-2. There seems to be an error when reading the file. 666.jpg is a 0 byte empty file.

If you look at 10125.jpg

img=Image.open(dataset_dir + '/10125.jpg')
print(img.format, img.size, img.mode)
imgA = np.array(img)
print(imgA.size, imgA.shape,imgA.ndim)

GIF (259, 346) P 89614 (346, 259) 2

As far as I can see in the viewer, the color image is displayed without any problem. The problem seems to be in P-pallet mode. So I modified the reading method.

img=Image.open(dataset_dir + '/10501.jpg').convert('RGB')
print(img.format, img.size, img.mode)
imgA = np.array(img)
print(imgA.size, imgA.shape,imgA.ndim)

None (400, 299) RGB 358800 (299, 400, 3) 3

Apparently this is fine.

Moved the problem file CAt 1074.jpg 5127.jpg 666.jpg Dog 11702.jpg 829.jpg 8366.jpg to another folder and increased the test data to 100 and the training data to 4000 to avoid memory errors. I decided to squeeze it.

import os
import glob
import numpy as np 

from PIL import Image

dataset_dir = os.path.dirname(os.path.abspath('__file__'))+'/dataset'
catfiles = glob.glob(dataset_dir + '/Cat/*.jpg')
dogfiles = glob.glob(dataset_dir + '/Dog/*.jpg')

fsize = 80

tsl = []
tsi = []
trl = []
tri = []
tst_count = 0
trn_count = 0

count = 0

for f in catfiles:
    try:
        lblA = 0
        img = Image.open(f).convert('RGB')
        w,h=img.size
        pad_u, pad_d, pad_l, pad_r = 0,0,0,0
        if w>h:
            imgr = img.resize((fsize, int(h*fsize/w)))
            wr,hr = imgr.size
            pad_u = int((fsize - hr)/2)
            pad_d = fsize - hr - pad_u
        else:
            imgr = img.resize((int(w*fsize/h),fsize))
            wr,hr = imgr.size
            pad_l = int((fsize - wr)/2)
            pad_r = fsize - wr - pad_l

        imgtr = np.array(imgr).transpose(2,0,1)
        imgA = np.pad(imgtr, [(0, 0),(pad_u,pad_d),(pad_l,pad_r)], 'constant')      
        imgA = imgA.tolist()
    except Exception as e:
        print(f+" : " + str(e))

    if count < 50:
        tsl.append(lblA)
        tsi.append(imgA)
        tst_count += 1
    elif count < 2050:
        trl.append(lblA)
        tri.append(imgA)
        trn_count += 1
    else:
        break

    count += 1
    
count = 0
        
for f in dogfiles:
    try:
        lblA = 1
        img = Image.open(f).convert('RGB')
        w,h=img.size
        pad_u, pad_d, pad_l, pad_r = 0,0,0,0
        if w>h:
            imgr = img.resize((fsize, int(h*fsize/w)))
            wr,hr = imgr.size
            pad_u = int((fsize - hr)/2)
            pad_d = fsize - hr - pad_u
        else:
            imgr = img.resize((int(w*fsize/h),fsize))
            wr,hr = imgr.size
            pad_l = int((fsize - wr)/2)
            pad_r = fsize - wr - pad_l

        imgtr = np.array(imgr).transpose(2,0,1)
        imgA = np.pad(imgtr, [(0, 0),(pad_u,pad_d),(pad_l,pad_r)], 'constant')      
        imgA = imgA.tolist()

    except Exception as e:
        print(f+" : " + str(e))
    
    if count < 50:
        tsl.append(lblA)
        tsi.append(imgA)
        tst_count += 1
    elif count < 2050:
        trl.append(lblA)
        tri.append(imgA)
        trn_count += 1
    else:
        break
        
    count += 1

dataset = {}
dataset['test_label']  = np.array(tsl, dtype=np.uint8)
dataset['test_img']    = np.array(tsi, dtype=np.uint8)
dataset['train_label'] = np.array(trl, dtype=np.uint8)
dataset['train_img']   = np.array(tri, dtype=np.uint8) 

import pickle

save_file = dataset_dir + '/catdog.pkl'    
with open(save_file, 'wb') as f:
    pickle.dump(dataset, f, -1)

dataset['test_img'].shape

(100, 3, 80, 80)

dataset['train_img'].shape

(4000, 3, 80, 80)

DeepConvNet class

The code for the DeepConvNet class is in deep_convnet.py in the folder ch08, but it seems that there is a part that matches the input of (1,28,28).

    def __init__(self, input_dim=(1, 28, 28),
                 conv_param_1 = {'filter_num':16, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_2 = {'filter_num':16, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_3 = {'filter_num':32, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_4 = {'filter_num':32, 'filter_size':3, 'pad':2, 'stride':1},
                 conv_param_5 = {'filter_num':64, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_6 = {'filter_num':64, 'filter_size':3, 'pad':1, 'stride':1},
                 hidden_size=50, output_size=10):

input_dim = (1, 28, 28) is natural, but this is not a problem as you can give it another value as a parameter. The problem is the size of the weight W7.

        #Weight initialization===========
        #How many connections each neuron in each layer has with the neurons in the presheaf (TODO):Calculate automatically)
        pre_node_nums = np.array([1*3*3, 16*3*3, 16*3*3, 32*3*3, 32*3*3, 64*3*3, 64*4*4, hidden_size])
        weight_init_scales = np.sqrt(2.0 / pre_node_nums)  #Recommended initial value when using ReLU

Here, you can specify the channel x height x width of the filter used in each layer, but this will happen when the input data becomes (3,80,80).

        pre_node_nums = np.array([3*3*3, 16*3*3, 16*3*3, 32*3*3, 32*3*3, 64*3*3, 64*10*10, hidden_size])

The size of the weight W7 was also like this,

        self.params['W7'] = weight_init_scales[6] * np.random.randn(64*4*4, hidden_size)

If you do not change this, the program will not work properly.

        self.params['W7'] = weight_init_scales[6] * np.random.randn(64*10*10, hidden_size)

The reason why it is 10x10 is that it is based on the height x width of the input data. The input data this time is 80x80, but if it passes through the pooling layer once in the middle, it will be half the size. In this class definition, it passes 3 times, so 80 → 40 → 20 → 10. The 4x4 of the original program is 28 → 14 → 7 → 4.

Learning process

import sys, os
import pickle
import numpy as np
from common.functions import *
from common.optimizer import *
from deep_convnet import DeepConvNet

def to_one_hot(label):
    t = np.zeros((label.size, 2))
    for i in range(label.size):
        t[i][label[i]] = 1
    return t

dataset_dir = os.path.dirname(os.path.abspath('__file__'))+'/dataset'

mnist_file = dataset_dir + '/catdog.pkl'
with open(mnist_file, 'rb') as f:
    dataset = pickle.load(f)
x_train = dataset['train_img']
t_train = to_one_hot(dataset['train_label'])
    
#Hyperparameters
iters_num = 30
train_size = x_train.shape[0]
batch_size = 12
learning_rate = 0.1

train_loss_list = []

network = DeepConvNet( input_dim=(3, 80, 80),
                 conv_param_1 = {'filter_num':16, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_2 = {'filter_num':16, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_3 = {'filter_num':32, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_4 = {'filter_num':32, 'filter_size':3, 'pad':2, 'stride':1},
                 conv_param_5 = {'filter_num':64, 'filter_size':3, 'pad':1, 'stride':1},
                 conv_param_6 = {'filter_num':64, 'filter_size':3, 'pad':1, 'stride':1},
                 hidden_size=50, output_size=2)

optimizer = Adam(lr=learning_rate)

for i in range(iters_num):
    #Get a mini batch
    batch_mask = np.random.choice(train_size, batch_size)
    x_batch = x_train[batch_mask]
    t_batch = t_train[batch_mask]
    
    #Gradient calculation
    grads = network.gradient(x_batch, t_batch) 
    optimizer.update(network.params, grads)
    
    #Record of learning progress
    loss = network.loss(x_batch, t_batch)
    train_loss_list.append(loss)

#Save the network object with pickle. Saved objects are used for inference processing
import pickle
save_file = dataset_dir + '/catdogA.pkl'
with open(save_file, 'wb') as f:
    pickle.dump(network, f, -1)

The program worked, but I couldn't actually learn it. Even if I try inference processing, the correct answer rate does not reach 50%. After all, isn't it wrong that the number of data to be trained in one batch is only 12? However, if you process it any further, a memory error will occur.

network.layers

[common.layers.Convolution at 0x3610030, common.layers.Relu at 0xbad8170, common.layers.Convolution at 0xbabf990, common.layers.Relu at 0xbabf870, common.layers.Pooling at 0xbabf950, common.layers.Convolution at 0xbabf430, common.layers.Relu at 0xbabf0f0, common.layers.Convolution at 0xbabf230, common.layers.Relu at 0xbabf570, common.layers.Pooling at 0xbabf130, common.layers.Convolution at 0xbabf4d0, common.layers.Relu at 0xbabf1f0, common.layers.Convolution at 0xbabf210, common.layers.Relu at 0xbabf190, common.layers.Pooling at 0xbabf9f0, common.layers.Affine at 0xbabf970, common.layers.Relu at 0xbabf270, common.layers.Dropout at 0xbabf9b0, common.layers.Affine at 0xbabf470, common.layers.Dropout at 0xbabf370]

print(x_batch.shape)
print(network.params['W1'].shape)
print(network.params['W2'].shape)
print(network.params['W3'].shape)
print(network.params['W4'].shape)
print(network.params['W5'].shape)
print(network.params['W6'].shape)
print(network.params['W7'].shape)
print(network.params['W8'].shape)

(12, 3, 80, 80) (16, 3, 3, 3) (16, 16, 3, 3) (32, 16, 3, 3) (32, 32, 3, 3) (64, 32, 3, 3) (64, 64, 3, 3) (6400, 50) (50, 2)

Inference processing

#Evaluation with test data
import numpy as np
import sys, os
import pickle
from deep_convnet import DeepConvNet

dataset_dir = os.path.dirname(os.path.abspath('__file__'))+'/dataset'

mnist_file = dataset_dir + '/catdog.pkl'
with open(mnist_file, 'rb') as f:
    dataset = pickle.load(f)
x_test = dataset['test_img']
t_test = dataset['test_label']

test_size = 10
test_mask = np.random.choice(100, test_size)
x = x_test[test_mask]
t = t_test[test_mask]
 
#network = DeepConvNet()
weight_file = dataset_dir + '/catdogA.pkl'
with open(weight_file, 'rb') as f:
    network = pickle.load(f)

y = network.predict(x)
accuracy_cnt = 0
for i in range(len(y)):
    p= np.argmax(y[i]) 
    if p == t[i]:
        accuracy_cnt += 1

print("Accuracy:" + str(float(accuracy_cnt) / len(x)))

Accuracy:0.5

import matplotlib.pyplot as plt
class_names = ['cat', 'dog']

def showImg(x):
    example = x.transpose(1,2,0)
    plt.figure()
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(example, cmap=plt.cm.binary)
    plt.show()
    return

for i in range(test_size):
    c = t[i]
    print("Correct answer" + str(c) + " " + class_names[c])
    p = np.argmax(y[i])
    v = y[p]
    print("Judgment" + str(p) + " " + class_names[p] + " " + str(v) )
    showImg(x[i])

The probability of the judgment result is a very small value. After all, it may not have been processed normally. However, it is difficult to repeat the test in an environment where a memory error occurs immediately ...

I looked at the contents of the weight

import numpy as np
import matplotlib.pyplot as plt
dataset_dir = os.path.dirname(os.path.abspath('__file__'))+'/dataset'

def filter_show(filters, nx=8, margin=3, scale=10):
    """
    c.f. https://gist.github.com/aidiary/07d530d5e08011832b12#file-draw_weight-py
    """
    FN, C, FH, FW = filters.shape
    ny = int(np.ceil(FN / nx))

    fig = plt.figure()
    fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
    
    for i in range(FN):
        ax = fig.add_subplot(ny, nx, i+1, xticks=[], yticks=[])
        ax.imshow(filters[i][0], cmap=plt.cm.binary, interpolation='nearest')

    plt.show()

#network = DeepConvNet()
weight_file = dataset_dir + '/catdogA.pkl'
with open(weight_file, 'rb') as f:
    network = pickle.load(f)

filter_show(network.params['W1'])

It doesn't look like there is a pattern.

Part 11 ←

Referenced site

Difference between P-mode and L-mode images of PIL