Moon and soft-shelled turtle

The moon and soft-shelled turtle are similarly round, but the difference is so great that they cannot be compared. The parable that the two are so different. https://dictionary.goo.ne.jp/word/%E6%9C%88%E3%81%A8%E9%BC%88/

There seems to be a big difference, so let's try if you can recognize images using deep learning!

I'll also explain Pytorch (a little) as appropriate. (If you make a mistake, please correct it. Thank you.)

The code is here. https://github.com/kyasby/Tuki-Suppon.git

This keyword

"Moon and soft-shelled turtle"

It seems to be similar and different.

pytorch's "torch vision.datasets.ImageFolder」

I made it because there weren't many articles that used torchvision.datasets.ImageFolderofpytorch, which corresponds to keras from_from_directry. If you put an image in the folder, it will be labeled automatically. Convenient.

pytorch "torch.utils.data.random_split」

Thanks to this, there is no need to separate train and test when putting photos in a folder.

Data to use

From google image ・ 67 images of soft-shelled turtle We have collected images that look like they were seen from above the shell. For example, an image like this. (Pii-san's soft-shelled turtle) http://photozou.jp/photo/show/235691/190390795

There weren't many images with a nice soft-shelled turtle, so some of them are used instead.

・ 70 images of the moon I have collected images of round moons. I cut it out by hand so that a large circle appears on the screen. For example, an image like this.

The image has been scraped.
Scraping is not mentioned here.

Data set creation

.
├── main.ipynb
├── pics
   ├── tuki
   |     |-tuki1.png
   |     |-tuki2.png
   |        
   └── kame
        |-kame1.png
        |-kame2.png

Since the images are divided into directories, use torchvision.datasets.ImageFolde to automatically label each directory.

Module import

import matplotlib.pyplot as plt
import numpy as np
import copy
import time
import os
from tqdm import tqdm

import torchvision.transforms as transforms
import torchvision.models as models
import torchvision

import torch.nn as nn
import torch

Preprocessing

transform_dict = {
        'train': transforms.Compose(
            [transforms.Resize((256,256)),
             transforms.RandomHorizontalFlip(),
             transforms.ToTensor(),
             transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                  std=[0.229, 0.224, 0.225]),
             ]),
        'test': transforms.Compose(
            [transforms.Resize((256,256)),
             transforms.ToTensor(),
             transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                  std=[0.229, 0.224, 0.225]),
             ])}

Create a pre-processing dictionary for train and test. You can create a pre-processing sequence by using transforms.Compose. It seems that they are processed in the order they were put in the arguments.

This time, transforms.Resize(256, 256) → Resize the image to 256x256.

transforms.RandomHorizontalFlip() → Create an image that is flipped horizontally.

transforms.ToTensor() → PIL or numpy.ndarray ((height x width x channel) (0 ~ 255)) To It converts to Tensor ((channel x height x width) (0.0 ~ 1.0)).

In PIL and numpy, the images are in the order of (height x width x channel), but in Pytorch, it should be noted that (channel x height x width). It seems that this order is easier to handle.

transforms.Normalize(mean=[0.485, 0.456, 0.406],std=[0.229, 0.224, 0.225]) → Normalizes each GRB with the specified mean and standard deviation.

document https://pytorch.org/docs/stable/torchvision/transforms.html

data set

# ex.
# data_folder = "./pics"
# transform   = transform_dict["train"]

data = torchvision.datasets.ImageFolder(root=data_folder, transform=transform_dict[phase])

Create a dataset from the above directory.

Separation of train and test

# ex.
# train_ratio = 0.8

train_size = int(train_ratio * len(data))
# int()To an integer.
val_size  = len(data) - train_size      
data_size  = {"train":train_size, "val":val_size}
#          =>{"train": 112,       "val": 28}
data_train, data_val = torch.utils.data.random_split(data, [train_size, val_size])

torch.utils.data.random_split(dataset, lengths) Will divide the dataset ** randomly **, ** without cover **. Of course, the dataset is the dataset You can pass the number of datasets in a list to lengths.

I also stored the train and valid data sizes in the dictionary.

# ex.
# data_train => Subset(data, [4,5,1,7])
# data_val 　=> Subset(data, [3,8,2,6])

There are as many return values as there are list lengths. Each return value contains a list of datasets and index numbers.

(What is a Subset?)

Data loader

train_loader = torch.utils.data.DataLoader(data_train, batch_size=batch_size, shuffle=True)
val_loader   = torch.utils.data.DataLoader(data_val,   batch_size=batch_size, shuffle=False)
dataloaders  = {"train":train_loader, "val":val_loader}

Create a data loader. Pytorch creates a data loader like this to load data. I also put this in the dictionary.

Check the image

def imshow(img):
    img = img / 2 + 0.5     
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()

#Randomly acquire training data
dataiter = iter(dataloaders["train"])
images, labels = dataiter.next()

#Image display
imshow(torchvision.utils.make_grid(images))
#Label display
print(' '.join('%5s' % labels[labels[j]] for j in range(8)))

It seems that the above code will display it like this. I got it from here. Https://qiita.com/kuto/items/0ff3ccb4e089d213871d スクリーンショット 2020-04-20 18.37.17.png

Modeling

model = models.resnet18(pretrained=True)
for param in model.parameters():
    print(param)
# => Parameter containing:
#tensor([[[[-1.0419e-02, -6.1356e-03, -1.8098e-03,  ...,  5.6615e-02,
#            1.7083e-02, -1.2694e-02],
#          ...
#           -7.1195e-02, -6.6788e-02]]]], requires_grad=True)

The model uses ResNet18. By putting pretrained = True in the argument, you can use the trained model. Transfer learning is performed without learning existing parameters. The weight displayed as requires_grad = True is updated. To prevent it from being updated, set as follows.

model
#　=> ResNet(
#   (conv1): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3), bias=False)
#   (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#   (relu): ReLU(inplace=True)
#   (maxpool): MaxPool2d(kernel_size=3, stride=2, padding=1, dilation=1, ceil_mode=False)
#   (layer1): Sequential(
#     (0): BasicBlock(
#       (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
#       (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#       (relu): ReLU(inplace=True)
#       (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False)
#       (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
#     )
#   ...
#   (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
#   (fc): Linear(in_features=512, out_features=1000, bias=True)
# )

I knew that the final layer was (fc), so

for p in model.parameters():
    p.requires_grad=False
model.fc = nn.Linear(512, 2)

Extract all parameters with model.parameters (), set requires_grad = False, and overwrite the final layer.

Learning settings


model = model.cuda()　#If you don't have a GPU, you don't need this line.
lr = 1e-4
epoch = 40
optim = torch.optim.Adam(model.parameters(), lr=lr, weight_decay=1e-4)
criterion = nn.CrossEntropyLoss().cuda() #Without GPU.cuda()I don't need it.

If you want to use GPU, you need to send the model to GPU.

The model remains almost a tutorial. https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html

Modeling


def train_model(model, criterion, optimizer, scheduler=None, num_epochs=25):
    #:Returns a bool value.
    use_gpu = torch.cuda.is_available()
    #Start time
    since = time.time()
    
    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0
    
    #Create a dictionary with a list for saving the progress.
    loss_dict ={"train" : [],  "val" : []}
    acc_dict = {"train" : [],  "val" : []}  

    for epoch in tqdm(range(num_epochs)):
        if (epoch+1)%5 == 0:#The epoch is displayed once every five times.
            print('Epoch {}/{}'.format(epoch, num_epochs - 1))
            print('-' * 10)
        
        #In each epoch, train,Execute val.
        #The power put in the dictionary is demonstrated here, and you can write train and val in one go.
        for phase in ['train', 'val']:
            if phase == 'train':
                model.train()   #Learning mode. Do dropout etc.
            else:
                model.val()  #Inference mode. Do not drop out.
            
            running_loss = 0.0
            running_corrects = 0
            
            for data in dataloaders[phase]:
                inputs, labels = data #The data created by ImageFolder is
                                      #It will label the data.
                
                #Not required if you don't use GPU
                if use_gpu:
                    inputs = inputs.cuda()
                    labels = labels.cuda()
                
                

                #~~~~~~~~~~~~~~forward~~~~~~~~~~~~~~~
                outputs = model(inputs)

                _, preds = torch.max(outputs.data, 1)
               #torch.max returns the actual value and index.
               #torch.max((0.8, 0.1),1)=> (0.8, 0)
               #Argument 1 is whether to return the maximum value in the row direction or the column direction.
                loss = criterion(outputs, labels)

                if phase == 'train':
                    optimizer.zero_grad()
                    loss.backward()
                    optimizer.step()
                
                # statistics #Without GPU item()Unnecessary
                running_loss += loss.item() * inputs.size(0) 
                running_corrects += torch.sum(preds == labels)
                # (preds == labels)Is[True, True, False]Etc., but
                #python true,False is 1 each,Since it corresponds to 0,
                #You can sum with sum.
               
               #Store progress in list
               loss_dict[phase].append(epoch_loss)
               acc_dict[phase].append(epoch_acc)

            #Divide by the number of samples to get the average.
            #Putting the number of samples in the dictionary comes to life.
            epoch_loss = running_loss / data_size[phase]
            #Without GPU item()Unnecessary
            epoch_acc = running_corrects.item() / data_size[phase]

           #tensot().item()You can retrieve the value from the tensor by using.
           #print(tensorA)       => tensor(112, device='cuda:0')
           #print(tensorA.itme)) => 112
                        
            #I use format,.With nf, you can output up to n digits after the decimal point.
            #It's the same as C language.
            print('{} Loss: {:.4f} Acc: {:.4f}'.format(phase, epoch_loss, epoch_acc))
            
            # deep copy the model
            #Save the model when accuracy improves
            if phase == 'val' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())
            #Without deepcopy, model.state_dict()Due to changes in the contents
            #The copied (should) data will also change.
            #The difference between copy and deepcopy is easy to understand in this article.
            # https://www.headboost.jp/python-copy-deepcopy/

    time_elapsed = time.time() - since
    print('Training complete in {:.0f}m {:.0f}s'.format(time_elapsed // 60, time_elapsed % 60))
    print('Best val acc: {:.4f}'.format(best_acc))
    
    #Reads and returns the best weight.
    model.load_state_dict(best_model_wts)
    return model, loss_dict, acc_dict

Learning

model_ft, loss, acc = train_model(model, criterion, optim, num_epochs=epoch)

Visualize learning


#loss,Take out acc.
loss_train = loss["train"]
loss_val   = loss["val"]

acc_train = acc["train"]
acc_val   = acc["val"]


#By writing like this, you can create a graph of rows x cols.
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,5))

#0th graph
axes[0].plot(range(epoch), loss_train, label = "train")
axes[0].plot(range(epoch), loss_val,    label =  "val")
axes[0].set_title("Loss")
axes[0].legend()#Display label of each graph

#1st graph
axes[1].plot(range(epoch), acc_train, label = "train")
axes[1].plot(range(epoch), acc_val,    label =  "val")
axes[1].set_title("Train Loss")
axes[1].legend()

#Adjust so that the 0th and 1st graphs do not overlap
fig.tight_layout()

Are you overfitting at around 11 or 12 epochs?

bonus

GoogleColabolatory Collaboration is an easy way to use the GPU. https://colab.research.google.com/notebooks/welcome.ipynb?hl=ja When using images in collaboration, it is convenient to zip and upload. (It is difficult to upload one by one.) (The method of linking with the drive is also OK) At that time, the decompression can be done as follows.

#/content/pics.Please change each zip.
!unzip /content/pics.zip -d /content/data > /dev/null 2>&1 &

Also, "Copy path" that appears when you right-click the file is convenient.

matplotlib.plt This time, I output a graph with 1 row and 2 columns, but if it is 2 rows and 2 columns, for example, you can create a graph as follows. You can also overwrite a plot on a graph and plot two at the same time. I plotted two graphs at a time.

loss_train = loss["train"]
loss_val    = loss["val"]

acc_train = acc["train"]
acc_val    = acc["val"]

fig, axes = plt.subplots(nrows=2, ncols=2, figsize=(10,5))
axes[0,0].plot(range(epoch), loss_train, label = "train")
axes[0,0].plot(range(epoch), loss_val,    label =  "val")
axes[0,0].set_title("Loss")
axes[0,0].legend()

axes[0,1].plot(range(epoch), acc_train, c="red",  label = "train")
axes[0,1].plot(range(epoch), acc_val,    c="pink", label =  "val")
axes[0,1].set_title("Train Loss")
axes[0,1].legend()

x = np.random.rand(100)
xx = np.random.rand(200)
axes[1,0].hist(xx, bins=25, label="xx")
axes[1,0].hist(x, bins=50,   label="x")
axes[1,0].set_title("histgram")

y = np.random.randn(100)
z = np.random.randn(100)
axes[1,1].scatter(y, z, alpha=0.8, label="y,z") 
axes[1,1].scatter(z, y, alpha=0.8, label="z,y")
axes[1,1].set_title("Scatter")
axes[1,1].legend()

fig.tight_layout()

I tried image recognition of "Moon and Soft-shelled Turtle" with Pytorch (using torchvision.datasets.ImageFolder which corresponds to from_from_directry of keras)

Moon and soft-shelled turtle

This keyword

Data to use

Data set creation

Module import

Preprocessing

data set

Separation of train and test

Data loader

Check the image

Modeling

Learning settings

Modeling

Learning

Visualize learning

bonus