0. Introduction

In this article, I will write down the basic implementation using PyTorch (also as a memorandum). Take the classification of CIFAR10 (color image classification set) as an example.

1. About the programs posted

1.1. MIT license

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

1.2. Full view of the program

The whole view of the program is as follows.

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

ave = 0.5               #Normalized average
std = 0.5               #Normalized standard deviation
batch_size_train = 256  #Learning batch size
batch_size_test = 16    #Test batch size
val_ratio = 0.2         #Ratio of validation data to total data
epoch_num = 30          #Number of learning epochs

class Net(nn.Module):
    #Definition of network structure
    def __init__(self):
        super(Net, self).__init__()
        self.init_conv = nn.Conv2d(3,16,3,padding=1)
        self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])
        self.bn1 = nn.ModuleList([nn.BatchNorm2d(16) for _ in range(3)])
        self.pool = nn.MaxPool2d(2, stride=2)
        self.fc1 = nn.ModuleList([nn.Linear(16*16*16, 128), nn.Linear(128, 32)])
        self.output_fc = nn.Linear(32, 10)

    #Forward calculation
    def forward(self, x):
        x = F.relu(self.init_conv(x))
        for l,bn in zip(self.conv1, self.bn1):
            x = F.relu(bn(l(x)))
        x = self.pool(x)
        x = x.view(-1,16*16*16) # flatten
        for l in self.fc1:
            x = F.relu(l(x))
        x = self.output_fc(x)
        return x

def set_GPU():
    #GPU settings
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(device)
    return device

def load_data():
    #Data loading
    transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((ave,),(std,))])
    train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

    #Validation data split
    n_samples = len(train_set)
    val_size = int(n_samples * val_ratio)
    train_set, val_set = torch.utils.data.random_split(train_set, [(n_samples-val_size), val_size])

    #DataLoader definition
    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size_train, shuffle=True, num_workers=2)
    val_loader = torch.utils.data.DataLoader(val_set, batch_size=batch_size_train, shuffle=False, num_workers=2)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size_test, shuffle=False, num_workers=2)

    return train_loader, test_loader, val_loader

def train():
    device = set_GPU()
    train_loader, test_loader, val_loader = load_data()
    model = Net()
    model.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)

    min_loss = 999999999
    print("training start")
    for epoch in range(epoch_num):
        train_loss = 0.0
        val_loss = 0.0
        train_batches = 0
        val_batches = 0
        model.train()   #Training mode
        for i, data in enumerate(train_loader):   #Read by batch
            inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of

            #Gradient reset
            optimizer.zero_grad()

            outputs = model(inputs)    #Forward calculation
            loss = criterion(outputs, labels)   #Loss calculation
            loss.backward()                     #Reverse calculation(Gradient calculation)
            optimizer.step()                    #Parameter update

            #Cumulative history
            train_loss += loss.item()
            train_batches += 1

        # validation_Loss calculation
        model.eval()    #Inference mode
        with torch.no_grad():
            for i, data in enumerate(val_loader):   #Read by batch
                inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of
                outputs = model(inputs)               #Forward calculation
                loss = criterion(outputs, labels)   #Loss calculation

                #Cumulative history
                val_loss += loss.item()
                val_batches += 1

        #History output
        print('epoch %d train_loss: %.10f' %
              (epoch + 1,  train_loss/train_batches))
        print('epoch %d val_loss: %.10f' %
              (epoch + 1,  val_loss/val_batches))

        with open("history.csv",'a') as f:
            print(str(epoch+1) + ',' + str(train_loss/train_batches) + ',' + str(val_loss/val_batches),file=f)

        #Save the best model
        if min_loss > val_loss/val_batches:
            min_loss = val_loss/val_batches
            PATH = "best.pth"
            torch.save(model.state_dict(), PATH)

        #Dynamic change of learning rate
        scheduler.step(val_loss/val_batches)

    #Save model of final epoch
    print("training finished")
    PATH = "lastepoch.pth"
    torch.save(model.state_dict(), PATH)

if __name__ == "__main__":
    train()

2. Explanation of each implementation

2.0. Library import and constant definition

The libraries used in this article are as follows.

import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np

Define constants to be used later.

ave = 0.5               #Normalized average
std = 0.5               #Normalized standard deviation
batch_size_train = 256  #Learning batch size
batch_size_test = 16    #Test batch size
val_ratio = 0.2         #Ratio of validation data to total data
epoch_num = 30          #Number of learning epochs

2.1. GPU settings

If you use GPU, you need to specify device before various settings.

def set_GPU():
    #GPU settings
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
    print(device)
    return device

The GPU is used through this object called device. For example

data.to(device)
model.to(device)

By doing so, data and neural network models can be loaded on the GPU.

2.2 Data preparation

Several datasets are already available in PyTorch. For example, with CIFAR10, you can prepare as follows.

def load_data():
    #Data loading
    transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((ave,),(std,))])
    train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

    #Validation data split
    n_samples = len(train_set)
    val_size = int(n_samples * val_ratio)
    train_set, val_set = torch.utils.data.random_split(train_set, [(n_samples-val_size), val_size])

    #DataLoader definition
    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size_train, shuffle=True, num_workers=2)
    val_loader = torch.utils.data.DataLoader(val_set, batch_size=batch_size_train, shuffle=False, num_workers=2)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size_test, shuffle=False, num_workers=2)

    return train_loader, test_loader, val_loader

I will explain in order. First, transform represents a series of processes that have the function of transforming data.

    transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((ave,),(std,))])

In the above example, transforms.ToTensor () changes the data to Tensor (PyTorch data type), and then transforms.Normalize ((ave,), (std,)) normalizes the mean ave and standard deviation std. Is going. Compose plays the role of organizing a series of processes. There are several other types of data conversion. See documentation.

Next, read the data of CIFAR10.

    train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
    test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)

Specify the download destination for root. It seems that PyTorch's existing dataset is divided into train and test. Specify with the option train. Here, set to perform data transformation by passing the transform defined earlier as an argument.

I want to separate the verification data from the read data. For that, we use torch.utils.data.random_split.

    train_set, val_set = torch.utils.data.random_split(train_set, [(n_samples-val_size), val_size])

It randomly divides the data into the number specified by the second argument.

When learning with PyTorch, using DataLoader is very convenient when learning. Make it as follows.

    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size_train, shuffle=True, num_workers=2)
    val_loader = torch.utils.data.DataLoader(val_set, batch_size=batch_size_train, shuffle=False, num_workers=2)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size_test, shuffle=False, num_workers=2)

Pass the dataset as the first argument. batch_size specifies the batch size, shuffle specifies the presence / absence of data shuffle, and num_workers specifies the number of subprocesses (parallel number) at the time of reading. By the way, the identity of DataLoader is an iterator. Therefore, at the time of learning, data is fetched for each batch with a for statement.

If you want to use your own data

There are many situations where you want to use your own data. In this case, you should define your own dataset class as follows. (Reference: https://qiita.com/mathlive/items/2a512831878b8018db02)

class MyDataset(torch.utils.data.Dataset):
    def __init__(self, data, label, transform=None):
        self.transform = transform
        self.data = data
        self.data_num = len(data)
        self.label = label

    def __len__(self):
        return self.data_num

    def __getitem__(self, idx):
        out_data = self.data[idx]
        out_label =  self.label[idx]
        if self.transform:
            out_data = self.transform(out_data)

        return out_data, out_label

At least, define len (a function that returns the size of data) and getitem (a function that gets data) in the class. With this,

dataset = MyDataset(Input data,Teacher label, transform=Required data conversion)

You can create a dataset like this. As an aside, if you look at the contents of MyDataset, you can somehow understand what PyTorch is doing inside. In other words, when idx is specified, the data corresponding to that index is returned. Moreover, if any transform is specified, it is applied to the data. It is possible to make various operations by playing with getitem (for example, using two transforms), but it is omitted.

2.3. Creating a neural network

PyTorch makes it easy to define neural networks using classes. For the CIFAR10 classifier, for example:

class Net(nn.Module):
    #Definition of network structure
    def __init__(self):
        super(Net, self).__init__()
        self.init_conv = nn.Conv2d(3,16,3,padding=1)
        self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])
        self.bn1 = nn.ModuleList([nn.BatchNorm2d(16) for _ in range(3)])
        self.pool = nn.MaxPool2d(2, stride=2)
        self.fc1 = nn.ModuleList([nn.Linear(16*16*16, 128), nn.Linear(128, 32)])
        self.output_fc = nn.Linear(32, 10)

    #Forward calculation
    def forward(self, x):
        x = F.relu(self.init_conv(x))
        for l,bn in zip(self.conv1, self.bn1):
            x = F.relu(bn(l(x)))
        x = self.pool(x)
        x = x.view(-1,16*16*16) # flatten
        for l in self.fc1:
            x = F.relu(l(x))
        x = self.output_fc(x)
        return x

I will explain various things.

Network construction

The network is created by inheriting nn.Module. Define each layer as its own member in init.

    def __init__(self):
        super(Net, self).__init__()
        self.init_conv = nn.Conv2d(3,16,3,padding=1)
        self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])
        self.bn1 = nn.ModuleList([nn.BatchNorm2d(16) for _ in range(3)])
        self.pool = nn.MaxPool2d(2, stride=2)
        self.fc1 = nn.ModuleList([nn.Linear(16*16*16, 128), nn.Linear(128, 32)])
        self.output_fc = nn.Linear(32, 10)

For example, nn.Conv2d passes the following arguments.

nn.Conv2d(Number of input channels, number of output channels, kernel size, padding=Padding size, stride=Amount of movement)

See the official documentation for details. (https://pytorch.org/docs/stable/nn.html)

A little technique

The network layer can also be defined as an array using nn.ModuleList.

        self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])

This is especially useful when defining a large-scale network with a repeating structure. In addition, it seems that the parameters cannot be updated if the array is changed to a normal array without using nn.ModuleList. (Reference: https://qiita.com/perrying/items/857df46bb6cdc3047bd8) Make sure to use nn.ModuleList properly.

Definition of forward calculation

Forward calculation is defined as forward.

    def forward(self, x):
        x = F.relu(self.init_conv(x))
        for l,bn in zip(self.conv1, self.bn1):
            x = F.relu(bn(l(x)))
        x = self.pool(x)
        x = x.view(-1,16*16*16) # flatten
        for l in self.fc1:
            x = F.relu(l(x))
        x = self.output_fc(x)
        return x

The layers arranged by nn.ModuleList are taken out by the for statement. This kind of place is also convenient. Also, on the way

        x = x.view(-1,16*16*16) # flatten

a. Here, the image-like data is converted into a one-dimensional vector (pass the number of channels x the vertical width of the image x the horizontal width of the image as the second argument). The reason why the first argument is -1 is that the conversion is automatically performed according to the batch size.

2.4. Loss function and update method

The loss function is defined in torch.nn and the update method is defined in torch.optim, which are called and used. This time, we use CrossEntropyLoss as the loss function for classification. Adam is used as the update method.

criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

Dynamic learning rate setting

In order to perform learning efficiently, we may want to dynamically set (or reduce) the learning rate. In that case, use something called lr_scheduler. For example

scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)

Define the scheduler like this. With this, after calculating the loss to the validation data,

scheduler.step(val_loss)

If there is no improvement during the (patience) epoch, the learning rate will be automatically reduced. This can prevent stagnation of learning.

Self-made loss function

As with datasets, you may want to create your own loss function. It seems that this can be created by inheriting the PyTorch class or defined as a simple function. (Reference: https://kento1109.hatenablog.com/entry/2018/08/13/092939) For simple regression / classification tasks, MSELoss / CrossEntropyLoss often works well, but for machine learning papers, the loss function is devised to improve performance, so it is a fairly important implementation.

2.5. Actual learning

After preparing up to this point, we will finally start learning. First, apply each setting defined so far.

    device = set_GPU()
    train_loader, test_loader, val_loader = load_data()
    model = Net()
    model.to(device)

    criterion = nn.CrossEntropyLoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)
    scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)

The learning is as follows. Basically as commented out.

    min_loss = 999999999
    print("training start")
    for epoch in range(epoch_num):
        train_loss = 0.0
        val_loss = 0.0
        train_batches = 0
        val_batches = 0
        model.train()   #Training mode
        for i, data in enumerate(train_loader):   #Read by batch
            inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of

            #Gradient reset
            optimizer.zero_grad()

            outputs = model(inputs)    #Forward calculation
            loss = criterion(outputs, labels)   #Loss calculation
            loss.backward()                     #Reverse calculation(Gradient calculation)
            optimizer.step()                    #Parameter update

            #Cumulative history
            train_loss += loss.item()
            train_batches += 1

        # validation_Loss calculation
        model.eval()    #Inference mode
        with torch.no_grad():
            for i, data in enumerate(val_loader):   #Read by batch
                inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of
                outputs = model(inputs)               #Forward calculation
                loss = criterion(outputs, labels)   #Loss calculation

                #Cumulative history
                val_loss += loss.item()
                val_batches += 1

        #History output
        print('epoch %d train_loss: %.10f' %
              (epoch + 1,  train_loss/train_batches))
        print('epoch %d val_loss: %.10f' %
              (epoch + 1,  val_loss/val_batches))

        with open("history.csv",'a') as f:
            print(str(epoch+1) + ',' + str(train_loss/train_batches) + ',' + str(val_loss/val_batches),file=f)

        #Save the best model
        if min_loss > val_loss/val_batches:
            min_loss = val_loss/val_batches
            PATH = "best.pth"
            torch.save(model.state_dict(), PATH)

        #Dynamic change of learning rate
        scheduler.step(val_loss/val_batches)

    #Save model of final epoch
    print("training finished")
    PATH = "lastepoch.pth"
    torch.save(model.state_dict(), PATH)

In PyTorch, it is necessary to explicitly describe the calculation of the loss function and the back propagation of the error (there is also a library that wraps this up).

Official tutorial (https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) for testing the learned model etc. ). In the case of this model, the accuracy of each class is about 40 to 80% (the blur is quite large ...).

About train mode and eval mode

For Batch Normalization and Dropout, it is necessary to switch the behavior between learning and inference. Therefore, during learning and inference,

model.train()
model.eval()

It seems that it is necessary to add each description. In addition to this, there is also a mode called torch.no_grad (). This is a mode that does not save gradient information. Gradient information is not required because backward calculation is not performed at the time of verification, and omitting this will increase the calculation speed and save memory.

3. At the end

I feel like I'm exhausted on the way, so I may add it soon. I'm glad if you can use it as a reference.

4. References

About overall usage

Mr. fukuit: https://qiita.com/fukuit/items/215ef75113d97560e599 Mr. perrying: https://qiita.com/perrying/items/857df46bb6cdc3047bd8

Construction of classifier (CIFAR10)

Official Tutorial: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html transform Official documentation: https://pytorch.org/docs/stable/torchvision/transforms.html

Existing dataset

Official documentation: https://pytorch.org/docs/stable/torchvision/datasets.html

Around data processing

Official documentation: https://pytorch.org/docs/stable/data.html

About self-made data set

mathlive: https://qiita.com/mathlive/items/2a512831878b8018db02

About self-made loss function

Mr. kento1109: https://kento1109.hatenablog.com/entry/2018/08/13/092939

ModuleList official

Official documentation: https://pytorch.org/docs/stable/generated/torch.nn.ModuleList.html

Learning rate Scheduler

Official documentation: https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate

Save and load the model

Official Tutorial: https://pytorch.org/tutorials/beginner/saving_loading_models.html jyori112: https://qiita.com/jyori112/items/aad5703c1537c0139edb

eval mode and no_grad

Official Tutorial: https://pytorch.org/tutorials/beginner/saving_loading_models.html PyTorch Forum: https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615 s0sem0y: https://www.hellocybernetics.tech/entry/2018/02/20/182906

Summary of basic implementation by PyTorch