Neural network with PyTorch

I referred to the following official reference. Neural Networks -- PyTorch Tutorials 1.4.0 documentation

The general procedure for training a neural network is as follows. ** 1. Prepare data (training data / test data). ** ** ** 2. Define a neural network with trainable parameters. (Define the network) ** ** 3. Calculate the loss function when training data is input to the network. (Loss function) ** ** 4. Calculate the slope of the loss function with respect to network parameters. (Backward) ** ** 5. Update the parameters based on the gradient of the loss function. (Optimize) ** ** 6. Train by repeating 3 to 6 many times. ** **

Build a neural network according to the procedure.

1. Data preparation

For the data used for training the neural network, use the data already prepared in the package, or use the data prepared by yourself.

If you want to use the one that is already prepared, it is convenient to use the torchvision package. Data sets torchvision.datasets such as MNIST and CIFAR10, which are often used in machine learning, are prepared, as well as a general-purpose machine learning model torchvision.models and a module torchvision.transforms for data processing. Has been done. See official documentation for details-> torchvision

When executing the training, prepare a data box called torch.utils.data.DataLoader. DataLoader is a set of data that combines the input data and its label, and is a batch size.

The preparation procedure is as follows. ** (1) Prepare transforms to preprocess data. ** ** ** (2) Instantiate the Dataset class with transforms as an argument to prepare Dataset. ** ** ** (3) Instantiate the DataLoader class with Dataset as an argument to prepare DataLoader. ** ** ** (4) At the time of training, use DataLoader to acquire training data and labels in batch size chunks. ** **

2. Definition of neural network

Neural networks can be constructed using the torch.nn package. nn executes the definition and differentiation of the model by using the automatic differentiation ʻautograd`.

nn.Module has various layers of neural network andforward (input)method. Therefore, when constructing a new network, the nn.Module class should be inherited.

import torch
import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        # 1 input image channel, 6 output channels, 3x3 square convolution
        # kernel
        self.conv1 = nn.Conv2d(1, 6, 3)
        self.conv2 = nn.Conv2d(6, 16, 3)
        # an affine operation: y = Wx + b
        self.fc1 = nn.Linear(16 * 6 * 6, 120)  # 6*6 from image dimension
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        # Max pooling over a (2, 2) window
        x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
        # If the size is a square you can only specify a single number
        x = F.max_pool2d(F.relu(self.conv2(x)), 2)
        x = x.view(-1, self.num_flat_features(x))
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features

net = Net()
print(net)

# ---Output---
#Net(
#  (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
#  (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
#  (fc1): Linear(in_features=576, out_features=120, bias=True)
#  (fc2): Linear(in_features=120, out_features=84, bias=True)
#  (fc3): Linear(in_features=84, out_features=10, bias=True)
#)

Define the layer held by the network with the __init__ () method. Most commonly used layers such as Linear and Conv2d are defined in torch.nn. See official documentation for details-> torch.nn

Similarly, processing such as relu and max_pool2d is defined in torch.nn.functional. It can be called and used as appropriate when processing is required. See official documentation for details-> torch.nn.functional

Define the forward propagation of the network with the forward () method. The layers to be passed and the processing to be executed until the input x is output are defined in order.

It is not necessary to define backward (), which is the back propagation of the network. By defining forward () and using ʻautograd`. Back propagation is automatically obtained.

Trainable parameters can be obtained with net.parameters (). Since the weight parameter and the bias parameter are acquired separately, a list of parameters with a length of $ \ times $ 2, which is the number of defined layers, is obtained.

params = list(net.parameters())
print(len(params))
print(params[0].size())   # conv1's weight
print(params[1].size())   # conv1's bias
print(params[0][0,:,:,:]) # conv1's weights on the first dimension

# ---Output---
#10
#torch.Size([6, 1, 3, 3])
#torch.Size([6])
#tensor([[[-0.0146, -0.0219,  0.0491],
#         [-0.3047, -0.0137,  0.0954],
#         [-0.2612, -0.2972, -0.2798]]], grad_fn=<SliceBackward>)

Enter appropriate data of $ 32 \ times 32 $ for this network.

input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

# ---Output---
#tensor([[-0.0703,  0.0575, -0.0679, -0.1168, -0.1093,  0.0815, -0.0085,  0.0408,
#          0.1275,  0.0472]], grad_fn=<AddmmBackward>)

The input random number is output through the layer with the initial parameters.

You can make the gradient of all parameters zero with the zero_grad () method. It is recommended to run zero_grad () before running backward () to avoid unexpected parameter updates.

torch.nn assumes that a mini-batch is input. For example, nn.Conv2d needs to prepare a 4-dimensional Tensor ($ \ rm {nSamples} \ times nChannels \ times Height \ times Width $) as an input.

3. Loss function

Commonly used loss functions such as MSELoss () and CrossEntropyLoss () are provided in the nn package. In the following, MSE Loss is calculated using the output value when a random number is input and a sequence of random numbers of the same size.

input = torch.randn(1, 1, 32, 32)
output = net(input)
target = torch.randn(10)    # a dummy target, for example
target = target.view(1,-1)  # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)

# ---Output---
#tensor(0.5322, grad_fn=<MseLossBackward>)

If you follow the forward propagation so far,

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d 
      -> view -> linear -> relu -> linear -> relu -> linear 
      -> MSELoss 
      -> loss

It can be confirmed by looking at the grad_fn attribute.

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

# ---Output---
#<MseLossBackward object at 0x7f5008a1c4e0>
#<AddmmBackward object at 0x7f5008a1c5c0>
#<AccumulateGrad object at 0x7f5008a1c4e0>

4. Gradient calculation

The gradient of the loss function is required to perform error backpropagation for parameter update. In PyTorch, if you execute loss.backward () for the loss function loss, the gradient will be calculated automatically. In order to avoid the accumulation of gradients, it is recommended to execute net.zero_grad () for each iteration during training to eliminate the gradients.

net.zero_grad()     # zeroes the gradient buffers of all parameters
print("conv1.bias.grad before backward")
print(net.conv1.bias.grad)

loss.backward()
print("conv1.bias.grad after backward")
print(net.conv1.bias.grad)

# ---Output---
#conv1.bias.grad before backward
#tensor([0., 0., 0., 0., 0., 0.])
#conv1.bias.grad after backward
#tensor([ 0.0072, -0.0051, -0.0008, -0.0017,  0.0043, -0.0030])

5. Parameter update

Parameter update (optimization) can be quoted from torch.optim. Here, we try to use the stochastic gradient descent method (SGD) defined by the following equation. See official documentation for details-> torch.optim

weight -> weight - learning_rate * gradient

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output,target)
loss.backward()
optimizer.step()        # do the update

6. Training

Network training is performed by repeating steps 3 to 6 above.

Implementation using CIFAR10

As an example, we train a neural network that classifies images using CIFAR10. I referred to the official reference below. Training a Classifier -- PyTorch Tutorials 1.4.0 documentation

Data preparation

Acquire and standardize the CIFAR10 data provided in torchvision.datasets. Since the data in the torchvision dataset is a PILImage with values in the range [0,1], it is standardized here as a Tensor with values in the range [-1,1].

import torchvision
import torchvision.transforms as transforms

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Let's display the prepared data.

import matplotlib.pyplot as plt
import numpy as np

def imshow(img):
    img = img/2 + 0.5 # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1,2,0)))
    plt.show()
    
# get some random training imges
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))

# print labels
print(''.join('%5s' % classes[labels[j]] for j in range(4)))

[Output]

Network construction

Next, we build a network for classifying images.

import torch.nn as nn
import torch.nn.functional as F

class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

Definition of loss function and optimization method

Once the network is built, define the loss function and optimization method.

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

Training

Once the network, loss function, and optimization method have been defined, training is started using the training data.

for epoch in range(2): # loop over the dataset multiple times
    
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        
        # zero the parameter gradients
        optimizer.zero_grad()
        
        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs,labels)
        loss.backward()
        optimizer.step()
        
        # print statistics
        running_loss += loss.item()
        if i%2000==1999: # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss/2000))
            running_loss = 0.0
            
print('Finished Training')

# ---Output---
#[1,  2000] loss: 2.149
#[1,  4000] loss: 1.832
#[1,  6000] loss: 1.651
#[1,  8000] loss: 1.573
#[1, 10000] loss: 1.514
#[1, 12000] loss: 1.458
#[2,  2000] loss: 1.420
#[2,  4000] loss: 1.371
#[2,  6000] loss: 1.348
#[2,  8000] loss: 1.333
#[2, 10000] loss: 1.326
#[2, 12000] loss: 1.293
#Finished Training

Here, training using all 12000 training data is performed twice. As the amount of data used for training increases, the loss function loss becomes smaller, so it is possible to observe the progress of learning. (It seems that learning has not been completed yet, but this time we will stop here and move on.)

Save model parameters

The parameters of the trained model can be saved with torch.save ().

PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)

Apply to test data

Apply a trained network to the test data. First, check the contents of the test data.

dataiter = iter(testloader)
images, labels = dataiter.next()

imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))

[Output]

Then, read the saved network parameters. After that, input the test data into the read model and display the classification result.

net = Net()
net.load_state_dict(torch.load(PATH))
# ---Output---
# <All keys matched successfully>

outputs = net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))
# ---Output---
# Predicted:    cat  ship plane plane

The third image is misjudged as plane instead of ship, but the other three are correctly classified.

Let's calculate the correct answer rate for all 10000 test data.

correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        
print('Accuracy of the network on the 10000 test images: %d %%' % (100*correct/total))
# ---Output---
# Accuracy of the network on the 10000 test images: 52 %

The correct answer rate is 52%, which is not very accurate as an image classifier.

Next, try to obtain the correct answer rate for each type of classification.

class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))

with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs,1)
        c = (predicted == labels).squeeze()
        for i in range(4):
            label = labels[i]
            class_correct[label] += c[i].item()
            class_total[label] += 1
            
for i in range(10):
    print('Accuracy of %5s : %2d %%' % ( classes[i], 100*class_correct[i]/class_total[i]))

# ---Output---
# Accuracy of plane : 61 %
# Accuracy of   car : 61 %
# Accuracy of  bird : 52 %
# Accuracy of   cat : 26 %
# Accuracy of  deer : 34 %
# Accuracy of   dog : 51 %
# Accuracy of  frog : 67 %
# Accuracy of horse : 43 %
# Accuracy of  ship : 76 %
# Accuracy of truck : 50 %

From this, it can be seen that although we are not good at classifying cats, we are good at classifying ships.

When using GPU

When training on GPU, it is necessary to specify CUDA device with device. First, check if the GPU is available. If the code below shows cuda: 0, the GPU is available.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)

# ---Output---
# cuda:0

You can move networks and data on the GPU with .to (device). When training, don't forget to move the data to the GPU for each iteration.

net.to(device)
inputs, labels = data[0].to(device), data[1].to(device)

Summary

Finally, the above procedure is summarized in one code.

# import packages -------------------------------
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms

# prepare data ----------------------------------
transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

# define a network ------------------------------
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.pool = nn.MaxPool2d(2, 2)
        self.conv2 = nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16 * 5 * 5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        x = self.pool(F.relu(self.conv1(x)))
        x = self.pool(F.relu(self.conv2(x)))
        x = x.view(-1, 16 * 5 * 5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        return x

net = Net()

# define loss function and optimizer -------------
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

# start training ---------------------------------
for epoch in range(2): # loop over the dataset multiple times
    
    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data
        
        # zero the parameter gradients
        optimizer.zero_grad()
        
        # forward + backward + optimize
        outputs = net(inputs)
        loss = criterion(outputs,labels)
        loss.backward()
        optimizer.step()
        
        # print statistics
        running_loss += loss.item()
        if i%2000==1999: # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss/2000))
            running_loss = 0.0
            
print('Finished Training')

# check on test data ----------------------------
correct = 0
total = 0
with torch.no_grad():
    for data in testloader:
        images, labels = data
        outputs = net(images)
        _, predicted = torch.max(outputs, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()
        
print('Accuracy of the network on the 10000 test images: %d %%' % (100*correct/total))

Basics of PyTorch (2) -How to make a neural network-