I referred to the following official reference. Neural Networks -- PyTorch Tutorials 1.4.0 documentation
The general procedure for training a neural network is as follows. ** 1. Prepare data (training data / test data). ** ** ** 2. Define a neural network with trainable parameters. (Define the network) ** ** 3. Calculate the loss function when training data is input to the network. (Loss function) ** ** 4. Calculate the slope of the loss function with respect to network parameters. (Backward) ** ** 5. Update the parameters based on the gradient of the loss function. (Optimize) ** ** 6. Train by repeating 3 to 6 many times. ** **
Build a neural network according to the procedure.
For the data used for training the neural network, use the data already prepared in the package, or use the data prepared by yourself.
If you want to use the one that is already prepared, it is convenient to use the torchvision
package.
Data sets torchvision.datasets
such as MNIST and CIFAR10, which are often used in machine learning, are prepared, as well as a general-purpose machine learning model torchvision.models
and a module torchvision.transforms
for data processing. Has been done.
See official documentation for details-> torchvision
When executing the training, prepare a data box called torch.utils.data.DataLoader
. DataLoader
is a set of data that combines the input data and its label, and is a batch size.
The preparation procedure is as follows.
** (1) Prepare transforms
to preprocess data. ** **
** (2) Instantiate the Dataset class with transforms
as an argument to prepare Dataset
. ** **
** (3) Instantiate the DataLoader class with Dataset
as an argument to prepare DataLoader
. ** **
** (4) At the time of training, use DataLoader
to acquire training data and labels in batch size chunks. ** **
Neural networks can be constructed using the torch.nn
package.
nn
executes the definition and differentiation of the model by using the automatic differentiation ʻautograd`.
nn.Module
has various layers of neural network andforward (input)
method.
Therefore, when constructing a new network, the nn.Module
class should be inherited.
import torch
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
# 1 input image channel, 6 output channels, 3x3 square convolution
# kernel
self.conv1 = nn.Conv2d(1, 6, 3)
self.conv2 = nn.Conv2d(6, 16, 3)
# an affine operation: y = Wx + b
self.fc1 = nn.Linear(16 * 6 * 6, 120) # 6*6 from image dimension
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
# Max pooling over a (2, 2) window
x = F.max_pool2d(F.relu(self.conv1(x)), (2, 2))
# If the size is a square you can only specify a single number
x = F.max_pool2d(F.relu(self.conv2(x)), 2)
x = x.view(-1, self.num_flat_features(x))
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
def num_flat_features(self, x):
size = x.size()[1:] # all dimensions except the batch dimension
num_features = 1
for s in size:
num_features *= s
return num_features
net = Net()
print(net)
# ---Output---
#Net(
# (conv1): Conv2d(1, 6, kernel_size=(3, 3), stride=(1, 1))
# (conv2): Conv2d(6, 16, kernel_size=(3, 3), stride=(1, 1))
# (fc1): Linear(in_features=576, out_features=120, bias=True)
# (fc2): Linear(in_features=120, out_features=84, bias=True)
# (fc3): Linear(in_features=84, out_features=10, bias=True)
#)
Define the layer held by the network with the __init__ ()
method.
Most commonly used layers such as Linear
and Conv2d
are defined in torch.nn
.
See official documentation for details-> torch.nn
Similarly, processing such as relu
and max_pool2d
is defined in torch.nn.functional
.
It can be called and used as appropriate when processing is required.
See official documentation for details-> torch.nn.functional
Define the forward propagation of the network with the forward ()
method.
The layers to be passed and the processing to be executed until the input x
is output are defined in order.
It is not necessary to define backward ()
, which is the back propagation of the network.
By defining forward ()
and using ʻautograd`. Back propagation is automatically obtained.
Trainable parameters can be obtained with net.parameters ()
.
Since the weight parameter and the bias parameter are acquired separately, a list of parameters with a length of $ \ times $ 2, which is the number of defined layers, is obtained.
params = list(net.parameters())
print(len(params))
print(params[0].size()) # conv1's weight
print(params[1].size()) # conv1's bias
print(params[0][0,:,:,:]) # conv1's weights on the first dimension
# ---Output---
#10
#torch.Size([6, 1, 3, 3])
#torch.Size([6])
#tensor([[[-0.0146, -0.0219, 0.0491],
# [-0.3047, -0.0137, 0.0954],
# [-0.2612, -0.2972, -0.2798]]], grad_fn=<SliceBackward>)
Enter appropriate data of $ 32 \ times 32 $ for this network.
input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)
# ---Output---
#tensor([[-0.0703, 0.0575, -0.0679, -0.1168, -0.1093, 0.0815, -0.0085, 0.0408,
# 0.1275, 0.0472]], grad_fn=<AddmmBackward>)
The input random number is output through the layer with the initial parameters.
You can make the gradient of all parameters zero with the zero_grad ()
method. It is recommended to run zero_grad ()
before running backward ()
to avoid unexpected parameter updates.
torch.nn
assumes that a mini-batch is input. For example, nn.Conv2d
needs to prepare a 4-dimensional Tensor ($ \ rm {nSamples} \ times nChannels \ times Height \ times Width $) as an input.
Commonly used loss functions such as MSELoss ()
and CrossEntropyLoss ()
are provided in the nn
package.
In the following, MSE Loss
is calculated using the output value when a random number is input and a sequence of random numbers of the same size.
input = torch.randn(1, 1, 32, 32)
output = net(input)
target = torch.randn(10) # a dummy target, for example
target = target.view(1,-1) # make it the same shape as output
criterion = nn.MSELoss()
loss = criterion(output, target)
print(loss)
# ---Output---
#tensor(0.5322, grad_fn=<MseLossBackward>)
If you follow the forward propagation so far,
input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
-> view -> linear -> relu -> linear -> relu -> linear
-> MSELoss
-> loss
It can be confirmed by looking at the grad_fn
attribute.
print(loss.grad_fn) # MSELoss
print(loss.grad_fn.next_functions[0][0]) # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0]) # ReLU
# ---Output---
#<MseLossBackward object at 0x7f5008a1c4e0>
#<AddmmBackward object at 0x7f5008a1c5c0>
#<AccumulateGrad object at 0x7f5008a1c4e0>
The gradient of the loss function is required to perform error backpropagation for parameter update. In PyTorch, if you execute loss.backward ()
for the loss function loss
, the gradient will be calculated automatically.
In order to avoid the accumulation of gradients, it is recommended to execute net.zero_grad ()
for each iteration during training to eliminate the gradients.
net.zero_grad() # zeroes the gradient buffers of all parameters
print("conv1.bias.grad before backward")
print(net.conv1.bias.grad)
loss.backward()
print("conv1.bias.grad after backward")
print(net.conv1.bias.grad)
# ---Output---
#conv1.bias.grad before backward
#tensor([0., 0., 0., 0., 0., 0.])
#conv1.bias.grad after backward
#tensor([ 0.0072, -0.0051, -0.0008, -0.0017, 0.0043, -0.0030])
Parameter update (optimization) can be quoted from torch.optim
.
Here, we try to use the stochastic gradient descent method (SGD) defined by the following equation.
See official documentation for details-> torch.optim
weight -> weight - learning_rate * gradient
import torch.optim as optim
# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)
# in your training loop:
optimizer.zero_grad() # zero the gradient buffers
output = net(input)
loss = criterion(output,target)
loss.backward()
optimizer.step() # do the update
Network training is performed by repeating steps 3 to 6 above.
As an example, we train a neural network that classifies images using CIFAR10. I referred to the official reference below. Training a Classifier -- PyTorch Tutorials 1.4.0 documentation
Acquire and standardize the CIFAR10 data provided in torchvision.datasets
.
Since the data in the torchvision dataset is a PILImage with values in the range [0,1], it is standardized here as a Tensor with values in the range [-1,1].
import torchvision
import torchvision.transforms as transforms
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
Let's display the prepared data.
import matplotlib.pyplot as plt
import numpy as np
def imshow(img):
img = img/2 + 0.5 # unnormalize
npimg = img.numpy()
plt.imshow(np.transpose(npimg, (1,2,0)))
plt.show()
# get some random training imges
dataiter = iter(trainloader)
images, labels = dataiter.next()
# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(''.join('%5s' % classes[labels[j]] for j in range(4)))
[Output]
Next, we build a network for classifying images.
import torch.nn as nn
import torch.nn.functional as F
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
Once the network is built, define the loss function and optimization method.
import torch.optim as optim
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
Once the network, loss function, and optimization method have been defined, training is started using the training data.
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs,labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i%2000==1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss/2000))
running_loss = 0.0
print('Finished Training')
# ---Output---
#[1, 2000] loss: 2.149
#[1, 4000] loss: 1.832
#[1, 6000] loss: 1.651
#[1, 8000] loss: 1.573
#[1, 10000] loss: 1.514
#[1, 12000] loss: 1.458
#[2, 2000] loss: 1.420
#[2, 4000] loss: 1.371
#[2, 6000] loss: 1.348
#[2, 8000] loss: 1.333
#[2, 10000] loss: 1.326
#[2, 12000] loss: 1.293
#Finished Training
Here, training using all 12000 training data is performed twice. As the amount of data used for training increases, the loss function loss becomes smaller, so it is possible to observe the progress of learning. (It seems that learning has not been completed yet, but this time we will stop here and move on.)
The parameters of the trained model can be saved with torch.save ()
.
PATH = './cifar_net.pth'
torch.save(net.state_dict(), PATH)
Apply a trained network to the test data. First, check the contents of the test data.
dataiter = iter(testloader)
images, labels = dataiter.next()
imshow(torchvision.utils.make_grid(images))
print('GroundTruth: ', ' '.join('%5s' % classes[labels[j]] for j in range(4)))
[Output]
Then, read the saved network parameters. After that, input the test data into the read model and display the classification result.
net = Net()
net.load_state_dict(torch.load(PATH))
# ---Output---
# <All keys matched successfully>
outputs = net(images)
_, predicted = torch.max(outputs, 1)
print('Predicted: ', ' '.join('%5s' % classes[predicted[j]] for j in range(4)))
# ---Output---
# Predicted: cat ship plane plane
The third image is misjudged as plane instead of ship, but the other three are correctly classified.
Let's calculate the correct answer rate for all 10000 test data.
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (100*correct/total))
# ---Output---
# Accuracy of the network on the 10000 test images: 52 %
The correct answer rate is 52%, which is not very accurate as an image classifier.
Next, try to obtain the correct answer rate for each type of classification.
class_correct = list(0. for i in range(10))
class_total = list(0. for i in range(10))
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs,1)
c = (predicted == labels).squeeze()
for i in range(4):
label = labels[i]
class_correct[label] += c[i].item()
class_total[label] += 1
for i in range(10):
print('Accuracy of %5s : %2d %%' % ( classes[i], 100*class_correct[i]/class_total[i]))
# ---Output---
# Accuracy of plane : 61 %
# Accuracy of car : 61 %
# Accuracy of bird : 52 %
# Accuracy of cat : 26 %
# Accuracy of deer : 34 %
# Accuracy of dog : 51 %
# Accuracy of frog : 67 %
# Accuracy of horse : 43 %
# Accuracy of ship : 76 %
# Accuracy of truck : 50 %
From this, it can be seen that although we are not good at classifying cats, we are good at classifying ships.
When training on GPU, it is necessary to specify CUDA device with device
.
First, check if the GPU is available. If the code below shows cuda: 0
, the GPU is available.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# Assuming that we are on a CUDA machine, this should print a CUDA device:
print(device)
# ---Output---
# cuda:0
You can move networks and data on the GPU with .to (device)
.
When training, don't forget to move the data to the GPU for each iteration.
net.to(device)
inputs, labels = data[0].to(device), data[1].to(device)
Finally, the above procedure is summarized in one code.
# import packages -------------------------------
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
# prepare data ----------------------------------
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
shuffle=True, num_workers=2)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
shuffle=False, num_workers=2)
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
# define a network ------------------------------
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.conv1 = nn.Conv2d(3, 6, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(6, 16, 5)
self.fc1 = nn.Linear(16 * 5 * 5, 120)
self.fc2 = nn.Linear(120, 84)
self.fc3 = nn.Linear(84, 10)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 16 * 5 * 5)
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = self.fc3(x)
return x
net = Net()
# define loss function and optimizer -------------
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)
# start training ---------------------------------
for epoch in range(2): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(trainloader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = net(inputs)
loss = criterion(outputs,labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i%2000==1999: # print every 2000 mini-batches
print('[%d, %5d] loss: %.3f' % (epoch+1, i+1, running_loss/2000))
running_loss = 0.0
print('Finished Training')
# check on test data ----------------------------
correct = 0
total = 0
with torch.no_grad():
for data in testloader:
images, labels = data
outputs = net(images)
_, predicted = torch.max(outputs, 1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
print('Accuracy of the network on the 10000 test images: %d %%' % (100*correct/total))
Recommended Posts