In this article, I will write down the basic implementation using PyTorch (also as a memorandum). Take the classification of CIFAR10 (color image classification set) as an example.
Copyright 2020 shun310
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
The whole view of the program is as follows.
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
ave = 0.5 #Normalized average
std = 0.5 #Normalized standard deviation
batch_size_train = 256 #Learning batch size
batch_size_test = 16 #Test batch size
val_ratio = 0.2 #Ratio of validation data to total data
epoch_num = 30 #Number of learning epochs
class Net(nn.Module):
#Definition of network structure
def __init__(self):
super(Net, self).__init__()
self.init_conv = nn.Conv2d(3,16,3,padding=1)
self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])
self.bn1 = nn.ModuleList([nn.BatchNorm2d(16) for _ in range(3)])
self.pool = nn.MaxPool2d(2, stride=2)
self.fc1 = nn.ModuleList([nn.Linear(16*16*16, 128), nn.Linear(128, 32)])
self.output_fc = nn.Linear(32, 10)
#Forward calculation
def forward(self, x):
x = F.relu(self.init_conv(x))
for l,bn in zip(self.conv1, self.bn1):
x = F.relu(bn(l(x)))
x = self.pool(x)
x = x.view(-1,16*16*16) # flatten
for l in self.fc1:
x = F.relu(l(x))
x = self.output_fc(x)
return x
def set_GPU():
#GPU settings
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
return device
def load_data():
#Data loading
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((ave,),(std,))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
#Validation data split
n_samples = len(train_set)
val_size = int(n_samples * val_ratio)
train_set, val_set = torch.utils.data.random_split(train_set, [(n_samples-val_size), val_size])
#DataLoader definition
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size_train, shuffle=True, num_workers=2)
val_loader = torch.utils.data.DataLoader(val_set, batch_size=batch_size_train, shuffle=False, num_workers=2)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size_test, shuffle=False, num_workers=2)
return train_loader, test_loader, val_loader
def train():
device = set_GPU()
train_loader, test_loader, val_loader = load_data()
model = Net()
model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)
min_loss = 999999999
print("training start")
for epoch in range(epoch_num):
train_loss = 0.0
val_loss = 0.0
train_batches = 0
val_batches = 0
model.train() #Training mode
for i, data in enumerate(train_loader): #Read by batch
inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of
#Gradient reset
optimizer.zero_grad()
outputs = model(inputs) #Forward calculation
loss = criterion(outputs, labels) #Loss calculation
loss.backward() #Reverse calculation(Gradient calculation)
optimizer.step() #Parameter update
#Cumulative history
train_loss += loss.item()
train_batches += 1
# validation_Loss calculation
model.eval() #Inference mode
with torch.no_grad():
for i, data in enumerate(val_loader): #Read by batch
inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of
outputs = model(inputs) #Forward calculation
loss = criterion(outputs, labels) #Loss calculation
#Cumulative history
val_loss += loss.item()
val_batches += 1
#History output
print('epoch %d train_loss: %.10f' %
(epoch + 1, train_loss/train_batches))
print('epoch %d val_loss: %.10f' %
(epoch + 1, val_loss/val_batches))
with open("history.csv",'a') as f:
print(str(epoch+1) + ',' + str(train_loss/train_batches) + ',' + str(val_loss/val_batches),file=f)
#Save the best model
if min_loss > val_loss/val_batches:
min_loss = val_loss/val_batches
PATH = "best.pth"
torch.save(model.state_dict(), PATH)
#Dynamic change of learning rate
scheduler.step(val_loss/val_batches)
#Save model of final epoch
print("training finished")
PATH = "lastepoch.pth"
torch.save(model.state_dict(), PATH)
if __name__ == "__main__":
train()
The libraries used in this article are as follows.
import torch
import torchvision
import torchvision.transforms as transforms
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import numpy as np
Define constants to be used later.
ave = 0.5 #Normalized average
std = 0.5 #Normalized standard deviation
batch_size_train = 256 #Learning batch size
batch_size_test = 16 #Test batch size
val_ratio = 0.2 #Ratio of validation data to total data
epoch_num = 30 #Number of learning epochs
If you use GPU, you need to specify device before various settings.
def set_GPU():
#GPU settings
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(device)
return device
The GPU is used through this object called device. For example
data.to(device)
model.to(device)
By doing so, data and neural network models can be loaded on the GPU.
Several datasets are already available in PyTorch. For example, with CIFAR10, you can prepare as follows.
def load_data():
#Data loading
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((ave,),(std,))])
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
#Validation data split
n_samples = len(train_set)
val_size = int(n_samples * val_ratio)
train_set, val_set = torch.utils.data.random_split(train_set, [(n_samples-val_size), val_size])
#DataLoader definition
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size_train, shuffle=True, num_workers=2)
val_loader = torch.utils.data.DataLoader(val_set, batch_size=batch_size_train, shuffle=False, num_workers=2)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size_test, shuffle=False, num_workers=2)
return train_loader, test_loader, val_loader
I will explain in order. First, transform represents a series of processes that have the function of transforming data.
transform = transforms.Compose([transforms.ToTensor(),transforms.Normalize((ave,),(std,))])
In the above example, transforms.ToTensor () changes the data to Tensor (PyTorch data type), and then transforms.Normalize ((ave,), (std,)) normalizes the mean ave and standard deviation std. Is going. Compose plays the role of organizing a series of processes. There are several other types of data conversion. See documentation.
Next, read the data of CIFAR10.
train_set = torchvision.datasets.CIFAR10(root='./data', train=True, download=True, transform=transform)
test_set = torchvision.datasets.CIFAR10(root='./data', train=False, download=True, transform=transform)
Specify the download destination for root. It seems that PyTorch's existing dataset is divided into train and test. Specify with the option train. Here, set to perform data transformation by passing the transform defined earlier as an argument.
I want to separate the verification data from the read data. For that, we use torch.utils.data.random_split.
train_set, val_set = torch.utils.data.random_split(train_set, [(n_samples-val_size), val_size])
It randomly divides the data into the number specified by the second argument.
When learning with PyTorch, using DataLoader is very convenient when learning. Make it as follows.
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size_train, shuffle=True, num_workers=2)
val_loader = torch.utils.data.DataLoader(val_set, batch_size=batch_size_train, shuffle=False, num_workers=2)
test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size_test, shuffle=False, num_workers=2)
Pass the dataset as the first argument. batch_size specifies the batch size, shuffle specifies the presence / absence of data shuffle, and num_workers specifies the number of subprocesses (parallel number) at the time of reading. By the way, the identity of DataLoader is an iterator. Therefore, at the time of learning, data is fetched for each batch with a for statement.
There are many situations where you want to use your own data. In this case, you should define your own dataset class as follows. (Reference: https://qiita.com/mathlive/items/2a512831878b8018db02)
class MyDataset(torch.utils.data.Dataset):
def __init__(self, data, label, transform=None):
self.transform = transform
self.data = data
self.data_num = len(data)
self.label = label
def __len__(self):
return self.data_num
def __getitem__(self, idx):
out_data = self.data[idx]
out_label = self.label[idx]
if self.transform:
out_data = self.transform(out_data)
return out_data, out_label
At least, define len (a function that returns the size of data) and getitem (a function that gets data) in the class. With this,
dataset = MyDataset(Input data,Teacher label, transform=Required data conversion)
You can create a dataset like this. As an aside, if you look at the contents of MyDataset, you can somehow understand what PyTorch is doing inside. In other words, when idx is specified, the data corresponding to that index is returned. Moreover, if any transform is specified, it is applied to the data. It is possible to make various operations by playing with getitem (for example, using two transforms), but it is omitted.
PyTorch makes it easy to define neural networks using classes. For the CIFAR10 classifier, for example:
class Net(nn.Module):
#Definition of network structure
def __init__(self):
super(Net, self).__init__()
self.init_conv = nn.Conv2d(3,16,3,padding=1)
self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])
self.bn1 = nn.ModuleList([nn.BatchNorm2d(16) for _ in range(3)])
self.pool = nn.MaxPool2d(2, stride=2)
self.fc1 = nn.ModuleList([nn.Linear(16*16*16, 128), nn.Linear(128, 32)])
self.output_fc = nn.Linear(32, 10)
#Forward calculation
def forward(self, x):
x = F.relu(self.init_conv(x))
for l,bn in zip(self.conv1, self.bn1):
x = F.relu(bn(l(x)))
x = self.pool(x)
x = x.view(-1,16*16*16) # flatten
for l in self.fc1:
x = F.relu(l(x))
x = self.output_fc(x)
return x
I will explain various things.
The network is created by inheriting nn.Module. Define each layer as its own member in init.
def __init__(self):
super(Net, self).__init__()
self.init_conv = nn.Conv2d(3,16,3,padding=1)
self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])
self.bn1 = nn.ModuleList([nn.BatchNorm2d(16) for _ in range(3)])
self.pool = nn.MaxPool2d(2, stride=2)
self.fc1 = nn.ModuleList([nn.Linear(16*16*16, 128), nn.Linear(128, 32)])
self.output_fc = nn.Linear(32, 10)
For example, nn.Conv2d passes the following arguments.
nn.Conv2d(Number of input channels, number of output channels, kernel size, padding=Padding size, stride=Amount of movement)
See the official documentation for details. (https://pytorch.org/docs/stable/nn.html)
The network layer can also be defined as an array using nn.ModuleList.
self.conv1 = nn.ModuleList([nn.Conv2d(16,16,3,padding=1) for _ in range(3)])
This is especially useful when defining a large-scale network with a repeating structure. In addition, it seems that the parameters cannot be updated if the array is changed to a normal array without using nn.ModuleList. (Reference: https://qiita.com/perrying/items/857df46bb6cdc3047bd8) Make sure to use nn.ModuleList properly.
Forward calculation is defined as forward.
def forward(self, x):
x = F.relu(self.init_conv(x))
for l,bn in zip(self.conv1, self.bn1):
x = F.relu(bn(l(x)))
x = self.pool(x)
x = x.view(-1,16*16*16) # flatten
for l in self.fc1:
x = F.relu(l(x))
x = self.output_fc(x)
return x
The layers arranged by nn.ModuleList are taken out by the for statement. This kind of place is also convenient. Also, on the way
x = x.view(-1,16*16*16) # flatten
a. Here, the image-like data is converted into a one-dimensional vector (pass the number of channels x the vertical width of the image x the horizontal width of the image as the second argument). The reason why the first argument is -1 is that the conversion is automatically performed according to the batch size.
The loss function is defined in torch.nn and the update method is defined in torch.optim, which are called and used. This time, we use CrossEntropyLoss as the loss function for classification. Adam is used as the update method.
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
In order to perform learning efficiently, we may want to dynamically set (or reduce) the learning rate. In that case, use something called lr_scheduler. For example
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)
Define the scheduler like this. With this, after calculating the loss to the validation data,
scheduler.step(val_loss)
If there is no improvement during the (patience) epoch, the learning rate will be automatically reduced. This can prevent stagnation of learning.
As with datasets, you may want to create your own loss function. It seems that this can be created by inheriting the PyTorch class or defined as a simple function. (Reference: https://kento1109.hatenablog.com/entry/2018/08/13/092939) For simple regression / classification tasks, MSELoss / CrossEntropyLoss often works well, but for machine learning papers, the loss function is devised to improve performance, so it is a fairly important implementation.
After preparing up to this point, we will finally start learning. First, apply each setting defined so far.
device = set_GPU()
train_loader, test_loader, val_loader = load_data()
model = Net()
model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
scheduler = optim.lr_scheduler.ReduceLROnPlateau(optimizer, patience=5, verbose=True)
The learning is as follows. Basically as commented out.
min_loss = 999999999
print("training start")
for epoch in range(epoch_num):
train_loss = 0.0
val_loss = 0.0
train_batches = 0
val_batches = 0
model.train() #Training mode
for i, data in enumerate(train_loader): #Read by batch
inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of
#Gradient reset
optimizer.zero_grad()
outputs = model(inputs) #Forward calculation
loss = criterion(outputs, labels) #Loss calculation
loss.backward() #Reverse calculation(Gradient calculation)
optimizer.step() #Parameter update
#Cumulative history
train_loss += loss.item()
train_batches += 1
# validation_Loss calculation
model.eval() #Inference mode
with torch.no_grad():
for i, data in enumerate(val_loader): #Read by batch
inputs, labels = data[0].to(device), data[1].to(device) #data is[inputs, labels]List of
outputs = model(inputs) #Forward calculation
loss = criterion(outputs, labels) #Loss calculation
#Cumulative history
val_loss += loss.item()
val_batches += 1
#History output
print('epoch %d train_loss: %.10f' %
(epoch + 1, train_loss/train_batches))
print('epoch %d val_loss: %.10f' %
(epoch + 1, val_loss/val_batches))
with open("history.csv",'a') as f:
print(str(epoch+1) + ',' + str(train_loss/train_batches) + ',' + str(val_loss/val_batches),file=f)
#Save the best model
if min_loss > val_loss/val_batches:
min_loss = val_loss/val_batches
PATH = "best.pth"
torch.save(model.state_dict(), PATH)
#Dynamic change of learning rate
scheduler.step(val_loss/val_batches)
#Save model of final epoch
print("training finished")
PATH = "lastepoch.pth"
torch.save(model.state_dict(), PATH)
In PyTorch, it is necessary to explicitly describe the calculation of the loss function and the back propagation of the error (there is also a library that wraps this up).
Official tutorial (https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html) for testing the learned model etc. ). In the case of this model, the accuracy of each class is about 40 to 80% (the blur is quite large ...).
For Batch Normalization and Dropout, it is necessary to switch the behavior between learning and inference. Therefore, during learning and inference,
model.train()
model.eval()
It seems that it is necessary to add each description. In addition to this, there is also a mode called torch.no_grad (). This is a mode that does not save gradient information. Gradient information is not required because backward calculation is not performed at the time of verification, and omitting this will increase the calculation speed and save memory.
I feel like I'm exhausted on the way, so I may add it soon. I'm glad if you can use it as a reference.
Mr. fukuit: https://qiita.com/fukuit/items/215ef75113d97560e599 Mr. perrying: https://qiita.com/perrying/items/857df46bb6cdc3047bd8
Official Tutorial: https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html transform Official documentation: https://pytorch.org/docs/stable/torchvision/transforms.html
Official documentation: https://pytorch.org/docs/stable/torchvision/datasets.html
Official documentation: https://pytorch.org/docs/stable/data.html
mathlive: https://qiita.com/mathlive/items/2a512831878b8018db02
Mr. kento1109: https://kento1109.hatenablog.com/entry/2018/08/13/092939
Official documentation: https://pytorch.org/docs/stable/generated/torch.nn.ModuleList.html
Official documentation: https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
Official Tutorial: https://pytorch.org/tutorials/beginner/saving_loading_models.html jyori112: https://qiita.com/jyori112/items/aad5703c1537c0139edb
Official Tutorial: https://pytorch.org/tutorials/beginner/saving_loading_models.html PyTorch Forum: https://discuss.pytorch.org/t/model-eval-vs-with-torch-no-grad/19615 s0sem0y: https://www.hellocybernetics.tech/entry/2018/02/20/182906
Recommended Posts