I was looking for various sites to implement MNIST learning using CNN with Pytorch
, but I could not find a site that explained CNN
+ MNIST
, so it was a good idea for the first person. I thought it would be difficult, so I decided to write an article.
In this article, I will focus on how to handle Pytorch and MNIST, and how to implement CNN.
↓ ↓ See this famous article for a theoretical explanation of CNN. https://qiita.com/icoxfog417/items/5fd55fad152231d706c2
This time, I will write it on the assumption that you are looking at the above article or have knowledge of CNN.
・ Those who understand python ・ I understand the theory, but how do you implement it? Those who say ・ Those who have learned the outline in a university class but do not know how to actually implement it
python 3.8.8 Pytorch 1.17.0
import.py
import matplotlib.pyplot as plt
import torchvision.transforms as transforms
from torch.utils.data import DataLoader
from torchvision.datasets import MNIST
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optimizer
A research dataset for handwriting recognition. Abbreviation for Mixed National Institute of Standards and Technology database. The data is a black-and-white grayscale image data of 28 × 28 pixels and 256 gradations obtained by handwriting a number from 0 to 9. As the correct answer label, one of 10 kinds of labels from 0 to 9 is given. 60,000 such images are available for training and 10,000 for testing. Since the same data set is used, it is easy to compare the superiority and inferiority of the algorithms. The data set is compact and the training time is short. Therefore, it has become established as a standard data set for image recognition. (From Kotobank)
Simply put, it is a ** collection of data prepared for research and learning **.
There are many ways to get MNIST
, but you can easily download it by using the library torchvision
of Pytorch
.
torchvision
offers a variety of other datasets and useful modules, so remember how to use them easily.
At first, it's okay to recognize as much as ** a library that provides data **.
At the time of import, from torchvision.datasets import MNIST
is set, so it can be described as follows.
Select the destination path for the dataset with root
.
Setting train
to True
selects the training data, and conversely, setting it to False
selects the test data.
If you set download
to True
, the data will be downloaded, and if you set it to False
, it will not be downloaded.
If you are studying or studying with machine learning, you will use the dataset many times, so it is recommended to download it.
torchvision.py
data = MNIST(root="./MNIST", train=True, download=True, transform=transforms.ToTensor())
Next, I will explain about DataLoader
.
DataLoader is a function (mini-batch learning method) that easily divides the data in torch.utils.data
.
The division in this case is not the division into training data and test data used for cross-validation.
The division in this case is called ** mini-batch learning **.
To put it simply, it is a method of learning training data in several parts instead of learning it all at once **.
There are several reasons and grounds for dividing the data, but this time it is difficult to calculate at once if there are a lot of data, so it is okay to recognize about **.
Let's see how to use DataLoader below.
First, select which data to split with data
as the first argument.
batch_size
is 64 this time, which means ** split the selected data into 64 pieces **.
If shuffle
is set to True
, data
will be divided in random order, and if it is set to False
, it will be divided in order from the top of data
.
dataloader.py
data_loader = DataLoader(data, batch_size=64, shuffle=True)
I downloaded MNIST for training data and test data by the method introduced earlier, and set batch_size
to ** 16 ** respectively.
dataload.py
train_data = MNIST(root="./MNIST", train=True, download=True, transform=transforms.ToTensor())
train_data_loader = DataLoader(train_data, batch_size=16, shuffle=True)
test_data = MNIST(root="./MNIST", train=False, download=True, transform=transforms.ToTensor())
test_data_loader = DataLoader(test_data, batch_size=16, shuffle=False)
Finally, we will start building the network, which is the real thrill.
First, the parameters of the convolution part are calculated by a simple calculation before building the entire network.
Since the input data is MNIST, a ** 28x28 ** image will be input.
If this is folded with a filter size of 5, (image size)-(filter size) + (slide width) = (image size after convolution)
, and if the filter size is 5 and the slide width is 1, 28 --5 + 1 = 24
, and ** 28x28 ** is convolved into ** 24x24 **.
If you pass the ** 2x2 ** pool layer here, the overall image size will be halved, so it will be ** 12x12 **.
Further convolution gives a 12 -3 + 1 = 10
and an image size of ** 10x10 **.
Finally, only once more through the ** 2x2 ** pool layer, the final image size will be ** 5x5 **.
The rest is completed by multiplying the total number of pixels of the image (5x5 = 25) and the final number of outputs to the fully connected layer.
Please refer to the simple diagram of the series of steps. (Only the layer is shown without writing the activation function etc.)
__Init__
Conv2d
now takes the argumentConv2d (number of input channels, number of output channels, filter size)
.
It is okay to think of the input / output channels here as processing similar to that of a normal intermediate layer. (See the official documentation for the exact definition)
In other words, in the case of self.conv1, it means "16 outputs for 1 input".
The size of the filter, as mentioned in the premise article, takes a small area called a filter on the image and compresses (= convolves) this as one feature quantity.
In the case of self.conv1 this time, ** 5 ** is specified as the argument of the filter part, so the size of the filter (small area) is a 5x5 square.
The Linear
is the same as a normal NN (neural network), only the input / output unit is specified.
Forward
In forward, it is the same as a normal NN (neural network), and describes the flow from the input layer to the output layer of the network.
The max_pool2d
that appears here pools the size of the value entered (like compression).
This time, 2x2 is set for both pooling layers, so the input is halved.
Since the input is compressed and returned, by setting F.max_pool2d (F.relu (self.conv1 (x)), (2,2))
, a further compressed x will be created after being folded. ..
There is an activation function relu in between, but I won't explain this.
If you don't know, it's okay if you think about the processing required between layers.
Furthermore, since this time it is a multi-class classification (three or more types of correct labels), the log_softmax
function is used as the activation function at the end.
network.py
import torch.nn as nn
import torch.nn.functional as F
import torch
class Net(nn.Module):
def __init__(self):
super().__init__()
self.conv1 = nn.Conv2d(1, 128, 5)
self.conv2 = nn.Conv2d(128, 256, 3)
self.fc1 = nn.Linear(256 * 5 * 5, 150)
self.fc2 = nn.Linear(150, 120)
self.fc3 = nn.Linear(120, 10)
def forward(self,x):
x = F.max_pool2d(F.relu(self.conv1(x)),(2,2))
x = F.max_pool2d(F.relu(self.conv2(x)),(2,2))
x = x.view(-1,256 * 5 * 5)
x = self.fc1(x)
x = self.fc2(x)
x = F.log_softmax(self.fc3(x),dim=1)
return x
net = Net()
print(net)
-Execution result-
When executed, the outline of the created network will be output as shown below.
Make sure that the items you specify are correct.
Some items are set by default, so if you don't know a word, you can ignore it.
kernel_size
represents the size of the filter.
Net(
(conv1): Conv2d(1, 64, kernel_size=(5, 5), stride=(1, 1))
(conv2): Conv2d(64, 128, kernel_size=(3, 3), stride=(1, 1))
(fc1): Linear(in_features=3200, out_features=200, bias=True)
(fc2): Linear(in_features=200, out_features=120, bias=True)
(fc3): Linear(in_features=120, out_features=10, bias=True)
)
This time we will use cross entropy for the loss function and SGD for the optimization function.
optimizer.py
import torch.optim as optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optimizer.SGD(net.parameters(),lr=0.01)
This time, the number of epochs is set to 5, but you can set it to the optimum value if you like. When performing GPU calculation using CUDA etc., the learning time can be significantly shortened by increasing the number of batches and the number of epochs. (Example: batch number-> 256, epoch number-> 150, etc.) This is the same as a normal NN (neural network), so I will omit the explanation.
learn.py
epochs = 3
for epoch in range(epochs):
running_loss = 0.0
for i,data in enumerate(train_data_loader):
train_data, teacher_labels = data
optimizer.zero_grad()
outputs = net(train_data)
loss = criterion(outputs,teacher_labels)
loss.backward()
optimizer.step()
running_loss += loss.item()
if i % 2000 == 1999:
print("Learning progress: [{0},{1}]Learning loss: {2}".format(epoch + 1, i + 1, running_loss / 2000))
running_loss = 0.0
print("End of learning")
-Execution result-
Learning progress: [1,2000]Learning loss: 0.595951408088862
Learning progress: [1,4000]Learning loss: 0.1649219517123347
Learning progress: [1,6000]Learning loss: 0.11794542767963322
Learning progress: [1,8000]Learning loss: 0.10531693481299771
Learning progress: [1,10000]Learning loss: 0.09142379220648672
Learning progress: [1,12000]Learning loss: 0.07094955143502511
Learning progress: [1,14000]Learning loss: 0.07791419489397276
Learning progress: [2,2000]Learning loss: 0.053966304615655415
Learning progress: [2,4000]Learning loss: 0.06185422517460961
Learning progress: [2,6000]Learning loss: 0.0553153860775642
Learning progress: [2,8000]Learning loss: 0.048434725125255054
Learning progress: [2,10000]Learning loss: 0.050712744873740806
Learning progress: [2,12000]Learning loss: 0.04531467049338937
Learning progress: [2,14000]Learning loss: 0.04466106877903383
Learning progress: [3,2000]Learning loss: 0.032806222373120246
Learning progress: [3,4000]Learning loss: 0.03608645232143033
Learning progress: [3,6000]Learning loss: 0.03568348353291483
Learning progress: [3,8000]Learning loss: 0.04021910582008564
Learning progress: [3,10000]Learning loss: 0.038381989276232556
Learning progress: [3,12000]Learning loss: 0.038247817724080646
Learning progress: [3,14000]Learning loss: 0.03719676028518558
End of learning
At first, we divided it into train_data
and test_data
, so we will perform cross-validation.
count
represents the total of the prediction results obtained using the generated trained model, and total
represents the total value of the correct labels for test_data.
The correct answer rate is calculated by dividing these.
cross_varidation.py
count = 0
total = 0
for data in test_data_loader:
test_data, teacher_labels = data[0], data[1]
results = net(test_data)
_, predicted = torch.max(results.data, 1)
count += (predicted == teacher_labels).sum()
total += teacher_labels.size(0)
print("Correct answer rate: {0}/{1} -> {2}".format(count, total, (int(count) / int(total)) * 100))
-Execution result- Since we are using CNN, the accuracy is as high as 99%.
Correct answer rate: 9906/10000 -> 99.06
I think it's hard to get a real feeling because you can't see the prediction result just by cross-validation.
So let's visualize the MNIST forecast data.
Since test_data_loader
is a special object dedicated to Pytorch
, it is first iterated and then divided into explanatory variables and objective variables. (Of course, it's okay if you turn it with a for statement)
After that, get the label of the forecast data in the same way as for cross-validation.
Since MNIST is ** 28x28 ** handwritten digit data, convert the test data to a numpy array with reshape
.
You can use reshape even if you don't have numpy installed because the function numpy ()
in pytorch converts it to a numpy array.
result.py
test_itr = iter(test_data_loader)
test_data, labels = test_itr.next()
results = net(test_data)
_, predicted = torch.max(results.data, 1)
plt.imshow(test_data[0].numpy().reshape(28,28), cmap="inferno", interpolation="bicubic")
print("label: {0}".format(predicted[0]))
Execution result
label: 7
Thank you for your hard work. CNN is a basic category of neural networks, and if you understand this, it will be easy to apply, so it is recommended that you understand it deeply. This time, I learned with the prepared data called MNIST, but it may be interesting to train with the data prepared by myself. In that case, there is work before learning, such as adjusting the size of the data in advance and doing some pre-processing, but if you use your own data, you will deepen your understanding considerably, so if you are interested, try it. recommend to. If there are any mistakes in the article, it would be helpful if you could tell me. We will correct it each time.
Recommended Posts