It's been about half a year since I moved from TensorFlow to Pytorch, so I'll summarize the basics. This time, I would like to focus on the following three points.
The pre-trained models currently available are:
When using the trained model in ImageNet, use it as follows.
import torchvision
model = torchvision.models.alexnet(pretrained=True)
--Unless you set pretrained = True
, the trained weights in ImageNet will not be loaded.
--Please note that the default is pretrained = False
--If you want to check the structure of the model, you can check it with print (model)
. The following is the execution result.
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): Linear(in_features=9216, out_features=4096, bias=True)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=4096, out_features=4096, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=4096, out_features=1000, bias=True)
If you want to classify by your own data, change as follows. Take two-class classification as an example.
model.classifier[6].out_features = 2
If you execute print (model)
again, you can see that it has changed.
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): Linear(in_features=9216, out_features=4096, bias=True)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=4096, out_features=4096, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=4096, out_features=2, bias=True)
Now let's get down to the main topic. This time we will implement 1D CNN with scratch. Here is a simple example.
import torch
import torch.nn as nn
class Net1D(nn.Module):
def __init__(self):
self.conv1 = nn.Conv1d(1, 8,kernel_size=3, stride=1)
self.bn1 = nn.BatchNorm1d(8)
self.relu = nn.ReLU()
self.maxpool = nn.MaxPool1d(kernel_size=3, stride=2)
self.conv2 = nn.Conv1d(8, 16,kernel_size=3, stride=1)
self.bn2 = nn.BatchNorm1d(16)
self.conv3 = nn.Conv1d(16,64,kernel_size=3, stride=1) = nn.AdaptiveAvgPool1d(1)
self.fc = nn.Linear(64,2)
def forward(self,x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.conv3(x)
x =
x = x.view(x.size(0),-1)
x = self.fc(x)
return x
If you want to see if this model works, try the following:
model = SimpleNet()
in_data = torch.randn(8,1,50)
out_data = model(data)
print(out_size.size()) #torch.Size([8, 2])
--Prepare appropriate input data with torch.randn ()
. ← This method is convenient! It can also be applied in 2D!
--The input is torch.randn (batch size, number of channels, one-dimensional array size)
The size of the output is torch.Size ([8, 2])
, which means torch.Size (batch size, last output)
--If you want to do a classification task, you can do it through softmax after this.
--Also, there is a convenient library called torch summary
that allows you to check the size of the feature map, so please use that as well. I wrote an article before, so I will post a link.
nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
parameters | Overview |
in_channels | Number of input channels. |
out_channels | The number of channels after convolution. Number of filters. |
kernel_size | The size of the kernel. |
stride | How much to move the kernel. |
padding | The size of the padding. If 1 is specified, it will be inserted at both ends, so it will be increased by 2. The default is 0. |
dilation | Change the space between the filters. Used in atrous conv etc. |
groups | The default is 1. Increasing the number reduces calculation costs. |
bias | Whether to include bias. Default is True |
padding_mode | Padding mode. The default is 0. |
nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
For num_features
, enter the same number as the value of ʻout_channels` of the previous layer.
I wrote a simple CNN sample. This time, the number of filters and the kernel size are decided appropriately. If you create your own network, consider the value when deciding the value.
import torch
import torch.nn as nn
class Net2D(nn.Module):
def __init__(self):
self.conv1 = nn.Conv2d(3,16,kernel_size=3,stride=2)
self.bn1 = nn.BatchNorm2d(16)
self.relu = nn.ReLU()
self.maxpool = nn.MaxPool2d(2)
self.conv2 = nn.Conv2d(16,32,kernel_size=3,stride=2)
self.bn2 = nn.BatchNorm2d(32)
self.conv3 = nn.Conv2d(32,64,kernel_size=3,stride=2) = nn.AdaptiveAvgPool2d(1)
self.fc1 = nn.Linear(64,32)
self.fc2 = nn.Linear(32,2)
def forward(self,x):
x = self.conv1(x)
x = self.bn1(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.conv2(x)
x = self.bn2(x)
x = self.relu(x)
x = self.maxpool(x)
x = self.conv3(x)
x =
x = x.view(x.size(0),-1)
x = self.fc1(x)
x = self.fc2(x)
return x
--When creating your own model, you need to inherit nn.Module
--Basically, define the layer used by ʻinit. I often see articles that define ʻinit
for those with parameters and forward
for those without parameters, but since relu etc. are not displayed whenprint (model)
, I have parameters. Even something like no relu is defined in ʻinit` like this time.
determines the structure of the model.
nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')
parameters | Overview |
in_channels | Number of input channels. It is 3 for RGB images. |
out_channels | The number of channels after convolution. Number of filters. |
kernel_size | The size of the kernel. |
stride | How much to move the kernel. |
padding | The size of the padding. If 1 is specified, it will be inserted at both ends, so it will be increased by 2. The default is 0. |
dilation | Change the space between the filters. Used in atrous conv etc. |
groups | The default is 1. Increasing the number reduces calculation costs. |
bias | Whether to include bias. Default is True |
padding_mode | Padding mode. The default is 0. |
nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
--Batch normalization finds the mean and standard deviation for each element in the batch. When convolving, it normalizes to the channels in the batch. When it is a fully connected layer, it becomes a unit.
--In addition, there are Layer Norm
, ʻInstance Norm,
Group Norm`, etc., so if you are interested, please search.
, and Sigmoid
nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)
Use a pooling layer to emphasize the features.
――There are two main patterns, so check below.
① When the size of the pool is square
m = nn.MaxPool2d(3, stride=2) #(pool of square window of size=3, stride=2)
② When you want to customize the size of the pool
m = nn.MaxPool2d((3, 2), stride=(2, 1)) #(pool of non-square window)
nn.AdaptiveMaxPool2d(output_size, return_indices=False)
Often called Global Max Pooling
It is often used before connecting to a fully connected layer, as it makes each channel a single value.
Put the output size of one channel in ʻoutput_size. I think that ʻoutput_size = 1
is often used.
nn.Linear(in_features, out_features, bias=True)
Specify in_features and out_features to use. Use this when implementing a fully connected layer.
It's been about half a year since I moved to Pytorch, and it's very easy to use. I hope this article will be of some help to you.
Recommended Posts