Introduction

It's been about half a year since I moved from TensorFlow to Pytorch, so I'll summarize the basics. This time, I would like to focus on the following three points.

Use of pre-learning model 2.1 Implementation of DCNN 3.2 Implementation of DCNN

1 I would like to briefly explain DCNN and focus on 2DCNN.
It does not write about the theory. This is an implementation-centric article.

1. Use of pre-learning model

The pre-trained models currently available are:

AlexNet
VGG
ResNet
SqueezeNet
DenseNet
Inception v3
GoogLeNet
ShuffleNet v2
MobileNet v2
ResNeXt
Wide ResNet
MNasNet

Click here for details

When using the trained model in ImageNet, use it as follows.

import torchvision
model = torchvision.models.alexnet(pretrained=True)

--Unless you set pretrained = True, the trained weights in ImageNet will not be loaded. --Please note that the default is pretrained = False.

--If you want to check the structure of the model, you can check it with print (model). The following is the execution result.

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

If you want to classify by your own data, change as follows. Take two-class classification as an example.

model.classifier[6].out_features = 2

If you execute print (model) again, you can see that it has changed.

AlexNet(
  (features): Sequential(
    (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
    (1): ReLU(inplace=True)
    (2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
    (4): ReLU(inplace=True)
    (5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
    (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (7): ReLU(inplace=True)
    (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (9): ReLU(inplace=True)
    (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    (11): ReLU(inplace=True)
    (12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
  (classifier): Sequential(
    (0): Dropout(p=0.5, inplace=False)
    (1): Linear(in_features=9216, out_features=4096, bias=True)
    (2): ReLU(inplace=True)
    (3): Dropout(p=0.5, inplace=False)
    (4): Linear(in_features=4096, out_features=4096, bias=True)
    (5): ReLU(inplace=True)
    (6): Linear(in_features=4096, out_features=2, bias=True)
  )
)

2.1 Implementation of DCNN

Now let's get down to the main topic. This time we will implement 1D CNN with scratch. Here is a simple example.

import torch
import torch.nn as nn


class Net1D(nn.Module):
    def __init__(self):
        super(SimpleNet,self).__init__()

        self.conv1 = nn.Conv1d(1, 8,kernel_size=3, stride=1)
        self.bn1 = nn.BatchNorm1d(8)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool1d(kernel_size=3, stride=2)

        self.conv2 = nn.Conv1d(8, 16,kernel_size=3, stride=1)
        self.bn2 = nn.BatchNorm1d(16)
        self.conv3 = nn.Conv1d(16,64,kernel_size=3, stride=1)
        self.gap = nn.AdaptiveAvgPool1d(1)
        self.fc = nn.Linear(64,2)


    def forward(self,x):

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.conv3(x)
        x = self.gap(x)
        x = x.view(x.size(0),-1)
        x = self.fc(x)
 
        return x

If you want to see if this model works, try the following:


model = SimpleNet()
in_data = torch.randn(8,1,50)
out_data = model(data)
print(out_size.size()) #torch.Size([8, 2])

--Prepare appropriate input data with torch.randn (). ← This method is convenient! It can also be applied in 2D! --The input is torch.randn (batch size, number of channels, one-dimensional array size). The size of the output is torch.Size ([8, 2]), which means torch.Size (batch size, last output). --If you want to do a classification task, you can do it through softmax after this.

--Also, there is a convenient library called torch summary that allows you to check the size of the feature map, so please use that as well. I wrote an article before, so I will post a link.

===============================================================

About the matter that torch summary can be used seriously when building a model with Pytorch

===============================================================

【nn.Conv1d】

nn.Conv1d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

parameters	Overview
in_channels	Number of input channels.
out_channels	The number of channels after convolution. Number of filters.
kernel_size	The size of the kernel.
stride	How much to move the kernel.
padding	The size of the padding. If 1 is specified, it will be inserted at both ends, so it will be increased by 2. The default is 0.
dilation	Change the space between the filters. Used in atrous conv etc.
groups	The default is 1. Increasing the number reduces calculation costs.
bias	Whether to include bias. Default is True
padding_mode	Padding mode. The default is 0.

【nn.BatchNorm1d】

nn.BatchNorm1d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

For num_features, enter the same number as the value of ʻout_channels` of the previous layer.

3. 2D CNN

I wrote a simple CNN sample. This time, the number of filters and the kernel size are decided appropriately. If you create your own network, consider the value when deciding the value.


import torch
import torch.nn as nn

class Net2D(nn.Module):
    def __init__(self):
        super(Net,self).__init__()

        self.conv1 = nn.Conv2d(3,16,kernel_size=3,stride=2)
        self.bn1 = nn.BatchNorm2d(16)
        self.relu = nn.ReLU()
        self.maxpool = nn.MaxPool2d(2)
        self.conv2 = nn.Conv2d(16,32,kernel_size=3,stride=2)
        self.bn2 = nn.BatchNorm2d(32)
        self.conv3 = nn.Conv2d(32,64,kernel_size=3,stride=2)
        self.gap = nn.AdaptiveAvgPool2d(1)
        self.fc1 = nn.Linear(64,32)
        self.fc2 = nn.Linear(32,2)


    def forward(self,x):

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.maxpool(x)

        x = self.conv3(x)
        x = self.gap(x)
        x = x.view(x.size(0),-1)
        x = self.fc1(x)
        x = self.fc2(x)
        return x

--When creating your own model, you need to inherit nn.Module. --Basically, define the layer used by ʻinit. I often see articles that define ʻinit for those with parameters and forward for those without parameters, but since relu etc. are not displayed whenprint (model), I have parameters. Even something like no relu is defined in ʻinit` like this time.

--forward determines the structure of the model.

【nn.Conv2d】


nn.Conv2d(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1, bias=True, padding_mode='zeros')

parameters	Overview
in_channels	Number of input channels. It is 3 for RGB images.
out_channels	The number of channels after convolution. Number of filters.
kernel_size	The size of the kernel.
stride	How much to move the kernel.
padding	The size of the padding. If 1 is specified, it will be inserted at both ends, so it will be increased by 2. The default is 0.
dilation	Change the space between the filters. Used in atrous conv etc.
groups	The default is 1. Increasing the number reduces calculation costs.
bias	Whether to include bias. Default is True
padding_mode	Padding mode. The default is 0.

【nn.BatchNorm2d】

nn.BatchNorm2d(num_features, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)

--Batch normalization finds the mean and standard deviation for each element in the batch. When convolving, it normalizes to the channels in the batch. When it is a fully connected layer, it becomes a unit.

--In addition, there are Layer Norm, ʻInstance Norm, Group Norm`, etc., so if you are interested, please search.

【nn.ReLU】

nn.ReLU(inplace=False)

(x) = max(0,x) --ReLU is an activation function. Others include ReLU6, RReLU, SELU, CELU, and Sigmoid.

【nn.MaxPool2d】

nn.MaxPool2d(kernel_size, stride=None, padding=0, dilation=1, return_indices=False, ceil_mode=False)

Use a pooling layer to emphasize the features.

――There are two main patterns, so check below.

① When the size of the pool is square

m = nn.MaxPool2d(3, stride=2)  #(pool of square window of size=3, stride=2)

② When you want to customize the size of the pool

m = nn.MaxPool2d((3, 2), stride=(2, 1)) #(pool of non-square window)

【nn.AdaptiveMaxPool2d】

nn.AdaptiveMaxPool2d(output_size, return_indices=False)

Often called Global Max Pooling. It is often used before connecting to a fully connected layer, as it makes each channel a single value. Put the output size of one channel in ʻoutput_size. I think that ʻoutput_size = 1 is often used.

【nn.Linear】


nn.Linear(in_features, out_features, bias=True)

Specify in_features and out_features to use. Use this when implementing a fully connected layer.

At the end

It's been about half a year since I moved to Pytorch, and it's very easy to use. I hope this article will be of some help to you.

1D-CNN, 2D-CNN scratch implementation summary by Pytorch