Introduction

This is the 5th installment of PyTorch Official Tutorial following Last time. This is the second part of Learning PyTorch with Examples. Part 1 is here.

nn module

3.1. PyTorch: nn

It is not possible to model a neural network with autograd alone. The nn package is used to build the model. The nn package also contains a Sequential class that defines the input, hidden, and output layers, as well as a loss function. The example below uses the nn package to implement a two-tier network.

import torch

#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create random input data and teacher data
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

# Use the nn package to define our model as a sequence of layers. nn.Sequential
# is a Module which contains other Modules, and applies them in sequence to
# produce its output. Each Linear Module computes output from input using a
# linear function, and holds internal Tensors for its weight and bias.

#Use the nn package to define the model as a series of layers.
# nn.Sequential applies the other modules in turn to generate the model.
#Linear uses linear functions to calculate the output from the input and weight and bias it.
#Hold it in the internal tensor.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)

#The nn package also contains a loss function.
#This time, we will use mean squared error (MSE) as the loss function.
loss_fn = torch.nn.MSELoss(reduction='sum')

learning_rate = 1e-4
for t in range(500):
    #Forward Propagation: Computes the predicted value y by passing x to the model.
    #The base class Module__call__To override the operator
    #It can be called like a function.
    #In that case, pass the input data tensor to the module and generate the output data tensor.
    y_pred = model(x)

    #Calculates and outputs the loss.
    #Passing the predicted value of y and the tensor of the teacher data to the loss function
    #Returns the tensor containing the loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    #Set the gradient to zero before calculating the backpropagation.
    model.zero_grad()

    #Backpropagation: Calculates the loss gradient.
    #Internally, module parameters require_grad =Because it is held as True
    #Calculate the gradients for all parameters.
    loss.backward()

    #Update the weights using stochastic gradient descent.

    with torch.no_grad():
        for param in model.parameters():
            param -= learning_rate * param.grad

3.2. PyTorch: optim

So far, you have calculated and updated the model weights yourself as follows:

param -= learning_rate * param.grad

This calculation method is called stochastic gradient descent. There are other ways to calculate model weights (optimizers), and in some cases you may want to use a more sophisticated algorithm such as AdaGrad, RMSProp, or Adam. There are various optimization algorithms in PyTorch's optim package. The following example defines the model in the nn package as in the example above, but uses the Adam algorithm in the optim package to update the weights.

import torch

#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create random input data and teacher data
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

#Use the nn package to define the model and loss function.
model = torch.nn.Sequential(
    torch.nn.Linear(D_in, H),
    torch.nn.ReLU(),
    torch.nn.Linear(H, D_out),
)
loss_fn = torch.nn.MSELoss(reduction='sum')

#The optim package is used to define an optimization algorithm (optimizer) that updates model weights.
#We will use Adam here.
#The optim package contains many other optimization algorithms.
#The first argument to the Adam constructor specifies which Tensor to update.
learning_rate = 1e-4
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(500):
    #Forward Propagation: Computes the predicted value y by passing x to the model.
    y_pred = model(x)

    #Calculates and outputs the loss.
    loss = loss_fn(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    #Zero all gradients of variables (weights) to update before calculating backpropagation.
    #This is because the gradient is accumulated (not overwritten) each time backward is called.
    optimizer.zero_grad()

    #Backpropagation: Calculates the loss gradient.
    loss.backward()

    #The parameters are updated when you call the optimizer step function.
    optimizer.step()

3.3. PyTorch: Custom nn Modules

If you want to build a complex model, you can subclass nn.Module. You can define your own module by overriding the forward function in the subclass and writing the process of returning the output Tensor from the input Tensor. In this example, we will implement a two-tier network as a custom module subclass.

import torch

class TwoLayerNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
In the constructor, two nn.Instantiate a Linear module
Assign them as member variables.
        """
        super(TwoLayerNet, self).__init__()
        self.linear1 = torch.nn.Linear(D_in, H)
        self.linear2 = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
The forward function should return an output tensor based on the input tensor.
You can use the module defined in the constructor.
        """
        h_relu = self.linear1(x).clamp(min=0)
        y_pred = self.linear2(h_relu)
        return y_pred


#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create random input data and teacher data
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

#Build a model by instantiating the neural network module defined above
model = TwoLayerNet(D_in, H, D_out)

#Loss function and optimization algorithm(optimizer)Is defined.
#SGD argument model.parameters()Is a member of the defined class
#2 nn.It also contains the parameters of the Linear module.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4)
for t in range(500):
    #Forward Propagation: Computes the predicted value y by passing x to the model.
    y_pred = model(x)

    #Calculates and outputs the loss.
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    #Set the gradient to zero, calculate the backpropagation, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

3.4. PyTorch: Control Flow + Weight Sharing

Implement a special model as an example of dynamic graphing and weight sharing. The following ReLU network calculates by selecting a random number from 0 to 3 with the forward function and sharing the same weight among multiple hidden layers. Implement this model as a Module subclass.

import random
import torch


class DynamicNet(torch.nn.Module):
    def __init__(self, D_in, H, D_out):
        """
In the constructor, the three nn used in the forward function.Linear instance(Input layer, hidden layer, output layer)Create a.
        """
        super(DynamicNet, self).__init__()
        self.input_linear = torch.nn.Linear(D_in, H)
        self.middle_linear = torch.nn.Linear(H, H)
        self.output_linear = torch.nn.Linear(H, D_out)

    def forward(self, x):
        """
The forward function randomly selects a value from 0 to 3 times,
        middle_Reuse the linear module multiple times to calculate hidden layer processing.

Since autograd is a dynamic graph, it is built during forward propagation.
Therefore, you can write normal Python processing such as loops and conditional statements in the forward function.

You can use the same module multiple times when defining a calculated graph.
This is an improvement over Lua Torch, where each module could only be used once.
        """
        h_relu = self.input_linear(x).clamp(min=0)
        for _ in range(random.randint(0, 3)):
            h_relu = self.middle_linear(h_relu).clamp(min=0)
            print(str(_))
        print(h_relu.size())
        y_pred = self.output_linear(h_relu)
        return y_pred


#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create random input data and teacher data
x = torch.randn(N, D_in)
y = torch.randn(N, D_out)

#Build a model by instantiating the neural network module defined above
model = DynamicNet(D_in, H, D_out)

#Create a loss function and an optimizer.
#This model is trained by the usual stochastic gradient descent method.(Converge)Is difficult, so specify momentum.
criterion = torch.nn.MSELoss(reduction='sum')
optimizer = torch.optim.SGD(model.parameters(), lr=1e-4, momentum=0.9)
for t in range(500):
    #Forward Propagation: Computes the predicted value y by passing x to the model.
    y_pred = model(x)

    #Calculates and outputs the loss.
    loss = criterion(y_pred, y)
    if t % 100 == 99:
        print(t, loss.item())

    #Set the gradient to zero, calculate the backpropagation, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

At the end

That's the content of PyTorch's fifth tutorial, "Learning PyTorch with Examples". I was able to deepen my understanding of the autograd, torch.nn package, and torch.optim package.

Next time, I would like to proceed with the sixth tutorial "What is torch.nn really?".

History

2020/07/10 First edition released 2020/07/10 Link correction of the first part

[PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 2)

Introduction

At the end

History