Introduction

This is the 5th installment of PyTorch Official Tutorial following Last time. This time, we will proceed with Learning PyTorch with Examples.

Learning PyTorch with Examples

This tutorial will show you two main features of PyTorch through sample code.

Tensor --Automatic differentiation and neural network

The network (model) handled by the sample code is 3 layers (input layer, hidden layer x 1, output layer). The activation function uses ReLU.

Tensor

1.1. Warm-up: numpy

Before PyTorch, first implement the network using numpy. Numpy doesn't have features for deep learning, gradients, You can build a simple neural network by implementing it manually.

import numpy as np

#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create random input data and teacher data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)

#Initialize the weight with a random value
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)

learning_rate = 1e-6
for t in range(500):
    #Forward propagation:Calculates the predicted value y with the current weight value
    h = x.dot(w1)
    h_relu = np.maximum(h, 0)
    y_pred = h_relu.dot(w2)

    #Calculates and outputs the loss
    loss = np.square(y_pred - y).sum()
    print(t, loss)

    #With reference to the loss value, calculate the gradient of the weights w1 and w2 by back propagation.
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.T.dot(grad_y_pred)
    grad_h_relu = grad_y_pred.dot(w2.T)
    grad_h = grad_h_relu.copy()
    grad_h[h < 0] = 0
    grad_w1 = x.T.dot(grad_h)

    #Update the weight.
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

When you run this code, you can see that the loss value is reduced and the learning is progressing.

1.2. PyTorch: Tensors

Numpy can't be calculated using the GPU, but PyTorch's Tensor can use the GPU to speed up numerical calculations. The tensor can also calculate the gradient, but for now, let's implement it manually, as in the numpy example above.

import torch


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") #Uncomment here to run on the GPU.

#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create random input data and teacher data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Initialize the weight with a random value
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)

learning_rate = 1e-6
for t in range(500):
    #Forward propagation:Calculates the predicted value y with the current weight value
    h = x.mm(w1)
    h_relu = h.clamp(min=0)
    y_pred = h_relu.mm(w2)

    #Calculates and outputs the loss
    loss = (y_pred - y).pow(2).sum().item()
    if t % 100 == 99:
        print(t, loss)

    #With reference to the loss value, calculate the gradient of the weights w1 and w2 by back propagation.
    grad_y_pred = 2.0 * (y_pred - y)
    grad_w2 = h_relu.t().mm(grad_y_pred)
    grad_h_relu = grad_y_pred.mm(w2.t())
    grad_h = grad_h_relu.clone()
    grad_h[h < 0] = 0
    grad_w1 = x.t().mm(grad_h)

    #Update weights using gradient descent
    w1 -= learning_rate * grad_w1
    w2 -= learning_rate * grad_w2

Even with this code, you can see that the loss value has decreased and the learning is progressing.

Autograd

2.1. PyTorch: Tensors and autograd

In the example above, we manually implemented forward and backpropagation, but you can use PyTorch's autograd package to automate the backpropagation calculation.

-Set requires_grad = True for the variable (Tensor) for which you want to calculate the gradient. ・ Execute backward () These two can automate the backpropagation calculation.

import torch

dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") #Uncomment here to run on the GPU.

#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create a random tensor to hold the input and teacher data.
# require_grad =Set to False to indicate that the gradient does not need to be calculated.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Create a random tensor that holds the weights.
# requires_grad =Setting True indicates that the gradient will be calculated.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    #Forward propagation:Calculate the predicted value y using the Tensor operation
    #Median value h because backpropagation is not calculated manually_relu does not need to be retained
    y_pred = x.mm(w1).clamp(min=0).mm(w2)

    #Calculate and display losses using Tensor operations
    #Loss is shape (1,) Tensor
    # loss.item()Gets the scalar value held in the loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    #Use autograd to calculate backpropagation
    # backward()Requires_grad =Calculates the loss gradient for all True Tensors
    #After this call, w1.grad and w2.grad is w1 respectively,Will be a Tensor that holds the gradient of w2
    loss.backward()

    #Manually update the weights using the steepest descent method
    #Require for weight_grad =Because there is True, torch.no_grad()Prevents the calculation graph from being updated with
    # torch.optim.You can do the same with SGD
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        #After updating the weights, manually set the gradient to zero
        w1.grad.zero_()
        w2.grad.zero_()

Although not in the tutorial, let's illustrate the backpropagation calculation graph. Calculation graphs can be illustrated by using torchviz. If you are using colaboratory, you need to install it.

!pip install torchviz

PyTorch: A little tweak to the Tensors sample code. Stop the loop so that the gradient is calculated only once.

#Create random input data and teacher data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Initialize the weight with a random value
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

#Forward propagation:Calculates the predicted value y with the current weight value
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)

#Calculates and outputs the loss
loss = (y_pred - y).pow(2).sum().item()

#With reference to the loss value, calculate the gradient of the weights w1 and w2 by back propagation.
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)

Diagram the calculation graph with make_dot of torchviz. Illustrates forward propagation and gradient. param_dict is not required, but it allows you to write variable names in the diagram.

#The calculation graph of forward propagation is illustrated.
from torchviz import make_dot
param_dict = {'w1': w1, 'w2': w2}
make_dot(loss, param_dict)

#The calculation graph of the gradient of w1 is shown.
make_dot(grad_w1, param_dict)

#The calculation graph of the gradient of w2 is illustrated.
make_dot(grad_w2, param_dict)

The calculation graph is below.

Similarly, modify the sample code in PyTorch: Tensors and autograd so that the gradient is calculated only once. Specifying create_graph = True at runtime () preserves the derivative graph.

import torch

#Create a random tensor to hold the input and teacher data.
# require_grad =Set to False to indicate that the gradient does not need to be calculated.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Create a random tensor that holds the weights.
# requires_grad =Setting True indicates that the gradient will be calculated.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

#Forward propagation:Calculate the predicted value y using the Tensor operation
#Median value h because backpropagation is not calculated manually_relu does not need to be retained
y_pred = x.mm(w1).clamp(min=0).mm(w2)

#Calculate and display losses using Tensor operations
#Loss is shape (1,) Tensor
# loss.item()Gets the scalar value held in the loss
loss = (y_pred - y).pow(2).sum()

#Use autograd to calculate backpropagation
# backward()Requires_grad =Calculates the loss gradient for all True Tensors
#After this call, w1.grad and w2.grad is w1 respectively,Will be a Tensor that holds the gradient of w2
loss_backward = loss.backward(create_graph=True)

Similarly, the gradient calculated by forward propagation and autograd is illustrated.

#The calculation graph of forward propagation is illustrated.
param_dict = {'w1': w1, 'w2': w2}
make_dot(loss, param_dict)

#The calculation graph of the gradient of w1 is shown.
make_dot(w1.grad, param_dict)

#The calculation graph of the gradient of w2 is illustrated.
make_dot(w2.grad, param_dict)

Forward propagation is the same. The backpropagation has a slightly different shape, but you can see that the backpropagation calculation is done automatically by autograd.

2.2. PyTorch: Defining new autograd functions

In PyTorch, you can define your own function (operator) by defining a subclass of torch.autograd.Function. Implement the following two methods in the subclass.

forward method: Compute the output tensor from the input tensor
backward method: Takes the gradient of the output tensor and calculates the gradient of the input tensor

In this example, we define a two-tier network with our own function, which means the ReLU function.

import torch

class MyReLU(torch.autograd.Function):
    """
    torch.autograd.Subclass Function and
By implementing forward and backward paths that work with Tensors,
You can implement your own custom autograd function.
    """

    @staticmethod
    def forward(ctx, input):
        """
The forward pass receives the Tensor containing the input and
Returns a Tensor containing the output.
ctx is an object for backpropagation calculations.
        ctx.save_for_Using the backward method
You can cache the object.
        """
        ctx.save_for_backward(input)
        return input.clamp(min=0)

    @staticmethod
    def backward(ctx, grad_output):
        """
In the backward, we receive a Tensor that contains the gradient of the loss with respect to the output.
You need to calculate the loss gradient for the input.
        """
        input, = ctx.saved_tensors
        grad_input = grad_output.clone()
        grad_input[input < 0] = 0
        return grad_input


dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") #Uncomment here to run on the GPU.

#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10

#Create a random tensor to hold the input and teacher data.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Create a random tensor that holds the weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

learning_rate = 1e-6
for t in range(500):
    #To apply a function, Function.Use the apply method.
    relu = MyReLU.apply

    #Forward propagation:Calculate the predicted value y using a custom autograd function
    y_pred = relu(x.mm(w1)).mm(w2)

    #Calculate and display the loss
    loss = (y_pred - y).pow(2).sum()
    if t % 100 == 99:
        print(t, loss.item())

    #Use autograd to calculate backpropagation
    loss.backward()

    #Update weights using steepest descent
    with torch.no_grad():
        w1 -= learning_rate * w1.grad
        w2 -= learning_rate * w2.grad

        #After updating the weights, manually set the gradient to zero
        w1.grad.zero_()
        w2.grad.zero_()

Let's also visualize the original function. As before, make sure it is processed only once.

#Create a random tensor to hold the input and teacher data.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)

#Create a random tensor that holds the weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)

#To apply a function, Function.Use the apply method.
relu = MyReLU.apply

#Forward propagation:Calculate the predicted value y using a custom autograd function
y_pred = relu(x.mm(w1)).mm(w2)

#Calculate and display the loss
loss = (y_pred - y).pow(2).sum()

#Use autograd to calculate backpropagation
loss.backward(create_graph=True)

Does it have a similar calculation graph?

Continue

Now that it's long, I'd like to split PyTorch: nn into the second part.

History

2020/05/27 First edition released

[PyTorch Tutorial ⑤] Learning PyTorch with Examples (Part 1)

Introduction

Continue

History