This is the 5th installment of PyTorch Official Tutorial following Last time. This time, we will proceed with Learning PyTorch with Examples.
Learning PyTorch with Examples
This tutorial will show you two main features of PyTorch through sample code.
The network (model) handled by the sample code is 3 layers (input layer, hidden layer x 1, output layer). The activation function uses ReLU.
1.1. Warm-up: numpy
Before PyTorch, first implement the network using numpy. Numpy doesn't have features for deep learning, gradients, You can build a simple neural network by implementing it manually.
import numpy as np
#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10
#Create random input data and teacher data
x = np.random.randn(N, D_in)
y = np.random.randn(N, D_out)
#Initialize the weight with a random value
w1 = np.random.randn(D_in, H)
w2 = np.random.randn(H, D_out)
learning_rate = 1e-6
for t in range(500):
#Forward propagation:Calculates the predicted value y with the current weight value
h = x.dot(w1)
h_relu = np.maximum(h, 0)
y_pred = h_relu.dot(w2)
#Calculates and outputs the loss
loss = np.square(y_pred - y).sum()
print(t, loss)
#With reference to the loss value, calculate the gradient of the weights w1 and w2 by back propagation.
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.T.dot(grad_y_pred)
grad_h_relu = grad_y_pred.dot(w2.T)
grad_h = grad_h_relu.copy()
grad_h[h < 0] = 0
grad_w1 = x.T.dot(grad_h)
#Update the weight.
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
When you run this code, you can see that the loss value is reduced and the learning is progressing.
1.2. PyTorch: Tensors
Numpy can't be calculated using the GPU, but PyTorch's Tensor can use the GPU to speed up numerical calculations. The tensor can also calculate the gradient, but for now, let's implement it manually, as in the numpy example above.
import torch
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") #Uncomment here to run on the GPU.
#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10
#Create random input data and teacher data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
#Initialize the weight with a random value
w1 = torch.randn(D_in, H, device=device, dtype=dtype)
w2 = torch.randn(H, D_out, device=device, dtype=dtype)
learning_rate = 1e-6
for t in range(500):
#Forward propagation:Calculates the predicted value y with the current weight value
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)
#Calculates and outputs the loss
loss = (y_pred - y).pow(2).sum().item()
if t % 100 == 99:
print(t, loss)
#With reference to the loss value, calculate the gradient of the weights w1 and w2 by back propagation.
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
#Update weights using gradient descent
w1 -= learning_rate * grad_w1
w2 -= learning_rate * grad_w2
Even with this code, you can see that the loss value has decreased and the learning is progressing.
2.1. PyTorch: Tensors and autograd
In the example above, we manually implemented forward and backpropagation, but you can use PyTorch's autograd package to automate the backpropagation calculation.
-Set requires_grad = True for the variable (Tensor) for which you want to calculate the gradient. ・ Execute backward () These two can automate the backpropagation calculation.
import torch
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") #Uncomment here to run on the GPU.
#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10
#Create a random tensor to hold the input and teacher data.
# require_grad =Set to False to indicate that the gradient does not need to be calculated.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
#Create a random tensor that holds the weights.
# requires_grad =Setting True indicates that the gradient will be calculated.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
learning_rate = 1e-6
for t in range(500):
#Forward propagation:Calculate the predicted value y using the Tensor operation
#Median value h because backpropagation is not calculated manually_relu does not need to be retained
y_pred = x.mm(w1).clamp(min=0).mm(w2)
#Calculate and display losses using Tensor operations
#Loss is shape (1,) Tensor
# loss.item()Gets the scalar value held in the loss
loss = (y_pred - y).pow(2).sum()
if t % 100 == 99:
print(t, loss.item())
#Use autograd to calculate backpropagation
# backward()Requires_grad =Calculates the loss gradient for all True Tensors
#After this call, w1.grad and w2.grad is w1 respectively,Will be a Tensor that holds the gradient of w2
loss.backward()
#Manually update the weights using the steepest descent method
#Require for weight_grad =Because there is True, torch.no_grad()Prevents the calculation graph from being updated with
# torch.optim.You can do the same with SGD
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
#After updating the weights, manually set the gradient to zero
w1.grad.zero_()
w2.grad.zero_()
Although not in the tutorial, let's illustrate the backpropagation calculation graph. Calculation graphs can be illustrated by using torchviz. If you are using colaboratory, you need to install it.
!pip install torchviz
PyTorch: A little tweak to the Tensors sample code. Stop the loop so that the gradient is calculated only once.
#Create random input data and teacher data
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
#Initialize the weight with a random value
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
#Forward propagation:Calculates the predicted value y with the current weight value
h = x.mm(w1)
h_relu = h.clamp(min=0)
y_pred = h_relu.mm(w2)
#Calculates and outputs the loss
loss = (y_pred - y).pow(2).sum().item()
#With reference to the loss value, calculate the gradient of the weights w1 and w2 by back propagation.
grad_y_pred = 2.0 * (y_pred - y)
grad_w2 = h_relu.t().mm(grad_y_pred)
grad_h_relu = grad_y_pred.mm(w2.t())
grad_h = grad_h_relu.clone()
grad_h[h < 0] = 0
grad_w1 = x.t().mm(grad_h)
Diagram the calculation graph with make_dot of torchviz. Illustrates forward propagation and gradient. param_dict is not required, but it allows you to write variable names in the diagram.
#The calculation graph of forward propagation is illustrated.
from torchviz import make_dot
param_dict = {'w1': w1, 'w2': w2}
make_dot(loss, param_dict)
#The calculation graph of the gradient of w1 is shown.
make_dot(grad_w1, param_dict)
#The calculation graph of the gradient of w2 is illustrated.
make_dot(grad_w2, param_dict)
The calculation graph is below.
Similarly, modify the sample code in PyTorch: Tensors and autograd so that the gradient is calculated only once. Specifying create_graph = True at runtime () preserves the derivative graph.
import torch
#Create a random tensor to hold the input and teacher data.
# require_grad =Set to False to indicate that the gradient does not need to be calculated.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
#Create a random tensor that holds the weights.
# requires_grad =Setting True indicates that the gradient will be calculated.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
#Forward propagation:Calculate the predicted value y using the Tensor operation
#Median value h because backpropagation is not calculated manually_relu does not need to be retained
y_pred = x.mm(w1).clamp(min=0).mm(w2)
#Calculate and display losses using Tensor operations
#Loss is shape (1,) Tensor
# loss.item()Gets the scalar value held in the loss
loss = (y_pred - y).pow(2).sum()
#Use autograd to calculate backpropagation
# backward()Requires_grad =Calculates the loss gradient for all True Tensors
#After this call, w1.grad and w2.grad is w1 respectively,Will be a Tensor that holds the gradient of w2
loss_backward = loss.backward(create_graph=True)
Similarly, the gradient calculated by forward propagation and autograd is illustrated.
#The calculation graph of forward propagation is illustrated.
param_dict = {'w1': w1, 'w2': w2}
make_dot(loss, param_dict)
#The calculation graph of the gradient of w1 is shown.
make_dot(w1.grad, param_dict)
#The calculation graph of the gradient of w2 is illustrated.
make_dot(w2.grad, param_dict)
Forward propagation is the same. The backpropagation has a slightly different shape, but you can see that the backpropagation calculation is done automatically by autograd.
2.2. PyTorch: Defining new autograd functions
In PyTorch, you can define your own function (operator) by defining a subclass of torch.autograd.Function. Implement the following two methods in the subclass.
In this example, we define a two-tier network with our own function, which means the ReLU function.
import torch
class MyReLU(torch.autograd.Function):
"""
torch.autograd.Subclass Function and
By implementing forward and backward paths that work with Tensors,
You can implement your own custom autograd function.
"""
@staticmethod
def forward(ctx, input):
"""
The forward pass receives the Tensor containing the input and
Returns a Tensor containing the output.
ctx is an object for backpropagation calculations.
ctx.save_for_Using the backward method
You can cache the object.
"""
ctx.save_for_backward(input)
return input.clamp(min=0)
@staticmethod
def backward(ctx, grad_output):
"""
In the backward, we receive a Tensor that contains the gradient of the loss with respect to the output.
You need to calculate the loss gradient for the input.
"""
input, = ctx.saved_tensors
grad_input = grad_output.clone()
grad_input[input < 0] = 0
return grad_input
dtype = torch.float
device = torch.device("cpu")
# device = torch.device("cuda:0") #Uncomment here to run on the GPU.
#N: Batch size
# D_in: Number of input dimensions
#H: Number of dimensions of hidden layer
# D_out: Number of output dimensions
N, D_in, H, D_out = 64, 1000, 100, 10
#Create a random tensor to hold the input and teacher data.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
#Create a random tensor that holds the weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
learning_rate = 1e-6
for t in range(500):
#To apply a function, Function.Use the apply method.
relu = MyReLU.apply
#Forward propagation:Calculate the predicted value y using a custom autograd function
y_pred = relu(x.mm(w1)).mm(w2)
#Calculate and display the loss
loss = (y_pred - y).pow(2).sum()
if t % 100 == 99:
print(t, loss.item())
#Use autograd to calculate backpropagation
loss.backward()
#Update weights using steepest descent
with torch.no_grad():
w1 -= learning_rate * w1.grad
w2 -= learning_rate * w2.grad
#After updating the weights, manually set the gradient to zero
w1.grad.zero_()
w2.grad.zero_()
Let's also visualize the original function. As before, make sure it is processed only once.
#Create a random tensor to hold the input and teacher data.
x = torch.randn(N, D_in, device=device, dtype=dtype)
y = torch.randn(N, D_out, device=device, dtype=dtype)
#Create a random tensor that holds the weights.
w1 = torch.randn(D_in, H, device=device, dtype=dtype, requires_grad=True)
w2 = torch.randn(H, D_out, device=device, dtype=dtype, requires_grad=True)
#To apply a function, Function.Use the apply method.
relu = MyReLU.apply
#Forward propagation:Calculate the predicted value y using a custom autograd function
y_pred = relu(x.mm(w1)).mm(w2)
#Calculate and display the loss
loss = (y_pred - y).pow(2).sum()
#Use autograd to calculate backpropagation
loss.backward(create_graph=True)
Does it have a similar calculation graph?
Now that it's long, I'd like to split PyTorch: nn into the second part.
2020/05/27 First edition released
Recommended Posts