Try the new scheduler chaining in PyTorch 1.4

Introduction

Have you upgraded to PyTorch 1.4? If you haven't done so already, you can find out how to upgrade from Official here. (Please note that Python 2 series will not be supported from the next version)

As a new feature of PyTorch 1.4, ** Scheduler chaining function ** was quietly added. (Release notes here) Try immediately.

What is Scheduler

Using Scheduler, the learning rate can be changed for each Epoch. The higher the learning rate, the faster the learning progresses, but if the learning rate remains too high, there is a risk of skipping the optimal solution. Therefore, it is a standard practice to use Scheduler when learning NN and gradually lower the learning rate as the number of Epochs increases. (Although not directly related to this story, please note that PyTorch's scheduler is implemented to return a multiplier to the original learning rate, unlike Keras etc.)

What is new function Chaining?

According to the formula, ** "a function that allows you to define and step two schedulers one after another to combine effects" **. Until now, only one scheduler could use the learning rate, but it seems that it is possible to perform matching techniques. It doesn't come out very well, so let's actually move it and see how the learning rate changes.

First, check how it behaved with PyTorch 1.3

import torch
from torch.optim import SGD
from torch.optim.lr_scheduler import ExponentialLR, StepLR
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler1 = StepLR(optimizer, step_size=3, gamma=0.5)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
s1, s2, lr = [], [], []
for epoch in range(100):
    optimizer.step()
    scheduler1.step()
    scheduler2.step()
    s1.append(scheduler1.get_lr()[0])
    s2.append(scheduler2.get_lr()[0])
    for param_group in optimizer.param_groups:
        lr.append(param_group['lr'])

We use two schedulers, StepLR and ExponentialLR. Let's call them scheduler 1 and 2, respectively. Plot the learning rate (s1, s2) of the obtained scheduler respectively.

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.plot(s1, label='StepLR (scheduler1)')
plt.plot(s2, label='ExponentialLR (scheduler2)')
plt.legend()

image.png

You can see what kind of characteristics each scheduler has. Next, plot the learning rate of Optimizer.

plt.plot(lr, label='Learning Rate')
plt.legend()

image.png

As you can see at a glance, you can see that the learning rate of Exponential LR (scheduler 2) is used. Apparently, PyTorch 1.3 used the learning rate of the scheduler where step was last called. Also, it seems that the schedulers of each other worked independently without any particular influence.

Finally confirm the effect of Chaining with PyTorch 1.4

import torch
from torch.optim import SGD
from torch.optim.lr_scheduler import ExponentialLR, StepLR
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler1 = StepLR(optimizer, step_size=3, gamma=0.5)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
s1, s2, lr = [], [], []
for epoch in range(100):
    optimizer.step()
    scheduler1.step()
    scheduler2.step()
    s1.append(scheduler1.get_last_lr()[0])
    s2.append(scheduler2.get_last_lr()[0])
    for param_group in optimizer.param_groups:
        lr.append(param_group['lr'])

** Note that in PyTorch 1.3 I was getting the scheduler learning rate with .get_lr (), but in PyTorch 1.4 it is .get_last_lr (). ** ** You can also use .get_lr (), but be aware that the correct value may not be output. These changes have been officially announced.

Now, let's plot the learning rate of each scheduler.

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.plot(s1, label='StepLR (scheduler1)')
plt.plot(s2, label='ExponentialLR (scheduler2)')
plt.legend()

image.png

At this point, the situation is different from 1.3. Then plot the learning rate of Optimizer.

plt.plot(lr, label='Learning Rate')
plt.legend()

image.png

In this way, you can see that the learning rates of the two schedulers are multiplied one after another to obtain the final learning rate of the Optimizer. It seems good to recognize that the learning rate is changing due to the influence of the two schedulers themselves. It seems that the learning rate of Optimizer changes as the learning rate of each scheduler changes.

It seems that it will be easier to write a scheduler that changes the learning rate in a complicated way, which until now had to be written by yourself. It seems to be especially useful when you want to add a little Cyclical behavior.

that's all.

Recommended Posts

Try the new scheduler chaining in PyTorch 1.4
Cython to try in the shortest
Try using the Kraken API in Python
Try using the HL band in order
Try hitting the YouTube API in Python
Try hitting the Spotify API in Django.
[Cloudian # 7] Try deleting the bucket in Python (boto3)
Try using the BitFlyer Ligntning API in Python
Try implementing the Monte Carlo method in Python
Factfulness of the new coronavirus seen in Splunk
Try using the DropBox Core API in Python
Try loading the image in a separate thread (OpenCV-Python)
Try to decipher the login data stored in Firefox
Try Semantic Segmentation (Pytorch)
Try gRPC in Python
What's new in Python 3.5
New in Python 3.4.0 (1)-pathlib
Try 9 slices in Python
What's new in Python 3.6
Try scraping the data of COVID-19 in Tokyo with Python
A note on the default behavior of collate_fn in PyTorch
Get and create nodes added and updated in the new version
Try to extract the keywords that are popular in COTOHA