Introduction

Have you upgraded to PyTorch 1.4? If you haven't done so already, you can find out how to upgrade from Official here. (Please note that Python 2 series will not be supported from the next version)

As a new feature of PyTorch 1.4, ** Scheduler chaining function ** was quietly added. (Release notes here) Try immediately.

What is Scheduler

Using Scheduler, the learning rate can be changed for each Epoch. The higher the learning rate, the faster the learning progresses, but if the learning rate remains too high, there is a risk of skipping the optimal solution. Therefore, it is a standard practice to use Scheduler when learning NN and gradually lower the learning rate as the number of Epochs increases. (Although not directly related to this story, please note that PyTorch's scheduler is implemented to return a multiplier to the original learning rate, unlike Keras etc.)

What is new function Chaining?

According to the formula, ** "a function that allows you to define and step two schedulers one after another to combine effects" **. Until now, only one scheduler could use the learning rate, but it seems that it is possible to perform matching techniques. It doesn't come out very well, so let's actually move it and see how the learning rate changes.

First, check how it behaved with PyTorch 1.3

import torch
from torch.optim import SGD
from torch.optim.lr_scheduler import ExponentialLR, StepLR
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler1 = StepLR(optimizer, step_size=3, gamma=0.5)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
s1, s2, lr = [], [], []
for epoch in range(100):
    optimizer.step()
    scheduler1.step()
    scheduler2.step()
    s1.append(scheduler1.get_lr()[0])
    s2.append(scheduler2.get_lr()[0])
    for param_group in optimizer.param_groups:
        lr.append(param_group['lr'])

We use two schedulers, StepLR and ExponentialLR. Let's call them scheduler 1 and 2, respectively. Plot the learning rate (s1, s2) of the obtained scheduler respectively.

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.plot(s1, label='StepLR (scheduler1)')
plt.plot(s2, label='ExponentialLR (scheduler2)')
plt.legend()

You can see what kind of characteristics each scheduler has. Next, plot the learning rate of Optimizer.

plt.plot(lr, label='Learning Rate')
plt.legend()

As you can see at a glance, you can see that the learning rate of Exponential LR (scheduler 2) is used. Apparently, PyTorch 1.3 used the learning rate of the scheduler where step was last called. Also, it seems that the schedulers of each other worked independently without any particular influence.

Finally confirm the effect of Chaining with PyTorch 1.4

import torch
from torch.optim import SGD
from torch.optim.lr_scheduler import ExponentialLR, StepLR
model = [torch.nn.Parameter(torch.randn(2, 2, requires_grad=True))]
optimizer = SGD(model, 0.1)
scheduler1 = StepLR(optimizer, step_size=3, gamma=0.5)
scheduler2 = ExponentialLR(optimizer, gamma=0.9)
s1, s2, lr = [], [], []
for epoch in range(100):
    optimizer.step()
    scheduler1.step()
    scheduler2.step()
    s1.append(scheduler1.get_last_lr()[0])
    s2.append(scheduler2.get_last_lr()[0])
    for param_group in optimizer.param_groups:
        lr.append(param_group['lr'])

** Note that in PyTorch 1.3 I was getting the scheduler learning rate with .get_lr (), but in PyTorch 1.4 it is .get_last_lr (). ** ** You can also use .get_lr (), but be aware that the correct value may not be output. These changes have been officially announced.

Now, let's plot the learning rate of each scheduler.

import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
plt.plot(s1, label='StepLR (scheduler1)')
plt.plot(s2, label='ExponentialLR (scheduler2)')
plt.legend()

At this point, the situation is different from 1.3. Then plot the learning rate of Optimizer.

plt.plot(lr, label='Learning Rate')
plt.legend()

In this way, you can see that the learning rates of the two schedulers are multiplied one after another to obtain the final learning rate of the Optimizer. It seems good to recognize that the learning rate is changing due to the influence of the two schedulers themselves. It seems that the learning rate of Optimizer changes as the learning rate of each scheduler changes.

It seems that it will be easier to write a scheduler that changes the learning rate in a complicated way, which until now had to be written by yourself. It seems to be especially useful when you want to add a little Cyclical behavior.

that's all.

Try the new scheduler chaining in PyTorch 1.4

Introduction

What is Scheduler

What is new function Chaining?

First, check how it behaved with PyTorch 1.3

Finally confirm the effect of Chaining with PyTorch 1.4