In the previous article (I implemented DeepPose with PyTorch), I compared Chainer and PyTorch while implementing DeepPose. PyTorch is as easy to implement as Chainer, and in terms of performance, the prediction is about the same as Chainer, and the learning is faster than Chainer. This time, we will dig deeper into the performance aspect by conducting the investigation and verification that we left behind last time.
Last time, regarding the fact that PyTorch's learning speed is faster than Chainer, PyTorch made a hypothesis that the automatic differentiation of the backward calculation of the Loss function is executed (natively) in C. This time, I changed the implementation in two points to verify it.
Added a process to specify a random number seed before starting learning. Note that Chainer's ʻiterator uses
MultiprocessIterator`, and it was difficult to fix random numbers in multi-process, so Data Augmentation in the iteration is disabled.
Chainer
def start(self):
""" Train pose net. """
+ # set random seed.
+ if self.seed is not None:
+ random.seed(self.seed)
+ np.random.seed(self.seed)
+ if self.gpu >= 0:
+ chainer.cuda.cupy.random.seed(self.seed)
# initialize model to train.
model = AlexNet(self.Nj, self.use_visibility)
PyTorch
def start(self):
""" Train pose net. """
+ # set random seed.
+ if self.seed is not None:
+ random.seed(self.seed)
+ torch.manual_seed(self.seed)
+ if self.gpu:
+ torch.cuda.manual_seed(self.seed)
# initialize model to train.
model = AlexNet(self.Nj)
According to Extending PyTorch, in PyTorch, the differentiation of Module
is automatic differentiation, and the differentiation of Function
is implementation required. It seems. So, this time, it seems that you should implement Function.backward
. Also note that the input for Module
is Variable
and the input for Function
is Tensor
.
Note that Function
has a convenient method called save_for_backward
, which allows you to save variables for backward
, but it does not support values in the middle of calculation and is used for backward
calculation. The intermediate result of the forward` calculation is stored in the member variable.
PyTorch
def forward(self, *inputs):
x, t, v = inputs
- diff = x - t
+ self.diff = x - t
if self.use_visibility:
- N = (v.sum()/2).data[0]
- diff = diff*v
+ self.N = v.sum()/2
+ self.diff = self.diff*v
else:
- N = diff.numel()/2
- diff = diff.view(-1)
- return diff.dot(diff)/N
+ self.N = self.diff.numel()/2
+ diff = self.diff.view(-1)
+ return torch.Tensor([diff.dot(diff)/self.N])
+
+ def backward(self, *grad_outputs):
+ coeff = grad_outputs[0][0]*2/self.N
+ gx0 = coeff*self.diff
+ return gx0, None, None
An additional experiment was conducted to verify the temporary construction that was set up last time. The data set and environment used for verification are the same as last time.
In order to verify the effect of PyTorch's automatic differentiation on C (native) on learning speed, the requirements for PyTorch's automatic differentiation and explicit differentiation when 100 epochs are trained in the CPU environment and GPU environment, respectively. I measured the time.
In the CPU environment, the learning time of PyTorch's automatic differentiation and explicit differentiation is almost the same.
Library | Time required[h] |
---|---|
PyTorch(Automatic differentiation) | 47.6 |
PyTorch(Explicit derivative) | 47.6 |
Since the random numbers are fixed, the learning curves of PyTorch almost overlap.
In the GPU environment, the learning time was slightly slower for PyTorch's automatic differentiation than for explicit differentiation. It's hard to understand that the Python implementation is faster, but it may be due to the randomness of the GPU, and the results may change with a few trials.
Library | Time required[h] |
---|---|
PyTorch(Automatic differentiation) | 2.60 |
PyTorch(Explicit derivative) | 2.49 |
Although the random numbers are fixed, the learning curve of PyTorch can be shifted in the time axis direction depending on the implementation method because of the randomness caused by the GPU.
Looking at the above experimental results, it seems that the temporary construction made last time is not always the correct answer. Looking at the code again to investigate the cause, in Chainer, the implementation of each layer such as Convolution was Python, and in PyTorch it was C. Since this seems to have a dominant effect on the learning time, Chainer and PyTorch measured the total time required for forward and backward calculations of the Loss function (including network). This time, for a batch size of $ 2 ^ n $, we measured each 100 times and calculated the mean and variance.
In the CPU environment, when $ n = 0 $, the processing time is almost the same, but as the $ n $ increases, PyTorch becomes superior. Considering that the average learning time of 1 epoch of Chainer and PyTorch in the CPU environment is 8.2 [sec] and 5.0 [sec], respectively, the above hypothesis seems to be appropriate.
In the GPU environment as well as the CPU environment, the result is that PyTorch becomes superior as $ n $ increases. Considering that the average learning time of 1 epoch of Chainer and PyTorch in the GPU environment is 0.45 [sec] and 0.28 [sec], respectively, the above hypothesis seems to be appropriate.
The hypothesis that I made last time, "The learning time of PyTorch is faster than Chainer is because the bakcward calculation of the Loss function is executed (natively) in C" is half correct and half incorrect. It was. The effect on learning time due to the difference in the implementation method of backward calculation of the Loss function seems to be insignificant. It seems that the dominant influence on the learning time is the implementation method such as Convolution required to calculate each layer of the network. Once you understand it, it's a natural conclusion. However, although I used PyTorch 0.1.10 in this experiment, I also tried the experiment with the latest version 0.1.12 at the time of writing, and the result was that Chainer was rather faster. I have the impression that PyTorch is just under development. The code is on github, so I'll try again when development is settled.
Recommended Posts