TL;DR
is applied when building the forward graph, not when
backward () is executed. --However, since the movement is not very intuitive, it is better not to change ʻenable_backprop
many times in the graph.train
does not include ʻenable_backprop in
chainer.configuration. --The standard library
chainer.extensions.Evaluatorwill set these appropriately so you can rest assured. --When writing something like your own
chainer.extensions.Evaluator, change both
configurations`At last I tried to raise Chainer and Oreore helper library to v2 with a heavy waist, but I also noted that the behavior of ʻenable_backprop` became noticeable. In particular, consider the following questions.
changes in the middle of the graph? Is it possible to do something like
.unchain_backward ()`?train
is set to False, will ʻenable_backprop` also be automatically set to False?chainer.extensions.Evaluator
?By the way, this post is as of v2.0.0.
In the implementation using contextmanager, I am very curious about what happens if ʻenable_backpropchanges in the middle of the graph. If
no_backprop_mode has an effect when
backward ()is executed, and
no_backprop_mode is used at the time of forward calculation, but the backward calculation is forgotten to be enclosed in
no_backprop_mode`, so the accuracy is abnormally good. I'm in trouble [^ 1].
[^ 1]: Of course, the data doesn't actually change unless you call optimizer, so it's okay to make this mistake in practice.
The answer can be found in function_node.py. As described here, the effect of no_backprop_mode
is that the parent is not registered when constructing the calculation graph, so it has the ** effect within the range specified at the time of ** forward calculation.
Normal execution
a = chainer.Variable(np.array([0.1], dtype=np.float32))
with chainer.configuration.using_config('enable_backprop', True):
chainer.config.show()
b = a * 2.0
b.backward()
print a.grad
output
cudnn_deterministic False
debug False
enable_backprop True
keep_graph_on_report False
train True
type_check True
use_cudnn auto
[ 2.]
You can see that it is effective even if backward is out of range.
a = chainer.Variable(np.array([0.1], dtype=np.float32))
with chainer.configuration.using_config('enable_backprop', False):
chainer.config.show()
b = a * 2.0
b.backward()
print a.grad
output
cudnn_deterministic False
debug False
enable_backprop False
keep_graph_on_report False
train True
type_check True
use_cudnn auto
None
Also, as mentioned above, ʻenable_backprop` breaks the connection with the ** parent **. So the gradient of its parent is 0, not the newly created variable in contextmanager.
a = chainer.Variable(np.array([0.1], dtype=np.float32))
with chainer.configuration.using_config('enable_backprop', False):
b = a * 2.0
c = b + 0.5
c.backward()
print a.grad # None
print b.grad # [ 1.]
What's more, ** isn't completely dependent on the configuration when backward is called? Not **. Therefore, it seems better to use ʻunchain_backward () obediently instead of implementing ʻenable_backprop
many times in one calculation graph.
train
is set to False, will ʻenable_backprop` also be automatically set to False?** No. ** **
a = chainer.Variable(np.array([0.1], dtype=np.float32))
with chainer.configuration.using_config('train', False):
chainer.config.show()
b = a * 2.0
b.backward()
print a.grad
output
cudnn_deterministic False
debug False
enable_backprop True
keep_graph_on_report False
train False
type_check True
use_cudnn auto
[ 2.]
So if you write your own code like ʻextensions.Evaluator, you need to set both ʻenable_backprop
and train
to False
.
chainer.extensions.Evaluator
?** It looks okay. ** ʻenable_backprop and
trainare both
False`.
import chainer
import chainer.functions as F
import chainer.links as L
from chainer import training
from chainer.training import extensions
# Network definition
class MLP(chainer.Chain):
def __init__(self, n_out):
super(MLP, self).__init__()
with self.init_scope():
self.l1 = L.Linear(None, n_out)
def __call__(self, x):
chainer.config.show()
print ""
return self.l1(x)
model = L.Classifier(MLP(10))
optimizer = chainer.optimizers.Adam()
optimizer.setup(model)
# Load the MNIST dataset
train, test = chainer.datasets.get_mnist()
test = chainer.datasets.split_dataset(test, 1)[0]
train_iter = chainer.iterators.SerialIterator(train, 32)
test_iter = chainer.iterators.SerialIterator(test, 1,
repeat=False, shuffle=False)
# Set up a trainer
updater = training.StandardUpdater(train_iter, optimizer)
trainer = training.Trainer(updater, (1, 'iteration'))
trainer.extend(extensions.Evaluator(test_iter, model), trigger=(1, 'iteration'))
# Run the training
trainer.run()
output
cudnn_deterministic False
debug False
enable_backprop True
keep_graph_on_report False
train True
type_check True
use_cudnn auto
cudnn_deterministic False
debug False
enable_backprop False
keep_graph_on_report False
train False
type_check True
use_cudnn auto
As you can see, both ʻenable_backprop and
trainare
False`. In terms of code, around here is applicable.
Recommended Posts