About the behavior of enable_backprop of Chainer v2

TL;DR

Purpose

At last I tried to raise Chainer and Oreore helper library to v2 with a heavy waist, but I also noted that the behavior of ʻenable_backprop` became noticeable. In particular, consider the following questions.

By the way, this post is as of v2.0.0.

Investigation

What if ʻenable_backprop` changes in the middle of the graph?

In the implementation using contextmanager, I am very curious about what happens if ʻenable_backpropchanges in the middle of the graph. Ifno_backprop_mode has an effect when backward ()is executed, andno_backprop_mode is used at the time of forward calculation, but the backward calculation is forgotten to be enclosed in no_backprop_mode`, so the accuracy is abnormally good. I'm in trouble [^ 1].

[^ 1]: Of course, the data doesn't actually change unless you call optimizer, so it's okay to make this mistake in practice.

The answer can be found in function_node.py. As described here, the effect of no_backprop_mode is that the parent is not registered when constructing the calculation graph, so it has the ** effect within the range specified at the time of ** forward calculation.

Normal execution

a = chainer.Variable(np.array([0.1], dtype=np.float32))
with chainer.configuration.using_config('enable_backprop', True):
    chainer.config.show()
    b = a * 2.0
b.backward()
print a.grad

output


cudnn_deterministic  False
debug                False
enable_backprop      True
keep_graph_on_report False
train                True
type_check           True
use_cudnn            auto
[ 2.]

You can see that it is effective even if backward is out of range.

a = chainer.Variable(np.array([0.1], dtype=np.float32))
with chainer.configuration.using_config('enable_backprop', False):
    chainer.config.show()
    b = a * 2.0
b.backward()
print a.grad

output


cudnn_deterministic  False
debug                False
enable_backprop      False
keep_graph_on_report False
train                True
type_check           True
use_cudnn            auto
None

Also, as mentioned above, ʻenable_backprop` breaks the connection with the ** parent **. So the gradient of its parent is 0, not the newly created variable in contextmanager.

a = chainer.Variable(np.array([0.1], dtype=np.float32))
with chainer.configuration.using_config('enable_backprop', False):
    b = a * 2.0
c = b + 0.5
c.backward()
print a.grad  # None
print b.grad  # [ 1.]

What's more, ** isn't completely dependent on the configuration when backward is called? Not **. Therefore, it seems better to use ʻunchain_backward () obediently instead of implementing ʻenable_backprop many times in one calculation graph.

If train is set to False, will ʻenable_backprop` also be automatically set to False?

** No. ** **

a = chainer.Variable(np.array([0.1], dtype=np.float32))
with chainer.configuration.using_config('train', False):
    chainer.config.show()
    b = a * 2.0
b.backward()
print a.grad

output


cudnn_deterministic  False
debug                False
enable_backprop      True
keep_graph_on_report False
train                False
type_check           True
use_cudnn            auto
[ 2.]

So if you write your own code like ʻextensions.Evaluator, you need to set both ʻenable_backprop and train to False.

Is it okay to trust and use the standard library chainer.extensions.Evaluator?

** It looks okay. ** ʻenable_backprop and trainare bothFalse`.

import chainer
import chainer.functions as F
import chainer.links as L
from chainer import training
from chainer.training import extensions

# Network definition
class MLP(chainer.Chain):
    def __init__(self, n_out):
        super(MLP, self).__init__()
        with self.init_scope():
            self.l1 = L.Linear(None, n_out)

    def __call__(self, x):
        chainer.config.show()
        print ""
        return self.l1(x)

model = L.Classifier(MLP(10))

optimizer = chainer.optimizers.Adam()
optimizer.setup(model)

# Load the MNIST dataset
train, test = chainer.datasets.get_mnist()
test = chainer.datasets.split_dataset(test, 1)[0]

train_iter = chainer.iterators.SerialIterator(train, 32)
test_iter = chainer.iterators.SerialIterator(test, 1,
                                             repeat=False, shuffle=False)
# Set up a trainer
updater = training.StandardUpdater(train_iter, optimizer)
trainer = training.Trainer(updater, (1, 'iteration'))
trainer.extend(extensions.Evaluator(test_iter, model), trigger=(1, 'iteration'))

# Run the training
trainer.run()

output


cudnn_deterministic  False
debug                False
enable_backprop      True
keep_graph_on_report False
train                True
type_check           True
use_cudnn            auto

cudnn_deterministic  False
debug                False
enable_backprop      False
keep_graph_on_report False
train                False
type_check           True
use_cudnn            auto

As you can see, both ʻenable_backprop and trainareFalse`. In terms of code, around here is applicable.

Recommended Posts

About the behavior of enable_backprop of Chainer v2
About the behavior of yield_per of SqlAlchemy
About variable of chainer
About the behavior of copy, deepcopy and numpy.copy
About the behavior of Model.get_or_create () of peewee in Python
About the behavior of Queue during parallel processing
About the ease of Python
About the components of Luigi
About the features of Python
Tank game made with python About the behavior of tanks
A memo about the behavior of bowtie2 during multiple hits
About the return value of pthread_mutex_init ()
About the return value of the histogram.
About the basic type of Go
About the upper limit of threads-max
About the size of matplotlib points
About the basics list of Python basics
Check the behavior of destructor in Python
About the virtual environment of python version 3.7
About the arguments of the setup function of PyCaret
About the Normal Equation of Linear Regression
I wanted to be careful about the behavior of Python's default arguments
About the accuracy of Archimedean circle calculation method
About the test
About the X-axis notation of Matplotlib bar graphs
See the behavior of drunkenness with reinforcement learning
About the processing speed of SVM (SVC) of scikit-learn
Behavior of multiprocessing.pool.Pool.map
The behavior of signal () depends on the compile options
A note about the python version of python virtualenv
About the development contents of machine learning (Example)
[Note] About the role of underscore "_" in Python
Visualize the behavior of the sorting algorithm with matplotlib
About the * (asterisk) argument of python (and itertools.starmap)
About the queue
Think about the next generation of Rack and WSGI
About testing in the implementation of machine learning models
About the inefficiency of data transfer in luigi on-memory
About the uncluttered arrangement in the import order of flake8
Let's analyze the emotions of Tweet using Chainer (2nd)
A story about changing the master name of BlueZ
Personal notes about the integration of vscode and anaconda
A reminder about the implementation of recommendations in Python
Let's analyze the sentiment of Tweet using Chainer (1st)
The beginning of cif2cell
About all of numpy
About assignment of numpy.ndarray
[python] behavior of argmax
About MultiIndex of pandas
the zen of Python
The story of sys.path.append ()
About the Unfold function
About the service command
About the confusion matrix
About the Visitor pattern
Revenge of the Types: Revenge of types
A note on the default behavior of collate_fn in PyTorch
Reuse the behavior of the @property method by using a descriptor [16/100]
Think about the analysis environment (Part 1: Overview) * As of January 2017
About the camera change event of Google Maps Android API
About the garbled Japanese part of pandas-profiling in Jupyter notebook