tl;dr

You can get the minimum logging for free by writing trainer.extend (extensions.LogReport ()). You can be happy if you keep ʻextensions.ParameterStatistics`.

Purpose

It's been a year since the Training Loop Abstraction was introduced to Chainer in June 2016, v1.11. I think there are many people who want to use the assets (debt?) Of the abstract learning loop and haven't touched it. Since I want to master Trainer and other things in a sloppy manner, I have summarized the reporting of learning metrics that are not so popular (but very important).

Understand the reporting mechanism in Chainer

Monitoring various metrics is very important for deep learning. For example, Tensorflow has a powerful reporting feature called summary.

Officially, there is a class called Reporter that seems to report, Documentation, but Official If you look at Example, the class Reporter does not appear anywhere, and the accuracy etc. are clearly stated. There is no place to write it out. What does this mean?

If you look at Official Example L96-97, you can see LogReport and I'm adding ʻextensions such as PrintReport to the Trainer` object.

# Write a log of evaluation statistics for each epoch
trainer.extend(extensions.LogReport())

And look at chainer / chainer / training / trainer.py: L291-L299 Looking at it, we declare with reporter.scope (self.observation) at the beginning of the learning loop. This declaration ensures that all calls to chainer.reporter.report ({'name': value_to_report}) made in the learning loop are stored in self.observation.

    def run(self):
        ....
        reporter = self.reporter
        stop_trigger = self.stop_trigger

        # main training loop
        try:
            while not stop_trigger(self):
                self.observation = {}
                with reporter.scope(self.observation):
                    update()
                    for name, entry in extensions:
                        if entry.trigger(self):
                             entry.extension(self)

In other words, even if you don't explicitly say Reporter, the metrics are actually collected behind the scenes. Now, regarding the collected data, by calling ʻentry.extension (self)`, [chainer / chainer / training / extensions / log_report.py: L67-L88](https://github.com/chainer/chainer/ It is passed to blob / v2.0.0/chainer/training/extensions/log_report.py#L67-L88).

    def __call__(self, trainer):
        # accumulate the observations
        keys = self._keys
        observation = trainer.observation
        summary = self._summary

        if keys is None:
            summary.add(observation)
        else:
            summary.add({k: observation[k] for k in keys if k in observation})

        if self._trigger(trainer):
            # output the result
            stats = self._summary.compute_mean()
            stats_cpu = {}
            for name, value in six.iteritems(stats):
                stats_cpu[name] = float(value)  # copy to CPU

            updater = trainer.updater
            stats_cpu['epoch'] = updater.epoch
            stats_cpu['iteration'] = updater.iteration
            stats_cpu['elapsed_time'] = trainer.elapsed_time

The number of epochs etc. is added in this function and output in the appropriate place. If no extension with any reporting function is registered, the data will simply be discarded.

Now I see why I didn't explicitly read Reporter in the official Example. But why is the accuracy (ʻaccuracy) and loss (loss) even though I have never called chainer.reporter.report`?

So, take a look at chainer / chainer / links / model / classifier.py If you look, you can see that chainer.reporter.report is called in the official implementation.

        self.loss = self.lossfun(self.y, t)
        reporter.report({'loss': self.loss}, self)
        if self.compute_accuracy:
            self.accuracy = self.accfun(self.y, t)
            reporter.report({'accuracy': self.accuracy}, self)

In other words, if you just write trainer.extend (extensions.LogReport ()), you will get the minimum required logging, and if you just call chainer.reporter.report in your model, you can do any reporting. You can do it. Convenient.

By the way, if you execute the above example, you will get the following reporting in result / log.

[{u'elapsed_time': 6.940603971481323,
  u'epoch': 1,
  u'iteration': 600,
  u'main/accuracy': 0.9213500021273892,
  u'main/loss': 0.2787705701092879,
  u'validation/main/accuracy': 0.9598000049591064,
  u'validation/main/loss': 0.13582063710317016},
 {u'elapsed_time': 14.360282897949219,
  u'epoch': 2,
  u'iteration': 1200,
  ...

Try to increase the reporting content

This is convenient enough, but using ʻextensions.ParameterStatistics` is rich like Tensorflow's tf.summary.histogram. Monitoring is possible.

...
trainer.extend(extensions.ParameterStatistics(model))
...

The representative value of each Link matrix included in the model is automatically collected and added to the result. It's very convenient.

[{u'None/predictor/l1/W/data/max': 0.18769985591371854,
  u'None/predictor/l1/W/data/mean': 0.0006860141372822189,
  u'None/predictor/l1/W/data/min': -0.21658104345202445,
  u'None/predictor/l1/W/data/percentile/0': -0.1320047355272498,
  u'None/predictor/l1/W/data/percentile/1': -0.08497818301255008,
  u'None/predictor/l1/W/data/percentile/2': -0.04122352957670082,
  u'None/predictor/l1/W/data/percentile/3': 0.0008963784146650747,
  u'None/predictor/l1/W/data/percentile/4': 0.0428067545834066,
  ...

The above execution result is in gist.

Let's summarize Chainer's reporting function

Purpose

Understand the reporting mechanism in Chainer

Try to increase the reporting content