tl;dr
You can get the minimum logging for free by writing trainer.extend (extensions.LogReport ())
. You can be happy if you keep ʻextensions.ParameterStatistics`.
It's been a year since the Training Loop Abstraction was introduced to Chainer in June 2016, v1.11. I think there are many people who want to use the assets (debt?) Of the abstract learning loop and haven't touched it. Since I want to master Trainer and other things in a sloppy manner, I have summarized the reporting of learning metrics that are not so popular (but very important).
Monitoring various metrics is very important for deep learning. For example, Tensorflow has a powerful reporting feature called summary.
Officially, there is a class called Reporter
that seems to report, Documentation, but Official If you look at Example, the class Reporter
does not appear anywhere, and the accuracy etc. are clearly stated. There is no place to write it out. What does this mean?
If you look at Official Example L96-97, you can see LogReport
and I'm adding ʻextensions
such as PrintReport to the
Trainer` object.
# Write a log of evaluation statistics for each epoch
trainer.extend(extensions.LogReport())
And look at chainer / chainer / training / trainer.py: L291-L299 Looking at it, we declare with reporter.scope (self.observation)
at the beginning of the learning loop. This declaration ensures that all calls to chainer.reporter.report ({'name': value_to_report})
made in the learning loop are stored in self.observation
.
def run(self):
....
reporter = self.reporter
stop_trigger = self.stop_trigger
# main training loop
try:
while not stop_trigger(self):
self.observation = {}
with reporter.scope(self.observation):
update()
for name, entry in extensions:
if entry.trigger(self):
entry.extension(self)
In other words, even if you don't explicitly say Reporter
, the metrics are actually collected behind the scenes. Now, regarding the collected data, by calling ʻentry.extension (self)`, [chainer / chainer / training / extensions / log_report.py: L67-L88](https://github.com/chainer/chainer/ It is passed to blob / v2.0.0/chainer/training/extensions/log_report.py#L67-L88).
def __call__(self, trainer):
# accumulate the observations
keys = self._keys
observation = trainer.observation
summary = self._summary
if keys is None:
summary.add(observation)
else:
summary.add({k: observation[k] for k in keys if k in observation})
if self._trigger(trainer):
# output the result
stats = self._summary.compute_mean()
stats_cpu = {}
for name, value in six.iteritems(stats):
stats_cpu[name] = float(value) # copy to CPU
updater = trainer.updater
stats_cpu['epoch'] = updater.epoch
stats_cpu['iteration'] = updater.iteration
stats_cpu['elapsed_time'] = trainer.elapsed_time
The number of epochs etc. is added in this function and output in the appropriate place. If no extension with any reporting function is registered, the data will simply be discarded.
Now I see why I didn't explicitly read Reporter
in the official Example. But why is the accuracy (ʻaccuracy) and loss (
loss) even though I have never called
chainer.reporter.report`?
So, take a look at chainer / chainer / links / model / classifier.py If you look, you can see that chainer.reporter.report
is called in the official implementation.
self.loss = self.lossfun(self.y, t)
reporter.report({'loss': self.loss}, self)
if self.compute_accuracy:
self.accuracy = self.accfun(self.y, t)
reporter.report({'accuracy': self.accuracy}, self)
In other words, if you just write trainer.extend (extensions.LogReport ())
, you will get the minimum required logging, and if you just call chainer.reporter.report
in your model, you can do any reporting. You can do it. Convenient.
By the way, if you execute the above example, you will get the following reporting in result / log
.
[{u'elapsed_time': 6.940603971481323,
u'epoch': 1,
u'iteration': 600,
u'main/accuracy': 0.9213500021273892,
u'main/loss': 0.2787705701092879,
u'validation/main/accuracy': 0.9598000049591064,
u'validation/main/loss': 0.13582063710317016},
{u'elapsed_time': 14.360282897949219,
u'epoch': 2,
u'iteration': 1200,
...
This is convenient enough, but using ʻextensions.ParameterStatistics` is rich like Tensorflow's tf.summary.histogram. Monitoring is possible.
...
trainer.extend(extensions.ParameterStatistics(model))
...
The representative value of each Link matrix included in the model is automatically collected and added to the result. It's very convenient.
[{u'None/predictor/l1/W/data/max': 0.18769985591371854,
u'None/predictor/l1/W/data/mean': 0.0006860141372822189,
u'None/predictor/l1/W/data/min': -0.21658104345202445,
u'None/predictor/l1/W/data/percentile/0': -0.1320047355272498,
u'None/predictor/l1/W/data/percentile/1': -0.08497818301255008,
u'None/predictor/l1/W/data/percentile/2': -0.04122352957670082,
u'None/predictor/l1/W/data/percentile/3': 0.0008963784146650747,
u'None/predictor/l1/W/data/percentile/4': 0.0428067545834066,
...
The above execution result is in gist.