Command line tool for Chainer ChainerCMD

――It is troublesome to write the same training code every time you want to try something by writing only the code of the model and the dataset quickly. --Chainer has a handy tool called Trainer, but it's still a hassle to write code to add a similar extension every time to create train.py. ――If you pass a YAML file that describes the learning settings, model file, data set file, etc. as an argument, it will be easier if there is a command line tool that will start learning as usual. I made it. ――It's basically a complete oleoresabo tool, so it wasn't made with the idea of being used by people at all, but it was released because it was a big deal.

ChainerCMD
https://github.com/mitmul/chainercmd

1. Installation

$ pip install chainercmd

2. How to use

$ chainer init

Then, the following four files will be created.

config.yml
custom_extension.yml
dataset.py
loss.py
model.py

The contents are like this.

`config.yml`


stop_trigger: [20, "epoch"]
# max_workspace_size: 8388608  # default: 8 * 1024 * 1024

dataset:
  train:
    file: dataset.py
    name: MNIST
    batchsize: 128
    args:
      split: train
      ndim: 3
  valid:
    file: dataset.py
    name: MNIST
    batchsize: 64
    args:
      split: valid
      ndim: 3

model:
  file: model.py
  name: LeNet5
  args:
    n_class: 10

loss:
  module: chainer.links.model.classifier
  # file: loss.py  # If you use your own loss definition, remove "module" key above and use "file" key to specify the path to the file which describes the "name" class for a Loss link.
  name: Classifier

optimizer:
  method: MomentumSGD
  args:
    lr: 0.01
  weight_decay: 0.0001
  lr_drop_ratio: 0.1
  lr_drop_triggers:
    points: [10, 15]
    unit: epoch

# You can ommit this part
# updater_creator:
#   file: updater_creator.py
#   name: MyUpdaterCreator
#   args:
#     print: True

trainer_extension:
  - custom:
      file: custom_extension.py
      name: CustomExtension
      args:
          message: 'I am learning...'
  - LogReport:
      trigger: [1, "epoch"]
  - dump_graph:
      root_name: main/loss
      out_name: cg.dot
  - observe_lr:
      trigger: [1, "epoch"]
  - ParameterStatistics:
      links:
        - conv1
        - conv2
        - conv3
        - fc4
        - fc5
      report_params: True
      report_grads: True
      prefix: null
      trigger: [1, "epoch"]
  - Evaluator:
      module: chainer.training.extensions
      name: Evaluator
      trigger: [1, "epoch"]
      prefix: val
    # You can specify other evaluator like this:
    #   module: chainercv.extensions
    #   name: SemanticSegmentationEvaluator
    #   trigger: [1, "epoch"]
    #   prefix: val
  - PlotReport:
      y_keys:
        - conv1/W/data/mean
        - conv2/W/data/mean
        - conv3/W/data/mean
        - conv4/W/data/mean
        - fc5/W/data/mean
        - fc6/W/data/mean
      x_key: epoch
      file_name: parameter_mean.png
      trigger: [1, "epoch"]
  - PlotReport:
      y_keys:
        - conv1/W/data/std
        - conv2/W/data/std
        - conv3/W/data/std
        - conv4/W/data/std
        - fc5/W/data/std
        - fc6/W/data/std
      x_key: epoch
      file_name: parameter_std.png
      trigger: [1, "epoch"]
  - PlotReport:
      y_keys:
        - main/loss
        - val/main/loss
      x_key: epoch
      file_name: loss.png
      trigger: [1, "epoch"]
  - PlotReport:
      y_keys:
        - main/accuracy
        - val/main/accuracy
      x_key: epoch
      file_name: accuracy.png
      trigger: [1, "epoch"]
  - PrintReport:
      entries:
        - epoch
        - iteration
        - main/loss
        - main/accuracy
        - val/main/loss
        - val/main/accuracy
        - elapsed_time
        - lr
      trigger: [1, "epoch"]
  - ProgressBar:
      update_interval: 10
      trigger: [10, "iteration"]
  - snapshot:
      filename: trainer_{.updater.epoch}_epoch
      trigger: [10, "epoch"]

`custom_extension.py`


import chainer


class CustomExtension(chainer.training.Extension):

    def __init__(self, message):
        self._message = message

    def initialize(self, trainer):
        self._message += ' and Trainer ID is: {}'.format(id(trainer))

    def __call__(self, trainer):
        pass

    def serialize(self, serializer):
        self._message = serializer('_message', self._message)

`dataset.py`


import chainer


class Dataset(chainer.dataset.DatasetMixin):

    def __init__(self, split='train'):
        super().__init__()

    def __len__(self):
        pass

    def get_example(self, i):
        pass


# You can delete this
class MNIST(chainer.dataset.DatasetMixin):

    def __init__(self, split='train', ndim=3):
        super().__init__()
        train, valid = chainer.datasets.get_mnist(ndim=ndim)
        self.d = train if split == 'train' else valid

    def __len__(self):
        return len(self.d)

    def get_example(self, i):
        return self.d[i]

`loss.py`


from chainer import link
from chainer import reporter
from chainer.functions.evaluation import accuracy
from chainer.functions.loss import softmax_cross_entropy


class MyLoss(link.Chain):

    def __init__(self, predictor):
        super().__init__()
        self.lossfun = softmax_cross_entropy.softmax_cross_entropy
        self.accfun = accuracy.accuracy
        self.y = None
        self.loss = None
        self.accuracy = None

        with self.init_scope():
            self.predictor = predictor

    def __call__(self, *args):
        assert len(args) >= 2
        x = args[:-1]
        t = args[-1]
        self.y = None
        self.loss = None
        self.accuracy = None
        self.y = self.predictor(*x)
        self.loss = self.lossfun(self.y, t)
        reporter.report({'loss': self.loss}, self)
        self.accuracy = self.accfun(self.y, t)
        reporter.report({'accuracy': self.accuracy}, self)
        return self.loss

`model.py`


import chainer
import chainer.functions as F
import chainer.links as L


class Model(chainer.Chain):

    """Model definition.
    This is a template of model definition.
    """

    def __init__(self, n_class):
        super().__init__()
        with self.init_scope():
            pass

    def __call__(self, x):
        pass


# You can delete this! It's a sample model
class LeNet5(chainer.Chain):

    def __init__(self, n_class):
        super().__init__()
        with self.init_scope():
            self.conv1 = L.Convolution2D(None, 6, 5, 1)
            self.conv2 = L.Convolution2D(6, 16, 5, 1)
            self.conv3 = L.Convolution2D(16, 120, 4, 1)
            self.fc4 = L.Linear(None, 84)
            self.fc5 = L.Linear(84, n_class)

    def __call__(self, x):
        h = F.relu(self.conv1(x))
        h = F.max_pooling_2d(h, 2, 2)
        h = F.relu(self.conv2(h))
        h = F.max_pooling_2d(h, 2, 2)
        h = F.relu(self.conv3(h))
        h = F.relu(self.fc4(h))
        return self.fc5(h)

After that, basically edit this for individual tasks, and when you're done

$ chainer train config.yml --gpus 0

Then learning will start using the device with GPU ID: 0. If you want to use multiple GPUs, specify the IDs of the GPUs you want to use, separated by spaces, such as --gpus 0 1 2 3.

3. Description of each file

config.yml

It is a file that describes learning settings and which file to use as a model file.

dataset

dataset:
  train:
    file: dataset.py
    name: MNIST
    batchsize: 128
    args:
      split: train
      ndim: 3
  valid:
    file: dataset.py
    name: MNIST
    batchsize: 64
    args:
      split: valid
      ndim: 3

It is the setting of the training data set and the verification data set. Define them in another file in advance, and use the file key to specify the path to that file, and name to specify the class name of the dataset class in that file. The value of ʻargsmust be a dictionary, which is passed as a keyword argument to the constructor of the dataset class, just like** args. batchsize` is the size of the mini-batch created from each dataset.

model & loss

model:
  file: model.py
  name: LeNet5
  args:
    n_class: 10

This is almost the same, instantiating a class named name in the file in the path specified by file to create a model. If there is ʻargs` at that time, pass it as a keyword argument.

loss:
  module: chainer.links
  # file: loss.py  # If you use your own loss definition, remove "module" key above and use "file" key to specify the path to the file which describes the "name" class for a Loss link.
  name: Classifier

The loss part is basically the same, and ʻargs is omitted here, but if you specify a dictionary with ʻargs as the key like the model part, it will calculate the loss. It is passed as a keyword argument to the constructor of the class for. For loss, you can use the key module so that you can also use chainer.links.Classifier etc. prepared by Chainer in advance. module and file cannot be used at the same time.

optimizer

optimizer:
  method: MomentumSGD
  args:
    lr: 0.01
  weight_decay: 0.0001
  lr_drop_ratio: 0.1
  lr_drop_triggers:
    points: [10, 15]
    unit: epoch

This is the setting part of Optimizer. method will be the class name under Chainer's ʻoptimizers module. ʻArgs is a keyword argument to pass to its constructor. If the weight_decay key is present, Weight decay is added as an Optimizer hook. If both lr_drop_ratio and lr_drop_triggers are present, the learning rate is dropped using the ManualScheduleTrigger. In the dictionary passed to lr_drop_triggers, when points doubles the learning rate by lr_drop_ratio, and ʻunit is the unit of that timing ("epoch or ʻiteration` can be specified). In the case of the above example, the learning rate of Momentum SGD is 0.1 times for 10 epochs and 0.1 times for 15 epochs again.

updater

Updater can be customized by preparing a function that takes iterator, optimizer, and devices and returns an Updater object, and specifies it with the updater_creater key in config.yml.

Other files

A wrapper class that calculates dataset classes, model classes, and losses written using ordinary Chainer. For custom_extension.py, write your own Extension class, specify it from config.yml, and rewrite it when you want to add it to Trainer.

4. Learn

The file created by chainer init contains the models and datasets required to execute MNIST Example from the beginning, and if config.yml is left as it is, those model datasets etc. Is prepared to run the MNIST example using. So all you have to do is hit the chainer command. Specify train in the subcommand.

$ chainer train config.yml --gpus 0

Specify the path of the config YAML file you want to use next to the subcommand train. The file name does not have to be config.yml. If you want to use GPU, specify the device ID with the --gpus option. If multiple specifications such as --gpus 0 1 2 3 are specified, ParallelUpdater or MultiprocessParallelUpdater (if NCCL is enabled) will be automatically selected and learning will be performed on multiple GPUs.

When doing on Ubuntu, you can avoid errors around PlotReport by setting MPLBACKEND = Agg and environment variables.

The result of actually turning is as follows. I tried using 4 GPUs in vain.

$ MPLBACKEND=Agg chainer train config.yml --gpus 0 1 2 3

chainer version: 2.0.1
cuda: True, cudnn: True, nccl: True
result_dir: result/config_2017-07-18_23-26-41_0
train: 60000
valid: 10000
/home/shunta/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:142: UserWarning: optimizer.lr is changed to 0.0025 by MultiprocessParallelUpdater for new batch size.
  format(optimizer.lr))
epoch       iteration   main/loss   validation/main/loss  main/accuracy  validation/main/accuracy  elapsed_time  lr
1           118         0.890775    0.234464              0.739672       0.928896                  8.45449       0.0025
2           235         0.198075    0.141786              0.939503       0.957476                  13.914        0.0025
3           352         0.128017    0.120378              0.960737       0.960839                  19.1064       0.0025
4           469         0.100555    0.0979902             0.96895        0.969739                  24.3107       0.0025
5           586         0.0865762   0.077968              0.971888       0.97587                   29.2581       0.0025
6           704         0.0734014   0.0672336             0.976562       0.978837                  34.3428       0.0025
7           821         0.0683174   0.0618281             0.977564       0.979826                  39.1815       0.0025
8           938         0.0617364   0.0748559             0.980235       0.976958                  44.0893       0.0025
9           1055        0.0573468   0.0596004             0.981904       0.980024                  49.0457       0.0025
10          1172        0.0531992   0.0578394             0.98364        0.982694                  54.3706       0.0025
11          1290        0.047489    0.0485524             0.986096       0.984573                  59.3655       0.00025
12          1407        0.0417473   0.0482626             0.987513       0.984968                  64.18         0.00025
13          1524        0.0406346   0.0473873             0.987914       0.984771                  69.0114       0.00025
14          1641        0.0405981   0.0479212             0.987847       0.985265                  74.0731       0.00025
15          1758        0.0394898   0.0478847             0.988114       0.986155                  79.3369       0.00025
16          1875        0.0394069   0.0472816             0.988181       0.984968                  84.2785       2.5e-05
17          1993        0.0389244   0.0471326             0.988546       0.985166                  89.4715       2.5e-05
18          2110        0.0391655   0.046991              0.988181       0.985463                  94.6602       2.5e-05
19          2227        0.0390729   0.0468674             0.988381       0.985364                  99.7827       2.5e-05
20          2344        0.038587    0.0471131             0.988315       0.985166                  104.962       2.5e-05

Since two PlotReports were set in config.yml, specify in config.yml in the result directory (a directory with a name that concatenates the base name of the config file and the date / time is created under result). The images (loss.png and accuracy.png) of the created file name are created.

6. About the result directory

When learning is started with the chainer command, the file name of the automatically specified config file (when config.yml is specified, the part of config looking into the extension) and the time at the start are also included. As a result, a directory is automatically created under the result directory that can be created in the executed hierarchy, and model files, loss files, config files, etc. are copied to it without permission. That directory is specified in Trainer's ʻout`, so snapshots and log files are also written there.

7. Conclusion

The things that can be done with such a tool are very limited, but I often wrote the same train.py every time, so I tried to generalize it appropriately. However, if you do NLP or GAN, you can not use it because it is strict unless you can also specify Updater. You may be able to use it for simple image recognition tasks.