――It is troublesome to write the same training code every time you want to try something by writing only the code of the model and the dataset quickly. --Chainer has a handy tool called Trainer, but it's still a hassle to write code to add a similar extension every time to create train.py. ――If you pass a YAML file that describes the learning settings, model file, data set file, etc. as an argument, it will be easier if there is a command line tool that will start learning as usual. I made it. ――It's basically a complete oleoresabo tool, so it wasn't made with the idea of being used by people at all, but it was released because it was a big deal.
ChainerCMD
https://github.com/mitmul/chainercmd
$ pip install chainercmd
$ chainer init
Then, the following four files will be created.
The contents are like this.
config.yml
stop_trigger: [20, "epoch"]
# max_workspace_size: 8388608 # default: 8 * 1024 * 1024
dataset:
train:
file: dataset.py
name: MNIST
batchsize: 128
args:
split: train
ndim: 3
valid:
file: dataset.py
name: MNIST
batchsize: 64
args:
split: valid
ndim: 3
model:
file: model.py
name: LeNet5
args:
n_class: 10
loss:
module: chainer.links.model.classifier
# file: loss.py # If you use your own loss definition, remove "module" key above and use "file" key to specify the path to the file which describes the "name" class for a Loss link.
name: Classifier
optimizer:
method: MomentumSGD
args:
lr: 0.01
weight_decay: 0.0001
lr_drop_ratio: 0.1
lr_drop_triggers:
points: [10, 15]
unit: epoch
# You can ommit this part
# updater_creator:
# file: updater_creator.py
# name: MyUpdaterCreator
# args:
# print: True
trainer_extension:
- custom:
file: custom_extension.py
name: CustomExtension
args:
message: 'I am learning...'
- LogReport:
trigger: [1, "epoch"]
- dump_graph:
root_name: main/loss
out_name: cg.dot
- observe_lr:
trigger: [1, "epoch"]
- ParameterStatistics:
links:
- conv1
- conv2
- conv3
- fc4
- fc5
report_params: True
report_grads: True
prefix: null
trigger: [1, "epoch"]
- Evaluator:
module: chainer.training.extensions
name: Evaluator
trigger: [1, "epoch"]
prefix: val
# You can specify other evaluator like this:
# module: chainercv.extensions
# name: SemanticSegmentationEvaluator
# trigger: [1, "epoch"]
# prefix: val
- PlotReport:
y_keys:
- conv1/W/data/mean
- conv2/W/data/mean
- conv3/W/data/mean
- conv4/W/data/mean
- fc5/W/data/mean
- fc6/W/data/mean
x_key: epoch
file_name: parameter_mean.png
trigger: [1, "epoch"]
- PlotReport:
y_keys:
- conv1/W/data/std
- conv2/W/data/std
- conv3/W/data/std
- conv4/W/data/std
- fc5/W/data/std
- fc6/W/data/std
x_key: epoch
file_name: parameter_std.png
trigger: [1, "epoch"]
- PlotReport:
y_keys:
- main/loss
- val/main/loss
x_key: epoch
file_name: loss.png
trigger: [1, "epoch"]
- PlotReport:
y_keys:
- main/accuracy
- val/main/accuracy
x_key: epoch
file_name: accuracy.png
trigger: [1, "epoch"]
- PrintReport:
entries:
- epoch
- iteration
- main/loss
- main/accuracy
- val/main/loss
- val/main/accuracy
- elapsed_time
- lr
trigger: [1, "epoch"]
- ProgressBar:
update_interval: 10
trigger: [10, "iteration"]
- snapshot:
filename: trainer_{.updater.epoch}_epoch
trigger: [10, "epoch"]
custom_extension.py
import chainer
class CustomExtension(chainer.training.Extension):
def __init__(self, message):
self._message = message
def initialize(self, trainer):
self._message += ' and Trainer ID is: {}'.format(id(trainer))
def __call__(self, trainer):
pass
def serialize(self, serializer):
self._message = serializer('_message', self._message)
dataset.py
import chainer
class Dataset(chainer.dataset.DatasetMixin):
def __init__(self, split='train'):
super().__init__()
def __len__(self):
pass
def get_example(self, i):
pass
# You can delete this
class MNIST(chainer.dataset.DatasetMixin):
def __init__(self, split='train', ndim=3):
super().__init__()
train, valid = chainer.datasets.get_mnist(ndim=ndim)
self.d = train if split == 'train' else valid
def __len__(self):
return len(self.d)
def get_example(self, i):
return self.d[i]
loss.py
from chainer import link
from chainer import reporter
from chainer.functions.evaluation import accuracy
from chainer.functions.loss import softmax_cross_entropy
class MyLoss(link.Chain):
def __init__(self, predictor):
super().__init__()
self.lossfun = softmax_cross_entropy.softmax_cross_entropy
self.accfun = accuracy.accuracy
self.y = None
self.loss = None
self.accuracy = None
with self.init_scope():
self.predictor = predictor
def __call__(self, *args):
assert len(args) >= 2
x = args[:-1]
t = args[-1]
self.y = None
self.loss = None
self.accuracy = None
self.y = self.predictor(*x)
self.loss = self.lossfun(self.y, t)
reporter.report({'loss': self.loss}, self)
self.accuracy = self.accfun(self.y, t)
reporter.report({'accuracy': self.accuracy}, self)
return self.loss
model.py
import chainer
import chainer.functions as F
import chainer.links as L
class Model(chainer.Chain):
"""Model definition.
This is a template of model definition.
"""
def __init__(self, n_class):
super().__init__()
with self.init_scope():
pass
def __call__(self, x):
pass
# You can delete this! It's a sample model
class LeNet5(chainer.Chain):
def __init__(self, n_class):
super().__init__()
with self.init_scope():
self.conv1 = L.Convolution2D(None, 6, 5, 1)
self.conv2 = L.Convolution2D(6, 16, 5, 1)
self.conv3 = L.Convolution2D(16, 120, 4, 1)
self.fc4 = L.Linear(None, 84)
self.fc5 = L.Linear(84, n_class)
def __call__(self, x):
h = F.relu(self.conv1(x))
h = F.max_pooling_2d(h, 2, 2)
h = F.relu(self.conv2(h))
h = F.max_pooling_2d(h, 2, 2)
h = F.relu(self.conv3(h))
h = F.relu(self.fc4(h))
return self.fc5(h)
After that, basically edit this for individual tasks, and when you're done
$ chainer train config.yml --gpus 0
Then learning will start using the device with GPU ID: 0. If you want to use multiple GPUs, specify the IDs of the GPUs you want to use, separated by spaces, such as --gpus 0 1 2 3
.
config.yml
It is a file that describes learning settings and which file to use as a model file.
dataset
dataset:
train:
file: dataset.py
name: MNIST
batchsize: 128
args:
split: train
ndim: 3
valid:
file: dataset.py
name: MNIST
batchsize: 64
args:
split: valid
ndim: 3
It is the setting of the training data set and the verification data set. Define them in another file in advance, and use the file
key to specify the path to that file, and name
to specify the class name of the dataset class in that file. The value of ʻargsmust be a dictionary, which is passed as a keyword argument to the constructor of the dataset class, just like
** args.
batchsize` is the size of the mini-batch created from each dataset.
model & loss
model:
file: model.py
name: LeNet5
args:
n_class: 10
This is almost the same, instantiating a class named name
in the file in the path specified by file
to create a model. If there is ʻargs` at that time, pass it as a keyword argument.
loss:
module: chainer.links
# file: loss.py # If you use your own loss definition, remove "module" key above and use "file" key to specify the path to the file which describes the "name" class for a Loss link.
name: Classifier
The loss
part is basically the same, and ʻargs is omitted here, but if you specify a dictionary with ʻargs
as the key like the model
part, it will calculate the loss. It is passed as a keyword argument to the constructor of the class for. For loss
, you can use the key module
so that you can also use chainer.links.Classifier
etc. prepared by Chainer in advance. module
and file
cannot be used at the same time.
optimizer
optimizer:
method: MomentumSGD
args:
lr: 0.01
weight_decay: 0.0001
lr_drop_ratio: 0.1
lr_drop_triggers:
points: [10, 15]
unit: epoch
This is the setting part of Optimizer. method
will be the class name under Chainer's ʻoptimizers module. ʻArgs
is a keyword argument to pass to its constructor. If the weight_decay
key is present, Weight decay is added as an Optimizer hook. If both lr_drop_ratio
and lr_drop_triggers
are present, the learning rate is dropped using the ManualScheduleTrigger. In the dictionary passed to lr_drop_triggers
, when points
doubles the learning rate by lr_drop_ratio
, and ʻunit is the unit of that timing ("epoch
or ʻiteration` can be specified). In the case of the above example, the learning rate of Momentum SGD is 0.1 times for 10 epochs and 0.1 times for 15 epochs again.
updater
Updater can be customized by preparing a function that takes iterator, optimizer, and devices and returns an Updater object, and specifies it with the updater_creater key in config.yml.
A wrapper class that calculates dataset classes, model classes, and losses written using ordinary Chainer. For custom_extension.py
, write your own Extension class, specify it from config.yml, and rewrite it when you want to add it to Trainer.
The file created by chainer init
contains the models and datasets required to execute MNIST Example from the beginning, and if config.yml
is left as it is, those model datasets etc. Is prepared to run the MNIST example using. So all you have to do is hit the chainer
command. Specify train
in the subcommand.
$ chainer train config.yml --gpus 0
Specify the path of the config YAML file you want to use next to the subcommand train
. The file name does not have to be config.yml. If you want to use GPU, specify the device ID with the --gpus
option. If multiple specifications such as --gpus 0 1 2 3
are specified, ParallelUpdater or MultiprocessParallelUpdater (if NCCL is enabled) will be automatically selected and learning will be performed on multiple GPUs.
When doing on Ubuntu, you can avoid errors around PlotReport by setting MPLBACKEND = Agg
and environment variables.
The result of actually turning is as follows. I tried using 4 GPUs in vain.
$ MPLBACKEND=Agg chainer train config.yml --gpus 0 1 2 3
chainer version: 2.0.1
cuda: True, cudnn: True, nccl: True
result_dir: result/config_2017-07-18_23-26-41_0
train: 60000
valid: 10000
/home/shunta/.pyenv/versions/anaconda3-4.4.0/lib/python3.6/site-packages/chainer/training/updaters/multiprocess_parallel_updater.py:142: UserWarning: optimizer.lr is changed to 0.0025 by MultiprocessParallelUpdater for new batch size.
format(optimizer.lr))
epoch iteration main/loss validation/main/loss main/accuracy validation/main/accuracy elapsed_time lr
1 118 0.890775 0.234464 0.739672 0.928896 8.45449 0.0025
2 235 0.198075 0.141786 0.939503 0.957476 13.914 0.0025
3 352 0.128017 0.120378 0.960737 0.960839 19.1064 0.0025
4 469 0.100555 0.0979902 0.96895 0.969739 24.3107 0.0025
5 586 0.0865762 0.077968 0.971888 0.97587 29.2581 0.0025
6 704 0.0734014 0.0672336 0.976562 0.978837 34.3428 0.0025
7 821 0.0683174 0.0618281 0.977564 0.979826 39.1815 0.0025
8 938 0.0617364 0.0748559 0.980235 0.976958 44.0893 0.0025
9 1055 0.0573468 0.0596004 0.981904 0.980024 49.0457 0.0025
10 1172 0.0531992 0.0578394 0.98364 0.982694 54.3706 0.0025
11 1290 0.047489 0.0485524 0.986096 0.984573 59.3655 0.00025
12 1407 0.0417473 0.0482626 0.987513 0.984968 64.18 0.00025
13 1524 0.0406346 0.0473873 0.987914 0.984771 69.0114 0.00025
14 1641 0.0405981 0.0479212 0.987847 0.985265 74.0731 0.00025
15 1758 0.0394898 0.0478847 0.988114 0.986155 79.3369 0.00025
16 1875 0.0394069 0.0472816 0.988181 0.984968 84.2785 2.5e-05
17 1993 0.0389244 0.0471326 0.988546 0.985166 89.4715 2.5e-05
18 2110 0.0391655 0.046991 0.988181 0.985463 94.6602 2.5e-05
19 2227 0.0390729 0.0468674 0.988381 0.985364 99.7827 2.5e-05
20 2344 0.038587 0.0471131 0.988315 0.985166 104.962 2.5e-05
Since two PlotReports were set in config.yml, specify in config.yml in the result directory (a directory with a name that concatenates the base name of the config file and the date / time is created under result). The images (loss.png and accuracy.png) of the created file name are created.
When learning is started with the chainer
command, the file name of the automatically specified config file (when config.yml
is specified, the part of config
looking into the extension) and the time at the start are also included. As a result, a directory is automatically created under the result
directory that can be created in the executed hierarchy, and model files, loss files, config files, etc. are copied to it without permission. That directory is specified in Trainer's ʻout`, so snapshots and log files are also written there.
The things that can be done with such a tool are very limited, but I often wrote the same train.py every time, so I tried to generalize it appropriately. However, if you do NLP or GAN, you can not use it because it is strict unless you can also specify Updater. You may be able to use it for simple image recognition tasks.
Recommended Posts