Personal best practices when fine-tuning with Chainer

Merry Christmas! !! !! !! @ tabe2314.

In this article, I'll show you some personal best practices for fine-tuning existing models to create new ones as a Chainer technique that you can use from tomorrow.

Fine tuning

In order to train a neural network, copying the parameters of a model trained in another problem or another data set and using them as the initial values of the parameters of the new neural network is called fine tuning. A typical case is using a network trained with ImageNet, a general object recognition dataset, for other problems such as object detection and Semantic Segmentation.

In general, deep learning requires a large amount of training data, but by fine-tuning using a model trained in advance (with a large amount of data) as an initial value, the amount of data related to the problem that you actually want to solve is not large. You may be able to achieve sufficient performance even if it is sufficient. It also has the effect of reducing the learning time.

Converting a Caffe model to a Chainer model

By the way, when fine-tuning with Chainer, I think that I often want to base the model published on Caffe's Model Zoo. In such a case, it is recommended that you first convert the Caffe model to the Chainer model, and then copy the parameters as described below. This section describes how to do that. As an example, use VGG (https://gist.github.com/ksimonyan/3785162f95cd2d5fee77#file-readme-md).

1. Load the Caffe model

Load it as a Chainer model of the Caffe model downloaded with chainer.links.caffe.CaffeFunction. It takes a lot of time to load a large model.

python


from chainer.links.caffe import CaffeFunction
vgg = CaffeFunction('VGG_ILSVRC_19_layers.caffemodel')

After reading, you can access each layer and its parameters by typing vgg.conv1_1.W.data with the name set on the Caffe side.

2. Save as a Chainer model

Since loading a Caffe model takes time, it is recommended to save the model loaded with Caffe Function as a Chainer model. Starting with Chainer 1.5, HDF5 serialization is supported. However, in order to use this, Chainer's network definition is required by specification, but if you load the Caffe model, it is troublesome to create it separately. For this reason, it's a good idea to save it using cPickle.

python


import cPickle as pickle
pickle.dump(vgg, open('vgg.pkl', 'wb'))

Copy parameters to new model

We will show you how to copy parameters from a model converted from Caffe by the above procedure or from another trained Chainer model to a new model. The method presented here is useful when the network configuration of the source and the new model are partially different. If the configuration is exactly the same, the procedure is very simple, just train the loaded original model as it is and save the updated model with a different name.

By the way, there are the following situations where fine tuning is performed by partially changing the network configuration.

--Replace only the final layer to apply to problems with different classification categories (use the model learned by e.g., ImageNet for scene recognition) --Only the first half of the Convolution layer is diverted and the second half is learned normally. --The overall configuration is almost the same, but the parameters of some layers are changed ("Let's double the number of channels in the nth layer")

All of this can be achieved by passing the source model and the new model to the copy_model function below. This function searches for the Link (function with parameters) of the original model that has the same name as the Link of the copy destination and the shape of the parameter, and copies them. This happens recursively if the Chains are nested.

python


def copy_model(src, dst):
    assert isinstance(src, link.Chain)
    assert isinstance(dst, link.Chain)
    for child in src.children():
        if child.name not in dst.__dict__: continue
        dst_child = dst[child.name]
        if type(child) != type(dst_child): continue
        if isinstance(child, link.Chain):
            copy_model(child, dst_child)
        if isinstance(child, link.Link):
            match = True
            for a, b in zip(child.namedparams(), dst_child.namedparams()):
                if a[0] != b[0]:
                    match = False
                    break
                if a[1].data.shape != b[1].data.shape:
                    match = False
                    break
            if not match:
                print 'Ignore %s because of parameter mismatch' % child.name
                continue
            for a, b in zip(child.namedparams(), dst_child.namedparams()):
                b[1].data = a[1].data
            print 'Copy %s' % child.name

Now you can automatically copy only the common parts to the new model with a different configuration from the original model. Then learn the new model as you like!

Summary

――Fine tuning is a very important technique in practice. --If you want to fine tune a Caffe model with Chainer, first load it with CaffeFunction and save it with pickle. --By using the introduced copy_model function, you can copy the parameters of the part common to the new model from the original model. ――The rest is to boil or bake a new model!

Recommended Posts

Personal best practices when fine-tuning with Chainer
Best practices for messing with data with pandas
Personal best practice template to use when you want to make MVP with Flask
Lambda function deploy best practices with CircleCI + Lamvery
Seq2Seq (1) with chainer
Personal best practices for VS Code-fronted Python development environments
Use tensorboard with Chainer
Precautions when using Chainer