It is better to use Weight Initializer when initializing with HeNormal with Chainer

What is HeNormal

One of the weighting factor initialization methods, also known as "MSRA". It is used in Kaiming He et al.'S paper, and it seems that the name comes from the paper author and Microsoft Research.

HeNormal initializes the weighting factor $ W $ with a random number that follows a normal distribution with standard deviation $ \ sqrt {2 / N} $. Here, $ N $ is the size of the input, which is the size of the input vector for links.Linear and the number of channels x kernel height x kernel width for links.Convolution2D.

Is it not possible to specify wscale?

As a method of initializing with HeNormal, math.sqrt (2) may be specified for wscale when instantiating link. For example, specify chainer.links.Linear (100, 100, wscale = math.sqrt (2)). This method is also used in Network-in-Network model in Chainer's ImageNet sample. However, it turns out that the behavior of wscale has changed unintentionally from v1.9.0. I have already registered an issue. wscale doesn't have backward compatibility If you want to use HeNormal explicitly, you should avoid specifying it with wscale until this problem is solved.

For example, if you say "Initialized with HeNormal" when publishing the result using Chainer implementation, or if you are using HeNormal in the research of other teams and want to arrange the initialization method to reproduce it, , I think it is better to use Weight Initializer as described below.

How to use Weight Initializer

In Chainer, you can specify how to initialize the weight factor by using Weight Initializer. To initialize using Initializer, specify an instance of initializer in ʻinitialW` of link. Example:

class MLP(chainer.Chain):

    def __init__(self, n_in, n_units, n_out):
        initializer = chainer.initializers.HeNormal()
        super(MLP, self).__init__(
            l1=L.Linear(None, n_units, initialW=initializer),  # n_in -> n_units
            l2=L.Linear(None, n_units, initialW=initializer),  # n_units -> n_units
            l3=L.Linear(None, n_out, initialW=initializer),  # n_units -> n_out
        )

    ...

References

  1. Kaiming He el al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,(2015)

Recommended Posts

It is better to use Weight Initializer when initializing with HeNormal with Chainer
When you want to use it as it is when using it with lambda memo
It is more convenient to use csv-table when writing a table with python-sphinx
It is better to use NTFS when connecting SSD to Linux to create a file server.
When it is troublesome to copy what you built with vue
It is convenient to use Icecream instead of print when debugging.
Is it deprecated to use pip directly?
It is convenient to use Layers when putting a library on Lambda
[OpenCV] When you want to check if it is read properly with imread
Autoencoder with Chainer (Notes on how to use + trainer)
Scraping with Python-Selenium is old! ?? ・ ・ ・ How to use Pyppeteer
Use tensorboard with Chainer
When it is troublesome to set up an SMTP server locally when sending mail with Python.
[Python] What is a tuple? Explains how to use without tuples and how to use it with examples.
Solution when you want to use cv_bridge with python3 (virtualenv)
After all it is wrong to cat with python subprocess.
Is it possible to detect similar images only with ImageHash?
When I try to push with heroku, it doesn't work
[python] A note when trying to use numpy with Cython
Use aggdraw when you want to draw beautifully with pillow
[C language] Close () It is dangerous to retry when it fails
[Python] What is a slice? An easy-to-understand explanation of how to use it with a concrete example.
[Python] What is pip? Explain the command list and how to use it with actual examples
Microsoft's Deep Learning framework "CNTK" is now compatible with Python, making it much easier to use
Use chainer with Jetson TK1
A memorandum when I tried to get it automatically with selenium
When I try to use pip, SSL module is not available.