What is HeNormal

One of the weighting factor initialization methods, also known as "MSRA". It is used in Kaiming He et al.'S paper, and it seems that the name comes from the paper author and Microsoft Research.

HeNormal initializes the weighting factor $ W $ with a random number that follows a normal distribution with standard deviation $ \ sqrt {2 / N} $. Here, $ N $ is the size of the input, which is the size of the input vector for links.Linear and the number of channels x kernel height x kernel width for links.Convolution2D.

Is it not possible to specify wscale?

As a method of initializing with HeNormal, math.sqrt (2) may be specified for wscale when instantiating link. For example, specify chainer.links.Linear (100, 100, wscale = math.sqrt (2)). This method is also used in Network-in-Network model in Chainer's ImageNet sample. However, it turns out that the behavior of wscale has changed unintentionally from v1.9.0. I have already registered an issue. wscale doesn't have backward compatibility If you want to use HeNormal explicitly, you should avoid specifying it with wscale until this problem is solved.

For example, if you say "Initialized with HeNormal" when publishing the result using Chainer implementation, or if you are using HeNormal in the research of other teams and want to arrange the initialization method to reproduce it, , I think it is better to use Weight Initializer as described below.

How to use Weight Initializer

In Chainer, you can specify how to initialize the weight factor by using Weight Initializer. To initialize using Initializer, specify an instance of initializer in ʻinitialW` of link. Example:

class MLP(chainer.Chain):

    def __init__(self, n_in, n_units, n_out):
        initializer = chainer.initializers.HeNormal()
        super(MLP, self).__init__(
            l1=L.Linear(None, n_units, initialW=initializer),  # n_in -> n_units
            l2=L.Linear(None, n_units, initialW=initializer),  # n_units -> n_units
            l3=L.Linear(None, n_out, initialW=initializer),  # n_units -> n_out
        )

    ...

References

Kaiming He el al. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification,(2015)

It is better to use Weight Initializer when initializing with HeNormal with Chainer

What is HeNormal

Is it not possible to specify wscale?

How to use Weight Initializer

References