One of the weighting factor initialization methods, also known as "MSRA". It is used in Kaiming He et al.'S paper, and it seems that the name comes from the paper author and Microsoft Research.
HeNormal initializes the weighting factor $ W $ with a random number that follows a normal distribution with standard deviation $ \ sqrt {2 / N} $. Here, $ N $ is the size of the input, which is the size of the input vector for links.Linear
and the number of channels x kernel height x kernel width for links.Convolution2D
.
As a method of initializing with HeNormal, math.sqrt (2)
may be specified for wscale
when instantiating link.
For example, specify chainer.links.Linear (100, 100, wscale = math.sqrt (2))
.
This method is also used in Network-in-Network model in Chainer's ImageNet sample.
However, it turns out that the behavior of wscale
has changed unintentionally from v1.9.0.
I have already registered an issue.
wscale doesn't have backward compatibility
If you want to use HeNormal explicitly, you should avoid specifying it with wscale
until this problem is solved.
For example, if you say "Initialized with HeNormal" when publishing the result using Chainer implementation, or if you are using HeNormal in the research of other teams and want to arrange the initialization method to reproduce it, , I think it is better to use Weight Initializer as described below.
In Chainer, you can specify how to initialize the weight factor by using Weight Initializer. To initialize using Initializer, specify an instance of initializer in ʻinitialW` of link. Example:
class MLP(chainer.Chain):
def __init__(self, n_in, n_units, n_out):
initializer = chainer.initializers.HeNormal()
super(MLP, self).__init__(
l1=L.Linear(None, n_units, initialW=initializer), # n_in -> n_units
l2=L.Linear(None, n_units, initialW=initializer), # n_units -> n_units
l3=L.Linear(None, n_out, initialW=initializer), # n_units -> n_out
)
...
Recommended Posts