As a practice of Chainer, I tried a simple linear separation problem.

environment

OSX 10.11
python 2.7.11
chainer 1.7.2

task

I want to learn a function that determines whether or not I am obese by inputting height (cm), weight (kg), and chest circumference (cm). However, obesity is defined here as having a BMI (body weight divided by the square of height) of 25 or more. Therefore, information on body weight and height is sufficient to determine whether or not the patient is obese, and information on chest circumference is not required. Then, will the learning device made this time be able to determine whether or not it is obese by focusing only on height and weight without being confused by the information on the chest circumference?

data

I made dummy data in Excel. Height, weight, chest circumference, and obesity flags are arranged in one line separated by spaces. Height, weight, and chest circumference were each generated by adding a normal random number with an appropriate variance to the average value of men. The obesity flag was set to 1 if the BMI calculated from height and weight was 25 or more. We made 1000 of these independently, 900 of which were for learning and 100 for evaluation.

 Height Weight Chest circumference Obesity flag
152.5110992	70.64096855	76.24909648	1
176.5483602	72.54812988	79.99468908	0
171.9815877	78.13768514	80.87788608	1
180.013773	77.60660479	79.71464192	0
171.9685041	81.20240554	84.93720091	1
186.3999693	77.03393024	82.25099179	0
175.1117213	81.23388203	86.89111757	1

As you can see, it's almost linear.

Learner

After practicing Chainer, I tried to build a multi-layer perceptron. It has a three-layer structure with three dimensions for input, four dimensions for hidden elements, and two dimensions for output. (Since it is a linear separation task, it can also be a single-layer perceptron.) Other settings are as follows.

--Activation function: ReLu --Optimization algorithm: Adam --Error function: Softmax cross entropy --Dropout rate: 0.5 --Number of mini batches: 5 --Number of iterations (epoch): 100

class MLP(Chain):
  def __init__(self):
    super(MLP, self).__init__(
 # 3-4-2 dimensional network
      l1=L.Linear(3, 4),
      l2=L.Linear(4, 2),
     )
  def forward(self, x, t, train):
    h1 = F.dropout(F.relu(self.l1(x)), train=train)
    y = self.l2(h1)
    return F.softmax_cross_entropy(y, t), F.accuracy(y, t)

# Instantiation
model = MLP()
# Adam is used as the optimization algorithm
optimizer = optimizers.Adam()
optimizer.setup(model)

 N = 900 # Number of training data
 N_test = 100 # Number of evaluation data
 n_epoch = 100 # number of iterations
 batchsize = 5 # mini batch
# Omitted below

result

The error function and the accuracy rate were plotted by the number of epochs.

--Left blue: Error function in training data --Left green: Correct answer rate in learning data --Right blue: Error function in evaluation data --Right green: Correct answer rate in evaluation data

It was a little less than 80% performance. subtle?

We also looked at what kind of output is made to the evaluation data after learning.

 Height Weight Weight Obesity Flag Estimated by Chest circumference System Correct Obesity Flag
[ 179.30055237   69.73477936   84.73832703] 0 0
[ 176.89619446   84.05502319   85.10128021] 1 1
[ 172.04129028   77.36618805   87.89541626] 1 1
[ 168.48660278   73.91072845   84.5171814 ] 1 1
[ 166.53656006   71.42696381   83.17546844] 0 1
[ 163.44270325   77.11021423   90.57539368] 1 1
[ 180.63993835   77.33372498   85.33548737] 0 0
[ 165.73175049   71.87976837   80.57328033] 0 1

The second and fourth from the bottom, which were originally obese, were judged to be normal. All of them are low when looking only at their weight. Haven't you grasped the relationship with your height?

Where to get stuck

overflow encountered in subtract When using softmax_cross_entropy, I got an error such as overflow encountered in subtract, and the error function was often nan. This error seems to be caused by entering 0 in log when calculating the cross-entropy error function. Actually, at first, I used a linear function as the activation function, but it seems that it didn't work.

--Fall into a local solution There were times when I fell into a local solution and learning did not proceed. I reduced the number of mini-batch, reassigned the initial weights, tried many times, and tried again until it was learned well.

from now on

Performance of this task is less than 80% low. I want to get a feel for it by making various adjustments such as learning rate, mini-batch size, dropout rate, and data normalization.

I tried super easy linear separation with Chainer