As a practice of Chainer, I tried a simple linear separation problem.
I want to learn a function that determines whether or not I am obese by inputting height (cm), weight (kg), and chest circumference (cm). However, obesity is defined here as having a BMI (body weight divided by the square of height) of 25 or more. Therefore, information on body weight and height is sufficient to determine whether or not the patient is obese, and information on chest circumference is not required. Then, will the learning device made this time be able to determine whether or not it is obese by focusing only on height and weight without being confused by the information on the chest circumference?
I made dummy data in Excel. Height, weight, chest circumference, and obesity flags are arranged in one line separated by spaces. Height, weight, and chest circumference were each generated by adding a normal random number with an appropriate variance to the average value of men. The obesity flag was set to 1 if the BMI calculated from height and weight was 25 or more. We made 1000 of these independently, 900 of which were for learning and 100 for evaluation.
Height Weight Chest circumference Obesity flag
152.5110992 70.64096855 76.24909648 1
176.5483602 72.54812988 79.99468908 0
171.9815877 78.13768514 80.87788608 1
180.013773 77.60660479 79.71464192 0
171.9685041 81.20240554 84.93720091 1
186.3999693 77.03393024 82.25099179 0
175.1117213 81.23388203 86.89111757 1
As you can see, it's almost linear.
After practicing Chainer, I tried to build a multi-layer perceptron. It has a three-layer structure with three dimensions for input, four dimensions for hidden elements, and two dimensions for output. (Since it is a linear separation task, it can also be a single-layer perceptron.) Other settings are as follows.
--Activation function: ReLu --Optimization algorithm: Adam --Error function: Softmax cross entropy --Dropout rate: 0.5 --Number of mini batches: 5 --Number of iterations (epoch): 100
class MLP(Chain):
def __init__(self):
super(MLP, self).__init__(
# 3-4-2 dimensional network
l1=L.Linear(3, 4),
l2=L.Linear(4, 2),
)
def forward(self, x, t, train):
h1 = F.dropout(F.relu(self.l1(x)), train=train)
y = self.l2(h1)
return F.softmax_cross_entropy(y, t), F.accuracy(y, t)
# Instantiation
model = MLP()
# Adam is used as the optimization algorithm
optimizer = optimizers.Adam()
optimizer.setup(model)
N = 900 # Number of training data
N_test = 100 # Number of evaluation data
n_epoch = 100 # number of iterations
batchsize = 5 # mini batch
# Omitted below
The error function and the accuracy rate were plotted by the number of epochs.
--Left blue: Error function in training data --Left green: Correct answer rate in learning data --Right blue: Error function in evaluation data --Right green: Correct answer rate in evaluation data
It was a little less than 80% performance. subtle?
We also looked at what kind of output is made to the evaluation data after learning.
Height Weight Weight Obesity Flag Estimated by Chest circumference System Correct Obesity Flag
[ 179.30055237 69.73477936 84.73832703] 0 0
[ 176.89619446 84.05502319 85.10128021] 1 1
[ 172.04129028 77.36618805 87.89541626] 1 1
[ 168.48660278 73.91072845 84.5171814 ] 1 1
[ 166.53656006 71.42696381 83.17546844] 0 1
[ 163.44270325 77.11021423 90.57539368] 1 1
[ 180.63993835 77.33372498 85.33548737] 0 0
[ 165.73175049 71.87976837 80.57328033] 0 1
The second and fourth from the bottom, which were originally obese, were judged to be normal. All of them are low when looking only at their weight. Haven't you grasped the relationship with your height?
overflow encountered in subtract
, and the error function was often nan. This error seems to be caused by entering 0 in log when calculating the cross-entropy error function. Actually, at first, I used a linear function as the activation function, but it seems that it didn't work.--Fall into a local solution There were times when I fell into a local solution and learning did not proceed. I reduced the number of mini-batch, reassigned the initial weights, tried many times, and tried again until it was learned well.
Performance of this task is less than 80% low. I want to get a feel for it by making various adjustments such as learning rate, mini-batch size, dropout rate, and data normalization.
Recommended Posts