Simple classification model with neural network

Continuing from the last time, I tried to study neural networks.

This time, we challenged to build a simple classification model. Prepare the following two variables.

0 \le x_1,x_2 \le 1

Then, let's classify as follows according to the sum of the two.

t(x_1,x_2) = 
\left\{
\begin{matrix}
0 & (x_1 + x_2 < 1) \\
1 & (x_1 + x_2 \ge 1)
\end{matrix}
\right.

Let's implement such a simple model. As a neural concept of 0 layer ??, consider the following formula.

y(x_1,x_2) = \sigma(w_1 x_1 + w_2 x_2 + w_0)

However, σ is a sigmoid function, and its output is limited to [0,1]. This is really characteristic, and it seems that it has a meaning such as classification A if y <0.5 and classification B if y> 0.5. Actually, I feel that this is quite a part of the liver. In a neural network, a mechanism that gives an answer by adjusting w1, w2, w0. So, considering the function that estimates the error, it seems to be cross entropy in this case. Specifically, when the correct answer t is given to x1 and x2, the cross entropy is expressed as follows.

E(w_1,w_2,w_0) = - (t \ln y + (1-t) \ln (1-y) )

For details, refer to "Pattern recognition and machine learning" (around p.235), but as a rough image ... Pay attention to 0 <y <1

--Consider y as a class A probability (when t = 1) --Consider 1-y as a class B probability (when t = 0)

Then, the probability p (t) when t is given is by unifying the equation using X ^ 0 = 1.

p(t)= y^t  (1-y)^{1-t}

It seems to be. After that, if you log, the formula of E (w1, w2, w0) is derived. This is multiplied for each sample, so for each yi

p(t)= \prod_i y_i^t  (1-y_i)^{1-t}

So take a log

\ln p(t)= \sum_i (t \ln y_i + (1-t) \ln (1-y_i) )

It will be. This is like the sum of the correct answer rates, so the more you match, the larger the value. Therefore, we defined E by inverting the sign in order to solve it as an optimization (minimization) problem. Again, the cross entropy is as follows: (Multiple sample versions)

E(w_1,w_2,w_0) = - \sum_i (t \ln y_i + (1-t) \ln (1-y_i) )

Now, let's actually write the source code. First, the variable part.

# make placeholder
x_ph = tf.placeholder(tf.float32, [None, 3])
t_ph = tf.placeholder(tf.float32, [None, 1])

x_ph is the input part, it should be two, but I put a dummy variable (fixed to 1) for w0. Maybe you don't need this? t_ph is the output part and is for giving the correct answer. Contains 0 or 1.

Then, set the sample data to be included in this with random numbers.

# deta making???
N = 1000
x = np.random.rand(N,2)

# sum > 1.0 -> 1 : else -> 0
t = np.floor(np.sum(x,axis=1))

# ext x
x = np.hstack([x,np.ones(N).reshape(N,1)])

After making two x, make y, and then add one dummy variable. As a result, x is three-dimensional in one sample. y is created using decimal point truncation so that x1 + x2 is 1 when it is 1 or more and 0 when it is less than 1.

And, for the time being, prepare about two layers of neural network.

# create newral parameter(depth=2,input:3 > middle:30 > output:1)
hidden1 = tf.layers.dense(x_ph, 30, activation=tf.nn.relu)
newral_out = tf.layers.dense(hidden1, 1, activation=tf.nn.sigmoid)

Now let's define the cross entropy and define the learning to minimize it. It's just a copy. I'm still not sure about the convergence algorithm settings.

# Minimize the cross entropy
ce = -tf.reduce_sum(t_ph * tf.log(newral_out) + (1-t_ph)*tf.log(1-newral_out) )
optimizer = tf.train.AdamOptimizer()
train = optimizer.minimize(ce)

So, I tried to make the whole source by combining the above appropriately.

import numpy as np
import tensorflow as tf

# deta making???
N = 1000
x = np.random.rand(N,2)

# sum > 1.0 > 1 : else > 0
t = np.floor(np.sum(x,axis=1))

# ext x
x = np.hstack([x,np.ones(N).reshape(N,1)])

train_x = x
train_t = t

# make placeholder
x_ph = tf.placeholder(tf.float32, [None, 3])
t_ph = tf.placeholder(tf.float32, [None, 1])
# create newral parameter(depth=2,input:3 > middle:30 > output:1)
hidden1 = tf.layers.dense(x_ph, 30, activation=tf.nn.relu)
newral_out = tf.layers.dense(hidden1, 1, activation=tf.nn.sigmoid)

# Minimize the cross entropy
ce = -tf.reduce_sum(t_ph * tf.log(newral_out) + (1-t_ph)*tf.log(1-newral_out) )
optimizer = tf.train.AdamOptimizer()
train = optimizer.minimize(ce)


# initialize tensorflow session
sess = tf.Session()
sess.run(tf.global_variables_initializer())

for k in range(1001):

    if np.mod(k,100) == 0:
        # get Newral predict data
        y_newral = sess.run( newral_out
                         ,feed_dict = {
                         x_ph: x, #I put the input data in x
                         })
        
        ce_newral = sess.run( ce
                         ,feed_dict = {
                         x_ph: x, #I put the input data in x
                         t_ph: t.reshape(len(t),1) #I put the correct answer data in y
                         })
        
        sign_newral = np.sign(np.array(y_newral).reshape([len(t),1]) - 0.5)
        sign_orig = np.sign(np.array(t.reshape([len(t),1])) - 0.5)
        NGCNT = np.sum(np.abs(sign_newral-sign_orig))/2
        # check predict NewralParam
        print('[%d] loss %.2f hit_per:%.2f' % (k,ce_newral,(N-NGCNT)/N))


    # shuffle train_x and train_t
    n = np.random.permutation(len(train_x))
    train_x = train_x[n]
    train_t = train_t[n].reshape([len(train_t), 1])

    # execute train process
    sess.run(train,feed_dict = {
                     x_ph: train_x, # x is input data
                     t_ph: train_t # t is true data
                     })


#For test
x = np.array([0.41,0.5,1]).reshape([1,3])
loss_newral = sess.run( newral_out
                 ,feed_dict = {
                 x_ph: x, #I put the input data in x
                 })
# <0.Is 5 a success?
print(loss_newral)

If you move this, it will be as follows.

[0] loss 727.36 hit_per:0.35
[100] loss 587.68 hit_per:0.78
[200] loss 465.78 hit_per:0.89
[300] loss 358.70 hit_per:0.93
[400] loss 282.45 hit_per:0.94
[500] loss 230.54 hit_per:0.96
[600] loss 194.34 hit_per:0.97
[700] loss 168.11 hit_per:0.98
[800] loss 148.34 hit_per:0.98
[900] loss 132.93 hit_per:0.99
[1000] loss 120.56 hit_per:0.99
[[0.27204064]]

Where loss is written, you can see that the value of cross entropy is gradually decreasing. (Change the variable name ...) And hit_per is the correct answer rate for the training data. At 1.00, it becomes 100%, but it seems that the success rate is 99%. Finally, we are giving the output of the neural network when we put in the appropriate test, x1 = 0.45, x2 = 0.5. In this case, it should be class B t = 0, so if the output is smaller than 0.5, it is correct, and if it is closer to 0.5, it can be read that it is lost. This time it's 0.27, so I'm pretty confident in answering it.

Finally, I will write a little about how to get the correct answer. I think the point is that we are giving out the number of NG. Some pickups below.

        # get Newral predict data
        y_newral = sess.run( newral_out
                         ,feed_dict = {
                         x_ph: x, #I put the input data in x
                         })
        
        sign_newral = np.sign(np.array(y_newral).reshape([len(t),1]) - 0.5)
        sign_orig = np.sign(np.array(t.reshape([len(t),1])) - 0.5)
        NGCNT = np.sum(np.abs(sign_newral-sign_orig))/2

y_newral contains the classification information estimated from x in the range [0,1]. This number means whether it falls into the category of greater than or less than 0.5, so it must be converted into a form that can be discriminated. Therefore, I subtracted the value of 0.5, translated it into the range of [-0.5, 0.5], and then extracted only the code so that either +1 or -1 was selected. The same process is performed for t (correct answer) to generate a correct answer value of + 1 / -1. These two are correct when they have the same value and incorrect when they have different values, but when the pattern is written out, they have the following relationship.

Estimated value	Correct answer value	Estimated correctness	Estimated value-Correct answer value
1	1	OK	0
1	-1	NG	2
-1	1	NG	-2
-1	-1	OK	0

Therefore, the number of NGs can be counted by calculating sum (abs (estimated value-correct answer value)) / 2. By using this, the correct answer rate ??? HIT rate could be calculated.

It worked so well, but if I set the neural network more simply and without the intermediate layer, it took a long time to converge, or it seemed that overdoing it would cause something like zero percent ???. For example, if you go with the following settings ...

newral_out = tf.layers.dense(x_ph, 1, activation=tf.nn.sigmoid)

Result is???

[0] loss 761.80 hit_per:0.50
[100] loss 732.66 hit_per:0.50
[200] loss 706.48 hit_per:0.50
[300] loss 682.59 hit_per:0.50
[400] loss 660.61 hit_per:0.50
[500] loss 640.24 hit_per:0.54
[600] loss 621.26 hit_per:0.62
[700] loss 603.52 hit_per:0.70
[800] loss 586.88 hit_per:0.76
[900] loss 571.22 hit_per:0.80
[1000] loss 556.44 hit_per:0.84
[[0.52383685]]

It seems that it is not good enough. What's good and how it works ??? I'm still lacking in understanding, but it seems that something has happened and it's working because there is some middle class. How much should we increase to make it look like? I would like to study the theory around that as well.