Simple classification model with neural network

Continuing from the last time, I tried to study neural networks.

This time, we challenged to build a simple classification model. Prepare the following two variables.

0 \le x_1,x_2 \le 1

Then, let's classify as follows according to the sum of the two.

t(x_1,x_2) = 
\left\{
\begin{matrix}
0 & (x_1 + x_2 < 1) \\
1 & (x_1 + x_2 \ge 1)
\end{matrix}
\right.

Let's implement such a simple model. As a neural concept of 0 layer ??, consider the following formula.

y(x_1,x_2) = \sigma(w_1 x_1 + w_2 x_2 + w_0)

However, σ is a sigmoid function, and its output is limited to [0,1]. This is really characteristic, and it seems that it has a meaning such as classification A if y <0.5 and classification B if y> 0.5. Actually, I feel that this is quite a part of the liver. In a neural network, a mechanism that gives an answer by adjusting w1, w2, w0. So, considering the function that estimates the error, it seems to be cross entropy in this case. Specifically, when the correct answer t is given to x1 and x2, the cross entropy is expressed as follows.

E(w_1,w_2,w_0) = - (t \ln y + (1-t) \ln (1-y) )

For details, refer to "Pattern recognition and machine learning" (around p.235), but as a rough image ... Pay attention to 0 <y <1

--Consider y as a class A probability (when t = 1) --Consider 1-y as a class B probability (when t = 0)

Then, the probability p (t) when t is given is by unifying the equation using X ^ 0 = 1.

p(t)= y^t  (1-y)^{1-t}

It seems to be. After that, if you log, the formula of E (w1, w2, w0) is derived. This is multiplied for each sample, so for each yi

p(t)= \prod_i y_i^t  (1-y_i)^{1-t}

So take a log

\ln p(t)= \sum_i (t \ln y_i + (1-t) \ln (1-y_i) )

It will be. This is like the sum of the correct answer rates, so the more you match, the larger the value. Therefore, we defined E by inverting the sign in order to solve it as an optimization (minimization) problem. Again, the cross entropy is as follows: (Multiple sample versions)

E(w_1,w_2,w_0) = - \sum_i (t \ln y_i + (1-t) \ln (1-y_i) )

Now, let's actually write the source code. First, the variable part.

# make placeholder
x_ph = tf.placeholder(tf.float32, [None, 3])
t_ph = tf.placeholder(tf.float32, [None, 1])

x_ph is the input part, it should be two, but I put a dummy variable (fixed to 1) for w0. Maybe you don't need this? t_ph is the output part and is for giving the correct answer. Contains 0 or 1.

Then, set the sample data to be included in this with random numbers.

# deta making???
N = 1000
x = np.random.rand(N,2)

# sum > 1.0 -> 1 : else -> 0
t = np.floor(np.sum(x,axis=1))

# ext x
x = np.hstack([x,np.ones(N).reshape(N,1)])

After making two x, make y, and then add one dummy variable. As a result, x is three-dimensional in one sample. y is created using decimal point truncation so that x1 + x2 is 1 when it is 1 or more and 0 when it is less than 1.

And, for the time being, prepare about two layers of neural network.

# create newral parameter(depth=2,input:3 > middle:30 > output:1)
hidden1 = tf.layers.dense(x_ph, 30, activation=tf.nn.relu)
newral_out = tf.layers.dense(hidden1, 1, activation=tf.nn.sigmoid)

Now let's define the cross entropy and define the learning to minimize it. It's just a copy. I'm still not sure about the convergence algorithm settings.

# Minimize the cross entropy
ce = -tf.reduce_sum(t_ph * tf.log(newral_out) + (1-t_ph)*tf.log(1-newral_out) )
optimizer = tf.train.AdamOptimizer()
train = optimizer.minimize(ce)

So, I tried to make the whole source by combining the above appropriately.

import numpy as np
import tensorflow as tf

# deta making???
N = 1000
x = np.random.rand(N,2)

# sum > 1.0 > 1 : else > 0
t = np.floor(np.sum(x,axis=1))

# ext x
x = np.hstack([x,np.ones(N).reshape(N,1)])

train_x = x
train_t = t

# make placeholder
x_ph = tf.placeholder(tf.float32, [None, 3])
t_ph = tf.placeholder(tf.float32, [None, 1])
# create newral parameter(depth=2,input:3 > middle:30 > output:1)
hidden1 = tf.layers.dense(x_ph, 30, activation=tf.nn.relu)
newral_out = tf.layers.dense(hidden1, 1, activation=tf.nn.sigmoid)

# Minimize the cross entropy
ce = -tf.reduce_sum(t_ph * tf.log(newral_out) + (1-t_ph)*tf.log(1-newral_out) )
optimizer = tf.train.AdamOptimizer()
train = optimizer.minimize(ce)


# initialize tensorflow session
sess = tf.Session()
sess.run(tf.global_variables_initializer())

for k in range(1001):

    if np.mod(k,100) == 0:
        # get Newral predict data
        y_newral = sess.run( newral_out
                         ,feed_dict = {
                         x_ph: x, #I put the input data in x
                         })
        
        ce_newral = sess.run( ce
                         ,feed_dict = {
                         x_ph: x, #I put the input data in x
                         t_ph: t.reshape(len(t),1) #I put the correct answer data in y
                         })
        
        sign_newral = np.sign(np.array(y_newral).reshape([len(t),1]) - 0.5)
        sign_orig = np.sign(np.array(t.reshape([len(t),1])) - 0.5)
        NGCNT = np.sum(np.abs(sign_newral-sign_orig))/2
        # check predict NewralParam
        print('[%d] loss %.2f hit_per:%.2f' % (k,ce_newral,(N-NGCNT)/N))


    # shuffle train_x and train_t
    n = np.random.permutation(len(train_x))
    train_x = train_x[n]
    train_t = train_t[n].reshape([len(train_t), 1])

    # execute train process
    sess.run(train,feed_dict = {
                     x_ph: train_x, # x is input data
                     t_ph: train_t # t is true data
                     })


#For test
x = np.array([0.41,0.5,1]).reshape([1,3])
loss_newral = sess.run( newral_out
                 ,feed_dict = {
                 x_ph: x, #I put the input data in x
                 })
# <0.Is 5 a success?
print(loss_newral)

If you move this, it will be as follows.

[0] loss 727.36 hit_per:0.35
[100] loss 587.68 hit_per:0.78
[200] loss 465.78 hit_per:0.89
[300] loss 358.70 hit_per:0.93
[400] loss 282.45 hit_per:0.94
[500] loss 230.54 hit_per:0.96
[600] loss 194.34 hit_per:0.97
[700] loss 168.11 hit_per:0.98
[800] loss 148.34 hit_per:0.98
[900] loss 132.93 hit_per:0.99
[1000] loss 120.56 hit_per:0.99
[[0.27204064]]

Where loss is written, you can see that the value of cross entropy is gradually decreasing. (Change the variable name ...) And hit_per is the correct answer rate for the training data. At 1.00, it becomes 100%, but it seems that the success rate is 99%. Finally, we are giving the output of the neural network when we put in the appropriate test, x1 = 0.45, x2 = 0.5. In this case, it should be class B t = 0, so if the output is smaller than 0.5, it is correct, and if it is closer to 0.5, it can be read that it is lost. This time it's 0.27, so I'm pretty confident in answering it.

Finally, I will write a little about how to get the correct answer. I think the point is that we are giving out the number of NG. Some pickups below.

        # get Newral predict data
        y_newral = sess.run( newral_out
                         ,feed_dict = {
                         x_ph: x, #I put the input data in x
                         })
        
        sign_newral = np.sign(np.array(y_newral).reshape([len(t),1]) - 0.5)
        sign_orig = np.sign(np.array(t.reshape([len(t),1])) - 0.5)
        NGCNT = np.sum(np.abs(sign_newral-sign_orig))/2

y_newral contains the classification information estimated from x in the range [0,1]. This number means whether it falls into the category of greater than or less than 0.5, so it must be converted into a form that can be discriminated. Therefore, I subtracted the value of 0.5, translated it into the range of [-0.5, 0.5], and then extracted only the code so that either +1 or -1 was selected. The same process is performed for t (correct answer) to generate a correct answer value of + 1 / -1. These two are correct when they have the same value and incorrect when they have different values, but when the pattern is written out, they have the following relationship.

Estimated value Correct answer value Estimated correctness Estimated value-Correct answer value
1 1 OK 0
1 -1 NG 2
-1 1 NG -2
-1 -1 OK 0

Therefore, the number of NGs can be counted by calculating sum (abs (estimated value-correct answer value)) / 2. By using this, the correct answer rate ??? HIT rate could be calculated.

It worked so well, but if I set the neural network more simply and without the intermediate layer, it took a long time to converge, or it seemed that overdoing it would cause something like zero percent ???. For example, if you go with the following settings ...

newral_out = tf.layers.dense(x_ph, 1, activation=tf.nn.sigmoid)

Result is???

[0] loss 761.80 hit_per:0.50
[100] loss 732.66 hit_per:0.50
[200] loss 706.48 hit_per:0.50
[300] loss 682.59 hit_per:0.50
[400] loss 660.61 hit_per:0.50
[500] loss 640.24 hit_per:0.54
[600] loss 621.26 hit_per:0.62
[700] loss 603.52 hit_per:0.70
[800] loss 586.88 hit_per:0.76
[900] loss 571.22 hit_per:0.80
[1000] loss 556.44 hit_per:0.84
[[0.52383685]]

It seems that it is not good enough. What's good and how it works ??? I'm still lacking in understanding, but it seems that something has happened and it's working because there is some middle class. How much should we increase to make it look like? I would like to study the theory around that as well.

Recommended Posts

Simple classification model with neural network
Neural network with Python (scikit-learn)
3. Normal distribution with neural network!
Neural network starting with Chainer
4. Circle parameters with neural network!
Simple neural network implementation using Chainer
Neural network with OpenCV 3 and Python 3
Image classification with self-made neural network by Keras and PyTorch
[TensorFlow] [Keras] Neural network construction with Keras
Simple neural network theory and implementation
[Deep learning] Image classification with convolutional neural network [DW day 4]
Compose with a neural network! Run Magenta
Predict time series data with neural network
Persist the neural network built with PyBrain
Simple neural network implementation using Chainer-Data preparation-
Simple neural network implementation using Chainer-Model description-
2. Mean and standard deviation with neural network!
[Chainer] Document classification by convolutional neural network
Parametric Neural Network
Simple neural network implementation using Chainer-optimization algorithm setting-
Experiment with various optimization algorithms with a neural network
Verification of Batch Normalization with multi-layer neural network
Train MNIST data with a neural network in PyTorch
Implement Convolutional Neural Network
Model using convolutional neural network in natural language processing
Model fitting with lmfit
Implement Neural Network from 1
Convolutional neural network experience
Regression with linear model
Tech Circle ML # 8 Chainer with Recurrent Neural Language Model
I ran the TensorFlow tutorial with comments (first neural network: the beginning of the classification problem)
I tried to implement a basic Recurrent Neural Network model
Generalized linear model (GLM) and neural network are the same (1)
Create a web application that recognizes numbers with a neural network
Try to build a deep learning / neural network with scratch
Python sample to learn XOR with genetic algorithm with neural network
Generalized linear model (GLM) and neural network are the same (2)
Implement a 3-layer neural network
Simulate neural activity with Brian2
Document classification with Sentence Piece
Easy image classification with TensorFlow
Pokemon classification by topic model
Neural network implementation in python
Calibrate the model with PyCaret
Pytorch Neural Network (CNN) Tutorial 1.3.1.
Simple typing game with DragonRuby
Simple synonym dictionary with sudachipy
Neural network implementation (NumPy only)
TensorFlow Tutorial-Convolutional Neural Network (Translation)
Network programming with Python Scapy
Network performance measurement with iperf
Challenge image classification by TensorFlow2 + Keras 4 ~ Let's predict with trained model ~
[Text classification] I implemented Convolutional Neural Networks for Sentence Classification with Chainer