Touch the object of the neural network

I touched Tensorflow for the first time the other day, but this time I touched a neural network object for a while. At first, I tried high-level challenges to try various things, but it didn't work at all, so I will write super-super rudimentary things in the spirit of valuing the basics.

I touched it tensorflow.layers.dense That part. Apparently, it corresponds to the "layer" of the Newral Network. When I searched for it, I found the following example sentence.

hidden1 = tf.layers.dense(x_ph, 32, activation=tf.nn.relu)

If you build this, it seems that you can do quite complicated things. However, this time, I will simply consider the following model. There are two inputs and one output.

I don't think it's a neural network anymore, but ... I think I have to understand this, so I'll start from here.

First, let's create a layer of neural network.

newral_out = tf.layers.dense(x_ph, 1)

With this, the input is defined by x_ph, and the number of outputs seems to be one. It seems that you can do various things with options, but this time I will try to make it feel like a mere linear combination. x_ph is a placeholder of the box for data entry, and this time it is defined below.

x_ph = tf.placeholder(tf.float32, [None, 2])

The size of [None 2] and None means that you can insert any number of samples, and 2 is the number of variables to be input, so here we will set it to 2 (X1, X2).

This time, really simply, the output of the following formula is equivalent to y1.

y_1 = w_1 x_1 + w_2 x_2 + w_0

Is it possible to estimate w_1, w_2, w_0 well by inserting arbitrary (x1, x2) and y1 appropriately ?? It's a really simple problem.

Actually, at this point, the code of the first sample can be used as it is, so it's probably very easy to implement, and it looks like the following.

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(newral_out - y_ph))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

After that, is it OK if I let you learn this?

# initialize tensorflow session
sess = tf.Session()
sess.run(tf.global_variables_initializer())

for k in range(101):
    # shuffle train_x and train_y
    n = np.random.permutation(len(train_x))
    train_x = train_x[n]
    train_y = train_y[n].reshape([len(train_y), 1])

    # execute train process
    sess.run(train,feed_dict = {
                     x_ph: train_x, # x is input data
                     y_ph: train_y # y is true data
                     })

This seems to work once, but since it is a good idea, I would like to know how the parameters of the Newral Network behave. There seems to be a way to read it directly, but it seems to be difficult. The problem is simple, so I thought about my own analysis function.

The idea is to find w0, w1, w2 by the least squares method. Think of the neural network as a virtual linear approximation equation and check its parameters. First, enter the data

{{x_1}_k},{{x_2}_k}

And set the information inferred through Newral

{y^{(new)}}_k = Newral({{x_1}_k},{{x_2}_k})

will do. This {y ^ {(new)}} _k contains some definitive number. Since there may be Bias in the estimation of the neural network, consider the following equation with it added.

{y^{(new)}}_k = w_1{x_1}_k + w_2{x_2}_k + w0

At this time, w_1, w_2, w_0 are unknown, and the others are determined. Here, for each k, it is as follows when considered as a system of equations.

\left(
\begin{matrix}
{y^{(new)}}_1 \\
... \\
{y^{(new)}}_K \\
\end{matrix}
\right)
 = 
\left(
\begin{matrix}
{x_1}_1 & {x_2}_1 & 1 \\
... & ... \\
{x_1}_K & {x_2}_K & 1 \\
\end{matrix}
\right)
\left(
\begin{matrix}
w_1 \\
w_2 \\
w_0 \\
\end{matrix}
\right)

At this point, we end up with a simple least-squares problem. For simplicity

A
 = 
\left(
\begin{matrix}
{x_1}_1 & {x_2}_1 & 1 \\
... & ... \\
{x_1}_K & {x_2}_K & 1 \\
\end{matrix}
\right)

And then,

\left(
\begin{matrix}
w_1 \\
w_2 \\
w_0 \\
\end{matrix}
\right)
=
\left(
A^T A
\right)^{-1}
A^T
\left(
\begin{matrix}
{y^{(new)}}_1 \\
... \\
{y^{(new)}}_K \\
\end{matrix}
\right)

So, it seems that the parameters can be obtained. When actually creating a true value, if you set w_1, w_2, w_0 to some numerical value and observe how you are approaching the correct answer ???, you will feel that you understand the contents of the neural network a little. (Laughs)

So, I'll paste the whole code.

import numpy as np
#import matplotlib.pyplot as plt
import tensorflow as tf


# deta making???
N = 50
x = np.random.rand(N,2)
# true param???
w = np.array([0.5,0.5]).reshape(2,1)
# sum > 1.0 > 1 : else > 0
#y = np.floor(np.sum(x,axis=1))
y = np.matmul(x,w)
train_x = x
train_y = y


# make placeholder
x_ph = tf.placeholder(tf.float32, [None, 2])
y_ph = tf.placeholder(tf.float32, [None, 1])
# create newral parameter(depth=1,input:2 > output:1)
newral_out = tf.layers.dense(x_ph, 1)


# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(newral_out - y_ph))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)


# initialize tensorflow session
sess = tf.Session()
sess.run(tf.global_variables_initializer())


for k in range(101):

    
    if np.mod(k,10) == 0:
        # get Newral predict data
        y_newral = sess.run( newral_out
                         ,feed_dict = {
                         x_ph: x, #I put the input data in x
                         y_ph: y.reshape(len(y),1) #I put the correct answer data in y
                         })
        
        # check for newral_parameter(w0,w1,w2)???
        # 
        x_ext = np.hstack([x,np.ones(N).reshape(N,1)])
        A = np.linalg.inv( np.matmul(np.transpose(x_ext),x_ext) )
        A = np.matmul(A,np.transpose(x_ext))
        w_ext = np.matmul(A,y_newral)
        
        # errcheck??? ([newral predict] vs [true value])
        err = y_newral - y
        err = np.matmul(np.transpose(err),err)
        
        
        # check y_newral
        # check LS solution(approaching to NewralNet Parameter)
        # check predict NewralParam
        print('[%d] err:%.5f w1:%.2f w2:%.2f bias:%.2f' % (k,err,w_ext[0],w_ext[1],w_ext[2]))


    # shuffle train_x and train_y
    n = np.random.permutation(len(train_x))
    train_x = train_x[n]
    train_y = train_y[n].reshape([len(train_y), 1])

    # execute train process
    sess.run(train,feed_dict = {
                     x_ph: train_x, # x is input data
                     y_ph: train_y # y is true data
                     })

I didn't write anything, but I decided to calculate the error by the sum of squares. Looking at this result ... ???

[0] err:1.06784 w1:0.36 w2:0.36 bias:0.00
[10] err:0.02231 w1:0.45 w2:0.45 bias:0.06
[20] err:0.00795 w1:0.47 w2:0.47 bias:0.03
[30] err:0.00283 w1:0.48 w2:0.48 bias:0.02
[40] err:0.00101 w1:0.49 w2:0.49 bias:0.01
[50] err:0.00036 w1:0.49 w2:0.49 bias:0.01
[60] err:0.00013 w1:0.50 w2:0.50 bias:0.00
[70] err:0.00005 w1:0.50 w2:0.50 bias:0.00
[80] err:0.00002 w1:0.50 w2:0.50 bias:0.00
[90] err:0.00001 w1:0.50 w2:0.50 bias:0.00
[100] err:0.00000 w1:0.50 w2:0.50 bias:0.00

Since the true parameter is w1 = 0.5, w2 = 0.5, bias (w0) = 0, it seems that it is converging to a good feeling after turning about 30 times.

Neural networks seemed complicated, but by making them so simple, they look like simple linear combinations and equivalent circuits. It may be a result that is obvious to the expert, but it was a big harvest for me.

With this kind of feeling, I will try something a little more complicated next time!