I tried running TensorFlow

I tried running TensorFlow, a machine learning library published by Google teacher. It will be touched with Python. Install it from Anaconda with the pip command. I will refer to the following sites.

Installing TensorFlow on Windows was easy even for Python beginners

It didn't work at first, but when I specified the version of TensorFlow even though it was a little old, it worked. So, let's move the code of the tutorial ??? immediately. Try it with the code you can get with GetStart, a simple linear regression analysis. I tried to refer to the following site.

Probably the most straightforward introduction to TensorFlow (Introduction)

y=0.1x+0.3

The problem is to take a sample of about 100 points on the plot and estimate the parameters of the equations 0.1 and 0.3.

import tensorflow as tf
import numpy as np

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3

# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but Tensorflow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# Before starting, initialize the variables.  We will 'run' this first.
init = tf.initialize_all_variables()

# Launch the graph.
sess = tf.Session()
sess.run(init)

# Fit the line.
for step in range(201):
    if step % 20 == 0:
        print((step, sess.run(W), sess.run(b)))
    sess.run(train)

# Learns best fit is W: [0.1], b: [0.3]

It's a very good sample in terms of how to use TensorFlow, but I'm not sure because the API is black-boxed. I thought about various things and analyzed what I was doing. Apparently, the initial values of the parameters w and b are set appropriately, and the convergence operation is performed using the steepest gradient method for the cost function of the least squares.

[Sudden descent (Wikipedia)](https://ja.wikipedia.org/wiki/%E6%9C%80%E6%80%A5%E9%99%8D%E4%B8%8B%E6%B3% 95)

The algorithm itself is not a big deal, and it is OK if the evaluation function is updated with the one-time partial differential of the parameter as the update amount. Specifically ... the sample is defined as follows. (In this example, it looks like N = 100)

\left\{ \left( x_n,y_n \right) \right\}_{n=1}^N

At this time, the relationship between x_n and y_n is configured as follows. (In this example, w = 0.1, b = 0.3 are true values)

y_n=wx_n+b

And since the cost function is the sum of squares of the residuals, it becomes as follows. You can think of w and b as initial parameters.

L(w,b)=\sum_{n=1}^N \left(y_n - (wx_n+b) \right)^2

Of course, when w and b are correct values, I'm happy

L(w,b)=0

Therefore, you should search for w and b that minimize L.

In the steepest gradient method, the initial parameters are updated once with partial differentiation, so obtain each.

\frac{\partial}{\partial w}L(w,b)=-2\sum_{n=1}^N 
\left( y_n - (wx_n+b)\right)x_n

\frac{\partial}{\partial b}L(w,b)=-2\sum_{n=1}^N 
\left( y_n - (wx_n+b)\right)

Using this, it seems that to update a certain parameter initial value, w ^ (k), b ^ (k), do as follows.

\left(
\begin{matrix}
w^{(k+1)} \\
b^{(k+1)}
\end{matrix}
\right)
=
\left(
\begin{matrix}
w^{(k)} \\
b^{(k)}
\end{matrix}
\right)
- \alpha
\left(
\begin{matrix}
\frac{\partial L}{\partial w} \\
\frac{\partial L}{\partial b}
\end{matrix}
\right)
\\
=
\left(
\begin{matrix}
w^{(k)} \\
b^{(k)}
\end{matrix}
\right)
+ 2\alpha
\left(
\begin{matrix}
\sum (y_n - (wx_n+b))x_n \\
\sum (y_n - (wx_n+b))
\end{matrix}
\right)

I'm very sorry, but amakudari decides the coefficient α as follows. This is determined by the characteristics of the coefficients passed to the TensorFlow library.

\alpha = \frac{1}{N} \beta

β ... Is there any name? It seems that this is the first setting parameter for convergence. In this sample, β = 0.5.

So, let's create our own Class and verify it.

How about the following feeling?

class calcWB:
  def __init__(self,x,y,w,b):
    self.x = x
    self.y = y
    self.w = w
    self.b = b
    # get length of sample data
    self.N = len(x)

  def run(self,beta):
    # calculate current redisual
    residual = self.y - (self.w*self.x + self.b)
    # calc dL/dw
    dw = -2*np.dot(residual,self.x)
    # calc dL/db
    db = -2*sum(residual)
    # calc alpha
    alpha = beta/self.N
    # update param(w,b)
    self.w = self.w - alpha*dw
    self.b = self.b - alpha*db
    return self.w,self.b

There are only two methods, one for initialization and one for learning. Using this to modify the first sample, it looks like this:

# setting param init data
w_init = np.random.rand()-.5
b_init = np.random.rand()-.5

# GradientDescentOptimizer parameter
beta = 0.5

# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3


# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
#W = tf.Variable(tf.random_uniform([1], -10, 10))
W = tf.Variable(w_init)
#b = tf.Variable(tf.zeros([1]))
b = tf.Variable(b_init)
y = W * x_data + b

# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(beta)
train = optimizer.minimize(loss)

# Before starting, initialize the variables. We will 'run' this first.
init = tf.global_variables_initializer()

# Launch the graph.
sess = tf.Session()
sess.run(init)


# create calcWB object
objCalcWB = calcWB(x_data,y_data,w_init,b_init)

# Fit the line.
for step in range(201):
  sess.run(train)
  w_tmp,b_tmp = objCalcWB.run(beta)

  if step % 20 == 0:
    #print(step, sess.run(W), sess.run(b))
    print('[from TensorFlow] k=%d w=%.10f b=%.10f' % (step, sess.run(W), sess.run(b)))
    print('[from calcWB] k=%d w=%.10f b=%.10f' % (step,w_tmp,b_tmp))

# Learns best fit is W: [0.1], b: [0.3]

Looking at the execution result ...

[from TensorFlow] k=0 w=0.4332985282 b=0.2284004837
[from calcWB]     k=0 w=0.4332985584 b=0.2284004998
[from TensorFlow] k=20 w=0.1567724198 b=0.2680215836
[from calcWB]     k=20 w=0.1567724287 b=0.2680215712
[from TensorFlow] k=40 w=0.1113634855 b=0.2935992479
[from calcWB]     k=40 w=0.1113634986 b=0.2935992433
[from TensorFlow] k=60 w=0.1022744998 b=0.2987188399
[from calcWB]     k=60 w=0.1022745020 b=0.2987188350
[from TensorFlow] k=80 w=0.1004552618 b=0.2997435629
[from calcWB]     k=80 w=0.1004552578 b=0.2997435619
[from TensorFlow] k=100 w=0.1000911444 b=0.2999486625
[from calcWB]     k=100 w=0.1000911188 b=0.2999486686
[from TensorFlow] k=120 w=0.1000182480 b=0.2999897301
[from calcWB]     k=120 w=0.1000182499 b=0.2999897517
[from TensorFlow] k=140 w=0.1000036523 b=0.2999979556
[from calcWB]     k=140 w=0.1000036551 b=0.2999979575
[from TensorFlow] k=160 w=0.1000007242 b=0.2999995947
[from calcWB]     k=160 w=0.1000007308 b=0.2999995937
[from TensorFlow] k=180 w=0.1000001431 b=0.2999999225
[from calcWB]     k=180 w=0.1000001444 b=0.2999999224
[from TensorFlow] k=200 w=0.1000000909 b=0.2999999523
[from calcWB]     k=200 w=0.1000000255 b=0.2999999832

It seems that it is a good idea because it has about 7 digits after the decimal point.

Well, I think I understand a little about what TensorFlow's Gradient Descent Optimizer is doing.