I tried running TensorFlow, a machine learning library published by Google teacher. It will be touched with Python. Install it from Anaconda with the pip command. I will refer to the following sites.
Installing TensorFlow on Windows was easy even for Python beginners
It didn't work at first, but when I specified the version of TensorFlow even though it was a little old, it worked. So, let's move the code of the tutorial ??? immediately. Try it with the code you can get with GetStart, a simple linear regression analysis. I tried to refer to the following site.
Probably the most straightforward introduction to TensorFlow (Introduction)
y=0.1x+0.3
The problem is to take a sample of about 100 points on the plot and estimate the parameters of the equations 0.1 and 0.3.
import tensorflow as tf
import numpy as np
# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3
# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but Tensorflow will
# figure that out for us.)
W = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
b = tf.Variable(tf.zeros([1]))
y = W * x_data + b
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.initialize_all_variables()
# Launch the graph.
sess = tf.Session()
sess.run(init)
# Fit the line.
for step in range(201):
if step % 20 == 0:
print((step, sess.run(W), sess.run(b)))
sess.run(train)
# Learns best fit is W: [0.1], b: [0.3]
It's a very good sample in terms of how to use TensorFlow, but I'm not sure because the API is black-boxed. I thought about various things and analyzed what I was doing. Apparently, the initial values of the parameters w and b are set appropriately, and the convergence operation is performed using the steepest gradient method for the cost function of the least squares.
[Sudden descent (Wikipedia)](https://ja.wikipedia.org/wiki/%E6%9C%80%E6%80%A5%E9%99%8D%E4%B8%8B%E6%B3% 95)
The algorithm itself is not a big deal, and it is OK if the evaluation function is updated with the one-time partial differential of the parameter as the update amount. Specifically ... the sample is defined as follows. (In this example, it looks like N = 100)
\left\{ \left( x_n,y_n \right) \right\}_{n=1}^N
At this time, the relationship between x_n and y_n is configured as follows. (In this example, w = 0.1, b = 0.3 are true values)
y_n=wx_n+b
And since the cost function is the sum of squares of the residuals, it becomes as follows. You can think of w and b as initial parameters.
L(w,b)=\sum_{n=1}^N \left(y_n - (wx_n+b) \right)^2
Of course, when w and b are correct values, I'm happy
L(w,b)=0
Therefore, you should search for w and b that minimize L.
In the steepest gradient method, the initial parameters are updated once with partial differentiation, so obtain each.
\frac{\partial}{\partial w}L(w,b)=-2\sum_{n=1}^N
\left( y_n - (wx_n+b)\right)x_n
\frac{\partial}{\partial b}L(w,b)=-2\sum_{n=1}^N
\left( y_n - (wx_n+b)\right)
Using this, it seems that to update a certain parameter initial value, w ^ (k), b ^ (k), do as follows.
\left(
\begin{matrix}
w^{(k+1)} \\
b^{(k+1)}
\end{matrix}
\right)
=
\left(
\begin{matrix}
w^{(k)} \\
b^{(k)}
\end{matrix}
\right)
- \alpha
\left(
\begin{matrix}
\frac{\partial L}{\partial w} \\
\frac{\partial L}{\partial b}
\end{matrix}
\right)
\\
=
\left(
\begin{matrix}
w^{(k)} \\
b^{(k)}
\end{matrix}
\right)
+ 2\alpha
\left(
\begin{matrix}
\sum (y_n - (wx_n+b))x_n \\
\sum (y_n - (wx_n+b))
\end{matrix}
\right)
I'm very sorry, but amakudari decides the coefficient α as follows. This is determined by the characteristics of the coefficients passed to the TensorFlow library.
\alpha = \frac{1}{N} \beta
β ... Is there any name? It seems that this is the first setting parameter for convergence. In this sample, β = 0.5.
So, let's create our own Class and verify it.
How about the following feeling?
class calcWB:
def __init__(self,x,y,w,b):
self.x = x
self.y = y
self.w = w
self.b = b
# get length of sample data
self.N = len(x)
def run(self,beta):
# calculate current redisual
residual = self.y - (self.w*self.x + self.b)
# calc dL/dw
dw = -2*np.dot(residual,self.x)
# calc dL/db
db = -2*sum(residual)
# calc alpha
alpha = beta/self.N
# update param(w,b)
self.w = self.w - alpha*dw
self.b = self.b - alpha*db
return self.w,self.b
There are only two methods, one for initialization and one for learning. Using this to modify the first sample, it looks like this:
# setting param init data
w_init = np.random.rand()-.5
b_init = np.random.rand()-.5
# GradientDescentOptimizer parameter
beta = 0.5
# Create 100 phony x, y data points in NumPy, y = x * 0.1 + 0.3
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data * 0.1 + 0.3
# Try to find values for W and b that compute y_data = W * x_data + b
# (We know that W should be 0.1 and b 0.3, but TensorFlow will
# figure that out for us.)
#W = tf.Variable(tf.random_uniform([1], -10, 10))
W = tf.Variable(w_init)
#b = tf.Variable(tf.zeros([1]))
b = tf.Variable(b_init)
y = W * x_data + b
# Minimize the mean squared errors.
loss = tf.reduce_mean(tf.square(y - y_data))
optimizer = tf.train.GradientDescentOptimizer(beta)
train = optimizer.minimize(loss)
# Before starting, initialize the variables. We will 'run' this first.
init = tf.global_variables_initializer()
# Launch the graph.
sess = tf.Session()
sess.run(init)
# create calcWB object
objCalcWB = calcWB(x_data,y_data,w_init,b_init)
# Fit the line.
for step in range(201):
sess.run(train)
w_tmp,b_tmp = objCalcWB.run(beta)
if step % 20 == 0:
#print(step, sess.run(W), sess.run(b))
print('[from TensorFlow] k=%d w=%.10f b=%.10f' % (step, sess.run(W), sess.run(b)))
print('[from calcWB] k=%d w=%.10f b=%.10f' % (step,w_tmp,b_tmp))
# Learns best fit is W: [0.1], b: [0.3]
Looking at the execution result ...
[from TensorFlow] k=0 w=0.4332985282 b=0.2284004837
[from calcWB] k=0 w=0.4332985584 b=0.2284004998
[from TensorFlow] k=20 w=0.1567724198 b=0.2680215836
[from calcWB] k=20 w=0.1567724287 b=0.2680215712
[from TensorFlow] k=40 w=0.1113634855 b=0.2935992479
[from calcWB] k=40 w=0.1113634986 b=0.2935992433
[from TensorFlow] k=60 w=0.1022744998 b=0.2987188399
[from calcWB] k=60 w=0.1022745020 b=0.2987188350
[from TensorFlow] k=80 w=0.1004552618 b=0.2997435629
[from calcWB] k=80 w=0.1004552578 b=0.2997435619
[from TensorFlow] k=100 w=0.1000911444 b=0.2999486625
[from calcWB] k=100 w=0.1000911188 b=0.2999486686
[from TensorFlow] k=120 w=0.1000182480 b=0.2999897301
[from calcWB] k=120 w=0.1000182499 b=0.2999897517
[from TensorFlow] k=140 w=0.1000036523 b=0.2999979556
[from calcWB] k=140 w=0.1000036551 b=0.2999979575
[from TensorFlow] k=160 w=0.1000007242 b=0.2999995947
[from calcWB] k=160 w=0.1000007308 b=0.2999995937
[from TensorFlow] k=180 w=0.1000001431 b=0.2999999225
[from calcWB] k=180 w=0.1000001444 b=0.2999999224
[from TensorFlow] k=200 w=0.1000000909 b=0.2999999523
[from calcWB] k=200 w=0.1000000255 b=0.2999999832
It seems that it is a good idea because it has about 7 digits after the decimal point.
Well, I think I understand a little about what TensorFlow's Gradient Descent Optimizer is doing.
Recommended Posts