Just writing this crushed me for two days on Saturday and Sunday. It's in line with what you've learned, so as you read through it, you'll start talking about code. I want to touch / try Tensorflow, but I still don't understand various things! I wrote it for those who said.
** * Added on October 4, 2018 ** Since it is a very old article, there is a high possibility that the link is broken or the official document has been changed. I feel that Tensorflow in this article was about ver0.4 ~ 0.7, so now that it seems to be ver2.0 ~, you may not know what most of the sentences refer to.
It seems that experts will point out, but the point is, why not call it a black box that performs regression analysis?
Just the word "regression" will bring up ?
.
Have the machine calculate the "value" you want to find and the value that is as close as possible to it. So, isn't it okay?
eg. I want to know a proper function
eg. I want to know the appropriate cluster
eg. I want to know a proper "face" from a set of pixels
Then there are many things I would like to know! I think there are many people who say that.
I want to capture only when (value) when the face is awesome from the idol video! Or, Qiita: --Try to determine whether it is big breasts from the face photo by deep learning (it works or subtle).
It seems that there are always great ancestors.
So let's get started with Deep Learning! have become.
First of all, the amount of information **
At first, I didn't know what Tensorflow was and what the functions were doing, so I thought about switching to Theano many times, but for now most of the questions are already on Stackoverflow (in English) or on Github. There are various things written in the issue of, so after all Google's name power is amazing. The code of Tensorflow itself can also be found by doing a google search with the function name, so the understanding of the main body will deepen as you use it.
Before you start touching ** What can you do and what can't you do? I didn't even know **, so I read the blogs of people who are doing various experiments with Deep Learning frameworks.
Is the document easy to understand, such as Tensorflow> Theano> Chainer
?
Other:
-Tensorflow kivantium activity diary: --Identify the anime Yuruyuri production company with TensorFlow Sugyan Memo: --Identify the idol's face by deep learning with TensorFlow -Theano A breakthrough on artificial intelligence: --Implementation of convolutional neural network by Theano (1) StatsFragments: --Deep Learning with Theano <3>: Convolutional Neural Network -Chainer Sekairabo: -I made a bot that can answer naturally with LSTM Oriental Robotics: --Learning with RNN to output literary text (aka DeepDazai) Preferred Research: --Robot control with distributed deep reinforcement learning
print hoge_Tensor
at times other than the learning process, the contents are not included.
The value of the learning process such as "weight" is kept in the tf.Variable variable.
** And Tensor always has Rank, Shape, Type. ** **
It's often said with errors, so it's been a lot easier once I understand it.
Rankt = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
is Rank2
The point is the number of dimensions of Tensor itself.
Rank | Mathematical Units th> | Python example |
---|---|---|
0 | Scalar (actual quantity only) td> | s = 483 |
1 | Vector (quantity and direction) td> | v = [1.1, 2.2, 3.3] |
2 | Matrix (common table) td> | m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] |
3 | 3-Tensor (three-dimensional) td> | t = [[[2], [4], [6]], [[8], [10], [12]], [[14], [16], [18]]] |
n | n-Tensor (n-dimensional) td> | .... |
Shape
The previous t = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
Shape is 3D x 3D, so [3, 3]
Rank | Shape | Dimension number | Example |
---|---|---|---|
0 | [] | 0-D | A 0-D tensor. A scalar. |
1 | [D0] | 1-D | A 1-D tensor with shape [5]. |
2 | [D0, D1] | 2-D | A 2-D tensor with shape [3, 4]. |
3 | [D0, D1, D2] | 3-D | A 3-D tensor with shape [1, 4, 3]. |
n | [D0, D1, ... Dn-1] | n-D | A tensor with shape [D0, D1, ... Dn-1]. |
Type This is an int or float, so I don't need much explanation.
In the case of MNIST, 55,000 image data (images) Tensors and image answers (labels) Tensors are displayed.
Images Tensor is Shape [55000, 784]
, Rank2
, dtype = tf.float32
Labels Tensor is Shape [55000, 10]
, Rank2
, dtype = tf.float32
In the tutorial, it is first inserted with tf.placeholder
. (It may be easier to understand if you say secure Tensor)
input_Tensors
x = tf.placeholder(tf.float32, [None, 784]) #images
y_ = tf.placeholder(tf.float32, [None, 10]) #labels
#The None part contains the number of batches
Note that tf.placeholder ()
must be given data with the feed_dict
argument for each learning execution.
In the case of the tutorial, the learning execution starts near the end:
The last learning execution start code
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
So, in reality, the Tensors process every 100 images of x: Shape [100, 784]
y_: Shape [100, 10]
.
The image data is originally 28x28pixels grayscale = 1 channel, but in the beginner tutorial, it is flat-converted to a 784-dimensional vector for easy consideration (or rather, it has already been done).
28281 = 784-Dimension
-You can see it in the figure-
It's like putting all the numbers lined up vertically and horizontally horizontally.
00000000000000000000000000000000000000000000000000000000000000000000000000000000000000.6.7.7.50000000000.81111111.9.30000000.4.4.4.7111000000000000.1.10000000000000000000000000000000000000000000000000000000000
It seems to be "1" to those who can see it.
By the way, in the case of [55000, 28, 28, 1]
which does not flatten the image, Rank4
Even in the case of a color image, it only changes to 3 channels, so [55000, 28, 28, 3]
Rank4
Now, once you understand Tensor, you can finally follow the machine learning process of Tensorflow.
I prepared Image Tensorx: [batch_num, 784]
, but how do you derive the correct answer from the 10 correct answers from the 784-dimensional vector?
Here we understand the existence of ** matrix operations and "weights", "bias", and Softmax regression **.
Matrix operations are a simple matter.
If you perform a matrix operation of [784, 10]
on x: [batch_num, 784]
, a matrix of [batch_num, 10]
will be created, so there are 10 possible answers.
If you refer to the image on wikipedia;
ʻA: [4,2]and
B: [2,3]are now
[4,3]`.
In Tensorflow
Matrix operation matmul
tf.matmul(A,B) # A is [4,2] and B is [2,3]. output would be [4,3]
'''
x: [batch_num, 784]
W: [784, 10]
matmul: [batch_num, 10]
'''
matmul = tf.matmul(x,W)
In this B[2,3]
, MNIST,W: [784, 10]
is an important ** weight **.
Weights W: [784, 10]
are now available. The part in the code is
Weight W
W = tf.Variable(tf.zeros([784, 10]))
tf.Variable ()
is in-memory buffers </ i> A variable containing a Tensor that keeps the parameters you want to use for learning.
tf.zeros ()
creates a Tensor with all its contents filled with 0
.
Filling with 0
is only a 0
start because it is updated from time to time during the learning process. There is also a tf.random_normal ()
that puts in a random number.
The contents of W: [784, 10]
are the numerical values of the image in 1 pixel units, the possibility of 0 is 0.XXX, the possibility of 1 is -0.0.XXX, the possibility of 2 is 0.0XX ... I come to multiply the numbers like this.
For example, in the case of the previous "1" image, for the very first upper left pixel, the actual trained weight W [0]
is [0.0.0.0.0.0.0.0. 0. 0. 0. 0.]
Often. The reason is clear: all numbers from 0 to 9 don't make sense in the upper left pixel.
Looking at the weight W [380]
around the middle:
[-0.23017341 0.03032022 0.02670325 -0.06415708 0.07344861 -0.05119878 0.03592584 -0.00460929 0.09520938 0.08853132]
It has become. The fact that the 0 weight -0.23017341
is negative means that ** it is unlikely to be" 0 "when the middle pixel is black. You can understand that **.
I think it's more about the convolution layer of the expert tutorial, but ** I personally feel that the word filter is more appropriate than weight. ** **
Matrix multiplication of this weight on Images Tensor
After matrix operation
matmul = tf.matmul(x,W)
print "matmul:", matmul[0] #First image(The answer is 7)
matmul: [ 1.43326855 -10.14613152 2.10967159 6.07900429 -3.25419664
-1.93730605 -8.57098293 10.21759605 1.16319525 2.90590048]
Will be returned. Well, I'm still not sure.
Bias may be inappropriate because it sounds great, but
y = x(sin(2+(x^1+exp(0.01)+exp(0.5)))+x^(2+tan(10)))+x(x/2x+x^3x)+0.12
Is it something like the last 0.12
when there is a function like this?
More simply, b
? Of y = xa + b
?
Oh, that's why it's bias.
However, in the case of the tutorial, the accuracy of the answer did not change much even without bias.
If the true value of the bias is b = 1e-10
, it may not make much sense.
In the code, we will create it in the same way as the weights, but since the image Tensor and weights have already been matrix-operated, the bias to be added later is Shape [10]
of Rank1
.
bias
b = tf.Variable(tf.zeros([10]))
print "b:",b #Post-learning bias
b: [-0.98651898 0.82111627 0.23709664 -0.55601585 0.00611385 2.46202803
-0.34819031 1.39600098 -2.53770232 -0.49392569]
I'm not sure if this is a single unit.
The original Images Tensor x: [batch_num, 784]
is
Matrix operation with x
weightW: [784, 10]
After becoming =
matmul: [batch_num, 10]
+
Bias b: [10]
will be added.
However, I still don't understand the meaning of these numbers.
Therefore, pass these to tf.nn.softmax ()
to make them understandable to humans.
softmax
y = tf.nn.softmax(tf.matmul(x, W) + b)
print "y", y[0] #First image(The answer is 7)
y [ 2.04339485e-05 6.08732953e-10 5.19737077e-05 2.63350527e-03
2.94665284e-07 2.85405549e-05 2.29651920e-09 9.96997833e-01
1.14465665e-05 2.55984633e-04]
Looking at it, the 7th number is the highest. Apparently, the probability of 7
is high.
If you want to simply match the answers rather than the probabilities in the array
Please give me an answer
x_answer = tf.argmax(y,1)
y_answer = tf.argmax(y_,1)
print "x",x_answer[0:10] #The answer to the first 10 images Tensorflow thinks
print "y",y_answer[0:10] #10 The real answer of the image
x [7 2 1 0 4 1 4 9 6 9]
y [7 2 1 0 4 1 4 9 5 9]
I want to know the accuracy
correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print "accuracy:", accuracy
accuracy: 0.9128
range (0, 1)
.
At first I wrote it as Softmax regression, but to be precise, it is called "logistic regression" because it performs regression on probability. Softmax is a function that returns the output when you input it.
Since MNIST is a problem of classifying images, as a series of processes,
"I want to know the probability of each label for this image" → "Logistic regression (softmax)" → "The answer is the one with the highest probability (argmax)".
So you probably won't use softmax for regression analysis where you want to find real numbers.Now you understand how Tensorflow gives MNIST answers.
But how is the learning of weights W
and bias b
going on? It will be.
The hint is in the part where the learning execution of Tensorflow is repeated.
The last learning execution start code
for i in range(1000):
batch_xs, batch_ys = mnist.train.next_batch(100)
sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
This train_step
seems to be training. The contents are
Learning method
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
'''
y: [batch_num, 10] y is a list of processed numbers of x(images)
y_: [batch_num, 10] y_ is labels
0.01 is a learning rate
'''
But let's chew a little more
tf.log ()
calculates log in an easy-to-understand manner. The Tensor itself hasn't changed, so it's log-y: [batch_num, 10]
.
And I multiply it with the answer Tensory_
, but since y_
contains all 0s except the answer, when I multiply it, the value of ʻindex other than the answer becomes
0. In the multiplied Tensor,
Shape is
[batch_num, 10] , but it may be easier to understand that the actual dimension is
[batch_num, 1] because it is
0` except for the answer part.
log-y = tf.log(y)
print log-y[0]
[ -1.06416254e+01 -2.04846172e+01 -8.92418385e+00 -5.71210337e+00
-1.47629070e+01 -1.18935766e+01 -1.92577553e+01 -3.63449310e-03
-1.08472376e+01 -8.88469982e+00]
y_times_log-y = y_*tf.log(y)
print y_times_log-y[0] #Only the value of 7 remains.
[-0. -0. -0. -0. -0. -0.
-0. -0.00181153 -0. -0. ]
tf.reduce_sum ()
adds across all dimensions and becomes aRank0
Tensor (scalar) without the second argument and the keep_dims = True
option. In the case of MNIST, it is the sum of all the values held by [batch_num]
.
Example tf.reduce_sum()
# 'x' is [[1, 1, 1]
# [1, 1, 1]]
tf.reduce_sum(x) ==> 6
tf.reduce_sum(x, 0) ==> [2, 2, 2]
tf.reduce_sum(x, 1) ==> [3, 3]
tf.reduce_sum(x, 1, keep_dims=True) ==> [[3], [3]]
tf.reduce_sum(x, [0, 1]) ==> 6
------
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
print "cross_entropy:", cross_entropy #y_*tf.log(y)The total number of contents
cross_entropy 23026.0 #Numerical value after the first learning
.
.
.
cross_entropy: 3089.6 #Numerical value after the last learning
This article is very helpful for cross entropy.
Neural Networks and Deep Learning: -Free Online Books- Chapter 3
http://nnadl-ja.github.io/nnadl_site_ja/chap3.html
In short, it's an indicator of how much you're learning.
It seems that learning is successful if you optimize ** weight ** and ** bias ** while referring to this.
The actual optimization is done by tf.train.GradientDescentOptimizer ()
, but there are other choices to choose from, class tf.train.Optimizer
, so it's fun to take a look.
Tensorflow/api_docs - Optimizers:
https://www.tensorflow.org/versions/r0.7/api_docs/python/train.html#optimizers
If you call .minimize ()
additionally, Gradient calculation and application to tf.Variables
will be performed together.
Conversely, by calling .compute_gradients ()
, you can see the value for updating the ** weight ** W
and ** bias ** b
at the time of optimization, that is, the error value / correction value. can do.
Actually, it seems that it starts with ± a large number and converges while going back and forth between the place to place.
Gradient_values
#Early learning
cross_entropy 23026.0
grad W[0] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
grad W[380] [ 511.78765869 59.3368187 -34.74549103 -163.8828125 -103.32589722
181.61528015 17.56824303 -60.38471603 -175.52197266 -232.44744873]
grad b [ 19.99900627 -135.00904846 -32.00152588 -9.99949074 18.00206184
107.99274445 41.992836 -27.99754715 26.00336075 -8.99738121]
#Last learning
cross_entropy 2870.42
grad W[0] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
grad W[380] [ 6.80800724 1.27235568 -6.85943699 -22.70822525 -17.48428154
13.11752224 19.7425499 -32.00106812 -41.48160553 79.59416199]
grad b [ 19.52701187 3.17797041 -20.07606125 -48.88145447 -28.05920601
37.52313232 40.22808456 -34.04494858 -74.16973114 104.77211761]
Regarding the weight W
, it seems that the first pixel is completely ignored ... lol
I think it's better to leave these numbers to the machine and drink tea slowly.
Actually, I haven't realized what I want to do yet ... I was completely fascinated by the fact that machine learning super-stimulates the "manufacturing spirit." The deeper your understanding, the more ideas you will come up with, "Let's do this" and "Let's do it". It doesn't work, but it's fun. I wonder ... this nostalgic feeling. Next, I would like to explain the MNIST expert edition of the tutorial. I would like to recommend it to those who do not understand convolution, pooling, etc. Stocks, tweets, likes, hates, comments, etc. are all encouraging, so please.