I'm neither a programmer nor a data scientist, but I've been touching Tensorflow for a month, so it's super easy to understand.

Just writing this crushed me for two days on Saturday and Sunday. It's in line with what you've learned, so as you read through it, you'll start talking about code. I want to touch / try Tensorflow, but I still don't understand various things! I wrote it for those who said.

** * Added on October 4, 2018 ** Since it is a very old article, there is a high possibility that the link is broken or the official document has been changed. I feel that Tensorflow in this article was about ver0.4 ~ 0.7, so now that it seems to be ver2.0 ~, you may not know what most of the sentences refer to.

1: What is Deep Learning doing in the first place?

It seems that experts will point out, but the point is, why not call it a black box that performs regression analysis? Just the word "regression" will bring up ?. Have the machine calculate the "value" you want to find and the value that is as close as possible to it. So, isn't it okay? eg. I want to know a proper function Non-Linear-Regression.gif eg. I want to know the appropriate cluster K-Means-Clustering-Gif.gif eg. I want to know a proper "face" from a set of pixels Screen Shot 2016-03-06 at 4.44.13 PM.png Then there are many things I would like to know! I think there are many people who say that. I want to capture only when (value) when the face is awesome from the idol video! Or, Qiita: --Try to determine whether it is big breasts from the face photo by deep learning (it works or subtle). It seems that there are always great ancestors. So let's get started with Deep Learning! have become.

2: Choose a framework-the good things about Tensorflow

First of all, the amount of information ** At first, I didn't know what Tensorflow was and what the functions were doing, so I thought about switching to Theano many times, but for now most of the questions are already on Stackoverflow (in English) or on Github. There are various things written in the issue of, so after all Google's name power is amazing. The code of Tensorflow itself can also be found by doing a google search with the function name, so the understanding of the main body will deepen as you use it. Before you start touching ** What can you do and what can't you do? I didn't even know **, so I read the blogs of people who are doing various experiments with Deep Learning frameworks. Is the document easy to understand, such as Tensorflow> Theano> Chainer? Other:

List of blogs I read

-Tensorflow kivantium activity diary: --Identify the anime Yuruyuri production company with TensorFlow Sugyan Memo: --Identify the idol's face by deep learning with TensorFlow -Theano A breakthrough on artificial intelligence: --Implementation of convolutional neural network by Theano (1) StatsFragments: --Deep Learning with Theano <3>: Convolutional Neural Network -Chainer Sekairabo: -I made a bot that can answer naturally with LSTM Oriental Robotics: --Learning with RNN to output literary text (aka DeepDazai) Preferred Research: --Robot control with distributed deep reinforcement learning

3: Hello, World! MNIST beginner edition

t = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] is Rank2 The point is the number of dimensions of Tensor itself.

Rank Mathematical Units Python example
0 Scalar (actual quantity only) s = 483
1 Vector (quantity and direction) v = [1.1, 2.2, 3.3]
2 Matrix (common table) m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
3 3-Tensor (three-dimensional) t = [[[2], [4], [6]], [[8], [10], [12]], [[14], [16], [18]]]
n n-Tensor (n-dimensional) ....

Shape The previous t = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] Shape is 3D x 3D, so [3, 3]

Rank Shape Dimension number Example
0 [] 0-D A 0-D tensor. A scalar.
1 [D0] 1-D A 1-D tensor with shape [5].
2 [D0, D1] 2-D A 2-D tensor with shape [3, 4].
3 [D0, D1, D2] 3-D A 3-D tensor with shape [1, 4, 3].
n [D0, D1, ... Dn-1] n-D A tensor with shape [D0, D1, ... Dn-1].

Type This is an int or float, so I don't need much explanation.

See those Tensors on MNIST

In the case of MNIST, 55,000 image data (images) Tensors and image answers (labels) Tensors are displayed. Images Tensor is Shape [55000, 784], Rank2, dtype = tf.float32 Labels Tensor is Shape [55000, 10], Rank2, dtype = tf.float32 In the tutorial, it is first inserted with tf.placeholder. (It may be easier to understand if you say secure Tensor)

input_Tensors


x = tf.placeholder(tf.float32, [None, 784]) #images
y_ = tf.placeholder(tf.float32, [None, 10]) #labels
#The None part contains the number of batches

Note that tf.placeholder () must be given data with the feed_dict argument for each learning execution. In the case of the tutorial, the learning execution starts near the end:

The last learning execution start code


for i in range(1000):
 batch_xs, batch_ys = mnist.train.next_batch(100)
 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

So, in reality, the Tensors process every 100 images of x: Shape [100, 784] y_: Shape [100, 10].

** Digression: About the number of dimensions of the image **

The image data is originally 28x28pixels grayscale = 1 channel, but in the beginner tutorial, it is flat-converted to a 784-dimensional vector for easy consideration (or rather, it has already been done). 28281 = 784-Dimension -You can see it in the figure- It's like putting all the numbers lined up vertically and horizontally horizontally. mnist.jpg 00000000000000000000000000000000000000000000000000000000000000000000000000000000000000.6.7.7.50000000000.81111111.9.30000000.4.4.4.7111000000000000.1.10000000000000000000000000000000000000000000000000000000000 mnist1.jpg It seems to be "1" to those who can see it. By the way, in the case of [55000, 28, 28, 1] which does not flatten the image, Rank4 Even in the case of a color image, it only changes to 3 channels, so [55000, 28, 28, 3] Rank4

4: Tensorflow processing: --What you are doing in the beginner tutorial

Now, once you understand Tensor, you can finally follow the machine learning process of Tensorflow. I prepared Image Tensorx: [batch_num, 784] , but how do you derive the correct answer from the 10 correct answers from the 784-dimensional vector? Here we understand the existence of ** matrix operations and "weights", "bias", and Softmax regression **.

Matrix operation

Matrix operations are a simple matter. If you perform a matrix operation of [784, 10] on x: [batch_num, 784], a matrix of [batch_num, 10] will be created, so there are 10 possible answers. If you refer to the image on wikipedia; ʻA: [4,2]andB: [2,3]are now[4,3]`. Matrix_multiplication_diagram_2.png In Tensorflow

Matrix operation matmul


tf.matmul(A,B) # A is [4,2] and B is [2,3]. output would be [4,3]

'''
x: [batch_num, 784]
W: [784, 10]
matmul: [batch_num, 10]
'''
matmul = tf.matmul(x,W)

In this B[2,3], MNIST,W: [784, 10]is an important ** weight **.

weight

Weights W: [784, 10] are now available. The part in the code is

Weight W


W = tf.Variable(tf.zeros([784, 10]))

tf.Variable () is in-memory buffers </ i> A variable containing a Tensor that keeps the parameters you want to use for learning. tf.zeros () creates a Tensor with all its contents filled with 0. Filling with 0 is only a 0 start because it is updated from time to time during the learning process. There is also a tf.random_normal () that puts in a random number.

The role of weights

The contents of W: [784, 10] are the numerical values of the image in 1 pixel units, the possibility of 0 is 0.XXX, the possibility of 1 is -0.0.XXX, the possibility of 2 is 0.0XX ... I come to multiply the numbers like this. For example, in the case of the previous "1" image, for the very first upper left pixel, the actual trained weight W [0] is [0.0.0.0.0.0.0.0. 0. 0. 0. 0.] Often. The reason is clear: all numbers from 0 to 9 don't make sense in the upper left pixel. Looking at the weight W [380] around the middle: [-0.23017341 0.03032022 0.02670325 -0.06415708 0.07344861 -0.05119878 0.03592584 -0.00460929 0.09520938 0.08853132] It has become. The fact that the 0 weight -0.23017341 is negative means that ** it is unlikely to be" 0 "when the middle pixel is black. You can understand that **. mnist.jpg I think it's more about the convolution layer of the expert tutorial, but ** I personally feel that the word filter is more appropriate than weight. ** ** Matrix multiplication of this weight on Images Tensor

After matrix operation


matmul = tf.matmul(x,W)
print "matmul:", matmul[0] #First image(The answer is 7)
matmul: [ 1.43326855 -10.14613152 2.10967159 6.07900429 -3.25419664
-1.93730605 -8.57098293 10.21759605 1.16319525 2.90590048]

Will be returned. Well, I'm still not sure.

bias

Bias may be inappropriate because it sounds great, but y = x(sin(2+(x^1+exp(0.01)+exp(0.5)))+x^(2+tan(10)))+x(x/2x+x^3x)+0.12 Is it something like the last 0.12 when there is a function like this? graph.jpg More simply, b? Of y = xa + b? Oh, that's why it's bias. However, in the case of the tutorial, the accuracy of the answer did not change much even without bias. If the true value of the bias is b = 1e-10, it may not make much sense. In the code, we will create it in the same way as the weights, but since the image Tensor and weights have already been matrix-operated, the bias to be added later is Shape [10] of Rank1.

bias


b = tf.Variable(tf.zeros([10]))
print "b:",b #Post-learning bias
b: [-0.98651898 0.82111627 0.23709664 -0.55601585 0.00611385 2.46202803
-0.34819031 1.39600098 -2.53770232 -0.49392569]

I'm not sure if this is a single unit.

Softmax function --matching answers--

The original Images Tensor x: [batch_num, 784] is Matrix operation with x weightW: [784, 10] After becoming = matmul: [batch_num, 10] + Bias b: [10] will be added. However, I still don't understand the meaning of these numbers. Therefore, pass these to tf.nn.softmax () to make them understandable to humans.

softmax


y = tf.nn.softmax(tf.matmul(x, W) + b)
print "y", y[0] #First image(The answer is 7)
y [ 2.04339485e-05 6.08732953e-10 5.19737077e-05 2.63350527e-03
2.94665284e-07 2.85405549e-05 2.29651920e-09 9.96997833e-01
1.14465665e-05 2.55984633e-04]

Looking at it, the 7th number is the highest. Apparently, the probability of 7 is high. If you want to simply match the answers rather than the probabilities in the array

Please give me an answer


x_answer = tf.argmax(y,1)
y_answer = tf.argmax(y_,1)
print "x",x_answer[0:10] #The answer to the first 10 images Tensorflow thinks
print "y",y_answer[0:10] #10 The real answer of the image
x [7 2 1 0 4 1 4 9 6 9]
y [7 2 1 0 4 1 4 9 5 9]

I want to know the accuracy


correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))
print "accuracy:", accuracy
accuracy: 0.9128
  • Added 2016/05/19 The Softmax function is a function that crushes a set of arbitrary real numbers into range (0, 1). At first I wrote it as Softmax regression, but to be precise, it is called "logistic regression" because it performs regression on probability. Softmax is a function that returns the output when you input it. Since MNIST is a problem of classifying images, as a series of processes, "I want to know the probability of each label for this image" → "Logistic regression (softmax)" → "The answer is the one with the highest probability (argmax)". So you probably won't use softmax for regression analysis where you want to find real numbers.

5: When are you learning?

Now you understand how Tensorflow gives MNIST answers. But how is the learning of weights W and bias b going on? It will be. The hint is in the part where the learning execution of Tensorflow is repeated.

The last learning execution start code


for i in range(1000):
 batch_xs, batch_ys = mnist.train.next_batch(100)
 sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

This train_step seems to be training. The contents are

Learning method


cross_entropy = -tf.reduce_sum(y_*tf.log(y))
train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)
'''
 y: [batch_num, 10] y is a list of processed numbers of x(images)
y_: [batch_num, 10] y_ is labels
0.01 is a learning rate
'''

But let's chew a little more tf.log () calculates log in an easy-to-understand manner. The Tensor itself hasn't changed, so it's log-y: [batch_num, 10]. And I multiply it with the answer Tensory_, but since y_ contains all 0s except the answer, when I multiply it, the value of ʻindex other than the answer becomes 0. In the multiplied Tensor, Shape is [batch_num, 10] , but it may be easier to understand that the actual dimension is [batch_num, 1] because it is 0` except for the answer part.

log-y = tf.log(y)
print log-y[0]
[ -1.06416254e+01 -2.04846172e+01 -8.92418385e+00 -5.71210337e+00
 -1.47629070e+01 -1.18935766e+01 -1.92577553e+01 -3.63449310e-03
 -1.08472376e+01 -8.88469982e+00]
y_times_log-y = y_*tf.log(y)
print y_times_log-y[0] #Only the value of 7 remains.
[-0. -0. -0. -0. -0. -0.
-0. -0.00181153 -0. -0. ]

tf.reduce_sum () adds across all dimensions and becomes aRank0 Tensor (scalar) without the second argument and the keep_dims = True option. In the case of MNIST, it is the sum of all the values held by [batch_num].

Example tf.reduce_sum()


# 'x' is [[1, 1, 1]
# [1, 1, 1]]
tf.reduce_sum(x) ==> 6
tf.reduce_sum(x, 0) ==> [2, 2, 2]
tf.reduce_sum(x, 1) ==> [3, 3]
tf.reduce_sum(x, 1, keep_dims=True) ==> [[3], [3]]
tf.reduce_sum(x, [0, 1]) ==> 6
------
cross_entropy = -tf.reduce_sum(y_*tf.log(y))
print "cross_entropy:", cross_entropy #y_*tf.log(y)The total number of contents
cross_entropy 23026.0 #Numerical value after the first learning
.
.
.
cross_entropy: 3089.6 #Numerical value after the last learning

This article is very helpful for cross entropy. Neural Networks and Deep Learning: -Free Online Books- Chapter 3 http://nnadl-ja.github.io/nnadl_site_ja/chap3.html In short, it's an indicator of how much you're learning. It seems that learning is successful if you optimize ** weight ** and ** bias ** while referring to this. The actual optimization is done by tf.train.GradientDescentOptimizer (), but there are other choices to choose from, class tf.train.Optimizer, so it's fun to take a look. Tensorflow/api_docs - Optimizers: https://www.tensorflow.org/versions/r0.7/api_docs/python/train.html#optimizers If you call .minimize () additionally, Gradient calculation and application to tf.Variables will be performed together. Conversely, by calling .compute_gradients (), you can see the value for updating the ** weight ** W and ** bias ** b at the time of optimization, that is, the error value / correction value. can do. Actually, it seems that it starts with ± a large number and converges while going back and forth between the place to place.

Gradient_values


#Early learning
cross_entropy 23026.0
grad W[0] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
grad W[380] [ 511.78765869 59.3368187 -34.74549103 -163.8828125 -103.32589722
 181.61528015 17.56824303 -60.38471603 -175.52197266 -232.44744873]
grad b [ 19.99900627 -135.00904846 -32.00152588 -9.99949074 18.00206184
 107.99274445 41.992836 -27.99754715 26.00336075 -8.99738121]
#Last learning
cross_entropy 2870.42
grad W[0] [ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
grad W[380] [ 6.80800724 1.27235568 -6.85943699 -22.70822525 -17.48428154
 13.11752224 19.7425499 -32.00106812 -41.48160553 79.59416199]
grad b [ 19.52701187 3.17797041 -20.07606125 -48.88145447 -28.05920601
 37.52313232 40.22808456 -34.04494858 -74.16973114 104.77211761]

Regarding the weight W, it seems that the first pixel is completely ignored ... lol I think it's better to leave these numbers to the machine and drink tea slowly.

6: Next time, I will explain the experts in detail!

Actually, I haven't realized what I want to do yet ... I was completely fascinated by the fact that machine learning super-stimulates the "manufacturing spirit." The deeper your understanding, the more ideas you will come up with, "Let's do this" and "Let's do it". It doesn't work, but it's fun. I wonder ... this nostalgic feeling. Next, I would like to explain the MNIST expert edition of the tutorial. I would like to recommend it to those who do not understand convolution, pooling, etc. Stocks, tweets, likes, hates, comments, etc. are all encouraging, so please.

Recommended Posts

I'm neither a programmer nor a data scientist, but I've been touching Tensorflow for a month, so it's super easy to understand.
I'm a windows user but want to run tensorflow