Let's break down the basics of TensorFlow Python code

Hello. In this article, I will explain the basic writing style and grammar of TensorFlow (TF). The target is for those who are new to TF and have not written much Python.

As a background, I wrote the Python code properly when I touched TF. Since it is a note left at that time, it may help similar people to understand TF.

I used sergeant-wizard's "Assess the salary of professional baseball players with a neural network" as the sample code. .. Thank you for your permission to publish.

(1) Import library

import tensorflow as tf
import numpy

I also use numpy, so don't forget to import it.

(2) Data scale and NN dimension setting

SCORE_SIZE = 33
HIDDEN_UNIT_SIZE = 32
TRAIN_DATA_SIZE = 90

SCORE_SIZE: Dimension of input data HIDDEN_UNIT_SIZE: Number of hidden layer (intermediate layer) nodes TRAIN_DATA_SIZE: Number of training data samples You can set the variable name yourself!

(3) Data scale and NN dimension setting

raw_input = numpy.loadtxt(open(`input.csv`), delimiter=`,`)
[salary, score]  = numpy.hsplit(raw_input, [1])

numpy.loadtxt: Data load function in the numpy library.

delimiter is a data delimiter specification. The author stored the data in input.csv. http://goo.gl/h5g8cJ raw_input stores array type data.

numpy.hsplit: A function that puts a secant in the vertical direction (column direction) and splits the array in the horizontal direction (row direction).

raw_input: Array to be split [1]: Boundary of division. Select the 0th column and assign it to salary, and assign the 1st column to score. Therefore, salary is a vector of the number of components = number of people (= 94), and score is an array of 94 x 33. Note that salary is the teacher data and score is the input data. http://goo.gl/hoMyGH

(4) Format the training data (divide into training athletes and test athletes)

[salary_train, salary_test] = numpy.vsplit(salary, [TRAIN_DATA_SIZE])
[score_train, score_test] = numpy.vsplit(score, [TRAIN_DATA_SIZE])

numpy.vsplit: A function that puts a scalpel horizontally and divides the array vertically

salary_train: Vector with 89 components salary_test: Vector with 5 components score_train: 89 x 33 array score_test: 5 x 33 array

(5) Define the parameter specification of the hidden layer and the calculation of the activation function.

def inference(score_placeholder):
  with tf.name_scope('hidden1') as scope:
    hidden1_weight = tf.Variable(tf.truncated_normal([SCORE_SIZE, HIDDEN_UNIT_SIZE], stddev=0.1), name=`hidden1_weight`)
    hidden1_bias = tf.Variable(tf.constant(0.1, shape=[HIDDEN_UNIT_SIZE]), name=`hidden1_bias`)
    hidden1_output = tf.nn.relu(tf.matmul(score_placeholder, hidden1_weight) + hidden1_bias)
  with tf.name_scope('output') as scope:
    output_weight = tf.Variable(tf.truncated_normal([HIDDEN_UNIT_SIZE, 1], stddev=0.1), name=`output_weight`)
    output_bias = tf.Variable(tf.constant(0.1, shape=[1]), name=`output_bias`)
    output = tf.matmul(hidden1_output, output_weight) + output_bias
  return tf.nn.l2_normalize(output, 0)

inference: Function name (optional) score_placeholder: Data defined in the code below.

For placeholder type data, specify the data / update data (that is, input data and teacher data) that will be the source of learning. It is a data type peculiar to TF. In TF, when updating learning, the original data is taken in by a mechanism called feed. The placeholder type is linked with feed, and the input data will be reflected in the related calculation. http://goo.gl/uhHk3o

  with tf.name_scope('hidden1') as scope:

with: python grammar. command An instruction that does not carry over the following instructions to commands outside the context. tf.name_scope: TF name management function. A context group is formed for the nested instructions under this function.

The advantage of managing names is drawing. TensorFlow can illustrate a model of a learner built on TensorBoard. Managing the names makes it easier to understand the drawing output. Here, the following weight and bias assignment actions are nested under the name_scope function. That is, it defines that these instructions are in the context of'hidden1', a hidden layer parameter calculation. http://goo.gl/AYodFB

    hidden1_weight = tf.Variable(tf.truncated_normal([SCORE_SIZE, HIDDEN_UNIT_SIZE], stddev=0.1), name=`hidden1_weight`)
    hidden1_bias = tf.Variable(tf.constant(0.1, shape=[HIDDEN_UNIT_SIZE]), name=`hidden1_bias`)
    hidden1_output = tf.nn.relu(tf.matmul(score_placeholder, hidden1_weight) + hidden1_bias)

It is a calculation of ** input layer → hidden layer **.

tf.Variable: TF variable class can be applied

In addition to generating as a variable, it has various functions. For example, you can overwrite the value of a variable with the command assign. (Details are linked below) http://goo.gl/nUJafs

tf.truncated_normal: Returns a random number with a normal distribution

[SCORE_SIZE, HIDDEN_UNIT_SIZE]: Array size of the desired random numbers. stddev: Specifies the standard deviation of the normal distribution. In the standard normal distribution, specify "mean = 0.0, stddev = 1.0". The initial value of the NN weight $ {\ bf W} $ is generated by random numbers. At this stage, this program can be interpreted as an NN without pre-learning. hidden1_weight can be interpreted as a matrix with a weight of $ {\ bf W} $. name simply names the execution of this function. http://goo.gl/oZkcvs

tf.constant: Function to generate constants

0.1: Generate a constant 0.1. shape: The size that makes the constant. Create only the number of hidden layer units (HIDDEN_UNIT_SIZE) and substitute them for hidden1_bias. hidden1_bias is the bias term of the hidden layer, and here the initial value is set to 0.1.

tf.nn.relu: A function that calculates ReLU, which is one of the activation functions. (This article is for the purpose of explaining the code, so please learn the meaning of ReLU separately m (_ _) m)

tf.matmul: Function to calculate the product of matrices (vectors) (inner product if they are vectors)

Here we calculate the product of the matrix score_placeholder and the matrix hidden1_weight. As you can see from the definition below, this definition holds because the number of columns in score_placeholder is defined as SCORE_SIZE. As a result, the calculation result for the number of units is calculated as a vector as output.

  with tf.name_scope('output') as scope:
    output_weight = tf.Variable(tf.truncated_normal([HIDDEN_UNIT_SIZE, 1], stddev=0.1), name=`output_weight`)
    output_bias = tf.Variable(tf.constant(0.1, shape=[1]), name=`output_bias`)
    output = tf.matmul(hidden1_output, output_weight) + output_bias
  return tf.nn.l2_normalize(output, 0)

It is a calculation of ** hidden layer → output layer **. output is calculated in scalar. In other words, the unit of the output layer is 1, which is the comparison target with the player's annual salary (teacher data). In this case, the activation function is processed as an identity map for the output layer. tf.nn.l2_normalize: Function that calculates and returns normalization

If it is a vector, divide each component of the vector by the norm. (The one with the transformation that the size of the vector becomes 1) The same applies to matrices and tensors and beyond. If it is a scalar, 1 is returned. Here, since the original data is also normalized, it is necessary to compare the normalized values. https://goo.gl/NEFajc

(6) Definition of error function

def loss(output, salary_placeholder, loss_label_placeholder):
  with tf.name_scope('loss') as scope:
    loss = tf.nn.l2_loss(output - tf.nn.l2_normalize(salary_placeholder, 0))
    tf.scalar_summary(loss_label_placeholder, loss)
  return loss

tf.nn.l2_loss: A function that calculates the squared error.

Note that there are restrictions on the data type. The main error functions are ** square error function ** and ** cross entropy error function **. The square error function is used for the numerical prediction task, and the cross entropy error function is used for the classification task. The method of specifying salary_placeholder is to input salary data as teacher data to this variable for each update. http://goo.gl/V67M7c

tf.scalar_summary: A function that attaches a string to the target scalar and records the meaning of the value.

loss_label_placeholder is defined below as a string placeholder. http://goo.gl/z7JWNe

(7) Application of SGD (Stochastic Gradient Descent Method)

def training(loss):
  with tf.name_scope('training') as scope:
    train_step = tf.train.GradientDescentOptimizer(0.01).minimize(loss)
  return train_step

tf.train.GradientDescentOptimizer: A class that stores the SGD algorithm.

0.01: Represents the learning coefficient ε minimize (loss): Tricky, but a function stored in a class called tf.train.GradientDescentOptimizer that minimizes the target variable (loss here) Since it is an object that stores the calculation process, the updated weight parameter after the differential calculation is returned. http://goo.gl/5XENkX

(8) Description of execution system

In TensorFlow, algorithms related to learning execution are stored in a class called Graph. Not only the calculation (session.run) but also the information necessary for graph drawing (session.graph) is organized as the name suggests. (It's convenient (・ ω <))

with tf.Graph().as_default():
  salary_placeholder = tf.placeholder(`float`, [None, 1], name=`salary_placeholder`)
  score_placeholder = tf.placeholder(`float`, [None, SCORE_SIZE], name=`score_placeholder`)
  loss_label_placeholder = tf.placeholder(`string`, name=`loss_label_placeholder`)

with tf.Graph (). as_default () :: Statement declaring the graph class salary_placeholder: An object that stores annual income teacher data

[None, 1]: Means that the number of columns is 1 and the number of rows is arbitrary.

score_placeholder: Object to store input data loss_label_placeholder = Character string storage object to be included / reflected in the summary information at the time of output

  feed_dict_train={
    salary_placeholder: salary_train,
    score_placeholder: score_train,
    loss_label_placeholder: `loss_train`
  }

feed_dict_train: Declaration of dictionary type data to be eaten for each learning update

Stored in dictionary type data (feed_dict_train). In order for the TF to read the data each time, it has to bite the feed. http://goo.gl/00Ikjg ↑ Review of dictionary type data Enter a temporary Tensor for key and an initial value for value. A set of dictionaries for learning data. You can see why this type of dictionary is needed in session.run below.

  feed_dict_test={
    salary_placeholder: salary_test,
    score_placeholder: score_test,
    loss_label_placeholder: `loss_test`
  }

feed_dict_test: Create dictionary data for test data The configuration is the same as above.

`python`


  output = inference(score_placeholder)
  loss = loss(output, salary_placeholder, loss_label_placeholder)
  training_op = training(loss)

Insert the forward propagation NN calculation function inference into ʻoutput. Insert the calculation function loss of the square error function into loss. Insert the SGD algorithm execution function training into training_op`.

`python`


  summary_op = tf.merge_all_summaries()

tf.merge_all_summaries: Aggregates the information of the Summary function.

http://goo.gl/wQo8Rz

  init = tf.initialize_all_variables()

Assign the function tf.initialize_all_variables that initializes all variables to init. ** The timing of init declaration of initialization is important. ** If you do not declare after defining all the required variables, an error will be returned.

http://goo.gl/S58XJ2

  best_loss = float(`inf`)

Declare best_loss, a floating point number. There is a place to update the error below. The best_loss is used for that value, but the initial best_loss is higher than any number (inf; infinity) because the initial value loss must be included.

  with tf.Session() as sess:

Session is a core class in Graph, and is packed with instructions related to execution in general. If Session is not declared, all tf object-related processing will not start.

http://goo.gl/pDZeLI

    summary_writer = tf.train.SummaryWriter('data', graph_def=sess.graph_def)

tf.train.SummaryWriter: A function that writes summary information to an event file.

    sess.run(init)

sess.run: Further core function.

Execute the instruction written in the first argument. In Windows, it's like an .exe!

    for step in range(10000):
      sess.run(training_op, feed_dict=feed_dict_train)
      loss_test = sess.run(loss, feed_dict=feed_dict_test)

for step in range (10000): Repeat 10000 times. "Learning data for 89 people and testing data for 5 people." This is one time! feed_dict = feed_dict_train: Important options for session.run

The feed system that I have repeatedly explained here. Here, the necessary data is taken in and the learning update calculation is performed.

      if loss_test < best_loss:
        best_loss = loss_test
        best_match = sess.run(output, feed_dict=feed_dict_test)

Only when the minimum error value is updated, record the error value and the output layer (estimated annual salary).

      if step % 100 == 0:
        summary_str = sess.run(summary_op, feed_dict=feed_dict_test)
        summary_str += sess.run(summary_op, feed_dict=feed_dict_train)
        summary_writer.add_summary(summary_str, step)

An instruction to collect summary information once every 100 steps. In addition, write the collected information to the event file.

    print sess.run(tf.nn.l2_normalize(salary_placeholder, 0), feed_dict=feed_dict_test)
    print best_match

Display teacher data and highly accurate output layer data for each step.