This is the 22nd day article of TensorFlow2.0 Advent Calendar 2019.

In this article, we will introduce the TensorFlow internal optimization process and how to control the optimization when constructing a TensorFlow calculation graph from a Python program with tf.function.

This article explains what is done behind tf.function, so please refer to other articles for how to use tf.function itself. TensorFlow2.0 Advent Calender 2019 also posted some articles about tf.function.

How to use tf.function-https://qiita.com/Ryuichirou/items/66a75610c569a23ac493
Notes on tf.function and Tracing --https://qiita.com/t_shimmura/items/1209d01f1e488c947cab

tf.function

Functions decorated with @ tf.function are converted to TensorFlow calculation graphs. For example, consider the following program.

import tensorflow as tf

@tf.function
def simple_func(arg):
    a = tf.constant(7.9)
    b = tf.constant(6.3)
    c = arg + a
    d = a * b
    ret = c + d
    
    return ret

arg = tf.constant(8.9)
print(simple_func(arg))

The computational graph created by the simple_func function looks like this:

Calculation graph optimization process

The calculated graph transformed by tf.function is optimized inside TensorFlow (C ++ layer). This optimization process is also used in Graph Mode, which was the default of TensorFlow 1.x, and the calculation graph optimization technology cultivated in TensorFlow 1.x is also utilized in tf.function.

This is an old article for TensorFlow 1.13, but if you are interested in the optimization process of calculation graphs performed inside TensorFlow, please also refer to the following article. In TensorFlow 2.0, in addition to the optimizations introduced in the article, new optimizations such as Auto Mixed Precision have been added. I would like to introduce the latest optimization process at another time.

Now, returning to the calculation graph above, the calculation graph after optimizing the calculation graph generated by tf.function is as follows.

If you look at the generated calculation graph, you can see that some nodes in the calculation graph have been deleted. The optimization done here is called Constant Folding, where all node inputs are constant values. Evaluate when building the calculation graph and replace it with the Const node. By evaluating in advance what can be evaluated at the time of constructing the calculation graph, the processing time of the calculation graph as a whole can be shortened.

Check the optimized calculation graph

You can check the optimized calculation graph using TensorBoard. In order to output Summary data for TensorBoard, it is necessary to call tf.summary.trace_on () before calling the calculation graph constructed by tf.function. Note that the optimized calculation graph will not be output unless True is specified for the arguments graph and profiler oftf.summary.trace_on (). Then, after executing the calculation graph you want to check, you can output the optimized calculation graph by calling tf.summary.trace_export (). The source code that outputs the optimized calculation graph for TensorBoard is shown below.

import tensorflow as tf

@tf.function
def simple_func(arg):
    a = tf.constant(7.9)
    b = tf.constant(6.3)
    c = arg + a
    d = a * b
    ret = c + d
    
    return ret

#Enable Summary data collection for TensorBoard
writer = tf.summary.create_file_writer("summary")
#You can check the optimized calculation graph by specifying True for the arguments graph and profiler.
tf.summary.trace_on(graph=True, profiler=True)

arg = tf.constant(8.9)
print(simple_func(arg))

#Output the collected Summary data
with writer.as_default():
    tf.summary.trace_export("summary", step=0, profiler_outdir="summary")

#Disable the collection of Summary data
tf.summary.trace_off()

Use TensorBoard to read the Summary data for TensorBoard output by executing the program. First, let's take a look at the user-defined graph. With the TensorBoard's GRAPHS tab selected, you can view a user-defined graph by selecting Graph from the radio box on the left.

The user-defined graph is the computational graph itself defined in simple_func.

Next, let's check the calculation graph after optimization. With the "GRAPHS" tab of TensorBoard selected, you can display the optimized calculation graph by selecting "Profile" from the radio box on the left side.

In the optimized calculation graph, one Mul node and its input Constant node are shaded. Shaded nodes are nodes that have not been calculated inside TensorFlow. It is possible that these shaded nodes are no longer computed as a result of the computational graph optimization inside TensorFlow.

In this way, by using TensorBoard, you can check the difference between the user-defined calculation graph and the optimized calculation graph.

Control the optimization process of the calculation graph

The calculation graph optimization process performed inside TensorFlow can be enabled / disabled by calling tf.config.optimizer.set_experimental_options (). Note that the only optimization process that can be enabled / disabled by tf.config.optimizer.set_experimental_options () is Optimize process performed by Grappler. is. Note that optimization with GraphOptimizationPass and GraphOptimizer will always be performed. please.

Before explaining the program that enables / disables the optimization process of the calculation graph, let's check the default optimization settings. You can check the settings related to the optimization of the calculation graph by calling tf.config.optimizer.get_experimental.options ().

import tensorflow as tf

tf.config.optimizer.get_experimental_options()

If you do the above, you will get the following results:

{'disable_meta_optimizer': False, 'disable_model_pruning': False}

disable_meta_optimizer is a setting that disables Optimize processing performed by Grappler, and False is specified by default. From this, you can see that the optimization process by Grappler is enabled by default. In addition, since other optimizations are not set, each [Default optimization setting](https://qiita.com/nuka137/items/f1b0fe9c820e4d5f80cc#grappler%E3%81%AB%E3%82%88 % E3% 82% 8B% E6% 9C% 80% E9% 81% A9% E5% 8C% 96% E9% A0% 85% E7% 9B% AE) has been applied.

Now, let's check the effect by giving an example of actually enabling / disabling optimization.

Disable optimization "Debug Stripper"

Debug Stripper is an optimization process that deletes nodes (Assert, etc.) used for debugging purposes. The Debug Stripper is disabled by default, so Assert nodes added by tf.Assert will not be removed. As a result, the following code raises an exception at tf.Assert.

import tensorflow as tf

@tf.function
def assert_func():
    a = tf.constant(1.2)
    computation_graph = tf.Assert(tf.less_equal(a, 1.0), [a])   #Exception "InvalidArgumentError" occurs
    return a

print(assert_func())

On the other hand, if you enable and run the Debug Stripper, the Assert node added by tf.Assert will be removed and the exception raised above will no longer occur.

import tensorflow as tf

#Enable "Debug Stripper"
tf.config.optimizer.set_experimental_options({'debug_stripper': True})

@tf.function
def assert_func():
    a = tf.constant(1.2)
    computation_graph = tf.Assert(tf.less_equal(a, 1.0), [a])   #No exception
    return a

print(assert_func())

Asserts added for debugging purposes take time to process, such as the need to check the tensor data, which affects the execution time of the calculation graph. After debugging is completed, it is good to delete the tf.Assert etc. added for debugging one by one, but just by enabling Debug Stripper by the method shown here, the calculation for debugging purposes Will be deleted, so why not take advantage of it?

Disables all computational graph optimizations done by Grappler

I wrote that all optimizations done by Grappler can be disabled by setting disable_meta_optimizer to True, but here I would like to see the effect.

First, let's check the calculated graph after optimization with the default optimization settings. In the source code shown below, Transpose builds a continuous computational graph.

import tensorflow as tf
import numpy as np

@tf.function
def optimized(arg):
    a = arg * 2

    #Deleted by "Arithmetic Optimizer"
    b = tf.transpose(a, perm=[1, 0])
    ret = tf.transpose(b, perm=[1, 0])

    return ret

writer = tf.summary.create_file_writer("summary")
tf.summary.trace_on(graph=True, profiler=True)

arg = tf.constant(np.random.normal(size=(30, 40)))
optimized(arg)

with writer.as_default():
    tf.summary.trace_export("summary", step=0, profiler_outdir="summary")

tf.summary.trace_off()

If you check the calculation graph after optimization with TensorBoard, you can see that the Transpose node has been deleted. This is done by RemoveIdentityTranspose of Arithmetic Optimizer This is because we have deleted the transposed pairs that cancel each other out. You can also see that the Identity node, which does not affect the operation, has also been removed by the optimization.

Next, run the same calculation graph with disable_meta_optimizer set to True, and check the calculation graph after it has been optimized by TensorBoard.

import tensorflow as tf
import numpy as np

#Disables all computational graph optimization processing performed by Grappler
tf.config.optimizer.set_experimental_options({'disable_meta_optimizer': True})

@tf.function
def not_optimized(arg):
    a = arg * 2
    b = tf.transpose(a, perm=[1, 0])
    ret = tf.transpose(b, perm=[1, 0])

    return ret

writer = tf.summary.create_file_writer("summary")
tf.summary.trace_on(graph=True, profiler=True)

arg = tf.constant(np.random.normal(size=(30, 40)))
not_optimized(arg)

with writer.as_default():
    tf.summary.trace_export("summary", step=0, profiler_outdir="summary")

tf.summary.trace_off()

If you look at the calculated graph after it has been optimized, you can see that the Transpose node remains intact and the Arithmetic Optimizer is disabled. Also note that the Identity node has not been deleted.

Summary

We introduced how the computational graph built by tf.function is optimized within TensorFlow and explained how to control that optimization. It is expected that the optimization function of the calculation graph will continue to be developed, such as the addition of new optimizations even recently. Let's look forward to it in the future.

In addition, the content introduced in this article was successfully merged the other day by issuing Pull Request as an English version of the document. I think that it will be officially released as a TensorFlow document, so I hope that you can experience the fact that the optimization of the calculation graph is actually working on Google Colab. I think that the document merged this time will eventually be translated into Japanese and contributed to the TensorFlow document (https://qiita.com/Suguru_Toyohara/items/f5d9f42578eec7cc1497), so please look forward to it.