Overview

This is an article about clipping of each value held by tensor handled by TensorFlow. Specifically, it will be an explanation and implementation example of tf.clip_by ... system methods.

Why do you do this in the first place?

I think there are various uses, but in machine learning-related calculations (especially gradient calculations), which is the main purpose of TensorFlow, there are many cases where the scale feeling of the variables to be handled is different and it is not possible to calculate well. Clipping and normalization are used in such cases.

TensorFlow terminology

In TensorFlow, conventional processes that represent variables, constants, and various operations are called Op nodes (Reference article). By executing (run) in the session, the Op node holds a set (tensor) of the values of the operation result and transmits it to the next Op node. In this article, the numerical set after some processing is completed is referred to as ** node **.

clip_by system

Image of suppressing the value of a node when it meets the conditions for some fixed standard

tf.clip_by_value


tf.clip_by_value(
    t,
    clip_value_min,
    clip_value_max,
    name=None
)

For each value held by the node, change the value greater than the maximum value clip_value_max to clip_value_max and the value smaller than the minimum value clip_value_min to clip_value_min.

`example1.py`


p1 = tf.placeholder(tf.int32, 6, name='p1')
p2 = tf.placeholder(tf.float32, 6, name='p2')

clip_value1 = tf.clip_by_value(p1, clip_value_max=2, clip_value_min=-2, name='clip_value1')
clip_value2 = tf.clip_by_value(p2, clip_value_max=2., clip_value_min=-2., name='clip_value2')

num1 = np.linspace(-4, 6, 6)

with tf.Session() as sess:
    print(p1.eval(feed_dict={p1: num1}, session=sess))
    print(p2.eval(feed_dict={p2: num1}, session=sess))

    print(clip_value1.eval(feed_dict={p1: num1}, session=sess))
    print(clip_value2.eval(feed_dict={p2: num1}, session=sess))

`console`


[-4 -2  0  2  4  6]
[-4. -2.  0.  2.  4.  6.]

[-2 -2  0  2  2  2]
[-2. -2.  0.  2.  2.  2.]

** If the node and clip_value type do not match, an error will be thrown. ** **

`example1.py`


    print(clip_error1.eval(feed_dict={p1: num1}, session=sess))

`console`


TypeError: Expected int32 passed to parameter 'y' of op 'Minimum', got 2.0 of type 'float' instead.

tf.clip_by_norm

tf.clip_by_norm(
    t,
    clip_norm,
    axes=None,
    name=None
)

If the node's L2 norm is greater than clip_norm, change each value to change to this norm. If it is less than clip_norm, it will not be changed.

`example2.py`


p3 = tf.placeholder(tf.float32, [2, 3], name='p3')

clip_norm1 = tf.clip_by_norm(p3, clip_norm=4, name='clip_norm1')
clip_norm2 = tf.clip_by_norm(p3, clip_norm=5, name='clip_norm2')

num2 = np.linspace(-2, 3, 6).reshape((2, 3))

with tf.Session() as sess:
    print(p3.eval(feed_dict={p3: num2}, session=sess))
    print(clip_norm1.eval(feed_dict={p3: num2}, session=sess))
    print(clip_norm2.eval(feed_dict={p3: num2}, session=sess))

`console`


[[-2. -1.  0.]
 [ 1.  2.  3.]]   #The overall L2 norm is 4.358 ...

[[-1.8353258 -0.9176629  0.       ]
 [ 0.9176629  1.8353258  2.7529888]]

[[-2. -1.  0.]
 [ 1.  2.  3.]]

You can specify axes in tf.clip_by_norm. Normalizes the value with the L2 norm for each axis specified by axes.

`example3.py`


clip_norm3 = tf.clip_by_norm(p3, clip_norm=3, axes=1, name='clip_norm3')

with tf.Session() as sess:
    print(p3.eval(feed_dict={p3: num2}, session=sess))
    print(clip_norm3.eval(feed_dict={p3: num2}, session=sess))

`console`


[[-2. -1.  0.]    #The L2 norm in column 0 is 2.236 ...
 [ 1.  2.  3.]]   #The L2 norm in the first row is 3.741 ...

[[-2.        -1.         0.       ]
 [ 0.8017837  1.6035674  2.4053512]]

In addition, tf.clip_by_norm will result in TypeError if the node to be bitten cannot handle the decimal point. Please use float ◯◯ or complex ◯◯ type.

tf.clip_by_global_norm


tf.clip_by_global_norm(
    t_list,
    clip_norm,
    use_norm=None,
    name=None
)

Unlike tf.clip_by_norm, it passes a ** list of nodes ** instead of nodes. If you pass the node itself, you will get a TypeError.

Let the L2 norm of the entire node stored in the list be global_norm, and if this value is greater than clip_norm, change all the values in the list so that the L2 norm is clip_norm. If it is less than clip_norm, it will not be changed.

Also, there are two return values: a list containing ** nodes after clipping ** list_clipped and a calculated global_norm.

`example4.py`


c1 = tf.constant([[0, 1, 2], [3, 4, 5]], dtype=tf.float32, name='c1')
c2 = tf.constant([[-2, -4], [2, 4]], dtype=tf.float32, name='c2')
C = [c1, c2]

clip_global_norm, global_norm = tf.clip_by_global_norm(C, clip_norm=9, name='clip_global_norm')

with tf.Session() as sess:
    for c in C:
        print(c.eval(session=sess))
    print(global_norm.eval(session=sess))
    for cgn in clip_global_norm1:
        print(cgn.eval(session=sess))

`console`


[[0. 1. 2.]
 [3. 4. 5.]]
[[-2. -4.]
 [ 2.  4.]]

9.746795
[[0.        0.9233805 1.846761 ]
 [2.7701416 3.693522  4.6169024]]
[[-1.846761 -3.693522]
 [ 1.846761  3.693522]]

The tf.clip_by_norm and tf.clip_by_global_norm methods themselves are simple, but can be used, for example, to address gradient explosion countermeasures "gradient clipping" in RNNs.

The following will be helpful.

Pascanu et al., (2012), On the difficulty of training Recurrent Neural Networks (pdf) -[Understanding LSTM-Clipping gradient with #rnn and gradient with recent trends](https://qiita.com/t_Signull/items/21b82be280b46f467d1b#rnn%E3%81%A8%E5%8B%BE%E9 % 85% 8D% E3% 81% AE% E3% 82% AF% E3% 83% AA% E3% 83% 83% E3% 83% 94% E3% 83% B3% E3% 82% B0 gradient-cliping)

After building the model, if you want to learn it, you may be able to solve it by taking this method when you jump to inf by error propagation calculation etc.

bonus

Clipping actually changed the holding value of the node, but it is also possible to calculate the norm only.

tf.norm

tf.norm(
    tensor,
    ord='euclidean',
    axis=None,
    keepdims=None,
    name=None
)

The parameter ʻord` determines the value of p in the Lp norm. For the L∞ norm, specify np.inf.

`example4.py`


p4 = tf.placeholder(tf.float32, [3, 4], name='p4')

normalize1 = tf.norm(p4, name='normalize1')
normalize2 = tf.norm(p4, ord=1.5, axis=0, name='normalize2')
normalize3 = tf.norm(p4, ord=np.inf, axis=1, name='normalize3')

num3 = np.linspace(-10, 8, 12).reshape((3, 4))

with tf.Session() as sess:
    print(p4.eval(feed_dict={p4: num3}, session=sess))
    print(normalize1.eval(feed_dict={p4: num3}, session=sess))
    print(normalize2.eval(feed_dict={p4: num3}, session=sess))
    print(normalize3.eval(feed_dict={p4: num3}, session=sess))

`console`


[[-10.          -8.363636    -6.7272725   -5.090909  ]
 [ -3.4545455   -1.8181819   -0.18181819   1.4545455 ]
 [  3.090909     4.7272725    6.3636365    8.        ]]

19.87232

[12.364525  11.0871725 10.408293  10.876119 ]

[10.         3.4545455  8.       ]

reference

TensorFlow > API > TensorFlow Core r2.0 > Python > tf.cilp_by_value TensorFlow > API > TensorFlow Core r2.0 > Python > tf.clip_by_norm TensorFlow > API > TensorFlow Core r2.0 > Python > tf.clip_by_global_norm TensorFlow > API > TensorFlow Core r2.0 > Python > tf.norm

Tomorrow is "Hacking the reservation system using Ruby" by @yoshishin. Please continue to enjoy GMO Advent Calendar 2019!

Clipping and normalization in TensorFlow

Overview

Why do you do this in the first place?

TensorFlow terminology

clip_by system

example1.py

console

example1.py

console

example2.py

console

example3.py

console

example4.py

console

bonus

example4.py

console

reference

`example1.py`

`console`

`example1.py`

`console`

`example2.py`

`console`

`example3.py`

`console`

`example4.py`

`console`

`example4.py`

`console`