I've found some undocumented behavior about name behavior in tf.keras custom layers, so let me know. The "variable name" mentioned here is not the variable name in the Python grammar, but the name (required as an argument) given to the Tensorflow variable (tf.Variable).
Before the recommended writing, I will explain a little about variable names.
It's not my.v1 or self.v2 in the sample code below, but my_variable1 or my_variable2.
import tensorflow as tf
#Custom layer sample code
#Self-made fully connected layer
class MyLayer(tf.keras.layers.Layer):
def __init__(self, output_dim):
super().__init__()
self.output_dim = output_dim
#Bias term
#It does not depend on the size of the input data
self.v1 = self.add_weight(name='my_variable1', shape=[output_dim])
def build(self, input_shape):
#affine matrix
#Depends on the size of the input data
self.v2 = self.add_weight(name='my_variable2', shape=[input_shape[1], self.output_dim])
self.built = True
def call(self, inputs, **kwargs):
return tf.matmul(inputs, self.v2) + self.v1
The contents around here are the contents in the official tutorial.
Let's actually run it and check it.
model = MyLayer(output_dim=3)
#The build method is executed the first time you enter data, so enter the appropriate data
x = tf.random.normal(shape=(3, 5))
y = model(x)
print(model.trainable_variables)
↓ This is the name
[<tf.Variable 'my_variable1:0' shape=(3,) dtype=float32, numpy=array([-0.56484747, 0.00200152, 0.42238712], dtype=float32)>,
↓ This is the name
<tf.Variable 'my_layer/my_variable2:0' shape=(5, 3) dtype=float32, numpy=
array([[ 0.47857696, -0.04394728, 0.31904382],
[ 0.37552172, 0.22522384, 0.07408607],
[-0.74956644, -0.61549807, -0.41261673],
[ 0.4850598 , -0.45188528, 0.56900233],
[-0.39462167, 0.40858668, -0.5422235 ]], dtype=float32)>]
my_variable1: 0
and my_layer / my_variable2: 0
.
There's something extra, but I've confirmed that the variable names are my_variable1 and my_variable2, respectively, so it's OK.
Is it right?
Let's continue with the previous example.
#When you stack your own layers
model = tf.keras.Sequential([
MyLayer(3),
MyLayer(3),
MyLayer(3)
])
↓
[<tf.Variable 'my_variable1:0' shape=(3,) dtype=float32, (Abbreviation)>,
<tf.Variable 'sequential/my_layer_1/my_variable2:0' shape=(5, 3) dtype=float32, (Abbreviation))>,
<tf.Variable 'my_variable1:0' shape=(3,) dtype=float32, (Abbreviation)>,
<tf.Variable 'sequential/my_layer_2/my_variable2:0' shape=(3, 3) dtype=float32, (Abbreviation)>,
<tf.Variable 'my_variable1:0' shape=(3,) dtype=float32, (Abbreviation)>,
<tf.Variable 'sequential/my_layer_3/my_variable2:0' shape=(3, 3) dtype=float32, (Abbreviation)]
my_variable1 is full (crying). Indistinguishable.
Even if I drew a histogram of variables with Tensorboard, the names collided and I could not understand the translation.
class MyLayer(tf.keras.layers.Layer):
def __init__(self, output_dim):
super().__init__()
self.output_dim = output_dim
def build(self, input_shape):
#Bias term
#It does not depend on the size of the input data
self.v1 = self.add_weight(name='my_variable1', shape=[output_dim])
#affine matrix
#Depends on the size of the input data
self.v2 = self.add_weight(name='my_variable2', shape=[input_shape[1], self.output_dim])
self.built = True
def call(self, inputs, **kwargs):
return tf.matmul(inputs, self.v2) + self.v1
Simply declare all the variables in the build method.
Since Tensorflow is also define by run after version 2, I think that it can not be solved until the model and layer order is executed first. I think that's why the \ _ \ _ init__ and biuld methods make a big difference.
By the way, tf.keras.layers.Dense etc. are all declared in the build method, so you can use it with confidence.
When declaring a variable in a custom layer, be sure to declare it in the build method. Do not declare in the \ _ \ _ init__ method.
Explanation of the behavior of name processing
It will be added automatically according to the Tensorflow specifications. When executing on multiple GPUs, variables are copied for each GPU, so they are numbered 0, 1, 2, ... in that order. The specifications around here are the same as in version 1.
In version 2, you can check by doing the same as above on multiple GPUs using tf.distribute.MirroredStrategy etc.
my_layer is the default name if you did not explicitly name MyLayer. The class name is automatically converted to a snake case.
Also, when tf.keras.Sequential is used in the second example, it is my_layer_1, my_layer_2, my_layer_3. This is automatically added to the end to avoid name conflicts. This is because the first example has my_layer and the second example is executed in succession.
I think this is the same behavior as it was in version 1. At least Tensorflow's wrapper library dm-sonnet does the same.