RaggedTensor that represents variable length data introduced in TensorFlow 2.1 or later, but if you try to write with ordinary Tensor glue, there are various things I'm addicted to it.
This time is Indexing. Try to retrieve the value from RaggedTensor
by specifying a specific index. As you get used to it, you will be able to perform complicated operations ...
Suppose that x
is created as the RaggedTensor
to be indexed as follows.
x = tf.RaggedTensor.from_row_lengths(tf.range(15), tf.range(1, 6))
print(x)
# <tf.RaggedTensor [[0], [1, 2], [3, 4, 5], [6, 7, 8, 9], [10, 11, 12, 13, 14]]>
Column index | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Line 0 | 0 | ||||
The first line | 1 | 2 | |||
2nd line | 3 | 4 | 5 | ||
3rd line | 6 | 7 | 8 | 9 | |
4th line | 10 | 11 | 12 | 13 | 14 |
The first operation is to retrieve a line, which is the same as a normal Tensor
. You can think of it as numpy.ndarray
. If you specify a range, ** includes the first index and does not include the last index. ** If you are a Python user, I think there is no problem.
print(x[2])
# tf.Tensor([3 4 5], shape=(3,), dtype=int32)
print(x[1:4])
# <tf.RaggedTensor [[1, 2], [3, 4, 5], [6, 7, 8, 9]]>
However, unlike numpy.ndarray
, it seems that slicing that specifies discrete rows cannot be used.
#This can be done for ndarray
print(x.numpy()[[1, 3]])
# [array([1, 2], dtype=int32) array([6, 7, 8, 9], dtype=int32)]
# Tensor/Not available for Ragged Tensor
print(x[[1, 3]])
# InvalidArgumentError: slice index 3 of dimension 0 out of bounds. [Op:StridedSlice] name: strided_slice/
Please go here instead.
# Tensor/Fancy Indexing with Ragged Tensor
print(tf.gather(x, [1, 3], axis=0))
# <tf.RaggedTensor [[1, 2], [6, 7, 8, 9]]>
The following is an example of slicing with a fixed column index.
Unlike a normal Tensor
, the presence or absence of an element at that index depends on the row, so it's simply
print(x[:, 2])
# ValueError: Cannot index into an inner ragged dimension.
It is not possible to do like. If you specify the range
print(x[:, 2:3])
# <tf.RaggedTensor [[], [], [5], [8], [12]]>
It works like. It is []
for the row where the specified index does not exist.
Column index | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Line 0 | 0 | ||||
The first line | 1 | 2 | |||
2nd line | 3 | 4 | 5 | ||
3rd line | 6 | 7 | 8 | 9 | |
4th line | 10 | 11 | 12 | 13 | 14 |
If you have a Tensor
that lists the 2D indexes you want to collect, you can use tf.gather_nd ()
.
ind = tf.constant([[0, 0], [1, 1], [2, 0], [4, 3]])
#x(0, 0), (1, 1), (2, 0), (4, 3)I want to collect elements
print(tf.gather_nd(x, ind))
# tf.Tensor([ 0 2 3 13], shape=(4,), dtype=int32)
Column index | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Line 0 | 0 | ||||
The first line | 1 | 2 | |||
2nd line | 3 | 4 | 5 | ||
3rd line | 6 | 7 | 8 | 9 | |
4th line | 10 | 11 | 12 | 13 | 14 |
On the other hand, I fetch one element for each row, but I think there are times when you want to fetch from different columns.
col = tf.constant([0, 0, 2, 1, 2])
#x(0, 0), (1, 0), (2, 2), (3, 1), (4, 2)I want to collect elements
#Add line numbers to the index, then use the same method as before
ind = tf.transpose(tf.stack([tf.range(tf.shape(col)[0]), col]))
print(tf.gather_nd(x, ind))
# tf.Tensor([ 0 1 5 7 12], shape=(5,), dtype=int32)
Column index | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Line 0 | 0 | ||||
The first line | 1 | 2 | |||
2nd line | 3 | 4 | 5 | ||
3rd line | 6 | 7 | 8 | 9 | |
4th line | 10 | 11 | 12 | 13 | 14 |
But I feel like it's going to be late, so I thought about a smarter way.
print(tf.gather(x.values, x.row_starts() + col))
# tf.Tensor([ 0 1 5 7 12], shape=(5,), dtype=int32)
This is OK.
The entity of the value of x
is contained in Tensor
(not RaggedTensor
) that connects each line (one dimension less) and can be obtained by accessing x.values
. I will. It also holds information about the start index of each row (x.row_starts ()
) to represent the shape of x
. Therefore, you can add the specified offset to this index and slice against x.values
.
%timeit tf.gather_nd(x, tf.transpose(tf.stack([tf.range(tf.shape(col)[0]), col])))
# 739 µs ± 75.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%timeit tf.gather(x.values, x.row_starts() + col)
# 124 µs ± 6.47 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
This one is faster (^_^)
If you want to master the operation around here, it is good to see the official document.
Apply the fact that the substance of the value is in the one-dimensional Tensor
.
col = tf.ragged.constant([[0], [], [0, 2], [1, 3], [2]])
#x(0, 0), (2, 0), (2, 2), (3, 1), (3, 3), (4, 2)I want to collect elements
#Get the start index of each row of x
row_starts = tf.cast(x.row_starts(), "int32")
#Get the line number to which each component of col belongs, convert it to the starting index at x, and add the offset
ind_flat = tf.gather(row_starts, col.value_rowids()) + col.values
ret = tf.gather(x.values, ind_flat)
print(ret)
# tf.Tensor([ 0 3 5 7 9 12], shape=(6,), dtype=int32)
Column index | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Line 0 | 0 | ||||
The first line | 1 | 2 | |||
2nd line | 3 | 4 | 5 | ||
3rd line | 6 | 7 | 8 | 9 | |
4th line | 10 | 11 | 12 | 13 | 14 |
The result above is a normal Tensor
with the values listed, and the information in the original row is lost, but what if you want to save the row information?
You can create a RaggedTensor
by giving the Tensor
information about the starting index of the row. The length of each row should match col
, so you can get this starting index from col.value_rowids ()
.
print(tf.RaggedTensor.from_value_rowids(ret, col.value_rowids()))
# <tf.RaggedTensor [[0], [], [3, 5], [7, 9], [12]]>
Even if the data of 2 dimensions or more are arranged in chronological order (3 dimensions or more for RaggedTensor
including batch dimension), the conventional method can be used as it is.
x = tf.RaggedTensor.from_row_lengths(tf.reshape(tf.range(30), (15, 2)), tf.range(1, 6))
print(x)
# <tf.RaggedTensor [[[0, 1]], [[2, 3], [4, 5]], [[6, 7], [8, 9], [10, 11]], [[12, 13], [14, 15], [16, 17], [18, 19]], [[20, 21], [22, 23], [24, 25], [26, 27], [28, 29]]]>
The structure of this x
can be interpreted as follows.
Column index | 0 | 1 | 2 | 3 | 4 |
---|---|---|---|---|---|
Line 0 | [0, 1] | ||||
The first line | [2, 3] | [4, 5] | |||
2nd line | [6, 7] | [8, 9] | [10, 11] | ||
3rd line | [12, 13] | [14, 15] | [16, 17] | [18, 19] | |
4th line | [20, 21] | [22, 23] | [24, 25] | [26, 27] | [28, 29] |
The rest is exactly the same as before. However, note that the returned Tensor
is two-dimensional.
ind = tf.constant([[0, 0], [1, 1], [2, 0], [4, 3]])
#x(0, 0), (1, 1), (2, 0), (4, 3)I want to collect elements
print(tf.gather_nd(x, ind))
# tf.Tensor(
# [[ 0 1]
# [ 4 5]
# [ 6 7]
# [26 27]], shape=(4, 2), dtype=int32)
col = tf.constant([0, 0, 2, 1, 2])
#x(0, 0), (1, 0), (2, 2), (3, 1), (4, 2)I want to collect elements
print(tf.gather(x.values, x.row_starts() + col))
# tf.Tensor(
# [[ 0 1]
# [ 2 3]
# [10 11]
# [14 15]
# [24 25]], shape=(5, 2), dtype=int32)
col = tf.ragged.constant([[0], [], [0, 2], [1, 3], [2]])
#x(0, 0), (2, 0), (2, 2), (3, 1), (3, 3), (4, 2)I want to collect elements
#Get the start index of each row of x
row_starts = tf.cast(x.row_starts(), "int32")
#Get the line number to which each component of col belongs, convert it to the starting index at x, and add the offset
ind_flat = tf.gather(row_starts, col.value_rowids()) + col.values
ret = tf.gather(x.values, ind_flat)
print(ret)
# tf.Tensor(
# [[ 0 1]
# [ 6 7]
# [10 11]
# [14 15]
# [18 19]
# [24 25]], shape=(6, 2), dtype=int32)
#If you want to save the information of the original line
print(tf.RaggedTensor.from_value_rowids(ret, col.value_rowids()))
# <tf.RaggedTensor [[[0, 1]], [], [[6, 7], [10, 11]], [[14, 15], [18, 19]], [[24, 25]]]>
Recommended Posts