Python is a programming language that is often used in machine learning research. This is often difficult to use, especially if you want to use it in the field of basic research. This is because writing a lot of for statements slows down. So, if you try to do your best without writing a for statement, you will come across various troubles, but I would like to introduce how to use np.newaxis
, which I had trouble with. You will often come across this when you want to experiment with various parameters.
[Python] Numpy reference, extraction, combination Indexing - docs.scipy.org
At first, it's a good idea to sort out what kind of data you have and what kind of goals you want to reach. Keep in mind what kind of data, what the shape of the array is, and what dimension it is.
For the sake of simplicity, consider the following situation.
init.py
import numpy as np
assert a.shape == (N,) # ndarray
assert b.shape == (N, K) # ndarray
and
b are both vectors, and I want to do ʻa + b
b
has K
samples, the entity is stored as a two-dimensional array of (N, K)
.If you write this in a for statement,
naive.py
c = np.zeros(N, K)
for k in range(K):
c[:, k] = a + b[:, k]
However, I don't want to write a for statement as much as possible. You can also simply do ʻa + b`, but this will not be able to handle complicated problems at all.
The goal this time is to do ʻa + bfor each
K sample of
b` and get the result for each sample. Let's firmly define the shape of the final result = (arrangement of goals).
goal.py
c = np.zeros(N, K)
hogehoge()
assert c.shape == (N, K)
It will be like this.
Normally, when I put ʻa + b` in my head, it looks like this,
Since b
is a two-dimensional array with multiple samples, I don't know how to calculate it.
Let me give you instructions on how to calculate this. That is np.newaxis
.
Actually, the substance of np.newaxis
is None
, but let's use np.newaxis
without worrying too much.
When an operation (here +
) that requires the array to have the same shape comes, we will instruct "Please make it the same shape" from here. First, since the dimension of the array is ʻa.shape == (N,), at least ʻa.shape == (N, 1)
. To do this, use ʻa [:, np.newaxis] . This operation allows Numpy to automatically determine from the other argument of the operator, which stretches from
(N, 1)to
(N, K)` and performs the calculation.
add_samples.py
import numpy as np
assert a.shape == (N,) # ndarray
assert b.shape == (N, K) # ndarray
# a[:, (Not enough here)]
c = a[:, np.newaxis] + b[:, :]
assert c.shape == (N, K) #Let's have the role of error check and memo at the same time
In this way, a calm understanding of what Numpy does automatically will make it easier for you to deal with complex problems.
I will add it later even if it is a more difficult problem.
If you find the explanation difficult to understand, please let us know. Questions etc. are also accepted in the comments.
Recommended Posts