This is a training to become a data scientist using Python. Finally I will use Numpy. Numpy is a library for manipulating large multidimensional arrays and matrices. You can't parse data in Python without it.

** PREV ** → [Python] Road to snake charmer (3) Python class ** NEXT ** → [Python] Road to snake charmer (5) Play with Matplotlib

Array generation

First, import numpy.

>>> import numpy as np

Generate 1D, 2D, and 3D arrays.

>>> a1 = np.array([1, 2, 3], int)
>>> a2 = np.array([[1, 2, 3], [4, 5, 6]])
>>> a3 = np.array([[[1, 2, 3], [4, 5, 6]], [[7, 8, 9], [10, 11, 12]]])

Examine the data type (dtype) and shape (shape).

>>> a2.dtype, a1.shape, a2.shape, a3.shape
    (dtype('int64'), (3,), (2, 3), (2, 2, 3))

Type convert array.

>>> a1.astype(float), a1.astype(complex)
    (array([ 1.,  2.,  3.]), array([ 1.+0.j,  2.+0.j,  3.+0.j]))

Generate an array of all 0.

>>> np.zeros((2, 3), int)
    array([[0, 0, 0],
           [0, 0, 0]])

Generate an array of all 1.

>>> np.ones((2, 3),int)
    array([[1, 1, 1],
           [1, 1, 1]])

Generate a 3x3 identity matrix.

>>> np.identity(3, int)
    array([[1, 0, 0],
           [0, 1, 0],
           [0, 0, 1]])

Generate a diagonal matrix from diagonal components.

>>> a5 = np.diag([1, 2, 3]); a5
    array([[1, 0, 0],
           [0, 2, 0],
           [0, 0, 3]])

Extract the diagonal components from the square matrix.

>>> np.diag(a5)
    array([1, 2, 3])

Shape change

Generate an array with shape = (2,3).

>>> a1 = np.array([[1, 2, 3], [4, 5, 6]]); a1
    array([[1, 2, 3],
           [4, 5, 6]])

Changed to one-dimensional shape of shape = (6,).

>>> a1.reshape(6,)
    array([1, 2, 3, 4, 5, 6])

Changed to a two-dimensional shape with shape = (3,2).

>>> a1.reshape(3, 2)
    array([[1, 2],
           [3, 4],
           [5, 6]])

Subarray

slice

A slice is an arithmetic progression index list, which is given to [...] to get a subarray.

>>> a1 = np.array(range(1, 10)).reshape(3, 3); a1
    array([[1, 2, 3],
           [4, 5, 6],
           [7, 8, 9]])

Extract row numbers from 1 to less than 3 and column numbers from 1 to less than 3.

>>> a1[1:3, 1:3]
    array([[5, 6],
           [8, 9]])

Negative values for the third parameter of slice mean reverse order.

>>> a1[::-1, ::-1]
    array([[9, 8, 7],
           [6, 5, 4],
           [3, 2, 1]])

You can also use the slice object in this way.

>>> slice1 = slice(0, 3, 2) # 0:3:2
>>> a1[slice1, slice1]
    array([[1, 3],
           [7, 9]])

fancy indexing

fancy indexing is a method to get a subarray by giving an arbitrary index column to [...].

Extract the 0th and 2nd lines.

>>> a1[[0, 2], :]
    array([[1, 2, 3],
           [7, 8, 9]])

Extract the 0th column and the 2nd column.

>>> a1[:, [0, 2]]
    array([[1, 3],
           [4, 6],
           [7, 9]])

Fancy indexing on both axes. Extract the [0,0] and [2,2] components.

>>> a1[[0, 2], [0, 2]]
    array([1, 9])

Please note that the above result is a 1D array, which is different from the following.

>>> a1[[0, 2], :][:, [0, 2]]
    array([[1, 3],
           [7, 9]])

fancy indexing can also be used for data permutation purposes in this way.

>>> a1[[2, 0, 1], :]
    array([[7, 8, 9],
           [1, 2, 3],
           [4, 5, 6]])

boolean indexing

boolean indexing is a method to get a partial array of only true indexes by giving a boolean array to [...].

Create a boolean array that is True only if the value is even.

>>> a1=np.array(range(1,10)).reshape(3,3); a1
    array([[1, 2, 3],
           [4, 5, 6],
           [7, 8, 9]])

>>> a2 = (a1 % 2 == 0); a2
    array([[False,  True, False],
           [ True, False,  True],
           [False,  True, False]], dtype=bool)

Extract only the components whose index corresponds to True as a 1D array.

>>> a1[a2]
    array([2, 4, 6, 8])

General rules for subarrays

A subarray of Numpy is a reference if index is slice. In other cases (fancy index, etc.) it is a copy. Even if slice and fancy indexing are mixed like ʻa1 [[0, 2],:] `, it will be a copy.

Joining arrays

First, prepare two 2x2 two-dimensional arrays.

>>> a1 = np.array(range(1, 5)).reshape(2, 2); a1
    array([[1, 2],
           [3, 4]])

>>> a2 = np.array(range(5, 9)).reshape(2, 2); a2
    array([[5, 6],
           [7, 8]])

Join in the row direction. (Line direction: 0th axis) shape = (4,2).

>>> np.vstack((a1,a2)) # np.concatenate((a1,a2),axis=0)Same with
    array([[1, 2],
           [3, 4],
           [5, 6],
           [7, 8]])

Join in the column direction. (Column direction: 1st axis) shape = (2,4).

>>> np.hstack((a1,a2))  #  np.concatenate((a1,a2),axis=1)Same with
    array([[1, 2, 5, 6],
           [3, 4, 7, 8]])

Swap the shaft

Prepare a 2D array with shape = (2,3).

>>> a1 = np.array(range(2 * 3)).reshape(2, 3); a1
    array([[0, 1, 2],
           [3, 4, 5]])

Transpose (swap the 0th axis (row) and 1st axis (column)).

B_{ji} = A_{ij}

>>> a1.T
    array([[0, 3],
           [1, 4],
           [2, 5]])

Prepare a 4-dimensional array with shape = (2,3,4,5).

>>> a2 = np.array(range(2 * 3 * 4 * 5)).reshape(2, 3, 4, 5)
>>> a2.shape
    (2, 3, 4, 5)

Swap the 1st and 3rd axes.

B_{ilkj} = A_{ijkl}

>>> a2.swapaxes(1, 3).shape
    (2, 5, 4, 3)

Let the 1st, 2nd, 3rd and 0th axes be the new 0th, 1st, 2nd and 3rd axes.

B_{jkli} = A_{ijkl}

>>> a3=a2.transpose(1, 2, 3, 0)

The same process can be expressed visually in an easy-to-understand manner by using einsum.

>>> a4=np.einsum('ijkl->jkli', a2)
>>> a3.shape, a4.shape, (a3 == a4).all()
    ((3, 4, 5, 2), (3, 4, 5, 2), True)

outer and kron

Prepare the following matrix.

A =
\left(
\begin{matrix}
1 & 2 \\
3 & 4 
\end{matrix}
\right)
　　
I =
\left(
\begin{matrix}
1 & 0 \\
0 & 1 
\end{matrix}
\right)

>>> A = np.array([[1, 2], [3, 4]]); A
    array([[1, 2],
           [3, 4]])

>>> I = np.identity(2); I  #2x2 identity matrix
    array([[ 1.,  0.],
           [ 0.,  1.]])

Make I and A one-dimensional and calculate the tensor product of two vectors.

C =
\left(
\begin{matrix}
1\\
0\\
0\\
1 
\end{matrix}
\right)
⊗
\left(
\begin{matrix}
1\\
2\\
3\\
4 
\end{matrix}
\right)
=
\left(
\begin{matrix}
1&2&3&4\\
0&0&0&0\\
0&0&0&0\\
1&2&3&4 
\end{matrix}
\right)

>>> C = np.outer(I, A); C
    array([[ 1.,  2.,  3.,  4.],
           [ 0.,  0.,  0.,  0.],
           [ 0.,  0.,  0.,  0.],
           [ 1.,  2.,  3.,  4.]])

Calculate the Kronecker tensor product (the so-called operator tensor product).

D =
\left(
\begin{matrix}
1&0\\
0&1 
\end{matrix}
\right)
⊗
\left(
\begin{matrix}
1&2\\
3&4 
\end{matrix}
\right)
=
\left(
\begin{matrix}
1&2&0&0\\
3&4&0&0\\
0&0&1&2\\
0&0&3&4 
\end{matrix}
\right)

>>> D = np.kron(I, A); D
    array([[ 1.,  2.,  0.,  0.],
           [ 3.,  4.,  0.,  0.],
           [ 0.,  0.,  1.,  2.],
           [ 0.,  0.,  3.,  4.]])

** NEXT ** → [Python] Road to snake charmer (5) Play with Matplotlib

[Python] Road to a snake charmer (4) Tweak Numpy