Let's think again from the origin. What is the reason for using Python for statistics and machine learning? The author thinks as follows.
There are many other types, such as Haskell and Ruby, that are purely language appealing. Also, the R language is well known for simple statistical mathematical calculations. There is also paid software for financial calculations and statistical analysis. However, Python may be the first candidate for a scientific application that specializes in computation and a language that solves problems in general system processing at the same time.
And the library that is the basis of scientific calculation and various mathematical calculations such as multidimensional array calculation by Python is NumPy.
From now on, we'll focus on NumPy's array manipulation for a while to better understand its powerful features.
The NumPy ndarray object is a class for treating stride data (continuous data) as an N-dimensional array.
Attempting to work with multidimensional arrays in common programming languages without the help of such libraries would require a complex nested list (array) and another complex nested multiple loops to compute it. It will end up. This is not realistic, so the good or bad of a multidimensional array library can be said to be essential in scientific calculations.
Here are some of the advantages of ndarray over ndarray and multiple lists (arrays).
On the other hand, ndarray has some disadvantages.
From the perspective of mathematically, especially linear algebra, I think this is rather natural.
The ndarray has a stride internally, and the array object has elements such as dtype (data type), shape, and stride.
The shape of the array can be accessed with the shape function. Strides can also be accessed with the strides function.
np.zeros((3,4))
#=> array([[ 0., 0., 0., 0.],
# [ 0., 0., 0., 0.],
# [ 0., 0., 0., 0.]])
np.zeros((3,4)).shape
#=> (3, 4)
np.zeros((3,4)).strides
#=> (32, 8)
#A byte value that indicates the "step length" required to advance an element by one in each dimension.
For arrays of two or more dimensions, the index reference destination is an array of one or more dimensions.
arr = np.array( [[[1,2,3], [4,5,6]], [[7,8,9],[10,11,12]]] )
arr.ndim #Number of dimensions
#=> 3
arr[0]
#=> array([[1, 2, 3],
# [4, 5, 6]])
#Returns a two-dimensional array
If you specify an array for the index, you can retrieve the array with the number of dimensions reduced by that amount.
arr[1,0]
#=> array([7, 8, 9])
arr[1,0,2]
#=> 9
When you want to extract from a multidimensional array in a particular order, you can pass a list of integers or ndarray to indicate that order as an index reference.
arr[[1,0]]
#=> array([[[ 7, 8, 9],
# [10, 11, 12]],
# [[ 1, 2, 3],
# [ 4, 5, 6]]])
arr = np.arange(48).reshape((4,3,4))
#=> array([[[ 0, 1, 2, 3],
# [ 4, 5, 6, 7],
# [ 8, 9, 10, 11]],
# [[12, 13, 14, 15],
# [16, 17, 18, 19],
# [20, 21, 22, 23]],
# [[24, 25, 26, 27],
# [28, 29, 30, 31],
# [32, 33, 34, 35]],
# [[36, 37, 38, 39],
# [40, 41, 42, 43],
# [44, 45, 46, 47]]])
arr[[0,0,1],[0,1,0],[0,0,2]]
#=> array([ 0, 4, 14])
# (0,0,0), (0,1,0), (1,0,2)The element at the position of is taken out
First, I introduced the basic index reference. These references do not copy the object in memory and are all provided in a stride view on the data. This is one of the features of ndarray.
Recommended Posts