Why use python

Let's think again from the origin. What is the reason for using Python for statistics and machine learning? The author thinks as follows.

Python itself is a versatile glue language
The library for scientific calculations such as mathematics, statistics, and machine learning is very rich and unrivaled by other common languages.
Since it calls a mathematical library such as linear algebra written in C / C ++ or FORTRAN, the mathematical calculation part is fast even though it is an interpreted language.

There are many other types, such as Haskell and Ruby, that are purely language appealing. Also, the R language is well known for simple statistical mathematical calculations. There is also paid software for financial calculations and statistical analysis. However, Python may be the first candidate for a scientific application that specializes in computation and a language that solves problems in general system processing at the same time.

And the library that is the basis of scientific calculation and various mathematical calculations such as multidimensional array calculation by Python is NumPy.

From now on, we'll focus on NumPy's array manipulation for a while to better understand its powerful features.

np.ndarray object

The NumPy ndarray object is a class for treating stride data (continuous data) as an N-dimensional array.

Attempting to work with multidimensional arrays in common programming languages without the help of such libraries would require a complex nested list (array) and another complex nested multiple loops to compute it. It will end up. This is not realistic, so the good or bad of a multidimensional array library can be said to be essential in scientific calculations.

Advantages and disadvantages of ndarray

Here are some of the advantages of ndarray over ndarray and multiple lists (arrays).

ndarray can apply many advanced mathematical operations on matrices more easily and faster than multiple lists.
High-speed processing is possible by applying operations and functions to all or some elements in the array at once.

On the other hand, ndarray has some disadvantages.

Multiple lists may have different element types within the list. However, ndarray must basically consist of all elements of the same type.
ndarray must have the same number of elements in each dimension.

From the perspective of mathematically, especially linear algebra, I think this is rather natural.

Components of ndarray

The ndarray has a stride internally, and the array object has elements such as dtype (data type), shape, and stride.

The shape of the array can be accessed with the shape function. Strides can also be accessed with the strides function.

np.zeros((3,4))
#=> array([[ 0.,  0.,  0.,  0.],
#          [ 0.,  0.,  0.,  0.],
#          [ 0.,  0.,  0.,  0.]])

np.zeros((3,4)).shape
#=> (3, 4)

np.zeros((3,4)).strides
#=> (32, 8)
#A byte value that indicates the "step length" required to advance an element by one in each dimension.

Index reference

Slicing

For arrays of two or more dimensions, the index reference destination is an array of one or more dimensions.

arr = np.array( [[[1,2,3], [4,5,6]], [[7,8,9],[10,11,12]]] )

arr.ndim #Number of dimensions
#=> 3

arr[0]
#=> array([[1, 2, 3],
#          [4, 5, 6]])
#Returns a two-dimensional array

If you specify an array for the index, you can retrieve the array with the number of dimensions reduced by that amount.

arr[1,0]
#=> array([7, 8, 9])

arr[1,0,2]
#=> 9

See fancy index

When you want to extract from a multidimensional array in a particular order, you can pass a list of integers or ndarray to indicate that order as an index reference.

arr[[1,0]]
#=> array([[[ 7,  8,  9],
#           [10, 11, 12]],
#          [[ 1,  2,  3],
#           [ 4,  5,  6]]])

arr = np.arange(48).reshape((4,3,4))
#=> array([[[ 0,  1,  2,  3],
#           [ 4,  5,  6,  7],
#           [ 8,  9, 10, 11]],

#          [[12, 13, 14, 15],
#           [16, 17, 18, 19],
#           [20, 21, 22, 23]],

#          [[24, 25, 26, 27],
#           [28, 29, 30, 31],
#           [32, 33, 34, 35]],

#          [[36, 37, 38, 39],
#           [40, 41, 42, 43],
#           [44, 45, 46, 47]]])

arr[[0,0,1],[0,1,0],[0,0,2]]
#=> array([ 0,  4, 14])
# (0,0,0), (0,1,0), (1,0,2)The element at the position of is taken out

Summary

First, I introduced the basic index reference. These references do not copy the object in memory and are all provided in a stride view on the data. This is one of the features of ndarray.

NumPy array manipulation (1)