Continuing from Last time, the story of NumPy continues without sexual discipline.

Why use NumPy in the first place

It's called data analysis, but what is data in the first place? Data is a collection of numbers. Looking around us, the total number of books on the desk, the plates on the table, and the apples lined up at the greengrocers is finite. For example, there are 14 books, 22 plates, 56 apples in total, and so on.

Such a finite and small number of things can be combined into one at any time if necessary. The collection of things in the world we live and experience is thus made up of a finite number of things.

A set of things is a set. When the total number of objects being considered is small, the recognition of each object and the recognition made by the entire set are not so different. However, as the number of objects included in one concept increases, the way of recognizing the whole as one becomes different. For example, in terms of natural numbers, there is a difference between recognizing each natural number and recognizing the entire natural number as a symbol. The way of recognizing this whole as one is the basis of set theory.

Now let's consider set theory and linear algebra. Vector space in linear algebra refers to the mathematical structure of a collection of elements called a vector. Roughly speaking, linear algebra is the mathematics of vectors and matrices, but in order to handle these vectors, matrices, and multidimensional arrays consisting of them, it is essential to support a dedicated library. Therefore, mastering NumPy also means mastering the structure and manipulation of the actual data to be analyzed.

Save ndarray object

The np.save and np.load functions can input and output ndarray objects as files. In addition, np.savetxt and np.loadtxt make the file in text format.

Also, if you can use pandas, you can use higher-order read_csv, read_table, write_csv, write_table functions.

arr = np.arange(10)
#=> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

np.save('hoge', arr) #Save with hoge, extension is.npy
arr2 = np.load('hoge.npy') #Load the saved object
arr2
#=> array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) #Successful restoration

np.savetxt('fuga.txt', arr) #Save as text
arr3 = np.loadtxt('fuga.txt') #Read from text
arr3
#=> array([ 0.,  1.,  2.,  3.,  4.,  5.,  6.,  7.,  8.,  9.])

Matrix calculation

As mentioned above, the calculation of linear algebra is an important position in array calculations such as NumPy.

The reshape function is a very useful function when generating multidimensional arrays.

x = np.array([1,2,3,4,5,6]).reshape([2,3])
#=> array([[1, 2, 3],
#          [4, 5, 6]])

y = np.array([6,23,-1,7,8,9]).reshape([3,2])
#=> array([[ 6, 23],
#          [-1,  7],
#          [ 8,  9]])

x.dot(y) #Find the inner product
#=> array([[ 28,  64],
#          [ 67, 181]])

from numpy.linalg import inv, qr, pinv, eig

x = np.random.randn(5,5)
x.T #Transpose
#=> array([[ 0.1797343 , -1.48685211,  1.89995885, -1.48818535,  0.22707072],
#          [ 0.16362348,  0.73820851,  0.6830228 , -0.28744869,  1.60110706],
#          [-0.25212006, -0.75832623,  1.08510935,  0.36069392, -0.25172285],
#          [-1.23742215, -0.27616976,  1.09778477, -0.79290683,  1.88819678],
#          [ 1.25424329, -0.44571606, -0.37970879,  0.25329534, -0.0571783 ]])

mat = x.T.dot(x)
mat #Transpose and find inner product
#=> array([[ 8.119134  ,  1.02085855,  2.54992915,  3.88270879, -0.22322061],
#          [ 1.02085855,  3.68441518, -0.36661744,  3.59459509, -0.54751549],
#          [ 2.54992915, -0.36661744,  2.00954999,  0.95132328, -0.28449209],
#          [ 3.88270879,  3.59459509,  0.95132328,  7.00660306, -2.15457716],
#          [-0.22322061, -0.54751549, -0.28449209, -2.15457716,  1.98339569]])

inv(mat) #Returns the inverse matrix for a square matrix
#=> array([[ 0.34294894,  0.13024165, -0.30841121, -0.30883099, -0.30517266],
#          [ 0.13024165,  0.87379103,  0.25904943, -0.69902729, -0.46633355],
#          [-0.30841121,  0.25904943,  0.98083558, -0.06094767,  0.11128055],
#          [-0.30883099, -0.69902729, -0.06094767,  0.91304207,  0.75537864],
#          [-0.30517266, -0.46633355,  0.11128055,  0.75537864,  1.17764414]])

q, r = qr(mat) #QR disassemble
q
#=> array([[-0.86261627,  0.28238894,  0.35807769,  0.08420784, -0.20208678],
#          [-0.10846098, -0.74407004,  0.01429817,  0.58226198, -0.30880827],
#          [-0.27091687,  0.2393467 , -0.87410947,  0.31594231,  0.0736905 ],
#          [-0.41251786, -0.54595751, -0.16208726, -0.50524429,  0.50021529],
#          [ 0.02371604,  0.10611234,  0.28501978,  0.54661567,  0.7798415 ]])

r
#=> array([[-9.41221986, -2.67672118, -3.10345255, -6.93833754,  1.26485136],
#          [ 0.        , -4.56152677,  0.92426978, -5.40443517,  1.66303293],
#          [ 0.        , -0.        , -1.08401917, -1.13963141,  1.07545497],
#          [ 0.        ,  0.        ,  0.        , -1.997258  ,  1.7452655 ],
#          [-0.        , -0.        , -0.        , -0.        ,  0.66220471]])

pinv(mat) #Returns the Moore Penrose reciprocal
#=> array([[ 0.34294894,  0.13024165, -0.30841121, -0.30883099, -0.30517266],
#          [ 0.13024165,  0.87379103,  0.25904943, -0.69902729, -0.46633355],
#          [-0.30841121,  0.25904943,  0.98083558, -0.06094767,  0.11128055],
#          [-0.30883099, -0.69902729, -0.06094767,  0.91304207,  0.75537864],
#          [-0.30517266, -0.46633355,  0.11128055,  0.75537864,  1.17764414]])

np.trace(mat) #Returns the sum of diagonal components
#=> 22.80309791710043

eig(mat) #Returns eigenvalues and eigenvectors for a square matrix
#=> (array([ 13.3600683 ,   5.95602662,   2.24791381,   0.81881059,   0.4202786 ]),
#    array([[-0.64467006, -0.63086541,  0.20659265,  0.31642236, -0.20881982],
#          [-0.31183983,  0.5014887 ,  0.55973159, -0.32086736, -0.48477798],
#          [-0.19333292, -0.33433927, -0.32170008, -0.86424035, -0.02091191],
#          [-0.65225524,  0.42536152, -0.20774674,  0.04440243,  0.59033922],
#          [ 0.15601897, -0.24042203,  0.70524491, -0.21917587,  0.61028427]]))

Summary

This time, I explained the frequently occurring functions of linear algebra functions, which are especially important. This is the basic part of scientific calculation, so let's study it well.

NumPy array manipulation (3)

Why use NumPy in the first place

Save ndarray object

Matrix calculation

Summary