Python for Data Analysis Chapter 4

NumPy Basics: Arrays and Vectorized Computation

ndarray N-dimensional array object provided by NumPy Creating dnarrays

#Created from an array
data1 = [6, 7.5, 8, 9]
arr1 = np.array(data1)

#Can also be created in multidimensional arrays
data2 = [[1, 2, 3, 4], [5, 6, 7, 8]]
arr2 = np.array(data2)

#python range function
np.arange(10)

#Zero vector
np.zeros(10)

#Zero matrix
np.zeros((3, 6))

#Generate without initialization
np.empty((2, 3, 2))

#Dimensional confirmation
arr2.ndim

#Array shape
arr2.shap

#Data type confirmation
arr1.dtype

#Generate by specifying the data type
arr1 = np.array([1, 2, 3], dtype=np.float64)

#Generated from a string
data3 = ["1.1", "2.2", "3.3"]
arr3 = np.array(data3, dtype=float64)

Operations between Arrays and Scalars

#Calculation between arrays is calculation between the same place
arr = np.array([[1, 2, 3], [4, 5, 6]])
"""
In [32]: arr
Out[32]: 
array([[1, 2, 3],
       [4, 5, 6]])
"""
arr * arr
"""
In [33]: arr * arr
Out[33]: 
array([[ 1,  4,  9],
       [16, 25, 36]])
"""

#Calculation with scalar is calculated for all elements
arr - 1
"""
In [34]: arr - 1
Out[34]: 
array([[0, 1, 2],
       [3, 4, 5]])
"""
1 / arr
"""
In [35]: 1 / arr
Out[35]: 
array([[1, 0, 0],
       [0, 0, 0]])
"""

Basic Indexing and Slicing / Fancy Indexing

\|0	1	2
0	0,0	0,1
1	1,0	1,1
2	2,0	2,1

The element specification is the same as the mathematical matrix (row, col)

__ If you want a copy of an array slice, if you don't copy it, the slice will change when the original array changes __ arr[5:8].copy()

Boolean Indexing Array masking can be done using bool array

name = np.array(["bob", "martin" ,"feed","max","rosetta","john"])
"""
In [63]: name == "bob"
Out[63]: array([ True, False, False, False, False, False], dtype=bool)
"""
arr = np.arange(6)
"""
In [68]: arr[name=="rosetta"]
Out[68]: array([4])
"""

Boolean operator & (and) | (or)

`python`


mask = (name=="rosetta") | (name=="martin")
"""
In [72]: mask
Out[72]: array([False,  True, False, False,  True, False], dtype=bool)
"""

Selection by comparison operator

data = randn(10)
"""
In [78]: data
Out[78]: 
array([-0.43930899, -0.18084457,  0.50384496,  0.34177923,  0.4786331 ,
        0.0930973 ,  0.95264648,  1.29876589,  0.96616151,  0.69204729])
"""
data[data < 0] = 0
"""
In [80]: data
Out[80]: 
array([ 0.        ,  0.        ,  0.50384496,  0.34177923,  0.4786331 ,
        0.0930973 ,  0.95264648,  1.29876589,  0.96616151,  0.69204729])
"""

Transposing Arrays and Swapping Axes !! !! !! !! !! difficult! !! !! !! !! I think it's easier to take only what you want with a fancy slice ...

arr = np.arange(15).reshape((3,5))

#Transpose
arr.T

#inner product
np.dot(arr.T, arr)

arr = np.arange(45).reshape((3,5,3))

#Transform by specifying the axis
arr.transpose((1, 0, 2))

#Shaft replacement
arr.swapaxes(1, 2)

Universal Functions: Fast Element-wise Array Functions

1 argument function

A function that operates on an elementwise basis. Apply a function to each element of x with np.func (x).

Function	Description
abs	Absolute value
sqrt	x ** 0.5
square	x ** 2
exp	exp(x)
log, log10, log2	Bottom e, 10,Log at 2(x)
log1p	log when x is very small(1+x)
sign	Code(1,0,-1)return it
ceil	Round up after the decimal point
floor	Truncate after the decimal point
rint	Round a decimal to a recent integer
modf	Decompose a decimal into a decimal part and an integer part
isnan, isinf, isfinite	NaN,infinite,Returns a numeric or bool value
logical_not	returns a bool value of not x

2-argument function

Used in np.func (x1, x2).

Function	Description
add, subtract, multiply, divide, power, mod	x1 (+, -, , /, *, %) x2
maximum, minimum	With elements at the same position on x1 and x2(large,small)One
copysign	x1 * (sign of x2)
greater, greater_equal, less, less_equal, equal, not_equal	x1 (>, >=, <, <=, ==, !=) x2
logical_and, logical_or, logical_xor	x1 (&,丨, ^) x2

Data Processing Using Arrays Visualize 2D data. As an example, display the grid on which sqrt (x ^ 2, y ^ 2) is calculated.

#Create 1000 points
points = np.arange(-5, 5, 0.01)
#Create a 2D mesh
#x is a two-dimensional array with an array of x in rows and y is an array of y in columns
xs, ys = np.meshgrid(points, points)
#Calculation
z = np.sqrt(xs ** 2 + ys ** 2)
#display
plt.imshow(z, cmap=plt.cm.gray); plt.colorbar()
plt.title("Image plot of $\sqrt{x^2 + y^2}$ for a grid of values")

Expressing Conditional Logic as Array Operations np.where is a function that returns either the second or third argument depending on the value of the first argument. That is, np.where (cond, xarr, yarr) = [(x if c else y) for x, y, c in zip (xarr, yarr, cond)]

arr = randn(5, 5)
"""
In [5]: arr
Out[5]: 
array([[-0.63774199, -0.76558645, -0.46003378,  0.61095653,  0.78277454],
       [ 0.25332127,  0.50226145, -1.45706102,  1.14315867,  0.28015   ],
       [-0.76326506,  0.33218657, -0.18509161, -0.3410194 , -0.29194451],
       [-0.32247669, -0.64285987, -0.61059921, -0.38261289,  0.41530912],
       [-1.7341384 ,  1.39960857,  0.78411537,  0.25922757, -0.22972615]])
"""
arrtf = np.where(arr > 0, True, False)
"""
In [6]: arrtf
Out[6]: 
array([[False, False, False,  True,  True],
       [ True,  True, False,  True,  True],
       [False,  True, False, False, False],
       [False, False, False, False,  True],
       [False,  True,  True,  True, False]], dtype=bool)
"""

By combining these, it is possible to classify by multiple conditions.

cond1 = np.where(randn(10) > 0, True, False)
cond2 = np.where(randn(10) > 0, True, False)
"""
In [16]: cond1
Out[16]: array([False,  True, False, False,  True,  True,  True,  True,  True,  True], dtype=bool)

In [17]: cond2
Out[17]: array([False, False, False, False, False,  True, False,  True,  True,  True], dtype=bool)
"""
result = np.where(cond1 & cond2, 0, np.where(cond1, 1, np.where(cond2, 2, 3)))
"""
In [19]: result
Out[19]: array([3, 1, 3, 3, 1, 0, 1, 0, 0, 0])
"""

You can also rewrite if and else.

result = []
for i in range(n):
    if cond1[i] and cond2[i]:
        result.append(0)
    elif cond1[i]:
        result.append(1)
    elif cond2[i]:
        result.append(2)
    else:
        result.append(3)

It is also possible with mathematical formulas. (Note that 0 and 3 are interchanged with the others) result = 1*cond1 + 2*cond2

Mathematical and Statistical Methods Statistical functions are also available.

arr = randn(5, 4)
arr.mean()
#Axis can also be specified
arr.mean(0)
arr.mean(1)
"""
In [60]: arr.mean()
Out[60]: 0.51585861805229682

In [62]: arr.mean(0)
Out[62]: array([ 0.65067115, -0.03856606,  1.06405353,  0.38727585])

In [63]: arr.mean(1)
Out[63]: array([ 1.18400902,  0.84203136,  0.50352006,  0.07445734, -0.0247247 ])
"""

sum
mean
std, var
min, max
argmin, argmax (returns maximum / minimum index)
cumsum (progressive total)
cumprod (cumulative total)

Methods for Boolean Arrays Since the Boolean type True is counted as 1 and False is counted as 0, counting by the sum function is often used.

arr = randn(100)
sumnum = (arr > 0).sum()
"""
In [75]: sumnum
Out[75]: 43
"""

Other Boolean functions

any (True if there is even one True)
all (True if all are True)

Sorting You can also sort. arr.sort()

Unique and Other Set Logic You can also use something like a genuine Python set function.

unique(x)
intersect1d(x, y)（unique(x) & unique(y)）
union1d(x, y)（unique(x) | unique(y)）
in1d (x, y) (returns an array of Boolean values if the element of y is contained in x)
setdiff1d (x, y) (value of x not in y)
setxor1d (x, y) (value of x not in y & value of y not in x)

File Input and Output with Arrays You can save the NumPy array object to an external file. Of course, you can also load and restore saved files.

arr = np.arange(10)

#Save in binary format
np.save("array_name", arr)
#Load binary format file
arr = np.load("array_name.npy")
#Save multiple arrays as zip
np.savez("array_archive.npz", a=arr, b=arr)
#Load multiple array zip
arr_a = np.load("array_archive.npz")["a"]
arr_b = np.load("array_archive.npz")["b"]

#Save in csv format
np.savetxt("array_ex.txt", arr, delimiter=",")
#Read csv format file
arr = np.loadtxt("array_ex.txt", delimiter=",")

Linear Algebra You can also calculate linear algebra.

Function	Description
diag	Extract diagonal elements
dot	inner product
trace	Sum of diagonal elements
det	Determinant
eig	Decompose into eigenvalues and eigenvectors
inv	Transpose
pinv	Moore-Penrose's reciprocal
qr	QR decomposition
svd	SVD decomposition
solve	When A is a square matrix Ax=Find x in b
stsq	Calculate least squares solution

Random Number Generation Random values of various distributions can be obtained at high speed.

Function	Description
seed	Random generation by seed value
permutation	Randomly sort the elements of the sequence
shuffle	Randomly sort the elements of the sequence
rand	Generate a random array of the number of dimensions passed as an argument
randint	Generate a random integer array of the number of dimensions passed as an argument
binomial	Random sampling from the binomial distribution
normal	Random sampling from normal distribution
beta	Random sampling from beta distribution
chisquare	chi-Random sampling from square distribution
gamma	Random sampling from gamma distribution
uniform	Random sampling from the normal distribution in a given range

Example: Random Walks Run the following in ipython

nsteps = 1000
draws = np.random.randint(0, 2, size=nsteps)
steps = np.where(draws > 0, 1, -1)
walk = steps.cumsum()
plt.plot(walk)

Simulating Many Random Walks at Once

nwalks = 100
nsteps = 1000
draws = np.random.randint(0, 2, size=(nwalks, nsteps))
steps = np.where(draws > 0, 1, -1)
walks = steps.cumsum(1)
plt.plot(walks)

Expansion It doesn't look like a very high quality random value, but it should be quite high quality because it actually uses the Mersenne Twister.