Chapter 2 Exercises

These calculations can be obtained immediately by using existing functions, but we will implement them from scratch in the sense of studying the contents.

2.2 Mean difference / Gini coefficient

A : 0, 3, 3, 5, 5, 5, 5, 7, 7, 10 B : 0, 1, 2, 3, 5, 5, 7, 8, 9, 10 C : 3, 4, 4, 5, 5, 5, 5, 6, 6, 7

Calculate the mean difference and Gini coefficient for the data in.

Here, the mean difference and the Gini coefficient are defined by the following formulas, respectively.

\frac{Σ_iΣ_j|x_i-x_j|}{n^2}

\frac{Σ_iΣ_j|x_i-x_j|}{2n^2 \bar{x}}

Therefore, I wrote the program as follows.

import numpy as np
A = np.array([0,3,3,5,5,5,5,7,7,10])
B = np.array([0,1,2,3,5,5,7,8,9,10])
C = np.array([3,4,4,5,5,5,5,6,6,7])
#Mean difference
def ave_diff(x):
    n=len(x)**2
    result = [np.abs(x[i]-x[j])/n for i in range(len(x)) for j in range(len(x))]
    return sum(result)


"""
print(ave_diff(A))
print(ave_diff(B))
print(ave_diff(C))

2.76
3.7599999999999976
1.2000000000000008
"""

#Gini coefficient
def get_gini(x):
def get_gini(x):
    n=len(x)**2
    x_bar=x.mean()
    result = [np.abs(x[i]-x[j])/(2*n*x_bar) for i in range(len(x)) for j in range(len(x))]
    return sum(result)
"""
print(get_gini(A))
print(get_gini(B))
print(get_gini(C))

0.2760000000000002
0.3760000000000002
0.12000000000000008
"""

2.3 Entropy

When p_i = f_i / n

H(p_1, p_2, ...., p_n) = -Σp_iilog(p_ii)

It is defined by. This amount is called entropy, and the larger H is, the more uniform the distribution is, and the smaller H is, the more concentrated it is.

example I asked 100 students where they came from. The following results were obtained for 10 years ago and this year. Compare the distribution of this place of origin from the standpoint of concentration.

area	A	B	C	D	E	Total
This year	32	19	10	24	15	100
10 years ago	28	13	18	29	12	100

import numpy as np
a=np.array([32, 19, 10, 24, 15])
b=np.array([28,13,18,29,12])

def entropy(x):
    H=0
    n=sum(x)
    
    H=[x[i]/n*np.log10(x[i]/n) for i in range(len(x))]
#     for i in range(len(x)):
#         p=a[i]/n
#         H.append(p*np.log10(p))
    return -sum(H)

"""
print(entropy(a))
print(entropy(b))

0.667724435887455
0.6704368955892825
"""

2.4 Standard score / deviation score

Calculate the standard score and deviation score for data B

Standard score / standardization

z_i = \frac{xi-\bar{x}}{S_x}

def standard_score(x):
    x_bar = x.mean()
    s=np.sqrt(x.var())
    z = [(x[i]-x_bar)/s for i in range(len(x))]
    return z

"""
standard_score(B)

[-1.5214515486254614,
 -1.217161238900369,
 -0.9128709291752768,
 -0.6085806194501845,
 0.0,
 0.0,
 0.6085806194501845,
 0.9128709291752768,
 1.217161238900369,
 1.5214515486254614]

"""

About deviation value score

T_i = 10z_i * 50

So I changed the above function a little and

def dev_val(x):
    x_bar = x.mean()
    s=np.sqrt(x.var())
    T = [(x[i]-x_bar)/s*10 +50 for i in range(len(x))]
    return T

'''

def_val(B)


[34.78548451374539,
 37.82838761099631,
 40.87129070824723,
 43.91419380549816,
 50.0,
 50.0,
 56.08580619450184,
 59.12870929175277,
 62.17161238900369,
 65.21451548625461]

'''

It will be.

Recommended Posts

Introduction to Statistics The University of Tokyo Press Chapter 2 Exercises

2016 The University of Tokyo Mathematics Solved with Python

[Introduction to Python3 Day 20] Chapter 9 Unraveling the Web (9.1-9.4)

From the introduction of pyethapp to the execution of contract

Introduction to Quiz Statistics (1) -Mathematical analysis of question sentences to know the tendency of questions-

Introduction to Machine Translation Architecture by the University of Cambridge by Slack Translation App Kiara

Try to solve the problems / problems of "Matrix Programmer" (Chapter 1)

[Chapter 5] Introduction to Python with 100 knocks of language processing

[Chapter 3] Introduction to Python with 100 knocks of language processing

[Chapter 2] Introduction to Python with 100 knocks of language processing