First, let's generate a uniform random number and illustrate its distribution.
#Import the library for handling random numbers.
import random
sample_size = 10 #Number of random numbers generated
#Store uniform random numbers in dist (distribution):distribution)
dist = [random.random() for i in range(sample_size)]
#Check the contents of dist.
dist
#Import a library to illustrate diagrams and graphs.
import matplotlib.pyplot as plt
%matplotlib inline
#Draw a histogram.
plt.hist(dist)
plt.grid()
plt.show()
As the number of random numbers generated increases, the shape of the "ideal" distribution approaches.
sample_size = 100 #Number of random numbers generated
#Store uniform random numbers in dist
dist = [random.random() for i in range(sample_size)]
#Draw a histogram.
plt.hist(dist)
plt.grid()
plt.show()
sample_size = 1000 #Number of random numbers generated
#Store uniform random numbers in dist
dist = [random.random() for i in range(sample_size)]
#Draw a histogram.
plt.hist(dist)
plt.grid()
plt.show()
sample_size = 10000 #Number of random numbers generated
#Store uniform random numbers in dist
dist = [random.random() for i in range(sample_size)]
#Draw a histogram.
plt.hist(dist)
plt.grid()
plt.show()
sample_size = 100000 #Number of random numbers generated
#Store uniform random numbers in dist
dist = [random.random() for i in range(sample_size)]
#Draw a histogram.
plt.hist(dist)
plt.grid()
plt.show()
The box used to separate garbage is called bin. When drawing a histogram, the display will differ depending on how many bins are sorted. If you increase the number of bins, you can see the fine shape of the distribution, but the number of data separated per bin naturally decreases.
sample_size = 100000 #Number of random numbers generated
#Store uniform random numbers in dist
dist = [random.random() for i in range(sample_size)]
#Draw a histogram.
plt.hist(dist, bins=100) #Increase bin
plt.grid()
plt.show()
__np.random.binomial (n, p) __ returns the number of odd numbers that appear when you play roulette n times with a probability of p that produces an odd number (probability 1-p produces an even number). Such a distribution is called a binomial distribution.
Play roulette with equal probability of odd and even numbers 10 times and count the number of odd numbers. Repeat it 10,000 times. What is the probability that odd and even numbers will appear the same number of times (probability of appearing 5 times each)?
#Import the library of numerical calculations.
import numpy as np
sample_size = 10000 #Number of random numbers generated
#An odd number appears with a probability p (probability 1)-When you play roulette n times, you get an even number with p)
#Distribution of the number of odd numbers
dist = [np.random.binomial(n=10, p=0.5) for i in range(sample_size)]
#Draw a histogram.
plt.hist(dist, bins=100)
plt.grid()
plt.show()
As you can see from the above figure, if you play roulette with equal probability of odd and even numbers, the probability of odd and even numbers appearing the same number of times (probability of 5 times each) is about 25% (about 10000 times). 2500 times). You may have the impression that it is unexpectedly small.
You were observing other guests playing roulette at the casino. Then, since the number of odd numbers appearing is extremely high, I felt that the roulette was a squid. If it's not crazy, roulette should have odd and even odd numbers with equal probability. However, this roulette had an odd number 60 times out of 100 times. Is this roulette squid?
When you play roulette with equal probability of odd and even numbers 100 times, what is the probability that odd numbers will appear 60 times or more? First, let's draw the distribution.
sample_size = 10000 #Number of random numbers generated
#An odd number appears with a probability p (probability 1)-When you play roulette n times, you get an even number with p)
#Distribution of the number of odd numbers
dist = [np.random.binomial(n=100, p=0.5) for i in range(sample_size)]
#Draw a histogram.
plt.hist(dist, bins=100)
plt.grid()
plt.show()
With the same calculation as above, let's calculate the "probability of playing roulette 100 times and getting an odd number 60 times or more".
sample_size = 10000 #Number of random numbers generated
#An odd number appears with a probability p (probability 1)-When you play roulette n times, you get an even number with p)
#Distribution of the number of odd numbers
dist = [np.random.binomial(n=100, p=0.5) for i in range(sample_size)]
p = sum([1 for n in dist if n >= 60]) / sample_size
print("p-value: %(p)s " %locals())
After playing roulette with equal probability of odd and even numbers 100 times, it was found that the probability of odd numbers appearing 60 times or more "accidentally" is less than 5%. In other words, for a roulette that gives an odd number 60 times or more out of 100 times, it seems good to suspect that the roulette is crazy.
P at this time is called the p value (significance probability).
For a roulette wheel that has an odd number of 60 or more out of 100, it seems good to suspect that the roulette wheel is crazy. Then, if odd numbers appear 6 or more times out of 10 times, the probability of odd numbers appearing is the same 60%, but can you say that the roulette is crazy? Calculate the p-value and answer.
#Exercise 1
It is estimated that 5% of all populations have an infectious disease. If 20 people were randomly selected from the total population, how many people would be affected in the extracted population? Such a distribution is also a binomial distribution. Let's draw a distribution.
sample_size = 10000 #Number of random numbers generated
#An odd number appears with a probability p (probability 1)-When you play roulette n times, you get an even number with p)
#Distribution of the number of odd numbers
dist = [np.random.binomial(n=20, p=0.05) for i in range(sample_size)]
#Draw a histogram.
plt.hist(dist, bins=100)
plt.grid()
plt.show()
It is estimated that 5% of all populations have an infectious disease. When 100 people were randomly selected from all the inhabitants, there were more than 10 affected people in the extracted population.
(1) Estimate the probability that it will happen by chance.
(2) How should the result be interpreted?
#Exercise 2
__random.normalvariate (mu, sigma) __ is a function that generates random numbers that follow a normal distribution (mu is the mean, sigma is the standard deviation).
A normal distribution with a mean of 0 and a standard deviation of 1 is called a "standard normal distribution". Let's draw a standard normal distribution.
sample_size = 10000 #Number of random numbers generated
dist = [random.normalvariate(mu=0, sigma=1) for i in range(sample_size)]
#Draw a histogram.
plt.hist(dist, bins=100)
plt.grid()
plt.show()
What is the probability that a random number that follows a standard normal distribution will output a value of 2 or more? Let's calculate.
sample_size = 10000 #Number of random numbers generated
dist = [random.normalvariate(mu=0, sigma=1) for i in range(sample_size)]
p = sum([1 for n in dist if n >= 2]) / sample_size
print("p-value: %(p)s " %locals())
It is assumed that the "deviation value", which is often used in university entrance exams, follows a normal distribution with an average of 50 and a standard deviation of 10. Let's draw a distribution. Here, imagine the number of students on the vertical axis.
sample_size = 10000 #Number of random numbers generated
#Normal distribution with mean 50 and standard deviation 10
dist = [random.normalvariate(mu=50, sigma=10) for i in range(sample_size)]
#Draw a histogram.
plt.hist(dist, bins=100)
plt.grid()
plt.show()
How many out of 10,000 students have a deviation of 70 or more?
#Exercise 3
import numpy as np #Library for numerical calculation
import scipy as sp #Scientific calculation library
from scipy import stats #Statistical calculation library
The chi-square test is a method used to test whether two distributions are the same.
After rolling the dice 60 times and counting the number of times each roll was rolled, the result was as follows.
Dice roll td> | 1 | 2 | 3 | 4 | 5 | 6 |
Number of occurrences td> | 17 | 10 | 6 | 7 | 15 | 5 |
At this time, let's test whether or not it follows the distribution of theoretical values (uniform distribution).
significance = 0.05
o = [17, 10, 6, 7, 15, 5] #Measured value
e = [10, 10, 10, 10, 10, 10] #Theoretical value
chi2, p = stats.chisquare(o, f_exp = e)
print('chi2 value is%(chi2)s' %locals())
print('The probability is%(p)s' %locals())
if p < significance:
print('Significance level%(significance)There is a significant difference in s' %locals())
else:
print('Significance level%(significance)There is no significant difference in s' %locals())
chi2 value is 12.4 The probability is 0.029699459203520212 At a significance level of 0.05, there is a significant difference
When the shipping grades of a vegetable grown by the A method and the B method are as shown in the table below, should we consider that there is a relationship between these growing methods and the product grade?
Excellent th> | Good th> | Yes th> | Total th> | |
---|---|---|---|---|
A method td> | 12 | 30 | 58 | 100 |
B method td> | 14 | 90 | 96 | 200 |
total td> | 26 | 120 | 154 | 300 |
#Exercise 4
#Unpaired t-test
significance = 0.05
X = [68, 75, 80, 71, 73, 79, 69, 65]
Y = [86, 83, 76, 81, 75, 82, 87, 75]
t, p = stats.ttest_ind(X, Y)
print('t value is%(t)s' %locals())
print('The probability is%(p)s' %locals())
if p < significance:
print('Significance level%(significance)There is a significant difference in s' %locals())
else:
print('Significance level%(significance)There is no significant difference in s' %locals())
The t value is -3.214043146821967 The probability is 0.006243695014300228 At a significance level of 0.05, there is a significant difference
The same math test was conducted in two classes, the 6th grade 1st class and the 6th grade 2nd class, and the scoring results were obtained. Please test if there is a difference in points between the two classes.
6th grade 1 group th> | Score th> | 6th grade 2nd group th> | Score th> |
---|---|---|---|
1 | 70 | 1 | 85 |
2 | 75 | 2 | 80 |
3 | 70 | 3 | 95 |
4 | 85 | 4 | 70 |
5 | 90 | 5 | 80 |
6 | 70 | 6 | 75 |
7 | 80 | 7 | 80 |
8 | 75 | 8 | 90 |
class_one = [70, 75, 70, 85, 90, 70, 80, 75]
class_two = [85, 80, 95, 70, 80, 75, 80, 90]
#Exercise 5
#Paired t-test
significance = 0.05
X = [68, 75, 80, 71, 73, 79, 69, 65]
Y = [86, 83, 76, 81, 75, 82, 87, 75]
t, p = stats.ttest_rel(X, Y)
print('t value is%(t)s' %locals())
print('The probability is%(p)s' %locals())
if p < significance:
print('Significance level%(significance)There is a significant difference in s' %locals())
else:
print('Significance level%(significance)There is no significant difference in s' %locals())
The t value is -2.9923203754253302 The probability is 0.02016001617368161 At a significance level of 0.05, there is a significant difference
Please test if there is a difference between the national language and the math score.
6th grade 1 group th> | Japanese th> | Arithmetic th> |
---|---|---|
1 | 90 | 95 |
2 | 75 | 80 |
3 | 75 | 80 |
4 | 75 | 80 |
5 | 80 | 75 |
6 | 65 | 75 |
7 | 75 | 80 |
8 | 80 | 85 |
kokugo = [90, 75, 75, 75, 80, 65, 75, 80]
sansuu = [95, 80, 80, 80, 75, 75, 80, 85]
#Exercise 6
#One-factor analysis of variance
significance = 0.05
a = [34, 39, 50, 72, 54, 50, 58, 64, 55, 62]
b = [63, 75, 50, 54, 66, 31, 39, 45, 48, 60]
c = [49, 36, 46, 56, 52, 46, 52, 68, 49, 62]
f, p = stats.f_oneway(a, b, c)
print('f value is%(f)s' %locals())
print('The probability is%(p)s' %locals())
if p < significance:
print('Significance level%(significance)There is a significant difference in s' %locals())
else:
print('Significance level%(significance)There is no significant difference in s' %locals())
The f value is 0.09861516667148518 The probability is 0.9064161716556407 Significance level 0.05, no significant difference
Perform an analysis of variance using the data below.
group1 = [80, 75, 80, 90, 95, 80, 80, 85, 85, 80, 90, 80, 75, 90, 85, 85, 90, 90, 85, 80]
group2 = [75, 70, 80, 85, 90, 75, 85, 80, 80, 75, 80, 75, 70, 85, 80, 75, 80, 80, 90, 80]
group3 = [80, 80, 80, 90, 95, 85, 95, 90, 85, 90, 95, 85, 98, 95, 85, 85, 90, 90, 85, 85]
#Exercise 7
Choose one of the following survey results on Twitter and perform a statistical test. Also, consider the results statistically.
Recommended Posts