Yesterday explained the "law of large numbers" and "central limit theorem" which are important probability theorems when dealing with large size data. ..
Today, let's actually use a computer to simulate the distribution of many stochastic trials.
You can't make real random numbers with a calculator. Therefore, we use pseudo-random numbers. (In addition, it seems that pseudo-random numbers are officially "pseudo" with no bias.) Random numbers are very deep, and there is no end to pursuing them.
Here, we will use numpy.random.randint that is associated with NumPy. It returns an integer from a uniformly discrete distribution.
I explained yesterday that 40,000 coin tosses will not show up 20,400 times when the central limit theorem is applied. It has been proved by the theorem, but it may be difficult for an engineer to understand it without actually running it on a computer.
Therefore, use the following code to simulate.
def coin_toss(lim):
"""Simulate a coin toss"""
#Stores 1 if the front appears and 0 if the back appears
_randomized = np.random.randint(2, size=lim)
#Aggregate the total number of times the table appears
_succeed = [i for i in _randomized if i == 1]
#Returns the aggregation result
return len(_succeed)
X = []
Y = []
lim = 10000
# 40,000 coin toss 10,Do 000 times
for i in range(lim):
X.append(i)
Y.append(coin_toss(lim = 40000))
print (X)
print (Y)
_over_lim = [i for i in Y if i >= 20400]
# 20,Number of times over 400
print( len(_over_lim) )
_under_lim = [i for i in Y if i <= 19600]
# 19,Number of times 600 or less
print( len(_under_lim) )
This means that you will toss 400 million coins in one experiment.
First is the result of the first experiment.
If you think straightforwardly with a 40,000 coin toss, you should get a table of 20,000 times, but this is a graph that repeats this 10,000 times.
It's almost centered. This time, the number of times the table appeared was neither more than 20,400 nor less than 19,600.
When I experimented again, there were only 1 out of 10,000 cases where the table appeared less than 19,600 times.
I think you can somehow read it from the graph.
This is the result of the third experiment.
There was only one case where the number of times the table was barely displayed did not exceed 20,400. In addition, there was only one case that was less than 19,600 times.
This is the result of the 4th experiment.
After all, it did not exceed 20,400 times.
As a follow-up test, I tried the same experiment again on a different day.
There was one case where the table appeared more than 20,400 times and one case where the table appeared less than 19,600 times.
Now, let's statistically investigate whether the hypothesis that 40,000 coin toss yields 20,400 tables is correct.
The hypothesis can be made as follows.
hypothesis | Description |
---|---|
Null hypothesis | 40,20 coin toss 000 times,400 times table appears |
Alternative hypothesis | 40,20 coin toss 000 times,400 times table does not appear(20,000 times table appears) |
** Pearson's chi-square test ** is the Chi-square test (http://en.wikipedia.org/wiki/%E3%82%AB%E3%82%A4%E4) % BA% 8C% E4% B9% 97% E5% 88% 86% E5% B8% 83) is the most basic and widely used method. The formula looks like this:
X^2 = \sum\frac {(O-E)^2} {E}
The chi-square test is easy to implement using SciPy.
Let's start by checking if a 400 coin toss will give you a 204 flip table. To find out, whether the hypothesis that the results of 204 fronts and 196 backs are from 400 coin tosses is significant.
# -*- coding:utf-8 -*-
import numpy as np
import scipy.stats
s = 204 #Number of times the table appears
f = 196 #Number of times the back comes out
e = 200 #Expected number of times
#Null hypothesis(204:196)
observed = np.array([s,f])
#Alternative hypothesis(200:200)
expected = np.array([e,e])
#Perform a chi-square test
x2, p = scipy.stats.chisquare(observed, expected)
print("The chi-square value is%(x2)s" %locals() )
print("The probability is%(p)s" %locals() )
#Statistical significance level 0.Find out if it is higher than 05
if p > 0.05:
print("Significant")
else:
print("Not significant")
The result is like this. 0.68 is higher than 0.05, so it's significant. It can be said that it can happen enough.
Chi-square value is 0.16
Probability is 0.689156516779
Significant
Let's look at this as 2,040 times on the front and 1,960 times on the back.
Chi-square value is 1.6
Probability is 0.205903210732
Significant
Sounds still significant. It can happen enough.
Then what about 20,400 times on the front and 19,600 times on the back?
Chi-square value is 16.0
Probability is 6.33424836662e-05
Not significant
It was only a very low value (note floating point numbers). It can be said that this is unlikely to happen.
What if you try 40,000 coin toss to flip 20,100 times?
Chi-square value is 1.0
Probability is 0.317310507863
Significant
It's significant. It turned out that it is quite possible that the table will appear if it is about 20,100 times.
By using a computer, you can perform a large amount of calculations, perform simulations, and visualize the results. We also found that hypothesis testing can test whether the relative frequency of events observed follows a frequency distribution.