I summarized the statistical hypothesis test. Later, I would like to write an article that organizes "detection power" and "effect size", so I wrote this article as the first step. I am not an expert in statistics, so I would appreciate it if you could point out any mistakes.
The following is used as a reference when summarizing the "statistical hypothesis test".
-Statistics time -Introduction to Statistics (Basic Statistics I) Department of Statistics, Faculty of Liberal Arts, University of Tokyo -What is a hypothesis test?
We will organize the flow of statistical hypothesis testing and finally implement it in Python.
Wikipedia explains the statistical hypothesis test as follows.
Statistical hypothesis testing is one of the statistical methods to test the hypothesis about the population parameter of the population distribution from the sample. The Japanese Industrial Standards defines a statistical hypothesis as "a declaration of a population parameter or probability distribution. There are null hypotheses and alternative hypotheses." The test (statistical test) is a statistical procedure for deciding whether to reject the null hypothesis and support the alternative hypothesis or not to reject the null hypothesis based on the observed values. The procedure is that the null hypothesis holds. It is decided that the probability of rejection is α or less even though it is done. This α is called the significance level. "
Although technical terms such as "null hypothesis", "alternative hypothesis", and "significance level" have come out and are difficult to define, I understand that it is a method to carry out verification with the following logic. ..
** Assuming that a hypothesis is correct, when calculating the probability of becoming the state of that hypothesis from the actually observed data, if the probability is small enough, it is judged that the hypothesis is unlikely to hold **
The statistical hypothesis test is performed according to the following procedure.
If only the procedure is listed, it is abstract and difficult to understand, so I will explain it with a concrete example.
We played a coin toss 5 times, and played a game where you would get 500 yen when the front side came out and 500 yen when the back side came out. Then, as a result, the table came out all five times and I had to pay 2500 yen. Somehow it smells like squid, but can this coin be said to be bogus?
Some people may say, "Isn't it sometimes that the table appears five times in a row?", While others say, "It's strange that the table appears five times in a row." Such things can be judged objectively using statistical hypothesis tests.
This time, as a hypothesis that I want to return to nothing (a hypothesis that I want to deny), I make a hypothesis that ** this coin is not a fake **. Since it is not a fake, the probability that the table will appear is $ p = 0.5 $, so it is expressed as follows.
Null hypothesis: $ H_ {0}: p = 0.5 $
The alternative hypothesis (the probability that the table will appear is 50% or more) is as follows.
Alternative hypothesis: $ H_ {0}: p> 0.5 $
This time, we will use the binomial test. (If the number of samples is large, the binomial distribution can be approximated to a normal distribution, so other methods can be used.)
In this test, the significance level is set to 5% **. Null hypothesis: On the assumption that $ H_ {0}: p = 0.5 $, if the probability of obtaining observation data is 5% or less **, the null hypothesis is rejected (that is, ** this coin is bogus). There is **).
Also, this time, we will perform a ** one-sided test ** for a test ** to verify the suspicion that the coin may be abnormally easy to appear. The rejection area will be on one side only.
The statistic called the p-value represents the probability that the realized value of the observed data will be obtained, or the probability that more extreme data will be obtained, given that the null hypothesis is correct.
Therefore, the p value in this case is the probability that the table will appear 5 times, assuming that the probability that the table will appear is $ 50 % $.
Now that we have the information necessary for the hypothesis test, we will decide to reject or adopt the null hypothesis. The significance level of this hypothesis test was $ 0.05 $ ($ 5 % $), and the statistic (p value) was calculated as $ 0.03125 $.
Since $ 0.03125 <0.05 $, the null hypothesis is rejected and the alternative hypothesis is adopted.
Therefore, we were able to verify that $ H_ {0}: p> 0.5 $ (this coin is bogus).
The above calculation can be easily performed in Python. Below are the results of a binomial test using scipy 1.3.1.
from scipy import stats
#x is the number of successful observation data
#n is the number of trials
#p is the expected success probability
#alternative specifies whether it is a two-sided test or a one-sided test, and if it is a one-sided test, which side it is.
p = stats.binom_test(x = 5, n = 5, p = 0.5, alternative = 'greater' )
print(p)
The output result is here. The p-value can be output according to the specified argument.
0.03125
The binomial test can be easily performed by making a rejection or acceptance decision according to the significance level set using the above. (If you want to perform another test, use another method.)
Up to now, the hypothesis test was performed only by calculation, but it becomes very easy to understand when actually drawing the distribution. Perform a coin toss 5 times and draw the distribution of the number of times the table appears.
import numpy as np
import matplotlib.pyplot as plt
import math
%matplotlib inline
def comb_(n, k):
result = math.factorial(n) / (np.math.factorial(n - k) * np.math.factorial(k))
return result
def binomial_dist(p, n, k):
result = comb_(n, k) * (p**k) * ((1 - p) ** (n - k))
return result
x = np.arange(0, 6, 1)
y = [binomial_dist(0.5, 5, i) for i in x]
plt.bar(x, y, alpha = 0.5)
The above is the result of drawing, but you can immediately see that the probability that the table appears 5 times is below the significance level of 0.05 **.
NEXT Next time, I will summarize the type I errors, type II errors, and detection power in hypothesis testing.
Recommended Posts