Introduction to Statistical Hypothesis Testing with stats models

Have you ever thought you've mastered a statistical test and don't know what to do if you decide to use it? Statistical tests have higher hurdles than you might think. This is actually the same for those who always use statistics. If you don't use it for a while, you won't know immediately.

Therefore Basic Econometrics by Damodar N. Gujarati I decided to translate a part of. We look forward to helping you. In addition, we will hold a Free Online Study Session (Linear Regression) on June 16, 2020. I hope you will join us.

Statistical inference: Hypothesis test

Inference and hypothesis testing form two pillars of classical statistical inference. After considering the problem of estimation, we will briefly consider the problem of testing statistical hypotheses.

The problem of hypothesis testing is expressed as follows. Suppose you have an X with a known probability density function f (x; θ). Where θ is a parameter of the distribution. Obtain $ \ hat {\ theta} $ by point estimator from a randomly obtained sample with a sample size of n. The problem arises because the true θ is rarely known. The estimator $ \ hat {\ theta} $ is "compatible" with some hypothetical values of θ, such as θ = θ $ ^ \ * $. For example. Is $ θ ^ \ * $ a specific number for θ? In other words, can the sample be obtained from the probability density function f (x; θ) = $ θ ^ \ * $?

In the hypothesis testing language, $ θ = \ theta ^ \ * $ is called the null hypothesis and is commonly represented by H0. The null hypothesis is tested against the alternative hypothesis represented by H1. For example, it could be θ $ \ ne θ ^ \ * $. The null hypothesis and the alternative hypothesis are either simple or compound hypotheses. When you specify the parameter value of the distribution, it is called a simple hypothesis. Otherwise, it is called a compound hypothesis. Therefore, it is X to N (µ, $ \ sigma ^ 2 $). H0: µ = 15 and σ = 2 If so, that is a simple hypothesis. On the other hand H0: µ = 15 and σ> 2 Is a compound hypothesis because no value for σ is specified.

To test the null hypothesis, that is, to test its validity, we use the information from the sample to obtain what is called a test statistic. Often, this test statistic is a point estimator for an unknown parameter. Then find a sampling or probability distribution of the test statistic and test the null hypothesis using confidence intervals or significance tests. Next, I will explain the procedure.

To understand this idea, let's return to Example 23 of the male height (X) of the population. Xi〜N(µ、\sigma^2)= N(µ、2.5$^2$) \bar{X}= 67 n = 100 will do. H0:µ = µ* = 69 H1:µ \ne 69 Let's assume. The problem is: Is it possible that the sample of the test statistic $ \ bar {X} $ = 67 comes from a population with a mean of 69? Intuitively, if $ \ bar {X} $ is "close enough" to µ *, it will not reject the null hypothesis, otherwise it will support the alternative hypothesis and reject the null hypothesis. .. But how do you determine that $ \ bar {X} $ is "close enough" to µ *? There are two methods: (1) confidence interval (2) test of significance. We draw the same conclusions for both problems.

Confidence interval method

Since $ X_i \ sim N (\ mu, \ sigma ^ 2 $), the test statistic $ \ bar {X} $ is $ \ bar {X} $ ~ N (µ, $ \ sigma ^ 2 $ / n) It is distributed like this.

Since the probability distribution of $ \ bar {X} $ is known, for example, establish a confidence interval of 100 (1 − α) of µ based on $ \ bar {X} $, and µ = µ in this confidence interval. Want to see if it contains *?

If so, the null hypothesis cannot be rejected. If this is not the case, we may reject the null hypothesis.

Therefore, if α = 0.05, there is a 95% confidence interval, and if this confidence interval contains µ *, we may not reject the null hypothesis. That is, 95 of the 100 established intervals may contain µ *.

The actual method is as follows. $ \ bar {X} $ ~ N (µ, $ \ sigma $ / n), so image.png

That is, it is a standard regular variable.

Next, in this case, since it is a two-sided test, the z value is calculated from the norm.ppf function with 0.05 / 2 = 0.025.

from scipy.stats import norm
norm.ppf(0.975) #Percent point function (inverse of cdf — percentiles).

1.959963984540054

Pr(-1.96≤Z_i≤1.96)= 0.95 To get In other words Pr(−1.96≤\bar{X}− µσ /√n≤1.96)= 0.95 If you rebuild it, Pr(\bar{X}− 1.96σ√n≤µ≤\bar{X}+ 1.96σ√n)= 0.95

This is the 95% confidence interval for µ. Once this interval is established, the null hypothesis test becomes easier. All we have to do is check if µ = µ * is in this interval. If so, the null hypothesis cannot be rejected. If not, it may be rejected.

Looking at this example, a 95% confidence interval for µ has already been established. This is 66.51 ≤ µ ≤ 67.49. This interval clearly does not include µ = 69. Therefore, we reject the null hypothesis that the true µ is 69 with a 95% confidence factor. Geometrically, the situation is shown in Figure A.12. In hypothesis testing terms, the area of the established confidence interval is called the acceptance zone, and the area outside the adoption zone is called the null hypothesis risk zone or rejection zone. The lower and upper limits of the acceptance zone (which is distinguished from the rejection zone) are called critical values. In this hypothesis testing term, the null hypothesis cannot be rejected if the value of the hypothesis is within the adopted range. Otherwise you can reject it.

There are two types of errors that can occur when deciding whether to reject H0: (1) Even though it is actually H0, H0 may be rejected. This is called the type I error. (Therefore, in the example above, $ \ bar {X} $ = 67 may belong to a population with a mean of 69), or (2) Even though H0 is actually false, it may not be rejected. This is called the type II error. Therefore, the hypothesis test does not determine the true value of µ. It only provides a means to determine if µ = µ *.

image.png

Ideally, you want to minimize both Type 1 and Type 2 errors. Unfortunately, changing the size of the specimen does not minimize both errors at the same time. The classic approach to this problem is embodied in Neyman and Pearson's work, where type I errors are actually more serious than type II errors.

Therefore, try to keep the probability of type I errors occurring at fairly low levels, such as 0.01 or 0.05. Second, it minimizes the possibility of type II errors. In the literature, the probability of type I error is written as α, which is called the significance level, and the probability of type II error is written as β. The probability that a type II error will not occur is called detection. In other words, detection is the ability to reject false null hypotheses. The classic method for hypothesis testing is to fix α to a level such as 0.01 (or 1 percent) or 0.05 (5 percent) before maximizing detection. That is, it minimizes β. It is important for the reader to understand the concept of detection. This will be explained using an example.

Let X to N (µ, 100). That is, X is a normal distribution with mean µ and variance 100. Suppose α = 0.05. Suppose you have 25 observed samples, of which $ \ bar {X} $ gives the sample mean. We also assume H0: µ = 50. Since X follows a normal distribution, the sample mean also follows a normal distribution from $ \ bar {X} $ to N (µ, 100/25). Therefore, under the null hypothesis that µ = 50, the 95% confidence interval for $ \ bar {X} $ is (µ ± 1.96 (100/25) = µ ± 3.92, or (46.08 to 53.92). Therefore, the rejection interval consists of all values of X less than 46.08 or greater than or equal to 53.92, which means that if the sample mean is less than 46.08 or greater than 53.92, the null hypothesis that the true mean is 50 is rejected. ..

But if the true µ value is different from 50, what is the probability that $ \ bar {X} $ is in the rejection zone shown earlier? Suppose you have three alternative hypotheses: µ = 48, µ = 52, and µ = 56. If any of these choices are true, it will be the actual mean of the distribution of $ \ bar {X} $. Since $ \ sigma ^ 2 $ is assumed to be 100, the standard error of the three alternatives remains the same.

The shaded area in Figure A.13 shows the probability that $ \ bar {X} $ will fall into the rejection zone if each of the alternative hypotheses is true. As you can see, these probabilities are 0.17 (for µ = 48), 0.05 (for µ = 50), 0.17 (for µ = 52), and 0.85 (for µ = 56). As you can see from this figure, whenever the true value of µ is significantly different from the hypothesis under consideration (here µ = 50), there is a high probability of rejecting the hypothesis, but the true value is the null hypothesis. If it is not much different from the value given under, the probability of rejection is small. Intuitively, this makes sense when the null hypothesis and the alternative hypothesis overlap very closely.

image.png

print(norm.cdf((46.1-48)/np.sqrt(100/25)))
print(norm.cdf((46.1-50)/np.sqrt(100/25))+1-norm.cdf((53.9-50)/np.sqrt(100/25)))
print(1-norm.cdf((53.9-52)/np.sqrt(100/25)))
print(1-norm.cdf((53.9-56)/np.sqrt(100/25)))
0.171056126308482
0.051176119043277346
0.171056126308482
0.8531409436241042

This can be further understood by looking at Figure A.14. This is called the detection force function graph, and the curve shown there is called the detection force curve. The reader will now notice that the confidence factor (1 − α) mentioned above is simply 1 minus the probability of a type I error. Therefore, a 95% confidence factor means that you are willing to accept a maximum of 5% probability of a Type I error. This is when you don't want to reject the true hypothesis more than 5 out of 100 times. image.png

%matplotlib inline
import matplotlib.pyplot as plt
xxx=range(40,61,1)
x=[]
for xx in xxx:
    x.append(norm.cdf((46.1-xx)/np.sqrt(100/25))+1-norm.cdf((53.9-xx)/np.sqrt(100/25)))
plt.plot(xxx,x)

image.png p-value, or exact significance level

Instead of pre-selecting α at any level, such as 1, 5, or 10 percent, calculate the exact significance level of the p (probability) value or test statistic. The p value is defined as the lowest significance level at which the null hypothesis can be rejected.

Suppose you want to get a t-value of 3.552 with 20 degrees of freedom. Here, the exact probability of getting a p-value, or a t-value greater than or equal to 3.552, can be considered from Table D.2 to 0.001 (one side) or 0.002 (both sides). The observed t-test of 3.552 is statistically significant at the 0.001 or 0.002 level, depending on whether you are using a one-sided or two-sided test. Some statistical packages output a p-value as a test statistic. Therefore, it is recommended that the reader present the p-value as much as possible.

Significance test method

Z_i = \frac{\bar{X}- \mu}{\sigma \sqrt{n}} \sim N(0,1)

Remember that. $ \ bar {X} $ and n address (or can be estimated) a known problem, but the actual µ and σ are unknown. However, assuming σ is specified and µ = µ * (a specific number) (under H0), $ Z_i $ can be calculated directly, and a quick look at the normal distribution table shows the calculated Z value. You get the probability of getting. If this probability is small, for example less than 5% or 1%, you can reject the null hypothesis. If the hypothesis is true, it is very likely that you will get a particular Z value. This is the general idea behind the significant difference test method for hypothesis testing. An important idea here is the test statistic (here the Z statistic) and the probability distribution under the assumed value µ = µ $ ^ \ * $. Appropriately, this test is called the Z-test because it uses the Z (standardized standard) value in this case.

image.png

Looking at the normal distribution table D.1, we can see that the probability of getting such a Z value is very small. (Note: The probability that the Z value will exceed 3 or -3 is about 0.001, so the probability that Z will exceed 8 is even smaller.) Therefore, the null hypothesis of µ = 69 can be rejected. Given this value, it is very unlikely that $ \ bar {X} $ will be 67. Therefore, it is doubtful whether the sample is from a population with a mean of 69. The situation is shown in Figure A.15.

image.png

The important thing is that you can generally reject the null hypothesis. And the test statistic is considered significant if the probability of getting it is less than or equal to the probability of making a type I error α. Therefore, for α = 0.05, we can see that the probability of getting a Z value of -1.96 or 1.96 is 5% (or 2.5% for each tail of the standardized normal distribution). In this example, Z was -8. Therefore, the probability of getting such a Z value is much less than 2.5 percent, well below the pre-specified probability of making a type I error. Therefore, the calculated value of Z = -8 is statistically significant. That is, it rejects the null hypothesis that the true µ * is 69. Of course, the same conclusion was obtained by using confidence intervals in the hypothesis test.

Here is a summary of the steps for statistical hypothesis testing.

Step 1. Determine the null hypothesis H0 and the alternative hypothesis H1 (for example, H0: µ = 69 and H1: µ $ \ ne $ 69). Step 2. Select a test statistic (such as $ \ bar {X} $). Step 3. Determine the probability distribution of the test statistic (for example, $ \ bar {X} $ ~ N (µ, $ \ sigma ^ 2 $ / n). Step 4. Select the level of importance (that is, the probability of a type I error) α. Step 5. Use the probability distribution of the test statistic to find the 100 (1 − α)% confidence interval. If the value of the parameter under the null hypothesis (for example, µ = µ * = 69) is in this confidence interval (adopted range), then the null hypothesis is not rejected. However, if you are outside this interval (that is, in the rejection zone), you can reject the null hypothesis. Note that if you do not reject the null hypothesis, there is an α percent chance that your decision will be wrong.

This is the end of the simple translation of Basic Econometrics.

About the rejection of the null hypothesis

Meaning that the null hypothesis is rejected

In the hypothesis test, either the null hypothesis or the alternative hypothesis is selected as a statistical judgment based on the sample. If an alternative hypothesis is selected, the probability that this selection is incorrect is guaranteed to be less than or equal to $ \ alpha $. In other words, the alternative hypothesis holds strongly.

Meaning that the null hypothesis is not rejected

There is no reason to actively support the null hypothesis that it is not rejected. There was simply insufficient reason for the null hypothesis to be rejected.

Type I error

The type I error is an error that is rejected when the null hypothesis is correct. This probability is referred to as $ \ alpha $.

Type II error

It is an error that is not rejected when the null hypothesis is incorrect. It is an error that does not reject the null hypothesis when the alternative hypothesis is correct. Write this probability as $ \ beta $. When the alternative hypothesis is correct, the probability of being correct is $ 1- \ beta $.

Significance level

Ideally, both type I and type II errors are small at the same time. However, these two are in a trade-off relationship. So, in general, we try to reduce type II errors while keeping type I errors below a certain level of $ \ alpha $. This $ \ alpha $ is called the significance level and is the probability of a type I error.

H_0Is correct(H_1Is wrong) H_1Is correct(H_0Is wrong)
H_0Rejection Type I error(\alpha) Correct judgment(1-\beta)
H_0Adopted Correct judgment(1-\alpha) Type II error(\beta)

Detection power

Now consider an attempt to throw money 20 times. Is the currency distorted when the table appears 15 times?

The null hypothesis is that this money is not distorted.

Null hypothesis $ H_0 $: $ p = 0.5 $ Alternative hypothesis $ H_1 $: $ p \ ne 0.5 $

So, first of all, let's assume that this currency is not distorted and find the probability that the table will appear 15 times out of 20 times. $ p[X=k]=\frac{n!}{k!(n-k)!}p^k (1-p)^{(1-k)}=\frac{20!}{15!(20-15)!}0.5^{15}0.5^5 =0.015 $ This finds the probability that the table will appear 15 times without distortion.

from math import factorial
a=factorial(20)/factorial(15)/factorial(20-15)
b=a*0.5**15*0.5**5
print(a,b)

15504.0 0.0147857666015625

Similarly, if you get 15 times, you should get 16, 17, 18, 19, and 20 times. $ p[X=k]=\frac{20!}{16!(20-16)!}0.5^{16}0.5^4 =0.005 $ $ p[X=k]=\frac{20!}{17!(20-17)!}0.5^{17}0.5^3 =0.001 $ $ p[X=k]=\frac{20!}{18!(20-18)!}0.5^{18}0.5^2 =0.0001 $ Omitted below

p=0.5
sump=0
for i in range(6):
    a=factorial(20)/factorial(20-i)/factorial(i)
    b=a*p**(20-i)*(1-p)**i
    sump+=b
    print(i,a,b)
print("total",sump)

0 1.0 9.5367431640625e-07
1 20.0 1.9073486328125e-05
2 190.0 0.0001811981201171875
3 1140.0 0.001087188720703125
4 4845.0 0.004620552062988281
5 15504.0 0.0147857666015625
total 0.020694732666015625

The sum of the probabilities of appearing more than 15 times is now 2%.

But that's not enough. If the table appears 15 times, it may appear only 5 times.

p=0.5
sump=0
for i in range(15,21):
    a=factorial(20)/factorial(20-i)/factorial(i)
    b=a*p**(20-i)*(1-p)**i
    sump+=b
    print(i,a,b)
print("total",sump)

15 15504.0 0.0147857666015625
16 4845.0 0.004620552062988281
17 1140.0 0.001087188720703125
18 190.0 0.0001811981201171875
19 20.0 1.9073486328125e-05
20 1.0 9.5367431640625e-07
total 0.020694732666015625

Therefore, the total is 4%.

At a significance level of 5%, this is not enough to reject the null hypothesis that money is undistorted. To understand this Next, let's assume that the probability of appearing in the table is 0.6. $ p[X=k]=\frac{n!}{k!(n-k)!}p^k (1-p)^{(1-k)}=\frac{20!}{15!(20-15)!}0.6^{15}0.4^5 =0.075 $ $ p[X=k]=\frac{20!}{16!(20-16)!}0.6^{16}0.4^4 =0.035 $ $ p[X=k]=\frac{20!}{17!(20-17)!}0.6^{17}0.4^3 =0.012 $ $ p[X=k]=\frac{20!}{18!(20-18)!}0.6^{18}0.4^2 =0.0003 $

Then, the probability of appearing in the 15th inning is 12.5%. Let's calculate the other one as well.

p=0.6
sump=0
for i in range(6):
    a=factorial(20)/factorial(20-i)/factorial(i)
    b=a*p**(20-i)*(1-p)**i
    sump+=b
    print(i,a,b)
print("total",sump)
print()
for i in range(15,21):
    a=factorial(20)/factorial(20-i)/factorial(i)
    b=a*p**(20-i)*(1-p)**i
    sump+=b
    print(i,a,b)
print("two tails total",sump)

0 1.0 3.6561584400629733e-05
1 20.0 0.00048748779200839646
2 190.0 0.003087422682719845
3 1140.0 0.01234969073087938
4 4845.0 0.03499079040415825
5 15504.0 0.07464701952887093
total 0.12559897272303744

15 15504.0 0.0012944935222876579
16 4845.0 0.00026968615047659537
17 1140.0 4.230370987868163e-05
18 190.0 4.700412208742404e-06
19 20.0 3.2985348833280036e-07
20 1.0 1.0995116277760013e-08
two tails total 0.12721049736649373

It will be. In other words, there is a 4% chance that an event such as 0,1,2,3,4,5,15,16,17,18,19,20 times in the table will occur. If you do the same with $ p_0 = 0.6 $ and there is distortion, the probability is 12.72%. Let's plot the two side by side for better understanding. If the area to the right of the blue vertical line of the distribution is larger than the blue distribution, then the orange distribution is closer to the right than the blue distribution, which means that the coins are likely to be distorted. Become.

%matplotlib inline
import matplotlib.pyplot as plt
p05=[]
p06=[]
ii=range(21)
for i in ii:
    a=factorial(20)/factorial(20-i)/factorial(i)
    p05.append(a*0.5**(20-i)*0.5**i)
    p06.append(a*0.4**(20-i)*0.6**i)
plt.plot(ii,p05,label='p=0.5',linestyle='--')
plt.plot(ii,p06,label='p=0.6')
plt.axvline(x=15)
plt.legend()

image.png

Let's do the same with a normal distribution.

Null hypothesis

H_0: p_0=0.5

Then, the rejection area is

|X-np_0|=|X-20 \cdot 0.5|=|X-10|>c

Given in. Assuming a significance level of 5%

P(|X-n \cdot p_0|\ge c |p=p_0)=0.05

It will be.

If $ X-n \ cdot p_0> 0 $, then $ X-n \ cdot p_0> c $, that is, $ c <X-n \ cdot p_0 $

If $ X-n \ cdot p_0 <0 $, then $ -X + n \ cdot p_0> c $, that is, $ X-n \ cdot p_0 <-c $

It will be. This is the rejection area.

If the hypothesis is correct, $ X $ follows a normal distribution with mean $ n \ cdot p_0 $ and variance $ n \ cdot p_0 (1-p_0) . Also $ z=\frac{X-p \cdot n}{\sqrt{p(1-p)n}}$$ Follows a normal distribution with mean zero and variance 1.

|X-n \cdot p_0|\ge 1.96 \sqrt{n \cdot p_0(1-p_0)}=1.96 \sqrt{20 \cdot 0.5 \cdot 0.5 }=4.4

It will be.

If $ X-n \ cdot p_0> 0 $, then $ X-n \ cdot p_0> 4.4 $, that is, $ 4.4 <X-n \ cdot p_0 $

If $ X-n \ cdot p_0 <0 $, then $ -X + n \ cdot p_0> 4.4 $, that is, $ X-n \ cdot p_0 <-4.4 $

So $ x <5.6 $ and $ 14.4 <X $ are the rejection areas.

If $ p_0 = 0.6 , $ z=\frac{14.4-0.6 \cdot 20}{\sqrt{0.6\cdot0.4\cdot 20}}=1.095$$ $ z=\frac{5.6-0.6 \cdot 20}{\sqrt{0.6\cdot0.4\cdot 20}}=-2.92$ You just have to find the probability of.

a=(14.4-0.6*20)/np.sqrt(0.6*0.4*20)
b=(5.6-0.6*20)/np.sqrt(0.6*0.4*20)
print(a,b)
a=1-norm.cdf((14.4-0.6*20)/np.sqrt(0.6*0.4*20))
b=norm.cdf((5.6-0.6*20)/np.sqrt(0.6*0.4*20))
print(a,b,a+b)

1.0954451150103324 -2.921186973360886
0.13666083914614902 0.001743502446070691 0.1384043415922197

The probability is now 13.8%. Therefore, the probability that the null hypothesis will be rejected is 13.8%, which exceeds the significance level of 5%.

Detection power

Expressing the probability of making a type II error in $ \ beta $, the probability of determining that $ H_1 $ is correct is $ 1- \ beta $. This $ 1- \ beta $ is called detection power. What is important here is that the probability of accepting the null hypothesis when it is correct is $ 1- \ alpha $, which is not the same as the probability of not rejecting the null hypothesis when the alternative hypothesis is correct. To find $ \ beta $, you need to specify $ \ theta_1 $, such as $ H_1: \ theta = \ theta_1 $.

Next, $ H_0 $ must be rejected in order to determine that it is correct when $ H_1 $ is correct. Also, $ H_1 $ must be specified as a specific value instead of $ \ ne \ theta_0 $, and if it is $ \ theta_1 $, the significance level at that time will be $ \ beta $.

In the previous example, the detection power is 13.8%. But what if you want higher detection power?

Detection power function

Estimated number of data

As mentioned above, with 20 data, the detection power does not increase. Under what conditions, for example, can the detection power be increased to nearly 90%?

Assuming that the probability of the front and back of money is the same, the rejection area at the significance level of 5% is that when the number of observation data is 20,

|X-np_0|=|X-20 \cdot 0.5|=|X-10|>c

Given in. First, replace this number of observations with $ n $. At that time, the number of data is expected to be large, so the binomial distribution is approximated by a normal distribution.

from scipy.stats import norm
norm.ppf(0.025)

-1.9599639845400545

Then

|X-np_0|=|X-0.5n|\ge 1.96 \cdot 0.5 \sqrt{n}=0.98\sqrt{n}

It will be. First

P(|X-np_0|=|X-n0.5|> 0.98\sqrt{n})

Think about. This is the probability that the null hypothesis will be rejected.

Next, suppose that the actual currency is distorted and the probabilities of front and back are $ p $ and $ 1-p $.

z=\frac{X-p \cdot n}{\sqrt{p(1-p)n}}

Follows a normal distribution with mean zero and variance 1.

What is the probability that $ X $ will be in the rejection zone when $ p = 0.7 $?

P(|X-n0.5|\ge 0.98\sqrt{n}|p=0.7,1-p=0.3)

I want it to be 90% or more.

Divide $ P (X-0.5n> 0.98 \ sqrt {n}) $ by $ \ sqrt {p (1-p) n} $, move $ 0.5n $ to the right, and add -pn to both sides. When transformed

P\left(\frac{X-p \cdot n}{\sqrt{p(1-p)n}}>\frac{(0.5-p)\sqrt{n}+0.98}{\sqrt{p(1-p)}}\right)+ P\left(\frac{X-p \cdot n}{\sqrt{p(1-p)n}}<\frac{(0.5-p)\sqrt{n}-0.98}{\sqrt{p(1-p)}}\right)>0.9

It will be.

When $ p = 0.7 $, $ X $ approximately follows a normal distribution with mean $ 0.7n $ and standard deviation $ \ sqrt {0.7 \ cdot 0.3 * n} = 0.46 \ sqrt {n} $.

Now, let's actually find the parameters. $ P\left(\frac{X-0.7 \cdot n}{0.46\sqrt{n}}>2.13-0.43\sqrt{n}\right)+P\left(\frac{X-0.7 \cdot n}{0.46\sqrt{n}}<-2.13+0.43\sqrt{n}\right)$

norm.ppf(0.9)

1.2815515655446004

Then, it becomes $ -2.13 + 0.43 \ sqrt {n} \ ge1.282 $, and 90% can be achieved when $ n $ is about 63.

((1.282+2.13)/0.43)**2

62.9623796646836
%matplotlib inline
import matplotlib.pyplot as plt
p05=[]
p06=[]
n=61
ii=range(1,n+1)
for i in ii:
    a=factorial(n)/factorial(n-i)/factorial(i)
    p05.append(a*0.5**(n-i)*0.5**i)
    p06.append(a*0.3**(n-i)*0.7**i)
plt.plot(ii,p05,label='p=0.5',linestyle='--')
plt.plot(ii,p06,label='p=0.7')
plt.axvline(x=39)
plt.legend()

image.png

The area of the orange distribution to the right of the blue vertical line is now 90%. In other words, the area to the right of the orange distribution from the vertical line is 1-β and the part to the left is β.

More generally

Test statistic for significance level $ \ alpha $ $z=\frac{\bar{x}-\mu_0}{\sigma/\sqrt{n}}$ Realization of $ z ^ \ prime $

If $ z ^ \ prime \ ge z_ \ alpha $, then $ H_0 $ is rejected. If $ z ^ \ prime <z_ \ alpha $, it's not enough to reject $ H_0 $.

It will be.

H_0:\mu = \mu_0
H_1:\mu \ne \mu_0
If you set $ \ mu $ to $ \ mu = \ mu_1 $ ($ \ mu_1 $ is known) The test statistic when $ H_0 $ or $ H_1 $ is correct

E(z|H_0)=0
E(z|H_1)=\sqrt{n}\delta
V(z|H_0)=V(z|H_1)=1 Will have a normal distribution of. here $\frac{\mu_1-\mu_0}{\sigma/\sqrt{n}}=\sqrt{n}\frac{\mu_1-\mu_0}{\sigma}=\sqrt{n}\delta$

Converting to $ z ^ \ star = z- \ sqrt {n} \ delta $ z^\star \sim N(0,1)
It will be.

P(z\le z_\alpha|H_1)=P(z^\star\le z_\alpha-\sqrt{n}\delta|z^\star\sim N(0,1))

Determines if the stock price return is zero. Volatility and annual rates of return are known to be 20% and 20%. How much data is there to make the detection power 90% at the 5% significance level?

H_0:\mu=\mu_0=0
H_1:\mu \ne \mu_0

Corrects volatility to the standard deviation of the day.

import numpy as np
0.2/np.sqrt(250)

0.012649110640673518

If $ \ mu_1 $ is 5%, the daily return is

0.20/250

0.0008
norm.ppf(0.05)

1.2815515655446004

z^\star=z_\alpha-\sqrt{n}\delta=1.645-\sqrt{n}0.0008/0.0126

0.0008/0.0126

46.42857142857142
norm.ppf(0.9)

(1.645+1.28)/0.063

\sqrt{n}=\frac{1.645+1.28}{0.0158}=46

46**2

2116

You will need data for almost 10 years.

Therefore, let's set the annual rate of return to 90%.

v=0.2
r=0.9
a=0.05
power=0.9
v1=v/np.sqrt(250)
r1=r/250
x=norm.ppf(1-a)
y=norm.ppf(power)
((x+y)/(r1/v1))**2

105.7265105020738

It means that we only need about 100 days of data.

β

image.png

image.png

The detection power can be increased by increasing the difference in P or increasing the number of data.

Is the null hypothesis difficult?

I explained Kimukasetsu in the translation part of BASIC ECONOMETRICS, but it is often the case that the role of a particular statistical test is known, but the null hypothesis is somewhat obscured.

jarque-Bera test

It is one of the goodness-of-fit tests for the normal distribution and uses the kurtosis of skewness. The null hypothesis has zero skewness and kurtosis. In other words, it follows a normal distribution.

Let's try the normal distribution.

import numpy as np
import statsmodels.api as sm
import statsmodels.stats.api as sms
from statsmodels.compat import lzip
nsample=100
name = ['Jarque-Bera', 'Chi^2 two-tail prob.', 'Skew', 'Kurtosis']
e = np.random.normal(size=nsample)
test = sms.jarque_bera(e)
lzip(name, test)

[('Jarque-Bera', 0.12532827370844277),
 ('Chi^2 two-tail prob.', 0.9392588831621588),
 ('Skew', 0.061591593745002705),
 ('Kurtosis', 2.877915242516404)]

The null hypothesis is not enough to reject. It cannot be denied that random numbers have a normal distribution. Next, let's try a uniform distribution.

ee=np.random.rand(nsample)
test = sms.jarque_bera(ee)
lzip(name, test)

[('Jarque-Bera', 6.766980830340755),
 ('Chi^2 two-tail prob.', 0.033928822141024814),
 ('Skew', -0.13374676721016596),
 ('Kurtosis', 1.7539973481869713)]

The null hypothesis was rejected. Random numbers do not follow a normal distribution.

Dickey–Fuller test

The DF or ADF test is a compound hypothesis. The null hypothesis is that there is a unit root in the AR model. There are three types.

  1. Random walk without drift image.png
  2. Random walk with drift image.png
  3. Random walk with drift + time trend image.png

H_0:\delta=0 H_1: \delta<0 is.

Trend stationary

from statsmodels.tsa.stattools import adfuller
import pandas as pd
index=['ADF Test Statistic','P-Value','# Lags Used','# Observations Used']

nsample = 100
y_true = np.linspace(0, 0.2*nsample/260, nsample)
y = y_true
plt.plot(y)
adfTest = adfuller(y, autolag='AIC')
dfResults = pd.Series(adfTest[0:4], index)
for key,value in adfTest[4].items():
    dfResults['Critical Value (%s)'%key] = value
print('Augmented Dickey-Fuller Test Results:')
print(dfResults)

Augmented Dickey-Fuller Test Results:
ADF Test Statistic       0.689710
P-Value                  0.989630
# Lags Used             11.000000
# Observations Used     88.000000
Critical Value (1%)     -3.506944
Critical Value (5%)     -2.894990
Critical Value (10%)    -2.584615

image.png

Since the generated time series is trend stationary, the null hypothesis was not rejected.

In such cases, the DF (ADF) test should be performed after removing the trend.

Normal random number

nsample = 100
y_true = np.linspace(0, 0.2*nsample/260, nsample)
e = np.random.normal(size=nsample)
y = y_true + e
plt.plot(y)
adfTest = adfuller(y, autolag='AIC')
dfResults = pd.Series(adfTest[0:4], index)
for key,value in adfTest[4].items():
    dfResults['Critical Value (%s)'%key] = value
print('Augmented Dickey-Fuller Test Results:')
print(dfResults)

Augmented Dickey-Fuller Test Results:
ADF Test Statistic     -1.141212e+01
P-Value                 7.207876e-21
# Lags Used             0.000000e+00
# Observations Used     9.900000e+01
Critical Value (1%)    -3.498198e+00
Critical Value (5%)    -2.891208e+00
Critical Value (10%)   -2.582596e+00

image.png

Since it is a normal random number, the null hypothesis was rejected.

Random walk without drift

from statsmodels.tsa.stattools import adfuller
import pandas as pd
index=['ADF Test Statistic','P-Value','# Lags Used','# Observations Used']

nsample = 100
e = np.random.normal(size=nsample)
y = np.cumsum(e)
adfTest = adfuller(y, autolag='AIC')
dfResults = pd.Series(adfTest[0:4], index)
for key,value in adfTest[4].items():
    dfResults['Critical Value (%s)'%key] = value
print('Augmented Dickey-Fuller Test Results:')
print(dfResults)

Augmented Dickey-Fuller Test Results:
ADF Test Statistic       0.470182
P-Value                  0.983938
# Lags Used              0.000000
# Observations Used     99.000000
Critical Value (1%)     -3.498198
Critical Value (5%)     -2.891208
Critical Value (10%)    -2.582596
dtype: float64

The generated time series was a random walk with no drift, so it was not enough to reject the null hypothesis.

Random walk + deterministic trend

Let's create a random walk with a deterministic trend of 20% annual return and 20% volatility. Let's add them together and see what the results look like using the Dickey-Fuller test.

nsample = 260
y_true = np.linspace(0, 0.2*nsample/260, nsample)
e = np.random.normal(size=nsample)
sigma=0.2/np.sqrt(260)
y = y_true + np.cumsum(e*sigma)
plt.plot(y)
adfTest = adfuller(y,regression='nc', autolag='AIC')
dfResults = pd.Series(adfTest[0:4], index)
for key,value in adfTest[4].items():
    dfResults['Critical Value (%s)'%key] = value
print('Augmented Dickey-Fuller Test Results reg=nc:')
print(dfResults)
adfTest = adfuller(y,regression='c', autolag='AIC')
dfResults = pd.Series(adfTest[0:4], index)
print('Augmented Dickey-Fuller Test Results reg=c:')
print(dfResults)
adfTest = adfuller(y,regression='ct', autolag='AIC')
dfResults = pd.Series(adfTest[0:4], index)
print('Augmented Dickey-Fuller Test Results reg=ct:')
print(dfResults)

Augmented Dickey-Fuller Test Results reg=nc:
ADF Test Statistic       -1.618005
P-Value                   0.099695
# Lags Used               1.000000
# Observations Used     258.000000
Critical Value (1%)      -2.574460
Critical Value (5%)      -1.942090
Critical Value (10%)     -1.615830
dtype: float64
Augmented Dickey-Fuller Test Results reg=c:
ADF Test Statistic       -1.838595
P-Value                   0.361476
# Lags Used               1.000000
# Observations Used     258.000000

Augmented Dickey-Fuller Test Results reg=ct:
ADF Test Statistic       -2.218211
P-Value                   0.479608
# Lags Used               5.000000
# Observations Used     254.000000

image.png

  1. In the random walk without drift, the null hypothesis was rejected.
  2. In the random walk with drift, the null hypothesis was not rejected.
  3. In the random walk with drift + time trend, the null hypothesis was not rejected.

In the DF or ADF test, if the time series model is not known in advance, the detection power will be reduced. Therefore, it is necessary to determine the time series model in advance.

Also, the above three types are not enough to distinguish. There are many possible deterministic trends, and the debate is endless.

Campbell, J. Y.; Perron, P. (1991). "Pitfalls and Opportunities: What Macroeconomists Should Know about Unit Roots"

Stock J. Unit Roots, Structural Breaks, and Trends. In: Engle R, McFadden D Handbook of Econometrics. Amsterdam: Elsevier ; 1994. pp. 2740-2843.

nsample = 260
parm=nsample/260*0.2
y_true = np.linspace(0, parm, nsample)
e = np.random.normal(size=nsample)
sigma=0.2/np.sqrt(260)
ar=[1]
i=1
a=0.9
for ee in e:
    ar.append(1-a+a*ar[i-1]+ee*sigma)
    i+=1
y = y_true + ar[1:]
plt.plot(y)
adfTest = adfuller(y,regression='nc')
dfResults = pd.Series(adfTest[0:4], index)
for key,value in adfTest[4].items():
    dfResults['Critical Value (%s)'%key] = value
print('Augmented Dickey-Fuller Test Results reg=nc:')
print(dfResults)
adfTest = adfuller(y,regression='c')
dfResults = pd.Series(adfTest[0:4], index)
for key,value in adfTest[4].items():
    dfResults['Critical Value (%s)'%key] = value
print('Augmented Dickey-Fuller Test Results reg=c:')
print(dfResults)
adfTest = adfuller(y,regression='ct')
dfResults = pd.Series(adfTest[0:4], index)
for key,value in adfTest[4].items():
    dfResults['Critical Value (%s)'%key] = value
print('Augmented Dickey-Fuller Test Results reg=ct:')
print(dfResults)

image.png

Prediction direction

In the direction evaluation, if the direction of the actual price movement and the direction of the forecast are the same, it is set to 1, otherwise it is set to 0. Then, the difference is judged by comparing the average value of the evaluations with 0.5, that is, the situation where the number of correct directions and the number of incorrect directions are the same. Assuming that the null hypothesis has no predictive ability ($ dL = 0.5 $), the following statistic is used for the test.

\frac{\hat{dL}-0.5}{\sqrt{0.25/n}}

Follows a normal distribution.

References

Basic Econometricsby Damodar N. Gujarati Mathematical Statistics Kei Takeuchi Introduction to Statistics, Department of Statistics, Faculty of Liberal Arts, University of Tokyo 7 principles of statistics <IMG SRC="http://www.tradersshop.com/images/prod/9784775941683.jpg "

Recommended Posts

Introduction to Statistical Hypothesis Testing with stats models
Introduction to Vector Autoregressive Models (VAR) with stats models
Introduction to Vector Error Correcting Models (VECM) with stats models
Introduction to Generalized Linear Models (GLM) with Python
Introduction to RDB with sqlalchemy Ⅰ
Introduction to RDB with sqlalchemy II
Introduction to Statistical Modeling for Data Analysis Generalized Linear Models (GLM)
Introduction to Bayesian Statistical Modeling with python ~ Trying Linear Regression with MCMC ~
[Introduction to WordCloud] Let's play with scraping ♬
Introduction to Machine Learning: How Models Work
[Logistic regression] Implement k-validation with stats models
Introduction to Python Image Inflating Image inflating with ImageDataGenerator
Collectively implement statistical hypothesis testing in Python
[Introduction to Python] Let's use foreach with Python
[Python] Introduction to CNN with Pytorch MNIST
[Introduction to Pytorch] I played with sinGAN ♬
How to create sample CSV data with hypothesis
[Python] Easy introduction to machine learning with python (SVM)
Introduction to Artificial Intelligence with Python 1 "Genetic Algorithm-Theory-"
Markov Chain Chatbot with Python + Janome (1) Introduction to Janome
Markov Chain Chatbot with Python + Janome (2) Introduction to Markov Chain
Introduction to Artificial Intelligence with Python 2 "Genetic Algorithm-Practice-"
[Introduction to StyleGAN2] Independent learning with 10 anime faces ♬
Introduction to Tornado (1): Python web framework started with Tornado
An introduction to statistical modeling for data analysis
Introduction to formation flight with Tello edu (Python)
[Introduction to minimize] Data analysis with SEIR model ♬
Introduction to Python with Atom (on the way)
[Introduction to Udemy Python3 + Application] 9. First, print with print
[Logistic regression] Implement holdout verification with stats models
Introduction to MQTT (Introduction)
Introduction to Scrapy (1)
Introduction to Scrapy (3)
Introduction to Supervisor
Introduction to Tkinter 1: Introduction
Introduction to PyQt
Introduction to Scrapy (2)
[Linux] Introduction to Linux
Introduction to Scrapy (4)
Introduction to discord.py (2)
Introduction to discord.py
[Introduction to Python] How to iterate with the range function?
[Introduction to WordCloud] It's easy to use even with Jetson-nano ♬
[Chapter 5] Introduction to Python with 100 knocks of language processing
An introduction to Python distributed parallel processing with Ray
Introduction to Mathematics Starting with Python Study Memo Vol.1
Reading Note: An Introduction to Data Analysis with Python
[Chapter 6] Introduction to scikit-learn with 100 knocks of language processing
[Chapter 3] Introduction to Python with 100 knocks of language processing
[Introduction to Pytorch] I tried categorizing Cifar10 with VGG16 ♬
[Chapter 2] Introduction to Python with 100 knocks of language processing
[Introduction to AWS] I tried playing with voice-text conversion ♪
[Chapter 4] Introduction to Python with 100 knocks of language processing