Confidence interval of ** difference in population ratio **, not confidence interval of population ratio.
Detailed explanation is omitted here. The following site is easy to understand.
Confidence interval for difference in population ratio
In business, we often perform "chi-square test" and "test for difference in population ratio". Of course, it is important to pay attention to the conclusion that there is a significant difference **, but if you just pay attention to it, it is difficult to grasp the effect size and variation **. Let's make it a little more intuitive! The flow.
The confidence interval for the population ratio seems to be found in the library, but it seems that the confidence interval for the difference in population ratio is not done (1 minute survey). How to use Python to estimate the 95% confidence interval for the population ratio and determine a reasonable sample size
The calculation formula is not complicated, so implement it quickly.
(\hat{p_1} - \hat{p_2}) - z_\frac{\alpha}{2} \times \sqrt{\frac{\hat{p_1}(1 - \hat{p_1})}{n_1} + \frac{\hat{p_2}(1 - \hat{p_2})}{n_2}} \leq \hat{p_1} - \hat{p_2} \leq \\ (\hat{p_1} - \hat{p_2}) + z_\frac{\alpha}{2} \times \sqrt{\frac{\hat{p_1}(1 - \hat{p_1})}{n_1} + \frac{\hat{p_2}(1 - \hat{p_2})}{n_2}}
The detailed explanation is explained in the site introduced earlier. The left expression is called lower bound, and the right expression is called upper bound.
If the lower bound and upper bound do not cross 0, it can be said that there is a significant difference. How to find the 95% confidence interval? Relationship with significant differences and the meaning and formula of 1.96
It's a religion that doesn't move, so I love it.
Image of feeding a 2x2 cross tabulation table with csv.
Purchase | Not purchased | |
---|---|---|
Man | 50 | 100 |
woman | 40 | 120 |
main.py
import csv
import numpy as np
#Parameters
z = 1.96
#Read test data
with open('test.csv') as f:
reader = csv.reader(f, quoting=csv.QUOTE_NONNUMERIC)
d = [row for row in reader]
#Calculate population ratio
p = [d[0][0]/sum(d[0]), d[1][0]/sum(d[1])]
# 95%Calculate confidence interval
lb = (p[0]- p[1]) - z * np.sqrt(p[0] * (1 - p[0]) / sum(d[0]) + p[1] * (1 - p[1]) / sum(d[1]))
ub = (p[0]- p[1]) + z * np.sqrt(p[0] * (1 - p[0]) / sum(d[0]) + p[1] * (1 - p[1]) / sum(d[1]))
#Output result
print('95 of the difference in population ratio%Confidence interval: {:.3f} <= p1 - p2 <= {:.3f}'.format(lb, ub))
It may have been a niche, but it should be convenient ...
Recommended Posts