Problem setting

In a supermarket, ** the number of customer complaints (per day) ** was aggregated for $ 365 $ a day, and the results were as follows.

Number of complaints	0 cases	1	2 cases	3 cases	4 cases	5 cases	6	7 cases
Observation frequency (Number of days)	22	49	82	86	63	39	16	8

Can we think of this data as ** following a Poisson distribution **? Test at the significance level $ 5 % $.

Visualization

Let's graph the "** expected frequency " and " observed frequency **" calculated by hypothesizing that they follow the Poisson distribution.

"** Expected frequency " is calculated as follows. First, multiply the "number of complaints" by the " observation frequency " and divide by the number of observation days $ 365 $ to get the ** average number of complaints per day ** $ \ mu = 2.931 . Then, find the probability mass ( k = 0.7 $) for the ** Poisson distribution ** with an average of $ 2.931 $, and multiply it by $ 365 $ for the number of observation days to obtain " expected frequency **".

`Visualization of observed and expected frequencies`


import numpy as np
import scipy.stats as st
import japanize_matplotlib
import matplotlib.pyplot as plt

n = np.arange(0,7+1)
f = np.array((22,49,82,86,63,39,16,8)) #frequency

mu = (n*f).sum()/f.sum() #Calculate average number of claims 2.931

Po = st.poisson(mu=mu) # μ=2.Poisson distribution of 931
xp = Po.pmf(k=n)       # k=1...Probability mass of 7
xp[-1] = 1 - st.poisson.cdf(n[-2],mu=mu) # k>=Probability mass of 7 k=Integrated into 7
fp = xp * f.sum()      #Expected frequency

plt.figure(figsize=(6,3),dpi=120,facecolor='white')
plt.bar(n,f,width=0.9, label='Observation frequency')
plt.plot(n,fp,'o--', c='tab:orange', label=f'Expected frequency$Po\,(\\lambda={mu:.2f})$')
plt.tick_params(axis='x',length=0)
plt.legend(framealpha=1, fancybox=False)
plt.gca().set_axisbelow(True)
plt.grid(axis='y')
plt.show()

The execution result is as follows.
Download (1) .png

Number of complaints	0 cases	1	2 cases	3 cases	4 cases	5 cases	6	7 cases
Observation frequency f	22	49	82	86	63	39	16	8
Expected frequency f_{p}	19.5	57.0	83.6	81.7	59.9	35.1	17.2	11.0

Goodness of fit test

Is the hypothesis that "the object follows a Poisson distribution" (null hypothesis) correct? Is evaluated by ** statistical test **.

If this hypothesis is correct, then the following ** statistic ** $ t calculated from ** observation frequency ** $ \ boldsymbol {x} $ and ** hypothesis-based expected frequency ** $ \ boldsymbol {x} _p $ The test is performed using the property that $ follows (approximately) ** chi-square distribution ** (degrees of freedom $ n-2 $).

t = \sum \frac{(x-x_p)^2}{x_p}

In the current problem, the number of claims is in the $ 8 $ range from $ 0 $ to $ 7 $, so ** degrees of freedom ** is $ n-2 = 8-2 = 6 $.

Let's draw a probability density function for a chi-square distribution with $ 6 $ degrees of freedom.

`Chi-square distribution with 6 degrees of freedom`


import numpy as np
import scipy.stats as st
import japanize_matplotlib
import matplotlib.pyplot as plt

x = np.linspace(0,25,200)
p = st.chi2.pdf(x, df=6)

x_p95 = st.chi2.ppf(0.95, df=6)

y_range = (0.00,0.16)
plt.figure(figsize=(6,3),dpi=120,facecolor='white')
plt.plot(x,p)
plt.vlines(x_p95,*y_range,lw=1,ls=':',color='tab:red')

x2 = np.linspace(0,x_p95,200)
plt.fill_between(x2,st.chi2.pdf(x2,df=6),np.zeros(len(x2)),facecolor='tab:blue',alpha=0.3)

plt.text(5,0.05,'0.95',ha='center',va='center')
plt.text(x_p95,-0.004,f'{x_p95:.2f}',c='tab:red', ha='center',va='top')

plt.text(x_p95,0.08,f'→ Rejection area',c='tab:red')

arrowprops=dict(arrowstyle='->',connectionstyle='angle,angleA=0,angleB=60, rad=1')
kw = dict(xycoords='data',textcoords='data',ha='left', arrowprops=arrowprops)
plt.gca().annotate('0.05', xy=(13.3, 0.003), xytext=(15.5,0.016), **kw)

plt.xlim(0,25)
plt.ylim(*y_range)

plt.xlabel('Statistics$t$')
plt.show()

The execution result is as follows.

In fact, find the statistic $ t $ and its corresponding $ p $ value.

`Calculation of test statistic and p-value`


t = ( ((f-fp)**2)/fp ).sum()        #Statistic t
p = 1 - st.chi2.cdf(t,df=len(n)-2)  #6 degrees of freedom(8-2)P-value corresponding to t in the chi-square distribution of

print(f'Statistic t= {t:.2f}, P-value= {p:.2f} ',end='')
if p >= 0.05 :
  print('（ >= 0.05 ) \n Null hypothesis (following Poisson distribution) is not rejected')
else :
  print('（ < 0.05 ) \n Null hypothesis (following Poisson distribution) is rejected')

The execution result is as follows.

Statistic t= 3.22, p-value= 0.78 （ >= 0.05 ) 
Null hypothesis (following Poisson distribution) is not rejected

Test whether the observed data follow the Poisson distribution (Test of the goodness of fit of the Poisson distribution by Python)

Problem setting

Visualization

Visualization of observed and expected frequencies

Goodness of fit test

Chi-square distribution with 6 degrees of freedom

Calculation of test statistic and p-value

`Visualization of observed and expected frequencies`

`Chi-square distribution with 6 degrees of freedom`

`Calculation of test statistic and p-value`