[Statistical test 2nd grade] Discrete probability distribution

Introduction

Various probability distributions appear in the second grade of the statistical test. This time, I briefly summarized the discrete probability distributions. Also, in Python, actually draw each probability distribution to deepen your understanding. (The explanation of the code is omitted.)

The symbols in this article correspond to the following meanings.

symbol	meaning
P(A)	EventAProbability of
X	Random variable
E[X]	Random variableXExpected value of
V[X]	Random variableXDispersion
n	Number of trials

Binomial distribution

Binomial distribution that the probability of success is $ p $ and the number of successes is $ x $ when $ n $ Bernoulli trials are performed, that is, the number of successes $ X = x $ follows. It is called distribution. The probability function is

P(X=x)≡f(x)={}_n C_{x}p^x(1-p)^{n-x}\\

It will be.

Also, if the expected value and variance of the binomial distribution are calculated based on the definition,

E[X]=np\\
V[X]=E[X^2]-μ^2=np(1-p)\\

It will be.

In particular, the distribution when $ n = 1 $ is called the Bernoulli distribution.

Draw a binomial distribution ($ n = 40, p = 0.25,0.5,0.75 $) using Python's scipy library.

from scipy.stats import binom 
import numpy as np
import matplotlib.pyplot as plt

x =  np.arange(1, 40, 1)
y1= [binom.pmf(i, 40, 0.25) for i in x]
y2= [binom.pmf(i, 40, 0.5) for i in x]
y3= [binom.pmf(i, 40, 0.75) for i in x]

plt.bar(x, y1, width=0.5, color="r" ,alpha=0.5, label="Binom p= {}".format(0.25))
plt.bar(x, y2, width=0.5, color="g" ,alpha=0.5, label="Binom p= {}".format(0.5))
plt.bar(x, y3, width=0.5, color="b",alpha=0.5, label="Binom p= {}".format(0.75))

plt.legend(loc=8)
plt.show()

スクリーンショット 2020-11-03 3.14.25.png

Poisson distribution

The Poisson distribution is the probability distribution obtained when the expected value $ np = λ $ is fixed in the binomial distribution and the limits of $ n → ∞ and p → 0 $ are taken for the number of trials and the probability of success. The probability function is

P(X=x)≡f(x)=\frac{e^{-λ}λ^x}{x!}\\

It will be.

The expected value and variance of this distribution

E[X]=λ\\
V[X] = λ\\

It will be. This is easy to see given these limits, as the expected value and variance of the binomial distribution are $ np $ and $ np (1-p) $.

Let's draw a Poisson distribution of $ λ = 10, 20, 30 $.

from scipy.stats import poisson
fig, ax = plt.subplots(1, 1)

x =  np.arange(1, 50, 1)
y1= [poisson.pmf(i, 10) for i in x]
y2= [poisson.pmf(i, 20) for i in x]
y3= [poisson.pmf(i, 30) for i in x]

plt.bar(x, y1, width=0.5, color="r", alpha=0.5, label="Poisson λ= {}".format(10))

plt.bar(x, y2, width=0.5, color="g", alpha=0.5, label="Poisson λ= {}".format(20))

plt.bar(x, y3, width=0.5, color="b", alpha=0.5, label="Poisson λ= {}".format(30))

plt.legend()
plt.show()

スクリーンショット 2020-11-03 4.10.17.png

The Poisson distribution is the distribution obtained when considering the limits of the binomial distribution parameters $ n $ and $ p $. Let's see how big $ n $ actually overlaps the two distributions.

Fix it at $ λ = 10 $, change $ n $ and $ p $, and see how the distribution changes.

from scipy.stats import poisson
fig, axes = plt.subplots(1, 3, figsize=(15,5))

x =  np.arange(1, 30, 1)
y1= [poisson.pmf(i, 10) for i in x]
y2 = [binom.pmf(i, 10**1, 10**0) for i in x]
y3 = [binom.pmf(i, 10**2, 10**-1) for i in x]
y4 = [binom.pmf(i, 10**3, 10**-2) for i in x]

axes[0].bar(x, y1, width=0.5, color="r", alpha=0.3, label="Poisson λ= {}".format(10))
axes[0].bar(x, y2, width=0.5, color="b", alpha=0.3, label="Binom n= {}".format(10))
axes[0].set_title('n=10')
axes[0].legend()
axes[1].bar(x, y1, width=0.5, color="r", alpha=0.3, label="Poisson λ= {}".format(10))
axes[1].bar(x, y3, width=0.5, color="b", alpha=0.3, label="Binom n= {}".format(100))
axes[1].set_title('n=100')
axes[1].legend()
axes[2].bar(x, y1, width=0.5, color="r", alpha=0.3, label="Poisson λ= {}".format(10))
axes[2].bar(x, y4, width=0.5, color="b", alpha=0.3, label="Binom n= {}".format(1000))
axes[2].set_title('n=1000')
axes[2].legend()

スクリーンショット 2020-11-03 4.49.04.png

Comparing the graphs, it cannot be said that the approximation is good at n = 10, but it can be seen that the distribution is almost the same when n = 100, 1000. In other words, it seems good to say that Bernoulli trials with three or more digits follow the Poisson distribution.

Geometric distribution

The probability distribution of the number of trials X when a Bernoulli trial with a success probability of $ p $ is repeated until it succeeds for the first time is called a geometric distribution. The probability function of this distribution is

P(X=x)≡f(x)=p(1-p)^{x-1}\\

It will be. Expected value, variance

E[X]=\frac{1}{p}\\
V[X]=\frac{1-p}{p^2}\\

It will be. Drawing a geometric distribution of $ p = 0.1 $ in Python looks like this:

from scipy.stats import geom
fig, axes = plt.subplots(1, 1)

x =  np.arange(1, 30, 1)
y = [geom.pmf(i, 0.1) for i in x]

plt.bar(x, y, width=0.5, color="g", alpha=0.5, label="Geom p= {}".format(0.1))

plt.legend()
plt.show()

スクリーンショット 2020-11-03 5.18.32.png

Finally

Next, I would like to summarize the continuous probability distribution. I applied for the statistical test two weeks later, so I will do my best to study!

reference

Revised edition officially certified by the Japan Statistical Society, "Statistics Basics" for Level 2 Statistical Test Understand the Poisson distribution carefully and draw it in Python