Various probability distributions appear in the second grade of the statistical test. This time, I briefly summarized the discrete probability distributions. Also, in Python, actually draw each probability distribution to deepen your understanding. (The explanation of the code is omitted.)
symbol | meaning |
---|---|
Event |
|
Random variable | |
Random variable |
|
Random variable |
|
n | Number of trials |
Binomial distribution that the probability of success is $ p $ and the number of successes is $ x $ when $ n $ Bernoulli trials are performed, that is, the number of successes $ X = x $ follows. It is called distribution. The probability function is
P(X=x)≡f(x)={}_n C_{x}p^x(1-p)^{n-x}\\
It will be.
Also, if the expected value and variance of the binomial distribution are calculated based on the definition,
E[X]=np\\
V[X]=E[X^2]-μ^2=np(1-p)\\
It will be.
In particular, the distribution when $ n = 1 $ is called the Bernoulli distribution.
Draw a binomial distribution ($ n = 40, p = 0.25,0.5,0.75 $) using Python's scipy library.
from scipy.stats import binom
import numpy as np
import matplotlib.pyplot as plt
x = np.arange(1, 40, 1)
y1= [binom.pmf(i, 40, 0.25) for i in x]
y2= [binom.pmf(i, 40, 0.5) for i in x]
y3= [binom.pmf(i, 40, 0.75) for i in x]
plt.bar(x, y1, width=0.5, color="r" ,alpha=0.5, label="Binom p= {}".format(0.25))
plt.bar(x, y2, width=0.5, color="g" ,alpha=0.5, label="Binom p= {}".format(0.5))
plt.bar(x, y3, width=0.5, color="b",alpha=0.5, label="Binom p= {}".format(0.75))
plt.legend(loc=8)
plt.show()
The Poisson distribution is the probability distribution obtained when the expected value $ np = λ $ is fixed in the binomial distribution and the limits of $ n → ∞ and p → 0 $ are taken for the number of trials and the probability of success. The probability function is
P(X=x)≡f(x)=\frac{e^{-λ}λ^x}{x!}\\
It will be.
The expected value and variance of this distribution
E[X]=λ\\
V[X] = λ\\
It will be. This is easy to see given these limits, as the expected value and variance of the binomial distribution are $ np $ and $ np (1-p) $.
Let's draw a Poisson distribution of $ λ = 10, 20, 30 $.
from scipy.stats import poisson
fig, ax = plt.subplots(1, 1)
x = np.arange(1, 50, 1)
y1= [poisson.pmf(i, 10) for i in x]
y2= [poisson.pmf(i, 20) for i in x]
y3= [poisson.pmf(i, 30) for i in x]
plt.bar(x, y1, width=0.5, color="r", alpha=0.5, label="Poisson λ= {}".format(10))
plt.bar(x, y2, width=0.5, color="g", alpha=0.5, label="Poisson λ= {}".format(20))
plt.bar(x, y3, width=0.5, color="b", alpha=0.5, label="Poisson λ= {}".format(30))
plt.legend()
plt.show()
The Poisson distribution is the distribution obtained when considering the limits of the binomial distribution parameters $ n $ and $ p $. Let's see how big $ n $ actually overlaps the two distributions.
Fix it at $ λ = 10 $, change $ n $ and $ p $, and see how the distribution changes.
from scipy.stats import poisson
fig, axes = plt.subplots(1, 3, figsize=(15,5))
x = np.arange(1, 30, 1)
y1= [poisson.pmf(i, 10) for i in x]
y2 = [binom.pmf(i, 10**1, 10**0) for i in x]
y3 = [binom.pmf(i, 10**2, 10**-1) for i in x]
y4 = [binom.pmf(i, 10**3, 10**-2) for i in x]
axes[0].bar(x, y1, width=0.5, color="r", alpha=0.3, label="Poisson λ= {}".format(10))
axes[0].bar(x, y2, width=0.5, color="b", alpha=0.3, label="Binom n= {}".format(10))
axes[0].set_title('n=10')
axes[0].legend()
axes[1].bar(x, y1, width=0.5, color="r", alpha=0.3, label="Poisson λ= {}".format(10))
axes[1].bar(x, y3, width=0.5, color="b", alpha=0.3, label="Binom n= {}".format(100))
axes[1].set_title('n=100')
axes[1].legend()
axes[2].bar(x, y1, width=0.5, color="r", alpha=0.3, label="Poisson λ= {}".format(10))
axes[2].bar(x, y4, width=0.5, color="b", alpha=0.3, label="Binom n= {}".format(1000))
axes[2].set_title('n=1000')
axes[2].legend()
Comparing the graphs, it cannot be said that the approximation is good at n = 10, but it can be seen that the distribution is almost the same when n = 100, 1000. In other words, it seems good to say that Bernoulli trials with three or more digits follow the Poisson distribution.
The probability distribution of the number of trials X when a Bernoulli trial with a success probability of $ p $ is repeated until it succeeds for the first time is called a geometric distribution. The probability function of this distribution is
P(X=x)≡f(x)=p(1-p)^{x-1}\\
It will be. Expected value, variance
E[X]=\frac{1}{p}\\
V[X]=\frac{1-p}{p^2}\\
It will be. Drawing a geometric distribution of $ p = 0.1 $ in Python looks like this:
from scipy.stats import geom
fig, axes = plt.subplots(1, 1)
x = np.arange(1, 30, 1)
y = [geom.pmf(i, 0.1) for i in x]
plt.bar(x, y, width=0.5, color="g", alpha=0.5, label="Geom p= {}".format(0.1))
plt.legend()
plt.show()
Next, I would like to summarize the continuous probability distribution. I applied for the statistical test two weeks later, so I will do my best to study!
Revised edition officially certified by the Japan Statistical Society, "Statistics Basics" for Level 2 Statistical Test Understand the Poisson distribution carefully and draw it in Python
Recommended Posts