The gamma distribution is used to model values that take positive continuous values. There are two parameters, but the definitions are slightly different depending on the library, and it is difficult to understand the average value and variance, so I feel like I'm always researching them, so I'll summarize them.
\begin{align}
f(x) &= \frac{1}{\Gamma(k) \theta^k} x^{k-1} e^{- \frac{x}{\theta}} \\
&= \frac{\lambda^k}{\Gamma(k)} x^{k-1} e^{- \lambda x}
\end{align}
It looks like this when plotted with some parameters.
Average $ \ mu = k \ theta = \ frac {k} {\ lambda} $ Variance $ v = k \ theta ^ 2 = \ frac {k} {\ lambda ^ 2} $
On the contrary, if you want to determine the parameters from the mean and variance, use this.
In summary, it looks like this. I wanted to write this table!
Library | shape parameter | scale parameter |
---|---|---|
numpy.random.gamma | ||
scipy.stats.gamma | ||
PyMC3(pm.Gamma) | ||
TensorFlow Probability (tfp.Gamma)) | concentration | 1/rate |
Stan (gamma) | ||
R (rgamma) | shape | scale, 1/rate |
numpy and scipy use scale parameter $ \ theta $, but in so-called stochastic programming languages such as PyMC3 and Stan TFP, they are specified by the reciprocal of $ \ theta $.
The reciprocal of $ \ theta $, $ \ lambda $, is called rate parameter, and the library that calls the parameter $ \ alpha, \ beta $ seems to adopt the definition by $ \ lambda $.
In PyMC3, it is also possible to specify the gamma distribution by mean (mu) and standard deviation (sigma). Also, in R, it seems that either shape or rate can be specified.
In order to confirm whether the above parameter list is correct, we obtained 10,000 random numbers from $ Gamma (2, 2) $ in each library, estimated the probability density function, and compared them.
import numpy as np
import scipy as sp
import pymc3 as pm
import tensorflow_probability as tfp
import pystan
import matplotlib.pyplot as plt
import seaborn as sns
shape = 2
scale = 2
rate = 1 / scale
n_sample = 10000
xx = np.linspace(0, 20)
# ground truth
gamma_pdf = sp.stats.gamma(a=shape, scale=scale).pdf
s_np = np.random.gamma(shape=shape, scale=scale, size=n_sample)
s_sp = sp.stats.gamma(a=shape, scale=scale).rvs(size=n_sample)
s_tfp = tfp.distributions.Gamma(concentration=shape, rate=rate).sample(n_sample).numpy()
s_pm = pm.Gamma.dist(alpha=shape, beta=rate).random(size=n_sample)
fig, ax = plt.subplots()
ax.plot(xx, gamma_pdf(xx), label='truth', lw=2, c='k')
sns.kdeplot(s_np, ax=ax, label='numpy', alpha=0.8)
sns.kdeplot(s_sp, ax=ax, label='scipy', alpha=0.8)
sns.kdeplot(s_tfp, ax=ax, label='TFP', alpha=0.8)
sns.kdeplot(s_pm, ax=ax, label='PyMC3', alpha=0.8)
The result is shown in the figure below, and it was confirmed that all libraries were implemented correctly.
Only Stan did not know how to obtain random numbers directly from the probability distribution, so instead I tried to estimate the parameters of the gamma distribution from the random numbers obtained above.
stan_code = '''
data {
int N;
vector<lower=0>[N] Y;
}
parameters {
real<lower=0> shape;
real<lower=0> rate;
}
model {
Y ~ gamma(shape, rate);
}
'''
data = dict(N=n_sample, Y=s_np)
stan_model = pystan.StanModel(model_code=stan_code)
fit = stan_model.sampling(data=data)
print(fit)
The estimated values of the shape and rate parameters are 1.98 and 0.49, respectively, which are also the expected results.
Inference for Stan model: anon_model_6a5d60bed963727c801dc434b96a49a1.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
shape 1.98 8.3e-4 0.03 1.93 1.97 1.98 2.0 2.04 1016 1.0
rate 0.49 2.3e-4 7.4e-3 0.47 0.48 0.49 0.49 0.5 1020 1.0
lp__ -2.3e4 0.03 1.01 -2.3e4 -2.3e4 -2.3e4 -2.3e4 -2.3e4 1192 1.0
It is convenient to prepare a function to calculate the shape and scale of the corresponding gamma distribution from the mean and standard deviation.
def calc_gamma_param(mu, sigma):
return (mu / sigma)**2, sigma**2 / mu
mu, sigma = 4, 2
shape, scale = calc_gamma_param(mu, sigma)
def plot_gamma(xx, shape, scale):
plt.plot(xx, sp.stats.gamma(a=shape, scale=scale).pdf(xx), label=f'shape={shape}, scale={scale}')
xx = np.linspace(0, 10)
plot_gamma(xx, shape, scale)
plt.legend()
Gamma distribution with mean 4 and standard deviation 2. Note that the mode (the most probable value) is smaller than the mean because the distribution has a long tail to the right.
When $ k = 1 $, the gamma distribution matches the exponential distribution of the parameter $ \ theta $. When $ k = \ frac {n} {2} (n = 1,2, \ dots), \ \ theta = 2 $, the gamma distribution matches the chi-square distribution with $ n $ degrees of freedom.
Click here for the code.
from scipy import stats
xx = np.linspace(0, 10)
fig, ax = plt.subplots(1, 2, figsize=(10, 3))
shape, scale = 1, 3
ax[0].plot(xx, stats.gamma(a=shape, scale=scale).pdf(xx), label=f'Gamma({shape}, {scale})')
ax[0].plot(xx, stats.expon(scale=scale).pdf(xx), label=f'Exp({scale})')
ax[0].legend()
shape, scale = 3/2, 2
ax[1].plot(xx, stats.gamma(a=shape, scale=scale).pdf(xx), label=f'Gamma({shape}, {scale})')
ax[1].plot(xx, stats.chi2(df=2*shape).pdf(xx), label=f'Chi2({int(2*shape)})')
ax[1].legend()
plt.savefig('gamma_exp_chi2.png')
Recommended Posts