This article is about Bayesian estimation of a simple linear regression model. Specifically, the Markov chain Monte Carlo method (commonly known as MCMC) was implemented in python using Gibbs sampling. Execution of MCMC in python is famous by a module called pymc, but this time I dared to implement it only with numpy and scipy without using pymc. In writing this article, Mr. Kosumi's ["Bayes Computational Statistics" (Asakura Shoten)](https://www.amazon.co.jp/%E3%83%99%E3%82%A4%E3%82 % BA% E8% A8% 88% E7% AE% 97% E7% B5% B1% E8% A8% 88% E5% AD% A6-% E7% B5% B1% E8% A8% 88% E8% A7% A3% E6% 9E% 90% E3% 82% B9% E3% 82% BF% E3% 83% B3% E3% 83% 80% E3% 83% BC% E3% 83% 89-% E5% 8F% A4 % E6% BE% 84-% E8% 8B% B1% E7% 94% B7 / dp / 4254128568 / ref = sr_1_1? Ie = UTF8 & qid = 1485493617 & sr = 8-1 & keywords =% E3% 83% 99% E3% 82% A4 % E3% 82% BA% E8% A8% 88% E7% AE% 97% E7% B5% B1% E8% A8% 88) was greatly referred to.
Before implementation, let's look back at the theory, though it is simple. The linear regression model you want to estimate
y_i=\boldsymbol{x}_{i}'\boldsymbol{\beta}+u_i, \ \ u_i\sim N(0,\sigma^2) \ \ (i=1,\cdots,n)
($ Y_i $ is an unexplained variable, $ \ boldsymbol {x} $ is an explanatory variable vector of k × 1, $ u_i $ is mean 0, and is independent of each other in a normal distribution with variance $ \ sigma ^ 2 $. Represents an error term according to). At this time, $ y_i $ follows a normal distribution with mean $ \ boldsymbol {x} _ {i} ^ {'} \ boldsymbol {\ beta} $ and variance $ \ sigma ^ 2 $, so that the likelihood function is
\begin{align}
f(\boldsymbol{y} \ | \ \boldsymbol{\beta},\sigma^2) &= \prod_{i=1}^{n}\frac{1}{(2\pi\sigma^2)^{1/2}}\mathrm{exp}\left\{-\frac{(y_i-\boldsymbol{x}_{i}'\boldsymbol{\beta})^2}{2\sigma^2}\right\} \\
&\propto \left(\frac{1}{\sigma^2}\right)^{n/2}\mathrm{exp}\left\{-\frac{\sum_{i=1}^{n}(y_i-\boldsymbol{x}_{i}'\boldsymbol{\beta})^2}{2\sigma^2}\right\}
\end{align}
Will be. Regarding the prior distribution of parameters,
\boldsymbol{\beta} \sim N(\boldsymbol{\beta}_0,\boldsymbol{B}_0), \ \ \sigma^2 \sim IG\left(\frac{n_0}{2},\frac{s_0}{2}\right)
(However, IG means an inverse gamma distribution). Therefore, when Bayes' theorem is applied to the above likelihoods and prior distributions, the posterior distribution becomes
\pi(\boldsymbol{\beta},\sigma^2 \ | \ \boldsymbol{y}) \propto \left(\frac{1}{\sigma^2}\right)^{n/2}\mathrm{exp}\left\{-\frac{\sum_{i=1}^{n}(y_i-\boldsymbol{x}_{i}'\boldsymbol{\beta})^2}{2\sigma^2}\right\}\mathrm{exp}\left\{-\frac{1}{2}(\boldsymbol{\beta}-\boldsymbol{\beta}_0)'\boldsymbol{B}_{0}^{-1}(\boldsymbol{\beta}-\boldsymbol{\beta}_0)\right\}\left(\frac{1}{\sigma^2}\right)^{n_0/2+1}\mathrm{exp}\left(-\frac{s_0}{2\sigma^2}\right)
Will be.
Now, although the kernel of the posterior distribution can be successfully derived, the integral calculation of the above equation cannot be performed analytically, so the standardized constant of the posterior distribution cannot be calculated (that is why sampling using MCMC is necessary). Will be). Therefore, we successfully generate random numbers that follow the posterior distribution by Gibbs sampling and analyze the properties of the desired distribution, but in order to perform Gibbs sampling, we must derive a complete conditional distribution of each parameter. This can be calculated from the posterior distribution kernel derived earlier, but the process is complicated, so omit it and show only the result (for details, each person ["Bayesian Computational Statistics" (Asakura Shoten)](https: // www) .amazon.co.jp/%E3%83%99%E3%82%A4%E3%82%BA%E8%A8%88%E7%AE%97%E7%B5%B1%E8%A8%88% E5% AD% A6-% E7% B5% B1% E8% A8% 88% E8% A7% A3% E6% 9E% 90% E3% 82% B9% E3% 82% BF% E3% 83% B3% E3 % 83% 80% E3% 83% BC% E3% 83% 89-% E5% 8F% A4% E6% BE% 84-% E8% 8B% B1% E7% 94% B7 / dp / 4254128568 / ref = sr_1_1 ? ie = UTF8 & qid = 1485493617 & sr = 8-1 & keywords =% E3% 83% 99% E3% 82% A4% E3% 82% BA% E8% A8% 88% E7% AE% 97% E7% B5% B1% E8% Please refer to A8% 88) etc.). The complete conditional distribution of each parameter
\begin{align}
\pi(\boldsymbol{\beta} \ | \ \sigma^2,\boldsymbol{y}) &= N(\hat{\beta},\hat{\boldsymbol{B}}) \\
\pi(\sigma^2 \ | \ \boldsymbol{\beta},\boldsymbol{y}) &= IG\left(\frac{n+n_0}{2},\frac{\sum_{i=1}^{n}(y_i-\boldsymbol{x}_{i}'\boldsymbol{\beta})^2)+s_0}{2}\right)
\end{align} \\
However,
\hat{\boldsymbol{B}}^{-1}=\sum_{i=1}^{n}\frac{\boldsymbol{x}_i\boldsymbol{x}_i'}{\sigma^2}+\hat{\boldsymbol{B}}_{0}^{-1}, \ \ \hat{\boldsymbol{\beta}}=\hat{\boldsymbol{B}}\left(\sum_{i=1}^{n}\frac{\boldsymbol{x}_iy_i}{\sigma^2}+\hat{\boldsymbol{B}}_{0}^{-1}\boldsymbol{\beta}_0\right)
Is. Sampling from this perfect conditional distribution may be performed alternately one after another.
Pseudo data is generated, Bayesian estimation is performed from the data, and it is verified whether the distribution of the results approaches the original parameters. Very simple, but the Gibbs sampling pytho code is below.
import numpy as np
from numpy.random import normal, multivariate_normal
from scipy.stats import invgamma
import matplotlib.pyplot as plt
##Pseudo data generation
alpha = 2.5
beta = 0.8
sigma = 10
data = 100
x = np.c_[np.ones(data), rand(data) * data]
y = 2.5 * x[:,0] + 0.8 * x[:,1] + normal(0, sigma, data)
plt.plot(x[:,1],y,'o')
##Gibbs sampling
#Specify the number of burn ins and the number of samplings
burnin = 10000
it = 2000
beta = np.zeros([burnin + it, 2])
sigma = np.zeros(burnin + it)
B_hat = np.zeros([2,2])
beta_hat = np.zeros(2)
sum1 = np.zeros([2,2])
sum2 = np.zeros(2)
#Specifying hyperparameters
beta0 = np.array([0,0])
B0 = np.array([[1, 0],[0,1]])
n0 = 2
s0 = 2
#Initial value setting
beta[0,0] = 1
beta[0,1] = 1
sigma[0] = 1
for i in range(data):
sum1[0,0] += x[i,0] ** 2
sum1[0,1] += x[i,0] * x[i,1]
sum1[0,1] += x[i,0] * x[i,1]
sum1[1,1] += x[i,1] ** 2
sum2[0] += x[i,0] * y[i]
sum2[1] += x[i,1] * y[i]
for i in range(burnin + it - 1):
B_hat = np.linalg.inv(sum1/sigma[i] + np.linalg.inv(B0))
beta_hat = B_hat @ (sum2/sigma[i] + np.linalg.inv(B0) @ beta0)
beta[i+1,:] = multivariate_normal(beta_hat, B_hat)
sigma[i+1] = invgamma.rvs((data+n0)/2,(np.dot(y-beta[i,0]*x[:,0]-beta[i,1]*x[:,1], y-beta[i,0]*x[:,0]-beta[i,1]*x[:,1])))
As a result of sampling, the histogram of beta is as follows.
It can be seen that the MAP estimator is approaching the initially set value of 0.8. I did it.
(Details will be added later)
Recommended Posts