Previous explained the Markov chain and the Monte Carlo method, respectively, and summarized the outline of the Markov chain Monte Carlo method (commonly known as the MCMC method). There are multiple algorithms for the MCMC method, but this time I would like to summarize the metropolitan hasting method (commonly known as the MH method).
I learned this time by referring to the following articles and books.
[Bayes Statistics from the Basics: A Practical Introduction to the Hamiltonian Monte Carlo Method](https://www.amazon.co.jp/%E5%9F%BA%E7%A4%8E%E3%81%8B%E3 % 82% 89% E3% 81% AE% E3% 83% 99% E3% 82% A4% E3% 82% BA% E7% B5% B1% E8% A8% 88% E5% AD% A6-% E3% 83% 8F% E3% 83% 9F% E3% 83% AB% E3% 83% 88% E3% 83% 8B% E3% 82% A2% E3% 83% B3% E3% 83% A2% E3% 83% B3% E3% 83% 86% E3% 82% AB% E3% 83% AB% E3% 83% AD% E6% B3% 95% E3% 81% AB% E3% 82% 88% E3% 82% 8B% E5% AE% 9F% E8% B7% B5% E7% 9A% 84% E5% 85% A5% E9% 96% 80-% E8% B1% 8A% E7% 94% B0-% E7% A7% 80% E6% A8% B9 / dp / 4254122128)
[Statistics] Let's explain sampling by Markov chain Monte Carlo method (MCMC) with animation.
Bayesian Statistics from the Basics Round-reading Material Chapter 4 Metropolis-Hastings Method
The Markov chain Monte Carlo method (commonly known as the MCMC method) is a method for obtaining a complex distribution such as a probability distribution in a multidimensional space from ** sampling based on the probability distribution **.
When calculating the expected value of a function $ θ_ {eap} $, it tends to be a high-dimensional function for data analysis. In that case, the calculation is very complicated and often difficult to solve, so consider randomly sampling the target distribution (Monte Carlo method). The Markov chain adjusts the distribution of the sampled values to obtain the desired distribution. This MCMC method has the following four major methods and algorithms.
This time, I will summarize this Metropolis-Hasting method.
In the MCMC method, detailed balance conditions shown below are considered for arbitrary random variables $ θ and θ'$.
f(θ'|θ)f(θ) = f(θ|θ')f(θ') \hspace{1cm}Equation 1
This time,
So this transition nucleus
The proposed distribution is a probability distribution that proposes suitable candidates as a sample from the target distribution. This proposed distribution does not necessarily satisfy the equal sign because it is chosen appropriately. For example
q(θ|θ')f(θ') > q(θ'|θ)f(θ)\hspace{1cm}Equation 2\\
It looks like Equation 2 here. The method of performing probability correction to satisfy detailed balance conditions for this formula is called the ** Metropolis-Hastings algorithm method (commonly known as the MH method) **.
Originally, it is an algorithm proposed in the field of physical chemistry. It was applied to Bayesian statistics. Equation 2 shows that the probability of moving to $ θ $ is greater than the probability of moving to $ θ'$. However, in order to correct with the transition probability, we introduce correction coefficients $ c $ and $ c'$ with positive signs.
f(θ|θ') = cq(θ|θ')\\
f(θ'|θ) = c'q(θ'|θ)
Therefore, after correction,
cq(θ|θ')f(θ')=c'q(θ'|θ)f(θ)
And the integration was established. But even in this case
There are issues such as. Therefore,
r = \frac{c}{c'} = \frac{q(θ'|θ)f(θ)}{q(θ|θ')f(θ')}\\
r' =\frac{c'}{c'} = 1
It will be. Substituting this
rq(θ|θ')f(θ')=r'q(θ'|θ)f(θ)
Because it becomes
Now, the MH algorithm is below.
q(a|θ^{t})f(θ^{t}) > q(θ^{t}|a)f(a)
r =\frac {q(θ^{t}|a)f(a) }{q(a|θ^{t})f(θ^{t})}
Is calculated, accepts $ a $ with a probability of $ r $, and sets $ θ ^ {t + 1} = a $. Discard $ a $ with a probability of $ 1-r $ and set $ θ ^ {t + 1} = θ ^ t $.
In summary, ** accept the proposed candidate point $ a $ with a probability of $ min (1, r) $ $ (θ ^ {t + 1} = a) $, or stay in place $ θ ^ {t + 1} = θ ^ {t} $ is repeated any number of times **. The outline of the movement is shown in the figure below. $ Θ $ makes it easy to move toward the target distribution $ f (θ) $.
Then, I would like to actually implement this MH method. There are two types of MH methods: the independent MH method and the random walk MH method. First of all, about the independent MH method. The theme is as follows.
The posterior distribution of the parameter $ θ $ is assumed to be the gamma distribution of $ f (θ | α = 11, λ = 13) $. Then, the proposed distribution is a normal distribution $ q (θ) $ with an average of 1 and a variance of 0.5. Here, the correction coefficient is
r =\frac {q(θ^{t})f(a) }{q(a)f(θ^{t})}
And. The program is below (the library import etc. are not written. Please see the full text link later).
theta = []
# Initial value
current = 3
theta.append(current)
n_itr = 100000
for i in range(n_itr):
#Random number generation from the proposed distribution
a = rand_prop()
if a < 0:
theta.append(theta[-1])
continue
r = (q(current)*f_gamma(a)) / (q(a)*f_gamma(current))
assert r > 0
if r < 0:
#reject
theta.append(theta[-1])
continue
if r >= 1 or r > st.uniform.rvs():#Generate a random number and accept it if it is greater than r
# Accept
theta.append(a)
current = a
else:
#Reject
theta.append(theta[-1])
The point is how to express whether to accept with a probability of $ r $.
if r >= 1 or r > st.uniform.rvs()
It is decided that the st.uniform.rvs ()
method will generate random numbers in a uniform distribution of $ 0 to $ 1 and accept them if they are larger than that.
As a result of this program, check if the posterior distribution can be reproduced.
plt.title("Histgram from MCMC sampling.")
plt.hist(theta[n_burn_in:], bins=30, normed=True)
xx = np.linspace(0, 2.5,501)
plt.plot(xx, st.gamma(11, 0, 1/13.).pdf(xx))
plt.show()
The data shown in blue as a histogram is the frequency distribution of $ θ $ moved by the MH method. It is normalized and corrected so that the sum is 1. Since the orange color is the posterior distribution that we originally wanted to reproduce, we can confirm that it has been reproduced.
By the way, it seems that it was done without any problem, but in the proposed distribution, the average was 1 and the variance was 0.5. This is arbitrary and may not actually work. For example, if a normal distribution with a mean of 3 and a variance of 0.5 is the proposed distribution,
It will shift like this. In fact, in the independent MH method, there is a problem that the distribution does not converge well unless the distribution close to the target distribution is a steady distribution. In other words, the independent MH method makes a big difference in the results until convergence depending on the quality of the proposed distribution. This time, the posterior distribution has been clarified for simplicity, but in many cases the distribution is unknown in actual data analysis. Therefore, we need to think of ways to solve this problem.
An algorithm called the random walk MH method has been proposed as a method to solve this problem.
Candidates for suggestions
a =θ^{t} + e
will do. For $ e $, a normal distribution with an average of 0 or a uniform distribution is used.
If you choose a symmetric distribution such as a normal distribution or a uniform distribution for the proposed distribution,
q(a|θ^{t}) = q(θ^{t}|a)
And the proposed distribution. Therefore, the correction coefficient r in the random walk MH method is
r = \frac {f(a)}{f(θ^{t})}
And the proposed distribution always disappears.
Now, let's implement and check the same problem as before.
# MCMC sampling
theta = []
current = 5
theta.append(current)
n_itr = 100000
prop_m = 5
prop_sd = np.sqrt(0.10)
x = np.linspace(-1,5,501)
def rand_prop1(prop_m):
return st.norm.rvs(prop_m, prop_sd)
for i in range(n_itr):
a = rand_prop1(current) #Random number generation from the proposed distribution
r = f_gamma(a) / f_gamma(current)
if a < 0 or r < 0:
#reject
theta.append(theta[-1])
pass
elif r >= 1 or r > st.uniform.rvs():
# Accept
theta.append(a)
current = a
status = "acc"
col = "r"
else:
#Reject
theta.append(theta[-1])
pass
The graph description is below.
plt.title("Histgram from MCMC sampling.")
n, b, p = plt.hist(theta[n_burn_in:], bins=50, density=True)
xx = np.linspace(0, 2.5,501)
plt.plot(xx, st.gamma(11, 0, 1/13.).pdf(xx))
plt.show()
kk = 11.
tt = 13.
print ("sample mean:{0:.5f}, sample std:{1:.5f}".format(np.mean(theta), np.std(theta)))
#Theoretical expected value / variance
print("mean:{0:.5f}, std:{1:.5f}".format( kk/tt, np.sqrt(kk*((1/tt)**2))) )
It turned out that the distribution was reproduced neatly. Compared to the independent MH method I mentioned earlier, I found that the algorithm of this random walk MH method seems to be safe to use.
I would like to say that this is a relief, but in reality there are still issues to be solved. After all, you need to specify the value of $ e $ properly, so this also requires a hyperparameter idea. If the variance is small, the percentage of transitions accepted is high, but the advance in the state space is a slow random walk, which requires a long convergence time. On the other hand, the higher the variance, the higher the rejection rate. The Hamiltonian Monte Carlo method is a way to solve this. I hope that will be summarized next time.
This time, we have summarized the Metropolis-Hastings method. Thinking about how to converge towards the mean of the posterior distribution is just like optimizing weight parameters in a neural network.
After all, the algorithms are different, but the purpose I want to do is similar, so I thought that if I could grasp that, I could see the fun and breadth of statistics and machine learning.
The full text of the program is stored here. https://github.com/Fumio-eisan/Montecarlo20200516
Recommended Posts