Understand the metropolitan hasting method (one of the methods in Markov chain Monte Carlo method) with implementation

Introduction

Previous explained the Markov chain and the Monte Carlo method, respectively, and summarized the outline of the Markov chain Monte Carlo method (commonly known as the MCMC method). There are multiple algorithms for the MCMC method, but this time I would like to summarize the metropolitan hasting method (commonly known as the MH method).

I learned this time by referring to the following articles and books.

Looking back on the Markov chain Monte Carlo method

The Markov chain Monte Carlo method (commonly known as the MCMC method) is a method for obtaining a complex distribution such as a probability distribution in a multidimensional space from ** sampling based on the probability distribution **.  image.png

When calculating the expected value of a function $ θ_ {eap} $, it tends to be a high-dimensional function for data analysis. In that case, the calculation is very complicated and often difficult to solve, so consider randomly sampling the target distribution (Monte Carlo method). The Markov chain adjusts the distribution of the sampled values to obtain the desired distribution.   This MCMC method has the following four major methods and algorithms.

This time, I will summarize this Metropolis-Hasting method.

What is the Metropolis-Hastings method?

In the MCMC method, detailed balance conditions shown below are considered for arbitrary random variables $ θ and θ'$.

f(θ'|θ)f(θ) = f(θ|θ')f(θ') \hspace{1cm}Equation 1

This time,f(θ'|θ)as well asf(θ|θ')ButTransition nucleusIt is a distribution called. Usually known posterior distributionf(θ)に対して式1を満たすようなTransition nucleusをいきなり見つけることは非常に困難です。

So this transition nucleusf(|)Instead ofAppropriate analystTransition nucleusq(|)ToProposed distributionI will use it as.この適当に選ぶことが実は難しさTo表していることが後の実装で分かります。

The proposed distribution is a probability distribution that proposes suitable candidates as a sample from the target distribution. This proposed distribution does not necessarily satisfy the equal sign because it is chosen appropriately. For example

q(θ|θ')f(θ') > q(θ'|θ)f(θ)\hspace{1cm}Equation 2\\

It looks like Equation 2 here. The method of performing probability correction to satisfy detailed balance conditions for this formula is called the ** Metropolis-Hastings algorithm method (commonly known as the MH method) **.

Originally, it is an algorithm proposed in the field of physical chemistry. It was applied to Bayesian statistics.   Equation 2 shows that the probability of moving to $ θ $ is greater than the probability of moving to $ θ'$. However, in order to correct with the transition probability, we introduce correction coefficients $ c $ and $ c'$ with positive signs.

f(θ|θ') = cq(θ|θ')\\
f(θ'|θ) = c'q(θ'|θ)

Therefore, after correction,

 cq(θ|θ')f(θ')=c'q(θ'|θ)f(θ)

And the integration was established. But even in this case

There are issues such as. Therefore,

r = \frac{c}{c'} = \frac{q(θ'|θ)f(θ)}{q(θ|θ')f(θ')}\\
r' =\frac{c'}{c'} = 1

It will be. Substituting this

 rq(θ|θ')f(θ')=r'q(θ'|θ)f(θ)

Because it becomesrq(θ|θ')Whenr'q(θ'|θ)を遷移核Whenすれば、詳細釣り合い条件を満たすこWhenWhenなります。

MH algorithm

Now, the MH algorithm is below.

  1. Use the proposed distribution $ q (| θ ^ {t}) $ to generate a random number $ a $.
  2. Determine the following propositions.
q(a|θ^{t})f(θ^{t}) > q(θ^{t}|a)f(a)
r  =\frac {q(θ^{t}|a)f(a) }{q(a|θ^{t})f(θ^{t})}

Is calculated, accepts $ a $ with a probability of $ r $, and sets $ θ ^ {t + 1} = a $. Discard $ a $ with a probability of $ 1-r $ and set $ θ ^ {t + 1} = θ ^ t $.

  1. Set $ t = t + 1 $ and return to 1.

In summary, ** accept the proposed candidate point $ a $ with a probability of $ min (1, r) $ $ (θ ^ {t + 1} = a) $, or stay in place $ θ ^ {t + 1} = θ ^ {t} $ is repeated any number of times **. The outline of the movement is shown in the figure below. $ Θ $ makes it easy to move toward the target distribution $ f (θ) $.   image.png

Implement and check (independent MH method)

Then, I would like to actually implement this MH method. There are two types of MH methods: the independent MH method and the random walk MH method. First of all, about the independent MH method. The theme is as follows.

The posterior distribution of the parameter $ θ $ is assumed to be the gamma distribution of $ f (θ | α = 11, λ = 13) $. Then, the proposed distribution is a normal distribution $ q (θ) $ with an average of 1 and a variance of 0.5. Here, the correction coefficient is

r  =\frac {q(θ^{t})f(a) }{q(a)f(θ^{t})}

And. The program is below (the library import etc. are not written. Please see the full text link later).

theta = []

# Initial value
current = 3
theta.append(current)

n_itr = 100000

for i in range(n_itr):
    #Random number generation from the proposed distribution
    a = rand_prop()
    if a < 0:
        theta.append(theta[-1])
        continue

    r = (q(current)*f_gamma(a)) / (q(a)*f_gamma(current))
    assert r > 0
    
    if r < 0:
        #reject
        theta.append(theta[-1])
        continue
    if r >= 1 or r > st.uniform.rvs():#Generate a random number and accept it if it is greater than r
        # Accept
        theta.append(a)
        current = a
    else:
        #Reject
        theta.append(theta[-1])

The point is how to express whether to accept with a probability of $ r $.

if r >= 1 or r > st.uniform.rvs()

It is decided that the st.uniform.rvs () method will generate random numbers in a uniform distribution of $ 0 to $ 1 and accept them if they are larger than that.

As a result of this program, check if the posterior distribution can be reproduced.


plt.title("Histgram from MCMC sampling.")
plt.hist(theta[n_burn_in:], bins=30, normed=True)

xx = np.linspace(0, 2.5,501)
plt.plot(xx, st.gamma(11, 0, 1/13.).pdf(xx))
plt.show()

017_100000(loc=1).png

The data shown in blue as a histogram is the frequency distribution of $ θ $ moved by the MH method. It is normalized and corrected so that the sum is 1. Since the orange color is the posterior distribution that we originally wanted to reproduce, we can confirm that it has been reproduced.

By the way, it seems that it was done without any problem, but in the proposed distribution, the average was 1 and the variance was 0.5. This is arbitrary and may not actually work. For example, if a normal distribution with a mean of 3 and a variance of 0.5 is the proposed distribution, 018.png

It will shift like this. In fact, in the independent MH method, there is a problem that the distribution does not converge well unless the distribution close to the target distribution is a steady distribution. In other words, the independent MH method makes a big difference in the results until convergence depending on the quality of the proposed distribution. This time, the posterior distribution has been clarified for simplicity, but in many cases the distribution is unknown in actual data analysis. Therefore, we need to think of ways to solve this problem.

Random walk MH method

An algorithm called the random walk MH method has been proposed as a method to solve this problem.

Candidates for suggestions

a =θ^{t} + e

will do. For $ e $, a normal distribution with an average of 0 or a uniform distribution is used.

If you choose a symmetric distribution such as a normal distribution or a uniform distribution for the proposed distribution,

q(a|θ^{t}) = q(θ^{t}|a)

And the proposed distribution. Therefore, the correction coefficient r in the random walk MH method is

r = \frac {f(a)}{f(θ^{t})}

And the proposed distribution always disappears.

Implement and check (random walk MH method)

Now, let's implement and check the same problem as before.

# MCMC sampling

theta = []
current  = 5
theta.append(current)
n_itr = 100000
prop_m = 5
prop_sd = np.sqrt(0.10)
x = np.linspace(-1,5,501)

def rand_prop1(prop_m):
    return st.norm.rvs(prop_m, prop_sd)

for i in range(n_itr):
    a = rand_prop1(current)  #Random number generation from the proposed distribution
    r = f_gamma(a) / f_gamma(current)
    
    if a < 0 or r < 0:
        #reject
        theta.append(theta[-1])
        pass
    elif r >= 1 or r > st.uniform.rvs():
        # Accept
        theta.append(a)
        current = a
        status = "acc"
        col = "r"
    else:
        #Reject
        theta.append(theta[-1])
        pass

The graph description is below.

plt.title("Histgram from MCMC sampling.")
n, b, p = plt.hist(theta[n_burn_in:], bins=50, density=True)

xx = np.linspace(0, 2.5,501)
plt.plot(xx, st.gamma(11, 0, 1/13.).pdf(xx))
plt.show()

kk = 11.
tt = 13.
print ("sample mean:{0:.5f}, sample std:{1:.5f}".format(np.mean(theta), np.std(theta)))
#Theoretical expected value / variance
print("mean:{0:.5f}, std:{1:.5f}".format( kk/tt, np.sqrt(kk*((1/tt)**2))) )

019.png

It turned out that the distribution was reproduced neatly. Compared to the independent MH method I mentioned earlier, I found that the algorithm of this random walk MH method seems to be safe to use.

However, the problem with the MH method

I would like to say that this is a relief, but in reality there are still issues to be solved. After all, you need to specify the value of $ e $ properly, so this also requires a hyperparameter idea.   If the variance is small, the percentage of transitions accepted is high, but the advance in the state space is a slow random walk, which requires a long convergence time. On the other hand, the higher the variance, the higher the rejection rate.   The Hamiltonian Monte Carlo method is a way to solve this. I hope that will be summarized next time.

At the end

This time, we have summarized the Metropolis-Hastings method. Thinking about how to converge towards the mean of the posterior distribution is just like optimizing weight parameters in a neural network.

After all, the algorithms are different, but the purpose I want to do is similar, so I thought that if I could grasp that, I could see the fun and breadth of statistics and machine learning.

The full text of the program is stored here. https://github.com/Fumio-eisan/Montecarlo20200516

Recommended Posts

Understand the metropolitan hasting method (one of the methods in Markov chain Monte Carlo method) with implementation
The first Markov chain Monte Carlo method by PyStan
Increase the speed of the Monte Carlo method of Cython cut-out implementation.
[Statistics] Visualize and understand the Hamiltonian Monte Carlo method with animation.
The first Markov chain Monte Carlo method by PyStan
[Statistics] Visualize and understand the Hamiltonian Monte Carlo method with animation.
Understand the metropolitan hasting method (one of the methods in Markov chain Monte Carlo method) with implementation
Estimating π by Monte Carlo method
Dominion compression play analyzed by Monte Carlo method
PRML Chapter 11 Markov Chain Monte Carlo Python Implementation
Markov chain and Monte Carlo methods are summarized
Monte Carlo method
Speed comparison of each language by Monte Carlo method
Finding pi with a three-line function [Python / Monte Carlo method]
PRML Chapter 11 Markov Chain Monte Carlo Python Implementation
Markov chain and Monte Carlo methods are summarized
Try implementing the Monte Carlo method in Python
Calculation of the shortest path using the Monte Carlo method
Save the output of GAN one by one ~ With the implementation of GAN by PyTorch ~
Find the ratio of the area of Lake Biwa by the Monte Carlo method
Simulate Monte Carlo method in Python
4 methods to count the number of occurrences of integers in a certain interval (including imos method) [Python implementation]
Understand the images of various matrix operations used in Keras (Tensorflow) with examples
One of the cluster analysis methods, k-means, is executed with scikit-learn or implemented without scikit-learn.
Here is one of the apps with "artificial intelligence" that I was interested in.