[Statistics] Let's explain sampling by Markov chain Monte Carlo method (MCMC) with animation.

It is an article that implements and explains the Markov chain Monte Carlo method in Python.

"Calculation Statistics II Markov Chain Monte Carlo Method and Its Surroundings" on p16

The best way to get a feel for the content of this section is in any computer language. Try to implement what has been said here from a blank slate.

So I tried it obediently. Since it's a big deal, I'll explain the code and how it works.

I'll show the resulting animation and plot first: kissing_closed_eyes: (Burn-in period: 1-30 [Data in this period is plotted in lighter colors], up to 150 samplings including rejection) metropolis_norm_1-compressor.gif

Plot the result of sampling repeated 10,000 times. (Of which, Burn-in: 2,000 times)

mcmc10000-compressor.png

Introduction

First of all, import the required libraries.

import numpy as np
import numpy.random as rd
import pandas as pd
import scipy.stats as st
import copy

%matplotlib inline
import matplotlib.pyplot as plt
from matplotlib import animation as ani
import matplotlib.cm as cm
import seaborn as sns
sns.set(style="whitegrid", palette="muted", color_codes=True)

Target Distribution to be sampled this time

f(x,y) = {\sqrt{1-b^2} \over 2\pi } \exp \left( -{1 \over 2} (x_1^2 - 2bx_1x_2 + x_2^2)  \right) \\
\propto \exp \left( -{1 \over 2} (x_1^2 - 2bx_1x_2 + x_2^2)  \right) \\

Use the two-dimensional normal distribution minus the normalization constants. In this distribution, the probability densities are compared, so the constant part can be ignored. Define a function called P (・) in Python and use this as this target distribution.

#Target distribution: Probability density function of two-dimensional normal distribution minus normalization constants
def P(x1, x2, b):
    assert np.abs(b) < 1
    return np.exp(-0.5*(x1**2 - 2*b*x1*x2 + x2**2))

Define various parameters.

# parameters
b = 0.5            #Covariance of object distribution
delta = 1          #Standard deviation of the proposed distribution
dist_type = "norm" #Types of proposed distribution("norm" or "unif")
print "Types of proposed distribution:", dist_type

num_frame = 150.   #Total number of frames for animation

#List to store sampling results
sample = []
# 1:accept, 0:List to store reject
acc_rej = [] 

After setting the parameters, we will start the process of sampling and drawing the animation. I put commentary comments in the important points.

#Setting the initial position and storing it in the sampling result list
current = (3, 3)
sample.append(current)

#A function that draws each frame of the animation
def animate(nframe):
    global current, acc_rej
    print nframe,       #View progress

    #Selection of next step by proposed distribution
    # dist_type: "norm":normal distribution/ "unif": Uniform distribution
    if dist_type == "norm":
        next = (current[0] + rd.normal(0, delta), current[1] + rd.normal(0, delta))
    else:
        next = (current[0] + rd.uniform(-delta, delta), current[1] + rd.uniform(-delta, delta))
    #Ratio of probability density of target distribution at each position ...[[1]]
    P_prev = P(current[0], current[1], b)   #Probability density of target distribution at current position(Numerical value proportional to)
    P_next = P(next[0], next[1], b)         #Probability density of target distribution at the next candidate position(Numerical value proportional to)

    #Take the ratio of the above two values
    r = P_next/P_prev
    
    #Accept in the upper left of the graph/Display the frame to display Reject
    ax = fig.add_subplot(111)
    rect = plt.Rectangle((-3.8,3.2), 1.1, .5,fc="#ffffff", zorder=nframe)
    ax.add_patch(rect)
    
    #Draw a dotted line representing the movement path from the current position to the next candidate position
    plt.plot([current[0], next[0]], [current[1], next[1]], "k--", lw=.3, color="gray") 
    
    if r > 1 or r > rd.uniform(0, 1):     #・ ・ ・[[2]]
        # 0-When the uniform random number of 1 is larger than r, the state is updated.
        current = copy.copy(next)
        #Fill the list with sampled values.
        sample.append(current) 
        
        if nframe < num_frame*.2:
            #The first 20 iterations%Is Burn-Think of it as an in period(The color of the plot is dimmed)
            alpha = 0.2
        else:
            #Restores dot density during normal period
            alpha = 0.8
            #Record accept
            acc_rej.append(1)
            
        #Adopted(Accept)So plot the points.
        plt.scatter(current[0], current[1], alpha=alpha)
        plt.text(-3.7, 3.35, "Accept", zorder=nframe, fontdict={'color':"b"})
        
    else:  
        # 0-If the uniform random number of 1 is smaller than r, it is rejected.
        #When rejected, plot the x mark.
        plt.scatter(next[0], next[1], alpha=0.5, color="r", marker="x")
        plt.text(-3.7, 3.35, "Reject", zorder=nframe, fontdict={'color':"r"})
        
        if nframe <= num_frame*.2:
            #Record reject
            acc_rej.append(0)

    if nframe >= num_frame*.2:
        plt.title("cnt:{}".format(nframe+1))
    else:
        plt.title("cnt:{} [burn-in]".format(nframe+1))

    #Setting the drawing range of the graph
    plt.xlim(-4, 4)
    plt.ylim(-4, 4)
    

fig = plt.figure(figsize=(8,7))
anim = ani.FuncAnimation(fig, animate, frames=int(num_frame), blit=True)
anim.save('metropolis_norm.gif', writer='imagemagick', fps=3, dpi=96)

#Adopted(Accept)Calculation of rate
print "Accept ratio:{0:.5f}".format(np.mean(acc_rej))

metropolis_norm_1-compressor.gif

The acceptance rate was about 60%. Not very expensive: expensive:

This Metropolis-Hastings method selects the next transition destination according to the proposed distribution at each sampling. The following is a contour line of the proposed distribution for selecting the next sampling when the sampling is performed up to the 40th sampling, according to the probability density. The red dot is the ** current position **. The next candidate point that can be obtained from the proposed distribution now

x_1^{(t+1)} = x_1^{(t)} + N(0, 1) \\
x_2^{(t+1)} = x_2^{(t)} + N(0, 1) 

Because it is, such contour lines can be drawn. Since this proposed distribution randomly generates one value according to $ N (0, 1) $ regardless of the current position, for example, as shown in the figure below, the next transition destination is set in the direction of low probability density of the target distribution. You may also choose.

proposal2-compressor.png

Below is an image of the cross section when it is cut vertically along the red line in the above figure. The blue line is the target distribution and the green line is the proposed distribution. Now, a random number is generated from the proposed distribution whose center is the current position, and the blue dot is selected as a candidate for the transition destination.

prop_target01-compressor.png

Now, I would like to compare this "probability density of the target distribution at the current position" and "probability density of the target distribution at the transition destination candidate". The two pink lines below correspond to each. Intuitively, the candidate for this transition destination is a place with a low probability density in light of the target distribution, so it is thought that it should rarely be realized. Now that we are using random numbers from the proposed distribution, the probability density of that target distribution is not taken into account. In order to reflect the probability density of the target distribution, we will consider reflecting this transition destination candidate by accepting and rejecting (Rejet) based on a certain rule.

prop_target02-compressor.png

Let's take a ratio and compare. If the ratio is $ r $,

r.png

Can be expressed as. On the Python code

    #Ratio of probability density of target distribution at each position ...[[1]]
    P_prev = P(current[0], current[1], b)   #Probability density of target distribution at current position(Numerical value proportional to)
    P_next = P(next[0], next[1], b)         #Probability density of target distribution at the next candidate position(Numerical value proportional to)
    #Take the ratio of the above two values
    r = P_next/P_prev

It is the part of.

This $ r $ takes a number greater than or equal to 0. The adoption rule is determined by interpreting this ratio as a certain probability of adoption.

First, if $ r \ ge 1 $ is always adopted.

If $ 0 \ le r <1 $, it is processed so that the value of $ r $ can be regarded as the acceptance probability by comparing it with the uniform random number of [0,1].

In this Python code,

    if r > 1 or r > rd.uniform(0, 1):     #・ ・ ・[[2]]
        …

The part corresponds.

By repeating this, sampling for the target distribution can be performed. The theoretical proof that this works is "Calculation Statistics II Markov Chain Monte Carlo Method and Surroundings". Please refer.

10,000 sampling executions

Since the above animation was small with 150 samplings, let's sample 10000 and plot it so that the target distribution can be seen more.

# parameters
b = 0.5
delta = 1
dist_type = "norm" # "norm" # "unif"
n_samples = 10000

# result
sample = []

#initial state
current = (5, 5)
sample.append(current)

print dist_type

cnt = 1
while cnt < n_samples:
    # candidate of next step
    if dist_type == "norm":
        next = (current[0] + rd.normal(0, delta), current[1] + rd.normal(0, delta))
    else:
        next = (current[0] + rd.uniform(-delta, delta), current[1] + rd.uniform(-delta, delta))

    P_prev = P(current[0], current[1], b)
    P_next = P(next[0], next[1], b)

    r = P_next/P_prev

    if r > 1 or r > rd.uniform(0, 1):
        # 0-When the uniform random number of 1 is larger than r, the state is updated.
        current = copy.copy(next)
        sample.append(current)
        cnt += 1
    
sample = np.array(sample)
plt.figure(figsize=(9,6))
plt.scatter(sample[int(len(sample)*0.2):,0], sample[int(len(sample)*0.2):,1], alpha=0.2)
plt.title("Scatter plot of 2-dim normal random variable with MCMC.")
plt.show()

mcmc10000-compressor.png

The covariance matrix

[[1,  0.5],
 [0.5,  1]]

The plot looks like it has a normal distribution of: laughing:

Transition of average value

Let's also look at the transition of the average value while sampling $ x_1 $ and $ x_2 $. You can see that the average value is gradually approaching 0 as expected. Just when the average value became 0, it reached 10000 (including 2000 burn-in), so it may be okay to increase the number of samplings a little more.

ave = [[],[]]

start = len(sample) * 0.2
for i, d in enumerate(np.array(sample[int(start):])):
    #print d
    for j in range(2):
        if i == 0:
            ave[j].append(float(d[j]))
        else:
            ave[j].append( (ave[j][i-1]*i + d[j])/float(i+1) )


plt.figure(figsize=(15, 5))
plt.xlim(0, len(sample[int(start):]))
plt.plot(np.array(ave).T, lw=1)
plt.title("Sequence of x's and y's mean.")
plt.show()

mean_2-compressor.png

Histogram of sampling results

fig = plt.figure(figsize=(15,6))

ax = fig.add_subplot(121)
plt.hist(sample[start:,0], bins=30)
plt.title("x axis")

ax = fig.add_subplot(122)
plt.hist(sample[start:,1], bins=30, color="g")
plt.title("y axis")

plt.show()

hist_02-compressor.png

Set of code

The code used in this article can be found on Github here.

reference

"Computational Statistics II Markov Chain Monte Carlo Method and Surroundings" Part I "Basics of Markov Chain Monte Carlo Method" (Yukito Iba)   https://www.iwanami.co.jp/.BOOKS/00/0/0068520.html

Markov Chain Monte Carlo Method (MCMC) to Understand by Visualization   http://d.hatena.ne.jp/hoxo_m/20140911/p1 ⇒ @hoxo_m Oyabun had already written a similar animation on his blog ... Sorry for the second decoction ...

Preferences to generate animated GIFs from Python on Mac   http://qiita.com/kenmatsu4/items/573ca0733b192d919d0e

Recommended Posts

[Statistics] Let's explain sampling by Markov chain Monte Carlo method (MCMC) with animation.
The first Markov chain Monte Carlo method by PyStan
[Statistics] Visualize and understand the Hamiltonian Monte Carlo method with animation.
Understand the metropolitan hasting method (one of the methods in Markov chain Monte Carlo method) with implementation
Estimating π by Monte Carlo method
Dominion compression play analyzed by Monte Carlo method
PRML Chapter 11 Markov Chain Monte Carlo Python Implementation
Markov chain and Monte Carlo methods are summarized
Monte Carlo method
Speed comparison of each language by Monte Carlo method
Finding pi with a three-line function [Python / Monte Carlo method]