I'm currently learning about Bayesian statistics. Among them, I will summarize the Markov chain Monte Carlo method. As a starting point, this article gives an overview of Markov chains and the Monte Carlo method.
Here is the book that I referred to this time.
In Bayesian inference, consider the probability model $ P (X, Z) $, where the observed data is $ X $ and the set of unobserved variables such as parameters and latent variables is $ Z $. At this time, it is necessary to calculate the posterior distribution of $ P (Z | X) $ in order to solve specific problems such as learning and prediction. By the way, there is a problem that this posterior distribution cannot be solved analytically, or that solving it numerically requires a large amount of calculation. Therefore, I will try to investigate the characteristics of this desired distribution by obtaining multiple samples from $ P (Z | X) $. As this sample ring method, there is ** Markov chain Monte Carlo method **.
To briefly explain each term,
It will be. Below is a brief summary of each.
This is a method to find an approximate solution by trial using random numbers. An approximate solution with a pi of $ π $ is often used as an example of explanation. Consider a circle with a radius of $ R $ and a square covering it.
At this time, if the area of the circle is $ So $ and the area of the square minus the area of the circle is $ Ss $,
So = πR^2\\
Ss = 4R^2-πR^2\\
Can be expressed as. Now, add these two equations and express $ π $ using $ So $ and $ Ss $.
π = \frac {4So}{So + Ss}
It will be. Now, let's change the radius to $ 1 $. By generating a random number in the interval $ [0,1] $, the probability of becoming $ 1 $ if it is inside the circle and $ 0 $ if it is outside the circle is calculated. Implement in python as an example.
import random
inner =0
for i in range(1000000):
x, y = random.random(), random.random()
if x**2 + y**2 < 1:
inner +=1
print (inner * 4/1000000)
It became $ 3.141072 $. It turns out that the value is close to $ 3.1415 $. I confirmed the Monte Carlo method that approximates with random numbers.
A Markov chain means that the next state is determined only by the previous state. If you know the recurrence formula that you learn in high school mathematics, I think you should think of it. It's hard to imagine an event that is determined from the previous state in an actual example, but here is an example seen in a textbook or web page.
There is no end to raising it.
Now consider an example.
Tie problem: One high school officially offers three types of uniform ties. Vermilion, green and blue. Suppose there are 100 first graders. It is empirically known to wear vermilion, green and blue ties on the first day with a probability of $ 0.6,0.25 and 0.15 respectively. ** Students at this high school decide on the day's tie pattern based solely on the tie pattern they squeezed the day before. ** ** The probability is shown below. Now, what is the probability of wearing each color one day later, or ten days later?
table | On the day: Zhu | On the day: Green | On the day: Blue |
---|---|---|---|
The day before: Zhu | 0.3 | 0.3 | 0.4 |
The day before: green | 0.1 | 0.5 | 0.4 |
The day before: blue | 0.2 | 0.6 | 0.2 |
Let the random variable representing the pattern of the tie on the $ t $ day be $ X ^ {t} $ using the subscript $ t (= 1, ...., T) $. Random variables that change over time are called ** stochastic processes **. The general stochastic process is
P(X^t|X^{t-1},・ ・ ・,X^1)
Like, it is a conditional probability based on the circumstances before that. However, in this case,
P(X^t|X^{t-1},・ ・ ・,X^1) = P(X^t|X^{t-1})
A stochastic process that is affected only by the probability of the previous day is called a Markov chain.
The conditional probabilities that define this Markov chain can be expressed as follows. This is called the ** transition kernel or transition kernel **.
{\bf{P}}=\begin{pmatrix} 0.3 & 0.3 & 0.4 \\
0.1 & 0.5 & 0.4 \\
0.2 & 0.6 & 0.2 \end{pmatrix}
Now, let's find the probability distribution after a few days. When $ p_Zhu ^ {t} = p (X ^ {t} = Zhu) $ is written, the probability distribution of $ X ^ {t} $ is $ \ textbf p ^ {t} = (p_1 ^ {t} , p_2 ^ {t}, p_3 ^ {t}) $, and $ \ textbf p ^ {1} = (0.6,0.25,0.15) $. The probability of wearing a vermilion tie on the second day is $ \ Textbf p (X ^ 2 = Zhu) = 0.3 × 0.6 + 0.1 × 0.25 + 0.2 × 0.15 = 0.235 $ It will be.
It can be calculated as follows by repeating the calculation with the same idea.
You can see that there is no change after the 4th day. The answer I mentioned earlier is, "After the 4th day, it will be stable at vermilion: green: blue = 1/6: 1/2: 1/3."
The probability distribution $ \ bf p $ that does not change at this time is called ** steady distribution or invariant distribution **. The period until this steady distribution is replaced is called the ** burn-in period **.
The purpose of this tie problem was to find the steady state, which is the final wearing state. However, this time, with the posterior distribution clear, I would like to obtain a random number that matches the posterior distribution. In other words, consider designing a transition nucleus so that the posterior distribution becomes a steady distribution. In this way, the method of constructing a Markov chain with the distribution to be sampled as a ** stationary distribution ** is called the ** Markov chain Monte Carlo method ** (commonly known as the MCMC method).
In the MCMC method, the steady-state distribution is known, but the transition nucleus is unknown (opposite to the tie problem).
Here, the concept of ** detailed balance condition ** is used as a condition for the Markov chain to converge to a steady distribution. The following equation refers to the random variables $ θ and θ'$.
f(θ'|θ)f(θ) = f(θ|θ')f(θ')
In the case of the tie problem I mentioned earlier,
p(Vermilion|Green) ×\frac {1}{6} = p(Green|Vermilion) ×\frac {1}{2}
It will be.
Here is an overview of detailed balance conditions. Let the point $ θ $ be near the center of the distribution and the point $ θ'$ be the periphery of the distribution. At this time, what the detailed balance conditions mean is
f(θ'):f(θ) = 1 :If a\\
f(θ|θ'):f(θ'|θ) = a :1
It will be.
The movement of $ θ'$ → $ θ $ is $ a $ more likely than the movement of $ θ $ → $ θ'$. In other words, it means that it is easy to move near the center.
If this detailed balance condition is cracked, it means that the random number sequence will eventually be drawn to the center even if the initial state is taken randomly.
After summarizing the Markov chain and Monte Carlo methods, we have summarized the tie problem as an example of the Markov chain Monte Carlo method. Now that the outline is complete, I would like to deal with the explanation of the algorithm that utilizes this MCMC method next time.
The next article.
Understand the metropolitan hasting method (one of the methods in Markov chain Monte Carlo method) with implementation https://qiita.com/Fumio-eisan/items/6e8bad9977e945ddb2fc
It is also described below as a blog. The role is to summarize in more detail.
https://fumio-eisan.hatenablog.com/
Recommended Posts