PRML Implemented the EM algorithm for mixed Gaussian distribution in Chapter 9. Link to Jupyter notebook
result
Clustering Old Faithful Geyser Data with K-means and EM algorithm for mixed Gaussian distribution did.
Consideration
The EM algorithm for a mixed Gaussian distribution can be applied to various problems, considering models other than the mixed Gaussian distribution.
EM algorithm for mixed Gaussian distribution
Mixtures of Gaussians
The mixed Gaussian distribution is represented by a superposition of K Gaussian distributions.$p(\mathbf{x}) = \Sigma_{k=1}^{K} \pi_{k} \mathcal{N}(\mathbf{x}|\mathbf{\mu_{\rm{k}}}, \mathbf{\Sigma_{\rm{k}}})Each Gaussian distribution\mathcal{N}(\mathbf{x}|\mathbf{\mu_{\rm{k}}}, \mathbf{\Sigma_{\rm{k}}})Is__Mixture component__Called, each individually average\mathbf{\mu_{\rm{k}}}, Covariance\mathbf{\Sigma_{\rm{k}}}$Has the parameters of.
The parameter $ \ pi_ {k} $ is called __mixing coefficient __. $ \ Sigma_ {k = 1} ^ {K} \ pi_ {k} = 1 $ $ 0 \ leqslant \ pi_ {k} \ leqslant 1 $.
Introduced latent variables
K-dimensional binary random variable\mathbf{z}Introduce.\mathbf{z}Is\mathbf{x}Represents from which mixed element 1-of-Take the K coding method (one of them)z_{k}だけが1で、他Is0)。\mathbf{x}When\mathbf{z}Joint distribution ofp(\mathbf{x}, \mathbf{z})Is expressed as follows.$p(\mathbf{x}, \mathbf{z})=p(\mathbf{z})p(\mathbf{x}|\mathbf{z})$p(\mathbf{z})Is混合係数\pi_{k}It is decided as follows.$p(z_{k}=1)=\pi_{k}1-of-Since K coding is used, it can also be written as follows.p(\mathbf{z})=\Pi_{k=1}^{K}\pi_{k}^{z_k}$\mathbf{z}の値が与えられたもWhenでの\mathbf{x}の条件付き分布Is次のガウス分布である。$p(\mathbf{x}|z_{k}=1)=\mathcal{N}(\mathbf{x}|\mathbf{\mu_{\rm{k}}}, \mathbf{\Sigma_{\rm{k}}})これIsp(\mathbf{x}|\mathbf{z})=\Pi_{k=1}^{K} \mathcal{N}(\mathbf{x}|\mathbf{\mu_{\rm{k}}}, \mathbf{\Sigma_{\rm{k}}})^{z_{k}}Whenいう形にも書ける。p(\mathbf{x}, \mathbf{z})Periphered to the previousp(\mathbf{x})To get.p(\mathbf{x})=\Sigma_{\mathbf{z}}p(\mathbf{z})p(\mathbf{x}|\mathbf{z})=\Sigma_{k=1}^{K} \pi_{k} \mathcal{N}(\mathbf{x}|\mathbf{\mu_{\rm{k}}}, \mathbf{\Sigma_{\rm{k}}})$\mathbf{z}It seems meaningless to introduce, but it is necessary for deriving the EM algorithm.
Responsibility
The conditional probability of $ \ mathbf {z} $ given $ \ mathbf {x} $ is represented by $ \ gamma (z_ {k}) $.
\begin{align}
\gamma(z_{k})\equiv p(z_{k}=1|\mathbf{x})&=\frac{p(z_{k}=1)p(\mathbf{x}|z_{k}=1)}{\Sigma_{j=1}^{K}p(z_{j}=1)p(\mathbf{x}|z_{j}=1}\\
&=\frac{\pi_{k}\mathcal{N}(\mathbf{x}|\mathbf{\mu_{\rm{k}}}, \mathbf{\Sigma_{\rm{k}}})}{\Sigma_{j=1}^{K}\pi_{j}\mathcal{N}(\mathbf{x}|\mathbf{\mu_{\rm{j}}}, \mathbf{\Sigma_{\rm{j}}})}
\end{align}
Correspondence when $ \ pi_ {k} $ is $ z_ {k} = 1 $ event __ prior probability _, $ \ gamma (z {k}) $ is $ \ mathbf {x} $ It is regarded as __posterior probability _. $ \ Gamma (z {k}) $ can also be interpreted as responsibility, which represents the degree to which the mixed element k "explains" the observation of $ \ mathbf {x} $.
EM algorithm for mixed Gaussian distribution
- Initialize the mean $ \ mathbf {\ mu_ {\ rm {k}}} $, the variance $ \ Sigma_ {k} $, and the mixing factor $ \ pi_ {k} $ to calculate the initial value of the log-likelihood. ..
- \mathbf{E}Step: Calculate the burden factor using the current parameter values.$\gamma(z_{nk})=\frac{\pi_{k}\mathcal{N}(\mathbf{x_{\rm{n}}}|\mathbf{\mu_{\rm{k}}}, \mathbf{\Sigma_{\rm{k}}})}{\Sigma_{j=1}^{K}\pi_{j}\mathcal{N}(\mathbf{x_{\rm{n}}}|\mathbf{\mu_{\rm{j}}}, \mathbf{\Sigma_{\rm{j}}})}$\mathbf{x_{\rm{n}}}Is the nth data point.z_{nk}Is a latent variable for the nth data point.
- $ \ mathbf {M} $ Step: Recalculate the parameter value with the following equation using the current burden factor.
\begin{align}
\mathbf{\mu_{\rm{k}}^{\rm{new}}}&=\frac{1}{N_{k}}\Sigma_{n=1}^{N}\gamma(z_{nk})\mathbf{x_{\rm{n}}}\\
\Sigma_{k}^{new}&=\frac{1}{N_{k}}\Sigma_{n=1}^{N}\gamma(z_{nk})(\mathbf{x_{\rm{n}}}-\mathbf{\mu_{\rm{k}}^{\rm{new}}})(\mathbf{x_{\rm{n}}}-\mathbf{\mu_{\rm{k}}^{\rm{new}}})^{T}\\
\pi_{k}^{new}&=\frac{N_{k}}{N}
\end{align}
However, $ N_ {k} = \ Sigma_ {n = 1} ^ {N} \ gamma (z_ {nk}) $
4.Log likelihood$\ln(p(\mathbf{X}|\mathbf{\mu}, \mathbf{\Sigma}, \mathbf{\pi})=\Sigma_{n=1}^{N}\ln\\{\Sigma_{k=1}^{K}\pi_{k}\mathcal{N}(\mathbf{x_{\rm{n}}}|\mathbf{\mu_{\rm{k}}}, \mathbf{\Sigma_{\rm{k}}})\\}$Is calculated, the convergence is confirmed by observing the change in the parameter value or the change in the log-likelihood, and if the convergence criterion is not met, the process returns to step 2.
reference
Written by C.M. Bishop, translated by Hiroshi Motoda, Takio Kurita, Tomoyuki Higuchi, Yuji Matsumoto, Noboru Murata, Pattern Recognition and Machine Learning, Chapter 9, Maruzen Publishing