Since I started posting qiita this time, I will write about the implementation of the continuous hidden Markov model by python, which I had a hard time implementing a few months ago. As a memorandum, I will write from the basics in the order in which I studied. Also, I think that this result is probably made, but there are some suspicious parts, so if you notice something, I would appreciate it if you could let me know.
A Markov model is a model created based on a Markov process, and a Markov process is a stochastic process with Markov properties. Markov came out a lot and I couldn't understand it easily, so I will explain my own understanding. First, consider a diagram like the graph theory below.
In this figure, it is assumed that after a certain period of time, the state changes stochastically in the direction in which the arrow points, and that probability is the number attached to the arrow. If it is in state A now, it will transition to state B with a probability of 0.3 and to state C with a probability of 0.7. If the A state comes from X or Y, the transition probability from state A to state B or state C does not change. In other words, Markov property is a property in which the transition probability to the next state (B or C) depends only on the current state (A). A stochastic process is written as "a random variable that changes with time.-Wikipedia-". In this case, the probability of taking a state changes with time, so it can be said to be a stochastic process. ~~ I'm sorry, I'll leave the strict definition to other information sites ~~ A model created in this way can be called a Markov model.
The hidden Markov model is a model like the Markov model mentioned earlier, but it is a model used when it has a certain value (output) depending on the state instead of not knowing the state. Let's make a concrete example based on natural language processing. First, consider the example sentence (output) of "I book a film" (I reserve a movie). At this time, "I" and "film" are nouns, "book" is a verb, and "a" is an article. Such part of speech is considered hidden because it is not known from this sentence. Suppose you want the machine to decide the part of speech of this sentence. However, although "I" and "a" are almost uniquely determined, "book" and "film" can also be used as verbs such as taking a book or movie, which is a noun. Therefore, let's consider whether it is possible to classify part of speech well using the hidden Markov model.
In English, the order of part of speech is fixed to some extent, such as 5 sentence patterns. Therefore, the ease of changing part of speech is likely to be as shown in the table below. (Price is set by your own sense)
Beginning of sentence | noun | verb | article | End of sentence | |
---|---|---|---|---|---|
Beginning of sentence | 0 | 0.6 | 0.2 | 0.2 | 0 |
noun | 0 | 0.3 | 0.5 | 0.1 | 0.1 |
verb | 0 | 0.6 | 0 | 0.3 | 0.1 |
article | 0 | 1 | 0 | 0 | 0 |
End of sentence | 0 | 0 | 0 | 0 | 0 |
Think of the left as the current state and the top as the next state. To give an example, the probability that a noun will come next to a noun is 0.3, the probability that a verb will come is 0.5, the probability that an article will come is 0.1, and the probability that the end of a sentence will come is 0.1. ~~ Using 0 for the probability is not good for zero frequency problem ~~ Then, it is assumed that the output is taken with the following probabilities in each state.
In other words, when it is a noun, "I" is 0.4, "a" is 0.0, "book" is 0.3, and "film" is 0.2. The method of state transition at this time is as follows. (The state where the transition probability is 0 is deleted in advance)
Then, the number of words is n, the word string (observable) is $ W = w_1, w_2, ..., w_n $, and the part of speech sequence (unobservable: state) is $ T = t_1, t_2 ,. .., t_n $, the probability that this model will give when deciding one route is
Can be represented by. In other words, the probability of changing from the previous state to the current state
(However, in the example of the sentence used this time, $ w_0 $ is blank, $ w_1 $ is'I', ..., $ w_n $ is blank.) Then, the part of speech of this sentence can be predicted by finding the transition method with the highest probability. By the way, in the case of this example, it can be said that it is biased in an easy-to-understand manner. ..
Since the hidden Markov model mentioned earlier was thinking about four words, the output can be thought of as a four-value output. So the output was a discrete value. Therefore, it can be said to be a discrete hidden Markov model. Therefore, the hidden Markov model when using continuous values such as temperature is called the continuous hidden Markov model. In the discrete Markov model mentioned earlier, the appearance probabilities of each state are calculated individually, but since it is not realistic to assign all values to continuous values, we will define the appearance probabilities using a mixed Gaussian distribution. .. Before we get into the implementation of the Hidden Markov Model, let's first write about the mixed Gaussian distribution.
The mixed Gaussian distribution is a representation of the probability distribution by adding a number of Gaussian distributions.
In the first place, the Gaussian distribution is also called the normal distribution.
It is represented by.
-Image borrowed from wikipedia- As you can see from this graph, the closer this x is to the average $ \ mu $ (0 in this figure), the larger the value. And if the variance is large, it will be flat. In other words, if this value is considered as a probability, the closer the value of x is to the average $ \ mu $, the higher the value will be, and if the variance is large, it will be possible to take that value even if it is far from the average. I can say.
Also, the mixed Gaussian distribution is like adding multiple normal distributions together, and if K Gaussian distributions are combined, the formula for the mixed Gaussian distribution is
However, there is a condition of $ \ sum_ {k = 1} ^ K \ pi_k = 1 $.
It is represented by. If you try to illustrate this
It looks like this.
In the future implementation of the continuous hidden Markov model, it is necessary to identify the Gaussian distribution from the data points. Therefore, I would like to see the flow by implementing soft clustering with a mixed Gaussian distribution.
First, let's plot the points based on the three Gaussian distributions.
This is the plot. These points are in a state where we know from which Gaussian distribution they were generated. (Complete data) However, when actually clustering, I do not know from which distribution it was generated. (Incomplete data) Therefore, from now on, we will adjust the parameters of the mixed Gaussian distribution consisting of three Gaussian distributions so that they can be best divided into three according to the Gaussian distribution.
How can we divide it into three well? It means that the simultaneous probability of all points according to the desired mixed Gaussian distribution should be the highest. In other words
$ X = [x_1, x_2, ..., x_n] $ ・ ・ ・ Each data point $ K $ ・ ・ ・ Number of models (number of clusters)
Adjust the parameters so that is the highest. This is also called the log-likelihood function. I would like to change the parameters so as to maximize this function, but since the sum of the sums appears and it is difficult to solve analytically, I will solve it using the EM algorithm.
As a method of EM algorithm E step: Calculation of expected value M step: Adjust the parameters to maximize the expected value It is a method to find the optimum solution by repeating.
The flow this time is
I will do it in the flow.
The initial initial value is decided appropriately. (Of course, if you know this value somehow, it seems that it will converge faster if you adjust it.)
Next is the E step. The term burden rate comes up here, which is the probability that a point x occurs from model k. Given the plot above, I didn't know which Gaussian distribution model the point came from. Consider the latent variable z indicating whether that was born from any model for that. If $ z_ {i, k} = 1 $, then the point $ x_i $ is a point born from model k.
When this is made into an expression
Also,
Because
Is. (Bayes' theorem)
Also, since $ p (x_i) $ is a value marginalized with respect to z, it is expressed by the formula of mixed Gaussian distribution.
It will be.
here,
I haven't caught up with why $ p (z_ {i, k} = 1) = \ pi_k $, but this is a value that has nothing to do with x, and how much it belongs to model k. It is a feeling that it may certainly be the case if it is an amount to compare whether it is easy or not.)
The reason why this is called the burden rate is that it shows how much the model k occupies the proportion of the mixed Gaussian distribution. If you look at the figure two years ago, you can imagine it somehow.
Next is the parameter update. As for each policy, for $ \ mu and \ sigma $, the likelihood function should be the highest, so we will proceed with the policy of differentiating and searching for the point where the value becomes 0. Also, since $ \ pi $ has the condition of $ \ sum_ {k = 1} ^ K \ pi_k = 1 $, it will be solved by Lagrange's undetermined multiplier method.
The derivation of this derivative is EM algorithm thorough explanation, so other people explain it in a very easy-to-understand manner, so I will summarize only the results. ~~ I personally derived it, but it was very difficult ~~
result
The result is.
Since the derivation was organized, I actually implemented clustering of a mixed Gaussian distribution.
ex_gmm.py
import numpy as np
import matplotlib.pyplot as plt
#difine
data_num = 100
model_num = 3
dim = 2
color = ['#ff0000','#00ff00','#0000ff']
epsilon = 1.0e-3
check = False # if not check == False
#gaussian calculator
def cal_gauss(x,mu,sigma):
return np.exp(-np.dot((x-mu),np.dot(np.linalg.inv(sigma),(x-mu).T))/2)/np.sqrt(2*np.pi*abs(np.linalg.det(sigma)))
#init create param
mu =[]
sigma = []
for i in range(model_num):
mu.append([np.random.rand()*10 ,np.random.rand()*10])
temp = np.random.rand()-np.random.rand()
sigma.append([[np.random.rand()+1,temp],[temp,np.random.rand()+1]])
#create data
values = []
for i in range(model_num):
values.append(np.random.multivariate_normal(mu[i],sigma[i],data_num))
for i in range(model_num):
plt.scatter(values[i][:,0],values[i][:,1],c=color[i],marker='+')
plt.show()
#create training data
train_values =values[0]
for i in range(model_num-1):
train_values = np.concatenate([train_values,values[i+1]])
plt.scatter(train_values[:,0],train_values[:,1],marker='+')
#init model param
model_mu =[]
model_sigma =[]
for i in range(model_num):
model_mu.append([np.random.rand()*10 ,np.random.rand()*10])
temp = np.random.rand()-np.random.rand()
model_sigma.append([[np.random.rand()+1,temp],[temp,np.random.rand()+1]])
model_mu = np.array(model_mu)
model_sigma = np.array(model_sigma)
for i in range(model_num):
plt.scatter(model_mu[i,0],model_mu[i,1],c=color[i],marker='o') #plot pre train mu
temp = []
for i in range(model_num):
temp.append(np.random.rand())
model_pi = np.zeros_like(temp)
for i in range(model_num):
model_pi[i] = temp[i]/np.sum(temp)
plt.show()
#training
old_L = 1
new_L = 0
gamma = np.zeros((data_num*model_num,model_num))
while(epsilon<abs(new_L-old_L)):
#gamma calculation
for i in range(data_num*model_num):
result_gauss = []
for j in range(model_num):
result_gauss.append(cal_gauss(train_values[i,:],model_mu[j],model_sigma[j]))
result_gauss = np.array(result_gauss)
old_L = np.sum(model_pi*result_gauss) #old_L calculation
for j in range(model_num):
gamma[i,j] = (model_pi[j]*result_gauss[j])/np.sum(model_pi*result_gauss,axis=0)
N_k =np.sum(gamma,axis=0)
#calculation sigma
for i in range(model_num):
temp = np.zeros((dim,dim))
for j in range(data_num*model_num):
temp+=gamma[j,i]*np.dot((train_values[j,:]-model_mu[i]).reshape(-1,1),(train_values[j,:]-model_mu[i]).reshape(1,-1))
model_sigma[i] = temp/N_k[i]
#calculation mu
for i in range(model_num):
model_mu[i] = np.sum(gamma[:,i].reshape(-1,1)*train_values,axis=0)/N_k[i]
#calculation pi
for i in range(model_num):
model_pi[i]=N_k[i]/(data_num*model_num)
#check plot
if(check == True ):
gamma_crasta = np.argmax(gamma,axis=1)
for i in range(data_num*model_num):
color_temp = color[gamma_crasta[i]]
plt.scatter(train_values[i,0],train_values[i,1],c=color_temp,marker='+')
for i in range(model_num):
plt.scatter(model_mu[i,0],model_mu[i,1],c=color[i],marker='o')
plt.show()
##new_Lcalculation
for i in range(data_num*model_num):
result_gauss = []
for j in range(model_num):
result_gauss.append(cal_gauss(train_values[i,:],model_mu[j],sigma[j]))
result_gauss = np.array(result_gauss)
new_L = np.sum(model_pi*result_gauss)
gamma_crasta = np.argmax(gamma,axis=1)
for i in range(data_num*model_num):
color_temp = color[gamma_crasta[i]]
plt.scatter(train_values[i,0],train_values[i,1],c=color_temp,marker='+')
for i in range(model_num):
plt.scatter(model_mu[i,0],model_mu[i,1],c=color[i],marker='o')
plt.show()
It should work like this. In this program, all the parameters are randomly output, so it may not be possible to cluster well. ~~ Actually, when such data comes in, clustering is not possible ... ~~
It's been a detour, but I'd like to see the implementation of the continuous Markov model from now on. When the observation information y = {$ y_1, y_2, ..., y_t $}, which is a continuous Markov model, is obtained, the probability $ p (y | M) $ that this model recalls this value is determined by the trellis algorithm. can do. The trellis algorithm $ p (y | M) = \ sum_ {i} \ alpha (i, T) $ (however, this i is the final state)
Is required at. And this $ \ alpha (i, t) $ is
\begin{eqnarray}
\alpha_{i,t}=\left\{ \begin{array}{ll}
\ \pi_i & (t=0) \\
\ \sum_j \alpha(j,t-1)a_{j,i}・ B_{i}(y_t) & (|l-k|=1) \\
\end{array} \right.
\end{eqnarray}
is. And $ a_ {j, i} $ and $ b_ {j, i} (y_t) $ that appear here are a little while ago, but they show the transition probability and the appearance probability that also appeared in the hidden Markov model. I am. (This story had nothing to do with learning ...) This $ \ alpha (i, t) $ traces the probability from the front (forward function), but defines a function that traces this from the end (backward function).
\begin{eqnarray}
\beta_{i,T}=\left\{ \begin{array}{ll}
\ 1 & (Final state) \\
\ 0 & (Not in final state) \\
\end{array} \right.
\end{eqnarray}
\beta(i,t)=\sum_{j}a_{i,j}・ B_{i}(y_t)・\beta(j,t+1) \qquad (t=T,T-1,...,1)
The Baum-Welch algorithm, which is a type of EM algorithm, can be used to train the model parameters of the hidden Markov model. The Baum-Welch algorithm is used to calculate in E steps. $ \ Gamma (i, j, t) = \ frac {\ alpha (i, t-1) ・ a_ {i, j} ・ b_ {j} ・ \ beta (j, t)} {p (y | M )} $
This value. This value is considered to be the probability of transitioning from the state i to j at time t and outputting $ y_t $ divided by the probability of y occurring in this model.
And the parameter estimation (update) formula is
Next, Because b has a Gaussian distribution
Is required at.
Then it is a code. Implementation file of Gaussian distribution because it was adjusted sequentially. It is divided into three parts: a hidden Markov model file and an executable file. (At first, I was planning to make it with GMM-HMM, so there are many parts that are useless to implement. Please wait for a while as I will update it again as soon as GMM-HMM works.)
class_b.py
import numpy as np
import matplotlib.pyplot as plt
import math
import sys
#construct
#c = mixing coefficient
#mu = mean
#sigma = dispersion
#m = model num
#size = datasize
#x = data
##method
#gaussian(x) : calculation gaussian:x=data
#updata(x) : update gmm :x=data
#output(x) : gmm output:x=data (size=1)
class function_b:
upsilon = 1.0e-50
def __init__(self,c,mu,sigma,m,size,x):
self.c = np.array(c)
self.c = self.c.reshape((-1,1))
self.mu = np.array(mu)
self.mu = self.mu.reshape((-1,1))
self.sigma = np.array(sigma)
self.sigma = self.sigma.reshape((-1,1))
self.size = size
self.model_num = m
self.x = np.array(x).reshape(1,-1)
self.ganma = np.zeros((self.model_num,self.size))
self.sum_gaus = 0
self.N_k=np.zeros(self.model_num)
def gaussian(self,x):
return np.exp(-((x-self.mu)**2)/(2*self.sigma))/np.sqrt(2*math.pi*self.sigma)
def update(self,pre_ganma):
#x=[x1~xt]
self.ganma = np.zeros((self.model_num,self.size))
#calculate ganma
for m in range(self.model_num):
self.ganma[m] = pre_ganma*((self.c[m]*self.gaussian(self.x)[m]+function_b.upsilon)/np.sum(self.c*self.gaussian(self.x)+function_b.upsilon,axis=0))+function_b.upsilon
for m in range(self.model_num):
self.N_k[m] = np.sum(self.ganma[m])
#calculate c
for m in range(self.model_num):
self.c[m]=self.N_k[m]/np.sum(self.N_k)
#calculate sigma
for m in range(self.model_num):
self.sigma[m]=np.sum(self.ganma[m]*((self.x-self.mu[m])**2))/self.N_k[m]
#calculate mu
for m in range(self.model_num):
self.mu[m]=np.sum(self.ganma[m]*self.x)/self.N_k[m]
def output(self,x):
return np.sum(self.c.reshape((1,-1))*self.gaussian(x).reshape((1,-1)))
class_hmm_gmm.py
import numpy as np
import sys
import matplotlib.pyplot as plt
from class_b import function_b
import copy
#construct
#s = state num
#x = data
#pi = init prob
#a = transition prob
#c = gmm mixing coefficient
#mu = gmm mean
#sigma = gmm dispersion
#m = gmm model num
#method
#cal_alpha() : calculation alpha
#cal_beta() : calculation beta
#cal_exp() : calculation expected value(?)
#update() : update prams
#makestatelist : make statelist
#viterbi : viterbi algorithm
class Hmm_Gmm():
upsilon=1.0e-300
def __init__(self,s,x,pi,a,c,mu,sigma,m):
self.s = s
self.x = x
self.a = np.array(a)
self.size=len(x)
self.b=[]
for i in range(s):
self.b.append(function_b(c[i],mu[i],sigma[i],m,self.size,x))
self.pi = np.array(pi)
self.alpha = np.zeros((self.s,self.size))
self.beta = np.zeros((self.s,self.size))
self.prob = None
self.kusai = np.zeros((self.s,self.s,self.size-1))
self.ganma = np.zeros((self.s,self.size))
self.preganma = None
self.delta = None
self.phi = None
self.F = None
self.P = None
self.max_statelist=None
def cal_alpha(self):
for t in range(self.size):
for j in range(self.s):
if t==0:
self.alpha[j,t]=self.pi[j]*self.b[j].output(self.x[t])
else:
self.alpha[j,t]=0
for i in range(self.s):
self.alpha[j,t]+=self.alpha[i,t-1]*self.a[i,j]*self.b[j].output(self.x[t])
self.prob =0
for i in range(self.s):
self.prob+=self.alpha[i,self.size-1]+Hmm_Gmm.upsilon
def cal_beta(self):
for t in reversed(range(self.size)):
for i in range(self.s):
if t ==self.size-1:
self.beta[i,t]=1
else:
self.beta[i,t]=0
for j in range(self.s):
self.beta[i,t]+=self.beta[j,t+1]*self.a[i,j]*self.b[j].output(self.x[t+1])
def cal_exp(self):
for t in range(self.size-1):
for i in range(self.s):
for j in range(self.s):
self.kusai[i,j,t] = ((self.alpha[i,t]*self.a[i,j]*self.b[j].output(self.x[t+1])*self.beta[j,t+1])+(Hmm_Gmm.upsilon/(self.s)))/self.prob
for t in range(self.size):
for i in range(self.s):
#self.ganma[i,t] = (self.alpha[i,t]*self.beta[i,t])/self.prob
#self.ganma[i,t] = (self.alpha[i,t]*self.beta[i,t])/self.prob + Hmm_Gmm.uplilon
self.ganma[i,t] = ((self.alpha[i,t]*self.beta[i,t])+(Hmm_Gmm.upsilon))/self.prob
def update(self):
#cal exp
self.cal_alpha()
self.cal_beta()
self.cal_exp()
#cal pi
self.pi = self.ganma[:,0]
#cal a
for i in range(self.s):
for j in range(self.s):
#self.a[i,j]=np.sum(self.ganma[i,:])
self.a[i,j]=np.sum(self.kusai[i,j,:])/np.sum(self.ganma[i,:])
#self.a[i,j]=np.sum(self.kusai[i,j,:])
#cal b
#cal preganma
self.preganma = np.zeros((self.s,self.size))
for t in range(self.size):
temp = self.alpha[:,t]*self.beta[:,t]+self.upsilon
self.preganma[:,t] = temp/np.sum(temp)
for i in range(self.s):
self.b[i].update(np.array(self.preganma[i]))
def makestatelist(self,f,size):
statelist = np.zeros(size)
for t in reversed(range(size)):
if t == size-1:
statelist[t] = f
else:
statelist[t] = self.phi[statelist[t+1],t+1]
return statelist
def viterbi(self,x):
data=np.array(x)
self.delta = np.zeros((self.s,data.shape[0]))
self.phi = np.zeros((self.s,data.shape[0]))
for t in range(data.shape[0]):
for i in range(self.s):
if t == 0:
self.phi[i,t] = np.nan
self.delta[i,t] = np.log(self.pi[i])
else:
self.phi[i,t] = np.argmax(self.delta[:,t-1]+np.log(self.a[:,i]))
self.delta[i,t] = np.max(self.delta[:,t-1]+np.log(self.a[:,i]*self.b[i].output(data[t])+self.upsilon))
self.F = np.argmax(self.delta[:,data.shape[0]-1])
self.P = np.max(self.delta[:,data.shape[0]-1])
#print(self.makestatelist(self.F,data.shape[0]))
self.max_statelist=self.makestatelist(self.F,data.shape[0])
ex_hmm_demo.py
import numpy as np
import matplotlib.pyplot as plt
import sys
from class_hmm_gmm import Hmm_Gmm
model_num = 1
state_num = 4
color=['#ff0000','#00ff00','#0000ff','#888888']
value_size = int(10*model_num*state_num)
create_mu = []
create_sigma= []
create_pi =[]
for i in range(state_num):
temp = []
temp1 = []
temp2=[]
for j in range(model_num):
temp.extend([(np.random.rand()*30-np.random.rand()*30)])
temp1.extend([np.random.rand()+1])
temp2.extend([np.random.rand()])
create_mu.append(temp)
create_sigma.append(temp1)
create_pi.append((temp2/np.sum(temp2)).tolist())
#create data
values = []
for i in range(state_num):
temp = []
for it in range(int(value_size/state_num/model_num)):
for j in range(model_num):
temp.extend(np.random.normal(create_mu[i][j],create_sigma[i][j],1).tolist())
values.append(temp)
x_plot = np.linspace(0,value_size,value_size)
for i in range(state_num):
plt.scatter(x_plot[i*len(values[0]):i*len(values[0])+len(values[0])],values[i],c=color[i],marker='+')
for i in range(state_num):
for j in range(model_num):
plt.plot(x_plot[i*len(values[0]):i*len(values[0])+len(values[0])],np.full(len(values[0]),create_mu[i][j]),c=color[i])
plt.show()
train_values=[]
for i in range(state_num):
train_values.extend(values[i])
train_mu = []
train_sigma= []
train_pi =[]
for i in range(state_num):
temp = []
temp1 = []
temp2=[]
for j in range(model_num):
temp.extend([np.random.rand()*30-np.random.rand()*30])
temp1.extend([np.random.rand()+1])
temp2.extend([np.random.rand()])
train_mu.append(temp)
train_sigma.append(temp1)
train_pi.append((temp2/np.sum(temp2)).tolist())
##backup train
b_train_mu = train_mu
b_train_sigma = train_sigma
plt.scatter(x_plot,train_values,c="#000000",marker='+')
for i in range(state_num):
for j in range(model_num):
plt.plot(x_plot[i*len(values[0]):i*len(values[0])+len(values[0])],np.full(len(values[0]),train_mu[i][j]),c=color[i])
plt.show()
#init a
a = []
for j in range(state_num):
temp = []
for i in range(state_num):
temp.append(np.random.rand())
a.append(temp)
for i in range(state_num):
for j in range(state_num):
#a[j][i] = a[j][i]/np.sum(np.array(a)[:,i])
a = (np.array(a)/np.sum(np.array(a),axis=0)).tolist()
#init init_prob
init_prob = []
for j in range(state_num):
temp2.extend([np.random.rand()])
#temp2[0]+=1
init_prob.extend((temp2/np.sum(temp2)).tolist())
#init Hmm_Gmm model
hmm_gmm = Hmm_Gmm(state_num,train_values,init_prob,a,train_pi,train_mu,train_sigma,model_num)
#train roop
print(create_mu)
print(a)
count = 0
while(count != 100):
hmm_gmm.update()
count +=1
#viterbi
hmm_gmm.viterbi(train_values)
print(hmm_gmm.max_statelist)
#*************************************
plt.scatter(x_plot,train_values,c="#000000",marker='+')
##plot of init train mu
for i in range(state_num):
for j in range(model_num):
plt.plot(x_plot[i*len(values[0]):i*len(values[0])+len(values[0])],np.full(len(values[0]),b_train_mu[i][j]),c="#dddddd")
for i in range(state_num):
for j in range(model_num):
plt.plot(x_plot[i*len(values[0]):i*len(values[0])+len(values[0])],np.full(len(values[0]),hmm_gmm.b[i].mu[j]),c=color[i])
plt.show()
result The points output from the initial $ \ mu $ value and the Gaussian distribution. This time, four states are assumed, and points are generated from each of the four Gaussian distributions. From the beginning, each state is called state 1 (red), state 2 (green), state 3 (blue), and state 4 (gray).
The average value $ \ mu $ was reset with a random number. We will continue to learn about this. This is the result of updating 100 times.
It is a learning result. The states are in the order of state 2, state 1, state 4, and state 3 from the beginning, but as a result, I feel that it is working well.
This time I tried to summarize the hidden Markov model in my own way, but the amount is larger than I expected, and there are many places where I forget how to think even though I implemented it a few months ago, so it is not possible to output early. I really felt it was important. Because of that influence, I can't explain the bitterbi algorithm and the function of the state list. I will summarize it again at a later date. ~~ (There is also a reason why I couldn't enter because of my lack of writing ability ...) ~~ Also, the Gaussian distribution may not fit properly in this program, so I would like to think a little more about whether it is due to my program or whether it has fallen into a locally optimal solution. I also tried to implement GMM-HMM (changing model_num makes it GMM), but it didn't work well, and where did it go wrong again (or did it fall into the local optimal solution) again? I would like to review it. I'm sorry it wasn't completed properly. Also, I found a paper that seems to describe a method for solving a problem that falls into a locally optimal solution, so I will study this as well.
https://mieruca-ai.com/ai/markov_model_hmm/ [Technical explanation] Markov model and hidden Markov model https://qiita.com/kenmatsu4/items/59ea3e5dfa3d4c161efb Thorough explanation of EM algorithm
Recommended Posts