This article is the 18th day of the Wacul Advent Calendar.
I have been working in the analysis team of WACUL Co., Ltd. for a year. Python history: about 3 weeks
Practicing python & I will try to scratch-implement a deep Boltzmann machine mainly for the purpose of studying probabilistic deep learning. All the theoretical aspects are in the following books.
" Deep Learning (Machine Learning Professional Series) "Takayuki Okatani (Author)
" Deep Learning "(supervised by the Japanese Society for Artificial Intelligence)
In this article, we will first implement a restricted Boltzmann machine as a preparation. Next time, we will build a deep Boltzmann machine by stacking restricted Boltzmann machines.
Since the probability distribution is simply a "function that associates a set of values with the probability density (probability mass) at which it is obtained", it can be regarded as a "mechanism behind the generation of a set of values" as a whole. I can do it. This is called *** Generative Model *** </ font>. If you can build a generative model, you can understand the relationships between variables and calculate expected values, which will expand the range of applications. The Boltzmann machine that we will discuss this time is a generative model that calculates the probability density using the Boltzmann distribution. The specific probability density is the standardized value after taking the exponential function of the energy function (Φ) defined below multiplied by minus. (Multiply a constant to fit in the closed interval of [0,1]) The energy function is a weighted linear sum of each node bias and link.
Visible variables are variables that can be observed directly, and hidden variables are variables that cannot be observed. Introducing hidden variables improves the expressiveness of the Boltzmann machine. In the case of a model that includes hidden variables, distribution of only visible variables marginalized with respect to hidden variables </ strong> is used as the objective function for maximum likelihood estimation.
The restricted Boltzmann machine is a model that simplifies parameter estimation by introducing hidden variables into a general Boltzmann machine and then applying certain constraints. What are the restrictions? ① There is no link between visible nodes ② There is no link between hidden nodes There are two. From these two conditions, the restricted Boltzmann machine has two layers, a "visible layer" consisting of only visible nodes and a "hidden layer" consisting of only hidden nodes, and is expressed as a model with links only between different layers. It will be.
This means that *** </ font> means that one layer is conditionally independent of one layer *** </ font>. -Get a sample group of all nodes in the hidden layer from the current expected values of all nodes in the visible layer -Approximate the expected value of all nodes in the hidden layer from the sample group of all nodes in the hidden layer ・ Obtain a sample group of all nodes in the visible layer from the expected values of all nodes in the hidden layer. -Approximate the expected value of all nodes in the visible layer from the sample group of all nodes in the visible layer You can turn the process. This calculation has the advantage that it is relatively easy and you can get a sample of each layer at once. Limitations The limits of the Boltzmann machine can be said to benefit from this in exchange for the expressiveness of the model.
Once you've decided on the shape of your model, the next thing to do is learn the parameters. Parameter learning of the restriction Boltzmann machine uses the orthodox gradient (ascending) method for the purpose of maximizing the log-likelihood function. The gradient increase method is a method that aims to maximize the function while repeating the process of updating the parameters little by little in the direction in which the value of the objective function increases. Distribution). To maximize this, the objective function is differentiated by the target parameters to find the gradient. Expected values for visible and hidden layers are required when calculating the gradient, but since this is difficult to calculate directly, it is approximated by actually generating a sample using Gibbs sampling and averaging it. At this time, the calculation can be easily performed by the benefit of the "conditional independence between layers" of the above-mentioned restricted Boltzmann machine.
Learning goes through the following process when viewed broadly.
⚫️ Randomly initialize parameters
⚫️ Repeat below----------------------------------------------
① Alternately sampling the visible layer and the hidden layer using the current parameter values
② Calculate the expected value of each node in the visible layer and hidden layer using ①
③ Use ② to calculate the bias of the visible layer, the bias of the hidden layer, and the gradient (derivative) of each link.
④ Update parameters using ③
---------------------------------------------------------
The persistent contrastive divergence method is used when sampling. Since the processing efficiency is not considered, the convergence test is omitted.
RBM class implementation
from typing import List
Input = List[int]
def sigmoid(z):
return 1/(1+np.exp(-z))
class RBM:
learningRate = 0.005
sample_num = 100
#Vn is the number of visible nodes, Hn is the number of hidden nodes
def __init__(self , Vn : int , Hn : int):
self.Vn = Vn
self.Hn = Hn
self.bias_v = np.array(np.random.rand(Vn)) #Randomly initialized
self.bias_h = np.array(np.random.rand(Hn)) #Randomly initialized
self.link = np.array(np.random.rand(Vn,Hn)) #Randomly initialized
self.hidden_value = []
def train(self, data_for_learning , iteration_times):
for iteration_times in range(iteration_times):
# samples_vh is v*h sample
samples_h =[]
samples_v = []
samples_vh = np.zeros((self.Vn,self.Hn))
messege_in_sampling = 0
sigmoid_belief = 0
#① Alternately sampling the visible layer and the hidden layer using the current parameter values
for index in range(RBM.sample_num):
current_v = data_for_learning[0]
current_h = np.zeros(self.Hn)
##Sampling hidden layers
for j in range(self.Hn) :
messege_in_sampling = self.bias_h[j] + sum(list(map(lambda i: current_v[i] * self.link[i][j] , range(self.Vn))))
sigmoid_belief = 1/(1+ np.exp(-1 * messege_in_sampling))
r = np.random.rand(1)
if sigmoid_belief > r[0] :
current_h[j] = 1
else :
current_h[j] = 0
samples_h.append(current_h)
##Sampling the visible layer
for i in range(self.Vn):
messege_in_sampling = self.bias_v[i] + sum(list(map(lambda j: current_h[j] * self.link[i][j] , range(self.Hn))))
sigmoid_belief = 1/(1+ np.exp(-1 * messege_in_sampling))
r = np.random.rand(1)
if sigmoid_belief > r[0] :
current_v[i] = 1
else :
current_v[i] = 0
samples_v.append(current_v)
##Visible layer ✖️ Hidden layer sampling
z = itertools.product(current_v,current_h)
product = []
for element in z:
product.append(element)
current_vh = (np.array(list(map(lambda x : x[0] * x[1] ,product))) ) .reshape(self.Vn,self.Hn)
samples_vh += current_vh
#② Calculate the expected value of each node in the visible layer and hidden layer
E_V = np.sum(np.array(samples_v),axis=0) / RBM.sample_num
E_H = np.sum(np.array(samples_h),axis=0) / RBM.sample_num
E_VH = samples_vh / RBM.sample_num
#③ Calculate the bias of the visible layer, the bias of the hidden layer, and the gradient (derivative) of each link.
##Derivative of visible layer bias
gradient_v = sum(np.array(data_for_learning)) / len(data_for_learning) - E_V
##Derivative of hidden layer bias
gradient_h = []
for j in range(len(self.bias_h)):
gradient_h_1 = []
sigmoid_beliefs_h_1 = []
for n in range(len(data_for_learning)):
messege_h_1 = self.bias_h[j] + sum(list(map(lambda i: data_for_learning[n][i] * self.link[i][j] , range(self.Vn))))
sigmoid_belief_h_1 = 1/(1+ np.exp(-1 * messege_h_1))
sigmoid_beliefs_h_1.append(sigmoid_belief_h_1)
gradient_h_1 = sum(sigmoid_beliefs_h_1) / len(data_for_learning)
gradient_h_2 = E_H[j]
gradient_h_j = gradient_h_1 + gradient_h_2
gradient_h.append(gradient_h_j)
##Derivative of the link
gradient_vh = []
for i in range(len(self.bias_v)):
for j in range(len(self.bias_h)):
gradient_vh_1 = []
sigmoid_beliefs_vh_1 = []
for n in range(len(data_for_learning)):
messege_vh = self.bias_h[j] + sum(list(map(lambda i: data_for_learning[n][i] * self.link[i][j] , range(self.Vn))))
sigmoid_belief_vh_1 = 1/(1+ np.exp(-1 * messege_vh))
sigmoid_beliefs_vh_1.append(sigmoid_belief_vh_1 * data_for_learning[n][i])
gradient_vh_1 = sum(sigmoid_beliefs_vh_1) / len(data_for_learning)
gradient_vh_2 = E_VH[i][j]
gradient_vh_ij = gradient_vh_1 + gradient_vh_2
gradient_vh.append(gradient_vh_ij)
#④ Update parameters using ③
self.bias_v += RBM.learningRate * np.array(gradient_v)
self.bias_h += RBM.learningRate * np.array(gradient_h)
self.link += RBM.learningRate * np.array(gradient_vh).reshape(self.Vn,self.Hn)
def energy(self,input : Input) -> float:
#Visible layer energy
energy_v = sum(list(map(lambda i : self.bias_v[i] * input[i] , range(self.Vn))))
#Hidden layer energy
##Estimated value of hidden layer
hidden_layer = self.getHiddenValue(input)
energy_h = sum(list(map(lambda j : self.bias_h[j] * hidden_layer[j] , range(self.Hn))))
#link energy
accumlator = []
for i in range(self.Vn):
for j in range(self.Hn):
accumlator.append(input[i]*hidden_layer[j] * self.link[i][j])
energy_vh = sum(accumlator)
energy = -1 * (energy_v + energy_h + energy_vh)
return energy
def estimate(self,input:Input) -> float:
#Calculate energy function
energy = self.energy(input)
#Calculate partition function
all_patterns = list(itertools.product(range(0,2), repeat=self.Vn))
partition = sum(list(map(lambda x: np.exp(-1 *self.energy(list(x))) , all_patterns)))
return np.exp(-1* energy ) / partition
def getHiddenValue(self,input:Input) :
hidden_layer = np.zeros(self.Hn)
for j in range(self.Hn) :
messege_in_sampling = self.bias_h[j] + sum(list(map(lambda i: input[i] * self.link[i][j] , range(self.Vn))))
sigmoid_belief = 1/(1+ np.exp(-1 * messege_in_sampling))
r = np.random.rand(1)
if sigmoid_belief > r[0]:
hidden_layer[j] = 1
else :
hidden_layer[j] = 0
self.hidden_value = hidden_layer
return hidden_layer
Execution example
#Build
visible_nord = 3
hidden_nord = 4
rbm = RBM(visible_nord,hidden_nord)
#Learning data creation
training_data = np.array(list(map(lambda x : np.random.randint(0,2) , range(0,300)))).reshape(100,3)
#Parameter learning
rbm.train(training_data,10)
print(rbm.link)
input = [1,0,1]
#Get the value of the hidden layer
rbm.getHiddenValue(input)
#Energy function output
rbm.energy(input)
#Probability density output
rbm.estimate()
It seems to work, but it's unclear if it fits. .. .. I will move on for the time being.
Next time, we will combine this limited Boltzmann machine to configure a deep Boltzmann machine. The deeper the layer, the more difficult it is to learn, so the story is that you can get a good initial value of the parameters by pre-learning using the restricted Boltzmann machine first, and then use this to fine-tune the entire model. Will be.
Recommended Posts