I read through last month ["Introduction to Machine Learning by Bayesian Inference"](https://www.amazon.co.jp/ Machine Learning Startup Series-Introduction to Machine Learning by Bayesian Inference-KS Information Science Special Book-Suyama-Atsushi / dp / I implemented 4061538322) in Python.
The content is Chapter 4, "4.3 Inference in Poisson Mixed Model". A green book with a flower pattern. This is an introduction ... Machine learning is too deep. Lol
This time, we will create a bimodal Poisson distribution as multimodal one-dimensional data.
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline
plt.figure(figsize=(12, 5))
plt.title('Poisson distribution', fontsize=20)
#Creating sample data
Lambda_x1 = 30
Lambda_x2 = 50
samples_x1 = 300
samples_x2 = 200
x1 = np.random.poisson(Lambda_x1, samples_x1)
x2 = np.random.poisson(Lambda_x2, samples_x2)
#Data combination
X = np.concatenate([x1, x2])
plt.hist(X, 30, density=True, label='all data')
plt.legend()
plt.show()
The result is as follows.
This is a complete coincidence, but this graph does not seem to be bimodal intuitively, so it is a perfect proof of the effectiveness of Bayesian inference!
First, as a preparation, create a function that calculates logsumexp.
… And here, “logsumexp” hasn't appeared in the book, right? I thought you there! That's right. It doesn't come out. Lol
However, this "logsumexp" plays an important role in this algorithm!
First, this function is used in equation (4.38) in the book. Since there is a conditional expression of η in the lower row, logsumexp is required.
In fact, if you normally calculate η according to the formula in the upper row, η does not satisfy the conditional formula in the lower row, so you need to "normalize" it. Also, in this normalization, it is an operation to "align the total values to 1", so it is necessary to "divide each value by the total value". Therefore, the formula described above is transformed as follows. The second formula is exp(logx) = x exp(-logx) = -x It can be transformed from the formula of.
Also, the third stage is easy if you use the exponential law.
As a result, a term for normalization has been added at the end of the third row. If you look closely at this normalization term, it contains "log", "sum [= Σ]", and "exp [= η]"!
And when dealing with this "logsumexp", it seems that overflow or underflow may occur ...
Therefore, in order to prevent overflow and underflow in advance, we need a logsumexp function to be implemented in the future!
It's been long, but the function itself is easy, so let's implement it now.
def log_sum_exp(X):
max_x = np.max(X, axis=1).reshape(-1, 1)
return np.log(np.sum(np.exp(X - max_x), axis=1).reshape(-1, 1)) + max_x
I can't explain it any further, so please check the article "Mixed Gaussian distribution and logsumexp" for details.
We will finally implement the algorithm from here!
This time, ["Introduction to Machine Learning by Bayesian Inference"](https://www.amazon.co.jp/ Machine Learning Startup Series-Introduction to Machine Learning by Bayesian Inference-KS Information Science Special Book-Suyama-Atsushi / dp / Implement "Algorithm 4.2 Gibbs Sampling for Poisson Mixed Model" described in 4061538322) based on Python numpy.
#Prepare a list for sampling
sample_s = []
sample_lambda = []
sample_pi = []
#Constant setting(Number of repetitions,The number of samples,Number of clusters)
MAXITER = 100
N = len(X)
K = 2
#Parameter initial value
init_Gam_param_a = 1
init_Gam_param_b = 1
init_Dir_alpha = np.ones(K)
# λ,Initial value setting of π
Lambda = np.ones(K)
Pi = np.ones(K)
#Normalized according to the condition of π(Divide each value by the total value)
if np.sum(Pi) != 1:
Pi = Pi / np.sum(Pi)
#Repeated sampling for the number of MAXITER
for i in range(MAXITER):
#Prepare the data base of s
s = np.zeros((N, K))
#Calculate η to sample s
log_eta = np.dot(X.reshape(N, 1), np.log(Lambda.reshape(1, K))) - Lambda.reshape(1, K) + np.log(Pi.reshape(1, K))
#Normalize η(Use logsumexp to prevent overflow and underflow)
logsumexp_eta = -1 * log_sum_exp(log_eta)
eta = np.exp(log_eta + logsumexp_eta)
#Sample s from the category distribution with η as a parameter
for n in range(N):
s[n] = np.random.multinomial(1, eta[n])
#Add to sample list
sample_s.append(np.copy(s))
#a to sample λ,Calculate b
Gam_param_a = (np.dot(s.T, X.reshape(N, 1)) + init_Gam_param_a).T[0]
Gam_param_b = np.sum(s, axis=0).T + init_Gam_param_b
# a, 1/Sample λ from the gamma distribution with b as a parameter
Lambda = np.random.gamma(Gam_param_a, 1/Gam_param_b)
#Add to sample list
sample_lambda.append(np.copy(Lambda))
#Calculate α to sample π
Dir_alpha = np.sum(s, axis=0) + init_Dir_alpha
#Sample π from Dirichlet distribution with α as a parameter
Pi = np.random.dirichlet(Dir_alpha)
#Add to sample list
sample_pi.append(np.copy(Pi))
#Clusters in no particular order(Because the order is not defined)
sample_s_ndarray = np.array(sample_s)
sample_lambda_ndarray = np.array(sample_lambda)
sample_pi_ndarray = np.array(sample_pi)
It is basically implemented according to the book, but you need to be careful about the following points.
-Update each parameter η, a, b, α before sampling each of s, λ, π ・ Π needs to be normalized when it is the initial value (just divide each value by the total value) ・ Η is normalized by combining with logsumexp in np.exp () ・ In the book, the for statement of N and K is used, but if you do matrix calculation, the for statement is necessary only when sampling s. -Sampled values are added to each corresponding list each time
Also, the initial value of each parameter can be 1 or something. (Be careful of normalization only for π!)
Let's check the results in order!
… And before that, note that the clusters are out of order. This time, the results were obtained by chance in the order of cluster settings, but the order of setting and the order of the obtained cluster results may not match. However, please be assured that there is no particular problem in that case as well.
Let's start with λ!
#Average value for each cluster
ave_Lambda = list(np.average(sample_lambda_ndarray, axis=0))
print(f'prediction: {ave_Lambda}')
The result is as follows. prediction: [29.921538459827033, 49.185569726045905]
λ corresponds to the average value of each cluster. When creating the data Lambda_x1 = 30 Lambda_x2 = 50 I set it, so it's pretty good accuracy!
Next, let's look at π.
#Percentage of cluster samples in all data
ave_Pi = list(np.average(sample_pi_ndarray, axis=0))
all_samples = samples_x1 + samples_x2
print(f'prediction: {ave_Pi}')
The results are as follows. prediction: [0.5801320180878531, 0.4198679819121469]
Regarding the number of data samples_x1 = 300 samples_x2 = 200 I set it. Dividing each value by the total number of data, 500, is [0.6, 0.4], so they are almost the same!
Finally, check s and finish! Thank you for your hard work.
#Number of samples in each cluster
sum_s = np.sum(np.sum(sample_s_ndarray, axis=0), axis=0) / MAXITER
print(f'prediction: {sum_s}')
Result is, prediction: [291.18 208.82] have become.
Since the number of samples obtained is doubled by the number of MAXITERs, it can be obtained by dividing the total sample of each cluster by MAXITER.
["Introduction to Machine Learning by Bayesian Inference"](https://www.amazon.co.jp/ Machine Learning Startup Series-Introduction to Machine Learning by Bayesian Inference-KS Information Science Special Book-Suyama-Atsushi / dp / 4061538322) "Mixed Gaussian distribution and logsumexp"
Recommended Posts