(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.

Introduction

I am studying Bayesian theory. This time, I will summarize linear regression based on Bayesian theory.   The books and articles that I referred to are as follows.

What is Bayesian linear regression?

The idea of linear regression that you come across while learning machine learning is different from that of Bayesian linear regression that we are dealing with this time. The outline is shown below.  image.png

(Frequency theory) Linear regression

To distinguish linear regression based on the least squares method from Bayesian linear regression, this time it is called frequency theory linear regression. * If the name is not strictly frequency-based, we would appreciate it if you could comment.

This method can be obtained very easily even in Excel. I also often use it when compiling data in my work. Well, this method is very simple to think about.


E(w) = \frac {1}{2}\sum_{n=1}^{N}(y(x_n,w)-t_n)^2

However,

All you have to do is find the polynomial coefficient $ w $ that minimizes $ E (w) $, which is called this error function.

Bayesian linear regression

On the other hand, consider Bayesian linear regression. Consider the following probabilities for the observed data $ X, Y $.


p(w|X,Y) = \frac {p(w)p(Y|X,w)}{p(X,Y)}

here,

It will be.

The point of the Bayesian way of thinking is that all are organized by probability (= distribution). ** I understand that I try to capture it as a distribution including the optimal solution **.

Another point is the idea called Bayesian inference. It refers to the act of trying to find a probability (= posterior probability) with a value (= prior probability) that is assumed in advance. When I first heard it, it was refreshing.

In this case,p(w|X,Y)IsX,YWeight parameter when is givenwI want to find the probability distribution of. This is the observed valueX,Weight parameterwbyYIs the predicted distribution ofp(Y|X,w)It is calculated using. By the way, I'm wondering how to calculate using $ w $ even though I should have wanted to know $ w $.

In this Bayesian way of thinking, it is necessary to be aware of the time axis. In other words, it is necessary to consider the flow of time to temporarily determine the weight parameter from a certain data and update it to a good value at any time. This is a very convenient idea used in the process of increasing data from a small number of data.

Actually find the distribution (formation of formula in Bayesian linear regression)

Now, I would like to solve the regression problem by Bayesian linear regression. Consider the following for $ p (w | X, Y) $, with the parameter $ t $.

p(w|t) \propto p(t|w)p(w)

Originally, $ p (t) $ should come to the denominator when performing the calculation, but since the calculation is complicated, we will consider it with the proportional expression $ \ propto $. Here, $ p (t | w) $ and $ p (w) $, respectively, are as shown below.

\begin{align}
p(t|w)&=\prod_{n=1}^N \left\{ \mathcal{N} (t_n|w^T\phi(x_n), \beta^{-1}) \right\}\\

\end{align}\\
p(w)=\mathcal{N}(w|m_0, S_0)

It is formulated as following the Gaussian distribution.

In addition, constants and functions are shown below.

=\prod_{n=1}^N \left\{ \mathcal{N} (t_n|w^T\phi(x_n), \beta^{-1}) \right\} \mathcal{N}(w|m_0, S_0)\\
\propto \left( \prod_{n=1}^N exp\left[ -\frac{1}{2} \left\{ t_n - w^T \phi(x_n) \right\}^2 \beta \right] \right) exp\left[ -\frac{1}{2} (w - m_0)^TS_0^{-1}(w - m_0) \right]\\
= exp\left[  -\frac{1}{2} \left\{ \beta \sum_{n=1}^N \left( t_n - w^T\phi(x_n) \right)^2 + (w - m_0)^TS_0^{-1}(w - m_0) \right\} \right]

Can be transformed with. After that, we will carefully develop this formula.

\beta \sum_{n=1}^N \left( t_n - w^T\phi(x_n) \right)^2 + (w - m_0)^TS_0^{-1}(w - m_0)\\
= \beta \left( \begin{array}{c}
t_1 - w^T\phi(x_1) \\
\vdots\\
t_N - w^T\phi(x_N)
\end{array} \right)^T
\left( \begin{array}{c}
t_1 - w^T\phi(x_1) \\
\vdots\\
t_N - w^T\phi(x_N)
\end{array} \right)
+ (w - m_0)^TS_0^{-1}(w - m_0)

It can be expressed as $ w ^ T \ phi (x_1) = \ phi ^ T (x_1) w $. Therefore,

= \beta \left( \begin{array}{c}
t_1 - \phi^T(x_1)w \\
\vdots\\
t_N - \phi^T(x_N)w
\end{array} \right)^T
\left( \begin{array}{c}
t_1 - \phi^T(x_1)w \\
\vdots\\
t_N - \phi^T(x_N)w
\end{array} \right)
+ (w - m_0)^TS_0^{-1}(w - m_0)\\

It will be. Now, here, the basis function is expressed in a form called a design matrix as shown below.

\Phi = \left( \begin{array}{c}
\phi^T(x_1) \\
\vdots\\
\phi^T(x_N)
\end{array} \right)

I will use this to further summarize. The terms of only $ \ beta $ and $ m $ are constant terms, but they are put together as $ C $.

= \beta (t - \Phi w)^T(t - \Phi w) + (w - m_0)^TS_0^{-1}(w - m_0)\\
= \beta ( w^T\Phi^T\Phi w - w^T\Phi^Tt - t^T\Phi w ) + w^TS_0^{-1}w - w^TS_0^{-1}m_0 - m_0^TS_0^{-1}w + C\\

Since $ S_0 $ is covariant, it is a symmetric matrix. Therefore, since its inverse matrix is also a symmetric matrix, it can be expressed as $ (S_0 ^ {-1}) ^ T = S_0 ^ {-1} $. Apply this to the coefficient of $ w $ and

= w^T(S_0^{-1} + \beta \Phi^T\Phi)w - w^T(S_0^{-1}m_0 + \beta \Phi^T t) - (S_0^{-1}m_0 + \beta \Phi^T t)^Tw + C

Here, if $ R = S_0 ^ {-1} m_0 + \ beta \ Phi ^ Tt $

= w^TS_N^{-1}w - w^TR - R^Tw + C

I was able to summarize. Furthermore, it can be summarized by square completion and factorization from here, but it is a rather difficult calculation. Confirm the match by expanding the formula that is completed in Amanojaku.

Expand expression

\mathcal{N}(w|m_N, S_N)\\
\propto exp\left\{ -\frac{1}{2} (w - m_N)^T S_N^{-1} (w - m_N) \right\}

Again, we will only discuss the contents of $-\ frac {1} {2} $ in $ exp $ in the Gaussian formula.

(w - m_N)^T S_N^{-1} (w - m_N)\\

Here, $ m_N and S_N $ are given as follows.

m_N = S_N(S_0^{-1} m_0 + \beta \Phi^Tt) = S_NR\\
S_N^{-1} = S_0^{-1} + \beta \Phi^T \Phi

Here, $ S_N $ also appears in the target formula derived from the posterior distribution, so do not expand it, but expand $ m_N $.

(w - m_N)^T S_N^{-1} (w - m_N)\\
= (w - S_NR)^T S_N^{-1} (w - S_NR)\\
= w^T S_N^{-1} w - w^T R - R^Tw + C

Therefore, posterior distributionp(w|t)When\mathcal{N}(w|m_N, S_N)Has the same normal distribution.

I will implement it and check it.

This time, let's regress the randomly generated data points based on the $ sin $ function.

Make a design matrix

beyes.ipynb



#Make a design matrix
Phi = np.array([phi(x) for x in X])
 
#Hyperparameters
alpha = 0.1
beta = 9.0
M = 12

Create a randomly distributed sin function-like

beyes.ipynb



n = 10
X = np.random.uniform(0, 1, n)
T = np.sin(2 * np.pi * X) + np.random.normal(0, 0.1, n)
plt.scatter(X, T)
plt.plot(np.linspace(0,1), np.sin(2 * np.pi * np.linspace(0,1)), c ="g")
plt.show()

015.png

Find the mean and variance of posterior probabilities

For easy calculation this time, set $ m_0 = 0 $, $ S_0 = α ^ {-1} I $ $ S_N ^ {-1} = αI + \ beta \ Phi ^ T \ Phi $, $ m_N = βS_N \ Phi ^ Tt $.

beyes.ipynb


#Variance of posterior probabilities
S = np.linalg.inv(alpha * np.eye(M) + beta * Phi.T.dot(Phi))
#Average posterior probability
m = beta * S.dot(Phi.T).dot(T)

Illustrated

beyes.ipynb


x_, y_ = np.meshgrid(np.linspace(0,1), np.linspace(-1.5, 1.5))
Z = np.vectorize(norm)(x_,y_)
x = np.linspace(0,1)
y = [m.dot(phi(x__)) for x__ in x]
 
plt.figure(figsize=(10,6))
plt.pcolor(x_, y_, Z, alpha = 0.2)
plt.colorbar()
plt.scatter(X, T)
#Mean of predicted distribution
plt.plot(x, y)
#Genuine distribution
plt.plot(np.linspace(0,1), np.sin(2 * np.pi * np.linspace(0,1)), c ="g")
plt.show()
 
#Sample parameters obtained from posterior distribution
m_list = [np.random.multivariate_normal(m, S) for i in range(5)]
 
for m_ in m_list:
    x = np.linspace(0,1)
    y = [m_.dot(phi(x__)) for x__ in x]
    plt.plot(x, y, c = "r")
    plt.plot(np.linspace(0,1), np.sin(2 * np.pi * np.linspace(0,1)), c ="g")

013.png

The shade indicates that the probability density is high.

At the end

This time, we have summarized Bayesian linear regression from conception to implementation. I was almost at a loss because the expression development was very complicated. However, the idea is very simple, and it is an important idea in machine learning how to obtain a plausible probability from a small amount of data.

We will continue to learn to gain a deeper understanding of Bayesian feelings.

The full program is here. https://github.com/Fumio-eisan/Beyes_20200512

Recommended Posts

(Machine learning) I tried to understand Bayesian linear regression carefully with implementation.
(Machine learning) I tried to understand the EM algorithm in a mixed Gaussian distribution carefully with implementation.
I tried to move machine learning (ObjectDetection) with TouchDesigner
I tried machine learning with liblinear
I tried to implement Bayesian linear regression by Gibbs sampling in python
Machine learning linear regression
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Implementation ~
I tried to understand the learning function in the neural network carefully without using the machine learning library (second half).
I tried to organize the evaluation indexes used in machine learning (regression model)
I tried to step through Bayesian optimization. (With examples)
I tried to compress the image using machine learning
Machine Learning: Supervised --Linear Regression
Understand machine learning ~ ridge regression ~.
I tried to understand the learning function of neural networks carefully without using a machine learning library (first half).
I tried to build an environment for machine learning with Python (Mac OS X)
Uncle SE with hardened brain tried to study machine learning
Mayungo's Python Learning Episode 3: I tried to print numbers with print
I tried to implement ListNet of rank learning with Chainer
[Machine learning] I tried to summarize the theory of Adaboost
I tried to divide with a deep learning language model
Machine learning beginners try linear regression
I tried learning LightGBM with Yellowbrick
I tried to understand it carefully while implementing the algorithm Adaboost in machine learning (+ I deepened my understanding of array calculation)
(Python) Expected value ・ I tried to understand Monte Carlo sampling carefully
I tried to make deep learning scalable with Spark × Keras × Docker
Introduction to Bayesian Statistical Modeling with python ~ Trying Linear Regression with MCMC ~
[Machine learning] I tried to do something like passing an image
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Introduction ~
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 1
I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 2
I tried to create a linebot (implementation)
I tried multiple regression analysis with polynomial regression
Machine learning algorithm (generalization of linear regression)
I tried to implement Autoencoder with TensorFlow
I tried to visualize AutoEncoder with TensorFlow
Record the steps to understand machine learning
I tried to get started with Hy
Machine learning with python (2) Simple regression analysis
I installed Python 3.5.1 to study machine learning
<Course> Machine Learning Chapter 1: Linear Regression Model
I tried to implement CVAE with PyTorch
I tried to solve TSP with QAOA
Machine learning algorithm (linear regression summary & regularization)
I tried to implement deep learning that is not deep with only NumPy
Mayungo's Python Learning Episode 2: I tried to put out characters with variables
I tried to classify guitar chords in real time using machine learning
I tried to understand the decision tree (CART) that makes the classification carefully
A beginner of machine learning tried to predict Arima Kinen with python
I started machine learning with Python (I also started posting to Qiita) Data preparation
I tried to make Othello AI with tensorflow without understanding the theory of machine learning ~ Battle Edition ~
[Python] Easy introduction to machine learning with python (SVM)
I tried to detect Mario with pytorch + yolov3
I tried to implement reading Dataset with PyTorch
I tried to use lightGBM, xgboost with Boruta
I tried to learn logical operations with TF Learn
I tried to move GAN (mnist) with keras
Mayungo's Python Learning Episode 5: I tried to do four arithmetic operations with numbers
I tried to save the data with discord
I tried to detect motion quickly with OpenCV
I tried to integrate with Keras in TFv1.1