I wrote GP with numpy

Introduction

Nice to meet you, my name is Mimura and I will be in charge of the 21st day of the Advent Calendar of the NTT DoCoMo Service Innovation Department. This year as well, DoCoMo volunteers will write an Advent Calendar, so I would like to write an article.

Suddenly, do you know the regression problem? Regression problems are problems used in various places such as stock price forecasts. DoCoMo also deals with this regression problem in various places.

In this article, Gaussian process regression is one of the methods to solve this regression problem, and I will implement it. The purpose of this time is to implement it with numpy, not a theoretical explanation. Thank you. For those who want to know the specific contents of this area, ["Gauss process and machine learning (machine learning professional series)"](https://www.amazon.co.jp/%E3%82%AC%E3%82% A6%E3%82%B9%E9%81%8E%E7%A8%8B%E3%81%A8%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92- % E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 83% 97% E3% 83% AD% E3% 83% 95% E3% 82% A7% E3 % 83% 83% E3% 82% B7% E3% 83% A7% E3% 83% 8A% E3% 83% AB% E3% 82% B7% E3% 83% AA% E3% 83% BC% E3% 82 % BA-% E6% 8C% 81% E6% A9% 8B-% E5% A4% A7% E5% 9C% B0 / dp / 4061529269 / ref = sr_1_1? __mk_ja_JP =% E3% 82% AB% E3% 82% BF% E3% 82% AB% E3% 83% 8A & keywords =% E3% 82% AC% E3% 82% A6% E3% 82% B9% E9% 81% 8E% E7% A8% 8B & qid = 1575191954 & sr = 8-1 ](Https://www.amazon.co.jp/ Gauss process and machine learning-Machine learning professional series-Mochihashi-Earth / dp / 4061529269 / ref = sr_1_1? -1 "Call" Gaussian Process and Machine Learning (Machine Learning Professional Series) "")

Also, if you actually use Gaussian processes, use GPyTorch or GPy! Finally, this time we have not optimized the kernel functions or adjusted the parameters. sorry…

A brief explanation of how to solve a regression problem

The simple regression problem is simply "I want to predict some data $ Y $ from some observable data $ X $". One way to solve this problem is to prepare a function like $ y = Ax + B $. If we can observe the variable $ x_i $, we can predict $ y_i $.

However, in reality, it is difficult to express it with a simple function such as $ y = Ax + B $. The way to solve this is to make the formula difficult and improve the expressiveness like $ y = Ax ^ 2 + Bx + C $.

This time, we will express this complicated function as $ y = W ^ T \ phi (x) $. For example, if $ W = [A, B, C], \ phi (x) = [x ^ 2, x ^ 1, x ^ 0] $, then $ y = Ax ^ 2 + Bx + C $ is expressed well. can do. In other words, if you can make $ \ phi (x) $ difficult and find $ W $ for it, you can predict $ Y $ well! !! !! !!

Yay! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !!

If it ends here, it will be against the subject, so I will explain the Gaussian process from here.

A brief explanation of the Gaussian process

First, consider making $ y = W ^ T \ phi (x) $ $ \ phi (x) $ a Gaussian distribution. In other words, try $ \ phi (x) = \ exp (-\ frac {(x- \ mu_h) ^ 2} {\ sigma}) $. You can still solve the regression problem.

However, with this method, the amount of calculation becomes enormous due to the curse of dimensionality. To solve this problem, consider a method of taking the expected value in $ W $ and integrating and eliminating it from the model.

If $ w $ is generated from a Gaussian distribution with mean $ 0 $ and variance $ \ lambda ^ 2 I $ as $ y = \ Phi W $, it will be $ w \ sim N (0, \ lambda ^ 2 I) $. .. This means that $ y $ is "a linear transformation of $ W $ following a Gaussian distribution with the constant matrix $ \ Phi $".

At this time, the expected value and covariance are

Will be

From this result, $ y $ follows a multivariate Gaussian distribution of $ y \ sim N (0, \ lambda ^ 2 \ Phi \ Phi ^ 2) $.

$ K = \ lambda ^ 2 \ Phi \ Phi ^ T $ and $ K_ {n, n'} = \ lambda ^ 2 \ phi (X_n) \ phi (X_ {n'}) $ and the average of $ y $ If you normalize to $ 0 $, you get $ y \ sim N (0, K) $. Even with this, the calculation of $ K_ {n, n'} = \ phi (X_n) \ phi (X_ {n'}) $ is still heavy ... But here is the famous "use kernel tricks!" The idea is that $ K_ {n, n'} $ can be calculated without having to work hard to calculate $ \ phi (X_n) $.

When you come to this point, you can get more data

D=\{(x_1,y_1),(x_2,y_2),\cdots,(x_N,y_N)\} \\

After learning based on this, what happens to $ y ^ {new} $ when observing new data $ x ^ {new} $ = y ^ {new} = w ^ T \ phi (x ^ {new) }) All you have to do is solve $.

To achieve this, $ y'= (y_1, y_2, \ cdots, y_N, y ^ {new}), X = (x_1, x_2, \ cdots, x_N, x ^ {new}) $ It becomes y \ sim N (0, K') $.

This way

\begin{pmatrix}
y \\ 
y^{new}
\end{pmatrix}
\sim 
N\left(0,\begin{pmatrix}
K,k_{new} \\ 
k_{new}^T,k_{new,new}
\end{pmatrix}\right)

Can be written as. Here, $ k_ {new} and k_ {new, new} $ are as follows.

\left\{ 
\begin{array}{}
k_{new} &=(k(x^{new},x_1),k(x^{new},x_2),\cdots,k(x^{new},x_N))^T \\
k_{new,new}&= (k(x^{new},x^{new}))
\end{array}
\right.

here

\begin{pmatrix}
y_1 \\ 
y_2
\end{pmatrix}
\sim N\left(
\begin{pmatrix}
\mu_1 \\ 
\mu_2
\end{pmatrix}
,
\begin{pmatrix}
\Sigma_{1,1},\Sigma_{1,2} \\
\Sigma_{2,1},\Sigma_{2,2}
\end{pmatrix}
\right)

When there was the formulap(y_2|y_1)Top(y_2|y_1)=N(\mu_2+\Sigma_{2,1}\Sigma_{1,1}^{-1}(y_1-\mu_1),\Sigma_{2,2}-\Sigma_{2,1}\Sigma_{2,1}^{-1}\Sigma_{1,2})It is known that it can be expressed by. This time\mu_1=0,\mu_2=0Sop(y_2|y_1)=N(\Sigma_{2,1}\Sigma_{1,1}^{-1}y_1,\Sigma_{2,2}-\Sigma_{2,1}\Sigma_{2,1}^{-1}\Sigma_{1,2})It can be expressed by.

It was so long, but in other words

p(y^{new}|X^{new},D)=N(k_{new}^TK^{-1}y,k_{new,new}-k_{new}^TK^{-1}k_{new})

You just have to implement! !! !!

Let's actually implement it!

p(y^{new}|X^{new},D)=N(k_{new}^TK^{-1}y,k_{new,new}-k_{new}^TK^{-1}k_{new})

I just found out that I should implement this.

Also, $ K, k_ {new}, k_ {new, new} $

\left\{ 
\begin{array}{}
K   &=\left[\begin{array}{}
k(x_1,x_1),k(x_1,x_2),\cdots,k(x_1,x_N) \\
k(x_2,x_1),k(x_2,x_2),\cdots,k(x_2,x_N) \\
\cdots \\
k(x_N,x_1),k(x_N,x_2),\cdots,k(x_N,x_N)
\end{array}
\right] \\
k_{new} &=(k(x^{new},x_1),k(x^{new},x_2),\cdots,k(x^{new},x_N))^T \\
k_{new,new}&= (k(x^{new},x^{new}))
\end{array}
\right.

was.

k(x,x')=\theta_1\exp \left(-\frac{(x-x')^2}{\theta_2}\right)+\theta_3\delta(x,x')

Then the kernel function

def RBF(X,X_):
    theta_1  = 1
    theta_2  = 0.2
    theta_3  = 0.01
    distance =  ((X - X_)**2).sum(-1)
    if distance.shape[0] == distance.shape[1]:
        return theta_1 * np.exp(-1 * distance/theta_2) + theta_3 * np.eye(len(X))[:,:X.shape[1]]
    else:
        return theta_1 * np.exp(-1 * distance/theta_2) 

You can write with.

Let's use this to calculate $ K ^ {-1}, k_ {new}, k_ {new, new} $. First, calculate $ K ^ {-1} $.

X_    = np.array([X for _ in range(len(X))])
K     = RBF(X_,X_.transpose(1,0,2)) 
inv_K = np.linalg.inv(K)

Then calculate $ k_ {new} $ and $ k_ {new, new} $.

X__        = np.array([X      for _ in range(len(Test_X))])
Test_X__   = np.array([Test_X for _ in range(len(X))]).transpose(1,0,2)
Test_X__a  = np.array([Test_X for _ in range(len(Test_X))]).transpose(1,0,2)
k          = RBF(X__,Test_X__)
k__        = RBF(Test_X__a,Test_X__a)

Finally

p(y^{new}|X^{new},D)=N(k_{new}^TK^{-1}y,k_{new,new}-k_{new}^TK^{-1}k_{new})

To generate from

Y_result   = np.random.multivariate_normal(k.dot(inv_K).dot(Y),k__ - k.dot(inv_K).dot(k.T))

Just call

Experiment with bike share data in New York

Use New York's Bike Share Data (https://www.citibikenyc.com/system-data) to predict your picture. This time, I tried to visualize the number of returned bicycles from 8:00 am to 12:00 noon on June 30th.

It is logarithmically normalized to the returned number for easy viewing.

plot_demand.png

Let's visualize using the Gaussian process Then it looks like this.

GP_result_port.png

The star is the position of the port. $ 0 $ and below are complemented with $ 0 $ for easy viewing.

The results show that there is a high demand in the center. It was inferred that this time there is a high demand because various values are sampled at positions without ports due to the large variance. In this way, the good thing about the Gaussian process is that it can not only predict where there is no data, but also calculate the variance of that part.

Finally

You can solve the regression problem like this, so why don't you try it once?

Recommended Posts

I wrote GP with numpy
I wrote matplotlib
I made a life game with Numpy
I made a random number graph with Numpy
I played with wordcloud!
Moving average with numpy
Getting Started with Numpy
Learn with Cheminformatics NumPy
Matrix concatenation with Numpy
Hamming code with numpy
Regression analysis with NumPy
Extend NumPy with Rust
I also wanted to check type hints with numpy
I tried Smith standardizing an integer matrix with Numpy
[Python, ObsPy] I wrote a beach ball with Matplotlib + ObsPy
I wrote you to watch the signal with Go
Kernel regression with Numpy only
I tried fp-growth with python
I tried scraping with Python
I wrote python in Japanese
I wrote the code for Japanese sentence generation with DeZero
CNN implementation with just numpy
Artificial data generation with numpy
I tried Learning-to-Rank with Elasticsearch!
[Python] Calculation method with numpy
I made blackjack with python!
I wrote a program quickly to study DI with Python ①
Try matrix operation with NumPy
I wrote the basic grammar of Python with Jupyter Lab
Diffusion equation animation with NumPy
I tried clustering with PyCaret
Debt repayment simulation with numpy
Implemented SMO with Python + NumPy
Stick strings together with Numpy
I implemented VQE with Blueqat
I want to change the Japanese flag to the Palau flag with Numpy
Handle numpy arrays with f2py
I wrote the basic operation of Numpy in Jupyter Lab.
I can't search with # google-map. ..
Use OpenBLAS with numpy, scipy
I measured BMI with tkinter
I tried gRPC with Python
I made COVID19_simulator with JupyterLab
I tried scraping with python
I made Word2Vec with Pytorch
I made blackjack with Python.
I wrote the basic operation of matplotlib with Jupyter Lab
Python3 | Getting Started with numpy
I made wordcloud with Python.
Implementing logistic regression with NumPy
I wrote a script to get you started with AtCoder fast!
I wrote the basic operation of Pandas with Jupyter Lab (Part 1)
I wrote the basic operation of Pandas with Jupyter Lab (Part 2)
Perform least squares fitting with numpy.
I tried trimming efficiently with OpenCV
I can't install python3 with pyenv-vertualenv
I tried summarizing sentences with summpy
I tried machine learning with liblinear
I tried web scraping with python.
I tried moving food with SinGAN
Numpy functions I learned this year