A python implementation of the Bayesian linear regression class

Introduction

I was worried that there are not many Bayesian linear regressions of the correct implementation in the world, and there are few implementations that support multidimensional input, so I implemented it as an easy-to-use class. The description basically follows PRML.

Implemented class

That said, it's not as big as an implementation, it's about 50 lines ... I'll write the whole class here. The functions it has are as follows. It's very simple.

-Give a combination of $ \ phi $ and $ t $ to calculate the posterior distribution --Random sampling of mean and covariance parameters of Bayesian linear regression -Give $ \ phi $ to output the mean and variance of the predicted distribution

Beyes_LR.py


import numpy as np
from scipy.stats import multivariate_normal

class BeyesLinearRegression:
    def __init__(self, mu, S, beta):
        self.mu = mu
        self.S = S
        self.beta = beta

    def calc_posterior(self, phi, t):
        S_inv = np.linalg.inv(self.S)

        if len(phi.shape) == 1:
            phi = phi.reshape(1, -1)
            t = t.reshape(1, 1)
        self.S = np.linalg.inv(S_inv + self.beta * phi.T @ phi)
        self.mu = self.S @ (S_inv @ self.mu + np.squeeze(self.beta * phi.T @ t))

    def sampling_params(self, n=1, random_state=0):
        np.random.seed(random_state)
        return np.random.multivariate_normal(self.mu, self.S, n)

    def probability(self, x):
        dist = multivariate_normal(mean=self.mu, cov=self.S)
        return dist.logpdf(x)

    def predict(self, phi):
        if len(phi.shape) == 1:
            phi = phi.reshape(1, -1)
        pred = np.array([self.mu.T @ _phi for _phi in phi])
        S_pred = np.array([(1 / self.beta) + _phi.T @ self.S @ _phi for _phi in phi])

        # Above is a simple implementation.
        # This may be better if you want speed.
        # pred = self.mu @ phi.T
        # S_pred = (1 / self.beta) + np.diag(phi @ self.S @ phi.T)
        return pred, S_pred

All the code is also in git. (Although it is about 50 lines)

GitHub

About Bayesian linear regression

Since the following is detailed about the derivation of the formula, I will not write the details.

-Implementing Bayesian linear regression in python

-Regression using Bayesian inference

Calculation of posterior distribution

The important thing is to update the distribution as follows. To put it plainly, $ \ phi $ is the explanatory variable and $ t $ is the answer. The mean and covariance matrix are updated accordingly.

M_N=S_N(S_0^{-1}m_0+\beta\Phi^Tt)
S_N^{-1}=S_0^{-1}+\beta\Phi^T\Phi

Predicted distribution

The predicted distribution is shown below. I will omit the details here as well, but the point is that the distribution is expressed by the mean and variance as shown below for the new point $ x $.

N(m_N^T\phi(x), 1/\beta+\phi(x)^TS_N\phi(x))

Use the implemented class

From here, let's try Bayesian linear regression using the class we actually implemented. The class I created needs to be given $ \ phi $ as a feature from $ x $. In a common implementation, the generation part of $ \ phi $ (for example, polynomial) is also included in the class, and it is difficult to tell whether Bayesian linear regression is performed or feature design is performed by polynomial. But here they are separated.

So, if you input the original data $ x $, $ y $ and then implement the function to create $ \ phi $, it is basically OK.

Regression with sine wave data

Let's try with Toy data first.

Data generation and feature function design

The input data is the observation data by adding noise to the true distribution of the sine wave. In addition, the design of the feature function is a composite wave of trigonometric functions of several frequencies. The method x_to_phi vectorizes 10 waves and _phi represents a composite wave. Amplitude is the parameter found by Bayesian linear regression. The image below is mathematically. (If you think about it, the first term is zero ... I don't need it ...)

y=w_1sin(0)+w_2sin(2\pi x)+w_3sin(2\times2\pi x)+\cdots+w_9sin(9\times2\pi x)
def x_to_phi(x):
    if len(x.shape) == 1:
        x = x.reshape(-1, 1)
    return np.concatenate([np.sin(2 * np.pi * x * m) for m in range(0, 10)], axis=1)


def _phi(x, params):
    return np.array([p * np.sin(2 * np.pi * x * i) for i, p in enumerate(params)]).sum(axis=0)

Click here for the actual data generation part.

x = np.arange(0, 1, 0.01)
phi = x_to_phi(x)

e = np.random.randn(len(x)) / 10
y_true = np.sin(2 * np.pi * x)
y = y_true + e

When only one point is observed

First, consider the case where only one point of the 50th data is observed.

train_idx = 50
x_train = x[train_idx]
phi_train = phi[train_idx]
y_train = y[train_idx]
plt.scatter(x_train, y_train, c='crimson', marker='o', label='observation')
plt.plot(x, y_true, label='true')

toy_input.png

Let's calculate the posterior distribution for this data immediately. If you just want to learn, it's one line like this:

#Initial value of Bayesian linear regression
input_dim = phi.shape[1]
mu = np.zeros(input_dim)
S = np.identity(input_dim)
sigma = 0.1
beta = 1.0 / (sigma ** 2)

#Model definition and learning
beyes_linear_model = BeyesLR.BeyesLinearRegression(mu, S, beta)
beyes_linear_model.calc_posterior(phi_train, y_train)

The waveforms randomly sampled from the post-learning posterior distribution are displayed with a dotted green line, and the predicted distribution is displayed in light blue. Most of them are blue because I have learned only one point.

toy_sin_predict.png

When observing 50 points

Let's do exactly the same thing from 50 observation data. The only difference in the code is to randomly select 50 train_idx. The predicted distribution of the results is shown in the figure below.

toy_sin_predict_50.png

Regression on Advertising dataset

Next is a real problem, which also solves a multidimensional linear regression. If feature extraction is performed in multiple dimensions, the dimensions will increase too much and it will be difficult to understand, so $ \ phi $ is a Linear function.

Input data

It is an Advertising dataset that is familiar to ISLR. This time, we will use TV and Radio advertising expenses as input and sales as response.

This time $ \ phi $ is linear, so just add the intercept term. The regression equation has the following form. $Sales = w_0+w_1TV+w_2Radio$

def x_to_phi(x, typ='linear', degree=3):
    if len(x.shape) == 1:
        x = x.reshape(-1, 1)
    return np.concatenate([np.ones(x.shape[0]).reshape(-1, 1), x], axis=1)


df = pd.read_csv(ADVERTISING_DATASET)
x = df[['TV', 'Radio']].values
y = df['Sales'].values

phi = x_to_phi(x)
x_train, x_test, phi_train, phi_test, y_train, y_test = \
    train_test_split(x, phi, y, train_size=0.05, random_state=0)

All you have to do is learn as in the previous example.

input_dim = phi.shape[1]
mu = np.zeros(input_dim)
S = np.identity(input_dim)
sigma = 10
beta = 1.0 / (sigma ** 2)

beyes_linear_model = BeyesLR.BeyesLinearRegression(mu, S, beta)
beyes_linear_model.calc_posterior(phi_train, y_train)

In the code, train_size is set to 0.05, but the regression plane drawn while changing this is as follows. It is learned by Bayesian linear regression, and 5 planes are extracted and drawn by random sampling. Random when the number of learning samples is small A plane is drawn, but it converges as the number of data points increases. beyes_linear.gif

in conclusion

Finally, a little promotion of Bayesian linear regression. The design of the Bayesian linear regression feature extraction function $ \ phi $ is important, although there are still some parts that are not fully understood. I recognize that Gaussian process regression treats the variance-covariance matrix as a design matrix using kernel functions without explicitly writing this down. However, experience shows that Bayesian linear regression is sufficient from the viewpoint of explanatory power if there is prior knowledge that can extract good features. In addition, due to the calculation method of Bayesian linear regression, sequential learning can be performed as it is. All you have to do is learn the calculated posterior distribution as a prior distribution. There is no need to recalculate the Gram matrix like Gaussian process regression. There may be an online version, but ...

By the way, have a comfortable Bayesian linear regression life!

Recommended Posts

A python implementation of the Bayesian linear regression class
"Linear regression" and "Probabilistic version of linear regression" in Python "Bayesian linear regression"
A reminder about the implementation of recommendations in Python
A simple Python implementation of the k-nearest neighbor method (k-NN)
PRML Chapter 4 Bayesian Logistic Regression Python Implementation
[Python] I thoroughly explained the theory and implementation of logistic regression
Plate reproduction of Bayesian linear regression (PRML §3.3)
[python] [meta] Is the type of python a type?
About the Normal Equation of Linear Regression
The story of blackjack A processing (python)
Hit a method of a class instance with the Python Bottle Web API
PRML §3.3.1 Reproduce the convergence diagram of the parameter distribution by Bayesian linear regression
Implemented in Python PRML Chapter 3 Bayesian Linear Regression
Get the caller of a function in Python
Why the Python implementation of ISUCON 5 used Bottle
Make a copy of the list in Python
Python: Prepare a serializer for the class instance:
A note about the python version of python virtualenv
[Python] A rough understanding of the logging module
Output in the form of a python array
A discussion of the strengths and weaknesses of Python
the zen of Python
Build a python environment to learn the theory and implementation of deep learning
[Python] Implementation of clustering using a mixed Gaussian model
[Python] A program that counts the number of valleys
Explanation of the concept of regression analysis using python Part 2
Find out the location of Python class definition files.
Cut a part of the string using a Python slice
Python points from the perspective of a C programmer
Calculate the regression coefficient of simple regression analysis with python
Explanation of the concept of regression analysis using Python Part 1
Tasks at the start of a new python project
Explanation of the concept of regression analysis using Python Extra 1
Overview of generalized linear models and implementation in Python
Variational Bayesian inference implementation of topic model in python
Why is the first argument of [Python] Class self?
[Python] A program that compares the positions of kangaroos.
Python Note: The mystery of assigning a variable to a variable
A note on the library implementation that explores hyperparameters using Bayesian optimization in Python
Don't take an instance of a Python exception class directly as an argument to the exception class!
About the ease of Python
Python implementation of particle filters
[Python] Linear regression with scikit-learn
Online linear regression in Python
Implementation of quicksort in Python
About the features of Python
The Power of Pandas: Python
Find out the apparent width of a string in python
I just changed the sample source of Python a little.
How to use the __call__ method in a Python class
Different from the import type of python. from A import B meaning
Have the equation graph of the linear function drawn in Python
The story of a Django model field disappearing from a class
Get the number of specific elements in a python list
[Note] Import of a file in the parent directory in Python
Find the eigenvalues of a real symmetric matrix in Python
A Python script that compares the contents of two directories
__init__ called by wxPython or Tkinter was a __init__ call of the inheriting class in Python
A memorandum of understanding for the Python package management tool ez_setup
A record of patching a python package
Create an instance of a predefined class from a string in Python