Pattern recognition and machine learning, Chapter 3 reproduces how Bayesian linear regression converges the predicted distribution. What we are doing is almost the same as the Bayesian estimation we did in Chapter 1, only the basis function is a Gaussian function (3.4). Therefore, the code is almost diverted.
Compared to Chapter 1, in Chapter 3, the expression of mathematical formulas has been rewritten to handle vectors. Is there any intention in this area? Since I introduced the concept of the feature vector in (3.16) here, I wonder if this is the way to write it, but what about it?
(1) What you want to find is the predicted distribution (3.57).
p(t|{\bf x}, {\bf t}, \alpha, \beta) = N (t| {\bf m}^T_N \phi({\bf x}), \sigma^2_N ({\bf x})) (3.58)
② The basis function is $ \ phi_i (x) = \ exp \ {-\ frac {(x-\ mu_j) ^ 2} {2s ^ 2} } $ in (3.4). Although 9 basis functions are specified, they are arranged so that they are evenly distributed because there is no instruction on how to arrange them.
(3) Since there are the following three formulas for calculating the mean and variance of the predicted distribution, implement each of them to obtain the predicted distribution.
{\bf m}_N = \beta{\bf S}_N\Phi(x_n)^{\bf T}{\bf t}(3.53)
\sigma^2_N(x) = \beta^{-1} + \phi({\bf x})^{\bf T} {\bf S}_N \phi({\bf x}). (3.59)
{\bf S}^{-1}_N = \alpha {\bf I} + \beta\Phi^{\bf T}\Phi(3.54)
In drawing the figure, @ naoya_t's This article (Reproduction of the Bayesian linear regression (PRML §3.3)) shows how the sample data converges as it increases. ) I wasn't confident that I could come up with an idea to draw more beautifully, so I used it as a reference, almost like a sutra copy. Thank you very much.
import numpy as np
from numpy.linalg import inv
import pandas as pd
from pylab import *
import matplotlib.pyplot as plt
def func(x_train, y_train):
# (3.4) Gaussian basis function
def phi(s, mu_i, M, x):
return np.array([exp(-(x - mu)**2 / (2 * s**2)) for mu in mu_i]).reshape((M, 1))
#(3.53)' ((1.70)) Mean of predictive distribution
def m(x, x_train, y_train, S):
sum = np.array(zeros((M, 1)))
for n in xrange(len(x_train)):
sum += np.dot(phi(s, mu_i, M, x_train[n]), y_train[n])
return Beta * phi(s, mu_i, M, x).T.dot(S).dot(sum)
#(3.59)'((1.71)) Variance of predictive distribution
def s2(x, S):
return 1.0/Beta + phi(s, mu_i, M, x).T.dot(S).dot(phi(s, mu_i, M, x))
#(3.53)' ((1.72))
def S(x_train, y_train):
I = np.identity(M)
Sigma = np.zeros((M, M))
for n in range(len(x_train)):
Sigma += np.dot(phi(s, mu_i, M, x_train[n]), phi(s, mu_i, M, x_train[n]).T)
S_inv = alpha*I + Beta*Sigma
S = inv(S_inv)
return S
#params for prior probability
alpha = 0.1
Beta = 9
s = 0.1
#use 9 gaussian basis functions
M = 9
# Assign basis functions
mu_i = np.linspace(0, 1, M)
S = S(x_train, y_train)
#Sine curve
x_real = np.arange(0, 1, 0.01)
y_real = np.sin(2*np.pi*x_real)
#Seek predictive distribution corespponding to entire x
mean = [m(x, x_train, y_train, S)[0,0] for x in x_real]
variance = [s2(x, S)[0,0] for x in x_real]
SD = np.sqrt(variance)
upper = mean + SD
lower = mean - SD
plot(x_train, y_train, 'bo')
plot(x_real, y_real, 'g-')
plot(x_real, mean, 'r-')
fill_between(x_real, upper, lower, color='pink')
xlim(0.0, 1.0)
ylim(-2, 2)
title("Figure 3.8")
show()
if __name__ == "__main__":
# Maximum observed data points (underlining function plus some noise)
x_train = np.linspace(0, 1, 26)
#Set "small level of random noise having a Gaussian distribution"
loc = 0
scale = 0.3
y_train = np.sin(2*np.pi*x_train) + np.random.normal(loc,scale,26)
#Sample data pick up
def randidx(n, k):
r = range(n)
shuffle(r)
return sort(r[0:k])
for k in (1, 2, 5, 26):
indices = randidx(size(x_train), k)
func(x_train[indices], y_train[indices])
Reproduction of Bayesian Linear Regression (PRML §3.3)
Recommended Posts