This is the most basic algorithm for regression. Algorithms such as Bayesian linear regression, neural networks, and Gaussian processes that will appear later are added with probability theory or devised so that they can be applied to higher dimensional spaces, but the basic part Are common.
Suppose you observe an input $ \ boldsymbol {x} = (x_1, x_2, \ cdots, x_N) ^ T $ and an output $ \ boldsymbol {t} = (t_1, t_2, \ cdots, t_N) ^ T $. .. (N is the number of data) Now suppose you want to predict the output $ t $ for the new input $ x $. This is called the "regression problem".
Now consider a function $ y (x) $. When $ x_1, x_2, \ cdots, x_N $ are assigned to this function, the output $ y (x_1), y (x_2), \ cdots, y (x_N) $ is $ t_1, t_2, \ cdots. If the value is close to, t_N $, then the output $ t $ for the new input $ x $ is likely to match $ y (x) $.
In polynomial curve fitting, we introduce the parameter $ \ boldsymbol {w} = (w_0, w_1, \ cdots, w_M) ^ T $ and consider the following polynomial. (M is the dimension of the polynomial: a fixed parameter.)
y(x, \boldsymbol{w})=w_0+w_1x^1+w_2x^2+{\cdots}w_Mx^M= \boldsymbol{w}^T\boldsymbol{\phi}(x)
However, $ \ boldsymbol {\ phi} (x) = (1, x, x ^ 2, \ cdots, x ^ M) ^ T $
Also, the result of substituting $ x_1, x_2, \ cdots, x_N $ for this function is summarized as follows.
\boldsymbol{y}(\boldsymbol{x}, \boldsymbol{w})=(y(x_1, \boldsymbol{w}),y(x_2, \boldsymbol{w}),\cdots,y(x_N, \boldsymbol{w}))^T=\boldsymbol{\Phi}\boldsymbol{w}
However, $ \ boldsymbol {\ Phi} = (\ boldsymbol {\ phi} (x_1), \ boldsymbol {\ phi} (x_2), \ cdots, \ boldsymbol {\ phi} (x_N)) ^ T $
Find $ \ boldsymbol {w} $, which matches $ \ boldsymbol {y} (\ boldsymbol {x}, \ boldsymbol {w}) $ with $ \ boldsymbol {t} $ as much as possible. To do this, first take the difference between the two and square it.
\begin{align}
E(\boldsymbol{w}) &= (\boldsymbol{y}(\boldsymbol{x}, \boldsymbol{w})-\boldsymbol{t})^T(\boldsymbol{y}(\boldsymbol{x}, \boldsymbol{w})-\boldsymbol{t})\\
&=\boldsymbol{w}^T\boldsymbol{\Phi}^T\boldsymbol{\Phi}\boldsymbol{w} - 2\boldsymbol{t}^T\boldsymbol{\Phi}\boldsymbol{w} +\boldsymbol{t}^T\boldsymbol{t}
\end{align}
I want to find $ \ boldsymbol {w} $ that makes this error $ E (\ boldsymbol {w}) $ as small as possible, so if I differentiate it with $ \ boldsymbol {w} $ and set it to $ 0 $,
\begin{align}
\frac{dE(\boldsymbol{w})}{d{\boldsymbol{w}}} &=2\boldsymbol{\Phi} ^T\boldsymbol{\Phi}\boldsymbol{w} - 2\boldsymbol{\Phi}^T\boldsymbol{t} = 0
\end{align}
Solve this,
\boldsymbol{w} =(\boldsymbol{\Phi} ^T\boldsymbol{\Phi})^{-1}\boldsymbol{\Phi}^T\boldsymbol{t}
It will be. Let's implement it.
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import numpy as np
import matplotlib.pyplot as plt
#Training data
data = np.array(
[[0.000000,0.349486],
[0.111111, 0.830839],
[0.222222, 1.007332],
[0.333333, 0.971507],
[0.444444, 0.133066],
[0.555556, 0.166823],
[0.666667, -0.848307],
[0.777778, -0.445686],
[0.888889, -0.563567],
[1.000000, 0.261502]])
x=data[:,0]
t=data[:,1]
#Data for plot
plotS = 100
X = np.linspace(0,1,plotS)
Y = np.zeros(plotS)
def _phi(xn,M):
ret = np.zeros([M+1])
for m in range(M+1):
ret[m] += xn**m
return ret
def _Phi(x,M):
N = x.shape[0]
ret = np.zeros([N,M+1])
for n in range(N):
ret[n,:] = _phi(x[n],M)
return ret
plotArea = 0
for M in [0,1,3,9]:
#w learning
Phi = _Phi(x,M)
w = np.linalg.inv(Phi.T.dot(Phi)).dot(Phi.T).dot(t)
plotArea += 1
plt.subplot(2,2,plotArea)
#Training data plot
plt.plot(x,t,'o',c='w',ms=5,markeredgecolor='blue',markeredgewidth=1)
#True curve plot
plt.plot(X,np.sin(2 * np.pi * X),'g')
#Approximate curve plot
for i in range(plotS):
Y[i] = w.dot(_phi(X[i],M))
plt.plot(X,Y,'r')
Recommended Posts