The perceptron learning rule is a method of finding the weight of a linear discriminant function by learning. The pattern is represented by a point on the feature space. For example, when considering two classes, the question is whether the points distributed in the feature space can be separated into two by the hyperplane generated by the linear discriminant function. In order to separate it into two, the weight of the linear discriminant function must be adjusted. The adjustment becomes learning. In other words, learning creates a hyperplane to separate it into two. First, let the class be Ci (i = 1 ... c), x be the input pattern, and the number of dimensions be d. The linear discriminant function gi (x) of class Ci is
g_i(x) = ω₀+\sum_{j=1}^{d} ω_jx_j
This time, for the sake of simplicity, we will consider two classes, and let the classes be C₁ and C₂. Also, when gi (x) is the maximum, it is assumed that the class to which x belongs is Ci (depending on the linear discrimination function, the minimum can also be the discrimination result, so here the preamble is that the maximum is the discrimination result. did). At this time, when the input pattern is class C₁, g₁ (x)> g₂ (x). And when the input pattern is class C₂, g₁ (x) <g₂ (x). Therefore, the discriminant function gi (x) is combined into one,
g(x) = g_1(x)-g_2(x)
=ω'₀+\sum_{j=1}^{d} ω'_jx'_j
In this way, the weight vector W is obtained. The linear discriminant function that substitutes this for g (x) separates the feature space for each class by the hyperplane. However, it is limited to linearly separable distributions, because the perceptron learning rule converges when it is linearly separable (Perceptron's convergence theorem).
First, the feature vector of the learning pattern could be used as an input on the execution side, but it was set in advance because the learning rules of the perceptron cannot be applied unless the distribution is linearly identifiable. It is as follows. Class C₁: x₁ = (1,1,1,1), x₂ = (1,0,1,1), x₃ = (1,1,0,1), x₄ = (1,1,1,0) Class C₂: x₅ = (1,0,0,0), x₆ = (1,1,0,0), x₇ = (1,0,1,0), x₈ = (1,0,0,1) The 0th element of each feature vector is all 1, which is the value multiplied by ω₀ of the linear discriminant function g (x). In reality, the three elements after the first element are the feature vectors, and the feature space is also three-dimensional (strictly speaking, the four-dimensional vector including the first element is called the extended feature vector. Originally, the feature vector ( (Without extension) means the first to dth vectors). Since it is a three-dimensional feature space, it is successful if the two classes are separated by a two-dimensional hyperplane. The learning coefficient in the learning rule was set to 1. First, the feature vectors of the learning pattern are distributed in the feature space as follows. (Red: Class C₁, Blue: Class C₂) I implemented the learning rules in Python. The code is as follows.
perceptron_learning.py
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
#Learning pattern feature vector
x = np.array([[1, 1, 1, 1], [1, 0, 1, 1], [1, 1, 0, 1], [1, 1, 1, 0], [1, 0, 0, 0], [1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1]])
#Weight vector
w = np.array([-5, 1, 1, 1])
#Perceptron learning rules
while True:
judge = 0
count = 1;
for data in x:
print(w)
if count <= x.shape[0]/2:
if np.dot(data, w)<=0:
w += data
judge = 1
else:
if np.dot(data, w)>=0:
w -= data
judge = 1
count += 1
print()
if judge == 0:
break
#Linear discriminant function
def f(x1, x2):
return (w[1]*x1+w[2]*x2+w[0])/(-w[3])
x1, x2 = np.mgrid[-1:2:0.1, -1:2:0.1]
x3 = f(x1, x2)
#Feature space settings
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d", facecolor="w")
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('x3')
ax.set_title("feature space")
#Drawing a linear discriminant function
ax.plot_surface(x1, x2, x3, cmap='BuGn')
i = 1
#Drawing a learning pattern
for point in x:
if i<=x.shape[0]/2:
ax.scatter(point[1], point[2], point[3], c='red')
else:
ax.scatter(point[1], point[2], point[3], c='blue')
i += 1
#Display feature space
plt.show()
As a result of execution, the feature space is as follows. As shown in the above figure, the red and blue points are separated by the green plane (hyperplane). Therefore, it can be seen that the weight was obtained by the learning rule.
[1] Kenichiro Ishii, Nobuyoshi Ueda, Eisaku Maeda, Hiroshi Murase. "Easy-to-understand pattern recognition 2nd edition". Ohmsha, 2019.
Recommended Posts