1. Perceptron learning rules

The perceptron learning rule is a method of finding the weight of a linear discriminant function by learning. The pattern is represented by a point on the feature space. For example, when considering two classes, the question is whether the points distributed in the feature space can be separated into two by the hyperplane generated by the linear discriminant function. In order to separate it into two, the weight of the linear discriminant function must be adjusted. The adjustment becomes learning. In other words, learning creates a hyperplane to separate it into two. First, let the class be Ci (i = 1 ... c), x be the input pattern, and the number of dimensions be d. The linear discriminant function gi (x) of class Ci is

g_i(x) = ω₀+\sum_{j=1}^{d} ω_jx_j

This time, for the sake of simplicity, we will consider two classes, and let the classes be C₁ and C₂. Also, when gi (x) is the maximum, it is assumed that the class to which x belongs is Ci (depending on the linear discrimination function, the minimum can also be the discrimination result, so here the preamble is that the maximum is the discrimination result. did). At this time, when the input pattern is class C₁, g₁ (x)> g₂ (x). And when the input pattern is class C₂, g₁ (x) <g₂ (x). Therefore, the discriminant function gi (x) is combined into one,

g(x) = g_1(x)-g_2(x)
     =ω'₀+\sum_{j=1}^{d} ω'_jx'_j

When g (x)> 0, x∊C₁. When g (x) <0, x∊C₂ And. The ω in this g (x) is the weight, and the purpose of learning is to make this weight the optimum value. The weight is calculated by the following procedure. Express (ω'₀, ω'₁, ..., ω'd) as the weight vector W. ① First, set the weight appropriately ② Select a learning pattern. (A learning pattern is a set of feature vectors x and class Ci to which x belongs) (3) When x∊C₁ is g (x) <0, the weight vector W is multiplied by the learning pattern x (feature vector) and the learning coefficient p. When x∊C₂ is g (x)> 0, the value obtained by multiplying the weight vector W by the learning pattern x (feature vector) by the learning coefficient p is subtracted. (4) Perform steps (2) and (3) for all learning patterns. ⑤ Finish when correct identification results are obtained for all learning patterns. If there is even one error, return to step ②.

In this way, the weight vector W is obtained. The linear discriminant function that substitutes this for g (x) separates the feature space for each class by the hyperplane. However, it is limited to linearly separable distributions, because the perceptron learning rule converges when it is linearly separable (Perceptron's convergence theorem).

2. Implemented in Python

First, the feature vector of the learning pattern could be used as an input on the execution side, but it was set in advance because the learning rules of the perceptron cannot be applied unless the distribution is linearly identifiable. It is as follows. Class C₁: x₁ = (1,1,1,1), x₂ = (1,0,1,1), x₃ = (1,1,0,1), x₄ = (1,1,1,0) Class C₂: x₅ = (1,0,0,0), x₆ = (1,1,0,0), x₇ = (1,0,1,0), x₈ = (1,0,0,1) The 0th element of each feature vector is all 1, which is the value multiplied by ω₀ of the linear discriminant function g (x). In reality, the three elements after the first element are the feature vectors, and the feature space is also three-dimensional (strictly speaking, the four-dimensional vector including the first element is called the extended feature vector. Originally, the feature vector ( (Without extension) means the first to dth vectors). Since it is a three-dimensional feature space, it is successful if the two classes are separated by a two-dimensional hyperplane. The learning coefficient in the learning rule was set to 1. First, the feature vectors of the learning pattern are distributed in the feature space as follows. (Red: Class C₁, Blue: Class C₂) I implemented the learning rules in Python. The code is as follows.

`perceptron_learning.py`


import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

#Learning pattern feature vector
x = np.array([[1, 1, 1, 1], [1, 0, 1, 1], [1, 1, 0, 1], [1, 1, 1, 0], [1, 0, 0, 0], [1, 1, 0, 0], [1, 0, 1, 0], [1, 0, 0, 1]])

#Weight vector
w = np.array([-5, 1, 1, 1])

#Perceptron learning rules
while True:
	judge = 0
	count = 1;
	for data in x:
		print(w)
		if count <= x.shape[0]/2:
			if np.dot(data, w)<=0:
				w += data
				judge = 1
		else:
			if np.dot(data, w)>=0:
				w -= data
				judge = 1
		count += 1
	print()
	if judge == 0:
		break


#Linear discriminant function
def f(x1, x2):
    return (w[1]*x1+w[2]*x2+w[0])/(-w[3])

x1, x2 = np.mgrid[-1:2:0.1, -1:2:0.1]
x3 = f(x1, x2)

#Feature space settings
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d", facecolor="w")
ax.set_xlabel('x1')
ax.set_ylabel('x2')
ax.set_zlabel('x3')
ax.set_title("feature space")

#Drawing a linear discriminant function
ax.plot_surface(x1, x2, x3, cmap='BuGn')

i = 1

#Drawing a learning pattern
for point in x:
	if i<=x.shape[0]/2:
		ax.scatter(point[1], point[2], point[3], c='red')
	else:
		ax.scatter(point[1], point[2], point[3], c='blue')
	i += 1

#Display feature space
plt.show()

As a result of execution, the feature space is as follows. As shown in the above figure, the red and blue points are separated by the green plane (hyperplane). Therefore, it can be seen that the weight was obtained by the learning rule.

3. References

[1] Kenichiro Ishii, Nobuyoshi Ueda, Eisaku Maeda, Hiroshi Murase. "Easy-to-understand pattern recognition 2nd edition". Ohmsha, 2019.

Perceptron learning experiment learned with Python

1. Perceptron learning rules

2. Implemented in Python

perceptron_learning.py

3. References

`perceptron_learning.py`