Learning Perceptron

Network to use

-A network consisting of three layers: an input layer, an intermediate layer, and an output layer. ・ The layers are fully connected. -All neurons output $ 1 $ or $ 0 $.

The purpose of the perceptron is to learn the weight (synaptic weight) between the intermediate layer and the output layer and generate an output pattern corresponding to the input pattern.

Input layer

Suppose you have $ M $ neurons. It receives the input from the outside as it is and outputs it. The output of the $ i $ th neuron is expressed by the following formula. $output_i = input_i$

Middle layer

Suppose you have $ N $ neurons. The input given to the $ j $ th neuron in the middle layer is the sum of the output values of all input layer neurons multiplied by the synaptic weight $ w_ {i, j} . $input_j=\sum_{i=1}^{M}{w_{i,j}output_i}$$

Then, set the threshold $ \ theta_j $ for the signal received by each middle layer neuron, and reduce it by that amount. $input_j-\theta_j$

Since the output is only $ 0 $ or $ 1 $, set the output function $ f (x) $. Therefore, the output value of the $ j $ th neuron in the middle layer is expressed by the following formula. $output_j = f(input_j-\theta_j)$

f(u) = \left\{
\begin{array}{ll}
1 & (u \gt 0) \\
0 & (u \leq 0)
\end{array}
\right.

Output layer

The inputs of the output layer, like the inputs of the middle layer, are the output values of all neurons in the previous layer multiplied by synaptic weights. Let the number of neurons in the output layer be $ 1 $.

input_o=\sum_{j=1}^{N}{w_{j,o}output_j}

And the output of the output layer is the result of applying $ f (x) $ to the value obtained by subtracting the threshold value from the input, as in the intermediate layer.

output_o=f(input_o-\theta_o)

f(u) = \left\{
\begin{array}{ll}
1 & (u \gt 0) \\
0 & (u \leq 0)
\end{array}
\right.

Learning

During training, only the synaptic weight $ w_ {j, o} $ between the output layer and the intermediate layer is updated without changing other parameters.

\Delta w_{j,o}=\eta(t_o-output_o)output_j

w^{t+1}=w^{t}+\Delta w_{j,o}

$ \ eta $ is the learning rate, and it is common to set a small positive value. $ t_o-output_o $ is the difference between the teacher data output $ t_o $ and the actual output data $ output_o $. Therefore, it can be seen that the synaptic weight is updated only when the calculated result and the teacher signal are different.

The explanation and proof of Perceptron's convergence theorem are omitted here. If you are interested, please check it out.

Implementation

parameter settings

Input data -Suppose a function that returns $ 1 $ for $ x \ gt y $ and $ 0 $ for $ x \ leq y $ for an integer pair $ (x, y) $ between $ 0 $ and $ 15 $. --We have prepared $ 2000 $ pieces of such data. --Half of the data is for learning and half is for testing. --Network structure --Input layer --Number of neurons: $ 8 $ -Represent $ 16x + y $ in binary with $ 8 $ digits and pass each digit to the neuron --Middle layer --Number of neurons: $ 30 $ (appropriately set, it's fun to experiment) --Output layer --Number of neurons: $ 1 $ --Synaptic weight -$ [-0.005,0.005) $ uniform random number --Other parameters --Number of experiments $ 200 $ --Learning rate: $ \ eta = 10 ^ {-4} $

code

Library installation and input path settings

import numpy as np
import matplotlib.pyplot as plt

PATH_X = "./../input_x.npy"
PATH_Y = "./../input_y.npy"

Convert input data from $ (x, y) $ to $ 0 $ and $ 1 $ columns of length $ 8 $

def to_input(data):
    x = data[0]
    y = data[1]
    n = x * 16 + y
    return np.array([int(k) for k in format(n, '08b')])

Perceptron class __Caution! __ The load calculation in the program is calculated by considering the formula explained above as a vector.

class Perceptron:
    def __init__(self, m, n, o):
        # decide initial weight [-0.005,0.005)
        #I added 1 to handle the threshold easily
        self.w_IM = np.random.rand(n,m+1) - 0.5
        self.w_IM = self.w_IM / 100
        self.w_MO = np.random.rand(o,n+1) - 0.5
        self.w_MO = self.w_MO / 100

    # calculate accuracy
    def get_acc(self, x, y):
        ok = 0
        for i in range(len(x)):
            #I'm adding a neuron that always outputs 1
            mid_in = np.inner(np.append(x[i],1.), self.w_IM)
            mid_out = np.array([int(k > 0) for k in mid_in])
            #I'm adding a neuron that always outputs 1
            out_in = np.inner(np.append(mid_out,1.), self.w_MO)
            ok += int(int(out_in[0] > 0) == y[i])
        return ok / len(x)

    def learn(self, train_x, train_y, eta = 0.00001):
        #I'm adding a neuron that always outputs 1
        mid_in = np.inner(np.append(train_x,1.), self.w_IM)
        mid_out = np.array([int(k > 0) for k in mid_in])
        #I'm adding a neuron that always outputs 1
　      out_in = np.inner(np.append(mid_out,1.), self.w_MO)
        out = int(out_in[0] > 0)

        #Updating loads from output and teacher data values
        self.w_MO[0,:-1] = self.w_MO[0,:-1] + eta * (train_y - out) * mid_out

Parameter setting and result graph drawing

def main():
    # read datas
    x = np.load(PATH_X)
    y = np.load(PATH_Y)
    # split datas
    train_x, test_x = np.split(x, 2)
    train_y, test_y = np.split(y, 2)
    # preprocess - transfer data into inputs
    datas = np.array([to_input(k) for k in train_x])
    tests = np.array([to_input(k) for k in test_x])
    # number of neurons input layer
    m = 8
    # number of neurons mid layer
    n = 10
    # number of neurons output layer
    o = 1
    # define the perceptron
    P = Perceptron(m,n,o)
    
    # learning time
    N = 10
    cnt = 0

    x = np.linspace(0,200,200)
    acc_train = np.copy(x)
    acc_test = np.copy(x)
    while True:
        acc = P.get_acc(datas, train_y)
        acc_train[cnt] = acc
        acc = P.get_acc(tests, test_y)
        acc_test[cnt] = acc
        print("Try ", cnt, ": ", acc)
        cnt += 1
        for i in range(len(datas)):
            P.learn(datas[i], train_y[i])
        if cnt >= 200:
            break
    plt.plot(x,acc_train,label="train")
    plt.plot(x,acc_test,label="test")
    plt.savefig("result.png ")

if __name__ == "__main__":
    main()

It is also uploaded to Github. https://github.com/xuelei7/NeuralNetwork/tree/master/Perceptron

result

The horizontal axis is the number of trials, and the vertical axis is the correct answer rate.

For $ 30 $ middle layer neurons:

For $ 100 $ middle layer neurons:

in conclusion

If there are any improper points, I would like to correct them. We apologize for the inconvenience, but please contact the author.

Reference material

"Neural Network", Yasunari Yoshitomi, Asakura Shoten,

Perceptron basics and implementation