Explanation and implementation of simple perceptron

Target

Simple perceptrons are the foundation of machine learning, especially neural networks. I'm also studying, but let's start implementing a neural network with a simple perceptron and eventually work hard to implement Deep Learning! (I also do my best)

What is a simple perceptron?

A neural network that takes an array as input and returns one number 0 or 1. This algorithm started by imitating brain cells. One thing to keep in mind is that it's just a simple imitation, and it's extremely simplified. The only mathematical elements that appear in a simple perceptron are multiplication and addition. There are a total of 2 lines of expressions to remember, and the source code is about 20 lines for the learning algorithm alone.

Caution

This article is not deeply understood by me and will be lengthy if I prove a detailed learning algorithm. In the first place, the amount of mathematical formulas themselves is small, and you can memorize mathematical formulas and implement them as they are, so I will not explain why you can learn with that algorithm. It's just an introduction, step up. I would like to post a detailed explanation separately when I get a better understanding. (I thought it was longer, because it emphasizes clarity)

Simplified brain cells

I will briefly describe the actual composition of brain cells. Input to brain cells takes place in processes called dendrites. The dendrites are easy to transmit and difficult to transmit. When expressed in an expression, it is simply multiplication. Let i be the input and w be the ease of transmission.

i \times w

Ease of transmission is called weight, so it is represented by w. There are multiple dendrites. Of course, even with a simple perceptron, the input is an array of numbers. Therefore, this calculation is done for the input elements. For example, if there are 5 elements

i_0 \times w_0\\
i_1 \times w_1\\
i_2 \times w_2\\
i_3 \times w_3\\
i_4 \times w_4\\ 

It will be. The numbers next to it are subscripts. i and w correspond to arrays that can be placed in the program. It means i [0] * w [0]. You can see that the number of elements of the weight and the number of elements of the input match, and the elements of the same subscript in each array are multiplied.

The input to the brain cells was each dendrite. On the other hand, it has only one output and is called an axon. Dendrites and axons extend from what is called the cell body. Inputs from each dendrite merge at the cell body, and the cell body outputs to one axon.

The input and transmission of signals to dendrites could simply be expressed by multiplication, as in the equation above. Next, we need to merge this input at the cell body. It simply sums the inputs through the dendrites. In other words, it is realized by adding them together.

i_0 \times w_0 + i_1 \times w_1 + i_2 \times w_2 + i_3 \times w_3 + i_4 \times w_4

Use abbreviations because the formulas are long. Sigma is used for the abbreviation of sum. For example, the sum of 1 to 5 is shown below.

\sum_{i=1}^{5} i = 1 + 2 + 3 + 4 + 5

sigma.py


tmp = 0
for i in range(1,6):
    tmp += i

I also showed the python source code. Please refer to. Sigma (Σ) is very simple, under the sigma, enter the variable and the initial value of that variable. This is the part of i = 1. Note that the i here is the i in the index, not the input. Variable names can be given freely other than i. Above the sigma is the number of changes to the value of the variable i. The variable i is incremented by exactly 1. Sigma shows that they are added together.

The following is the formula for the input through the dendrites. I'm using i as a variable for input, so use j instead.

\sum_{j=0}^{4} i_j \times w_j = i_0 \times w_0 + i_1 \times w_1 + i_2 \times w_2 + i_3 \times w_3 + i_4 \times w_4

sigma.py


#It is assumed that i and w are predefined.
tmp = 0
for j in range(5):
    tmp += i[j] * w[j]

It's inconvenient without a name, so we'll name this value a weighted input. The variable name is z for the time being. Also, the number of elements will be represented by n so that it can handle the number of elements other than 5.

z = \sum_{j=0}^{n-1} i_j w_j

This concludes the weighted input formula. With this, we can express the input, the ease of transmission of each input, and the summarization of them as an expression. Finally, the formula expresses how to determine the output from this sum.

Again, it mimics real brain cells. In actual brain cells, if the input is above a certain level, it will transmit a signal, and if it is below a certain level, it will not transmit a signal. When a signal is generated above a certain level, it is called ignition.

The Simple Perceptron is a very simple model. Therefore, this formula is also simple. Let z be the weighted input and o be the output.

o = 
\begin{cases}
1 & \text{if } z > 0\\
0 & \text{if } z \leqq 0
\end{cases}

step.py


#Let z be predefined
if z > 0:
    o = 1
elif z <= 0 :
    o = 0

that's all. If it is greater than 0, it will fire (tell 1), otherwise it will not fire (tell 0). If you've read it properly, you may be wondering whether or not to fire based on 0. This is because it is clearly premised that a negative signal enters the cell body.

But keep in mind that the simple perceptron is a very simple model. If it is greater than 0, it will ignite, but this is just a specification for simplification of the perceptron. That concludes my discussion of the perceptron output.

Summarizing the above, the python code for the output of the simple perceptron is as follows.

output.py


#It is assumed that i and w are predefined.

def dot(vec0,vec1):
    tmp = 0
    for i in range(len(vec0)):
        tmp += vec0[i] * vec1[i]
    return tmp

def step(num):
    if num > 0:
        return 1
    else:
        return 0

o = step(dot(i,w))

Functions that return 1 if greater than 0 are classified in mathematics as a step function. Also, the function that multiplies each element and takes the sum is called the inner product, which is also called the dot product in English. The function name of the program is taken from here.

output.py


step(dot(i,w))

If you remember the above one-line formula, you should be able to write the output algorithm. Here is another line of formulas to remember and learning formulas.

Learning Perceptron

As noted at the beginning, this article does not explain why this algorithm can be used for learning. The purpose is to make it a rehearsal exercise for the next neural network through implementation. The goal is Deep Learning. (I explained it a lot ...)

However, I think it is difficult to memorize completely, so I would like to include some explanations. First, the goal of Perceptron is to return the proper output for the input. For this reason, we need data that defines the existence of input data and the appropriate output for that input. Appropriate output is called a teacher label.

Nowadays, simple perceptrons are called supervised learning in neural networks. In supervised learning, you need the data to be trained, the appropriate output you want to see when you enter that data, and the teacher label. Learning in a simple perceptron is to update the weight and w based on this data.

This time, we will explain by taking as an example the learning of and of logical operations. The and operation is an operation that receives two numbers (0 or 1 respectively), outputs 1 if both are 1, and returns 0 otherwise, which is the basis of a computer.

a b a and b
0 0 0
0 1 0
1 0 0
1 1 1

Let's prepare the learning data and teacher label for the time being. In this area, x and y are often used to represent each, so let's call them train_x and train_y.

data.py


train_x = [[0,0],[0,1],[1,0],[1,1]]
train_y = [0,0,0,1]

There seems to be no particular problem. It can be expressed that 0 is output when it is [0,0], [0,1], [1,0], and 1 is output when it is [1,1].

I think it's a mistake.

Why? In fact, there is something missing from the simple perceptron explanations I have given so far. The existence of the bias term. This is correct.

data.py


train_x = [[0,0,1],[0,1,1],[1,0,1],[1,1,1]]
train_y = [0,0,0,1]

Added 1 to the end of train_x. The bias term is a fixed value for all input examples. It can be other than 1 as long as it is fixed (for example, [[0,0,0.5], [0,1,0.5], [1,0,0.5], [1,1,0.5]]). But usually 1 is added as the bias term. Why do we need a bias term?

Let's take the case where the bias term is 1 as an example. At this time, let's see what happens to the weighted input z. That's why i [n] and w [n] are added.

z = \sum_{j=0}^{n-1} i_j w_j + i_n w_n

Here, i [n] is 1 in the bias term, so this is the case.

z = \sum_{j=0}^{n-1} i_j w_j + w_n

In short, adding bias term 1 is the same as adding one weight. In learning, this w [n] is updated in the same way, and as a result, the weighted input z goes up and down by the variable w [n]. This is because the weight can take a negative value as described above. In other words, by adding a bias term, the weighted input z can be biased to either side. (Bias means bias)

In terms of the actual function of brain cells, this is equivalent to indirectly changing the threshold of firing (outputting a signal when the input is above a certain level (called the threshold)). Of course, the threshold of the step function is 0, but for example, when there is a bias of 0.5, it indirectly corresponds to the threshold becoming -0.5.

Changing the threshold means determining whether it is easy to output 1 or 0, so you can see that it is an important factor. So let's bias it. Even when learning and, it will not work unless a bias term is included. Mathematically, I can explain (it seems) that if there is no intercept when separating in a hyperplane, only a graph that passes through the origin can be drawn and the expressiveness is reduced, but I will omit it. I want to give a mathematical proof later.

Let's return the story. It was about learning a simple perceptron. Learning is updating weights. For that, I noted that as teacher data, I need a set of inputs and outputs for them. What else do we need to change the weights to return the correct output for the input? Of course, it's the original weight.

data.py


train_x = [[0,0,1],[0,1,1],[1,0,1],[1,1,1]]
train_y = [0,0,0,1]
weight  = [0,0,0]

For the time being, I initialized all the weight vectors with 0. The weight vector is the weight. Nowadays, an array containing only numbers is called a vector. From now on, when we say a vector, please recognize that it means an array.

What else do you need? In order to update the weight vector, it seems necessary to decide whether to update the weight vector in the first place. Since it is obvious that the time when the weight vector should be updated is when the wrong output is output, the output o is required.

data.py


train_x = [[0,0,1],[0,1,1],[1,0,1],[1,1,1]]
train_y = [0,0,0,1]
weight  = [0,0,0]
#Example: train_x[0]Output when i is
o = step(dot(train_x[0],weight))

What else do you need? In fact, this is the end. With the above data, you can update the weight vector. It's a lie. Actually, it is normal to enter the learning coefficient. The learning factor is a number that represents how much the weight changes when it is updated. However, even if the learning coefficient is 1, it converges (learning ends properly), and when the weight vectors are all 0, the number of times it takes to converge is the same regardless of the learning coefficient (it seems to be posted later in the references). ), So you can omit it. But let's put it in for the time being.

data.py


train_x = [[0,0,1],[0,1,1],[1,0,1],[1,1,1]]
train_y = [0,0,0,1]
weight  = [0,0,0]
#Example: train_x[0]Output when i is
o = step(dot(train_x[0],weight))
#Example: Learning coefficient
eta = 0.1

Let's actually look at the update algorithm. First of all, it seems necessary to decide whether to increase or decrease the weight vector when the output is incorrect. When you want to output 0, you should reduce the weight if you output 1, and if you want to output 1, you should increase the weight if you output 0. Of course, if they match, the weights will not change.

This is when the value you want to output (teacher label) is y and the actual output is o.

y - o

It can be expressed by. Shown in the table.

y o y - o
0 0 0
0 1 -1
1 0 1
1 1 0

When o is large, it can be reduced by -1, and when o is small, it can be increased by +1.

However, this is in units of the entire weight vector. I will explain what that means. The above formula is the actual code

y_o.py


#Example: train when the subscript is 0_x and train_Calculate based on the element of y
y = train_y[0]
o = step(dot(train_x[0],weight))
y - o

It will be. The point is that y-o is calculated from the entire weight vector. This tells us whether to update this weight vector, positive or negative, but not which element to update. This is obviously not good, as if you update all of the weights uniformly, all the elements of the weight vector will have the same value! What should i do?

Now let's look back at the behavior of the perceptron. Perceptron takes an input vector, multiplies each by a weight vector, takes the sum, gives it to the argument of the step function, and outputs 0 or 1. So, of course, if you give the same input, it will return the same output. And if you give different inputs, especially if you give different data for teacher labels, it is desirable to return different outputs.

Returning different outputs means that the weighted inputs need to be biased towards either positive or negative numbers. This is because the weighted input is passed through the step function.

When the teacher label is 1 data, it should be biased to a positive number, and when the teacher label is 0, it should be biased to a negative number. So, to put it simply, weights must have both positive and negative weights. You can't have a uniform weight.

And, it seems that the data with the teacher label of 1 should pass through the positive weight, and the data with the teacher label of 0 should pass through the negative weight. Then, of course, data with a teacher label of 1 that has passed a positive weight will output 1, and data with a teacher label of 0 that has passed a negative weight will of course output 0.

However, even if it is a positive weight or a negative weight, of course, the perceptron passes the input vector through all the weights, so this is not a correct expression. So how do you get it right? Now consider that there was a 0 in the elements of the input vector.

Since the input is 0, when the dot product is taken, the weight corresponding to that input has no effect. 0 x weight = 0. That is, when the input is 0, the corresponding weights are the same as being ignored when calculating the output. It seems that this did not pass the weight.

If the input is 0, then the input does not pass the weight. Therefore, the weight does not have to be updated. Also, for example, data with a teacher label of 1 may not pass through that weight, but data with a teacher label of 0 may pass through that weight, so if you update a weight that does not need to be updated, the weight that the other party will pass through Will be changed (and in your own direction) without permission, which will have the opposite effect.

You already know. When updating the weight vector, if the input is 0, the weight should not be updated, and if the input is 1, the weight should be updated. Since the input is not necessarily 0 or 1, you can simply think of the amount of weight vector updates being proportional to that input.

That is, for example, the update amount of the subscript 0 of the weight vector is

(y - o) * i_0

It will be. Oops, you forgot to include the learning factor.

(y - o) * i_0 * eta

What else do you need ... Nothing needed! When updating w [0], this is all the values to add. If you add a dot to the new w [0] and express it in the formula, it will look like this.

\dot{w}_0 = w_0 + (y - o) * i_0 * eta

Let's set it to j so that it can be used even when the subscript is other than 0.

\dot{w}_j = w_j + (y - o) * i_j * eta

This is the end of the second formula to remember. The rest is a digestive game where you just write the program. Thank you very much.

Implementation

Let's implement it one by one. It is assumed that you can read python. First, you needed a function to calculate the inner product and a step function.

dot.py


def dot(vec0,vec1):
    tmp = 0
    for i, j in zip(vec0,vec1):
        tmp += i * j
    return tmp

The zip function is a function that sticks two lists together. This puts the elements of the pasted list into i and j from the beginning. Just multiply them and add them together. Remember the zip function because it's useful.

step.py


def step(num):
    if num > 0:
        return 1
    else:
        return 0

Let's define more output. I named it feedforward. Separately, output is also acceptable.

feedforward.py


def feedforward(i,w):
    return step(dot(i,w))

Next is learning. Remember this formula.

\dot{w}_j = w_j + (y - o) * i_j * eta

For the time being, if you write only the arguments that the function takes, it will look like this.

train.py


def train(w,i,y,eta):
    pass

i and y are elements of train_x and train_y. Although train_x and train_y are teacher data, we will train each element of the teacher data one by one. This is called online learning, sequential learning. Let's calculate o for the time being.

train.py


def train(w,i,y,eta):
    o = feedforward(i,w)

Since it is necessary to update all w [j], turn the for statement.

train.py


def train(w,i,y,eta):
    o = feedforward(i,w)
    for j in range(len(w)):
        w[j] = w[j] + (y - o) * i[j] * eta
    return w

Prepare teacher data, weight vector, and learning rate ...

data.py


train_x = [[0,0,1],[0,1,1],[1,0,1],[1,1,1]]
train_y = [0,0,0,1]
weight  = [0,0,0]
eta     = 0.1

Let the teacher data be learned one by one ...

main.py


for x,y in zip(train_x,train_y):
    weight = train(weight,x,y,eta)

If you repeat it multiple times ... (The number of repetitions is called epoch)

main.py


epoch = 100
for i in range(epoch):
    for x,y in zip(train_x,train_y):
        weight = train(weight,x,y,eta)

Complete!

simple_perceptron.py


#inner product
def dot(vec0,vec1):
    tmp = 0
    for i, j in zip(vec0,vec1):
        tmp += i * j
    return tmp
#step function
def step(num):
    if num > 0:
        return 1
    else:
        return 0
#output
def feedforward(i,w):
    return step(dot(i,w))
#Sequential learning
def train(w,i,y,eta):
    o = feedforward(i,w)
    for j in range(len(w)):
        w[j] = w[j] + (y - o) * i[j] * eta
    return w

#main processing,Learn and.
if __name__ == "__main__":
    train_x = [[0,0,1],[0,1,1],[1,0,1],[1,1,1]]
    train_y = [0,0,0,1]
    weight  = [0,0,0]
    eta     = 0.1

    epoch = 100
    for i in range(epoch):
        for x,y in zip(train_x,train_y):
            weight = train(weight,x,y,eta)

    #Verification,0,0,0,It's okay if 1 is output
    for x in train_x:
        print(feedforward(x,weight))

The input data can be an array of numbers, that is, any vector. Number recognition is also possible by converting handwritten 0 and 1 images into a one-dimensional array (vector). The accuracy is subtle, but ...

Recommended Posts

Explanation and implementation of simple perceptron
Explanation and implementation of SocialFoceModel
Explanation and implementation of PRML Chapter 4
Explanation and implementation of ESIM algorithm
Explanation and implementation of Decomposable Attention algorithm
Perceptron basics and implementation
Explanation of edit distance and implementation in Python
Introduction and Implementation of JoCoR-Loss (CVPR2020)
Introduction and implementation of activation function
Implementation of a simple particle filter
[Reinforcement learning] Explanation and implementation of Ape-X in Keras (failure)
Simple neural network theory and implementation
[With simple explanation] Scratch implementation of deep Boltzmann machine with Python ②
[With simple explanation] Scratch implementation of deep Boltzmann machine with Python ①
Explanation of CSV and implementation example in each programming language
Mathematical explanation of binary search and ternary search and implementation method without bugs
Implementation and experiment of convex clustering method
Implementation and explanation using XGBoost for beginners
Explanation and implementation of the XMPP protocol used in Slack, HipChat, and IRC
Comparison of k-means implementation examples of scikit-learn and pyclustering
Implementation of TRIE tree with Python and LOUDS
Simple implementation example of one kind of data augmentation
[Super Introduction] Machine learning using Python-From environment construction to implementation of simple perceptron-
Implementation example of simple LISP processing system (Python version)
Python --Explanation and usage summary of the top 24 packages
Sequential update of covariance to derivation and implementation of expressions
Implementation of Fibonacci sequence
I touched Wagtail (3). Investigation and implementation of pop-up messages.
Basics of Perceptron Foundation
Machine Learning_Nonlinearize Simple Perceptron
Implementation of DB administrator screen by Flask-Admin and Flask-Login
Overview of generalized linear models and implementation in Python
Explanation of package tools and commands for Linux OS
Python implementation of CSS3 blend mode and talk of color space
[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method
Derivation and implementation of update equations for non-negative tensor factorization
A simple Python implementation of the k-nearest neighbor method (k-NN)
Theory and implementation of multiple regression models-why regularization is needed-
Verification and implementation of video reconstruction method using GRU and Autoencoder
Quantum computer implementation of quantum walk 2
Problems of liars and honesty
Mechanism of pyenv and virtualenv
Implementation of MathJax on Sphinx
A simple sample of pivot_table.
Pre-processing and post-processing of pytest
Combination of recursion and generator
Combination of anyenv and direnv
Normalizing Flow theory and implementation
Implementation of game theory-Prisoner's dilemma-
Differentiation of sort and generalization of sort
Implementation of independent component analysis
Simple FPS measurement of python
Coexistence of pyenv and autojump
Simple simulation of virus infection
Quantum computer implementation of quantum walk 3
Python implementation of particle filters
Machine learning algorithm (simple perceptron)
Use and integration of "Shodan"
Problems of liars and honesty
Maxout description and implementation (Python)
Occurrence and resolution of tensorflow.python.framework.errors_impl.FailedPreconditionError