[Python] A junior high school student implemented Perceptron and tried to classify irises.

Introduction

In writing this article, I referred to Python Machine Learning Programming -GitHub. I'm really thankful to you.

About neurons

The cells that make up nerves are stimulated and excited, and the stimulus is transmitted to other cells. Nerve cell.

The neuron is an image like the one below.

[Nikkei Cross Tech](https://www.google.com/url?sa=i&url=https%3A%2F%2Fxtech.nikkei.com%2Fdm%2Fatcl%2Ffeature%2F15%2F032300023%2F00003%2F&psig=AOvVaw3ekhKAPatWT524BOdiUTBX 1593296223820000 & source = images & cd = vfe & ved = 0CAMQjB1qFwoTCJDi3o_BoOoCFQAAAAAdAAAAABAQ)

There are about 200 billion neurons in your brain. And each neuron is connected as shown in the image below.

[Earth Seminar 36-3](https://www.google.com/url?sa=i&url=http%3A%2F%2Fblog.livedoor.jp%2Fnara_suimeishi%2Farchives%2F51595095.html&psig=AOvVaw2KqsANc_yt4xjFv8gllDcL&ust=159 vfe & ved = 0CAMQjB1qFwoTCMD-84vDoOoCFQAAAAAdAAAAABAD)

The joints are called synapses.

Mechanism of information transmission by neurons

The cell part is stimulated
Nerve impulse "fires"
Informational spike trains are delivered to synapses through axons
It progresses to the nerve cells downstream.

Formulate neurons

A simple diagram of a neuron is shown below.

There are many input values up to $ x_1 ... x_m $, but the output value is always either "fire" or "do not fire". This time, the output value when "ignites" is 1, and the output value when "does not ignite" is -1. (The output value is arbitrary) By the way, $ w $ is an acronym for weight, which determines the importance of the input value. $ y $ represents the output value.

When this is mathematically expressed, it becomes like this.

z = x_1w_1 + ... + x_mw_m = \sum_{x=1}^{m} x_iw_i

The new $ z $ here is called total input, or net input in English.

And the $ \ theta $ shown in the image above is called the threshold value, and when it is $ z \ geq \ theta $, "fire" 1 is output. On the other hand, if $ z <\ theta $, -1 is output because it does not ignite.

f(z) = \left\{
\begin{array}{ll}
1 & (z \geq 0) \\
0 & (z \lt 0)
\end{array}
\right.

At this time, $ f (z) $ becomes a ** decision function ** that outputs "1" when it becomes 0 or more and "0" when it becomes 0 or less.

Unit step function

As you can see from this graph, when z becomes 0 or more, it fires and 1 is output.

In this way, the function that suddenly fires at a certain timing is called the ** Heaviside step function **.

About Simple Perceptron

So far, we have talked about simple neurons called formal neurons. Next, I will explain an algorithm called perceptron.

Introduced in 1958, the Simple Perceptron runs on a very simple algorithm.

1.Initialize weights to 0 or any number
2.Calculate the output value for each input value and update the weight

That's it.

The formula for updating the weight is as follows.

W_j := W_j + \Delta W_j

Each weight on the left, $ W_j $, is updated with $ \ Delta W_j $.

$ \ Delta W_j $ is

\Delta W_j = \eta\:(\:y^{(i)}-\hat{y}^{(i)}\:)\: x_{j}^{(i)}

Can be written as.

$ y ^ {(i)} $ represents the true class label (correct classification) and $ \ hat {y} ^ {(i)} $ represents the output value (classification result). And $ \ eta $ shows the learning rate and $ x_ {j} ^ {(i)} $ shows the input value.

Let's apply specific numbers

Now that you understand the generated formula, let's try a concrete number.

For example, consider a pattern that has been misclassified.

\Delta W_j = \eta\:(\:1-(-1)\:)\: x_{j}^{(i)} = \eta\:(2)\:x_{j}^{(i)}

If $ \ eta $ is 1, $ x_ {j} ^ {(i)} $ is 0.5, the weight is updated to 1.

This makes the total number of inputs ($ x_ {j} ^ {(i)} W_j $) a larger positive number and less likely to be mistaken.

And it is ** Simple Perceptron ** that keeps doing this all the time until there are no mistakes.

Disadvantages of simple perceptrons

Simple perceptrons cannot separate two classes that are not linearly separable.

PLAN-B

Also, XOR operation is not possible because linear separation is not possible. (What is XOR operation?) Consider the input values $ x_1 $, $ x_2 $.

x_1	x_2	x_1 XOR x_2
0	0	0
0	1	1
1	1	0
1	0	1

At this time, plotting each will look like this.

Machine learning that even high school graduates can understand

As you can see, this is not linearly separable.

In order to solve the XOR operation, it is necessary to implement a multi-layer perceptron.

Let's implement Perceptron !!

Development environment: Chrome 83 Google Colab Mac OS High Sierra

1. Modularized perceptron

import numpy as np

class Perceptron(object):
    def __init__(self, eta=0.01, n_iter=50, random_state=1):
        self.eta = eta
        self.n_iter = n_iter
        self.random_state = random_state

    def fit(self, X, y):
        rgen = np.random.RandomState(self.random_state)
        self.w_ = rgen.normal(loc=0.0, scale=0.01, size=1 + X.shape[1])
        self.errors_ = []

        for _ in range(self.n_iter):
            errors = 0
            for xi, target in zip(X, y):
                update = self.eta * (target - self.predict(xi))
                self.w_[1:] += update * xi
                self.w_[0] += update
                errors += int(update != 0.0)
            self.errors_.append(errors)
        return self

    def net_input(self, X):
        return np.dot(X, self.w_[1:]) + self.w_[0]

    def predict(self, X):
        return np.where(self.net_input(X) >= 0.0, 1, -1)

Original code: GitHub

2. Check the iris dataset

#Data reading and confirmation, X,specification of y
import matplotlib.pyplot as plt
import pandas as pd 
df = pd.read_csv('https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data', header=None)
df.head()

y = df.iloc[0:100, 4].values
y = np.where(y == 'Iris-setosa', -1, 1)

X = df.iloc[0:100, [0, 2]].values

Original code: GitHub

3. Functions for plotting

from matplotlib.colors import ListedColormap

def plot_decision_regions(X, y, classifier, resolution=0.02):

    # setup marker generator and color map
    markers = ('s', 'x', 'o', '^', 'v')
    colors = ('red', 'blue', 'lightgreen', 'gray', 'cyan')
    cmap = ListedColormap(colors[:len(np.unique(y))])

    # plot the decision surface
    x1_min, x1_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    x2_min, x2_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx1, xx2 = np.meshgrid(np.arange(x1_min, x1_max, resolution),
                           np.arange(x2_min, x2_max, resolution))
    Z = classifier.predict(np.array([xx1.ravel(), xx2.ravel()]).T)
    Z = Z.reshape(xx1.shape)
    plt.contourf(xx1, xx2, Z, alpha=0.3, cmap=cmap)
    plt.xlim(xx1.min(), xx1.max())
    plt.ylim(xx2.min(), xx2.max())

    # plot class samples
    for idx, cl in enumerate(np.unique(y)):
        plt.scatter(x=X[y == cl, 0], 
                    y=X[y == cl, 1],
                    alpha=0.8, 
                    c=colors[idx],
                    marker=markers[idx], 
                    label=cl, 
                    edgecolor='black')

Original code: GitHub

4. Plot

plot_decision_regions(X, y, classifier=ppn)
plt.xlabel('sepal length [cm]')
plt.ylabel('petal length [cm]')
plt.legend(loc='upper left')


# plt.savefig('images/02_08.png', dpi=300)
plt.show()

Original code: GitHub

5. Like this

Finally

Next, I would like to try the implementation of Multilayer Perceptron.