Introduction

Step-by-step on the theory, implementation in python, and analysis using scikit-learn about the algorithm previously taken up in "Classification of Machine Learning" I will study with. I'm writing it for personal learning, so I'd like you to overlook any mistakes.

From this time on, I will start on the classification problem. First of all, from the basic perceptron.

The following sites were referred to this time. Thank you very much.

What is 2 classification?

Two-class classification refers to outputting "1" or "0" (or "1" or "-1") for an input. Instead of "may break down with a 60% probability", put black and white to see if it breaks down or not. There are various types of two-class classification, and ** Perceptron ** is the most basic classifier.

Binary classification (Wikipedia)

Overview of Perceptron

The perceptron is a nerve cell-inspired model that adds weights to a large number of inputs and outputs 1 when a certain threshold is exceeded. It is that picture that you often see when illustrated.

Mathematical expression

n inputs $ \ boldsymbol {x} = (x_0, x_1, \ cdots, x_ {n}) $, weights $ \ boldsymbol {w} = (w_0, w_1, \ cdots, w_ {n}) $ And when you add them all together,

w_0x_0+w_1x_1+\cdots+w_{n}x_{n} \\\
=\sum_{i=0}^{n}w_ix_i \\\
= \boldsymbol{w}^T\boldsymbol{x}

It is expressed as. T is the transposed matrix. And if this value is positive, it outputs 1, and if it is negative, it outputs -1. A function that indicates such a value of -1 or 1 is called a step function.

The initial value irrelevant to the input is called ** bias term **, but if the bias term is $ w_0 $ and $ x_0 = 1 $, the above formula can be used as it is.

Let's write up to here in python

Since python can calculate the product of matrices with "@", if the input input to the perceptron and the output are the output

import numpy as np

w = np.array([1.,-2.,3.,-4.])
x = np.array([1.,2.,3.,4.])

input = w.T @ x
output = 1 if input>=0 else -1

It's easy.

Learning Perceptron

Perceptron is so-called "supervised learning". For the given $ \ boldsymbol {x} $, if there is a correct label $ \ boldsymbol {t} = (t_0, t_1, \ cdots, t_n) $, then $ \ boldsymbol {w} ^ T \ boldsymbol {x You need to find $ \ boldsymbol {w} $ such that} $ returns the correct label correctly.

This needs to be learned using the teacher data as in the case of regression. For the perceptron, the same approach of determining the loss function and updating the parameter $ \ boldsymbol {w} $ to minimize the loss is effective.

Perceptron loss function

So what kind of loss function should we set? The idea is that if the answer is correct, there is no loss, and if the answer is incorrect, the loss is given according to the distance from the boundary based on the boundary that classifies the two classes.

The hinge function is often used to meet such demands. It seems that the perceptron of scikit-learn also uses the hinge function. For the hinge function,

Rough explanation of the meaning of the hinge function and its ability to be used as a loss function

As you can see here, in a function that increases from a certain value, if you say $$ h (x) , you can write $ h (x) = \ max (0, x-a) $$. The hinge function is also used in SVM (Support Vector Machine). SVM is important, so I will come back soon.

For the loss function, if the correct label $ t_n $ for each element and the predicted value $ step (w_nx_n) $ are the same, $ t_nw_nx_n $ indicates a positive value, and if they are different, it is a negative value. The smaller the loss function, the better, so if the loss function is $ L , then $ L = \ sum_ {i = 0} ^ {n} \ max (0, -t_nw_nx_n) $$. Find $ w_n $, which minimizes the loss function, using gradient descent.

Partial differentiation of $ L $ with respect to $ w_n $

\frac{\partial L}{\partial w_n}=-t_nx_n

So the recurrence formula that updates $ w_n $ is

w_{i+1}=w_{i}+\eta t_nx_n

Can be written. In addition, $ \ eta $ is the learning rate.

Perceptron python implementation

I will actually implement it in python. The data used is the familiar scikit-learn to iris classification. See below for a detailed description of the dataset.

Building a machine learning model using scikit-learn (classification of iris varieties)

First of all, since it is a two-class classification, we will specialize in that. It doesn't matter what data you use, but it's arbitrary and prejudiced, and the labels are "versicolor" and "virginica". "Sepal length (cm)" and "petal width (cm)" were selected for the features.

First, visualize the data.

mport numpy as np
import pandas as pd
import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.datasets import load_iris

iris = load_iris()

df_iris = pd.DataFrame(iris.data, columns=iris.feature_names)
df_iris['target'] = iris.target_names[iris.target]

fig, ax = plt.subplots()

x1 = df_iris[df_iris['target']=='versicolor'].iloc[:,3].values
y1 = df_iris[df_iris['target']=='versicolor'].iloc[:,0].values

x2 = df_iris[df_iris['target']=='virginica'].iloc[:,3].values
y2 = df_iris[df_iris['target']=='virginica'].iloc[:,0].values

ax.scatter(x1, y1, color='red', marker='o', label='versicolor')
ax.scatter(x2, y2, color='blue', marker='s', label='virginica')

ax.set_xlabel("petal width (cm)")
ax.set_ylabel("sepal length (cm)")
ax.legend()

plt.plot()

It seems that it can be classified somehow (it is also said that such data was selected).

Implementation of perceptron class

Implement the Perceptron class. The bias term is intentionally added.

class Perceptron:
  def __init__(self, eta=0.1, n_iter=1000):
    self.eta=eta
    self.n_iter=n_iter
    self.w = np.array([])

  def fit(self, x, y):
    self.w = np.ones(len(x[0])+1)

    x = np.hstack([np.ones((len(x),1)), x])

    for _ in range(self.n_iter):
      for i in range(len(x)):
        loss = np.max([0, -y[i] * self.w.T @ x[i]])
        if (loss!=0):
          self.w += self.eta * y[i] * x[i]

  def predict(self, x):
    x = np.hstack([1., x])
    return 1 if self.w.T @ x>=0 else -1

  @property
  def w_(self):
    return self.w

The hinge loss function is calculated for each data, and if the answer is incorrect, the weight is updated by the steepest gradient method. The calculation is stopped when the specified number of updates is reached, but this may be stopped when the error falls below a certain value.

Actually classify

After putting the data in the previous class and training it, let's draw a boundary.

df = df_iris[df_iris['target']!='setosa']
df = df.drop(df.columns[[1,2]], axis=1)
df['target'] = df['target'].map({'versicolor':1, 'virginica':-1})

x = df.iloc[:,0:2].values
y = df['target'].values

model = Perceptron()
model.fit(x, y)

#Drawing a graph
fig, ax = plt.subplots()

x1 = df_iris[df_iris['target']=='versicolor'].iloc[:,3].values
y1 = df_iris[df_iris['target']=='versicolor'].iloc[:,0].values

x2 = df_iris[df_iris['target']=='virginica'].iloc[:,3].values
y2 = df_iris[df_iris['target']=='virginica'].iloc[:,0].values

ax.scatter(x1, y1, color='red', marker='o', label='versicolor')
ax.scatter(x2, y2, color='blue', marker='s', label='virginica')

ax.set_xlabel("petal width (cm)")
ax.set_ylabel("sepal length (cm)")

#Draw classification boundaries
w = model.w_
x_fig = np.linspace(1.,2.5,100)
y_fig = [-w[2]/w[1]*xi-w[0]/w[1] for xi in x_fig]
ax.plot(x_fig, y_fig)
ax.set_ylim(4.8,8.2)

ax.legend()

plt.show()

It seems that virginica can be classified correctly, but visicolor cannot be classified. Is it such a thing?

Try it with scikit-learn

df = df_iris[df_iris['target']!='setosa']
df = df.drop(df.columns[[1,2]], axis=1)
df['target'] = df['target'].map({'versicolor':1, 'virginica':-1})

x = df.iloc[:,0:2].values
y = df['target'].values

from sklearn.linear_model import Perceptron
model = Perceptron(max_iter=40, eta0=0.1)
model.fit(x,y)

#The graph part is omitted

Well, the versicolor can be classified in the opposite way. The loss function may be a little different, but I haven't verified it.

Summary

I thought about the perceptron, which is the basis of the classifier. Since deep learning is a model that combines a large number of perceptrons, understanding of perceptrons will become more important.

Machine learning algorithm (simple perceptron)