Step-by-step on the theory, implementation in python, and analysis using scikit-learn about the algorithm previously taken up in "Classification of Machine Learning" I will study with. I'm writing it for personal learning, so I'd like you to overlook any mistakes.
From this time on, I will start on the classification problem. First of all, from the basic perceptron.
The following sites were referred to this time. Thank you very much.
Two-class classification refers to outputting "1" or "0" (or "1" or "-1") for an input. Instead of "may break down with a 60% probability", put black and white to see if it breaks down or not. There are various types of two-class classification, and ** Perceptron ** is the most basic classifier.
The perceptron is a nerve cell-inspired model that adds weights to a large number of inputs and outputs 1 when a certain threshold is exceeded. It is that picture that you often see when illustrated.
n inputs $ \ boldsymbol {x} = (x_0, x_1, \ cdots, x_ {n}) $, weights $ \ boldsymbol {w} = (w_0, w_1, \ cdots, w_ {n}) $ And when you add them all together,
w_0x_0+w_1x_1+\cdots+w_{n}x_{n} \\\
=\sum_{i=0}^{n}w_ix_i \\\
= \boldsymbol{w}^T\boldsymbol{x}
It is expressed as. T is the transposed matrix. And if this value is positive, it outputs 1, and if it is negative, it outputs -1. A function that indicates such a value of -1 or 1 is called a step function.
The initial value irrelevant to the input is called ** bias term **, but if the bias term is $ w_0 $ and $ x_0 = 1 $, the above formula can be used as it is.
Since python can calculate the product of matrices with "@", if the input input to the perceptron and the output are the output
import numpy as np
w = np.array([1.,-2.,3.,-4.])
x = np.array([1.,2.,3.,4.])
input = w.T @ x
output = 1 if input>=0 else -1
It's easy.
Perceptron is so-called "supervised learning". For the given $ \ boldsymbol {x} $, if there is a correct label $ \ boldsymbol {t} = (t_0, t_1, \ cdots, t_n) $, then $ \ boldsymbol {w} ^ T \ boldsymbol {x You need to find $ \ boldsymbol {w} $ such that} $ returns the correct label correctly.
This needs to be learned using the teacher data as in the case of regression. For the perceptron, the same approach of determining the loss function and updating the parameter $ \ boldsymbol {w} $ to minimize the loss is effective.
So what kind of loss function should we set? The idea is that if the answer is correct, there is no loss, and if the answer is incorrect, the loss is given according to the distance from the boundary based on the boundary that classifies the two classes.
The hinge function is often used to meet such demands. It seems that the perceptron of scikit-learn also uses the hinge function. For the hinge function,
As you can see here, in a function that increases from a certain value, if you say $$ h (x)
For the loss function, if the correct label $ t_n $ for each element and the predicted value $ step (w_nx_n) $ are the same, $ t_nw_nx_n $ indicates a positive value, and if they are different, it is a negative value. The smaller the loss function, the better, so if the loss function is $ L
Partial differentiation of $ L $ with respect to $ w_n $
\frac{\partial L}{\partial w_n}=-t_nx_n
So the recurrence formula that updates $ w_n $ is
w_{i+1}=w_{i}+\eta t_nx_n
Can be written. In addition, $ \ eta $ is the learning rate.
I will actually implement it in python. The data used is the familiar scikit-learn to iris classification. See below for a detailed description of the dataset.
First of all, since it is a two-class classification, we will specialize in that. It doesn't matter what data you use, but it's arbitrary and prejudiced, and the labels are "versicolor" and "virginica". "Sepal length (cm)" and "petal width (cm)" were selected for the features.
First, visualize the data.
mport numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.datasets import load_iris
iris = load_iris()
df_iris = pd.DataFrame(iris.data, columns=iris.feature_names)
df_iris['target'] = iris.target_names[iris.target]
fig, ax = plt.subplots()
x1 = df_iris[df_iris['target']=='versicolor'].iloc[:,3].values
y1 = df_iris[df_iris['target']=='versicolor'].iloc[:,0].values
x2 = df_iris[df_iris['target']=='virginica'].iloc[:,3].values
y2 = df_iris[df_iris['target']=='virginica'].iloc[:,0].values
ax.scatter(x1, y1, color='red', marker='o', label='versicolor')
ax.scatter(x2, y2, color='blue', marker='s', label='virginica')
ax.set_xlabel("petal width (cm)")
ax.set_ylabel("sepal length (cm)")
ax.legend()
plt.plot()
It seems that it can be classified somehow (it is also said that such data was selected).
Implement the Perceptron class. The bias term is intentionally added.
class Perceptron:
def __init__(self, eta=0.1, n_iter=1000):
self.eta=eta
self.n_iter=n_iter
self.w = np.array([])
def fit(self, x, y):
self.w = np.ones(len(x[0])+1)
x = np.hstack([np.ones((len(x),1)), x])
for _ in range(self.n_iter):
for i in range(len(x)):
loss = np.max([0, -y[i] * self.w.T @ x[i]])
if (loss!=0):
self.w += self.eta * y[i] * x[i]
def predict(self, x):
x = np.hstack([1., x])
return 1 if self.w.T @ x>=0 else -1
@property
def w_(self):
return self.w
The hinge loss function is calculated for each data, and if the answer is incorrect, the weight is updated by the steepest gradient method. The calculation is stopped when the specified number of updates is reached, but this may be stopped when the error falls below a certain value.
After putting the data in the previous class and training it, let's draw a boundary.
df = df_iris[df_iris['target']!='setosa']
df = df.drop(df.columns[[1,2]], axis=1)
df['target'] = df['target'].map({'versicolor':1, 'virginica':-1})
x = df.iloc[:,0:2].values
y = df['target'].values
model = Perceptron()
model.fit(x, y)
#Drawing a graph
fig, ax = plt.subplots()
x1 = df_iris[df_iris['target']=='versicolor'].iloc[:,3].values
y1 = df_iris[df_iris['target']=='versicolor'].iloc[:,0].values
x2 = df_iris[df_iris['target']=='virginica'].iloc[:,3].values
y2 = df_iris[df_iris['target']=='virginica'].iloc[:,0].values
ax.scatter(x1, y1, color='red', marker='o', label='versicolor')
ax.scatter(x2, y2, color='blue', marker='s', label='virginica')
ax.set_xlabel("petal width (cm)")
ax.set_ylabel("sepal length (cm)")
#Draw classification boundaries
w = model.w_
x_fig = np.linspace(1.,2.5,100)
y_fig = [-w[2]/w[1]*xi-w[0]/w[1] for xi in x_fig]
ax.plot(x_fig, y_fig)
ax.set_ylim(4.8,8.2)
ax.legend()
plt.show()
It seems that virginica can be classified correctly, but visicolor cannot be classified. Is it such a thing?
df = df_iris[df_iris['target']!='setosa']
df = df.drop(df.columns[[1,2]], axis=1)
df['target'] = df['target'].map({'versicolor':1, 'virginica':-1})
x = df.iloc[:,0:2].values
y = df['target'].values
from sklearn.linear_model import Perceptron
model = Perceptron(max_iter=40, eta0=0.1)
model.fit(x,y)
#The graph part is omitted
Well, the versicolor can be classified in the opposite way. The loss function may be a little different, but I haven't verified it.
I thought about the perceptron, which is the basis of the classifier. Since deep learning is a model that combines a large number of perceptrons, understanding of perceptrons will become more important.