Introduction

This time, I would like to summarize a machine learning model using a support vector machine. The outline is below.

Understand the concept of support vector machines
Try a nonlinear SVM classifier based on the MakeMoons dataset
Try the Gauss RBF kernel

Understand the concept of support vector machines

The Support Vector Machine is one of the machine learning algorithms that optimizes the model based on a criterion called "margin maximization". It is basically used for classification and regression problems. Typical algorithms (which I recognize) to solve these classification / regression problems are as follows.

Decision tree system (decision tree / random forest / LightGBM, etc.)
Neural network system
** (This time) Support vector machine **
k-nearest neighbor method
Logistic regression

Other than logistic regression, it has the feature that non-linear representation is possible. The support vector machine this time is a popular algorithm (like) because it is practical and easy to handle.

In some cases, it is translated as a support ** vector ** machine. This is probably due to the difference in translating Vector as "vector" or "vector".

What is support vector and margin maximization?

Now, let me explain the words. The margin is the distance between the element (△, ▲) that is closest to the separator (line format: $ w ^ Tx + b $) that identifies as shown in the figure below. And this element with the closest distance is called a support vector.

Maximizing the distance of this margin is the optimization of this support vector machine (hereinafter abbreviated as SVM).

Try a nonlinear SVM classifier based on the MakeMoons dataset

Linear classifiers work well for data with easy-to-separate characteristics like the ones above, but in practice they often require very complex divisions. In that case, you need to use a non-linear classifier. By using SVM, it is possible to easily create and use a non-linear classifier.

The MakeMoons dataset is an example of an SVM analysis. This is a module that can be imported from the scikit learn dataset shown below. You can draw so-called moon-shaped data. Let's classify this by nonlinear SVM.

`SVM.ipynb`


from sklearn.datasets import make_moons
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)

def plot_dataset(X, y, axes):
    plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs")
    plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^")
    plt.axis(axes)
    plt.grid(True, which='both')
    plt.xlabel(r"$x_1$", fontsize=20)
    plt.ylabel(r"$x_2$", fontsize=20, rotation=0)

plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.show()

SVM classifier with polynomial kernels of different dimensions

In SVM, the value of degree represents a dimension, and the performance of the classifier changes depending on this dimension value.

`SVM.ipynb`


from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import SVC

poly_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
    ])
poly_kernel_svm_clf.fit(X, y)

poly100_kernel_svm_clf = Pipeline([
        ("scaler", StandardScaler()),
        ("svm_clf", SVC(kernel="poly", degree=12, coef0=100, C=5))
    ])
poly100_kernel_svm_clf.fit(X, y)

plt.figure(figsize=(11, 4))

plt.subplot(121)
plot_predictions(poly_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.title(r"$d=3, r=1, C=5$", fontsize=18)

plt.subplot(122)
plot_predictions(poly100_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.title(r"$d=12, r=100, C=5$", fontsize=18)

plt.show()

In the above example, the left figure shows a three-dimensional example, and the right figure shows a 12-dimensional example. In the case of 12 dimensions, it is in a state of overfitting. Therefore, in this case, it is necessary to lower the order of the model. On the contrary, if you lower it to two dimensions, you can see that it cannot be classified as shown in the figure below.

Understand scikit-learn Pipeline

A method called Pipeline that came out in the above program came out, so I will briefly describe it. Detailed pre-processing such as missing value processing and value standardization is required for model prediction in machine learning. At this time, scikit's Pipeline is a method that can solve these problems to some extent. 　 In this case, the following processing is performed.


"scaler", StandardScaler()#Standardize the value (= subtract by mean and divide by variance)
"svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5)#Set the SVM classifier

I understand that it involves the processing that I want to compile as a model,

Try the Gauss RBF kernel

Now, we will use one of the SVM classification methods called the Gauss RBF kernel method. All the words were difficult to understand, so I would like to understand the meaning of each word.

What is a kernel function

First, about kernel functions. This is a very important function for non-linear representation, which I used lightly in the MakeMoons example earlier. As an overview, as shown in the figure below, a process that can have a plane that can be linearly separable by increasing the dimensions (in practice, it is called a hyperplane because it has three or more dimensions) ** is called a kernel function. I call.

Regression using this kernel function is called kernel regression. The formula that represents kernel regression is as follows.

f({\bf x}) = \sum_{i=1}^{N} \alpha_i k({\bf x}^{(i)}, {\bf x})

$ \ Alpha_i $ is the coefficient you want to optimize, and $ k ({\ bf x} ^ {(i)}, {\ bf x}) $ is called the kernel function. There are several types of this kernel function used. The following is one of them, a kernel function called Gaussian kernel.

k({\bf x}, {\bf x}') = exp(-\beta \|{\bf x} - {\bf x}'\|^2)

||x-x'||Is the norm: the distance.\betaIs\beta>0It is a real hyperparameter that satisfies, and is determined by the person using the model.

What is RBF

It is called Radial Basis Function, and in Japanese it is called Radial Basis Function. There seem to be various functions that become this RBF, but the most used one is called the Gaussian function.

φ(\bf x)=(\bf x - \bf c)^T \frac {(\bf x - \bf c)}{2σ^2}

$ \ Bf x $ is the input function and $ \ bf c $ is the center of the Gaussian function.

Let's implement

`SVM.ipynb`


from sklearn.svm import SVC

gamma1, gamma2 = 0.1, 5
C1, C2 = 0.001, 1000
hyperparams = (gamma1, C1), (gamma1, C2), (gamma2, C1), (gamma2, C2)

svm_clfs = []
for gamma, C in hyperparams:
    rbf_kernel_svm_clf = Pipeline([
            ("scaler", StandardScaler()),
            ("svm_clf", SVC(kernel="rbf", gamma=gamma, C=C))
        ])
    rbf_kernel_svm_clf.fit(X, y)
    svm_clfs.append(rbf_kernel_svm_clf)

plt.figure(figsize=(11, 11))

for i, svm_clf in enumerate(svm_clfs):
    plt.subplot(221 + i)
    plot_predictions(svm_clf, [-1.5, 2.5, -1, 1.5])
    plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
    gamma, C = hyperparams[i]
    plt.title(r"$\gamma = {}, C = {}$".format(gamma, C), fontsize=16)

plt.show()

It is necessary to determine both γ and C as hyperparameters, but it turns out that overfitting will occur if each value is too large.

At the end

While SVM is a very easy-to-use model, the mathematical theory behind it is very profound.

I referred to this article this time.

Linear method and kernel method (regression analysis) https://qiita.com/wsuzume/items/09a59036c8944fd563ff

The full program is stored here. https://github.com/Fumio-eisan/SVM_20200417

I tried to understand the support vector machine carefully (Part 1: I tried the polynomial / RBF kernel using MakeMoons as an example).

Introduction

Understand the concept of support vector machines

What is support vector and margin maximization?

Try a nonlinear SVM classifier based on the MakeMoons dataset

SVM.ipynb

SVM classifier with polynomial kernels of different dimensions

SVM.ipynb

Understand scikit-learn Pipeline

Try the Gauss RBF kernel

What is a kernel function

What is RBF

Let's implement

SVM.ipynb

At the end

`SVM.ipynb`

`SVM.ipynb`

`SVM.ipynb`