This time, I would like to summarize a machine learning model using a support vector machine. The outline is below.
The Support Vector Machine is one of the machine learning algorithms that optimizes the model based on a criterion called "margin maximization". It is basically used for classification and regression problems. Typical algorithms (which I recognize) to solve these classification / regression problems are as follows.
Other than logistic regression, it has the feature that non-linear representation is possible. The support vector machine this time is a popular algorithm (like) because it is practical and easy to handle.
In some cases, it is translated as a support ** vector ** machine. This is probably due to the difference in translating Vector as "vector" or "vector".
Now, let me explain the words. The margin is the distance between the element (△, ▲) that is closest to the separator (line format: $ w ^ Tx + b $) that identifies as shown in the figure below. And this element with the closest distance is called a support vector.
Maximizing the distance of this margin is the optimization of this support vector machine (hereinafter abbreviated as SVM).
Linear classifiers work well for data with easy-to-separate characteristics like the ones above, but in practice they often require very complex divisions. In that case, you need to use a non-linear classifier. By using SVM, it is possible to easily create and use a non-linear classifier.
The MakeMoons dataset is an example of an SVM analysis. This is a module that can be imported from the scikit learn dataset shown below. You can draw so-called moon-shaped data. Let's classify this by nonlinear SVM.
SVM.ipynb
from sklearn.datasets import make_moons
X, y = make_moons(n_samples=100, noise=0.15, random_state=42)
def plot_dataset(X, y, axes):
plt.plot(X[:, 0][y==0], X[:, 1][y==0], "bs")
plt.plot(X[:, 0][y==1], X[:, 1][y==1], "g^")
plt.axis(axes)
plt.grid(True, which='both')
plt.xlabel(r"$x_1$", fontsize=20)
plt.ylabel(r"$x_2$", fontsize=20, rotation=0)
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.show()
In SVM, the value of degree represents a dimension, and the performance of the classifier changes depending on this dimension value.
SVM.ipynb
from sklearn.datasets import make_moons
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import PolynomialFeatures
from sklearn.svm import SVC
poly_kernel_svm_clf = Pipeline([
("scaler", StandardScaler()),
("svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5))
])
poly_kernel_svm_clf.fit(X, y)
poly100_kernel_svm_clf = Pipeline([
("scaler", StandardScaler()),
("svm_clf", SVC(kernel="poly", degree=12, coef0=100, C=5))
])
poly100_kernel_svm_clf.fit(X, y)
plt.figure(figsize=(11, 4))
plt.subplot(121)
plot_predictions(poly_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.title(r"$d=3, r=1, C=5$", fontsize=18)
plt.subplot(122)
plot_predictions(poly100_kernel_svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
plt.title(r"$d=12, r=100, C=5$", fontsize=18)
plt.show()
In the above example, the left figure shows a three-dimensional example, and the right figure shows a 12-dimensional example. In the case of 12 dimensions, it is in a state of overfitting. Therefore, in this case, it is necessary to lower the order of the model. On the contrary, if you lower it to two dimensions, you can see that it cannot be classified as shown in the figure below.
A method called Pipeline that came out in the above program came out, so I will briefly describe it. Detailed pre-processing such as missing value processing and value standardization is required for model prediction in machine learning. At this time, scikit's Pipeline is a method that can solve these problems to some extent. In this case, the following processing is performed.
"scaler", StandardScaler()#Standardize the value (= subtract by mean and divide by variance)
"svm_clf", SVC(kernel="poly", degree=3, coef0=1, C=5)#Set the SVM classifier
I understand that it involves the processing that I want to compile as a model,
Now, we will use one of the SVM classification methods called the Gauss RBF kernel method. All the words were difficult to understand, so I would like to understand the meaning of each word.
First, about kernel functions. This is a very important function for non-linear representation, which I used lightly in the MakeMoons example earlier. As an overview, as shown in the figure below, a process that can have a plane that can be linearly separable by increasing the dimensions (in practice, it is called a hyperplane because it has three or more dimensions) ** is called a kernel function. I call.
Regression using this kernel function is called kernel regression. The formula that represents kernel regression is as follows.
f({\bf x}) = \sum_{i=1}^{N} \alpha_i k({\bf x}^{(i)}, {\bf x})
$ \ Alpha_i $ is the coefficient you want to optimize, and $ k ({\ bf x} ^ {(i)}, {\ bf x}) $ is called the kernel function. There are several types of this kernel function used. The following is one of them, a kernel function called Gaussian kernel.
k({\bf x}, {\bf x}') = exp(-\beta \|{\bf x} - {\bf x}'\|^2)
It is called Radial Basis Function, and in Japanese it is called Radial Basis Function. There seem to be various functions that become this RBF, but the most used one is called the Gaussian function.
φ(\bf x)=(\bf x - \bf c)^T \frac {(\bf x - \bf c)}{2σ^2}
$ \ Bf x $ is the input function and $ \ bf c $ is the center of the Gaussian function.
SVM.ipynb
from sklearn.svm import SVC
gamma1, gamma2 = 0.1, 5
C1, C2 = 0.001, 1000
hyperparams = (gamma1, C1), (gamma1, C2), (gamma2, C1), (gamma2, C2)
svm_clfs = []
for gamma, C in hyperparams:
rbf_kernel_svm_clf = Pipeline([
("scaler", StandardScaler()),
("svm_clf", SVC(kernel="rbf", gamma=gamma, C=C))
])
rbf_kernel_svm_clf.fit(X, y)
svm_clfs.append(rbf_kernel_svm_clf)
plt.figure(figsize=(11, 11))
for i, svm_clf in enumerate(svm_clfs):
plt.subplot(221 + i)
plot_predictions(svm_clf, [-1.5, 2.5, -1, 1.5])
plot_dataset(X, y, [-1.5, 2.5, -1, 1.5])
gamma, C = hyperparams[i]
plt.title(r"$\gamma = {}, C = {}$".format(gamma, C), fontsize=16)
plt.show()
It is necessary to determine both γ and C as hyperparameters, but it turns out that overfitting will occur if each value is too large.
While SVM is a very easy-to-use model, the mathematical theory behind it is very profound.
I referred to this article this time.
Linear method and kernel method (regression analysis) https://qiita.com/wsuzume/items/09a59036c8944fd563ff
The full program is stored here. https://github.com/Fumio-eisan/SVM_20200417
Recommended Posts