What happens when I change the hyperparameters of SVM (RBF kernel)?

Overview

SVM (Support Vector Machine) is known as a machine learning method with high classification accuracy. For higher classification accuracy in SVM, hyperparameters need to be determined from training data. In this article, I will explain how the decision boundary changes by adjusting the hyperparameters of SVM using the RBF kernel (Gaussian kernel).

Hyperparameters to decide

In SVM using RBF kernel, adjust the following two hyperparameters.

About cost parameters

SVM is a method for determining the hyperplane that separates the set of data points mapped to the feature space. However, the set of points on the feature space is not always separable. For example, in the figure below, it is not possible to draw a straight line that perfectly separates the two types of symbols. na.PNG

Now, let's consider misclassification, draw a straight line, and divide the point set. For example, draw a straight line in the previous figure as shown below to divide the two types of symbols. miss.PNG

The cost parameter $ C $ is a parameter that determines how much misclassification is tolerated. $ C $ appears in the equation for the quadratic programming problem solved by the SVM.

\min_{\beta}\frac{1}{2}\|\beta\|^2+C\sum_{i=1}^{N}\xi_i

Determine the hyperplane so that smaller $ C $ allows misclassification and larger $ C $ does not tolerate misclassification.

About RBF kernel parameters

RBF kernel parameters: $ \ gamma $ appear in the following RBF kernel expression.

K(x, x')=\exp(-\gamma\|x-x'\|^2)

As shown in the experiment described later, the smaller the value of $ \ gamma $, the simpler the decision boundary, and the larger the value, the more complicated the decision boundary.

Experiment

Let's draw the decision boundaries when $ C $ and $ \ gamma $ are set to extreme values. $ C $ was set to $ 2 ^ {-5} $ and $ 2 ^ {15} $, and $ \ gamma $ was set to $ 2 ^ {-15} $ and $ 2 ^ 3 $, respectively. Use the SVM implemented in scikit-learn (0.15). (Internally, [LIBSVM](http: // www. csie.ntu.edu.tw/~cjlin/libsvm/) is used.) The dataset uses iris. iris is a dataset that contains 3 class labels and 4 features. This time we will use only 2 class labels and 2 features. To make the problem more difficult, we add noise to each of the two features.

Source code

# -*- coding: utf-8 -*-

import numpy as np
from sklearn import svm, datasets
import matplotlib.pyplot as plt
from itertools import product

if __name__ == '__main__':
    iris = datasets.load_iris()
    #The first two features,Use the first two class labels as well
    X = iris.data[:100, :2]
    #Add noise to features
    E = np.random.uniform(0, 1.0, size=np.shape(X))
    X += E
    y = iris.target[:100]
    #mesh step size
    h = 0.02
    #Cost parameters
    Cs = [2 ** -5, 2 ** 15]
    #RBF kernel parameters
    gammas = [2 ** -15, 2 ** 3]
    
    svms = [svm.SVC(C=C, gamma=gamma).fit(X, y) for C, gamma in product(Cs, gammas)]
    titles = ["C: small, gamma: small", "C: small, gamma: large",
        "C: large, gamma: small", "C: large, gamma: large"]
    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
    
    for i, clf in enumerate(svms):
        plt.subplot(2, 2, i + 1)
        plt.subplots_adjust(wspace=0.4, hspace=0.4)
        Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
        Z = Z.reshape(xx.shape)
        plt.contourf(xx, yy, Z, cmap=plt.cm.Paired, alpha=0.8)
        plt.scatter(X[:, 0], X[:, 1], c=y, cmap=plt.cm.Paired)
        plt.xlabel("Sepal length")
        plt.ylabel("Sepal width")
        plt.xlim(xx.min(), xx.max())
        plt.ylim(yy.min(), yy.max())
        plt.xticks(())
        plt.yticks(())
        plt.title(titles[i])
    plt.show()

Execution result

plot.png

The horizontal axis and the vertical axis each represent two features. When $ C $ is small, there are many misclassification points in the decision area, while when $ C $ is large, there are few misclassification points in the decision area. The decision boundary when $ \ gamma $ is small is a simple decision boundary (straight line), while the decision boundary when $ \ gamma $ is large has a complicated shape.

Other

Adjusting $ C $ and $ \ gamma $ seems to give something similar to the decision boundaries when using a linear kernel. If you are uncertain about the kernel selection, it seems okay to use the RBF kernel, but it will take time to tune the parameters. (´ ・ ω ・ `)

Recommended Posts

What happens when I change the hyperparameters of SVM (RBF kernel)?
What kind of Kernel is this Kernel?
What happens when I change the hyperparameters of SVM (RBF kernel)?
Try to make a kernel of Jupyter
What I did to save Python memory
What to do when PermissionError of tempfile.mkstemp occurs
[Question] What happens when I use% in python?
I failed when clustering with k-means, but what should I do (implementation of kernel k-means)
Does TensorFlow change the image of deep learning? What I thought after touching a little
What happens when an amateur completes 100 knocks of language processing?
Check in advance what happens when you execute the command
What I saw by analyzing the data of the engineer market
What is the Linux kernel?
Change the theme of Jupyter
When I start the virtual environment of conda, the prompt of bash collapses
I found out by analyzing the reviews of the job change site! ??
Animate what happens in frequency space when the Nyquist frequency is exceeded
What I did when I couldn't find the feature point with the optical flow of opencv and when I lost it
Change the background of Ubuntu (GNOME)
I investigated the mechanism of flask-login!
H29.2.27 ~ 3.5 Summary of what I did
Change the Python version of Homebrew
Change the suffix of django-filter / DateFromToRangeFilter
What I did to keep track of the humidity and temperature of the archive
When you want to change the HTTP headers of Flask's test client
A reminder of what I got stuck when starting Atcoder with python