Coursera Machine Learning Challenges in Python: ex6 (How to Adjust SVM Parameters)

Introduction

A series that implements the tasks of Coursera's Machine Learning class (Professor Andrew Ng) in Python. In ex6, classification is done using a support vector machine (SVM).

Linear SVM

The first is a linear (no kernel) SVM. In scikit-learn, the interface of machine learning models is unified, and any model can be instantiated and then trained with model.fit (X, y). It has the same grammar whether it is linear regression, logistic regression, or SVM. SVM uses the sklearn.svm.SVC () class.

Click here for the code. As usual, use scipy.scio.loadmat to load the Matlab format data and perform the training.

import numpy as np
import matplotlib.pyplot as plt
import scipy.io as scio
from sklearn import svm

# scipy.io.loadmat()Load matlab data using
data = scio.loadmat('ex6data1.mat')
X = data['X']
y = data['y'].ravel()

pos = (y==1) # numpy bool index
neg = (y==0) # numpy bool index
plt.scatter(X[pos,0], X[pos,1], marker='+', c='k')
plt.scatter(X[neg,0], X[neg,1], marker='o', c='y')

#Linear SVM
model = svm.SVC(C=1.0, kernel='linear')
model.fit(X, y)

#Draw a decision boundary
px = np.linspace( np.min(X[:,0]), np.max(X[:,0]), 100)
w = model.coef_[0]
py = - (w[0]*px + model.intercept_[0]) / w[1]
plt.plot(px,py)
plt.show()

Click here for the results.

Now, according to the task, adjust the SVM regularization parameter C to see the change in behavior. Click here for the result with C = 100.0.

Increasing C weakens regularization. For this reason, the decision boundary has changed so that outliers on the left are also properly classified. This is a little overfitting, and it can be said that the classification of the more natural form is when C = 1.0 is set earlier.

Gaussian kernel SVM

In the next task, we will classify another dataset with the Gaussian kernel SVM, which is difficult to separate on a straight line. Since the Gaussian kernel is used, set kernel ='rbf' in the parameter of svm.SVC.

import numpy as np
import matplotlib.pyplot as plt
import scipy.io as scio
from sklearn import svm

# scipy.io.loadmat()Load matlab data using
data = scio.loadmat('ex6data2.mat')
X = data['X']
y = data['y'].ravel()

pos = (y==1) # numpy bool index
neg = (y==0) # numpy bool index
plt.scatter(X[pos,0], X[pos,1], marker='+', c='k')
plt.scatter(X[neg,0], X[neg,1], marker='o', c='y')

#Gaussian kernel(RBF)SVM
model = svm.SVC(C=1.0, gamma=50.0, kernel='rbf', probability=True)
model.fit(X, y)

# Decision Boundary(Decision boundary)To plot
px = np.arange(0, 1, 0.01)
py = np.arange(0, 1, 0.01)
PX, PY = np.meshgrid(px, py) # PX,Each PY is a 100x100 matrix
XX = np.c_[PX.ravel(), PY.ravel()] #XX is a 10000x2 matrix
Z = model.predict_proba(XX)[:,1] #Predicted by SVM model. y=The probability of 1 is in the second column of the result, so take it out. Z is a 10000 dimensional vector
Z = Z.reshape(PX.shape) #Convert Z to 100x100 Matrix
plt.contour(PX, PY, Z, levels=[0.5], linewidths=3) # Z=0.The contour line of 5 becomes the decision boundary
plt.xlim(0.0,1.0)
plt.ylim(0.4,1.0)
plt.show()

The resulting plot looks like this, and you can see that even the data that becomes the decision boundary of a complicated shape can be classified neatly.

The point this time is about gamma of the RBF kernel, which is one of the parameters of SVM. In Coursera, the Gaussian kernel $ K_{gaussian}(x^{(i)},x^{(j)}) = \exp(-\frac{||x^{(i)} - x^{(j)}||^2}{2\sigma^2}) $ It was defined in the form of. On the other hand, the RBF kernel in scikit-learn $ \exp(-\gamma \left| x-x' \right| ^2) $ It is in the form of, and $ \ gamma $ gamma is passed as a parameter tosklearn.svm.SVC (). Comparing the two equations, they correspond to $ \ gamma = \ frac {1} {2 \ sigma ^ 2} $. In the Coursera example, $ \ sigma = 0.1 $ was set, so we set $ \ gamma = 50 $ accordingly.

How to adjust the parameters `C` and σ

As we have seen, the Gaussian kernel SVM requires adjustment of the parameters C and $ \ sigma $.

―― C is the strength of regularization. The smaller C is, the more regularized (= does not fit the training data, the generalization performance becomes stronger), and the larger C is, the less regularized (= fits the training data, overfitting). -$ \ sigma $ is the width of the Gaussian kernel. The larger $ \ sigma $, the smoother the classification boundaries.

Based on this characteristic, we will tune both parameters.

So, in the next task, we will train the new dataset with different combinations of C and $ \ sigma $, and adopt the set of parameters with the highest percentage of correct answers in the classification. Eight kinds of values of 0.01, 0.03, 0.1, 0.3, 1, 3, 10, 30 are tried as the values of C and $ \ sigma $, so 8x8 = 64 times of learning is performed. Click here for the code.

import numpy as np
import matplotlib.pyplot as plt
import scipy.io as scio
from sklearn import svm

# scipy.io.loadmat()Load matlab data using
data = scio.loadmat('ex6data3.mat')
X = data['X']
y = data['y'].ravel()
Xval = data['Xval']
yval = data['yval'].ravel()

c_values = np.array([0.01, 0.03, 0.1, 0.3, 1.0, 3.0, 10.0, 30.0])
gamma_values = 1/ (2* c_values**2)

#Gaussian kernel with different C and gamma(RBF)Learn SVM
scores = np.zeros([8,8])
for i_c in range(0,8):
    for i_g in range(0,8):
        model = svm.SVC(C=c_values[i_c], gamma=gamma_values[i_g], kernel='rbf')
        model.fit(X, y)
        #Calculate the score in the cross-validation data
        scores[i_c, i_g] = model.score(Xval, yval) 

#C with the highest score,Ask for gamma
max_idx = np.unravel_index(np.argmax(scores), scores.shape)
#Maximum C,Learn SVM again with gamma
model = svm.SVC(C=c_values[max_idx[0]], gamma=gamma_values[max_idx[1]], kernel='rbf', probability=True)
model.fit(X, y)

#Plot cross-validation data
pos = (yval==1) # numpy bool index
neg = (yval==0) # numpy bool index
plt.scatter(Xval[pos,0], Xval[pos,1], marker='+', c='k')
plt.scatter(Xval[neg,0], Xval[neg,1], marker='o', c='y')

# Decision Boundary(Decision boundary)To plot
px = np.arange(-0.6, 0.25, 0.01)
py = np.arange(-0.8, 0.6, 0.01)
PX, PY = np.meshgrid(px, py) # PX,Each PY is a 100x100 matrix
XX = np.c_[PX.ravel(), PY.ravel()] #XX is a 10000x2 matrix
Z = model.predict_proba(XX)[:,1] #Predicted by SVM model. y=The probability of 1 is in the second column of the result, so take it out. Z is a 10000 dimensional vector
Z = Z.reshape(PX.shape) #Convert Z to 100x100 Matrix
plt.contour(PX, PY, Z, levels=[0.5], linewidths=3) # Z=0.The contour line of 5 becomes the decision boundary
plt.show()

As a result of verification with cross-validation data, C = 1.0, gamma = 0.1 has the highest performance, and the decision boundary of the classifier is as follows.

Note --How to find the index of the maximum value from the numpy two-dimensional array ʻA-> np.unravel_index (np.argmax (A), A.shape)` A tuple with an index is returned as a result.

Summary

Dr. Andrew has issued the following guidelines on the proper use of logistic regression, linear SVM, and Gaussian kernel SVM. If m = number of samples (training data) and n = number of features

--If n is large (~ 10000) and m is small (~ 1000), then logistic regression or linear SVM (-> because there are not enough samples to create complex nonlinear decision boundaries) --If n is small (1 ~ 1000) and m is moderate (~ 50,000), Gaussian kernel SVM --If n is small (1 ~ 1000) and m is very large (1 million or so), add a feature and logistic regression or linear SVM (because Gaussian SVM is slow when m is large)

The latter half of ex6 (spam filter using SVM) will be done separately.