Last time University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (8) Make your own stochastic steepest descent method
We classify the two chunks, but the problem is divided into those with and without outliers that deviate significantly from the chunks.
Youtube commentary is about 54 minutes of the 6th (1) The hinge loss of the one with no outlier is actually the same as the example of scikit-learn. It was more difficult to use matplotlib than the SVM part. The original story is this I added a comment for my own learning.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.datasets import make_blobs
#Create 40 random classification datasets Specify the number of chunks in centers
X, y = make_blobs(n_samples=40, centers=2, random_state=6)
# kernel='linear'The larger the hinge loss C, the less effective regularization is.
clf = svm.SVC(kernel='linear', C=1000), y)
#Draw classification data. The color is decided by the cmap part.
plt.scatter(X[:, 0], X[:, 1], c=y, s=30,
#Drawing the decision boundary
ax = plt.gca()
xlim = ax.get_xlim()
ylim = ax.get_ylim()
#Make a 30x30 grid
xx = np.linspace(xlim[0], xlim[1], 30)
yy = np.linspace(ylim[0], ylim[1], 30)
YY, XX = np.meshgrid(yy, xx)
xy = np.vstack([XX.ravel(), YY.ravel()]).T
#Classification at each grid
Z = clf.decision_function(xy).reshape(XX.shape)
#Draw decision boundaries using contour lines level=0 corresponds to it
ax.contour(XX, YY, Z, colors='k', levels=[-1, 0, 1], alpha=0.5,
linestyles=['--', '-', '--'])
#Draw the support vector with the smallest margin
ax.scatter(clf.support_vectors_[:, 0], clf.support_vectors_[:, 1], s=100,
linewidth=1, facecolors='none', edgecolors='k')
plt.savefig("5.3.png ")
Only two lines are actually calculated. The result looks like this.
I could understand this because there was a source code in the commentary video, but it was impossible without it.
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model
from sklearn.datasets import make_blobs
#Create 40 random classification datasets Specify the number of chunks in centers
X, y = make_blobs(n_samples=40, centers=2, random_state=6)
#The value of y-1,Set to 1
y = y*2-1
#Square loss
clf = linear_model.LinearRegression(fit_intercept=True,normalize=True,copy_X=True), y)
#Draw classification data. The color is decided by the cmap part.
plt.scatter(X[:, 0], X[:, 1], c=y, s=30,
#Drawing the decision boundary
x_plot = np.linspace(4,10,100)
w = [clf.intercept_,clf.coef_[0],clf.coef_[1]]
y_plot = -(w[1]/w[2]) * x_plot - w[0]/w[2]
plt.savefig("5.3.png ")
The idea is to perform linear multiple regression with $ X $ created by make_blobs
as the feature quantity (2 types) and $ y $ as the target quantity.
In this example, the number of feature samples is 40.
In the graph above, the horizontal axis is $ x_1 $ and the vertical axis is $ x_2 $.
y = w_0 + w_1\times x_1 + w_2\times x_2
Can be expressed as.
In make_blobs
, $ y = 0,1 $, but this is changed to $ y = -1,1 $ by y = y * 2-1
The decision boundary can be drawn by setting $ y = 0 $.
0 = w_0 + w_1\times x_1 + w_2\times x_2 \\
x_2 = -\frac{w_0}{w_2} - \frac{w_1}{w_2}x_1
This is the last part of the source code.
This is what I drew.
If there are no large outliers, similar results can be obtained with both hinge loss and square loss. However, when there are large outliers, the squared loss overestimates the loss of the outliers, making it impossible to obtain correct results. This will be explained next time.
University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (1) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (2) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (3) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (4) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (5) University of Tsukuba Machine Learning Course: Study sklearn while creating the Python script part of the assignment (6) University of Tsukuba Machine Learning Course: Study sklearn while making the Python script part of the task (7) Make your own steepest descent method
Recommended Posts