The Support Vector Machine (SVM) minimizes the following objective functions.
\min_{\bf w}\frac{\lambda}{2}\|{\bf w}\|^2+\frac{1}{m}\sum_{({\bf x},y)\in S}l({\bf w}; ({\bf x}, y)) \\
l({\bf w}; ({\bf x},y))=\max(0,1-y\left<{\bf w},{\bf x} \right>)
Online Machine Learning has an algorithm that minimizes this objective function by stochastic gradient descent. It didn't correspond to the trick.
So, when I was looking for an implementation of SVM using the kernel trick by stochastic gradient descent, [Pegasos: Primal Estimated sub-GrAdient SOlver for SVM](http://ttic.uchicago.edu/~nati/Publications/ I found a paper called PegasosMPB.pdf).
In this article, I will implement the algorithm shown in the paper.
Implement Fig.3 in the paper. The input of the algorithm is as follows.
Fig.Choose in 3$i_t \in \lbrace 0,\dots,|S| \rbrace
The output of the algorithm is a vector of $ \ alpha_ {T + 1} $ with $ m $ of elements. By using the obtained $ \ alpha_ {T + 1} $, the prediction can be made as follows.
\hat{y}={\bf w}_{T+1}^{\rm T}\phi({\bf x})=\frac{1}{\lambda t}\sum_{j=1}^{m}\alpha_{T+1}[j]y_jK({\bf x}_j,{\bf x})
I implemented it like sklearn. Also, the algorithm has been modified to add a bias term. The source code has been uploaded to here.
SGDSVC.py
import numpy as np
class SGDSVC(object):
def __init__(self,
kernel="rbf", lmd=1e-1, gamma=0.1, bias=1.0, max_iter=100):
if kernel not in self.__kernel_dict:
print(kernel + " kernel does not exist!\nUse rbf kernel.")
kernel = "rbf"
if kernel == "rbf":
def kernel_func(x, y):
return self.__kernel_dict[kernel](x,y,gamma=gamma)
else:
kernel_func = self.__kernel_dict[kernel]
self.kernel = kernel_func
self.lmd = lmd
self.bias = bias
self.max_iter = max_iter
def __linear_kernel(x, y):
return np.dot(x, y)
def __gaussian_kernel(x, y, gamma):
diff = x - y
return np.exp(-gamma * np.dot(diff, diff))
__kernel_dict = {"linear": __linear_kernel, "rbf": __gaussian_kernel}
def fit(self, X, y):
def update_alpha(alpha, t):
data_size, feature_size = np.shape(self.X_with_bias)
new_alpha = np.copy(alpha)
it = np.random.randint(low=0, high=data_size)
x_it = self.X_with_bias[it]
y_it = self.y[it]
if (y_it *
(1. / (self.lmd * t)) *
sum([alpha_j * y_it * self.kernel(x_it, x_j)
for x_j, alpha_j in zip(self.X_with_bias, alpha)])) < 1.:
new_alpha[it] += 1
return new_alpha
self.X_with_bias = np.c_[X, np.ones((np.shape(X)[0])) * self.bias]
self.y = y
alpha = np.zeros((np.shape(self.X_with_bias)[0], 1))
for t in range(1, self.max_iter + 1):
alpha = update_alpha(alpha, t)
self.alpha = alpha
def decision_function(self, X):
X_with_bias = np.c_[X, np.ones((np.shape(X)[0])) * self.bias]
y_score = [(1. / (self.lmd * self.max_iter)) *
sum([alpha_j * y_j * self.kernel(x_j, x)
for (x_j, y_j, alpha_j) in zip(
self.X_with_bias, self.y, self.alpha)])
for x in X_with_bias]
return np.array(y_score)
def predict(self, X):
y_score = self.decision_function(X)
y_predict = map(lambda s: 1 if s >= 0. else -1, y_score)
return y_predict
Here is an example prediction for the iris dataset.
In order to confirm the correctness of the implementation, the data used during the training is used as it is in the test.
The parameters are $ \ lambda = 0.1, T = 100
main.py
# -*- coding: utf-8 -*-
import SGDSVC
from sklearn import datasets
from sklearn import preprocessing
from sklearn import metrics
if __name__ == '__main__':
iris = datasets.load_iris()
X = iris.data[:100]
y = iris.target[:100]
#Label is 1, -1
y = map(lambda l: 1 if l == 1 else -1, y)
#scaling
scaler = preprocessing.StandardScaler()
X = scaler.fit_transform(X)
clf = SGDSVC.SGDSVC()
clf.fit(X, y)
y_pred = clf.predict(X)
print("Confusion Matrix")
print(metrics.confusion_matrix(y, y_pred))
print(metrics.classification_report(y, y_pred))
The execution result is as follows.
Confusion Matrix
[[50 0]
[ 0 50]]
precision recall f1-score support
-1 1.00 1.00 1.00 50
1 1.00 1.00 1.00 50
avg / total 1.00 1.00 1.00 100
As mentioned above, this is the result of using the data used during training as it is for the test. You can see that all the data points are correctly classified.
The online version of SVM is Passive Aggressive, but Chapter 1 of this paper describes the difference from PA. Also, the name Pegasos is really cool.
(2015/7/8) Corrected the prediction formula because it was slightly wrong. (2015/7/10) Corrected Japanese. (2016/1/15) Corrected the link of the cited paper.
Recommended Posts