Introduction

Step-by-step on the theory, implementation in python, and analysis using scikit-learn about the algorithm previously taken up in "Classification of Machine Learning" I will study with. I'm writing it for personal learning, so I'd like you to overlook any mistakes.

Last time, extended 2 class classification to multi class classification. This time I will actually implement it in Python.

I referred to the following sites. Thank you very much.

Implementation policy

I would like to extend Logistic Regression implemented before to multiple classes. The method is

One-vs-Rest
Multi-class Softmax

I will try it with.

Data used for classification

Iris data is used for classification. It uses 4 features (sepal_length, sepal_width, petal_length, petal_width) and classifies them into 3 classes (setosa, versicolor, virginica).

Below, we will implement the classification using sepal_length and sepal_width for clarity.

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.datasets import load_iris

sns.set()
iris = sns.load_dataset("iris")
ax = sns.scatterplot(x=iris.sepal_length, y=iris.sepal_width,
                     hue=iris.species, style=iris.species)

One-vs-Rest One-vs-Rest builds a two-class classifier for each label class to learn, and finally uses the most plausible value. Since logistic regression outputs the probability value, the classification of the classifier with the highest probability is adopted.

Use the LogisticRegression class, which is a slightly modified version of the logistic regression code we used last time. I made the predict_proba method because it determines which value to use with probability.

from scipy import optimize

class LogisticRegression:
  def __init__(self):
    self.w = None

  def sigmoid(self, a):
    return 1.0 / (1 + np.exp(-a))

  def predict_proba(self, x):
    x = np.hstack([1, x])
    return self.sigmoid(self.w.T @ x)
  
  def predict(self, x):
    return 1 if self.predict_proba(x)>=0.5 else -1

  def cross_entropy_loss(self, w, *args):
    def safe_log(x, minval=0.0000000001):
      return np.log(x.clip(min=minval))
    t, x = args
    loss = 0
    for i in range(len(t)):
      ti = 1 if t[i] > 0 else 0
      h = self.sigmoid(w.T @ x[i])
      loss += -ti*safe_log(h) - (1-ti)*safe_log(1-h)

    return loss/len(t)

  def grad_cross_entropy_loss(self, w, *args):
    t, x = args
    grad = np.zeros_like(w)
    for i in range(len(t)):
      ti = 1 if t[i] > 0 else 0
      h = self.sigmoid(w.T @ x[i])
      grad += (h - ti) * x[i]

    return grad/len(t)

  def fit(self, x, y):
    w0 = np.ones(len(x[0])+1)
    x = np.hstack([np.ones((len(x),1)), x])

    self.w = optimize.fmin_cg(self.cross_entropy_loss, w0, fprime=self.grad_cross_entropy_loss, args=(y, x))

  @property
  def w_(self):
    return self.w

Implement the One-vs-Rest class. I also implemented the ʻaccuracy_score` method to calculate how correct the answer is, as I will use it later for algorithm comparison.

from sklearn.metrics import accuracy_score

class OneVsRest:
  def __init__(self, classifier, labels):
    self.classifier = classifier
    self.labels = labels
    self.classifiers = [classifier() for _ in range(len(self.labels))]

  def fit(self, x, y):
    y = np.array(y)
    for i in range(len(self.labels)):
      y_ = np.where(y==self.labels[i], 1, 0)
      self.classifiers[i].fit(x, y_)

  def predict(self, x):
    probas = [self.classifiers[i].predict_proba(x) for i in range(len(self.labels))]
    return np.argmax(probas)

  def accuracy_score(self, x, y):
    pred = [self.labels[self.predict(i)] for i in x]
    acc = accuracy_score(y, pred)
    return acc

Actually classify using the previous data.

model = OneVsRest(LogisticRegression, np.unique(iris.species))
x = iris[['sepal_length', 'sepal_width']].values
y = iris.species
model.fit(x, y)
print("accuracy_score: {}".format(model.accuracy_score(x,y)))

accuracy_score: 0.8066666666666666

The correct answer rate of 81% is not very good. Let's visualize how it was classified. Use matplotlib's contourf method for visualization. Colors according to which values on the grid points are classified.

from matplotlib.colors import ListedColormap

x_min = iris.sepal_length.min()
x_max = iris.sepal_length.max()
y_min = iris.sepal_width.min()
y_max = iris.sepal_width.max()

x = np.linspace(x_min, x_max, 100)
y = np.linspace(y_min, y_max, 100)

data = []
for i in range(len(y)):
  data.append([model.predict([x[j], y[i]]) for j in range(len(x))])

xx, yy = np.meshgrid(x, y)

cmap = ListedColormap(('blue', 'orange', 'green'))
plt.contourf(xx, yy, data, alpha=0.25, cmap=cmap)
ax = sns.scatterplot(x=iris.sepal_length, y=iris.sepal_width,
                     hue=iris.species, style=iris.species)
plt.show()

As you can see, setosa is properly classified, but the remaining two classes are mixed, so it seems that the correct answer rate is a little low. For the time being, it will be like this.

Multiclass softmax

Implements the LogisticRegressionMulti class for softmax classification in logistic regression.

The cross-entropy error was used as the error function for evaluation, and the parameters were calculated using the steepest gradient descent method. I made it quite properly, I'm sorry

from sklearn.metrics import accuracy_score

class LogisticRegressionMulti:
  def __init__(self, labels, n_iter=1000, eta=0.01):
    self.w = None
    self.labels = labels
    self.n_iter = n_iter
    self.eta = eta
    self.loss = np.array([])

  def softmax(self, a):
    if a.ndim==1:
      return np.exp(a)/np.sum(np.exp(a))
    else:
      return np.exp(a)/np.sum(np.exp(a), axis=1)[:, np.newaxis]

  def cross_entropy_loss(self, w, *args):
    x, y = args
    def safe_log(x, minval=0.0000000001):
      return np.log(x.clip(min=minval))
    
    p = self.softmax(x @ w)
    loss = -np.sum(y*safe_log(p))

    return loss/len(x)

  def grad_cross_entropy_loss(self, w, *args):
    x, y = args

    p = self.softmax(x @ w)
    grad = -(x.T @ (y-p))

    return grad/len(x)

  def fit(self, x, y):
    self.w = np.ones((len(x[0])+1, len(self.labels)))
    x = np.hstack([np.ones((len(x),1)), x])

    for i in range(self.n_iter):
      self.loss = np.append(self.loss, self.cross_entropy_loss(self.w, x, y))
      grad = self.grad_cross_entropy_loss(self.w, x, y)
      self.w -= self.eta * grad
  
  def predict(self, x):
    x = np.hstack([1, x])
    return np.argmax(self.softmax(x @ self.w))

  def accuracy_score(self, x, y):
    pred = [self.predict(i) for i in x]
    y_ = np.argmax(y, axis=1)

    acc = accuracy_score(y_, pred)
    return acc

  @property
  def loss_(self):
    return self.loss

Input to LogisticRegressionMulti uses a One-Hot-Encoded label. This is easy with Pandas' get_dummies`. (I thought after making it, but I should have used get_dummies in the class)

model = LogisticRegressionMulti(np.unique(iris.species), n_iter=10000, eta=0.1)
x = iris[['sepal_length', 'sepal_width']].values
y = pd.get_dummies(iris['species']).values
model.fit(x, y)
print("accuracy_score: {}".format(model.accuracy_score(x, y)))

accuracy_score: 0.8266666666666667

The correct answer rate is about 83%. Looking at the history of the error, it seems that it has converged, so it seems like this.

Also, let's color how it is classified in the same way as before.

Compare with scikit-learn logistic regression

Finally, using all the features, we will compare the classifier we created this time with the LogisticRegression class of scikit-learn.

Method	accuracy_score
OneVsRest	0.98
LogisticRegressionMulti	0.98
sklearn LogisticRegression	0.973

Even with this implementation, it seems that you can get a good score if it is about the classification of irises.

Summary

Implemented multi-class classification using logistic regression. I feel that other classifiers can be used in a similar way. Especially in neural networks, multi-class softmax is a popular method, so I thought it would be useful to understand the theoretical part later.