Machine learning algorithm (logistic regression)

Introduction

Step-by-step on the theory, implementation in python, and analysis using scikit-learn about the algorithm previously taken up in "Classification of Machine Learning" I will study with. I'm writing it for personal learning, so I'd like you to overlook any mistakes.

This time about ** logistic regression **. Logistic regression is also an algorithm that handles binary classification like perceptron, although it is written as regression.

The following sites were referred to this time. Thank you very much.

theory

Regarding the theory of logistic regression, let's first derive the activation function ** sigmoid function **.

Sigmoid function

Since logistic regression is a binary classification, consider classes $ C_1 $ and $ C_2 $. The sum of the probability $ P (C_1) $ for $ C_1 $ and the probability $ P (C_2) $ for $ C_2 $ is 1.

The probability of becoming $ C_1 $ when the data sequence $ \ boldsymbol {x} $ is given is from ** Bayes' theorem **.

\begin{align}
P(C_1|\boldsymbol{x})&=\frac{P(\boldsymbol{x}|C_1)P(C_1)}{P(\boldsymbol{x})} \\
&= \frac{P(\boldsymbol{x}|C_1)P(C_1)}{P(\boldsymbol{x}|C_1)P(C_1)+P(\boldsymbol{x}|C_2)P(C_2)} \\
&= \frac{1}{1+\frac{P(\boldsymbol{x}|C_2)P(C_2)}{P(\boldsymbol{x}|C_1)P(C_1)}} \\
&= \frac{1}{1+\exp(-\ln\frac{P(\boldsymbol{x}|C_1)P(C_1)}{P(\boldsymbol{x}|C_2)P(C_2)})} \\
&= \frac{1}{1+\exp(-a)} = \sigma(a)
\end{align}

This $ \ sigma (a) $ is called the ** sigmoid function **. The sigmoid function takes a value from 0 to 1 as shown below, so it is a convenient function to express the probability.

sigmoid

Logistic regression model

Using the given data sequence $ \ boldsymbol {x} = (x_0, x_1, \ cdots, x_n) $ and the teacher's classification $ \ boldsymbol {t} = (t_0, t_1, \ cdots, t_n) $

L(\boldsymbol{x})=\frac{1}{1+\exp(-\boldsymbol{w}^T\boldsymbol{x})}

We will optimize the parameter $ \ boldsymbol {w} = (w_0, w_1, \ cdots, w_n) $ of.

Cross entropy error

is therex_iWhen given the classC_1Probability of becomingP(C_1|x_i)Top_iThen the classC_2Probability of becomingP(C_2|x_i)Is(1-p_i)Will be. That is, the classt_iProbability of becomingP(t_i|x_i)Is、$P(t_i|x_i)=p_i^{t_i}(1-p_i)^{1-t_i}$Will be.

Applying this to all data

\begin{align}
P(\boldsymbol{t}|\boldsymbol{x})&=P(t_0|x_0)P(t_1|X_1)\cdots P(t_{n-1}|x_{n-1}) \\
&=\prod_{i=0}^{n-1}P(t_i|x_i) \\
&=\prod_{i=1}^{n-1}p_i^{t_i}(1-p_i)^{1-t_i}
\end{align}

It will be. Taking the logarithm of both sides,

\log P(\boldsymbol{t}|\boldsymbol{x}) = \sum_{i=0}^{n-1}\{t_i\log p_i+(1-t_i)\log (1-p_i)\}

This is called ** log-likelihood **, and in order to maximize log-likelihood, the sign is inverted.

E(\boldsymbol{x}) = -\frac{1}{n}\log P(\boldsymbol{t}|\boldsymbol{x}) = \frac{1}{n}\sum_{i=0}^{n-1}\{-t_i\log p_i-(1-t_i)\log (1-p_i)\}

This $ E $ is called the ** cross entropy error function **. Since we will use it later, the derivative of $ E $ is

\frac{\partial{E}}{\partial{w_i}}=\frac{1}{n}\sum_{i=0}^{n-1}(p_i-t_i)x_i

(Explanation omitted)

Conjugate gradient method

Now, to minimize the cross-entropy error function, we use the previously mentioned gradient method. Again, you can use the steepest descent method or the stochastic gradient descent method, but use the ** Conjugate Gradient Method **. For more information, see [Wikipedia: Conjugate Gradient Method](https://ja.wikipedia.org/wiki/%E5%85%B1%E5%BD%B9%E5%8B%BE%E9%85%8D%E6% I will give it to B3% 95), but it is an algorithm that is faster than the steepest gradient method and converges without setting the learning rate. I'd like to implement this in python, but it's troublesome (!) [Scipy.optimize.fmin_cg] in python (https://docs.scipy.org/doc/scipy-0.13.0/reference/generated/scipy.optimize) Use a library called .fmin_cg.html).

Implementation by python

We will implement the LogisticRegression class using the theory so far. fmin_cg uses the gradient function because it gives good results when given a gradient function.

from scipy import optimize

class LogisticRegression:
  def __init__(self):
    self.w = np.array([])

  def sigmoid(self, a):
    return 1.0 / (1 + np.exp(-a))
  
  def cross_entropy_loss(self, w, *args):
    def safe_log(x, minval=0.0000000001):
      return np.log(x.clip(min=minval))
    t, x = args
    loss = 0
    for i in range(len(t)):
      ti = (t[i]+1)/2
      h = self.sigmoid(w.T @ x[i])
      loss += -ti*safe_log(h) - (1-ti)*safe_log(1-h)

    return loss/len(t)

  def grad_cross_entropy_loss(self, w, *args):
    t, x = args
    grad = np.zeros_like(w)
    for i in range(len(t)):
      ti = (t[i]+1)/2
      h = self.sigmoid(w.T @ x[i])
      grad += (h - ti) * x[i]

    return grad/len(t)

  def fit(self, x, y):
    w0 = np.ones(len(x[0])+1)
    x = np.hstack([np.ones((len(x),1)), x])

    self.w = optimize.fmin_cg(self.cross_entropy_loss, w0, fprime=self.grad_cross_entropy_loss, args=(y, x))

  @property
  def w_(self):
    return self.w

Let's use this class to classify iris data. Draw the boundary as well. The boundary is a line of $ \ boldsymbol {w} ^ T \ boldsymbol {x} = 0 $. I changed the 2 classes to 1 and -1, so I made the code to match it.

df = df_iris[df_iris['target']!='setosa']
df = df.drop(df.columns[[1,2]], axis=1)
df['target'] = df['target'].map({'versicolor':1, 'virginica':-1})

#Drawing a graph
fig, ax = plt.subplots()

df_versicolor = df_iris[df_iris['target']=='versicolor']

x1 = df_iris[df_iris['target']=='versicolor'].iloc[:,3].values
y1 = df_iris[df_iris['target']=='versicolor'].iloc[:,0].values

x2 = df_iris[df_iris['target']=='virginica'].iloc[:,3].values
y2 = df_iris[df_iris['target']=='virginica'].iloc[:,0].values

xs = StandardScaler()
ys = StandardScaler()

xs.fit(np.append(x1,x2).reshape(-1, 1))
ys.fit(np.append(y1,y2).reshape(-1, 1))

x1s = xs.transform(x1.reshape(-1, 1))
x2s = xs.transform(x2.reshape(-1, 1))
y1s = ys.transform(y1.reshape(-1, 1))
y2s = ys.transform(y2.reshape(-1, 1))

x = np.concatenate([np.concatenate([x1s, y1s], axis=1), np.concatenate([x2s, y2s], axis=1)])

y = df['target'].values

model = LogisticRegression()
model.fit(x, y)

ax.scatter(x1s, y1s, color='red', marker='o', label='versicolor')
ax.scatter(x2s, y2s, color='blue', marker='s', label='virginica')

ax.set_xlabel("petal width (cm)")
ax.set_ylabel("sepal length (cm)")

#Draw classification boundaries
w = model.w_
x_fig = np.linspace(-2.,2.,100)
y_fig = [-w[1]/w[2]*xi-w[0]/w[2] for xi in x_fig]
ax.plot(x_fig, y_fig)
ax.set_ylim(-2.5,2.5)

ax.legend()
print(w)
plt.show()

Optimization terminated successfully.
         Current function value: 0.166434
         Iterations: 12
         Function evaluations: 41
         Gradient evaluations: 41
[-0.57247091 -5.42865492 -0.20202263]
logistic_regression_1.png

It seems that they can be classified fairly neatly.

Implementation of scikit-learn

scikit-learn also has a LogisticRegression class, so it's almost the same as the code above.

from sklearn.linear_model import LogisticRegression

df = df_iris[df_iris['target']!='setosa']
df = df.drop(df.columns[[1,2]], axis=1)
df['target'] = df['target'].map({'versicolor':1, 'virginica':-1})

#Drawing a graph
fig, ax = plt.subplots()

df_versicolor = df_iris[df_iris['target']=='versicolor']

x1 = df_iris[df_iris['target']=='versicolor'].iloc[:,3].values
y1 = df_iris[df_iris['target']=='versicolor'].iloc[:,0].values

x2 = df_iris[df_iris['target']=='virginica'].iloc[:,3].values
y2 = df_iris[df_iris['target']=='virginica'].iloc[:,0].values

xs = StandardScaler()
ys = StandardScaler()

xs.fit(np.append(x1,x2).reshape(-1, 1))
ys.fit(np.append(y1,y2).reshape(-1, 1))

x1s = xs.transform(x1.reshape(-1, 1))
x2s = xs.transform(x2.reshape(-1, 1))
y1s = ys.transform(y1.reshape(-1, 1))
y2s = ys.transform(y2.reshape(-1, 1))

x = np.concatenate([np.concatenate([x1s, y1s], axis=1), np.concatenate([x2s, y2s], axis=1)])

y = df['target'].values

model = LogisticRegression(C=100)
model.fit(x, y)

ax.scatter(x1s, y1s, color='red', marker='o', label='versicolor')
ax.scatter(x2s, y2s, color='blue', marker='s', label='virginica')

ax.set_xlabel("petal width (cm)")
ax.set_ylabel("sepal length (cm)")

#Draw classification boundaries
w = model.coef_[0]

x_fig = np.linspace(-2.,2.,100)
y_fig = [-w[0]/w[1]*xi-model.intercept_/w[1] for xi in x_fig]
ax.plot(x_fig, y_fig)
ax.set_ylim(-2.5,2.5)

ax.legend()
plt.show()
logistic_regression_2.png

This is also classified as good.

Summary

We have summarized the logistic regression that is relatively important (believed to be) in the world of machine learning. The theory has become more difficult from around here.

Recommended Posts

Machine learning algorithm (logistic regression)
Machine learning logistic regression
Machine learning algorithm (multiple regression analysis)
Machine learning algorithm (simple regression analysis)
<Course> Machine Learning Chapter 3: Logistic Regression Model
Machine learning algorithm (generalization of linear regression)
Machine learning linear regression
Machine learning algorithm (linear regression summary & regularization)
Machine Learning: Supervised --Linear Regression
Understand machine learning ~ ridge regression ~.
Machine learning algorithm (simple perceptron)
Supervised machine learning (classification / regression)
Machine learning stacking template (regression)
Coursera Machine Learning Challenges in Python: ex2 (Logistic Regression)
Logistic regression
Machine learning
<Course> Machine Learning Chapter 6: Algorithm 2 (k-means)
Machine learning algorithm (support vector machine application)
Machine learning beginners try linear regression
Classification and regression in machine learning
Machine learning algorithm (gradient descent method)
Machine learning with python (2) Simple regression analysis
Dictionary learning algorithm
<Course> Machine Learning Chapter 1: Linear Regression Model
Machine learning algorithm classification and implementation summary
[Memo] Machine learning
<Course> Machine Learning Chapter 2: Nonlinear Regression Model
Machine learning classification
Stock price forecast using machine learning (regression)
[Machine learning] Regression analysis using scikit learn
Machine Learning sample
Gaussian mixed model EM algorithm [statistical machine learning]
Machine learning tutorial summary
About machine learning overfitting
Machine learning ⑤ AdaBoost Summary
Coursera Machine Learning Challenges in Python: ex3 (Handwritten Number Recognition with Logistic Regression)
Machine Learning: Supervised --AdaBoost
Studying Machine Learning ~ matplotlib ~
Machine learning course memo
Machine learning library dlib
Supervised learning (regression) 1 Basics
Python: Supervised Learning (Regression)
Machine learning library Shogun
Machine learning rabbit challenge
Introduction to machine learning
Machine Learning: k-Nearest Neighbors
What is machine learning?
Coursera Machine Learning Challenges in Python: ex1 (Linear Regression)
Talk about improving machine learning algorithm bottlenecks with Cython
Machine learning model considering maintainability
Machine learning learned with Pokemon
Data set for machine learning
Japanese preprocessing for machine learning
Try to evaluate the performance of machine learning / regression model
Python Machine Learning Programming Chapter 2 Classification Problems-Machine Learning Algorithm Training Summary
Machine learning in Delemas (practice)
An introduction to machine learning
Machine learning / classification related techniques
Basics of Machine Learning (Notes)
Machine learning beginners tried RBM
[Machine learning] Understanding random forest