I tried to deepen my understanding by making my own discriminator that can classify binary using logistic regression.

Introduction

This time, I will summarize the contents that deepened my understanding by implementing logistic regression without using a framework such as scikit learn.

The outline is below.

Regression analysis and logistic regression

Before discussing logistic regression, let's summarize regression analysis. Regression is the expression of the objective variable $ y $ using the explanatory variable $ x $. And this $ y $ takes consecutive values. If $ x $ is one-dimensional, it is called simple regression, and if it is two-dimensional or more, it is called multiple regression. As an example

And so on. You can see that the land price and the number of visitors are continuous values. Logistic regression, on the other hand, is a method of estimating the probability of belonging to a class (for example, whether an email is spam) from an explanatory variable. As with linear regression, perform linear calculations based on explanatory variables. However, instead of outputting the calculation result as it is, ** returns the logistic of the result. ** **

Logistic means to output a value from 0 to 1. At this time, the function used for logistic regression is called the sigmoid function.

f(x) = \frac{1}{1+e^{-x}} \\
x = β_0×α_0 +β_1

004.png

$ β_0 × α_0 + β_1 $ is the regression equation used in linear regression ($ β_0, β_1 $ are constants, $ α_0 $ is a variable). This time, as an example, it is set as the linear expression of $ \ alpha $. $ f (x) $ returns 0 to 1.

The difference between linear regression and logistic regression is briefly shown below. image.png

Implement logistic regression and try to understand

By the way, this time I would like to proceed with logistic regression while making my own without using a framework such as sckit learn.

  1. Generate 100 random points (2D)
  2. Based on a certain one-dimensional straight line (f (x, y) = 2x + 3y-1 this time), the above 100 points are classified by f (x, y)> 0 or <0.
  3. Create a suitable classifier $ w $ (3D vector). Output 0 to 1 by sigmoid function of connection $ w ・ φ $ with basis function $ φ = (x, y, 1) $
  4. Update the parameter of 3. with the stochastic descent method to make f (x, y) distinguishable.
  5. The discriminator can identify f (x, y)

Generate 100 random points (2D)

First, generate 100 random points in a two-dimensional plane with the np.random.randn () method for x and y points.

logistic.ipynb


N = 100#Number of data points
np.random.seed(0)#Fixed random number sequence for data points
X = np.random.randn(N, 2)#Generate a random N × 2 matrix=N random points in 2D space

Based on a certain one-dimensional straight line (f (x, y) = 2x + 3y-1 this time), the above 100 points are classified by f (x, y)> 0 or <0.

Next, draw a one-dimensional straight line (f (x, y) = 2x + 3y-1 this time) and classify random hundred points by f (x, y)> 0 or <0.

logistic.ipynb


def f(x, y):
    return 2 * x + 3 * y - 1  #True separation theorem 2x+ 3y = 1

T = np.array([ 1 if f(x, y) > 0 else 0 for x, y in X])
plt.figure(figsize=(6, 6)) 
plt.plot(X[T==1,0], X[T==1,1], 'o', color='red')
plt.plot(X[T==0,0], X[T==0,1], 'o', color='blue')
plt.show()

005.png

I saw a straight line that I wanted to classify.

Create a suitable classifier $ w $ (three-dimensional vector). Output 0 to 1 by sigmoid function of connection $ w ・ φ $ with basis function $ φ = (x, y, 1) $

Next, generate a 3D vector $ w $ as a classifier. Then define $ φ = (x, y, 1) $ as the basis function. Find this inner product (multiplyed by each component). For linear regression problems, this dot product value is used for prediction. However, in ** logistic regression, the point is to predict from the value of 0 to 1 obtained by substituting this inner product value into the sigmoid function. ** ** The actual implementation is as follows.

logistic.ipynb


np.random.seed() #Initialize random numbers
w = np.random.randn(3)  #Randomly initialize parameters

def phi(x, y):#Basis set
    return np.array([x, y, 1])

seq = np.arange(-3, 3, 0.1)
xlist, ylist = np.meshgrid(seq, seq)
zlist = [sigmoid(np.inner(w, phi(x, y))) for x, y in zip(xlist, ylist)] #Inner product of parameters and basis functions and assign to sigmoid function
plt.imshow(zlist, extent=[-3,3,-3,3], origin='lower', cmap='bwr')
plt.show()

008.png

The distribution obtained by the classifier is shown in the figure above. Approximately 0.5 or more (= [1]) is red, and 0.5 or less (= [0]) is blue (correct area for teacher data). If you match this divided area with the area determined by f (x, y) earlier, you will be successful.

Update the parameter in 3. with stochastic gradient descent to make f (x, y) distinguishable.

Then update the classifier parameters with stochastic gradient descent. The stochastic gradient descent method is an example of explanation in a neural network, but it is summarized here.

I tried to understand the learning function of neural networks carefully without using a machine learning library (second half) https://qiita.com/Fumio-eisan/items/7507d8687ca651ab301d

Now, here is the parameter update formula for the stochastic gradient descent method in logistic regression.

\begin{align}
w_{i+1}& = w_i -\eta ・\frac{δE}{δw}\\
&= w_i -\eta ・(y_n-t_n)φ(x_n)
\end{align}

At this time, $ w $ is the parameter of the classifier, $ \ eta $ is the learning rate, $ y $ is the probability of 0 to 1 obtained by the sigmoid function, and the teacher data indicating that $ t $ is 0 or 1, $ φ ( x) $ is the basis function.

I made a quick formula transformation, but the transformation below ** is a transformation peculiar to logistic regression. ** **

\frac{δE}{δw}=(y_n-t_n)φ(x_n)

In neural networks, etc., it is very mathematically complicated to find the gradient of this loss function, and there is a concern that the amount of calculation will increase. Therefore, a method such as the error back propagation method is used. In logistic regression, it is possible to express it as an unexpectedly simple formula while also using the characteristics of the sigmoid function.

Regarding formula transformation, the following URL is explained very carefully, so it would be greatly appreciated if you could refer to it.

Reference URL http://gihyo.jp/dev/serial/01/machine-learning/0019

By the way, when it is actually implemented, it will be as follows. This time, the learning rate is initially set to 0.1. And we are gradually lowering the learning rate this time to make it easier to converge.

logistic.ipynb



#Initial value of learning rate
eta = 0.1

for i in range(len(xlist)):
    list = range(N)
    
    for n in list:
        x_n, y_n = X[n, :]
        t_n = T[n]

        #Predicted probability
        feature = phi(x_n, y_n)
        predict = sigmoid(np.inner(w, feature))
        w -= eta * (predict - t_n) * feature

    #Reduce the learning rate for each iteration
    eta *= 0.9

The calculated result is shown in the figure.

logistic.ipynb


#Draw scatter plot and predicted distribution
plt.figure(figsize=(6, 6)) 
plt.imshow(zlist, extent=[-3,3,-3,3], origin='lower', cmap='GnBu')
plt.plot(X[T==1,0], X[T==1,1], 'o', color='red')
plt.plot(X[T==0,0], X[T==0,1], 'o', color='blue')
plt.show()

007.png

We have successfully created a classifier that can separate the blue and red areas of random dots.

At the end

This time, I made my own logistic regression discriminator. It was interesting to follow how to mathematically optimize the loss function.

The full program is here. https://github.com/Fumio-eisan/logistic_20200411

Recommended Posts

I tried to deepen my understanding by making my own discriminator that can classify binary using logistic regression.
I tried to publish my own module so that I can pip install it
I tried to create a simple credit score by logistic regression.
I tried to classify text using TensorFlow
I tried using the Python library "pykakasi" that can convert kanji to romaji.
I tried to classify dragon ball by adaline
I tried learning my own dataset using Chainer Trainer
I tried to classify MNIST by GNN (with PyTorch geometric)
[Graduation from article scattering] I tried to develop a service that can list articles by purpose
[Python] I made my own library that can be imported dynamically
I tried Python! ] Can I post to Kaggle on my iPad Pro?
I tried logistic regression analysis for the first time using Titanic data
I tried how to improve the accuracy of my own Neural Network
I tried to implement Bayesian linear regression by Gibbs sampling in python
I tried to classify mnist numbers by unsupervised learning [PCA, t-SNE, k-means]
I tried to analyze the New Year's card by myself using python