Introduction

This time, I will summarize the contents that deepened my understanding by implementing logistic regression without using a framework such as scikit learn.

The outline is below.

Regression analysis and logistic regression
Implement logistic regression and try to understand

Regression analysis and logistic regression

Before discussing logistic regression, let's summarize regression analysis. Regression is the expression of the objective variable $ y $ using the explanatory variable $ x $. And this $ y $ takes consecutive values. If $ x $ is one-dimensional, it is called simple regression, and if it is two-dimensional or more, it is called multiple regression. As an example

Predict land price (objective variable) from the size of land area (explanatory variable)
Predict the number of store visitors (objective variable) from advertising expenses (explanatory variable)
Predict weight (objective variable) from height (explanatory variable)

And so on. You can see that the land price and the number of visitors are continuous values. Logistic regression, on the other hand, is a method of estimating the probability of belonging to a class (for example, whether an email is spam) from an explanatory variable. As with linear regression, perform linear calculations based on explanatory variables. However, instead of outputting the calculation result as it is, ** returns the logistic of the result. ** **

Logistic means to output a value from 0 to 1. At this time, the function used for logistic regression is called the sigmoid function.

f(x) = \frac{1}{1+e^{-x}} \\
x = β_0×α_0 +β_1

$ β_0 × α_0 + β_1 $ is the regression equation used in linear regression ($ β_0, β_1 $ are constants, $ α_0 $ is a variable). This time, as an example, it is set as the linear expression of $ \ alpha $. $ f (x) $ returns 0 to 1.

The difference between linear regression and logistic regression is briefly shown below.

Implement logistic regression and try to understand

By the way, this time I would like to proceed with logistic regression while making my own without using a framework such as sckit learn.

Generate 100 random points (2D)
Based on a certain one-dimensional straight line (f (x, y) = 2x + 3y-1 this time), the above 100 points are classified by f (x, y)> 0 or <0.
Create a suitable classifier $ w $ (3D vector). Output 0 to 1 by sigmoid function of connection $ w ・ φ $ with basis function $ φ = (x, y, 1) $
Update the parameter of 3. with the stochastic descent method to make f (x, y) distinguishable.
The discriminator can identify f (x, y)

Generate 100 random points (2D)

First, generate 100 random points in a two-dimensional plane with the np.random.randn () method for x and y points.

`logistic.ipynb`


N = 100#Number of data points
np.random.seed(0)#Fixed random number sequence for data points
X = np.random.randn(N, 2)#Generate a random N × 2 matrix=N random points in 2D space

Based on a certain one-dimensional straight line (f (x, y) = 2x + 3y-1 this time), the above 100 points are classified by f (x, y)> 0 or <0.

Next, draw a one-dimensional straight line (f (x, y) = 2x + 3y-1 this time) and classify random hundred points by f (x, y)> 0 or <0.

`logistic.ipynb`


def f(x, y):
    return 2 * x + 3 * y - 1  #True separation theorem 2x+ 3y = 1

T = np.array([ 1 if f(x, y) > 0 else 0 for x, y in X])
plt.figure(figsize=(6, 6)) 
plt.plot(X[T==1,0], X[T==1,1], 'o', color='red')
plt.plot(X[T==0,0], X[T==0,1], 'o', color='blue')
plt.show()

I saw a straight line that I wanted to classify.

Create a suitable classifier $ w $ (three-dimensional vector). Output 0 to 1 by sigmoid function of connection $ w ・ φ $ with basis function $ φ = (x, y, 1) $

Next, generate a 3D vector $ w $ as a classifier. Then define $ φ = (x, y, 1) $ as the basis function. Find this inner product (multiplyed by each component). For linear regression problems, this dot product value is used for prediction. However, in ** logistic regression, the point is to predict from the value of 0 to 1 obtained by substituting this inner product value into the sigmoid function. ** ** The actual implementation is as follows.

`logistic.ipynb`


np.random.seed() #Initialize random numbers
w = np.random.randn(3)  #Randomly initialize parameters

def phi(x, y):#Basis set
    return np.array([x, y, 1])

seq = np.arange(-3, 3, 0.1)
xlist, ylist = np.meshgrid(seq, seq)
zlist = [sigmoid(np.inner(w, phi(x, y))) for x, y in zip(xlist, ylist)]　#Inner product of parameters and basis functions and assign to sigmoid function
plt.imshow(zlist, extent=[-3,3,-3,3], origin='lower', cmap='bwr')
plt.show()

The distribution obtained by the classifier is shown in the figure above. Approximately 0.5 or more (= [1]) is red, and 0.5 or less (= [0]) is blue (correct area for teacher data). If you match this divided area with the area determined by f (x, y) earlier, you will be successful.

Update the parameter in 3. with stochastic gradient descent to make f (x, y) distinguishable.

Then update the classifier parameters with stochastic gradient descent. The stochastic gradient descent method is an example of explanation in a neural network, but it is summarized here.

I tried to understand the learning function of neural networks carefully without using a machine learning library (second half) https://qiita.com/Fumio-eisan/items/7507d8687ca651ab301d

Now, here is the parameter update formula for the stochastic gradient descent method in logistic regression.

\begin{align}
w_{i+1}& = w_i -\eta ・\frac{δE}{δw}\\
&= w_i -\eta ・(y_n-t_n)φ(x_n)
\end{align}

At this time, $ w $ is the parameter of the classifier, $ \ eta $ is the learning rate, $ y $ is the probability of 0 to 1 obtained by the sigmoid function, and the teacher data indicating that $ t $ is 0 or 1, $ φ ( x) $ is the basis function.

I made a quick formula transformation, but the transformation below ** is a transformation peculiar to logistic regression. ** **

\frac{δE}{δw}=(y_n-t_n)φ(x_n)

In neural networks, etc., it is very mathematically complicated to find the gradient of this loss function, and there is a concern that the amount of calculation will increase. Therefore, a method such as the error back propagation method is used. In logistic regression, it is possible to express it as an unexpectedly simple formula while also using the characteristics of the sigmoid function.

Regarding formula transformation, the following URL is explained very carefully, so it would be greatly appreciated if you could refer to it.

Reference URL http://gihyo.jp/dev/serial/01/machine-learning/0019

By the way, when it is actually implemented, it will be as follows. This time, the learning rate is initially set to 0.1. And we are gradually lowering the learning rate this time to make it easier to converge.

`logistic.ipynb`



#Initial value of learning rate
eta = 0.1

for i in range(len(xlist)):
    list = range(N)
    
    for n in list:
        x_n, y_n = X[n, :]
        t_n = T[n]

        #Predicted probability
        feature = phi(x_n, y_n)
        predict = sigmoid(np.inner(w, feature))
        w -= eta * (predict - t_n) * feature

    #Reduce the learning rate for each iteration
    eta *= 0.9

The calculated result is shown in the figure.

`logistic.ipynb`


#Draw scatter plot and predicted distribution
plt.figure(figsize=(6, 6)) 
plt.imshow(zlist, extent=[-3,3,-3,3], origin='lower', cmap='GnBu')
plt.plot(X[T==1,0], X[T==1,1], 'o', color='red')
plt.plot(X[T==0,0], X[T==0,1], 'o', color='blue')
plt.show()

We have successfully created a classifier that can separate the blue and red areas of random dots.

At the end

This time, I made my own logistic regression discriminator. It was interesting to follow how to mathematically optimize the loss function.

The full program is here. https://github.com/Fumio-eisan/logistic_20200411

I tried to deepen my understanding by making my own discriminator that can classify binary using logistic regression.

Introduction

Regression analysis and logistic regression

Implement logistic regression and try to understand

Generate 100 random points (2D)

logistic.ipynb

Based on a certain one-dimensional straight line (f (x, y) = 2x + 3y-1 this time), the above 100 points are classified by f (x, y)> 0 or <0.

logistic.ipynb

Create a suitable classifier $ w $ (three-dimensional vector). Output 0 to 1 by sigmoid function of connection $ w ・ φ $ with basis function $ φ = (x, y, 1) $

logistic.ipynb

Update the parameter in 3. with stochastic gradient descent to make f (x, y) distinguishable.

logistic.ipynb

logistic.ipynb

At the end

`logistic.ipynb`

`logistic.ipynb`

`logistic.ipynb`

`logistic.ipynb`

`logistic.ipynb`