This time, I will summarize the contents that deepened my understanding by implementing logistic regression without using a framework such as scikit learn.
The outline is below.
Before discussing logistic regression, let's summarize regression analysis. Regression is the expression of the objective variable $ y $ using the explanatory variable $ x $. And this $ y $ takes consecutive values. If $ x $ is one-dimensional, it is called simple regression, and if it is two-dimensional or more, it is called multiple regression. As an example
And so on. You can see that the land price and the number of visitors are continuous values. Logistic regression, on the other hand, is a method of estimating the probability of belonging to a class (for example, whether an email is spam) from an explanatory variable. As with linear regression, perform linear calculations based on explanatory variables. However, instead of outputting the calculation result as it is, ** returns the logistic of the result. ** **
Logistic means to output a value from 0 to 1. At this time, the function used for logistic regression is called the sigmoid function.
f(x) = \frac{1}{1+e^{-x}} \\
x = β_0×α_0 +β_1
$ β_0 × α_0 + β_1 $ is the regression equation used in linear regression ($ β_0, β_1 $ are constants, $ α_0 $ is a variable). This time, as an example, it is set as the linear expression of $ \ alpha $. $ f (x) $ returns 0 to 1.
The difference between linear regression and logistic regression is briefly shown below.
By the way, this time I would like to proceed with logistic regression while making my own without using a framework such as sckit learn.
First, generate 100 random points in a two-dimensional plane with the np.random.randn () method for x and y points.
logistic.ipynb
N = 100#Number of data points
np.random.seed(0)#Fixed random number sequence for data points
X = np.random.randn(N, 2)#Generate a random N × 2 matrix=N random points in 2D space
Next, draw a one-dimensional straight line (f (x, y) = 2x + 3y-1 this time) and classify random hundred points by f (x, y)> 0 or <0.
logistic.ipynb
def f(x, y):
return 2 * x + 3 * y - 1 #True separation theorem 2x+ 3y = 1
T = np.array([ 1 if f(x, y) > 0 else 0 for x, y in X])
plt.figure(figsize=(6, 6))
plt.plot(X[T==1,0], X[T==1,1], 'o', color='red')
plt.plot(X[T==0,0], X[T==0,1], 'o', color='blue')
plt.show()
I saw a straight line that I wanted to classify.
Next, generate a 3D vector $ w $ as a classifier. Then define $ φ = (x, y, 1) $ as the basis function. Find this inner product (multiplyed by each component). For linear regression problems, this dot product value is used for prediction. However, in ** logistic regression, the point is to predict from the value of 0 to 1 obtained by substituting this inner product value into the sigmoid function. ** ** The actual implementation is as follows.
logistic.ipynb
np.random.seed() #Initialize random numbers
w = np.random.randn(3) #Randomly initialize parameters
def phi(x, y):#Basis set
return np.array([x, y, 1])
seq = np.arange(-3, 3, 0.1)
xlist, ylist = np.meshgrid(seq, seq)
zlist = [sigmoid(np.inner(w, phi(x, y))) for x, y in zip(xlist, ylist)] #Inner product of parameters and basis functions and assign to sigmoid function
plt.imshow(zlist, extent=[-3,3,-3,3], origin='lower', cmap='bwr')
plt.show()
The distribution obtained by the classifier is shown in the figure above. Approximately 0.5 or more (= [1]) is red, and 0.5 or less (= [0]) is blue (correct area for teacher data). If you match this divided area with the area determined by f (x, y) earlier, you will be successful.
Then update the classifier parameters with stochastic gradient descent. The stochastic gradient descent method is an example of explanation in a neural network, but it is summarized here.
I tried to understand the learning function of neural networks carefully without using a machine learning library (second half) https://qiita.com/Fumio-eisan/items/7507d8687ca651ab301d
Now, here is the parameter update formula for the stochastic gradient descent method in logistic regression.
\begin{align}
w_{i+1}& = w_i -\eta ・\frac{δE}{δw}\\
&= w_i -\eta ・(y_n-t_n)φ(x_n)
\end{align}
At this time, $ w $ is the parameter of the classifier, $ \ eta $ is the learning rate, $ y $ is the probability of 0 to 1 obtained by the sigmoid function, and the teacher data indicating that $ t $ is 0 or 1, $ φ ( x) $ is the basis function.
I made a quick formula transformation, but the transformation below ** is a transformation peculiar to logistic regression. ** **
\frac{δE}{δw}=(y_n-t_n)φ(x_n)
In neural networks, etc., it is very mathematically complicated to find the gradient of this loss function, and there is a concern that the amount of calculation will increase. Therefore, a method such as the error back propagation method is used. In logistic regression, it is possible to express it as an unexpectedly simple formula while also using the characteristics of the sigmoid function.
Regarding formula transformation, the following URL is explained very carefully, so it would be greatly appreciated if you could refer to it.
Reference URL http://gihyo.jp/dev/serial/01/machine-learning/0019
By the way, when it is actually implemented, it will be as follows. This time, the learning rate is initially set to 0.1. And we are gradually lowering the learning rate this time to make it easier to converge.
logistic.ipynb
#Initial value of learning rate
eta = 0.1
for i in range(len(xlist)):
list = range(N)
for n in list:
x_n, y_n = X[n, :]
t_n = T[n]
#Predicted probability
feature = phi(x_n, y_n)
predict = sigmoid(np.inner(w, feature))
w -= eta * (predict - t_n) * feature
#Reduce the learning rate for each iteration
eta *= 0.9
The calculated result is shown in the figure.
logistic.ipynb
#Draw scatter plot and predicted distribution
plt.figure(figsize=(6, 6))
plt.imshow(zlist, extent=[-3,3,-3,3], origin='lower', cmap='GnBu')
plt.plot(X[T==1,0], X[T==1,1], 'o', color='red')
plt.plot(X[T==0,0], X[T==0,1], 'o', color='blue')
plt.show()
We have successfully created a classifier that can separate the blue and red areas of random dots.
This time, I made my own logistic regression discriminator. It was interesting to follow how to mathematically optimize the loss function.
The full program is here. https://github.com/Fumio-eisan/logistic_20200411
Recommended Posts