Learn while implementing with Scipy Logistic regression and the basics of multi-layer perceptron

In Optimization such as interpolation and curve fitting, we learned how to approximate various curves. So what curve should we approximate if $ Y $ has only $ 0 $ or $ 1 $, as follows?

X1 = [4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 
      3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 
      4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 
      3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 
      4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1, 
      6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 
      5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 
      5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 
      6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 
      5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1]

Y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Let's illustrate the data for the time being.

%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(12,4))
plt.scatter(X1, Y)
plt.grid()
plt.show()

output_3_1.png

Logistic regression

When approximating such a relationship, we use "Logistic Regression", which approximates a sigmoid curve. Let's implement it with scipy.optimize.curve_fit.

#Let's convert a Python List to a Numpy Array.
import numpy as np
X1 = np.array(X1)
Y = np.array(Y)

Explanatory variable approximates one sigmoid curve

Defines a sigmoid curve func1 with one explanatory variable. You will be optimizing $ a $ and $ b $.

import numpy as np
def func1(X, a, b): #Sigmaid curve
    f = a + b * X
    return 1. / (1. + np.exp(-f))

An example of using func1 looks like this.

func1(X1, 1, 1)
array([0.99666519, 0.99592986, 0.99726804, 0.99330715, 0.99631576,
       0.99592986, 0.99666519, 0.98661308, 0.99631576, 0.99260846,
       0.98901306, 0.9945137 , 0.99330715, 0.99666519, 0.9900482 ,
       0.99550373, 0.99592986, 0.9939402 , 0.99592986, 0.99260846,
       0.99698158, 0.99330715, 0.99726804, 0.99666519, 0.9950332 ,
       0.99550373, 0.99698158, 0.99752738, 0.99592986, 0.98901306,
       0.99183743, 0.9909867 , 0.99260846, 0.99776215, 0.99592986,
       0.99592986, 0.99666519, 0.99550373, 0.9939402 , 0.99330715,
       0.99550373, 0.99631576, 0.99330715, 0.98661308, 0.9945137 ,
       0.9945137 , 0.9945137 , 0.9950332 , 0.98201379, 0.9939402 ,
       0.99908895, 0.99776215, 0.99899323, 0.99864148, 0.99888746,
       0.9994998 , 0.99592986, 0.99932492, 0.99888746, 0.99917558,
       0.99776215, 0.99816706, 0.99849882, 0.99752738, 0.99776215,
       0.99816706, 0.99849882, 0.99954738, 0.99962939, 0.99752738,
       0.9987706 , 0.99726804, 0.99954738, 0.99726804, 0.9987706 ,
       0.99908895, 0.99698158, 0.99726804, 0.99864148, 0.99888746,
       0.99917558, 0.99938912, 0.99864148, 0.99776215, 0.99864148,
       0.99917558, 0.99864148, 0.99849882, 0.99698158, 0.9983412 ,
       0.99864148, 0.99776215, 0.99776215, 0.99899323, 0.9987706 ,
       0.99797468, 0.99752738, 0.99797468, 0.9983412 , 0.99776215])

Use scipy.optimize.curve_fit to get the optimal solution for $ a $ and $ b $.

from scipy.optimize import curve_fit  
popt, pcov = curve_fit(func1,X1,Y) #popt is the optimal estimate, pcov is the covariance
popt
array([-47.16056308,   9.69474387])

This is the optimal solution for $ a $ and $ b $.

When $ X_1 $ is regressed on the sigmoid curve using the obtained optimal solution, the predicted value of $ Y $ is "correctly classified as 1 when predicting whether it is classified as 0 or 1." It can be interpreted as "probability of being." Let's look at the distribution of that probability.

plt.figure(figsize=(12,4))
plt.hist(func1(X1, -47.16056308,   9.69474387))
plt.grid()
plt.show()

output_13_0.png

Regressing $ X_1 $ back to the sigmoid curve using the obtained optimal solution, the figure below is obtained. The blue dot is the original data, the curve is the regression curve, and the orange dot is the data after regression. This vertical axis is the "probability that the classification to $ 1 $ is correct when predicting whether it will be classified as $ 0 $ or $ 1 $".

output_15_0.png

Exercise 1

Use matplotlib to illustrate the regression curve as shown above.

Estimating prediction accuracy

The actual classification ($ 0 $ or $ 1 $) is in a variable called $ Y $. If you set a threshold, you can calculate how accurate it is, such as classifying it as $ 1 $ if the value obtained by regression is $ 0.5 $ or more, and $ 0 $ if it is less than $ 0.5 $. here,

will do.

Exercise 2

There are various evaluation indexes. The simplest here

ʻAccuracy = (TP + TN) / (TP + FP + FN + TN)`.

Also, as shown in the figure below, plot the probability distribution for each TP, FP, FN, TN.

print("Accuracy: ", len(TP + TN) / len(TP + FP + FN + TN))
plt.figure(figsize=(12,4))
plt.hist([TP, FP, FN, TN], label=['TP', 'FP', 'FN', 'TN'], color=['blue', 'green', 'orange', 'red'])
plt.legend()
plt.grid()
plt.show()

output_20_1.png

Explanatory variable approximates two sigmoid curves

Let's try the case where there are two explanatory variables.

X2 = [1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 
      1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 
      1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 
      1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 
      1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3, 
      2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 
      2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 
      2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 
      1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 
      2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8]

X = np.array([X1, X2])

The explanatory variables define a sigmoid curve func2 with two. You will be optimizing $ a $, $ b $, and $ c $.

Exercise 3

Illustration of regression surface

Since there are two explanatory variables, it is not a regression "curve" but a regression "curved surface". Let's use matplotlib to illustrate the 3D plot as a reference.

N = 1000
x1_axis = np.linspace(min(X[0]), max(X[0]), N)
x2_axis = np.linspace(min(X[1]), max(X[1]), N)
x1_grid, x2_grid = np.meshgrid(x1_axis, x2_axis)
x_mesh = np.c_[np.ravel(x1_grid), np.ravel(x2_grid)]

Assuming that the resulting $ a $, $ b $, and $ c $ have the following values:

y_plot = func2(x_mesh.T, -34.73855674,   4.53539756,   7.68378862).reshape(x1_grid.shape)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(x1_grid, x2_grid, y_plot, cmap='bwr', linewidth=0)
fig.colorbar(surf)
ax.set_title("Surface Plot")
fig.show()

output_32_0.png

The important thing here is that we are regressing on a sigmoid curve (or curved surface), but when we think of it as a classification problem, the classification boundary is a "straight line" (or a plane).

Approximate to a more multivariable sigmoid curve

Let's improve it so that it can handle more explanatory variables.

X3 = [7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 
      5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 
      5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 
      5.5, 5.5, 5.8, 6.0, 5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 
      5.5, 6.1, 5.8, 5.0, 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 
      6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 
      6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 
      6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2, 
      7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 
      6.7, 6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9]

X4 = [3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 
      2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 
      3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 
      2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 
      2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 
      3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 
      3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 
      3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 
      2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 
      3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0]

X = np.array([X1, X2, X3, X4])

Improve the function to accommodate any number of explanatory variables.

import numpy as np
def func(X, *params):
    f = np.zeros_like(X[0])
    for i, param in enumerate(params):
        if i == 0:
            f = f + param
        else:
            f = f + np.array(param * X[i - 1])
    return 1. / (1. + np.exp(-f))

Exercise 4

Refer to Optimization such as interpolation and curve fitting.

Recommendation of scikit-learn

The above calculation is called "logistic regression". Logistic regression is used as one of the "classification" methods in machine learning. This time, the purpose was to understand logistic regression and how to use scipy.optimize.curve_fit, but in practice, it is convenient to calculate using a machine learning library called scikit-learn.

Multilayer perceptron

Multilayer perceptron (MLP) is a type of feedforward neural network consisting of at least three node layers (input layer X, hidden layer H, output layer ʻO`). Individual nodes other than the input node are neurons that use a nonlinear activation function, and by using a supervised learning method called backpropagation, data that is not linearly separable can be identified. .. One of the functions used as a nonlinear activation function is the sigmoid function.

This time I will implement MLP with scipy.optimize.curve_fit as follows.

def mlp(X, *params):
    h01, h02, h03, h04, h0b = params[0], params[1], params[2], params[3], params[4]
    h11, h12, h13, h14, h1b = params[5], params[6], params[7], params[8], params[9]
    h21, h22, h23, h24, h2b = params[10], params[11], params[12], params[13], params[14]

    o01, o02, o03, o0b = params[15], params[16], params[17], params[18]

    h0 = 1. / (1. + np.exp(-(h01 * X[0] + h02 * X[1] + h03 * X[2] + h04 * X[3] + h0b)))
    h1 = 1. / (1. + np.exp(-(h11 * X[0] + h12 * X[1] + h13 * X[2] + h14 * X[3] + h1b)))
    h2 = 1. / (1. + np.exp(-(h21 * X[0] + h22 * X[1] + h23 * X[2] + h24 * X[3] + h2b)))

    o0 = 1. / (1. + np.exp(-(o01 * h0 + o02 * h1 + o03 * h2 + o0b)))

    return o0

Exercise 5

scikit-learn and pytorch

MLP has the simplest configuration of machine learning methods called deep learning. This time, the purpose was to understand MLP and how to use scipy.optimize.curve_fit, but in practice, it is convenient to calculate using libraries such as scikit-learn and pytorch.

Reference article: Govern a lot with PyTorch from linear multiple regression to logistic regression, multi-layer perceptron, autoencoder

Recommended Posts

Learn while implementing with Scipy Logistic regression and the basics of multi-layer perceptron
Implementing logistic regression with NumPy
See the power of speeding up with NumPy and SciPy
I learned the basics of reinforcement learning and played with Cart Pole (implementing simple Q Learning)
Learn the basics of Python ① Beginners
[Python] I thoroughly explained the theory and implementation of logistic regression
Learn the basics of Theano once again
Learn the basics while touching python Variables
Visualize the boundary values of the multi-layer perceptron
Review the concept and terminology of regression
[Linux] Learn the basics of shell commands
I measured the speed of list comprehension, for and while with python2.7.
Learn the trends of feature words in texts with Jubatus and categorize input texts
I compared the moving average of IIR filter type with pandas and scipy
Govern full of PyTorch from linear multiple regression to logistic regression, multi-layer perceptrons and autoencoders
Learn Nim with Python (from the beginning of the year).
Visualize the range of interpolation and extrapolation with python
Calculate the regression coefficient of simple regression analysis with python
Linear multiple regression, logistic regression, multi-layer perceptron, autoencoder, Chainer yo!
Solving the iris problem with scikit-learn ver1.0 (logistic regression)
Animate the basics of dynamic programming and the knapsack problem
Perceptron basics and implementation
Basics of Perceptron Foundation
Basics of regression analysis
The story of implementing the popular Facebook Messenger Bot with python
Play with the password mechanism of GitHub Webhook and Python