In Optimization such as interpolation and curve fitting, we learned how to approximate various curves. So what curve should we approximate if $ Y $ has only $ 0 $ or $ 1 $, as follows?

X1 = [4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 
      3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 
      4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 
      3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 
      4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1, 
      6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 
      5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 
      5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 
      6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 
      5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1]

Y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

Let's illustrate the data for the time being.

%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(12,4))
plt.scatter(X1, Y)
plt.grid()
plt.show()

Logistic regression

When approximating such a relationship, we use "Logistic Regression", which approximates a sigmoid curve. Let's implement it with scipy.optimize.curve_fit.

#Let's convert a Python List to a Numpy Array.
import numpy as np
X1 = np.array(X1)
Y = np.array(Y)

Explanatory variable approximates one sigmoid curve

Defines a sigmoid curve func1 with one explanatory variable. You will be optimizing $ a $ and $ b $.

import numpy as np
def func1(X, a, b): #Sigmaid curve
    f = a + b * X
    return 1. / (1. + np.exp(-f))

An example of using func1 looks like this.

func1(X1, 1, 1)

array([0.99666519, 0.99592986, 0.99726804, 0.99330715, 0.99631576,
       0.99592986, 0.99666519, 0.98661308, 0.99631576, 0.99260846,
       0.98901306, 0.9945137 , 0.99330715, 0.99666519, 0.9900482 ,
       0.99550373, 0.99592986, 0.9939402 , 0.99592986, 0.99260846,
       0.99698158, 0.99330715, 0.99726804, 0.99666519, 0.9950332 ,
       0.99550373, 0.99698158, 0.99752738, 0.99592986, 0.98901306,
       0.99183743, 0.9909867 , 0.99260846, 0.99776215, 0.99592986,
       0.99592986, 0.99666519, 0.99550373, 0.9939402 , 0.99330715,
       0.99550373, 0.99631576, 0.99330715, 0.98661308, 0.9945137 ,
       0.9945137 , 0.9945137 , 0.9950332 , 0.98201379, 0.9939402 ,
       0.99908895, 0.99776215, 0.99899323, 0.99864148, 0.99888746,
       0.9994998 , 0.99592986, 0.99932492, 0.99888746, 0.99917558,
       0.99776215, 0.99816706, 0.99849882, 0.99752738, 0.99776215,
       0.99816706, 0.99849882, 0.99954738, 0.99962939, 0.99752738,
       0.9987706 , 0.99726804, 0.99954738, 0.99726804, 0.9987706 ,
       0.99908895, 0.99698158, 0.99726804, 0.99864148, 0.99888746,
       0.99917558, 0.99938912, 0.99864148, 0.99776215, 0.99864148,
       0.99917558, 0.99864148, 0.99849882, 0.99698158, 0.9983412 ,
       0.99864148, 0.99776215, 0.99776215, 0.99899323, 0.9987706 ,
       0.99797468, 0.99752738, 0.99797468, 0.9983412 , 0.99776215])

Use scipy.optimize.curve_fit to get the optimal solution for $ a $ and $ b $.

from scipy.optimize import curve_fit  
popt, pcov = curve_fit(func1,X1,Y) #popt is the optimal estimate, pcov is the covariance
popt

array([-47.16056308,   9.69474387])

This is the optimal solution for $ a $ and $ b $.

When $ X_1 $ is regressed on the sigmoid curve using the obtained optimal solution, the predicted value of $ Y $ is "correctly classified as 1 when predicting whether it is classified as 0 or 1." It can be interpreted as "probability of being." Let's look at the distribution of that probability.

plt.figure(figsize=(12,4))
plt.hist(func1(X1, -47.16056308,   9.69474387))
plt.grid()
plt.show()

Regressing $ X_1 $ back to the sigmoid curve using the obtained optimal solution, the figure below is obtained. The blue dot is the original data, the curve is the regression curve, and the orange dot is the data after regression. This vertical axis is the "probability that the classification to $ 1 $ is correct when predicting whether it will be classified as $ 0 $ or $ 1 $".

Exercise 1

Use matplotlib to illustrate the regression curve as shown above.

Estimating prediction accuracy

The actual classification ($ 0 $ or $ 1 $) is in a variable called $ Y $. If you set a threshold, you can calculate how accurate it is, such as classifying it as $ 1 $ if the value obtained by regression is $ 0.5 $ or more, and $ 0 $ if it is less than $ 0.5 $. here,

TP (True Positives): What I expected to be $ 1 $ was really $ 1 $
FP (False Positives): What I expected to be $ 1 $ was actually $ 0 $
FN (False Negatives): What I expected to be $ 0 $ was actually $ 1 $
TN (True Negatives): What I expected to be $ 0 $ was really $ 0 $

will do.

Exercise 2

There are various evaluation indexes. The simplest here

ʻAccuracy = (TP + TN) / (TP + FP + FN + TN)`.

Also, as shown in the figure below, plot the probability distribution for each TP, FP, FN, TN.

print("Accuracy: ", len(TP + TN) / len(TP + FP + FN + TN))
plt.figure(figsize=(12,4))
plt.hist([TP, FP, FN, TN], label=['TP', 'FP', 'FN', 'TN'], color=['blue', 'green', 'orange', 'red'])
plt.legend()
plt.grid()
plt.show()

Explanatory variable approximates two sigmoid curves

Let's try the case where there are two explanatory variables.

X2 = [1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 
      1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 
      1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 
      1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 
      1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3, 
      2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 
      2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 
      2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 
      1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 
      2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8]

X = np.array([X1, X2])

The explanatory variables define a sigmoid curve func2 with two. You will be optimizing $ a $, $ b $, and $ c $.

Exercise 3

Define func2 and use scipy.optimize.curve_fit to get the optimal solution for the coefficients $ a $, $ b $ and $ c $.
Calculate the correct answer rate as in Exercise 2.
As in Exercise 2, illustrate the probability distribution for each TP, FP, FN, TN.
Please answer how the correct answer rate is compared to the case where there is only one explanatory variable.

Illustration of regression surface

Since there are two explanatory variables, it is not a regression "curve" but a regression "curved surface". Let's use matplotlib to illustrate the 3D plot as a reference.

N = 1000
x1_axis = np.linspace(min(X[0]), max(X[0]), N)
x2_axis = np.linspace(min(X[1]), max(X[1]), N)
x1_grid, x2_grid = np.meshgrid(x1_axis, x2_axis)
x_mesh = np.c_[np.ravel(x1_grid), np.ravel(x2_grid)]

Assuming that the resulting $ a $, $ b $, and $ c $ have the following values:

y_plot = func2(x_mesh.T, -34.73855674,   4.53539756,   7.68378862).reshape(x1_grid.shape)

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D

fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(x1_grid, x2_grid, y_plot, cmap='bwr', linewidth=0)
fig.colorbar(surf)
ax.set_title("Surface Plot")
fig.show()

The important thing here is that we are regressing on a sigmoid curve (or curved surface), but when we think of it as a classification problem, the classification boundary is a "straight line" (or a plane).

Approximate to a more multivariable sigmoid curve

Let's improve it so that it can handle more explanatory variables.

X3 = [7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 
      5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 
      5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 
      5.5, 5.5, 5.8, 6.0, 5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 
      5.5, 6.1, 5.8, 5.0, 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 
      6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 
      6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 
      6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2, 
      7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 
      6.7, 6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9]

X4 = [3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 
      2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 
      3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 
      2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 
      2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 
      3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 
      3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 
      3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 
      2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 
      3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0]

X = np.array([X1, X2, X3, X4])

Improve the function to accommodate any number of explanatory variables.

import numpy as np
def func(X, *params):
    f = np.zeros_like(X[0])
    for i, param in enumerate(params):
        if i == 0:
            f = f + param
        else:
            f = f + np.array(param * X[i - 1])
    return 1. / (1. + np.exp(-f))

Exercise 4

Refer to Optimization such as interpolation and curve fitting.

Use func to make a prediction using only the first variable $ X_1 $ and calculate the correct answer rate.
Use func to make a prediction using only the first two variables $ X_1 $ and $ X_2 $, and calculate the correct answer rate.
Use func to make a prediction using only the first three variables $ X_1 $, $ X_2 $, $ X_3 $, and calculate the correct answer rate.
Use func to make a prediction using all four variables $ X_1 $, $ X_2 $, $ X_3 $, $ X_4 $, and calculate the correct answer rate.

Recommendation of scikit-learn

The above calculation is called "logistic regression". Logistic regression is used as one of the "classification" methods in machine learning. This time, the purpose was to understand logistic regression and how to use scipy.optimize.curve_fit, but in practice, it is convenient to calculate using a machine learning library called scikit-learn.

Multilayer perceptron

Multilayer perceptron (MLP) is a type of feedforward neural network consisting of at least three node layers (input layer X, hidden layer H, output layer ʻO`). Individual nodes other than the input node are neurons that use a nonlinear activation function, and by using a supervised learning method called backpropagation, data that is not linearly separable can be identified. .. One of the functions used as a nonlinear activation function is the sigmoid function.

This time I will implement MLP with scipy.optimize.curve_fit as follows.

def mlp(X, *params):
    h01, h02, h03, h04, h0b = params[0], params[1], params[2], params[3], params[4]
    h11, h12, h13, h14, h1b = params[5], params[6], params[7], params[8], params[9]
    h21, h22, h23, h24, h2b = params[10], params[11], params[12], params[13], params[14]

    o01, o02, o03, o0b = params[15], params[16], params[17], params[18]

    h0 = 1. / (1. + np.exp(-(h01 * X[0] + h02 * X[1] + h03 * X[2] + h04 * X[3] + h0b)))
    h1 = 1. / (1. + np.exp(-(h11 * X[0] + h12 * X[1] + h13 * X[2] + h14 * X[3] + h1b)))
    h2 = 1. / (1. + np.exp(-(h21 * X[0] + h22 * X[1] + h23 * X[2] + h24 * X[3] + h2b)))

    o0 = 1. / (1. + np.exp(-(o01 * h0 + o02 * h1 + o03 * h2 + o0b)))

    return o0

Exercise 5

Explain how the above mlp works.
Use mlp to make a prediction using all four variables $ X_1 $, $ X_2 $, $ X_3 $, $ X_4 $, and calculate the correct answer rate.

scikit-learn and pytorch

MLP has the simplest configuration of machine learning methods called deep learning. This time, the purpose was to understand MLP and how to use scipy.optimize.curve_fit, but in practice, it is convenient to calculate using libraries such as scikit-learn and pytorch.

Reference article: Govern a lot with PyTorch from linear multiple regression to logistic regression, multi-layer perceptron, autoencoder

https://qiita.com/maskot1977/items/2fb459c66d49ba550db2

Learn while implementing with Scipy Logistic regression and the basics of multi-layer perceptron