In Optimization such as interpolation and curve fitting, we learned how to approximate various curves. So what curve should we approximate if $ Y $ has only $ 0 $ or $ 1 $, as follows?
X1 = [4.7, 4.5, 4.9, 4.0, 4.6, 4.5, 4.7, 3.3, 4.6, 3.9, 
      3.5, 4.2, 4.0, 4.7, 3.6, 4.4, 4.5, 4.1, 4.5, 3.9, 
      4.8, 4.0, 4.9, 4.7, 4.3, 4.4, 4.8, 5.0, 4.5, 3.5, 
      3.8, 3.7, 3.9, 5.1, 4.5, 4.5, 4.7, 4.4, 4.1, 4.0, 
      4.4, 4.6, 4.0, 3.3, 4.2, 4.2, 4.2, 4.3, 3.0, 4.1, 
      6.0, 5.1, 5.9, 5.6, 5.8, 6.6, 4.5, 6.3, 5.8, 6.1, 
      5.1, 5.3, 5.5, 5.0, 5.1, 5.3, 5.5, 6.7, 6.9, 5.0, 
      5.7, 4.9, 6.7, 4.9, 5.7, 6.0, 4.8, 4.9, 5.6, 5.8, 
      6.1, 6.4, 5.6, 5.1, 5.6, 6.1, 5.6, 5.5, 4.8, 5.4, 
      5.6, 5.1, 5.1, 5.9, 5.7, 5.2, 5.0, 5.2, 5.4, 5.1]
Y = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
     1, 1, 1, 1, 1, 1, 1, 1, 1, 1]
Let's illustrate the data for the time being.
%matplotlib inline
import matplotlib.pyplot as plt
plt.figure(figsize=(12,4))
plt.scatter(X1, Y)
plt.grid()
plt.show()

When approximating such a relationship, we use "Logistic Regression", which approximates a sigmoid curve. Let's implement it with scipy.optimize.curve_fit.
#Let's convert a Python List to a Numpy Array.
import numpy as np
X1 = np.array(X1)
Y = np.array(Y)
Defines a sigmoid curve func1 with one explanatory variable. You will be optimizing $ a $ and $ b $.
import numpy as np
def func1(X, a, b): #Sigmaid curve
    f = a + b * X
    return 1. / (1. + np.exp(-f))
An example of using func1 looks like this.
func1(X1, 1, 1)
array([0.99666519, 0.99592986, 0.99726804, 0.99330715, 0.99631576,
       0.99592986, 0.99666519, 0.98661308, 0.99631576, 0.99260846,
       0.98901306, 0.9945137 , 0.99330715, 0.99666519, 0.9900482 ,
       0.99550373, 0.99592986, 0.9939402 , 0.99592986, 0.99260846,
       0.99698158, 0.99330715, 0.99726804, 0.99666519, 0.9950332 ,
       0.99550373, 0.99698158, 0.99752738, 0.99592986, 0.98901306,
       0.99183743, 0.9909867 , 0.99260846, 0.99776215, 0.99592986,
       0.99592986, 0.99666519, 0.99550373, 0.9939402 , 0.99330715,
       0.99550373, 0.99631576, 0.99330715, 0.98661308, 0.9945137 ,
       0.9945137 , 0.9945137 , 0.9950332 , 0.98201379, 0.9939402 ,
       0.99908895, 0.99776215, 0.99899323, 0.99864148, 0.99888746,
       0.9994998 , 0.99592986, 0.99932492, 0.99888746, 0.99917558,
       0.99776215, 0.99816706, 0.99849882, 0.99752738, 0.99776215,
       0.99816706, 0.99849882, 0.99954738, 0.99962939, 0.99752738,
       0.9987706 , 0.99726804, 0.99954738, 0.99726804, 0.9987706 ,
       0.99908895, 0.99698158, 0.99726804, 0.99864148, 0.99888746,
       0.99917558, 0.99938912, 0.99864148, 0.99776215, 0.99864148,
       0.99917558, 0.99864148, 0.99849882, 0.99698158, 0.9983412 ,
       0.99864148, 0.99776215, 0.99776215, 0.99899323, 0.9987706 ,
       0.99797468, 0.99752738, 0.99797468, 0.9983412 , 0.99776215])
Use scipy.optimize.curve_fit to get the optimal solution for $ a $ and $ b $.
from scipy.optimize import curve_fit  
popt, pcov = curve_fit(func1,X1,Y) #popt is the optimal estimate, pcov is the covariance
popt
array([-47.16056308,   9.69474387])
This is the optimal solution for $ a $ and $ b $.
When $ X_1 $ is regressed on the sigmoid curve using the obtained optimal solution, the predicted value of $ Y $ is "correctly classified as 1 when predicting whether it is classified as 0 or 1." It can be interpreted as "probability of being." Let's look at the distribution of that probability.
plt.figure(figsize=(12,4))
plt.hist(func1(X1, -47.16056308,   9.69474387))
plt.grid()
plt.show()

Regressing $ X_1 $ back to the sigmoid curve using the obtained optimal solution, the figure below is obtained. The blue dot is the original data, the curve is the regression curve, and the orange dot is the data after regression. This vertical axis is the "probability that the classification to $ 1 $ is correct when predicting whether it will be classified as $ 0 $ or $ 1 $".

Use matplotlib to illustrate the regression curve as shown above.
The actual classification ($ 0 $ or $ 1 $) is in a variable called $ Y $. If you set a threshold, you can calculate how accurate it is, such as classifying it as $ 1 $ if the value obtained by regression is $ 0.5 $ or more, and $ 0 $ if it is less than $ 0.5 $. here,
will do.
There are various evaluation indexes. The simplest here
ʻAccuracy = (TP + TN) / (TP + FP + FN + TN)`.
Also, as shown in the figure below, plot the probability distribution for each TP, FP, FN, TN.
print("Accuracy: ", len(TP + TN) / len(TP + FP + FN + TN))
plt.figure(figsize=(12,4))
plt.hist([TP, FP, FN, TN], label=['TP', 'FP', 'FN', 'TN'], color=['blue', 'green', 'orange', 'red'])
plt.legend()
plt.grid()
plt.show()

Let's try the case where there are two explanatory variables.
X2 = [1.4, 1.5, 1.5, 1.3, 1.5, 1.3, 1.6, 1.0, 1.3, 1.4, 
      1.0, 1.5, 1.0, 1.4, 1.3, 1.4, 1.5, 1.0, 1.5, 1.1, 
      1.8, 1.3, 1.5, 1.2, 1.3, 1.4, 1.4, 1.7, 1.5, 1.0, 
      1.1, 1.0, 1.2, 1.6, 1.5, 1.6, 1.5, 1.3, 1.3, 1.3, 
      1.2, 1.4, 1.2, 1.0, 1.3, 1.2, 1.3, 1.3, 1.1, 1.3, 
      2.5, 1.9, 2.1, 1.8, 2.2, 2.1, 1.7, 1.8, 1.8, 2.5, 
      2.0, 1.9, 2.1, 2.0, 2.4, 2.3, 1.8, 2.2, 2.3, 1.5, 
      2.3, 2.0, 2.0, 1.8, 2.1, 1.8, 1.8, 1.8, 2.1, 1.6, 
      1.9, 2.0, 2.2, 1.5, 1.4, 2.3, 2.4, 1.8, 1.8, 2.1, 
      2.4, 2.3, 1.9, 2.3, 2.5, 2.3, 1.9, 2.0, 2.3, 1.8]
X = np.array([X1, X2])
The explanatory variables define a sigmoid curve func2 with two. You will be optimizing $ a $, $ b $, and $ c $.
func2 and use scipy.optimize.curve_fit to get the optimal solution for the coefficients $ a $, $ b $ and $ c $.TP, FP, FN, TN.Since there are two explanatory variables, it is not a regression "curve" but a regression "curved surface". Let's use matplotlib to illustrate the 3D plot as a reference.
N = 1000
x1_axis = np.linspace(min(X[0]), max(X[0]), N)
x2_axis = np.linspace(min(X[1]), max(X[1]), N)
x1_grid, x2_grid = np.meshgrid(x1_axis, x2_axis)
x_mesh = np.c_[np.ravel(x1_grid), np.ravel(x2_grid)]
Assuming that the resulting $ a $, $ b $, and $ c $ have the following values:
y_plot = func2(x_mesh.T, -34.73855674,   4.53539756,   7.68378862).reshape(x1_grid.shape)
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(x1_grid, x2_grid, y_plot, cmap='bwr', linewidth=0)
fig.colorbar(surf)
ax.set_title("Surface Plot")
fig.show()

The important thing here is that we are regressing on a sigmoid curve (or curved surface), but when we think of it as a classification problem, the classification boundary is a "straight line" (or a plane).
Let's improve it so that it can handle more explanatory variables.
X3 = [7.0, 6.4, 6.9, 5.5, 6.5, 5.7, 6.3, 4.9, 6.6, 5.2, 
      5.0, 5.9, 6.0, 6.1, 5.6, 6.7, 5.6, 5.8, 6.2, 5.6, 
      5.9, 6.1, 6.3, 6.1, 6.4, 6.6, 6.8, 6.7, 6.0, 5.7, 
      5.5, 5.5, 5.8, 6.0, 5.4, 6.0, 6.7, 6.3, 5.6, 5.5, 
      5.5, 6.1, 5.8, 5.0, 5.6, 5.7, 5.7, 6.2, 5.1, 5.7, 
      6.3, 5.8, 7.1, 6.3, 6.5, 7.6, 4.9, 7.3, 6.7, 7.2, 
      6.5, 6.4, 6.8, 5.7, 5.8, 6.4, 6.5, 7.7, 7.7, 6.0, 
      6.9, 5.6, 7.7, 6.3, 6.7, 7.2, 6.2, 6.1, 6.4, 7.2, 
      7.4, 7.9, 6.4, 6.3, 6.1, 7.7, 6.3, 6.4, 6.0, 6.9, 
      6.7, 6.9, 5.8, 6.8, 6.7, 6.7, 6.3, 6.5, 6.2, 5.9]
X4 = [3.2, 3.2, 3.1, 2.3, 2.8, 2.8, 3.3, 2.4, 2.9, 2.7, 
      2.0, 3.0, 2.2, 2.9, 2.9, 3.1, 3.0, 2.7, 2.2, 2.5, 
      3.2, 2.8, 2.5, 2.8, 2.9, 3.0, 2.8, 3.0, 2.9, 2.6, 
      2.4, 2.4, 2.7, 2.7, 3.0, 3.4, 3.1, 2.3, 3.0, 2.5, 
      2.6, 3.0, 2.6, 2.3, 2.7, 3.0, 2.9, 2.9, 2.5, 2.8, 
      3.3, 2.7, 3.0, 2.9, 3.0, 3.0, 2.5, 2.9, 2.5, 3.6, 
      3.2, 2.7, 3.0, 2.5, 2.8, 3.2, 3.0, 3.8, 2.6, 2.2, 
      3.2, 2.8, 2.8, 2.7, 3.3, 3.2, 2.8, 3.0, 2.8, 3.0, 
      2.8, 3.8, 2.8, 2.8, 2.6, 3.0, 3.4, 3.1, 3.0, 3.1, 
      3.1, 3.1, 2.7, 3.2, 3.3, 3.0, 2.5, 3.0, 3.4, 3.0]
X = np.array([X1, X2, X3, X4])
Improve the function to accommodate any number of explanatory variables.
import numpy as np
def func(X, *params):
    f = np.zeros_like(X[0])
    for i, param in enumerate(params):
        if i == 0:
            f = f + param
        else:
            f = f + np.array(param * X[i - 1])
    return 1. / (1. + np.exp(-f))
Refer to Optimization such as interpolation and curve fitting.
func to make a prediction using only the first variable $ X_1 $ and calculate the correct answer rate.func to make a prediction using only the first two variables $ X_1 $ and $ X_2 $, and calculate the correct answer rate.func to make a prediction using only the first three variables $ X_1 $, $ X_2 $, $ X_3 $, and calculate the correct answer rate.func to make a prediction using all four variables $ X_1 $, $ X_2 $, $ X_3 $, $ X_4 $, and calculate the correct answer rate.The above calculation is called "logistic regression". Logistic regression is used as one of the "classification" methods in machine learning. This time, the purpose was to understand logistic regression and how to use scipy.optimize.curve_fit, but in practice, it is convenient to calculate using a machine learning library called scikit-learn.
Multilayer perceptron (MLP) is a type of feedforward neural network consisting of at least three node layers (input layer X, hidden layer H, output layer ʻO`). Individual nodes other than the input node are neurons that use a nonlinear activation function, and by using a supervised learning method called backpropagation, data that is not linearly separable can be identified. .. One of the functions used as a nonlinear activation function is the sigmoid function.
This time I will implement MLP with scipy.optimize.curve_fit as follows.
def mlp(X, *params):
    h01, h02, h03, h04, h0b = params[0], params[1], params[2], params[3], params[4]
    h11, h12, h13, h14, h1b = params[5], params[6], params[7], params[8], params[9]
    h21, h22, h23, h24, h2b = params[10], params[11], params[12], params[13], params[14]
    o01, o02, o03, o0b = params[15], params[16], params[17], params[18]
    h0 = 1. / (1. + np.exp(-(h01 * X[0] + h02 * X[1] + h03 * X[2] + h04 * X[3] + h0b)))
    h1 = 1. / (1. + np.exp(-(h11 * X[0] + h12 * X[1] + h13 * X[2] + h14 * X[3] + h1b)))
    h2 = 1. / (1. + np.exp(-(h21 * X[0] + h22 * X[1] + h23 * X[2] + h24 * X[3] + h2b)))
    o0 = 1. / (1. + np.exp(-(o01 * h0 + o02 * h1 + o03 * h2 + o0b)))
    return o0
mlp works.mlp to make a prediction using all four variables $ X_1 $, $ X_2 $, $ X_3 $, $ X_4 $, and calculate the correct answer rate.MLP has the simplest configuration of machine learning methods called deep learning. This time, the purpose was to understand MLP and how to use scipy.optimize.curve_fit, but in practice, it is convenient to calculate using libraries such as scikit-learn and pytorch.
Reference article: Govern a lot with PyTorch from linear multiple regression to logistic regression, multi-layer perceptron, autoencoder
Recommended Posts