What is Logistic Regression Analysis?

--A model that makes predictions by ** calculating the probability ** from several explanatory variables.

-** A type of generalized linear model **.

--Although it has the name "regression", it is often used for ** "classification" **.

What is a generalized linear model?

-** A linear model that can be used even when the response variable ** follows a probability distribution other than the normal distribution.

For example ** 〇 Weight = β0 + β1 × Height ** (Weight is a variable that follows a normal distribution)

** ✖ Clothing size = β + β1 × Height ** (Clothing size is clearly not a variable that follows a normal distribution)

The response variable must correspond to the linear predictor

What if the response variable is the number of ice cream sold?

Number of ice cream sold = β0 + β1 × Temperature ** (Response variable) (Linear predictor) **

The "number of ice cream sold" can only be positive, but the right side may be negative depending on the temperature.

**Therefore! !! ** ** Introduce a ** link function (log function) ** that will be the savior.

** log (number of ice cream sold) ** = β0 + β1 × Temperature

What if the response variable is probability (pass rate)?

** ✖ Test pass / fail (1,0) = β0 + β1 × Study time ** The right-hand side is clearly not an expression that takes only 1 or 0 values.

** ✖ Test pass rate = β0 + β1 × Study time ** However, this is still insufficient. The pass rate should range from 0 to 1, but not on the right side.

Therefore!! Introduce a ** link function (logit function) ** that will be the savior.

** log (p / 1-p) = β0 + β1 × Study time ** If this is made into the form of p = 〇,

** p = 1 / {1 + exp (-(β0 + β1 × study time))} ** By using this formula, the right side will take a range of 0 to 1.

The goal is to optimize ** parameters β0 and β1 of this equation.

データ

How do you define "optimal"?

Consider the ** likelihood function **. The predicted value of the nth person is

Tn ... Correct label (0 or 1)
Yn… ** p = 1 / {1 + exp (β0 + β1 × study time)} ** Output value of the nth person The purpose is ** maximization ** of this formula. However, multiplication is complicated to calculate, and when trying to calculate ** probability synergies **, the value approaches 0 infinitely. ** (Underflow) **.

【solution】 ① Eliminate multiplication by taking ** logarithm **. (Can be added) (2) By adding ** minus **, you can execute the ** gradient descent method **. (Because the gradient descent method is suitable for finding the minimum value) データ

The above equation is called the ** cross entropy error function **.

The optimum value of the parameter is obtained by differentiating ** β0 and β1 ** using this function ** gradient descent method **!

Experiment

** This time I would like to analyze using the dataset of the sklearn library. ** **

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris

iris=load_iris()
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target

X=iris.data[50:,2].reshape(-1,1) #target 0~1 out of 2,Get only 2.
y=iris.target[50:]


from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split


scaler=StandardScaler()#Standardization
X_scaled=scaler.fit_transform(X)

X_train,X_test,y_train,y_test=train_test_split(X_scaled,y,random_state=0)

log_reg=LogisticRegression().fit(X_train,y_train)

print(model.coef_) #Regression variable display
print(model.intercept_) #Intercept of regression line

print(log_reg.score(X_train,y_train)) #Output the coefficient of determination.
print(log_reg.score(X_test,y_test)) #Output the coefficient of determination.