--A model that makes predictions by ** calculating the probability ** from several explanatory variables.
-** A type of generalized linear model **.
--Although it has the name "regression", it is often used for ** "classification" **.
-** A linear model that can be used even when the response variable ** follows a probability distribution other than the normal distribution.
For example ** 〇 Weight = β0 + β1 × Height ** (Weight is a variable that follows a normal distribution)
** ✖ Clothing size = β + β1 × Height ** (Clothing size is clearly not a variable that follows a normal distribution)
Number of ice cream sold = β0 + β1 × Temperature ** (Response variable) (Linear predictor) **
The "number of ice cream sold" can only be positive, but the right side may be negative depending on the temperature.
**Therefore! !! ** ** Introduce a ** link function (log function) ** that will be the savior.
** log (number of ice cream sold) ** = β0 + β1 × Temperature
** ✖ Test pass / fail (1,0) = β0 + β1 × Study time ** The right-hand side is clearly not an expression that takes only 1 or 0 values.
** ✖ Test pass rate = β0 + β1 × Study time ** However, this is still insufficient. The pass rate should range from 0 to 1, but not on the right side.
Therefore!! Introduce a ** link function (logit function) ** that will be the savior.
** log (p / 1-p) = β0 + β1 × Study time ** If this is made into the form of p = 〇,
** p = 1 / {1 + exp (-(β0 + β1 × study time))} ** By using this formula, the right side will take a range of 0 to 1.
The goal is to optimize ** parameters β0 and β1 of this equation.
Consider the ** likelihood function **. The predicted value of the nth person is
【solution】 ① Eliminate multiplication by taking ** logarithm **. (Can be added) (2) By adding ** minus **, you can execute the ** gradient descent method **. (Because the gradient descent method is suitable for finding the minimum value)
The above equation is called the ** cross entropy error function **.
The optimum value of the parameter is obtained by differentiating ** β0 and β1 ** using this function ** gradient descent method **!
** This time I would like to analyze using the dataset of the sklearn library. ** **
import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
iris=load_iris()
df=pd.DataFrame(iris.data,columns=iris.feature_names)
df["target"]=iris.target
X=iris.data[50:,2].reshape(-1,1) #target 0~1 out of 2,Get only 2.
y=iris.target[50:]
from sklearn.linear_model import LogisticRegression
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
scaler=StandardScaler()#Standardization
X_scaled=scaler.fit_transform(X)
X_train,X_test,y_train,y_test=train_test_split(X_scaled,y,random_state=0)
log_reg=LogisticRegression().fit(X_train,y_train)
print(model.coef_) #Regression variable display
print(model.intercept_) #Intercept of regression line
print(log_reg.score(X_train,y_train)) #Output the coefficient of determination.
print(log_reg.score(X_test,y_test)) #Output the coefficient of determination.
Recommended Posts