Notes on how to use StatsModels that can use linear regression and GLM in python

StatsModels

It is a package that can use various statistical models such as linear regression, logistic regression, generalized linear model, ARIMA model, and calculation of autocorrelation function.

API list https://www.statsmodels.org/stable/api.html

  1. install

Enter with pip https://www.statsmodels.org/stable/install.html

terminal


pip install statsmodels

2. Linear regression

You can create a linear regression model with ordinary least squares with statsmodels.api.OLS (). If x is composed only of variables, there is no y-intercept, and if a constant sequence is added to x with statsmodels.api.add_constant (), the regression is with y-intercept.

python


import numpy as np

import statsmodels.api as sm

spector_data = sm.datasets.spector.load(as_pandas=False)
x = spector_data.exog
xc = sm.add_constant(x, prepend=False)
y = spector_data.endog
print(xc.shape, y.shape)

# Fit and summarize OLS model
model = sm.OLS(y, xc)
res = model.fit()

print(res.summary())

image.png You can also retrieve each value

python


>>> res.params  #coefficient
array([ 0.46385168,  0.01049512,  0.37855479, -1.49801712])

>>> res.pvalues  #P value
array([0.00784052, 0.59436148, 0.01108768, 0.00792932])

>>> res.aic, res.bic  #Akaike Information Criterion, Bayesian Information Criterion
(33.95649234217083, 39.81943595336974)

>>> res.bse  #Standard error
array([0.16195635, 0.01948285, 0.13917274, 0.52388862])

>>> res.resid  #Residual error
array([ 0.05426921, -0.07340692, -0.27529932,  0.01762875,  0.42221284,
       -0.00701576,  0.03936941, -0.05363477, -0.16983152,  0.37535999,
        0.06818476, -0.28335827, -0.39932119,  0.72348259, -0.41225249,
        0.0276562 , -0.03995305, -0.01409045, -0.56914272,  0.39131297,
       -0.06696482,  0.14645583, -0.36800073, -0.78153024,  0.22554445,
        0.52339378,  0.36858806, -0.37090458,  0.20600614,  0.0226678 ,
       -0.53887544,  0.8114495 ])

Estimate is predict ()

python


result.predict(xc)

result


array([-0.05426921,  0.07340692,  0.27529932, -0.01762875,  0.57778716,
        0.00701576, -0.03936941,  0.05363477,  0.16983152,  0.62464001,
       -0.06818476,  0.28335827,  0.39932119,  0.27651741,  0.41225249,
       -0.0276562 ,  0.03995305,  0.01409045,  0.56914272,  0.60868703,
        0.06696482,  0.85354417,  0.36800073,  0.78153024,  0.77445555,
        0.47660622,  0.63141194,  0.37090458,  0.79399386,  0.9773322 ,
        0.53887544,  0.1885505 ])

3. Logistic regression

python


import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm

# Load the data from Spector and Mazzeo (1980)
spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog)

y = spector_data.endog
x = spector_data.exog

# Follow statsmodles ipython notebook
model = sm.Logit(y, x)
res = model.fit(disp=0)

print(res.summary())

image.png You can get various values as well.

python


>>> res.params
array([-13.02134686,   2.82611259,   0.09515766,   2.37868766])

>>> res.pvalues
array([0.00827746, 0.02523911, 0.50143424, 0.0254552 ])

>>> res.aic, res.bic
(33.779268444262826, 39.642212055461734)

>>> res.bse
array([4.93132421, 1.26294108, 0.14155421, 1.06456425])

>>> res.resid_dev
array([-0.23211021, -0.35027122, -0.64396264, -0.22909819,  1.06047795,
       -0.26638437, -0.23178275, -0.32537884, -0.48538752,  0.85555565,
       -0.22259715, -0.64918082, -0.88199929,  1.81326864, -0.94639849,
       -0.24758297, -0.3320177 , -0.28054444, -1.33513084,  0.91030269,
       -0.35592175,  0.44718924, -0.74400503, -1.95507406,  0.59395382,
        1.20963752,  0.95233204, -0.85678568,  0.58707192,  0.33529199,
       -1.22731092,  2.09663887])

python


>>> res.predict(x)
array([0.02657799, 0.05950125, 0.18725993, 0.02590164, 0.56989295,
       0.03485827, 0.02650406, 0.051559  , 0.11112666, 0.69351131,
       0.02447037, 0.18999744, 0.32223955, 0.19321116, 0.36098992,
       0.03018375, 0.05362641, 0.03858834, 0.58987249, 0.66078584,
       0.06137585, 0.90484727, 0.24177245, 0.85209089, 0.83829051,
       0.48113304, 0.63542059, 0.30721866, 0.84170413, 0.94534025,
       0.5291172 , 0.11103084])

4. Generalized linear model

Select the distribution and link function from the following combinations image.png Also, the details about the distribution and the link function are summarized below. https://www.statsmodels.org/stable/glm.html#families

The family = sm.families.Gamma () part of sm.GLM () is the part that specifies the distribution and link function. In the following, the default inverse is used because the link function is not specified in the gamma distribution, but when using log, it should be sm.families.Gaussian (sm.families.links.log).

python


import statsmodels.api as sm
data = sm.datasets.scotland.load(as_pandas=False)
x = sm.add_constant(data.exog)
y = data.endog

model = sm.GLM(y, x, family=sm.families.Gamma())
res = model.fit()
res.summary()

image.png

python


>>> res.params
[-1.77652703e-02  4.96176830e-05  2.03442259e-03 -7.18142874e-05
  1.11852013e-04 -1.46751504e-07 -5.18683112e-04 -2.42717498e-06]

>>> res.scale
0.003584283173493321

>>> res.deviance
0.08738851641699877

>>> res.pearson_chi2
0.08602279616383915

>>> res.llf
-83.01720216107174

python


>>> res.predict(x)
array([57.80431482, 53.2733447 , 50.56347993, 58.33003783, 70.46562169,
       56.88801284, 66.81878401, 66.03410393, 57.92937473, 63.23216907,
       53.9914785 , 61.28993391, 64.81036393, 63.47546816, 60.69696114,
       74.83508176, 56.56991106, 72.01804172, 64.35676519, 52.02445881,
       64.24933079, 71.15070332, 45.73479688, 54.93318588, 66.98031261,
       52.02479973, 56.18413736, 58.12267471, 67.37947398, 60.49162862,
       73.82609217, 69.61515621])

Recommended Posts

Notes on how to use StatsModels that can use linear regression and GLM in python
How to use is and == in Python
[Hyperledger Iroha] Notes on how to use the Python SDK
How to use SQLite in Python
Notes on how to use pywinauto
[Introduction to Udemy Python 3 + Application] 36. How to use In and Not
Notes on how to use featuretools
Comparison of how to use higher-order functions in Python 2 and 3
How to use Mysql in python
How to use ChemSpider in Python
How to use PubChem in Python
Notes on how to use marshmallow in the schema library
Notes on how to use doctest
How to install OpenCV on Cloud9 and run it in Python
How to use python put in pyenv on macOS with PyCall
[Introduction to Python] How to use class in Python?
How to install and use pandas_datareader [Python]
Memorandum on how to use gremlin python
How to use __slots__ in Python class
How to use Python zip and enumerate
How to use regular expressions in Python
How to use the C library in Python
How to generate permutations in Python and C ++
How to use Python Image Library in python3 series
How to use Python Kivy ④ ~ Execution on Android ~
Summary of how to use MNIST in Python
How to use tkinter with python in pyenv
[Python] How to use hash function and tuple.
How to plot autocorrelation and partial autocorrelation in python
Tips for those who are wondering how to use is and == in Python
"Linear regression" and "Probabilistic version of linear regression" in Python "Bayesian linear regression"
[For beginners] How to use say command in python!
A memorandum on how to use keras.preprocessing.image in Keras
[Python] How to sort dict in list and instance in list
Autoencoder with Chainer (Notes on how to use + trainer)
How to use Django on Google App Engine / Python
How to use the model learned in Lobe in Python
How to use Decorator in Django and how to make it
python3: How to use bottle (2)
How to use Python argparse
[Python] How to use checkio
How to develop in Python
Online linear regression in Python
[Python] How to use input ()
How to use Python lambda
[Python] How to use virtualenv
python3: How to use bottle (3)
python3: How to use bottle
How to use Python bytes
How to test that Exception is raised in python unittest
How to swap elements in an array in Python, and how to reverse an array.
How to use the __call__ method in a Python class
How to use VS Code in venv environment on windows
Install pyenv on MacBook Air and switch python to use
How to create and use static / dynamic libraries in C
How to do Bulk Update with PyMySQL and notes [Python]
How to write a metaclass that supports both python2 and python3
How to execute external shell scripts and commands in python
How to log in to AtCoder with Python and submit automatically
Notes on reading and writing float32 TIFF images in python
[C / C ++] Pass the value calculated in C / C ++ to a python function to execute the process, and use that value in C / C ++.