Call dlm from python to run a time-varying coefficient regression model

This is an article to try the time-varying coefficient regression model, which is one of the variations of the state-space model. Since it is easy to use R dlm, this time I will call and execute R dlm from Python through rpy2. For information on how to use the dlm time-varying coefficient model by R, see Logics of Blue "[Dlm-based time-varying coefficient model](http://logics-of-blue.com/dlm%E3%81%AB%E3%82%". 88% E3% 82% 8B% E6% 99% 82% E5% A4% 89% E4% BF% 82% E6% 95% B0% E3% 83% A2% E3% 83% 87% E3% 83% AB / ) ”Was a great reference. In this article, I changed the explanatory variables from one to two from this model and tried using it from Python through rpy2.

1. What is a time-varying coefficient regression model?

As the name implies, it is a model in which the regression coefficient changes with time.

y_t = a_t x_t + b_t + v_t \quad \cdots (1)

$ y_t $: Dependent variable at time $ t $ $ x_t $: Explanatory variable for time $ t $ $ a_t $: Regression coefficient at time point $ t $ $ b_t $: intercept at time $ t $ $ v_t $: Observation error at time point $ t $

As shown, the regression coefficient changes with time. that time,

a_t = a_{t-1} + e_t \quad \cdots (2)

As shown in the above, the error of $ e_t $ is added to the regression coefficient of the previous period, and it becomes the regression coefficient of the period $ t $. In terms of the state space model, (1) is the observation model and (2) is the system model.

This time I would like to try the case where there are two explanatory variables, so the series of equations is as follows.

\begin{aligned}
y_t &= a^{(1)}_t x^{(1)}_t + a^{(2)}_t x^{(2)}_t +b_t + v_t \\
a^{(1)}_t &= a^{(1)}_{t-1} + e^{(1)}_t \\
a^{(2)}_t &= a^{(2)}_{t-1} + e^{(2)}_t \\
b_t &= b_{t-1} + w_t \\
\\
v_t &\sim N(0, \sigma_{v})\\
e^{(1)}_t &\sim N(0, \sigma_{e}^{(1)}) \\
e^{(2)}_t &\sim N(0, \sigma_{e}^{(2)}) \\
w_t &\sim N(0, \sigma_{w})\\
\end{aligned}

There is one observation model and three system models. There are 3 latent variables, $ a ^ {(1)} _t, a ^ {(2)} _t, b_t $ x $ 3t $ of hours $ t $.

2. Create data

After setting various parameters according to the equation above, generate random numbers to create data. The parameter specifications of this artificial data are based on [1].

rd.seed(71)

T = 500
v_sd = 20 #Root-mean-squared of observation error
i_sd = 10 #Standard deviation of intercept
a10 = 10
e_sd1 = .5 #Standard deviation of variation of regression coefficient 1
a20 = 20
e_sd2 = .8 #Standard deviation of variation of regression coefficient 1

#Two time-varying regression coefficients
e1 = np.random.normal(0, e_sd1, size=T)
a1 = e1.cumsum() + a10
e2 = np.random.normal(0, e_sd2, size=T)
a2 = e2.cumsum() + a20

# intercept
intercept = np.cumsum(np.random.normal(0, i_sd, size=T))

#Explanatory variable
x1 = rd.normal(10, 10, size=T)
x2 = rd.normal(10, 10, size=T)

#Dependent variable
v = np.random.normal(0, v_sd, size=T) #Observation error
y = intercept + a1*x1 + a2*x2 + v
y_noerror = intercept + a1*x1 + a2*x2  #Y when there was no observation error

The resulting graph is below. First, a plot of the three latent variables you want to estimate.

Explanatory variables are data that follow two independent normal distributions.

As a result, the graph of $ y $ doesn't seem to have any structure with just random numbers to the human eye. I would like to confirm whether the latent variables can be estimated well from this data.

3. Model estimation using dlm

Well, here is the production. I would like to call dlm of R from Python through rpy2 to estimate the time-varying coefficient regression model.

3-1. rpy2 import

Import rpy2.

import rpy2
from rpy2.robjects.packages import importr
import rpy2.robjects as robjects
from rpy2.rinterface import R_VERSION_BUILD
from rpy2.robjects import pandas2ri
pandas2ri.activate()

Import the R library, dlm.

dlm = importr("dlm")

3-2. Model specification of state space model

Here you specify the type of the state space model. This time, since it is a regression model with explanatory variables, we will use dlmModReg. X is an explanatory variable, and this time there are two, x1 and x2, so set that data. dV is the variance of the observation model error, which represents the variance of $ v_t $ $ \ sigma_v $. dW is the error variance of the system model. There are three system models related to $ e ^ {(1)} _t, e ^ {(2)} _t, w_t $ this time.

It corresponds to $ \ sigma_ {e} ^ {(1)}, \ sigma_ {e} ^ {(2)}, \ sigma_w $, respectively.

python


buildDlmReg = lambda theta: dlm.dlmModReg(
    X=df[["x1", "x2"]], 
    dV=np.exp(theta[0]), 
    dW=[np.exp(theta[1]), np.exp(theta[2]), np.exp(theta[3])]
  )

Now that the model type has been determined, set the observed value y and obtain the parameters by the maximum likelihood method. Since the estimation may be incorrect depending on the initial value, it is estimated using dlmMLE twice. It means that the result of the first dlmMLE is used as the initial value and the seconddlmMLE is executed to improve the stability.

3-3. Estimating the variance of the error by the maximum likelihood method

python


#Maximum likelihood estimation in two steps
%time parm = dlm.dlmMLE(y, parm=[2, 1, 1, 1], build=buildDlmReg, method="L-BFGS-B") 
par = np.array(get_method(parm, "par"))
print("par:", par)
%time fitDlmReg = dlm.dlmMLE(y, parm=par, build=buildDlmReg, method="SANN")

Let's display the result.

python


show_result(fitDlmReg)

The following par corresponds to the estimation results of the four standard deviations of $ \ sigma_v, e ^ {(1)} _t, e ^ {(2)} _t, w_t $, but it is set to dlmModReg. Sometimes I put np.exp in between to prevent the value from becoming negative, so if I take the result np.exp and then take np.sqrt, it will be the standard deviation.

out


par [1]  5.9998846  4.5724213 -1.6572242 -0.9232988

value [1] 2017.984

counts function gradient 
   10000       NA 

convergence [1] 0

message NULL

When you actually calculate, you can see that the parameters specified when creating the artificial data are almost reproduced.

python


par = np.asanyarray(get_method(fitDlmReg, "par"))
modDlmReg = buildDlmReg(par)
estimated_sd = np.sqrt(np.exp(np.array(get_method(fitDlmReg, "par"))))
pd.DataFrame({"estimated_sd":estimated_sd, "sd":[v_sd, i_sd, e_sd1, e_sd2]})
estimated_sd sd
0 20.084378 20.0
1 9.837589 10.0
2 0.436655 0.5
3 0.630243 0.8

3-4. Performing filtering and smoothing

Now, using this result, find the value filtered by the Kalman filter, and then find the smoothed value from that filter value.

python


#Kalman filter
filterDlmReg = dlm.dlmFilter(y, modDlmReg)

#Smoothing
smoothDlmReg = dlm.dlmSmooth(filterDlmReg)

The obtained filtering and smoothing results are returned to the Python world and stored as ndarray.

python


#Get filtered values
filtered_a = np.array(get_method(filterDlmReg, "a"))

#Get smoothed value
smoothed_a = np.array(get_method(smoothDlmReg, "s"))

3-5. Visualization of estimation results

Below is a graph of the two results of the intercept ʻinterceptand the regression coefficients $ a ^ {(1)} _ t and a ^ {(2)} _ t $. The original value of the artificial data is shown in blue, the filtered value is shown in green, and the smoothed value is shown in red, but I think that the movement of the original data can be reproduced considerably. Of course, dlm is given only three data,x1, x2, y`, and the type of the state space model, but the intercept and regression coefficient can be restored properly! (The value is rampant only near the initial value)

python


plt.figure(figsize=(12,5))
plt.plot(df.intercept, label="intercept(original)")
plt.plot(filtered_a[1:,0], label="intercept(filtered)")
plt.plot(smoothed_a[1:,0], label="intercept(smoothed)")
plt.legend(loc="best")
plt.title("plot of intercept")
graph_intercept.png

python


plt.figure(figsize=(12,5))
plt.plot(df.a1, label="a1(original)")
plt.plot(filtered_a[1:,1], label="a1(filtered)")
plt.plot(smoothed_a[1:,1], label="a1(smoothed)")
plt.legend(loc="best")
plt.title("plot of regression coeff a1")
graph_a1.png

python


plt.figure(figsize=(12,5))
plt.plot(df.a2, label="a2(original)")
plt.plot(filtered_a[1:,2], label="a2(filtered)")
plt.plot(smoothed_a[1:,2], label="a2(smoothed)")
plt.legend(loc="best")
plt.title("plot of regression coeff a2")
graph_a2.png

Finally, let's compare the one without the observation error of the y value and the one calculated with y using the smoothed latent variable. You can see that the true y value can be reproduced considerably by removing the observation error here as well!

python


estimatedLevel = smoothed_a[1:,0] + df.x1*smoothed_a[1:,1] + df.x2*smoothed_a[1:,2]

python


plt.figure(figsize=(12,5))
plt.plot(y_noerror, lw=4, label="y(original)")
plt.plot(estimatedLevel, "r", lw=1, label="y(estimated) [no observation error]")
plt.legend(loc="best")
plt.title("plot of target value.")
graph_y.png

reference

[1] Logics of blue "Time-varying coefficient model by dlm" http://logics-of-blue.com/dlm%E3%81%AB%E3%82%88%E3%82%8B%E6%99%82%E5%A4%89%E4%BF%82%E6%95%B0%E3%83%A2%E3%83%87%E3%83%AB/

[2] This code is uploaded (github) https://github.com/matsuken92/Qiita_Contents/blob/master/General/dlm_time_varying_regression.ipynb

[2] Version of this code that can be supported by n variables (github) https://github.com/matsuken92/Qiita_Contents/blob/master/General/dlm_time_varying_regression_n-var.ipynb

Recommended Posts

Call dlm from python to run a time-varying coefficient regression model
[Python] How to call a c function from python (ctypes)
How to run a Python program from within a shell script
Call Matlab from Python to optimize
Call a Python function from p5.js.
Simple code to call a python program from Javascript on EC2
Send a message from Python to Slack
Call a command from Python (Windows version)
How to run a Maya Python script
When running a Python shell from Electron, pass multiple arguments to run Python.
Edit Excel from Python to create a PivotTable
How to open a web browser from python
How to generate a Python object from JSON
Call a Python script from Embedded Python in C ++ / C ++
Run a Python file from html using Django
Run a python script from excel (using xlwings)
An easy way to call Java from Python
C language to see and remember Part 3 Call C language from Python (argument) c = a + b
Changes from Python 3.0 to Python 3.5
Changes from Python 2 to Python 3.0
Create a plugin to run Python Doctest in Vim (2)
Run python from excel
Create a plugin to run Python Doctest in Vim (1)
A memorandum to run a python script in a bat file
From buying a computer to running a program with python
Consider a conversion from a Python recursive function to a non-recursive function
Python script to create a JSON file from a CSV file
A mechanism to call a Ruby method from Python that can be done in 200 lines
How to create a kubernetes pod from python code
I want to run a quantum computer with Python
Call Rust from Python to speed it up! PyO3 Tutorial: Wrapping a Simple Function Part ➀
Call Rust from Python to speed it up! PyO3 Tutorial: Wrapping a Simple Function Part ➁
Call C language functions from Python to exchange multidimensional arrays
How to slice a block multiple array from a multiple array in Python
How to run a Python file at a Windows 10 command prompt
How to use the __call__ method in a Python class
How to launch AWS Batch from a python client app
I want to start a lot of processes from python
I want to send a message from Python to LINE Bot
Go language to see and remember Part 8 Call GO language from Python
How to call Python or Julia from Ruby (experimental implementation)
Extract the value closest to a value from a Python list element
Call CPLEX from Python (DO cplex)
Post from Python to Slack
Cheating from PHP to Python
A road to intermediate Python
How to call a function
Run illustrator script from python
Anaconda updated from 4.2.0 to 4.3.0 (python3.5 updated to python3.6)
Switch from python2.7 to python3.6 (centos7)
How to run Notepad ++ Python
Connect to sqlite from python
[Road to Python Intermediate] Call a class instance like a function with __call__
I created a Python library to call the LINE WORKS API
Create a shell script to run the python file multiple times
Post a message from IBM Cloud Functions to Slack in Python
From building a Python environment for inexperienced people to Hello world
Try to extract a character string from an image with Python3
[Python] How to get & change rows / columns / values from a table.
From installing Ansible to building a Python environment in Vagrant's virtual environment
A story about trying to run multiple python versions (Mac edition)