-** I will describe how to implement time series analysis of a general Gauss state-space model using Python ** --Implement using stats models --Implementation when seasonality and exogenousness are taken into consideration --The theoretical part is limited to the reference article. --No detailed examination (cannot be done ...) -** "Basics of Time Series Analysis and State Space Models Theory and Implementation Learned with R and Stan" I read it, but I was in trouble because I didn't understand R! Write in python! !! Person **
I am studying time series analysis using ** "Basics of Time Series Analysis and State Space Models: Theory and Implementation Learned with R and Stan" ** (so-called Hayabusa). However, since I am used to Python, I skipped the implementation in R and implemented it by referring to the following article on the blog of the author of the book. --State space model by Python: https://logics-of-blue.com/python-state-space-models/
In this article, I was able to understand how to implement a general Gaussian state-space model using Python, but it was not as effective as the implementation method when it was exogenous. Therefore, we will use ** separate data to summarize the implementation and effects of exogenous data. ** **
--Using TEPCO's 2019 electricity demand data: http://www.tepco.co.jp/forecast/index-j.html
python
df = pd.read_csv("data/juyo-2019.csv",encoding="shift-jis")
df.head()
Since the data is hourly, reorganize it into daily data. You can do it right away with the groupby function. Then correct the index of the data to date.
python
df["date"] = pd.to_datetime(df["date"],format="%Y/%m/%d")
df["power"] = df["power"].values.astype(int)
daily_data = df.groupby("date")["power"].sum().reset_index()
daily_data = daily_data.set_index(["date"])
rcParams["figure.figsize"] = [12,5]
daily_data.plot()
Looking at the data, we can see that there are ** periodic jaggedness **. Since this fluctuates every 7 days, it can be inferred that it is a cycle related to the value day of the week. In other words, electricity demand changes depending on whether it is a holiday or not.
In addition, it is presumed that not only the days of the week but also holidays affect the power demand. Holidays do not occur periodically and can be modeled as extrinsic.
** In summary, we will try to model more accurately by modeling the following. --Periodic: Occurs every 7 days on the day of the week --Extrinsic: Occurs randomly on holidays **
First, let's model without considering seasonality, that is, the day of the week. Model using a local linear model that does not consider seasonality and exogenousness. For this modeling,
python
#Local linear trend model
mod_trend = sm.tsa.UnobservedComponents(daily_data,"local linear trend")
res_trend = mod_trend.fit(method="bfgs")
rcParams["figure.figsize"] = 15,15
fig = res_trend.plot_components()
In the figure, the top is the actual data, the second is the state, and the third is the trend data. ** You can see that the trend is messed up without considering the periodicity. ** **
Next, when considering periodicity. All you have to do is put the seasonal argument in Unobserved Components. Since the day of the week is taken into consideration here, the argument is set to 7. A graph showing seasonality has been added to the fourth of the figure. You can see that it fluctuates periodically.
python
#Local linear trend model with seasonality
mod_season_trend = sm.tsa.UnobservedComponents(daily_data.power,
"local linear trend",
seasonal = 7)
res_season_trend = mod_season_trend.fit(
method='bfgs',
maxiter=1000,
)
rcParams["figure.figsize"] = 15,20
fig = res_season_trend.plot_components()
** The big difference from the previous one is that the trend data is no longer jagged. ** ** By removing the periodicity, the trend can be seen more accurately than before. (I'm worried that the first data per month in the periodic graph is high ...)
You can also see that the level graph has decreased significantly at the beginning of May. This is probably due to the GW rather than the state changing. So next, let's consider a model when holidays are incorporated as extrinsic.
Finally, it is a model when exogenous is also taken into consideration. Consider holidays as extrinsic. Import jpholiday to put the holiday flag. (Here is the code when using google colab) Add a holiday column that will be 1 for holidays.
Since Saturday and Sunday are closed from the beginning, the holiday flag is not set.
python
#Put a holiday flag, in the case of google colab
!pip install jpholiday
import jpholiday
#Holiday flag
daily_data["holiday"] = 0
for i in range(len(daily_data)):
if daily_data.index[i].dayofweek != 6: #Only if not Sunday
date_i = daily_data.index[i] #date
if jpholiday.is_holiday(date_i): #For holidays
daily_data.loc[date_i,"holiday"] = 1
daily_data.head()
If it's only a holiday, you can't consider consecutive holidays. Manually rewrite the winter vacation from December to January, the GW in May, and the Obon holiday in August to 1.
python
list1 = daily_data.query("20190101 <= index <= 20190104").index
list2 = daily_data.query("20190429 <= index <= 20190503").index
list3 = daily_data.query("20190812 <= index <= 20190816").index
list4 = daily_data.query("20191230 <= index <= 20191231").index
for i in list1:
daily_data.loc[i,"holiday"]=1
for i in list2:
daily_data.loc[i,"holiday"]=1
for i in list3:
daily_data.loc[i,"holiday"]=1
for i in list4:
daily_data.loc[i,"holiday"]=1
Finally, let's create a local linear trend model with exogenous and plot it. ** Put exog in the argument and specify the holiday flag created earlier. ** **
python
#Local linear trend model with seasonality and exogenousness
mod_season_exog_trend = sm.tsa.UnobservedComponents(
daily_data.power,
"local linear trend",
seasonal=7,
exog=daily_data.holiday
)
res_season_exog_trend = mod_season_exog_trend.fit(
method="bfgs",
maxiter=1000,
)
#Drawing estimated state / trend / seasonal effects
rcParams['figure.figsize'] = 15, 20
fig = res_season_exog_trend.plot_components()
Focusing on the beginning of May, the decrease is ** slightly ** suppressed compared to the case of periodicity alone. However, it seems that this decrease is not completely suppressed only by adding the holiday flag.
It seems that further improvement is needed, but this was the limit of my current ability. .. ..
--Implementing seasonal and extrinsic models using stats model --Implemented seasonality as a day of the week and extrinsic as a holiday for electricity demand. --The periodic model was highly effective and could be implemented. —— Holidays alone are not enough for extrinsic models and need further improvement
Thank you for reading for me until the end.
--State space model by Python: https://logics-of-blue.com/python-state-space-models/ --Quick trend extraction: Introduction to time series analysis with statsmodels in Python: https://data.gunosy.io/entry/statsmodel_trend
Recommended Posts