Hello. Future Search I'm Sugato from Brazil.

I don't know what number to brew today, but I would like to write about forecasting time series data.

1.First of all

There is an image that prediction of time series data is basically not so usable, but I would like to see how much it is and whether it can be used in practice.

The specifics I tried are as follows

** Get daily followers on Twitter ** ~~ That astringent ~~ I will do my best with the API

** Try to predict the number of followers on your Twitter ** (1) Predicted by SARIMA model ・ [Combining neural network model with seasonal time series ARIMA model] https://www.sciencedirect.com/science/article/pii/S004016250000113X ・ [Analysis of time series data with SARIMA (prediction of PV number)] https://www.kumilog.net/entry/sarima-pv @xkumiyu

(2) Prediction with Prophet model ・ [Prophet Official] https://facebook.github.io/prophet/docs/quick_start.html ・ [Time Series Analysis Library Prophet Official Document Translation 1 (Overview & Features)] https://qiita.com/japanesebonobo/items/96868e58d4da42d36807 @japanesebonobo

Contents of this time

Predicting the number of followers, which is decreasing day by day without tweeting, makes my heart even more deep. To conclude first, the number of followers will decrease, and there is no prospect of an increase.

2. Environment

Machine
- Mac --version 10.15.1
Python
- Python3 --version 3.7.0

3. Preparation

The daily follower number data looks like this. I can't stand to see it. (Https://twitter.com/Ndtn_/) http://web.sfc.wide.ad.jp/~nadechin/follower.csv

date        follower
2018/9/6	39.569
2018/9/7	39.57
2018/9/8	39.573
   .           .
   .           .
   .           .
2019/12/10	37.861

4. Processing time series data

Separate training data and test data. It doesn't matter if it's pandas or numpy, but for the time being, ・ 2018/09/06 ~ 2019/12/10 Original data ・ 2018/09/06 ~ 2019/11/30 learning data ・ 2019/12/01 ~ 2019/12/10 test data

Confirm the stationarity of the data by ADF test. ・ [Statsmodels.tsa.stattools.adfuller] http://www.statsmodels.org/dev/generated/statsmodels.tsa.stattools.adfuller.html ・ [Null hypothesis, significance level] http://www.gen-info.osaka-u.ac.jp/MEPHAS/express/express11.html

`res = sm.tsa.stattools.adfuller(df.follower)`

The output result is as follows

`p-value = 0.9774`


⇨p-value  >  0.05

Therefore, it cannot be said to have stationarity. In order to have stationarity, the difference is taken and the seasonality is removed.

`predict.py`


data = [Scatter(x=df.index, y=df.follower.diff())]

Then seasonal removal.

`predict.py`


data = [Scatter(x=df.index, y=df.follower-res.seasonal)]

This will perform the ADF test again.

`p-value = 1.109e-25`


⇨p-value  <  0.05

As a result, we were able to process time-series data with stationarity.

5. Forecasting time series data

In the case of SARIMA model, creating a model for each data

`predict.py`


# coding:utf-8
from statsmodels.tsa.statespace.sarimax import SARIMAX

model = SARIMAX(
    train,
    order=(p, d, q),
    seasonal_order=(sa, sd, sq, s),
    enforce_stationarity=False,
    enforce_invertibility=False)
result = model.fit()

Do it with. order = (p, d, q) is a parameter of the ARIMA model seasonal_order = (sp, sd, sq, s) is a seasonal parameter

See ↓ ・ [Statsmodels.tsa.statespace.sarimax.SARIMAX] https://www.statsmodels.org/dev/generated/statsmodels.tsa.statespace.sarimax.SARIMAX.html ・ [Analysis of time series data with SARIMA (prediction of PV number)] https://www.kumilog.net/entry/sarima-pv @xkumiyu

Next, create a Prophet model.

Prophet will build a model just by typing in the learning data. It realizes "I don't know what I'm doing, but I've done something that seems to be predictable." Starting today, I can become a data scientist with a 2-second copy and paste.

`predict.py`


# coding:utf-8
import pandad as pd
import numpy as np
from fbprophet import Prophet

data = pd.read_csv('follower.csv')
data.follower= data.follower.apply(lambda x: int(x.replace(',', '')))
#The column name is'ds','y'Must be set to
data = data.rename(columns={'date': 'ds', 'follower': 'y'})
model = Prophet()
model.fit(data)

6. Forecasting time series data

・ SARIMA model

Prediction of test data applied to SARIMA model

2019-12-01  38002.878685
2019-12-02  38001.204647
2019-12-03  37998.080676
2019-12-04  37988.324131
2019-12-05  37981.134367
2019-12-06  37974.569498
2019-12-07  37966.333432
2019-12-08  37958.270232
2019-12-09  37956.258566
2019-12-10  37952.875398

・ Prophet model

Prediction of test data applied to Prophet model

2019-12-01  37958.337506
2019-12-02  37959.963661
2019-12-03  37957.304699
2019-12-04  37943.272430
2019-12-05  37934.533210
2019-12-06  37920.537811
2019-12-07  37908.529618
2019-12-08  37905.819057
2019-12-09  37907.445213
2019-12-10  37904.786251

I'm lonely so I'll plot

[Overall view]

[Prediction part]

[Enlarged view of the predicted part]

7. What I found

Let's look at the forecast data for the day after the last day of the training data.

date, follower

#Real data
2019-12-01, 38003.000000

# SARIMA
2019-12-01, 38002.878685

# Prophet
2019-12-01, 37958.337506

As you can see from the [Expanded view of the predicted part], the predictions for the next day of the training data are almost the same in the SARIMA data. The prediction of the next time point of the training data seems to be suitable.

Prophet was honestly subtle.

7. Let's make a one-day intensive forecast

I thought that it would work unexpectedly if I learned until 2019/12/09 and put out the predicted value of 2019/12/10, so I will try it.

Results below

date, follower

#Real data
2019-12-10, 37861.000000

# SARIMA
2019-12-10  37868.158032

It feels good. After all, if it is a forecast for only one day, it seems that a relatively good accuracy of a practical level will come out.

As I say many times, Prophet was honestly subtle.

Summary

Prophet is convenient, but it lacks practicality. With the SARIMA model, I felt that the prediction of time-series data could be used in one day. I wanted to compare a little more models at once. See you next time.

Also, the number of followers will decrease.

Comparison of time series data predictions between SARIMA and Prophet models

1.First of all

Contents of this time

2. Environment

3. Preparation

4. Processing time series data

res = sm.tsa.stattools.adfuller(df.follower)

p-value = 0.9774

predict.py

predict.py

p-value = 1.109e-25

5. Forecasting time series data

predict.py

predict.py

6. Forecasting time series data

7. What I found

7. Let's make a one-day intensive forecast

Summary

`res = sm.tsa.stattools.adfuller(df.follower)`

`p-value = 0.9774`

`predict.py`

`predict.py`

`p-value = 1.109e-25`

`predict.py`

`predict.py`