Prophet developed by Facebook is easy to understand and I have been using it for a long time for forecasting time series data, but recently I have been able to use Prophet for sklearn-like and wrap other time series analysis methods as well. I found out that there is a library called Darts, so I tried to check its usability using the data of Covid-19, so I will explain it.

What is Darts

https://github.com/unit8co/darts Darts is a library that Swiss companies regretted in June 2020. Deeplearning such as Prophet and LSTM, and statistical models such as ARIMA are all very convenient because they are libraries that can be handled by sklearn-based API. .. ..

Installation method

You can install it with pip.

pip install 'u8darts[all]'

You can also install it with pip install u8darts without adding [all], but then pytorch etc. when running LSTM did not seem to be installed and an error occurred. For the time being, it may not be necessary if you just want to check the usability.

Execution environment

OS: macOS ver11.1 CPU: core i5 Memory: 16GB python: 3.8.7 Darts: 0.5.0

Data preparation

I think I used the data of Covid-19 this time. Download the data on the number of positives by date with the download link on the Ministry of Health, Labor and Welfare website. https://www.mhlw.go.jp/content/pcr_positive_daily.csv While thinking that it is necessary to use the number of PCR tests to predict positive people, this time we will build a model that simply predicts the number of positive people in the future from the number of positive people in the past, which is a verification of Darts. ..

I tried using it

Library installation

import warnings
warnings.simplefilter('ignore') #Many warnings will be issued, so those who are interested should do it
import pandas as pd
import darts
from darts import TimeSeries #Darts data type conversion module
import matplotlib.pyplot as plt

Data reading

df = pd.read_csv('https://www.mhlw.go.jp/content/pcr_positive_daily.csv') #Data up to January 14 can be downloaded at the time of article creation

The contents of the data look like this. Delicious! Just one day less than a year! スクリーンショット 2021-01-16 15.03.47.png

Data type conversion.

Darts does the conversion from pandas DataFrame with the TimeSeries module. This time, we will try to predict after December 01, 2012. This area is quite helpful based on sklearn's API. Intuitively easy to understand.

ts = TimeSeries.from_dataframe(df, time_col='date', value_cols='Number of PCR positives(Single day)')
train, val = ts.split_after(pd.Timestamp('20201201'))

Creating a learning model

They have prepared a lot of learning models. Since the deep learning type learning model requires another effort to convert the data, the other models are executed with the for statement. I've written it many more times, but after all the sklearn base is easy. Just run it with fit and predict, which you don't know how many times you've done it.

#Import model
from darts.models import ExponentialSmoothing, NaiveSeasonal, NaiveDrift, Prophet, ARIMA
from darts.models import AutoARIMA, StandardRegressionModel, Theta, FFT

models = [ExponentialSmoothing(), 
          NaiveSeasonal(), 
          NaiveDrift(), 
          Prophet(daily_seasonality=True, yearly_seasonality=True), 
          Prophet(daily_seasonality=True, yearly_seasonality=True, weekly_seasonality=True),#Since the number of inspections varies depending on the day of the week, we have prepared a version to see the periodicity of the week.
          ARIMA(), 
          AutoARIMA(), 
          StandardRegressionModel(), 
          Theta(), 
          FFT()]

for model in models:
    print(model.__str__())
    try: #This is for avoidance because some models will cause an error when executed.
        model.fit(train) #How to sklearn
        prediction = model.predict(len(val))
        #Confirmation by visualization
        plt.figure(figsize=(12, 5))
        ts.split_after(pd.Timestamp('20201101')) [1].plot(label='actual', lw=1) #When displayed from the beginning, it was difficult to see the part that deviated from the important prediction result, so the plot from 20101011
        prediction.plot(label='forecast', lw=1)
        plt.legend()
        plt.xlabel('Day')
        plt.show()
    except Exception as e:
        print('error¥t :{}'.format(e))

Execution result

Exponetial smoothing Naive seasonal model Naive drift model Prophet Prophet(Weekly True) ARIMA Auto-ARIMA Theta It was an error! You have to look at the official document. .. ..

FFT

The result seems to be that the ARIMA model and Exponetial smoothing can be learned well. Prophet's Weekly defaults to Auto, so the result was the same. Even if you look at the actual line, it has risen explosively since April. It is understandable that this was declared an emergency and converged.

I tried the deep learning model

Try LSTM. For the parameters, I used the parameters described in the article that I referred to. The article is at the bottom of the page.

Data processing

from darts.models import TCNModel, RNNModel
from darts.dataprocessing.transformers import Scaler
from darts.metrics import mape, r2_score
from darts.utils.missing_values import fill_missing_values

#Data preparation. Scaler is 0,It seems to be a sklearn wrapper that normalizes with 1.
scaler = Scaler()
train_tr = scaler.fit_transform(train)
val_tr = scaler.transform(val)
ts_tr = scaler.transform(ts)

LSTM It will take some time.

model = RNNModel(
    model='LSTM',
    output_length=1, #Number of output (= prediction) time steps
    hidden_size=25, #Number of hidden states in RNN
    n_rnn_layers=3, #Number of hidden layers of RNN
    input_length=12, # Number of previous time stamps taken into account.(?did not understand…)
    dropout=0.4,
    batch_size=16,
    n_epochs=400,
    optimizer_kwargs={'lr': 1e-3},
    log_tensorboard=True,
    random_state=42
)
model.fit(train_tr, val_training_series=val_tr, verbose=True)

Confirmation of execution result

prediction = model.predict(len(val))
fig = plt.figure(figsize=(12, 5))
ts_tr_after10 = ts_tr.drop_before(pd.Timestamp('20201001'))
ts_tr_after10.plot(label='actual')
prediction.plot(label='forecast', color='red')
plt.legend()

The accuracy is subtle ...

Summary

Darts found it quite useful. I don't think there will be any in the future, but I will investigate whether the accuracy has deteriorated compared to the accuracy when using the original Prophet, and if there is no problem, I will use Darts. Since it is sklearn-like, I would like to search for hyperparameters and try hard. Also, it is natural, but the accuracy is not high because it was just executed with the default parameters without any ingenuity of data. This isn't the library's fault, but the one I skipped. Also, the backtesting system is well-developed, so I'd like to try that area as well.

greeting

This time I got a state of emergency and I couldn't go unless I stayed at home, so I wrote an article for the first time. Since I started doing it, I hope I can write it little by little in the future. I would be grateful if you could feel free to comment on the modified part of the code. I am going to use it as an example.

Reference article

https://blog.ikedaosushi.com/entry/2020/08/25/003557 https://qiita.com/hironey/items/d1d8a80c8329d5d46c16

I tried to predict Covid-19 using Darts