This is the 19th day article of Gunosy Advent Calender 2015. This year is also over.
I started working at Gunosy in November and it's been a lot of fun. Since I usually do data analysis and algorithm development, this time I will briefly introduce the time series analysis used in business analysis.
Time series analysis is an attempt to capture the fluctuation of a certain phenomenon in relation to past movements
.
[From "Introduction to Time Series Analysis" by Genshiro Kitagawa](http://www.amazon.co.jp/%E6%99%82%E7%B3%BB%E5%88%97%E8%A7%A3%E6 % 9E% 90% E5% 85% A5% E9% 96% 80-% E5% 8C% 97% E5% B7% 9D-% E6% BA% 90% E5% 9B% 9B% E9% 83% 8E / dp / 4000054554)
The terms "data-driven management" and "big data" are beginning to take hold, and I think many companies are making decisions to improve their products based on the data. However, data that changes daily (especially indicators called sales and KPIs) varies widely, and it can be difficult to properly grasp changes. Therefore, time-series analysis can be used to properly capture changes and make predictions with some accuracy.
This time, we will introduce seasonally adjusted data.
Roughly speaking, time series data
Observed value = trend component + seasonal component + noise component
This is the model explained in.
The apps we provide are also influenced by the rhythm of human life as long as they are closely related to human life. There are roughly "month factor", "day of the week factor", and "time factor", but this time I will focus on the day of the week and implement the sample.
I would like to implement it using the data of TEPCO. First, I will do it with R. First, output the raw data.
data <- read.csv("tokyo2015_day.csv", header=T) #Get data from csv
power <- data[,2] #Extract numbers
plot(power, type="l") #plot
It's jagged.
R has a ts function
that converts data to periodic data and a stl function
function that converts seasonally adjusted time series data, and you can easily create a seasonally adjusted model using these functions.
data <- read.csv("tokyo2015_day.csv", header=T) #Get data from csv
power <- data[,2] #Extract numbers
plot(power, type="l") #plot
ts <- ts(power, frequency=7) #Cycle is 7 days(1 week)
stl <- stl(ts, s.window="periodic") #Seasonally adjusted time series data creation
plot(stl, type="l") #plot
The top of the four graphs is the raw data (observed values), which can be divided into trend component
, seasonal component
, and noise component
in order from the top.
In data analysis, you can capture long-term changes by looking at trend components.
After all, summer and winter are high, and the influence of the day of the week seems to be large (you can see that many people live in a similar cycle). The main items for the day of the week component are as follows. Since January 1st, 2015 starts and January 1st is Thursday, you can see that the seasonal components of holidays (Saturday, Sunday) are negative.
$ print(stl$time.series[,1]) #Output seasonal components
2321.9288 1927.3324 -2517.1524 -6122.9112 293.1919 1872.1087 2225.5017...
When you see the amount of electricity used in the summer drop sharply, you can't help but think, "It's true that it has suddenly cooled down since September this year." (* I can't say anything unless I compare it with other years)
I also do it in Python. This is Jupyter. What we are doing is the same.
import csv
import datetime as datetime
import matplotlib.pyplot as plt
import pandas as pd
from statsmodels.tsa.seasonal import seasonal_decompose
%matplotlib inline
filename = "tokyo2015_day.csv"
with open(filename, 'rt') as f:
data = list(csv.reader(f))
headers = data.pop(0)
df = pd.DataFrame(data, columns=headers)
dataFrame = DataFrame(df['power'].values.astype(int), DatetimeIndex(start='2015-01-01', periods=len(df['power']), freq='D'))
ts = seasonal_decompose(dataFrame.values, freq=7)
plt.plot(ts.trend) #Trend component
plt.plot(ts.seasonal) #Seasonal ingredients
plt.plot(ts.resid) #Noise component
Part of the Python code was rewritten from R to Python by @moyomot.
The end
[Genshiro Kitagawa "Introduction to Time Series Analysis"](http://www.amazon.co.jp/%E6%99%82%E7%B3%BB%E5%88%97%E8%A7%A3%E6% 9E% 90% E5% 85% A5% E9% 96% 80-% E5% 8C% 97% E5% B7% 9D-% E6% BA% 90% E5% 9B% 9B% E9% 83% 8E / dp / 4000054554)