Introduction

It's been a long time since I posted with the title of the same name before, and it's getting a little ripe, so I'll post it again.

Postscript

Please refer to the following article created on 2017/12/22. Power usage forecast with TensorFlow with Keras

Data collection

Power demand

First, download the power demand data from the TEPCO website.

http://www.tepco.co.jp/forecast/html/download-j.html

http://www.tepco.co.jp/forecast/html/images/juyo-2016.csv

As of January 14, 2017, we were able to download hourly data from April 1, 2016 to December 31, 2016.

Also, if you change the URL, you can get the data of 2014.

http://www.tepco.co.jp/forecast/html/images/juyo-2014.csv

As of January 14, 2017, the data for 2015 was empty and did not contain any data.

The downloaded data is a CSV of the date, time, and actual power.

By the way, it seems that you can also get it with the following command.

`python`


$ curl -O http://www.tepco.co.jp/forecast/html/images/juyo-2014.csv
$ curl -O http://www.tepco.co.jp/forecast/html/images/juyo-2016.csv

temperature

Like last time, we will download the past weather data from the Japan Meteorological Agency.

http://www.data.jma.go.jp/gmd/risk/obsdl/index.php

The point is "Tokyo", the items are "hourly value" and "temperature", and the period is 2013/12/31 to 2015/1/1 and 2015/12/31 to 2017/1/1, data-2014.csv, Save it as data-2016.csv. The reason for choosing a slightly longer period here is to narrow down the period later.

The downloaded data is a CSV containing data such as date and time, temperature, quality information, and homogeneous number.

By the way, this seems to be downloaded from the site normally.

Data reading

Library

`python`


import pandas as pd
import numpy as np
import datetime as dt
import math

Power demand

First, load the 2014 power data.

`python`


filename = "juyo-2014.csv"

#The character code is Shift JIS, and unnecessary lines are skipped and read.
df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=2)

#Convert column names
df.columns = ["DATE","TIME","KW"]

#Since the date and time data are separated, connect them into one, convert it to date and time type, and specify it as an index.
df.index = df.index.map(lambda x: dt.datetime.strptime(df.loc[x].DATE + " " + df.loc[x].TIME,"%Y/%m/%d %H:%M"))

#Get monthly data
df["MONTH"] = df.index.month

#Acquisition of day of the week data
df["WEEK"] = df.index.weekday

#Acquisition of time data
df["HOUR"] = df.index.hour

df_kw = df

temperature

Next, load the temperature data for 2014.

`python`


filename = "data-2014.csv"

#Character code is Shift JIS, skip unnecessary lines and get only the required 2 columns
df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=4)[[0,1]]

#Convert column names
df.columns = ["DATE","TEMP"]

#Convert date and time data to date and time type and specify it as an index
df.index = df.index.map(lambda x: dt.datetime.strptime(df.loc[x].DATE,"%Y/%m/%d %H:%M:%S"))

df_temp = df

Combine power demand and temperature data

`python`


d1 = df_kw.index.min()
d2 = df_kw.index.max()

df_kw["TEMP"] = df_temp.ix[d1:d2].TEMP

Data processing

Acquires input data and output data used for machine learning. Since we are predicting power demand, we will use the KW column for the output and the MONTH, WEEK, HOUR, and TEMP columns for the input.

`python`


#Specifying the data string used for input
X_cols = ["MONTH","WEEK","HOUR","TEMP"]

#Specifying the data column to use for output
y_cols = ["KW"]

#Acquisition of input / output data
X = df_kw[X_cols].as_matrix().astype('float')
y = df_kw[y_cols].as_matrix().astype('int').flatten()

Divide into training data and validation data.

`python`


from sklearn import cross_validation

X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=.1, random_state=42)

Normalizes the input data.

`python`


from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Learning

Learn with a regression model.

`python`


from sklearn.ensemble import RandomForestRegressor

model = RandomForestRegressor()
model.fit(X_train, y_train)

Forecast

Calculate the score using the divided test data.

`python`


print(model.score(X_test,y_test))

The score was "0.91601162513664502" (^-^)

Confirmation of forecast results

Let's graph the prediction result and the actual data and check it.

`python`


#Prediction result
result = model.predict(X_test)

#Convert to data frame
df_result = pd.DataFrame({
    "y_test":y_test,
    "result":result
})

#Graph library
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt

#Graph drawing
df_result.plot(figsize=(15, 3))

It looks like it's hitting, but I'm not sure how it is.

Reduce the number of data and reconfirm.

`python`


#Graph drawing
df_result[:20].plot(figsize=(15, 3))

Isn't it a good feeling!

Forecast using 2016 data

Data reading

Load the 2016 data using the same procedure as the 2014 data.

`python`


#Power demand
filename = "juyo-2016.csv"

df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=2)
df.columns = ["DATE","TIME","KW"]
df.index = df.index.map(lambda x: dt.datetime.strptime(df.loc[x].DATE + " " + df.loc[x].TIME,"%Y/%m/%d %H:%M"))
df["MONTH"] = df.index.month
df["WEEK"] = df.index.weekday
df["HOUR"] = df.index.hour

#Use only for April
df_kw = df[df.index.month == 4]

#temperature
filename = "data-2016.csv"

df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=4)[[0,1]]
df.columns = ["DATE","TEMP"]
df.index = df.index.map(lambda x: dt.datetime.strptime(df.loc[x].DATE,"%Y/%m/%d %H:%M:%S"))

df_temp = df

#Data join
d1 = df_kw.index.min()
d2 = df_kw.index.max()
df_kw["TEMP"] = df_temp.ix[d1:d2].TEMP

Data processing

`python`


#Acquisition of input / output data
X = df_kw[X_cols].as_matrix().astype('float')
y = df_kw[y_cols].as_matrix().astype('int').flatten()

X_test = scaler.transform(X)
y_test = y

Forecast

Predict and calculate scores using a model trained with 2014 data.

`python`


model.score(X_test,y_test)

The result was "0.82435418225963963", which was a little lower.

Confirmation of forecast results

`python`


#Prediction result
result = model.predict(X_test)

#Convert to data frame
df_result = pd.DataFrame({
    "y_test":y_test,
    "result":result
})

#Graph drawing
df_result.plot(figsize=(15, 3))

I need a little more ingenuity (-_-;)

Predict power demand with machine learning Part 2

Introduction

Postscript

Data collection

Power demand

python

temperature

Data reading

Library

python

Power demand

python

temperature

python

Combine power demand and temperature data

python

Data processing

python

python

python

Learning

python

Forecast

python

Confirmation of forecast results

python

python

Forecast using 2016 data

Data reading

python

Data processing

python

Forecast

python

Confirmation of forecast results

python

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`

`python`