Introduction

I've tried how much power demand can be predicted by machine learning, so I'll organize the procedure.

Postscript

It's been half a year since I wrote this article, and I've learned a lot about new things, so I wrote the article for Part 2. Predicting power demand with machine learning Part 2

Postscript

The following article was also created on 2017/12/22, so please refer to it as well. Power usage forecast with TensorFlow with Keras

Environment used

Machine: MacBook Air Mid 2012 Language: Python 3.5.1 Run: Jupyter notebook 4.2.1 Library: scikit-learn on Anaconda 4.1.0

Data collection

Power demand

First, download the power demand data from the TEPCO website.

http://www.tepco.co.jp/forecast/html/download-j.html

As of August 12, 2016, we were able to download hourly data from April 1, 2016 to August 11, 2016.

The file name was "juyo-2016.csv". The downloaded data is a CSV of the date, time, and actual power, but since it contains a little extra character string, correct it and save it in UTF-8.

The format of the saved data is as follows.

DATE	TIME	Performance(10,000 kW)
2016/04/01	00:00	234
2016/04/01	01:00	235
...

temperature

Next, when I searched for some reference material, I was able to download the past weather data from the Japan Meteorological Agency, so I will download the data for the same period and interval as the power demand data.

http://www.data.jma.go.jp/gmd/risk/obsdl/index.php

The file name was "data.csv". The downloaded data is CSV of date and time, temperature, quality information, and homogeneous number, but since it contains a little extra character string, correct it and save it in UTF-8.

The format of the saved data is as follows.

Date and time	temperature(℃)
2016-04-01 00:00	15
2016-04-01 01:00	16
...

Data capture

`python`


import pandas as pd
import numpy as np

#Reading power data
kw_df = pd.read_csv("juyo-2016.csv")

#Reading temperature data
temp_df = pd.read_csv("data.csv")

It is also possible to directly specify the URL for power data and read the downloaded data as it is for temperature data.

`python`


import pandas as pd

url = "http://www.tepco.co.jp/forecast/html/images/juyo-2016.csv"

kw_df = pd.read_csv(url, encoding="shift_jis", skiprows=2)
kw_df.head()

file = "data.csv"
temp_df = pd.read_csv(file, encoding="shift_jis", skiprows=4)
temp_df = temp_df[temp_df.columns[:2]]
temp_df.columns = ["Date and time","temperature(℃)"]

Data processing

When applying to machine learning, we combine data and convert it to numerical data for machine learning. First, the date data is useless as it is, so let's convert it to the day of the week data.

[Example] Sunday-> 0 Monday-> 1 ...And

Next, since it is hourly data, convert the time to data.

[Example] 0:00 -> 0 1:00 -> 1 ...And

`python`


#Data combination
df = kw_df
df["temperature"] = temp_df["temperature(℃)"]

#Acquisition of day of the week data

import datetime

pp = df["DATE"]
tmp = []

for i in range(len(pp)):
    d = datetime.datetime.strptime(pp[i], "%Y/%m/%d")
    tmp.append(d.weekday())
    
df["weekday"] = tmp

#Acquisition of time data

pp = df["TIME"]
tmp = []

for i in range(len(pp)):
    d = datetime.datetime.strptime(pp[i], "%H:%M")
    tmp.append(d.hour)
    
df["hour"] = tmp

Creation of training data and test data

Create training data and test data from the processed data. Here, the variables used for input are "temperature", "day of the week", and "time", and the variables to be output are "power".

The sequence of processed data is "DATE", "TIME", "actual (10,000 kW)", "temperature", "weekday", "hour", so input (explanatory variable) is 3,4, It is acquired from the 5th column, and the output (output variable) uses the actual result (10,000 kW) in the 2nd column. It also normalizes the data for machine learning.

`python`


#input
pp = df[["temperature","weekday","hour"]]
X = pp.as_matrix().astype('float')

#output
pp = df["Performance(10,000 kW)"]
y = pp.as_matrix().flatten()

#Load the cross-validation module
from sklearn import cross_validation

#Training set with labeled data(X_train, y_train)And test set(X_test, y_test)Divided into
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=.2, random_state=42)

#Load the normalization module
from sklearn.preprocessing import StandardScaler

scaler = StandardScaler()
scaler.fit(X_train)

X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)

Learning

Let's try machine learning with SVM.

`python`


#Module loading
from sklearn import svm
model = svm.SVC()

#Learning
model.fit(X_train, y_train)
# model.score(X,y)Use to get prediction accuracy
print(model.score(X_test,y_test))

The predicted score is "0.00312989045383"! !! I was surprised at how low the score was! !! It's no good! You think ...

Confirmation of forecast results

In order to see what it looks like, I drew a graph and checked it.

`python`


#Error calculation
pp = pd.DataFrame({'kw': np.array(y_test), "result": np.array(result)})
pp["err"] = pp["kw"] - pp["result"]

pp.plot()

Somehow, it seems to follow as it is (^-^) Check the prediction results with actual values.

`python`


err_max = 0
err_min = 50000
err_ave = 0

for i in range(len(pp)):
    if err_max < pp["err"][i]:
        err_max = pp["err"][i]
    if err_min > pp["err"][i]:
        err_min = pp["err"][i]
    err_ave += pp["err"][i]

print(err_max)
print(err_min)
print(err_ave / i)

The execution result is as follows.

1571
-879
114.81661442

Well, what about this result? I can't tell if it's good or bad ... (-_-;)

Actually, I think I have to think more about it, but I thought that if there was a situation where I had to forecast the power demand in a situation where there was little past data, it would be a good result.

By the way, the Jupyter Notebook file and the processed CSV file have been released on GitHub, so please refer to them as well.

https://github.com/shinob/predict_kw

Try to forecast power demand by machine learning

Introduction

Postscript

Postscript

Environment used

Data collection

Power demand

temperature

Data capture

python

python

Data processing

python

Creation of training data and test data

python

Learning

python

Confirmation of forecast results

python

python

`python`

`python`

`python`

`python`

`python`

`python`

`python`