I've tried how much power demand can be predicted by machine learning, so I'll organize the procedure.
It's been half a year since I wrote this article, and I've learned a lot about new things, so I wrote the article for Part 2. Predicting power demand with machine learning Part 2
The following article was also created on 2017/12/22, so please refer to it as well. Power usage forecast with TensorFlow with Keras
Machine: MacBook Air Mid 2012 Language: Python 3.5.1 Run: Jupyter notebook 4.2.1 Library: scikit-learn on Anaconda 4.1.0
First, download the power demand data from the TEPCO website.
http://www.tepco.co.jp/forecast/html/download-j.html
The file name was "juyo-2016.csv". The downloaded data is a CSV of the date, time, and actual power, but since it contains a little extra character string, correct it and save it in UTF-8.
The format of the saved data is as follows.
DATE | TIME | Performance(10,000 kW) |
---|---|---|
2016/04/01 | 00:00 | 234 |
2016/04/01 | 01:00 | 235 |
... |
Next, when I searched for some reference material, I was able to download the past weather data from the Japan Meteorological Agency, so I will download the data for the same period and interval as the power demand data.
http://www.data.jma.go.jp/gmd/risk/obsdl/index.php
The file name was "data.csv". The downloaded data is CSV of date and time, temperature, quality information, and homogeneous number, but since it contains a little extra character string, correct it and save it in UTF-8.
The format of the saved data is as follows.
Date and time | temperature(℃) |
---|---|
2016-04-01 00:00 | 15 |
2016-04-01 01:00 | 16 |
... |
python
import pandas as pd
import numpy as np
#Reading power data
kw_df = pd.read_csv("juyo-2016.csv")
#Reading temperature data
temp_df = pd.read_csv("data.csv")
python
import pandas as pd
url = "http://www.tepco.co.jp/forecast/html/images/juyo-2016.csv"
kw_df = pd.read_csv(url, encoding="shift_jis", skiprows=2)
kw_df.head()
file = "data.csv"
temp_df = pd.read_csv(file, encoding="shift_jis", skiprows=4)
temp_df = temp_df[temp_df.columns[:2]]
temp_df.columns = ["Date and time","temperature(℃)"]
When applying to machine learning, we combine data and convert it to numerical data for machine learning. First, the date data is useless as it is, so let's convert it to the day of the week data.
[Example] Sunday-> 0 Monday-> 1 ...And
Next, since it is hourly data, convert the time to data.
[Example] 0:00 -> 0 1:00 -> 1 ...And
python
#Data combination
df = kw_df
df["temperature"] = temp_df["temperature(℃)"]
#Acquisition of day of the week data
import datetime
pp = df["DATE"]
tmp = []
for i in range(len(pp)):
d = datetime.datetime.strptime(pp[i], "%Y/%m/%d")
tmp.append(d.weekday())
df["weekday"] = tmp
#Acquisition of time data
pp = df["TIME"]
tmp = []
for i in range(len(pp)):
d = datetime.datetime.strptime(pp[i], "%H:%M")
tmp.append(d.hour)
df["hour"] = tmp
Create training data and test data from the processed data. Here, the variables used for input are "temperature", "day of the week", and "time", and the variables to be output are "power".
The sequence of processed data is "DATE", "TIME", "actual (10,000 kW)", "temperature", "weekday", "hour", so input (explanatory variable) is 3,4, It is acquired from the 5th column, and the output (output variable) uses the actual result (10,000 kW) in the 2nd column. It also normalizes the data for machine learning.
python
#input
pp = df[["temperature","weekday","hour"]]
X = pp.as_matrix().astype('float')
#output
pp = df["Performance(10,000 kW)"]
y = pp.as_matrix().flatten()
#Load the cross-validation module
from sklearn import cross_validation
#Training set with labeled data(X_train, y_train)And test set(X_test, y_test)Divided into
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=.2, random_state=42)
#Load the normalization module
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
Let's try machine learning with SVM.
python
#Module loading
from sklearn import svm
model = svm.SVC()
#Learning
model.fit(X_train, y_train)
# model.score(X,y)Use to get prediction accuracy
print(model.score(X_test,y_test))
The predicted score is "0.00312989045383"! !! I was surprised at how low the score was! !! It's no good! You think ...
In order to see what it looks like, I drew a graph and checked it.
python
#Error calculation
pp = pd.DataFrame({'kw': np.array(y_test), "result": np.array(result)})
pp["err"] = pp["kw"] - pp["result"]
pp.plot()
Somehow, it seems to follow as it is (^-^) Check the prediction results with actual values.
python
err_max = 0
err_min = 50000
err_ave = 0
for i in range(len(pp)):
if err_max < pp["err"][i]:
err_max = pp["err"][i]
if err_min > pp["err"][i]:
err_min = pp["err"][i]
err_ave += pp["err"][i]
print(err_max)
print(err_min)
print(err_ave / i)
The execution result is as follows.
1571
-879
114.81661442
Well, what about this result? I can't tell if it's good or bad ... (-_-;)
Actually, I think I have to think more about it, but I thought that if there was a situation where I had to forecast the power demand in a situation where there was little past data, it would be a good result.
By the way, the Jupyter Notebook file and the processed CSV file have been released on GitHub, so please refer to them as well.
https://github.com/shinob/predict_kw
Recommended Posts