It's been a long time since I posted with the title of the same name before, and it's getting a little ripe, so I'll post it again.
Please refer to the following article created on 2017/12/22. Power usage forecast with TensorFlow with Keras
First, download the power demand data from the TEPCO website.
http://www.tepco.co.jp/forecast/html/download-j.html
http://www.tepco.co.jp/forecast/html/images/juyo-2016.csv
Also, if you change the URL, you can get the data of 2014.
http://www.tepco.co.jp/forecast/html/images/juyo-2014.csv
The downloaded data is a CSV of the date, time, and actual power.
By the way, it seems that you can also get it with the following command.
python
$ curl -O http://www.tepco.co.jp/forecast/html/images/juyo-2014.csv
$ curl -O http://www.tepco.co.jp/forecast/html/images/juyo-2016.csv
Like last time, we will download the past weather data from the Japan Meteorological Agency.
http://www.data.jma.go.jp/gmd/risk/obsdl/index.php
The point is "Tokyo", the items are "hourly value" and "temperature", and the period is 2013/12/31 to 2015/1/1 and 2015/12/31 to 2017/1/1, data-2014.csv, Save it as data-2016.csv. The reason for choosing a slightly longer period here is to narrow down the period later.
The downloaded data is a CSV containing data such as date and time, temperature, quality information, and homogeneous number.
By the way, this seems to be downloaded from the site normally.
python
import pandas as pd
import numpy as np
import datetime as dt
import math
First, load the 2014 power data.
python
filename = "juyo-2014.csv"
#The character code is Shift JIS, and unnecessary lines are skipped and read.
df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=2)
#Convert column names
df.columns = ["DATE","TIME","KW"]
#Since the date and time data are separated, connect them into one, convert it to date and time type, and specify it as an index.
df.index = df.index.map(lambda x: dt.datetime.strptime(df.loc[x].DATE + " " + df.loc[x].TIME,"%Y/%m/%d %H:%M"))
#Get monthly data
df["MONTH"] = df.index.month
#Acquisition of day of the week data
df["WEEK"] = df.index.weekday
#Acquisition of time data
df["HOUR"] = df.index.hour
df_kw = df
Next, load the temperature data for 2014.
python
filename = "data-2014.csv"
#Character code is Shift JIS, skip unnecessary lines and get only the required 2 columns
df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=4)[[0,1]]
#Convert column names
df.columns = ["DATE","TEMP"]
#Convert date and time data to date and time type and specify it as an index
df.index = df.index.map(lambda x: dt.datetime.strptime(df.loc[x].DATE,"%Y/%m/%d %H:%M:%S"))
df_temp = df
python
d1 = df_kw.index.min()
d2 = df_kw.index.max()
df_kw["TEMP"] = df_temp.ix[d1:d2].TEMP
Acquires input data and output data used for machine learning. Since we are predicting power demand, we will use the KW column for the output and the MONTH, WEEK, HOUR, and TEMP columns for the input.
python
#Specifying the data string used for input
X_cols = ["MONTH","WEEK","HOUR","TEMP"]
#Specifying the data column to use for output
y_cols = ["KW"]
#Acquisition of input / output data
X = df_kw[X_cols].as_matrix().astype('float')
y = df_kw[y_cols].as_matrix().astype('int').flatten()
Divide into training data and validation data.
python
from sklearn import cross_validation
X_train, X_test, y_train, y_test = cross_validation.train_test_split(X, y, test_size=.1, random_state=42)
Normalizes the input data.
python
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
Learn with a regression model.
python
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor()
model.fit(X_train, y_train)
Calculate the score using the divided test data.
python
print(model.score(X_test,y_test))
The score was "0.91601162513664502" (^-^)
Let's graph the prediction result and the actual data and check it.
python
#Prediction result
result = model.predict(X_test)
#Convert to data frame
df_result = pd.DataFrame({
"y_test":y_test,
"result":result
})
#Graph library
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
#Graph drawing
df_result.plot(figsize=(15, 3))
It looks like it's hitting, but I'm not sure how it is.
Reduce the number of data and reconfirm.
python
#Graph drawing
df_result[:20].plot(figsize=(15, 3))
Isn't it a good feeling!
Load the 2016 data using the same procedure as the 2014 data.
python
#Power demand
filename = "juyo-2016.csv"
df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=2)
df.columns = ["DATE","TIME","KW"]
df.index = df.index.map(lambda x: dt.datetime.strptime(df.loc[x].DATE + " " + df.loc[x].TIME,"%Y/%m/%d %H:%M"))
df["MONTH"] = df.index.month
df["WEEK"] = df.index.weekday
df["HOUR"] = df.index.hour
#Use only for April
df_kw = df[df.index.month == 4]
#temperature
filename = "data-2016.csv"
df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=4)[[0,1]]
df.columns = ["DATE","TEMP"]
df.index = df.index.map(lambda x: dt.datetime.strptime(df.loc[x].DATE,"%Y/%m/%d %H:%M:%S"))
df_temp = df
#Data join
d1 = df_kw.index.min()
d2 = df_kw.index.max()
df_kw["TEMP"] = df_temp.ix[d1:d2].TEMP
python
#Acquisition of input / output data
X = df_kw[X_cols].as_matrix().astype('float')
y = df_kw[y_cols].as_matrix().astype('int').flatten()
X_test = scaler.transform(X)
y_test = y
Predict and calculate scores using a model trained with 2014 data.
python
model.score(X_test,y_test)
The result was "0.82435418225963963", which was a little lower.
python
#Prediction result
result = model.predict(X_test)
#Convert to data frame
df_result = pd.DataFrame({
"y_test":y_test,
"result":result
})
#Graph drawing
df_result.plot(figsize=(15, 3))
I need a little more ingenuity (-_-;)
Recommended Posts