Previously, I explained how to predict the water level from precipitation, but after that, when I investigated various things, it became possible to predict the water level one hour later with an accuracy of about 95%, so I will reorganize it. I will make an article.
item | Contents |
---|---|
Machine | MacBook Air (13-inch, Early 2015) |
Processor | 2.2 GHz Intel Core i7 |
memory | 8 GB 1600 MHz DDR3 |
Python | 3.6.0 :: Anaconda 4.3.1 (x86_64) |
Jupyter Notebook | 4.2.1 |
Please refer to the following URL for the usual front miso.
Procedure to quickly create a deep learning environment on Mac with TensorFlow and OpenCV
Open Data List | Data City Sabae Portal Site
If you select the "Disaster prevention" group on the above website, you will see the following notation. Click the "CSV" button and download the CSV from the displayed link.
In addition, past weather data can be downloaded from the Japan Meteorological Agency, so we will download hourly precipitation data in Fukui City.
Japan Meteorological Agency | Past Meteorological Data Download
Use Jupyter Notebook to load the following libraries.
python
from ipywidgets import FloatProgress
from IPython.display import display
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
import datetime
python
#Read file
filename = "sparql.csv"
df_level = pd.read_csv(filename, header=None, skiprows=1)
#Rename column
df_level.columns = ["url","datetime","level"]
#Convert date and time to timestamp
df_level["datetime"] = df_level.datetime.map(lambda _: pd.to_datetime(_))
#Set date and time as index
df_level.index = df_level.pop("datetime")
#Sort by date and time(...I think it will work without it, but I'll leave it)
df_level = df_level.sort_index()
#graph display
df_level["level"].plot(figsize=(15,5))
When executed, the following graph will be displayed.
Please read the data and display it on the graph, paying attention to the fact that the CSV contains data that is not counted and that the character code is Shift JIS.
python
#Read file
filename = "data.csv"
df = pd.read_csv(filename,encoding="SHIFT-JIS",skiprows=4)
#Rename column
df_rain.columns = ["datetime", "rain", "Information without phenomenon","quality information","Homogeneous number"]
#Convert date and time to timestamp
df_rain["datetime"] = df_rain.datetime.map(lambda _: pd.to_datetime(_))
#Set date and time as index
df_rain.index = df_rain.pop("datetime")
#graph display
df_level.level.plot(figsize=(15,5))
df_rain.rain.plot(figsize=(15,5))
When executed, the following graph will be displayed. By the way, orange is the amount of precipitation.
This time, since we are predicting the water level one hour later, I would like to predict the maximum water level one hour later using the change in water level one hour ago and the amount of precipitation.
For that, the training data is as follows.
input | output |
---|---|
Precipitation 1 hour ago Water level every 5 minutes 1 hour ago(10 points) |
Maximum water level after 1 hour |
Since the water level data is data at 5-minute intervals, there should be 12 points of data every 60 minutes, but there are some missing data, and some of them have 12 points or less depending on the timing. After trial and error, the score is 10 points.
In addition, since the precipitation data is described as "1 hour before" on the Japan Meteorological Agency's website, it is considered to be the data 1 hour before the date and time set in the index.
Based on this, the data processing method is as follows.
python
#Get Precipitation Index
ixs = df_rain.index
#Creating an array for data acquisition
df = []
y = []
for i in range(len(ixs)-2):
#Get date and time from index
dt1 = ixs[i]
dt2 = ixs[i + 1]
dt3 = ixs[i + 2]
#Get water level data from date and time data
d1 = df_level[dt1:dt2].level.tolist()
d2 = df_level[dt2:dt3].level.tolist()
if len(d1) > 10 and len(d2) > 10:
#Get the maximum water level after 1 hour
y.append(max(d2))
#Sort the water level data one hour ago in descending order
d1.sort()
d1.reverse()
#Get 10 points of data
d1 = d1[:10]
#Get precipitation data
d1.append(df_rain.ix[i].rain)
#Get an array of input data
df.append(d1)
#Convert to data frame
df = pd.DataFrame(df)
df["y"] = y
#Check the number of data
print(df.shape)
When I executed it, (6863, 12) was displayed and I was able to get 6863 rows of data.
We will learn the first half 90% of the data by machine learning and verify the learning result in the second half 10%.
python
#Divide data into input and output
y = df.pop("y").as_matrix().astype("int").flatten()
X = df.as_matrix().astype("float")
#Divided to use 90% for learning and 10% for verification
num = int(len(X) * 0.9)
print(len(X), num, len(X)-num)
X_train = X[:num]
X_test = X[num:]
y_train = y[:num]
y_test = y[num:]
#Set a random forest as a learning model
from sklearn.ensemble import RandomForestRegressor
model = RandomForestRegressor(random_state=42)
#Learning and verification
model.fit(X_train, y_train)
result = model.predict(X_test)
#Score
print(model.score(X_test,y_test))
When I ran it, the prediction accuracy was "0.952915078747".
I'm not sure about the numbers, so I'll draw a graph.
python
pp = pd.DataFrame({'act': np.array(y_test), "pred": np.array(result), "rain": X_test[:,-1]})
pp.rain = pp.rain * 5
plt.figure(figsize=(15,5))
plt.ylim(0,250)
plt.plot(pp)
Blue is the actual water level, orange is the predicted water level, and it overlaps so much that the blue line can hardly be seen (^-^)
Wow!
Now, let's change the precipitation from the water level at a certain point in time and predict the water level one hour later.
python
import random
#Randomly select index
i = random.randint(0,len(df))
d = df.ix[i].as_matrix().tolist()
print(d)
#Get a test array
df_test = []
#Create test data by changing precipitation from 0 to 20
for i in range(21):
temp = d[:10]
temp.append(i)
df_test.append(temp)
#Forecast
test = model.predict(np.array(df_test).astype("float"))
#graph display
plt.plot(test)
The data used were the following values.
python
[150.0, 149.0, 149.0, 148.0, 147.0, 147.0, 147.0, 146.0, 146.0, 146.0, 8.0, 147.0]
The graph of the forecast results is as follows.
The X-axis is the precipitation and the Y-axis is the water level. Looking at this graph, although the water level gradually rises in proportion to the precipitation, it rises sharply after 10 mm and falls at 13 mm. ..
I tried a few other tests, but all of them had a slightly distorted graph. Even if the prediction accuracy of time series data is high, this is not useful ... (-_-;)
I thought that the water level would rise as the amount of precipitation increased, but the prediction based on the test data was a little different from what I expected, and it did not increase uniformly. This is probably because it is not possible to correctly predict what is not included in the training data.
Alright, let's consider the next method with this in mind!
Now let's try a recently popular algorithm. The process up to data processing is the same, and the machine learning part is changed as follows.
By the way, neural networks are also known as multi-layer perceptrons. In addition, since neural networks mainly handle numerical values from -1 to 1, normalize the training data.
python
#Divide data into input and output
y = df.pop("y").as_matrix().astype("int").flatten()
X = df.as_matrix().astype("float")
#Divided to use 90% for learning and 10% for verification
num = int(len(X) * 0.9)
print(len(X), num, len(X)-num)
X_train = X[:num]
X_test = X[num:]
y_train = y[:num]
y_test = y[num:]
#Data normalization
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
scaler.fit(X_train)
X_train = scaler.transform(X_train)
X_test = scaler.transform(X_test)
#Set a neural network as a learning model
from sklearn.neural_network import MLPRegressor
model = MLPRegressor(random_state=42)
#Learning and verification
model.fit(X_train, y_train)
result = model.predict(X_test)
#Score
print(model.score(X_test,y_test))
When executed, the prediction accuracy is "0.947163962045", which is a little worse than Random Forest (-_-;)
But for the time being, I will try until the end.
python
import random
#Randomly select index
i = random.randint(0,len(df))
d = df.ix[i].as_matrix().tolist()
print(d)
df_test = []
#Create test data by changing precipitation from 0 to 20
for i in range(21):
temp = d[:10]
temp.append(i)
df_test.append(temp)
#Input data normalization
d = scaler.transform(np.array(df_test).astype("float"))
#Forecast
test = model.predict(d)
plt.plot(test)
I will try it.
[54.0, 54.0, 54.0, 53.0, 53.0, 53.0, 53.0, 53.0, 53.0, 53.0, 0.0, 53.0]
Kita --------! !!
Neural network is amazing! !!
Thank you to everyone involved in open data in Sabae City for their valuable data. We look forward to working with you in the future.
We have released a document that summarizes the data of Jupyter Notebook that executed the above contents, so please refer to it as well.
Water level prediction using open data in Sabae City, Fukui Prefecture-2017 version
Recommended Posts