I tried to predict the change in snowfall for 2 years by machine learning

This entry is a sequel to the previously written I tried to predict the presence or absence of snow by machine learning. At this time, I predicted only the presence or absence of snow (1 or 0), but I tried a little more to predict the change in the amount of snow.

When I wrote down the result first, it looked like this. The horizontal axis is the number of days, and the vertical axis is the amount of snow (cm).

Result 1 (blue is the actual amount of snow, red line is the predicted amount of snow) スクリーンショット 2016-05-01 17.45.39.png

Result 2 (blue is the actual amount of snow, red line is the predicted amount of snow) スクリーンショット 2016-05-01 17.59.35.png

Please read the following to find out what "Result 1" and "Result 2" are respectively.

What I wanted to do

Previously, I tried to predict the presence or absence of snow by using scikit-learn in I tried to predict the presence or absence of snow by machine learning. However, I got a little greedy and wanted to predict the actual amount of snow (cm) for a certain period, not whether it was present or not.

Specifically, we will acquire meteorological data such as snow cover wind speed`` temperature provided by the Japan Meteorological Agency, and use the data for the first 7500 days for learning, and the remaining 2 years (365x2 = Predict changes in snowfall (730 days) and compare with actual changes in snowfall.

Collect training data

The learning data will be the one published by the Japan Meteorological Agency. Specifically, please refer to the previously written I tried to predict the presence or absence of snow by machine learning.

The obtained CSV data looks like this. The target was Tonami City in Toyama, which has a lot of snow.

`data_2013_2015.csv`


Download time: 2016/03/20 20:31:19

,Tonami,Tonami,Tonami,Tonami,Tonami,Tonami,Tonami,Tonami,Tonami,Tonami,Tonami,Tonami,Tonami,Tonami
Date and time,temperature(℃),temperature(℃),temperature(℃),Snow cover(cm),Snow cover(cm),Snow cover(cm),wind speed(m/s),wind speed(m/s),wind speed(m/s),wind speed(m/s),wind speed(m/s),Precipitation(mm),Precipitation(mm),Precipitation(mm)
,,,,,,,,,Wind direction,Wind direction,,,,
,,quality information,Homogeneous number,,quality information,Homogeneous number,,quality information,,quality information,Homogeneous number,,quality information,Homogeneous number
2013/2/1 1:00:00,-3.3,8,1,3,8,1,0.4,8,West,8,1,0.0,8,1
2013/2/1 2:00:00,-3.7,8,1,3,8,1,0.3,8,North,8,1,0.0,8,1
2013/2/1 3:00:00,-4.0,8,1,3,8,1,0.2,8,Quiet,8,1,0.0,8,1
2013/2/1 4:00:00,-4.8,8,1,3,8,1,0.9,8,South-southeast,8,1,0.0,8,1
...

basic way of thinking

The idea is that this kind of prediction is probably standard, but we train the model with some types of peripheral data and the resulting amount of snow as a set, and only the peripheral data is applied to the resulting model. It is to give and get the predicted value of the amount of snowfall. So-called "supervised learning" </ b>.

In this case, the following data was used as peripheral data.

Temperature

Wind speed

Yesterday's snowfall

1 day ago temperature, 2 days ago temperature, 3 days ago temperature

Wind speed 1 day ago, wind speed 2 days ago, wind speed 3 days ago

Expressed as an image, it looks like this.

[temperature,wind speed,Yesterday's snowfall,1日前のtemperature,2日前のtemperature,3日前のtemperature, 1日前のwind speed, 2日前のwind speed, 3日前のwind speed]→ Snowfall on the day [temperature,wind speed,Yesterday's snowfall,1日前のtemperature,2日前のtemperature,3日前のtemperature, 1日前のwind speed, 2日前のwind speed, 3日前のwind speed]→ Snowfall on the day [temperature,wind speed,Yesterday's snowfall,1日前のtemperature,2日前のtemperature,3日前のtemperature, 1日前のwind speed, 2日前のwind speed, 3日前のwind speed]→ Snowfall on the day .... [temperature,wind speed,Yesterday's snowfall,1日前のtemperature,2日前のtemperature,3日前のtemperature, 1日前のwind speed, 2日前のwind speed, 3日前のwind speed]→ Snowfall on the day

So, based on this, give only the peripheral data and get the predicted value

[temperature,wind speed,Yesterday's snowfall,1日前のtemperature,2日前のtemperature,3日前のtemperature, 1日前のwind speed, 2日前のwind speed, 3日前のwind speed]→ (Predicted amount of snow on the day)

I did it like this. Basically, the data of the forecast target date is given, but only one yesterday's snowfall amount is the data one day before the forecast target date. And it seemed to have the most impact on the data it gave. Well, when you think about it, it's natural.

As I wrote at the beginning, I will use the data for about 7500 days from the data obtained from the Japan Meteorological Agency for learning, predict the change in snow cover for the remaining 2 years, and compare it with the actual change in snow cover.

Try to predict

The actual code looks like this:

snow_forecaster.py

import csv import numpy as np from matplotlib import pyplot from sklearn import linear_model from sklearn import cross_validation class SnowForecast: def __init__(self): u"""Initialize each instance variable""" self.model = None #Generated learning model self.data = [] #Array of training data self.target = [] #Array of actual snow cover self.predicts = [] #Array of predicted values of snowfall self.reals = [] #Array of actual snow cover self.day_counts = [] #Array of elapsed dates from the start date self.date_list = [] self.record_count = 0 def load_csv(self): u"""Read a CSV file for learning""" with open("sample_data/data.csv", "r") as f: reader = csv.reader(f) accumulation_yesterday0 = 0 date_yesterday = "" temp_3days = [] wind_speed_3days = [] for row in reader: if row[4] == "": continue daytime = row[0] # "yyyy/mmdd HH:MM:SS" date = daytime.split(" ")[0] # "yyyy/mm/dd" temp = int(float(row[1])) #temperature. There is a subtle effect wind_speed = float(row[7]) #wind speed. There is a subtle effect precipitation = float(row[12]) #Precipitation. no effect accumulation = int(row[4]) #The amount of snow. The amount of snowfall yesterday has a big impact if len(wind_speed_3days) == 3: #Training data # [temperature,wind speed,Yesterday's snowfall,1日前のtemperature,2日前のtemperature,3日前のtemperature, 1日前のwind speed, 2日前のwind speed, 3日前のwind speed] sample = [temp, wind_speed, accumulation_yesterday0] sample.extend(temp_3days) sample.extend(wind_speed_3days) self.data.append(sample) self.target.append(accumulation) if date_yesterday != date: accumulation_yesterday0 = accumulation self.date_list.append(date) wind_speed_3days.insert(0, wind_speed) if len(wind_speed_3days) > 3: wind_speed_3days.pop() temp_3days.insert(0, temp) if len(temp_3days) > 3: temp_3days.pop() date_yesterday = date self.record_count = len(self.data) return self.data def train(self): u"""Generate a learning model. Use the training data up to about 7500 days of the original data""" x = self.data y = self.target print(len(x)) # ElasticNetCV,LassoCV,Select Elastic NetCV with the smallest error from RidgeCV model = linear_model.ElasticNetCV(fit_intercept=True) model.fit(x[0:self.training_data_count()], y[0:self.training_data_count()]) self.model = model def predict(self): u"""Predict using a learning model. Forecast for the last two years""" x = self.data y = self.target model = self.model for i, xi in enumerate(x): real_val = y[i] if i < self.training_data_count() + 1: self.predicts.append(0) self.reals.append(real_val) self.day_counts.append(i) continue predict_val = int(model.predict([xi])[0]) #If the snowfall forecast is 0 or less, it is set to 0. if predict_val < 0: predict_val = 0 self.predicts.append(predict_val) self.reals.append(real_val) self.day_counts.append(i) def show_graph(self): u"""Compare predicted and measured values with a graph""" pyplot.plot(self.day_counts[self.predict_start_num():], self.reals[self.predict_start_num():], "b") pyplot.plot(self.day_counts[self.predict_start_num():], self.predicts[self.predict_start_num():], "r") pyplot.show() def check(self): u"""Measure the error between training data and forecast data""" x = np.array(self.data[self.predict_start_num():]) y = np.array(self.target[self.predict_start_num():]) model = self.model p = np.array(self.predicts[self.predict_start_num():]) e = p - np.array(self.reals[self.predict_start_num():]) error = np.sum(e * e) rmse_10cv = np.sqrt(error / len(self.data[self.predict_start_num():])) print("RMSE(10-fold CV: {})".format(rmse_10cv)) def training_data_count(self): u"""Leave the last two years and use the data before that as training data. Returns the number of training data""" return self.record_count - 365 * 2 def predict_start_num(self): u"""The last two years are predicted and used to measure the error from the measured value. Returns the predicted start position""" return self.training_data_count() + 1 if __name__ == "__main__": forecaster = SnowForecast() forecaster.load_csv() forecaster.train() forecaster.predict() forecaster.check() forecaster.show_graph()

The most annoying part was creating training data from raw data as in the previous chapter. Still, it's easy because it's python.

So, the execution result is as follows (blue is the actual amount of snow, red line is the predicted amount of snow). This is the first "result 1" shown.

I'm predicting something like that.

At this point, I suddenly wondered how to do this. "But I'm predicting by giving the amount of snow one day ago, so when I actually try to use it for future prediction, I can only predict the amount of snow tomorrow ...?" b>

No, do you know? If you say that, the temperature and wind speed will be the same. But you see, they're weather forecasts ... Gefun Gefun

Changed to predict the next day's snowfall using the yesterday's snowfall that I predicted

So, I immediately modified the code like that. There are no particular changes to the learning part of the model. Of the data given when predicting the amount of snowfall, let's replace the amount of snowfall for yesterday with the` predicted value one day before, which was predicted by himself, instead of the actual measurement value.

The code is as follows. Only the predict function has changed.

snow_forecaster.py

import csv import numpy as np from matplotlib import pyplot from sklearn import linear_model from sklearn import cross_validation class SnowForecast: def __init__(self): u"""Initialize each instance variable""" self.model = None #Generated learning model self.data = [] #Array of training data self.target = [] #Array of actual snow cover self.predicts = [] #Array of predicted values of snowfall self.reals = [] #Array of actual snow cover self.day_counts = [] #Array of elapsed dates from the start date self.date_list = [] self.record_count = 0 def load_csv(self): u"""Read a CSV file for learning""" with open("sample_data/data.csv", "r") as f: reader = csv.reader(f) accumulation_yesterday0 = 0 date_yesterday = "" temp_3days = [] wind_speed_3days = [] for row in reader: if row[4] == "": continue daytime = row[0] # "yyyy/mmdd HH:MM:SS" date = daytime.split(" ")[0] # "yyyy/mm/dd" temp = int(float(row[1])) #temperature. There is a subtle effect wind_speed = float(row[7]) #wind speed. There is a subtle effect precipitation = float(row[12]) #Precipitation. no effect accumulation = int(row[4]) #The amount of snow. The amount of snowfall yesterday has a big impact if len(wind_speed_3days) == 3: #Training data # [temperature,wind speed,Yesterday's snowfall,1日前のtemperature,2日前のtemperature,3日前のtemperature, 1日前のwind speed, 2日前のwind speed, 3日前のwind speed] sample = [temp, wind_speed, accumulation_yesterday0] sample.extend(temp_3days) sample.extend(wind_speed_3days) self.data.append(sample) self.target.append(accumulation) if date_yesterday != date: accumulation_yesterday0 = accumulation self.date_list.append(date) wind_speed_3days.insert(0, wind_speed) if len(wind_speed_3days) > 3: wind_speed_3days.pop() temp_3days.insert(0, temp) if len(temp_3days) > 3: temp_3days.pop() date_yesterday = date self.record_count = len(self.data) return self.data def train(self): u"""Generate a learning model. Use the training data up to about 7500 days of the original data""" x = self.data y = self.target print(len(x)) # ElasticNetCV,LassoCV,Select Elastic NetCV with the smallest error from RidgeCV model = linear_model.ElasticNetCV(fit_intercept=True) model.fit(x[0:self.training_data_count()], y[0:self.training_data_count()]) self.model = model def predict(self): u"""Predict the amount of snowfall using a learning model. Forecast for the last two years""" x = self.data y = self.target model = self.model yesterday_predict_val = None #Variable to store yesterday's forecast value for i, xi in enumerate(x): real_val = y[i] if i < self.training_data_count() + 1: self.predicts.append(0) self.reals.append(real_val) self.day_counts.append(i) continue #Replace yesterday's snowfall with yesterday's forecast if yesterday_predict_val != None: xi[2] = yesterday_predict_val predict_val = int(model.predict([xi])[0]) #If the snowfall forecast is 0 or less, it is set to 0. if predict_val < 0: predict_val = 0 self.predicts.append(predict_val) self.reals.append(real_val) self.day_counts.append(i) yesterday_predict_val = predict_val def show_graph(self): u"""Compare predicted and measured values with a graph""" pyplot.plot(self.day_counts[self.predict_start_num():], self.reals[self.predict_start_num():], "b") pyplot.plot(self.day_counts[self.predict_start_num():], self.predicts[self.predict_start_num():], "r") pyplot.show() def check(self): u"""Measure the error between training data and forecast data""" x = np.array(self.data[self.predict_start_num():]) y = np.array(self.target[self.predict_start_num():]) model = self.model p = np.array(self.predicts[self.predict_start_num():]) e = p - np.array(self.reals[self.predict_start_num():]) error = np.sum(e * e) rmse_10cv = np.sqrt(error / len(self.data[self.predict_start_num():])) print("RMSE(10-fold CV: {})".format(rmse_10cv)) def training_data_count(self): u"""Leave the last two years and use the data before that as training data. Returns the number of training data""" return self.record_count - 365 * 2 def predict_start_num(self): u"""The last two years are predicted and used to measure the error from the measured value. Returns the predicted start position""" return self.training_data_count() + 1 if __name__ == "__main__": forecaster = SnowForecast() forecaster.load_csv() forecaster.train() forecaster.predict() forecaster.check() forecaster.show_graph()

The result is as follows (blue is the actual amount of snow, red line is the predicted amount of snow). "Result 2" shown at the beginning.

Hmm. As expected, it became more inaccurate than when the actual amount of snow covered yesterday was given. However, it seems that the waveform is not so messed up.

Impressions etc.

I was wondering if it would be a more messed up prediction, but I thought I was able to predict it like that. However, although it was successfully deceived by Gefun Gefun on the way, the temperature and wind speed given when predicting are using the measured values of the day. However, if you want to make predictions for a certain period in the future, you have to use the predicted values separately or stop using those values in the first place, so if you use the predicted values, the accuracy will be higher. It will go down. Moreover, the more the future. So, if you want to do something like this, make a prediction using the predicted value, then make a prediction using it, and so on, and the later, the slight error in the previous process will greatly increase. thought. That's why the Japan Meteorological Agency does its best (

Recommended Posts
I tried to predict the change in snowfall for 2 years by machine learning

I tried to compress the image using machine learning

I tried to organize the evaluation indexes used in machine learning (regression model)

I tried to process and transform the image and expand the data for machine learning

[Machine learning] I tried to summarize the theory of Adaboost

I tried to verify the yin and yang classification of Hololive members by machine learning

I tried to predict by letting RNN learn the sine wave

I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 1

I tried to understand supervised learning of machine learning in an easy-to-understand manner even for server engineers 2

I want to change the color by clicking the scatter point in matplotlib

I tried to classify guitar chords in real time using machine learning

I tried to visualize the model with the low-code machine learning library "PyCaret"

I tried the common story of using Deep Learning to predict the Nikkei 225

I tried to analyze the New Year's card by myself using python

[For beginners] Introduction to vectorization in machine learning

I tried to predict the price of ETF

(Machine learning) I tried to understand the EM algorithm in a mixed Gaussian distribution carefully with implementation.

I tried to understand the learning function in the neural network carefully without using the machine learning library (second half).

I tried to predict the horses that will be in the top 3 with LightGBM

I tried to implement various methods for machine learning (prediction model) using scikit-learn.

GTUG Girls + PyLadiesTokyo Meetup I went to machine learning for the first time

Although I knew that the machine learning course in the example was good, I continued to go through it for two years, but it was still good

I tried to move machine learning (ObjectDetection) with TouchDesigner

I tried to graph the packages installed in Python

I tried to predict the J-League match (data analysis)

I tried to predict the number of people infected with coronavirus in Japan by the method of the latest paper in China

[Keras] I tried to solve a donut-type region classification problem by machine learning [Study]

[Series for busy people] I tried to summarize by parsing to call news in 30 seconds

I tried to predict horse racing by doing everything from data collection to deep learning

I tried "Lobe" which can easily train the machine learning model published by Microsoft.

I tried to build an environment for machine learning with Python (Mac OS X)

I tried to implement anomaly detection by sparse structure learning

Predict the presence or absence of infidelity by machine learning

I tried using Tensorboard, a visualization tool for machine learning

I tried to summarize the code often used in Pandas

I tried machine learning to convert sentences into XX style

I tried to illustrate the time and time in C language

I tried to summarize the commands often used in business

I tried to implement the mail sending function in Python

[TF] I tried to visualize the learning result using Tensorboard

I tried to compare the accuracy of machine learning models using kaggle as a theme.

[For beginners] I want to explain the number of learning times in an easy-to-understand manner.

[Deep Learning from scratch] I tried to explain the gradient confirmation in an easy-to-understand manner.

I tried to predict the sales of game software with VARISTA by referring to the article of Codexa

I tried to understand it carefully while implementing the algorithm Adaboost in machine learning (+ I deepened my understanding of array calculation)

I tried machine learning with liblinear

I tried to move the ball

I tried to estimate the interval.

Python learning memo for machine learning by Chainer Chapter 8 Introduction to Numpy

Before the introduction to machine learning. ~ Technology required for machine learning other than machine learning ~

Python learning memo for machine learning by Chainer Chapter 10 Introduction to Cupy

Perform morphological analysis in the machine learning environment launched by GCE

I tried to visualize the Beverage Preference Dataset by tensor decomposition.

I tried to process the image in "sketch style" with OpenCV

I tried to summarize the commands used by beginner engineers today

I tried to make an analysis base of 5 patterns in 3 years

I tried to process the image in "pencil style" with OpenCV

I tried to solve the shift scheduling problem by various methods

Python learning memo for machine learning by Chainer Chapter 9 Introduction to scikit-learn

[Machine learning] I tried to do something like passing an image

Try to predict the triplet of boat race by ranking learning

I tried to predict the change in snowfall for 2 years by machine learning

What I wanted to do

Collect training data

`data_2013_2015.csv`

basic way of thinking

Try to predict

`snow_forecaster.py`

Changed to predict the next day's snowfall using the yesterday's snowfall that I predicted

`snow_forecaster.py`

Impressions etc.