** 2016/8/31 Corrected because the AUC calculation only saw "fine" **
Speaking of CNN (Convolutional Neural Network) in Deep Learning, image processing is the main thing, but I have not done any analysis on image processing, so this time I would like to verify ** whether weather forecast can be done from weather images ** think.
The meteorological image was downloaded from Kochi University site. Get the data from January 2015 to July 2016. I used the image like this. In addition, since there is data every hour, I got the data at 17:00.
Source: Provided by Kochi University, University of Tokyo, Japan Meteorological Agency
Past weather can be obtained from the [Page] of the Japan Meteorological Agency (http://www.data.jma.go.jp/risk/obsdl/index.php). This is also the daily weather data from January 2015 to July 2016. The place is Tokyo and I used the daytime weather.
The problem is "predicting the next day's fine weather and rain from the weather image near Japan at 17:00 the day before." The weather of the Japan Meteorological Agency includes "rain after fine weather" and "cloudy", but if rain is included, it is treated as "rain", otherwise it is treated as "fine", and it is a binary classification problem.
Tomorrow's weather will be affected not only by the movement of clouds near Tokyo but also by westerlies, so we need a wider range of meteorological images, but we don't need the weather far away from Japan. So, I will crop the image to make the data near Japan.
import numpy as np
from PIL import Image
import datetime as dt
w = 640
h = 480
"""Setting to cut out only around Japan"""
sw = 320
sh = 65
ew = 540
eh = 320
"""Image compression"""
is_comp = False
def get_mat(dates=[]):
"""
:param dates:
:return:
"""
l = len(dates)
if not is_comp:
wr = ew - sw
hr = eh - sh
else:
wr = 50
hr = 50
mat = np.zeros((l,3,wr,hr),dtype=np.float32)
file_base = base_file_dir + "fe.%s" + base_hour + ".jpg "
j = 0
err_dates = []
for ddd in dates:
dd = dt.datetime.strptime(ddd,"%Y/%m/%d")
dd_str = dd.strftime("%y%m%d")
try:
im = Image.open(file_base % (dd_str))
im = im.crop((sw,sh,ew,eh))
im = im.resize((wr, hr))
mat0 = np.array(im)
for i in range(0,3):
mat[j,i,:,:] = mat0[:,:,i].T
j += 1
except:
err_dates.append(ddd)
print dd_str + " --> Error!!"
return mat[0:j],err_dates
It takes a list of dates, returns the images as a numpy matrix, and also returns the date of the error. The image object im contains an image, but if you convert it to a matrix with numpy, it will be in the order of (width, height, channel), so be careful that it is transposed. Here, the image (480x640) is trimmed to the extent that the Japanese archipelago is included. It looks like the following.
Since the data after making a matrix contains data from 0 to 255 in each cell, divide it by 255 and convert it to data from 0 to 1.
Due to the learning time of CNN after that, I thought about compressing the image, but this time I stopped.
The model of the convolutional neural network uses the one used before. Konohen or Konohen or Konohen Please refer to .com / wbh / items / da881fac695f17042b19).
The parameter settings are as follows.
params = {"clm_dim":clm_dim,"in_channels":3,"out_channels":3,"row_dim":row_dim,"filt_clm":3,"filt_row":3,"pool_clm":3,"pool_row":3,"batchsize":200,"hidden_dim":500,"n_classes":2}
clm_dim and row_dim correspond to the width and height of each image. Also, since the number of channels is RGB, it is set to 3. Also use max pooling. If you increase the filter size or pooling size, it will take more time to calculate. (hooked on)
Even with this setting, it takes a lot of time on my macbook pro. So, this time I set epoch = 50.
Since it takes a lot of calculation time, the training data is for one year from 2015/1/1 to 2015/12/31, and the test data is from 2016/1/1 to 2016/7/31. Even this takes time. Image processing is scary ...
The learning process and test accuracy are as follows.
Also, the learning accuracy is likely to increase, but the test accuracy is not catching up.
In AUC ~~ 0.78 ~~ ** 0.70 **, various indicators are as follows.
Precision | Recall | F-Score | |
---|---|---|---|
rain | 0.54 | 0.67 | 0.6 |
Fine | 0.83 | 0.74 | 0.78 |
average | 0.74 | 0.72 | 0.72 |
I think it's a pretty good feeling for what I did. However, of the forecasts of rain, the actual amount of rain was as low as 54%.
Let's plot the actual rain ratio and the fine ratio by arranging them in descending order with the probability of rain.
It seems that it is not random but predictive. That's right, because the actual rain ratio in the test data is 30% and the sunny ratio is 60%.
It seems that one of the reasons why the accuracy does not improve is the treatment of "cloudiness". There are cases of "sunny and then cloudy" and "cloudy temporary rain", and the actual cloudy ratio in the data exceeds 70%. (Too cloudy) This time, the former is treated as fine and the latter as rain. Therefore, it seems that you do not know which is the delicate situation. This is a challenge. However, in this result, it seems that the obvious rain is hit with a good degree of accuracy.
It is a difficult problem because cloudiness becomes dominant in the ternary classification.
The weather forecaster is amazing.
I'm wondering if this is the algorithm, but for the data, for example, it is possible to take the difference from the previous day's image or remove the pixels of the base map. I think I'll do it little by little.
Recommended Posts