The Japan Meteorological Agency will provide historical weather data free of charge until the end of March 2020. (Reference) Usage environment of past weather data https://www.data.jma.go.jp/developer/past_data/index.html
The basic weather data is "Anyone can use it regardless of the purpose and target of use", so we will do something using the weather data.
You can also download it from the "Past Meteorological Data Download" page of the Japan Meteorological Agency, but it is very convenient because you can download it all at once. The available data is listed below. https://www.data.jma.go.jp/developer/past_data/data_list_20200114.pdf The deadline is coming soon, so if you need it, download it early.
This time, I will try to express the weather during the Tokyo Olympics in letters. WordCloud (1.6.0) is used to display characters.
As with "Use of past meteorological data 2 (change in maximum temperature during the Tokyo Olympics)", "Ground weather observation"-"Hourly / daily values" will be used. Check the following for the file format. http://data.wxbc.jp/basic_data/kansoku/surface/format_surface.pdf
Download the file to the surface folder. It takes time because it has a capacity of about 2GB.
import os
import urllib.request
#Ground weather observation hourly / daily value file download
url = 'http://data.wxbc.jp/basic_data/kansoku/surface/hourly_daily_1872-2019_v191121.tar'
folder = 'surface'
path = 'surface/hourly_daily_1872-2019_v191121.tar'
#Create folder
os.makedirs(folder, exist_ok=True)
if not os.path.exists(path):
#download
urllib.request.urlretrieve(url, path)
For details of the file, refer to "Use of past weather data 2 (Transition of maximum temperature during the Tokyo Olympics)".
As for the weather data, "daytime" and "nighttime" data are stored as the general weather conditions. This time, we will use the "daytime" data. The weather overview consists of up to four weathers and conjunctions. For example, if it is sunny all day long, it will be "fine". For example, if it is composed of multiple weathers, it will be "cloudy and sometimes sunny", "cloudy and then sunny", and "cloudy and sometimes sunny and accompanied by lightning".
code | conjunction |
---|---|
0 | No data |
1 | Blank |
2 | Temporary |
3 | Sometimes |
4 | rear |
5 | Temporarily after |
6 | Sometimes later |
7 | Accompanied by (accompanied by ○○) |
code | weather | code | weather | code | weather |
---|---|---|---|---|---|
0 | No weather | 10 | Sleet | 20 | Typhoon |
1 | Sunny | 11 | snow | 21 | Thunder |
2 | Fine | 12 | heavy snow | 22 | Hailstone |
3 | Light cloud | 13 | Blizzard | 23 | leopard |
4 | Cloudy | 14 | Fubuki | 24 | Typhoon / thunder |
5 | fog | 15 | Fubuki | 25 | Lightning and hail |
6 | Drizzle | 16 | Reserve | 26 | Lightning / hail |
7 | rain | 17 | Reserve | 27 | Lightning / fog |
8 | heavy rain | 18 | Reserve | 28 | No precipitation |
9 | storm | 19 | Reserve | 29 | There is a sunny day |
30 | Reserve | ||||
31 | × |
"Heavy rain" is used when there is rainfall of 30 mm or more. For details, refer to the following site. http://www.data.jma.go.jp/obd/stats/data/mdrr/man/gaikyo.html
Prepare a list for conversion.
conjunction = ['No data', 'Blank', 'Temporary', 'Sometimes', 'rear', 'rearTemporary', 'rearSometimes', '、']
conjunction7 = 'Accompanied by'
weather = ['No weather', 'Sunny', 'Fine', 'Light cloud', 'Cloudy', 'fog', 'fograin', 'rain', '大rain', '暴風rain',
'Sleet', 'snow', '大snow', '暴風snow', 'Fubuki', '地Fubuki', 'Reserve', 'Reserve', 'Reserve', 'Reserve',
'Typhoon', 'Thunder', 'Hailstone', 'leopard', 'Typhoon・Thunder', 'Thunder・Hailstone', 'Thunder・leopard', 'Thunder・霧', 'No precipitation', 'There is a sunny day',
'Reserve', '×']
Conjunctions are generally added before the weather, but in the case of "accompanied", the weather (the part of XX) is in the middle, such as "accompanied by XX".
Load data into a pandas data frame. The weather overview is stored from the 1500th byte of each day. Read while converting numbers to characters. Insert a space between each word for later display in WordCloud. WordCloud doesn't break words down. You need to break it down into words and pass it in advance.
#Create data frame for data storage
import pandas as pd
tokyo_df = pd.DataFrame()
#Get weather overview
import tarfile
#Point setting=Tokyo
p_no = '662'
#Get the files contained in the tar file
with tarfile.open(path, 'r') as tf:
for tarinfo in tf:
if tarinfo.isfile():
# tar.Get the files contained in the gz file
with tarfile.open(fileobj=tf.extractfile(tarinfo), mode='r') as tf2:
for tarinfo2 in tf2:
if tarinfo2.isfile():
#Read only files with matching points
if tarinfo2.name[-3:] == p_no:
print(tarinfo2.name)
#Open file
with tf2.extractfile(tarinfo2) as tf3:
lines = tf3.readlines()
for line in lines:
#Ignore files that do not contain data
if line[0:3] == b' ':
continue
#Year
year = line[14:18].decode()
#time
date = line[18:22].decode().replace(' ', '0')
#Get weather overview
conditions = ''
p = 1500
#
for i in range(4):
#conjunction
c = int(line[p:p+1])
c_rmk = int(line[p+1:p+2])
if c_rmk == 8:
conditions += conjunction[c] + ' '
#weather
w = int(line[p+2:p+4])
w_rmk = int(line[p+4:p+5])
if w_rmk == 8:
conditions += weather[w] + ' '
if c == 7: #If accompanied by 〇〇, attach it to the back.
conditions += conjunction7 + ' '
p += 5
#Data storage
tokyo_df.loc[year, date] = conditions
Check the data.
#Data confirmation
tokyo_df
There is no weather information in the old days. It seems that the current format is after 1989, so I will take out the data after 1989. The date will be the Olympic period.
tokyo_olympic_df = tokyo_df.loc['1989':'2019','0724':'0809']
tokyo_olympic_df
The weather is correct.
Let's display the weather using WordCloud. The size of the letters changes according to the number of words used in the weather conditions. It may not be possible to accurately reflect the weather, but you should be able to get a rough idea of the situation, as "fine" is counted as the same one for both "fine" and "temporary cloudy" all day long. It also contains conjunctions, but leave them as they are to indicate the change in weather.
First, let's check the weather during the Olympic Games every year. For each year, we combine the weather overview strings and pass them to WordCloud. It is a point to note. -Specify the font (font_path) to display Japanese. My environment specified MS Gothic for Winodws. Please change to the appropriate font according to your environment. -By default, single-letter words are not displayed. Specify the regular expression with regexp and display it.
import matplotlib.pyplot as plt
from wordcloud import WordCloud
#Display by year
i = 1
plt.figure(figsize=(16, 33))
for row in tokyo_olympic_df.index:
text = ''
for column in tokyo_olympic_df.columns:
text += tokyo_olympic_df.loc[row, column]
#Character image creation with WordCloud
wordcloud = WordCloud(colormap='jet', font_path="msgothic.ttc", regexp="\w+").generate(text)
plt.subplot(11,3,i)
plt.imshow(wordcloud)
plt.title(row)
plt.axis("off")
i += 1
plt.show()
The result. Actually, it would be nice if the font color could be changed according to the weather such as fine weather and rain, but it seems that the font color is randomly selected. There are some variations depending on the year, but it is mostly sunny and cloudy.
By date.
#Display by date
i = 1
plt.figure(figsize=(16, 18))
for column in tokyo_olympic_df.columns:
text = ''
for row in tokyo_olympic_df.index:
text += tokyo_olympic_df.loc[row, column]
#Character image creation with WordCloud
wordcloud = WordCloud(colormap='jet', font_path="msgothic.ttc", regexp="\w+").generate(text)
plt.subplot(6,3,i)
plt.imshow(wordcloud)
plt.title(column)
plt.axis("off")
i += 1
plt.show()
It's midsummer, so it's sunny.
Since it's a big deal, let's create a weather feeling calendar. It is for one year. Since 2019 is only halfway data, it will be until 2018. Also, February 29th of the leap year was deleted.
tokyo_365_df = tokyo_df.loc['1989':'2018']
tokyo_365_df = tokyo_365_df.drop('0229', axis=1)
tokyo_365_df
#Display by date
i = 1
plt.figure(figsize=(16, 40))
for column in tokyo_365_df.columns:
text = ''
for row in tokyo_365_df.index:
text += tokyo_365_df.loc[row, column]
#Character image creation with WordCloud
wordcloud = WordCloud(colormap='jet', font_path="msgothic.ttc", regexp="\w+").generate(text)
plt.subplot(37,10,i)
plt.imshow(wordcloud)
plt.title(column)
plt.axis("off")
i += 1
plt.show()
It is a calendar for one year in Tokyo.
During the rainy season, the rain is noticeable. October 10th, when the opening ceremony of the last Tokyo Olympics, which is said to be a peculiar day of fine weather, seems to have less rain than the surrounding dates. I think it's a peculiar day of fine weather.
Points can be addressed by changing the point number of p_no = '662'
. If you created a calendar for your local location, change the location number. You can check the point number in the point information history file (smaster_201909.tar.gz).
This is an example of Sapporo (412). There is a lot of snow in winter.
This time, I tried to express the weather by the size of letters. It is more accurate to express it numerically or in a graph, but it is intuitive and interesting to express it in the size of letters.
The data will be released until the end of March 2020, so if you need it, we recommend you to download it as soon as possible.
Recommended Posts