It's fantastic that it snows at Christmas, but it doesn't snow very well when I live in Tokyo. But what about Hokkaido? In Tohoku? So, let's find ** "White Christmas rate by prefecture" ** using the weather data by prefecture for the past 50 years. Also, I will plot the result on a map of Japan.
――How to proceed with data analysis, way of thinking --DataFrame operations for data cleansing --Drawing a map of Japan with japanmap
etc...
In this verification, White Christmas is defined as ** 12/24 or 25 nights, even temporarily snowy days **.
In addition, although there are multiple weather observation points in one prefecture, basically ** information on the location of the prefectural office is used **. However, this does not apply if the observation point does not exist at the prefectural capital. (As a result of the investigation, Saitama prefecture and Shiga prefecture were applicable, so we used the information of Kumagaya and Hikone, respectively.)
Past weather data will be borrowed from the website of the Japan Meteorological Agency below. https://www.data.jma.go.jp/gmd/risk/obsdl/index.php
Select 47 points as described above. I downloaded the CSV with the item "Weather overview (night: 18:00 to 06:00 the next day)" and the period "Display daily values from December 24 to December 25 from 1970 to 2019". It was.
Also, eliminate the extra lines at the top on Excel and format them in the following format.
The file name is ** Xmas.csv **.
Now, let's process the previous data and calculate the White Christmas rate. First, read CSV.
import pandas as pd
from datetime import datetime
df_xmas_master = pd.read_csv("Xmas.csv", encoding="shift_jis", index_col="Unnamed: 0", parse_dates=True)
First, delete the extra lines that are lined up with "8" and "1" from csv. This is OK if you select every other row as shown below.
df_xmas = df_xmas_master.iloc[:,::3]
Then replace the cells with "snow" in the weather with True and the cells without it with False.
for i in df_xmas.columns:
df_xmas[i] = df_xmas[i].str.contains("snow")
Then, the data frame is as follows.
If you come to this point, it seems that you should add up the rest.
df_white_rate = df_xmas.resample("Y").max().mean()
df_white_rate = df_white_rate.to_frame().reset_index()
df_white_rate.columns = ["Prefectural office location","White Xmas rate"]
Considering that True = 1 and False = 0, if you aggregate by year and calculate the maximum value, the weather of ** 12/24 or 12/25 includes even one "snow". If it is, it will be calculated as 1, and if none of them contain "snow", it will be calculated as 0 **.
Then do mean () to find out the percentage of snowfall each year. (If the weather is not obtained = NaN is included, it will be omitted from the calculation.)
By the way, in the bottom two lines of the source code, if you do mean (), the format will be Series type, so to make it easier to handle in the future, I just changed it to the data frame format and defined a new column name.
By this process, df_white_rate becomes as follows.
You have already calculated the annual White Christmas rate. Now, let's map this result to a map of Japan in a good way.
By the way, the result is illustrated on the map of Japan, but for this we use a convenient library called ** japanmap **. This is an excellent product that allows you to obtain a colored map of Japan by simply giving a list of "prefecture names" and "colors" in Series type **.
But here is the problem. The data frame created earlier is not the "prefecture name" but the "prefectural office location name". ** The prefectural office location name and the prefecture name must be linked. ** **
So, I will borrow the prefecture name-prefectural office location name table from the following site.
https://sites.google.com/site/auroralrays/hayamihyou/kenchoushozaichi
The source is below.
df_center_master = pd.read_html("https://sites.google.com/site/auroralrays/hayamihyou/kenchoushozaichi")
df_center = df_center_master[2].iloc[2:,1:]
df_center[2] = df_center[2].str[:-1]
df_center = df_center.reset_index().iloc[:,1:]
df_center.columns = ["Prefectures","Prefectural office location"]
df_center.iloc[10]["Prefectural office location"] = "Kumagaya"
df_center.iloc[12]["Prefectural office location"] = "Tokyo"
df_center.iloc[24]["Prefectural office location"] = "Hikone"
The previous site has "city" after the prefectural office location name, so I deleted it to match the format of the White Xmas data frame I created earlier.
Also, since the prefectural capital of Tokyo was "Shinjuku", it will be adjusted to "Tokyo". And this time, we have acquired data that is not the location of the prefectural office, such as "Kumagaya" in Saitama prefecture and "Hikone" in Shiga prefecture, so I will change that as well.
The data frame ** df_center ** created in this way is as follows.
It seems good to combine this data frame with the White Xmas data frame created earlier.
df_all = pd.merge(df_center,df_white_rate,on="Prefectural office location")
df_all = df_all[["Prefectures","White Xmas rate"]]
If you check df_all. .. ..
It seems that the ** prefecture name-White Xmas rate ** data frame has been successfully completed.
Next, I will manage to change the numerical value to a color.
Any shade is fine, but try to make it ** closer to white as the white Christmas rate is higher and closer to green as the rate is lower **.
In terms of color code, it can be rephrased as ** when the white Christmas rate is high, it approaches # ffffff
, and when it is low, it approaches # 00ff00
**.
The first two digits of the color code are "red intensity 256 levels converted to hexadecimal", the middle two digits are "green intensity 256 levels converted to hexadecimal", and the last two digits are " It is a conversion of 256 levels of blue intensity into hexadecimal numbers. " Since 0 to 255 in decimal numbers becomes 00 to ff in hexadecimal numbers, it always fits in 2 digits.
So, let's create a function ** that converts numbers to green intensity **.
def num2color(x):
color_code = "#" + format(round(x*255),"#04x")[2:] + "ff" + format(round(x*255),"#04x")[2:]
return color_code
Right now, the numbers are in the range 0 to 1, so multiply this by 255 to change it to the range 0 to 255. It is rounded off and replaced with a 2-digit hexadecimal number. You can adjust the color from "white to green" by adjusting only the intensity of red and blue while leaving the intensity of green as it is.
Now, let's use this function to create a ** prefecture name-color code ** list.
df_all = df_all.set_index("Prefectures")
df_all = df_all["White Xmas rate"].apply(num2color)
So what happened to df_all?
Prefecture name-It seems that the list of color codes has been completed.
Finally, we will map it to the map of Japan. This just loads matplotlib and japanmap and gives df_all to the japanmap method.
import matplotlib.pyplot as plt
from japanmap import picture
fig = plt.subplots(figsize=(10,10))
plt.imshow(picture(df_all))
Well, the result is ...
This completes the map of white Christmas rates by prefecture. The color setting may not be good, but there is a problem with the sense here, so please respect me ...
So far, we have done the mapping of the white Christmas rate with Python.
The state of this analysis is also uploaded to the following YouTube. (The source code of this article is a new optimization of what was created in this video.) If you like, I hope you can also experience the ** realistic sense of data analysis ** here.
https://youtu.be/nu_RqJAMYTY
We hope that this article will serve as a reference for data analysis policies and visualization methods.
Finally, since it was difficult to understand the white Christmas rate in the figure, I will end by pasting the percentage of the white Christmas rate for the past 50 years by prefecture. (Visualized meaning ...)
Recommended Posts