Continuing from yesterday, I was wondering if I could do something with the data released by Shimane Prefecture, and it seems that the water level data of the river is released over a wide range, so I tried to visualize this.
[Shimane Prefecture] Daily latest river water level data (for 40 days)
First, there is the catalog page.
https://shimane-opendata.jp/db/organization/main
There is a "River water level data" page in the catalog page.
https://shimane-opendata.jp/db/dataset/010010
It seems that the river water level data saved every 10 minutes on a daily basis is saved in CSV. For example, if you want to download the data for June 30, access the following URL.
https://shimane-opendata.jp/db/dataset/010010/resource/88f86c3b-b609-45a2-b3b9-1949c459aeae
July 1st ...
https://shimane-opendata.jp/db/dataset/010010/resource/2db49bb8-1e87-4f7d-9bc3-3e3c5d188044
that? URLs vary greatly from day to day.
Furthermore, the CSV URL is ...
https://shimane-opendata.jp/storage/download/bf1d010d-940d-4f9e-82b0-2a3609300320/suii_10min_20200701.csv
Yes, it's hard to use!
So, let's try the visualization work by the following procedure.
By the way, this time too, we will use Colaboratory.
Get the URL of the daily page with the following script.
python
import requests
from bs4 import BeautifulSoup
urlBase = "https://shimane-opendata.jp"
urlName = urlBase + "/db/dataset/010010"
def get_tag_from_html(urlName, tag):
url = requests.get(urlName)
soup = BeautifulSoup(url.content, "html.parser")
return soup.find_all(tag)
def get_page_urls_from_catalogpage(urlName):
urlNames = []
elems = get_tag_from_html(urlName, "a")
for elem in elems:
try:
string = elem.get("class")[0]
if string in "heading":
href = elem.get("href")
if href.find("resource") > 0:
urlNames.append(urlBase + href)
except:
pass
return urlNames
urlNames = get_page_urls_from_catalogpage(urlName)
print(urlNames)
Get the CSV URL with the following script.
python
def get_csv_urls_from_url(urlName):
urlNames = []
elems = get_tag_from_html(urlName, "a")
for elem in elems:
try:
href = elem.get("href")
if href.find(".csv") > 0:
urlNames.append(href)
except:
pass
return urlNames[0]
urls = []
for urlName in urlNames:
urls.append(get_csv_urls_from_url(urlName))
print(urls)
Read the data directly from the URL obtained above. By the way, the character code is Shift JIS, and the first 5 lines contain information other than data, so that is excluded.
python
import pandas as pd
df = pd.DataFrame()
for url in urls:
df = pd.concat([df, pd.read_csv(url, encoding="Shift_JIS").iloc[6:]])
df.shape
python
df.info()
You can get the column information by executing the above.
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2016 entries, 6 to 149
Data columns (total 97 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Observatory 2016 non-null object
1 Iwasakibashi 2016 non-null object
2 Otani 2016 non-null object
3 Tamayu River 2016 non-null object
4 Kashima 2016 non-null object
5 Mabashi River 2016 non-null object
6 Hizugawa Sluice Upstream 2016 non-null object
7 Downstream of Hitsugawa Sluice 2016 non-null object
8 Kitada River Sluice Upstream 2016 non-null object
9 Downstream of Kitada River Sluice 2016 non-null object
10 Kyobashi River 2016 non-null object
11 Kyobashi River Sluice Upstream 2016 non-null object
12 Kyobashi River Sluice Downstream 2016 non-null object
13 Upper Tekai Sluice 2016 non-null object
14 Downstream of Tekai Sluice 2016 non-null object
15 Kano Bridge 2016 non-null object
16 Izumokyo 2016 non-null object
17 Nunobe 2016 non-null object
18 Owatari 2016 non-null object
19 Yada 2016 non-null object
20 Iinashi Bridge 2016 non-null object
21 Shimoyama Sa 2016 non-null object
22 Grandfather Tanigawa 2016 non-null object
23 Hirotsuruhashi 2016 non-null object
24 Yasugi Ohashi 2016 non-null object
25 Yoshida Bridge 2016 non-null object
26 Hinodebashi 2016 non-null object
27 Kakeya Ohashi 2016 non-null object
28 Sakayama Bridge 2016 non-null object
29 Kandabashi 1 2016 non-null object
30 Hachiguchi Bridge 2016 non-null object
31 Yagami 2016 non-null object
32 Yokota Shinohashi 2016 non-null object
33 Mitsunari Ohashi 2016 non-null object
34 Shimbashi 2016 non-null object
35 Takasegawa 2016 non-null object
36 Goemon Bridge 2016 non-null object
37 Ron Tagawa 2016 non-null object
38 Yutanigawa 2016 non-null object
39 Nishihirata 2016 non-null object
40 Ichimonbashi 2016 non-null object
41 Nie 2016 non-null object
42 Sada 2016 non-null object
43 Kimura Bridge 2016 non-null object
44 Shinnai Fujikawa 2016 non-null object
45 Akakawa 2016 non-null object
46 Flowing Bridge 2016 non-null object
47 Touma River 2016 non-null object
48 Lake Jinzai 2016 non-null object
49 Imbara 2016 non-null object
50 Shimokuchiba 2016 non-null object
51 Kawai Bridge 2016 non-null object
52 Yokaichibashi 2016 non-null object
53 Masahara Bridge 2016 non-null object
54 exit 2016 non-null object
55 sunrise 2016 non-null object
56 Kandabashi 2 2016 non-null object
57 Nagahisa 2016 non-null object
58 Kute 2016 non-null object
59 Togeshika 2016 non-null object
60 Takuno 2016 non-null object
61 Zenkoji Bridge 2016 non-null object
62 Furuichibashi 2016 non-null object
63 Eo 2016 non-null object
64 Tochiya 2016 non-null object
65 Chikahara 2016 non-null object
66 Hinui 2016 non-null object
67 Katsuji 2016 non-null object
68 Toji 2016 non-null object
69 Fuchubashi 2016 non-null object
70 Shimoraihara 2016 non-null object
71 Isago 2016 non-null object
72 Sangubashi 2016 non-null object
73 Nakashibabashi 2016 non-null object
74 Hamada Ohashi 2016 non-null object
75 Hamada 2016 non-null object
76 Midfield 2016 non-null object
77 Misumi 2016 non-null object
78 Nishikawachi 2016 non-null object
79 Keikawabashi 2016 non-null object
80 Daido Bridge 2016 non-null object
81 Showa Bridge 2016 non-null object
82 Somewa 2016 non-null object
83 Asakura 2016 non-null object
84 Kiami River 2016 non-null object
85 Too Bridge 2016 non-null object
86 Aioi Bridge 2016 non-null object
87 Asahibashi 2016 non-null object
88 Machida 2016 non-null object
89 Nakajo 2016 non-null object
90 Yao River 2016 non-null object
91 Hatabashi 2016 non-null object
92 Shintsutsumi Bridge 2016 non-null object
93 Kiyomi Bridge 2016 non-null object
94 Goka Ohashi 2016 non-null object
95 Tsuma River 2016 non-null object
96 Mita 2016 non-null object
dtypes: object(97)
memory usage: 1.5+ MB
Since everyone's Dtype is object, it seems that the numerical data is a character string ...
Also, if you take a look inside, it seems that the strings "Uncollected", "Missing", and "Maintenance" are included. After removing those character information, it is converted to a real value. Since the date and time data is also a character string, this also has to be converted to a serial value.
So, execute the following script.
python
df.index = df["Observatory"].map(lambda _: pd.to_datetime(_))
df = df.replace('Not collected', '-1')
df = df.replace('Missing', '-1')
df = df.replace('Maintenance', '-1')
cols = df.columns[1:]
for col in cols:
df[col] = df[col].astype("float")
Try drawing the graph after setting the environment so that the Japanese display does not become strange.
python
!pip install japanize_matplotlib
import matplotlib.pyplot as plt
import japanize_matplotlib
import seaborn as sns
sns.set(font="IPAexGothic")
df[cols[:5]].plot(figsize=(15,5))
plt.show()
df["2020-07-12":"2020-07-13"][cols[:5]].plot(figsize=(15,5))
plt.show()
It has been raining since the other day, so you can see at a glance how the water level is rising.
Well, what are we going to do now?
Recommended Posts