Introduction

Continuing from yesterday, I was wondering if I could do something with the data released by Shimane Prefecture, and it seems that the water level data of the river is released over a wide range, so I tried to visualize this.

[Shimane Prefecture] Daily latest river water level data (for 40 days)

Check the procedure

View the structure of the public page

Catalog page

First, there is the catalog page.

https://shimane-opendata.jp/db/organization/main

River water level page

There is a "River water level data" page in the catalog page.

https://shimane-opendata.jp/db/dataset/010010

Daily data page

It seems that the river water level data saved every 10 minutes on a daily basis is saved in CSV. For example, if you want to download the data for June 30, access the following URL.

https://shimane-opendata.jp/db/dataset/010010/resource/88f86c3b-b609-45a2-b3b9-1949c459aeae

July 1st ...

https://shimane-opendata.jp/db/dataset/010010/resource/2db49bb8-1e87-4f7d-9bc3-3e3c5d188044

that? URLs vary greatly from day to day.

Daily CSV

Furthermore, the CSV URL is ...

https://shimane-opendata.jp/storage/download/bf1d010d-940d-4f9e-82b0-2a3609300320/suii_10min_20200701.csv

Yes, it's hard to use!

procedure

So, let's try the visualization work by the following procedure.

Get the URL of the daily page from the river water level data page
Get the CSV URL from the daily URL page
Get the data from the obtained CSV URL
Data processing
Visualization

By the way, this time too, we will use Colaboratory.

Get the URL of the daily page

Get the URL of the daily page with the following script.

`python`


import requests
from bs4 import BeautifulSoup

urlBase = "https://shimane-opendata.jp"
urlName = urlBase + "/db/dataset/010010"

def get_tag_from_html(urlName, tag):
  url = requests.get(urlName)
  soup = BeautifulSoup(url.content, "html.parser")
  return soup.find_all(tag)

def get_page_urls_from_catalogpage(urlName):
  urlNames = []
  elems = get_tag_from_html(urlName, "a")
  for elem in elems:
    try:
      string = elem.get("class")[0]
      if string in "heading":
        href = elem.get("href")
        if href.find("resource") > 0:
          urlNames.append(urlBase + href)
    except:
      pass
  return urlNames

urlNames = get_page_urls_from_catalogpage(urlName)
print(urlNames)

Get CSV URL

Get the CSV URL with the following script.

`python`


def get_csv_urls_from_url(urlName):
  urlNames = []
  elems = get_tag_from_html(urlName, "a")
  for elem in elems:
    try:
      href = elem.get("href")
      if href.find(".csv") > 0:
        urlNames.append(href)
    except:
      pass
  return urlNames[0]

urls = []

for urlName in urlNames:
  urls.append(get_csv_urls_from_url(urlName))

print(urls)

Get data from URL and create data frame

Read the data directly from the URL obtained above. By the way, the character code is Shift JIS, and the first 5 lines contain information other than data, so that is excluded.

`python`


import pandas as pd

df = pd.DataFrame()

for url in urls:
    df = pd.concat([df, pd.read_csv(url, encoding="Shift_JIS").iloc[6:]])

df.shape

Data confirmation and processing

`python`


df.info()

You can get the column information by executing the above.

<class 'pandas.core.frame.DataFrame'>
Int64Index: 2016 entries, 6 to 149
Data columns (total 97 columns):
 #   Column   Non-Null Count  Dtype 
---  ------   --------------  ----- 
0 Observatory 2016 non-null   object
1 Iwasakibashi 2016 non-null   object
2 Otani 2016 non-null   object
3 Tamayu River 2016 non-null   object
4 Kashima 2016 non-null   object
5 Mabashi River 2016 non-null   object
6 Hizugawa Sluice Upstream 2016 non-null   object
7 Downstream of Hitsugawa Sluice 2016 non-null   object
8 Kitada River Sluice Upstream 2016 non-null   object
9 Downstream of Kitada River Sluice 2016 non-null   object
10 Kyobashi River 2016 non-null   object
11 Kyobashi River Sluice Upstream 2016 non-null   object
12 Kyobashi River Sluice Downstream 2016 non-null   object
13 Upper Tekai Sluice 2016 non-null   object
14 Downstream of Tekai Sluice 2016 non-null   object
15 Kano Bridge 2016 non-null   object
16 Izumokyo 2016 non-null   object
17 Nunobe 2016 non-null   object
18 Owatari 2016 non-null   object
19 Yada 2016 non-null   object
20 Iinashi Bridge 2016 non-null   object
21 Shimoyama Sa 2016 non-null   object
22 Grandfather Tanigawa 2016 non-null   object
23 Hirotsuruhashi 2016 non-null   object
24 Yasugi Ohashi 2016 non-null   object
25 Yoshida Bridge 2016 non-null   object
26 Hinodebashi 2016 non-null   object
27 Kakeya Ohashi 2016 non-null   object
28 Sakayama Bridge 2016 non-null   object
29 Kandabashi 1 2016 non-null   object
30 Hachiguchi Bridge 2016 non-null   object
31 Yagami 2016 non-null   object
32 Yokota Shinohashi 2016 non-null   object
33 Mitsunari Ohashi 2016 non-null   object
34 Shimbashi 2016 non-null   object
35 Takasegawa 2016 non-null   object
36 Goemon Bridge 2016 non-null   object
37 Ron Tagawa 2016 non-null   object
38 Yutanigawa 2016 non-null   object
39 Nishihirata 2016 non-null   object
40 Ichimonbashi 2016 non-null   object
41 Nie 2016 non-null   object
42 Sada 2016 non-null   object
43 Kimura Bridge 2016 non-null   object
44 Shinnai Fujikawa 2016 non-null   object
45 Akakawa 2016 non-null   object
46 Flowing Bridge 2016 non-null   object
47 Touma River 2016 non-null   object
48 Lake Jinzai 2016 non-null   object
49 Imbara 2016 non-null   object
50 Shimokuchiba 2016 non-null   object
51 Kawai Bridge 2016 non-null   object
52 Yokaichibashi 2016 non-null   object
53 Masahara Bridge 2016 non-null   object
54 exit 2016 non-null   object
55 sunrise 2016 non-null   object
56 Kandabashi 2 2016 non-null   object
57 Nagahisa 2016 non-null   object
58 Kute 2016 non-null   object
59 Togeshika 2016 non-null   object
60 Takuno 2016 non-null   object
61 Zenkoji Bridge 2016 non-null   object
62 Furuichibashi 2016 non-null   object
63 Eo 2016 non-null   object
64 Tochiya 2016 non-null   object
65 Chikahara 2016 non-null   object
66 Hinui 2016 non-null   object
67 Katsuji 2016 non-null   object
68 Toji 2016 non-null   object
69 Fuchubashi 2016 non-null   object
70 Shimoraihara 2016 non-null   object
71 Isago 2016 non-null   object
72 Sangubashi 2016 non-null   object
73 Nakashibabashi 2016 non-null   object
74 Hamada Ohashi 2016 non-null   object
75 Hamada 2016 non-null   object
76 Midfield 2016 non-null   object
77 Misumi 2016 non-null   object
78 Nishikawachi 2016 non-null   object
79 Keikawabashi 2016 non-null   object
80 Daido Bridge 2016 non-null   object
81 Showa Bridge 2016 non-null   object
82 Somewa 2016 non-null   object
83 Asakura 2016 non-null   object
84 Kiami River 2016 non-null   object
85 Too Bridge 2016 non-null   object
86 Aioi Bridge 2016 non-null   object
87 Asahibashi 2016 non-null   object
88 Machida 2016 non-null   object
89 Nakajo 2016 non-null   object
90 Yao River 2016 non-null   object
91 Hatabashi 2016 non-null   object
92 Shintsutsumi Bridge 2016 non-null   object
93 Kiyomi Bridge 2016 non-null   object
94 Goka Ohashi 2016 non-null   object
95 Tsuma River 2016 non-null   object
96 Mita 2016 non-null   object
dtypes: object(97)
memory usage: 1.5+ MB

Since everyone's Dtype is object, it seems that the numerical data is a character string ...

Also, if you take a look inside, it seems that the strings "Uncollected", "Missing", and "Maintenance" are included. After removing those character information, it is converted to a real value. Since the date and time data is also a character string, this also has to be converted to a serial value.

So, execute the following script.

`python`


df.index = df["Observatory"].map(lambda _: pd.to_datetime(_))

df = df.replace('Not collected', '-1')
df = df.replace('Missing', '-1')
df = df.replace('Maintenance', '-1')

cols = df.columns[1:]

for col in cols:
  df[col] = df[col].astype("float")

Visualization

Try drawing the graph after setting the environment so that the Japanese display does not become strange.

`python`


!pip install japanize_matplotlib

import matplotlib.pyplot as plt
import japanize_matplotlib 
import seaborn as sns

sns.set(font="IPAexGothic")

df[cols[:5]].plot(figsize=(15,5))
plt.show()

df["2020-07-12":"2020-07-13"][cols[:5]].plot(figsize=(15,5))
plt.show()

It has been raining since the other day, so you can see at a glance how the water level is rising.

Well, what are we going to do now?

Let's visualize the river water level data released by Shimane Prefecture

Introduction

Check the procedure

View the structure of the public page

Catalog page

River water level page

Daily data page

Daily CSV

procedure

Get the URL of the daily page

python

Get CSV URL

python

Get data from URL and create data frame

python

Data confirmation and processing

python

python

Visualization

python

`python`

`python`

`python`

`python`

`python`

`python`