Measure and compare temperature with Raspberry Pi and automatically generate graph

This is Qiita's 4th post. Inexperienced in the IT industry and the definition of words may be wrong. If you have any notice, I would appreciate it if you could give me some advice.

My house is a 25-year-old wooden house, and the first floor is very cold in winter and the second floor is moderately hot in summer. Also, probably because it is a wooden building, I feel that the temperature inside the house changes slowly. For example, it is cool from 9 am to 11 am in June, but the inside of the house gets damp around 19:00 in the evening.

Therefore, I wanted to measure temperature data in various seasons and places, so I tried to acquire temperature data and process the data with Rasberry Pi.

Introduction

I really wanted to measure and compare the temperatures at multiple locations at the same time, but due to budget constraints I could only get one Rasberry Pi, so I decided to compare the measurement data with the data from the Japan Meteorological Agency near my house. Did.

Measure the temperature every 60 minutes with a temperature sensor called DHT11
Scraping the meteorological data of the Japan Meteorological Agency with Beautiful Soup
Create a graph with matplotlib The we.

Scraping is the extraction of specific information from a website. Click here for details (https://ja.wikipedia.org/wiki/%E3%82%A6%E3%82%A7%E3%83%96%E3%82%B9%E3%82%AF%E3 Please check% 83% AC% E3% 82% A4% E3% 83% 94% E3% 83% B3% E3% 82% B0).

1. Measure the temperature every 60 minutes with a temperature sensor called DHT11

1-1. Build a circuit and install a sample script

Measure temperature and humidity with Raspberry Pi and DHT11 and Make temperature and humidity logger with Raspberry Zero and DHT11 / items / 2737749d4532150026ee) I built a circuit and created a sample script with reference to.

I used a Raspberry Pi 3B +, and the circuit looks like this.

For each function in the sample script, [Detailed description of temperature / humidity sensor (DHT11)](https://www.souichi.club/technology/dht11-datasheet/#parsedatapulluplengths%E3%83%A1%E3%82%BD Understood by% E3% 83% 83% E3% 83% 89).

1-2. Create code to store data in csv using sample script

Since the data of the Japan Meteorological Agency is scraped, it is necessary to match the time of the data to be scraped with the data acquired by the sensor.

Since error handling is performed by collect_input and parse_data_pull_up_lengths of the sample script, the example.py of the sample script has been rewritten so that the measured data can be saved in the csv file. Also, it seems that the is_varlid () function is executed to check whether it was obtained properly, but there are quite a few cases where this function returns a failure.

When I got the number of failures with the count variable, ・ Success in one time ... 70% ・ Successful at the 17th time ... 15% ・ Successful at 33rd time ... 10% ・ More ... 5% (I'm sorry. I don't have statistics. It's just subjective.) If it fails, it seems that it has failed in a row of 2 to the nth power. Since the data is acquired once every 60 minutes this time, "the number of trials that can be considered from the number of failures in the past" is set to 2 plus 10 (= 1029).

** Click all code to expand **

import RPi.GPIO as GPIO
import dht11
import time
import datetime
import os
import numpy as np
import pandas as pd

def add_data(filepath):
    #Check if the csv file exists in the specified path
    if os.path.isfile(filepath):
        #Read if it exists
        df = pd.read_csv(filepath, index_col=0)
    else:
        #Create new if it does not exist
        df = pd.DataFrame([], columns=["year","month","day","hour","sensor_temprature", \
                                       "scraping_temprature","read_try"])
    #Create a variable to count the number of attempts
    count=0
    # is_valid()Read temperature data up to 1029 times until the function is True
    while count<=1029:
        result = instance.read()
        count+=1
        if result.is_valid():
           #Store year, month and day in DataFrame for web scraping
            year = datetime.datetime.now().year
            month = datetime.datetime.now().month
            day = datetime.datetime.now().day
            hour = datetime.datetime.now().hour
            temp = result.temperature
            hum = result.humidity
            data=[year,month,day,hour,temp,hum,"",count]
            s = pd.Series(data, index=df.columns)
            df = df.append(s, ignore_index=True)
            break
    return df

# initialize GPIO
GPIO.setwarnings(True)
GPIO.setmode(GPIO.BCM)

instance = dht11.DHT11(pin=14)

filepath = '/home/pi/Desktop/DHT11_Python/data.csv'
df=add_data(filepath)
df.to_csv(filepath)
print(df)

Set the temperature sensor to start once every 1-3.60 minutes

I set it to start on time by referring to Automatically run the program using systemd on Raspberry Pi. In addition, set as "hourly" at the bottom of Arch manual pages (SYSTEMD.TIME (7)) doing.

2. Scraping the meteorological data of the Japan Meteorological Agency with Beautiful Soup

I wrote the code so that I can scrape the data of the Japan Meteorological Agency by referring to Scraping past weather data with Python. Please note that the description method will change slightly depending on the version of beautiful soup and Raspberry Pi / pip.

The difference between the reference url and the page targeted for scraping this time is

--While the reference link page has two tables, the target page this time has only one table.
Therefore, it is necessary to rewrite the code assuming that table 1 does not exist. --The reference link page has only one column name, but the target page this time has two columns.
I set the column name manually because it is not convenient when creating a DataFrame with pandas.

Screenshots of this scraping target 対象の画像.png

Screenshots of links scraped リンクの画像.png

Other changes in writing code are

--You can only get one day's worth of temperature data on one page.
Therefore, I set variables for year, month, and day so that I can move pages, so that I can access all dates. --When I try to scrape the weather data for today, the page does not exist and an error occurs.
Therefore, today's page is set not to be scraped.

This is a slightly dirty code. .. ..

** Click all code to expand **

from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests
import datetime
import os
import sys

#Changing the prec and block codes will change the region.
# m_prec=44 represents Tokyo, m_block=1133 represents Fuchu.
def scraping_weather(m_year,m_month,m_day,m_prec=44,m_block=1133):
    url ="https://www.data.jma.go.jp/obd/stats/etrn/view/hourly_a1.php?prec_no={prec}&block_no={block}&year={year}&month={month}&day={day}&view="
    url = url.format(prec=m_prec,block=m_block,year=m_year,month=m_month,day=m_day)
    html=requests.get(url)
    soup = BeautifulSoup(html.content, "html.parser")

    # id='tablefix2'of<table>Extract
    table = soup.find('table', id='tablefix1')  

    #Extract all th in table2
    th_all = table.find_all('th')  
    #Manually store column titles
    table_column = ["Time", "Precipitation","temperature", "Wind speed / direction(m/s)","日照Time間","snow(cm)","wind speed","Wind direction","降snow","積snow"]

    # <table>Extract all trs in.
    tr_all = table.find_all('tr')  

    #The first tr is already extracted, so skip it
    tr_all = tr_all[1:]  

    #Calculate the number of rows and columns and create an ndarray
    number_of_cols = len(table_column)  
    number_of_rows = len(tr_all)  
    table_data = np.zeros((number_of_rows, number_of_cols), dtype=np.float32)  

    #Store the data of each row in ndarray
    for r, tr in enumerate(tr_all):  
        td_all = tr.find_all('td')  
        for c, td in enumerate(td_all):  
            try:  
                table_data[r,c] = td.string  
            except ValueError:  
                table_data[r,c] = np.nan  

    #Generate a DataFrame of the extracted data
    df = pd.DataFrame(data=table_data, columns=table_column)  
    return df

def combine_scraping_data(df):
    date_before=str(0)
    #Search all data
    for i in range(len(df)):
        #Check if web scraping has been done in the past
        if np.isnan(df.loc[i,"scraping_temprature"]):
            year = df.loc[i,"year"]
            month = df.loc[i,"month"]
            day = df.loc[i,"day"]
            
            #There is no scraping data for today, so skip if it is the same as today
            if ((day == datetime.date.today().day)&(month == datetime.date.today().month)):
                continue
            
            #Substitute date as str type
            date_now = str(year)+str(month)+str(day)
            
            #Check if the previous line and date have changed
            if date_now != date_before:
                #If it has changed, scrape it and store it in DataFrame.
                t_df =  scraping_weather(year,month,day)
                # date_Update date of before variable
                date_before=date_now
            for j in range(len(t_df)):
                #Match the scraped data with the sensor data and substitute the scraped temperature
                if df.loc[i,"hour"] == t_df.loc[j,"Time"]:
                    df.loc[i,"scraping_temprature"] = t_df.loc[j,"temperature"]
    return df


filepath = '/home/pi/Desktop/DHT11_Python/data.csv'

if os.path.isfile(filepath):
    df = pd.read_csv(filepath, index_col=0)
else:
    print("No data. This python wll be stopped.")
    sys.exit()


for i in range(len(df)):
    if df.loc[i,'hour']==0:
        df.loc[i,'hour'] =24
        df.loc[i,'day'] -=1

df = combine_scraping_data(df)

df.to_csv(filepath)
print(df['scraping_temprature'])

3. Create a graph with matplotlib

Create a graph of scraping data with matplotlib and save it in the specified file. The graph is [Draw a line graph with matplotlib](https://pythondatascience.plavox.info/matplotlib/%E6%8A%98%E3%82%8C%E7%B7%9A%E3%82%B0%E3% The plot was made with reference to 83% A9% E3% 83% 95).

I decided to create a folder for each date in the result folder and store the csv file and graph (png file) in it. It looks like this in the figure.

├── creating_graph.py
└── result
    ├── 6_10
    │   ├── 6_10.csv
    │   └── 6_10.png
    └── 6_11
        ├── 6_11.csv
        └── 6_11.png

Also, after creating the graph, I decided to delete the target data from the original data.

** Click all code to expand **

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os


filepath = '/home/pi/Desktop/DHT11_Python/data.csv'

#Read csv file with DataFrame
if os.path.isfile(filepath):
    df = pd.read_csv(filepath, index_col=0)
else:
    print("No data. This python wll be stopped.")
    sys.exit()

#Since a daily graph is created, list the month / day data and run a loop.
for g_month in df["month"].unique():
    for g_day in df["day"].unique():
        #Target data t_Load into df
        t_df = df[(df['day']==g_day)&(df['month']==g_month)]
        #Skip if no scraping data exists
        if t_df["scraping_temprature"].isnull().any():
            continue
        
        #Create a new folder
        result_filepath = '/home/pi/Desktop/DHT11_Python/dht11_creating_graph/result/{month}_{day}'
        result_filepath = result_filepath.format(month=g_month, day=g_day)
        os.makedirs(result_filepath, exist_ok=True)
        
        #Create and save the path to save the csv file
        result_filename = '/{month}_{day}.csv'
        result_filename = result_filename.format(month=g_month, day=g_day)
        t_df.to_csv(result_filepath+result_filename)
        
        #Create a path to store the graph
        result_graphname = '/{month}_{day}.png'
        result_graphname = result_graphname.format(month=g_month, day=g_day)
        
        #Create a graph
        x=t_df['hour']
        y1=t_df['sensor_temprature']
        y2=t_df['scraping_temprature']
        
        fig = plt.figure()
        p1 = plt.plot(x, y1, linewidth=2)
        p2 = plt.plot(x, y2, linewidth=2, linestyle="dashed")
        plt.legend((p1[0], p2[0]), ('sensor_temprature', 'scraping_temprature'), loc=2)
        plt.xlabel("hour")
        plt.ylabel("temprature")
        
        #Save the graph
        fig.savefig(result_filepath+result_graphname)
        
        #Delete the graphed data from the original data
        df = df[(df['day']!=g_day)|(df['month']!=g_month)]
        #Reindex the original data
        df.reset_index(inplace=True, drop=True)

#Save the original data
df.to_csv(filepath)

result

There are many things to see, but I was able to get the data for the hot room on the 2nd floor on June 14th and the data for the room on the 1st floor that is not so hot on June 16th. If you look at the graph below, in the graph on the first floor, the room temperature is lower than the scraped temperature from 8:00 to 18:00, while in the graph on the second floor, the room temperature is lower than the scraped temperature. You can see that it is always expensive.

Graph of hot room on the 2nd floor

Graph of a room on the first floor that is not so hot

Highlights

--The temperature on the second floor is measured on a rainy day, and the temperature on the first floor is measured on a sunny day. Isn't the precondition different in the first place? ――What happens when you consider the measurement error of the temperature sensor itself? ――Isn't there an error depending on how the wind hits the measurement location and how the sun hits? ――The temperature sensor itself is shining. Does this light raise the temperature and cause the measured value to fluctuate? --If the temperature sensor is tilted, is there an error in the measurement result?

Reference photo

in conclusion

There are doubts about the measurement results, but since we were able to automate the scraping of temperature data and the creation of graphs to some extent, we will end this time. I would be grateful if you could contact me if you have any concerns!