This is Qiita's 4th post. Inexperienced in the IT industry and the definition of words may be wrong. If you have any notice, I would appreciate it if you could give me some advice.
My house is a 25-year-old wooden house, and the first floor is very cold in winter and the second floor is moderately hot in summer. Also, probably because it is a wooden building, I feel that the temperature inside the house changes slowly. For example, it is cool from 9 am to 11 am in June, but the inside of the house gets damp around 19:00 in the evening.
Therefore, I wanted to measure temperature data in various seasons and places, so I tried to acquire temperature data and process the data with Rasberry Pi.
I really wanted to measure and compare the temperatures at multiple locations at the same time, but due to budget constraints I could only get one Rasberry Pi, so I decided to compare the measurement data with the data from the Japan Meteorological Agency near my house. Did.
Scraping is the extraction of specific information from a website. Click here for details (https://ja.wikipedia.org/wiki/%E3%82%A6%E3%82%A7%E3%83%96%E3%82%B9%E3%82%AF%E3 Please check% 83% AC% E3% 82% A4% E3% 83% 94% E3% 83% B3% E3% 82% B0).
Measure temperature and humidity with Raspberry Pi and DHT11 and Make temperature and humidity logger with Raspberry Zero and DHT11 / items / 2737749d4532150026ee) I built a circuit and created a sample script with reference to.
I used a Raspberry Pi 3B +, and the circuit looks like this.
For each function in the sample script, [Detailed description of temperature / humidity sensor (DHT11)](https://www.souichi.club/technology/dht11-datasheet/#parsedatapulluplengths%E3%83%A1%E3%82%BD Understood by% E3% 83% 83% E3% 83% 89).
Since the data of the Japan Meteorological Agency is scraped, it is necessary to match the time of the data to be scraped with the data acquired by the sensor.
Since error handling is performed by collect_input and parse_data_pull_up_lengths of the sample script, the example.py of the sample script has been rewritten so that the measured data can be saved in the csv file. Also, it seems that the is_varlid () function is executed to check whether it was obtained properly, but there are quite a few cases where this function returns a failure.
When I got the number of failures with the count variable, ・ Success in one time ... 70% ・ Successful at the 17th time ... 15% ・ Successful at 33rd time ... 10% ・ More ... 5% (I'm sorry. I don't have statistics. It's just subjective.) If it fails, it seems that it has failed in a row of 2 to the nth power. Since the data is acquired once every 60 minutes this time, "the number of trials that can be considered from the number of failures in the past" is set to 2 plus 10 (= 1029).
import RPi.GPIO as GPIO
import dht11
import time
import datetime
import os
import numpy as np
import pandas as pd
def add_data(filepath):
#Check if the csv file exists in the specified path
if os.path.isfile(filepath):
#Read if it exists
df = pd.read_csv(filepath, index_col=0)
else:
#Create new if it does not exist
df = pd.DataFrame([], columns=["year","month","day","hour","sensor_temprature", \
"scraping_temprature","read_try"])
#Create a variable to count the number of attempts
count=0
# is_valid()Read temperature data up to 1029 times until the function is True
while count<=1029:
result = instance.read()
count+=1
if result.is_valid():
#Store year, month and day in DataFrame for web scraping
year = datetime.datetime.now().year
month = datetime.datetime.now().month
day = datetime.datetime.now().day
hour = datetime.datetime.now().hour
temp = result.temperature
hum = result.humidity
data=[year,month,day,hour,temp,hum,"",count]
s = pd.Series(data, index=df.columns)
df = df.append(s, ignore_index=True)
break
return df
# initialize GPIO
GPIO.setwarnings(True)
GPIO.setmode(GPIO.BCM)
instance = dht11.DHT11(pin=14)
filepath = '/home/pi/Desktop/DHT11_Python/data.csv'
df=add_data(filepath)
df.to_csv(filepath)
print(df)
I set it to start on time by referring to Automatically run the program using systemd on Raspberry Pi. In addition, set as "hourly" at the bottom of Arch manual pages (SYSTEMD.TIME (7)) doing.
I wrote the code so that I can scrape the data of the Japan Meteorological Agency by referring to Scraping past weather data with Python. Please note that the description method will change slightly depending on the version of beautiful soup and Raspberry Pi / pip.
The difference between the reference url and the page targeted for scraping this time is
--While the reference link page has two tables, the target page this time has only one table.
Therefore, it is necessary to rewrite the code assuming that table 1 does not exist.
--The reference link page has only one column name, but the target page this time has two columns.
I set the column name manually because it is not convenient when creating a DataFrame with pandas.
Screenshots of this scraping target
Screenshots of links scraped
Other changes in writing code are
--You can only get one day's worth of temperature data on one page.
Therefore, I set variables for year, month, and day so that I can move pages, so that I can access all dates.
--When I try to scrape the weather data for today, the page does not exist and an error occurs.
Therefore, today's page is set not to be scraped.
from bs4 import BeautifulSoup
import pandas as pd
import numpy as np
import requests
import datetime
import os
import sys
#Changing the prec and block codes will change the region.
# m_prec=44 represents Tokyo, m_block=1133 represents Fuchu.
def scraping_weather(m_year,m_month,m_day,m_prec=44,m_block=1133):
url ="https://www.data.jma.go.jp/obd/stats/etrn/view/hourly_a1.php?prec_no={prec}&block_no={block}&year={year}&month={month}&day={day}&view="
url = url.format(prec=m_prec,block=m_block,year=m_year,month=m_month,day=m_day)
html=requests.get(url)
soup = BeautifulSoup(html.content, "html.parser")
# id='tablefix2'of<table>Extract
table = soup.find('table', id='tablefix1')
#Extract all th in table2
th_all = table.find_all('th')
#Manually store column titles
table_column = ["Time", "Precipitation","temperature", "Wind speed / direction(m/s)","日照Time間","snow(cm)","wind speed","Wind direction","降snow","積snow"]
# <table>Extract all trs in.
tr_all = table.find_all('tr')
#The first tr is already extracted, so skip it
tr_all = tr_all[1:]
#Calculate the number of rows and columns and create an ndarray
number_of_cols = len(table_column)
number_of_rows = len(tr_all)
table_data = np.zeros((number_of_rows, number_of_cols), dtype=np.float32)
#Store the data of each row in ndarray
for r, tr in enumerate(tr_all):
td_all = tr.find_all('td')
for c, td in enumerate(td_all):
try:
table_data[r,c] = td.string
except ValueError:
table_data[r,c] = np.nan
#Generate a DataFrame of the extracted data
df = pd.DataFrame(data=table_data, columns=table_column)
return df
def combine_scraping_data(df):
date_before=str(0)
#Search all data
for i in range(len(df)):
#Check if web scraping has been done in the past
if np.isnan(df.loc[i,"scraping_temprature"]):
year = df.loc[i,"year"]
month = df.loc[i,"month"]
day = df.loc[i,"day"]
#There is no scraping data for today, so skip if it is the same as today
if ((day == datetime.date.today().day)&(month == datetime.date.today().month)):
continue
#Substitute date as str type
date_now = str(year)+str(month)+str(day)
#Check if the previous line and date have changed
if date_now != date_before:
#If it has changed, scrape it and store it in DataFrame.
t_df = scraping_weather(year,month,day)
# date_Update date of before variable
date_before=date_now
for j in range(len(t_df)):
#Match the scraped data with the sensor data and substitute the scraped temperature
if df.loc[i,"hour"] == t_df.loc[j,"Time"]:
df.loc[i,"scraping_temprature"] = t_df.loc[j,"temperature"]
return df
filepath = '/home/pi/Desktop/DHT11_Python/data.csv'
if os.path.isfile(filepath):
df = pd.read_csv(filepath, index_col=0)
else:
print("No data. This python wll be stopped.")
sys.exit()
for i in range(len(df)):
if df.loc[i,'hour']==0:
df.loc[i,'hour'] =24
df.loc[i,'day'] -=1
df = combine_scraping_data(df)
df.to_csv(filepath)
print(df['scraping_temprature'])
Create a graph of scraping data with matplotlib and save it in the specified file. The graph is [Draw a line graph with matplotlib](https://pythondatascience.plavox.info/matplotlib/%E6%8A%98%E3%82%8C%E7%B7%9A%E3%82%B0%E3% The plot was made with reference to 83% A9% E3% 83% 95).
I decided to create a folder for each date in the result folder and store the csv file and graph (png file) in it. It looks like this in the figure.
├── creating_graph.py
└── result
├── 6_10
│ ├── 6_10.csv
│ └── 6_10.png
└── 6_11
├── 6_11.csv
└── 6_11.png
Also, after creating the graph, I decided to delete the target data from the original data.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
filepath = '/home/pi/Desktop/DHT11_Python/data.csv'
#Read csv file with DataFrame
if os.path.isfile(filepath):
df = pd.read_csv(filepath, index_col=0)
else:
print("No data. This python wll be stopped.")
sys.exit()
#Since a daily graph is created, list the month / day data and run a loop.
for g_month in df["month"].unique():
for g_day in df["day"].unique():
#Target data t_Load into df
t_df = df[(df['day']==g_day)&(df['month']==g_month)]
#Skip if no scraping data exists
if t_df["scraping_temprature"].isnull().any():
continue
#Create a new folder
result_filepath = '/home/pi/Desktop/DHT11_Python/dht11_creating_graph/result/{month}_{day}'
result_filepath = result_filepath.format(month=g_month, day=g_day)
os.makedirs(result_filepath, exist_ok=True)
#Create and save the path to save the csv file
result_filename = '/{month}_{day}.csv'
result_filename = result_filename.format(month=g_month, day=g_day)
t_df.to_csv(result_filepath+result_filename)
#Create a path to store the graph
result_graphname = '/{month}_{day}.png'
result_graphname = result_graphname.format(month=g_month, day=g_day)
#Create a graph
x=t_df['hour']
y1=t_df['sensor_temprature']
y2=t_df['scraping_temprature']
fig = plt.figure()
p1 = plt.plot(x, y1, linewidth=2)
p2 = plt.plot(x, y2, linewidth=2, linestyle="dashed")
plt.legend((p1[0], p2[0]), ('sensor_temprature', 'scraping_temprature'), loc=2)
plt.xlabel("hour")
plt.ylabel("temprature")
#Save the graph
fig.savefig(result_filepath+result_graphname)
#Delete the graphed data from the original data
df = df[(df['day']!=g_day)|(df['month']!=g_month)]
#Reindex the original data
df.reset_index(inplace=True, drop=True)
#Save the original data
df.to_csv(filepath)
There are many things to see, but I was able to get the data for the hot room on the 2nd floor on June 14th and the data for the room on the 1st floor that is not so hot on June 16th. If you look at the graph below, in the graph on the first floor, the room temperature is lower than the scraped temperature from 8:00 to 18:00, while in the graph on the second floor, the room temperature is lower than the scraped temperature. You can see that it is always expensive.
Graph of hot room on the 2nd floor
Graph of a room on the first floor that is not so hot
--The temperature on the second floor is measured on a rainy day, and the temperature on the first floor is measured on a sunny day. Isn't the precondition different in the first place? ――What happens when you consider the measurement error of the temperature sensor itself? ――Isn't there an error depending on how the wind hits the measurement location and how the sun hits? ――The temperature sensor itself is shining. Does this light raise the temperature and cause the measured value to fluctuate? --If the temperature sensor is tilted, is there an error in the measurement result?
Reference photo
There are doubts about the measurement results, but since we were able to automate the scraping of temperature data and the creation of graphs to some extent, we will end this time. I would be grateful if you could contact me if you have any concerns!
Recommended Posts