TL;DR I wanted to do web scraping with python, so I tried it with requests + BeautifulSoup as usual. However, for some reason I could only get a part of the page, and after various investigations, I found something called "requests-html", so I will introduce this.
module
Install requests_html
with pip.
When I tried it on mac, there was no problem, but when I did pip install requests_html on Raspberry Pi, the following error occurred
ERROR: Command errored out with exit status 1:
(abridgement)
Error: Please make sure the libxml2 and libxslt development packages are installed.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output ```
Apparently, it seems that lxml is included in requests_html, and this is an error in Raspberry Pi.
Solved below
```sudo apt-get install libxml2-dev libxslt-dev python3-dev
pip install lxml
from requests_html import HTMLSession
url = "https://stopcovid19.metro.tokyo.lg.jp/cards/positive-rate"
#Session start
session = HTMLSession()
r = session.get(url)
r.html.render()
#Element acquisition
rows = r.html.find("span")
for row in rows:
print(row.text) #The text of all span elements is displayed
Get all the specified elements in the page with r.html.find ("element name"). In this example, I got the new Tokyo Metropolitan Corona Site, but with requests + Beautiful Soup, I could only get a part of the screen.
Recommended Posts