[Raspberry Pi] Scraping of web pages that cannot be obtained with python requests + Beautiful Soup

TL;DR I wanted to do web scraping with python, so I tried it with requests + BeautifulSoup as usual. However, for some reason I could only get a part of the page, and after various investigations, I found something called "requests-html", so I will introduce this.

environment

Raspberry Pi 4 Model B(Raspbian GNU/Linux 10)
Python 3.6.1

module Install requests_html with pip.

Raspberry Pi specific error

When I tried it on mac, there was no problem, but when I did pip install requests_html on Raspberry Pi, the following error occurred

`ERROR: Command errored out with exit status 1:`


(abridgement)
Error: Please make sure the libxml2 and libxslt development packages are installed.
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output ```

 Apparently, it seems that lxml is included in requests_html, and this is an error in Raspberry Pi.
 Solved below
```sudo apt-get install libxml2-dev libxslt-dev python3-dev
pip install lxml

code

from requests_html import HTMLSession
url = "https://stopcovid19.metro.tokyo.lg.jp/cards/positive-rate"
#Session start
session = HTMLSession()
r = session.get(url)
r.html.render()

#Element acquisition
rows = r.html.find("span")
for row in rows:
    print(row.text) #The text of all span elements is displayed

Get all the specified elements in the page with r.html.find ("element name"). In this example, I got the new Tokyo Metropolitan Corona Site, but with requests + Beautiful Soup, I could only get a part of the screen.

Recommended Posts

[Raspberry Pi] Scraping of web pages that cannot be obtained with python requests + Beautiful Soup

[Python] Introduction to web scraping | Summary of methods that can be used with webdriver

Try scraping with Python + Beautiful Soup

Scraping multiple pages with Beautiful Soup

Scraping pages with pagination with Beautiful Soup

Basic summary of scraping with Requests that beginners can absolutely understand [Python]

Write a basic headless web scraping "bot" in Python with Beautiful Soup 4

Address to the bug that node.surface cannot be obtained with python3 + mecab

Get CPU information of Raspberry Pi with Python

One-liner that outputs 10000 digits of pi with Python

Measure CPU temperature of Raspberry Pi with Python

Moved Raspberry Pi remotely so that it can be LED attached with Python

Scraping with Beautiful Soup

Let's operate GPIO of Raspberry Pi with Python CGI

Web scraping with python + JupyterLab

I tried running Movidius NCS with python of Raspberry Pi3

SSD 1306 OLED can be used with Raspberry Pi + python (Note)

Web scraping beginner with python

Table scraping with Beautiful Soup

Https access via proxy with Python web scraping was easy with requests

Get US stock price from Python with Web API with Raspberry Pi

Use vl53l0x with Raspberry Pi (python)

Web scraping with Python ① (Scraping prior knowledge)

[Python] A memorandum of beautiful soup4

Scraping with Beautiful Soup in 10 minutes

Website scraping with Python's Beautiful Soup

Settings when using Python 3 requests and Beautiful Soup with crostini on Chromebook

Sort anime faces by scraping anime character pages with Beautiful Soup and Selenium

WEB scraping with Python (for personal notes)

[Python3] Understand the basics of Beautiful Soup

Getting Started with Python Web Scraping Practice

[Personal note] Web page scraping with python3

Web scraping with Python ② (Actually scraping stock sites)

Horse Racing Site Web Scraping with Python

Getting Started with Python Web Scraping Practice

Python modules with "-(hyphen)" cannot be removed

[Python] Scraping a table using Beautiful Soup

Practice web scraping with Python and Selenium

Items that cannot be imported with sklearn

Easy web scraping with Python and Ruby

[For beginners] Try web scraping with Python

Working with GPS on Raspberry Pi 3 Python

Delete files that have passed a certain period of time with Raspberry PI

Periodically notify the processing status of Raspberry Pi with python → Google Spreadsheet → LINE

[Python] How to save images on the Web at once with Beautiful Soup

"Gazpacho", a scraping module that can be used more easily than Beautiful Soup

A story that I wanted to realize the identification of parking lot fullness information using images obtained with a Web camera and Raspberry Pi and deep learning.

Discord bot with python raspberry pi zero with [Notes]

I tried L-Chika with Raspberry Pi 4 (Python edition)

Investigation when import cannot be done with python

CSV output of pulse data with Raspberry Pi (CSV output)

Let's do web scraping with Python (weather forecast)

Let's do web scraping with Python (stock price)

Connect to MySQL with Python on Raspberry Pi

GPS tracking with Raspberry Pi 4B + BU-353S4 (Python)

Workaround for the problem that UTF-8 Japanese mail cannot be sent with Flask-Mail (Python3)

I tried various things with Python: scraping (Beautiful Soup + Selenium + PhantomJS) and morphological analysis.

File sharing server made with Raspberry Pi that can be used for remote work