I used to use the requests module as a method of scraping with Python, but this can be used for sites that return HTML generated on the server side, but since I can only get the response before executing JavaScript, the client It couldn't be used on a site created by SPA that executes JavaScript on the side and generates HTML at hand.
You need to use requests-html to scrape sites created with SPA.
pip install requests-html
main.py
# -*- coding: utf-8 -*-
import requests
from requests_html import HTMLSession
def main_render_javascript_page():
url = 'https://hogehoge'
session = HTMLSession()
r = session.get(url)
r.html.render()
title = r.html.find('body', first=True).text
print(title)
def main_normal_page():
url = 'https://hogehoge'
r = requests.get(url)
print(r.text)
if __name__ == '__main__':
main_normal_page()
main_render_javascript_page()
https://requests.readthedocs.io/projects/requests-html/en/latest/
https://dev.classmethod.jp/articles/python-asyncio/ https://blog.ikedaosushi.com/entry/2019/09/15/162445
Recommended Posts