How to scrape websites created with SPA

I used to use the requests module as a method of scraping with Python, but this can be used for sites that return HTML generated on the server side, but since I can only get the response before executing JavaScript, the client It couldn't be used on a site created by SPA that executes JavaScript on the side and generates HTML at hand.

requests-html module

You need to use requests-html to scrape sites created with SPA.

Installation

pip install requests-html

How to use

`main.py`


# -*- coding: utf-8 -*-
import requests
from requests_html import HTMLSession

def main_render_javascript_page():
    url = 'https://hogehoge'
    session = HTMLSession()
    r = session.get(url)
    r.html.render()
    title =  r.html.find('body', first=True).text
    print(title)
        
def main_normal_page():
    url = 'https://hogehoge'
    r = requests.get(url)
    print(r.text)

if __name__ == '__main__':
    main_normal_page()
    main_render_javascript_page()

official

https://requests.readthedocs.io/projects/requests-html/en/latest/

Reference site

https://dev.classmethod.jp/articles/python-asyncio/ https://blog.ikedaosushi.com/entry/2019/09/15/162445

Recommended Posts

How to scrape websites created with SPA

Scraping 2 How to scrape

How to scrape horse racing data with BeautifulSoup

How to scrape image data from flickr with python

How to update with SQLAlchemy?

How to cast with Theano

How to Alter with SQLAlchemy?

How to separate strings with','

How to RDP with Fedora31

How to Delete with SQLAlchemy?

How to scrape at speed per second with Python Selenium

Python: How to use async with

How to use virtualenv with PowerShell

How to deal with imbalanced data

How to install python-pip with ubuntu20.04LTS

How to deal with imbalanced data

How to get started with Scrapy

How to get started with Python

How to deal with DistributionNotFound errors

How to get started with Django

How to use FTP with Python

How to calculate date with python

How to install mysql-connector with pip3

How to INNER JOIN with SQLAlchemy

How to install Anaconda with pyenv

How to authenticate with Django Part 2

How to authenticate with Django Part 3

How to batch start a python program created with Jupyter notebook

How to do arithmetic with Django template

How to title multiple figures with matplotlib

How to get parent id with sqlalchemy

How to add a package with PyCharm

How to use OpenVPN with Ubuntu 18.04.3 LTS

How to use Cmder with PyCharm (Windows)

How to prevent package updates with apt

How to work with BigQuery in Python

How to use Ass / Alembic with HtoA

How to deal with enum compatibility errors

How to use Japanese with NLTK plot

How to do portmanteau test with python

How to search Google Drive with Google Colaboratory

How to display python Japanese with lolipop

How to download youtube videos with youtube-dl

How to use jupyter notebook with ABCI

How to power off Linux with Ultra96-V2

"How to pass PATH" to learn with homebrew

How to use CUT command (with sample)

How to enter Japanese with Python curses

[Python] How to deal with module errors

How to install zsh (with .zshrc customization)

How to read problem data with paiza

How to use SQLAlchemy / Connect with aiomysql

How to get started with laravel (Linux)

How to group volumes together with LVM

How to install python3 with docker centos

How to use JDBC driver with Redash

How to selectively delete past tweets with Tweepy

How to upload with Heroku, Flask, Python, Git (4)

How to deal with memory leaks in matplotlib.pyplot

How to create sample CSV data with hypothesis

How to read a CSV file with Python 2/3