Overview

In this article, I created a page for infinite loading. On such pages, you can't get the source with curl or requests, which is often used, so you need to do something special. This time, I will write the code to get the contents of this page.

environment

*python 3.8.1

code

Get the source for the page created by the code in this article. When the time and the number of acquired bytes limit is reached, the code acquired so far is output.

`get_inf_page.py`


import requests
import timeout_decorator

r_bytes = b""
def main():
    url = "http://localhost:8000"

    r = requests.get(url, stream=True, timeout=20)

    byte_limit = 30
    @timeout_decorator.timeout(100)
    def load_bytes(r):
        global r_bytes
        for l in r.iter_content():
            r_bytes += l
            if len(r_bytes) % 500 == 0:
                print(f"loaded:{len(r_bytes)}/{byte_limit}")
            if len(r_bytes) > byte_limit:
                r.close()
                print("reached size limit")
                break

    try:
        load_bytes(r)
    except timeout_decorator.timeout_decorator.TimeoutError:
        print("timeout")
        pass

    print(r_bytes)

if __name__ == "__main__":
    main()

Operation check (stopped when the number of loaded bytes is exceeded)

Please move the above code while the code of this article is running in another terminal. It will be displayed as follows.

reached size limit
b'<p>Hello World ! 0</p><p>Hello '

Operation check (stop when load time is over)

Change the 11th and 12th lines for the following and check the operation in the same way as above.

    byte_limit = 1000
    @timeout_decorator.timeout(5)

Only the output amount is displayed within 5 seconds after startup.

timeout
b'<p>Hello World ! 0</p><p>Hello World ! 1</p><p>Hello World ! 2</p>'

that's all.

Recommended Posts

Get the source of the page to load infinitely with python.

PhytoMine-I tried to get the genetic information of plants with Python

I tried to get the authentication code of Qiita API with Python.

Get the number of visits to each page with ReportingAPI + Cloud Functions

I tried to get the movie information of TMDb API with Python

Easy way to check the source of Python modules

How to get the number of digits in Python

Try to get the contents of Word with Golang

Get the operation status of JR West with Python

Note: How to get the last day of the month with python (added the first day of the month)

How to get a list of files in the same directory with python

[Introduction to Python] How to get the index of data with a for statement

I tried to find the entropy of the image with python

Try to get the function list of Python> os package

Link to get started with python

Minimum knowledge to get started with the Python logging module

Get information equivalent to the Network tab of Chrome developer tools with Python + Selenium

Get the weather with Python requests

Get the weather with Python requests 2

How to get the Python version

[Part.2] Crawling with Python! Click the web page to move!

How to get started with Python

Try to automate the operation of network devices with Python

[For beginners] Web scraping with Python "Access the URL in the page to get the contents"

How to get into the python development environment with Vagrant

A memo of misunderstanding when trying to load the entire self-made module with Python3

[Introduction to Python] How to get data with the listdir function

How to get the information of organizations, Cost Explorer of another AWS account with Lambda (python)

[Python] How to get the first and last days of the month

I want to output the beginning of the next month with Python

Output the contents of ~ .xlsx in the folder to HTML with Python

From the introduction of JUMAN ++ to morphological analysis of Japanese with Python

I tried to improve the efficiency of daily work with Python

Try to get CloudWatch metrics with re: dash python data source

The fastest way to get camera images regularly with python opencv

Check the existence of the file with python

The road to compiling to Python 3 with Thrift

I want to extract an arbitrary URL from the character string of the html source with python

Memo of the program to get the date in two digits with javascript, Ruby, Python, shell script

Get a capture of the entire web page in Selenium Python VBA

How to crop the lower right part of the image with Python OpenCV

Get the number of searches with a regular expression. SeleniumBasic VBA Python

How to get the date and time difference in seconds with python

Try to image the elevation data of the Geographical Survey Institute with Python

[Introduction to Python] How to sort the contents of a list efficiently with list sort

Get the number of articles accessed and likes with Qiita API + Python

I tried to streamline the standard role of new employees with Python

Get the return value of an external shell script (ls) with python3

Get the contents of git diff from python

[Python] Read the source code of Bottle Part 2

[Python] Get the files in a folder with Python

Load the network modeled with Rhinoceros in Python ③

Prepare the execution environment of Python3 with Docker

2016 The University of Tokyo Mathematics Solved with Python

[Python] Get / edit the scale label of the figure

Color page judgment of scanned image with python

[Note] Export the html of the site with python.

[Python] Get the main topics of Yahoo News

Get the caller of a function in Python

Specify the Python executable to use with virtualenv

Create a page that loads infinitely with python