In this article, I created a page for infinite loading. On such pages, you can't get the source with curl or requests, which is often used, so you need to do something special. This time, I will write the code to get the contents of this page.
*python 3.8.1
Get the source for the page created by the code in this article. When the time and the number of acquired bytes limit is reached, the code acquired so far is output.
get_inf_page.py
import requests
import timeout_decorator
r_bytes = b""
def main():
url = "http://localhost:8000"
r = requests.get(url, stream=True, timeout=20)
byte_limit = 30
@timeout_decorator.timeout(100)
def load_bytes(r):
global r_bytes
for l in r.iter_content():
r_bytes += l
if len(r_bytes) % 500 == 0:
print(f"loaded:{len(r_bytes)}/{byte_limit}")
if len(r_bytes) > byte_limit:
r.close()
print("reached size limit")
break
try:
load_bytes(r)
except timeout_decorator.timeout_decorator.TimeoutError:
print("timeout")
pass
print(r_bytes)
if __name__ == "__main__":
main()
Please move the above code while the code of this article is running in another terminal. It will be displayed as follows.
reached size limit
b'<p>Hello World ! 0</p><p>Hello '
Change the 11th and 12th lines for the following and check the operation in the same way as above.
byte_limit = 1000
@timeout_decorator.timeout(5)
Only the output amount is displayed within 5 seconds after startup.
timeout
b'<p>Hello World ! 0</p><p>Hello World ! 1</p><p>Hello World ! 2</p>'
that's all.
Recommended Posts