Scraping using Python 3.5 async / await

Because there was such an article before. http://postd.cc/fast-scraping-in-python-with-asyncio/ Let's do this with the new Python 3.5 syntax.

Ah, the original article scraped html, but I wanted to access a URL that returned a similar amount of response at the same time, so I tried RSS. It's not scraping at all. Well, it's the same thing I'm doing ...

import asyncio
import aiohttp
import feedparser
import time

async def print_first_title(url):
    response = await aiohttp.request('GET', url)
    body = await response.text()
    d = feedparser.parse(body)
    print(d.entries[0].title)

rss = [] #Somehow an array of RSS URLs. I did about 10 Yahoo news

if __name__ == '__main__':
    start = time.time()
    loop = asyncio.get_event_loop()
    loop.run_until_complete(asyncio.wait([print_first_title(url) for url in rss]))
    end = time.time()
    print("{0} ms".format((end - start) * 1000))

390.4871940612793 ms

Well, is it certainly easier to read? But with this alone, it just looks like the decorator and the yield from guy had some sort of syntax. So far I have nothing more than that. Is this amazing?

Rather, like the original article, I use a library called aiohttp to make the communication part asynchronous, but this is very convenient! I did not know! !! !!

Speed comparison with the one who does not use coroutines because of

import urllib.request
import feedparser
import time

def print_first_title(url):
    response = urllib.request.urlopen(url)
    body = response.read()
    d = feedparser.parse(body)
    print(d.entries[0].title)

rss = []

if __name__ == '__main__':
    start = time.time()
    [print_first_title(url) for url in rss]
    end = time.time()
    print("{0} ms".format((end - start) * 1000))

1424.4353771209717 ms

Slow! That's it!

Recommended Posts

Scraping using Python 3.5 async / await

Scraping using Python 3.5 Async syntax

Scraping using Python

python async / await curio

Web scraping using Selenium (Python)

[Scraping] Python scraping

Hide websockets async / await in Python3

[Python] Asynchronous request with async / await

Lightweight thread performance benchmark using async / await implemented in Python 3.5

Python scraping notes

Python Scraping get_ranker_categories

Scraping with Python

Scraping with Python

[Beginner] Python web scraping using Google Colaboratory

Start using Python

Play Python async

Python Scraping eBay

Python Scraping get_title

Scraping a website using JavaScript in Python

Python: Scraping Part 1

[Python] Scraping a table using Beautiful Soup

Python: Scraping Part 2

I tried web scraping using python and selenium

Pharmaceutical company researchers summarized web scraping using Python

Scraping with Python (preparation)

Summary about Python scraping

Try scraping with Python.

Operate Redmine using Python Redmine

Fibonacci sequence using Python

UnicodeEncodeError:'cp932' during python scraping

Data analysis using Python 0

Basics of Python scraping basics

Scraping with Python + PhantomJS

Data cleaning using Python

Using Python #external packages

WiringPi-SPI communication using Python

Age calculation using python

Search Twitter using Python

Scraping with Selenium [Python]

Python web scraping selenium

Scraping with Python + PyQuery

Name identification using python

Notes using Python subprocesses

Try using Tweepy [Python2.7]

Scraping RSS with Python

Convert callback-style asynchronous API to async / await in Python

Python asynchronous processing ~ Full understanding of async and await ~

Python notes using perl-ternary operator

Flatten using Python yield from

I tried scraping with Python

Save images using python3 requests

Web scraping with python + JupyterLab

Scraping with selenium in Python

Scraping with Selenium + Python Part 1

[S3] CRUD with S3 using Python [Python]

[Python] Scraping in AWS Lambda

python super beginner tries scraping

Web scraping notes in python3

[Python] Try using Tkinter's canvas

Scraping with chromedriver in python

Festive scraping with Python, scrapy