Because there was such an article before. http://postd.cc/fast-scraping-in-python-with-asyncio/ Let's do this with the new Python 3.5 syntax.
Ah, the original article scraped html, but I wanted to access a URL that returned a similar amount of response at the same time, so I tried RSS. It's not scraping at all. Well, it's the same thing I'm doing ...
import asyncio
import aiohttp
import feedparser
import time
async def print_first_title(url):
response = await aiohttp.request('GET', url)
body = await response.text()
d = feedparser.parse(body)
print(d.entries[0].title)
rss = [] #Somehow an array of RSS URLs. I did about 10 Yahoo news
if __name__ == '__main__':
start = time.time()
loop = asyncio.get_event_loop()
loop.run_until_complete(asyncio.wait([print_first_title(url) for url in rss]))
end = time.time()
print("{0} ms".format((end - start) * 1000))
390.4871940612793 ms
Well, is it certainly easier to read? But with this alone, it just looks like the decorator and the yield from guy had some sort of syntax. So far I have nothing more than that. Is this amazing?
Rather, like the original article, I use a library called aiohttp to make the communication part asynchronous, but this is very convenient! I did not know! !! !!
Speed comparison with the one who does not use coroutines because of
import urllib.request
import feedparser
import time
def print_first_title(url):
response = urllib.request.urlopen(url)
body = response.read()
d = feedparser.parse(body)
print(d.entries[0].title)
rss = []
if __name__ == '__main__':
start = time.time()
[print_first_title(url) for url in rss]
end = time.time()
print("{0} ms".format((end - start) * 1000))
1424.4353771209717 ms
Slow! That's it!
Recommended Posts