The script I made the other day The cat image is automatically picked up by feedparser Thanks to that, the days when cat image collection progresses ...
However, the execution speed is slow. Not fast ... This script isn't fast enough. So I'll try to modify it to see if it can be made faster.
For the time being, I measured how slow the current script is. Below, the source that incorporates the measurement logic.
get_cat.py
# -*- coding: utf-8 -*-
import feedparser
import urllib
import os
import time
def download_picture(q, count=10):
u"""Fetch count images of q."""
count = str(count)
feed = feedparser.parse("https://picasaweb.google.com/data/feed/base/all?q=" + q + "&max-results=" + count)
pic_urls = []
for entry in feed['entries']:
url = entry.content[0].src
if not os.path.exists(os.path.join(os.path.dirname(__file__), q)):
os.mkdir(os.path.join(os.path.dirname(__file__), q))
urllib.urlretrieve(url, os.path.join(os.path.dirname(__file__), q, os.path.basename(url)))
print('download:' + url)
if __name__ == "__main__":
time1=time.time()
download_picture("cat", 10)
time1_2=str(time.time()-time1)
print("complete!("+time1_2+")")
result
complete!(6.05635690689)
It took 6 seconds to download 10 copies. I tried it several times, but after all it was about 6 seconds. Now you can only download 14400 copies in 24 hours. It's far from ideal.
httplib2 I learned about the existence of a library called httplib2 from wind rumors. Rather, this one seems to be more standard than the standard one. Features of httplib2 (↓).
Isn't it wonderful? Let's use it now.
$ sudo pip install httplib2
Install quickly. And fix the program.
get_cat2.py
# -*- coding: utf-8 -*-
import feedparser
import httplib2
import os
import time
def download_picture(q, count=10):
u"""Fetch count images of q."""
count = str(count)
feed = feedparser.parse("https://picasaweb.google.com/data/feed/base/all?q=" + q + "&max-results=" + count)
pic_urls = []
http = httplib2.Http(".chache")
for entry in feed['entries']:
url = entry.content[0].src
open(os.path.join(os.path.join(os.path.dirname(__file__), q),os.path.basename(url)),'wb').write(http.request(url)[1])
print('download:' + url)
if __name__ == "__main__":
time1=time.time()
download_picture("cat", 10)
time1_2=str(time.time()-time1)
print("complete!("+time1_2+")")
A strange place.
Run it right away!
Execution result
First of all, the original program. Well like this.
complete!(5.79861092567)
revised edition. Hmm?
complete!(5.06348490715)
The second improved version. Oh...
complete!(1.20627403259)
The third improved version. Uooo!
complete!(0.768098115921)
It's fast! This is practical enough. It's not a lie to say the speed per second.
** Conclusion: Use httplib2 to improve cat image collection. ** **
Recommended Posts