It's just a personal note. Create a Python program that downloads images quickly using a library called requests. ʻurllib.requests` seems to be useful in python3, but it seems that it can not be used in python2 (insufficient research), so I used this. You can set various things such as cookies, but create a simple program that you just access and download.
Official: python-requests
$ pip install requests
$ python
>>> import requests
>>> url = "http://docs.python-requests.org/en/master/#"
>>> res = requests(url)
>>> res = requests.get(url)
>>> res.status_code
200
>>> res.headers["content-type"]
'text/html'
>>> res.content
'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html xmlns="http://www.w3.org/1999/xhtml">\n <head>\n...
>>> res.text
u'<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"\n "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">\n\n\n<html xmlns="http://www.w3.org/1999/xhtml">\n <head>\n ...
See The User Guide for more information.
When the URL parameter is set, it is given in dictionary format to the argument params.
res = requests.get('http://httpbin.org/get', params={'key':'value'})
print(res.url) #=> http://httpbin.org/get?key=value
In post and put, form information can be sent with the argument data.
res = requests.post('http://httpbin.org/post', data = {'key':'value'})
res = requests.put('http://httpbin.org/put', data = {'key':'value'})
Methods are provided according to the type of request.
res = requests.get('http://httpbin.org/get')
res = requests.post('http://httpbin.org/post', data = {'key':'value'})
res = requests.put('http://httpbin.org/put', data = {'key':'value'})
res = requests.delete('http://httpbin.org/delete')
res = requests.head('http://httpbin.org/get')
res = requests.options('http://httpbin.org/get')
You can refer to the following variables.
res = requests.get('http://httpbin.org/get')
# HTML Status Code
response.status_code
#Response header Content-Examine Type
print res.header["content-type"]
#Acquired data(binary)
print res.content
#Acquired data(Encoded)And encoding
print res.text
print res.encoding
The input is a text file ʻinput.txt with a list of URLs, and the images are output to the output directory ʻimages /
in the order of 0.jpg, 1.jpg, 2.jpg, ....
In some places, weird code is mixed in because it's cute.
import requests
import os
import sys
#Download image
def download_image(url, timeout = 10):
response = requests.get(url, allow_redirects=False, timeout=timeout)
if response.status_code != 200:
e = Exception("HTTP status: " + response.status_code)
raise e
content_type = response.headers["content-type"]
if 'image' not in content_type:
e = Exception("Content-Type: " + content_type)
raise e
return response.content
#Decide the file name of the image
def make_filename(base_dir, number, url):
ext = os.path.splitext(url)[1] #Get extension
filename = number + ext #Add an extension to the number to make it a file name
fullpath = os.path.join(base_dir, filename)
return fullpath
#Save the image
def save_image(filename, image):
with open(filename, "wb") as fout:
fout.write(image)
#Main
if __name__ == "__main__":
urls_txt = "input.txt"
images_dir = "images"
idx = 0
with open(urls_txt, "r") as fin:
for line in fin:
url = line.strip()
filename = make_filename(images_dir, idx, url)
print "%s" % (url)
try:
image = download_image(url)
save_image(filename, image)
idx += 1
except KeyboardInterrupt:
break
except Exception as err:
print "%s" % (err)
Recommended Posts