When collecting images on the Web with a crawler, etc., there may be a situation where you want to obtain only the size (resolution) of the image without downloading the entire image file. Therefore, if you just look at the header of the file, you can get the desired information without downloading everything.
Below is the source code. I have confirmed the operation with Python 3.4.2 + OS X 10.10 (Yosemite). It supports GIF, JPEG and PNG formats. There is no need for additional libraries such as OpenCV.
import sys
import struct
import urllib.request
def parse_jpeg(res):
while not res.closed:
(marker, size) = struct.unpack('>2sH', res.read(4))
if marker == b'\xff\xc0':
(_,height,width,_) = struct.unpack('>chh10s', res.read(size-2))
return (width,height)
else:
res.read(size-2)
def parse_png(res):
(_,width,height) = struct.unpack(">14sII", res.read(22))
return (width, height)
def parse_gif(res):
(_,width,height) = struct.unpack("<4sHH", res.read(8))
return (width, height)
def get_image_size(url):
res = urllib.request.urlopen(url)
size = (-1,-1)
if res.status == 200:
signature = res.read(2)
if signature == b'\xff\xd8': #jpg
size = parse_jpeg(res)
elif signature == b'\x89\x50': #png
size = parse_png(res)
elif signature == b'\x47\x49': #gif
size = parse_gif(res)
res.close()
return size
Please make up for error handling as appropriate.
Darkside Communication Group "File Format Encyclopedia" (ISBN4-87310-064-X)
Recommended Posts