Judge the extension and download the image

The blog I wrote a long time ago didn't have an export function. I made the URL list of the image with curl or grep, but since the URL is in a format like / Img? Hogehoge, even if I save it with wget -i, it becomes Img0.1 or Img0.2.

If you search carefully, there may be an option to do something good with curl or wget, but it is troublesome to search, so I wrote a script.

The file name takes the update date from Last-Modified and the extension from context-type. Since there was a file with the same update date, I also added a serial number.

How to use

cat url_list.txt | python get-contents.py

script

`get-contents.py`


# -*- coding: utf-8 -*-

import sys
import requests
import datetime
import struct

cnt = 0
for line in sys.stdin.readlines():
    r = requests.get(line.strip())
    # print(r.headers)
    ext = (r.headers['Content-Type'].split('/'))[1]
    lm = datetime.datetime.strptime(
        r.headers['Last-Modified'], '%a, %d %b %Y %H:%M:%S GMT')
    fname = lm.strftime('%Y%m%d-%H%M%S') + ('-%03d.' % cnt) + ext
    print(fname)
    with open(fname, "wb") as fout:
        for x in r.content:
            fout.write(struct.pack("B", x))
    cnt = cnt + 1