The blog I wrote a long time ago didn't have an export function. I made the URL list of the image with curl or grep, but since the URL is in a format like / Img? Hogehoge, even if I save it with wget -i, it becomes Img0.1 or Img0.2.
If you search carefully, there may be an option to do something good with curl or wget, but it is troublesome to search, so I wrote a script.
The file name takes the update date from Last-Modified and the extension from context-type. Since there was a file with the same update date, I also added a serial number.
cat url_list.txt | python get-contents.py
get-contents.py
# -*- coding: utf-8 -*-
import sys
import requests
import datetime
import struct
cnt = 0
for line in sys.stdin.readlines():
r = requests.get(line.strip())
# print(r.headers)
ext = (r.headers['Content-Type'].split('/'))[1]
lm = datetime.datetime.strptime(
r.headers['Last-Modified'], '%a, %d %b %Y %H:%M:%S GMT')
fname = lm.strftime('%Y%m%d-%H%M%S') + ('-%03d.' % cnt) + ext
print(fname)
with open(fname, "wb") as fout:
for x in r.content:
fout.write(struct.pack("B", x))
cnt = cnt + 1
Recommended Posts