This time, it is described for my memo. If you find it helpful, please use it as well.

The source code used this time is "Technical book of TomoProg" and "Resize images at once using Python, Pillow". (How to enlarge / reduce) " was used as a reference. Please check the above web page for more detailed explanations.

Then, I will describe the environment I used and the impression I used.

Development environment

windows7 python 3.5 pycharm

Image acquisition source code

import urllib.request
import bs4

#URL of the web page you want to get
url = "https://www.google.co.jp/"
request = urllib.request.urlopen(url)
html = request.read()

#Create a list of character codes
encoding_list = ["cp932", "utf-8", "utf_8", "euc_jp",
                 "euc_jis_2004", "euc_jisx0213", "shift_jis",
                 "shift_jis_2004", "shift_jisx0213", "iso2022jp",
                 "iso2022_jp_1", "iso2022_jp_2", "iso2022_jp_3",
                 "iso2022_jp_ext", "latin_1", "ascii"]

for enc in encoding_list:
    try:
        html.decode(enc)
        break
    except:
        enc = None

resources = []

#Create a BeautifulSoup object
soup = bs4.BeautifulSoup(html)

#Get the contents of the src attribute in all html img tags
for img_tag in soup.find_all("img"):
    src_str = img_tag.get("src")
    resources.append(src_str)

#Show the contents of src
array_jpg = []
for resource in resources:
    array_jpg.append(resource)

#Open the URL of the image file
#(Specify the URL of the image file in url)
count = 0
for number in range(0, len(array_jpg)):
    request = urllib.request.urlopen(array_jpg[number])

    #Open the file in binary mode and write the contents of the URL
    #File names are serial numbers(Example: 0.jpg/1.jpg/......)
    f = open("%d.jpg " % (count), "wb")
    f.write(request.read())

    #Close file
    f.close()
    count += 1

Image resizing source code

#coding:utf-8

from PIL import Image
import os

input_path = "C:\\Users\\image"
output_path = "C:\\Users\\image_480x300"

#Get the file name in the image folder
list_input_path = os.listdir(input_path)

for number in range(0, len(list_input_path)):
    #Open image file
    img = Image.open(input_path + "/" + list_input_path[number], 'r')

    # img.resize((480, 300), Image.LANCZOS)Is the size setting to resize, the filter setting
    img_resize_lanczos = img.resize((480, 300), Image.LANCZOS)
    img_resize_lanczos = img_resize_lanczos.convert("RGB")
    #Save resized image
    img_resize_lanczos.save(output_path + "/" + list_input_path[number], quality = 100)

Impressions of using

On the above website, the explanations were carefully written and there was no duplication. It was a very well organized site.

I think you will need some data when you want to do machine learning. In such a case, if you have this kind of knowledge, you can immediately collect data and start machine learning. I also collected a lot of images using this source code, so I would like to use it for machine learning. Please be careful about the copyright of the image.

Get an image from a web page and resize it

Development environment

Image acquisition source code

Image resizing source code

Impressions of using