[Python] How to save images on the Web at once with Beautiful Soup

Introduction

Here, we will introduce "How to save images on the Web at once" by web scraping.

: warning: Attention: warning: If it is protected by copyright or if it is OK in terms of copyright but scraping is prohibited by the terms of use, there is a possibility of claiming damages, so make sure you understand the copyright law and terms of use on the Web. Let's scrape.

[How to do web scraping](# 1-How to do web scraping)
[Actually save the image](# 2-Actually save the image)
[Extraction flow](# 3-Extraction flow)
[Summary](# 4-Summary)
[Bonus](# 5-Bonus)
[Reference](# 6-Reference)

1. How to do web scraping

Web scraping can be done in various languages such as "Ruby", "PHP", and "Javascript", but this time I will introduce the method using Python's "Beautiful Soup".

2. Actually save the image

① Install beautifulsoup4 with pip

pip install beautifulsoup4

② Decide on a site for web scraping

This time, we will download the image of "Irasutoya". https://www.irasutoya.com/search/label/%E3%83%93%E3%82%B8%E3%83%8D%E3%82%B9

③ Get the URL of each image link page from the list page

url = "https://www.irasutoya.com/search/label/%E3%83%93%E3%82%B8%E3%83%8D%E3%82%B9"
#Prepare a list to store the URL of the image page
link_list = []
response = urllib.request.urlopen(url)
soup = BeautifulSoup(response, "html.parser")
#Get all image link tags
image_list = soup.select('div.boxmeta.clearfix > h2 > a')
#Extract image links one by one
for image_link in image_list:
    link_url = image_link.attrs['href']
    link_list.append(link_url)

④ Get all the tags of the image file

for page_url in link_list:
    page_html = urllib.request.urlopen(page_url)
    page_soup = BeautifulSoup(page_html, "html.parser")
    #Get all tags for image files
    img_list = page_soup.select('div.separator > a > img')

⑤ Take out the img tags one by one and get the URL of the image file.

for img in img_list:
    #Get the URL of the image file
    img_url = (img.attrs['src'])
    file_name = re.search(".*/(.*png|.*jpg)$", img_url)
    save_path = output_folder.joinpath(file_name.group(1))

⑥ Download the data from the URL of the image file

try:
   #Get data from image file URL
   image = requests.get(img_url)
   #Save the data in the save destination file path
   open(save_path, 'wb').write(image.content)
   #Show saved file name
   print(save_path)
except ValueError:
   print("ValueError!")

That's all for the procedure.

↓ ↓ Execution result ↓ ↓

3. Extraction flow

I thought it was a little difficult to imagine steps ③ to ⑤, so I created a rough extraction flow.

Also, the source of this time is also posted on Github, so please refer to it from the following. https://github.com/miyazakikna/SaveLocalImageWebScraping.git

4. Summary

Here, I explained how to save images in bulk using Beatiful Soup of Python. I got the image of Irasutoya this time, but I think that you can download the image in the same way on other sites, so please use it.

5. Bonus

Click here for how to change the file name at once after downloading the image ↓ ↓ [[Work efficiency] How to change file names in Python] (https://qiita.com/miyazakikna/items/b9c6d6d83ebcd529afd7)