Here, we will introduce "How to save images on the Web at once" by web scraping.
: warning: Attention: warning: If it is protected by copyright or if it is OK in terms of copyright but scraping is prohibited by the terms of use, there is a possibility of claiming damages, so make sure you understand the copyright law and terms of use on the Web. Let's scrape.
Web scraping can be done in various languages such as "Ruby", "PHP", and "Javascript", but this time I will introduce the method using Python's "Beautiful Soup".
① Install beautifulsoup4 with pip
pip install beautifulsoup4
② Decide on a site for web scraping
③ Get the URL of each image link page from the list page
url = "https://www.irasutoya.com/search/label/%E3%83%93%E3%82%B8%E3%83%8D%E3%82%B9"
#Prepare a list to store the URL of the image page
link_list = []
response = urllib.request.urlopen(url)
soup = BeautifulSoup(response, "html.parser")
#Get all image link tags
image_list = soup.select('div.boxmeta.clearfix > h2 > a')
#Extract image links one by one
for image_link in image_list:
link_url = image_link.attrs['href']
link_list.append(link_url)
④ Get all the tags of the image file
for page_url in link_list:
page_html = urllib.request.urlopen(page_url)
page_soup = BeautifulSoup(page_html, "html.parser")
#Get all tags for image files
img_list = page_soup.select('div.separator > a > img')
⑤ Take out the img tags one by one and get the URL of the image file.
for img in img_list:
#Get the URL of the image file
img_url = (img.attrs['src'])
file_name = re.search(".*/(.*png|.*jpg)$", img_url)
save_path = output_folder.joinpath(file_name.group(1))
⑥ Download the data from the URL of the image file
try:
#Get data from image file URL
image = requests.get(img_url)
#Save the data in the save destination file path
open(save_path, 'wb').write(image.content)
#Show saved file name
print(save_path)
except ValueError:
print("ValueError!")
That's all for the procedure.
↓ ↓ Execution result ↓ ↓
I thought it was a little difficult to imagine steps ③ to ⑤, so I created a rough extraction flow.
Also, the source of this time is also posted on Github, so please refer to it from the following. https://github.com/miyazakikna/SaveLocalImageWebScraping.git
Here, I explained how to save images in bulk using Beatiful Soup of Python. I got the image of Irasutoya this time, but I think that you can download the image in the same way on other sites, so please use it.
Click here for how to change the file name at once after downloading the image ↓ ↓ [[Work efficiency] How to change file names in Python] (https://qiita.com/miyazakikna/items/b9c6d6d83ebcd529afd7)
・ Let's scrape images with Python ・ Image collection by web scraping
Recommended Posts