Overview

Save the image on your website to your PC using Python requests and Beautiful Soup. By the way, the image is displayed when the script is executed.

Motivation

I want to save a Perfume image. I thought it would be convenient if it could be saved automatically.

development of

environment

OS	Windows 10
Python	3.7.3
requests	2.22.0
beautifulsoup4	4.8.2

Completed code

`main.py`


import requests
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
import cv2

root = "https://www.perfume-web.jp/"
url = "https://www.perfume-web.jp/index-jpn.php"
store_path = "PATH"

def img_store(path):p
    img = requests.get(path).content

    print(path)

    with open(store_path, "wb") as f:
        f.write(img)

    img_local = cv2.cvtColor(cv2.imread(store_path), cv2.COLOR_BGR2RGB)

    plt.imshow(img_local)
    plt.show()

response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

top_img = soup.find("div", id="main").find("img").get("src")

img_store(root+top_img)

Description

response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

top_img = soup.find("div", id="main").find("img").get("src")

Extracts HTML from the specified URL. (Lines 1 and 2) Then read the HTML for your site. You can read the method in the window that appears when you left-click on Chrome and select "Verify". This time I want to take the top image of the WEB page, so I specified the main of the div tag. Find () fetches only the first one that appears even if it has the same tag or id, so only one value is returned. I got the src of the img tag in it. You can only find the tags and ids in HTML by actually reading the site and HTML and devising your own ideas. There are many functions in bs4 to get more complex elements.

 top_img = soup.find("div", id="main").find("img").get("src")


def img_store(path):p
    img = requests.get(path).content

    print(path)

    with open(store_path, "wb") as f:
        f.write(img)

    img_local = cv2.cvtColor(cv2.imread(store_path), cv2.COLOR_BGR2RGB)

    plt.imshow(img_local)
    plt.show()

Since the image path was a relative path, I prepared the domain of the site as root and made the correct URL of the image by connecting with the relative path of the acquired image. The rest is save and display. It is a hobby displayed by matplotlib. It feels good to have a scale according to the number of pixels.

result

スクリーンショット (7).png It will be displayed like this. (I have hidden the precious faces of the three people. If you want to see it, please visit Perfume Official Site) You can save other images by changing the way you search for images.

Consideration

I tried web scraping as a stepping stone to the idea that it would be nice if the site could be updated automatically. It seems that there are various rules and laws for web scraping, so please refer to the following site. It is a kind of attack that puts a load on the other server. .. .. https://qiita.com/nezuq/items/c5e827e1827e7cb29011 For debugging and practice, it's a good idea to save and use all the HTML for your site once. It's scary to have an unexpected infinite loop.

reference

http://kondou.com/BS4/ https://qiita.com/Azunyan1111/items/9b3d16428d2bcc7c9406 https://qiita.com/YosukeHoshi/items/189c615187f41f2f4e27

Save images with web scraping