Save images with web scraping

Overview

Save the image on your website to your PC using Python requests and Beautiful Soup. By the way, the image is displayed when the script is executed.

Motivation

I want to save a Perfume image. I thought it would be convenient if it could be saved automatically.

development of

environment

OS Windows 10
Python 3.7.3
requests 2.22.0
beautifulsoup4 4.8.2

Completed code

main.py


import requests
from bs4 import BeautifulSoup
import matplotlib.pyplot as plt
import cv2

root = "https://www.perfume-web.jp/"
url = "https://www.perfume-web.jp/index-jpn.php"
store_path = "PATH"

def img_store(path):p
    img = requests.get(path).content

    print(path)

    with open(store_path, "wb") as f:
        f.write(img)

    img_local = cv2.cvtColor(cv2.imread(store_path), cv2.COLOR_BGR2RGB)

    plt.imshow(img_local)
    plt.show()

response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

top_img = soup.find("div", id="main").find("img").get("src")

img_store(root+top_img)

Description

response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

top_img = soup.find("div", id="main").find("img").get("src")

Extracts HTML from the specified URL. (Lines 1 and 2) Then read the HTML for your site. You can read the method in the window that appears when you left-click on Chrome and select "Verify". This time I want to take the top image of the WEB page, so I specified the main of the div tag. Find () fetches only the first one that appears even if it has the same tag or id, so only one value is returned. I got the src of the img tag in it. You can only find the tags and ids in HTML by actually reading the site and HTML and devising your own ideas. There are many functions in bs4 to get more complex elements.

 top_img = soup.find("div", id="main").find("img").get("src")

def img_store(path):p
    img = requests.get(path).content

    print(path)

    with open(store_path, "wb") as f:
        f.write(img)

    img_local = cv2.cvtColor(cv2.imread(store_path), cv2.COLOR_BGR2RGB)

    plt.imshow(img_local)
    plt.show()

Since the image path was a relative path, I prepared the domain of the site as root and made the correct URL of the image by connecting with the relative path of the acquired image. The rest is save and display. It is a hobby displayed by matplotlib. It feels good to have a scale according to the number of pixels.

result

スクリーンショット (7).png It will be displayed like this. (I have hidden the precious faces of the three people. If you want to see it, please visit Perfume Official Site) You can save other images by changing the way you search for images.

Consideration

I tried web scraping as a stepping stone to the idea that it would be nice if the site could be updated automatically. It seems that there are various rules and laws for web scraping, so please refer to the following site. It is a kind of attack that puts a load on the other server. .. .. https://qiita.com/nezuq/items/c5e827e1827e7cb29011 For debugging and practice, it's a good idea to save and use all the HTML for your site once. It's scary to have an unexpected infinite loop.

reference

http://kondou.com/BS4/ https://qiita.com/Azunyan1111/items/9b3d16428d2bcc7c9406 https://qiita.com/YosukeHoshi/items/189c615187f41f2f4e27

Recommended Posts

Save images with web scraping
Automatically download images with scraping
Web scraping with python + JupyterLab
Easy web scraping with Scrapy
Web scraping beginner with python
web scraping
Web scraping with Python ① (Scraping prior knowledge)
Web scraping with BeautifulSoup4 (layered page)
Scraping Alexa's web rank with pyQuery
Web scraping with Python First step
I tried web scraping with python.
GAN: DCGAN Part1 --Scraping Web images
Save images on the web to Drive with Python (Colab)
Scraping with selenium
Scraping with selenium ~ 2 ~
Scraping with Python
WEB scraping with Python (for personal notes)
Scraping with Python
Getting Started with Python Web Scraping Practice
web scraping (prototype)
Scraping with Selenium
[Personal note] Web page scraping with python3
Web scraping with Python ② (Actually scraping stock sites)
Horse Racing Site Web Scraping with Python
Scraping 100 Fortnite images
Getting Started with Python Web Scraping Practice
Practice web scraping with Python and Selenium
Easy web scraping with Python and Ruby
Web scraping with BeautifulSoup4 (serial number page)
[For beginners] Try web scraping with Python
AWS-Perform web scraping regularly with Lambda + Python + Cron
Let's do web scraping with Python (weather forecast)
Let's do web scraping with Python (stock price)
Scraping with Python (preparation)
Save memory with `` __slots__``
Scraping with Python + PhantomJS
Introduction to Web Scraping
Scraping with scrapy shell
[Python] How to save images on the Web at once with Beautiful Soup
Scraping: Save website locally
Scraping with Selenium [Python]
Python web scraping selenium
Scraping with Python + PyQuery
Scraping with Beautiful Soup
Scraping RSS with Python
Center images with python-pptx
Data analysis for improving POG 1 ~ Web scraping with Python ~
Quick web scraping with Python (while supporting JavaScript loading)
Python beginners get stuck with their first web scraping
I tried scraping with Python
Save images using python3 requests
Scraping with selenium in Python
Scraping with Selenium + Python Part 1
Web scraping notes in python3
Scraping with chromedriver in python
Festive scraping with Python, scrapy
Scraping immediately from google images!
Web application development with Flask
Scraping with Selenium in Python
Web scraping technology and concerns
Web application creation with Django