How to download all photos of egao school photo service with python base

Background (Situation & Task)

When a child is in a nursery school, the nursery school staff may take a picture.

There are several ways to share these photos, one of which is the "egao School Photo Service". It's a service of Studio Alice, but I think it's a pretty good system that allows you to select and purchase photos of your child and download them from the web at a later date.

ホームページ___egao_スクールフォトサービス.jpg https://egao.photo/store/

However, most parents choose a lot of photos, either or not (my home is over a hundred), but there is no option for this web service, bulk download. If you click one by one, you will gradually lose track of what it is. .. .. .. That's horrible. .. .. ..

I'm sure it will be a similar situation again, so make it as a memorandum of your own.

** This article is based on the egao website as of March 2020, and may not be usable if the specifications of the ega website are changed. ** **

(If possible, please add a batch download if there is a change in the website specifications)

What I tried to do (Action)

For the time being, I assumed that I would download it according to the following flow.

  1. Access the site
  2. Log in
  3. Transition to the download page
  4. Download the displayed images (purchased photos) in a batch

Prepare in advance

The preparations for actually proceeding are as follows.

-Install Selenium and Beautiful Soup. (Especially on the PC side, be careful about the version of the web driver etc.) ・ Login ID (Email address) / Password ・ Copy the URL of the list page containing the photos you want to download.

The article referred to (at the end of this article) is detailed about the preset settings, so I will omit it here.

Actual procedure (Result)

First, I installed the necessary libraries.

python


from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from bs4 import BeautifulSoup

Next, I launched Chrome.Webdriver for automatic operation.

python


driver = webdriver.Chrome()
driver.implicitly_wait(3)

After launching, use the following command to access the relevant website and log in. By the way, if you make the web screen that is automatically displayed smaller, the structure of xml seems to change and there is a possibility of an error. Please note that we are not thinking about how to deal with this area.

python


url = "https://egao.photo/store/" #Web page with login page
user = "[email protected]" #My E-Describe mail
password = "hogehoge" #Enter the password you have set
driver.get(url)

elem = driver.find_element_by_id("btn-login")#Press the login button on the top page
elem.click()
elem = driver.find_element_by_id("inputEmail")#enter email address
elem.clear()
elem.send_keys(user)
elem = driver.find_element_by_id("inputPassword")#Password input
elem.clear()
elem.send_keys(password)
elem = driver.find_element_by_xpath("//*[@id='login-modal']/div/div/div[2]/form/div/div[3]/div[1]/button")#Press the login button
elem.click()

About the procedure of elem If the procedure is described with an image, it will be in the following form. At the last login, I wish I had an id, but I couldn't find it, so I specified it using Xpath.

ホームページ___egao_スクールフォトサービス.jpg

Next, specify the web page you want to download in bulk, and use the web driver to transition the page.

python


url_target = "https://egao.photo/store/EventPhoto/Download?Model=hogehogehogehogehoge-1"
driver.get(url_target)

This is the main work to be done with Selenium base once, and then Beautiful Soup comes into play (note that the browser displayed by WebDriver should not be deleted). Beautiful Soup loaded the page currently open by the webdriver and parsed it.

python


page_source = driver.page_source
soup = BeautifulSoup(page_source, 'lxml')

PhotoId was commonly included as an individual name in the images to be downloaded. First, extract the part containing the photoId and store it in the list. After that, the id (individual id for each image) was further extracted.

python


linklist = []
linklist = soup.find_all('button', attrs={'name': 'photoId'})

linklist_2 = []
for a in linklist:
    b = a.attrs['id']
    linklist_2.append(b)

It is OK if the contents of linklist_2 are as follows.

['Download_XYXYXYXYXYYYY', 'Download_YYYYYYYYYYYYY', 'Download_XXXXXXXXXXXYY', 'Download_XXXXXXXXXXXXY']

Finally, I went back to Selenium and downloaded the images for each id.

python


for a in linklist_2:
    elem = driver.find_element_by_id(a)
    elem.click()

With the method so far, you can download all at once up to the maximum amount displayed on the web page, so you can collect it by making the same page transition for another page and executing the same command again. It was.

Impressions & what I want to do in the future

The next time I need to download a lot, I'm thinking of automating a little more, including the redundant parts. Anyway, I'm glad that this made it easier when I needed to download the same large number of images again.

Referenced web page

Most of the necessary things were written in the following two. Thanks.

[Selenium] Log in and write data to csv [Beautiful Soup]Download images of Irasutoya at once with Python scraping

Recommended Posts

How to download all photos of egao school photo service with python base
How to specify attributes with Mock of python
[Python] How to specify the download location with youtube-dl
Python: How to use async with
How to get started with Python
How to use FTP with Python
How to calculate date with python
[Python] Summary of how to use pandas
How to crop the lower right part of the image with Python OpenCV
How to work with BigQuery in Python
[Introduction to Python] How to sort the contents of a list efficiently with list sort
How to deal with SSL error when connecting to S3 with boto of Python
How to do portmanteau test with python
How to display python Japanese with lolipop
How to download youtube videos with youtube-dl
[Python2.7] Summary of how to use unittest
Python code to train and test with Custom Vision of Cognitive Service
How to enter Japanese with Python curses
Summary of how to use Python list
[Python] How to deal with module errors
[Python2.7] Summary of how to use subprocess
Try to solve a set problem of high school math with Python
How to install python3 with docker centos
[Question] How to use plot_surface of python
(Diary 1) How to create, reference, and register data in the SQL database of Microsoft Azure service with python
Summary of how to read numerical data with python [CSV, NetCDF, Fortran binary]
How to use Service Account OAuth and API with Google API Client for python
Note: How to get the last day of the month with python (added the first day of the month)
How to get a list of files in the same directory with python
[Introduction to Python] How to get the index of data with a for statement
How to upload with Heroku, Flask, Python, Git (4)
How to read a CSV file with Python 2/3
How to save all Instagram photos at once
How to enjoy programming with Minecraft (Ruby, Python)
[Python] How to use two types of type ()
[REAPER] How to play with Reascript in Python
How to do multi-core parallel processing with python
Strategy on how to monetize with Python Java
Summary of how to import files in Python 3
[Python] How to draw multiple graphs with Matplotlib
[Python] How to read excel file with pandas
How to crop an image with Python + OpenCV
Summary of how to use MNIST in Python
How to implement "named_scope" of RubyOnRails with Django
How to measure execution time with Python Part 1
How to get dictionary type elements of Python 2.7
How to use tkinter with python in pyenv
[Python] How to handle Japanese characters with openCV
[Python] How to compare datetime with timezone added
How to measure execution time with Python Part 2
[Python] Summary of eval / exec functions + How to write character strings with line breaks
How to identify the element with the smallest number of characters in a Python list?
How to count the number of occurrences of each element in the list in Python with weight
The 15th offline real-time I tried to solve the problem of how to write with python
How to know the port number of the xinetd service
How to convert / restore a string with [] in python
The wall of changing the Django service from Python 2.7 to Python 3
How to add help to HDA (with Python script bonus)
A memo connected to HiveServer2 of EMR with python
How to get the number of digits in Python
I tried to summarize how to use matplotlib of python