Nogizaka46 I saved the image of the blog by scraping with Python. I scraped the first page of Manatsu Akimoto.
scraping.py
import requests
import urllib.request
import os
from bs4 import BeautifulSoup
def scraping():
    #Member URL
    member_name = "manatsu.akimoto"
    url = "http://blog.nogizaka46.com/" + member_name + "/"
    #Create folder
    if not os.path.isdir(member_name):  # ”member_If there is no "name" folder
        print("Create folder")
        os.mkdir(member_name)
    #For counting the number of saved sheets
    cnt = 0
    #BeautifulSoup object generation
    headers = {"User-Agent": "Mozilla/5.0"}
    soup = BeautifulSoup(requests.get(
        url, headers=headers).content, 'html.parser')
    #Find the html where the image is located
    for entry in soup.find_all("div", class_="entrybody"):  #Get all entry bodies
        for img in entry.find_all("img"):  #Get all img
            cnt += 1
            urllib.request.urlretrieve(
                img.attrs["src"], "./" + member_name + "/" + member_name + "-" + str(cnt) + ".jpeg ")
    print("the image" + str(cnt) + "I saved a sheet.")
if __name__ == '__main__':
    scraping()
 
 Since the member's name is used as the URL, I put the name of the member I want to get in member_name.
Since the member's name is used as the URL, I put the name of the member I want to get in member_name.
member_name = "manatsu.akimoto"
url = "http://blog.nogizaka46.com/" + member_name + "/"
There is an easy-to-understand explanation on the following site. Reference site: https://python.civic-apps.com/beautifulsoup4-selector/
Looking at the html that makes up the blog,
 There is a body in the div tag of the class name "entrybody"
There is a body in the div tag of the class name "entrybody"
 There is an image in the img tag in it, so save it in a folder as soon as you find it.
There is an image in the img tag in it, so save it in a folder as soon as you find it.
for entry in soup.find_all("div", class_="entrybody"):#Get all entry bodies
    for img in entry.find_all("img"):#Get all img
        cnt += 1
        urllib.request.urlretrieve(img.attrs["src"], "./" + member_name + "/" + member_name + "-" + str(cnt) + ".jpeg ")


Create folder
I have saved 22 images.
Recommended Posts