Scraping: Save website locally

Scraping: Save website locally

memorandum. For backup purposes only. Solar electromagnetic waves? Solar magnetic storm? In preparation for a global power outage like. This is done because microCMS does not have a backup function.

code

import os
from urllib.request import *

#URL for each article category
#base_url = "https://benzoinfojapan.org/patients-article/"
#base_url = "https://benzoinfojapan.org/doctors-article/"
base_url = "https://benzoinfojapan.org/medias-article/"

#Save destination file name prefix
#prefix = "patients-article"
#prefix = "doctors-article"
prefix = "medias-article"

num = 1

#While num for each category article upper limit<=Set to X.Below are the current values as of October 2020.
#For patients 10
#For patients 26
#For patients 13
#       ↓↓
while num <= 13:
    print("Download started")
 
    #Directory where HTML files are saved
    save_dir = os.path.dirname(os.path.abspath(__file__)) + "/html/"
    #Create directory if it does not exist
    if not os.path.exists(save_dir): 
        os.mkdir(save_dir)

    url=base_url + str(num)

    #Destination file path
    num_str = str(num)
    save_file = save_dir + prefix + num_str + ".html"

    urlretrieve(url, save_file)

    # doctors-Necessary processing because the article of article is missing 22nd^^;
    if num != 11:
        num += 1
    else:
        num += 2

How to use

Run the above code three times, changing the parameters for each of the three categories.

The only changes are as follows.

result

Each page is saved as an HTML file on your local drive. image.png

that's all.

Recommended Posts

Scraping: Save website locally
Save images with web scraping
Scraping Shizuoka's GoToEat official website
Website scraping with Python's Beautiful Soup
Scraping 1
Scraping a website using JavaScript in Python
Python) Save scraping content to local PC