Scraping the list of Go To EAT member stores in Niigata prefecture and converting it to CSV

Was released last time

Convert PDF of list of Go To EAT member stores in Niigata prefecture to CSV https://qiita.com/barobaro/items/74fb5bdedbf1ae7267a0

Can't find PDF, so scrape to create a list

Scraping

import re
import time

import requests
from bs4 import BeautifulSoup

url = "https://niigata-gte.com/shop/"

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko"
}

result = []

while True:

    r = requests.get(url, headers=headers)
    r.raise_for_status()

    soup = BeautifulSoup(r.content, "html.parser")

    for shop in soup.select("div#result > div.cont"):

        data = {}

        data["Dealer code"] = (
            shop.select_one("div.no").get_text(strip=True).split(":", 1)[-1]
        )

        span = shop.select("div.tag > span")

        data["area"] = span[0].get_text(strip=True)
        data["Genre"] = span[1].get_text(strip=True)

        if len(span) > 2:
            temp = {i.get("alt"): "○" for i in span[2].select("img")}
            data.update(temp)

        h4 = shop.select_one("h4")

        data["Store name"] = h4.get_text(strip=True)

        if h4.select_one("a"):

            link = h4.a.get("href")

            if link:
                data["home page"] = link

        p_add = shop.select_one("p.add").contents

        postcode, address = p_add[0].split(sep=None, maxsplit=1)

        #Extract latitude / longitude from google map link
        gps = re.search(r"(?<=@)(.+?),(.+?)(?=,\d{1,2}z)", p_add[1].a.get("href"))

        if gps:
            data["latitude"] = float(gps.group(1))
            data["longitude"] = float(gps.group(2))

        data["Postal code"] = postcode.strip()
        data["location"] = address.strip()

        data["phone number"] = shop.select_one("p.tel").get_text(strip=True)

        result.append(data)

    tag = soup.select_one("li.next")

    if tag:

        m = re.search("https://niigata-gte.com/shop/page/\d+/", tag.a.get("onclick"))

        if m:
            url = m.group(0)

    else:
        break

    time.sleep(3)

result

Convert to CSV

import pandas as pd

df = pd.DataFrame(result)

df.index += 1

df.to_csv("niigata.csv", encoding="utf_8_sig")

Recommended Posts

Scraping the list of Go To EAT member stores in Niigata prefecture and converting it to CSV
Scraping the list of Go To EAT member stores in Fukuoka prefecture and converting it to CSV
Scraping the member stores of Go To EAT in Osaka Prefecture and converting them to CSV
Convert PDF of Go To EAT member stores in Ishikawa prefecture to CSV
Convert PDF of available stores of Go To EAT in Kagoshima prefecture to CSV
Convert PDF of Kumamoto Prefecture Go To EAT member store list to CSV
Convert PDF of Chiba Prefecture Go To EAT member store list to CSV (command)
Convert PDF of Go To Eat Hokkaido campaign dealer list to CSV
Predict the amount of electricity used in 2 days and publish it in CSV
The story of creating a store search BOT (AI LINE BOT) for Go To EAT in Chiba Prefecture (1)
[Python] Precautions when retrieving data by scraping and putting it in the list
Even in the process of converting from CSV to space delimiter, seriously try to separate input / output and rules
Read the csv file and display it in the browser
Stumble when converting bidirectional list to JSON in Go
[Golang] Command to check the supported GOOS and GOARCH in a list (Check the supported platforms of the build)
Scraping PDF of the status of test positives in each prefecture of the Ministry of Health, Labor and Welfare
Recursively get the Excel list in a specific folder with python and write it to Excel.
Scraping the Excel file of the list of stores handling regional coupons
I wrote it in Go to understand the SOLID principle
Find the white Christmas rate by prefecture with Python and map it to a map of Japan
[Python] The role of the asterisk in front of the variable. Divide the input value and assign it to a variable
It is surprisingly troublesome to get a list of the last login date and time of Workspaces
Create a function to get the contents of the database in Go
[Python / Jupyter] Translate the comment of the program copied to the clipboard and insert it in a new cell
Use Cloud Dataflow to dynamically change the destination according to the value of the data and save it in GCS
Comparing the basic grammar of Python and Go in an easy-to-understand manner
The one that divides the csv file, reads it, and processes it in parallel
[Linux] Command to get a list of commands executed in the past
Hit the Rakuten Ranking API to save the ranking of any category in CSV
Convert PDF of new corona outbreak case in Aichi prefecture to CSV
I want to read CSV line by line while converting the field type (while displaying the progress bar) and process it.
How to save the feature point information of an image in a file and use it for matching
An engineer who has noticed the emo of cryptography is trying to implement it in Python and defeat it
Find it in the procession and edit it
Function to extract the maximum and minimum values ​​in a slice with Go
Various ways to read the last line of a csv file in Python
How to pass the execution result of a shell command in a list in Python
plot the coordinates of the processing (python) list and specify the number of times in draw ()
How to achieve something like a list of void * (or variant) in Go?
I stumbled on the character code when converting CSV to JSON in Python
[Python] How to name table data and output it in csv (to_csv method)
Use Pillow to make the image transparent and overlay only part of it
Scraping the rainfall data of the Japan Meteorological Agency and displaying it on M5Stack
[Python] Scan the inside of the folder including subfolders → Export the file list to CSV
I want to see a list of WebDAV files in the Requests module
How to get a list of files in the same directory with python