Let's get notified of the weather in your favorite area from yahoo weather on LINE! ~ PART2 ~

Last review

Until the last time, as mentioned in this article, "Let's get notified of the weather in your favorite area from yahoo weather on LINE!" I explained from yahoo weather to getting the URL of the whole country.

this time

This time, in this PART2, I will briefly explain from the URL of the national area to the acquisition of the URL of the detailed area (city, ward, town, village). think.

First, here is the URL I got last time. yahooURL.png

The information of the page obtained from the first page in this URL is as follows. yahoo天気.png You can easily see the weather of the main cities, wards, towns and villages that belong to the region.

If you click "Wakkanai" on this screen, the detailed weather and precipitation probability of "Wakkanai" will be displayed. Wakkanai.png

What to do from here ・ "Getting the name and URL of the area and city" ・ "Getting weather information from the URL of the city" There are two.

Program implementation

From here, I will explain how to actually get the information. First, I will explain "getting the name and URL of the area and city".

The program is as follows.

with open("yahooChiku.csv", "r", encoding="utf-8") as readChikuNum:
    reader         = csv.reader(readChikuNum)
    with open("shosaiChiku.csv", "w", encoding="cp932", newline="") as schiku:
        writer     = csv.writer(schiku)
        column     = ["Rural", "Municipality", "URL"]
        writer.writerow(column)
        for target_url in reader:
            res    = requests.get(target_url[0])
            soup   = BeautifulSoup(res.text, 'lxml')
            chiku  = re.search(r".*of", str(soup.find("title").text)).group().strip("of")
            elems  = soup.find_all("a")
            chikuList, shosaiNumList = [], []
            chikuNameList = [chikuName.get_text() for chikuName in soup.find_all(class_= "name")]
            for e in elems:
                if re.search(r'data-ylk="slk:prefctr', str(e)):
                    if re.search(r'"https://.*html"', str(e)):
                        row = re.search(r'"https://.*html"', str(e)).group().strip('"')
                        chikuList.append(chiku)
                        shosaiNumList.append(row)

            for p, e, c in zip(chikuList, chikuNameList, shosaiNumList):
                writeList = [p, e, c]
                writer.writerow(writeList)

The first with open reads the URL file, and the second with open opens the local and city and the file to write the URL of each. Next, store the html information in soup and acquire the necessary information in sequence. Adjust and assign the local name of the acquisition destination to chiku so that it is only the local name with a regular expression. In elems, save the a tag of html with find_all to get the URL of the city / ward / town / village destination.

This is where the variables written to the file come into play. In chikuNameList, the one whose tag is "name" is obtained from the local html using the inclusion notation. Fortunately, all the city names are in the "name" tag. Regarding the for statement, since there is a URL of the city, ward, town, and village in the "data-ylk =" slk: prefctr "tag", set the condition in the first if statement. Since the "data-ylk =" slk: prefctr "tag has data other than the URL of the city, ward, town, and village, only the one that matches the URL format is judged by the regular expression search. Then add the district name to chikuList and the URL of the city to shosaiNumList.

In the last for statement, write the local name, city, ward, town, and URL stored in the list to "shosaiChiku.csv" line by line.

And the resulting file looks like this: shosaiChiku.png

It is possible to access the URL of each city, ward, town, and village as it is, and bring in the desired data by regular expression or scraping, but I noticed that there is RSS, so I decided to add that as well.

df          = pd.read_csv("shosaiChiku.csv", encoding="cp932")
with open("dataBase.csv", "w", encoding="cp932", newline="") as DBcsv:
    writer  = csv.writer(DBcsv)
    #Header writing
    columns = ["Rural", "Municipality", "URL", "RSS"]
    writer.writerow(columns)

    #Write data (town name, city, URL, RSS) line by line
    for place, city, url in zip(df["Rural"], df["Municipality"], df["URL"]):
        row    = [place, city, url]
        rssURL = "https://rss-weather.yahoo.co.jp/rss/days/"
        #From the URL "Number.Get "html"> "Numbers".Molded into rss
        url_pattern = re.search(r"\d*\.html", url).group()
        url_pattern = url_pattern.replace("html", "xml")
        rssURL      = rssURL + url_pattern
        row.append(rssURL)
        writer.writerow(row)   

Almost everything you do is the same as the previous source. Since most of the data is contained in shosaiChiku, please enter the RSS URL. Just add a little bit. (I changed my mind and tried using pandas read_csv.) The URL that is the basis of RSS is the character string " https://rss-weather.yahoo.co.jp/rss/days/ "in rssURL. What the program is doing is to first read shosaiChiku line by line and get the "region", "city", and "URL". I noticed that the URL after "days /" in RSS is the same as the number part of the URL of the city, ward, town, and village. Next, extract only the number part of the URL of the city, ward, town, and village with a regular expression. Also, RSS is not ".html" but ".xml", so convert it. Now that we know the RSS URL, we append it to the list and write it.

Here is the resulting file. database.png It's hard to see because you don't open it directly and use it, but now you have the data to do what you want. (When I have time, I plan to use sqlite to make it look like databese)

in conclusion

I wrote a lot, but it's been long, so I'll stop by "getting the name and URL of the area and city" out of the two. Also, in the next update, I hope I can explain how to get the weather information and send it on LINE. .. ..

So next time.

Recommended Posts

Let's get notified of the weather in your favorite area from yahoo weather on LINE! ~ PART2 ~
Let's get notified of the weather in your favorite area from yahoo weather on LINE!
Get your heart rate from the fitbit API in Python!
Get information from the Japan Meteorological Agency and notify Slack of weather warnings in the 23 wards of Tokyo
How is the progress? Let's get on with the boom ?? in Python
Get the title and delivery date of Yahoo! News in Python
Get the number of readers of a treatise on Mendeley in Python
Get, test, and submit test cases on the command line in the AtCoder contest
Get the list of packages for the specified user from the packages registered on PyPI
Implement part of the process in C ++
Let's claim the possibility of pyenv-virtualenv in 2021
Let's automatically display the lyrics of the song being played on iTunes in Python
Get only the source code of the PyPI package with pip from the command line
Get the contents of git diff from python
Get the weather in Osaka via WebAPI (python)
[Python] Get the main topics of Yahoo News
Get the caller of a function in Python
Get only the address part of NIC (eth0)
Let's erase the unintended (base) that appears in the terminal [Get out of the Conda environment]
Get the value of a specific key in a list from the dictionary type in the list with Python
[Django] Let's try to clarify the part of Django that was somehow through in the test
How strong is your Qiita? Statistics on the number of Contributes seen in the data