"Communication restrictions" that everyone should have encountered in modern times It's painful that you can't use your smartphone freely on the go. ~~ It's just a matter of increasing the monthly packet limit ~~
Even so, there are times when I want to commute to work, go to school, and even know the weather. It's a problem if you don't know immediately when communication is restricted.
If you can see the text alone, it's quite so ... So why not send a message from LINE! ??
So, as a final goal, if I send the first and last trains on LINE, I wish I could get the information I wanted to know by replying, so I started development.
Maybe text can be done faster than looking at the app ...
First, collect information on each station! !!
First of all, I would like to exchange messages on LINE and create a system to notify the weather of each area as a practice of scraping.
For the time being, let's get information from yahoo weather using scraping.
import requests
from bs4 import BeautifulSoup
To do this, use requests to open the URL in python. Then import bs4 from BeautifulSoup to extract the data from HTML or XML after opening the URL.
Next, yahoo weather URL is as follows. https://weather.yahoo.co.jp/weather/ At this URL, you can see the weather all over the country at a glance, and you can dig deeper into the area you want to know.
First, get the URL of the region from all over the country.
target_url= "https://weather.yahoo.co.jp/weather/"
res = requests.get(target_url)
soup = BeautifulSoup(res.text, 'lxml')
elems = soup.find_all("a")
with open("yahooChiku.csv", "w", encoding="utf-8", newline ="") as ychiku:
writer = csv.writer(ychiku)
for e in elems:
chikuNumList = []
if re.search(r'<a data-ylk="', str(e)):
if re.search(r'"//weather.yahoo.co.jp/weather/jp/\d.*/"', str(e)):
row = re.search(r'"//weather.yahoo.co.jp/weather/jp/\d.*/"', str(e)).group().strip('"')
row = "https:" + row
chikuNumList.append(row)
writer.writerow(chikuNumList)
Please check various ways to use Beautiful Soup. I use it only on an ad hoc basis, so I lack knowledge ...
What we are doing with this source code is the process of extracting the URL for each region from the weather all over the country. You can get the tag containing the local URL with \ . However, there is other information contained in this tag, so use regular expressions to extract only those that include the URL of each region. Since https is not included in the tag information, add "https" at the beginning and write it to csv and save it.
It's going to be long for the time being, so that's it for now.
Now that I've got the URL of the region, I'd like to write an article about taking the URL of the city, ward, town, and village, the name of each region, and simply sending a message on LINE.
See you next time! !! !!
Recommended Posts