I am a new graduate in 2020 and am working in IT. Since this is my first post on Qiita, I think there are some points that cannot be reached, but thank you.
I was addicted to making bots using discordApi, so I tried to make a bot like a title for scraping practice.
Windows
VPS
I will omit the token acquisition of bot because it is written a lot in other articles. Install the library used by the following command
pip install beautifulsoup4
pip install requests
pip install discord
The site targeted for scraping this time is Tokyo Metropolitan Government's new coronavirus infection control site.
First of all, let's verify whether you can get the data you want with Beautiful Soup. The source code is as follows.
soup.py
import requests
from bs4 import BeautifulSoup
#Get data by connecting to URL with get
res= requests.get("https://stopcovid19.metro.tokyo.lg.jp/cards/number-of-confirmed-cases/")
#Store object
soup = BeautifulSoup(res.text,"html.parser")
#Filter by the tag and class of the part you want, remove the part that does not use extract
con = soup.find("span",class_="DataView-DataInfo-summary")
con=soup.find("small",class_="DataView-DataInfo-summary-unit").extract()
text=con.get_text()
text= text.strip()
print(text+"Is a person")
When I try to do this I was able to get it firmly, and I was able to erase the blanks.
Next, we will incorporate this into the discord bot program.
Using the loop function included in discord.py
, I created a program that loops once every 60 seconds and scrapes and posts to the text channel at the specified time. The source is as follows.
bot.py
#coding:UTF-8
import discord
from discord.ext import tasks
from datetime import datetime
import requests
from bs4 import BeautifulSoup
TOKEN = "hoge" #token
CHANNEL_ID = hoge #Channel ID
#Generate the objects needed for the connection
client = discord.Client()
#Loop once every 60 seconds
@tasks.loop(seconds=60)
async def loop():
now = datetime.now().strftime('%H:%M')
if now == '20:05':
await client.wait_until_ready()
channel = client.get_channel(CHANNEL_ID)
#The previous code(soup.py)
res= requests.get("https://stopcovid19.metro.tokyo.lg.jp/cards/number-of-confirmed-cases/")
soup = BeautifulSoup(res.text,"html.parser")
con = soup.find("span",class_="DataView-DataInfo-summary")
con=soup.find("small",class_="DataView-DataInfo-summary-unit").extract()
text=con.get_text()
text= text.strip()
await channel.send("The number of infected people in Tokyo today is"+text+"Is a person")
#Bonus part. Ability to return comments in response to a specified word
@client.event
async def on_message(message):
#Ignore if the message sender is a bot
if message.author.bot:
return
#If you say "Shinjuku", "Dense" will be returned
if message.content == 'Shinjuku':
await message.channel.send('Dense')
if message.content == 'Edogawa Ward':
await message.channel.send('Dense')
#Loop processing execution
loop.start()
#Bot launch
client.run(TOKEN)
The time is set to 20:05 because it is estimated that the number of infected people on the site used this time will be updated at 20:00. Execution result
docker run -it --name hoge python /bin/bash
Build a python environment with
docker exec -it hoge bash
I went inside and started work.
First, install the required libraries with pip.
pip install beautifulsoup4
pip install requests
pip install discord
I can't edit without vim
apt-get update
apt-get install vim
I will put it in.
Because the default timezone is UTC
ln -sf /usr/share/zoneinfo/Asia/Tokyo /etc/localtime
Use this command to change the time zone.
After that, create an appropriate folder, create bot.py
in it, and copy and paste the source. This completes.
If you start it with python bot.py
, a bot will start up on discord and report the number of people infected with corona today in Tokyo at the specified time!
This time, we assume that the site will be updated at 20:00 and make a definite decision, but the ideal form is that the site will be updated and scraped at the same time and the bot will report it. In the loop, I thought that it would be possible to write a program that compares the data acquired in that loop with the data acquired last time, updates it if it matches, and the bot reports it, so if it improves, another article I will post it all together.
Recommended Posts