This article is the first day of Tokyo City University Advent Calendar 2019. https://adventar.org/calendars/4282
Hello! My name is Ojie (@ 920oj) and I am studying information systems at Tokyo City University. This time, I built an advent calendar of Tokyo City University, so I myself Write "The story of making a LINE bot that informs you of a 100 yen breakfast at university with Python"!
Our school has a very profitable system where you can eat breakfast for 100 yen. Furthermore, since I am living alone in Tokyo from a rural area, it is perfect for running out of money.
This is 100 yen
This menu is posted on the cafeteria website every morning, but this website is a songwriter and you have to go through the app or the school portal to access it.
Therefore, I decided to automatically scrape the cafeteria website every morning and create a LINE Notify that will notify LINE of the 100 yen breakfast of the day.
Python v3.7.1 pip 19.1.1 Windows 10 v1903
Tokyo City University Yokohama Campus Student Cafeteria 100 yen Breakfast notification bot https://github.com/920oj/TCU-YC-Breakfast-Notify-Bot
Like this, you will be notified every day at 7:30 in the morning (only if you have a 100 yen breakfast)!
First, get the school cafeteria website with Beautiful Soup. Since you need to log in to browse the school cafeteria website, we have established a guideline to throw POST once, authenticate, and then knead using the session ID.
However, if you take a closer look, there is no such thing as authentication, and it seems that you just write the ID and password in plain text in the cookie to authenticate. (Is it okay?)
(Originally, I authenticate once (POST), and I think that the session ID is linked to the fact that the authentication passed, and the subsequent requests pass, but for some reason it works without POST. It's good because it's good) (It's not good originally)
For the time being, when you go to the login page, it seems that a "session key" and a "session ID" are given, so it starts from getting this.
def get_sessionid():
#Initial cookie acquisition process
r = requests.get('https://livexnet.jp/local/default.asp')
first_access_cookie = str(r.headers['Set-Cookie'])
# "ASPSESSIONID+Any 8-digit uppercase letter"(24 uppercase letters)Get
asp_session = str(first_access_cookie[first_access_cookie.find("ASPSESSIONID"):first_access_cookie.find("; secure")])
asp_session_key = str(asp_session[0:asp_session.find("=")])
asp_session_id = str(asp_session[asp_session.find("="):].replace('=',''))
return asp_session_key, asp_session_id
The framework is like ASP.NET. As for ASPSESSIONID, it seems that an 8-digit uppercase letter is added at the end, so get this as well. The return value of this function is asp_session_key and asp_session_id, and two types are returned.
def get_breakfast_info(key,id):
#Cookies are available (information may change in the future)
site_cookies = {
key: id,
'KCD': '02320',
'company_id': SITE_ID,
'company_pw': SITE_PASS,
'wrd': 'jp',
'dip': '0',
'ink': 'a',
'bcd': '02320',
'val': 'daily'
}
#Access menu / nutrition table page
url = 'https://reporting.livexnet.jp/eiyouka/menu.asp?val=daily&bcd=02320&ink=a&col=&str=' + today_data
r = requests.get(url, cookies=site_cookies)
r.encoding = r.apparent_encoding
#HTML parsing
all_html = r.text.replace('<br>','')
souped_html = BeautifulSoup(all_html, 'lxml')
try:
breakfast = souped_html.find('p', class_="img_comment6").string
return breakfast
except:
return False
Use Chrome's developer tools to see what cookies are set.
After checking, prepare a cookie according to it and load it into beautifulsoup, so prepare a dictionary.
The key is the session key obtained earlier, and the id is the session ID. (This is not the name of the variable. Let's make it more descriptive)
As mentioned earlier, it seems that the authentication ID and password are read in plain text with SITE_ID and SITE_PASS (?), So specify it as it is.
(Isn't it meaningless to get this session ID ...? Please let me know if you have any details.)
All you have to do is load it into lxml, which parses HTML, and extract the class element "img_comment6" from it!
def post_line(result):
post_data = 'today(' + today_data + ')100 yen breakfast is' + result + 'is.'
line_api_headers = {"Authorization" : "Bearer "+ LINE_TOKEN}
line_payload = {"message" : post_data}
r = requests.post(LINE_API_URL ,headers = line_api_headers ,params=line_payload)
return r.status_code
All you have to do is skip to LINE Notify. With LINE Notify, you can put the authentication information and message content in the header and throw POST to the API endpoint to send the information to the preset talk.
def main():
print('Tokyo City University 100 Yen Breakfast Menu Display Program by 920OJ')
print('today' + today_data + 'is.')
session = get_sessionid()
session_key = session[0]
session_id = session[1]
print('Obtained initial credentials.' + session_key + 'Is' + session_id + 'is. Wait for 3 seconds ...')
sleep(3)
result = get_breakfast_info(session_key,session_id)
if not result:
print('Information could not be obtained. 100 yen Breakfast may not be available.')
sys.exit()
print('Today's 100 yen breakfast is' + result + 'is. Send a notification to LINE.')
post_status = post_line(result)
if post_status == 200:
print('The LINE notification was successful. Exit the program.')
else:
print('LINE notification failed. The response is' + str(post_status) + 'is. Exit the program.')
if __name__ == "__main__":
main()
After that, I will write the main process in the same way as assembling the function created earlier.
Finally, write ʻif name == "main": `to prevent the process from being executed arbitrarily if this program is imported somewhere. For the first time, I learned that the main () function is called when the names of the files being executed match.
I host it on my borrowed Lightsail (VPS) and run it regularly with cron. As soon as I run it, I go to get breakfast for the day, so I run cron every morning at 7:30 to realize regular execution.
To tell the truth, this code was made in April, so when I look at it now, there are places where the variable names are not correct and the implementation is ambiguous. I would like to rewrite the code when I have a little more free time.
Tomorrow is an article by Mr. K (@ ke_odakyu9000
)! Nice to meet you!
https://adventar.org/calendars/4282
Recommended Posts