As the title suggests, I wrote a code that realizes automatic search with python using Twitter API and notifies the information obtained from it with LINE Notify.
The number of people infected each day is announced in connection with the epidemic of the new coronavirus. Regardless of whether or not you need to worry about the number, open Twitter to check the breaking news that you do not know when it will come out from noon to evening when the information on the number of infected people in Tokyo comes out and search. I've got a habit of calling. (I personally think that it is the fastest to check the flash report of the number of infected people on Twitter) ➡︎ I want to eliminate the work of searching on Twitter in vain! !! That is the motive for this time.
This implementation is specialized for the number of infected people in Tokyo, but it seems that it can be applied in various ways with a slight change.
-** Python execution environment : Jupyter Notebook is used here
- Twitter API registration : "Summary of procedures from Twitter API registration (account application method) to approval * Information as of August 2019" / 2524d21455aac111cdee) was read and registered
- LINE Notify registration : I used the method and code of "Send a message to LINE with Python" as it is.
--A python module that requires special environment construction
- tweepy : I think pip install tweepy
is okay
- mecab-python3 **: You need to install mecab itself and dictionaries as well as the python module. For details, I think that you can build an environment by referring to "A story of struggling to introduce mecab-python3 on Mac". Also, with Google Colabolatory, just run ! Pip install mecab-python3 == 0.7
and everything you need will be installed automatically, so that may be easier.
Preparing these may actually be the most troublesome part. It's fun to google and find various tools with "{what you want to do} python", but lol
The implementation details are explained below while looking at the actual code.
** 1. Various imports ** ** 2. Create an object for Twitter access ** ** 3. Create an object for LINE notification ** ** 4. Create an automatic search function **
import requests
import datetime
import time
import pandas as pd
from IPython.display import clear_output
#Function to erase the output of print
import tweepy
import MeCab
tagger = MeCab.Tagger("-Owakati")
The "tagger" in the last line is the object used when splitting sentences in MeCab.
Specifically, using tagger.parse
tagger.parse("It's nice weather today, is not it")
# => 'It's nice weather today, is not it\n'
tagger.parse("It's nice weather today, is not it").split()
# => ['today', 'Is', 'Good', 'weather', 'is', 'Ne']
It's like that. As we will see later, in this implementation, split () is used to make a list type and handle tweets.
Create an object that incorporates the key and token obtained by registering the Twitter API. Reference: "How to use Tweety ~ Part 1 ~ [Getting Tweet]"
consumer_key = "Key obtained here/Enter token"
consumer_secret = "Same as above"
access_token = "Same as above"
access_token_secret = "Same as above"
auth = tweepy.OAuthHandler(consumer_key,consumer_secret)
auth.set_access_token(access_token,access_token_secret)
api = tweepy.API(auth) #I will use this later
I used the code from the article I quoted earlier, "Send a message to LINE with Python" (https://qiita.com/moriita/items/5b199ac6b14ceaa4f7c9).
class LINENotifyBot:
API_URL = 'https://notify-api.line.me/api/notify'
def __init__(self, access_token):
self.__headers = {'Authorization': 'Bearer ' + access_token}
def send(
self, message,
image=None, sticker_package_id=None, sticker_id=None,
):
payload = {
'message': message,
'stickerPackageId': sticker_package_id,
'stickerId': sticker_id,
}
files = {}
if image != None:
files = {'imageFile': open(image, 'rb')}
r = requests.post(
LINENotifyBot.API_URL,
headers=self.__headers,
data=payload,
files=files,
)
access_toke_Notify = "Enter the token here"
bot_Notify = LINENotifyBot(access_token=access_token_Notify)
Now, if you do bot_Notify.send (message =" xxxxx ")
, LINE will be delivered to the specified token.
The basic idea is
--A certain number of the latest tweets containing the keywords "Tokyo, infection, {date} day" are extracted. --If there is an expression "n people", get that n --Of the n included in the acquired tweets, the one that appears most often is the candidate for the number of infected people. --Notify when the number of tweets including the number of infected people exceeds a certain percentage
It's like repeating this process at regular intervals.
So here is the final function ** auto_search ** to execute.
def auto_search(item=100,wait_time=60,rate=0.5):
"""
item:Number of tweets to retrieve
wait_time:Time interval for automatic search Unit s
rate:Percentage of tweets containing the estimated number of infected people
"""
d = datetime.datetime.now().day
m = datetime.datetime.now().month
print("searching on Twitter...")
pre_mode = 0 #Variable for recording numbers that previously exceeded rate
while True:
df = find_infected_num(d,item) # "n people"Function that returns n of the DataFrame type
num_mode = df.mode().values[0,0] #Mode of df=Get candidates for the number of infected people
count = df.groupby("num").size() #Aggregate data of the number of tweets per n
#num_The frequency of appearance of mode exceeds rate & num_mode is a new appearance
if count.max() > item*rate and num_mode!=pre_mode:
#Result output
print("\n--RESULT--")
print(count)
#LINE notification of results
text = "{}Month{}Day\n Infected people in Tokyo[{}]Man\n * Tweet ratio is{:.2f}%".format(m,d,num_mode,count.max()/item*100)
bot_Notify.send(message=text) #Send to LINE
#Conditional branching to allow continuation if the result is inappropriate
if input("\ncontinue? y/n ")=="n":
break #End
waiting(2,wait_time,count) #For display during waiting time
#pre_mode update
if count.max() > item*rate:
pre_mode = num_mode
** find_infected_num ** is a function to return n of "n people" in DataFrame. Here, the tagger prepared in 1 and the api prepared in 2 are used.
def find_infected_num(d,item):
num_list = [] #List to store n
for tweet in tweepy.Cursor(api.search, q=['infection',"Tokyo","{}Day".format(d)]).items(item):
split_tweet = tagger.parse(tweet.text).split()
if "Man" in split_tweet:
index = split_tweet.index("Man") - 1
n = cut_number(split_tweet,index) # "Man"A function that returns the number immediately before
num_list.append(n)
return pd.DataFrame(num_list,columns=["num"])
The ** cut_number ** included here is a function that gets the number immediately before the "person".
def cut_number(split_tweet,index):
start_i = index #A variable that represents the position where the number started in the tweet
# "Man"Returns 0 if the number immediately before is not a str type (10000 is appropriate)
if not split_tweet[index] in list(map(str,range(0,10000))):
return 0
ans = split_tweet[start_i] # "Man"Get the number just before
while True:
#Add to the left side of ans as long as the numbers continue
if split_tweet[start_i-1] in list(map(str,range(0,9))):
start_i -= 1
ans = split_tweet[start_i] + ans
#Returns ans when the number is over
else:
return ans
Let me explain a little about why you need such a function. For example, if there is a sentence "Today there are 123 infected people", if you try to divide by mecab,
tagger.parse("123 people are infected today").split()
# => ['today', 'of', 'infection', 'Person', 'Is', '1', '2', '3', 'Man']
In this way, 1, 2 and 3 have been separated. As it is, only the last digit of the number of infected people can be obtained, so I created ** cut_number ** to get the correct number.
Another function that appears in ** auto_search ** is ** waiting **, which is a function that visualizes the time remaining until the next automatic search. (It's like a bonus because it has nothing to do with the function of the main unit.)
def waiting(div,wait_time,count):
clear_output()
for i in range(1,wait_time//div+1):
print("waiting: |"+"*"*i+" "*(wait_time//div-i)+"|")
print("\n--RESULT--")
print(count)
time.sleep(div)
clear_output()
print("searching on Twitter...")
Due to the nature of the algorithm, it is not always possible to catch the preliminary report of the number of infected people, so I actually try to move it and adjust the parameters. (The above code uses the already adjusted values)
The following is the execution result of ** 7/19 **.
(Running with ʻitem = 30`)
At this time, the following notification was sent to LINE.
If you enter y in the input field of continue ?, the search will continue, but the number of infected people announced in Tokyo on this day is 188, so it is okay. (Even if you continue to search, you will not receive repeated notifications because pre_mode = 188.)
(Running with ʻitem = 100`)
When waiting, it is displayed like this. As you can see from RESULT, there are no candidates that exceed 50%, so no notification will come. On the contrary, it can be seen that the percentage decreases even if the number is correct after a certain period of time has passed since the number of infected people was announced.
Based on the test results, I will list future issues.
――The moment when the number of infected people is reported is not detected yet
--- rate = 0.5
, but depending on the time of day, an incorrect value may be detected as a preliminary value.
I would like to continue to run this program daily and test it to confirm these issues.
It seems that you can get the preliminary value of the number of infected people by specifying the tweet of the official news agency without taking the majority vote from miscellaneous tweets (and it is accurate), but you do not know exactly where to tweet the fastest. There is no such method, so I tried this method.
By the way, I tested it on 7/18, but the tweet rate of "290" exceeded 80% immediately after the announcement, probably because the number of infected people in Tokyo was 290. In response to that, 7/19 was initially operated as rate = 0.8
, but failed without receiving a notification even if the preliminary value came out. It was up to me to lower the rate and adjust it.
It is difficult to talk about different degrees depending on the day, but I am looking forward to seeing how accurate notifications can be made with the simple algorithm of "picking up many muttered numbers".
That's why it may include pursuing interest as personal development rather than practical use lol
Recommended Posts