A wall that I immediately ran into when I wanted to use IBM cloud's personality insight. A memorandum at that time.
A sentence of 3000 words or more is required (or rather desirable). So I borrowed a celebrity tweet from twitter. It is considered that there are about 15 words per tweet, and 200 tweets are acquired per person.
However, as a major premise, it is necessary to register the twitter API. I have already applied for registration, so I will skip this step here.
-*- coding:utf-8 -*-
import tweepy
import re
import subprocess
# User list
import user_list
# twitter API authentication key
# Access_token, Access_secret, Consumer_key, Consumer_secret
from auth import twitter_credentials as tc
def get_twitterdata(username, rfile):
#Authentication key reading, API setting
auth = tweepy.OAuthHandler(tc.Consumer_key, tc.Consumer_secret)
auth.set_access_token(tc.Access_token, tc.Access_secret)
api = tweepy.API(auth, wait_on_rate_limit = True)
#List to store tweets
tweets_data =[]
Data acquisition up to # 200 tweet
for tweet in api.user_timeline(screen_name=username, count=200):
# Get tweet text
tmp_text=tweet.text
#Continuous line breaks are combined into one
tmp_text=re.sub('\n+','\n',tmp_text)
#Add tweet to list
tweets_data.append(tmp_text + '\n')
# File output
with open(rfile, "w",encoding="utf-8") as wf:
wf.writelines(tweets_data)
if __name__ == '__main__':
Get your #twitter username
userlist=user_list.username
for i in range(0,len(userlist)):
username = userlist[i]
rfile = "./data/tweet_"+str(i).zfill(3)+".csv"
try:
get_twitterdata(username, rfile)
# Generate an empty file if it cannot be obtained, such as when it is set to private
except:
subprocess.run(["touch",rfile])
It's crazy, but the person's name is hidden. username=[ "ariyoshihiroiki", "matsu_bouzu", "takapon_jp" ]
The description is omitted because it may be copyrighted. tweet_000.csv tweet_001.csv tweet_002.csv Can be done.
Since the amount of data that can be acquired every 15 minutes (?) Is limited, if you are greedy, you will have to wait a long time. You may also exclude retweets and other non-texts from the person in question.
Recommended Posts