Windows10 Anaconda3 ( jupyter notebook )
A memorandum of graduation thesis of a university student The theme is to create a discriminator between what is spread and what is not spread in news tweets. This time, I am writing about getting Tweet in it.
・ Tweet Developer certified ・ Tweepy installed
https://qiita.com/i_am_miko/items/a2e5168e619ed37afeb9
The account to get is @livedoornews. The reason is that it excels in the number of followers and the sensitivity of those followers (whether to improve RT).
get_newstweet.ipynb
#Import the required libraries
import tweepy
import pandas as pd
get_newstweet.ipynb
#Consumer key and access token settings for using Twitter API
Consumer_key = "API key"
Consumer_secret = "API secret Key"
Access_token = "Access token"
Access_secret = "Access token secret"
#Authentication
auth = tweepy.OAuthHandler(Consumer_key,Consumer_secret)
auth.set_access_token(Access_token, Access_secret)
api = tweepy.API(auth)
get_newstweet.ipynb
#Specify account name
acount = "@livedoornews"
"""
Acquisition contents: Tweet number, time, tweet text, number of likes, number of RTs
"""
def get_tweets(acount):
tweet_data = [] #Empty list to store the data to get
for tweet in tweepy.Cursor(api.user_timeline,screen_name = acount,exclude_replies = True).items():
tweet_data.append([tweet.id,tweet.created_at,tweet.text.replace('\n',''),tweet.favorite_count,tweet.retweet_count])
df = pd.DataFrame(tweet_data,columns=['tweet_no', 'time', 'text', 'favorite_count', 'RT_count']) #Stored in pandas DataFrame
return df
df = get_tweets(acount)
If you want to continue taking tweets with the above function, you need to save additional. Therefore, I made two saving methods, one for new saving and the other for additional saving.
get_newstweet.ipynb
#Save new
file_name = "../data/tweet_{}.csv".format(acount)
df.to_csv(file_name, index=False) #index is often not needed
get_newstweet.ipynb
#overwrite save
file_name = "../data/tweet_{}.csv".format(acount)
pre_df = pd.read_csv(file_name) #Load the previous csv
df = pd.concat([df, pre_df])
df = df.drop_duplicates(subset=['tweet_no']) #Delete duplicates with Tweet No.(Leave the new data)
df.to_csv(file_name, index=False)
That's all for getting tweets and saving them. I think there is a better way to save new or overwrite. Next time, I would like to delete RT and URL.
Recommended Posts