It is assumed that you can use the Twitter API (sorry) This article is for people who want to do various things using tweets but are having trouble collecting data.
If you try to search tweets normally using Twitter's API search, you can only get the top 100. However, you can get tweets up to the last week by tracing the ID assigned to each tweet. The following is the search program.
tweet_search.py
# -*- coding:utf-8 -*-
import tweepy
def search_tweet(api):
print('Output from the latest tweets')
print('Search page: 1')
tweet_data = api.search(q=' ', count=100)
for tweet in tweet_data:
print(tweet.text)
print('************************************************\n')
next_max_id = tweet_data[-1].id
for i in range(2, 11):
print('Search page:' + str(i))
tweet_data = api.search(q=' ', count=100, max_id=next_max_id-1)
next_max_id = tweet_data[-1].id
for tweet in tweet_data:
print(tweet.text)
print('************************************************\n')
if '__name__' == '__main__':
consumer_key = "XXXXXXXXXXXXXXXXXXXXXXX"
consumer_secret = "XXXXXXXXXXXXXXXXXXXXXXXX"
access_token = "XXXXXXXXXXXXXXXXXXXXXXXXXXXX"
access_token_secret = "XXXXXXXXXXXXXXXXXXXXXXXXXX"
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth)
search_tweet(api)
To briefly explain the mechanism ・ Every tweet has a tweet ID (all numbers). -The ID is larger for the latest tweet (for example, if the ID of the latest tweet is 7000, the ID of the previous tweet is 6999). ・ Normal API search goes back to the past tweets in order from the latest tweets. -And API search has an argument called max_id, and by specifying this ID, you can search only tweets with this value or less.
By using these four points, the process of "searching from tweet IDs of 7,000 or less" becomes possible.
In other words
By repeating this, you can keep getting past tweets indefinitely! I'd like to say that, but unfortunately the API can only get tweets within a week, so it's not infinite. However, the number of tweets that can be collected is overwhelmingly larger than using API search normally.
that's all.
Recommended Posts