Write a script to search Twitter using Python. The final deliverable can be downloaded below.
https://github.com/mima3/searchTwitter
(1) Python 2.7 must be installed. (2) Install python_twitter
easy_install python_twitter
(3) Get the API for Twitter from the following page https://dev.twitter.com/
Please refer to the following page for the detailed acquisition method. http://support.dreamone.co.jp/Pandora/dp.do?jumpTo=DreamX&variables%28LPID%29=162
application/rate_limit_status.json https://dev.twitter.com/docs/api/1.1/get/application/rate_limit_status Get the limits for each API. This allows you to find out how many times each API can be used later and when the reset time will be.
search/tweets.json https://dev.twitter.com/docs/api/1.1/get/search/tweets API for searching.
The maximum number that can be acquired with one API is 100. The result of the search via API may be different from the search from the official page. Officially, tweets 7 days ago are also searched, but in the case of API, past tweets are not searched.
Also, the tweets obtained by searching are different depending on result_type. Get recent in chronological order popular gets popular tweets mixed is a mixture of the above.
The characters that can be specified in the search API can be the search characters used in "advanced search". https://twitter.com/search-advanced
(Erin OR Eirin) AND (BBA OR Auntie OR Baba)
Search for what is tweeted as "Erin" or "Eirin" and "BBA" or "Auntie" or "Babaa".
Enter the user name after "from:".
from:mima_ita
Search using from seems to be limited to 100 in API.
Specify the coordinates and range after "geocode:". The following example is a tweet with a radius of 500m from Tokyo Tower.
geocode:35.65858,139.745433,0.5km
The API seems to have a limit of 100 searches using geocode.
Here, an implementation example of the process of searching for a specified search character is shown. Originally, only 100 items can be obtained with one search API, so modify this so that you can get the full limit.
First, search the first search API with "result_type = recent" and get it in chronological order. At this time, only the latest 100 cases are acquired.
In the second search, try to get the oldest tweets obtained in the first search. To do this, specify "max_id = previous minimum id-1".
You can repeat this until you can't get all of them, or you can repeat until you exceed the API limit obtained by rate_limit_status.
A simple sample of this is shown below.
#!/usr/bin/python
# -*- coding: utf-8 -*-
# python_twitter 1.1
import twitter
from twitter import Api
import sys
import time
reload(sys)
sys.setdefaultencoding('utf-8')
from collections import defaultdict
maxcount=1000
maxid =0
terms=["Rin Yainaga","Eirin","Erin"]
search_str=" OR ".join(terms)
api = Api(base_url="https://api.twitter.com/1.1",
consumer_key='XXXXX',
consumer_secret='XXXXX',
access_token_key='XXXXX',
access_token_secret='XXXXX')
rate = api.GetRateLimitStatus()
print "Limit %d / %d" % (rate['resources']['search']['/search/tweets']['remaining'],rate['resources']['search']['/search/tweets']['limit'])
tm = time.localtime(rate['resources']['search']['/search/tweets']['reset'])
print "Reset Time %d:%d" % (tm.tm_hour , tm.tm_min)
print "-----------------------------------------\n"
found = api.GetSearch(term=search_str,count=100,result_type='recent')
i = 0
while True:
for f in found:
if maxid > f.id or maxid == 0:
maxid = f.id
print f.text
i = i + 1
if len(found) == 0:
break
if maxcount <= i:
break
print maxid
found = api.GetSearch(term=search_str,count=100,result_type='recent',max_id=maxid-1)
print "-----------------------------------------\n"
rate = api.GetRateLimitStatus()
print "Limit %d / %d" % (rate['resources']['search']['/search/tweets']['remaining'],rate['resources']['search']['/search/tweets']['limit'])
tm = time.localtime(rate['resources']['search']['/search/tweets']['reset'])
print "Reset Time %d:%d" % (tm.tm_hour , tm.tm_min)
You can download the script that evolved from the above from the following. https://github.com/mima3/searchTwitter
The above script saves the search results in SQLITE. This script searches past tweets to the limit of API call restrictions. When the script is executed next, it will be as follows.
__ If you have searched all searchable past tweets __ Search for tweets newer than the tweets registered in the DB.
__ If past tweets remain __ Search for tweets older than the last acquired tweet.
With this script, a large number of search results can be easily obtained.
Recommended Posts