Purpose

Write a script to search Twitter using Python. The final deliverable can be downloaded below.

https://github.com/mima3/searchTwitter

Preparation

(1) Python 2.7 must be installed. (2) Install python_twitter

easy_install python_twitter

(3) Get the API for Twitter from the following page https://dev.twitter.com/

Please refer to the following page for the detailed acquisition method. http://support.dreamone.co.jp/Pandora/dp.do?jumpTo=DreamX&variables%28LPID%29=162

Twitter API to use

application/rate_limit_status.json https://dev.twitter.com/docs/api/1.1/get/application/rate_limit_status Get the limits for each API. This allows you to find out how many times each API can be used later and when the reset time will be.

search/tweets.json https://dev.twitter.com/docs/api/1.1/get/search/tweets API for searching.

The maximum number that can be acquired with one API is 100. The result of the search via API may be different from the search from the official page. Officially, tweets 7 days ago are also searched, but in the case of API, past tweets are not searched.

Also, the tweets obtained by searching are different depending on result_type. Get recent in chronological order popular gets popular tweets mixed is a mixture of the above.

Search character

The characters that can be specified in the search API can be the search characters used in "advanced search". https://twitter.com/search-advanced

Search example using AND, OR

(Erin OR Eirin) AND (BBA OR Auntie OR Baba)

Search for what is tweeted as "Erin" or "Eirin" and "BBA" or "Auntie" or "Babaa".

An example of searching for what a specific user tweeted

Enter the user name after "from:".

from:mima_ita

Search using from seems to be limited to 100 in API.

An example of searching for a tweet in a specific place

Specify the coordinates and range after "geocode:". The following example is a tweet with a radius of 500m from Tokyo Tower.

geocode:35.65858,139.745433,0.5km

The API seems to have a limit of 100 searches using geocode.

Implementation example

Here, an implementation example of the process of searching for a specified search character is shown. Originally, only 100 items can be obtained with one search API, so modify this so that you can get the full limit.

First, search the first search API with "result_type = recent" and get it in chronological order. At this time, only the latest 100 cases are acquired.

In the second search, try to get the oldest tweets obtained in the first search. To do this, specify "max_id = previous minimum id-1".

You can repeat this until you can't get all of them, or you can repeat until you exceed the API limit obtained by rate_limit_status.

A simple sample of this is shown below.

#!/usr/bin/python
# -*- coding: utf-8 -*-
# python_twitter 1.1
import twitter
from twitter import Api
import sys
import time
reload(sys)
sys.setdefaultencoding('utf-8')
from collections import defaultdict



maxcount=1000
maxid =0
terms=["Rin Yainaga","Eirin","Erin"]
search_str=" OR ".join(terms)

api = Api(base_url="https://api.twitter.com/1.1",
                  consumer_key='XXXXX',
                  consumer_secret='XXXXX',
                  access_token_key='XXXXX',
                  access_token_secret='XXXXX')
rate = api.GetRateLimitStatus()
print "Limit %d / %d" % (rate['resources']['search']['/search/tweets']['remaining'],rate['resources']['search']['/search/tweets']['limit'])
tm = time.localtime(rate['resources']['search']['/search/tweets']['reset'])
print "Reset Time  %d:%d" % (tm.tm_hour , tm.tm_min)
print "-----------------------------------------\n"
found = api.GetSearch(term=search_str,count=100,result_type='recent')
i = 0
while True:
  for f in found:
    if maxid > f.id or maxid == 0:
      maxid = f.id
    print f.text
    i = i + 1
  if len(found) == 0:
    break
  if maxcount <= i:
    break
  print maxid
  found = api.GetSearch(term=search_str,count=100,result_type='recent',max_id=maxid-1)

print "-----------------------------------------\n"
rate = api.GetRateLimitStatus()
print "Limit %d / %d" % (rate['resources']['search']['/search/tweets']['remaining'],rate['resources']['search']['/search/tweets']['limit'])
tm = time.localtime(rate['resources']['search']['/search/tweets']['reset'])
print "Reset Time  %d:%d" % (tm.tm_hour , tm.tm_min)

Development system

You can download the script that evolved from the above from the following. https://github.com/mima3/searchTwitter

The above script saves the search results in SQLITE. This script searches past tweets to the limit of API call restrictions. When the script is executed next, it will be as follows.

__ If you have searched all searchable past tweets __ Search for tweets newer than the tweets registered in the DB.

__ If past tweets remain __ Search for tweets older than the last acquired tweet.

With this script, a large number of search results can be easily obtained.

Search Twitter using Python