Use Search Tweets: Full Archive / Sandbox in Python

Previous article I started Python for my thesis and tried to do various things with the Twitter API. Python history is no longer 10 days. Well, my skills haven't improved so much, but ...

This premise: Are all searches on the Twitter official website displayed?

In the end, the data collection for the thesis was a search on the official Twitter website, which has full access to past tweets. Primitive ~. Well, I was taking the number of data in that way, but as I go back in the past, the number of search results for that keyword will decrease ** more than I expected.

An unpleasant premonition that crosses my head there.

** "Maybe all the searches on the Twitter official website are not displayed ...?" **

Certainly, it is not unthinkable that the more past tweets are, the lower the browsing needs will be, and the search results will be thinned out. There is no quantitative analysis or shit with this. It is a crisis of thesis. (；・ ∀ ・)

Search specifications on the official Twitter website

Search on the official Twitter website (hereinafter referred to as Twitter search) has several search result display formats. A common focus is the difference between "topic tweets" and "latest." (In the past, there was a time when "topic tweets" were called "topics" and "latest" was called "all tweets.")

According to the official Twitter announcement ("Frequently Asked Questions about Search Results")

** Why doesn't my favorite tweet appear in the topical tweets? ** ** Topic tweets are the ones most relevant to your search. Twitter uses a number of factors to determine relevance, including the popularity of tweets (many people respond and share through retweets, replies, etc.), the keywords they contain, and more. If your favorite tweet doesn't appear in the topical tweet, it means that the tweet isn't very relevant to your search. To see recent tweets that match your search terms, click or tap All Tweets (Latest).

Of course, this data acquisition also used the "latest" tab. It used to be called "all tweets", so I thought that all tweets would be displayed ... but at the moment, it is called "latest" and "all tweets are displayed" in the official announcement. There was no clear announcement saying, "I'm doing it," so I can't confirm it.

What I did this time

Therefore, I decided to use the free trial version (Search Tweets: Full Archive / Sandbox) of the Twitter API to verify that all the results of the Twitter search are displayed. Due to the limit on the number of requests (50 requests / month for Sandbox), quantitative search is not possible, but it is good if you can search several times for verification. Also, I couldn't find a comprehensive Japanese article about Search Tweets: Full Archive / Sandbox in my view, so I'll try to summarize it (although I'm extremely overwhelmed). Obtaining a Twitter developer account has already been completed.

① Development environment setting on Dashboard

First of all, it seems that you have to log in with a developer account and set up the development environment of Search Tweets: Full Archive / Sandbox from Dashboard. ２.PNG In the above figure, nothing is displayed because the settings have already been completed, but in the initial state, "You must first set up a dev environment before accessing an endpoint and viewing usage." Is displayed at the bottom. I am. Click "set up a dev environment" and set the "Dev environment label" and which app to use Search Tweets: Full Archive / Sandbox. The Dev environment label (development in this case) will be used later, and the App Name should be the same as the consumer key used for authentication.

② Stumble points in the code

I was wondering if the code used in the previous free version search could be used as it is, but first I had to rewrite the URL of the endpoint ( That's right). What worked well with the free version 　https://api.twitter.com/1.1/search/tweets.json It was, but this time 　https://api.twitter.com/1.1/tweets/search/fullarchive/development.json have become. (Of course, you can find it in Search Tweets: Premium search APIs Reference.) As mentioned above, the Dev environment label you set is also included in this URL. In this case it is the development part. Please rewrite it according to your own needs.

Also, I don't know why because I'm illiterate, but the names of the parameters used are different between the search using the Standard search API and the search using the Premium search APIs. So, it didn't work unless I rewrote that point from the code used in the free version. (↑ Search Tweets: Standard search API Reference) (↑ Search Tweets: Premium search APIs Reference)

The name of the parameter that contains the search query is " q "in the free version, while it is" query" in the Premium search APIs. Other parameters were also rewritten while looking at the reference.

It's also plain, but the name of the internal structure of json that is spit out is also a little different, so I'm rewriting it.

③ Code

Since it is based on the previous code, I will quote here again, which is the base of the previous code. Thank you very much.

Play with twitter API # 3 (Get search results)

`test.py`


#! python3
# -*- coding: utf-8 -*-

import json
from requests_oauthlib import OAuth1Session

#OAuth authentication part
CK      = 'Obtained Consumer key'
CS      = 'Obtained Consumer secret'
AT      = 'Obtained Access token'
ATS     = 'Obtained Access token secret'
twitter = OAuth1Session(CK, CS, AT, ATS)

# Twitter Endpoint(Get search results)
url = 'https://api.twitter.com/1.1/tweets/search/fullarchive/development.json'

#Parameters to pass to Enedpoint
keyword = '"Pikmin"'

params ={
         'query' : keyword ,  #Search keyword
         'maxResults': 20 ,   #Number of tweets to get
         'fromDate' : 201301311500 ,
         'toDate' : 201302011500 
        }

req = twitter.get(url, params = params)

if req.status_code == 200:
    res = json.loads(req.text)
    for line in res['results']:
        print(line['text'])
        print('*******************************************')
else:
    print("Failed: %d" % req.status_code)

'fromDate' and 'toDate' are optional parameters, but these are specified in UTC. It's not Japan time.

④ Result

The result of turning this code and the result of searching ["Pikmin" since: 2013-02-01_00:00:00_JST until: 2013-02-02_00:00:00_JST](https: // twitter) .com / search? q =% 22% E3% 83% 94% E3% 82% AF% E3% 83% 9F% E3% 83% B3% 22% 20since% 3A2013-02-01_00% 3A00% 3A00_JST% 20until% I compared 3A2013-02-02_00% 3A00% 3A00_JST & src = typed_query & f = live). I tried to enclose the tweets displayed in only one of them with a red frame. Only tweets that start with "RT @ XXXX:" are displayed in the API search results. Is it the official RT at that time ...? On the other hand, some tweets were not displayed by API search. I don't know about this anymore. why. If you know anything, I would appreciate it if you could teach me. (Maybe the code is bad, I'll leave the one that is displayed as a double at the end in the search result with API)

Maybe I can write a thesis

Well, I don't know why there are tweets that "do not appear in API search results but appear in Twitter search results", but the search results that appear in "latest" in Twitter search are retweets. It seems that everything except is displayed (although it is possible that some tweets are not displayed in both ...). So, I think that the quantitative analysis based on Twitter search has some validity. I managed to connect the skin of my neck, so I will do my best to write my thesis. (= ゜ ω ゜) No

If you have any information, I would appreciate it if you could let me know. I am still a beginner, so please point out any points that you cannot reach.