Motivation

In order to learn about crawling and how to handle APIs, I decided to actually make a crawling tool for Twitter.

environment

The environment for this development is as follows.

MacBook Air (Retina, 13-inch, 2018) Processor 1.6 GHz Intel Core i5 Memory 8 GB 2133 MHz LPDDR3

API installation procedure

To install the API, follow the steps below.

Get Twitter API key
Crawler mounting

1. Get Twitter API key

First, get your Twitter API key on this site 　Twitter Developers After logging in with the account you want the API key here, click Create an app and then press Approve. After that, you will be asked variously, but I will answer them appropriately. Then you will get the following four API keys.

Consumer Key 　Consumer Secret Key 　Access Token 　Access Secret Token

2. Crawler mounting

We will implement the crawler so that it can also be used as a module First, create the following config.py

`config.py`


Consumer_Key = "Comsumer Key"
Consumer_Secret_Key = "Consumer Secret Key"
Access_Token = "Access Token"
Access_Secret_Token = "Access Secret Token"

Next, create an empty file called __init__.py

`init.py`

Then we will use these files to implement the crawler. The name of the crawler file to be implemented is tweet_butler.py. Create this in the same hierarchy as the config.py file and implement it.

import

`tweet_butler.py`


from . import config
from requests_oauthlib import OAuth1Session
import json

Please note that the from .import config part is imported with a relative path, and if there is no __init __.py file, an Import error will occur. The config.py file is needed to use the API key, and the requests_oauthlib library is needed for OAuth authentication. Furthermore, since the API response is returned as a json file, import the json library as well. Next, search using the API.

`tweet_butler.py`


CK = config.Consumer_Key
CS = config.Consumer_Secret_Key
AT = config.Access_Token
AS = config.Access_Secret_Token

twitter = OAuth1Session(CK, CS, AT, AS)
params = { q : "Search word" }
res = twitter.get(url, params = params)

Now you get the response from the API. Then make sure the status code is 200 and then get the text in json format.

if res.status_code == 200:
    search_result = json.loads(res.text)

We want to perform these operations as a class, so we will prepare a new Butler class.

Butler class

`tweet_butler.py`


class Butler:
    def __init__(self,setting = config):
        self.api = OAuth1Session(setting.Consumer_Key, setting.Consumer_Secret_Key, setting.Access_Token, setting.Access_Secret_Token)
        self.search_url = "https://api.twitter.com/1.1/search/tweets.json"

    def search(self,word,count=10):
        params = {"q":word,"count":count}
        res = self.api.get(self.search_url, params = params)
        if res.status_code == 200:
            search_results = json.loads(res.text)
        tweets = [Tweet(result) for result in search_results["statuses"]]
        return tweets

When receiving the search results, there is a code called tweet (result), but since it is troublesome to handle tweets in dictionary format, we create a separate tweet class and pass a dictionary type array there.

Tweet class

`tweet_butler.py`


class Tweet:
    def __init__(self,tweet):
        self.dict = tweet
        self.tweet_id = tweet["id"]
        self.user_id = tweet["user"]["id"]
        self.user_name = tweet["user"]["screen_name"]
        self.text = tweet["text"]
        self.created_at = tweet["created_at"]
        self.favorite_count = tweet["favorite_count"]
        self.hashtags = tweet["entities"]["hashtags"]
        self.symbols = tweet["entities"]["symbols"]
        self.mention = tweet["entities"]["user_mentions"]
    
    def get_text(self):
        return self.text

At this point, we only need the text from the tweet, so we're making the tweet class easy. Now you can get the text by doing get_text on the list of tweets received from butler.

Continue

Next, I would like to add a function to search user's profile to Butler and create a class for user.

reference

[Crawler creation using Twitter API](https://datumstudio.jp/blog/twitterapi%E3%82%92%E7%94%A8%E3%81%84%E3%81%9F%E3%82%AF % E3% 83% AD% E3% 83% BC% E3% 83% A9% E3% 83% BC% E4% BD% 9C% E6% 88% 90) twitter developer

Crawling with Python and Twitter API 1-Simple search function

Motivation

environment

API installation procedure

1. Get Twitter API key

2. Crawler mounting

config.py

__init__.py

tweet_butler.py

tweet_butler.py

Butler class

tweet_butler.py

Tweet class

tweet_butler.py

Continue

reference

`config.py`

`init.py`

`tweet_butler.py`

`tweet_butler.py`

`tweet_butler.py`

`tweet_butler.py`