In order to learn about crawling and how to handle APIs, I decided to actually make a crawling tool for Twitter.
The environment for this development is as follows.
MacBook Air (Retina, 13-inch, 2018) Processor 1.6 GHz Intel Core i5 Memory 8 GB 2133 MHz LPDDR3
To install the API, follow the steps below.
First, get your Twitter API key on this site Twitter Developers After logging in with the account you want the API key here, click Create an app and then press Approve. After that, you will be asked variously, but I will answer them appropriately. Then you will get the following four API keys.
Consumer Key Consumer Secret Key Access Token Access Secret Token
We will implement the crawler so that it can also be used as a module First, create the following config.py
config.py
Consumer_Key = "Comsumer Key"
Consumer_Secret_Key = "Consumer Secret Key"
Access_Token = "Access Token"
Access_Secret_Token = "Access Secret Token"
Next, create an empty file called __init__.py
__init__.py
Then we will use these files to implement the crawler. The name of the crawler file to be implemented is tweet_butler.py. Create this in the same hierarchy as the config.py file and implement it.
import
tweet_butler.py
from . import config
from requests_oauthlib import OAuth1Session
import json
Please note that the from .import config
part is imported with a relative path, and if there is no __init __.py
file, an Import error will occur.
The config.py file is needed to use the API key, and the requests_oauthlib library is needed for OAuth authentication. Furthermore, since the API response is returned as a json file, import the json library as well.
Next, search using the API.
tweet_butler.py
CK = config.Consumer_Key
CS = config.Consumer_Secret_Key
AT = config.Access_Token
AS = config.Access_Secret_Token
twitter = OAuth1Session(CK, CS, AT, AS)
params = { q : "Search word" }
res = twitter.get(url, params = params)
Now you get the response from the API. Then make sure the status code is 200 and then get the text in json format.
if res.status_code == 200:
search_result = json.loads(res.text)
We want to perform these operations as a class, so we will prepare a new Butler class.
tweet_butler.py
class Butler:
def __init__(self,setting = config):
self.api = OAuth1Session(setting.Consumer_Key, setting.Consumer_Secret_Key, setting.Access_Token, setting.Access_Secret_Token)
self.search_url = "https://api.twitter.com/1.1/search/tweets.json"
def search(self,word,count=10):
params = {"q":word,"count":count}
res = self.api.get(self.search_url, params = params)
if res.status_code == 200:
search_results = json.loads(res.text)
tweets = [Tweet(result) for result in search_results["statuses"]]
return tweets
When receiving the search results, there is a code called tweet (result)
, but since it is troublesome to handle tweets in dictionary format, we create a separate tweet class and pass a dictionary type array there.
tweet_butler.py
class Tweet:
def __init__(self,tweet):
self.dict = tweet
self.tweet_id = tweet["id"]
self.user_id = tweet["user"]["id"]
self.user_name = tweet["user"]["screen_name"]
self.text = tweet["text"]
self.created_at = tweet["created_at"]
self.favorite_count = tweet["favorite_count"]
self.hashtags = tweet["entities"]["hashtags"]
self.symbols = tweet["entities"]["symbols"]
self.mention = tweet["entities"]["user_mentions"]
def get_text(self):
return self.text
At this point, we only need the text from the tweet, so we're making the tweet class easy. Now you can get the text by doing get_text on the list of tweets received from butler.
Next, I would like to add a function to search user's profile to Butler and create a class for user.
[Crawler creation using Twitter API](https://datumstudio.jp/blog/twitterapi%E3%82%92%E7%94%A8%E3%81%84%E3%81%9F%E3%82%AF % E3% 83% AD% E3% 83% BC% E3% 83% A9% E3% 83% BC% E4% BD% 9C% E6% 88% 90) twitter developer
Recommended Posts