It will be spoiled first, but it can not be filtered when acquiring with API except RT. (At least I) I will introduce how to exclude the acquisition result locally.
I want to get tweets with only comments with tweepy. A tweet with only a simple comment here is a tweet that does not include a URL.
import tweepy
auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AT, AS)
api = tweepy.API(auth)
result=api.user_timeline(screen_name="screen_name",count=n)
for result in results:
print(result.entites)
print(result.text)
For example, if you get a quote RT like this tweet, result.text
will be
Quote RT https://t.co/ksg6oW95Wo
The URL of the citation source is included as in. Similarly, tweets that include media such as images and videos also get the URL to that media.
if(result.entities["urls"]!=[] or ("media"or"is_quote_status") in result.entities):
The ʻentities` of the above quote RT is as follows.
{'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/ksg6oW95Wo', 'expanded_url': 'https://twitter.com/T3ahat/status/1242458821550690304', 'display_url': 'twitter.com/T3ahat/status/…',
'indices': [5, 28]}]}
As you can see, ʻurls contains the URL (quoted), so tweets containing quoted RTs and URLs You can judge whether __
result.entities ["urls "]` is [] or __.
Also, for quote RT, result.is_quote_status
is True, so
You can also judge whether __result.is_quote_status is True or __.
An example of RT's ʻentities` is shown below.
{'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'T3ahat', 'name': 'teahat', 'id': 890647790048509952, 'id_str': '890647790048509952', 'indices': [3, 10]}], 'urls': []}
Thus, in RT, ʻurlsis []. However,
result.textreturns the text of the body to be RT, so __ Add
-filter: hoge` to the search word __.
This is efficient because you can exclude it when you hit the API, so you don't have to get extra tweets.
Such The ʻentities` of tweets including image media are as follows.
{'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 1242466345960144898, 'id_str': '1242466345960144898', 'indices': [6, 29], 'media_url': 'http://pbs.twimg.com/media/ET4gjwxVAAIVdD5.jpg', 'media_url_https': 'https://pbs.twimg.com/media/ET4gjwxVAAIVdD5.jpg', 'url': 'https://t.co/4IZF0jmTZy', 'display_url': 'pic.twitter.com/4IZF0jmTZy', 'expanded_url': 'https://twitter.com/T3ahat/status/1242466350351540225/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 400, 'h': 400, 'resize': 'fit'}, 'medium': {'w': 400, 'h': 400, 'resize': 'fit'}, 'small': {'w': 400, 'h': 400, 'resize': 'fit'}}}]}
In this way, ʻurlsis [] for tweets that include media. However,
result.textcontains the URL of the image. What is noteworthy here is that
media is added to ʻentities
. Tweets that do not include media do not include the key media
in the first place, so tweets that include media
You can judge whether __ "media" in result.entities is True or __.
With URL, quote RT: result.entites ["urls"]! = [] Or result.is_quote_status RT: -filter: Exclude with hoge __ Media: __ "media" in result.entities
Recommended Posts