It will be spoiled first, but it can not be filtered when acquiring with API except RT. (At least I) I will introduce how to exclude the acquisition result locally.

I want to get tweets with only comments with tweepy. A tweet with only a simple comment here is a tweet that does not include a URL.

Sample code

import tweepy

auth = tweepy.OAuthHandler(CK, CS)
auth.set_access_token(AT, AS)  
api = tweepy.API(auth)
result=api.user_timeline(screen_name="screen_name",count=n)
for result in results:
    print(result.entites)
    print(result.text)

For example, if you get a quote RT like this tweet, result.text will be

Quote RT https://t.co/ksg6oW95Wo

The URL of the citation source is included as in. Similarly, tweets that include media such as images and videos also get the URL to that media.

Solution


if(result.entities["urls"]!=[] or ("media"or"is_quote_status") in result.entities):

Commentary

・ Tweet with URL and quote RT

The ʻentities` of the above quote RT is as follows.

{'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [{'url': 'https://t.co/ksg6oW95Wo', 'expanded_url': 'https://twitter.com/T3ahat/status/1242458821550690304', 'display_url': 'twitter.com/T3ahat/status/…', 
'indices': [5, 28]}]}

As you can see, ʻurls contains the URL (quoted), so tweets containing quoted RTs and URLs You can judge whether __result.entities ["urls "]` is [] or __.

Also, for quote RT, result.is_quote_status is True, so You can also judge whether __result.is_quote_status is True or __.

・ Retweet

An example of RT's ʻentities` is shown below.

{'hashtags': [], 'symbols': [], 'user_mentions': [{'screen_name': 'T3ahat', 'name': 'teahat', 'id': 890647790048509952, 'id_str': '890647790048509952', 'indices': [3, 10]}], 'urls': []}

Thus, in RT, ʻurlsis []. However,result.textreturns the text of the body to be RT, so __ Add-filter: hoge` to the search word __. This is efficient because you can exclude it when you hit the API, so you don't have to get extra tweets.

・ Tweets including media

Such The ʻentities` of tweets including image media are as follows.

{'hashtags': [], 'symbols': [], 'user_mentions': [], 'urls': [], 'media': [{'id': 1242466345960144898, 'id_str': '1242466345960144898', 'indices': [6, 29], 'media_url': 'http://pbs.twimg.com/media/ET4gjwxVAAIVdD5.jpg', 'media_url_https': 'https://pbs.twimg.com/media/ET4gjwxVAAIVdD5.jpg', 'url': 'https://t.co/4IZF0jmTZy', 'display_url': 'pic.twitter.com/4IZF0jmTZy', 'expanded_url': 'https://twitter.com/T3ahat/status/1242466350351540225/photo/1', 'type': 'photo', 'sizes': {'thumb': {'w': 150, 'h': 150, 'resize': 'crop'}, 'large': {'w': 400, 'h': 400, 'resize': 'fit'}, 'medium': {'w': 400, 'h': 400, 'resize': 'fit'}, 'small': {'w': 400, 'h': 400, 'resize': 'fit'}}}]}

In this way, ʻurlsis [] for tweets that include media. However,result.textcontains the URL of the image. What is noteworthy here is thatmedia is added to ʻentities. Tweets that do not include media do not include the key media in the first place, so tweets that include media You can judge whether __ "media" in result.entities is True or __.

Summary

With URL, quote RT: result.entites ["urls"]! = [] Or result.is_quote_status RT: -filter: Exclude with hoge __ Media: __ "media" in result.entities

Exclude tweets containing URLs with tweepy [Python]