Overview

I wanted to get the media timeline (the column where I pressed "media" on the user page), so I write the article as a reminder.

What to use

Python 3.8.3
TwitterAPI
tweepy

(By the way, the first Python)

0. Preparations

I referred to other people's articles for the tokens and Python preparation required for Twitter API connection (round throw)

1. First try using Tweepy

Refer to this article, try it on your timeline.

If you print the text of the obtained tweet, you can see the result.

キャプチャ.PNG

You can get the tweet. The image was also saved in the img folder of the source code folder. However, RT tweets are also mixed. When you look at the media on Twitter, you shouldn't see what the person RTs, and this time I want to exclude RTs as well.

2. Set include_rts = False to exclude RT

If you look at Official document, API specification of user_timeline, you can see that the API param is ** There is an exact parameter called include_rts **, set this to ** False **. In the case of tweepy, it seems that it can be set by writing as follows

`tweetpytest.py`


search_results = tweepy.Cursor(api.user_timeline, screen_name=key_account, include_rts=False).items(count_no)

This will exclude RT from the results. As a caveat, the count parameter does not match the number of tweets that were actually acquired, but looking at the official document again, it seems that after acquiring the timeline including RT, tweets are selected based on the param condition.

3. Analyze the return result of tweepy

After that, the return value of the returned tweepy is turned around with for in to get the image or video URL and throw it to the save method. The procedure is as follows

1. Confirm the existence of 'extended_entities' , otherwise it is not a media tweet, so it is not applicable

It seems that the return value of tweepy is ItemIterator type and the child element is Status type (confirmed by debugging). If it is a list type, it can be judged by list.get ('element name') etc., but this time it is a property of the class, so this method cannot be used. This time, it is judged by ** hasattr method **.

`tweetpytest.py`


if hasattr(result, 'extended_entities'):

Without this, it is not a media tweet, so ignore it and go to the next line.

2. Check if there is'video_info' in result ['extended_entities'] ['media'] [0], if it is a video or GIF, if not, confirm the image

Image tweets do not have video_info, so you can judge by this.

3. GIF if result ['extended_entities'] ['media'] [0] ['type'] = animated_gif. Save [0] as mp4

GIF is also saved as mp4 for the time being.

4. Loop result ['extended_entities'] ['media'] [0] ['video_info'] ['variants'], save mp4 with the largest binrate

`tweetpytest.py`


                bitrate_array = []
                for movie in ex_media_video_variants:
                    bitrate_array.append(movie.get('bitrate',0))
                max_index = bitrate_array.index(max(bitrate_array))
                movie_url = ex_media_video_variants[max_index]['url']

Elements are stored in different sizes and formats, and it seems that there is no guarantee as to what is in which index. If it is fixed, the size may be small, or you may get a video URL whose format is not mp4 (there was). Therefore, analyze the contents with a for statement so that you can drop the mp4 video with the largest binrate.

4. Source

Putting the above together, the source is as follows.

`tweetpytest.py`


# coding: UTF-8
#!/usr/bin/python3

import json, config #Standard json module and config.Loading py
from requests_oauthlib import OAuth1Session #Loading OAuth library
import tweepy
import time
import datetime
import urllib.error
import urllib.request
import re
import sys, calendar
import update_tweetinfo_csv

CONSUMER_KEY = config.CONSUMER_KEY
CONSUMER_SECRET = config.CONSUMER_SECRET
ACCESS_TOKEN = config.ACCESS_TOKEN
ACCESS_SECRET = config.ACCESS_TOKEN_SECRET

FOLDER_PASS = 'img/'

#Authentication
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
api = tweepy.API(auth)

def download_file(url, file_name):
   urllib.request.urlretrieve(url, FOLDER_PASS + file_name)

key_account = input('Enter account name:')
count_no = int(input('Set search count:'))
search_results = tweepy.Cursor(api.user_timeline, screen_name=key_account, include_rts=False).items(count_no)

for result in search_results:
    if hasattr(result, 'extended_entities'): #result is'extended_entities'Determine if you have an attribute
        ex_media = result.extended_entities['media']
        tweet_id = result.id
        if 'video_info' in ex_media[0]:
            ex_media_video_variants = ex_media[0]['video_info']['variants']
            media_name = '%s-%s.mp4' % (key_account, tweet_id)
            if 'animated_gif' == ex_media[0]['type']:
                #GIF file
                gif_url = ex_media_video_variants[0]['url']
                download_file(gif_url, media_name)
            else:
                #Video file
                bitrate_array = []
                for movie in ex_media_video_variants:
                    bitrate_array.append(movie.get('bitrate',0))
                max_index = bitrate_array.index(max(bitrate_array))
                movie_url = ex_media_video_variants[max_index]['url']
                download_file(movie_url, media_name)
        else:
            #Image file
            for image in ex_media:
                image_url = image['media_url']
                image_name = image_url.split("/")[len(image_url.split("/"))-1]
                download_file(image_url + ':orig', image_name)

print('End')

Execution image

An img folder is created under the folder with the source, and images and videos are saved under it. For the time being, the purpose of this time is achieved. キャプチャ.PNG

5. (Bonus) About since_id

For the time being, it is completed above, but with this, all tweets will be searched every time. If you want to search only tweets newer than this time when you start it next time, set the following param.

search_results = tweepy.Cursor(api.user_timeline, screen_name=key_account, include_rts=False, since_id='Last last tweet ID').items(count_no)

This will add the condition "since_id <search target ID".

At the end of the previous source, print the last id printsearch_results = tweepy.Cursor (api.user_timeline, screen_name = key_account, include_rts = False, since_id = pre_last_tweet_id-1) .items (count_no)

If you keep the last tweet ID at the end of the loop and save it in a text file or something in association with the user ID, it seems that the total number of searches can be reduced.

6. Digression

At first, I used it as a source to hit the API without using tweepy, but I was caught in the limit that I can only get up to 200 cases with one request. I tried to form a loop to access while getting since_id and max_id, but since tweepy can be taken normally, I changed to a source that uses tweepy on the way.

Based on this source, I plan to build an application that saves images a little more conveniently.

Reference URL

I tried to get Twitter images "batch" with python Official documentation, API specifications for user_timeline

Get media timeline images and videos with Python + Tweepy

Overview

What to use

0. Preparations

1. First try using Tweepy

2. Set include_rts = False to exclude RT

tweetpytest.py

3. Analyze the return result of tweepy

1. Confirm the existence of **'extended_entities' **, otherwise it is not a media tweet, so it is not applicable

tweetpytest.py

2. Check if there is'video_info' in result ['extended_entities'] ['media'] [0], if it is a video or GIF, if not, confirm the image

3. GIF if result ['extended_entities'] ['media'] [0] ['type'] = animated_gif. Save [0] as mp4

4. Loop result ['extended_entities'] ['media'] [0] ['video_info'] ['variants'], save mp4 with the largest binrate

tweetpytest.py

4. Source

tweetpytest.py

Execution image

5. (Bonus) About since_id

6. Digression

Reference URL

`tweetpytest.py`

1. Confirm the existence of 'extended_entities' , otherwise it is not a media tweet, so it is not applicable

`tweetpytest.py`

`tweetpytest.py`

`tweetpytest.py`