I wanted to get the media timeline (the column where I pressed "media" on the user page), so I write the article as a reminder.
(By the way, the first Python)
I referred to other people's articles for the tokens and Python preparation required for Twitter API connection (round throw)
Refer to this article, try it on your timeline.
If you print the text of the obtained tweet, you can see the result.
You can get the tweet. The image was also saved in the img folder of the source code folder. However, RT tweets are also mixed. When you look at the media on Twitter, you shouldn't see what the person RTs, and this time I want to exclude RTs as well.
If you look at Official document, API specification of user_timeline, you can see that the API param is ** There is an exact parameter called include_rts **, set this to ** False **. In the case of tweepy, it seems that it can be set by writing as follows
tweetpytest.py
search_results = tweepy.Cursor(api.user_timeline, screen_name=key_account, include_rts=False).items(count_no)
This will exclude RT from the results. As a caveat, the count parameter does not match the number of tweets that were actually acquired, but looking at the official document again, it seems that after acquiring the timeline including RT, tweets are selected based on the param condition.
After that, the return value of the returned tweepy is turned around with for in to get the image or video URL and throw it to the save method. The procedure is as follows
It seems that the return value of tweepy is ItemIterator type and the child element is Status type (confirmed by debugging). If it is a list type, it can be judged by list.get ('element name') etc., but this time it is a property of the class, so this method cannot be used. This time, it is judged by ** hasattr method **.
tweetpytest.py
if hasattr(result, 'extended_entities'):
Without this, it is not a media tweet, so ignore it and go to the next line.
Image tweets do not have video_info, so you can judge by this.
GIF is also saved as mp4 for the time being.
tweetpytest.py
bitrate_array = []
for movie in ex_media_video_variants:
bitrate_array.append(movie.get('bitrate',0))
max_index = bitrate_array.index(max(bitrate_array))
movie_url = ex_media_video_variants[max_index]['url']
Elements are stored in different sizes and formats, and it seems that there is no guarantee as to what is in which index. If it is fixed, the size may be small, or you may get a video URL whose format is not mp4 (there was). Therefore, analyze the contents with a for statement so that you can drop the mp4 video with the largest binrate.
Putting the above together, the source is as follows.
tweetpytest.py
# coding: UTF-8
#!/usr/bin/python3
import json, config #Standard json module and config.Loading py
from requests_oauthlib import OAuth1Session #Loading OAuth library
import tweepy
import time
import datetime
import urllib.error
import urllib.request
import re
import sys, calendar
import update_tweetinfo_csv
CONSUMER_KEY = config.CONSUMER_KEY
CONSUMER_SECRET = config.CONSUMER_SECRET
ACCESS_TOKEN = config.ACCESS_TOKEN
ACCESS_SECRET = config.ACCESS_TOKEN_SECRET
FOLDER_PASS = 'img/'
#Authentication
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
api = tweepy.API(auth)
def download_file(url, file_name):
urllib.request.urlretrieve(url, FOLDER_PASS + file_name)
key_account = input('Enter account name:')
count_no = int(input('Set search count:'))
search_results = tweepy.Cursor(api.user_timeline, screen_name=key_account, include_rts=False).items(count_no)
for result in search_results:
if hasattr(result, 'extended_entities'): #result is'extended_entities'Determine if you have an attribute
ex_media = result.extended_entities['media']
tweet_id = result.id
if 'video_info' in ex_media[0]:
ex_media_video_variants = ex_media[0]['video_info']['variants']
media_name = '%s-%s.mp4' % (key_account, tweet_id)
if 'animated_gif' == ex_media[0]['type']:
#GIF file
gif_url = ex_media_video_variants[0]['url']
download_file(gif_url, media_name)
else:
#Video file
bitrate_array = []
for movie in ex_media_video_variants:
bitrate_array.append(movie.get('bitrate',0))
max_index = bitrate_array.index(max(bitrate_array))
movie_url = ex_media_video_variants[max_index]['url']
download_file(movie_url, media_name)
else:
#Image file
for image in ex_media:
image_url = image['media_url']
image_name = image_url.split("/")[len(image_url.split("/"))-1]
download_file(image_url + ':orig', image_name)
print('End')
An img folder is created under the folder with the source, and images and videos are saved under it. For the time being, the purpose of this time is achieved.
For the time being, it is completed above, but with this, all tweets will be searched every time. If you want to search only tweets newer than this time when you start it next time, set the following param.
search_results = tweepy.Cursor(api.user_timeline, screen_name=key_account, include_rts=False, since_id='Last last tweet ID').items(count_no)
This will add the condition "since_id <search target ID".
At the end of the previous source, print the last id printsearch_results = tweepy.Cursor (api.user_timeline, screen_name = key_account, include_rts = False, since_id = pre_last_tweet_id-1) .items (count_no)
If you keep the last tweet ID at the end of the loop and save it in a text file or something in association with the user ID, it seems that the total number of searches can be reduced.
At first, I used it as a source to hit the API without using tweepy, but I was caught in the limit that I can only get up to 200 cases with one request. I tried to form a loop to access while getting since_id and max_id, but since tweepy can be taken normally, I changed to a source that uses tweepy on the way.
Based on this source, I plan to build an application that saves images a little more conveniently.
I tried to get Twitter images "batch" with python Official documentation, API specifications for user_timeline
Recommended Posts