I thought it would be great if there was a program that could trace the user's tweets on Twitter and save the posted images at once. This program uses Python 2.7, which I am currently studying. I think it will work if you have a Python2 execution environment.
Basically, the API key of your Twitter account, the id of the account of the user you want to trace back the image, and the number of tweets to trace back are specified in the program and executed. It seems that the maximum number of tweets that can be traced back from one user is up to the latest 3200 tweets due to API restrictions. It also supports multiple image posting tweets, but GIFs, videos, etc. are not saved. If you are targeting a user with a keyed account, you need the API of the account that is following that user.
imgcrawler_twi.py
#coding: UTF-8
from requests_oauthlib import OAuth1Session
import json
import twitkey
import requests
import sys, urllib
import os.path
import time
twitter = OAuth1Session(twitkey.twkey["CONSUMER_KEY"],
twitkey.twkey["CONSUMER_SECRET"],
twitkey.twkey["ACCESS_TOKEN"],
twitkey.twkey["ACCESS_TOKEN_SECRET"]
)
#Separate file twitkey.Refer to each required parameter value from py.
Get_Count = 17 #Get_Specify the number of tweets to be acquired in Count
Get_At_Once = 200 #Get_At_Specify the number of tweets that can be traced back once in one acquisition
User_Id = "" #User_Specify the id of the user who wants to trace the image back to Id Example:github
Path = "" #Specify the file path of the directory where you want to save the image in Path Example:./Images/
for i in range(1,Get_Count):
if(i==1):
params = {"count":Get_At_Once}
else:
params = {"count":Get_At_Once,"max_id":num}
req = twitter.get("https://api.twitter.com/"
"1.1/statuses/user_timeline.json"
"?screen_name=%s&include_rts=false" % User_Id,
params=params)
timeline = json.loads(req.text)
if(req.status_code == 200):
if(i==1):
counter=1
else:
counter=count
for tweet in timeline:
print counter
print tweet["text"]
num = tweet["id"]
counter=counter+1
if("extended_entities" in tweet.keys()):
if("media" in tweet["extended_entities"].keys()):
print len(tweet["extended_entities"]["media"])
for i in range(0,len(tweet["extended_entities"]["media"])):
if("type" in tweet["extended_entities"]["media"][i].keys()):
if(tweet["extended_entities"]["media"][i]["type"]=="photo"):
print tweet["text"]
url = tweet["extended_entities"]["media"][i]["media_url_https"]
img = urllib.urlopen(url)
Name = tweet["user"]["name"]
created_at = tweet["created_at"]
Month = created_at[4:7]
Date = created_at[8:10]
Hour = created_at[11:13]
Minute = created_at[14:16]
Second = created_at[17:19]
Year = created_at[26:]
img_name = Name+"_"+Year+"_"+Month+"_"+Date+"_"+Hour+"_"+Minute+"_"+Second
localfile = open(Path + img_name +"_"+str(i)+".jpg ", 'wb')
localfile.write(img.read())
img.close()
localfile.close()
else:
print "No Image"
count=counter
else:
print (req.status_code)
time.sleep(240)
#Error handling
twitkey.py
#coding: UTF-8
twkey = {
"CONSUMER_KEY": "",
"CONSUMER_SECRET": "",
"ACCESS_TOKEN": "",
"ACCESS_TOKEN_SECRET": ""
}
#Please enter each parameter
In twitkey.py, enter the API key of your Twitter account. The account used here can be either abandoned or genuine. There are four parameter values required, but they are easy to see.
The four values required this time are CONSUMER_KEY, CONSUMER_SECRET, ACCESS_TOKEN, and ACCESS_TOKEN_SECRET.
For confirmation of API key, this page will be helpful. http://phiary.me/twitter-api-key-get-how-to/
There are four parameter values in the program. Specify the required values for each.
Get_Count =
Get_At_Once =
User_Id = ""
Path = ""
Use "Get_Count" and "Get_At_Once" to specify the number of tweets to go back. Specifically, it is like performing the operation of going back to the "Get_At_Once" tweet "Get_Count-1" times. The limit on the number of tweets that can be traced back at one time is 200. Therefore, the maximum value of "Get_At_Once" is 200. Also, as per the specification, there is a limit that you can only go back to the latest 3200 tweets per user, so please make sure that "Get_Count-1" x "Get_At_Once" is 3200 or less. If you want to go back a lot, we recommend that you specify 17 for "Get_Count" and 200 for "Get_At_Once".
For "User_Id", specify the id of the user who wants to trace the image. For example, it looks like "github".
For "Path", specify the file path of the folder where you want to save the image. Of course, you can save it anywhere, but for example, create a folder called "Images" in the same directory as the code, and use "./Images/" as the image.
"User's account name" _ "Tweet date and time (year, month, day, hour, minute, second)" _ "0 to 3 (This is the numbering when dealing with tweets with multiple images attached)".
Since the tweet date and time are obtained in London time, there is a time difference of 9 hours.
Save twitkey .py and imgcrawler_twi.py in the same directory and run imgcrawler_twi.py. It is successful if the image is successfully saved in the folder you specified.
Put this module in when running! You may get angry. In that case, please add modules such as "requests" and "requests-oauthlib" with pip and then execute it.
I also posted this crawler repository on github. Please take a look! !! https://github.com/tyokuyoku/Twitter_Images_Crawler
Recommended Posts