This is an article of SLP KBIT Advent Calendar 2019. I've always wanted to collect images on twitter, so I took this opportunity to do it. This program is a program that downloads images from tweets of a specific account.
Python3.7.5
Since the image posted on twitter this time will be displayed, you will need a key for that. I will omit the method of obtaining the key because it will come out if you google. Since tweepy is used, install it.
pip install tweepy
Create a file to put the obtained key. It can be in the same location as the executable file, but I personally prefer this one ...
config.py
CONFIG1 = {
"CONSUMER_KEY":"XXXXXXXXXXX",
"CONSUMER_SECRET":"XXXXXXXXXXXX",
"ACCESS_TOKEN":"XXXXXXXXXXXXXXXXXXX",
"ACCESS_SECRET":"XXXXXXXXXXXXXXXXX",
}
Import what you need and bring the key from config.py. The bottom three lines are needed to use the API.
twitter.py
import tweepy
from config import CONFIG
import urllib.request
import re
CONSUMER_KEY = CONFIG["CONSUMER_KEY"]
CONSUMER_SECRET = CONFIG["CONSUMER_SECRET"]
ACCESS_TOKEN = CONFIG["ACCESS_TOKEN"]
ACCESS_SECRET = CONFIG["ACCESS_SECRET"]
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
api = tweepy.API(auth)
It is a program that runs on the terminal, and when an ID is given, tweets are acquired from the account with that ID, and if there is an image tweet, it is downloaded.
twitter.py
import tweepy
from config import CONFIG2
import urllib.request
import re
CONSUMER_KEY = CONFIG["CONSUMER_KEY"]
CONSUMER_SECRET = CONFIG["CONSUMER_SECRET"]
ACCESS_TOKEN = CONFIG["ACCESS_TOKEN"]
ACCESS_SECRET = CONFIG["ACCESS_SECRET"]
auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)
api = tweepy.API(auth)
#Search by keyword
def log(user_name, count, id):
result_url = []
for i in range(0, 2):
results = api.user_timeline(screen_name=user_name, count=count, max_id=id)
id = results[-1].id
for result in results:
if 'media' in result.entities:
judg = 'RT @' in result.text
if judg == False:
for media in result.extended_entities['media']:
result_url.append(media['media_url'])
return result_url
def extract_pic_file(image_url):
m = re.search(r"(([A-Za-z0-9]|_)+\.(png|jpg))", image_url)
if m:
name = 'img_dl/' + m.group(0)
else:
name = 'img_dl/None.png'
return name
def save_image(url, name):
count = 1
for image_url in url:
file_name = extract_pic_file(image_url)
urllib.request.urlretrieve(image_url, file_name)
count += 1
def fast(user_name):
results = api.user_timeline(screen_name=user_name, count="1")
for result in results:
id = result.id
return id
def start():
count = 100
user_name = input("Enter ID>>")
id = fast(user_name)
url = log(user_name, count, id - 1)
save_image(url, user_name)
if __name__ == "__main__":
start()
Get the URL with the log function and save the image in the folder specified by the save_image function.
You will receive the account ID, the number of tweets acquired, and the tweet ID as arguments. I will pass the results in a for statement. The URL of the image is stored in a json-like format, so take it out and store it in result_url. Make it a return value. I don't want to get the image retweeted this time, so I try to exclude it with an if statement.
sava_image As the name implies, it is a function that saves an image. At that time, an error will occur when saving the URL, so call the extract_pic_file function to convert the file name.
The file name of the image is decided from the URL using a regular expression. This will prevent the image from being covered when retrieving the image from the same account.
This is a process for exception handling. The reason why this is necessary is that the log function takes the ID of the tweet as an argument and retrieves the tweets past from that tweet. Therefore, the first exception handling is required.
I'm not good at writing in an easy-to-understand manner. I found it easier to understand what was going on inside the function, so I wrote a description for each function.
Recommended Posts