Introduction

This is an article of SLP KBIT Advent Calendar 2019. I've always wanted to collect images on twitter, so I took this opportunity to do it. This program is a program that downloads images from tweets of a specific account.

environment

Python3.7.5

Preparation

Since the image posted on twitter this time will be displayed, you will need a key for that. I will omit the method of obtaining the key because it will come out if you google. Since tweepy is used, install it.

pip install tweepy

Create a file to put the obtained key. It can be in the same location as the executable file, but I personally prefer this one ...

`config.py`


CONFIG1 = {
    "CONSUMER_KEY":"XXXXXXXXXXX",
    "CONSUMER_SECRET":"XXXXXXXXXXXX",
    "ACCESS_TOKEN":"XXXXXXXXXXXXXXXXXXX",
    "ACCESS_SECRET":"XXXXXXXXXXXXXXXXX",
   }

Image acquisition

Import what you need and bring the key from config.py. The bottom three lines are needed to use the API.

`twitter.py`


import tweepy
from config import CONFIG
import urllib.request
import re


CONSUMER_KEY = CONFIG["CONSUMER_KEY"]
CONSUMER_SECRET = CONFIG["CONSUMER_SECRET"]
ACCESS_TOKEN = CONFIG["ACCESS_TOKEN"]
ACCESS_SECRET = CONFIG["ACCESS_SECRET"]

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

api = tweepy.API(auth)

Whole program

It is a program that runs on the terminal, and when an ID is given, tweets are acquired from the account with that ID, and if there is an image tweet, it is downloaded.

`twitter.py`


import tweepy
from config import CONFIG2
import urllib.request
import re


CONSUMER_KEY = CONFIG["CONSUMER_KEY"]
CONSUMER_SECRET = CONFIG["CONSUMER_SECRET"]
ACCESS_TOKEN = CONFIG["ACCESS_TOKEN"]
ACCESS_SECRET = CONFIG["ACCESS_SECRET"]

auth = tweepy.OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_SECRET)

api = tweepy.API(auth)

#Search by keyword
def log(user_name, count, id):
    result_url = []
    for i in range(0, 2):
        results = api.user_timeline(screen_name=user_name, count=count, max_id=id)
        id = results[-1].id
        for result in results:
            if 'media' in result.entities:
                judg = 'RT @' in result.text
                if judg == False:
                    for media in result.extended_entities['media']:
                        result_url.append(media['media_url'])
    return result_url

def extract_pic_file(image_url):
    m = re.search(r"(([A-Za-z0-9]|_)+\.(png|jpg))", image_url)
    if m:
        name = 'img_dl/' + m.group(0)
    else:
        name = 'img_dl/None.png'

    return name

def save_image(url, name):
    count = 1
    for image_url in url:
        file_name = extract_pic_file(image_url)
        urllib.request.urlretrieve(image_url, file_name)
        count += 1

def fast(user_name):
    results = api.user_timeline(screen_name=user_name, count="1")
    for result in results:
        id = result.id
    return id

def start():
    count = 100
    user_name = input("Enter ID>>")
    id = fast(user_name)
    url = log(user_name, count, id - 1)
    save_image(url, user_name)

if __name__ == "__main__":
    start()

flow

Get the URL with the log function and save the image in the folder specified by the save_image function.

Function description

log function

You will receive the account ID, the number of tweets acquired, and the tweet ID as arguments. I will pass the results in a for statement. The URL of the image is stored in a json-like format, so take it out and store it in result_url. Make it a return value. I don't want to get the image retweeted this time, so I try to exclude it with an if statement.

sava_image As the name implies, it is a function that saves an image. At that time, an error will occur when saving the URL, so call the extract_pic_file function to convert the file name.

extract_pic_file function

The file name of the image is decided from the URL using a regular expression. This will prevent the image from being covered when retrieving the image from the same account.

fast function

This is a process for exception handling. The reason why this is necessary is that the log function takes the ID of the tweet as an argument and retrieves the tweets past from that tweet. Therefore, the first exception handling is required.

At the end

I'm not good at writing in an easy-to-understand manner. I found it easier to understand what was going on inside the function, so I wrote a description for each function.

Get images from specific users on Twitter

Introduction

environment

Preparation

config.py

Image acquisition

twitter.py

Whole program

twitter.py

flow

Function description

log function

extract_pic_file function

fast function

At the end

`config.py`

`twitter.py`

`twitter.py`