The person in charge of the campaign linked with Twitter in the in-house project I have a problem that I have to collect tweets manually, I wanted to do something about it, so I used Docker and Python. I created a tweet collection tool.

Problems faced when making

--I had a free Twitter Developer account, but there are many restrictions on the tweets I can get,
However, you cannot make a high-priced premium contract just for automation. --So, using the following third party library "GetOldTweet" Create a program that can get tweets of any number and time without API https://github.com/Jefferson-Henrique/GetOldTweets-python => From one day, the program started throwing errors, and when I checked it, Issue was mentioned.
It seems unlikely that the bug will be fixed for a while.

――When I was in trouble, I found the following repository where an Indian engineer commented on Issue
https://github.com/itsayushisaxena/Get_Old_Tweets-Python Apparently, if you look at the source code, you need a Twitter Standard API account, By combining tweepy and snscrape, it seems that you can get the range and number of tweets you want as before.

What I made

--Building an environment where Python 3 works with Docker --Scraping using Twitter API in Python --A shell script that automates so that the person in charge can use it from the terminal without being aware of the docker command.

Click here for source code https://github.com/hikkymouse1007/GetTweets_pub

This time, I created a mechanism that can be operated on the PC of the person in charge of the project and can eliminate difficult operations.

So, I tried to create a series of flow that executes container startup, tweet acquisition, CSV creation
by Docker and shell script.

The directory structure is as follows.

.
├── Dockerfile
├── Makefile
├── README.md
├── command
│   └── twitter //Shell script
├── docker-compose.yml
└── src
    ├── csv_files //Output CSV here
    └── got_v2.py //Python source code

Dockerfile, docker-compose I referred to the following article for the recipe of the container that works with python3. https://qiita.com/reflet/items/4b3f91661a54ec70a7dc Since tweepy does not support 3.9, I specified the version of python3.8 this time.

Install the operating environment of python and the required libraries.

# Dockerfile
FROM python:3.8
USER root

RUN apt-get update
RUN apt-get -y install locales && \
    localedef -f UTF-8 -i ja_JP ja_JP.UTF-8
RUN apt-get -y install sudo
RUN sudo apt-get update && apt-get install -y cowsay fortunes
ENV PATH $PATH:/usr/games
RUN echo $PATH

ENV LANG ja_JP.UTF-8
ENV LANGUAGE ja_JP:ja
ENV LC_ALL ja_JP.UTF-8
ENV TZ JST-9
ENV TERM xterm

RUN apt-get install -y vim less
RUN pip install --upgrade pip
RUN pip install --upgrade setuptools
RUN pip install requests requests_oauthlib
RUN pip install pandas
RUN pip install IPython
RUN pip install twitter
RUN pip install tweepy
RUN pip install snscrape

# docker-compose.yml
version: '3'
services:
  python3:
    restart: always
    build: .
    container_name: 'python3'
    working_dir: '/root/'
    tty: true
    volumes:
      - ./src:/root/src

got.py Source code for accessing the TwtterAPI and retrieving tweets. I borrowed the basic source code from this repository. https://github.com/itsayushisaxena/Get_Old_Tweets-Python

Please enter the following information that you will receive when you issue an account for twitterStandardAPI.

Constant name	Type of key to enter
TWITTER_CLIENT_KEY	API key
TWITTER_CLIENT_SECRET	API secret key
TWITTER_CLIENT_ID_ACCESS_TOKEN	Access token
TWITTER_CLIENT_ID_ACCESS_TOKEN_SECRET	Secret token

As a simple flow, pass the
environment variable in the docker container from the shell script described later, read the information such as hashtag from the environment variable with python, and use
tweepy and snscrape to get the tweet. Performs processing such as outputting the acquired tweets to a CSV file.

import tweepy
import csv
import os
import snscrape.modules.twitter as sntwitter
import sys
sys.dont_write_bytecode = True

#ENV_VALUES
tag = os.environ["TAG"]
since_date = os.environ["FROM"]
until_date =  os.environ["UNTIL"]
tweet_count = os.environ["NUM"]

#Provide your own credentials here.
TWITTER_CLIENT_KEY = '####################'
TWITTER_CLIENT_SECRET = '########################'
TWITTER_CLIENT_ID_ACCESS_TOKEN = '####################################'
TWITTER_CLIENT_ID_ACCESS_TOKEN_SECRET = '################################'

auth = tweepy.OAuthHandler(TWITTER_CLIENT_KEY, TWITTER_CLIENT_SECRET)
auth.set_access_token(TWITTER_CLIENT_ID_ACCESS_TOKEN, TWITTER_CLIENT_ID_ACCESS_TOKEN_SECRET)
api = tweepy.API(auth,wait_on_rate_limit=True)

#pip install snscrape
csvFile = open('/root/src/csv_files/%s_from_%s_to_%s_%s_tweets.csv' %(tag, since_date, until_date, tweet_count), 'a')
csvWriter = csv.writer(csvFile)
maxTweets = int(tweet_count)  # the number of tweets you require
print('%s since:%s until:%s' % (tag, since_date, until_date))
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('%s' % tag +'since:%s until:%s' % (since_date, until_date)).get_items()) :
        if i > maxTweets :
            break
        csvWriter.writerow([tweet.date, tweet.username, tweet.content]) #If you need more information, just provide the attributes

Shell script

Here, by passing the path of the command named twiiter, it matches the condition you want to search. Let's execute all the processing to get the tweet. The parameters entered here as standard are passed as environment variables in the Docker container.

command/twitter

#!/bin/sh
echo "Enter the following data and press enter"
read -p "hashtag(eg. #test): " str1
read -p "Data acquisition start date(eg. 2020-08-10): " str2
read -p "Data acquisition end date(eg. 2020-08-20): " str3
read -p "Number of tweets acquired(eg. 100): " str4
TAG=$str1 FROM=$str2 UNTIL=$str3 NUM=$str4
echo "Entered data"
echo $TAG $FROM $UNTIL $NUM

ANIMALS=("cheese" \
         "cock" \
         "dragon-and-cow" \
        "ghostbusters" \
        "pony" \
        "stegosaurus" \
        "turtle" \
        "turkey" \
        "gnu"\
        )
ANIMAL=${ANIMALS[$(($RANDOM % ${#ANIMALS[*]}))]}

docker-compose -f ~/path/to/docker-compose.yml \
    run \
    --rm \
    -e TAG=$TAG \
    -e FROM=$FROM \
    -e UNTIL=$UNTIL \
    -e NUM=$NUM \
    -e ANIMAL=$ANIMAL \
    python3 \
    /bin/bash -c "python /root/src/got_v2.py && cowsay -f $ANIMAL “I collected tweets”"

FILENAME="${TAG}_from_${FROM}_to_${UNTIL}_${NUM}_tweets.csv"
echo $FILENAME
mkdir -p ~/Desktop/twitter_csv_files
cp src/csv_files/$FILENAME ~/Desktop/twitter_csv_files
open ~/Desktop/twitter_csv_files/$FILENAME

Makefile

Create a directory and create a twitter command with make path in one shot. Execute the make command in the root directory of this repository. This time, create a directory called commad directly under the user directory, Place the script file for the twitter command there. You can delete the path with make rm-path.

docker-path:
	@echo $(PWD)
path:
	@mkdir ~/command
	@cp ./command/twitter ~/command/twitter
	@ln -si ~/command/twitter /usr/local/bin
	@chmod 777 ~/command/twitter
rm-path:
	@rm -rf ~/command
	@rm /usr/local/bin/twitter

I actually moved it

Below is a video of running the program created this time.

output1

bonus

There is a mysterious animal in the previous video, but this is a program called cowsay It is installed in the Docker image created this time. Cute animals randomly complete CSV file creation so that workers do not get tired of monotonous work I tried to tell you. Randomly pass the animal name written in the shell script as an environment variable when starting docker, The cowsay command is executed at the end of the script.

Example

There are many other animals besides the ones listed here, so if you are interested, please check them out and add them.

Reference article

--Shell script https://qiita.com/Lambda34/items/7d24ebe6f7bde5bedddc

GetOldTweet(TPL) https://github.com/hikkymouse1007/Get_Old_Tweets-Python
Docker+Python3 https://qiita.com/reflet/items/4b3f91661a54ec70a7dc
Cowsay https://qiita.com/Hiroki_lzh/items/8cf206d54f91e29b3912#unix%E3%81%A7%E6%9C%89%E5%90%8D%E3%81%AA%E3%82%A2%E3%82%B9%E3%82%AD%E3%83%BC%E3%82%A2%E3%83%BC%E3%83%88%E3%82%92%E5%87%BA%E5%8A%9B%E3%81%99%E3%82%8Bcowsay

I tried to automate internal operations with Docker, Python and Twitter API + bonus