The person in charge of the campaign linked with Twitter in the in-house project I have a problem that I have to collect tweets manually, I wanted to do something about it, so I used Docker and Python. I created a tweet collection tool.
--I had a free Twitter Developer account, but there are many restrictions on the tweets I can get,
However, you cannot make a high-priced premium contract just for automation.
--So, using the following third party library "GetOldTweet"
Create a program that can get tweets of any number and time without API
https://github.com/Jefferson-Henrique/GetOldTweets-python
=> From one day, the program started throwing errors, and when I checked it, Issue was mentioned.
It seems unlikely that the bug will be fixed for a while.
――When I was in trouble, I found the following repository where an Indian engineer commented on Issue
https://github.com/itsayushisaxena/Get_Old_Tweets-Python
Apparently, if you look at the source code, you need a Twitter Standard API account,
By combining tweepy and snscrape, it seems that you can get the range and number of tweets you want as before.
--Building an environment where Python 3 works with Docker --Scraping using Twitter API in Python --A shell script that automates so that the person in charge can use it from the terminal without being aware of the docker command.
Click here for source code https://github.com/hikkymouse1007/GetTweets_pub
This time, I created a mechanism that can be operated on the PC of the person in charge of the project and can eliminate difficult operations.
So, I tried to create a series of flow that executes container startup, tweet acquisition, CSV creation
by Docker and shell script.
The directory structure is as follows.
.
├── Dockerfile
├── Makefile
├── README.md
├── command
│ └── twitter //Shell script
├── docker-compose.yml
└── src
├── csv_files //Output CSV here
└── got_v2.py //Python source code
Dockerfile, docker-compose I referred to the following article for the recipe of the container that works with python3. https://qiita.com/reflet/items/4b3f91661a54ec70a7dc Since tweepy does not support 3.9, I specified the version of python3.8 this time.
Install the operating environment of python and the required libraries.
# Dockerfile
FROM python:3.8
USER root
RUN apt-get update
RUN apt-get -y install locales && \
localedef -f UTF-8 -i ja_JP ja_JP.UTF-8
RUN apt-get -y install sudo
RUN sudo apt-get update && apt-get install -y cowsay fortunes
ENV PATH $PATH:/usr/games
RUN echo $PATH
ENV LANG ja_JP.UTF-8
ENV LANGUAGE ja_JP:ja
ENV LC_ALL ja_JP.UTF-8
ENV TZ JST-9
ENV TERM xterm
RUN apt-get install -y vim less
RUN pip install --upgrade pip
RUN pip install --upgrade setuptools
RUN pip install requests requests_oauthlib
RUN pip install pandas
RUN pip install IPython
RUN pip install twitter
RUN pip install tweepy
RUN pip install snscrape
# docker-compose.yml
version: '3'
services:
python3:
restart: always
build: .
container_name: 'python3'
working_dir: '/root/'
tty: true
volumes:
- ./src:/root/src
got.py Source code for accessing the TwtterAPI and retrieving tweets. I borrowed the basic source code from this repository. https://github.com/itsayushisaxena/Get_Old_Tweets-Python
Please enter the following information that you will receive when you issue an account for twitterStandardAPI.
Constant name | Type of key to enter |
---|---|
TWITTER_CLIENT_KEY | API key |
TWITTER_CLIENT_SECRET | API secret key |
TWITTER_CLIENT_ID_ACCESS_TOKEN | Access token |
TWITTER_CLIENT_ID_ACCESS_TOKEN_SECRET | Secret token |
As a simple flow, pass the
environment variable in the docker container from the shell script described later, read the information such as hashtag from the environment variable with python, and use
tweepy and snscrape to get the tweet.
Performs processing such as outputting the acquired tweets to a CSV file.
import tweepy
import csv
import os
import snscrape.modules.twitter as sntwitter
import sys
sys.dont_write_bytecode = True
#ENV_VALUES
tag = os.environ["TAG"]
since_date = os.environ["FROM"]
until_date = os.environ["UNTIL"]
tweet_count = os.environ["NUM"]
#Provide your own credentials here.
TWITTER_CLIENT_KEY = '####################'
TWITTER_CLIENT_SECRET = '########################'
TWITTER_CLIENT_ID_ACCESS_TOKEN = '####################################'
TWITTER_CLIENT_ID_ACCESS_TOKEN_SECRET = '################################'
auth = tweepy.OAuthHandler(TWITTER_CLIENT_KEY, TWITTER_CLIENT_SECRET)
auth.set_access_token(TWITTER_CLIENT_ID_ACCESS_TOKEN, TWITTER_CLIENT_ID_ACCESS_TOKEN_SECRET)
api = tweepy.API(auth,wait_on_rate_limit=True)
#pip install snscrape
csvFile = open('/root/src/csv_files/%s_from_%s_to_%s_%s_tweets.csv' %(tag, since_date, until_date, tweet_count), 'a')
csvWriter = csv.writer(csvFile)
maxTweets = int(tweet_count) # the number of tweets you require
print('%s since:%s until:%s' % (tag, since_date, until_date))
for i,tweet in enumerate(sntwitter.TwitterSearchScraper('%s' % tag +'since:%s until:%s' % (since_date, until_date)).get_items()) :
if i > maxTweets :
break
csvWriter.writerow([tweet.date, tweet.username, tweet.content]) #If you need more information, just provide the attributes
Here, by passing the path of the command named twiiter, it matches the condition you want to search. Let's execute all the processing to get the tweet. The parameters entered here as standard are passed as environment variables in the Docker container.
command/twitter
#!/bin/sh
echo "Enter the following data and press enter"
read -p "hashtag(eg. #test): " str1
read -p "Data acquisition start date(eg. 2020-08-10): " str2
read -p "Data acquisition end date(eg. 2020-08-20): " str3
read -p "Number of tweets acquired(eg. 100): " str4
TAG=$str1 FROM=$str2 UNTIL=$str3 NUM=$str4
echo "Entered data"
echo $TAG $FROM $UNTIL $NUM
ANIMALS=("cheese" \
"cock" \
"dragon-and-cow" \
"ghostbusters" \
"pony" \
"stegosaurus" \
"turtle" \
"turkey" \
"gnu"\
)
ANIMAL=${ANIMALS[$(($RANDOM % ${#ANIMALS[*]}))]}
docker-compose -f ~/path/to/docker-compose.yml \
run \
--rm \
-e TAG=$TAG \
-e FROM=$FROM \
-e UNTIL=$UNTIL \
-e NUM=$NUM \
-e ANIMAL=$ANIMAL \
python3 \
/bin/bash -c "python /root/src/got_v2.py && cowsay -f $ANIMAL “I collected tweets”"
FILENAME="${TAG}_from_${FROM}_to_${UNTIL}_${NUM}_tweets.csv"
echo $FILENAME
mkdir -p ~/Desktop/twitter_csv_files
cp src/csv_files/$FILENAME ~/Desktop/twitter_csv_files
open ~/Desktop/twitter_csv_files/$FILENAME
Makefile
Create a directory and create a twitter command with make path in one shot. Execute the make command in the root directory of this repository. This time, create a directory called commad directly under the user directory, Place the script file for the twitter command there. You can delete the path with make rm-path.
docker-path:
@echo $(PWD)
path:
@mkdir ~/command
@cp ./command/twitter ~/command/twitter
@ln -si ~/command/twitter /usr/local/bin
@chmod 777 ~/command/twitter
rm-path:
@rm -rf ~/command
@rm /usr/local/bin/twitter
Below is a video of running the program created this time.
There is a mysterious animal in the previous video, but this is a program called cowsay It is installed in the Docker image created this time. Cute animals randomly complete CSV file creation so that workers do not get tired of monotonous work I tried to tell you. Randomly pass the animal name written in the shell script as an environment variable when starting docker, The cowsay command is executed at the end of the script.
There are many other animals besides the ones listed here, so if you are interested, please check them out and add them.
--Shell script https://qiita.com/Lambda34/items/7d24ebe6f7bde5bedddc
Recommended Posts