I wanted to delete the past Peing answer posts on Twitter, but since there were nearly 1000 deletion targets, I gave up manually deleting them, and instead wrote a script that automatically deletes the target tweets. It was.
Tweepy is a Python library that uses Twitter's API. You can use it to create Twitter bots and automatically like and follow. This time, I will introduce a script that automatically deletes specific tweets.
・ Registration of Twitter API (Please refer to here) ・ Download Twitter archive data ([Please refer to here](https://help.twitter.com/en/managing-your-account/how-to-download-your-twitter-archive?&lang=en #))) ・ Installation of Tweepy and pandas
When you download the Twitter archive data, a file called tweet.js will be downloaded together. This file contains a large amount of past tweet data as shown below. ** "tweet": {** Everything after that shows the data related to one tweet, and innumerable data similar to this is configured as one file. As a policy, we aim to selectively delete tweets that contain "https://peing.net" in the ** "source": ** line of the tweet.js file below. At that time, the id assigned to each tweet is also required, so the numerical data in the ** "id_str": ** line is also extracted.
{
"tweet" : {
"retweeted" : false,
"source" : "<a href=\"https://peing.net\" rel=\"nofollow\">Peing</a>",
"entities" : {
"hashtags" : [ {
"text" : "Peing",
"indices" : [ "18", "24" ]
}, {
"text" : "Question box",
"indices" : [ "25", "29" ]
} ],
"symbols" : [ ],
"user_mentions" : [ ],
"urls" : [ {
"url" : "https://t.co/snIXxSjooH",
"expanded_url" : "https://peing.net/ja/qs/636766292",
"display_url" : "peing.net/ja/qs/636766292",
"indices" : [ "30", "53" ]
} ]
},
"display_text_range" : [ "0", "53" ],
"favorite_count" : "0",
"id_str" : "1203602228591788032",
"truncated" : false,
"retweet_count" : "0",
"id" : "1203602228591788032",
"possibly_sensitive" : false,
"created_at" : "Sun Dec 08 09:08:27 +0000 2019",
"favorited" : false,
"full_text" : "It is a plump and glossy rice.\n\n#Peing #Question box https://t.co/snIXxSjooH",
"lang" : "ja"
}
Since the regular expression is used when extracting the character string of the tweet data to be deleted from the tweet.js file, import the ** re ** module. It also imports ** pandas ** to create a data frame from the extracted data. ** datetime ** is not required as I personally import it to measure how long it runs. ** tweepy ** is of course required.
import re
import pandas as pd
from datetime import datetime
import tweepy
Define a function that extracts the required data (** "source": **, ** "id_str": **) from tweet.js and outputs it as a data frame.
def read_tweet_file(file):
"""
reads a tweet.js into a pd.DataFrame
"""
# tweet.Read js file
with open(file) as dataFile:
datalines = dataFile.readlines()
#Creating an empty data frame to store the extracted data
colname = ['source', 'id']
df = pd.DataFrame([], columns=colname)
#Specify the part to be extracted in the list
regexes = [r' \"source\".*', r' \"id_str\".*' ]
for i, regex in enumerate(regexes):
L = []
for line in datalines:
#Extract the part that matches the conditions
match_obj = re.match(regex, line)
if match_obj :
L.append(match_obj.group())
#Store in data frame
df[colname[i]] = pd.Series(L)
return df
Define a function that outputs the ID of the tweet to be erased from the data frame.
def extract_id(df):
target_id = []
for i in range(len(df)):
#Extract only peing tweets from the data frame
match_obj = re.search(r'https://peing.net', df['source'][i])
if match_obj:
#Output the tweet ID to be deleted as a list
target_id.append(int(re.search(r'[0-9]+', df['id'][i]).group()))
return target_id
Specify the tweet ID and define the output function to delete the tweet.
def delete_tweets(target_id):
delete_count = 0
for status_id in target_id:
try:
#Delete tweets
api.destroy_status(status_id)
print(status_id, 'deleted!')
delete_count += 1
except:
print(status_id, 'deletion failed.')
print(delete_count, 'tweets deleted.')
Executes the function defined above.
#Authentication to access the Twitter API
auth = tweepy.OAuthHandler('*API key*', '*API secret key*')
auth.set_access_token('*Access token*', '*Access token secret*')
api = tweepy.API(auth)
user = api.me()
#Run
print(datetime.now())
df = read_tweet_file('tweet.js')
target_id = extract_id(df)
delete_tweets(target_id)
print(datetime.now())
I was able to automatically delete the target tweets of 976. (Execution time is about 10 minutes)
2020-02-07 17:24:57.816773
1204021701639426048 deleted!
1204020924015472640 deleted!
1204020044683833344 deleted!
1203904952684302337 deleted!
... (Omitted) ...
1204025368052523014 deleted!
1204023316488560640 deleted!
1204023315221733376 deleted!
1204022282311499776 deleted!
976 tweets deleted.
2020-02-07 17:35:16.302221
Feel free to play with the code introduced here and have a fulfilling life with the chords. Thank you for reading. Well then!
Recommended Posts