[Twitter] I want to make the downloaded past tweets (of my account) into a beautiful CSV

On Twitter, go to Settings> Accounts> Twiter Data> Download Archive As you know, there is a function to download all your past tweets. スクリーンショット 2019-12-07 17.53.12.png

The downloaded file contains information about past tweets, RT tweets, likes tweets, direct messages, and more. (It seems that you can browse by opening index.html which is usually downloaded together, but in my case index.html was not downloaded. Why?)


2019/12/15 postscript
It seems that index.html has been changed to a specification that is not downloaded in the first place.

Hands-on to visualize your tweets while understanding BERT ↑ I noticed after reading this article. ↓ (Reference) [Solved] I can't download all tweet history on twitter [Method]


If you want to do text mining or some kind of analysis, you'll probably want to read tweet.json. In this article, we will process this json file into a csv that is easy to use for morphological analysis. The csv to be created is two columns, "Timestamp" and "Text Body".

Image of CSV that can be finally created n5v579QliBDG6o51575723317_1575723341.png

environment Python 3.6.5 Mac OS Mojave 10.14.4

pandas==0.23.0

When you open the downloaded json, it looks like this. tLvy3PlzlPJ2Khh1575731228_1575731416.jpg

Underlined red

window.YTD.tweet.part0 = 

Is unnecessary, so please delete it. Then change the extension to .txt and put it in your working directory.

read_dl_tweet.py


import pandas as pd
import json

tweets_file = open("tweet.txt", "r")
tweet = json.load(tweets_file)

Open json as a pandas dataframe in the script above. There are many columns, but only the necessary columns are extracted.

read_dl_tweet.py


df = tweet_data_frame.loc[:,["created_at","full_text"]]

Since there are troublesome characters such as line breaks and commas when making csv, remove them. It didn't work without regex = True.

read_dl_tweet.py


df = df.replace(['\n',',','	','\r'],'',regex=True)

Also, the format of the time stamp is in a form that cannot be used for sorting, so correct it to make it easier to read. I was able to convert it in one shot with the to_datetime method of pandas.

read_dl_tweet.py


df_date = pd.to_datetime(df["created_at"])
df["date_form"] = df_date
df_sorted = df.sort_values("date_form") 
df_text_date = df_sorted.loc[:,["date_form","full_text"]]

Sorted by the newly created time stamp.

read_dl_tweet.py


df_text_date.to_csv("df_text_date.csv", header=False, index=False,sep=',',encoding='utf-16')

Please change the option when outputting csv as appropriate (such as making the delimiter a tab).

In Next article, I will graph the number of tweets for each period from the created csv.

This code: https://github.com/KanikaniYou/plot_tweet_graph

Recommended Posts

[Twitter] I want to make the downloaded past tweets (of my account) into a beautiful CSV
I want to check the position of my face with OpenCV!
I tried to make a script that traces the tweets of a specific user on Twitter and saves the posted image at once
I want to express my feelings with the lyrics of Mr. Children
Python: I want to measure the processing time of a function neatly
I want to make matplotlib a dark theme
I want to INSERT a DataFrame into MSSQL
I want to make a game with Python
I want to make fits from my head
I want to customize the appearance of zabbix
The story of IPv6 address that I want to keep at a minimum
I want to set a life cycle in the task definition of ECS
I want to add silence to the beginning of a wav file for 1 second
I want to see a list of WebDAV files in the Requests module
Use twitter API to get the number of tweets related to a certain keyword
How to make a Raspberry Pi that speaks the tweets of the specified user
I refactored "I tried to make a script that saves posted images at once by going back to the tweets of a specific user on Twitter".
I tried to make something like a chatbot with the Seq2Seq model of TensorFlow
The story of Linux that I want to teach myself half a year ago
I want to fully understand the basics of Bokeh
I want to take a screenshot of the site on Docker using any font
I want to install a package of Php Redis
The world's most easy-to-understand explanation of how to make a LINE BOT (1) [Account preparation]
I want to increase the security of ssh connections
Make a note of what you want to do in the future with Raspberry Pi
I want to output a beautifully customized heat map of the correlation matrix. matplotlib edition
I want to make a blog editor with django admin
I want to start a lot of processes from python
I want to make a click macro with pyautogui (desire)
I want to use only the normalization process of SudachiPy
NikuGan ~ I want to see a lot of delicious meat! !!
I want to get the operation information of yahoo route
I want to make a click macro with pyautogui (outlook)
[Python] I tried to visualize the follow relationship of Twitter
I want to judge the authenticity of the elements of numpy array
I want to know the features of Python and pip
I want to make the Dictionary type in the List unique
[Visualization] I want to draw a beautiful graph with Plotly
Keras I want to get the output of any layer !!
I want to know the legend of the IT technology world
I want to make input () a nice complement in python
I want to create a Dockerfile for the time being.
How to connect the contents of a list into a string
I tried to make a site that makes it easy to see the update information of Azure
[First scraping] I tried to make a VIP character of Smash Bros. [Beautiful Soup] [Data analysis]
I want to clear up the question of the "__init__" method and the "self" argument of a Python class.
I want to extract the tag information (title and artist) of a music file (flac, wav).
I want to get the name of the function / method being executed
I want to manually assign the training parameters of the [Pytorch] model
I tried to make a system that fetches only deleted tweets
I tried to make a regular expression of "amount" using Python
I want to read the html version of "OpenCV-Python Tutorials" OpenCV 3.1 version
I tried to make a regular expression of "time" using Python
I wanted to convert my face photo into a Yuyushiki style.
[Introduction to StyleGAN] I played with "The Life of a Man" ♬
I want to output the beginning of the next month with Python
I tried to make a regular expression of "date" using Python
I want to create a system to prevent forgetting to tighten the key 1
[Introduction] I want to make a Mastodon Bot with Python! 【Beginners】
I want to make a parameter list from CloudFormation code (yaml)
I want to make the second line the column name in pandas