Hello, first post rude.
Since last year (2019), it has become impossible to download in csv format, but from tweet.js, tweet-part "" I would like to write
time" and "
text " as
I want to extract``.
For tweet.js
#### **`read_js.py`**
```python
#Tweet in the same directory.Please put js
#If you have MeCab installed, uncomment it to retrieve the word-separated ones.
import re
import datetime
import json
#import MeCab
tw_open = open("tweet.js","r",encoding="utf-8")
tw_time = open("tweet_mytext_time.txt","a",encoding="utf-8")
tw_a = open("tweet_mytext.txt","a",encoding="utf-8")
#tw_mecab = open("tweet_mytext_mecab.txt","a",encoding="utf-8")
twr = tw_open.read()
twr = re.sub("window.YTD.tweet.part0 = ","",twr)
twrj=json.loads(twr)
big=[]
small=[]
#mecab = MeCab.Tagger ("-Owakati")
for n in range(len(twrj)):
tw=eval(str(twrj[n]["tweet"]))
twf=str(tw["full_text"])
twf=re.sub(r"https?://[\w/:%#\$&\?\(\)~\.=\+\-…]+","",twf)
twf=twf.replace("\n","")
twc=str(tw["created_at"])
tim=datetime.datetime.strptime(twc,"%a %b %d %H:%M:%S %z %Y").replace(tzinfo=None)
tim_r=str(tim).replace(" ","_")
small=[]
twf_b=twf.split(":")[0]
if not "RT" in twf_b:
if not "@" in twf_b:
small.append(str(tim.timestamp()).replace(".0",""))
small.append(tim_r)
small.append(twf)
big.append(small)
small=[]
big.sort(key=lambda x: x[1],reverse=True)
for num in range(len(big)):
tw_a.write(big[num][2]+"\n")
tw_time.write(big[num][1]+" "+big[num][2]+"\n")
#text=big[num][2]
#text_m = mecab.parse(text)
#tw_mecab.write(str(text_m))
```
If you also have tweet-part1.js and Mecab,
#### **`read_js.py`**
```python
import re
import datetime
import json
import MeCab
tw_open = open("tweet.js","r",encoding="utf-8")
tw1_open = open("tweet-part1.js","r",encoding="utf-8")
tw_time = open("tweet_mytext_time.txt","a",encoding="utf-8")
tw_a = open("tweet_mytext.txt","a",encoding="utf-8")
tw_mecab = open("tweet_mytext_mecab.txt","a",encoding="utf-8")
twr = tw_open.read()
tw1r = tw1_open.read()
twr = re.sub("window.YTD.tweet.part0 = ","",twr)
tw1r = re.sub("window.YTD.tweet.part1 = ","",tw1r)
twrj=json.loads(twr)
tw1rj=json.loads(tw1r)
big=[]
small=[]
mecab = MeCab.Tagger ("-Owakati")
for n in range(len(twrj)):
tw=eval(str(twrj[n]["tweet"]))
twf=str(tw["full_text"])
twf=re.sub(r"https?://[\w/:%#\$&\?\(\)~\.=\+\-…]+","",twf)
twf=twf.replace("\n","")
twc=str(tw["created_at"])
tim=datetime.datetime.strptime(twc,"%a %b %d %H:%M:%S %z %Y").replace(tzinfo=None)
tim_r=str(tim).replace(" ","_")
small=[]
twf_b=twf.split(":")[0]
if not "RT" in twf_b:
if not "@" in twf_b:
small.append(str(tim.timestamp()).replace(".0",""))
small.append(tim_r)
small.append(twf)
big.append(small)
small=[]
for n in range(len(tw1rj)):
tw1=eval(str(tw1rj[n]["tweet"]))
twf1=str(tw1["full_text"])
twf1=re.sub(r"https?://[\w/:%#\$&\?\(\)~\.=\+\-…]+","",twf1)
twf1=twf1.replace("\n","")
twc1=str(tw1["created_at"])
tim1=datetime.datetime.strptime(twc1,"%a %b %d %H:%M:%S %z %Y").replace(tzinfo=None)
tim_r1=str(tim1).replace(" ","_")
small=[]
twf_b1=twf1.split(":")[0]
if not "RT" in twf_b1:
if not "@" in twf_b1:
small.append(str(tim1.timestamp()).replace(".0",""))
small.append(tim_r1)
small.append(twf1)
big.append(small)
#print(big)
big.sort(key=lambda x: x[1],reverse=True)
for num in range(len(big)):
tw_a.write(big[num][2]+"\n")
tw_time.write(big[num][1]+" "+big[num][2]+"\n")
text=big[num][2]
text_m = mecab.parse(text)
tw_mecab.write(str(text_m))
```
When you do this,
#### **`tweet_mytext.txt`**
```text
text
.....
```
#### **`tweet_mytext_time.txt`**
```text
2020-01-19_05:47:57 text
.....
```
Should be.
<h2> Description
Aside from the basics (), I had a lot of trouble with how to handle the quotes in JSON,
#### **`python`**
```python
twrj=json.loads(twr)
tw=eval(str(twrj[n]["tweet"]))
```
In such places, it seems that open and read read the entire sentence, remove the extra header, and convert it to dictionary type with json.loads (character type).
From there, eval further converts the dictionary type tweet value as a dictionary.
#### **`python`**
```python
{'tweet': {'retweeted': False, 'source': '<a href="http://twitter.com/download/android" rel="nofollow">Twitter for Android</a>', 'entities': {'hashtags': [], 'symbols': [], 'user_mentions': [{'name': 'Saidjon', 'screen_name': 'noppo6', 'indices': ['3', '10'], 'id_str': '240638809', 'id': '240638809'}], 'urls': []}, 'display_text_range': ['0', '140'], 'favorite_count': '0', 'id_str': '1218787465110024192', 'truncated': False, 'retweet_count': '0', 'id': '1218787465110024192', 'created_at': 'Sun Jan 19 06:49:10 +0000 2020', 'favorited': False, 'full_text': 'RT @noppo6:Samarkand is blue because it was painted blue to attract tourists after independence. Almost the level of ruins destruction. Congratulations to the Japanese guidebooks and media that praise Samarkand as the "blue city". I think this is good for this simple Samarkand.\Even if it's n, Shahizi ...', 'lang': 'ja'}}
```
The JSON I just got was like this, for example.
Q. How do you judge if it is your tweet?
A. In the case of RT, it is a colon, and RT and @ are included in the 0th position, so it is judged by that.
I've only written so far, but I'm happy if it helps. [I'm always on Twitter (@ kenkensz9) so if you have any questions](https://twitter.com/kenkensz9)
I hope you like it!
Recommended Posts