I was collecting data on Twitter, but after that I left it alone. So, when I look at the data for a long time, the mysterious municipal tweet of fav0 dating is Wansaka Wansaka ……. I actually found it by searching Twitter.
The user name was so obscene that I hid it ... What is this mysterious word ... It was a little more sentence before ...? And before, I specified a specific word, and when it hit, I used to say goodbye, but there is no common word that can be specified with this number of characters.
So delete it with a regular expression.
Dokan with each sample of operation check. Earthen pipe.
At first glance, as a muttering pattern ① "Hiragana 1 character" "Hiragana or punctuation" "Municipal name" ② "Hiragana 3 characters" "Symbol" "Municipal name" Since these are the two, replace the appropriate ones with blanks and then delete the blank lines.
Since the data is in the data frame, I will manage to do it there. It's been a while since I've had a python time. It's over soon.
import pandas as pd
import re
DF_samp=pd.DataFrame({'col_0': {'row_0': "Oh Osaka", 'row_1': "Oh, Osaka city aaa", 'row_2': "Oops, Osaka"},'col_1': {'row_0': 3, 'row_2': 4, 'row_3': 5},})
cols=DF_samp.col_0
cols0=cols.str.replace("[Ah-From][Ah-From][!-/:-@?[-`{-~.. , ... \].+[town|village|city]$|[Ah-From][Ah-From!-/:-@?[-`{-~.. , ... \].+[town|village|city]$", '')
DF_samp.col_0=cols0
DF_samp.dropna(subset=['col_0'])
With this, only the corresponding mysterious sentence was destroyed. Yattane. I feel like I can hear the voice asking if I'm substituting there, but I don't like it for a long time ...
I realized that a new pattern might come if this was seen by the BOT staff ... At that time, though.
Anyway, I want to be in a world where I can block efficiently! Well, it's the API that collects tweets, so this time it's not related to blocks.
Recommended Posts