It's been a year since the ban on online elections was lifted, and every time the elections are over, it's a speech space that keeps reducing the remaining machines of democracy and Japan like a spelunker. How are you doing today?
By the way, this time I will examine the contents of the tweets before and after the ballot counting.
2014/12/14 From 18:00 to 07:00 Collect tweets containing the following information
#election,#House of Representatives election,election
** Keep getting tweets containing specific keywords using Streaming API in Python ** http://qiita.com/mima_ita/items/ecdf7de2fe619378beee
https://github.com/mima3/stream_twitter
Confirmed to work on Windows7 Python 2.7
The collected data can be downloaded from the following. http://needtec.sakura.ne.jp/doc/shuin47twitter.zip
First, let's look at the number of tweets by time. Let's look at the hourly data from 18:00 to 07:00 on 2014/12/14 using the above code.
python twitter_db_hist.py "2014/12/14 9:00" "2014/12/14 22:00" 3600
The result is as follows:
Time (UTC) | Japan time | number |
---|---|---|
12/14 09:00 | 12/14 18:00 | 3149 |
12/14 10:00 | 12/14 19:00 | 4047 |
12/14 11:00 | 12/14 20:00 | 11280 |
12/14 12:00 | 12/14 21:00 | 9755 |
12/14 13:00 | 12/14 22:00 | 7199 |
12/14 14:00 | 12/14 23:00 | 5207 |
12/14 15:00 | 12/14 00:00 | 3472 |
12/14 16:00 | 12/14 01:00 | 3801 |
12/14 17:00 | 12/14 02:00 | 1545 |
12/14 18:00 | 12/14 03:00 | 529 |
12/14 19:00 | 12/14 04:00 | 292 |
12/14 20:00 | 12/14 05:00 | 300 |
12/14 21:00 | 12/14 06:00 | 477 |
The timing of ballot counting at 20:00 is the highest. And the number of tweets will decrease over time. However, it revived in the 1:00s, and after that, the number of tweets decreased, and it recovered a little from around 5:00 when I woke up.
It's understandable that the number of tweets decreases at midnight and increases in the morning. But why did the number of tweets increase in the midnight 1:00s?
For this reason, let's take a look at the 1:00s in 1-minute units.
python twitter_db_hist.py "2014/12/14 16:00" "2014/12/14 17:00" 60
Looking at this result, it shows a rapid excitement around 1:27 minutes.
What happened at this time? Here, let's check the threads of the people of the Kaieda Research Institute who love the Democratic Party.
** [Fukatsu no Jumon is different] Kaieda Democratic Party Research 802th [Tosen no Sho has been decided] ** http://anago.2ch.net/test/read.cgi/asia/1418565521/
811: Nameless place of sunrise: 2014/12/15(Month) 01:26:44.86 ID:tG+ZZ8gB
[Banri Kaieda] Democratic Party representative Banri Kaieda did not revive in proportional Tokyo block, and the election was confirmed (01):19)(c)2ch.net
http://daily.2ch.net/test/read.cgi/newsplus/1418574054/
812: Nameless place of sunrise: 2014/12/15(Month) 01:26:49.76 ID:4Us97nfn
Lost decision w
813: Nameless place of sunrise: 2014/12/15(Month) 01:26:51.11 ID:pW7uplw3
Goodbye, Mari
814: Nameless place of sunrise: 2014/12/15(Month) 01:27:01.20 ID:yIjazH47
Wow ah ah NHK also lost w
815: Nameless place of sunrise: 2014/12/15(Month) 01:27:02.02 ID:NOhUWn58
Mali completely defeated at NHK
No, I'm going to Phoenix from here! It's definitely Fenix, so _____
816: Nameless place of sunrise: 2014/12/15(Month) 01:27:08.70 ID:4zmUGrZE
>>802
㌧. I haven't bought any snacks so I can open the mackerel can w
Looking at the situation at that time, it seems that Asahi reported the extraordinary defeat of Kaieda at 1:19, and NHK also reported the news at 1:27.
As expected, it can be said that the advancement and retreat of the leader of the first opposition party had the impact of wiping out the drowsiness of Twitter at midnight. The results also show that television has stronger diffusion power than newspaper extras.
Next, let's look at frequent words. By using Mecab, we performed morphological analysis and aggregated the words.
This can be done with the following script.
python twitter_db_mecab.py "2014/12/14 9:00" "2014/12/14 22:00" > mecab.txt
The best 100 are displayed below.
word | Number of appearances |
---|---|
election | 70626 |
Ward | 33315 |
Selection | 27196 |
House of Representatives | 27152 |
Voting | 13740 |
1 | 11698 |
Probably | 8386 |
Liberal Democratic Party | 7403 |
Breaking news | 7120 |
Mr | 7074 |
Tokyo | 6864 |
Vote counting | 6484 |
Winning | 6456 |
Raw | 6443 |
NHK | 6222 |
0 | 5866 |
# | 5519 |
Lost | 5504 |
official | 5488 |
kyodo | 5487 |
Sure | 5384 |
2 | 5352 |
party | 5236 |
Extra | 5229 |
Seat | 5025 |
go | 4811 |
Man | 4796 |
BqAAr | 4633 |
vlhS | 4606 |
rate | 4460 |
Proportional | 4419 |
Liberal Democratic Party | 4302 |
block | 4208 |
4 | 4196 |
Teru | 4035 |
Day | 3912 |
Candidate | 3811 |
House of Representatives | 3782 |
seiji | 3773 |
9 | 3745 |
During ~ | 3726 |
Japan | 3611 |
jimin | 3607 |
koho | 3603 |
representative | 3599 |
Democracy | 3592 |
3 | 3589 |
Person | 3548 |
nicohou | 3490 |
JNSC | 3203 |
blogos | 3170 |
ld | 3125 |
name | 3098 |
Abe | 3068 |
Democratic Party | 3039 |
To tell | 3011 |
Special number | 2959 |
Next generation | 2889 |
% | 2881 |
Time | 2844 |
Nico | 2817 |
Be | 2750 |
Appearance | 2727 |
Beat Takeshi | 2723 |
To be | 2721 |
Kanagawa | 2690 |
Politics | 2532 |
5 | 2511 |
Kaieda | 2504 |
it can | 2488 |
Minutes | 2424 |
Long | 2371 |
Year | 2315 |
Viewing | 2315 |
Okinawa | 2231 |
Resurrection | 2176 |
Up | 2092 |
U | 1997 |
Acquired | 1977 |
Vote | 1954 |
Absent | 1953 |
Current | 1926 |
Restoration | 1905 |
Prime Minister | 1898 |
Press | 1888 |
Report | 1882 |
pond | 1831 |
take | 1775 |
Communist Party | 1773 |
Substitute | 1769 |
necessary | 1766 |
nMDR | 1761 |
YidT | 1761 |
Youth | 1750 |
Target | 1727 |
Paper | 1712 |
Mari | 1666 |
senkyost | 1645 |
information | 1628 |
I'd love to | 1618 |
After all, the most extracted party name was the "Liberal Democratic Party," which took the majority. Next is the "Democratic Party," followed by the "Next Generation," followed by the "Renewal" and the "Communist Party." The next-generation party seems to have a considerable gap between the actual number of seats and the degree of attention on the Internet.
"Tokyo" and "Okinawa" were extracted by place name. As for Tokyo, there was a retweet of the article of "Tokyo Shimbun", so it was extracted a lot, and for Okinawa, the LDP was in the form of annihilation of the single-seat constituencies, so it is thought that it attracted more attention than other areas.
The names of people that attracted attention were "Abe," "Beat Takeshi," and "Kaieda." Not to mention the prime minister and the leader of the first opposition party, it was surprising that "Beat Takeshi" was noticed. Apparently this is because "Beat Takeshi" was appearing on Nico Nico Live.
Finally, let's use Cabocha to tabulate the relationship between clauses. See below for how to install Cabocha on Windows.
** Put Cabocha in Windows and analyze the dependency with Python ** http://qiita.com/mima_ita/items/161cd869648edb30627b
This time, I analyzed with 0.66. I think the latest results will be similar.
This can be done with the following script.
python twitter_db_cabocha.py "2014/12/14 9:00" "2014/12/14 22:00" > cabocha.txt
The best 100 are displayed below.
Clause 1 | Clause 2 | Number of appearances |
---|---|---|
Lost | Sure | 1762 |
co/ | 4nMDR4YidT#General election http://t | 1557 |
Turnout | 0% | 1538 |
[Election] House of Representatives election, | Teen | 1534 |
Teen | Turnout | 1534 |
Youth | go | 1504 |
name | write | 1504 |
RT@whsaito:Ballot | Fill out | 1502 |
name | Fill out | 1502 |
By all means candidate | name | 1502 |
method | take | 1502 |
high | Japan | 1502 |
write | go | 1502 |
take | Japan | 1502 |
Fill out | method | 1502 |
14th | go | 1502 |
Education level | high | 1502 |
RT@kyoho_times: | Teen | 1460 |
Resurrection | Sure | 1288 |
guy | Win | 1208 |
Such | guy | 1208 |
go-denial | Win | 1186 |
Probably | Report | 1172 |
3700kei:#General election election | go-denial | 1141 |
RT@keisei | 3700kei:#General election election | 1107 |
Proportional Tokyo block | Resurrection | 1075 |
RT@kyodo_official:Democratic Party | Banri Kaieda representative | 964 |
Chairman Tetsu Katayama | Lose | 928 |
Lose | Lose | 928 |
Banri Kaieda representative | Resurrection | 928 |
Opposition | Lose | 928 |
Socialist Party | Chairman Tetsu Katayama | 928 |
1949 House of Representatives election | Lose | 928 |
defeat | Sure | 914 |
Everyone | Politics http://t | 885 |
Winning | Sure | 815 |
House of Representatives election special page | →http://t | 761 |
feel | #election | 755 |
Polling place | listen | 755 |
1 vote | Disparity | 755 |
Disparity | feel | 754 |
listen | feel | 754 |
Girls high school | listen | 754 |
RT@kurosia:acquaintance | Polling place | 752 |
The lowest after the war | Last time | 742 |
RT@ld_blogos: | [Breaking news] | 663 |
Next generation | party | 633 |
[Breaking news] | Probably | 561 |
Below | http://t | 551 |
RT@kyodo_official:Next generation | party | 546 |
House of Representatives election | Turnout | 535 |
When | State | 514 |
Kiyomi Tsujimoto, Democracy, Osaka 10th District | Probably | 475 |
Candidate information | House of Representatives election | 2014-Yahoo |
afternoon | As of 6 o'clock | 424 |
Turnout | 34 | 424 |
79 points | Below | 420 |
National average | 34 | 420 |
As of 6 o'clock | 34 | 420 |
98% | Last time | 420 |
By | 34 | 420 |
RT@senkyost: | [Acquired seats___ | 386 |
Voting | go | 379 |
Right of collective self-defense | Exercise acceptance | 377 |
defeat | Report | 370 |
Probably | Break | 367 |
thing | know | 363 |
Japan | know | 360 |
Return to J League | know | 359 |
If this happens | Run | 359 |
gradually | Return to J League | 359 |
I | Run | 359 |
necessary | To tell | 357 |
Winner#Hope to spread___#RT | Follow everyone | 356 |
[Sad news] For anime | necessary | 356 |
Regulation | necessary | 356 |
Winner | necessary | 356 |
To tell | #Election http://t | 355 |
Follow everyone | #Election http://t | 355 |
8bu_: | necessary | 352 |
RT@K | 8bu_: | 352 |
#election#NHK#衆議院election#Ikegami | election#Vote counting | 344 |
Mr. Ishihara | Make a statement | 341 |
party | Shintaro Ishihara Chief Advisor | 341 |
House of Representatives election this time | Retired from politics | 341 |
Shintaro Ishihara Chief Advisor | Lost | 341 |
Retired from politics | Make a statement | 341 |
co/ | 7LGbX1z | 322 |
RT@mainichijpedit:Ministry of Internal Affairs and Communications | By | 309 |
Understanding | obtain | 304 |
___http | ://t | 303 |
People | Understanding | 303 |
RT@jimin_koho: | /To do | 301 |
thing | Sure | 278 |
RT@jimin_koho: | / | 270 |
Exercise acceptance | To express | 268 |
Liberal Democratic Party | To express | 268 |
To express | Seiichiro Murakami | 268 |
Opposition | To express | 268 |
2nd ward | To express | 268 |
Lost → As certainty is ranked first, Twitter seems to be more interested in who will be dropped than who will be accepted.
Also, it seems that there are many references to youth turnout. However, it is also true that there are many news stories that "the turnout of teens is 0%".
"High"-> "Japan" seems to be the result of a large number of tweets saying "Only Japan with a high level of education uses the method of writing names on ballots."
Also, as I mentioned at the beginning, I investigated democracy and Japan, where the remaining opportunities are reduced at each election. Of the two tweets saying that democracy will die, there are 11 tweets saying that democracy will not die, so it seems that democracy's remaining opportunities are not less than expected.
However, a few words were extracted to reduce the remaining machines, such as the following.
Clause 1 | Clause 2 | Number of appearances |
---|---|---|
Democracy | die | 2 |
Democracy | End | 2 |
Democracy | End | 2 |
Democracy | Collapse | 2 |
···Democracy | Collapse | 1 |
How the reduction of the remaining machines of Japan is as follows.
Clause 1 | Clause 2 | Number of appearances |
---|---|---|
RT@inosan08260:End of Japan confirmed | 178kakapo:Japan | 7 |
Liberal Democratic Party | Japan collapse | 4 |
Already | Japan collapse | 4 |
Laughable | Japan collapse | 4 |
Japan | Crush | 2 |
Japan | End | 2 |
From this result, it seems that the number of remaining democratic aircraft has decreased to 9 in this election, and the number of remaining aircraft of Japan has decreased by about 23.
・ When Mr. Kaieda becomes unemployed, he gets excited even though it is midnight, and when he sees that the phrase of defeat → certainty appears frequently, it is more noticeable who will fall than who will accept it.
・ Looking at the number of appearances of the word “next generation” and the actual results, it can be said that seats cannot be taken because of the attention on the Internet.
・ Democracy and Japan had the image that the number of remaining aircraft is decreasing every election, but it seems that this was not the case.
You can do a bogus analysis like that. For tweets by hour, as in this example, I think you should look at the increase and decrease of the data and investigate in detail where there was a change.
It's easy to see that the word frequency is certainly getting a lot of attention. However, it should be noted that a large number of extracts for this example is not always a positive reaction.
With regard to parsing, it may be possible to overcome the weaknesses of examining frequent word-only occurrences. But honestly, I wouldn't have fully realized that possibility this time around. This will be an issue for the future.
By the way, I couldn't get the tweet of the key mark with the filter of Streaming API.
Recommended Posts