I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK

Elasticsearch When I was thinking of trying various things, I decided to use the tweet log of JAWS DAYS 2017, which I also tweeted.

Premise

  1. Python uses 2.7 series
  2. ELK uses 5.2.2

Log search from Twitter

I put the created source code here. It can be used by entering the access key. (It is necessary to complete registration to the Twitter API as a preliminary preparation.)

https://github.com/kojiisd/TweetsSearch

--The source code itself is very simple. ――The number of 3000 is decided appropriately. (A value specified appropriately because I thought it would be that many tweets) ――By the way, when you use the Twitter API, you need to be careful because there is a limit to the number of times you can use it (it will be reset).

import twitter
from twitter import Api
import sys
import os
import time
import json

reload(sys)
sys.setdefaultencoding('utf-8')
from collections import defaultdict

maxid = 0
search_word = "#XXXXXX"

api = Api(base_url="https://api.twitter.com/1.1",
          consumer_key='XXXXXXXX',
          consumer_secret='XXXXXXXX',
          access_token_key='XXXXXXXX',
          access_token_secret='XXXXXXXX')


count = 0
file = open(name='../data/result.json', mode='w')
found = api.GetSearch(term=search_word, count=100, lang="ja", result_type='mixed', until="yyyy-mm-dd")
while count < 3000:
    for result in found:

        file.write(str(result) + os.linesep)

        count += 1
        maxid = result.id
    found = api.GetSearch(term=search_word, count=100, result_type='mixed', max_id=maxid - 1)

file.close()
print "TweetsNum: " + str(count)

Try searching with the hashtag of JAWS DAYS 2017

--I searched for "#jawsdays" in the above program. The tweet I was able to get was like this. ――About 1900 hits. --The period is 2017/03/11 00:00:00 --2017/03/12 09:00:00. Appropriately specify the period from when tweets related to JAWS DAYS start in earnest until they settle down.

{
    "created_at": "Sat Mar 11 04:57:29 +0000 2017", 
    "favorited": false, 
    "hashtags": [
        "jd2017_b", 
        "jawsdays"
    ], 
    "id": XXXXXXXXXXXXXXXXXX, 
    "lang": "ja", 
    "retweeted": false, 
    "source": "<a href="http://twitter.com" rel="nofollow">Twitter Web Client</a>", 
    "text": "Full + standing. It's amazing.#jd2017_b #jawsdays", 
    "truncated": false, 
    "user": {
        :
        :
        :
        "name": "Koji, 
        :
        :
        :
        "screen_name": "kojiisd", 
        :
        :
        :
    }
}

For logstash, I wanted to see the result immediately, so I just imported it as JSON I didn't write much processing in the configuration file ...

As a result, the index has become difficult, but for the time being, I will continue as it is.

スクリーンショット 2017-04-06 7.26.23.png

So, ELK is convenient no matter how many times you use it. As long as I'm happy that Kibana can make a little visualization once it's poured into Elasticsearch. I tried arranging the number of tweets for each user in the tweets acquired this time. The user who is tweeting can be obtained by "user.screen_name".

スクリーンショット 2017-04-04 7.14.22.png

Hmmm, maybe you've had a good fight, but didn't you tweet 100? Let's do our best.

The most retweeted person was "nakayama_san". He is the one who always posts easy-to-understand pictures at study sessions. Convinced. Retweets are narrowed down by "retweeted_status.user.screen_name". I wonder if it fits this way ... スクリーンショット 2017-04-04 7.26.32.png

By the way, when I try to display the tag cloud for the time being, it looks like this. It's no good at all (^^; It's natural, but it's because I haven't set anything such as analyzed. スクリーンショット 2017-04-06 7.33.21.png

With JAWS DAYS, there are few tweets themselves, so it seems that preparations such as properly analyzing character strings, properly considering logstash settings, adding templates to the target index, etc. are necessary.

However, if you prepare properly and then analyze it, you will be able to see trend keywords in re: Invent.

Summary

For the time being, I created a flow of "data acquisition" → "visualization". This time I created it on the assumption that it will be executed locally, but if you can analyze it on AWS as it is and publish the URL, it may be possible to display trend keywords in the tag cloud in real time. Hmm. If you feel like it, let's try it before re: Invent.

Recommended Posts

I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to find the entropy of the image with python
[Python] I tried to visualize tweets about Corona with WordCloud
[Python] I tried to visualize the follow relationship of Twitter
I tried to improve the efficiency of daily work with Python
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried to get the authentication code of Qiita API with Python.
I tried to streamline the standard role of new employees with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to get the movie information of TMDb API with Python
[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
I tried to visualize the spacha information of VTuber
I tried to solve the problem with Python Vol.1
I tried to summarize the string operations of Python
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried "gamma correction" of the image with Python + OpenCV
I tried to simulate how the infection spreads with Python
I tried to find the average of the sequence with TensorFlow
I tried to divide the file into folders with Python
I tried to put out the frequent word ranking of LINE talk with Python
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
I tried to compare the processing speed with dplyr of R and pandas of Python
The 15th offline real-time I tried to solve the problem of how to write with python
I tried scraping the ranking of Qiita Advent Calendar with Python
I tried to automate the watering of the planter with Raspberry Pi
I tried to visualize AutoEncoder with TensorFlow
How to write offline real time I tried to solve the problem of F02 with Python
I want to output the beginning of the next month with Python
I tried to create a list of prime numbers with python
I tried to fix "I tried stochastic simulation of bingo game with Python"
I tried to expand the size of the logical volume with LVM
I tried to easily detect facial landmarks with python and dlib
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
I tried to automatically collect images of Kanna Hashimoto with Python! !!
I tried to visualize the common condition of VTuber channel viewers
PhytoMine-I tried to get the genetic information of plants with Python
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 1
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2
I tried to solve the first question of the University of Tokyo 2019 math entrance exam with python sympy
I tried to refer to the fun rock-paper-scissors poi for beginners with Python
I tried to visualize the age group and rate distribution of Atcoder
I tried to automatically extract the movements of PES players with software
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried with the top 100 PyPI packages> I tried to graph the packages installed on Python
I tried to verify and analyze the acceleration of Python by Cython
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried to visualize all decision trees of random forest with SVG
I tried to predict the behavior of the new coronavirus with the SEIR model.
I tried "smoothing" the image with Python + OpenCV
I tried "differentiating" the image with Python + OpenCV
I tried to save the data with discord
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried to output LLVM IR with Python
I tried "binarizing" the image with Python + OpenCV