Hello, this is sunfish. As the first post, I tried a corona-related tweet analysis series. I'm tired of Python, so I analyzed the data with nehan (corona related, is that word now?) I'm tired of Python, so I analyzed the data with nehan (I want to go live even with Corona-Sequel) I'm tired of Python, so I tried to analyze the data with nehan (I want to go live even with Corona sickness-Part 1)
This time, I would like to conclude with ** How to collect Twitter data in the first place **. Of course, using Analysis Tool nehan.
Use Amazon S3 as storage for storage.
First, you have to apply for the Twitter API. There are many ways to do this if you search on Google, so I will omit it. It's just an application process, but it's a little troublesome to write in various ways and to speak English in the first place.
nehan has many connectors for capturing external data. Since it uses Cdata drivers, it can also import Web service data. If you select Twitter and enter the obtained API information, you can get tweet data with SQL query.
Add some effort to the acquired data and store it in Amazon S3. Add one column to add processing time so that you can see when the data was acquired. This is where the variable function comes in handy. The execution time and execution date are dynamically defined. And finally, if you export to S3, the accumulation is completed. I put a variable in the file name to export so that I can know the processing date.
Since it is not possible to manually execute the above process every day, set the automatic update setting. The flow that updates tweet data and stores it in S3 is set to be automatically executed at 0:00 every day. Sometimes the Twitter API doesn't respond and I can't get the data, so it sometimes fails. .. ..
Collect daily data accumulated in Amazon S3 in a batch and import it into nehan. I have been analyzing the data captured in this way.
While it makes me want to collect external data and multiply it with my own data, it can be very troublesome to collect. With nehan, you can directly connect to analysis as well as collection. Of course no programming required. For analysts who are tired of collecting data and writing Python, why not live a comfortable analytical life with nehan?
Recommended Posts