I tried to build an environment that can acquire, store, and analyze tweet data in WSL (bash)

background

In the Windows environment, we have built an environment ** for acquiring, storing, and analyzing tweet data, so we have organized it as a memorandum. In particular, in ** analysis, I wanted to easily perform file operations ** (using grep, sed, awk, python, etc.), so [WSL (Windows Subsystem for Linux)](https: // ja. Wikipedia.org/wiki/Windows_Subsystem_for_Linux) was enabled and ** bash (ubuntu) ** was introduced.

1. 1. Procedure outline

――We decided to realize a series of environments on ** WSL1 bash **. --For WSL, Let's install Windows Subsystem for Linux (WSL1)! was introduced as a reference. ――This time, I wanted to use MongoDB for file operations, so I built a bash with ** ubuntu 18.04 **, which has a track record of using MongoDB on WSL1.

2. Detailed procedure

python related

As for python itself, it was introduced when ubuntu was introduced, so there is no need to install it again.

--Introduced ** pip3 ** for package installation

sudo apt install python3-pip

--Allow ** sqlite3 to be accessible ** from python

pip3 install pysqlite3 

-** unicodecsv ** can be used

pip3 install unicodecsv 

--Tools for MongoDB operation from python: ** mongo_dao ** introduction is obtained from odicchi / tweet_learning

--In addition, ** pymongo ** is also required and introduced with the following command

pip3 install pymongo

--Introduced ** tweepy **, which is convenient for manipulating tweet data from python.

pip3 install tweepy

--In addition, ** OAuth related packages ** are introduced to easily realize authentication related to operate Twitter API.

pip3 install requests requests_oauthlib

Data storage related

-For the introduction of ** MongoDB **, refer to Try using a database on Windows Subsystem for Linux. , Introduced by issuing the following command. Use the reference points for how to start and stop the database.

sudo apt-get install mongodb

-Introduction of ** sqlite3 **

sudo apt-get install sqlite3 

3. 3. Other

-** To access the Windows file system from WSL **, I referred to File linkage between WSL and windows. For example, if you want to access the c drive, it seems that you can operate it with ** / mnt / c **.

4. Sample execution

--In the build environment, as a sample execution, we will post the python code used when ** the account (ID, account name, screen name) information ** that a certain person (ID) is following is extracted on Twitter. .. For the base part of the code, we use List people who followed on tweepy.

#!/usr/bin/python
import config
import tweepy

#Login settings
twitter_conf = {
    'consumer' : {
        'key'    : config.CONSUMER_KEY,
        'secret' : config.CONSUMER_SECRET
    },
    'access'   : {
        'key'    : config.ACCESS_TOKEN,
        'secret' :config.ACCESS_TOKEN_SECRET
    }
}

#Authentication
auth = tweepy.OAuthHandler(
    twitter_conf['consumer']['key'],
    twitter_conf['consumer']['secret'])
auth.set_access_token(
    twitter_conf['access']['key'],
    twitter_conf['access']['secret'])

#tweepy initialization
api = tweepy.API(auth)
my_info = api.me()

friends_ids = []

id = 'XXXXXXX' #Specify target ID

#Get all IDs of people you follow
#If you use Cursor, it will fetch everything, but since it is not an array, put it in an array
for friend_id in tweepy.Cursor(api.friends_ids, user_id=id).items():
    friends_ids.append(friend_id)

#Get details for each 100 IDs
for i in range(0, len(friends_ids), 100):
    for user in api.lookup_users(user_ids=friends_ids[i:i+100]):
    	print (str(user.id) + " : " + user.name + " : @" + user.screen_name)

--In addition, for authentication information, I created a file called ** config.py ** and went out.

CONSUMER_KEY = "XXXXXX"
CONSUMER_SECRET = "XXXXXX"
ACCESS_TOKEN = "XXXXXX"
ACCESS_TOKEN_SECRET = "XXXXXX"

Related article

  1. Let's install Windows Subsystem for Linux (WSL1)!
  2. Try using a database on Windows Subsystem for Linux
  3. File linkage between WSL and windows
  4. List the people you followed on tweepy

Recommended Posts

I tried to build an environment that can acquire, store, and analyze tweet data in WSL (bash)
I tried to build an environment with WSL + Ubuntu + VS Code in a Windows environment
When I tried to build a Rails environment on WSL2 (Ubuntu 20.04LTS), I stumbled and fell.
I tried to create a class that can easily serialize Json in Python
I tried to build an environment of Ubuntu 20.04 LTS + ROS2 with Raspberry Pi 4
I tried to build an environment for machine learning with Python (Mac OS X)
[Go + Gin] I tried to build a Docker environment
I tried to analyze J League data with Python
I tried to build a service that sells machine-learned data at explosive speed with Docker
I tried to build an environment where work in the Docker container on the remote server can be done directly from the local VS Code with SSH connection
Build a Docker environment that can use PyTorch and JupyterLab
I tried to illustrate the time and time in C language
Build a Python environment and transfer data to the server
I tried to analyze scRNA-seq data using Topological Data Analysis (TDA)
I tried to create an article in Wiki.js with SQLAlchemy
I tried to build an estimation model of article titles that are likely to buzz with Qiita
I made it because I want JSON data that can be used freely in demos and prototypes
[Python] I tried to explain words that are difficult for beginners to understand in an easy-to-understand manner.
I tried to create a server environment that runs on Windows 10
I tried to make an activity that collectively sets location information
I tried to create an environment of MkDocs on Amazon Linux
processing to use notMNIST data in Python (and tried to classify it)
[Pandas] I tried to analyze sales data with Python [For beginners]
Build and try an OpenCV & Python environment in minutes using Docker
I want to acquire and list Japanese stock data without scraping
I tried to make an analysis base of 5 patterns in 3 years
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University
[Python] I tried to summarize the set type (set) in an easy-to-understand manner.
I tried to summarize until I quit the bank and became an engineer
I tried to summarize Cpaw Level1 & Level2 Write Up in an easy-to-understand manner
I tried to develop a Formatter that outputs Python logs in JSON
I tried to make Kana's handwriting recognition Part 2/3 Data creation and learning
I tried to verify and analyze the acceleration of Python by Cython
I tried to summarize Cpaw Level 3 Write Up in an easy-to-understand manner
I compared using Dash and Streamlit in Docker environment using B league data
I tried to build a Mac Python development environment with pythonz + direnv
I implemented the VGG16 model in Keras and tried to identify CIFAR10
I tried to make PyTorch model API in Azure environment using TorchServe