I tried to build an environment that can acquire, store, and analyze tweet data in WSL (bash)

background

In the Windows environment, we have built an environment ** for acquiring, storing, and analyzing tweet data, so we have organized it as a memorandum. In particular, in ** analysis, I wanted to easily perform file operations ** (using grep, sed, awk, python, etc.), so [WSL (Windows Subsystem for Linux)](https: // ja. Wikipedia.org/wiki/Windows_Subsystem_for_Linux) was enabled and ** bash (ubuntu) ** was introduced.

1. 1. Procedure outline

――We decided to realize a series of environments on ** WSL1 bash **. --For WSL, Let's install Windows Subsystem for Linux (WSL1)! was introduced as a reference. ――This time, I wanted to use MongoDB for file operations, so I built a bash with ** ubuntu 18.04 **, which has a track record of using MongoDB on WSL1.

2. Detailed procedure

python related

As for python itself, it was introduced when ubuntu was introduced, so there is no need to install it again.

--Introduced ** pip3 ** for package installation

sudo apt install python3-pip

--Allow ** sqlite3 to be accessible ** from python

pip3 install pysqlite3

-** unicodecsv ** can be used

pip3 install unicodecsv

--Tools for MongoDB operation from python: ** mongo_dao ** introduction is obtained from odicchi / tweet_learning

--In addition, ** pymongo ** is also required and introduced with the following command

pip3 install pymongo

--Introduced ** tweepy **, which is convenient for manipulating tweet data from python.

pip3 install tweepy

--In addition, ** OAuth related packages ** are introduced to easily realize authentication related to operate Twitter API.

pip3 install requests requests_oauthlib

Data storage related

-For the introduction of ** MongoDB **, refer to Try using a database on Windows Subsystem for Linux. , Introduced by issuing the following command. Use the reference points for how to start and stop the database.

sudo apt-get install mongodb

-Introduction of ** sqlite3 **

sudo apt-get install sqlite3

3. 3. Other

-** To access the Windows file system from WSL **, I referred to File linkage between WSL and windows. For example, if you want to access the c drive, it seems that you can operate it with ** / mnt / c **.

4. Sample execution

--In the build environment, as a sample execution, we will post the python code used when ** the account (ID, account name, screen name) information ** that a certain person (ID) is following is extracted on Twitter. .. For the base part of the code, we use List people who followed on tweepy.

#!/usr/bin/python
import config
import tweepy

#Login settings
twitter_conf = {
    'consumer' : {
        'key'    : config.CONSUMER_KEY,
        'secret' : config.CONSUMER_SECRET
    },
    'access'   : {
        'key'    : config.ACCESS_TOKEN,
        'secret' :config.ACCESS_TOKEN_SECRET
    }
}

#Authentication
auth = tweepy.OAuthHandler(
    twitter_conf['consumer']['key'],
    twitter_conf['consumer']['secret'])
auth.set_access_token(
    twitter_conf['access']['key'],
    twitter_conf['access']['secret'])

#tweepy initialization
api = tweepy.API(auth)
my_info = api.me()

friends_ids = []

id = 'XXXXXXX'　#Specify target ID

#Get all IDs of people you follow
#If you use Cursor, it will fetch everything, but since it is not an array, put it in an array
for friend_id in tweepy.Cursor(api.friends_ids, user_id=id).items():
    friends_ids.append(friend_id)

#Get details for each 100 IDs
for i in range(0, len(friends_ids), 100):
    for user in api.lookup_users(user_ids=friends_ids[i:i+100]):
    	print (str(user.id) + " : " + user.name + " : @" + user.screen_name)

--In addition, for authentication information, I created a file called ** config.py ** and went out.

CONSUMER_KEY = "XXXXXX"
CONSUMER_SECRET = "XXXXXX"
ACCESS_TOKEN = "XXXXXX"
ACCESS_TOKEN_SECRET = "XXXXXX"