In the Windows environment, we have built an environment ** for acquiring, storing, and analyzing tweet data, so we have organized it as a memorandum. In particular, in ** analysis, I wanted to easily perform file operations ** (using grep, sed, awk, python, etc.), so [WSL (Windows Subsystem for Linux)](https: // ja. Wikipedia.org/wiki/Windows_Subsystem_for_Linux) was enabled and ** bash (ubuntu) ** was introduced.
――We decided to realize a series of environments on ** WSL1 bash **. --For WSL, Let's install Windows Subsystem for Linux (WSL1)! was introduced as a reference. ――This time, I wanted to use MongoDB for file operations, so I built a bash with ** ubuntu 18.04 **, which has a track record of using MongoDB on WSL1.
As for python itself, it was introduced when ubuntu was introduced, so there is no need to install it again.
--Introduced ** pip3 ** for package installation
sudo apt install python3-pip
--Allow ** sqlite3 to be accessible ** from python
pip3 install pysqlite3
-** unicodecsv ** can be used
pip3 install unicodecsv
--Tools for MongoDB operation from python: ** mongo_dao ** introduction is obtained from odicchi / tweet_learning
--In addition, ** pymongo ** is also required and introduced with the following command
pip3 install pymongo
--Introduced ** tweepy **, which is convenient for manipulating tweet data from python.
pip3 install tweepy
--In addition, ** OAuth related packages ** are introduced to easily realize authentication related to operate Twitter API.
pip3 install requests requests_oauthlib
-For the introduction of ** MongoDB **, refer to Try using a database on Windows Subsystem for Linux. , Introduced by issuing the following command. Use the reference points for how to start and stop the database.
sudo apt-get install mongodb
-Introduction of ** sqlite3 **
sudo apt-get install sqlite3
-** To access the Windows file system from WSL **, I referred to File linkage between WSL and windows. For example, if you want to access the c drive, it seems that you can operate it with ** / mnt / c **.
--In the build environment, as a sample execution, we will post the python code used when ** the account (ID, account name, screen name) information ** that a certain person (ID) is following is extracted on Twitter. .. For the base part of the code, we use List people who followed on tweepy.
#!/usr/bin/python
import config
import tweepy
#Login settings
twitter_conf = {
'consumer' : {
'key' : config.CONSUMER_KEY,
'secret' : config.CONSUMER_SECRET
},
'access' : {
'key' : config.ACCESS_TOKEN,
'secret' :config.ACCESS_TOKEN_SECRET
}
}
#Authentication
auth = tweepy.OAuthHandler(
twitter_conf['consumer']['key'],
twitter_conf['consumer']['secret'])
auth.set_access_token(
twitter_conf['access']['key'],
twitter_conf['access']['secret'])
#tweepy initialization
api = tweepy.API(auth)
my_info = api.me()
friends_ids = []
id = 'XXXXXXX' #Specify target ID
#Get all IDs of people you follow
#If you use Cursor, it will fetch everything, but since it is not an array, put it in an array
for friend_id in tweepy.Cursor(api.friends_ids, user_id=id).items():
friends_ids.append(friend_id)
#Get details for each 100 IDs
for i in range(0, len(friends_ids), 100):
for user in api.lookup_users(user_ids=friends_ids[i:i+100]):
print (str(user.id) + " : " + user.name + " : @" + user.screen_name)
--In addition, for authentication information, I created a file called ** config.py ** and went out.
CONSUMER_KEY = "XXXXXX"
CONSUMER_SECRET = "XXXXXX"
ACCESS_TOKEN = "XXXXXX"
ACCESS_TOKEN_SECRET = "XXXXXX"
Recommended Posts