[Python] I tried to visualize the follow relationship of Twitter

Article content

Friend of a friend Friend of a friend and a friend ... Who will you reach? This person and this person are connected in a surprising place! There is something like that.

In this article, I tried to visualize the people who follow each other on Twitter.

What i did

There are two main things I have done. --Get information about the users you are following --Visualize

Get information about users you are following

import json
import config
from requests_oauthlib import OAuth1Session
from time import sleep
from mongo_dao import MongoDAO
import datetime

# API key setting (defined in another file config.py)
CK = config.CONSUMER_KEY
CS = config.CONSUMER_SECRET
AT = config.ACCESS_TOKEN
ATS = config.ACCESS_TOKEN_SECRET

# Authentication process
twitter = OAuth1Session(CK, CS, AT, ATS)  

mongo = MongoDAO("db", "followers_info")

 get_friends_url = "https://api.twitter.com/1.1/friends/list.json" # Get the account you are following
 get_user_info_url = "https://api.twitter.com/1.1/users/show.json" # Get user information
count = 200
targets = ['yurinaNECOPLA']
registed_list = []
 depth = 2 #Dive depth
 max_friends_count = 1000 # There are a lot of people who follow accounts, so exclude them if they exceed a certain number

# Determine if the number of follow accounts exceeds a certain number
def judge_friends_count(screen_name):
    params = {'screen_name': screen_name}
    while True:
        res = twitter.get(get_user_info_url, params=params)
        result_json = json.loads(res.text)
        if res.status_code == 200:
 #The number of people you are following is "friends_count", the number of people you are following is "followers_count"
            if result_json['friends_count'] > max_friends_count:
                return False
            else:
                return True
        elif res.status_code == 429:
 # You can only send requests 15 times in 15 minutes, so wait if you reach the limit
            now = datetime.datetime.now()
 print (now.strftime ("% Y /% m /% d% H:% M:% S") +'wait for connection limit')
 sleep (15 * 60) # wait 15 minutes
        else:
            return False

# Get followers for the specified screen_name
def get_followers_info(screen_name):
    followers_info = []
    params = {'count': count,'screen_name': screen_name}
    while True:
        res = twitter.get(get_friends_url, params=params)
        result_json = json.loads(res.text)

        if res.status_code == 200 and len(result_json['users']) != 0:
                for user in result_json['users']:
 Of the information obtained from #API, set only the necessary information in dict format (id is not used in this PG ...)
                    followers_info.append({'screen_name': user['screen_name'], 'id': user['id']})
 Set the following acquisition position in the # parameter
                params['cursor'] = result_json['next_cursor']
 #Processing when API connection limit is exceeded
        elif res.status_code == 429:
            now = datetime.datetime.now()
 print (now.strftime ("% Y /% m /% d% H:% M:% S") +'wait for connection limit')
 sleep (15 * 60) # wait 1 minute
        else:
            break
    return followers_info

# Get list of screen_name only from list of dict
def followers_list(followers_info):
    followers_list = []
    for follower in followers_info:
        followers_list.append(follower['screen_name'])
    return followers_list

# Recursive processing
def dive_search(target_list, d):
    for name in target_list:
        if name in registed_list or not judge_friends_count(name):
            continue
        print(name)
        followers_info = get_followers_info(name)
        mongo.insert_one({'screen_name': name, 'followers_info': followers_info})
        registed_list.append(name)
        if depth > d:
            dive_search(followers_list(followers_info), d + 1)
        else:
            return
    
dive_search(targets, 0)

In this program, decide the starting account. (Starting from the account of Yurina Aoshima, an idol group called // Necopla //)

After that, it will be processed recursively according to the following flow. ① Get information about the users you are following ② Register the information of ① in mongoDB ③ Get the user information acquired in ① one by one and execute from ①

You can change how recursively you dive by changing the value of depth.

If it is 2, it is an image to get friends of friends. I really wanted to get more data, but the API to get follow-related information can only send 15 requests in 15 minutes. The starting account currently follows 100 accounts, but even if I started with this account, it took about 3 hours to complete the process. What's more, the error "The existing connection was forcibly disconnected by the remote host" occurred on the way, and the process failed.

At this point, only about 60 of the 100 user accounts we are following have been completed. Even if it worked, I think it took about 6 hours.

The following code is used for data registration to mongoDB.

MongoDao

Visualize

As mentioned in the previous section, it cannot be said that all the data has been collected, but for the time being, let's visualize it with the collected data.

The library used for visualization was NetworkX. Installation can be done with the following command.

pip install networkx
import json
import networkx as nx
import matplotlib.pyplot as plt
from requests_oauthlib import OAuth1Session
from mongo_dao import MongoDAO

mongo = MongoDAO("db", "followers_info")
start_screen_name = 'yurinaNECOPLA'

# Create a new graph
G = nx.Graph()
 #Add node
G.add_node(start_screen_name)

depth = 3
processed_list = []

def get_followers_list(screen_name):
    result = mongo.find(filter={"screen_name": screen_name})
    followers_list = []
    try:
        doc = result.next()
        if doc != None:
            for user in doc['followers_info']:
                followers_list.append(user['screen_name'])
        return followers_list
    except StopIteration:
        return followers_list

def dive(screen_name, d):
    if depth > 0:
        if screen_name in processed_list:
            return
        followers_list = get_followers_list(screen_name)
        for screen_name in followers_list:
            f = get_followers_list(follower)
            if start_screen_name in f:
                G.add_edge(screen_name, follower)
                processed_list.append(screen_name)
                dive(follower, d + 1)
    else:
        return

dive(start_screen_name, 0)

# Creating a diagram. figsize is the size of the figure
plt.figure(figsize=(10, 8))
 
# Determine the layout of the figure. The smaller the value of k, the denser the figure
pos = nx.spring_layout(G, k=0.8)
 
# Drawing nodes and edges
# _color: Specify color
# alpha: Specifying transparency
nx.draw_networkx_edges(G, pos, edge_color='y')
nx.draw_networkx_nodes(G, pos, node_color='r', alpha=0.5)
 
# Add node name
nx.draw_networkx_labels(G, pos, font_size=10)
 
# Setting not to display X-axis and Y-axis
plt.axis('off')

plt.savefig("mutual_follow.png ")
# Draw a diagram
plt.show()

The procedure and logic for getting followers is similar. We recursively get followers and add edges when we find accounts that are following each other.

result

It turned out to be something like this. mutual_follow.png

I don't understand the detailed mechanism of the library, but accounts with many connections are crowded together. This crowded account is an idol belonging to the same office, so I was satisfied with the result.

Impressions

The result was quite interesting. Since the Twitter API request issuance limit is 15 / min, we couldn't increase the amount of data very much. If you can find time and collect more data, you may be able to see the connection of friends of friends of friends.

Recommended Posts

[Python] I tried to visualize the follow relationship of Twitter
I tried to visualize the spacha information of VTuber
I tried to summarize the string operations of Python
I tried to improve the efficiency of daily work with Python
I tried to visualize the common condition of VTuber channel viewers
I tried to vectorize the lyrics of Hinatazaka46!
I tried to verify and analyze the acceleration of Python by Cython
I tried to streamline the standard role of new employees with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
Using COTOHA, I tried to follow the emotional course of Run, Melos!
[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
[Python] I tried to graph the top 10 eyeshadow rankings
I tried to erase the negative part of Meros
I tried to solve the problem with Python Vol.1
[Python] I tried to get Json of squid ring 2
I tried to classify the voices of voice actors
I tried to refactor the code of Python beginner (junior high school student)
I tried to automatically send the literature of the new coronavirus to LINE with Python
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
[First data science ⑥] I tried to visualize the market price of restaurants in Tokyo
[TF] I tried to visualize the learning result using Tensorboard
I want to know the features of Python and pip
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
I tried to enumerate the differences between java and python
I tried to classify the quality of wine with PyCaret
I tried to fight the Local Minimum of Goldstein-Price Function
I tried changing the python script from 2.7.11 to 3.6.0 on windows10
I tried to implement blackjack of card game in Python
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried to move the ball
I tried to estimate the interval.
The 15th offline real-time I tried to solve the problem of how to write with python
I wrote a doctest in "I tried to simulate the probability of a bingo game with Python"
[Natural language processing] I tried to visualize the remarks of each member in the Slack community
I tried to expand the size of the logical volume with LVM
I tried to visualize Boeing of violin performance by pose estimation
I tried to summarize the frequently used implementation method of pytest-mock
I tried to automatically collect images of Kanna Hashimoto with Python! !!
PhytoMine-I tried to get the genetic information of plants with Python
I tried to estimate the pi stochastically
I tried to touch the COTOHA API
I tried to implement PPO in Python
Python: I tried the traveling salesman problem
The Python project template I think of.
[Python] I tried to calculate TF-IDF steadily
I tried to touch Python (basic syntax)
I tried the Python Tornado Testing Framework
I tried to verify the speaker identification by the Speaker Recognition API of Azure Cognitive Services with Python. # 2
I tried to summarize the contents of each package saved by Python pip in one line
I tried to solve the first question of the University of Tokyo 2019 math entrance exam with python sympy
I tried to find the optimal path of the dreamland by (quantum) annealing
I tried to extract and illustrate the stage of the story using COTOHA
I tried with the top 100 PyPI packages> I tried to graph the packages installed on Python
I tried to analyze the negativeness of Nono Morikubo. [Compare with Posipa]
I tried to visualize the model with the low-code machine learning library "PyCaret"
[Linux] I tried to verify the secure confirmation method of FQDN (CentOS7)
I tried to get the RSS of the top song of the iTunes store automatically
I made a program to check the size of a file in Python
I tried the common story of using Deep Learning to predict the Nikkei 225
I tried to verify the result of A / B test by chi-square test
Python: I want to measure the processing time of a function neatly