[Python] I tried to visualize the follow relationship of Twitter

Article content

Friend of a friend Friend of a friend and a friend ... Who will you reach? This person and this person are connected in a surprising place! There is something like that.

In this article, I tried to visualize the people who follow each other on Twitter.

What i did

There are two main things I have done. --Get information about the users you are following --Visualize

Get information about users you are following

import json
import config
from requests_oauthlib import OAuth1Session
from time import sleep
from mongo_dao import MongoDAO
import datetime

# API key setting (defined in another file config.py)
CK = config.CONSUMER_KEY
CS = config.CONSUMER_SECRET
AT = config.ACCESS_TOKEN
ATS = config.ACCESS_TOKEN_SECRET

# Authentication process
twitter = OAuth1Session(CK, CS, AT, ATS)  

mongo = MongoDAO("db", "followers_info")

 get_friends_url = "https://api.twitter.com/1.1/friends/list.json" # Get the account you are following
 get_user_info_url = "https://api.twitter.com/1.1/users/show.json" # Get user information
count = 200
targets = ['yurinaNECOPLA']
registed_list = []
 depth = 2 #Dive depth
 max_friends_count = 1000 # There are a lot of people who follow accounts, so exclude them if they exceed a certain number

# Determine if the number of follow accounts exceeds a certain number
def judge_friends_count(screen_name):
    params = {'screen_name': screen_name}
    while True:
        res = twitter.get(get_user_info_url, params=params)
        result_json = json.loads(res.text)
        if res.status_code == 200:
 #The number of people you are following is "friends_count", the number of people you are following is "followers_count"
            if result_json['friends_count'] > max_friends_count:
                return False
            else:
                return True
        elif res.status_code == 429:
 # You can only send requests 15 times in 15 minutes, so wait if you reach the limit
            now = datetime.datetime.now()
 print (now.strftime ("% Y /% m /% d% H:% M:% S") +'wait for connection limit')
 sleep (15 * 60) # wait 15 minutes
        else:
            return False

# Get followers for the specified screen_name
def get_followers_info(screen_name):
    followers_info = []
    params = {'count': count,'screen_name': screen_name}
    while True:
        res = twitter.get(get_friends_url, params=params)
        result_json = json.loads(res.text)

        if res.status_code == 200 and len(result_json['users']) != 0:
                for user in result_json['users']:
 Of the information obtained from #API, set only the necessary information in dict format (id is not used in this PG ...)
                    followers_info.append({'screen_name': user['screen_name'], 'id': user['id']})
 Set the following acquisition position in the # parameter
                params['cursor'] = result_json['next_cursor']
 #Processing when API connection limit is exceeded
        elif res.status_code == 429:
            now = datetime.datetime.now()
 print (now.strftime ("% Y /% m /% d% H:% M:% S") +'wait for connection limit')
 sleep (15 * 60) # wait 1 minute
        else:
            break
    return followers_info

# Get list of screen_name only from list of dict
def followers_list(followers_info):
    followers_list = []
    for follower in followers_info:
        followers_list.append(follower['screen_name'])
    return followers_list

# Recursive processing
def dive_search(target_list, d):
    for name in target_list:
        if name in registed_list or not judge_friends_count(name):
            continue
        print(name)
        followers_info = get_followers_info(name)
        mongo.insert_one({'screen_name': name, 'followers_info': followers_info})
        registed_list.append(name)
        if depth > d:
            dive_search(followers_list(followers_info), d + 1)
        else:
            return
    
dive_search(targets, 0)

In this program, decide the starting account. (Starting from the account of Yurina Aoshima, an idol group called // Necopla //)

After that, it will be processed recursively according to the following flow. ① Get information about the users you are following ② Register the information of ① in mongoDB ③ Get the user information acquired in ① one by one and execute from ①

You can change how recursively you dive by changing the value of depth.

If it is 2, it is an image to get friends of friends. I really wanted to get more data, but the API to get follow-related information can only send 15 requests in 15 minutes. The starting account currently follows 100 accounts, but even if I started with this account, it took about 3 hours to complete the process. What's more, the error "The existing connection was forcibly disconnected by the remote host" occurred on the way, and the process failed.

At this point, only about 60 of the 100 user accounts we are following have been completed. Even if it worked, I think it took about 6 hours.

The following code is used for data registration to mongoDB.

MongoDao

Visualize

As mentioned in the previous section, it cannot be said that all the data has been collected, but for the time being, let's visualize it with the collected data.

The library used for visualization was NetworkX. Installation can be done with the following command.

pip install networkx
import json
import networkx as nx
import matplotlib.pyplot as plt
from requests_oauthlib import OAuth1Session
from mongo_dao import MongoDAO

mongo = MongoDAO("db", "followers_info")
start_screen_name = 'yurinaNECOPLA'

# Create a new graph
G = nx.Graph()
 #Add node
G.add_node(start_screen_name)

depth = 3
processed_list = []

def get_followers_list(screen_name):
    result = mongo.find(filter={"screen_name": screen_name})
    followers_list = []
    try:
        doc = result.next()
        if doc != None:
            for user in doc['followers_info']:
                followers_list.append(user['screen_name'])
        return followers_list
    except StopIteration:
        return followers_list

def dive(screen_name, d):
    if depth > 0:
        if screen_name in processed_list:
            return
        followers_list = get_followers_list(screen_name)
        for screen_name in followers_list:
            f = get_followers_list(follower)
            if start_screen_name in f:
                G.add_edge(screen_name, follower)
                processed_list.append(screen_name)
                dive(follower, d + 1)
    else:
        return

dive(start_screen_name, 0)

# Creating a diagram. figsize is the size of the figure
plt.figure(figsize=(10, 8))
 
# Determine the layout of the figure. The smaller the value of k, the denser the figure
pos = nx.spring_layout(G, k=0.8)
 
# Drawing nodes and edges
# _color: Specify color
# alpha: Specifying transparency
nx.draw_networkx_edges(G, pos, edge_color='y')
nx.draw_networkx_nodes(G, pos, node_color='r', alpha=0.5)
 
# Add node name
nx.draw_networkx_labels(G, pos, font_size=10)
 
# Setting not to display X-axis and Y-axis
plt.axis('off')

plt.savefig("mutual_follow.png ")
# Draw a diagram
plt.show()

The procedure and logic for getting followers is similar. We recursively get followers and add edges when we find accounts that are following each other.

result

It turned out to be something like this. mutual_follow.png

I don't understand the detailed mechanism of the library, but accounts with many connections are crowded together. This crowded account is an idol belonging to the same office, so I was satisfied with the result.

Impressions

The result was quite interesting. Since the Twitter API request issuance limit is 15 / min, we couldn't increase the amount of data very much. If you can find time and collect more data, you may be able to see the connection of friends of friends of friends.

Recommended Posts

[Python] I tried to visualize the follow relationship of Twitter
I tried to visualize the spacha information of VTuber
I tried to summarize the string operations of Python
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to find the entropy of the image with python
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to improve the efficiency of daily work with Python
I tried to visualize the common condition of VTuber channel viewers
I tried to touch the API of ebay
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
[Python] I tried to visualize the night on the Galactic Railroad with WordCloud!
I tried to visualize the age group and rate distribution of Atcoder
I tried to get the authentication code of Qiita API with Python.
(Python) I tried to analyze 1 million hands ~ I tried to estimate the number of AA ~
I tried to verify and analyze the acceleration of Python by Cython
I tried to streamline the standard role of new employees with Python
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to get the movie information of TMDb API with Python
Using COTOHA, I tried to follow the emotional course of Run, Melos!
[Python] I tried to visualize the prize money of "ONE PIECE" over 100 million characters with matplotlib.
python beginners tried to predict the number of criminals
I tried to graph the packages installed in Python
I tried to summarize how to use matplotlib of python
I tried to summarize the basic form of GPLVM
I tried to touch the CSV file with Python
[Python] I tried to graph the top 10 eyeshadow rankings
I tried to erase the negative part of Meros
I tried to solve the problem with Python Vol.1
[Python] I tried to get Json of squid ring 2
I tried to classify the voices of voice actors
[Python] I tried to judge the member image of the idol group using Keras
I tried to automate the 100 yen deposit of Rakuten horse racing (python / selenium)
I tried to refactor the code of Python beginner (junior high school student)
I tried to automatically send the literature of the new coronavirus to LINE with Python
[Horse Racing] I tried to quantify the strength of racehorses
I tried "gamma correction" of the image with Python + OpenCV
I tried to simulate how the infection spreads with Python
I tried to get the location information of Odakyu Bus
I tried the accuracy of three Stirling's approximations in python
I tried to find the average of the sequence with TensorFlow
I tried follow management with Twitter API and Python (easy)
I tried to automate the article update of Livedoor blog with Python and selenium.
[Python] I tried to visualize tweets about Corona with WordCloud
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
[First data science ⑥] I tried to visualize the market price of restaurants in Tokyo
I tried to implement the mail sending function in Python
[TF] I tried to visualize the learning result using Tensorboard
[Machine learning] I tried to summarize the theory of Adaboost
I want to know the features of Python and pip
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
[Python] I tried collecting data using the API of wikipedia
I tried to enumerate the differences between java and python
I tried to fight the Local Minimum of Goldstein-Price Function
I tried changing the python script from 2.7.11 to 3.6.0 on windows10
I tried to divide the file into folders with Python
I tried to implement blackjack of card game in Python
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried to move the ball
I tried to estimate the interval.
The 15th offline real-time I tried to solve the problem of how to write with python