Article content

Friend of a friend Friend of a friend and a friend ... Who will you reach? This person and this person are connected in a surprising place! There is something like that.

In this article, I tried to visualize the people who follow each other on Twitter.

What i did

There are two main things I have done. --Get information about the users you are following --Visualize

Get information about users you are following

import json
import config
from requests_oauthlib import OAuth1Session
from time import sleep
from mongo_dao import MongoDAO
import datetime

# API key setting (defined in another file config.py)
CK = config.CONSUMER_KEY
CS = config.CONSUMER_SECRET
AT = config.ACCESS_TOKEN
ATS = config.ACCESS_TOKEN_SECRET

# Authentication process
twitter = OAuth1Session(CK, CS, AT, ATS)  

mongo = MongoDAO("db", "followers_info")

 get_friends_url = "https://api.twitter.com/1.1/friends/list.json" # Get the account you are following
 get_user_info_url = "https://api.twitter.com/1.1/users/show.json" # Get user information
count = 200
targets = ['yurinaNECOPLA']
registed_list = []
 depth = 2 #Dive depth
 max_friends_count = 1000 # There are a lot of people who follow accounts, so exclude them if they exceed a certain number

# Determine if the number of follow accounts exceeds a certain number
def judge_friends_count(screen_name):
    params = {'screen_name': screen_name}
    while True:
        res = twitter.get(get_user_info_url, params=params)
        result_json = json.loads(res.text)
        if res.status_code == 200:
 #The number of people you are following is "friends_count", the number of people you are following is "followers_count"
            if result_json['friends_count'] > max_friends_count:
                return False
            else:
                return True
        elif res.status_code == 429:
 # You can only send requests 15 times in 15 minutes, so wait if you reach the limit
            now = datetime.datetime.now()
 print (now.strftime ("% Y /% m /% d% H:% M:% S") +'wait for connection limit')
 sleep (15 * 60) # wait 15 minutes
        else:
            return False

# Get followers for the specified screen_name
def get_followers_info(screen_name):
    followers_info = []
    params = {'count': count,'screen_name': screen_name}
    while True:
        res = twitter.get(get_friends_url, params=params)
        result_json = json.loads(res.text)

        if res.status_code == 200 and len(result_json['users']) != 0:
                for user in result_json['users']:
 Of the information obtained from #API, set only the necessary information in dict format (id is not used in this PG ...)
                    followers_info.append({'screen_name': user['screen_name'], 'id': user['id']})
 Set the following acquisition position in the # parameter
                params['cursor'] = result_json['next_cursor']
 #Processing when API connection limit is exceeded
        elif res.status_code == 429:
            now = datetime.datetime.now()
 print (now.strftime ("% Y /% m /% d% H:% M:% S") +'wait for connection limit')
 sleep (15 * 60) # wait 1 minute
        else:
            break
    return followers_info

# Get list of screen_name only from list of dict
def followers_list(followers_info):
    followers_list = []
    for follower in followers_info:
        followers_list.append(follower['screen_name'])
    return followers_list

# Recursive processing
def dive_search(target_list, d):
    for name in target_list:
        if name in registed_list or not judge_friends_count(name):
            continue
        print(name)
        followers_info = get_followers_info(name)
        mongo.insert_one({'screen_name': name, 'followers_info': followers_info})
        registed_list.append(name)
        if depth > d:
            dive_search(followers_list(followers_info), d + 1)
        else:
            return
    
dive_search(targets, 0)

In this program, decide the starting account. (Starting from the account of Yurina Aoshima, an idol group called // Necopla //)

After that, it will be processed recursively according to the following flow. ① Get information about the users you are following ② Register the information of ① in mongoDB ③ Get the user information acquired in ① one by one and execute from ①

You can change how recursively you dive by changing the value of depth.

If it is 2, it is an image to get friends of friends. I really wanted to get more data, but the API to get follow-related information can only send 15 requests in 15 minutes. The starting account currently follows 100 accounts, but even if I started with this account, it took about 3 hours to complete the process. What's more, the error "The existing connection was forcibly disconnected by the remote host" occurred on the way, and the process failed.

At this point, only about 60 of the 100 user accounts we are following have been completed. Even if it worked, I think it took about 6 hours.

The following code is used for data registration to mongoDB.

MongoDao

Visualize

As mentioned in the previous section, it cannot be said that all the data has been collected, but for the time being, let's visualize it with the collected data.

The library used for visualization was NetworkX. Installation can be done with the following command.

pip install networkx

import json
import networkx as nx
import matplotlib.pyplot as plt
from requests_oauthlib import OAuth1Session
from mongo_dao import MongoDAO

mongo = MongoDAO("db", "followers_info")
start_screen_name = 'yurinaNECOPLA'

# Create a new graph
G = nx.Graph()
 #Add node
G.add_node(start_screen_name)

depth = 3
processed_list = []

def get_followers_list(screen_name):
    result = mongo.find(filter={"screen_name": screen_name})
    followers_list = []
    try:
        doc = result.next()
        if doc != None:
            for user in doc['followers_info']:
                followers_list.append(user['screen_name'])
        return followers_list
    except StopIteration:
        return followers_list

def dive(screen_name, d):
    if depth > 0:
        if screen_name in processed_list:
            return
        followers_list = get_followers_list(screen_name)
        for screen_name in followers_list:
            f = get_followers_list(follower)
            if start_screen_name in f:
                G.add_edge(screen_name, follower)
                processed_list.append(screen_name)
                dive(follower, d + 1)
    else:
        return

dive(start_screen_name, 0)

# Creating a diagram. figsize is the size of the figure
plt.figure(figsize=(10, 8))
 
# Determine the layout of the figure. The smaller the value of k, the denser the figure
pos = nx.spring_layout(G, k=0.8)
 
# Drawing nodes and edges
# _color: Specify color
# alpha: Specifying transparency
nx.draw_networkx_edges(G, pos, edge_color='y')
nx.draw_networkx_nodes(G, pos, node_color='r', alpha=0.5)
 
# Add node name
nx.draw_networkx_labels(G, pos, font_size=10)
 
# Setting not to display X-axis and Y-axis
plt.axis('off')

plt.savefig("mutual_follow.png ")
# Draw a diagram
plt.show()

The procedure and logic for getting followers is similar. We recursively get followers and add edges when we find accounts that are following each other.

result

It turned out to be something like this.

I don't understand the detailed mechanism of the library, but accounts with many connections are crowded together. This crowded account is an idol belonging to the same office, so I was satisfied with the result.

Impressions

The result was quite interesting. Since the Twitter API request issuance limit is 15 / min, we couldn't increase the amount of data very much. If you can find time and collect more data, you may be able to see the connection of friends of friends of friends.