I wanted to extract something like a "small group where all of each other is connected", that is, a community, from the entire network, so I did it with the python library "NetworkX".
https://networkx.github.io/
In human terms, it is a "group of people who are friends with each other", and it is a set that is a complete graph (all connected) as a complement of the entire network. It seems to be called "Clique" in graph theory.
Experiment with a small set of 6 nodes.
from networkx import *
import matplotlib.pyplot as plt
#raw data
data = [["A","B"],["A","C"],["C","B"],["D","C"],["D","E"],["C","E"],["E","F"]]
#Building nodes
G = nx.Graph()
for p in list(set([r[0] for r in data] + [r[1] for r in data])):
G.add_node(p)
#Edge construction
for d in data:
G.add_edge(d[0],d[1])
#drawing
pos = nx.spring_layout(G)
nx.draw_networkx_nodes(G, pos, node_size=1200,node_color="white")
nx.draw_networkx_edges(G, pos)
nx.draw_networkx_labels(G, pos)
plt.show()
Visualized graph structure
Now that we have a graph structure, let's detect the community.
cliques = nx.find_cliques(G)
#Standard output
print [ c for c in cliques]
Standard output
[['C', 'A', 'B'], ['C', 'E', 'D'], ['F', 'E']]
came out! You can extract the community properly!
If it is small data, it feels like Hoon, but if you do it with large data, there are various discoveries.
It is very fast because it can process about 100,000 data in a few seconds. However, clique detection seems to be a system processing whose processing time explodes as the number of nodes increases (probably a problem with NP), so if it is more big data, NetWork X may reach a plateau, but it is still a mystery.
https://en.wikipedia.org/wiki/Clique_problem
Perhaps serious people use Spark or something.
Recommended Posts