SimRank is a method of calculating (recursively) the similarity between nodes in a graph.
It has the feature that the similarity can be calculated without having a common adjacent node.
Papers Easy-to-understand explanation in Japanese Wikipedia
By the way, the second author of the SimRank paper is a super-famous DB researcher.
In the paper, there was a method to calculate SimRank for bipartite graph, so I implemented it as well.
simrank.py
import numpy as np
def simrank(G,C,n,t=10):
S = np.identity(n)
I = np.identity(n)
G = normalize(G)
i = 1
while True:
S = C * np.dot(np.dot(G.T,S),G) + (1-C) * I
for j in range(n):
S[j][j] = 1
if i >= t:
break
i += 1
return S
def bipertite_simrank(G,C,n,m,diag_1=True,t=10):
S1 = np.identity(n)
S2 = np.identity(m)
I1 = np.identity(n)
I2 = np.identity(m)
G_T = normalize(G.T)
G = normalize(G)
i = 1
while True:
S2 = C * np.dot(np.dot(G.T,S1),G) + (1-C) * I2
for j in range(m):
S2[j][j] = 1
S1 = C * np.dot(np.dot(G_T.T,S2),G_T) + (1-C) * I1
for j in range(n):
S1[j][j] = 1
if i >= t:
break
i += 1
return (S1,S2)
def normalize(G):
s = G.sum(0)
return G/s
if __name__ == '__main__':
C = 0.8
""" univ """
n = 5
G = np.zeros((n,n))
G[0][1] = 1
G[0][2] = 1
G[1][3] = 1
G[2][4] = 1
G[3][0] = 1
G[4][2] = 1
S = simrank(G,C,n)
print S
print '=================='
""" cake """
n = 2
m = 4
G = np.zeros((n,m))
G[0][0] = 1
G[0][1] = 1
G[0][2] = 1
G[1][1] = 1
G[1][2] = 1
G[1][3] = 1
S1, S2 = bipertite_simrank(G,C,n,m)
print S1
print S2
As a matter of fact, just calculate and update the matrix product.
The value of C is said to be about 0.8 in the paper, but it seems that about 0.6 is good in subsequent research (source is Wikipedia).
Also, the number of iterations is enough in the paper, but it seems that it is better to do more in subsequent research (source is Wikipedia).
>> python simrank.py
[[ 1. 0. 0.1321943 0. 0.032768 ]
[ 0. 1. 0.4131072 0. 0.10575544]
[ 0.1321943 0.4131072 1. 0.04096 0.08687124]
[ 0. 0. 0.04096 1. 0.33048576]
[ 0.032768 0.10575544 0.08687124 0.33048576 1. ]]
==================
[[ 1. 0.54658196]
[ 0.54658196 1. ]]
[[ 1. 0.61863088 0.61863088 0.43726175]
[ 0.61863088 1. 0.61863088 0.61863088]
[ 0.61863088 0.61863088 1. 0.61863088]
[ 0.43726175 0.61863088 0.61863088 1. ]]
It matches the results in the paper (Figure 1, Figure 2), so I'm sure it matches!
Recommended Posts