Implemented label propagation method in Python

Preface

A technique for estimating the label of a node on a graph.

lp.py was implemented by referring to here. Or rather, almost as it is.

lp_original.py was implemented by referring to Chapter 2 of Paper.

In lp_original.py, we also implemented a method to calculate the iterative and iterative matrix product without solving the simultaneous equations (lp_iter).

Implementation

lp.py


# -*- coding: utf-8 -*-
import sys
from scipy.sparse import *
from scipy.sparse.linalg import spsolve

def load_adjucency(filename,n):
    W = lil_matrix((n,n))
    for line in open(filename, 'r'):
        entries = line.rstrip().split(' ')
        src = int(entries[0])
        dst = int(entries[1])
        W[src,dst] = 1
    return W

def load_labels(filename,n):
    y = lil_matrix((1,n))
    for line in open(filename, 'r'):
        entries = line.rstrip().split(' ')
        nid = int(entries[0])
        label = int(entries[1])
        y[0,nid] = label
    return y

n = int(sys.argv[1])
W = load_adjucency(sys.argv[2],n)
y = load_labels(sys.argv[3],n)

D = dia_matrix((W.sum(0),[0]), W.shape)
L = D - W

I = identity(W.shape[0])

lam = 1
f = spsolve((I + lam * L), y)
print f

lp_original.py


# -*- coding: utf-8 -*-
import sys
from scipy.sparse import *
from scipy.sparse.linalg import spsolve
from sklearn.preprocessing import normalize

def load_matrix(filename,n,m):
    W = lil_matrix((n+m,n+m))
    for line in open(filename, 'r'):
        entries = line.rstrip().split(' ')
        src = int(entries[0])
        dst = int(entries[1])
        W[src,dst] = 1
    W = normalize(W,norm='l1',axis=1)
    Wul = W[n:n+m,0:n]
    Wuu = W[n:n+m,n:n+m]
    return (Wuu,Wul)

def load_labels(filename,n):
    y = lil_matrix((n,1))
    i = 0
    for line in open(filename, 'r'):
        label = int(line.rstrip())
        y[i,0] = label
        i += 1
    return y

def lp_solve(Wuu,Wul,y):
    return spsolve((identity(m) + Wuu), Wul*y)

def lp_iter(Wuu,Wul,y,t=10):
    i = 0
    f = lil_matrix((Wuu.shape[0],1))
    while True:
        f = Wuu * f + Wul * y
        if i > t:
            break
        i += 1
    return f

""" # of labeled nodes """
n = int(sys.argv[1])
""" # of unlabeled nodes """
m = int(sys.argv[2])
""" propagation matrix (row normalized) """
Wuu,Wul = load_matrix(sys.argv[3],n,m)
""" labels of labeled nodes """
y = load_labels(sys.argv[4],n)


print lp_solve(Wuu,Wul,y)
print '============='
print lp_iter(Wuu,Wul,y)

How to use

lp.py is executed by passing the number of nodes, the file path of the edge list, and the file path of the node label in this order. The data used is the same as the blog I referred to.

lp.Execution example of py


$ cat edge_list
0 2
2 0
0 4
4 0
0 9
9 0
1 2
2 1
2 3
3 2
3 4
4 3
3 6
6 3
5 9
9 5
6 7
7 6
6 8
8 6
7 9
9 7
$ cat labels
0 1
2 1
4 1
5 -1
6 -1
7 -1
$ python lp.py 10 edge_list labels
[ 0.45966011  0.23023256  0.46046512  0.1519678   0.5372093  -0.57951699
 -0.38980322 -0.51627907 -0.19490161 -0.15903399]

lp_original.py is executed by passing the number of labeled nodes n, the number of unlabeled nodes m, the file path of the edge list, and the file path of the edge list in this order.

However, the node numbers must be from 0 to n-1 to labeled nodes and from n to n + m to unlabeled nodes.

In this example, the correspondence between the node number passed to lp.py and the node number passed to lp_original.py is as follows.

lp.py [0,1,2,3,4,5,6,7,8,9] lp_original.py [0,6,1,7,2,3,4,5,8,9]

Therefore, each line of the label file contains the labels of the labeled nodes from 0 to n.

With this implementation, you only need to use the edge from the unlabeled node to the labeled node and the edge from the unlabeled node to the unlabeled node.

lp_original.Execution example of py


$ cat edge_list_for_original
9 0
6 1
7 1
7 2
7 4
9 3
8 4
9 5
$ cat labels_for_original
1
1
1
-1
-1
-1
$ python lp_original.py 6 4 edge_list_for_original labels_for_original
[ 1.          0.33333333 -1.         -0.33333333]
=============
  (0, 0)	1.0
  (1, 0)	0.333333333333
  (2, 0)	-1.0
  (3, 0)	-0.333333333333

The result is different from lp.py, but the tendency is the same.

Recommended Posts

Implemented label propagation method in Python
Implemented SimRank in Python
Simplex method (simplex method) in Python
Private method in python
Implemented Shiritori in Python
Sudoku solver implemented in Python 3
Implement method chain in Python
Suppressing method overrides in Python
6 Ball puzzle implemented in python
Implemented k-nearest neighbor method in python from scikit learn
Implemented image segmentation in python (Union-Find)
Try implementing extension method in python
Widrow-Hoff learning rules implemented in Python
Simulate Monte Carlo method in Python
Hash method (open address method) in Python
Implemented Perceptron learning rules in Python
Implemented in 1 minute! LINE Notify in Python
Let's build a belief propagation method (Python)
A simple HTTP client implemented in Python
Method to build Python environment in Xcode 6
Electron Microscopy Simulation in Python: Multislice Method (1)
Electron Microscopy Simulation in Python: Multislice Method (2)
I implemented Cousera's logistic regression in Python
Implemented in Python PRML Chapter 5 Neural Networks
Implemented Stooge sort in Python3 (Bubble sort & Quicksort)
Implemented in Python PRML Chapter 1 Bayesian Inference
Alignment algorithm by insertion method in Python
Quadtree in Python --2
Python in optimization
Metaprogramming in Python
Python 3.3 in Anaconda
Geocoding in python
SendKeys in Python
Meta-analysis in Python
Unittest in python
Epoch in Python
Discord in Python
Sudoku in Python
nCr in python
N-Gram in Python
Programming in python
Plink in Python
Constant in python
Lifegame in Python.
FizzBuzz in Python
Sqlite in python
StepAIC in Python
N-gram in python
LINE-Bot [0] in Python
Csv in python
Disassemble in Python
Reflection in Python
Constant in python
nCr in Python.
format in python
Scons in Python3
Puyo Puyo in python
python in virtualenv
PPAP in Python
Quad-tree in Python
Reflection in Python