The algorithm is super simple, with only one parameter, and let it be the threshold value r.
pyfof is a library that enables fast friend-friend clustering (Friends of Friends cluster finding) in python. Instead of simply implementing the friends-of-friends algorithm, it seems that the speedup was made possible by the method R * -tree. (I don't know the details).
Installation is
python
pip install pyfof
It was just OK (@ google colab, 2020.8.19)
python
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import pyfof
npts = 10000
ndim = 2
nptsperdim = int(npts/ndim)
data = np.vstack((np.random.normal(-1,0.2,(nptsperdim,ndim)),\
np.random.normal(1,0.2,(nptsperdim,ndim))))
groups = pyfof.friends_of_friends(data, 0.4)
colors = cm.rainbow(np.linspace(0, 1, len(groups)))
for g,c in zip(groups, colors):
plt.scatter(data[g,0], data[g,1], color=c, s=3)
plt.show()
Then
It can be neatly divided into two classes.
Next, why not put another class in the middle?
python
npts = 10000
ndim = 2
nptsperdim = int(npts/ndim)
data = np.vstack((np.random.normal(-1,0.2,(nptsperdim,ndim)),\
np.random.normal(1,0.2,(nptsperdim,ndim)),\
np.random.normal(0.,0.2,(nptsperdim,ndim))))
groups = pyfof.friends_of_friends(data, 0.4) # 0.If it is 4, it is too large and all are classified into the same class.
colors = cm.rainbow(np.linspace(0, 1, len(groups)))
for g,c in zip(groups, colors):
plt.scatter(data[g,0], data[g,1], color=c, s=3)
plt.show()
Then, they all had the same color, that is, the same class.
Let's change the range a little
python
npts = 10000
ndim = 2
nptsperdim = int(npts/ndim)
#Reduce Gaussian sigma.
data = np.vstack((np.random.normal(-1,0.1,(nptsperdim,ndim)),\
np.random.normal(1,0.1,(nptsperdim,ndim)),\
np.random.normal(0.,0.1,(nptsperdim,ndim))))
groups = pyfof.friends_of_friends(data, 0.2) # 0.2 and make the standard a little smaller. And the upper sigma was made smaller.
colors = cm.rainbow(np.linspace(0, 1, len(groups)))
for g,c in zip(groups, colors):
plt.scatter(data[g,0], data[g,1], color=c, s=3)
plt.show()
Looking at it
It was properly classified into 3 classes. This is because the Gaussian sigma was reduced and the criterion was reduced from 0.4 to 0.2.
The code can also be viewed at google colab.
Recommended Posts