Implementation and experiment of convex clustering method

The code I wrote

It's a childish result, but I've done Publish the code on Github.

References

-[1] Continuation / Easy-to-understand pattern recognition-Introduction to unsupervised learning- (Very easy-to-understand book!) -[2] D. Lashkari and P. Golland: Convex clustering with exemplar-based models (Original work of convex clustering method)

My rough understanding

The convex clustering method is a clustering method in which the number of classes is unknown based on the parameter estimation of the mixed normal distribution.

The difference from the parameter estimation of the mixed normal distribution (hereinafter referred to as soft K-means) by the usual EM algorithm lies in the parameters for which initial values must be assigned.

Method initial value
soft K-means Number of normal distributions and mean of each distribution
Convex clustering method Variance of normal distribution(Unified with all distributions)

Therefore, it has the advantage of being less dependent on the initial value of the solution than soft K-means.

The likelihood of the normal distribution formed by each centroid is calculated by regarding all the observed data as centroids. The likelihood calculation is a one-time only, and only the prior probabilities are updated. With the update, the prior probabilities of centroids that are difficult to form a distribution approach zero. Even if there is a certain prior probability, the number of centroids remaining will be the final number of classes.

Motivation for implementation

I was interested because it was introduced in the literature [1]. It is attractive that it seems to be easy to implement and that it seems to be less dependent on the initial value. As an approach to making toys for my hobby, I experimented with the expectation that it would be easier to use than nonparametric Bayes.

Implementation

Implemented in Python + Numpy. Only the implementation part of the parameter update of the convex clustering method is shown.

convex_clustering.py


def calculateLikelihoodCombination(x, sigma, gaussianFunc):
    likelihood = np.empty(0)
    for i1 in x:
        likelihood = np.append(likelihood, gaussianFunc(x, i1, sigma))
    samples = len(x)
    return np.reshape(likelihood, (samples, samples))


def calculatePrior(prePrior, likelihood):
    samples = len(prePrior)
    molecule = (prePrior*likelihood.T).T
    denomin = np.sum(molecule, axis=0) + 1e-10
    return np.sum(molecule/denomin, axis=1)/samples


def runConvexClustering(x, hyperSigma, gaussian, plotter):
    # Pre-calculation of likelihood
    likelihood = calculateLikelihoodCombination(x, hyperSigma, gaussian)
    # Initialize prior probability.
    classNum = len(x)  # Number of samples is used as number of classes.
    firstPrior = 1.0/classNum  # Set non zero value.
    prior = np.repeat(firstPrior, classNum)

    # Update parameters.
    for dummy in range(1000):
        prior = calculatePrior(prior, likelihood)

It's so easy that you only need dozens of lines. I think it's shorter than the soft K-means code I wrote locally in the past.

Experimental conditions

factor level
data \mathbb{R}^{1}(1D data)、\mathbb{R}^{2}(2D data)
Number of normal distributions used for sampling Four
Number of samples from one normal distribution 100 points
Size of variance of initial value(magnification) usually(×1), Large(×2), Very big(×10)

The total number of samples in one experiment is 4 * 100 = 400 points. The magnification attached to the magnitude of the variance of the initial value is the magnification from the standard deviation of the normal distribution used for sampling.

result

The following table shows the state of clustering. The vertical axis of the table is the data, and the horizontal axis is the size of the variance of the initial value.

usually Large Very big
\mathbb{R}^{1} figure_1.png figure_2.png figure_3.png
\mathbb{R}^{2} figure_4.png figure_5.png figure_6.png

Blue is the sample, and green is the shape of the convex combination of the normal distribution generated from the centroid of the class obtained by the convex clustering method.

In addition, the number of classes left after the cut is as follows. Similarly, the vertical axis of the table is the data, and the horizontal axis is the size of the variance of the initial value.

usually Large Very big
\mathbb{R}^{1} 46 40 5
\mathbb{R}^{2} 23 11 15

Consideration

When the variance of the initial value is [normal, large], it seems that it is clustered reasonably well. I feel that clustering is more correct when it is [Large]. When applying to a real problem, it may be better to use a slightly larger variance than expected. When the variance of the initial value is [very large], four distributions are clustered as one distribution. But this is a fairly trivial result.

The number of centroids in the secondary data is closer to the number of normal distributions (= 4) used to generate the sample than in the primary data. This is also mentioned in Ref. [2]. The wider the space to be searched, the smaller the dependence on the initial value of the convex clustering method lives. However, I am a little worried that the number of centroids did not come close enough to 4 as in the literature [1] [2]. \ # I sincerely hope that it is not an implementation bug.

Summary

What I felt about the convex clustering method through implementation & experimentation is that "implementation is very easy, but there seems to be some tricks to apply it to actual problems." In the literature [1] [2], the advantages and disadvantages are stated in the discussion, and it seems difficult to use depending on the problem to be dealt with. I think it is a cost-effective clustering method, but I would like to consider more carefully whether it is an appropriate approach to the problem I want to solve.

Recommended Posts

Implementation and experiment of convex clustering method
Clustering of clustering method
Explanation and implementation of SocialFoceModel
[Deep Learning from scratch] Implementation of Momentum method and AdaGrad method
Verification and implementation of video reconstruction method using GRU and Autoencoder
Explanation and implementation of PRML Chapter 4
Introduction and Implementation of JoCoR-Loss (CVPR2020)
Explanation and implementation of ESIM algorithm
Introduction and implementation of activation function
Einsum implementation of value iterative method
Explanation and implementation of simple perceptron
Mathematical explanation of binary search and ternary search and implementation method without bugs
[Python] Implementation of Nelder–Mead method and saving of GIF images by matplotlib
[Recommendation] Summary of advantages and disadvantages of content-based and collaborative filtering / implementation method
Explanation and implementation of Decomposable Attention algorithm
Comparison of k-means implementation examples of scikit-learn and pyclustering
Implementation of TRIE tree with Python and LOUDS
Gradient method implementation 1
Explanation of edit distance and implementation in Python
Implementation of clustering k-shape method for time series data [Unsupervised learning with python Chapter 13]
[Python] Implementation of clustering using a mixed Gaussian model
Merge sort implementation / complexity analysis and experiment in Python
Implementation of object authenticity judgment condition using __bool__ method
Sequential update of covariance to derivation and implementation of expressions
Clustering and principal component analysis by K-means method (beginner)
Machine learning #k-nearest neighbor method and its implementation and various
I touched Wagtail (3). Investigation and implementation of pop-up messages.
Implementation of DB administrator screen by Flask-Admin and Flask-Login
Visualization method of data by explanatory variable and objective variable
Overview of generalized linear models and implementation in Python
Clustering experiment by sampling
parallelization of class method
Method overriding and super
Implementation of Fibonacci sequence
Perceptron basics and implementation
I considered the machine learning method and its implementation language from the tag information of Qiita
Summary of test method
Python implementation of CSS3 blend mode and talk of color space
Increase the speed of the Monte Carlo method of Cython cut-out implementation.
Derivation and implementation of update equations for non-negative tensor factorization
A simple Python implementation of the k-nearest neighbor method (k-NN)
Examination of Forecasting Method Using Deep Learning and Wavelet Transform-Part 2-
[Reinforcement learning] Explanation and implementation of Ape-X in Keras (failure)
Theory and implementation of multiple regression models-why regularization is needed-
Acoustic simulation FDTD method Mur absorption boundary derivation and implementation
Implementation of ML-EM method, cross-section reconstruction algorithm for CT scan
Explanation of CSV and implementation example in each programming language