Summary

There is a dimension reduction algorithm for data visualization called t-SNE.
Since t-SNE is an algorithm that is conscious of the local structure of data, it is compatible with DBSCAN, which is also a clustering method that is conscious of the local structure.
If the DBSCAN results are further classified by a decision tree, can the characteristics of the clusters that appear in the data visualized by t-SNE be explained automatically?

I tried it with the idea. In fact, I haven't figured out how it can be applied.

Try it with sklearn's load_boston

First import what you need

import numpy as np
from sklearn import datasets
from sklearn.manifold import TSNE
from matplotlib import pyplot as plt

Load boston and try to visualize it with TSNE Visually see some clusters appear

boston = datasets.load_boston()
model = TSNE(n_components=2)
tsne_result = model.fit_transform(boston.data) 
plt.plot(tsne_result[:,0], tsne_result[:,1], ".")

Let's cluster with kmeans once for comparison

from sklearn.cluster import MiniBatchKMeans
#Number of clusters`n_clusters`Looked at the TSNE graph and decided by feeling
kmeans = MiniBatchKMeans(n_clusters=10, max_iter=300)
kmeans_tsne = kmeans.fit_predict(tsne_result)

#Color it nicely
color=cm.brg(np.linspace(0,1,np.max(kmeans_tsne) - np.min(kmeans_tsne)+1))
for i in range(np.min(kmeans_tsne), np.max(kmeans_tsne)+1):
    plt.plot(tsne_result[kmeans_tsne == i][:,0],
             tsne_result[kmeans_tsne == i][:,1],
             ".",
             color=color[i]
             )
    plt.text(tsne_result[kmeans_tsne == i][:,0][0],
             tsne_result[kmeans_tsne == i][:,1][0],
             str(i), color="black", size=16
             )

Clusters (1,5), (2,8), and (4,7,9) are split, but structurally connected, which is not very desirable (for me).

Try clustering with DBSCAN

from sklearn.cluster import DBSCAN
# `eps`Is the result of trial and error
dbscan = DBSCAN(eps=3)
dbscan_tsne = dbscan.fit_predict(tsne_result)

#Color it nicely
color=cm.brg(np.linspace(0,1,np.max(dbscan_tsne) - np.min(dbscan_tsne)+1))
for i in range(np.min(dbscan_tsne), np.max(dbscan_tsne)+1):
    plt.plot(tsne_result[dbscan_tsne == i][:,0],
             tsne_result[dbscan_tsne == i][:,1],
             ".",
             color=color[i+1]
             )
    plt.text(tsne_result[dbscan_tsne == i][:,0][0],
             tsne_result[dbscan_tsne == i][:,1][0],
             str(i), color="black", size=16
             )

In DBSCAN, it is desirable because the connected islands are in the same cluster. (-1 is a cluster that contains things that are out of order)

In addition, generate a decision tree to try to explain each cluster well.

from sklearn import tree
clf = tree.DecisionTreeClassifier()
#dbscan-The label is because 1 cluster is generated-Start from 1
clf.classes_ = np.max(dbscan_tsne) - np.min(dbscan_tsne) + 1
clf.fit(boston.data, dbscan_tsne)

#Generate a graphviz dot file
with open("boston_tsne_dt.dot", 'w') as f:
    tree.export_graphviz(
        clf,
        out_file=f,
        feature_names=boston.feature_names,
        filled=True,
        rounded=True,  
        special_characters=True,
        impurity=False,
        proportion=False,
        class_names=map(str, range(-1, np.max(dbscan_tsne) - np.min(dbscan_tsne)+1))
    )

dot -T png boston_tsne_dt.dot > boston_tsne_dt.png

The result is shown in the figure below.

For reference, draw the target (house price) of each cluster.

plt.boxplot([boston.target[dbscan_tsne == i]
             for i in range(np.min(dbscan_tsne), 
                            np.max(dbscan_tsne)+1)],
            labels=range(np.min(dbscan_tsne), 
                         np.max(dbscan_tsne)+1)
            )

Consideration

To summarize what I was interested in,

Clusters with high tax rates (TAX) (6,7) do not mean higher prices
In other words, isn't the price high because it's an urban area?
Clusters with high prices (0,1,2) are located in places with low tax rates, and are subtly separated by the ratio of residential land (ZN) and age group (AGE).
However, the wide range of prices makes the cluster and price less relevant.
The price distribution of clusters (2,5) is close, but they are divided by tax rate.
Maybe the tax rate has some other characteristics linked to it, so did the TSNE results make a clear distinction between the two?
Among the clusters with low tax rates, 3 is a special cluster

However, when it comes to providing some information with this, I feel quite suspicious. By the way, even if you mix boston.target with the original data, the result will be quite close.

Perform (Visualization> Clustering> Feature Description) with (t-SNE, DBSCAN, Decision Tree)

Summary

Try it with sklearn's load_boston

Consideration