Summary

I've done this.

An example is a scatter plot of MNIST reduced in dimension by t-SNE. Hover your mouse cursor over a plot to see the diagram that corresponds to that plot.

environment

python 3.7.3 anaconda
matplotlib 3.1.0
numpy 1.16.4
sklearn 0.21.2

First of all, t-SNE processing

This article focuses on how to use matplotlib. The explanation of MNIST and t-SNE is omitted.

However, what kind of variable name and how it was created depends on the program, so I will post the code there.

import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.manifold import TSNE

width = 28
nskip = 35

mnist = fetch_openml("mnist_784", version=1)
mnist_img = mnist["data"][::nskip, :]
mnist_label = mnist["target"][::nskip]
mnist_int = np.asarray(mnist_label, dtype=int)

x_embedded = TSNE(n_components=2).fit_transform(mnist_img)

width is the width of the image. nskip is the sample lottery probability (the reciprocal of). As it is, the sample size is 70,000, which is too much for plotting, so set it to 1/35 and use 2000 samples.

Details of other sequences are as follows.

mnist_img: (2000, 784) An array of dimensional double precision floating point numbers. Raw image data, stored with pixel values from 0 to 255.
mnist_label: (2000,) An array of dimensions. Numeric labels are stored as character strings.
mnist_int: (2000,) An array of dimensions. An integer type of mnist_label.

Ordinary plot

Probably the second most straightforward plot.

    plt.xlim(x_embedded[:, 0].min(), x_embedded[:, 0].max())
    plt.ylim(x_embedded[:, 1].min(), x_embedded[:, 1].max())
    for x, label in zip(x_embedded, mnist_label):
        plt.text(x[0], x[1], label)
    plt.xlabel("component 0")
    plt.ylabel("component 1")
    plt.show()

コメント 2020-01-24 224601.png

The idea of plotting numbers https://qiita.com/stfate/items/8988d01aad9596f9d586 Relied on.

If you use scatter obediently, the range of x and y axes will be adjusted automatically. Due to the method of placing text at each point, you can adjust xlim and ylim yourself. Must be. However, you can see at a glance that groups are formed for each number, and sometimes other numbers are mixed like noise.

Look at this

After all, I want to make the points just points. If the numbers overlap, it becomes difficult to understand.
However, it is troublesome to follow the color of dots and the correspondence between colors and numbers.
Then, if you move the cursor to the point you care about, you can display the details.

I thought. So, let's display the label by mouse over first.

Annotation display by mouse over

I found the answer on Stackoverflow.

https://stackoverflow.com/questions/7908636/possible-to-make-labels-appear-when-hovering-over-a-point-in-matplotlib

Change this code for this MNIST.

    fig, ax = plt.subplots()
    cmap = plt.cm.RdYlGn

    sc = plt.scatter(x_embedded[:, 0], x_embedded[:, 1], c=mnist_int/10.0, cmap=cmap, s=3)
    annot = ax.annotate("", xy=(0,0), xytext=(20,20),textcoords="offset points",
                    bbox=dict(boxstyle="round", fc="w"),
                    arrowprops=dict(arrowstyle="->"))
    annot.set_visible(False)
    def update_annot(ind):
        i = ind["ind"][0]
        pos = sc.get_offsets()[i]
        annot.xy = pos
        text = mnist_label[i]
        annot.set_text(text)
        annot.get_bbox_patch().set_facecolor(cmap(int(text)/10))

    def hover(event):
        vis = annot.get_visible()
        if event.inaxes == ax:
            cont, ind = sc.contains(event)
            if cont:
                update_annot(ind)
                annot.set_visible(True)
                fig.canvas.draw_idle()
            else:
                if vis:
                   annot.set_visible(False)
                   fig.canvas.draw_idle()

    fig.canvas.mpl_connect("motion_notify_event", hover)

    plt.show()

Well, it uses a common event in the GUI. First, create a blank ʻAnnotation object ʻannot, and then create a function ʻupdate_annotthat updates its position, contents, etc. Register thehover function as fig.canvas.mpl_connect ("motion_notify_event", hover) and call ʻupdate_annot while displaying ʻannot if the cursor points to any point in that hover Hide ʻannot if it doesn't point to a point.

In order to set the color to c = mnist_int / 10.0 in scatter, I prepared the mnist_int array with the label as an integer.

Now you can draw an interactive scatter plot like the one in the video above.

I've skipped this time, but I think it would be kinder to display a legend of colors and numbers somewhere.

By doing so far, I got more dissatisfaction.

"How special does a noisy point, for example 7 in a cluster of 1s, look? Maybe the algorithm makes a normal point noisy, so just the label I want to check the raw data as well. "

In order to realize this, it seems good to display the original image by mouse over.

Image display with mouse over

There was officially a demo to display images with annotations.

https://matplotlib.org/3.1.0/gallery/text_labels_and_annotations/demo_annotation_box.html

Combine this with the event registration mentioned earlier.

First of all, since the annotation was only text earlier, ʻAnnotation` was fine, but when it comes to images, it is a little troublesome,

First, give the image to a class object called ʻOffsetImage`,
Have it in ʻAnnotation Bbox`

A two-step operation is required. First, import the required class.

from matplotlib.offsetbox import OffsetImage, AnnotationBbox

Preparing the required graph objects.

    fig, ax = plt.subplots()
    cmap = plt.cm.RdYlGn

After that, prepare ʻOffsetImage`, but at that time, use the 0th image as a dummy image.

    img = mnist_img[0, :].reshape((width, width))
    imagebox = OffsetImage(img, zoom=1.0)
    imagebox.image.axes = ax

Based on that, make ʻAnnotation Bbox`.

    annot = AnnotationBbox(imagebox, xy=(0,0), xybox=(width,width),
                        xycoords="data", boxcoords="offset points", pad=0.5,
                        arrowprops=dict( arrowstyle="->", connectionstyle="arc3,rad=-0.3"))
    annot.set_visible(False)
    ax.add_artist(annot)

Note that xybox does not indicate the size of ʻannot, but the position relative to the annotation point xy`.

Next is the update of plots and images.

    sc = plt.scatter(x_embedded[:, 0], x_embedded[:, 1], c=mnist_int/10.0, cmap=cmap, s=3)

    def update_annot(ind):
        i = ind["ind"][0]
        pos = sc.get_offsets()[i]
        annot.xy = (pos[0], pos[1])
        img = mnist_img[i, :].reshape((width, width))
        imagebox.set_data(img)

I am updating the new image data for ʻimagebox, but it seems that the update process is not necessary for ʻannot.

Also, as I experimented separately, even if the size of ʻimg` changes, it responds dynamically, so I think that there is a process to add it separately even if there are various sizes.

The rest of the event registration is the same.

    def hover(event):
        vis = annot.get_visible()
        if event.inaxes == ax:
            cont, ind = sc.contains(event)
            if cont:
                update_annot(ind)
                annot.set_visible(True)
                fig.canvas.draw_idle()
            else:
                if vis:
                    annot.set_visible(False)
                    fig.canvas.draw_idle()

    fig.canvas.mpl_connect("motion_notify_event", hover)

    plt.show()

I'm sorry that it is displayed only for a moment, but I found that the noise point is certainly a shape that is close to another number and is likely to be misread.

the end

Well, it's a meaningless technique to do with image output, but I think it can be used quite well during trial and error.

In this talk, I learned about the existence of the source abstract class ʻArtist` that describes objects in matplotlib.

Bonus code as a whole

Change the plot part you want to display from ʻif False:to ʻif True:. We have not confirmed the operation when multiple are set to True.

Click here to expand / fold

import numpy as np
from sklearn.datasets import fetch_openml
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
from matplotlib.offsetbox import OffsetImage, AnnotationBbox

width = 28
nskip = 35

mnist = fetch_openml("mnist_784", version=1)
mnist_img = mnist["data"][::nskip, :]
mnist_label = mnist["target"][::nskip]
mnist_int = np.asarray(mnist_label, dtype=int)
print(type(mnist_img))
print(mnist_img.max())
print(mnist_img.dtype)
exit()

x_embedded = TSNE(n_components=2).fit_transform(mnist_img)

if True: 
    plt.xlim(x_embedded[:, 0].min(), x_embedded[:, 0].max())
    plt.ylim(x_embedded[:, 1].min(), x_embedded[:, 1].max())
    for x, label in zip(x_embedded, mnist_label):
        plt.text(x[0], x[1], label)
    plt.xlabel("component 0")
    plt.ylabel("component 1")
    plt.show()
    exit()


fig, ax = plt.subplots()
cmap = plt.cm.RdYlGn

if False: 
    sc = plt.scatter(x_embedded[:, 0], x_embedded[:, 1], c=mnist_int/10.0, cmap=cmap, s=3)
    annot = ax.annotate("", xy=(0,0), xytext=(20,20),textcoords="offset points",
                    bbox=dict(boxstyle="round", fc="w"),
                    arrowprops=dict(arrowstyle="->"))
    annot.set_visible(False)
    def update_annot(ind):
        i = ind["ind"][0]
        pos = sc.get_offsets()[i]
        annot.xy = pos
        text = mnist_label[i]
        annot.set_text(text)
        annot.get_bbox_patch().set_facecolor(cmap(int(text)/10))

    def hover(event):
        vis = annot.get_visible()
        if event.inaxes == ax:
            cont, ind = sc.contains(event)
            if cont:
                update_annot(ind)
                annot.set_visible(True)
                fig.canvas.draw_idle()
            else:
                if vis:
                   annot.set_visible(False)
                   fig.canvas.draw_idle()

    fig.canvas.mpl_connect("motion_notify_event", hover)

    plt.show()

if False:
    img = mnist_img[0, :].reshape((width, width))
    imagebox = OffsetImage(img, zoom=1.0)
    imagebox.image.axes = ax

    sc = plt.scatter(x_embedded[:, 0], x_embedded[:, 1], c=mnist_int/10.0, cmap=cmap, s=3)
    annot = AnnotationBbox(imagebox, xy=(0,0), xybox=(width,width),
                        xycoords="data", boxcoords="offset points", pad=0.5,
                        arrowprops=dict( arrowstyle="->", connectionstyle="arc3,rad=-0.3"))
    annot.set_visible(False)
    ax.add_artist(annot)

    def update_annot(ind):
        i = ind["ind"][0]
        pos = sc.get_offsets()[i]
        annot.xy = (pos[0], pos[1])
        img = mnist_img[i, :].reshape((width, width))
        imagebox.set_data(img)

    def hover(event):
        vis = annot.get_visible()
        if event.inaxes == ax:
            cont, ind = sc.contains(event)
            if cont:
                update_annot(ind)
                annot.set_visible(True)
                fig.canvas.draw_idle()
            else:
                if vis:
                    annot.set_visible(False)
                    fig.canvas.draw_idle()

    fig.canvas.mpl_connect("motion_notify_event", hover)

    plt.show()

Mouse over Matplotlib to display the corresponding image