I read and implemented the Variants of UKR

This article is the first day of Furukawa Lab Advent_calendar. This article was written by a student at Furukawa Lab as part of his studies. The content may be ambiguous or the expression may be slightly different.

Introduction

In this article, I would like to introduce what is called UKR (Unsupervised Kernel Regression) that I actually assembled. Translated into Japanese, it's unsupervised kernel regression, I read and implemented this paper.

If you are interested in learning manifolds, please try it.

What UKR wants to do

What UKR wants to do is observation data $ \ mathbf {Y} = (\ mathbf {y} _1, \ mathbf {y} _2, ... \ mathbf {y} _N) \ in \ mathbb {R} ^ {D × Latent variables corresponding to N} $ and each observation data

$ \ mathbf {X} = (\ mathbf {x} _1, \ mathbf {x} _2, ... \ mathbf {x} _N) \ in \ mathbb {R} ^ {M × N} $ It is to estimate the map $ \ mathbf {f}: \ mathbb {R} ^ {M} → \ mathbb {R} ^ {D} $.

The map $ \ mathbf {f} $ is estimated by the following formula.

\mathbf{f}(\mathbf{x} ; \mathbf{X})=\sum_{i} \mathbf{y}_{i} \frac{K\left(\mathbf{x}-\mathbf{x}_{i}\right)}{\sum_{j} K\left(\mathbf{x}-\mathbf{x}_{j}\right)}\tag{1}

Maybe some people are familiar with it. This formula is almost the same as the Nadaraya-Watoson estimation. For those unfamiliar, $ K $ in an expression is called a kernel function and is defined in this article with the following expression.

K\left(\mathbf{x}-\mathbf{x}^{\prime}\right)=\exp \left(\frac{-1}{2h}\left\|\mathbf{x}-\mathbf{x}^{\prime}\right\|^{2}\right)

This kernel function is also called the Gaussian kernel. The only difference from Nadaraya-Watoson is that the kernel width $ h $ is fixed at $ 1 $ in the paper.

The Nadaraya-Watoson estimation is explained in detail in this article.

The latent variable is mapped to the observation space by the formula (1), and the error from the observation data is calculated by the following formula.

R(\mathbf{X})=\frac{1}{N} \sum_{i}\left\|\mathbf{y}_{i}-\mathbf{f}\left(\mathbf{x}_{i} ; \mathbf{X}\right)\right\|^{2}\tag{2}

The gradient method reduces this error and updates the latent variables. When implementing it, you have no choice but to use automatic differentiation or to actually perform the differentiation yourself and write the formula in the program. In this article, I'm trying my best to use the formula that I differentiated myself. If you write all the expression expansions, it will be quite long and the burden on me will be great, so I will only post the final result.

The following variables are defined to simplify the formula.

r_{ij}=\frac{K\left(\mathbf{x}_{i}-\mathbf{x}_{j}\right)}{\sum_{j^{\prime}} K\left(\mathbf{x}_{i}-\mathbf{x}_{j^{\prime}}\right)}

\mathbf{d}_{ij}=\mathbf{f}(\mathbf{x}_{j};\mathbf{X})-\mathbf{y}_{i}

\mathbf{\delta}_{ij}=\mathbf{x}_{i}-\mathbf{x}_{j}

Using these variables, the derivative of Eq. (2) is expressed as follows.

This derivative is not quoted from the paper. Please note that we have not confirmed that it matches the one in the paper.

\frac{1}{N} \frac{\partial}{\partial \mathbf{x}_{n}} \sum_{i}\left\|\mathbf{y}_{i}-\mathbf{f}(\mathbf{x}_{i};\mathbf{X})\right\|^{2}=\frac{2}{N}\sum_{i}\left[r_{n i} \mathbf{d}_{n n}^{\mathrm{T}} \mathbf{d}_{n i} \boldsymbol{\delta}_{n i}-r_{i n} \mathbf{d}_{i i}^{\mathrm{T}} \mathbf{d}_{i n} \boldsymbol{\delta}_{i n}\right]

As a whole flow Estimate the map by equation (1) Repeat the update of the latent variable by the gradient method so that the error is minimized in Eq. (2). It will be.

Implementation result

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from numpy.random import*
import matplotlib.animation as anm
fig = plt.figure(1,figsize=(14, 6))
ax1 = fig.add_subplot(121,projection='3d')
ax2 = fig.add_subplot(122)
class UKR:
    def __init__(self,N,w,z,D,L):
        self.N = N
        self.zeta = z
        self.x = w
        self.f = []
        self.hist =[]
        self.sigma = 1.0
        self.lamda = 0.1
        self.test = []
        self.lr = 1.0
        self.D = D
        self.L = L
        self.gamma = 1.0 / (self.sigma * self.sigma)
    def fit(self,T):
        self.history = {"z":np.zeros((T, self.N, self.L)),
                        "Y":np.zeros((T, self.N ,self.D))}
        for t in range(T):
            self.delta = self.zeta[:,None,:] - self.zeta[None,:,:]
            self.h_kn = np.exp(-1 / (2*self.sigma ** 2) * np.sum((self.zeta[None, :, :] - self.zeta[:, None, :]) ** 2,axis=2))
            self.g_k = np.sum(self.h_kn,axis=1)
            self.r_ij = self.h_kn/self.g_k[:,None]
            self.f = np.zeros((self.N,self.D))
            self.f = self.r_ij @ self.x
            self.d_ij = self.f[:,None,:] - self.x[None,:,:]
            self.A = self.gamma * self.r_ij * np.einsum('nd,nid->ni', self.f - self.x, self.d_ij)
            self.bibun = np.sum((self.A + self.A.T)[:, :, None] * self.delta, axis=1)
            self.zeta -= self.lr*(self.bibun + self.lamda*self.zeta)
            self.history["Y"][t] = self.f
            self.history["z"][t] = self.zeta

if __name__=="__main__":
    N=20#The number of data
    T=50
    w = np.zeros((N*N,3))
    latent_space=np.random.normal
    zeta = np.dstack(np.meshgrid(np.linspace(-1,1,N),np.linspace(-1,1,N),indexing='ij'))
    zeta = np.reshape(zeta,(N*N,2))
    for i in range (N*N):
        mesh = zeta[i,0]**2-zeta[i,1]**2
        w[i] = np.append(zeta[i],mesh)
    np.random.seed(1)
    i=0
    ukr = UKR(N*N,w,zeta,3,2)
    ukr.fit(T)
    kekka = ukr.history["Y"]
    kekka = np.reshape(kekka,(T,N,N,3))
    k = ukr.history["z"]
    k = np.array(k)
    def update(i, zeta,w):
        print(i)
        ax1.cla()
        ax2.cla()
        ax1.scatter(w[:, 0], w[:, 1], w[:, 2], c=w[:, 0])
        ax1.plot_wireframe(kekka[i,:,:,0],kekka[i,:,:,1],kekka[i,:,:,2])
        ax2.scatter(k[i,:,0],k[i,:,1],c=w[:,0])
        ax1.set_xlabel("X_axis")
        ax1.set_ylabel("Y_axis")
        ax1.set_zlabel("Z_axis")
        ax2.set_title('latentspace')
        ax2.set_xlabel("X_axis")
        ax2.set_ylabel("Y_axis")

    ani = anm.FuncAnimation(fig, update, fargs = (zeta,w), interval = 500, frames = T-1)

    ani.save("test.gif", writer = 'imagemagick')

This is the implementation result when saddle type data is given as observation data. The figure on the left shows the observation space, where each point shows the manifold that connects the observation data and the mapped data of the mesh. The figure on the right shows the latent space.

It's good that the manifold gradually covers the observation data. It's hard to get tired of seeing it.

References

Variants of unsupervised kernel regression: General cost functions