I read and implemented the Variants of UKR

This article is the first day of Furukawa Lab Advent_calendar. This article was written by a student at Furukawa Lab as part of his studies. The content may be ambiguous or the expression may be slightly different.

Introduction

In this article, I would like to introduce what is called UKR (Unsupervised Kernel Regression) that I actually assembled. Translated into Japanese, it's unsupervised kernel regression, I read and implemented this paper.

If you are interested in learning manifolds, please try it.

What UKR wants to do

What UKR wants to do is observation data $ \ mathbf {Y} = (\ mathbf {y} _1, \ mathbf {y} _2, ... \ mathbf {y} _N) \ in \ mathbb {R} ^ {D × Latent variables corresponding to N} $ and each observation data

$ \ mathbf {X} = (\ mathbf {x} _1, \ mathbf {x} _2, ... \ mathbf {x} _N) \ in \ mathbb {R} ^ {M × N} $ It is to estimate the map $ \ mathbf {f}: \ mathbb {R} ^ {M} → \ mathbb {R} ^ {D} $.

The map $ \ mathbf {f} $ is estimated by the following formula.

\mathbf{f}(\mathbf{x} ; \mathbf{X})=\sum_{i} \mathbf{y}_{i} \frac{K\left(\mathbf{x}-\mathbf{x}_{i}\right)}{\sum_{j} K\left(\mathbf{x}-\mathbf{x}_{j}\right)}\tag{1}

Maybe some people are familiar with it. This formula is almost the same as the Nadaraya-Watoson estimation. For those unfamiliar, $ K $ in an expression is called a kernel function and is defined in this article with the following expression.

K\left(\mathbf{x}-\mathbf{x}^{\prime}\right)=\exp \left(\frac{-1}{2h}\left\|\mathbf{x}-\mathbf{x}^{\prime}\right\|^{2}\right)

This kernel function is also called the Gaussian kernel. The only difference from Nadaraya-Watoson is that the kernel width $ h $ is fixed at $ 1 $ in the paper.

The latent variable is mapped to the observation space by the formula (1), and the error from the observation data is calculated by the following formula.

R(\mathbf{X})=\frac{1}{N} \sum_{i}\left\|\mathbf{y}_{i}-\mathbf{f}\left(\mathbf{x}_{i} ; \mathbf{X}\right)\right\|^{2}\tag{2}

The gradient method reduces this error and updates the latent variables. When implementing it, you have no choice but to use automatic differentiation or to actually perform the differentiation yourself and write the formula in the program. In this article, I'm trying my best to use the formula that I differentiated myself. If you write all the expression expansions, it will be quite long and the burden on me will be great, so I will only post the final result.

r_{ij}=\frac{K\left(\mathbf{x}_{i}-\mathbf{x}_{j}\right)}{\sum_{j^{\prime}} K\left(\mathbf{x}_{i}-\mathbf{x}_{j^{\prime}}\right)}
\mathbf{d}_{ij}=\mathbf{f}(\mathbf{x}_{j};\mathbf{X})-\mathbf{y}_{i}
\mathbf{\delta}_{ij}=\mathbf{x}_{i}-\mathbf{x}_{j}

Using these variables, the derivative of Eq. (2) is expressed as follows.

\frac{1}{N} \frac{\partial}{\partial \mathbf{x}_{n}} \sum_{i}\left\|\mathbf{y}_{i}-\mathbf{f}(\mathbf{x}_{i};\mathbf{X})\right\|^{2}=\frac{2}{N}\sum_{i}\left[r_{n i} \mathbf{d}_{n n}^{\mathrm{T}} \mathbf{d}_{n i} \boldsymbol{\delta}_{n i}-r_{i n} \mathbf{d}_{i i}^{\mathrm{T}} \mathbf{d}_{i n} \boldsymbol{\delta}_{i n}\right]

As a whole flow Estimate the map by equation (1) Repeat the update of the latent variable by the gradient method so that the error is minimized in Eq. (2). It will be.

Implementation result

from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
import numpy as np
from numpy.random import*
import matplotlib.animation as anm
fig = plt.figure(1,figsize=(14, 6))
ax1 = fig.add_subplot(121,projection='3d')
ax2 = fig.add_subplot(122)
class UKR:
    def __init__(self,N,w,z,D,L):
        self.N = N
        self.zeta = z
        self.x = w
        self.f = []
        self.hist =[]
        self.sigma = 1.0
        self.lamda = 0.1
        self.test = []
        self.lr = 1.0
        self.D = D
        self.L = L
        self.gamma = 1.0 / (self.sigma * self.sigma)
    def fit(self,T):
        self.history = {"z":np.zeros((T, self.N, self.L)),
                        "Y":np.zeros((T, self.N ,self.D))}
        for t in range(T):
            self.delta = self.zeta[:,None,:] - self.zeta[None,:,:]
            self.h_kn = np.exp(-1 / (2*self.sigma ** 2) * np.sum((self.zeta[None, :, :] - self.zeta[:, None, :]) ** 2,axis=2))
            self.g_k = np.sum(self.h_kn,axis=1)
            self.r_ij = self.h_kn/self.g_k[:,None]
            self.f = np.zeros((self.N,self.D))
            self.f = self.r_ij @ self.x
            self.d_ij = self.f[:,None,:] - self.x[None,:,:]
            self.A = self.gamma * self.r_ij * np.einsum('nd,nid->ni', self.f - self.x, self.d_ij)
            self.bibun = np.sum((self.A + self.A.T)[:, :, None] * self.delta, axis=1)
            self.zeta -= self.lr*(self.bibun + self.lamda*self.zeta)
            self.history["Y"][t] = self.f
            self.history["z"][t] = self.zeta

if __name__=="__main__":
    N=20#The number of data
    T=50
    w = np.zeros((N*N,3))
    latent_space=np.random.normal
    zeta = np.dstack(np.meshgrid(np.linspace(-1,1,N),np.linspace(-1,1,N),indexing='ij'))
    zeta = np.reshape(zeta,(N*N,2))
    for i in range (N*N):
        mesh = zeta[i,0]**2-zeta[i,1]**2
        w[i] = np.append(zeta[i],mesh)
    np.random.seed(1)
    i=0
    ukr = UKR(N*N,w,zeta,3,2)
    ukr.fit(T)
    kekka = ukr.history["Y"]
    kekka = np.reshape(kekka,(T,N,N,3))
    k = ukr.history["z"]
    k = np.array(k)
    def update(i, zeta,w):
        print(i)
        ax1.cla()
        ax2.cla()
        ax1.scatter(w[:, 0], w[:, 1], w[:, 2], c=w[:, 0])
        ax1.plot_wireframe(kekka[i,:,:,0],kekka[i,:,:,1],kekka[i,:,:,2])
        ax2.scatter(k[i,:,0],k[i,:,1],c=w[:,0])
        ax1.set_xlabel("X_axis")
        ax1.set_ylabel("Y_axis")
        ax1.set_zlabel("Z_axis")
        ax2.set_title('latentspace')
        ax2.set_xlabel("X_axis")
        ax2.set_ylabel("Y_axis")

    ani = anm.FuncAnimation(fig, update, fargs = (zeta,w), interval = 500, frames = T-1)

    ani.save("test.gif", writer = 'imagemagick')

test.gif

This is the implementation result when saddle type data is given as observation data. The figure on the left shows the observation space, where each point shows the manifold that connects the observation data and the mapped data of the mesh. The figure on the right shows the latent space.

It's good that the manifold gradually covers the observation data. It's hard to get tired of seeing it.

References

Variants of unsupervised kernel regression: General cost functions

Recommended Posts

I read and implemented the Variants of UKR
I read the implementation of golang channel
I read the implementation of range (Objects / rangeobject.c)
I checked out the versions of Blender and Python
I checked the default OS and shell of docker-machine
I read "Basics of Electric Circuits and Transmission Lines"
I read the SHAP paper
I implemented the FloodFill algorithm with TRON BATTLE of CodinGame.
I displayed the chat of YouTube Live and tried playing
I implemented N-Queen in various languages and measured the speed
The story of Python and the story of NaN
I investigated the mechanism of flask-login!
Run Pylint and read the results
I implemented the K-means method (clustering method)
I compared the speed of Hash with Topaz, Ruby and Python
I want to read the html version of "OpenCV-Python Tutorials" OpenCV 3.1 version
Read the image of the puzzle game and output the sequence of each block
[Python] I thoroughly explained the theory and implementation of logistic regression
[Python] I thoroughly explained the theory and implementation of decision trees
I investigated the behavior of the difference between hard links and symbolic links
I replaced the numerical calculation of Python with Rust and compared the speed
I checked the contents of docker volume
I tried the asynchronous server of Django 3.0
This and that of the inclusion notation.
I read the Sudachi synonym dictionary with Pandas and searched for synonyms
I tried to visualize the age group and rate distribution of Atcoder
What I did to keep track of the humidity and temperature of the archive
[Cliff in 2025] The Ministry of Economy, Trade and Industry's "DX Report 2" was published, so I read it.
Review the concept and terminology of regression
I didn't know the basics of Python
I vectorized the chord of the song with word2vec and visualized it with t-SNE
I see, I will read the UNIX process
I tried to extract and illustrate the stage of the story using COTOHA
Read the graph image with OpenCV and get the coordinates of the final point of the graph
I tried to verify and analyze the acceleration of Python by Cython
The story of trying deep3d and losing
I want to analyze the emotions of people who want to meet and tremble
Read all the contents of proc / [pid]
I checked the number of closed and opened stores nationwide by Corona
Read the implementation of ARM global timer
I measured the speed of list comprehension, for and while with python2.7.
I implemented the VGG16 model in Keras and tried to identify CIFAR10
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
Talking about the features that pandas and I were in charge of in the project
I compared the speed of regular expressions in Ruby, Python, and Perl (2013 version)
I tweeted the illuminance of the room with Raspberry Pi, Arduino and optical sensor
[Python] I thoroughly explained the theory and implementation of support vector machine (SVM)
I compared the moving average of IIR filter type with pandas and scipy
I tried the pivot table function of pandas
[Python] Read the source code of Bottle Part 2
I tried cluster analysis of the weather map
About the behavior of copy, deepcopy and numpy.copy
Summary of the differences between PHP and Python
I implemented DCGAN and tried to generate apples
Full understanding of the concepts of Bellman-Ford and Dijkstra
I solved the deepest problem of Hiroshi Yuki.
The answer of "1/2" is different between python2 and 3
Specifying the range of ruby and python arrays
I checked the list of shortcut keys of Jupyter
I tried to touch the API of ebay
Change the color of Fabric errors and warnings