I tried hard to understand Spectral Normalization and singular value decomposition, which contribute to the stability of GAN.

Introduction

GAN: Content related to hostile generation networks. The model in GAN does not necessarily converge to an image that is indistinguishable from the real thing by training. The reason why the training does not proceed is the instability of gradient disappearance and mode collapse.

It is said that it is important to control the Lipschitz continuity and Lipschitz constant of the Discriminator for this instability. Spectral Normalization is useful for eliminating this instability.

Well, there are some words I don't understand. This time, I would like to summarize the contents of my own interpretation of these meanings.

Here is the book that I used as a reference this time as well.

I wrote a book to learn about deep learning and the latest GAN circumstances from Inpainting. https://qiita.com/koshian2/items/aefbe4b26a7a235b5a5e

What is Lipschitz continuity and Lipschitz function?

The function $ f (x) $ is Lipschitz continuous for any $ x_1 $, $ x_2 $.

|\frac{f(x_1)-f(x_2)}{x_1-x_2}|  \leq k formula 1

It means that there is a constant $ k $ that satisfies. This $ k $ is called the Lipschitz constant.

Now, before proceeding with the content of Lipschitz continuity, I would like to look back on the continuity of functions. If the function is simply continuous, it is as follows. What is continuous with $ x = x_0 $?

\lim_{x \to x_0} f(x) = f(x_0)Equation 2\\

It means that is established. And $ f (x) $ is a continuous function when it is continuous at all points of interest.

For example, the following example is a continuous function and not a continuous function.

image.png

I think it's easy to understand intuitively.

On the other hand, Lipschitz continuity is a function in which $ k $ that satisfies the above equation 1 exists.

001.png

In the figure above, if you draw a straight line with a slope of $ ± k $ at any point on the function, the state of the function graph is called Lipschitz continuity. Take $ y = x $ as an example. Equation 1

|\frac{f(x_1)-f(x_2)}{x_1-x_2}|  \leq k  \\
\Rightarrow 1\leq k

It will be. Therefore, if the value of $ k $ is 0.01, etc., the formula will not hold, and this function cannot be said to be Lipschitz continuous. Therefore, the fact that the function is continuous and that it is Lipschitz continuous

Lipschitz continuous\in continuous

It becomes a form that the continuation embraces.

In GAN, it is a rule of thumb that it is usually said that setting a constraint of $ k = 1 $ enhances stability.

Reference URL https://mathwords.net/lipschitz

What is singular value decomposition?

Next, we will explain singular value decomposition. This singular value decomposition is an operation in a matrix, which is necessary for Spectral Normalization below, so it is summarized here.

Singular value decomposition means that for any $ m × n $ matrix $ A $, the orthogonal matrix $ U, V $ where $ A = UΣV $ and the off-diagonal component are 0, and the diagonal component is non-negative and large. It is divided by the matrix $ Σ $ arranged in the order of. And this $ Σ $ component is called a singular value. Please refer to the following pdf for how to find $ U, V, Σ $.

http://www.cfme.chiba-u.jp/~haneishi/class/iyogazokougaku/SVD.pdf

Now, in Python, these singular value decompositions can be easily obtained.

SN.ipynb


import numpy as np
data = np.array([[1,2,3,4],[3,4,5,6]])
U, S, V = np.linalg.svd(data)
print(U)
print(S)
print(V)
[[-0.50566621 -0.86272921]
 [-0.86272921  0.50566621]]
[10.73807223  0.8329495 ] #Singular value
[[-0.28812004 -0.41555404 -0.54298803 -0.67042202] 
 [ 0.7854851   0.35681206 -0.07186099 -0.50053403]
 [-0.40008743  0.25463292  0.69099646 -0.54554195]
 [-0.37407225  0.79697056 -0.47172438  0.04882607]]

In this way, the singular value was confirmed to be [10.73807223 0.8329495]. You can see that the maximum singular value is about 10.74.

Reference URL https://thinkit.co.jp/article/16884

What is Spectral Normalization?

Now, about this last Spectral Normalization. A method called Batch Normalization (hereinafter referred to as Batch Norm) is famous for creating layers of neural networks. This Batch Norm is a method proposed in 2015. It is a layer that is incorporated after the fully connected layer and the convolution layer. The effects are as follows.

The processing is as follows.

image.png

As a mini-batch, $ x_1, x_2 ・ ・ ・ $ m $ of x_m $ For this input data, the average $ μB $ and the variance $ σ_B ^ 2 $ are calculated.

Batch Norm can enjoy these effects, but it is cited as a factor that impairs continuity when it comes to learning GAN. As you can see from the above formula, Batch Norm is a fractional function because it is divided by the standard deviation. It can be understood that the fractional function loses continuity because it is not continuous at $ x = 0 $.

Therefore, Spectral Normalization is the solution to this problem.

Spectral Normalization for Generative Adversarial Networks https://arxiv.org/abs/1802.05957

This is an author by a Japanese person and was announced by the people of Preferred Networks, Inc. Spectral Normalization is the idea of dividing the coefficient by the maximum singular value. You can ensure Lipschitz continuity and control the Lipschitz constant to be 1 for your model. To find this maximum singular value, use the above singular value decomposition.

It is very easy to implement. When using tensorflow, it can be implemented by specifying with ConvSN2D like the solution.

SN.ipynb


import tensorflow as tf
from inpainting_layers import ConvSN2D

inputs = tf.random.normal((16, 256, 256, 3))
x = ConvSN2D(64,3,padding='same')(inputs)

print(x.shape)

Now, this is a method to find the singular value, but if the svd method is applied as it is, the amount of calculation will be enormous, so we will use an algorithm called the power method.

The maximum singular value in the $ (N, M) $ matrix $ X $ is

  1. Define a matrix $ U $ for $ U: (1, X) $. At that time, it is initialized as a normal random number.
  2. Repeat the following P times

Estimate + $ V = L_2 (UX ^ T) $. However, $ L2 = x / \ sqrt (Σx_ {i, j}) + ε , ( ε $ is a minute amount) Estimate + $ U = L_2 (VX) $.

  1. Estimate $ σ = VXT ^ T $. $ σ $ is the maximum singular value.

When implemented, it will be as follows. The original data matrix is the one used above.

python


results = []

for p in range(1, 6):
    U = np.random.randn(1, data.shape[1])
    for i in range(p):
        V = l2_normalize(np.dot(U, data.T))
        U = l2_normalize(np.dot(V, data))
    sigma = np.dot(np.dot(V, data), U.T)
    results.append(sigma.flatten())

plt.plot(np.arange(1, 6), results)
plt.ylim([10, 11])

002.png

Well, around 10.74, I got the same result as before. In this way, it is required for implementation.

At the end

This time, we have summarized the contents related to Spectral Normalization. Although I grasped the general flow, I still lacked understanding of mathematical aspects. I would like to deepen my understanding as I continue to implement it.

The program is stored here. https://github.com/Fumio-eisan/SN_20200404

Recommended Posts

I tried hard to understand Spectral Normalization and singular value decomposition, which contribute to the stability of GAN.
[Linux] I learned LPIC lv1 in 10 days and tried to understand the mechanism of Linux.
I tried to visualize the age group and rate distribution of Atcoder
I tried to extract and illustrate the stage of the story using COTOHA
I tried to verify and analyze the acceleration of Python by Cython
I tried to display the altitude value of DTM in a graph
I tried to notify the update of "Hamelin" using "Beautiful Soup" and "IFTTT"
I tried to touch the API of ebay
I tried to correct the keystone of the image
Try singular value decomposition of the daimyo matrix
I didn't understand the Resize of TensorFlow so I tried to summarize it visually.
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to notify the update of "Become a novelist" using "IFTTT" and "Become a novelist API"
I tried to understand how to use Pandas and multicollinearity based on the Affairs dataset.
I tried to automate the article update of Livedoor blog with Python and selenium.
I tried to compare the processing speed with dplyr of R and pandas of Python
I tried to summarize the basic form of GPLVM
I want to fully understand the basics of Bokeh
I tried to visualize the spacha information of VTuber
I tried to erase the negative part of Meros
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python
I tried to automatically post to ChatWork at the time of deployment with fabric and ChatWork Api
I tried to get the number of days of the month holidays (Saturdays, Sundays, and holidays) with python
I tried to verify the yin and yang classification of Hololive members by machine learning
I tried to create a Python script to get the value of a cell in Microsoft Excel
I tried to find the entropy of the image with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried to get the location information of Odakyu Bus
I tried to find the average of the sequence with TensorFlow
I want to use only the normalization process of SudachiPy
I tried to illustrate the time and time in C language
I tried to display the time and today's weather w
[Python] I tried to visualize the follow relationship of Twitter
[Machine learning] I tried to summarize the theory of Adaboost
I want to know the features of Python and pip
I tried to enumerate the differences between java and python
I tried to fight the Local Minimum of Goldstein-Price Function
I displayed the chat of YouTube Live and tried playing
I wrote AWS Lambda, and I was a little addicted to the default value of Python arguments
I made my own 3-layer forward propagation neural network and tried to understand the calculation deeply.
I became horror when I tried to detect the features of anime faces using PCA and NMF.
I don't really understand the difference between modules, packages and libraries, so I tried to organize them.
I tried to predict the up and down of the closing price of Gurunavi's stock price using TensorFlow (progress)
I tried fitting the exponential function and logistics function to the number of COVID-19 positive patients in Tokyo
[Linux] I tried to summarize the command of resource confirmation system
(Python) Expected value ・ I tried to understand Monte Carlo sampling carefully
I tried to get the index of the list using the enumerate function
[Introduction to Python] I compared the naming conventions of C # and Python.
I tried to build the SD boot image of LicheePi Nano
I tried to visualize the Beverage Preference Dataset by tensor decomposition.
I want to use both key and value of Python iterator
I summarized how to change the boot parameters of GRUB and GRUB2
I tried to expand the size of the logical volume with LVM
I investigated the behavior of the difference between hard links and symbolic links
I tried to summarize the frequently used implementation method of pytest-mock
I tried to improve the efficiency of daily work with Python
I tried to visualize the common condition of VTuber channel viewers
I tried to take the difference of Config before and after work with pyATS / Genie self-made script
I tried to get and analyze the statistical data of the new corona with Python: Data of Johns Hopkins University