In the Jupyter Notebook environment called Colaboratory, which I had been interested in for a long time, I just tried running GAN, which I had been interested in for a long time.
Regarding Colavoratory, the article [Use free GPU at speed per second] Deep Learning Practice Tips on Colaboratory was helpful.
For GAN, I took a quick look at Thesis Generative Adversarial Networks. GAN is a kind of method to approximate the probability distribution of the data at hand (considered as a uniform distribution). When the two networks G and D are trained nicely, the probability distribution of the data generated by G is at hand. It seems to match the probability distribution of the data. It may not be possible to study in a good way, so we are currently researching ways to do so.
For the GAN code, I referred to here. GAN was simply coded using keras and I learned a lot.
We define two MLPs (this is G and D), give the output of G to D, and let Adam learn alternately. D learns to distinguish between "data at hand" and "output of G". G trains by manipulating the teacher data so that the discrimination result of D becomes "data at hand". At this time, D is not trained. The training data is MNIST.
If you think there is a ReLU that you are not familiar with, it seems that it is called Leaky ReLU, which is often used these days. (Reference: About the activation function ReLU and the ReLU clan [additional information]) Unlike ReLU, even if the input x of the activation function is 0 or less, x * α The value of is output. It is said that it is not effective for the wiki, but is it possible to reduce the gradient disappearance as much as possible? I'm not sure.
The code ran without any problems, but it doesn't return history because I'm training by running train_on_batch inside my own loop instead of fit. I want to visualize loss and acc, so I'll add code to save it as an instance variable and code for visualization.
# save
self.all_d_loss_real.append(d_loss_real)
self.all_d_loss_fake.append(d_loss_fake)
self.all_g_loss.append(g_loss)
if epoch % sample_interval == 0:
self.sample_images(epoch)
np.save('d_loss_real.npy', self.all_d_loss_real)
np.save('d_loss_fake.npy', self.all_d_loss_fake)
np.save('g_loss.npy', self.all_g_loss)
real is the D loss of the data at hand, and fake is the D loss of the data generated by G. Code to save locally.
from google.colab import files
import os
file_list = os.listdir("images")
for file in file_list:
files.download("images"+os.sep+file)
files.download('d_loss_real.npy')
files.download('d_loss_fake.npy')
files.download('g_loss.npy')
It is a code to plot loss and so on.
import numpy as np
import pylab as plt
t1 = np.load('d_loss_real.npy')
t2 = np.reshape(np.load('d_loss_fake.npy'),[np.shape(t1)[0],2])
g_loss = np.load('g_loss.npy')
t = (t1+t2)/2
d_loss = t[:,0]
acc = t[:,1]
d_loss_real = t1[:,0]
d_loss_fake = t2[:,0]
acc_real = t1[:,1]
acc_fake = t2[:,1]
n_epoch = 29801
x = np.linspace(1,n_epoch,n_epoch)
plt.plot(x, acc, label='acc')
plt.plot(x, d_loss, label='d_loss')
plt.plot(x, g_loss, label='g_loss')
plt.plot(x, d_loss_real, label='d_loss_real')
plt.plot(x, d_loss_fake, label='d_loss_fake')
plt.legend()
plt.ylim([0, 2])
plt.grid()
plt.show()
#moving average
num=100#Number of moving averages
b=np.ones(num)/num
acc2=np.convolve(acc, b, mode='same')
d_loss2=np.convolve(d_loss, b, mode='same')
d_loss_real2=np.convolve(d_loss_real, b, mode='same')
d_loss_fake2=np.convolve(d_loss_fake, b, mode='same')
g_loss2=np.convolve(g_loss, b, mode='same')
x = np.linspace(1,n_epoch,n_epoch)
plt.plot(x, acc2, label='acc')
plt.plot(x, d_loss2, label='d_loss')
plt.plot(x, g_loss2, label='g_loss')
plt.plot(x, d_loss_real2, label='d_loss_real')
plt.plot(x, d_loss_fake2, label='d_loss_fake')
plt.legend()
plt.ylim([0,1.2])
plt.grid()
plt.show()
Generated image of G epoch=0
epoch=200
epoch=1000
epoch=3000
epoch=7000
epoch=10000
epoch=20000
epoch=30000
As the number of epochs increases, images similar to MNIST will be generated, but it seems that there will be no particular change from around epoch 7000.
Correct answer rate and loss
Moving average in the above figure (n = 100, padded with zeros at both ends)
From about epoch 7,000, acc 0.63, d_loss (also real and fake) 0.63, g_loss 1.02 ~ 1.08 (slight increase) (d_loss and g_loss are binary cross entropy). real is the D loss of the data at hand, fake is the D loss of the data generated by G, and d_loss is the average.
loss is defined as the following formula.
\textrm{loss} = -\frac{1}{N}\sum_{n=1}^{N}\bigl( y_n\log{p_n}+(1-y_n)\log{(1-p_n)}\bigr)
N is the number of data, y is the label, and p is the output value of D (0,1).
It's confusing because it contains $ \ log $, but what we're doing is the average D output $ \ bigl (\ prod_ {n = 1} ^ {N} p_n ^ {y_n} \ bigr) ^ {\ I changed frac {1} {N}} $ to $ \ log $, and $ \ log $ from 0 to 1 is a negative number, so it's hard to see, so I just added a minus to make it a positive number.
\begin{align}
\textrm{loss} &= -\frac{1}{N}\sum_{n=1}^{N}\bigl( y_n\log{p_n}+(1-y_n)\log{(1-p_n)}\bigr) \\
&= -\log{\bigl( \prod_{n=1}^{N}p_n^{y_n}\bigr)^{\frac{1}{N}}} -\log{\bigl( \prod_{n=1}^{N}(1-p_n)^{y_n}\bigr)^{\frac{1}{N}}}
\end{align}
◯ Loss around epoch 25000
loss | Average D output | |
---|---|---|
g_loss | 1.06 | 0.35 |
d_loss | 0.63 | 0.53 |
d_loss_real | 0.63 | 0.53 |
d_loss_fake | 0.63 | 0.47 |
The more the label matches the output, the smaller the loss. GAN does not aim to reduce loss, so it is not a problem that it does not decrease.
If the learning goes well and the data at hand and the data generated by G are completely indistinguishable (the purpose of GAN is to be in this state), acc = 0.5 should be, but as far as the result is seen, it is so. It is not.
Looking at the image generated by G, it seems that it is clearly not a handwritten number, which is probably the reason why acc is high. It may be a little better if you play with the parameters, but since the purpose is not to drive in, I will stop here for the time being.
The value of g_loss means that the lower the value of g_loss, the more D determines that the image generated by G is true-that is, the more D is deceived. On the contrary, the higher the value of g_loss, the more D is not deceived. If the goal is to have an average D output of g_loss of 0.5, then g_loss is 0.7, so I would like it to drop a little further.
I don't think it happens that acc matches d_loss.
From epoch7000 ~, it is worrisome that the amount of decrease in d_loss_fake is smaller than the amount of increase in g_loss. Even with the average D output, there is a difference of about 10 times. Since the order is D learning → G learning, is that effective for Moro?
I felt like I was able to do it. I don't think there's anything particularly clogged up, but because the colaboratory isn't very stable, if you think the calculation crashes in the middle or the screen is reloaded, the past notebook is displayed for some reason, and you don't notice it. I overwrote it and rewrote the code sobbing.
Be careful if this pop-up appears from below after reloading the screen. If you look closely at the code, it is the unedited code immediately after opening the Colaboratory, and when you save it, the edited code is overwritten.
As a countermeasure, I think you should reload the page. My browser is Safari, but when I press Ctrl-r to reload the page, the edited code is displayed, and the variables after execution are also kept. If this pop-up appears, I think it's safer not to rush to overwrite it.
I think you have to make regular backups of calculation crashes.
Recommended Posts