The figure of what I was doing is easy to understand. In short, the encoder compresses the information by reducing the dimension of the image data, and the decoder reconstructs the image using the compressed information. Calculate and optimize the distance MAE between the pixels of the input image and the output image. Unsupervised learning.
Now how do you generate an image using this autoencoder? First, train the autoencoder using images. Optimize so that the parameters of the two networks, the encoder and the decoder, are appropriate. Then you can get a rough feel of how the input image is represented in the latent space. When using this as a generative model, basically the encoder part is not used, only the latent space and the decoder are used.
While ordinary autoencoders try to learn latent space as an array, variational encoders try to find appropriate parameters that define the distribution of latent space. Then, the image is reconstructed by sampling the value from this latent distribution to obtain a specific value and inputting it to the decoder.
The left is a normal autoencoder and the right is a variational encoder.
For details, this article is very easy to understand.
GAN
For more details on GAN's movements, see other articles. Here, the characteristic properties of GAN are described. In the GAN architecture, both the generator and the classifier are trained by the classifier's loss function. The discriminator's own training attempts to minimize discriminator loss for all training data. The generator, on the other hand, tries to maximize the loss of the classifier for the fake samples it makes. In other words, while ordinary neural network training is an optimization problem, GAN training is a game in which generators and classifiers compete rather than optimization. It stabilizes when the Nash equilibrium is reached.
The GAN training algorithm can be summarized as follows.
Each iterative step in training:
1.Discriminator training
a.Randomly select samples from real data to make mini-batch X
b.Make a mini-batch z of random vector and mini-batch G consisting of fake samples(Z)=X`make
c. D(x)And D(x`)Calculate the discriminant loss for and update the classifier parameters by inversely propagating the total error
2.Generator training
a.Make a mini-batch z of random vector and mini-batch G consisting of fake samples(z)=X`make
b.D(x`)The discrimination error is calculated for and backpropagated to update the generator parameters to maximize the discrimination error.
Note that the generator parameters are not updated when training the classifier in step 1! Note that the classifier parameters are not updated when training the generator in step 2!
Using the basic knowledge of GAN so far, implement GAN as simplified as possible. (A more practical implementation will be given in a later article)
This time, we will implement the code to generate an image from MNIST data with Keras's Sequential API.
%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from keras.datasets import mnist
from keras.layers import Dense, Flatten, Reshape
from keras.layers.advanced_activations import LeakyReLU
from keras.models import Sequential
from keras.optimizers import Adam
#Input dimension settings
img_rows = 28
img_cols = 28
channels = 1
img_shape = (img_rows, img_cols, channels)
#Dimension of input noise to generator
z_dim = 100
#Generator
def build_generator(img_shape, z_dim):
model = Sequential()
model.add(Dense(128, input_dim=z_dim))
model.add(LeakyReLU(alpha=0.01))
model.add(Dense(28*28*1, activation='tanh'))
model.add(Reshape(img_shape))
return model
#Identifyer
def build_discriminator(image_shape):
model = Sequential()
model.add(Flatten(input_shape=img_shape))
model.add(Dense(128))
model.add(LeakyReLU(alpha=0.01))
model.add(Dense(1, activation='sigmoid'))
return model
#compile!
def build_gan(generator, discriminator):
model = Sequential()
model.add(generator)
model.add(discriminator)
return model
#Build and compile classifier
discriminator = build_discriminator(img_shape)
discriminator.compile(loss='binary_crossentropy',
optimizer=Adam(),
metrics=["accuracy"])
#Building a generator
generator = build_generator(img_shape, z_dim)
#Identifier parameters are fixed during generator construction
discriminator.trainable = False
#Build and compile GAN model
gan = build_gan(generator, discriminator)
gan.compile(loss="binary_crossentropy",optimizer=Adam())
#Training!
losses = []
accuracies = []
iteration_checkpoints = []
def train(iterations, batch_size, sample_interval):
(X_train, Y_train), (X_test, Y_test) = mnist.load_data() #X_train.shape=(60000, 28, 28)
X_train = X_train /127.5 - 1.0
X_train = np.expand_dims(X_train, axis=3)
real = np.ones((batch_size, 1))
fake = np.zeros((batch_size,1))
for iteration in range(iterations):
#Make a randomly picked batch from a real image
idx = np.random.randint(0, X_train.shape[0],batch_size)
imgs = X_train[idx]
#Create a batch of fake images
z = np.random.normal(0, 1, (batch_size, 100))
gen_imgs = generator.predict(z)
#Discriminator training
d_loss_real = discriminator.train_on_batch(imgs, real)
d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
d_loss, accuracy = 0.5 * np.add(d_loss_real, d_loss_fake)
#Create a batch of fake images
z = np.random.normal(0, 1, (batch_size, 100))
ge_images = generator.predict(z)
#Generator training
g_loss = gan.train_on_batch(z, real)
if (iteration+1) % sample_interval == 0:
#Record the loss value and matching value of the iteration
losses.append((d_loss, g_loss))
accuracies.append(100 * accuracy)
iteration_checkpoints.append(iteration+1)
print("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" %
(iteration + 1, d_loss, 100.0 * accuracy, g_loss))
sample_images(generator)
def sample_images(generator, image_grid_rows=4, image_grid_columns=4):
#Random noise sampling
z = np.random.normal(0, 1, (image_grid_rows * image_grid_columns, z_dim))
gen_imgs = generator.predict(z)
#Pixel value scale
gen_imgs = 0.5 * gen_imgs + 0.5
fig, axs = plt.subplots(image_grid_rows,
image_grid_columns,
figsize=(4, 4),
sharey=True,
sharex=True)
cnt = 0
for i in range(image_grid_rows):
for j in range(image_grid_columns):
# Output a grid of images
axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray')
axs[i, j].axis('off')
cnt += 1
iterations = 20000
batch_size = 128
sample_interval = 1000
train(iterations, batch_size, sample_interval)
1000 [D loss: 0.129656, acc.: 96.09%] [G loss: 3.387729]
2000 [D loss: 0.079047, acc.: 97.66%] [G loss: 3.964481]
3000 [D loss: 0.071152, acc.: 97.27%] [G loss: 5.072118]
4000 [D loss: 0.217956, acc.: 91.02%] [G loss: 3.993687]
5000 [D loss: 0.380112, acc.: 86.72%] [G loss: 3.941338]
6000 [D loss: 0.292950, acc.: 89.45%] [G loss: 4.491636]
7000 [D loss: 0.345073, acc.: 85.55%] [G loss: 4.056399]
8000 [D loss: 0.396545, acc.: 86.33%] [G loss: 3.101150]
9000 [D loss: 0.744731, acc.: 70.70%] [G loss: 2.761991]
10000 [D loss: 0.444913, acc.: 80.86%] [G loss: 3.474383]
11000 [D loss: 0.362310, acc.: 82.81%] [G loss: 3.101751]
12000 [D loss: 0.383188, acc.: 84.38%] [G loss: 3.111648]
13000 [D loss: 0.283140, acc.: 89.06%] [G loss: 3.082010]
14000 [D loss: 0.411019, acc.: 81.64%] [G loss: 2.747284]
15000 [D loss: 0.386751, acc.: 82.03%] [G loss: 2.795580]
16000 [D loss: 0.475734, acc.: 80.86%] [G loss: 2.436490]
17000 [D loss: 0.285364, acc.: 89.45%] [G loss: 2.764011]
18000 [D loss: 0.202013, acc.: 91.80%] [G loss: 4.058733]
19000 [D loss: 0.285773, acc.: 86.72%] [G loss: 3.038511]
20000 [D loss: 0.354960, acc.: 81.64%] [G loss: 2.719907]
↓1000iteration ↓2000iteration ↓10000iteration ↓20000iteration
At the beginning of learning, it was just a noise-like image, but in the end, it seems that even a simple two-layer generator can generate relatively realistic handwritten characters. However, white dots appear on the background of the handwritten image generated by simple GAN, and it is immediately noticeable that it is not handwritten. Next time, I would like to implement DCGAN using convolution to improve this weakness!