DCGAN （Deep Convolutional GAN)

What is DCGAN?

For both the simple GAN generator and classifier created in previous article, convolutional neurals instead of using simple two-layer feedforward GAN using a network is called DCGAN.

Batch Normalization

This time DCGAN uses batch normalization. For a detailed explanation, this person's article is very easy to understand.

To briefly introduce only the advantages of introducing batch normalization

It is possible to advance learning quickly
Less dependent on initial values
Overfitting can be suppressed

And so on. In this implementation, the keras.layers.BatchNormalization function does a good job of calculating and updating mini-batch behind the scenes. Let's actually implement DCGAN! The general flow is almost the same as the previous article.

It's time to implement! !! !!

1. Various imports

#First of all, import

%matplotlib inline

import matplotlib.pyplot as plt
import numpy as np

from keras.datasets import mnist
from keras.layers import Activation, BatchNormalization, Dense, Dropout, Flatten, Reshape
from keras.layers.advanced_activations import LeakyReLU
from keras.layers.convolutional import Conv2D, Conv2DTranspose
from keras.models import Sequential
from keras.optimizers import Adam

2. Setting the input dimension of the model

#Model input dimension settings

img_rows = 28
img_cols = 28
channels = 1

img_shape = (img_rows, img_cols, channels)

#The dimension of the noise vector used as the input to the generator
z_dim = 100

3. Implementation of generator

Since the generator generates an image from the noise vector z, we will use transpose convolution. In other words, in the figure below, the leftmost image is generated from the rightmost z vector.

スクリーンショット 2020-06-13 15.16.22.png

The specific steps are summarized below.

Convert to a 7x7x256 tensor by creating a noise vector and passing it through the fully connected layer
Convert 7x7x256 to 14x14x128 by transposed convolution layer
Perform batch normalization and apply Leaky ReLU
Convert 14x14x128 to 14x14x64 with transposed convolution layer. Height and width do not change in this step
Perform batch normalization and apply Leaky ReLU
Convert 14x14x64 to output image size 28x28x1 with transposed convolution layer
Apply tanh function

Regarding the parameters of Conv2D Transpose, I referred to this article.

#Generator

def build_generator(z_dim):
  model = Sequential()

  model.add(Dense(256*7*7, input_dim = z_dim))
  model.add(Reshape((7, 7, 256)))

  model.add(Conv2DTranspose(128, kernel_size=3, strides=2,padding='same'))
  model.add(BatchNormalization())
  model.add(LeakyReLU(alpha=0.01))

  model.add(Conv2DTranspose(1, kernel_size=3, strides=2, padding="same"))
  model.add(Activation('tanh'))

  return model

4. Implementation of classifier

The classifier takes the familiar network structure of CNN. To give you a rough idea of what you're doing, you can enter image data and convolve it to finally calculate the probability that the image is genuine. Please check the code below for details.

Identifyer

def build_discriminator(img_shape):

  model = Sequential()

  model.add(Conv2D(32, kernel_size=3, strides=2, input_shape=img_shape, padding='same'))
  model.add(LeakyReLU(alpha=0.01))

  model.add(Conv2D(64, kernel_size=3, strides=2, padding="same"))
  model.add(BatchNormalization())
  model.add(LeakyReLU(alpha=0.01))

  model.add(Conv2D(128, kernel_size=3, strides=2, padding="same"))
  model.add(BatchNormalization())
  model.add(BatchNormalization())
  model.add(LeakyReLU(alpha=0.01))

  model.add(Flatten())
  model.add(Dense(1, activation="sigmoid"))

  return model

5. Compile DCGAN

#DCGAN compilation
def build_gan(generator, discriminator):

  model = Sequential()

  model.add(generator)
  model.add(discriminator)

  return model

discriminator = build_discriminator(img_shape)
discriminator.compile(loss="binary_crossentropy", optimizer=Adam(), metrics=["accuracy"])

generator = build_generator(z_dim)
discriminator.trainable = False

gan = build_gan(generator, discriminator)
gan.compile(loss="binary_crossentropy", optimizer=Adam())

6. Learning settings

#Training

losses = []
accuracies = []
iteration_checkpoints = []

def train(iterations, batch_size, sample_interval):
  (X_train, _),(_, _) = mnist.load_data()

  X_train = X_train / 127.5 -1.0
  X_train = np.expand_dims(X_train, 3)

  real = np.ones((batch_size, 1))
  fake = np.zeros((batch_size, 1))

  for iteration in range(iterations):

    idx = np.random.randint(0, X_train.shape[0], batch_size)
    imgs = X_train[idx]

    z = np.random.normal(0, 1, (batch_size, 100))
    gen_imgs = generator.predict(z)

    d_loss_real = discriminator.train_on_batch(imgs, real)
    d_loss_fake = discriminator.train_on_batch(gen_imgs, fake)
    d_loss,  accuracy = 0.5 * np.add(d_loss_real, d_loss_fake)

    z = np.random.normal(0, 1, (batch_size, 100))
    gen_imgs = generator.predict(z)

    g_loss = gan.train_on_batch(z, real)
    if iteration == 0:
      sample_images(generator)

    if ((iteration + 1) % sample_interval == 0):

      losses.append((d_loss, g_loss))
      accuracies.append(100 * accuracy)
      iteration_checkpoints.append(iteration+1)

      print("%d [D loss: %f, acc.: %.2f%%] [G loss: %f]" %
                  (iteration + 1, d_loss, 100.0 * accuracy, g_loss))
      sample_images(generator)

7. Image display

def sample_images(generator, image_grid_rows=4, image_grid_columns=4):

  z = np.random.normal(0, 1, (image_grid_rows * image_grid_columns, z_dim))
  gen_imgs = generator.predict(z)
  gen_imgs = 0.5 * gen_imgs + 0.5

  fig, axs = plt.subplots(image_grid_rows,
                           image_grid_columns,
                           figsize=(4,4),
                           sharey=True,
                           sharex=True
                           )
  cnt = 0
  for i in range(image_grid_rows):
    for j in range(image_grid_columns):
      axs[i, j].imshow(gen_imgs[cnt, :, :, 0], cmap='gray')
      axs[i, j].axis('off')
      cnt += 1

8. Let's learn!

iterations = 20000
batch_size = 128
sample_interval = 1000
train(iterations, batch_size, sample_interval)

result

↓ Initial noise スクリーンショット 2020-06-13 19.26.47.png ↓1000iterations スクリーンショット 2020-06-13 19.26.57.png ↓10000iterations スクリーンショット 2020-06-13 19.27.29.png ↓20000iterations スクリーンショット 2020-06-13 19.27.43.png

How about, I was able to generate an image at a level that is indistinguishable from the real handwriting taken from the mnist of the dataset. Also, in the simple GAN of previous, noise in pixel units was included in the image, but this time by using DCGAN, it is between pixels. Relationships can be embedded and a beautiful image without noise is generated.