Learn Wasserstein GAN with Keras model and TensorFlow optimization

Introduction

--What to do in this article: Learn Wasserstein GAN by optimizing the Keras model, TensorFlow. --Prerequisite knowledge: Basic learning rules of GAN --Those who do not need this article: Those who can freely code with either deep learning library or full scratch

Recently, the deep generating system, so-called GAN, is hot. I also try various things and play. As for the library, Keras is intuitive and easy to understand.

By the way, ordinary DCGAN and Wasserstein GAN can be implemented and learned quickly even with Keras, but when it comes to those who have made minor improvements by optimizing errors etc., "How should I write this in Keras? ,,? ”I think it may happen. I did that with the Gradient Penalty implementation proposed in Improved Training of Wasserstein GANs. I will omit the explanation of Gradient Penalty, but in this case, if you define a model with Keras and learn with TensorFlow, isn't it good? I thought that, I managed to learn this well with normal Wasserstein GAN, so I would like to write an article. I'm glad if you can use it as a reference. Place the code at here.

Requirements

python 3.5
Keras 2.0.5
TensorFlow 1.1.0 --Other basic libraries ――When learning, download the learning image data (CelebA, etc.) to some directory.

Model construction (Keras)

Generator and Discriminator are written by Keras. It's easy. (See model.py) Generator uses Deconvolution (filter size 4 * 4, stride 2 * 2) to convert random number inputs into images. Discriminator outputs the function value from the input image by Convolution (filter size 4 * 4, stride 2 * 2). I wrote it in Keras's Functional API Model, but there should be no problem with the Sequential Model (writing method that connects with model.add).

Learning (TensorFlow)

In Wasserstein GAN, Discriminator handles the output of function values rather than determining the authenticity of the input image.

\max_{D} \min_{G} \mathbb{E}_{x\sim p(x)}D(x) - \mathbb{E}_{z \sim p(z)}D(G(z))

This makes gradient learning more stable than the original GAN, which minimizes Jensen-Shannon divergence, but see the original paper (https://arxiv.org/abs/1701.07875) and other commentary for more details. Please give me.

By the way, normally, in the case of Keras, the loss function prepared or the one defined by yourself is specified and model.compile is performed, but in this article, this optimization is performed by TensorFlow.

`main.py`


class WassersteinGAN:

    def __init__(self, ...):

        self.image_size = 64
        self.gen = generator # model.Generator defined in py
        self.disc = discriminator # model.Discriminator defined in py
        
        #Input image and input random number are defined by placeholder.
        self.x = tf.placeholder(tf.float32, (None, self.image_size, self.image_size, 3), name = 'x') #Real image
        self.z = tf.placeholder(tf.float32, (None, 100), name = 'z')　#Input random number
        self.x_ = self.gen(self.z) #Fake image<-Input random number
        
        self.d = self.disc(self.x) #Output with real image as input
        self.d_ = self.disc(self.x_) #Output with fake image as input

        self.d_loss = -(tf.reduce_mean(self.d) - tf.reduce_mean(self.d_)) #Discriminator objective function
        self.g_loss = -tf.reduce_mean(self.d_) #Generator objective function

        #Set Optimizer. The learning rate of Generator is set a little smaller.
        self.d_opt = tf.train.RMSPropOptimizer(learning_rate = 5e-5)\
                     .minimize(self.d_loss, var_list = self.disc.trainable_weights)
        self.g_opt = tf.train.RMSPropOptimizer(learning_rate = 1e-5)\
                     .minimize(self.g_loss, var_list = self.gen.trainable_weights)

        #Set the TensorFlow session.
        self.sess = tf.Session()
        K.set_session(self.sess) #← It seems necessary when used with Keras

    def train(self, ...):

Under these settings, we will actually flow the data and learn.

In my code, a class for fetching input images and input random numbers (misc / dataIO.py, InputSampler ) Was made, The sampler that appears in the following code is an instance of this InputSampler, and the image_sample method returns a mini-batch of real images, and the noise_sample method returns a mini-batch of input random numbers. The reload method is a method for dividing and holding a large number of images of training data. (Since the frequently used face image data set CelebA contains more than 200,000 images, we made this specification due to memory reasons.)

Also, if the model implemented in Keras includes BatchNormalization, it is necessary to set K.learning_phase () when flowing data. Including that, it is described in the following code.

`main.py`


class WassersteinGAN:

    def __init__(self, ...):

    # --Abbreviation--

    def train(self, ...):
        
        for e in range(epochs):
            for batch in range(num_batches):

                #Discriminator learns a lot
                for _ in range(5):
                    #Weight clipping to ensure Lipschitz continuity
                    d_weights = [np.clip(w, -0.01, 0.01) for w in self.disc.get_weights()]
                    self.disc.set_weights(d_weights)

                    #Real image mini batch
                    bx = sampler.image_sample(batch_size)
                    #Mini batch of input random numbers
                    bz = sampler.noise_sample(batch_size)
                    #Input and K to flow to the placeholder.learning_phase()To learn Discriminator by specifying
                    self.sess.run(self.d_opt, feed_dict = {self.x: bx, self.z: bz,
                                                           K.learning_phase(): 1})

                bz = sampler.noise_sample(batch_size, self.z_dim)
                #Input and K to flow to the placeholder.learning_phase()To learn Generator by specifying
                self.sess.run(self.g_opt, feed_dict = {self.z: bz,
                                                       K.learning_phase(): 1})

                #When outputting loss
                d_loss, g_loss = self.sess.run([self.d_loss, self.g_loss],
                                               feed_dict = {self.x: bx, self.z: bz,
                                                            K.learning_phase(): 1})
                print('epoch : {}, batch : {}, d_loss : {}, g_loss : {}'\
                      .format(e, batch, d_loss, g_loss))

~~ Since the model is described in Keras, parameters can be saved by Keras formula. I often save each epoch. ~~ ~~ * If you try to save the model in the Keras way in a TensorFlow session, it may not work. confirm. ~~ Model parameters can be saved with the Keras expression `model.save_weights ()`. You can also load saved parameters with ``` model.load_weights ()` ``.

However, the parameters learned by tensorflow must be moved with `tf.Session ()`.

`main.py`


class WassersteinGAN:

    def __init__(self, ...):

    # --Abbreviation--

    def train(self, ...):
        
        for e in range(epochs):
            for batch in range(num_batches):
            
            # --Abbreviation--

            #Parameters can be saved in Keras way
            self.gen.save_weights('path/to/g_{}epoch.h5'.format(e))
            self.disc.save_weights('path/to/d_{}epoch.h5'.format(e))

This taught me reasonably well, but the generated image was a bit blurry. In general, Wasserstein GAN seems to be a little blurry than DCGAN, but I feel that it can be generated beautifully depending on the tuning such as learning rate.

Finally

In this way, I was able to build a model with Keras and learn with TensorFlow. Recently, I feel that many methods with devised optimization have been proposed, so I hope that they will be of some help to the follow-up tests. Please feel free to contact us if you have any suggestions or improvements.

Postscript

I wrote the code to learn with Keras and tensorflow in the same way with a simple CNN. If you like GAN, here is easy to understand.