Write DCGAN in Keras

I am continuing the additional experiment of the previous article without any sexual discipline. I wanted to write DCGAN, so I wrote it, but I have accumulated some insignificant knowledge, so I will write it. The contents are mainly as follows.

--Keras tips --DCGAN tinkering process

See other articles for a description of DCGAN itself. I mainly referred to this area.

Let the computer draw an illustration using Chainer Automatic face illustration generation with Chainer keras-dcgan

Keras related

It only mentions Keras, so if you're not interested, skip it.

Keras trainable

If you search for Keras DCGAN, keras-dcgan will appear at the top. When I take a peek for reference, Switch trainable value during training when learning Generator It seems that the weight of Discriminator is not updated. If you've read the Keras documentation, you know that you can prevent layer weights from being updated by specifying trainable = False, but after reading the above code, there are two things that get stuck. did. First of all, even if you set the Model's trainable, the layers inside it will still be trainable = True and the weights will be updated. This was mentioned briefly in the previous article. Secondly, the Keras Documentation FAQ states the following about stopping weight updates.

In addition, the trainable property of the layer can be given True or False after instantiation. The effect of this is that you need to call compile () on the modified model of the trainable property. Here is an example:

In keras-dcgan, since I did not compile after changing trainable, it is not reflected in learning and I feel that it will be a wrong progress (in the example, a fairly decent image is output and I have not tried it myself, so what happens? It is a mystery whether it is terrifying).

what to do

Then, I was tired of thinking that I would set trainable for all layers and repeat ~ for each alternating step, so I tried various things and found the best method. As mentioned earlier, Keras has a specification that it will not be reflected unless it is compile after updating ** trainable **, but I wondered if it could be used well (or if it was not designed well). Here is a partial excerpt of the experimental code. Click here for full text

modelA = Sequential([
    Dense(10, input_dim=100, activation='sigmoid')
])

modelB = Sequential([
    Dense(100, input_dim=10, activation='sigmoid')
])

modelB.compile(optimizer='adam', loss='binary_crossentropy')

set_trainable(modelB, False)
connected = Sequential([modelA, modelB])
connected.compile(optimizer='adam', loss='binary_crossentropy')

It is a model that cuts down DCGAN to the limit. Immediately after instantiation, everything is in the trainable state. By compile modelB in this state, you can update the weight with modelB.fit. Next, with set_trainable, trainable = False is set for all layers of model B, and the model connected that connects model A and model B is compile. What happens to the weight of modelB if fit modelB, connnected in this state?

w0 = np.copy(modelB.layers[0].get_weights()[0])

connected.fit(X1, X1)
w1 = np.copy(modelB.layers[0].get_weights()[0])
print('Freezed in "connected":', np.array_equal(w0, w1))
# Freezed in "connected": True

modelB.fit(X2, X1)
w2 = np.copy(modelB.layers[0].get_weights()[0])
print('Freezed in "modelB":', np.array_equal(w1, w2))
# Freezed in "modelB": False

connected.fit(X1, X1)
w3 = np.copy(modelB.layers[0].get_weights()[0])
print('Freezed in "connected":', np.array_equal(w2, w3))
# Freezed in "connected": True

Surprisingly, the output of the code above (?) Is as in the comment, and the settings at the time of compile are alive, with modelB being able to learn and connected not being able to learn. This means that if you set it correctly the first time, you don't need to change it at all. In this case, you don't have to switch every time in the learning part, and you can clean it up. (As an aside, keras-dcgan seems to be suspicious because there are many unnecessary compiles.)

Keras Batch Normalization

I wrote the code because I was able to solve the problem in the previous section. It's almost like this that I wrote without thinking about anything.

discriminator = Sequential([
    Convolution2D(64, 3, 3, border_mode='same', subsample=(2,2), input_shape=[32, 32, 1]),
    LeakyReLU(),
    Convolution2D(128, 3, 3, border_mode='same', subsample=(2,2)),
    BatchNormalization(),
    LeakyReLU(),
    Convolution2D(256, 3, 3, border_mode='same', subsample=(2,2)),
    BatchNormalization(),
    LeakyReLU(),
    Flatten(),
    Dense(2048),
    BatchNormalization(),
    LeakyReLU(),
    Dense(1, activation='sigmoid')
], name="discriminator")

generator = Sequential([
#abridgement
])

# setup models

print("setup discriminator")
opt_d = Adam(lr=1e-5, beta_1=0.1)
discriminator.compile(optimizer=opt_d, 
                      loss='binary_crossentropy', 
                      metrics=['accuracy'])

print("setup dcgan")
set_trainable(discriminator, False)
dcgan = Sequential([generator, discriminator])
opt_g = Adam(lr=2e-4, beta_1=0.5)
dcgan.compile(optimizer=opt_g, 
              loss='binary_crossentropy', 
              metrics=['accuracy'])

When I tried to learn this, I got the following error.

Exception: You are attempting to share a same `BatchNormalization` layer across different data flows. This is not possible. You should use `mode=2` in `BatchNormalization`, which has a similar behavior but is shareable (see docs for a description of the behavior).

It is said to set mode = 2 to Batch Normalization. This is when trying to reuse the same layer elsewhere, as in the example in the Shared Layers section. I think you are saying that sharing BatchNormalization may cause inconvenience, for example, when the distributions of two inputs are different. In the above code, Discriminator alone and Generator + Discriminator are compile, and it seems that the layers are considered to be shared. In DCGAN, Generator + Discriminator has trainable = False, which is not related to learning, so I think it's okay to specify mode = 2. If you think about it, you can see it on the Batch Normalization page.

Train step with Keras + DCGAN

Now that I have a model, I started learning it properly. A mysterious phenomenon occurred when the Generator was trained in batches of random numbers with reference to keras-dcgan, and the Discriminator was trained alternately in batches of images generated from the same random numbers and correct images. Specifically, [this article](http://qiita.com/rezoolab/items/5cc96b6d31153e0c86bc#%E3%83%91%E3%83%A9%E3%83%A1%E3%83%BC%E3 % 82% BF% E8% AA% BF% E6% 95% B4% E3% 81% AB% E3% 81% A4% E3% 81% 84% E3% 81% A6% E3% 81% AE% E6% 84 It is the same as the content mentioned in% 9F% E6% 83% B3).

There are two ways to update the parameters of Discriminator, one is to update the batch of real images and the batch of fake images in one batch, and the other is to explicitly update the loss function in two. .. If there is no Batch Normalization layer, the final gradient will not change, but if it is included, there will be a clear difference (complex). At first, if I updated with the former method, I got a strange result that the winning percentage was 100% for both G and D. At first I suspected it was a bug on the chainer side, but when I finally focused on the nature of BN and implemented the latter, it converged cleanly (it was not a bug).

I understand that it is not good to put the correct image and the random number image in the same batch and normalize it, but I do not understand at all why this phenomenon occurs (By the way, the discriminator of keras-dcgan is BN. Does not seem to address this issue as it does not contain). As it is written, it seems that it would be good to "explicitly divide the loss function into two and update", but I have not yet grasped what it means to explicitly divide the loss function into two. There was no source code for this article, so instead, take a look at the source code for this article. Applicable part

DCGAN.py


# train generator
z = Variable(xp.random.uniform(-1, 1, (batchsize, nz), dtype=np.float32))
x = gen(z)
yl = dis(x)
L_gen = F.softmax_cross_entropy(yl, Variable(xp.zeros(batchsize, dtype=np.int32)))
L_dis = F.softmax_cross_entropy(yl, Variable(xp.ones(batchsize, dtype=np.int32)))

# train discriminator

x2 = Variable(cuda.to_gpu(x2))
yl2 = dis(x2)
L_dis += F.softmax_cross_entropy(yl2, Variable(xp.zeros(batchsize, dtype=np.int32)))

#print "forward done"

o_gen.zero_grads()
L_gen.backward()
o_gen.update()
            
o_dis.zero_grads()
L_dis.backward()
o_dis.update()
            
sum_l_gen += L_gen.data.get()
sum_l_dis += L_dis.data.get()
            
#print "backward done"

I don't know Chainer at all, but it seems that the calculation of the visual loss is performed separately for the correct image and the random number image, and then the weight is updated accordingly. So, I thought I would do the same thing with Keras, but I couldn't find anything around Functional API. After all, as a compromise, I decided to take the method of separating the correct image and the random number image and performing train_on_batch separately (I will explain later, but when I put BN in Discriminator, it did not succeed even once, so this is not correct There is a lot of potential).

DCGAN

Setting

environment

Ubuntu16.04 Core i7 2600k Geforce GTX1060 6GB

data set

The data set is 11664 images of 54 32x32 grayscale abyss characters images used in the previous article, scaled and moved up, down, left and right by forcibly increasing the brightness ([I keep only the original image]. (https://github.com/t-ae/abyss-letter2/tree/master/abyss_letters)). That's why you should assume that the rest of the content is not general.

Various measurements

I don't know how to evaluate DCGAN, so I will measure it by a method that I think appropriately.

--Step by step train_loss, train_accuracy --Val_loss, val_accuracy for each epoch ――Can you complement between random number vectors?

The last one is [this article](http://qiita.com/mattya/items/e5bfe5e04b9d2f0bbd47#z%E3%81%AE%E7%A9%BA%E9%96%93%E3%82%92%E8 % AA% BF% E3% 81% B9% E3% 82% 8B) is detailed, so please read it.

When Batch Normalization is put in Discriminator

First of all, as mentioned in the paper, I tried to put Batch Normalization in both Discriminator and Generator. The image below plots train_loss and train_accuracy. norm_high_d.png norm_low_d.png I tried various things while changing the parameters of Optimizer, but I couldn't help but fell into an apparently unnatural state near the end of loss. The following is an example of the image output in this state. norm_low_d_out.png In addition to this, everything is black, and anyway, only similar images are generated for random number input. The probable cause is the train step that matches the Keras specifications mentioned above, but since it cannot be fixed, I decided not to use BN for the Discriminator after learning from keras-dcgan. For those who want to follow the paper or who want to enjoy using the implementation examples of their predecessors, I think it is better to do it other than Keras.

After BN exclusion

Since I removed the BN from the Discriminator (although parameter adjustment is still difficult), it started to spit out decent images. Although it is in Repository, the following code may have changed because it is in the process of being tampered with.

model

train_dcgan.py


# define models
discriminator = Sequential([
    Convolution2D(64, 3, 3, border_mode='same', subsample=(2,2), input_shape=[32, 32, 1]),
    LeakyReLU(),
    Convolution2D(128, 3, 3, border_mode='same', subsample=(2,2)),
    LeakyReLU(),
    Convolution2D(256, 3, 3, border_mode='same', subsample=(2,2)),
    LeakyReLU(),
    Flatten(),
    Dense(2048),
    LeakyReLU(),
    Dense(1, activation='sigmoid')
], name="discriminator")

generator = Sequential([
    Convolution2D(64, 3, 3, border_mode='same', input_shape=[4, 4, 4]),
    UpSampling2D(), # 8x8
    Convolution2D(128, 3, 3, border_mode='same'),
    BatchNormalization(),
    ELU(),
    UpSampling2D(), #16x16
    Convolution2D(128, 3, 3, border_mode='same'),
    BatchNormalization(),
    ELU(),
    UpSampling2D(), # 32x32
    Convolution2D(1, 5, 5, border_mode='same', activation='tanh')
], name="generator")

# setup models
print("setup discriminator")
opt_d = Adam(lr=1e-5, beta_1=0.1)
discriminator.compile(optimizer=opt_d, 
                      loss='binary_crossentropy', 
                      metrics=['accuracy'])

print("setup dcgan")
set_trainable(discriminator, False)
dcgan = Sequential([generator, discriminator])
opt_g = Adam(lr=2e-4, beta_1=0.5)
dcgan.compile(optimizer=opt_g, 
              loss='binary_crossentropy', 
              metrics=['accuracy'])

The model is a paper-based modification. Discriminator learns so quickly that the Generator hasn't caught up at all, so we've increased the number of hidden layer units more than necessary and reduced the learning rate.

Learning

train_dcgan.py


def create_random_features(num):
    return np.random.uniform(low=-1, high=1, 
                            size=[num, 4, 4, 4])
for epoch in range(1, sys.maxsize):

    print("epoch: {0}".format(epoch))
    
    np.random.shuffle(X_train)
    rnd = create_random_features(len(X_train))

    # train on batch
    for i in range(math.ceil(len(X_train)/batch_size)):
        print("batch:", i, end='\r')
        X_batch = X_train[i*batch_size:(i+1)*batch_size]
        rnd_batch = rnd[i*batch_size:(i+1)*batch_size]

        loss_g, acc_g = dcgan.train_on_batch(rnd_batch, [0]*len(rnd_batch))
        generated = generator.predict(rnd_batch)
        X = np.append(X_batch, generated, axis=0)
        y = [0]*len(X_batch) + [1]*len(generated)
        loss_d, acc_d = discriminator.train_on_batch(X,y)
        
        met_curve = np.append(met_curve, [[loss_d, acc_d, loss_g, acc_g]], axis=0)

    # val_Calculation of loss etc., output and model saving continue

I referred to keras-dcgan, but I changed the order so that I learned Generator first and then Discriminator. This is because I thought that if G comes first, the probability that the random number will deceive D will be higher, and learning at the beginning will be more efficient. The effect is not well understood.

output

Here is an example of how well the parameters were adjusted. It took about 8 hours with 3000 epochs. It seems good to put in a condition such as val_acc ends when it is 0 many times in a row. At 100 epochs 100epoch.png At the time of 300 epoch 300epoch.png At 1000 epochs 1000epoch.png As of 2000 epoch 2000epoch.png At 3000 epoch 3000epoch.png

The 1000-2000 epoch looks the best. It's broken when you reach 3000 at the end.

The train_loss / accuracy was as follows. a.png

The loss of G is vibrating and rising to the right. I tried various things, but I couldn't suppress the increase in G loss even if I made the learning rate of D considerably smaller. I will take out the middle part. met.png Although train_loss flew frequently, it seemed to be fairly stable as a whole. Although train_acc is a low-flying flight, I think that learning should proceed more efficiently if it stabilizes where it is larger. Especially when learning in the order of G-> D as in the above code, train_acc of G is directly linked to the number of deceived images input to D.

Finally, let's see that the two can be complemented. middle.png The leftmost and rightmost columns are images generated from random vectors, with spaces complementing them. It seems that the space is formed well because it is deformed fairly smoothly regardless of whether it is character-like.

Summary

While struggling with various problems, I thought that DCGAN itself could be managed with brute force (reasonable parameters and time). I used to give up early if it seemed impossible to see loss / acc in the early stages, but even those may be managed if I take the time. I looked at train_loss / acc as an evaluation of how well I was learning, but the loss of the Generator started to rise around the beginning, so it was useful as a criterion to give up if it seemed to rise endlessly. On the contrary, even if the value was stable, it was difficult to obtain a good image, so it seemed to be not very useful for evaluation after the middle stage. I didn't mention val_loss / acc in the bullet points at all, but the variation from step to step is so large and the direction of travel is different, so the value to evaluate once in the epoch is not very meaningful. It was. It was useful as a stop judgment. Complementary evaluation between the two is likely to be an objective criterion for whether DCGAN is successful. However, even if it can be cleared, it seems that the progress of the number of epochs and the output are not always the same, so it seems that we have to make a subjective judgment as to which stage of the model is the most suitable ( It may be good to scan the feature space to check what percentage of the original image the features correspond to, but it seems to be insanely difficult).

Chainer was wondering why forward and backward were separated, but this train step seems to have been an effective situation. I also looked at some TensorFlow implementations, but I didn't quite understand the sample. Writing in Keras makes it easier to understand, but I was still worried about the restrictions on the train step. In this article, the degree of freedom was conspicuous, but I think that it is the best for writing general problems such as image classification quickly, so if you do not want to study TF like me, please try it.

For the time being, I have created a model for generating abyss characters that looks like that, so I would like to post a brushed-up version of the previous article by the time the 5th volume of Made in Abyss was released.


11/20: Addendum When the number of dimensions of random numbers was reduced to 30, good output came to be obtained even with about 200 epochs. Although it is a trade-off with expressiveness, it seems that learning speed can be improved by adjusting the number of dimensions according to the diversity of the original image. If there is a measure for the diversity of output, it seems that we can explore more.

Recommended Posts

Write DCGAN in Keras
Write Pulumi in Go
Write decorator in class
Implemented hard-swish in Keras
Write Python in MySQL
Implement LSTM AutoEncoder in Keras
Write Pandoc filters in Python
Write standard input in code
Write beta distribution in Python
Write python in Rstudio (reticulate)
Write Spigot in VS Code
Write data in HDF format
Write Spider tests in Scrapy
Write a binary search in Python
Write a table-driven test in C
Write JSON Schema in Python DSL
How to write soberly in pandas
Write an HTTP / 2 server in Python
Write AWS Lambda function in Python
Write A * (A-star) algorithm in Python
[Maya] Write custom nodes in Open Maya 2.0
Solution for ValueError in Keras imdb.load_data
Write foreign key constraints in Django
Compare DCGAN and pix2pix with keras
Write selenium test code in python
Write a pie chart in Python
Write a vim plugin in Python
Write a depth-first search in Python
DCGAN
Simple regression analysis implementation in Keras
Write Reversi AI with Keras + DQN
Write tests in GO language + gin
Write C unit tests in Python
Write the test in a python docstring
Write a short property definition in Python
Write O_SYNC file in C and Python
Write a Caesar cipher program in Python
Write and execute SQL directly in Elixir
Read and write JSON files in Python
Write a simple greedy algorithm in Python
Write python modules in fortran using f2py
Write a simple Vim Plugin in Python 3
Various comments to write in the program
How to write this process in Perl?
How to write Ruby to_s in Python