I am continuing the additional experiment of the previous article without any sexual discipline. I wanted to write DCGAN, so I wrote it, but I have accumulated some insignificant knowledge, so I will write it. The contents are mainly as follows.
--Keras tips --DCGAN tinkering process
See other articles for a description of DCGAN itself. I mainly referred to this area.
Let the computer draw an illustration using Chainer Automatic face illustration generation with Chainer keras-dcgan
It only mentions Keras, so if you're not interested, skip it.
If you search for Keras DCGAN
, keras-dcgan will appear at the top. When I take a peek for reference, Switch trainable value during training when learning Generator It seems that the weight of Discriminator is not updated.
If you've read the Keras documentation, you know that you can prevent layer weights from being updated by specifying trainable = False
, but after reading the above code, there are two things that get stuck. did.
First of all, even if you set the Model's trainable
, the layers inside it will still be trainable = True
and the weights will be updated. This was mentioned briefly in the previous article.
Secondly, the Keras Documentation FAQ states the following about stopping weight updates.
In addition, the trainable property of the layer can be given True or False after instantiation. The effect of this is that you need to call compile () on the modified model of the trainable property. Here is an example:
In keras-dcgan, since I did not compile after changing trainable
, it is not reflected in learning and I feel that it will be a wrong progress (in the example, a fairly decent image is output and I have not tried it myself, so what happens? It is a mystery whether it is terrifying).
Then, I was tired of thinking that I would set trainable
for all layers and repeat ~ for each alternating step, so I tried various things and found the best method.
As mentioned earlier, Keras has a specification that it will not be reflected unless it is compile
after updating ** trainable
**, but I wondered if it could be used well (or if it was not designed well). Here is a partial excerpt of the experimental code. Click here for full text
modelA = Sequential([
Dense(10, input_dim=100, activation='sigmoid')
])
modelB = Sequential([
Dense(100, input_dim=10, activation='sigmoid')
])
modelB.compile(optimizer='adam', loss='binary_crossentropy')
set_trainable(modelB, False)
connected = Sequential([modelA, modelB])
connected.compile(optimizer='adam', loss='binary_crossentropy')
It is a model that cuts down DCGAN to the limit.
Immediately after instantiation, everything is in the trainable
state. By compile
modelB
in this state, you can update the weight with modelB.fit
.
Next, with set_trainable
, trainable = False
is set for all layers of model B
, and the model connected
that connects model A
and model B
is compile
.
What happens to the weight of modelB
if fit
modelB, connnected
in this state?
w0 = np.copy(modelB.layers[0].get_weights()[0])
connected.fit(X1, X1)
w1 = np.copy(modelB.layers[0].get_weights()[0])
print('Freezed in "connected":', np.array_equal(w0, w1))
# Freezed in "connected": True
modelB.fit(X2, X1)
w2 = np.copy(modelB.layers[0].get_weights()[0])
print('Freezed in "modelB":', np.array_equal(w1, w2))
# Freezed in "modelB": False
connected.fit(X1, X1)
w3 = np.copy(modelB.layers[0].get_weights()[0])
print('Freezed in "connected":', np.array_equal(w2, w3))
# Freezed in "connected": True
Surprisingly, the output of the code above (?) Is as in the comment, and the settings at the time of compile
are alive, with modelB
being able to learn and connected
not being able to learn.
This means that if you set it correctly the first time, you don't need to change it at all.
In this case, you don't have to switch every time in the learning part, and you can clean it up.
(As an aside, keras-dcgan seems to be suspicious because there are many unnecessary compile
s.)
I wrote the code because I was able to solve the problem in the previous section. It's almost like this that I wrote without thinking about anything.
discriminator = Sequential([
Convolution2D(64, 3, 3, border_mode='same', subsample=(2,2), input_shape=[32, 32, 1]),
LeakyReLU(),
Convolution2D(128, 3, 3, border_mode='same', subsample=(2,2)),
BatchNormalization(),
LeakyReLU(),
Convolution2D(256, 3, 3, border_mode='same', subsample=(2,2)),
BatchNormalization(),
LeakyReLU(),
Flatten(),
Dense(2048),
BatchNormalization(),
LeakyReLU(),
Dense(1, activation='sigmoid')
], name="discriminator")
generator = Sequential([
#abridgement
])
# setup models
print("setup discriminator")
opt_d = Adam(lr=1e-5, beta_1=0.1)
discriminator.compile(optimizer=opt_d,
loss='binary_crossentropy',
metrics=['accuracy'])
print("setup dcgan")
set_trainable(discriminator, False)
dcgan = Sequential([generator, discriminator])
opt_g = Adam(lr=2e-4, beta_1=0.5)
dcgan.compile(optimizer=opt_g,
loss='binary_crossentropy',
metrics=['accuracy'])
When I tried to learn this, I got the following error.
Exception: You are attempting to share a same `BatchNormalization` layer across different data flows. This is not possible. You should use `mode=2` in `BatchNormalization`, which has a similar behavior but is shareable (see docs for a description of the behavior).
It is said to set mode = 2
to Batch Normalization
.
This is when trying to reuse the same layer elsewhere, as in the example in the Shared Layers section. I think you are saying that sharing BatchNormalization
may cause inconvenience, for example, when the distributions of two inputs are different.
In the above code, Discriminator alone and Generator + Discriminator are compile
, and it seems that the layers are considered to be shared. In DCGAN, Generator + Discriminator has trainable = False
, which is not related to learning, so I think it's okay to specify mode = 2
.
If you think about it, you can see it on the Batch Normalization page.
Now that I have a model, I started learning it properly. A mysterious phenomenon occurred when the Generator was trained in batches of random numbers with reference to keras-dcgan, and the Discriminator was trained alternately in batches of images generated from the same random numbers and correct images. Specifically, [this article](http://qiita.com/rezoolab/items/5cc96b6d31153e0c86bc#%E3%83%91%E3%83%A9%E3%83%A1%E3%83%BC%E3 % 82% BF% E8% AA% BF% E6% 95% B4% E3% 81% AB% E3% 81% A4% E3% 81% 84% E3% 81% A6% E3% 81% AE% E6% 84 It is the same as the content mentioned in% 9F% E6% 83% B3).
There are two ways to update the parameters of Discriminator, one is to update the batch of real images and the batch of fake images in one batch, and the other is to explicitly update the loss function in two. .. If there is no Batch Normalization layer, the final gradient will not change, but if it is included, there will be a clear difference (complex). At first, if I updated with the former method, I got a strange result that the winning percentage was 100% for both G and D. At first I suspected it was a bug on the chainer side, but when I finally focused on the nature of BN and implemented the latter, it converged cleanly (it was not a bug).
I understand that it is not good to put the correct image and the random number image in the same batch and normalize it, but I do not understand at all why this phenomenon occurs (By the way, the discriminator of keras-dcgan is BN. Does not seem to address this issue as it does not contain). As it is written, it seems that it would be good to "explicitly divide the loss function into two and update", but I have not yet grasped what it means to explicitly divide the loss function into two. There was no source code for this article, so instead, take a look at the source code for this article. Applicable part
DCGAN.py
# train generator
z = Variable(xp.random.uniform(-1, 1, (batchsize, nz), dtype=np.float32))
x = gen(z)
yl = dis(x)
L_gen = F.softmax_cross_entropy(yl, Variable(xp.zeros(batchsize, dtype=np.int32)))
L_dis = F.softmax_cross_entropy(yl, Variable(xp.ones(batchsize, dtype=np.int32)))
# train discriminator
x2 = Variable(cuda.to_gpu(x2))
yl2 = dis(x2)
L_dis += F.softmax_cross_entropy(yl2, Variable(xp.zeros(batchsize, dtype=np.int32)))
#print "forward done"
o_gen.zero_grads()
L_gen.backward()
o_gen.update()
o_dis.zero_grads()
L_dis.backward()
o_dis.update()
sum_l_gen += L_gen.data.get()
sum_l_dis += L_dis.data.get()
#print "backward done"
I don't know Chainer at all, but it seems that the calculation of the visual loss is performed separately for the correct image and the random number image, and then the weight is updated accordingly.
So, I thought I would do the same thing with Keras, but I couldn't find anything around Functional API.
After all, as a compromise, I decided to take the method of separating the correct image and the random number image and performing train_on_batch
separately (I will explain later, but when I put BN in Discriminator, it did not succeed even once, so this is not correct There is a lot of potential).
DCGAN
Ubuntu16.04 Core i7 2600k Geforce GTX1060 6GB
The data set is 11664 images of 54 32x32 grayscale abyss characters images used in the previous article, scaled and moved up, down, left and right by forcibly increasing the brightness ([I keep only the original image]. (https://github.com/t-ae/abyss-letter2/tree/master/abyss_letters)). That's why you should assume that the rest of the content is not general.
I don't know how to evaluate DCGAN, so I will measure it by a method that I think appropriately.
--Step by step train_loss, train_accuracy --Val_loss, val_accuracy for each epoch ――Can you complement between random number vectors?
The last one is [this article](http://qiita.com/mattya/items/e5bfe5e04b9d2f0bbd47#z%E3%81%AE%E7%A9%BA%E9%96%93%E3%82%92%E8 % AA% BF% E3% 81% B9% E3% 82% 8B) is detailed, so please read it.
First of all, as mentioned in the paper, I tried to put Batch Normalization in both Discriminator and Generator. The image below plots train_loss and train_accuracy. I tried various things while changing the parameters of Optimizer, but I couldn't help but fell into an apparently unnatural state near the end of loss. The following is an example of the image output in this state. In addition to this, everything is black, and anyway, only similar images are generated for random number input. The probable cause is the train step that matches the Keras specifications mentioned above, but since it cannot be fixed, I decided not to use BN for the Discriminator after learning from keras-dcgan. For those who want to follow the paper or who want to enjoy using the implementation examples of their predecessors, I think it is better to do it other than Keras.
Since I removed the BN from the Discriminator (although parameter adjustment is still difficult), it started to spit out decent images. Although it is in Repository, the following code may have changed because it is in the process of being tampered with.
train_dcgan.py
# define models
discriminator = Sequential([
Convolution2D(64, 3, 3, border_mode='same', subsample=(2,2), input_shape=[32, 32, 1]),
LeakyReLU(),
Convolution2D(128, 3, 3, border_mode='same', subsample=(2,2)),
LeakyReLU(),
Convolution2D(256, 3, 3, border_mode='same', subsample=(2,2)),
LeakyReLU(),
Flatten(),
Dense(2048),
LeakyReLU(),
Dense(1, activation='sigmoid')
], name="discriminator")
generator = Sequential([
Convolution2D(64, 3, 3, border_mode='same', input_shape=[4, 4, 4]),
UpSampling2D(), # 8x8
Convolution2D(128, 3, 3, border_mode='same'),
BatchNormalization(),
ELU(),
UpSampling2D(), #16x16
Convolution2D(128, 3, 3, border_mode='same'),
BatchNormalization(),
ELU(),
UpSampling2D(), # 32x32
Convolution2D(1, 5, 5, border_mode='same', activation='tanh')
], name="generator")
# setup models
print("setup discriminator")
opt_d = Adam(lr=1e-5, beta_1=0.1)
discriminator.compile(optimizer=opt_d,
loss='binary_crossentropy',
metrics=['accuracy'])
print("setup dcgan")
set_trainable(discriminator, False)
dcgan = Sequential([generator, discriminator])
opt_g = Adam(lr=2e-4, beta_1=0.5)
dcgan.compile(optimizer=opt_g,
loss='binary_crossentropy',
metrics=['accuracy'])
The model is a paper-based modification. Discriminator learns so quickly that the Generator hasn't caught up at all, so we've increased the number of hidden layer units more than necessary and reduced the learning rate.
train_dcgan.py
def create_random_features(num):
return np.random.uniform(low=-1, high=1,
size=[num, 4, 4, 4])
for epoch in range(1, sys.maxsize):
print("epoch: {0}".format(epoch))
np.random.shuffle(X_train)
rnd = create_random_features(len(X_train))
# train on batch
for i in range(math.ceil(len(X_train)/batch_size)):
print("batch:", i, end='\r')
X_batch = X_train[i*batch_size:(i+1)*batch_size]
rnd_batch = rnd[i*batch_size:(i+1)*batch_size]
loss_g, acc_g = dcgan.train_on_batch(rnd_batch, [0]*len(rnd_batch))
generated = generator.predict(rnd_batch)
X = np.append(X_batch, generated, axis=0)
y = [0]*len(X_batch) + [1]*len(generated)
loss_d, acc_d = discriminator.train_on_batch(X,y)
met_curve = np.append(met_curve, [[loss_d, acc_d, loss_g, acc_g]], axis=0)
# val_Calculation of loss etc., output and model saving continue
I referred to keras-dcgan, but I changed the order so that I learned Generator first and then Discriminator. This is because I thought that if G comes first, the probability that the random number will deceive D will be higher, and learning at the beginning will be more efficient. The effect is not well understood.
Here is an example of how well the parameters were adjusted. It took about 8 hours with 3000 epochs. It seems good to put in a condition such as val_acc ends when it is 0 many times in a row. At 100 epochs At the time of 300 epoch At 1000 epochs As of 2000 epoch At 3000 epoch
The 1000-2000 epoch looks the best. It's broken when you reach 3000 at the end.
The train_loss / accuracy was as follows.
The loss of G is vibrating and rising to the right. I tried various things, but I couldn't suppress the increase in G loss even if I made the learning rate of D considerably smaller. I will take out the middle part. Although train_loss flew frequently, it seemed to be fairly stable as a whole. Although train_acc is a low-flying flight, I think that learning should proceed more efficiently if it stabilizes where it is larger. Especially when learning in the order of G-> D as in the above code, train_acc of G is directly linked to the number of deceived images input to D.
Finally, let's see that the two can be complemented. The leftmost and rightmost columns are images generated from random vectors, with spaces complementing them. It seems that the space is formed well because it is deformed fairly smoothly regardless of whether it is character-like.
While struggling with various problems, I thought that DCGAN itself could be managed with brute force (reasonable parameters and time). I used to give up early if it seemed impossible to see loss / acc in the early stages, but even those may be managed if I take the time. I looked at train_loss / acc as an evaluation of how well I was learning, but the loss of the Generator started to rise around the beginning, so it was useful as a criterion to give up if it seemed to rise endlessly. On the contrary, even if the value was stable, it was difficult to obtain a good image, so it seemed to be not very useful for evaluation after the middle stage. I didn't mention val_loss / acc in the bullet points at all, but the variation from step to step is so large and the direction of travel is different, so the value to evaluate once in the epoch is not very meaningful. It was. It was useful as a stop judgment. Complementary evaluation between the two is likely to be an objective criterion for whether DCGAN is successful. However, even if it can be cleared, it seems that the progress of the number of epochs and the output are not always the same, so it seems that we have to make a subjective judgment as to which stage of the model is the most suitable ( It may be good to scan the feature space to check what percentage of the original image the features correspond to, but it seems to be insanely difficult).
Chainer was wondering why forward and backward were separated, but this train step seems to have been an effective situation. I also looked at some TensorFlow implementations, but I didn't quite understand the sample. Writing in Keras makes it easier to understand, but I was still worried about the restrictions on the train step. In this article, the degree of freedom was conspicuous, but I think that it is the best for writing general problems such as image classification quickly, so if you do not want to study TF like me, please try it.
For the time being, I have created a model for generating abyss characters that looks like that, so I would like to post a brushed-up version of the previous article by the time the 5th volume of Made in Abyss was released.
11/20: Addendum When the number of dimensions of random numbers was reduced to 30, good output came to be obtained even with about 200 epochs. Although it is a trade-off with expressiveness, it seems that learning speed can be improved by adjusting the number of dimensions according to the diversity of the original image. If there is a measure for the diversity of output, it seems that we can explore more.
Recommended Posts