I tried to implement CycleGAN on Linux

This time, I tried to implement CycleGAN. Basically, we will implement it based on the code published on github. On this page, we will explain and implement a light paper. I would like to do it next time when I apply it using my own dataset.

--About CycleGAN --Implementation on Linux

It's easy, but I'll explain it according to the above two items.

About CycleGAN

Paper: https://arxiv.org/pdf/1703.10593.pdf I will explain according to this paper.

Introduction

CycleGAN is a Generative Adversarial Network (GAN) that enables style conversion. コメント 2020-05-23 144015.png The above figure is described in the paper, but if you want to perform style conversion (color painting) as shown on the left, learning to use with input and output image pairs such as ** pix2pix ** The method was adopted. In other words, a one-to-one correspondence is required, as shown in Paired in the figure. On the other hand, a method that enables unpaired style conversion as shown on the right has also been proposed. In the Unpaired method, it was necessary to define various metric spaces such as class label space, image feature space, and pixel space for each style conversion task, and use them to bring the input and output distances closer.

Therefore, ** CycleGAN ** was proposed as such a method that does not require a one-to-one pair image and does not need to change the learning method according to the task. コメント 2020-05-23 145111.png Here are the images converted by ** CycleGAN **. Pictures such as landscapes have been transformed into the style of world-famous painters Monet and Gogh. This cannot be done with learning that requires pairs like ** pix2pix **. Because, in order to take a picture of the landscape drawn by Van Gogh et al., You have to travel back in time. And it also enables conversion between Zebra and Horse, and conversion between Summer and Winter. With ** CycleGAN **, you can learn style conversion without changing the learning method according to the task.

Objective function

The introduction of ** Cycle-consustency loss ** makes this possible. This is the heart of this method, so I will explain it later. コメント 2020-05-23 152841.png The image above shows the loss used in ** Cycle-GAN **. First, ** (a) ** becomes ** Adversarial loss **, which is the general loss of ** GAN **. コメント 2020-05-23 153449.png Formulated by the above formula, in the first term, Discriminator means to identify the real data * y * from the real thing. The second term means identifying the data generated by the Generator as fake. Learning is performed so that this ** Adversarial loss ** is maximized (correctly identified) for the Discriminator and minimized (misidentified) for the Generator. For Discriminator, the meaning of maximizing the first term is that the probability value of real data * y * is identified as 1 (genuine). Also, the meaning of maximizing the second term is to identify the probability value of the fake * G (z) * generated by using * G () * for * z * as 0 (fake). For Generator, the opposite is true. The purpose is to create * G () * that the Discriminator cannot identify. When maximizing / minimizing these, one is fixed. By alternately performing this maximization and minimization, we will proceed with learning. Do this for both domains. That is, <img width = "134" alt = "comment 2020-05-23 160615.png " src = "https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/642312" /be5f6f1c-6454-ec9c-6140-e4ded0610b8f.png "> and <img width =" 135 "alt =" comment 2020-05-23 160533.png "src =" https://qiita-image-store.s3. It means that we will optimize for ap-northeast-1.amazonaws.com/0/642312/5cb2a163-436d-2f39-72c4-c9405cf63283.png ">.

Next, let's talk about ** (b) ** and ** (c) **. This is called ** Cycle-consustency loss ** and is expressed by the following formula. コメント 2020-05-23 160243.png In this first section, the data * x * generated using * G () * and the data * G (x) * returned to the original domain using * F () * * F (G (x) )) * Is properly * x *, using the L1 norm. In the second section, you do the opposite. The idea is simple.

Finally, combine ** (a) to (c) **, コメント 2020-05-23 161325.png Set the objective function as shown here. コメント 2020-05-23 161539.png By solving this optimization problem, you can learn the desired * G * and * F *.

Experimental result

This is an example of experimental results. High-precision style conversion is realized in various tasks such as conversion of "horse" and "zebra", conversion of "summer" and "winter" in landscape photography, conversion of "apple" and "mandarin orange". I understand. コメント 2020-05-23 162151.png

This is an example of failure. President Putin has come to an end. As you can see, although texture conversion works well, it seems that shape-capturing conversion can be difficult. I think it will be solved by introducing an object detection method.

If you want to see more results, please take a look at the paper.

Implementation on Linux

Public code https://github.com/xhujoy/CycleGAN-tensorflow

Implementation environment

Ubuntu 18.04 LTS
Python 3.6
PyTorch 0.4.0
Tensorflow 1.4.0
numpy 1.11.0
scipy 0.17.0
pillow 3.3.0

Implemented on public dataset

First, clone git to any directory. Then change to the CycleGAN-tensorflow / directory. This time, we will download the ** horse2zebra ** dataset that was also used in the paper.

$ git clone https://github.com/xhujoy/CycleGAN-tensorflow
$ cd CycleGAN-tensorflow/
$ bash ./download_dataset.sh horse2zebra

Learning

Next, we will train with the downloaded ** horse2zebra ** dataset.

$ CUDA_VISIBLE_DEVICES=0 python main.py --dataset_dir=horse2zebra

When specifying GPU, specify with CUDA_VISIBLE_DEVICES =. Learning begins.

Epoch: [ 0] [   0/1067] time: 14.2652
Epoch: [ 0] [   1/1067] time: 16.9671
Epoch: [ 0] [   2/1067] time: 17.6442
Epoch: [ 0] [   3/1067] time: 18.3194
Epoch: [ 0] [   4/1067] time: 19.0001
Epoch: [ 0] [   5/1067] time: 19.6724
Epoch: [ 0] [   6/1067] time: 20.3511
Epoch: [ 0] [   7/1067] time: 21.0326
Epoch: [ 0] [   8/1067] time: 21.7106
Epoch: [ 0] [   9/1067] time: 22.3866
Epoch: [ 0] [  10/1067] time: 23.0501
Epoch: [ 0] [  11/1067] time: 23.7298
.
.
.

By default, Epoch is set to 200 times. You can change this according to the dataset you apply. If you're not learning transformations that make such a big difference, you might try reducing Epoch. Note that there are testA /, testB /, trainA /, and trainB / in the downloaded datasets / horse2zebra / directory, and there are images in each directory. ____ is inside. Even at the time of learning, if there is no data in either testA / or testB /, the following error will be thrown.

ValueError: Cannot feed value of shape (1, 256, 256, 6) for Tensor 'real_A_and_B_images:0', which has shape '(?, 512, 512, 6)'

Be careful when building and implementing your own dataset.

test

The test is done with the following command.

$ CUDA_VISIBLE_DEVICES=0 python main.py --dataset_dir=horse2zebra --phase=test --which_direction=AtoB

Specify AtoB or BtoA with the --which_direction = option. Images in datasets / horse2zebra / testA or datasets / horse2zebra / testB are converted and saved in test /. Each image is easy to understand and is marked with ʻAtoB_orBtoA_`.

The following is an example of the test results.

horse2zebra (AtoB)
zebra2horse (BtoA) The conversion is done firmly. it's great. The above is the implementation on Linux. Next time, I would like to apply it to my own dataset.

Reference material

Paper: https://arxiv.org/pdf/1703.10593.pdf Github：https://github.com/xhujoy/CycleGAN-tensorflow

I implemented CycleGAN (1)