Creating an unknown Pokemon with StyleGAN2 [Part 1]

What to do this time

It feels like it's a brew, but I'm going to learn an automatic Pokemon generation model. If you look it up, some people have already tried it based on the latest model, StyleGAN2. However, the actual code, dataset, and implementation have not been published, so I would like to summarize this in an article.

What is StyleGAN2

StyleGAN2 is an image generation model announced by NVIDIA. It is a generative model that is characterized by using style conversion, and is a powerful model that is currently SOTA for multiple tasks.

Official implementation of StyleGAN2 Trends in image generation task methods

data set

When I was investigating whether there was any good data set, there was a person who collected 15,000 data in advance and made it public, so I used this. MonsterGAN

Others include Kaggle's Pokemon-Image-Dataset and [One-Shot-Pokemon-Images](https://www. kaggle.com/aaronyin/oneshotpokemon) is also a candidate. One-Shot-Pokemon-Images contains a Pokemon card dataset, which has a very large amount of data, so [Pokemon card generation task](https://devopstar.com/2019/05/21/ It seems that it is applied to stylegan-pokemon-card-generator).

pokemon-stylegan-example.png

Preceding case

It seems that Michael Friese is enthusiastically trying to generate Pokemon using StyleGAN / StyleGAN2. It seems that StyleGAN has already left a pretty good result. michael-stylegan-pokemon.jpg

When transfer learning is performed from a trained model such as a cat or horse, the learning itself is successful and the generated image seems to reflect the atmosphere of the domain before transfer. It's really emo.

Transfer from horse horse.jpg

Transfer from cat cat.jpg

Transfer from car car.jpg

The generation result using StyleGAN2 was released only halfway. This is the result of learning up to 540kimg. Since I am in the middle of learning, I feel that there are quite a lot of variations, leaving the quality aside. michael-stylegan2-pokemon-540k.png

Results of self-study

I am still in the process of learning, but I will show you the progress of learning using StyleGAN2. Since we are training with RTX1070, a messy model is impossible due to the memory size. This time, I resized all the datasets to 64x64.

Model overview

G                           Params    OutputShape       WeightShape     
---                         ---       ---               ---             
latents_in                  -         (?, 512)          -               
labels_in                   -         (?, 0)            -               
lod                         -         ()                -               
dlatent_avg                 -         (512,)            -               
G_mapping/latents_in        -         (?, 512)          -               
G_mapping/labels_in         -         (?, 0)            -               
G_mapping/Normalize         -         (?, 512)          -               
G_mapping/Dense0            262656    (?, 512)          (512, 512)      
G_mapping/Dense1            262656    (?, 512)          (512, 512)      
G_mapping/Dense2            262656    (?, 512)          (512, 512)      
G_mapping/Dense3            262656    (?, 512)          (512, 512)      
G_mapping/Dense4            262656    (?, 512)          (512, 512)      
G_mapping/Dense5            262656    (?, 512)          (512, 512)      
G_mapping/Dense6            262656    (?, 512)          (512, 512)      
G_mapping/Dense7            262656    (?, 512)          (512, 512)      
G_mapping/Broadcast         -         (?, 10, 512)      -               
G_mapping/dlatents_out      -         (?, 10, 512)      -               
Truncation/Lerp             -         (?, 10, 512)      -               
G_synthesis/dlatents_in     -         (?, 10, 512)      -               
G_synthesis/4x4/Const       8192      (?, 512, 4, 4)    (1, 512, 4, 4)  
G_synthesis/4x4/Conv        2622465   (?, 512, 4, 4)    (3, 3, 512, 512)
G_synthesis/4x4/ToRGB       264195    (?, 3, 4, 4)      (1, 1, 512, 3)  
G_synthesis/8x8/Conv0_up    2622465   (?, 512, 8, 8)    (3, 3, 512, 512)
G_synthesis/8x8/Conv1       2622465   (?, 512, 8, 8)    (3, 3, 512, 512)
G_synthesis/8x8/Upsample    -         (?, 3, 8, 8)      -               
G_synthesis/8x8/ToRGB       264195    (?, 3, 8, 8)      (1, 1, 512, 3)  
G_synthesis/16x16/Conv0_up  2622465   (?, 512, 16, 16)  (3, 3, 512, 512)
G_synthesis/16x16/Conv1     2622465   (?, 512, 16, 16)  (3, 3, 512, 512)
G_synthesis/16x16/Upsample  -         (?, 3, 16, 16)    -               
G_synthesis/16x16/ToRGB     264195    (?, 3, 16, 16)    (1, 1, 512, 3)  
G_synthesis/32x32/Conv0_up  2622465   (?, 512, 32, 32)  (3, 3, 512, 512)
G_synthesis/32x32/Conv1     2622465   (?, 512, 32, 32)  (3, 3, 512, 512)
G_synthesis/32x32/Upsample  -         (?, 3, 32, 32)    -               
G_synthesis/32x32/ToRGB     264195    (?, 3, 32, 32)    (1, 1, 512, 3)  
G_synthesis/64x64/Conv0_up  2622465   (?, 512, 64, 64)  (3, 3, 512, 512)
G_synthesis/64x64/Conv1     2622465   (?, 512, 64, 64)  (3, 3, 512, 512)
G_synthesis/64x64/Upsample  -         (?, 3, 64, 64)    -               
G_synthesis/64x64/ToRGB     264195    (?, 3, 64, 64)    (1, 1, 512, 3)  
G_synthesis/images_out      -         (?, 3, 64, 64)    -               
G_synthesis/noise0          -         (1, 1, 4, 4)      -               
G_synthesis/noise1          -         (1, 1, 8, 8)      -               
G_synthesis/noise2          -         (1, 1, 8, 8)      -               
G_synthesis/noise3          -         (1, 1, 16, 16)    -               
G_synthesis/noise4          -         (1, 1, 16, 16)    -               
G_synthesis/noise5          -         (1, 1, 32, 32)    -               
G_synthesis/noise6          -         (1, 1, 32, 32)    -               
G_synthesis/noise7          -         (1, 1, 64, 64)    -               
G_synthesis/noise8          -         (1, 1, 64, 64)    -               
images_out                  -         (?, 3, 64, 64)    -               
---                         ---       ---               ---             
Total                       27032600                                    


D                    Params    OutputShape       WeightShape     
---                  ---       ---               ---             
images_in            -         (?, 3, 64, 64)    -               
labels_in            -         (?, 0)            -               
64x64/FromRGB        2048      (?, 512, 64, 64)  (1, 1, 3, 512)  
64x64/Conv0          2359808   (?, 512, 64, 64)  (3, 3, 512, 512)
64x64/Conv1_down     2359808   (?, 512, 32, 32)  (3, 3, 512, 512)
64x64/Skip           262144    (?, 512, 32, 32)  (1, 1, 512, 512)
32x32/Conv0          2359808   (?, 512, 32, 32)  (3, 3, 512, 512)
32x32/Conv1_down     2359808   (?, 512, 16, 16)  (3, 3, 512, 512)
32x32/Skip           262144    (?, 512, 16, 16)  (1, 1, 512, 512)
16x16/Conv0          2359808   (?, 512, 16, 16)  (3, 3, 512, 512)
16x16/Conv1_down     2359808   (?, 512, 8, 8)    (3, 3, 512, 512)
16x16/Skip           262144    (?, 512, 8, 8)    (1, 1, 512, 512)
8x8/Conv0            2359808   (?, 512, 8, 8)    (3, 3, 512, 512)
8x8/Conv1_down       2359808   (?, 512, 4, 4)    (3, 3, 512, 512)
8x8/Skip             262144    (?, 512, 4, 4)    (1, 1, 512, 512)
4x4/MinibatchStddev  -         (?, 513, 4, 4)    -               
4x4/Conv             2364416   (?, 512, 4, 4)    (3, 3, 513, 512)
4x4/Dense0           4194816   (?, 512)          (8192, 512)     
Output               513       (?, 1)            (512, 1)        
scores_out           -         (?, 1)            -               
---                  ---       ---               ---             
Total                26488833                                    

I'm sorry I forgot to change the grid settings of snapshot and it's hard to see the image. .. .. It's hard to learn, so I haven't tried again.

Dataset example (reals.png) reals.png

Generation result (288kimg: 19 hours) fakes000288.png The contours are gradually formed, and the outline of Pokemon that looks like a human or animal type has been created.

Frechet Inception Distance (FID) is output to monitor whether learning is progressing properly. So far, things are going well. The official page says that the FID is a single digit, but it is too expensive to learn so much, so I will stop learning when the generated image looks good with my own eyes.

network-snapshot-              time 19m 34s      fid50k 278.0748
network-snapshot-              time 19m 34s      fid50k 382.7474
network-snapshot-              time 19m 34s      fid50k 338.3625
network-snapshot-              time 19m 24s      fid50k 378.2344
network-snapshot-              time 19m 33s      fid50k 306.3552
network-snapshot-              time 19m 33s      fid50k 173.8370
network-snapshot-              time 19m 30s      fid50k 112.3612
network-snapshot-              time 19m 31s      fid50k 99.9480
network-snapshot-              time 19m 35s      fid50k 90.2591
network-snapshot-              time 19m 38s      fid50k 75.5776
network-snapshot-              time 19m 39s      fid50k 67.8876
network-snapshot-              time 19m 39s      fid50k 66.0221
network-snapshot-              time 19m 46s      fid50k 63.2856
network-snapshot-              time 19m 40s      fid50k 64.6719
network-snapshot-              time 19m 31s      fid50k 64.2135
network-snapshot-              time 19m 39s      fid50k 63.6304
network-snapshot-              time 19m 42s      fid50k 60.5562
network-snapshot-              time 19m 36s      fid50k 59.4038
network-snapshot-              time 19m 36s      fid50k 57.2236

Summary

This time, I investigated the data set and the precedent cases, and wrote up to the point where I actually proceeded with the learning. The code that worked in my environment and the generated results after more learning will be summarized in the second part of the article at a later date.

Part 2 → Creating an unknown Pokemon with StyleGAN2 [Part 2]

Postscript: Report of learning process

640kimg A unique and similar thing has begun to be born fakes000640.png

Actually, when the learning progressed so far, I was running on the C drive, so the data capacity became insufficient and the model was saved empty and disappeared ... Reflect on it, move it to the added HDD at the expense of read speed, and re-execute. It's too painful ... the results of two days of learning ...

I'm sorry I didn't think about anything, but I'm sorry for those who sympathize with me ...

When saving the model, I was concerned about the capacity and overwrote only the latest one every time, but that backfired. I will diligently save the model on the HDD.

Recommended Posts

Creating an unknown Pokemon with StyleGAN2 [Part 1]
Creating an unknown Pokemon with StyleGAN2 [Part 2]
Creating an egg with python
Studying Python Part.1 Creating an environment
Creating an image splitting app with Tkinter
Note when creating an environment with python
GUI programming with kivy ~ Part 5 Creating buttons with images ~
Creating an authentication function
sandbox with neo4j part 10
Creating an environment that automatically builds with Github Actions (Android)
I came up with a way to create a 3D model from a photo Part 01 Creating an environment