It feels like it's a brew, but I'm going to learn an automatic Pokemon generation model. If you look it up, some people have already tried it based on the latest model, StyleGAN2. However, the actual code, dataset, and implementation have not been published, so I would like to summarize this in an article.
StyleGAN2 is an image generation model announced by NVIDIA. It is a generative model that is characterized by using style conversion, and is a powerful model that is currently SOTA for multiple tasks.
Official implementation of StyleGAN2 Trends in image generation task methods
When I was investigating whether there was any good data set, there was a person who collected 15,000 data in advance and made it public, so I used this. MonsterGAN
Others include Kaggle's Pokemon-Image-Dataset and [One-Shot-Pokemon-Images](https://www. kaggle.com/aaronyin/oneshotpokemon) is also a candidate. One-Shot-Pokemon-Images contains a Pokemon card dataset, which has a very large amount of data, so [Pokemon card generation task](https://devopstar.com/2019/05/21/ It seems that it is applied to stylegan-pokemon-card-generator).
It seems that Michael Friese is enthusiastically trying to generate Pokemon using StyleGAN / StyleGAN2. It seems that StyleGAN has already left a pretty good result.
When transfer learning is performed from a trained model such as a cat or horse, the learning itself is successful and the generated image seems to reflect the atmosphere of the domain before transfer. It's really emo.
Transfer from horse
Transfer from cat
Transfer from car
The generation result using StyleGAN2 was released only halfway. This is the result of learning up to 540kimg. Since I am in the middle of learning, I feel that there are quite a lot of variations, leaving the quality aside.
I am still in the process of learning, but I will show you the progress of learning using StyleGAN2. Since we are training with RTX1070, a messy model is impossible due to the memory size. This time, I resized all the datasets to 64x64.
Model overview
G Params OutputShape WeightShape
--- --- --- ---
latents_in - (?, 512) -
labels_in - (?, 0) -
lod - () -
dlatent_avg - (512,) -
G_mapping/latents_in - (?, 512) -
G_mapping/labels_in - (?, 0) -
G_mapping/Normalize - (?, 512) -
G_mapping/Dense0 262656 (?, 512) (512, 512)
G_mapping/Dense1 262656 (?, 512) (512, 512)
G_mapping/Dense2 262656 (?, 512) (512, 512)
G_mapping/Dense3 262656 (?, 512) (512, 512)
G_mapping/Dense4 262656 (?, 512) (512, 512)
G_mapping/Dense5 262656 (?, 512) (512, 512)
G_mapping/Dense6 262656 (?, 512) (512, 512)
G_mapping/Dense7 262656 (?, 512) (512, 512)
G_mapping/Broadcast - (?, 10, 512) -
G_mapping/dlatents_out - (?, 10, 512) -
Truncation/Lerp - (?, 10, 512) -
G_synthesis/dlatents_in - (?, 10, 512) -
G_synthesis/4x4/Const 8192 (?, 512, 4, 4) (1, 512, 4, 4)
G_synthesis/4x4/Conv 2622465 (?, 512, 4, 4) (3, 3, 512, 512)
G_synthesis/4x4/ToRGB 264195 (?, 3, 4, 4) (1, 1, 512, 3)
G_synthesis/8x8/Conv0_up 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Conv1 2622465 (?, 512, 8, 8) (3, 3, 512, 512)
G_synthesis/8x8/Upsample - (?, 3, 8, 8) -
G_synthesis/8x8/ToRGB 264195 (?, 3, 8, 8) (1, 1, 512, 3)
G_synthesis/16x16/Conv0_up 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Conv1 2622465 (?, 512, 16, 16) (3, 3, 512, 512)
G_synthesis/16x16/Upsample - (?, 3, 16, 16) -
G_synthesis/16x16/ToRGB 264195 (?, 3, 16, 16) (1, 1, 512, 3)
G_synthesis/32x32/Conv0_up 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Conv1 2622465 (?, 512, 32, 32) (3, 3, 512, 512)
G_synthesis/32x32/Upsample - (?, 3, 32, 32) -
G_synthesis/32x32/ToRGB 264195 (?, 3, 32, 32) (1, 1, 512, 3)
G_synthesis/64x64/Conv0_up 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Conv1 2622465 (?, 512, 64, 64) (3, 3, 512, 512)
G_synthesis/64x64/Upsample - (?, 3, 64, 64) -
G_synthesis/64x64/ToRGB 264195 (?, 3, 64, 64) (1, 1, 512, 3)
G_synthesis/images_out - (?, 3, 64, 64) -
G_synthesis/noise0 - (1, 1, 4, 4) -
G_synthesis/noise1 - (1, 1, 8, 8) -
G_synthesis/noise2 - (1, 1, 8, 8) -
G_synthesis/noise3 - (1, 1, 16, 16) -
G_synthesis/noise4 - (1, 1, 16, 16) -
G_synthesis/noise5 - (1, 1, 32, 32) -
G_synthesis/noise6 - (1, 1, 32, 32) -
G_synthesis/noise7 - (1, 1, 64, 64) -
G_synthesis/noise8 - (1, 1, 64, 64) -
images_out - (?, 3, 64, 64) -
--- --- --- ---
Total 27032600
D Params OutputShape WeightShape
--- --- --- ---
images_in - (?, 3, 64, 64) -
labels_in - (?, 0) -
64x64/FromRGB 2048 (?, 512, 64, 64) (1, 1, 3, 512)
64x64/Conv0 2359808 (?, 512, 64, 64) (3, 3, 512, 512)
64x64/Conv1_down 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
64x64/Skip 262144 (?, 512, 32, 32) (1, 1, 512, 512)
32x32/Conv0 2359808 (?, 512, 32, 32) (3, 3, 512, 512)
32x32/Conv1_down 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
32x32/Skip 262144 (?, 512, 16, 16) (1, 1, 512, 512)
16x16/Conv0 2359808 (?, 512, 16, 16) (3, 3, 512, 512)
16x16/Conv1_down 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
16x16/Skip 262144 (?, 512, 8, 8) (1, 1, 512, 512)
8x8/Conv0 2359808 (?, 512, 8, 8) (3, 3, 512, 512)
8x8/Conv1_down 2359808 (?, 512, 4, 4) (3, 3, 512, 512)
8x8/Skip 262144 (?, 512, 4, 4) (1, 1, 512, 512)
4x4/MinibatchStddev - (?, 513, 4, 4) -
4x4/Conv 2364416 (?, 512, 4, 4) (3, 3, 513, 512)
4x4/Dense0 4194816 (?, 512) (8192, 512)
Output 513 (?, 1) (512, 1)
scores_out - (?, 1) -
--- --- --- ---
Total 26488833
I'm sorry I forgot to change the grid settings of snapshot and it's hard to see the image. .. .. It's hard to learn, so I haven't tried again.
Dataset example (reals.png)
Generation result (288kimg: 19 hours) The contours are gradually formed, and the outline of Pokemon that looks like a human or animal type has been created.
Frechet Inception Distance (FID) is output to monitor whether learning is progressing properly. So far, things are going well. The official page says that the FID is a single digit, but it is too expensive to learn so much, so I will stop learning when the generated image looks good with my own eyes.
network-snapshot- time 19m 34s fid50k 278.0748
network-snapshot- time 19m 34s fid50k 382.7474
network-snapshot- time 19m 34s fid50k 338.3625
network-snapshot- time 19m 24s fid50k 378.2344
network-snapshot- time 19m 33s fid50k 306.3552
network-snapshot- time 19m 33s fid50k 173.8370
network-snapshot- time 19m 30s fid50k 112.3612
network-snapshot- time 19m 31s fid50k 99.9480
network-snapshot- time 19m 35s fid50k 90.2591
network-snapshot- time 19m 38s fid50k 75.5776
network-snapshot- time 19m 39s fid50k 67.8876
network-snapshot- time 19m 39s fid50k 66.0221
network-snapshot- time 19m 46s fid50k 63.2856
network-snapshot- time 19m 40s fid50k 64.6719
network-snapshot- time 19m 31s fid50k 64.2135
network-snapshot- time 19m 39s fid50k 63.6304
network-snapshot- time 19m 42s fid50k 60.5562
network-snapshot- time 19m 36s fid50k 59.4038
network-snapshot- time 19m 36s fid50k 57.2236
This time, I investigated the data set and the precedent cases, and wrote up to the point where I actually proceeded with the learning. The code that worked in my environment and the generated results after more learning will be summarized in the second part of the article at a later date.
Part 2 → Creating an unknown Pokemon with StyleGAN2 [Part 2]
640kimg A unique and similar thing has begun to be born
Actually, when the learning progressed so far, I was running on the C drive, so the data capacity became insufficient and the model was saved empty and disappeared ... Reflect on it, move it to the added HDD at the expense of read speed, and re-execute. It's too painful ... the results of two days of learning ...
I'm sorry I didn't think about anything, but I'm sorry for those who sympathize with me ...
When saving the model, I was concerned about the capacity and overwrote only the latest one every time, but that backfired. I will diligently save the model on the HDD.
Recommended Posts