In the previous ** "Does Yui Aragaki live in the latent space of StyleGAN2?" **, ** Learned StyleGAN2 model ** It was found that ** new images ** that are not used for learning also have ** high image generation ability **.
This time, I would like to use the ** trained StyleGAN2 model ** to see how much ** new images that are not used for training can be edited **.
The code was created using ** Google Colab ** and posted on ** Github **. Please try to move it.
StyleGAN does not generate an image from one latent variable like GAN so far, but uses a Mapping network to generate an image from 18 latent variables w (this is called style). Taking advantage of this feature, editing called ** Style Mixing ** becomes possible.
The 18 latent variables ** w0 to w17 ** are layers with 9 resolutions (4 x 4, 8 x 8, 16 x 16, 32 x 32, 64 x 64, 128 x 128, 256 x 256, 512. Two are connected to x512, 1024 x 1024).
Latent variables affect image generation differently depending on the resolution. At low resolution, it affects the big picture such as face orientation, face shape, and hairstyle, and as the resolution increases, it affects details such as eyes and mouth. I will.
Here, you can mix the features of the two ** images ** by swapping only the part of the image A and B with the ** latent variable w **, which is called ** Style Miximg **. is.
This image shows the result of replacing the latent variables ** w0, w1 ** in the blue frame ** Row_pic ** with those in the red frame ** Col_pic **. ** w0 and w1 mainly affect the orientation of the face and the presence or absence of glasses **, so only the orientation of the face can be changed independently.
This image is the Row pic ** w4, w5 ** replaced with Col_pic. It is mainly w4, w5 ** that affect the shape of the mouth, which is the point of laughter. It feels like the way you open your mouth is moving as it is, so you can edit the nuances of how you laugh.
This image is Row_pic with ** w0, w1, w2 ** replaced with Col_pic. ** It is mainly w0, w1, w2 ** that affect the glasses. Since it covers the direction of the face, the direction of the face also changes at the same time.
What's interesting is that the shape of the glasses doesn't move as it is, but it seems that Row_pic has ** individual glasses attributes **. Therefore, it is difficult to wear glasses of the intended shape.
This image is Row_pic with ** w4, w5, w6, w7 ** replaced with Col_pic. ** Leave w2, w3 ** affected by face shape and hairstyle, and replace only ** w4, w5 ** affecting mouth shape and ** w6, w7 ** affected eye shape I have. If you change the shape of your eyes and mouth, you will be quite young.
As before, w2 and w3, which affect the shape of the face and hairstyle, are left as they are, and only w4, w5, w6, and w7, which affect the shape of the mouth and eyes, are replaced **. Just changing the shape of the eyes and mouth makes it look a little older.
This is a bonus. Similarly, only w4, w5, w6, w7 are replaced, but the result is not good (laughs).
This is a rough summary of the relationships between the main latent variables w related to the elements of the face image. ** w8 and above only affect ** such as contrast and color, and do not seem to directly affect the shape of the face.
I think that the trained StyleGAN2 model has ** high image editing ability ** even for ** new images **.
(reference) Play StyleGAN !! ~ Image editing without additional learning ~
Recommended Posts