Do you know this icon?
Yes, it's the icon of the famous Melville . It is known that there are many people who have Melville draw their favorite characters and use them as thumbnails on twitter, and have gained great support. The icon drawn by this person is often called "Mel Icon" because of its unique style. Examples of typical mel icons
(Respectively of Yukatayu and Shun (As of February 19, 2020))
I also want an icon like this! !! !! !! !! !! So I made a mel icon generator by machine learning. In this article, I would like to briefly introduce the method used for it.
GAN (Generative adversarial networks) is used for generation.
This method combines a neural network (Generator) that generates an image and a neural network (Discriminator) that identifies whether the input data is a mel icon or not. The Generator tries to generate an image that resembles a mel icon as much as possible in order to deceive the Discriminator, and the Discriminator learns to identify the image more accurately. As the two neural networks train each other, the Generator will be able to generate images that are close to the Mel Icon.
In order for the Generator to be able to generate images that look like Mel icons, and for the Discriminator to be able to identify whether the input image is a Mel icon, bring as many real Mel icons as possible to the teacher. You need to create a dataset that will be the data and use it for training. So I went around twitter, found the thumbnail of Mel Icon, and saved it repeatedly, and got more than 100 sheets. Use this for learning.
Let the Generator look at the Mel icon prepared earlier and learn to generate an image that looks like it. The image to be generated is 64 x 64 pixels, and the color is rgb 3 channels. If the Generator generates similar data every time, learning will not proceed well, so it is necessary to be able to generate as many types of images as possible. Therefore, input a sequence of random numbers to the Generator for image generation. For this sequence, a process called "transpose convolution", which will be described later, is applied to each convolution layer to gradually bring it closer to an image with 3 channels of 64 x 64 pixels and rgb.
For normal convolution, as shown below, the sum product is taken and output while shifting the kernel. In pytorch, it can be implemented with torch.nn.Conv2d, for example.
Source SourceOn the other hand, in the transposed convolution used this time, the product with the kernel is calculated for each element, and the sum of the results obtained is taken. As an image, it feels like expanding the target element. In pytorch, it can be implemented with torch.nn.ConvTranspose2d, for example.
SourceThis transposed convolution layer and the self_attention layer (described later) are overlapped, and the number of output channels is 3 in the last layer. (Corresponds to rgb respectively) From the above contents, the outline of the Generator you are trying to make is as shown in the figure below.
This Generator has a total of 5 transposed convolution layers, with a layer called self_attention between the 3rd and 4th layers and between the 4th and 5th layers. By looking at pixels with similar values at once, it is possible to evaluate the entire image with a relatively small amount of calculation.
The Generator configured in this way outputs, for example, such an image if it is in an unlearned state. (The result depends on the sequence of random numbers you enter.) Since it has not been learned yet, only something like noise can be output. However, by training each other with a neural network (Discriminator) that identifies whether the input data is a mel icon or not, which will be explained next, it will be possible to output such an image.
Ask Discriminator to look at the image generated by the Generator above to see if it is a mel icon. The point is to make an image recognizer. The input image is 64 x 64 pixels, the color is rgb 3 channels, and the output is a value (range 0 to 1) that indicates how much it looks like a mel icon. The composition is to stack 5 ordinary convolution layers and sandwich a self_attention layer between the 3rd and 4th layers and between the 4th and 5th layers. The figure is as follows.
The learning methods for Discriminator and Generator are as described below.
When an image is input, Discriminator returns a number 0 to 1 that indicates how much it looks like a mel icon. First, enter the real Mel icon, and set the output (value from 0 to 1) at that time to $ d_ {real} $. Next, enter a random number into the Generator and have it generate an image. Entering this image into Discriminator will return a value between 0 and 1 as well. Let's call this $ d_ {fake} $. Input the $ d_ {real} $ and $ d_ {fake} $ that come out in this way into the loss function described below to obtain the value used for error propagation.
One of GAN's methods, SAGAN's "hinge version of the adversarial loss," uses the loss function described below. Simply put, this function labels $ l_ {i} $ and $ l_ {i} ^ {\ prime} $ with the correct labels, and $ y_ {i} $ and $ y_ {i} ^ {\ prime} $ from the Discriminator. When the output value, $ M $, is the number of data per mini-batch
-\frac{1}{M}\sum_{i=1}^{M}(l_{i}min(0,-1+y_{i})+(1-l_{i}^{\prime})min(0,-1-y_{i}^{\prime}))
Is expressed as. [^ 1] This time $ y_ {i} = d_ {real} $, $ y_ {i} ^ {\ prime} = d_ {fake} $, $ l_ {i} = 1 $ (indicates that it is a 100% mel icon) , $ L_ {i} ^ {\ prime} = 0 $ (indicating that it is not an absolute mel icon)
-\frac{1}{M}\sum_{i=1}^{M}(min(0,-1+d_{real})+min(0,-1-d_{fake}))
will do. This is the loss function of the Discriminator used this time. Adam was used as the error propagation optimization method, and the learning rate was set to 0.0004, and Adam's primary moment and secondary moment (exponential attenuation factor used for moment estimation) were set to 0.0 and 0.9, respectively.
When a sequence of random numbers is input, Generator will generate an image while trying to make it look like a mel icon as much as possible. First, input the sequence $ z_ {i} $ made of random numbers into the Generator to get an image. Input it to Discriminator and output a value that shows how much it looks like a mel icon. Let's call this $ r_ {i} $.
In SAGAN's "hinge version of the adversarial loss", the generator's loss function is defined as follows:
-\frac{1}{M}\sum_{i=1}^{M}r_{i}
In SAGAN, it seems that it is empirically known that this definition works well. [^ 1] Considering that $ M $ is the number of data per mini-batch, the judgment result of Discriminator is used as it is. I was a little surprised at this, but how about it? Adam was used as the error propagation optimization method, and the learning rate was set to 0.0001, and Adam's primary and secondary moments were set to 0.0 and 0.9, respectively. (Same as Discriminator except learning rate)
The image introduced above is reprinted, but the Generator and Discriminator created earlier are combined in this way to form a GAN.
Learn using the collected real mel icons and let the Generator generate mel icons. Keep the number of data $ M $ per mini-batch at 5. The result is as follows. __awesome! !! !! !! !! !! !! !! !! !! !! __ __ Impressed! !! !! !! !! !! !! !! !! !! !! __ For comparison, an example of the input data is displayed on the upper side, and the actually generated image is displayed on the lower side. Also, the generated result will change each time it is executed. Personally, I was quite surprised to be able to do this with source code that is not that long. GAN is really great! !! !! !! !! !! !! !!
I made something that can do such a great thing, but there are still some points that have not been solved yet.
The code I wrote is in this repository. https://github.com/zassou65535/image_generator
GAN is an insanely great technique. Even though the mode collapsed, I was able to make something quite close to the Mel icon with only nearly 100 datasets. Let's generate a pounding image with GAN as well.
If you simply average all the Mel icons you have collected, you will see the following image.
[^ 1]: Learn while making-Development by PyTorch Deep learning
Recommended Posts