The story of making the Mel Icon Generator version2

Introduction

The icon drawn by Melville, called "Mel Icon", is gaining popularity from many because of its unique style. Above is the icon of Melville . In particular, it is known that there are many people who ask this person to create an icon and use it as a twitter icon. Examples of typical mel icons

(From left to right, Yukatayu , Shun Shun , kaage (as of August 5, 2020))

I also want a mel icon like this! !! !! !! !! !! That's why I implemented a mel icon generator by machine learning! !! !! !! !! ....... is a rough outline of previous work . This time, we made a big review of the algorithm to improve many points and greatly evolved the Mel Icon generator. In this article, I will introduce the method used for it.

What is GAN

GAN (Generative adversarial networks) is used to generate images.

is20tech001zu004-1.jpg Figure quote Original

This method combines a neural network (Generator) that generates an image and a neural network (Discriminator) that identifies whether the input data is a mel icon or not. The Generator tries to generate an image that resembles a mel icon as much as possible in order to deceive the Discriminator. Discriminator learns to identify images more accurately so as not to be fooled. As the two neural networks train each other, the Generator will be able to generate images that are close to the mel icon. In short, it's Generator VS Discriminator.

Progressive GAN There are various types of methods, even if it is called GAN. This time, I am using one of them, ** Progressive GAN **. This is done, for example, by first learning the number of times it is a convolution layer corresponding to a low resolution of 4x4, then adding a convolution layer corresponding to 8x8 and learning, and then adding 16x16 .. It is a way to proceed with learning while gradually increasing the resolution, such as ...

At the beginning of learning, the Generator is ready to output 4x4 resolution images as shown. Discriminator also takes an image with 4x4 resolution as input and outputs a value that indicates how much it looks like a mel icon.

The Generator generates an image, and the Discriminator is input with two types: a generated image and a real image (a learning data mel icon).

After learning at 4x4 resolution to some extent, we will add a convolution layer corresponding to 8x8 and continue learning.

When 8x8 is finished, add 16x16 and so on, and so on, and finally the structure will be like this. The goal this time was to output a 256x256 image.

GAN has a weakness that learning tends to be unstable when trying to learn images with relatively high resolution. However, Progressive GAN can overcome this by first looking at the general characteristics of the image and then gradually focusing on small and complex parts.

Data set preparation

In order for the Generator to be able to generate an image that looks like a Mel icon, or for the Discriminator to be able to identify whether the input image is a Mel icon, bring as many real Mel icons as possible. It is necessary to create a dataset that will be teacher data and use it for training. This time, Melville provided all the Mel icons I have created so far. That number is 751. (Overwhelming ..... Thanks .... !!!!!!) From here, we will search for Mel icons that can be used for learning. This time, I excluded Mel Icon, which is too irregular, from learning. In particular

It's like this. In addition, there were some icons that were almost the same but slightly different in hair length. Considering the impact of these on the overall learning, we added up to 4 similar mel icons to the dataset and excluded them if there were 5 or more.

The number of data sets that could be used in this way is about 640. Considering that the last time was at most 100 sheets, the amount that can be used has increased more than 6 times. These are used as training data.

Creating a Generator

The role of the Generator is to take a sequence of random numbers (which we will call noise) as input and generate a mel icon-like image based on it. When you enter the generated mel icon into Discriminator, you will learn to deceive it as a real mel icon. As a basic operation, the Generator generates an image by repeatedly convolving the input noise.

In the initial state, the neural network that makes up the Generator is as shown in the figure below.

It is an image of data input from the top layer, processed, passed to the bottom layer in sequence, and data is obtained from the bottom layer.

The top convolution layer receives the noise input to the Generator (the noise magnitude is 512 channels and 4x4 resolution), processes the convolution, and outputs data with 256 channels and 4x4 resolution. To do. The data is passed to the next convolution layer, and so on, and the last layer outputs an image with 3 channels and a resolution of 4x4. The number of output channels 3 in the last layer corresponds to each of (R, G, B), and 4x4 is the resolution of the image output by the Generator.

While learning these layers, we will "introduce" layers corresponding to the next resolution "8x8" little by little. ("Introduction little by little" will be described later.) We aim for the following states by "introducing little by little".

Here, a layer called Upsample is sandwiched between the 4x4 layer and the 8x8 layer. When data with a resolution of 4x4 is input, this layer converts it to a resolution of 8x8 and outputs it. This is achieved by nicely complementing the intermediate values of each pixel. This allows you to bridge the data between the 4x4 and 8x8 layers.

"Introduced little by little"

It is known that if you suddenly start introducing a new layer, it will have a negative effect on learning. Therefore, Progressive GAN will "introduce" layers little by little.

For example, when adding an 8x8 layer after a 4x4 layer, the output from the 4x4 layer multiplied by (1-α) and the output from the 8x8 layer You get the product of α. Next, add these two to make an output image. The value of α is set to 0 at the beginning, and it approaches 1 as the number of learnings increases.

When α is 0, the Generator neural network is the same as below.

When α is 1, the Generator neural network is the same as below.

By gradually approaching the state where α is 0 to the state of 1, it becomes possible to learn by gradually mixing high resolution layers instead of starting high resolution learning suddenly.

We will use this for the transition from 8x8 to 16x16, the transition from 16x16 to 32x32, and so on. Ultimately, we will study with the aim of creating a network that can generate 3-channel mel icons with a resolution of 256 x 256 and (R, G, B) as shown below.

Creating a Discriminator

The role of the Discriminator is to take image data as input and identify if it is a real mel icon. You will learn to improve the accuracy so that you will not be fooled by the Generator.

In the initial state, the neural network that composes Discriminator is as shown in the figure below. (The red part in the figure, MiniBatchStd, will be described later.)

The top convolution layer receives the image (corresponding to the number of channels 3 (corresponding to (R, G, B)), resolution 4 × 4) input to the Discriminator, processes the convolution, and processes the number of channels 256, resolution 4 Output x4 data and pass it to the next layer. The next layer processes the data, passes it to the next layer, and so on, and the last layer outputs data with 1 channel and 1x1 resolution. This output 1x1x1 data, in short, one value, is a value that indicates how much the input image looks like a mel icon.

Similar to Generator, while learning these layers, we will "introduce" layers corresponding to the next resolution "8x8" little by little, aiming for the following states.

In Generator, in order to bridge the data between the layers corresponding to each resolution, a process called Upsample was applied to increase the resolution and then pass the data to the next layer. Discriminator inserts a process called Downsample, which works in the exact opposite way. This makes it possible, for example, to convert 8x8 resolution data to 4x4 and bridge the data from the 8x8 layer to the 4x4 layer. (In pytorch, a function called AdaptiveAvgPool2d is useful for doing this.)

As with the Generator, the value of α is gradually increased from 0 to 1 in this way, and new layers are gradually mixed.

Ultimately, we will take the following 3-channel mel icon with a resolution of 256 x 256 and (R, G, B) as input, and learn aiming for a network that can judge whether it is genuine or fake.

VS mode collapse

The "Mini Batch Std", which is contained only in the 4x4 layer, prevents a phenomenon called "mode collapse".

What is mode collapse?

I want Generator to generate as many types of mel icons as possible. However, even though GAN inputs various random numbers, it may end up in a state where only images that can hardly tell the difference are generated. This phenomenon is called mode collapse.

This is the result of the previous work, but I will explain it using this because it is the best example.

The upper row shows 5 types of data used for training, and the lower row shows 5 types of images output by GAN. You can see that the output results are almost the same even though we have entered different random numbers 5 times.

This phenomenon is due to the Generator "tasting". Suppose a generated image successfully tricks Discriminator. If you generate another image that is almost the same as that image, it is likely that you will be able to deceive Discriminator again. As you repeat this, you will only be able to generate almost the same image.

Mini batch standard deviation

Progressive GAN has a feature that prevents the Generator from doing this. That is the layer called "Mini Batch Std". This finds a statistic called the mini-batch standard deviation and prevents mode collapse.

Discriminator receives several images at once when identifying images, and takes the standard deviation for each pixel of the image. For example, if you receive 8 images, you will need to identify whether the 8 images are output from the Generator or a real Mel icon, but for each pixel of the image for these 8 images. Takes the standard deviation to.

Furthermore, if the standard deviation is taken for each pixel, the average is taken for all channels and pixels.

As a result, the same data as that of the original image with 1 channel and the same resolution will finally come out from the MiniBatch Std layer. Pass this to the next 4x4 layer as a set with the original image.

This value is a quantity that indicates how diverse the input images that have been input multiple times. (It's an image like dispersion.) If this seems too small, Discriminator can determine that the Generator has started cheating and can detect that the input image is the generated image. If the Generator generates only similar images, it will be detected by Discriminator as a generated image. Therefore, it is forced to generate various kinds of images.

The Mini Batch Std layer, which can do this, is paired with the 4x4 layer near the end of the Discriminator to eliminate the possibility of mode collapse.

Learning method / error function

Generator and Discriminator use ** WGAN-GP ** as the loss function. The definition is as follows.

-E[d_{fake}]
E[d_{fake}] - E[d_{real}] + \lambda E_{\substack{\hat{x}\in P_{\hat{x}}}}[(||\nabla_{\hat{x}}D(\hat{x})||_{2}-1)^{2}]

I will explain these in order.

Input noise $ z $ to Generator and get as many images as there are mini-batch. (Hereafter, the number of mini-batch is $ M $. This time, $ M = 8 $.) Input it to Discriminator and output $ M $ for each image to show how much it looks like a mel icon. Let me do it. Let's call this $ d_ {fake} $. Also, input the real Mel icon into the Discriminator for $ M $, and let the output of $ M $ at that time be $ d_ {real} $.

WGAN-GP uses these $ d_ {real} $ and $ d_ {fake} $ to calculate the loss.

Learning Generator

The Generator tries to generate an image that looks like a mel icon as much as possible in order to deceive the Discriminator when a sequence of random numbers is input.

Loss function

In WGAN-GP, the generator loss function is defined as follows:

-E[d_{fake}]

The point is that the $ M $ images generated by the Generator are judged by the Discriminator, the average of the output is taken, and a minus is added. In WGAN-GP, it seems that it is empirically known that this definition works well. Adam was used as the error propagation optimization method, and the learning rate was set to 0.0005, and Adam's primary and secondary moments were set to 0.0 and 0.99, respectively.

In addition, only when learning the 256 x 256 layer, if the learning is repeated a certain number of times, the learning rate is reduced to 0.0001. (Mind, I feel that this will make the generation of the mel icon relatively successful .... (maybe because of my mind.) Maybe there is another better way.)

Discriminator learning

After error propagation of Generator, next is error propagation of Discriminator.

Loss function

In WGAN-GP, the Discriminator loss function is defined as follows.

E[d_{fake}] - E[d_{real}] + \lambda E_{\substack{\hat{x}\in P_{\hat{x}}}}[(||\nabla_{\hat{x}}D(\hat{x})||_{2}-1)^{2}]

gradient penalty The definition of gradient penalty is as follows.

\lambda E_{\substack{\hat{x}\in P_{\hat{x}}}}[(||\nabla_{\hat{x}}D(\hat{x})||_{2}-1)^{2}]

However, the distribution of the generated image and the distribution of the real image are set to $ P_ {fake} $ and $ P_ {real} $, respectively.

\epsilon\in U[0,1],x_{fake}\in P_{fake},x_{real}\in P_{real}
\hat{x}=(1-\epsilon)x_{fake}+\epsilon x_{real}

I have decided.

Let me explain the image about this. (It's just an image. It's also quite about.)

There are a lot of images $ \ hat {x} $ that are a mixture of generated images and real images at random ratios. Consider the space created by the output when this is put into the Discriminator. In the optimized Discriminator, it is known that the gradient is 1 at almost every point in this space. Perhaps it is convenient to be around 1 so that the gradient does not disappear or diverge during error propagation. Therefore, even with the Mel Icon Generator Discriminator, we will proceed with learning so that this value becomes 1. The term for that is gradient penalty, that is

\lambda E_{\substack{\hat{x}\in P_{\hat{x}}}}[(||\nabla_{\hat{x}}D(\hat{x})||_{2}-1)^{2}]

is.

Also, this time, the constant $ \ lambda $ is set to 10.0. (Since the reference material was decided to be 10.0, I followed it.)

The above is the loss function of Discriminator in WGAN-GP. Here, take the square of $ d_ {real} $ and add $ E [{d_ {real}} ^ 2] $ which is the average of them. .. This section reduces the negative impact of extreme tilt on learning.

E[d_{fake}] - E[d_{real}] + \lambda E_{\substack{\hat{x}\in P_{\hat{x}}}}[(||\nabla_{\hat{x}}D(\hat{x})||_{2}-1)^{2}] + \beta E[{d_{real}}^2]

The constant $ \ beta $ is 0.001. (This is also because the reference material was decided to be 0.001.)

The above is the loss function of Discriminator used this time. Adam was used as the error propagation optimization method, and the learning rate was set to 0.0005, and Adam's primary moment and secondary moment (exponential attenuation factor used for moment estimation) were set to 0.0 and 0.99, respectively. Furthermore, only when learning the 256 x 256 layer, if the learning is repeated a certain number of times, the learning rate is reduced to 0.0001. (Except for the loss function, it is exactly the same as Generator.)

Overall picture

The image introduced above is reprinted, but the Generator and Discriminator created earlier are combined to form a Progressive GAN. With the following state as the final goal, we will learn from low resolution for each layer.

Generate

This time, we set to move to the next resolution every 8 mini-batch and 7500 learnings. Learn using the actual mel icon you received, and let the Generator generate the mel icon.

** Okay! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !! !! ** **

I have succeeded in generating a different image every time. The resolution has also been improved. Progressive GAN Seriously great! !! !! !! !! !! !! !!

The output during learning is as follows.

You can see that learning is progressing for each resolution.

Digression: Data augmentation

In machine learning, a technique called "data augmentation" is often used as one of the methods to increase the types of images in a dataset. For each learning, you can inflate the dataset by randomly changing the contrast and hue of the image, flipping left and right, changing the angle, and distorting the entire image.

However, there are problems with doing this with the Mel Icon Generator. First of all, the special feature of the Mel Icon is that the head is drawn so that it grows from the lower left.

(Icon: Minagi (as of August 5, 2020))

For this reason, there is a high possibility that image distortion, rotation, left-right inversion, etc. will be learned unintentionally, so it is better to stop it. Also, I don't use hue conversion because an icon with an eerie color appears. However, since there were few negative effects on contrast conversion, we also learned using data augmentation.

The left is the original image, and the right is the converted image with the contrast doubled. In this way, I thought that it would be possible to increase the number of data sets overwhelmingly compared to the previous time, so I greedily doubled the number of learnings and then executed and output the learning. The result is below.

I don't think it's a lot better than when I didn't use it, but this method seems to be good.

Summary

The Mel Icon Generator not only overcomes mode collapse with Progressive GAN, but also succeeds in increasing the resolution. Progressive GAN seems to be a technique that can even generate full HD high resolution images, depending on the method and dataset. (I think 256x256 is enough if you use it as a twitter icon.) Even in the real world, application examples in the medical field seem to be active, and it seems that this method will continue to attract more attention in the future.

Let's generate a pounding image with Progressive GAN.

Source code

The code I wrote is in this repository. https://github.com/zassou65535/image_generator_2

bonus

The following images were obtained when the average (torch.mean) was taken for each pixel for the approximately 640 images used in the data set this time.

I tried it with various statistics in the same way. The following are the standard deviation (torch.std), median (torch.median), and mode (torch.mode) in order from the left.

I also tried the minimum value (torch.min) and the maximum value (torch.max), but only almost black images and white images came out, respectively.

By the way, when I calculated the standard deviation (torch.std) for 5 randomly extracted images, I got this. It might be a little fashionable.

In addition, if the minimum value (torch.min) is calculated for all data sets of nearly 640 sheets, only images that are close to black will be output, but if the number is suppressed to about 7, a mail like this will be output. The icon will pop out. The following is the minimum value for 7 randomly selected images.

Previous work

The story of making a mel icon generator

reference

Practical GAN ~ Deep learning with hostile generation network Learn while making-Development with PyTorch Deep learning > Implement PGGAN with Pytorch PGGAN "Curriculum learning full of kindness" [DL Round Reading] Improved Training of Wasserstein GANs GAN (4) WGAN

Recommended Posts

The story of making the Mel Icon Generator version2
The story of making a mel icon generator
The story of making a lie news generator
The story of making Python an exe
The story of making an immutable mold
The story of sys.path.append ()
The story of making a music generation neural network
Align the version of chromedriver_binary
The story of building Zabbix 4.4
[Apache] The story of prefork
The story of making a question box bot with discord.py
The story of downgrading the version of tensorflow in the demo of Mask R-CNN.
Test the version of the argparse module
The story of Python and the story of NaN
Raise the version of pyenv itself
The story of participating in AtCoder
pyenv-change the python version of virtualenv
Change the Python version of Homebrew
The story of the "hole" in the file
The story of remounting the application server
The story of writing a program
The story that the version of python 3.7.7 was not adapted to Heroku
The story of making a standard driver for db with python.
[Pythonista] The story of making an action to copy selected text
The story of making a module that skips mail with python
The story of trying to reconnect the client
The story of making a university 100 yen breakfast LINE bot with Python
How to check the version of Django
The story of verifying the open data of COVID-19
The story of adding MeCab to ubuntu 16.04
About the virtual environment of python version 3.7
[Python] Try pydash of the Python version of lodash
The story of the student who developed the new coronavirus countermeasure site (Ishikawa version)
The story of manipulating python global variables
Migemo version of the: find command,: mfind
The story of trying deep3d and losing
The story of making a sound camera with Touch Designer and ReSpeaker
The story of deciphering Keras' LSTM model.predict
The story of running the asp.net core 3.1 app on arm64 version Amazon Linux 2
The story of making a package that speeds up the operation of Juman (Juman ++) & KNP
The story of blackjack A processing (python)
The story of pep8 changing to pycodestyle
The story of making a box that interconnects Pepper's AL Memory and MQTT
The story of making a web application that records extensive reading with Django
The story of making a Line Bot that tells us the schedule of competitive programming
The story of making soracom_exporter (I tried to monitor SORACOM Air with Prometheus)
The story of doing deep learning with TPU
Try the python version of emacs-org parser orgparse
Use the latest version of PyCharm on Ubuntu
A note about the python version of python virtualenv
Image processing? The story of starting Python for
Try the free version of Progate [Python I]
The story of finding the optimal n in N fist
The story of misreading the swap line of the top command
The story of reading HSPICE data in Python
The story of trying Sourcetrail × macOS × VS Code
The story of viewing media files in Django
[Small story] Download the image of Ghibli immediately
The story of moving from Pipenv to Poetry
The story of launching a Minecraft server from Discord
A story that reduces the effort of operation / maintenance