Explain & actually move the SMIS model that can "virtually try on clothes" by deep learning

Introduction

A research team from Huazhong University of Science and Technology and Peking University in China published a paper entitled "Semantically Multi-modal Image Synthesis" (SMIS) at the CVPR2020 conference. According to this paper, you can change the pattern and color of clothes and trousers from a single photo to create a virtual try-on image. This paper uses a deep learning-based method in which images are classified by object and the images are partially replaced and combined.

This time, I will explain a brief explanation of the SMIS model and how to actually try it. If you want to move it, please see "Try moving the model". The details of the paper are as follows. smis.gif

Paper: https://arxiv.org/abs/2003.12697 Project page: https://seanseattle.github.io/SMIS/ Github: https://github.com/Seanseattle/SMIS Video: http://www.youtube.com/watch?v=uarUonGi_ZU

About the model

To change the pattern / color of clothes, it is first necessary to label which part is the clothes / trousers. DeepFashion is used to perform semantic segmentation that associates image areas using a dataset that labels clothing parts.

By adjusting with the controller according to each semantic segmentation class, you can convert the image only in the corresponding part. Traditionally, in such image composition, a method of constructing a generation network for each class and integrating the outputs of different networks to generate the final image has been used.

However, with this method, there was a problem that the learning time increased and the accuracy decreased as the number of classes increased. The SMIS method solves that problem with the GroupDNet (Group Decreasing Network), a network that changes class control from traditional convolution to group convolution and unifies the generation process into a single model.

main.jpg

GroupDNet gives you the ability to build cross-correlation between different classes when they are similar between classes (grass color and leaf color are similar, etc.) and improve overall image quality. This also reduces the amount of calculation when there are many classes. This made it easier to convert the semantic labels to another image, and we were able to get high quality results even for datasets with many classes.

l_koya_sem2.png Figure 1 GroupDNet architecture (quoted from [1])

Figure 2 shows a comparison of other models. There are several indicators of how to evaluate the quality of the generated image in the generative model, one of which is the Fréchet Inception Distance (FID). This is an evaluation index that measures the distance between the distribution of the actual image and the generated image. In this FID, the result is state-of-the-art compared to other models. However, in terms of speed, it is 12.2 FPS, which is inferior to other models.

スクリーンショット 2020-09-04 14.08.43.png Figure 2 Comparison with other models (quoted from [1])

Since this method is a method of changing the style after semantic segmentation as shown in Fig. 1, it is not limited to fashion, but gradually from one image to another, such as changing a building to a tree, inserting a bet in an empty place, etc. It can be applied in various ways, such as morphing that changes to.

(For the explanation, I quoted some [2], there may be missing or incorrect explanations, so please comment at that time)

Try moving the model

Now let's move the SMIS model. The research team has published the model implemented in Pytorch on Github. I made a code that actually works with Google Colaboratory. You can check the notebook I created from the following URL. You can copy this notebook to your drive and have it run. (You can move it just by executing the cells in order from the top) https://colab.research.google.com/drive/1HGqqOXxFKTSJibg2tQ-Shb9ArFiX-96g?usp=sharing

I will explain the created Colaboratory code in Qiita as well.

Repository clone

Clone the repository (https://github.com/Seanseattle/SMIS) from Github.

!git clone https://github.com/Seanseattle/SMIS
%cd SMIS

Download trained model

Download the trained model from the following Google drive. https://drive.google.com/open?id=1og_9By_xdtnEd9-xawAj4jYbXR6A9deG The trained model is described in the README of the repository. In order to download directly from Google drive with CUI, it is necessary to pass the cookie information with wget, so download it by the following method (For how to download drive using cookie, click here]( It is summarized at https://qiita.com/tommy19970714/items/3e6a2e8b9dc15982a5de)). You can simply open the URL in your browser and download it in the GUI.

!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1og_9By_xdtnEd9-xawAj4jYbXR6A9deG' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1og_9By_xdtnEd9-xawAj4jYbXR6A9deG" -O smis_pretrained.rar && rm -rf /tmp/cookies.txt

Unzip the downloaded preserve data and rename the folder.

!unrar x smis_pretrained.rar
!mv smis_pretrained checkpoints

Download DeepFashion dataset

Similar to the above, download the Deep Fashion dataset from the following Google drive link. Contains train data and test data. This time, I downloaded it only for use as test data. https://drive.google.com/open?id=1ckx35-mlMv57yzv47bmOCrWTm5l2X-zD

!wget --load-cookies /tmp/cookies.txt "https://docs.google.com/uc?export=download&confirm=$(wget --quiet --save-cookies /tmp/cookies.txt --keep-session-cookies --no-check-certificate 'https://docs.google.com/uc?export=download&id=1ckx35-mlMv57yzv47bmOCrWTm5l2X-zD' -O- | sed -rn 's/.*confirm=([0-9A-Za-z_]+).*/\1\n/p')&id=1ckx35-mlMv57yzv47bmOCrWTm5l2X-zD" -O deepfashion.tar && rm -rf /tmp/cookies.txt

Unzip the file.

!tar -xvf deepfashion.tar

Install the required libraries

The libraries used in this repository are summarized in requirement.txt. Install the library in requirement.txt.

!pip install -r requirements.txt

Run the model test

All you have to do is run test.py to get the model working. For the --gpu_ids parameter, specify the gpu id to use. You can check the gpu id by hitting the nvidia-smi command. The --dataroot parameter indicates where the cihp_test_mask and cihp_train_mask folders of the deepfashion dataset you just downloaded are located. Now that I saved it in the same directory, I have specified ./.

!python test.py --name deepfashion_smis --dataset_mode deepfashion --dataroot ./ --no_instance \
--gpu_ids 0 --ngf 160 --batchSize 4 --model smis --netG deepfashion

The generated images are saved in a folder in results / deepfashion_smis / test_latest / images /. please make sure.

You can see that such an image is generated.

output1.pngoutput2.png output3.pngoutput4.png

At the end

This time, I explained the state-of-the-art SMIS model in the FID index of clothing style conversion from the explanation to the actual movement. The field of image-generated deep learning models is a series of surprises with new papers coming out every day. It's a lot of fun to check out these latest technologies. The application of this model is very clear, and I think it is a technology that companies in the fashion industry will also need. I hope that these articles will lead to the practical application of deep learning models.

I'm tweeting about deep learning models and personal development on Twitter → @ tommy19970714

Please comment on the explanation of the model, as there may be some missing or incorrect explanations.

References

[1] ZHU, Zhen, et al. Semantically Multi-modal Image Synthesis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2020. p. 5467-5476. [2] [ITmedia Deep learning makes it easy to change clothes. Technology "SMIS" that replaces and synthesizes only part of the image](https://www.itmedia.co.jp/news/articles/2009/04/news027. html) [3] Overview of Progressive / Big / Style GANs and Performance Evaluation Scale of GANs

Recommended Posts

Explain & actually move the SMIS model that can "virtually try on clothes" by deep learning
Deep learning course that can be crushed on site
Hide the warning that zsh can be used by default on Mac
Create an AI that identifies Zuckerberg's face by deep learning ② (AI model construction)
[Python] I tried to analyze the characteristics of thumbnails that are easy to play on YouTube by deep learning
Implementation of a model that predicts the exchange rate (dollar-yen rate) by machine learning
I tried "Lobe" which can easily train the machine learning model published by Microsoft.