Overview

https://github.com/rezoo/illustration2vec illustration2vec is a model that can detect the features and tags of illustrations. The structure of the model is almost VGG model. There are some changes from the original VGG, so for details, see the review article at the above link. It is a model of caffe and chainer. It's an interesting model, but chainer seems to be finished development, so I definitely wanted to reuse it and wrote a conversion code to keras. I've never used torch, so let's go through.

The execution was done by google colaboratory, so link below. https://colab.research.google.com/drive/1UZN7pn4UzU5s501dwSIA2IHGmjnAmouY

If you can't open the link, copy the following to colab and it will work.

All code

!git clone https://github.com/rezoo/illustration2vec.git
!sh illustration2vec/get_models.sh
!pip install -r /content/illustration2vec/requirements.txt
!mv /content/illustration2vec/i2v /content/
import i2v
illust2vec_tag = i2v.make_i2v_with_chainer('/content/illustration2vec/illust2vec_tag_ver200.caffemodel', '/content/illustration2vec/tag_list.json')
%tensorflow_version 2.x
import tensorflow as tf
import numpy as np

#tag estimater model
model_tag = tf.keras.Sequential(name='illustration2vec_tag')
model_tag.add(tf.keras.layers.Input(shape=(224, 224, 3)))
pool_idx = [0, 1, 3, 5, 7]
for i, chainer_layer in enumerate(illust2vec_tag.net.children()):
    kernel, bias = tuple(chainer_layer.params())
    k_kernel = np.transpose(kernel.data, axes=[3, 2, 1, 0])
    bias = bias.data
    if i == 0:
        k_kernel = k_kernel[:,:,::-1,:]
    channel = bias.shape[0]
    keras_layer = tf.keras.layers.Conv2D(channel, 3, padding='SAME', activation='relu', kernel_initializer=tf.keras.initializers.constant(k_kernel), bias_initializer=tf.keras.initializers.constant(bias), name='Conv_%d'%i)
    model_tag.add(keras_layer)
    if i in pool_idx:
        model_tag.add(tf.keras.layers.MaxPooling2D())
    del kernel, bias
model_tag.add(tf.keras.layers.AveragePooling2D(pool_size=(7, 7)))
model_tag.add(tf.keras.layers.Lambda(lambda x : tf.nn.sigmoid(tf.squeeze(x, axis=[1, 2])), name='sigmoid'))
model_tag.save('illust2vec_tag_ver200.h5')
del model_tag, illust2vec_tag

#feature vector model
illust2vec = i2v.make_i2v_with_chainer('/content/illustration2vec/illust2vec_ver200.caffemodel')
model = tf.keras.Sequential(name='illustration2vec')
model.add(tf.keras.layers.Input(shape=(224, 224, 3)))
pool_idx = [0, 1, 3, 5, 7]
for i, chainer_layer in enumerate(illust2vec.net.children()):
    if i == 12:
        break
    kernel, bias = tuple(chainer_layer.params())
    if len(kernel.data.shape) == 4:
        k_kernel = np.transpose(kernel.data, axes=[3, 2, 1, 0])
        bias = bias.data
        if i == 0:
            k_kernel = k_kernel[:,:,::-1,:]
        channel = bias.shape[0]
        keras_layer = tf.keras.layers.Conv2D(channel, 3, padding='SAME', activation='relu', kernel_initializer=tf.keras.initializers.constant(k_kernel), bias_initializer=tf.keras.initializers.constant(bias), name='Conv_%d'%i)
        model.add(keras_layer)
        if i in pool_idx:
            model.add(tf.keras.layers.MaxPooling2D())
        elif i == 10:
            model.add(tf.keras.layers.Flatten())
    elif len(kernel.data.shape) == 2:
        model.add(tf.keras.layers.Dense(4096, kernel_initializer=tf.keras.initializers.constant(kernel.data), bias_initializer=tf.keras.initializers.constant(bias.data), name='encode1'))
    del kernel, bias
model.save('illust2vec_ver200.h5')
del model, illust2vec
def resize(imgs):
    mean = tf.constant(np.array([181.13838569, 167.47864617, 164.76139251]).reshape((1, 1, 3)), dtype=tf.float32)
    resized = []
    for img in imgs:
        img = tf.cast(img, tf.float32)
        im_max = tf.reduce_max(img, keepdims=True)
        im_min = tf.reduce_min(img, keepdims=True)
        im_std = (img - im_min) / (im_max - im_min + 1e-10)
        resized_std = tf.image.resize(im_std, (224, 224))
        resized_im = resized_std*(im_max - im_min) + im_min
        resized_im = resized_im - mean
        resized.append(tf.expand_dims(resized_im, 0))
    return tf.concat(resized, 0)

Commentary

All I'm doing is defining a model with the same structure in keras and initializing it with the weight of the chainer model. There are three changes as follows. ・ Weight transpose The kernel of the convolution of this chainer model is (out_channel, in_channel, k_size, k_size), but it is transposed to (k_size, k_size, in_channel, out_channel) for the keras model. -Change the input image from BGR to RGB In the original model, the input image is BGR, but I changed it to RGB input. Therefore, the convolution kernel of the first layer is reversed on the axis of in_channel. -Change the input image to channel_last The chainer convolution has an input of (N, C, H, W), but is the default of keras (N, H, W, C).

The input of the converted model is the size of (batch, 224, 224, 3). If you pass the list of numpy.array of the image to the above colab link or the resize function in the whole code, it will resize + normalize. There are two models, illust2vec_tag_ver200.h5 and illust2vec_ver200.h5, but the first output is the tag_list at https://github.com/rezoo/illustration2vec. This is the probability of each of the 1539 tags in .json. The output of the second model is a feature vector of the image.

Please point out any deficiencies.

I made a code to convert illustration2vec to keras model

Overview

Commentary