When I noticed, GW passed by in a blink of an eye. This year's GW walked around Tokyo (including some Saitama) with the whole family. I climbed the Sky Tree, but I didn't think there were so many people. It seemed interesting to go at night if I had to go next. It looks like it's being screened.
I modified the network structure and various things in the dataset to improve the line art that can be extracted, so I updated it including the contents.
As I wrote in the previous article, I tried line art coloring with Deep Learning, but in the process, I felt that there were some problems with the data set. The line art itself was created with reference to http://qiita.com/khsk/items/6cf4bae0166e4b12b942, but there seem to be some problems.
Well, there are some problems, but I am still working on it because it can be created easily (OpenCV should work) and at high speed (OpenCV is abbreviated).
Some of the existing research is as follows. http://hi.cs.waseda.ac.jp:8081/
This is a dissertation. http://hi.cs.waseda.ac.jp/~esimo/publications/SimoSerraSIGGRAPH2016.pdf
This is a technique for AutoEncoder that converts rough to line art. This time, I made it with reference to this.
class AutoEncoder(object):
"""Define autoencoder"""
def __init__(self):
self.conv1 = Encoder(3, 48, 5, 5, strides=[1, 2, 2, 1], name='encoder1')
self.conv1_f1 = Encoder(48, 128, 3, 3, name='encoder1_flat1')
self.conv1_f2 = Encoder(128, 128, 3, 3, name='encoder1_flat2')
self.conv2 = Encoder(128, 256, 5, 5, strides=[1, 2, 2, 1], name='encoder2')
self.conv2_f1 = Encoder(256, 256, 3, 3, name='encoder2_flat1')
self.conv2_f2 = Encoder(256, 256, 3, 3, name='encoder2_flat2')
self.conv3 = Encoder(256, 256, 5, 5, strides=[1, 2, 2, 1], name='encoder3')
self.conv3_f1 = Encoder(256, 512, 3, 3, name='encoder3_flat1')
self.conv3_f2 = Encoder(512, 1024, 3, 3, name='encoder3_flat2')
self.conv3_f3 = Encoder(1024, 512, 3, 3, name='encoder3_flat3')
self.conv3_f4 = Encoder(512, 256, 3, 3, name='encoder3_flat4')
self.bnc1 = op.BatchNormalization(name='bnc1')
self.bnc1_f1 = op.BatchNormalization(name='bnc1_flat1')
self.bnc1_f2 = op.BatchNormalization(name='bnc1_flat2')
self.bnc2 = op.BatchNormalization(name='bnc2')
self.bnc2_f1 = op.BatchNormalization(name='bnc2_flat1')
self.bnc2_f2 = op.BatchNormalization(name='bnc2_flat2')
self.bnc3 = op.BatchNormalization(name='bnc3')
self.bnc3_f1 = op.BatchNormalization(name='bnc3_flat1')
self.bnc3_f2 = op.BatchNormalization(name='bnc3_flat2')
self.bnc3_f3 = op.BatchNormalization(name='bnc3_flat3')
self.bnc3_f4 = op.BatchNormalization(name='bnc3_flat4')
self.deconv1 = Decoder(256, 256, 4, 4, strides=[1, 2, 2, 1], name='decoder1')
self.deconv1_f1 = Encoder(256, 128, 3, 3, name='decoder1_flat1')
self.deconv1_f2 = Encoder(128, 128, 3, 3, name='decoder1_flat2')
self.deconv2 = Decoder(128, 128, 4, 4, strides=[1, 2, 2, 1], name='decoder2')
self.deconv2_f1 = Encoder(128, 128, 3, 3, name='decoder2_flat1')
self.deconv2_f2 = Encoder(128, 48, 3, 3, name='decoder2_flat2')
self.deconv3 = Decoder(48, 48, 4, 4, strides=[1, 2, 2, 1], name='decoder3')
self.deconv3_f1 = Decoder(48, 24, 3, 3, name='decoder3_flat1')
self.deconv3_f2 = Decoder(24, 1, 3, 3, name='decoder3_flat2')
self.bnd1 = op.BatchNormalization(name='bnd1')
self.bnd1_f1 = op.BatchNormalization(name='bnd1_flat1')
self.bnd1_f2 = op.BatchNormalization(name='bnd1_flat2')
self.bnd2 = op.BatchNormalization(name='bnd2')
self.bnd2_f1 = op.BatchNormalization(name='bnd2_flat1')
self.bnd2_f2 = op.BatchNormalization(name='bnd2_flat2')
self.bnd3 = op.BatchNormalization(name='bnd3')
self.bnd3_f1 = op.BatchNormalization(name='bnd3_flat1')
def autoencoder(images, height, width):
"""make autoencoder network"""
AE = AutoEncoder()
def div(v, d):
return max(1, v // d)
relu = tf.nn.relu
net = relu(AE.bnc1(AE.conv1(images, [height, width])))
net = relu(AE.bnc1_f1(AE.conv1_f1(net, [div(height, 2), div(width, 2)])))
net = relu(AE.bnc1_f2(AE.conv1_f2(net, [div(height, 2), div(width, 2)])))
net = relu(AE.bnc2(AE.conv2(net, [div(height, 2), div(width, 2)])))
net = relu(AE.bnc2_f1(AE.conv2_f1(net, [div(height, 4), div(width, 4)])))
net = relu(AE.bnc2_f2(AE.conv2_f2(net, [div(height, 4), div(width, 4)])))
net = relu(AE.bnc3(AE.conv3(net, [div(height, 4), div(width, 4)])))
net = relu(AE.bnc3_f1(AE.conv3_f1(net, [div(height, 8), div(width, 8)])))
net = relu(AE.bnc3_f2(AE.conv3_f2(net, [div(height, 8), div(width, 8)])))
net = relu(AE.bnc3_f3(AE.conv3_f3(net, [div(height, 8), div(width, 8)])))
net = relu(AE.bnc3_f4(AE.conv3_f4(net, [div(height, 8), div(width, 8)])))
net = relu(AE.bnd1(AE.deconv1(net, [div(height, 4), div(width, 4)])))
net = relu(AE.bnd1_f1(AE.deconv1_f1(net, [div(height, 4), div(width, 4)])))
net = relu(AE.bnd1_f2(AE.deconv1_f2(net, [div(height, 4), div(width, 4)])))
net = relu(AE.bnd2(AE.deconv2(net, [div(height, 2), div(width, 2)])))
net = relu(AE.bnd2_f1(AE.deconv2_f1(net, [div(height, 2), div(width, 2)])))
net = relu(AE.bnd2_f2(AE.deconv2_f2(net, [div(height, 2), div(width, 2)])))
net = relu(AE.bnd3(AE.deconv3(net, [height, width])))
net = relu(AE.bnd3_f1(AE.deconv3_f1(net, [height, width])))
net = tf.nn.sigmoid(AE.deconv3_f2(net, [height, width]))
return net
AutoEncoder looks like this. I wondered if I could get it somehow. In the paper, it seems that the focus is on the method called loss map, but because I don't know how to refer to the histogram in Tensorflow, that part is implemented.
The network used was about 250,000 times with the following parameters given.
I gave up on the huge size because I couldn't survive even with 2GiB memory in the first place. It seems good to reduce it once, extract it, and then insert an encoder to increase the resolution.
Mikon! I painted it. Amezuku @ Looking for a job
All the original images are borrowed because I painted them with Pixiv. This image has no original line art, and there is no comparison target in the first place, so that's it. It took about 10 seconds for both this size and GPU, so I honestly don't want to think about doing it with a CPU.
I think it's pretty solid. The average size of the images in the dataset is about 1000 \ * 1000, which is quite large, so even large images can be handled. However, unfortunately there is a jaggedness peculiar to the lower part ... This can not be said because there are times when it comes out and times when it does not come out.
The size is 256 \ * 256. For the time being, I will also include the version I pulled out with OpenCV. Ignore the tsukkomi that 256 \ * 256 is thumbnailed.
Original picture. At first glance, it seems like a line will appear, but ...
OpenCV version. Since all the fine color tones have come out, I can't deny the feeling that it's a line art or grayscale.
This network version. It's partly suspicious (or rather, the hand part is impossible to play), but the influence of the shadow can be ignored quite properly, and the expression of the hair part is simple, so it's a miso in the foreground, but it feels pretty good. Is it not? Too fine details such as frills are crushed by the stone, but it seems that this can be solved little by little if you continue learning a little more.
The dataset used for learning is basically obtained from the categories painted by Pixiv. The most important thing in making a data set was that ** the line art and the colored image had the same aspect **. If this shifts, it will be like not being able to learn in the first place, so I collected them while checking each one.
Also, I sometimes found that even if the aspect ratio was the same, there was a slight difference between the line art and the colored image **. I had to omit it properly because learning would not proceed if there was this too.
It depends on how you paint it, but it's a simpler line than OpenCV, and I think it feels like the details aren't lost that much. However, due to the structure or the nature of what we are doing, there are drawbacks, including the fact that there is nothing we can do about it.
Basically, rather than using it as it is, I think it will be more like processing based on this.
Generating networks such as AutoEncoder are interesting. I also want to challenge myself to give parameters and change the way line art is made.
It is difficult to investigate and implement, but since amateurs can do deep learning, we recommend that you invest a little and try it (on your own or in the cloud).
Recommended Posts