If you want to detect a character's face from an anime image in Python, use lbpcascade_animeface implemented by nagadomi. I often see cases. This lbpcascade_animeface is a module that estimates the bounding box of a face.

In addition, the module animeface-2009 implemented by nagadomi mentioned above is not only for the face, but also for facial parts (eyes, nose, mouth, etc.) The bounding box of the chin) is also detectable [^ 1]. However, only Ruby provides a binding level API here. According to the README.md of lbpcascade_animeface, it is written that" the detection accuracy is higher in ʻanimeface-2009`". Therefore, I definitely want to use it in Python.

[^ 1]: However, the bounding box on the nose and chin is 1x1 pixels, so it might be better to call it a landmark.

Function to call "animeface-2009" from Python

The Git repository of the one implemented this time is here.

Rewriting C code implemented for Ruby for Python is not an easy task for a fucking zaco engineer like me. So, although it's a bad idea, the Python subprocess module starts Ruby via the shell and executes the Ruby script of ʻanimeface-2009` [Python function](https://github.com/meow- noisy / animeface_result2xml / blob / master / animeface_poor_caller.py) has been implemented. Of course, ** this code alone cannot detect faces **. Advance preparation is required. You need to structure the directory like the Git repository mentioned above and also build animeface-2009 [^ 2] ..

[^ 2]: The original ʻanimeface-2009` was difficult to build in multiple environments, so I made a change to the [fork repository](https://github.com/meow-noisy/animeface-2009/ I am using tree / a39361157ba8bf16ccd838942fa2f333658e9627). We have added the necessary steps at build time to README.md in this repository, so please check it as well.

# a poor python caller of animeface-2009
# call rb module via shell by using subprocess...

import subprocess

import sys
import json
from pathlib import Path

this_file_dir = (Path(__file__).resolve()).parent


def detect_animeface(im_path):

    im_path = Path(im_path).resolve()
    assert im_path.exists()

    ruby_script_path = this_file_dir / 'animeface-2009/animeface-ruby/sample.rb'
    ret = subprocess.check_output(["ruby", str(ruby_script_path), str(im_path)]).decode('utf-8')

    ret = ret.replace("=>", ":")
    ret = ret.replace(">", "\"")
    ret = ret.replace("#", "\"")
    list_ = json.loads(ret)

    return list_

We are not passing data directly from Ruby to Python. After outputting the detected bounding box information as standard, Python manages to format the character string so that it is in JSON format and makes it a dictionary type [^ 4]. Although it is a function interface, the input is the image path, the return value is the detection result, and the following dictionary list is returned. One face detection result is one dictionary.

[^ 4]: Some unnecessary elements (character strings representing Ruby objects) are included, but since post-processing was troublesome, I am trying to output it as it is.

[{'chin': {'x': 228, 'y': 266},
  'eyes': {'left': {'colors': ['<Magick::Pixel:0x00007ff93e969148',
                               '<Magick::Pixel:0x00007ff93e968e28',
                               '<Magick::Pixel:0x00007ff93e968d88',
                               '<Magick::Pixel:0x00007ff93e968bf8'],
                    'height': 31,
                    'width': 39,
                    'x': 222,
                    'y': 181},
           'right': {'colors': ['<Magick::Pixel:0x00007ff93e968040',
                                '<Magick::Pixel:0x00007ff93e968018',
                                '<Magick::Pixel:0x00007ff93e968180',
                                '<Magick::Pixel:0x00007ff93e9681a8'],
                     'height': 28,
                     'width': 31,
                     'x': 165,
                     'y': 202}},
  'face': {'height': 127, 'width': 127, 'x': 158, 'y': 158},
  'hair_color': '<Magick::Pixel:0x00007ff93e969cb0',
  'likelihood': 1.0,
  'mouth': {'height': 12, 'width': 25, 'x': 210, 'y': 243},
  'nose': {'x': 207, 'y': 233},
  'skin_color': '<Magick::Pixel:0x00007ff93e96a020'},
 {'chin': {'x': 379, 'y': 243},
  'eyes': {'left': {'colors': ['<Magick::Pixel:0x00007ff93e96b6a0',
                               '<Magick::Pixel:0x00007ff93e96b8d0',
                               '<Magick::Pixel:0x00007ff93e96b9e8',
                               '<Magick::Pixel:0x00007ff93e96bab0'],
                    'height': 29,
                    'width': 32,
                    'x': 418,
                    'y': 177},
           'right': {'colors': ['<Magick::Pixel:0x00007ff93e963568',
                                '<Magick::Pixel:0x00007ff93e963478',
                                '<Magick::Pixel:0x00007ff93e963298',
                                '<Magick::Pixel:0x00007ff93e9631a8'],
                     'height': 31,
                     'width': 39,
                     'x': 354,
                     'y': 157}},
  'face': {'height': 139, 'width': 139, 'x': 329, 'y': 121},
  'hair_color': '<Magick::Pixel:0x00007ff93e96ab38',
  'likelihood': 1.0,
  'mouth': {'height': 12, 'width': 20, 'x': 383, 'y': 218},
  'nose': {'x': 401, 'y': 205},
  'skin_color': '<Magick::Pixel:0x00007ff93e96a7c8'}]

By the way, when the Python file is executed as a script, the image that depicts the detection result is output.

Module that outputs the detection result to an XML file in PASCAL VOC format

You can use the above function to detect animated faces (parts), but not all faces can be detected. From the perspective of detection performance scalability, we will also consider migrating to DNN-based detectors. Therefore, using the detection results of animeface-2009, we implemented a function to create an object detection dataset for learning DNN. The data set format is the standard PASCAL VOC.

animeface_result2xml.py

$ python animeface_result2xml.py [directory of the image to be detected] [directory to output the XML file] [text file path of the created XML file list]

Collect the images you want to detect in the directory in advance (subdirectories can be configured), specify the path to the directory to output the XML file, and the text file showing the list of created files.

By the way, the XML file is as follows.

<annotation>
    <folder>folder_name</folder>
    <filename>img_file_path</filename>
    <path>/path/to/dummy</path>
    <source>
    <database>Unknown</database>
    </source>
    <size>
        <width>600</width>
        <height>600</height>
        <depth>3</depth>
    </size>
    <segmented>0</segmented>
    <object>
        <name>face</name>
        <pose>Unspecified</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>158</xmin>
            <ymin>158</ymin>
            <xmax>285</xmax>
            <ymax>285</ymax>
        </bndbox>
    </object><object>
        <name>right_eye</name>
        <pose>Unspecified</pose>
        <truncated>1</truncated>
        <difficult>0</difficult>
        <bndbox>
            <xmin>165</xmin>
            <ymin>202</ymin>
            <xmax>196</xmax>
            <ymax>230</ymax>
        </bndbox>
...

Checking labels with tools

When editing an XML file, it is recommended to use the annotation tool LabelImg. You can check the estimated bounding box graphically, and correct the wrong place on the spot.

After that, the created XML is parsed by the data loader of the deep learning framework, and the object detection DNN model is trained.

in conclusion

I have implemented a dirty function to manage ʻanimeface-2009`, which is a Ruby module that detects anime face parts, in Python. We also implemented a function to convert the detection result into an XML file in Pascal VOC format. I would like to use this to expand the range of applications of anime machine learning.

In this article, we took the position that annotation labels are required to detect animated objects with DNN, but some people may find it a little troublesome. Recently, kanosawa published an article using the Domain Adaptation technique for learning object detection models. This allows you to learn the object detection model (Faster RCNN) without annotating the bounding box of the anime face, so please check it if you are interested. -I tried to realize high-precision anime face detection without annotation (Domain Adaptation)

Bad post for using "animeface-2009" in Python & Implementation of function to output to PASCAL VOC format XML file

Function to call "animeface-2009" from Python

Module that outputs the detection result to an XML file in PASCAL VOC format

Checking labels with tools

in conclusion