I'm not an engineer at all, but I classified faces by machine learning

Introduction

I am in charge of customer support, such as running selenium with macros or Python at an IT company. I participated in the AI development contest "Neural Network Console Challenge" that I happened to find, so I will post it.

What is "Neural Network Console Challenge"?

https://nnc-challenge.com/

"Neural Network Console (NNC)", a GUI tool that Sony Network Communications Co., Ltd. can develop AI without programming, By providing 10,000 points of person image data that Pixta Inc. cannot normally handle, it seems that it has become an AI development challenge that is friendly to beginners.

Theme decision

In this challenge, each person decides the theme of image classification and learns at NNC. Submitting the learning results along with the process.

While looking over the images, there seemed to be many images of people, so first I extracted the face. Cut out a large number of faces from a PIXTA image with an OpenCV classifier.

Image preprocessing? For, I referred to the following article. Thanks to the posters who are easy to understand even for beginners.

Create AI to identify Zuckerberg's face by deep learning Momokuro member face recognition by TensorFlow

Face extraction

# -*- coding:utf-8 -*-

import cv2
import numpy as np

from PIL import Image

#Folder for saving images
input_data_path = './pixta_images/'
#Directory to save the cropped image
save_path = './cutted_face_images/'
#OpenCV default classifier path
cascade_path = './opencv/data/haarcascades/haarcascade_frontalface_default.xml'
faceCascade = cv2.CascadeClassifier(cascade_path)

#Number of successful face detections
face_detect_count = 0

#When a face is detected from the collected image data, cut it out and save it.
types = ['*.jpg']
paths = []
for t in types:
    paths.extend(glob.glob(os.path.join(input_data_path, t)))
for p in paths:
    img = cv2.imread(p, cv2.IMREAD_COLOR)
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
    face = faceCascade.detectMultiScale(gray, 1.1, 3)
    
    if len(face) > 0:
        for rect in face:
            x = rect[0]
            y = rect[1]
            w = rect[2]
            h = rect[3]
            cv2.imwrite(save_path + 'cutted_face' + str(face_detect_count) + '.jpg', img[y:y+h, x:x+w])
            face_detect_count = face_detect_count + 1

About 2500 faces were detected from 1500 images. The detected faces include shadows and polka dots that look like faces, and are manually deleted. I deleted about 1000 sheets and left 1500 faces.

スクリーンショット 2020-03-31 21.16.58.png Data for learning: PIXTA

The theme example from the management office was to classify by emotions such as happy / sad / embarrassed, but when I look at the face photo, I just smile. Therefore, I decided to classify smiles into several types.

Creating a dataset

In order to make it uploadable to NNC, we will sort 1500 faces according to the degree of smile.

First of all, I classified them into two categories, laughing and not laughing, and put them in folders. In addition, prepare a csv that defines the file name and label and upload it to NNC.

As a learning model, I made the following based on the explanation video about NNC distributed by SONY on YouTube. スクリーンショット 2020-03-31 0.44.44.png

The learning result is ...

スクリーンショット 2020-03-31 0.47.23.png

Well, there were variations in the number of image data for each label, so I wonder if this is the case. I'm not sure, but the two categories aren't interesting, so I'll increase the categories next.

--Ahaha (voice is coming out) --Nico Nico (smile of the whole face) --Smile (smiley mouth or eyes) --Fufufu (smile) --Seriously

We also perform additional face extraction, prepare about 200 sheets each, and upload them separately as learning data and test data.

スクリーンショット 2020-03-31 0.57.04.png

It's a wonderfully ugly result. If you include the ones that are off by one side, it fits somehow. The cause may be that I couldn't define it myself in the first place while I was sorting. When I look at the extracted smiles all the time, I don't understand what it is, lol

I reduced the classification by one and recreated the data with four.

--Ahaha (voice is coming out) --Nico Nico (open mouth) --Fufufu (closed mouth) --Seriously

It has improved a lot since a while ago. However, it did not reach 70%. スクリーンショット 2020-03-31 1.02.02.png

After that, referring to the following video tower, I tried Image Augmentation, cutout, Dropout, etc. to prevent overfitting, but the accuracy did not improve easily.

Due to the deadline, it seems that the time is up with the following results after all, reducing the middle class.

スクリーンショット 2020-03-31 1.08.50.png スクリーンショット 2020-03-31 1.09.02.png

スクリーンショット 2020-03-31 21.08.59.png Data for learning: PIXTA

I wasn't quite satisfied with the result, which was about 75%, but if I have time, I would like to try again with images I prepared myself. I enjoyed working on it without any knowledge!