Head direction estimation?

Head direction estimation, Head Pose Optimization in English. It is an algorithm that estimates the direction in which the face is facing and the inclination of the head from the input image information and facial feature data. Recently, it is widely used for Vtuber development.

Head orientation estimation method

Qiita has already introduced several methods for estimating the head direction. This is very well summarized in the Qiita article. Investigating face orientation estimation

I think this is the article you are referring to about the head estimation method using Python and OpenCV + dlib. Head Pose Estimation using OpenCV and Dlib

The face orientation algorithms are described in great detail in the How do pose estimation algorithms work? Section of this page.

An example of a program

For the time being, I will write the program introduced in the article. You can download the dat file for face recognition from here. [dlib.net] 68 points learned data for face recognition [DL]

Module loading

`HeadPoseEstimation.py`


#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import cv2 #OpenCV:Image processing library
import dlib #Machine learning library
import imutils #OpenCV assistance
from imutils import face_utils
import numpy as np

I'm importing OpenCV for image processing, dlib for image recognition, and imutils as an aid to displaying on the screen.

Camera and face detector settings

`HeadPoseEstimation.py`


DEVICE_ID = 0 #Camera ID 0 to use is a standard webcam
capture = cv2.VideoCapture(DEVICE_ID)#Read dlib trained data
predictor_path = ",,,/shape_predictor_68_face_landmarks.dat"
#Copy and paste the path of the learned dat file

detector = dlib.get_frontal_face_detector() #Call the face detector. Only the face is detected.
predictor = dlib.shape_predictor(predictor_path) #Output landmarks such as eyes and nose from the face

See here for detailed dlib functions. dlib documentation

Contents of head direction estimation

It acquires one frame at a time from the camera and processes it.

`HeadPoseEstimation.py`


while(True): #Get images continuously from the camera
    ret, frame = capture.read() #Capture from the camera and put one frame of image data in the frame
    
    frame = imutils.resize(frame, width=1000) #Adjust the display size of the frame image
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert to gray scale
    rects = detector(gray, 0) #Detect face from gray
    image_points = None
     
    for rect in rects:
        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)
        
        for (x, y) in shape: #Plot 68 landmarks on the entire face
            cv2.circle(frame, (x, y), 1, (255, 255, 255), -1)

        image_points = np.array([
                tuple(shape[30]),#Nose tip
                tuple(shape[21]),
                tuple(shape[22]),
                tuple(shape[39]),
                tuple(shape[42]),
                tuple(shape[31]),
                tuple(shape[35]),
                tuple(shape[48]),
                tuple(shape[54]),
                tuple(shape[57]),
                tuple(shape[8]),
                ],dtype='double')
    
    if len(rects) > 0:
        cv2.FONT_HERSHEY_PLAIN, 0.7, (0, 0, 255), 2)
        model_points = np.array([
                (0.0,0.0,0.0), # 30
                (-30.0,-125.0,-30.0), # 21
                (30.0,-125.0,-30.0), # 22
                (-60.0,-70.0,-60.0), # 39
                (60.0,-70.0,-60.0), # 42
                (-40.0,40.0,-50.0), # 31
                (40.0,40.0,-50.0), # 35
                (-70.0,130.0,-100.0), # 48
                (70.0,130.0,-100.0), # 54
                (0.0,158.0,-10.0), # 57
                (0.0,250.0,-50.0) # 8
                ])

        size = frame.shape

        focal_length = size[1]
        center = (size[1] // 2, size[0] // 2) #Face center coordinates

        camera_matrix = np.array([
            [focal_length, 0, center[0]],
            [0, focal_length, center[1]],
            [0, 0, 1]
        ], dtype='double')

        dist_coeffs = np.zeros((4, 1))

        (success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix,
                                                                      dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE)
        #Rotation matrix and Jacobian
        (rotation_matrix, jacobian) = cv2.Rodrigues(rotation_vector)
        mat = np.hstack((rotation_matrix, translation_vector))

        #yaw,pitch,Take out roll
        (_, _, _, _, _, _, eulerAngles) = cv2.decomposeProjectionMatrix(mat)
        yaw = eulerAngles[1]
        pitch = eulerAngles[0]
        roll = eulerAngles[2]
        
        print("yaw",int(yaw),"pitch",int(pitch),"roll",int(roll))#Extraction of head posture data

        cv2.putText(frame, 'yaw : ' + str(int(yaw)), (20, 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
        cv2.putText(frame, 'pitch : ' + str(int(pitch)), (20, 25), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
        cv2.putText(frame, 'roll : ' + str(int(roll)), (20, 40), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)

        (nose_end_point2D, _) = cv2.projectPoints(np.array([(0.0, 0.0, 500.0)]), rotation_vector,
                                                         translation_vector, camera_matrix, dist_coeffs)
        #Plot of points used in the calculation/Display of face direction vector
        for p in image_points:
            cv2.drawMarker(frame, (int(p[0]), int(p[1])),  (0.0, 1.409845, 255),markerType=cv2.MARKER_CROSS, thickness=1)

        p1 = (int(image_points[0][0]), int(image_points[0][1]))
        p2 = (int(nose_end_point2D[0][0][0]), int(nose_end_point2D[0][0][1]))

        cv2.arrowedLine(frame, p1, p2, (255, 0, 0), 2)
    
    cv2.imshow('frame',frame) #Display image
    if cv2.waitKey(1) & 0xFF == ord('q'): #Press q to break and exit while
        break


capture.release() #Exit video capture
cv2.destroyAllWindows() #close window

If it works properly, it will be like this. スクリーンショット 2019-11-24 23.32.58.png

Parameter explanations and notes

yaw,roll,pitch The head posture parameters yaw, roll, and pitch look like this. (It's the same as an airplane)

Facial features to use

Please refer to here for the position of image_points defined this time. The point used this time is ・ Inside the eyebrows (22,23) ・ Inside the eyes (40,43) ・ Nose head (31) ・ Both sides of the nose (32,36) ・ Both outsides of the mouth (49,55) ・ Under the lips (58) ・ Chin (9) It is 11 points of. The algorithm can estimate the direction of the head with 5 points, but when I tried it, when the score was small, the direction of the vector at the tip of the nose turned around, so I increased the score. (Is it because the learned data is based on Westerners ...) Facial landmarks with dlib, OpenCV, and Python The more you use the points on the outside of the face, the better the accuracy will be, but if the eyebrows etc. are cut off when you turn to the side, it will cause a false judgment of the feature amount, so try to use the points in the center of the face as much as possible. <img width = "638" alt = "IMG_18D234CF6CC9-1.jpeg " src = "https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/543519/9cc12aa3-958f-0940" -9737-84a999dfacbf.jpeg ">

And the problem is model_points, that is, what should I do with the position coordinates of the parts of my face, but I defined it by force from the following program. The (x, y) coordinate data of the face with the tip of the nose as the origin will appear in the image, so please face the camera as straight as possible, extend your posture, and read it with a spirit. I'm guessing about the z coordinate. Calculate the distance from the tip of your nose to the area between your eyes and apply it to the height of your nose. Hang in there!

`HPEcal.py`


#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import cv2 #OpenCV:Image processing library
import dlib #Machine learning library
import imutils #OpenCV assistance
from imutils import face_utils
import numpy as np


#Gets a VideoCapture object
DEVICE_ID = 0 #ID 0 is standard web cam
capture = cv2.VideoCapture(DEVICE_ID)#Read dlib trained data
predictor_path = "/shape_predictor_68_face_landmarks.dat"

print("[INFO] loading facial landmark predictor...")
detector = dlib.get_frontal_face_detector() #Call the face detector. Only the face is detected.
predictor = dlib.shape_predictor(predictor_path) #Output landmarks such as eyes and nose from the face

while(True): #Get images continuously from the camera
    ret, frame = capture.read() #Capture from the camera and put one frame of image data in the frame
    
    frame = imutils.resize(frame, width=2000) #Adjust the display size of the frame image
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert to gray scale
    rects = detector(gray, 0) #Detect face from gray
    image_points = None
     
    for rect in rects:
        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)
        #print(shape[30])#Nose coordinates
        cal = shape-shape[30]
        print("######[X,Y]#######",
              "\n point18=",cal[17],
              "\n point22=",cal[21],
              "\n point37=",cal[36],
              "\n point40=",cal[39],
              "\n point28=",cal[27],
              "\n point31=",cal[30],
              "\n point32=",cal[31],
              "\n point49=",cal[48],
              "\n point58=",cal[57],
              "\n point9=",cal[8])
        
        for (x, y) in shape: #Plot 68 landmarks on the entire face
            cv2.circle(frame, (x, y), 1, (255, 255, 255), -1)
            cv2.putText(frame,str((x, y)-shape[30]),(x,y), cv2.FONT_HERSHEY_PLAIN, 1.0, (0, 0, 255), 2)

    
    cv2.imshow('frame',frame) #Display image
    if cv2.waitKey(1) & 0xFF == ord('q'): #Press q to break and exit while
        break

capture.release() #Exit video capture
cv2.destroyAllWindows() #close window

Finally

Finally, throw the whole program and finish it. Thank you for your hard work.

program

`HeadPoseEstimation.py`


import cv2 #OpenCV:Image processing library
import dlib #Machine learning library
import imutils #OpenCV assistance
from imutils import face_utils
import numpy as np


#Gets a VideoCapture object
DEVICE_ID = 0 #ID 0 is standard web cam
capture = cv2.VideoCapture(DEVICE_ID)#Read dlib trained data
predictor_path = ".../shape_predictor_68_face_landmarks.dat"

detector = dlib.get_frontal_face_detector() #Call the face detector. Only the face is detected.
predictor = dlib.shape_predictor(predictor_path) #Output landmarks such as eyes and nose from the face

while(True): #Get images continuously from the camera
    ret, frame = capture.read() #Capture from the camera and put one frame of image data in the frame
    
    frame = imutils.resize(frame, width=1000) #Adjust the display size of the frame image
    gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert to gray scale
    rects = detector(gray, 0) #Detect face from gray
    image_points = None
     
    for rect in rects:
        shape = predictor(gray, rect)
        shape = face_utils.shape_to_np(shape)

        for (x, y) in shape: #Plot 68 landmarks on the entire face
            cv2.circle(frame, (x, y), 1, (255, 255, 255), -1)

        image_points = np.array([
                tuple(shape[30]),#Nose tip
                tuple(shape[21]),
                tuple(shape[22]),
                tuple(shape[39]),
                tuple(shape[42]),
                tuple(shape[31]),
                tuple(shape[35]),
                tuple(shape[48]),
                tuple(shape[54]),
                tuple(shape[57]),
                tuple(shape[8]),
                ],dtype='double')
    
    if len(rects) > 0:
        model_points = np.array([
                (0.0,0.0,0.0), # 30
                (-30.0,-125.0,-30.0), # 21
                (30.0,-125.0,-30.0), # 22
                (-60.0,-70.0,-60.0), # 39
                (60.0,-70.0,-60.0), # 42
                (-40.0,40.0,-50.0), # 31
                (40.0,40.0,-50.0), # 35
                (-70.0,130.0,-100.0), # 48
                (70.0,130.0,-100.0), # 54
                (0.0,158.0,-10.0), # 57
                (0.0,250.0,-50.0) # 8
                ])

        size = frame.shape

        focal_length = size[1]
        center = (size[1] // 2, size[0] // 2) #Face center coordinates

        camera_matrix = np.array([
            [focal_length, 0, center[0]],
            [0, focal_length, center[1]],
            [0, 0, 1]
        ], dtype='double')

        dist_coeffs = np.zeros((4, 1))

        (success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix,
                                                                      dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE)
        #Rotation matrix and Jacobian
        (rotation_matrix, jacobian) = cv2.Rodrigues(rotation_vector)
        mat = np.hstack((rotation_matrix, translation_vector))

        #yaw,pitch,Take out roll
        (_, _, _, _, _, _, eulerAngles) = cv2.decomposeProjectionMatrix(mat)
        yaw = eulerAngles[1]
        pitch = eulerAngles[0]
        roll = eulerAngles[2]
        
        print("yaw",int(yaw),"pitch",int(pitch),"roll",int(roll))#Extraction of head posture data

        cv2.putText(frame, 'yaw : ' + str(int(yaw)), (20, 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
        cv2.putText(frame, 'pitch : ' + str(int(pitch)), (20, 25), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
        cv2.putText(frame, 'roll : ' + str(int(roll)), (20, 40), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)

        (nose_end_point2D, _) = cv2.projectPoints(np.array([(0.0, 0.0, 500.0)]), rotation_vector,
                                                         translation_vector, camera_matrix, dist_coeffs)
        #Plot of points used in the calculation/Display of face direction vector
        for p in image_points:
            cv2.drawMarker(frame, (int(p[0]), int(p[1])),  (0.0, 1.409845, 255),markerType=cv2.MARKER_CROSS, thickness=1)

        p1 = (int(image_points[0][0]), int(image_points[0][1]))
        p2 = (int(nose_end_point2D[0][0][0]), int(nose_end_point2D[0][0][1]))

        cv2.arrowedLine(frame, p1, p2, (255, 0, 0), 2)
    
    cv2.imshow('frame',frame) #Display image
    if cv2.waitKey(1) & 0xFF == ord('q'): #Press q to break and exit while
        break


capture.release() #Exit video capture
cv2.destroyAllWindows() #close window

2020/4/2 postscript

OpenCV doesn't work with Qt related errors!

I got this error recently

qt.qpa.plugin: Could not find the Qt platform plugin "cocoa" in ""
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.

It seems that when I install new openCV with pip, I get this error. It works when the version is lowered.

pip3 install opencv-python==4.1.2.30

Reference summary

Qiita Investigating face orientation estimation

External

Head Pose Estimation using OpenCV and Dlib dlib documentation Facial landmarks with dlib, OpenCV, and Python

Head orientation estimation using Python and OpenCV + dlib

Head direction estimation?

Head orientation estimation method

An example of a program

Module loading

HeadPoseEstimation.py

Camera and face detector settings

HeadPoseEstimation.py

Contents of head direction estimation

HeadPoseEstimation.py

Parameter explanations and notes

Facial features to use

HPEcal.py

Finally

program

HeadPoseEstimation.py

2020/4/2 postscript

OpenCV doesn't work with Qt related errors!

Reference summary

External

`HeadPoseEstimation.py`

`HeadPoseEstimation.py`

`HeadPoseEstimation.py`

`HPEcal.py`

`HeadPoseEstimation.py`