Head direction estimation, Head Pose Optimization in English. It is an algorithm that estimates the direction in which the face is facing and the inclination of the head from the input image information and facial feature data. Recently, it is widely used for Vtuber development.
Qiita has already introduced several methods for estimating the head direction. This is very well summarized in the Qiita article. Investigating face orientation estimation
I think this is the article you are referring to about the head estimation method using Python and OpenCV + dlib. Head Pose Estimation using OpenCV and Dlib
The face orientation algorithms are described in great detail in the How do pose estimation algorithms work? Section of this page.
For the time being, I will write the program introduced in the article. You can download the dat file for face recognition from here. [dlib.net] 68 points learned data for face recognition [DL]
HeadPoseEstimation.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import cv2 #OpenCV:Image processing library
import dlib #Machine learning library
import imutils #OpenCV assistance
from imutils import face_utils
import numpy as np
I'm importing OpenCV for image processing, dlib for image recognition, and imutils as an aid to displaying on the screen.
HeadPoseEstimation.py
DEVICE_ID = 0 #Camera ID 0 to use is a standard webcam
capture = cv2.VideoCapture(DEVICE_ID)#Read dlib trained data
predictor_path = ",,,/shape_predictor_68_face_landmarks.dat"
#Copy and paste the path of the learned dat file
detector = dlib.get_frontal_face_detector() #Call the face detector. Only the face is detected.
predictor = dlib.shape_predictor(predictor_path) #Output landmarks such as eyes and nose from the face
See here for detailed dlib functions. dlib documentation
It acquires one frame at a time from the camera and processes it.
HeadPoseEstimation.py
while(True): #Get images continuously from the camera
ret, frame = capture.read() #Capture from the camera and put one frame of image data in the frame
frame = imutils.resize(frame, width=1000) #Adjust the display size of the frame image
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert to gray scale
rects = detector(gray, 0) #Detect face from gray
image_points = None
for rect in rects:
shape = predictor(gray, rect)
shape = face_utils.shape_to_np(shape)
for (x, y) in shape: #Plot 68 landmarks on the entire face
cv2.circle(frame, (x, y), 1, (255, 255, 255), -1)
image_points = np.array([
tuple(shape[30]),#Nose tip
tuple(shape[21]),
tuple(shape[22]),
tuple(shape[39]),
tuple(shape[42]),
tuple(shape[31]),
tuple(shape[35]),
tuple(shape[48]),
tuple(shape[54]),
tuple(shape[57]),
tuple(shape[8]),
],dtype='double')
if len(rects) > 0:
cv2.FONT_HERSHEY_PLAIN, 0.7, (0, 0, 255), 2)
model_points = np.array([
(0.0,0.0,0.0), # 30
(-30.0,-125.0,-30.0), # 21
(30.0,-125.0,-30.0), # 22
(-60.0,-70.0,-60.0), # 39
(60.0,-70.0,-60.0), # 42
(-40.0,40.0,-50.0), # 31
(40.0,40.0,-50.0), # 35
(-70.0,130.0,-100.0), # 48
(70.0,130.0,-100.0), # 54
(0.0,158.0,-10.0), # 57
(0.0,250.0,-50.0) # 8
])
size = frame.shape
focal_length = size[1]
center = (size[1] // 2, size[0] // 2) #Face center coordinates
camera_matrix = np.array([
[focal_length, 0, center[0]],
[0, focal_length, center[1]],
[0, 0, 1]
], dtype='double')
dist_coeffs = np.zeros((4, 1))
(success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix,
dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE)
#Rotation matrix and Jacobian
(rotation_matrix, jacobian) = cv2.Rodrigues(rotation_vector)
mat = np.hstack((rotation_matrix, translation_vector))
#yaw,pitch,Take out roll
(_, _, _, _, _, _, eulerAngles) = cv2.decomposeProjectionMatrix(mat)
yaw = eulerAngles[1]
pitch = eulerAngles[0]
roll = eulerAngles[2]
print("yaw",int(yaw),"pitch",int(pitch),"roll",int(roll))#Extraction of head posture data
cv2.putText(frame, 'yaw : ' + str(int(yaw)), (20, 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
cv2.putText(frame, 'pitch : ' + str(int(pitch)), (20, 25), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
cv2.putText(frame, 'roll : ' + str(int(roll)), (20, 40), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
(nose_end_point2D, _) = cv2.projectPoints(np.array([(0.0, 0.0, 500.0)]), rotation_vector,
translation_vector, camera_matrix, dist_coeffs)
#Plot of points used in the calculation/Display of face direction vector
for p in image_points:
cv2.drawMarker(frame, (int(p[0]), int(p[1])), (0.0, 1.409845, 255),markerType=cv2.MARKER_CROSS, thickness=1)
p1 = (int(image_points[0][0]), int(image_points[0][1]))
p2 = (int(nose_end_point2D[0][0][0]), int(nose_end_point2D[0][0][1]))
cv2.arrowedLine(frame, p1, p2, (255, 0, 0), 2)
cv2.imshow('frame',frame) #Display image
if cv2.waitKey(1) & 0xFF == ord('q'): #Press q to break and exit while
break
capture.release() #Exit video capture
cv2.destroyAllWindows() #close window
If it works properly, it will be like this.
yaw,roll,pitch The head posture parameters yaw, roll, and pitch look like this. (It's the same as an airplane)
Please refer to here for the position of image_points defined this time. The point used this time is ・ Inside the eyebrows (22,23) ・ Inside the eyes (40,43) ・ Nose head (31) ・ Both sides of the nose (32,36) ・ Both outsides of the mouth (49,55) ・ Under the lips (58) ・ Chin (9) It is 11 points of. The algorithm can estimate the direction of the head with 5 points, but when I tried it, when the score was small, the direction of the vector at the tip of the nose turned around, so I increased the score. (Is it because the learned data is based on Westerners ...) Facial landmarks with dlib, OpenCV, and Python The more you use the points on the outside of the face, the better the accuracy will be, but if the eyebrows etc. are cut off when you turn to the side, it will cause a false judgment of the feature amount, so try to use the points in the center of the face as much as possible. <img width = "638" alt = "IMG_18D234CF6CC9-1.jpeg " src = "https://qiita-image-store.s3.ap-northeast-1.amazonaws.com/0/543519/9cc12aa3-958f-0940" -9737-84a999dfacbf.jpeg ">
And the problem is model_points, that is, what should I do with the position coordinates of the parts of my face, but I defined it by force from the following program. The (x, y) coordinate data of the face with the tip of the nose as the origin will appear in the image, so please face the camera as straight as possible, extend your posture, and read it with a spirit. I'm guessing about the z coordinate. Calculate the distance from the tip of your nose to the area between your eyes and apply it to the height of your nose. Hang in there!
HPEcal.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import cv2 #OpenCV:Image processing library
import dlib #Machine learning library
import imutils #OpenCV assistance
from imutils import face_utils
import numpy as np
#Gets a VideoCapture object
DEVICE_ID = 0 #ID 0 is standard web cam
capture = cv2.VideoCapture(DEVICE_ID)#Read dlib trained data
predictor_path = "/shape_predictor_68_face_landmarks.dat"
print("[INFO] loading facial landmark predictor...")
detector = dlib.get_frontal_face_detector() #Call the face detector. Only the face is detected.
predictor = dlib.shape_predictor(predictor_path) #Output landmarks such as eyes and nose from the face
while(True): #Get images continuously from the camera
ret, frame = capture.read() #Capture from the camera and put one frame of image data in the frame
frame = imutils.resize(frame, width=2000) #Adjust the display size of the frame image
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert to gray scale
rects = detector(gray, 0) #Detect face from gray
image_points = None
for rect in rects:
shape = predictor(gray, rect)
shape = face_utils.shape_to_np(shape)
#print(shape[30])#Nose coordinates
cal = shape-shape[30]
print("######[X,Y]#######",
"\n point18=",cal[17],
"\n point22=",cal[21],
"\n point37=",cal[36],
"\n point40=",cal[39],
"\n point28=",cal[27],
"\n point31=",cal[30],
"\n point32=",cal[31],
"\n point49=",cal[48],
"\n point58=",cal[57],
"\n point9=",cal[8])
for (x, y) in shape: #Plot 68 landmarks on the entire face
cv2.circle(frame, (x, y), 1, (255, 255, 255), -1)
cv2.putText(frame,str((x, y)-shape[30]),(x,y), cv2.FONT_HERSHEY_PLAIN, 1.0, (0, 0, 255), 2)
cv2.imshow('frame',frame) #Display image
if cv2.waitKey(1) & 0xFF == ord('q'): #Press q to break and exit while
break
capture.release() #Exit video capture
cv2.destroyAllWindows() #close window
Finally, throw the whole program and finish it. Thank you for your hard work.
HeadPoseEstimation.py
import cv2 #OpenCV:Image processing library
import dlib #Machine learning library
import imutils #OpenCV assistance
from imutils import face_utils
import numpy as np
#Gets a VideoCapture object
DEVICE_ID = 0 #ID 0 is standard web cam
capture = cv2.VideoCapture(DEVICE_ID)#Read dlib trained data
predictor_path = ".../shape_predictor_68_face_landmarks.dat"
detector = dlib.get_frontal_face_detector() #Call the face detector. Only the face is detected.
predictor = dlib.shape_predictor(predictor_path) #Output landmarks such as eyes and nose from the face
while(True): #Get images continuously from the camera
ret, frame = capture.read() #Capture from the camera and put one frame of image data in the frame
frame = imutils.resize(frame, width=1000) #Adjust the display size of the frame image
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY) #Convert to gray scale
rects = detector(gray, 0) #Detect face from gray
image_points = None
for rect in rects:
shape = predictor(gray, rect)
shape = face_utils.shape_to_np(shape)
for (x, y) in shape: #Plot 68 landmarks on the entire face
cv2.circle(frame, (x, y), 1, (255, 255, 255), -1)
image_points = np.array([
tuple(shape[30]),#Nose tip
tuple(shape[21]),
tuple(shape[22]),
tuple(shape[39]),
tuple(shape[42]),
tuple(shape[31]),
tuple(shape[35]),
tuple(shape[48]),
tuple(shape[54]),
tuple(shape[57]),
tuple(shape[8]),
],dtype='double')
if len(rects) > 0:
model_points = np.array([
(0.0,0.0,0.0), # 30
(-30.0,-125.0,-30.0), # 21
(30.0,-125.0,-30.0), # 22
(-60.0,-70.0,-60.0), # 39
(60.0,-70.0,-60.0), # 42
(-40.0,40.0,-50.0), # 31
(40.0,40.0,-50.0), # 35
(-70.0,130.0,-100.0), # 48
(70.0,130.0,-100.0), # 54
(0.0,158.0,-10.0), # 57
(0.0,250.0,-50.0) # 8
])
size = frame.shape
focal_length = size[1]
center = (size[1] // 2, size[0] // 2) #Face center coordinates
camera_matrix = np.array([
[focal_length, 0, center[0]],
[0, focal_length, center[1]],
[0, 0, 1]
], dtype='double')
dist_coeffs = np.zeros((4, 1))
(success, rotation_vector, translation_vector) = cv2.solvePnP(model_points, image_points, camera_matrix,
dist_coeffs, flags=cv2.SOLVEPNP_ITERATIVE)
#Rotation matrix and Jacobian
(rotation_matrix, jacobian) = cv2.Rodrigues(rotation_vector)
mat = np.hstack((rotation_matrix, translation_vector))
#yaw,pitch,Take out roll
(_, _, _, _, _, _, eulerAngles) = cv2.decomposeProjectionMatrix(mat)
yaw = eulerAngles[1]
pitch = eulerAngles[0]
roll = eulerAngles[2]
print("yaw",int(yaw),"pitch",int(pitch),"roll",int(roll))#Extraction of head posture data
cv2.putText(frame, 'yaw : ' + str(int(yaw)), (20, 10), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
cv2.putText(frame, 'pitch : ' + str(int(pitch)), (20, 25), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
cv2.putText(frame, 'roll : ' + str(int(roll)), (20, 40), cv2.FONT_HERSHEY_PLAIN, 1, (0, 0, 255), 2)
(nose_end_point2D, _) = cv2.projectPoints(np.array([(0.0, 0.0, 500.0)]), rotation_vector,
translation_vector, camera_matrix, dist_coeffs)
#Plot of points used in the calculation/Display of face direction vector
for p in image_points:
cv2.drawMarker(frame, (int(p[0]), int(p[1])), (0.0, 1.409845, 255),markerType=cv2.MARKER_CROSS, thickness=1)
p1 = (int(image_points[0][0]), int(image_points[0][1]))
p2 = (int(nose_end_point2D[0][0][0]), int(nose_end_point2D[0][0][1]))
cv2.arrowedLine(frame, p1, p2, (255, 0, 0), 2)
cv2.imshow('frame',frame) #Display image
if cv2.waitKey(1) & 0xFF == ord('q'): #Press q to break and exit while
break
capture.release() #Exit video capture
cv2.destroyAllWindows() #close window
I got this error recently
qt.qpa.plugin: Could not find the Qt platform plugin "cocoa" in ""
This application failed to start because no Qt platform plugin could be initialized. Reinstalling the application may fix this problem.
It seems that when I install new openCV with pip, I get this error. It works when the version is lowered.
pip3 install opencv-python==4.1.2.30
Qiita Investigating face orientation estimation
Head Pose Estimation using OpenCV and Dlib dlib documentation Facial landmarks with dlib, OpenCV, and Python
Recommended Posts