If you concentrate on your desk work, you may end up stooping without realizing it. This is even more so with remote work because there is no other person's eyes.
So I created a mechanism to alert you when your posture gets worse!
Python 3.7.4 Webcam (Logitech HD Webcam C615)
OpenCV
Installation
$ pip install opencv-python
First, we will capture the camera image and detect the eyes.
The following is a reference for the object detection method in OpenCV.
"Face detection with Haar Cascades" http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_objdetect/py_face_detection/py_face_detection.html#face-detection
Since we want to detect the eyes this time, we will use "haarcascade_eye.xml" as the classifier. You can download it from the link below.
https://github.com/opencv/opencv/tree/master/data/haarcascades
capture.py
import numpy as np
import cv2
#Camera device number
#Available cameras are ordered from 0
DEVICE_ID = 0
#Select a classifier
cascade = cv2.CascadeClassifier('haarcascade_eye.xml')
#Capture camera image
cap = cv2.VideoCapture(DEVICE_ID, cv2.CAP_DSHOW)
while cap.isOpened():
#Get frame
ret, frame = cap.read()
#Convert to grayscale
img_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
#Detect eyes
eyes = cascade.detectMultiScale(img_gray, minSize=(30, 30))
#Enclose the detected eye in a square
for (x, y, w, h) in eyes:
color = (255, 0, 0)
cv2.rectangle(img_gray, (x, y), (x+w, y+h), color, thickness=3)
#display
cv2.imshow('capture', img_gray)
#Exit the loop with the ESC key
if cv2.waitKey(1) & 0xFF == 27:
break
#End processing
cv2.destroyAllWindows()
cap.release()
The 25th line y
is the detected eye height information.
Therefore, it is OK if you record this value and see how it falls!
Well, I got the eye height, but it is not so good to use this value as it is. It's not something that has been judged to be a stoop just for a moment, so I'll deal with the average value for a certain period of time.
That's where the "moving average" comes into play. Those who do stocks and Forex may be familiar. This time, I used the simplest "simple moving average" among the moving averages.
Simple Moving Average (SMA) is an unweighted simple average of the last n data. For example, the simple moving average of the closing prices for 10 days is the average of the closing prices for the last 10 days. Set their closing prices to $ {\ displaystyle p_ {M}} p_ {{M}}, {\ displaystyle p_ {M-1}} p_ {{M-1}}, ..., {\ displaystyle p_ {M-9 }} p_ {{M-9}} $, then the formula for the simple moving average SMA (p, 10) is:
{\text{SMA}}{M}={p{M}+p_{M-1}+\cdots +p_{M-9} \over 10}
> To find the next day's simple moving average, add a new closing price and remove the oldest closing price. In other words, in this calculation, it is not necessary to recalculate the sum.
>```math
{\text{SMA}}_{{\mathrm {today}}}={\text{SMA}}_{{\mathrm {yesterday}}}-{p_{{M-n+1}} \over n}+{p_{{M+1}} \over n}
The point is that you can average the past n pieces of each element of your data.
If the difference between the beginning and the end of the time series data of this simple moving average is larger than a certain threshold value, it can be judged as a stoop.
The eye height information y
that I have talked about so far is only the pixel position information of the camera image.
Therefore, it is necessary to separately calculate how many cm the change in the value of y
corresponds to in the real world.
Therefore, I will consider a projective transformation. Since the image from the camera is two-dimensional, it is necessary to project the three-dimensional world onto a two-dimensional plane when visualizing it. This time, consider it as a perspective projection transformation.
Perspective transformation is like $ \ left (\ frac {x} {z}, \ frac {y} {z}, 0 \ right) $ for 3D coordinates $ (x, y, z) $ Convert. I think it's easy to understand because it matches the intuition that "distant things look small".
Therefore, the relationship between the height difference of the camera image $ \ Delta y_d $ (px) and the height difference of the real world $ \ Delta y_v $ (cm) is as follows.
\begin{equation}
\frac{\Delta y_d}{f} = \frac{\Delta y_v}{z_v} \tag{1}
\end{equation}
$ z_v $ is the distance between the camera and the object, and $ f $ is the focal length of the camera. It's easy considering the similarity of triangles.
The focal length of the camera varies depending on the camera model, etc., so it must be obtained by calibration. Camera calibration is also provided by OpenCV.
"Camera Calibration" http://whitewell.sakura.ne.jp/OpenCV/py_tutorials/py_calib3d/py_calibration/py_calibration.html
The contents of the internal parameters of the camera obtained by calibration are as follows, and the focal length can be found from here.
K = \left[
\begin{array}{ccc}
f & 0 & x_c \\
0 & f & y_c \\
0 & 0 & 1
\end{array}
\right]
$ x_c, y_c $ are the center points of the projection plane.
However, this calibration is a little troublesome because I have to prepare an image of the chess board actually taken with the camera. .. ..
The webcam I use (Logitech HD Webcam C615) costs about 500 $ f $, so for reference.
Once you get $ f $, you're done.
In the time series data of the simple moving average at eye height, the difference between the beginning and the end is $ \ Delta y_d $, and it is only necessary to judge by Eq. The distance $ z_v $ from the camera to the computer is about 45 cm. We also set the real-world eye height difference threshold $ \ Delta y_v $ to 3 cm.
Plotted with pyplot for data visualization.
Also, alerts use Tkinter's messagebox.
detect_posture.py
import numpy as np
from matplotlib import pyplot as plt
import cv2
import tkinter as tk
from tkinter import messagebox
WINDOW_NAME = "capture" # Videcapute window name
CAP_FRAME_WIDTH = 640 # Videocapture width
CAP_FRAME_HEIGHT = 480 # Videocapture height
CAP_FRAME_FPS = 30 # Videocapture fps (depends on user camera)
DEVICE_ID = 0 # Web camera id
SMA_SEC = 10 # SMA seconds
SMA_N = SMA_SEC * CAP_FRAME_FPS # SMA n
PLOT_NUM = 20 # Plot points number
PLOT_DELTA = 1/CAP_FRAME_FPS # Step of X axis
Z = 45 # (cm) Distance from PC to face
D = 3 # (cm) Limit of lowering eyes
F = 500 # Focal length
def simple_moving_average(n, data):
""" Return simple moving average """
result = []
for m in range(n-1, len(data)):
total = sum([data[m-i] for i in range(n)])
result.append(total/n)
return result
def add_simple_moving_average(smas, n, data):
""" Add simple moving average """
total = sum([data[-1-i] for i in range(n)])
smas.append(total/n)
if __name__ == '__main__':
# Not show tkinter window
root = tk.Tk()
root.iconify()
# Chose cascade
cascade = cv2.CascadeClassifier("haarcascade_eye.xml")
# Capture setup
cap = cv2.VideoCapture(DEVICE_ID, cv2.CAP_DSHOW)
cap.set(cv2.CAP_PROP_FRAME_WIDTH, CAP_FRAME_WIDTH)
cap.set(cv2.CAP_PROP_FRAME_HEIGHT, CAP_FRAME_HEIGHT)
cap.set(cv2.CAP_PROP_FPS, CAP_FRAME_FPS)
# Prepare windows
cv2.namedWindow(WINDOW_NAME)
# Time series data of eye height
eye_heights = []
sma_eye_heights = []
# Plot setup
ax = plt.subplot()
graph_x = np.arange(0, PLOT_NUM*PLOT_DELTA, PLOT_DELTA)
eye_y = [0] * PLOT_NUM
sma_eye_y = [0] * PLOT_NUM
eye_lines, = ax.plot(graph_x, eye_y, label="realtime")
sma_eye_lines, = ax.plot(graph_x, sma_eye_y, label="SMA")
ax.legend()
while cap.isOpened():
# Get a frame
ret, frame = cap.read()
# Convert image to gray scale
img_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
# Detect human eyes
eyes = cascade.detectMultiScale(img_gray, minSize=(30, 30))
# Mark on the detected eyes
for (x, y, w, h) in eyes:
color = (255, 0, 0)
cv2.rectangle(img_gray, (x, y), (x+w, y+h), color, thickness=3)
# Store eye heights
if len(eyes) > 0:
eye_average_height = CAP_FRAME_HEIGHT - sum([y for _, y, _, _ in eyes]) / len(eyes)
eye_heights.append(eye_average_height)
if len(eye_heights) == SMA_N:
sma_eye_heights = simple_moving_average(SMA_N, eye_heights)
elif len(eye_heights) > SMA_N:
add_simple_moving_average(sma_eye_heights, SMA_N, eye_heights)
# Detect bad posture
if sma_eye_heights and (sma_eye_heights[0] - sma_eye_heights[-1] > F * D / Z):
res = messagebox.showinfo("BAD POSTURE!", "Sit up straight!\nCorrect your posture, then click ok.")
if res == "ok":
# Initialize state, and restart from begening
eye_heights = []
sma_eye_heights = []
graph_x = np.arange(0, PLOT_NUM*PLOT_DELTA, PLOT_DELTA)
continue
# Plot eye heights
graph_x += PLOT_DELTA
ax.set_xlim((graph_x.min(), graph_x.max()))
ax.set_ylim(0, CAP_FRAME_HEIGHT)
if len(eye_heights) >= PLOT_NUM:
eye_y = eye_heights[-PLOT_NUM:]
eye_lines.set_data(graph_x, eye_y)
plt.pause(.001)
if len(sma_eye_heights) >= PLOT_NUM:
sma_eye_y = sma_eye_heights[-PLOT_NUM:]
sma_eye_lines.set_data(graph_x, sma_eye_y)
plt.pause(.001)
# Show result
cv2.imshow(WINDOW_NAME, img_gray)
# Quit with ESC Key
if cv2.waitKey(1) & 0xFF == 27:
break
# End processing
cv2.destroyAllWindows()
cap.release()
https://cvml-expertguide.net/2019/08/17/term-camera-model/
Recommended Posts