Introduction

If you concentrate on your desk work, you may end up stooping without realizing it. This is even more so with remote work because there is no other person's eyes.

So I created a mechanism to alert you when your posture gets worse!

environment

Python 3.7.4 Webcam (Logitech HD Webcam C615)

OpenCV

Installation

$ pip install opencv-python

First, we will capture the camera image and detect the eyes.

The following is a reference for the object detection method in OpenCV.

"Face detection with Haar Cascades" http://labs.eecs.tottori-u.ac.jp/sd/Member/oyamada/OpenCV/html/py_tutorials/py_objdetect/py_face_detection/py_face_detection.html#face-detection

Since we want to detect the eyes this time, we will use "haarcascade_eye.xml" as the classifier. You can download it from the link below.

https://github.com/opencv/opencv/tree/master/data/haarcascades

`capture.py`


import numpy as np
import cv2

#Camera device number
#Available cameras are ordered from 0
DEVICE_ID = 0

#Select a classifier
cascade = cv2.CascadeClassifier('haarcascade_eye.xml')

#Capture camera image
cap = cv2.VideoCapture(DEVICE_ID, cv2.CAP_DSHOW)

while cap.isOpened():
    #Get frame
    ret, frame = cap.read()

    #Convert to grayscale
    img_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

    #Detect eyes
    eyes = cascade.detectMultiScale(img_gray, minSize=(30, 30))

    #Enclose the detected eye in a square
    for (x, y, w, h) in eyes:
        color = (255, 0, 0)
        cv2.rectangle(img_gray, (x, y), (x+w, y+h), color, thickness=3)

    #display
    cv2.imshow('capture', img_gray)

    #Exit the loop with the ESC key
    if cv2.waitKey(1) & 0xFF == 27:
        break

#End processing
cv2.destroyAllWindows()
cap.release()

The 25th line y is the detected eye height information. Therefore, it is OK if you record this value and see how it falls!

moving average

Well, I got the eye height, but it is not so good to use this value as it is. It's not something that has been judged to be a stoop just for a moment, so I'll deal with the average value for a certain period of time.

That's where the "moving average" comes into play. Those who do stocks and Forex may be familiar. This time, I used the simplest "simple moving average" among the moving averages.

Simple Moving Average (SMA) is an unweighted simple average of the last n data. For example, the simple moving average of the closing prices for 10 days is the average of the closing prices for the last 10 days. Set their closing prices to $ {\ displaystyle p_ {M}} p_ {{M}}, {\ displaystyle p_ {M-1}} p_ {{M-1}}, ..., {\ displaystyle p_ {M-9 }} p_ {{M-9}} $, then the formula for the simple moving average SMA (p, 10) is:

{\text{SMA}}{M}={p{M}+p_{M-1}+\cdots +p_{M-9} \over 10}


 > To find the next day's simple moving average, add a new closing price and remove the oldest closing price. In other words, in this calculation, it is not necessary to recalculate the sum.

>```math
{\text{SMA}}_{{\mathrm  {today}}}={\text{SMA}}_{{\mathrm  {yesterday}}}-{p_{{M-n+1}} \over n}+{p_{{M+1}} \over n}

Quoted from wikipedia

The point is that you can average the past n pieces of each element of your data.

If the difference between the beginning and the end of the time series data of this simple moving average is larger than a certain threshold value, it can be judged as a stoop.

Homography

The eye height information y that I have talked about so far is only the pixel position information of the camera image. Therefore, it is necessary to separately calculate how many cm the change in the value of y corresponds to in the real world.

Therefore, I will consider a projective transformation. Since the image from the camera is two-dimensional, it is necessary to project the three-dimensional world onto a two-dimensional plane when visualizing it. This time, consider it as a perspective projection transformation.

Perspective transformation is like $ \ left (\ frac {x} {z}, \ frac {y} {z}, 0 \ right) $ for 3D coordinates $ (x, y, z) $ Convert. I think it's easy to understand because it matches the intuition that "distant things look small".

Therefore, the relationship between the height difference of the camera image $ \ Delta y_d $ (px) and the height difference of the real world $ \ Delta y_v $ (cm) is as follows.

\begin{equation}
  \frac{\Delta y_d}{f} = \frac{\Delta y_v}{z_v} \tag{1}
\end{equation}

$ z_v $ is the distance between the camera and the object, and $ f $ is the focal length of the camera. It's easy considering the similarity of triangles.

The focal length of the camera varies depending on the camera model, etc., so it must be obtained by calibration. Camera calibration is also provided by OpenCV.

"Camera Calibration" http://whitewell.sakura.ne.jp/OpenCV/py_tutorials/py_calib3d/py_calibration/py_calibration.html

The contents of the internal parameters of the camera obtained by calibration are as follows, and the focal length can be found from here.

K = \left[
      \begin{array}{ccc}
        f & 0 & x_c \\
        0 & f & y_c \\
        0 & 0 & 1
      \end{array}
    \right]

$ x_c, y_c $ are the center points of the projection plane.
However, this calibration is a little troublesome because I have to prepare an image of the chess board actually taken with the camera. .. .. The webcam I use (Logitech HD Webcam C615) costs about 500 $ f $, so for reference.

Judgment of stoop

Once you get $ f $, you're done.

In the time series data of the simple moving average at eye height, the difference between the beginning and the end is $ \ Delta y_d $, and it is only necessary to judge by Eq. The distance $ z_v $ from the camera to the computer is about 45 cm. We also set the real-world eye height difference threshold $ \ Delta y_v $ to 3 cm.

Whole code

Plotted with pyplot for data visualization.

Also, alerts use Tkinter's messagebox.

`detect_posture.py`


import numpy as np
from matplotlib import pyplot as plt
import cv2
import tkinter as tk
from tkinter import messagebox


WINDOW_NAME = "capture"     # Videcapute window name
CAP_FRAME_WIDTH = 640       # Videocapture width
CAP_FRAME_HEIGHT = 480      # Videocapture height
CAP_FRAME_FPS = 30          # Videocapture fps (depends on user camera)

DEVICE_ID = 0               # Web camera id

SMA_SEC = 10                        # SMA seconds
SMA_N = SMA_SEC * CAP_FRAME_FPS     # SMA n

PLOT_NUM = 20                   # Plot points number
PLOT_DELTA = 1/CAP_FRAME_FPS    # Step of X axis

Z = 45                  # (cm) Distance from PC to face 
D = 3                   # (cm) Limit of lowering eyes
F = 500                 # Focal length


def simple_moving_average(n, data):
    """ Return simple moving average """
    result = []
    for m in range(n-1, len(data)):
        total = sum([data[m-i] for i in range(n)])
        result.append(total/n)
    return result

def add_simple_moving_average(smas, n, data):
    """ Add simple moving average """
    total = sum([data[-1-i] for i in range(n)])
    smas.append(total/n)


if __name__ == '__main__':
    # Not show tkinter window
    root = tk.Tk()
    root.iconify()

    # Chose cascade
    cascade = cv2.CascadeClassifier("haarcascade_eye.xml")

    # Capture setup
    cap = cv2.VideoCapture(DEVICE_ID, cv2.CAP_DSHOW)
    cap.set(cv2.CAP_PROP_FRAME_WIDTH, CAP_FRAME_WIDTH)
    cap.set(cv2.CAP_PROP_FRAME_HEIGHT, CAP_FRAME_HEIGHT)
    cap.set(cv2.CAP_PROP_FPS, CAP_FRAME_FPS)

    # Prepare windows
    cv2.namedWindow(WINDOW_NAME)

    # Time series data of eye height
    eye_heights = []
    sma_eye_heights = []

    # Plot setup
    ax = plt.subplot()
    graph_x = np.arange(0, PLOT_NUM*PLOT_DELTA, PLOT_DELTA)
    eye_y = [0] * PLOT_NUM
    sma_eye_y = [0] * PLOT_NUM
    eye_lines, = ax.plot(graph_x, eye_y, label="realtime")
    sma_eye_lines, = ax.plot(graph_x, sma_eye_y, label="SMA")
    ax.legend()

    while cap.isOpened():
        # Get a frame
        ret, frame = cap.read()

        # Convert image to gray scale
        img_gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
        
        # Detect human eyes
        eyes = cascade.detectMultiScale(img_gray, minSize=(30, 30))

        # Mark on the detected eyes
        for (x, y, w, h) in eyes:
            color = (255, 0, 0)
            cv2.rectangle(img_gray, (x, y), (x+w, y+h), color, thickness=3)
        
        # Store eye heights
        if len(eyes) > 0:
            eye_average_height = CAP_FRAME_HEIGHT - sum([y for _, y, _, _ in eyes]) / len(eyes)
            eye_heights.append(eye_average_height)

            if len(eye_heights) == SMA_N:
                sma_eye_heights = simple_moving_average(SMA_N, eye_heights)
            elif len(eye_heights) > SMA_N:
                add_simple_moving_average(sma_eye_heights, SMA_N, eye_heights)
            

        # Detect bad posture
        if sma_eye_heights and (sma_eye_heights[0] - sma_eye_heights[-1] > F * D / Z):
            res = messagebox.showinfo("BAD POSTURE!", "Sit up straight!\nCorrect your posture, then click ok.")
            if res == "ok":
                # Initialize state, and restart from begening
                eye_heights = []
                sma_eye_heights = []
                graph_x = np.arange(0, PLOT_NUM*PLOT_DELTA, PLOT_DELTA)
                continue

        # Plot eye heights
        graph_x += PLOT_DELTA
        ax.set_xlim((graph_x.min(), graph_x.max()))
        ax.set_ylim(0, CAP_FRAME_HEIGHT)

        if len(eye_heights) >= PLOT_NUM:
            eye_y = eye_heights[-PLOT_NUM:]
            eye_lines.set_data(graph_x, eye_y)
            plt.pause(.001)
        
        if len(sma_eye_heights) >= PLOT_NUM:
            sma_eye_y = sma_eye_heights[-PLOT_NUM:]
            sma_eye_lines.set_data(graph_x, sma_eye_y)
            plt.pause(.001)

        
        # Show result
        cv2.imshow(WINDOW_NAME, img_gray)

        # Quit with ESC Key
        if cv2.waitKey(1) & 0xFF == 27:
            break
    
    # End processing
    cv2.destroyAllWindows()
    cap.release()

reference

https://cvml-expertguide.net/2019/08/17/term-camera-model/

Detect stoop with OpenCV