** The video used in this article was recorded a long time ago and is currently refraining from muscle training at the gym. ** **

Introduction

This is the second "Muscle Training x Deep-Learning". The first is here (Deep learning made it dramatically easier to see the time-lapse of physical changes). "Form check" is a habit of many trainees (people who love muscle training). By fixing your smartphone during training, taking a selfie, and looking back at it later, you will be able to know your own form habits and improve the quality of subsequent training! In this article, I focused on "squat (king of muscle training)" among muscle training and wrote a story of checking the form using deep learning.

First from the result

ezgif.com-optimize.gif

Compressed for publication.

The source code is available in the Google Colab notebook. https://colab.research.google.com/drive/18bFHGZ415T6emBoPHacrMyEckPGW9VCv You can also run it using your own video, so please give it a try.

[1. Posture estimation](# 1 Posture estimation)
[1-1. Try to move](# 1-1 Try to move)
[1-2. Heat map](# 1-2 heat map)
[1-3. Posture estimation](# 1-3 Posture estimation)
[2. Application to squat form check](# 2 Application to squat form check)
[2-1. Important factors in squats](# 2-1 Important factors in squats)
[2-2. Make it look like a form check](# 2-2 Make it look like a form check)
Summary

1. Posture estimation

Posture estimation is a technology that estimates the posture of the human body and animals. For example, in the case of the human body, the posture can be expressed by detecting the neck, buttocks, knees, etc. and connecting them. A well-known one is OpenPose.

Quoted from openpose github *

1-1. Try to move

This time, I referred to "[Learn while making! Deep learning by PyTorch](https://www.amazon.co.jp/%E3%81%A4%E3%81%8F%E3%82%8A" % E3% 81% AA% E3% 81% 8C% E3% 82% 89% E5% AD% A6% E3% 81% B6% EF% BC% 81PyTorch% E3% 81% AB% E3% 82% 88% E3 % 82% 8B% E7% 99% BA% E5% B1% 95% E3% 83% 87% E3% 82% A3% E3% 83% BC% E3% 83% 97% E3% 83% A9% E3% 83 % BC% E3% 83% 8B% E3% 83% B3% E3% 82% B0-% E5% B0% 8F% E5% B7% 9D-% E9% 9B% 84% E5% A4% AA% E9% 83 % 8E-ebook / dp / B07VPDVNKW /) "GitHub Page. I wanted to implement it with PyTorch x Google Colab, so the above implementation was very helpful.

For the time being, I will try to estimate the posture from a single image using free materials. The source is a schematic version. See the Google Colab notebook for more information.


def create_model(weight_path):
  """
Create a model.
The trained model and OpenPoseNet have different network layer names
  (For example, module.model0.0.weight and model0.model.0.weight)So
Correspond and load
  """

  #Model definition
  model = OpenPoseNet()

  #Load learned parameters
  net_weights = torch.load(
      weights_path, map_location={'cuda:0': 'cpu'})
  keys = list(net_weights.keys())

  weights_load = {}

  #The loaded contents can be displayed on OpenPoseNet.
  #Parameter name model.state_dict().keys()Copy to
  for i in range(len(keys)):
    weights_load[list(model.state_dict().keys())[i]
                 ] = net_weights[list(keys)[i]]

  #Give the model what you copied
  state = model.state_dict()
  state.update(weights_load)
  model.load_state_dict(state)
  return model

model = create_model(weights_path)
model.to(device)

model.eval()
with torch.no_grad():
  predicted_outputs, _ = model(img.to(device))
  pafs = predicted_outputs[0][0].cpu().detach().numpy().transpose(1, 2, 0)
  heatmaps = predicted_outputs[1][0].cpu().detach().numpy().transpose(1, 2, 0)

pafs = cv2.resize(
    pafs, (test_img.shape[1], test_img.shape[0]), interpolation=cv2.INTER_CUBIC)
heatmaps = cv2.resize(
    heatmaps, (test_img.shape[1], test_img.shape[0]), interpolation=cv2.INTER_CUBIC)

_, result_img, _, _ = decode_pose(test_img, heatmaps, pafs)

cv2_imshow(result_img)

The results are as follows.

[Photos created by Freepic.diller-jp.freepik.com](https://jp.freepik.com/free-photo/full-length-of-young-concentrated-sportsman-doing-squats-during- Use workout-outdoors_2593727.htm#page = 1 & query = squat & position = 17) *

It feels pretty good. If you do this for each frame of the video, you will be able to check the form. However, I don't want all of these joints, only the minimum necessary parts. This time,

Neck
Butt (right)
Knee (right)
Ankle (right)

I will focus on just that.

1-2. Heat map

In the model to be used, the posture is estimated by outputting the heat map and extracting each part from it. First, let's draw a heat map of the target part.

# 1:Neck, 8:Butt (right), 9:Knee (right), 10:Ankle (right)
necessary_parts=[1,8,9,10] 
fig, ax = plt.subplots(2, 2, figsize=(16, 10))

for i, part in enumerate(necessary_parts):
    heat_map = heatmaps[:, :, part]
    heat_map = np.uint8(cm.jet(heat_map)*255)
    heat_map = cv2.cvtColor(heat_map, cv2.COLOR_RGBA2RGB)
    blend_img = cv2.addWeighted(test_img, 0.5, heat_map, 0.5, 0)
    ax[int(i/2), i%2].imshow(blend_img)

plt.show()

The results are as follows. I have been able to extract a specific part properly.

1-3. Posture estimation

Identify the joint position from the output heat map of the specific part. This time, on the premise that only one person can be seen on one sheet, only one point is left for each part, and the specified parts are connected.

def find_joint_coords(heatmaps, necessary_parts, param = {'thre1': 0.1, 'thre2': 0.05, 'thre3': 0.5}):
  """
Detect joint coordinates
  """
  
  joints = []
  for part in necessary_parts:
      heat_map = heatmaps[:, :, part]
      peaks = find_peaks(param, heat_map)
      if len(peaks) == 0:
        joints.append(img, [np.nan, np.nan], [np.nan, np.nan])
      #If there are two or more peaks, leave only the strongest location
      if peaks.shape[0]>1:
        max_peak = None
        for peak in peaks: 
          val = heat_map[peak[1], peak[0]]
          if max_peak is None or heat_map[max_peak[1], max_peak[0]] < val:
            max_peak = peak
      else:
        max_peak = peaks[0]
      joints.append(max_peak)

  return joints

img = test_img.copy()
joints = find_joint_coords(heatmaps, necessary_parts)
for i in range(len(joints)-1):
  img = cv2.line(img,tuple(joints[i].astype(int)),tuple(joints[i+1].astype(int)),(255,0,0),3)

cv2_imshow(img)

The results are as follows. feel well.

2. Application to squat form check

Now that I know it's going to work, I'll apply it to my squat video.

2-1. Important factors in squats

The skeleton that forms the shin, thigh, and trunk varies from person to person, so the optimal form varies from person to person. I think it will be easier to imagine in the figure below.

In other words, the form that suits the person is more due to skeletal proportions than flexibility and strength balance. Therefore, if you can grasp this skeletal proportion by yourself, it will help prevent injuries and you can safely increase the weight of the squat. For more information, check out YouTube and a number of other great web pages below.

Squats Part 1: Fold-Ability and Proportions

2-2. Make it look like a form check

Since the coordinates of the joints can be taken, find the joint angle and moment arm and arrange them in chronological order. The angle of the joint is the angle of the two vectors that sandwich the joint. The moment arm simulates the ideal center of gravity (midfoot) and uses it as the distance of the perpendicular line passing through the joint with respect to the straight line in the vertical direction. Calculate this for each frame and connect them to complete. Finally, plot the angle and moment arm produced in each frame with matplotlib, paste it at the top of the screen, and add a graph that looks like it is being analyzed nicely.

** * The source code will be long, so I will omit it. Please refer to the notebook. ** **

The completed example is as follows (the frame is selected appropriately).

Summary

We applied posture estimation using deep learning and performed a squat form check. I think we were able to detect the joints with high accuracy. From this result, you may be able to further improve the quality of squats by finding a player with a skeleton close to you and comparing the player with the form.

The challenges are as follows.

Midfoot (ideal center of gravity) is hard-coded
Heavy processing
Unnecessarily detecting the part
Ingenuity like Pointwise convolution
Learning needs to be redone
Cannot be detected when the plate is attached
This is a fatal issue, but I didn't like being able to post, so I pretended not to see it, I'm sorry.
As a response, let the data of the state of carrying the plate be learned. Or, it is possible to detect the tip or face of the bar and estimate the neck from it.

As mentioned above, muscle training that you can do at home is not just about moving your body! Have a fun muscle training hack life!

Check squat forms with deep learning