Introduction

"Selfie (body)" is a habit of many trainees (people who love muscle training). It's a blissful time to take a picture of your pumped body after training and look back at it later. In addition, if you animate the captured image like a time lapse, you can see that muscle growth is more pickable! This article uses deep learning to dramatically improve the time-lapse of the body.

First from the result

ezgif.com-optimize (3).gif Changes in the body from December 2017 to March 2020

Due to the size of the data, the image is cropped and compressed.

[1. Manual correction](# 1 Manual correction)
[1-1. Display as is](# 1-1 Display as is)
[1-2. Fix position](# 1-2 Fix position)
[1-2-1. Nipple navel coordinate addition tool](# 1-2-1 Nipple navel coordinate addition tool)
[1-2-2. Video creation](# 1-2-2 Video creation)
[2. Automatic correction using deep learning](# 2 Automatic correction using deep learning)
[2-1. Annotation data creation](# 2-1 Annotation data creation)
[2-2. Learning](# 2-2 learning)
[2-3. Application to unknown image](# 2-3 Application to unknown image)
[2-4. Post-processing](# 2-4 Post-processing)
[2-4-1. Truncate if the output value of each pixel is below the threshold](# 2-4-1 Truncate if the output value of each pixel is below the threshold)
[2-4-2. Divide the object](# 2-4-2 Divide the object)
[2-4-3. If there are 4 or more clusters, select 3 in descending order of area and discard the rest](# 2-4-3 If there are 4 or more clusters, select 3 in descending order of area] Select 3 and discard the rest)
[2-4-4. Find the center of gravity of each cluster](# 2-4-4 Find the center of gravity of each cluster)
[2-4-5. Sort in ascending order of the center of gravity of each cluster (right nipple → navel → left nipple)](# 2-4-5 Sort in ascending order of the center of gravity of each cluster Right Nipple coordinate left nipple)
[2-5. Result](# 2-5 result)
Summary

Overview

I created a time lapse from the images I took. However, I was worried about the gap between the images, so I corrected it manually to create a smooth time lapse. Furthermore, in order to save the trouble of manual work, correction was automatically performed using deep learning.

1. Manual correction

1-1. Display as it is

For the time being, let's create a time-lapse that just switches the image as it is continuously.

`Time-lapse creation code (part)`



#You can make videos with opencv,
#In order to create an mp4 file that can be played on discord in the environment of google colab,
#I enjoyed using skvideo.
import skvideo.io

def create_video(imgs, out_video_path, size_wh):
  video = []
  vid_out = skvideo.io.FFmpegWriter(out_video_path,
      inputdict={
          "-r": "10"
      },
      outputdict={
          "-r": "10"
      })
  
  for img in imgs:
    img = cv2.resize(img, size_wh)
    vid_out.writeFrame(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))

  vid_out.close()

imgs = load_images("images_dir")
create_video(imgs,  "video.mp4", (w,h))

The result is as follows.

ezgif.com-crop.gif

I'm worried about the gap and can't concentrate on my child (body).

1-2. Fixing the position

I want to somehow easily eliminate this gap. If I set a reference point somewhere on my body and fix it, I came up with the solution of "nipple" and "navel" in about 0.1 seconds. Here's how to fix your nipples and navel.

1-2-1. Nipple and navel coordinate addition tool

First, create a tool that gives UV coordinates to the nipple and navel. It may be possible to realize it by using cvat etc., but when I estimated the time to master it and the time to make my own tool, I concluded that it is faster to make it myself, so I made it.

The specification of the tool is that if you specify a folder, the images will be displayed continuously, so for each image, click the three points of the nipple and navel, and the clicked coordinates will be output to the csv file. Become. The GUI used tkinter (source is abbreviated).

In the case of annotation data for deep learning, which will be used later, it is better to handle the image and annotation data at a ratio of 1: 1. But this time I didn't make it to finish it quickly.

1-2-2. Video creation

The location of the nipple and navel is fixed by affine transformation according to the first image.

`Corrected time-lapse creation code (part)`


def p3affine_img(img, src_p, dst_p):
    h, w, ch = img.shape
    pts1 = np.float32([src_p[0],src_p[1],src_p[2]])
    pts2 = np.float32([dst_p[0],dst_p[1],dst_p[2]])
    M = cv2.getAffineTransform(pts1,pts2)
    dst = cv2.warpAffine(img,M,(h, w))
    return dst


df = read_annotationd() #abridgement

imgs = []
src_p = None
for index, row in df.iterrows():
    img = cv2.imread(row.file)
    dst_p = [ [row.p1x, row.p1y], #Left nipple
              [row.p2x, row.p2y], #Right nipple
              [row.p3x, row.p3y]] #navel
    if src_p is None:
      src_p = dst_p
    else:
      img = p3affine_img(img, dst_p, src_p)
    
    imgs.append(img)

write_video(imgs) #abridgement

The results are as follows.

ezgif.com-optimize.gif

I was able to make the time lapse I expected, congratulations. ** Not! ** **

The number of sheets to which coordinates are given this time is 120 (the period is from September 9, 2019 to March 2020). However, I still have 281 images that I have taken since December 2017 and have not been given coordinates. In addition, we have to do muscle training for decades to come, that is, we have to keep giving coordinates for decades. Even just imagining, cortisol is secreted and it falls into catabolic. I thought about supplementing sugar to solve this.

That's right ~~ Let's go to the gym ~~ Deep learning.

2. Automatic correction using deep learning

Make a model to estimate the position of "nipple" and "navel". Once this is achieved, all you have to do is apply the affine transformation as before. Nipple and navel detection is approached as a segmentation task. Key point detection such as posture estimation seems to be better, but I personally have more experience with segmentation tasks, so I chose that.

The dataset is as follows. Since the coordinates have already been assigned from September 9, 2019 to March 2020, this will be used for the training image and verification image to automatically obtain the coordinates for the remaining period.

2-1. Annotation data creation

It is possible to solve by 4 classifications of "right nipple", "left nipple", "navel" and "background", but this time we have divided into 2 classifications of "right nipple / left nipple / navel" and "background". I thought it would be easy to classify them on a rule basis as long as I could detect three points. Now, let's make a mask image. Based on the coordinate data created earlier, make the coordinate points a little larger and fill them with 1. Other than that, it is the background, so set it to 0.

for index, row in df.iterrows():
  file = row.file
  mask = np.zeros((img_h, img_w), dtype=np.uint8)
  mask = cv2.circle(mask,(row.p1x, row.p1y,), 15, (1), -1)
  mask = cv2.circle(mask,(row.p2x, row.p2y,), 15, (1), -1)
  mask = cv2.circle(mask,(row.p3x, row.p3y,), 15, (1), -1)
  save_img(mask, row.file) #abridgement

Visually (1 is white, 0 is black), the data is as follows.

Make these pairs with the physical image.

2-2. Learning

For learning, I used DeepLab v3 (torch vision). The 120 images were split at 8: 2 for training and verification. Although the number of sheets is quite small, we did not expand the data for the following reasons.

Physical images are taken with the same camera
Camera posture and lighting environment are aligned between images to some extent

However, I think it's better to expand the data (it's just not annoying).

`Data set class / learning related functions`


class MaskDataset(Dataset):
  def __init__(self, imgs_dir, masks_dir, scale=1, transforms=None):
    self.imgs_dir = imgs_dir
    self.masks_dir = masks_dir

    self.imgs = list(sorted(glob.glob(os.path.join(imgs_dir, "*.jpg "))))
    self.msks = list(sorted(glob.glob(os.path.join(masks_dir, "*.png "))))
    self.transforms = transforms
    self.scale = scale

  def __len__(self):
      return len(self.imgs_dir)

  @classmethod
  def preprocess(cls, pil_img, scale):

    #It looks good in grayscale, but I don't do it because it's troublesome
    # pil_img = pil_img.convert("L") 

    w, h = pil_img.size
    newW, newH = int(scale * w), int(scale * h)
    pil_img = pil_img.resize((newW, newH))

    img_nd = np.array(pil_img)

    if len(img_nd.shape) == 2:
      img_nd = np.expand_dims(img_nd, axis=2)

    # HWC to CHW
    img_trans = img_nd.transpose((2, 0, 1))
    if img_trans.max() > 1:
        img_trans = img_trans / 255

    return img_trans

  def __getitem__(self, i):
      
    mask_file = self.msks[i]
    img_file = self.imgs[i]

    mask = Image.open(mask_file)
    img = Image.open(img_file)

    img = self.preprocess(img, self.scale)
    mask = self.preprocess(mask, self.scale)

    item = {"image": torch.from_numpy(img), "mask": torch.from_numpy(mask)}
    if self.transforms:
      item = self.transforms(item)
    return item

from torchvision.models.segmentation.deeplabv3 import DeepLabHead

def create_deeplabv3(num_classes):
  model = models.segmentation.deeplabv3_resnet101(pretrained=True, progress=True)
  model.classifier = DeepLabHead(2048, num_classes)

  #It looks good in grayscale, but I don't do it because it's troublesome
  #model.backbone.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)

  return model

def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=25, print_freq=1):
  since = time.time()

  best_model_wts = copy.deepcopy(model.state_dict())
  best_loss = 1e15

  for epoch in range(num_epochs):
    print('Epoch {}/{}'.format(epoch+1, num_epochs))
    print('-' * 10)

    loss_history = {"train": [], "val": []}
    
    for phase in ["train", "val"]:
        
      if phase == "train":
        model.train()
      else:
        model.eval()

      for sample in tqdm(iter(dataloaders[phase])):
        imgs = sample["image"].to(device, dtype=torch.float)
        msks = sample["mask"].to(device, dtype=torch.float)

        optimizer.zero_grad()

        with torch.set_grad_enabled(phase == "train"):
          outputs = model(imgs)
          loss = criterion(outputs["out"], msks)

          if phase == "train":
            loss.backward()
            optimizer.step()

      epoch_loss = np.float(loss.data)
      if (epoch + 1) % print_freq == 0:
        print("Epoch: [%d/%d], Loss: %.4f" %(epoch+1, num_epochs, epoch_loss))
        loss_history[phase].append(epoch_loss)

      # deep copy the model
      if phase == "val" and epoch_loss < best_loss:
        best_loss = epoch_loss
        best_model_wts = copy.deepcopy(model.state_dict())

  time_elapsed = time.time() - since
  print("Training complete in {:.0f}m {:.0f}s".format(time_elapsed // 60, time_elapsed % 60))
  print("Best val Acc: {:4f}".format(best_loss))

  model.load_state_dict(best_model_wts)
  
  return model, loss_history

`Learning execution`



dataset = MaskDataset("images_dir", "masks_dir", 0.5, transforms=None)

#Separate for training and verification
val_percent= 0.2
batch_size=4
n_val = int(len(dataset) * val_percent)
n_train = len(dataset) - n_val
train, val = random_split(dataset, [n_train, n_val])
train_loader = DataLoader(train, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True, drop_last=True )
val_loader = DataLoader(val, batch_size=batch_size, shuffle=False, num_workers=8, pin_memory=True, drop_last=True )

dataloaders = {"train": train_loader, "val": val_loader}

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

#When using BCEWithLogitsLoss, specify 1 for binary classification
num_classes = 1 

model = create_deeplabv3(num_classes)

#For pre trained
#model.load_state_dict(torch.load("model.pth"))

model.to(device)

#Since the background is overwhelmingly large, pos_Adjust with weight
criterion = nn.BCEWithLogitsLoss(pos_weight=torch.tensor(10000.0).to(device))

params = [p for p in model.parameters() if p.requires_grad]

#optimizer = torch.optim.SGD(params, lr=0.005,momentum=0.9, weight_decay=0.0005)
optimizer = optim.Adam(params)

total_epoch = 50

model, loss_dict = train_model(model, criterion, optimizer, dataloaders, device, total_epoch)

This time, when I turned about 50 epochs, the learning converged to some extent.

2-3. Application to unknown images

As a result, it was generally good, and 3 points responded properly, but occasionally there were also the following results (heat map expression).

Of course, there are never two left nipples, so the small dot on the top right is False Positive. By the way, there was no False Negative.

2-4. Post-processing

From the inference result above, the post-processing does the following:

Truncate the output value of each pixel below the threshold
Split the object
If there are 4 or more clusters, select 3 in descending order of area and discard the rest.
Find the center of gravity of each cluster
Sort in ascending x-coordinate of the center of gravity of each cluster (right nipple → navel → left nipple)

2-4-1. Truncate if the output value of each pixel is below the threshold

Pixels other than those with a clear degree of certainty are truncated for the next step. The threshold this time is empirically set to 0.995.

2-4-2. Divide the object

Use cv2.connectedComponents for object partitioning (splitting into clusters). For details, please refer to How to label connected components with OpenCV --connectedComponents --pynote.

2-4-3. If there are 4 or more clusters, select 3 in descending order of area and discard the rest.

From the case study, it was found that the area of False Positives other than the nipple and navel was small. Therefore, we will select three with a large area. Actually, I don't think this kind of countermeasure is very robust, but this time it worked, so I will adopt it.

2-4-4. Find the center of gravity of each cluster

Use cv2.moments to find the centroid of each cluster. For details, refer to Calculating the center of gravity with Python + OpenCV --Introduction to CV image analysis.

2-4-5. Sort in ascending order of the x-coordinate of the center of gravity of each cluster (right nipple → navel → left nipple)

Since the points need to correspond when affine transformation, it is necessary to unify the coordinate order of the nipple and navel between the images. All of the images this time were taken upright, and there is no doubt that nipples → navels → nipples will appear in the horizontal axis direction, so simply sort by x coordinate.

`At the time of reasoning`



#3 points detected from mask
def triangle_pt(heatmask, thresh=0.995):
  mask = heatmask.copy()

  # 2-4-1.If the output value of each pixel is below the threshold, it will be truncated.
  mask[mask>thresh] = 255
  mask[mask<=thresh] = 0
  mask = mask.astype(np.uint8)
  # 2-4-2.Object split
  nlabels, labels = cv2.connectedComponents(mask)

  pt = []
  if nlabels != 4:

    #If less, do nothing
    #I really want to lower the threshold, but it's annoying
    if nlabels < 4:
      return None
    
    # 2-4-3.If there are 4 or more clusters, select 3 in descending order of area and discard the rest
    elif nlabels > 4:
      sum_px = []
      for i in range(1, nlabels):
        sum_px.append((labels==i).sum())
      #Background+1
      indices = [ x+1 for x in np.argsort(-np.array(sum_px))[:3]]

  else:
    indices = [x for x in range(1, nlabels)]

  # 2-4-4.Find the center of gravity of each cluster
  for i in indices:
    base = np.zeros_like(mask, dtype=np.uint8)
    base[labels==i] = 255
    mu = cv2.moments(base, False)
    x,y= int(mu["m10"]/mu["m00"]) , int(mu["m01"]/mu["m00"])
    pt.append([x,y])

  # 2-4-5.Sort in ascending x-coordinate of the center of gravity of each cluster (right nipple → navel → left nipple)
  sort_key = lambda v: v[0]
  pt.sort(key=sort_key)
  return np.array(pt)


def correct_img(model, device, in_dir, out_dir, 
                draw_heatmap=True, draw_triangle=True, correct=True):

  imgs = []

  base_3p = None
  model.eval()
  with torch.no_grad():
    imglist = sorted(glob.glob(os.path.join(in_dir, "*.jpg ")))
    
    for idx, img_path in enumerate(imglist):

      #Batch size 1 because it's annoying
      full_img = Image.open(img_path)
      img = torch.from_numpy(BasicDataset.preprocess(full_img, 0.5))
      img = img.unsqueeze(0)
      img = img.to(device=device, dtype=torch.float32)

      output = model(img)["out"]
      probs = torch.sigmoid(output)
      probs = probs.squeeze(0)

      tf = transforms.Compose(
                [
                    transforms.ToPILImage(),
                    transforms.Resize(full_img.size[0]),
                    transforms.ToTensor()
                ]
            )
      
      probs = tf(probs.cpu())
      full_mask = probs.squeeze().cpu().numpy()

      full_img = np.asarray(full_img).astype(np.uint8)
      full_img = cv2.cvtColor(full_img, cv2.COLOR_RGB2BGR)

      #triangle
      triangle = triangle_pt(full_mask)
      if draw_triangle and triangle is not None:
        cv2.drawContours(full_img, [triangle], 0, (0, 0, 255), 5)

      #Heat map
      if draw_heatmap:
        full_mask = (full_mask*255).astype(np.uint8)
        jet = cv2.applyColorMap(full_mask, cv2.COLORMAP_JET)

        alpha = 0.7
        full_img = cv2.addWeighted(full_img, alpha, jet, 1 - alpha, 0)

      #Affine transformation
      if correct:
        if base_3p is None and triangle is not None:
          base_3p = triangle
        elif triangle is not None:
          full_img = p3affine_img(full_img, triangle, base_3p)

      if out_dir is not None:
        cv2.imwrite(os.path.join(out_dir, os.path.basename(img_path)), full_img)

      imgs.append(full_img)

  return imgs

imgs = correct_img(model, device,
                   "images_dir", None,
                    draw_heatmap=False, draw_triangle=False, correct=True)

2-5. Results

The time lapse just before the correction is as follows.

ezgif.com-optimize (1).gif

The corrected time lapse is as follows.

ezgif.com-optimize (2).gif

Summary

By using deep learning to detect nipples and navels and automatically correcting the image, the time lapse is dramatically easier to see. This further motivated me to train. Of course, some people might think ** "Isn't it possible with such a non-deep CV?" **, but in my case, if I had time to think about the rules, I would like to raise a barbell. It feels like a solution with brute force. All development was done with google colab except the coordinate giving tool, 3150 u! The challenge is

I don't know if it will work with other people's bodies (well, I should learn it)
Not supported if overall size is large (reference points other than nipples and navels are required)
Shading removal
App release (I want to do this !!!)

However, cortisol is secreted, so don't worry about it being too hard!

Let's have a fun muscle training life!

Deep learning dramatically makes it easier to see the time-lapse of physical changes