"Selfie (body)" is a habit of many trainees (people who love muscle training). It's a blissful time to take a picture of your pumped body after training and look back at it later. In addition, if you animate the captured image like a time lapse, you can see that muscle growth is more pickable! This article uses deep learning to dramatically improve the time-lapse of the body.
Changes in the body from December 2017 to March 2020
I created a time lapse from the images I took. However, I was worried about the gap between the images, so I corrected it manually to create a smooth time lapse. Furthermore, in order to save the trouble of manual work, correction was automatically performed using deep learning.
For the time being, let's create a time-lapse that just switches the image as it is continuously.
Time-lapse creation code (part)
#You can make videos with opencv,
#In order to create an mp4 file that can be played on discord in the environment of google colab,
#I enjoyed using skvideo.
import skvideo.io
def create_video(imgs, out_video_path, size_wh):
video = []
vid_out = skvideo.io.FFmpegWriter(out_video_path,
inputdict={
"-r": "10"
},
outputdict={
"-r": "10"
})
for img in imgs:
img = cv2.resize(img, size_wh)
vid_out.writeFrame(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
vid_out.close()
imgs = load_images("images_dir")
create_video(imgs, "video.mp4", (w,h))
The result is as follows.
I'm worried about the gap and can't concentrate on my child (body).
I want to somehow easily eliminate this gap. If I set a reference point somewhere on my body and fix it, I came up with the solution of "nipple" and "navel" in about 0.1 seconds. Here's how to fix your nipples and navel.
First, create a tool that gives UV coordinates to the nipple and navel. It may be possible to realize it by using cvat etc., but when I estimated the time to master it and the time to make my own tool, I concluded that it is faster to make it myself, so I made it.
The specification of the tool is that if you specify a folder, the images will be displayed continuously, so for each image, click the three points of the nipple and navel, and the clicked coordinates will be output to the csv file. Become. The GUI used tkinter (source is abbreviated).
The location of the nipple and navel is fixed by affine transformation according to the first image.
Corrected time-lapse creation code (part)
def p3affine_img(img, src_p, dst_p):
h, w, ch = img.shape
pts1 = np.float32([src_p[0],src_p[1],src_p[2]])
pts2 = np.float32([dst_p[0],dst_p[1],dst_p[2]])
M = cv2.getAffineTransform(pts1,pts2)
dst = cv2.warpAffine(img,M,(h, w))
return dst
df = read_annotationd() #abridgement
imgs = []
src_p = None
for index, row in df.iterrows():
img = cv2.imread(row.file)
dst_p = [ [row.p1x, row.p1y], #Left nipple
[row.p2x, row.p2y], #Right nipple
[row.p3x, row.p3y]] #navel
if src_p is None:
src_p = dst_p
else:
img = p3affine_img(img, dst_p, src_p)
imgs.append(img)
write_video(imgs) #abridgement
The results are as follows.
I was able to make the time lapse I expected, congratulations. ** Not! ** **
The number of sheets to which coordinates are given this time is 120 (the period is from September 9, 2019 to March 2020). However, I still have 281 images that I have taken since December 2017 and have not been given coordinates. In addition, we have to do muscle training for decades to come, that is, we have to keep giving coordinates for decades. Even just imagining, cortisol is secreted and it falls into catabolic. I thought about supplementing sugar to solve this.
That's right ~~ Let's go to the gym ~~ Deep learning.
Make a model to estimate the position of "nipple" and "navel". Once this is achieved, all you have to do is apply the affine transformation as before. Nipple and navel detection is approached as a segmentation task. Key point detection such as posture estimation seems to be better, but I personally have more experience with segmentation tasks, so I chose that.
The dataset is as follows. Since the coordinates have already been assigned from September 9, 2019 to March 2020, this will be used for the training image and verification image to automatically obtain the coordinates for the remaining period.
It is possible to solve by 4 classifications of "right nipple", "left nipple", "navel" and "background", but this time we have divided into 2 classifications of "right nipple / left nipple / navel" and "background". I thought it would be easy to classify them on a rule basis as long as I could detect three points. Now, let's make a mask image. Based on the coordinate data created earlier, make the coordinate points a little larger and fill them with 1. Other than that, it is the background, so set it to 0.
for index, row in df.iterrows():
file = row.file
mask = np.zeros((img_h, img_w), dtype=np.uint8)
mask = cv2.circle(mask,(row.p1x, row.p1y,), 15, (1), -1)
mask = cv2.circle(mask,(row.p2x, row.p2y,), 15, (1), -1)
mask = cv2.circle(mask,(row.p3x, row.p3y,), 15, (1), -1)
save_img(mask, row.file) #abridgement
Visually (1 is white, 0 is black), the data is as follows.
Make these pairs with the physical image.
For learning, I used DeepLab v3 (torch vision). The 120 images were split at 8: 2 for training and verification. Although the number of sheets is quite small, we did not expand the data for the following reasons.
However, I think it's better to expand the data (it's just not annoying).
Data set class / learning related functions
class MaskDataset(Dataset):
def __init__(self, imgs_dir, masks_dir, scale=1, transforms=None):
self.imgs_dir = imgs_dir
self.masks_dir = masks_dir
self.imgs = list(sorted(glob.glob(os.path.join(imgs_dir, "*.jpg "))))
self.msks = list(sorted(glob.glob(os.path.join(masks_dir, "*.png "))))
self.transforms = transforms
self.scale = scale
def __len__(self):
return len(self.imgs_dir)
@classmethod
def preprocess(cls, pil_img, scale):
#It looks good in grayscale, but I don't do it because it's troublesome
# pil_img = pil_img.convert("L")
w, h = pil_img.size
newW, newH = int(scale * w), int(scale * h)
pil_img = pil_img.resize((newW, newH))
img_nd = np.array(pil_img)
if len(img_nd.shape) == 2:
img_nd = np.expand_dims(img_nd, axis=2)
# HWC to CHW
img_trans = img_nd.transpose((2, 0, 1))
if img_trans.max() > 1:
img_trans = img_trans / 255
return img_trans
def __getitem__(self, i):
mask_file = self.msks[i]
img_file = self.imgs[i]
mask = Image.open(mask_file)
img = Image.open(img_file)
img = self.preprocess(img, self.scale)
mask = self.preprocess(mask, self.scale)
item = {"image": torch.from_numpy(img), "mask": torch.from_numpy(mask)}
if self.transforms:
item = self.transforms(item)
return item
from torchvision.models.segmentation.deeplabv3 import DeepLabHead
def create_deeplabv3(num_classes):
model = models.segmentation.deeplabv3_resnet101(pretrained=True, progress=True)
model.classifier = DeepLabHead(2048, num_classes)
#It looks good in grayscale, but I don't do it because it's troublesome
#model.backbone.conv1 = nn.Conv2d(1, 64, kernel_size=7, stride=2, padding=3, bias=False)
return model
def train_model(model, criterion, optimizer, dataloaders, device, num_epochs=25, print_freq=1):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_loss = 1e15
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch+1, num_epochs))
print('-' * 10)
loss_history = {"train": [], "val": []}
for phase in ["train", "val"]:
if phase == "train":
model.train()
else:
model.eval()
for sample in tqdm(iter(dataloaders[phase])):
imgs = sample["image"].to(device, dtype=torch.float)
msks = sample["mask"].to(device, dtype=torch.float)
optimizer.zero_grad()
with torch.set_grad_enabled(phase == "train"):
outputs = model(imgs)
loss = criterion(outputs["out"], msks)
if phase == "train":
loss.backward()
optimizer.step()
epoch_loss = np.float(loss.data)
if (epoch + 1) % print_freq == 0:
print("Epoch: [%d/%d], Loss: %.4f" %(epoch+1, num_epochs, epoch_loss))
loss_history[phase].append(epoch_loss)
# deep copy the model
if phase == "val" and epoch_loss < best_loss:
best_loss = epoch_loss
best_model_wts = copy.deepcopy(model.state_dict())
time_elapsed = time.time() - since
print("Training complete in {:.0f}m {:.0f}s".format(time_elapsed // 60, time_elapsed % 60))
print("Best val Acc: {:4f}".format(best_loss))
model.load_state_dict(best_model_wts)
return model, loss_history
Learning execution
dataset = MaskDataset("images_dir", "masks_dir", 0.5, transforms=None)
#Separate for training and verification
val_percent= 0.2
batch_size=4
n_val = int(len(dataset) * val_percent)
n_train = len(dataset) - n_val
train, val = random_split(dataset, [n_train, n_val])
train_loader = DataLoader(train, batch_size=batch_size, shuffle=True, num_workers=8, pin_memory=True, drop_last=True )
val_loader = DataLoader(val, batch_size=batch_size, shuffle=False, num_workers=8, pin_memory=True, drop_last=True )
dataloaders = {"train": train_loader, "val": val_loader}
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
#When using BCEWithLogitsLoss, specify 1 for binary classification
num_classes = 1
model = create_deeplabv3(num_classes)
#For pre trained
#model.load_state_dict(torch.load("model.pth"))
model.to(device)
#Since the background is overwhelmingly large, pos_Adjust with weight
criterion = nn.BCEWithLogitsLoss(pos_weight=torch.tensor(10000.0).to(device))
params = [p for p in model.parameters() if p.requires_grad]
#optimizer = torch.optim.SGD(params, lr=0.005,momentum=0.9, weight_decay=0.0005)
optimizer = optim.Adam(params)
total_epoch = 50
model, loss_dict = train_model(model, criterion, optimizer, dataloaders, device, total_epoch)
This time, when I turned about 50 epochs, the learning converged to some extent.
As a result, it was generally good, and 3 points responded properly, but occasionally there were also the following results (heat map expression).
Of course, there are never two left nipples, so the small dot on the top right is False Positive. By the way, there was no False Negative.
From the inference result above, the post-processing does the following:
Pixels other than those with a clear degree of certainty are truncated for the next step. The threshold this time is empirically set to 0.995.
Use cv2.connectedComponents for object partitioning (splitting into clusters). For details, please refer to How to label connected components with OpenCV --connectedComponents --pynote.
From the case study, it was found that the area of False Positives other than the nipple and navel was small. Therefore, we will select three with a large area. Actually, I don't think this kind of countermeasure is very robust, but this time it worked, so I will adopt it.
Use cv2.moments to find the centroid of each cluster. For details, refer to Calculating the center of gravity with Python + OpenCV --Introduction to CV image analysis.
Since the points need to correspond when affine transformation, it is necessary to unify the coordinate order of the nipple and navel between the images. All of the images this time were taken upright, and there is no doubt that nipples → navels → nipples will appear in the horizontal axis direction, so simply sort by x coordinate.
At the time of reasoning
#3 points detected from mask
def triangle_pt(heatmask, thresh=0.995):
mask = heatmask.copy()
# 2-4-1.If the output value of each pixel is below the threshold, it will be truncated.
mask[mask>thresh] = 255
mask[mask<=thresh] = 0
mask = mask.astype(np.uint8)
# 2-4-2.Object split
nlabels, labels = cv2.connectedComponents(mask)
pt = []
if nlabels != 4:
#If less, do nothing
#I really want to lower the threshold, but it's annoying
if nlabels < 4:
return None
# 2-4-3.If there are 4 or more clusters, select 3 in descending order of area and discard the rest
elif nlabels > 4:
sum_px = []
for i in range(1, nlabels):
sum_px.append((labels==i).sum())
#Background+1
indices = [ x+1 for x in np.argsort(-np.array(sum_px))[:3]]
else:
indices = [x for x in range(1, nlabels)]
# 2-4-4.Find the center of gravity of each cluster
for i in indices:
base = np.zeros_like(mask, dtype=np.uint8)
base[labels==i] = 255
mu = cv2.moments(base, False)
x,y= int(mu["m10"]/mu["m00"]) , int(mu["m01"]/mu["m00"])
pt.append([x,y])
# 2-4-5.Sort in ascending x-coordinate of the center of gravity of each cluster (right nipple → navel → left nipple)
sort_key = lambda v: v[0]
pt.sort(key=sort_key)
return np.array(pt)
def correct_img(model, device, in_dir, out_dir,
draw_heatmap=True, draw_triangle=True, correct=True):
imgs = []
base_3p = None
model.eval()
with torch.no_grad():
imglist = sorted(glob.glob(os.path.join(in_dir, "*.jpg ")))
for idx, img_path in enumerate(imglist):
#Batch size 1 because it's annoying
full_img = Image.open(img_path)
img = torch.from_numpy(BasicDataset.preprocess(full_img, 0.5))
img = img.unsqueeze(0)
img = img.to(device=device, dtype=torch.float32)
output = model(img)["out"]
probs = torch.sigmoid(output)
probs = probs.squeeze(0)
tf = transforms.Compose(
[
transforms.ToPILImage(),
transforms.Resize(full_img.size[0]),
transforms.ToTensor()
]
)
probs = tf(probs.cpu())
full_mask = probs.squeeze().cpu().numpy()
full_img = np.asarray(full_img).astype(np.uint8)
full_img = cv2.cvtColor(full_img, cv2.COLOR_RGB2BGR)
#triangle
triangle = triangle_pt(full_mask)
if draw_triangle and triangle is not None:
cv2.drawContours(full_img, [triangle], 0, (0, 0, 255), 5)
#Heat map
if draw_heatmap:
full_mask = (full_mask*255).astype(np.uint8)
jet = cv2.applyColorMap(full_mask, cv2.COLORMAP_JET)
alpha = 0.7
full_img = cv2.addWeighted(full_img, alpha, jet, 1 - alpha, 0)
#Affine transformation
if correct:
if base_3p is None and triangle is not None:
base_3p = triangle
elif triangle is not None:
full_img = p3affine_img(full_img, triangle, base_3p)
if out_dir is not None:
cv2.imwrite(os.path.join(out_dir, os.path.basename(img_path)), full_img)
imgs.append(full_img)
return imgs
imgs = correct_img(model, device,
"images_dir", None,
draw_heatmap=False, draw_triangle=False, correct=True)
The time lapse just before the correction is as follows.
The corrected time lapse is as follows.
By using deep learning to detect nipples and navels and automatically correcting the image, the time lapse is dramatically easier to see. This further motivated me to train. Of course, some people might think ** "Isn't it possible with such a non-deep CV?" **, but in my case, if I had time to think about the rules, I would like to raise a barbell. It feels like a solution with brute force. All development was done with google colab except the coordinate giving tool, 3150 u! The challenge is
However, cortisol is secreted, so don't worry about it being too hard!
Let's have a fun muscle training life!
Recommended Posts