On the fashion site WEAR, snapshots of various people are uploaded on the fashion coordination site. Looking at it, the models have some ingenuity in how they stand, such as turning a little sideways or raising one leg a little so that they can easily see their clothes, rather than just standing upright and immovable in front of them. This time, we estimated the attitude of the snapshot model and clustered the estimation results.

Posture estimation model

https://github.com/ildoonet/tf-pose-estimation

I used this model. The following function returns a list that combines the x and y coordinates of each point (body_parts) of the body given the path of the image. Not all points can be estimated in all images, so only those that can estimate all the skeleton parts (max_n_body_parts) except the eyes and ears are returned.


from tf_pose import common
import cv2
import numpy as np
from tf_pose.estimator import TfPoseEstimator
from tf_pose.networks import get_graph_path, model_wh

#Load the model first
model = 'mobilenet_thin'
resize = '432x368'

w, h = model_wh(resize)
if w == 0 or h == 0:
    e = TfPoseEstimator(get_graph_path(model), target_size=(432, 368))
else:
    e = TfPoseEstimator(get_graph_path(model), target_size=(w, h))

def img2vec(estimator, w, h, img_path, resize=resize, local_file=True):
    max_n_body_parts = 14 #Omit eyes and ears
    resize_out_ratio = 4.0

    if local_file:
        image = common.read_imgfile(img_path, None, None)
        if image is None:
            image = requests.get(img_path).text
    else:
        res = requests.get(img_path)
        image = np.array(Image.open(BytesIO(res.content)).convert('RGB'))
        
    humans = estimator.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=resize_out_ratio)

    image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)
    image_h, image_w = image.shape[:2]

    dfs = pd.DataFrame(index=[])
    columns = ['human', 'point', 'x', 'y']

    xx = 0
    if len(humans) != 1: return

    for human in humans: #Actually there is only one
        xx = xx + 1

        for m in human.body_parts:
            body_part = human.body_parts[m]
            center = (int(body_part.x * image_w + 0.5), int(body_part.y * image_h + 0.5))
            list = [[xx, m, center[0],center[1]]]
            df = pd.DataFrame(data=list, columns=columns)

            dfs = pd.concat([dfs, df])

    dfs = dfs[dfs['point'] < max_n_body_parts]
    if len(dfs) != max_n_body_parts: return
    
    return np.array(dfs.x).tolist() + np.array(dfs.y).tolist()

Also, in order to correct the position of the body in the image, min-max normalization is applied to each of the x and y coordinates, so the actual processing of the vectorization part is as follows.

def min_max_norm(l):
    max_ = max(l)
    return [l_ / max_ for l_ in l]

vec = img2vec(e, w, h, f"{file_path}", resize="432x368", local_file=False)
min_max_norm(vec[:14]) + min_max_norm(vec[14:]) #This vector is a vector for one image (list type)

Vectorize snapshots

This time, I created the data separately for men's snapshots and women's snapshots. Also, for winter clothes, I felt that the difficulty of posture estimation would increase a little with coats, long skirts, mufflers, etc., so I focused on the summer period and collected images.

https://wear.jp/men-coordinate/?from_month=6&to_month=8&pageno=hoge
https://wear.jp/women-coordinate/?from_month=6&to_month=8&pageno=hoge

When I append the vectors created by the previous function to the list of variable names men_vecs and women_vecs, respectively, 708 and 374 vectors were collected, respectively.

Clustering (K-means)

We will cluster the set of vectors for men and women. Since I didn't understand the vectorization and clustering methods specific to the attitude coordinates, I clustered the vector that combines the x and y coordinates by k-means in the space.

Determining the number of clusters

First, calculate something like a loss for each number of clusters to determine the number of clusters (k).

According to sklearn's K Means

inertia_ : float Sum of squared distances of samples to their closest cluster center.

So, plot this value (the sum of the distances from the nearest centroid of each point) for each number of clusters as something like that loss.

from sklearn.cluster import KMeans

#Calculate the sum of distances by changing k
errors = []
for k in range(1, 14):
    kmeans_model = KMeans(n_clusters=k, random_state=0).fit(np.array(men_vecs)) #Or women_vecs
    errors.append(kmeans_model.inertia_)
plt.plot(errors)

Men's image clustering	Ladies image clustering

From now on, it is assumed that men_vecs or women_vecs is replaced with vecs.

vecs = men_vecs
# vecs = women_vecs

Clustering plot (PCA)

From the previous plot, I decided to set k = 3 this time. Check the data scatter for each cluster with a two-dimensional plot using principal component analysis.

k = 3
kmeans_model = KMeans(n_clusters=k, random_state=0).fit(np.array(vecs))
labels = kmeans_model.labels_

The colors are set separately for each cluster. The center of gravity is all black.


from sklearn.decomposition import PCA

pca = PCA()
pca.fit(np.array(vecs))
feature = pca.transform(np.array(vecs))
centroids_pca = pca.transform(kmeans_model.cluster_centers_)

# NOTE:Assign the center of gravity of the cluster to a different color
color_codes = list(sns.color_palette(n_colors=k).as_hex())
colors = [color_codes[label] for label in labels]
colors += ['#000000' for i in range(k)]

plt.figure(figsize=(6, 6))
for x, y in zip(feature[:, 0], feature[:, 1]):
    plt.text(x, y, '', alpha=0.8, size=10)
features = np.append(feature, centroids_pca, axis=0)
plt.scatter(features[:, 0], features[:, 1], alpha=0.8, color=colors)
plt.show()

PCA plot by clustering of men's images	PCA plot by clustering of ladies images

Plot the center of gravity of the cluster to the posture

Now that the pose estimation clustering is complete, we want to convert the centroids of those clusters from vectors to plots of the x and y coordinate poses. Each body_part is represented by a scatter plot point and connected by a bar in a bar graph to form the actual skeleton.

def show_poses(vecs_list, m=50):
    n_poses = len(vecs_list)
    fig, axes = plt.subplots(n_poses, 1, figsize=(276/m, 368/m*n_poses))
    for i, vecs in enumerate(vecs_list):
    
        x, y = vecs[:14], vecs[14:]
        links = [[0, 1], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7], [1, 8], [8, 9], [9, 10], [1, 11], [11, 12], [12, 13]]

        axes[i].scatter(x, [-y_ for y_ in y]) if n_poses > 1 else plt.scatter(x, [-y_ for y_ in y])
        for l in links:
            axes[i].plot([x[l[0]], x[l[1]]], [-y[l[0]], -y[l[1]]]) if n_poses > 1 else plt.plot([x[l[0]], x[l[1]]], [-y[l[0]], -y[l[1]]])

When I tried to display the estimation result of the first image of ladies, it became as follows. You can imagine a fashionable standing figure

show_poses([women_vecs[0]])

スクリーンショット 2021-01-10 15.52.27.png

The following is the result of converting the centroid of the cluster into a posture plot.

show_poses([v.tolist() for v in kmeans_model.cluster_centers_])

category	Cluster 1	Cluster 2	Cluster 3
mens
Women

I understand that the third item for men is standing diagonally and stretching one leg, but the overall result was unclear.

in conclusion

Possible improvements to the method are

--Vectorization: A simple x-coordinate and y-coordinate connection vector may have been too simple --Normalization: I pushed both the x-coordinate and the y-coordinate from 0 to 1, but the ratios are actually different (because it is a human being, it becomes vertically long).

The latter story seems to be wider than the actual image even in the posture plot in the latter half of the article, so I hope there is some good way.

Clustering the posture of snapshots of fashion site, Wear