On the fashion site WEAR, snapshots of various people are uploaded on the fashion coordination site. Looking at it, the models have some ingenuity in how they stand, such as turning a little sideways or raising one leg a little so that they can easily see their clothes, rather than just standing upright and immovable in front of them. This time, we estimated the attitude of the snapshot model and clustered the estimation results.
https://github.com/ildoonet/tf-pose-estimation
I used this model. The following function returns a list that combines the x and y coordinates of each point (body_parts
) of the body given the path of the image. Not all points can be estimated in all images, so only those that can estimate all the skeleton parts (max_n_body_parts
) except the eyes and ears are returned.
from tf_pose import common
import cv2
import numpy as np
from tf_pose.estimator import TfPoseEstimator
from tf_pose.networks import get_graph_path, model_wh
#Load the model first
model = 'mobilenet_thin'
resize = '432x368'
w, h = model_wh(resize)
if w == 0 or h == 0:
e = TfPoseEstimator(get_graph_path(model), target_size=(432, 368))
else:
e = TfPoseEstimator(get_graph_path(model), target_size=(w, h))
def img2vec(estimator, w, h, img_path, resize=resize, local_file=True):
max_n_body_parts = 14 #Omit eyes and ears
resize_out_ratio = 4.0
if local_file:
image = common.read_imgfile(img_path, None, None)
if image is None:
image = requests.get(img_path).text
else:
res = requests.get(img_path)
image = np.array(Image.open(BytesIO(res.content)).convert('RGB'))
humans = estimator.inference(image, resize_to_default=(w > 0 and h > 0), upsample_size=resize_out_ratio)
image = TfPoseEstimator.draw_humans(image, humans, imgcopy=False)
image_h, image_w = image.shape[:2]
dfs = pd.DataFrame(index=[])
columns = ['human', 'point', 'x', 'y']
xx = 0
if len(humans) != 1: return
for human in humans: #Actually there is only one
xx = xx + 1
for m in human.body_parts:
body_part = human.body_parts[m]
center = (int(body_part.x * image_w + 0.5), int(body_part.y * image_h + 0.5))
list = [[xx, m, center[0],center[1]]]
df = pd.DataFrame(data=list, columns=columns)
dfs = pd.concat([dfs, df])
dfs = dfs[dfs['point'] < max_n_body_parts]
if len(dfs) != max_n_body_parts: return
return np.array(dfs.x).tolist() + np.array(dfs.y).tolist()
Also, in order to correct the position of the body in the image, min-max normalization is applied to each of the x and y coordinates, so the actual processing of the vectorization part is as follows.
def min_max_norm(l):
max_ = max(l)
return [l_ / max_ for l_ in l]
vec = img2vec(e, w, h, f"{file_path}", resize="432x368", local_file=False)
min_max_norm(vec[:14]) + min_max_norm(vec[14:]) #This vector is a vector for one image (list type)
This time, I created the data separately for men's snapshots and women's snapshots. Also, for winter clothes, I felt that the difficulty of posture estimation would increase a little with coats, long skirts, mufflers, etc., so I focused on the summer period and collected images.
When I append the vectors created by the previous function to the list of variable names men_vecs
and women_vecs
, respectively, 708 and 374 vectors were collected, respectively.
We will cluster the set of vectors for men and women. Since I didn't understand the vectorization and clustering methods specific to the attitude coordinates, I clustered the vector that combines the x and y coordinates by k-means in the space.
First, calculate something like a loss for each number of clusters to determine the number of clusters (k
).
According to sklearn's K Means
inertia_ : float Sum of squared distances of samples to their closest cluster center.
So, plot this value (the sum of the distances from the nearest centroid of each point) for each number of clusters as something like that loss.
from sklearn.cluster import KMeans
#Calculate the sum of distances by changing k
errors = []
for k in range(1, 14):
kmeans_model = KMeans(n_clusters=k, random_state=0).fit(np.array(men_vecs)) #Or women_vecs
errors.append(kmeans_model.inertia_)
plt.plot(errors)
Men's image clustering | Ladies image clustering |
---|---|
From now on, it is assumed that men_vecs
or women_vecs
is replaced with vecs
.
vecs = men_vecs
# vecs = women_vecs
From the previous plot, I decided to set k = 3
this time. Check the data scatter for each cluster with a two-dimensional plot using principal component analysis.
k = 3
kmeans_model = KMeans(n_clusters=k, random_state=0).fit(np.array(vecs))
labels = kmeans_model.labels_
The colors are set separately for each cluster. The center of gravity is all black.
from sklearn.decomposition import PCA
pca = PCA()
pca.fit(np.array(vecs))
feature = pca.transform(np.array(vecs))
centroids_pca = pca.transform(kmeans_model.cluster_centers_)
# NOTE:Assign the center of gravity of the cluster to a different color
color_codes = list(sns.color_palette(n_colors=k).as_hex())
colors = [color_codes[label] for label in labels]
colors += ['#000000' for i in range(k)]
plt.figure(figsize=(6, 6))
for x, y in zip(feature[:, 0], feature[:, 1]):
plt.text(x, y, '', alpha=0.8, size=10)
features = np.append(feature, centroids_pca, axis=0)
plt.scatter(features[:, 0], features[:, 1], alpha=0.8, color=colors)
plt.show()
PCA plot by clustering of men's images | PCA plot by clustering of ladies images |
---|---|
Now that the pose estimation clustering is complete, we want to convert the centroids of those clusters from vectors to plots of the x and y coordinate poses. Each body_part is represented by a scatter plot point and connected by a bar in a bar graph to form the actual skeleton.
def show_poses(vecs_list, m=50):
n_poses = len(vecs_list)
fig, axes = plt.subplots(n_poses, 1, figsize=(276/m, 368/m*n_poses))
for i, vecs in enumerate(vecs_list):
x, y = vecs[:14], vecs[14:]
links = [[0, 1], [1, 2], [2, 3], [3, 4], [1, 5], [5, 6], [6, 7], [1, 8], [8, 9], [9, 10], [1, 11], [11, 12], [12, 13]]
axes[i].scatter(x, [-y_ for y_ in y]) if n_poses > 1 else plt.scatter(x, [-y_ for y_ in y])
for l in links:
axes[i].plot([x[l[0]], x[l[1]]], [-y[l[0]], -y[l[1]]]) if n_poses > 1 else plt.plot([x[l[0]], x[l[1]]], [-y[l[0]], -y[l[1]]])
When I tried to display the estimation result of the first image of ladies, it became as follows. You can imagine a fashionable standing figure
show_poses([women_vecs[0]])
The following is the result of converting the centroid of the cluster into a posture plot.
show_poses([v.tolist() for v in kmeans_model.cluster_centers_])
category | Cluster 1 | Cluster 2 | Cluster 3 |
---|---|---|---|
mens | |||
Women |
I understand that the third item for men is standing diagonally and stretching one leg, but the overall result was unclear.
Possible improvements to the method are
--Vectorization: A simple x-coordinate and y-coordinate connection vector may have been too simple --Normalization: I pushed both the x-coordinate and the y-coordinate from 0 to 1, but the ratios are actually different (because it is a human being, it becomes vertically long).
The latter story seems to be wider than the actual image even in the posture plot in the latter half of the article, so I hope there is some good way.
Recommended Posts