I tried to visualize Boeing of violin performance by pose estimation

I tried to detect the feature point of the right wrist joint

ezgif.com-crop-2.gif

Introduction

This article was posted on the 22nd day of Fujitsu Systems Web Technology Advent Calendar 2019. The content of the article is the personal opinion, and the content of the writing is the responsibility of the author himself. The organization to which you belong does not matter. (Oyakusoku)

Overview

I like playing the violin as a hobby, but playing the violin is hard. It is widely known that it is difficult to play, thanks to Shizuka-chan, but there are other difficulties besides playing.

image.png

When I play the violin, I have a certain thing to do before I play the song. It's called Boeing, and it's a task of deciding whether to play with a raising bow or a lower bow one by one while looking at the score.

--V → Upbow (from bottom to top) --П → Down bow (from top to bottom)

image.png

I write Boeing on the score in consideration of the phrase feeling of the song and the ease of playing, but this work is quite painstaking. Boeing does not have the only correct answer, it also reveals the player's sensibility. Also, when playing in a quartet or orchestra, it may be necessary to match it with other instruments such as viola and cello. It is no exaggeration to say that the development of an automatic Boeing machine is a dream of a stringed instrument player.

So, even if you don't go to the Boeing automatic attachment machine, can you save time by reading Boeing from other people's performance videos and referring to it? I thought.

Strategy

The strategy is this.

--Using an open source library that can estimate poses, the coordinates of the right wrist holding the bow are recorded moment by moment from the performance video. --Overlay the trajectory of the coordinates of the right wrist on the score.

With this, you should be able to understand the Boeing of the performance video perfectly! The library used is an open source tf-pose-estimation that implements OpenPose for Tensorflow.

What is OpenPose?

OpenPose is a technology announced by CMU (Carnegie Mellon University) at CVPR2017, an international conference on computer vision, that detects keypoints and estimates the relationship between keypoints. With OpenPose, you can see the coordinates of the feature points in the human body, such as the position of joints, as shown below.

dance_foot.gif https://github.com/CMU-Perceptual-Computing-Lab/openpose

What is tf-pose-estimation?

tf-pose-estimation is an implementation of the same neural network as OpenPose for Tensorflow. I decided to use this this time because errno-mmd's tf-pose-estimation extension seemed to be useful. I had an option to output the pose-estimated two-dimensional joint position information to a file in JSON format, and I wanted to use this.

Performance video to analyze

I had no idea about the free violin performance videos, so I used a recording of my performance for analysis. The song played is the first 10 bars of Mozart's Eine Kleine Nachtmusik 1 movement. I played at a constant tempo using the metronome.

Slightly modified to mark only the right wrist joint

By default, tf-pose-estimation marks various feature points such as eyes, shoulders, elbows, wrists, and ankles, and connects them with lines. This time, I modified the code a little to mark only the right wrist.

tf-pose-estimation/tf_pose/estimator.py



440                #Draw a circle image only on the right wrist
441                if i == CocoPart.RWrist.value:
442                    cv2.circle(npimg, center, 8, common.CocoColors[i], thickness=3, lineType=8, shift=0)

449                #Disable line drawing connecting feature points
450                # cv2.line(npimg, centers[pair[0]], centers[pair[1]], common.CocoColors[pair_order], 3)

Performing pose estimation

The execution environment used Google Colaboratory. https://github.com/errno-mmd/tf-pose-estimation After executing the setup command according to Read.md of, execute the following command.

%run -i run_video.py --video "/content/drive/My Drive/violin_playing/EineKleineNachtmusik_20191226.mp4" --model mobilenet_v2_large --write_json "/content/drive/My Drive/violin_playing/json" --no_display --number_people_max 1 --write_video "/content/drive/My Drive/violin_playing/EineKleine_keypoints_20191226.mp4"

Using the performance video uploaded to Google Drive in advance as an input, a video depicting the joint point of the right wrist and a JSON file in which the coordinates of the joint point for each frame are written are output.

Output video file

Here is the output video cut out at intervals of a few seconds and made into a gif.

ezgif.com-crop-2.gif

As far as I can see, it seems that it traces the right wrist joint point with a certain degree of accuracy.

Output JSON file

A JSON file containing the coordinates of the feature points is output every frame (1/60 seconds). Here is the JSON file for the 10th frame.

000000000010_keypoints.json


{
	"version": 1.2,
	"people": [
		{
			"pose_keypoints_2d": [
				265.5925925925926,
				113.04347826086956,
				0.7988795638084412,
				244.55555555555557,
				147.82608695652175,
				0.762155294418335,
				197.22222222222223,
				149.56521739130434,
				0.6929810643196106,
				165.66666666666669,
				189.56521739130434,
				0.7044630646705627,
				220.88888888888889,
				166.95652173913044,
				0.690696656703949,
				289.2592592592593,
				146.08695652173913,
				0.5453883409500122,
				299.77777777777777,
				212.17391304347825,
				0.6319900751113892,
				339.22222222222223,
				177.3913043478261,
				0.6045356392860413,
				213,
				253.91304347826087,
				0.23064623773097992,
				0,
				0,
				0,
				0,
				0,
				0,
				268.22222222222223,
				276.52173913043475,
				0.2685505151748657,
				0,
				0,
				0,
				0,
				0,
				0,
				257.7037037037037,
				106.08695652173913,
				0.8110038042068481,
				270.85185185185185,
				107.82608695652173,
				0.7383710741996765,
				231.4074074074074,
				107.82608695652173,
				0.7740614414215088,
				0,
				0,
				0
			]
		}
	],
	"face_keypoints_2d": [],
	"hand_left_keypoints_2d": [],
	"hand_right_keypoints_2d": [],
	"pose_keypoints_3d": [],
	"face_keypoints_3d": [],
	"hand_left_keypoints_3d": [],
	"hand_right_keypoints_3d": []
}

Reading the code, it seems that the 13th element (shaking from 0) of pose_keypoints_2d is the value of the y coordinate of the right wrist. In the example of the 10th frame above, it would be 166.95652173913044.

Graph display of y coordinate of right wrist

I would like to make a graph with matplotlib. First, collect the target value from the JSON file output for each frame.

import json
import pprint
import os
import glob

import numpy as np

files = glob.glob('/content/drive/My Drive/violin_playing/json/*')
x = list(range(len(files)))
y = []
for file in files:
    with open(file) as f:
        df = json.load(f)
        y.append(df['people'][0]['pose_keypoints_2d'][13])

The variable x stores the number of frames, and the variable y stores the value of the y coordinate of the right wrist joint point.

Then graph with matplotlib. The x-coordinate memory is 30 frames apart. This is because I played Ainek at a tempo of quarter note = 120, so 30 frames are exactly equivalent to one quarter note beat, making it easier to see the correspondence with the score.

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker

fig = plt.figure(figsize=(120, 10))
ax = fig.add_subplot(1, 1, 1)
ax.xaxis.set_major_locator(ticker.MultipleLocator(30))
ax.plot(x, y, "red", linestyle='solid')

The output graph is here.

image.png

Overlay

Here is the result of superimposing it on the score. The overlay was done manually. image.png

When it's a down bow, it should be going down, and when it's an up bow, it should be going up, but ... well ... I wonder if it's almost right (^ _ ^;) It's like that.

Summary

To be honest, I can't really reach the level of utilization as it is. It seems to be difficult to identify the Boeing of small passages such as 16th notes with this method. If it is a more relaxed song, it may be helpful to some extent.

However, the following issues remain in terms of matching with the musical score.

――The width of one bar is not the same on the score --Similarly, the beats are not evenly spaced --The tempo changes during the song (Adagio → Allegro, etc.) --Even if the tempo notation is the same, the tempo fluctuates slightly according to the phrase.

If the matching with the musical score cannot be done automatically, the work of attaching Boeing will not be efficient, so the above problems must be overcome. Recognizing the pitch and mapping it to the musical score may be one solution. The difficulty seems to be high (^ _ ^;)

This time, I focused on improving the efficiency of the work of attaching Boeing, but the approach of visualizing the difference in how to handle bows for experienced and beginners and getting noticed from it seems to be interesting. I felt that the technology for estimating poses from this video is a wonderful technology with various potential applications.

Recommended Posts

I tried to visualize Boeing of violin performance by pose estimation
I tried to visualize the spacha information of VTuber
[Python] I tried to visualize the follow relationship of Twitter
I tried to visualize the Beverage Preference Dataset by tensor decomposition.
I tried to visualize the common condition of VTuber channel viewers
I tried to visualize AutoEncoder with TensorFlow
I tried to visualize the age group and rate distribution of Atcoder
I tried to find the optimal path of the dreamland by (quantum) annealing
I tried to verify and analyze the acceleration of Python by Cython
I tried to visualize the text of the novel "Weathering with You" with WordCloud
I tried to explain the latest attitude estimation model "Dark Pose" [CVPR2020]
I tried to visualize all decision trees of random forest with SVG
I tried to verify the result of A / B test by chi-square test
I tried to program bubble sort by language
I tried to notify slack of Redmine update
I tried to get an image by scraping
I tried to touch the API of ebay
I tried to correct the keystone of the image
I tried to classify dragon ball by adaline
I tried to predict the price of ETF
I tried to vectorize the lyrics of Hinatazaka46!
I tried to easily visualize the tweets of JAWS DAYS 2017 with Python + ELK
I tried to rescue the data of the laptop by booting it on Ubuntu
I tried to extract features with SIFT of OpenCV
I tried to summarize how to use matplotlib of python
I tried to extract various information of remote PC from Python by WMI Library
Python practice 100 knocks I tried to visualize the decision tree of Chapter 5 using graphviz
I tried to debug.
I tried to visualize the characteristics of new coronavirus infected person information with wordcloud
[First data science ⑥] I tried to visualize the market price of restaurants in Tokyo
I tried to paste
I tried to erase the negative part of Meros
I tried to visualize the running data of the racing game (Assetto Corsa) with Plotly
[Python] I tried to get Json of squid ring 2
I tried to implement automatic proof of sequence calculation
I tried to classify the voices of voice actors
I tried to summarize the string operations of Python
Since the stock market crashed due to the influence of the new coronavirus, I tried to visualize the performance of my investment trust with Python.
I tried to automatically generate OGP of a blog made with Hugo with tcardgen made by Go
I tried to visualize the power consumption of my house with Nature Remo E lite
I tried to verify the yin and yang classification of Hololive members by machine learning
[Natural language processing] I tried to visualize the remarks of each member in the Slack community
I tried to predict the sales of game software with VARISTA by referring to the article of Codexa
I forgot to evaluate the performance because of the habit of skipping "I tried to aim for the fastest FizzBuzz"
I tried to find the entropy of the image with python
[Horse Racing] I tried to quantify the strength of racehorses
I tried to find the average of the sequence with TensorFlow
I tried to implement anomaly detection by sparse structure learning
I tried to speed up video creation by parallel processing
[Django] I tried to implement access control by class inheritance.
[Introduction to Pandas] I tried to increase exchange data by data interpolation ♬
I tried adding post-increment to CPython. List of all changes
[Python] I tried to visualize tweets about Corona with WordCloud
I tried to classify MNIST by GNN (with PyTorch geometric)
I tried to implement ListNet of rank learning with Chainer
[TF] I tried to visualize the learning result using Tensorboard
[Machine learning] I tried to summarize the theory of Adaboost
[Celebration: Market capitalization $ 2 trillion] I tried to visualize Apple's patent
I tried to fight the Local Minimum of Goldstein-Price Function
I tried to implement blackjack of card game in Python
I tried to learn PredNet