[DanceDanceRevolution] Is it possible to predict the difficulty level (foot) from the value of the groove radar?

DanceDanceRevolution [^ DDR] is one of the music games developed by KONAMI. DanceDanceRevolution has a difficulty level for each musical score [^ level], which shows how difficult it is to play that musical score.

Apart from that, there is a mechanism called groove radar, which shows the tendency of the musical score. Each element is as follows.

STREAM: Average density . The higher the number of objects in the song, the higher the value.
VOLTAGE: highest density . The more objects there are, the higher the number of objects in 4 beats.
AIR: Jump frequency . The more objects you should not step on or step on at the same time, the higher the price.
FREEZE: constraint . The more beats you keep stepping on some panel, the higher it gets.
CHAOS: irregularity . The more fine rhythms and shifts, the higher the price.

Groove radar numbers can be calculated and calculated exactly from the score itself. The formula has not been published, but it has been revealed with considerable accuracy by volunteer players.

On the other hand, the difficulty value is artificially determined by the production side. Therefore, the difficulty level may be reviewed at the timing of version upgrade.

Then, is it possible to estimate the difficulty level from the numerical value of the groove radar? let's do it.

[^ DDR]: It's long and I want to omit it, but if I omit it with Qiita, I feel that it will be an obstacle for people who want to check memory, so I will not omit it. [^ level]: Since it was once indicated by a foot icon, it is written as “foot 16”.

environment

Python3 + JupyterLab

Matplotlib

NumPy

Pandas

PyCM

SciPy

Seaborn

Preparation

Import etc.

import math from pathlib import Path import matplotlib.pyplot as plt import numpy as np import pandas as pd import seaborn as sns from pycm import ConfusionMatrix from scipy.optimize import minimize, differential_evolution, Bounds

def ustd_coefficient(n): try: return math.sqrt(n / 2) * math.gamma((n - 1) / 2) / math.gamma(n / 2) except OverflowError: return math.sqrt(n / (n - 1.5)) def std_u(a): return np.std(a) * ustd_coefficient(len(a)) oo = math.inf

Read the data of each musical score. As data, the data of the old song and the new song of DanceDanceRevolution A20 from the BEMANI wiki at that time was set to CSV. Place it at here. This time, the training data that uses the old song for fitting and the new song will be the evaluation data. Now, let's read it and make it a DataFrame.

old_csv = Path('./old.csv') new_csv = Path('./new.csv') train_df = pd.read_csv(old_csv) test_df = pd.read_csv(new_csv) display(train_df) display(test_df)

VERSION MUSIC SEQUENCE LEVEL STREAM VOLTAGE AIR FREEZE CHAOS

0 DanceDanceRevolution A Words of love BEGINNER 3 21 22 7 26 0

1 DanceDanceRevolution A Words of love BASIC 5 34 22 18 26 0

2 DanceDanceRevolution A Words of love DIFFICULT 7 43 34 23 26 7

3 DanceDanceRevolution A Words of love EXPERT 11 63 45 21 25 28

4 DanceDanceRevolution A Amanojaku BEGINNER 3 20 25 0 0 0

... ... ... ... ... ... ... ... ... ...

3390 DanceDanceRevolution 1st PARANOiA EXPERT 11 67 52 25 0 17

3391 DanceDanceRevolution 1st TRIP MACHINE BEGINNER 3 25 26 5 0 0

3392 DanceDanceRevolution 1st TRIP MACHINE BASIC 8 47 40 14 0 4

3393 DanceDanceRevolution 1st TRIP MACHINE DIFFICULT 9 52 40 30 0 7

3394 DanceDanceRevolution 1st TRIP MACHINE EXPERT 10 56 53 36 0 12

3395 rows × 9 columns

VERSION MUSIC SEQUENCE LEVEL STREAM VOLTAGE AIR FREEZE CHAOS

0 DanceDanceRevolution A20 Okay! Lovely! Sweetie! Darling! BEGINNER 3 18 21 5 16 0

1 DanceDanceRevolution A20 Okay! Lovely! Sweetie! Darling! BASIC 7 37 28 18 39 0

2 DanceDanceRevolution A20 Okay! Lovely! Sweetie! Darling! DIFFICULT 12 60 56 54 55 21

3 DanceDanceRevolution A20 Okay! Lovely! Sweetie! Darling! EXPERT 15 95 99 30 25 100

4 DanceDanceRevolution A20 Revolution Passionate BEGINNER 3 16 16 1 35 0

... ... ... ... ... ... ... ... ... ...

380 DanceDanceRevolution A20 50th Memorial Songs -The BEMANI History- EXPERT 13 63 79 14 62 63

381 DanceDanceRevolution A20 50th Memorial Songs -When we are two ~ under the cherry bl... BEGINNER 3 17 20 3 46 0

382 DanceDanceRevolution A20 50th Memorial Songs -When we are two ~ under the cherry bl... BASIC 7 40 33 36 29 0

383 DanceDanceRevolution A20 50th Memorial Songs -When we are two ~ under the cherry bl... DIFFICULT 9 50 46 47 3 6

384 DanceDanceRevolution A20 50th Memorial Songs -When we are two ~ under the cherry bl... EXPERT 12 73 60 60 15 32

385 rows × 9 columns

Furthermore, we will standardize the numerical values of each groove radar. Make sure that the average of the training data is 0 and the standard deviation is 1, and the same operation is performed for the evaluation data.

grs = ['STREAM', 'VOLTAGE', 'AIR', 'FREEZE', 'CHAOS'] sgrs = ['S_{}'.format(gr) for gr in grs] m = {} s = {} for gr, sgr in zip(grs, sgrs): v = train_df.loc[:, gr].values v_t = test_df.loc[:, gr].values m[gr] = np.mean(v) s[gr] = std_u(v) train_df[sgr] = (v - m[gr]) / s[gr] test_df[sgr] = (v_t - m[gr]) / s[gr] display(train_df.loc[:, sgrs]) display(test_df.loc[:, sgrs])

S_STREAM S_VOLTAGE S_AIR S_FREEZE S_CHAOS

0 -0.981448 -0.838977 -0.636332 0.056063 -0.661167

1 -0.534364 -0.838977 -0.160513 0.056063 -0.661167

2 -0.224844 -0.405051 0.055768 0.056063 -0.441192

3 0.462978 -0.007285 -0.030744 0.014296 0.218735

4 -1.015839 -0.730495 -0.939125 -1.029883 -0.661167

... ... ... ... ... ...

3390 0.600542 0.245838 0.142280 -1.029883 -0.126941

3391 -0.843883 -0.694335 -0.722844 -1.029883 -0.661167

3392 -0.087279 -0.188088 -0.333538 -1.029883 -0.535467

3393 0.084676 -0.188088 0.358562 -1.029883 -0.441192

3394 0.222240 0.281999 0.618099 -1.029883 -0.284066

3395 rows × 5 columns

S_STREAM S_VOLTAGE S_AIR S_FREEZE S_CHAOS

0 -1.08462 -0.87514 -0.72284 -0.36161 -0.66117

1 -0.43119 -0.62201 -0.16051 0.599036 -0.66117

2 0.359805 0.39048 1.396711 1.26731 -0.00124

3 1.563493 1.945381 0.358562 0.014296 2.481343

4 -1.1534 -1.05594 -0.89587 0.431967 -0.66117

... ... ... ... ... ...

380 0.462978 1.222171 -0.33354 1.55968 1.318614

381 -1.11901 -0.9113 -0.80936 0.891406 -0.66117

382 -0.32802 -0.44121 0.618099 0.181364 -0.66117

383 0.015894 0.028875 1.093917 -0.90458 -0.47262

384 0.806889 0.535122 1.656248 -0.40338 0.344436

385 rows × 5 columns

Then, extract the 2nd floor tensor that shows the groove radar of each musical score and the 1st floor tensor that shows the difficulty level of each musical score.

train_sgr_arr = train_df.loc[:, sgrs].values test_sgr_arr = test_df.loc[:, sgrs].values train_level_arr = train_df.loc[:, 'LEVEL'].values test_level_arr = test_df.loc[:, 'LEVEL'].values

Multiple regression analysis by least squares method

Multiple regression analysis is based on the following concept.

There is an explanatory variable group $ x_n $ and an objective variable $ y . In this case, the explanatory variables are the values of the groove radar. The objective variable is the difficulty level. At this time, $ y' = k_0 + k_1x_1 + k_2x_2 + \cdots + k_nx_n $$ Considering the coefficient groups $ k_n $ and $ y'$, the square error of $ m $ data $ e ^ 2: = \ sum_ {i = 1} ^ m \ left (y'_i --y_i \ right) ) $ K_n $ will be searched so that ^ 2 $ is the smallest. This time, we will find such an optimization problem by the differential evolution method using SciPy.

First, define the function you want to minimize. It's $ e ^ 2 $.

def hadprosum(a, b): return (a * b).sum(axis=1) def estimate(x, sgr_arr): x_const = x[0] x_coef = x[1:] return hadprosum(sgr_arr, x_coef) + x_const def sqerr(x): est = estimate(x, train_sgr_arr) return ((est - train_level_arr) ** 2).sum()

Give this to SciPy's differential_evolution function. As for the search range, I give it a range that seems to be sufficient while trying various things.

bounds = Bounds([0.] * 6, [10.] * 6) result = differential_evolution(sqerr, bounds, seed=300) print(result)

fun: 5170.056057917698 jac: array([-0.00236469, 0.14933903, 0.15834303, 0.07094059, 0.01737135, 0.1551598 ]) message: 'Optimization terminated successfully.' nfev: 3546 nit: 37 success: True x: array([8.04447683, 2.64586828, 0.58686288, 0.42785461, 0.45934494, 0.4635763 ])

Looking at this result, it seems that STREAM has the most influence, followed by VOLTAGE, CHAOS, FREEZE, AIR.

Now, let's evaluate using the parameters actually obtained.

First, define a function for prediction. Returns a tensor of the predicted difficulty that gives this function a parameter and a tensor of groove radar.

def pred1(x, sgr_arr): est = estimate(x, sgr_arr).clip(1., 19.) return np.round(est).astype(int)

A confusion matrix object is created by giving the return value of this function and the actual difficulty to PyCM's ConfusionMatrix. Access the property of this and find the correct answer rate and macro F value.

train_pred1_arr = pred1(result.x, train_sgr_arr) test_pred1_arr = pred1(result.x, test_sgr_arr) train_cm1 = ConfusionMatrix(train_level_arr, train_pred1_arr) test_cm1 = ConfusionMatrix(test_level_arr, test_pred1_arr) print('====================') print('Train Score') print(' Accuracy: {}'.format(train_cm1.Overall_ACC)) print(' Fmeasure: {}'.format(train_cm1.F1_Macro)) print('====================') print('Test Score') print(' Accuracy: {}'.format(test_cm1.Overall_ACC)) print(' Fmeasure: {}'.format(test_cm1.F1_Macro)) print('====================')

==================== Train Score Accuracy: 0.33431516936671574 Fmeasure: 0.2785969345790368 ==================== Test Score Accuracy: 0.3142857142857143 Fmeasure: 0.24415916194348555 ====================

The correct answer rate was 31.4%. This is a much lower result than I expected. Let's heatmap the confusion matrix with Seaborn.

plt.figure(figsize=(10, 10), dpi=200) sns.heatmap(pd.DataFrame(test_cm1.table), annot=True, square=True, cmap='Blues') plt.show()

The horizontal axis is the actual difficulty level, and the vertical axis is the predicted difficulty level. Looking at this, it seems that songs with a low difficulty level are high, those with a certain level of difficulty are evaluated low, and those with a higher difficulty level are overestimated. The heat map is not a straight line but bends inward to draw a bow shape.

Maximum likelihood estimation method (order) Logistic regression analysis

Then try logistic regression analysis. Logistic regression analysis is suitable for analysis that takes some probability as the objective variable. Consider the following formula. $ y' = \frac{1}{1 + \exp\left[-\left(k_0 + k_1x_1 + k_2x_2 + \cdots + k_nx_n\right)\right]} $ This $ y'$ takes a value from 0 to 1. When the objective variable $ y $ is determined by 0 or 1 in $ m $ data, the likelihood $ l = \ prod_ {i = 1} ^ m y_iy_i'+ (1 --y_i) (1 --y_i' ) Consider maximizing $. It is difficult to understand if you write it in a mathematical formula, but $ y'$ is the probability of being positive, and if $ y $ is 1 that is positive, then $ y'$ as it is, and if $ y $ is 0, that is, negative. Probability, that is, use the value obtained by subtracting $ y'$ from 1. If you multiply $ y'$, which is the interval of $ (0, 1) $, the value will become too small and difficult to handle on a computer, so log-likelihood $ \ log l = \ sum_ {i = 1} ^ m \ log \ left [y_iy_i'+ (1 --y_i) (1 --y_i') \ right] It is normal to maximize $.

This time we're not dealing with negative or positive, but where in the ordered class they belong. In such a case, assume multiple logistic curves that differ only in the constant term $ k_0 $, and consider them as “probability of difficulty level 2 or higher”, “probability of difficulty level 3 or higher”, ..., respectively. The "probability of difficulty level 2" is the "probability of difficulty level 2 or higher" minus the "probability of difficulty level 3 or higher", so the likelihood can be calculated from this.

First, convert the difficulty level to a one-hot format 2nd floor tensor for calculation.

train_level_onehot_arr = np.zeros(train_level_arr.shape + (19,)) for i, l in np.ndenumerate(train_level_arr): train_level_onehot_arr[i, l - 1] = 1.

Then give the function to be minimized. Since it is a minimization, we define the above log-likelihood with a minus.

def upperscore(x, sgr_arr): x_const = np.append(np.append(oo, x[:18].copy()), -oo) #Insert infinity at both ends for probability 1 above 1 and probability 0 greater than 20 x_coef = x[18:] var = np.asarray([hadprosum(sgr_arr, x_coef)]).T cons = np.asarray([x_const]) return 1 / (1 + np.exp(-(var + cons))) def score(x, sgr_arr): us = upperscore(x, sgr_arr) us_2 = np.roll(us, -1) return np.delete(us - us_2, -1, axis=1) #Shift and pull,Remove the end to get the probability of each difficulty def mloglh(x): sc = score(x, train_sgr_arr) ret = -(np.log((sc * train_level_onehot_arr).sum(axis=1).clip(1e-323, oo)).sum()) return ret

Perform a search. Please note that it will take considerably longer than before.

bounds = Bounds([-60.] * 18 + [0] * 5, [20.] * 18 + [10] * 5) result = differential_evolution(mloglh, bounds, seed=300) print(result)

fun: 4116.792196474322 jac: array([ 0.00272848, 0.00636646, -0.00090949, 0.00327418, -0.00563887, -0.00291038, -0.00509317, 0.00045475, 0.00800355, 0.00536602, -0.00673026, 0.00536602, 0.00782165, -0.01209628, 0.00154614, -0.0003638 , 0.00218279, 0.00582077, 0.04783942, 0.03237801, 0.01400622, 0.00682121, 0.03601599]) message: 'Optimization terminated successfully.' nfev: 218922 nit: 625 success: True x: array([ 14.33053717, 12.20158703, 9.97549255, 8.1718939 , 6.36190483, 4.58724228, 2.61478521, 0.66474105, -1.46625252, -3.60065138, -6.27127806, -9.65032254, -14.06390123, -18.287351 , -23.44011235, -28.39033479, -32.35825176, -43.38390248, 6.13059504, 2.01974223, 0.64631137, 0.67555403, 2.44873606])

This time, STREAM> CHAOS> VOLTAGE> FREEZE> AIR, and you can see that CHAOS has a greater influence.

Let's actually evaluate this as well.

def pred2(x, sgr_arr): sc = score(x, sgr_arr) return np.argmax(sc, axis=1) + 1 train_pred2_arr = pred2(result.x, train_sgr_arr) test_pred2_arr = pred2(result.x, test_sgr_arr) train_cm2 = ConfusionMatrix(train_level_arr, train_pred2_arr) test_cm2 = ConfusionMatrix(test_level_arr, test_pred2_arr) print('====================') print('Train Score') print(' Accuracy: {}'.format(train_cm2.Overall_ACC)) print(' Fmeasure: {}'.format(train_cm2.F1_Macro)) print('====================') print('Test Score') print(' Accuracy: {}'.format(test_cm2.Overall_ACC)) print(' Fmeasure: {}'.format(test_cm2.F1_Macro)) print('====================')

==================== Train Score Accuracy: 0.4960235640648012 Fmeasure: 0.48246495009640167 ==================== Test Score Accuracy: 0.5454545454545454 Fmeasure: 0.5121482282311358 ====================

This time, the correct answer rate was 54.5%. It's a lot better than before, but it's still a long way off.

plt.figure(figsize=(10, 10), dpi=200) sns.heatmap(pd.DataFrame(test_cm2.table), annot=True, square=True, cmap='Blues') plt.show()

This is generally on a straight line, but the low and high difficulty levels are still not good.

Conclusion

The bottom line is, "You can get that kind of value, but it's not practical." I made this attempt with the hope that it could be used as a reference for adding difficulty to the score of StepMania, but in the end it seems that it will be necessary to play and adjust.

Another thing is that in logistic regression, each constant term is naturally constrained by the magnitude relation, but in this code, depending on the random number, the constraint may not be met, and an abnormal value may result in a convergence test [^ bound]. ]. SciPy's optimization function can give constraints with inequalities, but it didn't work for me because I got an error like a constraint of unknown form was passed. Search time is also wasted, so if anyone can solve it, I would like to ask a professor.

[^ bound]: I encountered such a phenomenon once at the stage of adjusting the actual range.

Recommended Posts
[DanceDanceRevolution] Is it possible to predict the difficulty level (foot) from the value of the groove radar?

Is it possible to extract the person's profile information from the chat log?

To make it possible to connect from the outside (other than localhost) with dev_appserver.py of GAE / Py

Try to predict the value of the water level gauge by machine learning using the open data of Data City Sabae

From the introduction of GoogleCloudPlatform Natural Language API to how to use it

Find out the name of the method that called it from the method that is python

LightGBM predict contributes to the predicted value

Hackathon's experience that it is most important to understand the feelings of the organizer

The value of pyTorch torch.var () is not distributed

Change the decimal point of logging from, to.

From the introduction of pyethapp to the execution of contract

The story of moving from Pipenv to Poetry

I tried to predict the price of ETF

The background of the characters in the text image is overexposed to make it easier to read.

[Python Data Frame] When the value is empty, fill it with the value of another column.

	VERSION	MUSIC	SEQUENCE	LEVEL	STREAM	VOLTAGE	AIR	FREEZE	CHAOS
0	DanceDanceRevolution A	Words of love	BEGINNER	3	21	22	7	26	0
1	DanceDanceRevolution A	Words of love	BASIC	5	34	22	18	26	0
2	DanceDanceRevolution A	Words of love	DIFFICULT	7	43	34	23	26	7
3	DanceDanceRevolution A	Words of love	EXPERT	11	63	45	21	25	28
4	DanceDanceRevolution A	Amanojaku	BEGINNER	3	20	25	0	0	0
...	...	...	...	...	...	...	...	...	...
3390	DanceDanceRevolution 1st	PARANOiA	EXPERT	11	67	52	25	0	17
3391	DanceDanceRevolution 1st	TRIP MACHINE	BEGINNER	3	25	26	5	0	0
3392	DanceDanceRevolution 1st	TRIP MACHINE	BASIC	8	47	40	14	0	4
3393	DanceDanceRevolution 1st	TRIP MACHINE	DIFFICULT	9	52	40	30	0	7
3394	DanceDanceRevolution 1st	TRIP MACHINE	EXPERT	10	56	53	36	0	12

	VERSION	MUSIC	SEQUENCE	LEVEL	STREAM	VOLTAGE	AIR	FREEZE	CHAOS
0	DanceDanceRevolution A20	Okay! Lovely! Sweetie! Darling!	BEGINNER	3	18	21	5	16	0
1	DanceDanceRevolution A20	Okay! Lovely! Sweetie! Darling!	BASIC	7	37	28	18	39	0
2	DanceDanceRevolution A20	Okay! Lovely! Sweetie! Darling!	DIFFICULT	12	60	56	54	55	21
3	DanceDanceRevolution A20	Okay! Lovely! Sweetie! Darling!	EXPERT	15	95	99	30	25	100
4	DanceDanceRevolution A20	Revolution Passionate	BEGINNER	3	16	16	1	35	0
...	...	...	...	...	...	...	...	...	...
380	DanceDanceRevolution A20	50th Memorial Songs -The BEMANI History-	EXPERT	13	63	79	14	62	63
381	DanceDanceRevolution A20	50th Memorial Songs -When we are two ~ under the cherry bl...	BEGINNER	3	17	20	3	46	0
382	DanceDanceRevolution A20	50th Memorial Songs -When we are two ~ under the cherry bl...	BASIC	7	40	33	36	29	0
383	DanceDanceRevolution A20	50th Memorial Songs -When we are two ~ under the cherry bl...	DIFFICULT	9	50	46	47	3	6
384	DanceDanceRevolution A20	50th Memorial Songs -When we are two ~ under the cherry bl...	EXPERT	12	73	60	60	15	32

	S_STREAM	S_VOLTAGE	S_AIR	S_FREEZE	S_CHAOS
0	-0.981448	-0.838977	-0.636332	0.056063	-0.661167
1	-0.534364	-0.838977	-0.160513	0.056063	-0.661167
2	-0.224844	-0.405051	0.055768	0.056063	-0.441192
3	0.462978	-0.007285	-0.030744	0.014296	0.218735
4	-1.015839	-0.730495	-0.939125	-1.029883	-0.661167
...	...	...	...	...	...
3390	0.600542	0.245838	0.142280	-1.029883	-0.126941
3391	-0.843883	-0.694335	-0.722844	-1.029883	-0.661167
3392	-0.087279	-0.188088	-0.333538	-1.029883	-0.535467
3393	0.084676	-0.188088	0.358562	-1.029883	-0.441192
3394	0.222240	0.281999	0.618099	-1.029883	-0.284066

	S_STREAM	S_VOLTAGE	S_AIR	S_FREEZE	S_CHAOS
0	-1.08462	-0.87514	-0.72284	-0.36161	-0.66117
1	-0.43119	-0.62201	-0.16051	0.599036	-0.66117
2	0.359805	0.39048	1.396711	1.26731	-0.00124
3	1.563493	1.945381	0.358562	0.014296	2.481343
4	-1.1534	-1.05594	-0.89587	0.431967	-0.66117
...	...	...	...	...	...
380	0.462978	1.222171	-0.33354	1.55968	1.318614
381	-1.11901	-0.9113	-0.80936	0.891406	-0.66117
382	-0.32802	-0.44121	0.618099	0.181364	-0.66117
383	0.015894	0.028875	1.093917	-0.90458	-0.47262
384	0.806889	0.535122	1.656248	-0.40338	0.344436