I'm a gamer, so I want to make AI for games every day.
This time, I came up with the idea of creating a video analysis AI for Splatoon players.
In this article ・ Approximate content ・ Results by image classification ・ The demo video Until,
In the next article ・ We plan to post the results of the video classification model.
In terms of tasks, "action segmentation" is close. Action segmentation is a classification model for videos that predicts which action class each frame belongs to. For example, on a golf swing 1F ~ 30F "Backswing" 31F ~ 45F "Downswing" 46F ~ 65F "Follow Through" I feel like.
We will do this with game data.
To put it simply, it outputs the label at the bottom left of this.
A sample of Splatoon's behavior recognition (the lower left is the output label) by machine learning pic.twitter.com/eNTT5PHNoT
& mdash; itdk (@ itdk1996) December 26, 2020
label | Action |
---|---|
Paint(painting) | I'm painting around. Include checks here |
attack(atack) | 対面している。相手をattackしている。塗りを行動は同じ |
Move(moving) | イカ、あるいはヒト状態でMoveする。塗りながらのMoveを含める |
Hidden(hidden) | Includes search and recovery. Same state as moving |
map(map) | 生きている状態でmapを開いた状態 |
Special(special) | Specialの使用 |
Super jump(super jump) | Super jump |
object(object) | Rule involvement. Playing related to areas, hoko, clams, and yagura. It's easy to overlap, but it's an important factor in a match |
Respawn(respawn) | Unique during death. |
opening(opening) | opening |
ending(ending) | ending |
is.
What is difficult about this is that there are duplicate classes on the input. As shown in the figure below, the act of "inking out" has the purpose of "attacking," "painting," or "being involved in an object." It's action purpose segmentation, which means "get the purpose." Due to duplication, class priorities are pre-determined.
The left side has priority. For example, if an object is involved, it can be moved while looking at the map, so in that case the object label is given priority.
Also, due to these properties, there are some restrictions on classifying images by themselves. Since the image is transitioning, it will be "moving", but the state is the same as "hidden".
It is "attacking" because there are enemies, but it is quite similar to "painting".
Therefore, the image-based method has its limits. So I started working on the assumption that I would use a model for the task of action segmentation.
This doesn't make much sense in itself, but if you look at the whole video, you can make the following table.
1 video ---> Action distribution If you can do this,
・ Comparison of good people and bad people ・ Comparison of the same person's rules ・ Comparison of good and bad of the same person
You can do various things. I think that if the numerical information is removed from the video of the content of the play, it can be used for various analyzes.
It takes up to 6 minutes per piece, but it took a long time to review it ...
As a result of consideration in advance, I decided to squeeze the environment to some extent and try it. ・ Rules (Gachihoko, Gachi area, etc.) are appropriately dispersed. ・ The stage is appropriate. The one that was there at that time. Biased ・ Weapons are Hero Roller and Hero Roller Betchu ・ I'm wearing a squid ninja ・ My Udemae is all X ・ 21 learning and 5 tests (1 area, 1 clam, clam, yagura, nawabari) ・ The average video is about 4 minutes.
It's a feeling. I was wondering if I could scatter the weapons, but the gears changed and the charge time of the spinner charger became complicated, so I decided to go with my own weapons for the time being.
Also, it may be allowed because it is play, but the quality and standard of annotation are slightly different between the first (r1, mp4) and the last (r20.mp4) lol
For example ・ Find a partner ·Moving ·Attack
Against Hiding → moving → attacking Or because it is a move for an attack Hiding → attacking It may be. (In addition, I thought that the special was only landing, so I used Roller Betchu for almost the first time)
Considering whether to publish
First, you have the choice of going with transfer learning or fine tuning. I can think of -Splatoon domain, too sharp. Does transfer learning not improve the distributed expression? ・ Since the number of videos is not large, it may be overfitting if fine tuning is performed. So I will verify which of these two is better
The network structure looks like this
Evaluation is tedious, so we use Accuracy. There are various evaluation methods for action segmentation, but I will think about that later.
The result of trying it is like this
model | mode | accuracy |
---|---|---|
VGG | Transfer learning | 64.7 |
mibileNet | Transfer learning | 61.6 |
VGG | fine tuning | 62.7 |
mobileNet | fine tuning | 59.7 |
By the way, fine tuning takes some time.
The transfer learning of VGG was the best. A little surprising
Also, I thought fine tuning would be better, but it wasn't.
Class | Precision |
---|---|
total_accuracy | 0.64768 |
opening | 0.97024 |
moving | 0.57446 |
hidden | 0.12668 |
painting | 0.53247 |
battle | 0.68719 |
respawn | 0.93855 |
superjump | 0.45745 |
object | 0.22072 |
special | 0.76923 |
map | 0.38922 |
ending | 0.98046 |
It seems that the class that was supposed to some extent is still difficult to classify.
It's fun to make machine learning from datasets and do various things Video model will be tackled from now on
However, as I understand the current situation, the stage I have seen (Arowana Mall in the test data) seems to be qualitatively quite hit. So, it may be okay to create learning data for the time being at all stages.
youtube playlist https://www.youtube.com/playlist?list=PL0Al7LkpEHRJPkI6HZfrUP9HKv1TI_tHn
There are 5 test videos. One each for Nawabari, Area, Hoko, Yagra, and Clam Also, the stage may or may not have training data. I'm wondering if that area will be a source of consideration, and there are both to see if it will work even on a stage I have never seen. Whether it is in the training data is in the video summary section.
GitHub https://github.com/daikiclimate/action_segmentation I've put the weights and the code for the demo, so I think it can be executed if you have the video and environment. Unverified
Recommended Posts