I captured the Touhou Project with Deep Learning ... I wanted to.

(Added on 2017/08/16)

This article is based on that knowledge of the time. It contains obvious mistakes, so I hope you can refer to it moderately.


It's still a little early in mid-February, but it's a double-digit temperature every day and it's spring weather. I think that the news of "Sakurasaku" has begun to reach the people who are watching here. ~~ On the other hand, I have been losing 5 job hunting games in a row. ~~

By the way, about three weeks ago, "Google's Go AI'AlphaGo'wins the professional Go player for the first time in history" Shocking news came in. Go is considered an unsolved issue, and I had just talked about "this could be used for evaluation" the other day, so I was very surprised at the news that it had already been settled in October last year. And at the same time, I thought about this.

"Isn't there a lot of problems that have already been solved just because we don't know? 』\

It may seem surprising, but in fact some games are already learning better behavior than humans. .. (Here is very detailed) The Deep Q-Network used in this video is a 2013 method, and the paper at that time used a very simple model. On the other hand, in the field of image processing, research has progressed in the last few years, and the effectiveness of a model that is far more difficult than the model used in Deep Q-Network has been recognized. The result of AlphaGo is also the feedback of such image processing. So, "If you bring in image processing technology like AlphaGo or more advanced than AlphaGo, it may be possible to capture complicated action games that have not been tried without using inside information. I thought.

This is a little over two weeks of a student who thought about it and acted.

background

Deep Q-Network and similar studies mainly use the Atari 2600 game released in 1977 as a benchmark. This is partly because the library is complete, but the main reason is that it's harder to capture more complex games than the Atari 2600. Especially due to the difficulty of three-dimensional information processing, 3D action games are said to be difficult.

So I focused on 2D shooters as something that is harder than the Atari 2600, easier and more appropriate than 3D. 2D shooters are 2D and don't require as much information as 3D, but they require more complexity and speed than Atari 2600 games. Among them, this time, the trial version is released for free and is easy to obtain Touhou Konjuden ~ Legacy of Lunatic Kingdom. Was targeted for capture.

~~ Actually, the reason I just like the Touhou Project is bigger ~~

design

image.png

Roughly speaking, it is a mechanism that exchanges screenshots of the game screen between Windows and Ubuntu and returns the operation to the game.

Actual operation example

Simple video is available on Twitter.

program

The main programs except model are published on github ... However, it is still in the peaky setting in the prototype state. I don't think it's suitable to write something based on this.

Program description

The client takes a screenshot with PIL, numpy it, and sends the image with socket.

When the server receives it with socket, it formats it with numpy and OpenCV, and Chainer makes death judgment and action decision. If the death judgment is not given, the action is returned by socket. If a death decision is made, a meta sentence is returned and the episode ends. Also, depending on the case, we will start learning with Chainer.

In response to the response, the client sends an action corresponding to DirectInput (DirectX game input system) with SendInput. When the death judgment meta statement is returned, the operation is paused and waits. It keeps waiting if learning takes place, and resumes operation otherwise.

Death judgment

img0006010.jpg

In the Touhou Konjuden, the words "Capture failure" are displayed at the time of death. Therefore, this is used to determine death. Specifically, the server side specifies that the area where this character is displayed is cut out, and whether it is a game image or not is judged by a simple model with three layers. This could be judged with a performance of 99% or more. (However, it is a ball that sometimes malfunctions)

Action decision

img0000012.jpg

Select the action with the highest rating for a single frame of image, as shown above. There are a total of 18 patterns of behavior, and they are as follows.

z (shot button) z (shot button) + 8 directions z (shot button) + SHIFT (slow movement) z (shot button) + SHIFT (slow movement) + 8 directions

In addition, the evaluation took the form of learning and estimating by combining behavior and survival time.

So how was the performance?

I tried various combinations based on Lunatic (highest difficulty), but I couldn't get past the first chapter.

As you can see from the previous Simple video, if it is not RANDOM, the action is uniquely decided. With this, it is inevitable to attack the operating character and you will die. This is probably due to poor learning.

In the above video, I tried to learn the evaluation by sandwiching the learning every time I finished the action to some extent. The above is the result of learning by actually using about 2000 images 5 times each. This is an amount that cannot be said to be sufficient learning even in general image processing.

However, a considerable amount of calculation is required to train with the model of 100 layers or more currently used in image processing. In my environment, it took almost an hour to handle all the images once, even if there were only a few thousand. However, even if you handle it once, you can solve a simple classification problem enough, but as far as I tried this time, it seems that it was not so easy.

Then how can I capture the Touhou Project? ??

In order to do further learning, use a parallel GPU calculation library such as "Distributed TensorFlow" which will be released soon. I think it is necessary. In the above article, an example of parallel processing performed 300 times with 500 units is introduced. If this 300 times processing can be used, the above problem can be solved. (If you actually use this, AlphaGo, which requires nearly 700 days of calculation time, can be learned in just 2 days.) [^ 1]

[^ 1]: I had a little time, so I was checking various documents, but I was skeptical that "this was a misunderstanding, and it was two years since I was dispersed in the first place." I will post it in the original sic for a while until I can confirm it. (Added on 2016/3/13)

However, if you buy 500 TitanX, the most powerful GPUs on the market today, that alone will be over 70 million. It's not an amount that can be paid to poor students, but isn't this amount of calculation manageable?

~~ I want to get a job before that, but I wonder if it can be done ... ~~

References

History of DQN + Deep Q-Network written in Chainer Take a screenshot with Pillow for Python Transfer images from Raspberry Pi with OpenCV Simulate Python keypresses for controlling a game

Correction and apology (Added on February 18, 2016)

AlphaGo uses a technique called Deep Q-Network.

In the first draft, the above sentence was posted at the beginning. However, I received the point that "reinforcement learning is being performed in the process of optimizing the model, but Deep Q-Network is not used, and it is a clear error", and I corrected it.

We would like to take this opportunity to apologize for posting incorrect information.

Recommended Posts

I captured the Touhou Project with Deep Learning ... I wanted to.
I wanted to play with the Bezier curve
I tried to divide with a deep learning language model
I wanted to solve the ABC164 A ~ D problem with Python
I wanted to solve ABC160 with Python
I wanted to solve ABC172 with Python
I really wanted to copy with selenium
I tried to implement deep learning that is not deep with only NumPy
I tried to visualize the model with the low-code machine learning library "PyCaret"
I tried the common story of using Deep Learning to predict the Nikkei 225
The story of doing deep learning with TPU
I wanted to solve NOMURA Contest 2020 with Python
I tried to save the data with discord
I wanted to install Python 3.4.3 with Homebrew + pyenv
I tried to extract a line art from an image with Deep Learning
I tried to implement Cifar10 with SONY Deep Learning library NNabla [Nippon Hurray]
I tried to make deep learning scalable with Spark × Keras × Docker 2 Multi-host edition
I tried deep learning
[Deep Learning from scratch] I implemented the Affine layer
I tried to move machine learning (ObjectDetection) with TouchDesigner
I also wanted to check type hints with numpy
I installed and used the Deep Learning library Chainer
I want to climb a mountain with reinforcement learning
I tried to touch the CSV file with Python
I tried to solve the soma cube with python
I wanted to use the Python library from MATLAB
I want to inherit to the back with python dataclass
I tried to solve the problem with Python Vol.1
[Deep Learning from scratch] I tried to explain Dropout
I tried to compress the image using machine learning
I wrote you to watch the signal with Go
"Deep Learning from scratch" Self-study memo (No. 16) I tried to build SimpleConvNet with Keras
"Deep Learning from scratch" Self-study memo (No. 17) I tried to build DeepConvNet with Keras
I just wanted to extract the data of the desired date and time with Django
I tried to find the entropy of the image with python
I tried to simulate how the infection spreads with Python
I tried to analyze the whole novel "Weathering with You" ☔️
Try deep learning with TensorFlow
Try to build a deep learning / neural network with scratch
[Part 1] Use Deep Learning to forecast the weather from weather images
I wanted to run the motor with Raspberry Pi, so I tried using Waveshare's Motor Driver Board
[Part 3] Use Deep Learning to forecast the weather from weather images
[Evangelion] Try to automatically generate Asuka-like lines with Deep Learning
I tried to find the average of the sequence with TensorFlow
I tried to notify the train delay information with LINE Notify
A story that I wanted to realize the identification of parking lot fullness information using images obtained with a Web camera and Raspberry Pi and deep learning.
Deep Kernel Learning with Pyro
[Deep Learning from scratch] I tried to explain the gradient confirmation in an easy-to-understand manner.
Try Deep Learning with FPGA
I want to change the Japanese flag to the Palau flag with Numpy
What I did to welcome the Python2 EOL with confidence
Recognize your boss and hide the screen with Deep Learning
Mayungo's Python Learning Episode 3: I tried to print numbers with print
"Deep copy" and "Shallow copy" to understand with the smallest example
I tried to implement ListNet of rank learning with Chainer
[TF] I tried to visualize the learning result using Tensorboard
I can't log in to the admin page with Django3
Introduction to Deep Learning ~ Learning Rules ~
[Machine learning] I tried to summarize the theory of Adaboost
Deep Reinforcement Learning 1 Introduction to Reinforcement Learning
I tried to divide the file into folders with Python