I'm a 4th year college student learning about reinforcement learning. I have already graduated. I started my research simply thinking that I could make money with the power of AI. Even if I search for it because it is related to finance, no information comes out ... So, my research this year, "Acquisition of investment strategy through deep reinforcement learning" I hope it will be helpful to someone somewhere. (I haven't published it in academic societies, so don't be afraid ... I can't find it even if I search for a paper.) (I wrote that it is a research on making money, but it is not an information product, etc. Please be assured that there is no guidance to strange URLs.)

This time, we introduced it as (1). (2) I would like to explain the actual theory and program after (3).

In addition, the content posted this time is not something that I learned in class, but everything that I learned by myself. I'm sure you'll find some mistakes everywhere, but I hope you'll take a warm look. Thank you.

What is reinforcement learning (policy gradient method)?

First of all, I will briefly introduce what reinforcement learning is, using "AlphaGo" as an example, which made me interested in reinforcement learning.

AlphaGo

Roughly speaking (it's really rough. I'm sorry for the experts)

Get the current board from the Go environment
The reinforcement learning agent reads the board
Output the ** probability distribution ** of the mover
Probabilistically determine the next move from the probability distribution

That is a series of steps to decide what to do from the board.

Output the ** probability distribution ** of the mover

That is the miso, depending on the current board It's better to hit next, and if you hit it, it's close to winning ** High probability **, For a hand that will be in a pinch if you hit it, ** lower the probability ** The goal is to output an "appropriate" probability distribution.

Then, it is possible to input and output such a reinforcement learning agent. It is a familiar "neural network" in deep learning, The role of reinforcement learning is to "appropriately" learn the probability distribution output by the neural network.

That's why the two are combined into deep reinforcement learning.

This is amazing in Alpha Go

AlphaGo and deep reinforcement learning It can be said that it is an algorithm that "learns a ** appropriate ** probability distribution according to the current environment". The great thing about this algorithm is

-"Learning without using the knowledge discovered by a person called" Joseki "" -"Converging in real time" ―― "It is a strength that far surpasses humans."

That's right ... The knowledge that people have discovered while competing for decades and hundreds of years is so easy to lose. It is this deep reinforcement learning that exerts a strong power in Go.

Application to stock investment

For the time being, leave the detailed theory of deep reinforcement learning, etc. Isn't it possible to use the algorithm of "learning the ** appropriate ** probability distribution according to the current environment" for stock prices? I thought.

People buy and sell stocks

In this way, you trade according to price movements from the past to the present. Of course, it can fail because we don't know the future.

Reinforcement learning agent reads the stock price

Isn't it possible to convert it into probability and trade?

If you can find the probability of buying and selling at the next point in time without pinpointing the stock price It is quite possible to make money.

Therefore

That is the goal of this time and the content of the program.

About the result

If the result is bad, it will be meaningless to read it, so I will put it out first. However, I am aware that there are many points that are lacking in explanation. I will post detailed results and methods at a later date, so now the probability converges even with this reinforcement learning method, and I wonder if it can make some profit, regardless of whether it surpasses humans. I hope you can think about it.

Learning data ↓

State of learning ↓

--The blue line is the value of the error function of the neural network. You can see that the probability converges as it approaches 0. ――The orange line is the average profit obtained during the 50 days of the study period.

The average profit increases as the number of steps increases.

Regarding buying and selling -"Take a buy position" --"Take a sell position (short sale)" -"Resolve or do not have a position" We have prepared three outputs, and we are buying and selling one unit on the premise that we can always buy (sell) one unit of the Nikkei Stock Average.

--Specific output design --Structure of the neural network itself --Buying and selling rules --Reinforcement learning algorithm --Actual program --Detailed results (including test period results)

I would like to post about it at a later date.

It may be a long post, but please keep in touch. Thank you.

Stock investment by deep reinforcement learning (policy gradient method) (1)

What is reinforcement learning (policy gradient method)?

This is amazing in Alpha Go

Application to stock investment

About the result