[Nekopuni's blog](http://nekopuni.holy.jp/2014/09/python%E5%BC%B7%E5%8C%96%E5%AD%A6%E7%BF%92%EF%BC % 8B% E7% 82% BA% E6% 9B% BF% E3% 83% 88% E3% 83% AC% E3% 83% BC% E3% 83% 89% E6% 88% A6% E7% 95% A5 Inspired by% E3% 81% 9D% E3% 81% AE2 /), I investigated algorithmic trading using reinforcement learning for both hobbies and practical benefits. I'm not completely confident in what I'm writing (especially the code). The main use is for personal memos, but I would appreciate it if you could use it as a reference for a small survey.
The difference between reinforcement learning and supervised learning is as follows.
In supervised learning, conditions are optimized without a unified trading policy. On the other hand, in reinforcement learning, it is possible to construct algorithms including the environment and policies, so it is considered that the effectiveness of using reinforcement learning is high.
■Value Function RL Action value is assigned by state or state action pair. Based on this value, we optimize the policy (the policy of action to be taken when a certain state is obtained). This will maximize long-term expected rewards. Q-learning is categorized here. For details on Q-learning, see Reference 5 .
■Direct RL We will directly adjust the reward function (values) based on the experience (observed values) obtained from the environment. Unlike Q-learning, Q-table is not required, so the amount of temporal / spatial calculation is small. However, maximizing expected rewards will be short-term. Recurrent Reinforcement Learning (RRL) is categorized here. For more information on RRL, see Reference 5 < img src = "http://ir-jp.amazon-adsystem.com/e/ir?t=shimashimao06-22&l=as2&o=9&a=4627826613" width="1" height="1" border="0" alt Please refer to = "" style = "border: none! Important; margin: 0px! Important;" />. (This time, I am writing the code using this RRL.)
RRL Financial Trading Framework
Uses Differential Sharpe Ratio (DSR). Used when updating weight.
The form of the formula is the same as a simple one-layer neural network. Actually, the neural network method is applied, and the threshold $ v_t $ is also included in the weight vector for optimization.
■sharpe ratio
■Differential Sharpe Ratio(DSR)
An improved version of the sharpe ratio moving average for online learning.
DSR is lighter in calculation and converges faster (it seems).
Ueno $ \ hat {S} $ is Taylor expanded around η = 0 and the first term is acquired.
Consider $ D_t $ as an immediate performance measure and update the weight to maximize it.
There is a statement that it should be set as a parameter, but there is no specific description. There seems to be no choice but to set it with an empirical value. ..
The sign function is once calculated as tanh and signalized depending on whether $ F_t $ is greater than 0 or not. Functions such as loss cut are not written. It's a rather suspicious code, though I say it myself. Please see for reference.
python
# coding: utf-8
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from math import tanh, copysign
class RRLAgentForFX:
TRADING_COST = 0.003
EPS = 1e-6
def __init__(self,M,rho=0.01,eta=0.1,bias=1.0):
np.random.seed(555)
self.M = M # number of lags
self.weights = np.zeros(self.M+3,dtype=np.float64)
self.bias = bias # bias term
self.rho = rho
self.eta = eta
self.price_diff = np.zeros(self.M+1) # r_t
self.pre_price = None
self.pre_signal = 0
self.pre_A = 0.0
self.pre_B = 0.0
self.pre_gradient_F = 0.0
# result store
self.signal_store = []
self.profit_store = []
self.dsr_store = []
self.sr_store = []
self.cumulative_profit = 0.0
def train_online(self, price):
self.calculate_price_diff(price)
signal, self.F_t_value = self.select_signal()
print "signal",signal
self.calculate_return(signal)
self.update_parameters()
self.pre_price = price
self.pre_signal = signal
# store result
self.signal_store.append(signal)
def calculate_price_diff(self,price):
r = price - self.pre_price if self.pre_price is not None else 0
self.price_diff[:self.M] = self.price_diff[1:]
self.price_diff[self.M] = r
def calculate_return(self,signal):
R_t = self.pre_signal*self.price_diff[-1]
R_t -= self.TRADING_COST*abs(signal - self.pre_signal)
self.return_t = R_t
self.cumulative_profit += R_t
self.profit_store.append(self.cumulative_profit)
def select_signal(self):
values_sum = (self.weights[:self.M+1]*self.price_diff).sum()
values_sum += self.weights[-2]*self.pre_signal
values_sum += self.bias*self.weights[-1]
F_t_value = tanh(values_sum)
return copysign(1, F_t_value ), F_t_value
def update_parameters(self):
# update weight
self.weights += self.rho*self.calculate_gradient_weights()
print "weight",self.weights
# update moment R_t
self.update_R_moment()
def calculate_gradient_weights(self):
""" differentiate between D_t and w_t """
denominator = self.pre_B-self.pre_A**2
if denominator!=0:
diff_D_R = self.pre_B-self.pre_A*self.return_t
diff_D_R /= (denominator)**1.5
else:
diff_D_R = 0
gradient_F = self.calculate_gradient_F()
print "gradient_F",gradient_F
#diff_R_F = -self.TRADING_COST
#diff_R_F_{t-1} = self.price_diff[-1] - self.TRADING_COST
delta_weights = -self.TRADING_COST*gradient_F
delta_weights += ( self.price_diff[-1] - self.TRADING_COST) \
*self.pre_gradient_F
delta_weights *= diff_D_R
self.pre_gradient_F = gradient_F
return delta_weights
def calculate_gradient_F(self):
""" differentiate between F_t and w_t """
diff_tnah = 1-self.F_t_value**2
diff_F_w = diff_tnah*( np.r_[ self.price_diff, self.pre_signal, self.bias ] )
diff_F_F = diff_tnah*self.weights[-2]
return diff_F_w + diff_F_F*self.pre_gradient_F
def update_R_moment(self):
delta_A = self.return_t - self.pre_A
delta_B = self.return_t**2 - self.pre_B
A_t = self.pre_A + self.eta*delta_A # A_t. first moment of R_t.
B_t = self.pre_B + self.eta*delta_B # B_t. second moment of R_t.
self.sr_store.append(A_t/B_t)
self.calculate_dsr(delta_A, delta_B)
self.pre_A = A_t
self.pre_B = B_t
def calculate_dsr(self,delta_A,delta_B):
dsr = self.pre_B*delta_A - 0.5*self.pre_A*delta_B
dsr /= (self.pre_B-self.pre_A**2)**1.5
self.dsr_store.append(dsr)
if __name__=='__main__':
M = 8
fx_agent = RRLAgentForFX(M,rho=0.01,eta=0.01,bias=0.25)
ifname = os.getcwd()+'/input/quote.csv'
data = pd.read_csv(ifname)
train_data = data.ix[:3000,'USD']
for price in train_data.values:
fx_agent.train_online(price)
I have downloaded and used the csv file of the date and time data of the foreign exchange rate (/ yen) from the Mizuho Bank Historical Data page. .. I used USD / JPY from April 1, 2002, and learned using 3,000 data.
DSR
SR
The result depends on the values of ρ and η. It's too unstable. .. I would like to update the code as soon as I notice the mistake. If you notice something strange, it would be greatly appreciated if you could comment.
Recommended Posts