[Python] Try optimizing FX systole parameters with random search
It is a continuation of. Let's implement a genetic algorithm (GA) instead of a random search.
The creation of hourly data is the same as last time.
import numpy as np
import pandas as pd
import indicators as ind #indicators.Import of py
from backtest import Backtest,BacktestReport
dataM1 = pd.read_csv('DAT_ASCII_EURUSD_M1_2015.csv', sep=';',
names=('Time','Open','High','Low','Close', ''),
index_col='Time', parse_dates=True)
dataM1.index += pd.offsets.Hour(7) #7 hour offset
ohlc = ind.TF_ohlc(dataM1, 'H') #Creation of 1-hour data
You need indicators.py and backtest.py uploaded on GitHub. For backtest.py, there is a slight fix in BacktestReport
.
This time, in order to increase the combination of parameter values to be optimized, we will add a signal for settlement to the crossing system of two moving averages. The signal for payment is defined as follows.
In this system, there are 3 parameters. This time, we will search each parameter in the following range.
SlowMAperiod = np.arange(7, 151) #Range of long-term moving average period
FastMAperiod = np.arange(5, 131) #Range of short-term moving averages
ExitMAperiod = np.arange(3, 111) #Range of moving average period for settlement
There are about 2 million combinations. It's possible to brute force, but it will take several hours.
The main routine of the genetic algorithm is almost the same as the previous random search. Simply replace the random search for parameters with the genetic processing described below. In addition, the trading signal adds a signal for settlement as in the above rule.
def Optimize(ohlc, Prange):
def shift(x, n=1): return np.concatenate((np.zeros(n), x[:-n])) #Shift function
SlowMA = np.empty([len(Prange[0]), len(ohlc)]) #Long-term moving average
for i in range(len(Prange[0])):
SlowMA[i] = ind.iMA(ohlc, Prange[0][i])
FastMA = np.empty([len(Prange[1]), len(ohlc)]) #Short-term moving average
for i in range(len(Prange[1])):
FastMA[i] = ind.iMA(ohlc, Prange[1][i])
ExitMA = np.empty([len(Prange[2]), len(ohlc)]) #Moving average for payment
for i in range(len(Prange[2])):
ExitMA[i] = ind.iMA(ohlc, Prange[2][i])
Close = ohlc['Close'].values #closing price
M = 20 #Population
Eval = np.zeros([M, 6]) #Evaluation item
Param = InitParam(Prange, M) #Parameter initialization
gens = 0 #Number of generations
while gens < 100:
for k in range(M):
i0 = Param[k,0]
i1 = Param[k,1]
i2 = Param[k,2]
#Buy entry signal
BuyEntry = (FastMA[i1] > SlowMA[i0]) & (shift(FastMA[i1]) <= shift(SlowMA[i0]))
#Sell entry signal
SellEntry = (FastMA[i1] < SlowMA[i0]) & (shift(FastMA[i1]) >= shift(SlowMA[i0]))
#Buy exit signal
BuyExit = (Close < ExitMA[i2]) & (shift(Close) >= shift(ExitMA[i2]))
#Sell exit signal
SellExit = (Close > ExitMA[i2]) & (shift(Close) <= shift(ExitMA[i2]))
#Backtest
Trade, PL = Backtest(ohlc, BuyEntry, SellEntry, BuyExit, SellExit)
Eval[k] = BacktestReport(Trade, PL)
#Alternation of generations
Param = Evolution(Param, Eval[:,0], Prange)
gens += 1
print(gens, Eval[0,0])
Slow = Prange[0][Param[:,0]]
Fast = Prange[1][Param[:,1]]
Exit = Prange[2][Param[:,2]]
return pd.DataFrame({'Slow':Slow, 'Fast':Fast, 'Exit':Exit, 'Profit': Eval[:,0], 'Trades':Eval[:,1],
'Average':Eval[:,2],'PF':Eval[:,3], 'MDD':Eval[:,4], 'RF':Eval[:,5]},
columns=['Slow','Fast','Exit','Profit','Trades','Average','PF','MDD','RF'])
Another change from the last time is that the range of the three parameters SlowMAperiod
, FastMAperiod
, and ʻExitMAperiod is put together in a list called
Prange` and passed to each function. By doing this, even if the number of parameters increases, it can be handled as it is.
In the above function, the functions added for GA are ʻInitParam () and ʻEvolution ()
. First, ʻInitParam ()` is the initialization of the parameters of each individual.
from numpy.random import randint,choice
#Parameter initialization
def InitParam(Prange, M):
Param = randint(len(Prange[0]), size=M)
for i in range(1,len(Prange)):
Param = np.vstack((Param, randint(len(Prange[i]), size=M)))
return Param.T
ʻEvolution ()` contains some genetic processing as follows:
#Genetic processing
def Evolution(Param, Eval, Prange):
#Roulette selection with elite storage
#1 point crossing
#Neighborhood generation
#mutation
return Param
Each process is explained below.
First, select the individual to be left for the next generation from the current individuals. There are several ways to select it, but I found a Numpy function that is useful for roulette selection, so I'll use that. Roulette selection is a method of selecting individuals that will be stochastically left for the next generation according to the magnitude of fitness, which is the evaluation value of the back test. The higher the fitness, the easier it is to remain.
The function used this time is a function called numpy.random.choice ()
, which randomly selects the required number from the list, but if you add a list of selection probabilities called p
to the optional argument, that It will choose according to the probability. This is the roulette selection itself. The code looks like this:
#Roulette selection with elite storage
Param = Param[np.argsort(Eval)[::-1]] #sort
R = Eval-min(Eval)
R = R/sum(R)
idx = choice(len(Eval), size=len(Eval), replace=True, p=R)
idx[0] = 0 #Elite save
Param = Param[idx]
However, it is inconvenient if the probability is negative, so it has been corrected so that the minimum value of fitness is 0. Also, if you only select roulette, you may not be unluckily selected even if the fitness is high, so we sort the fitness so that the highest individual (elite) will always remain in the next generation.
Next, the genes are crossed. This is to select two individuals and exchange some of their genetic information with each other. There are several methods of crossing, but here I chose one of the parameters and exchanged the front and back.
#1 point crossing
N = 10
idx = choice(np.arange(1,len(Param)), size=N, replace=False)
for i in range(0,N,2):
ix = idx[i:i+2]
p = randint(1,len(Prange))
Param[ix] = np.hstack((Param[ix][:,:p], Param[ix][:,p:][::-1]))
Again, use choice ()
to generate a sequence of random numbers for the number of intersecting individuals. By adding replace = False
, you can get a unique sequence of random numbers. Then, crossing is realized by selecting two individuals as ʻixand exchanging the data in the latter half of the crossing point
p`.
In normal GA, evolution is simulated by selection, crossing, and mutation, but if the crossing point is limited to the break of the parameter like this time, it will be full of the same individuals before you know it. Evolution will stop. However, if you increase the number of mutations, it will be close to a random search, so it is not very efficient. Therefore, this time, we will change some of the parameters to +1 or -1. It is a so-called neighborhood solution, which is often used in local search algorithms.
#Neighborhood generation
N = 10
idx = choice(np.arange(1,len(Param)), size=N, replace=False)
diff = choice([-1,1], size=N).reshape(N,1)
for i in range(N):
p = randint(len(Prange))
Param[idx[i]][p:p+1] = (Param[idx[i]][p]+diff[i]+len(Prange[p]))%len(Prange[p])
Select an individual that creates a neighborhood as well as a cross. Then, decide which parameter to change with a random number again, and change that parameter by 1.
Finally, mutate. There are several ways to do this, but some of the parameters of the selected individual are rewritten with new random numbers. In the case of GA, mutation is important to get out of the local solution, but if you use it a lot, the randomness will increase, so here we will set it to about two.
#mutation
N = 2
idx = choice(np.arange(1,len(Param)), size=N, replace=False)
for i in range(N):
p = randint(len(Prange))
Param[idx[i]][p:p+1] = randint(len(Prange[p]))
Let's execute the genetic algorithm using the function defined above.
result = Optimize(ohlc, [SlowMAperiod, FastMAperiod, ExitMAperiod])
result.sort_values('Profit', ascending=False)
GA also uses random numbers, so the results will be different each time. The following is an example of the results, showing the highest fitness for each generation. Since it is saved as an elite, the high fitness will be updated sequentially.
1 -94.9
2 958.2
3 958.2
4 958.2
5 1030.3
6 1030.3
7 1030.3
8 1454.0
9 1550.9
10 1550.9
11 1850.8
12 1850.8
13 1850.8
14 1850.8
15 1850.8
16 1850.8
17 2022.5
18 2076.5
19 2076.5
20 2076.5
:
61 2076.5
62 2076.5
63 2076.5
64 2076.5
65 2076.5
66 2316.2
67 2316.2
68 2316.2
69 2316.2
70 2316.2
:
95 2316.2
96 2316.2
97 2316.2
98 2316.2
99 2316.2
100 2316.2
The individuals that remained in the last generation are as follows.
Slow | Fast | Exit | Profit | Trades | Average | PF | MDD | RF | |
---|---|---|---|---|---|---|---|---|---|
0 | 126 | 17 | 107 | 2316.2 | 75.0 | 30.882667 | 2.306889 | 387.1 | 5.983467 |
18 | 126 | 15 | 107 | 2316.2 | 75.0 | 30.882667 | 2.306889 | 387.1 | 5.983467 |
8 | 105 | 18 | 106 | 2210.2 | 76.0 | 29.081579 | 2.247080 | 387.1 | 5.709636 |
17 | 126 | 18 | 108 | 2130.9 | 75.0 | 28.412000 | 2.158098 | 424.9 | 5.015062 |
10 | 126 | 18 | 107 | 2078.4 | 79.0 | 26.308861 | 1.980794 | 448.3 | 4.636181 |
9 | 127 | 18 | 107 | 2074.5 | 73.0 | 28.417808 | 2.184819 | 371.3 | 5.587126 |
6 | 126 | 15 | 7 | 2030.3 | 76.0 | 26.714474 | 2.007143 | 415.7 | 4.884051 |
16 | 126 | 14 | 107 | 2024.9 | 76.0 | 26.643421 | 2.100489 | 424.9 | 4.765592 |
5 | 126 | 17 | 107 | 1954.7 | 74.0 | 26.414865 | 1.917441 | 448.3 | 4.360250 |
13 | 126 | 17 | 105 | 1878.7 | 79.0 | 23.781013 | 1.888694 | 414.2 | 4.535732 |
2 | 127 | 18 | 107 | 1872.4 | 75.0 | 24.965333 | 1.878813 | 448.3 | 4.176667 |
12 | 126 | 17 | 101 | 1869.6 | 76.0 | 24.600000 | 2.063300 | 420.4 | 4.447193 |
11 | 92 | 15 | 107 | 1859.5 | 73.0 | 25.472603 | 2.006223 | 358.8 | 5.182553 |
14 | 125 | 14 | 108 | 1843.1 | 84.0 | 21.941667 | 1.811938 | 473.6 | 3.891681 |
4 | 124 | 14 | 107 | 1839.8 | 75.0 | 24.530667 | 1.975245 | 420.4 | 4.376308 |
3 | 42 | 19 | 107 | 1796.8 | 75.0 | 23.957333 | 1.912405 | 410.7 | 4.374970 |
1 | 125 | 15 | 106 | 1614.7 | 81.0 | 19.934568 | 1.711729 | 386.9 | 4.173430 |
19 | 104 | 18 | 107 | 1583.7 | 94.0 | 16.847872 | 1.654746 | 393.4 | 4.025674 |
7 | 125 | 17 | 106 | 1421.7 | 81.0 | 17.551852 | 1.629015 | 574.4 | 2.475104 |
15 | 92 | 16 | 107 | 539.8 | 103.0 | 5.240777 | 1.150513 | 605.1 | 0.892084 |
I haven't investigated the optimal solution to this problem, so I don't know, but I think this result is reasonably high. In the first place, the purpose of GA is not to find the optimal solution, but to find the quasi-optimal solution in a shorter time than brute force.
In fact, finding the optimal solution with the parameters of the systole does not mean that the system will produce the same results over different periods of time. So, if you get a decent solution in 2000 trials out of 2 million combinations, you should be fine.
Recommended Posts