Try to predict FX with LSTM using Keras + Tensorflow Part 2 (Calculate with GPU) I wrote that I will finally get off to a start I did. The reason is that there are many parameters used in deep learning and Forex, but I thought that it would take a considerable amount of time to find important or correct values among them.
That's right, you can't find it without using the GPU.
So this time I can use GPU, so I will try to find a good value by brute force various parameters that I have longed for.
The source can be found at https://github.com/rakichiki/keras_fx. Or do a git clone.
git clone https://github.com/rakichiki/keras_fx.git
The source this time is keras_fx_gpu_multi.ipynb. Get this and upload it to jupyter.
Let me explain a little.
First, I decided on the parameters I wanted to change. It is as follows. (If you look closely, it's not all ...)
Brute force
l_of_s_list = [20,25]
n_next_list = [5,7]
check_treshhold_list = [0.50,0.60]
#activation_list = ['sigmoid','tanh','linear']
activation_list = ['tanh']
#loss_func_list = ['mean_squared_error','mean_absolute_error','mean_squared_logarithmic_error']
loss_func_list = ['mean_squared_error','mean_absolute_error']
#optimizer_func_list = ['sgd','adadelta','adam','adamax']
optimizer_func_list = ['adadelta','adam','adamax']
#validation_split_number_list = [0.1,0.05]
validation_split_number_list = [0.05]
currency_pair_list = ['usdjpy']
#Storage of result files
if os.path.exists('result') == False:
os.mkdir('result')
if os.path.exists('png') == False:
os.mkdir('png')
save_file_name = 'result/result_' + dt.today().strftime("%Y%m%d%H%M%S") + '.txt'
save_file_name = dt.today().strftime("%Y%m%d%H%M%S")
#fx data acquisition
start_day = "20010101"
end_day = dt.today().strftime("%Y%m%d")
for currency_pair in currency_pair_list:
(train_start_count, train_end_count,test_start_count, test_end_count,data) = \
get_date(start_day, end_day, currency_pair)
file_name = currency_pair + '_d.csv'
for l_of_s in l_of_s_list:
for n_next in n_next_list:
for check_treshhold in check_treshhold_list:
#
(chane_data,average_value,diff_value, up_down,check_percent) = \
get_data(l_of_s, n_next,check_treshhold, file_name,train_start_count,\
train_end_count,test_start_count, test_end_count,data)
#
for activation in activation_list:
for loss_func in loss_func_list:
for optimizer_func in optimizer_func_list:
for validation_split_number in validation_split_number_list:
print('--------------------------')
fit_starttime = time.time()
fit(l_of_s, n_next,check_treshhold,file_name,save_file_name,activation,loss_func,optimizer_func,\
validation_split_number,train_start_count, train_end_count,test_start_count, test_end_count,\
chane_data,average_value,diff_value,up_down,check_percent)
print(str(math.floor(time.time() - fit_starttime)) + "s")
print('')
I would like to say that I would like to brute force these in the range of expectations, but since the time will increase exponentially, it is better to narrow down to some extent and investigate little by little. Well, even if you can use the GPU, if the speed is 10 times faster and the amount of calculation is 1000 times faster, it will be the original tree Ami. (But it is no longer within the range that can be calculated by the CPU)
Also, let's go little by little instead of turning around from the beginning due to the problem described later (I am the one who failed by turning a lot at the beginning).
The amount of calculation has exploded due to brute force parameters, but the improvement of GPU introduction alone is not enough. Therefore, we will introduce Early Stopping to prevent unnecessary looping by epochs.
EarlyStopping
early_stopping = EarlyStopping(monitor='val_loss', patience=10, verbose=1)
~~
high_history = high_model.fit(X_high_train, y_high_train, batch_size=100, epochs=300, \
validation_split=validation_split_number, callbacks=[early_stopping])
I think Keras is easy around here. However, it is unclear if this is the condition for Early Stopping.
Of course, you can't tell if the parameters are correct without looking at the learning curve. Introducing it is not too difficult.
Just keep the return value of fit and graph it.
Learning curve
#Learning
high_history = high_model.fit(X_high_train, y_high_train, batch_size=100, epochs=300, \
validation_split=validation_split_number, callbacks=[early_stopping])
~~~~
# high
val_loss = high_history.history['val_loss']
plt.rc('font',family='serif')
fig = plt.figure()
plt.plot(range(len(high_history.history['val_loss'])), val_loss, label='val_loss', color='black')
plt.xlabel('epochs')
plt.savefig('png/' + save_file_name + '_high_' + \
str(l_of_s) + '_' + str(n_next) + \
'_' + str(check_treshhold) + '_' + file_name + \
'_' + activation + '_' + loss_func + \
'_' + optimizer_func + '_' + str(validation_split_number) + \
'.png')
plt.show()
As a caveat, if you want to leave a graph, do plt.show () after plt.savefig. The reason is unknown, but if it is the other way around, it will not remain (I referred to the answer in the question corner somewhere).
When it is good, the graph with the transition of val_loss is displayed as shown below.
Well, it's another matter whether the hit rate is good just because this is beautiful. However, you can see whether learning is possible or not in this graph.
It is expected that it will take a very long time, but the PC may shut down on the way. I , I'm not the type who wants to let a PC without ECC memory work for over 10 hours and keep praying that it will not fall on the way.
So, save the analysis result to a file and take measures even if the PC goes down in the middle (although I will give up in case of storage failure).
File output
f = open(save_file_name, 'a')
f.write('l_of_s: ' + str(l_of_s) + ' n_next: ' + str(n_next) + \
' check_treshhold:' + str(check_treshhold) + ' file_name:' + file_name + \
' activation:' + activation + ' loss_func:' + loss_func + \
' optimizer_func:' + optimizer_func + ' validation_split_number:' + str(validation_split_number) + \
'\n')
f.write('UP: ' + str(up_ok_count) + ' - ' + str(up_ng_count) + ' - ' + str(up_ev_count) + '\n')
f.write('DN: ' + str(down_ok_count) + ' - ' + str(down_ng_count) + ' - ' + str(down_ev_count) + '\n')
f.close()
Was the csv format better? No, was it better to use JSON format (I like JSON format)? However, I will output the progress for the time being. Ah, the JSON format is useless considering that it will fail on the way.
You may want to save the graph as well, as mentioned above.
I turned it a lot for the time being. However, for the reason described later, it is a little gentle (don't say that there is only one pattern such as activation function).
The only currency pair is usdjpy. The result is as follows (over the number of days of trading judgment is not included in the hit rate).
Days for trading judgment | Days after buying and selling | Change rate for trading judgment | Activation function | Objective function | Optimization algorithm | Percentage of training data(%) | Number of hits when going up | Number of misses when going up | Number of hits when it goes down | Number of deviations when lowered | Total hit rate(%) |
---|---|---|---|---|---|---|---|---|---|---|---|
20 | 5 | 0.5 | tanh | mse | adadelta | 0.05 | 55 | 34 | 114 | 81 | 59.5 |
20 | 5 | 0.5 | tanh | mse | adam | 0.05 | 24 | 22 | 66 | 46 | 57.0 |
20 | 5 | 0.5 | tanh | mse | adamax | 0.05 | 14 | 14 | 46 | 33 | 56.1 |
20 | 5 | 0.5 | tanh | mae | adadelta | 0.05 | 69 | 58 | 95 | 88 | 52.9 |
20 | 5 | 0.5 | tanh | mae | adam | 0.05 | 31 | 28 | 69 | 58 | 53.8 |
20 | 5 | 0.5 | tanh | mae | adamax | 0.05 | 29 | 26 | 84 | 69 | 54.3 |
20 | 5 | 0.6 | tanh | mse | adadelta | 0.05 | 72 | 53 | 129 | 98 | 57.1 |
20 | 5 | 0.6 | tanh | mse | adam | 0.05 | 64 | 52 | 111 | 97 | 54.0 |
20 | 5 | 0.6 | tanh | mse | adamax | 0.05 | 43 | 33 | 59 | 52 | 54.5 |
20 | 5 | 0.6 | tanh | mae | adadelta | 0.05 | 51 | 40 | 140 | 120 | 54.4 |
20 | 5 | 0.6 | tanh | mae | adam | 0.05 | 75 | 57 | 102 | 75 | 57.3 |
20 | 5 | 0.6 | tanh | mae | adamax | 0.05 | 45 | 39 | 107 | 93 | 53.5 |
20 | 7 | 0.5 | tanh | mse | adadelta | 0.05 | 11 | 12 | 84 | 81 | 50.5 |
20 | 7 | 0.5 | tanh | mse | adam | 0.05 | 7 | 5 | 45 | 35 | 56.5 |
20 | 7 | 0.5 | tanh | mse | adamax | 0.05 | 22 | 18 | 61 | 40 | 58.9 |
20 | 7 | 0.5 | tanh | mae | adadelta | 0.05 | 46 | 37 | 92 | 81 | 53.9 |
20 | 7 | 0.5 | tanh | mae | adam | 0.05 | 25 | 28 | 47 | 31 | 55.0 |
20 | 7 | 0.5 | tanh | mae | adamax | 0.05 | 20 | 28 | 75 | 62 | 51.4 |
20 | 7 | 0.6 | tanh | mse | adadelta | 0.05 | 23 | 16 | 39 | 39 | 53.0 |
20 | 7 | 0.6 | tanh | mse | adam | 0.05 | 24 | 21 | 77 | 67 | 53.4 |
20 | 7 | 0.6 | tanh | mse | adamax | 0.05 | 27 | 26 | 61 | 45 | 55.3 |
20 | 7 | 0.6 | tanh | mae | adadelta | 0.05 | 56 | 43 | 120 | 107 | 54.0 |
20 | 7 | 0.6 | tanh | mae | adam | 0.05 | 40 | 36 | 65 | 58 | 52.8 |
20 | 7 | 0.6 | tanh | mae | adamax | 0.05 | 49 | 41 | 60 | 54 | 53.4 |
25 | 5 | 0.5 | tanh | mse | adadelta | 0.05 | 54 | 32 | 86 | 60 | 60.3 |
25 | 5 | 0.5 | tanh | mse | adam | 0.05 | 25 | 21 | 59 | 41 | 57.5 |
25 | 5 | 0.5 | tanh | mse | adamax | 0.05 | 15 | 14 | 53 | 39 | 56.2 |
25 | 5 | 0.5 | tanh | mae | adadelta | 0.05 | 46 | 37 | 126 | 95 | 56.6 |
25 | 5 | 0.5 | tanh | mae | adam | 0.05 | 34 | 30 | 56 | 41 | 55.9 |
25 | 5 | 0.5 | tanh | mae | adamax | 0.05 | 25 | 24 | 69 | 47 | 57.0 |
25 | 5 | 0.6 | tanh | mse | adadelta | 0.05 | 23 | 21 | 108 | 94 | 53.3 |
25 | 5 | 0.6 | tanh | mse | adam | 0.05 | 19 | 20 | 58 | 51 | 52.0 |
25 | 5 | 0.6 | tanh | mse | adamax | 0.05 | 18 | 19 | 86 | 69 | 54.2 |
25 | 5 | 0.6 | tanh | mae | adadelta | 0.05 | 92 | 80 | 92 | 85 | 52.7 |
25 | 5 | 0.6 | tanh | mae | adam | 0.05 | 26 | 28 | 117 | 100 | 52.8 |
25 | 5 | 0.6 | tanh | mae | adamax | 0.05 | 32 | 31 | 126 | 102 | 54.3 |
25 | 7 | 0.5 | tanh | mse | adadelta | 0.05 | 32 | 18 | 110 | 95 | 55.7 |
25 | 7 | 0.5 | tanh | mse | adam | 0.05 | 16 | 16 | 37 | 19 | 60.2 |
25 | 7 | 0.5 | tanh | mse | adamax | 0.05 | 9 | 10 | 42 | 28 | 57.3 |
25 | 7 | 0.5 | tanh | mae | adadelta | 0.05 | 33 | 23 | 40 | 30 | 57.9 |
25 | 7 | 0.5 | tanh | mae | adam | 0.05 | 25 | 21 | 71 | 55 | 55.8 |
25 | 7 | 0.5 | tanh | mae | adamax | 0.05 | 36 | 29 | 55 | 38 | 57.6 |
25 | 7 | 0.6 | tanh | mse | adadelta | 0.05 | 43 | 35 | 104 | 92 | 53.6 |
25 | 7 | 0.6 | tanh | mse | adam | 0.05 | 23 | 23 | 63 | 58 | 51.5 |
25 | 7 | 0.6 | tanh | mse | adamax | 0.05 | 25 | 22 | 90 | 70 | 55.6 |
25 | 7 | 0.6 | tanh | mae | adadelta | 0.05 | 37 | 25 | 118 | 108 | 53.8 |
25 | 7 | 0.6 | tanh | mae | adam | 0.05 | 33 | 25 | 76 | 63 | 55.3 |
25 | 7 | 0.6 | tanh | mae | adamax | 0.05 | 40 | 25 | 74 | 59 | 57.6 |
The average was 55% at 60% at best and 50% at worst, which was a little better than the dice. By the way, it took about 2 hours to calculate 48 patterns (with Geforce GTX 1070). It is also expected that increasing the parameters will take time exponentially. For this reason, it will be necessary to take measures to speed up somewhere, and since the hit rate is poor, it is necessary to take measures, but before that, a big problem was found.
I was able to find the desired parameters to some extent, but I found a disappointing problem.
It's a memory-intensive problem. Initially I was looking for around 1000 patterns, but an event that became very slow occurred in the middle. After taking the countermeasures, the state of the PC as a result of doing less than 48 patterns is as follows.
The memory of the PC itself consumes 12GB, and the GPU consumes 2GB. Although it is not posted when one pattern is executed, the GPU consumed less than 1GB and the main unit consumed less than 4GB.
Well, if anything, it's a memory leak. Initially, this PC had only 8GB of memory, but in a hurry I replaced the memory (the chassis is Mini-ITX and there are only two memory slots) and increased it to 32GB (I think 16GB was good here). There are opinions, but half-finished investment does not give good results, so I made it 32GB at once).
I don't know why it's not consuming (or freeing) memory so much, but if you want to do more with this script you should take into account the amount of memory and the pattern and time to run it. .. I haven't come up with a workaround so far.
This series was actually planned so far. In the future, I will do my best to improve the result, which is better than the dice or within the margin of error, but there is no guarantee that the result will be obtained so far.
For this reason, I don't know how far I can go, but I think there are many things I can do. Here's what I'm assuming at this point:
It looks like we can do a lot like this, but to be clear, it's a level that 1 GPU is unlikely to be able to do. In that case, it may be necessary to install multiple GPUs or rent AWS. I plan to think about the next measures while thinking a little about this area.
Recommended Posts