This method has not yet obtained good results. I'm at the stage of trying out ideas as a hobby, so I don't think it will be useful for those who are looking for a tool that can be used immediately. please note that. m (__) m [Previous article] Examination of exchange rate forecasting method using deep learning and wavelet transform
Figure 1 summarizes what we did this time. We will examine whether it is possible to predict the movement (up or down) of the exchange rate after 12 hours by continuously wavelet transforming the closing price of the 5-minute bar of USD / JPY and EUR / JPY, imaging it, and letting AI (CNN) learn it. did. The average accuracy rate for test data with the number of learnings = 8000 or more (even if the number of learnings is increased further, the accuracy rate for training data does not increase) was 53.7%.
Figure 1. Summary of what I did this time
Figure 2 shows a schematic diagram of the wavelet transform. The Fourier transform is an analytical method that expresses a complex waveform by adding infinitely continuous sine waves. On the other hand, the wavelet transform expresses a complicated waveform by adding the localized waves (wavelets). While the Fourier transform is good at analyzing stationary signals, the wavelet transform is suitable for analyzing irregular and non-stationary waveforms.
Figure 2. Schematic diagram of wavelet transform Source: https://www.slideshare.net/ryosuketachibana12/ss-42388444
The mapping of the wavelet strength at each shift (time) and each scale (frequency) is called a scalogram. Figure 2 is a scalogram created from the wavelet transform result of y = sin (πx / 16). Arbitrary waveforms can be imaged by using the wavelet transform in this way.
Figure 3. Scalogram example, y = sin (πx / 16)
There are two types of wavelet transform, continuous wavelet transform (CWT) and discrete wavelet transform (DWT), but this time we used continuous wavelet transform. There are various shapes of wavelets, but for the time being, the Gaussian function is used.
Last time, we predicted the price movement of USD / JPY from USD / JPY, but this time we will predict the price movement of USD / JPY from USD / JPY and EUR / JPY. I would like to create a scalogram from the same time data for USD / JPY and EUR / JPY respectively, but the exchange data I used had a time when the data was missing. Therefore, it was necessary to extract only the exchange data of the time that exists in both USD / JPY and EUR / JPY. Therefore, I created the following code. By the way, the exchange data I used is shown in Table 1.
Table 1. USD / JPY 5 minutes
align_USD_EUR
import numpy as np
def align_USD_EUR(USD_csv, EUR_csv):
"""
USD/JPY and EUR/A function that deletes missing data in JPY and extracts the closing price of the time that exists in both
USD_csv : USD/File name of 5 minutes of JPY
EUR_csv : EUR/File name of 5 minutes of JPY
"""
USD = np.loadtxt(USD_csv, delimiter = ",", usecols = (0,1,5), skiprows = 1, dtype="S8")
EUR = np.loadtxt(EUR_csv, delimiter = ",", usecols = (0,1,5), skiprows = 1, dtype="S8")
print("EUR shape " + str(EUR.shape)) # for debag
print("USD shape " + str(USD.shape)) # for debag
print("")
USD_close = USD[:,2]
EUR_close = EUR[:,2]
USD = np.core.defchararray.add(USD[:,0], USD[:,1])
EUR = np.core.defchararray.add(EUR[:,0], EUR[:,1])
#Index where the time does not match(idx_mismatch)To get
if USD.shape[0] > EUR.shape[0]:
temp_USD = USD[:EUR.shape[0]]
coincidence = EUR == temp_USD
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
elif EUR.shape[0] > USD.shape[0]:
temp_EUR = EUR[:USD.shape[0]]
coincidence = USD == temp_EUR
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
elif USD.shape[0] == EUR.shape[0]:
coincidence = USD == EUR
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
while USD.shape[0] != idx_mismatch:
print("idx mismatch " + str(idx_mismatch)) # for debag
print("USD[idx_mismatch] " + str(USD[idx_mismatch]))
print("EUR[idx_mismatch] " + str(EUR[idx_mismatch]))
#Delete unnecessary data
if USD[idx_mismatch] > EUR[idx_mismatch]:
EUR = np.delete(EUR, idx_mismatch)
EUR_close = np.delete(EUR_close, idx_mismatch)
elif EUR[idx_mismatch] > USD[idx_mismatch]:
USD = np.delete(USD, idx_mismatch)
USD_close = np.delete(USD_close, idx_mismatch)
print("EUR shape " + str(EUR.shape)) # for debag
print("USD shape " + str(USD.shape)) # for debag
print("")
if USD.shape[0] > EUR.shape[0]:
temp_USD = USD[:EUR.shape[0]]
coincidence = EUR == temp_USD
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
elif EUR.shape[0] > USD.shape[0]:
temp_EUR = EUR[:USD.shape[0]]
coincidence = USD == temp_EUR
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
elif USD.shape[0] == EUR.shape[0]:
coincidence = USD == EUR
if (coincidence==False).any():
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
else:
idx_mismatch = np.where(coincidence == True)
idx_mismatch = idx_mismatch[0].shape[0]
USD = np.reshape(USD, (-1,1))
EUR = np.reshape(EUR, (-1,1))
USD_close = np.reshape(USD_close, (-1,1))
EUR_close = np.reshape(EUR_close, (-1,1))
USD = np.append(USD, EUR, axis=1)
USD = np.append(USD, USD_close, axis=1)
USD = np.append(USD, EUR_close, axis=1)
np.savetxt("USD_EUR.csv", USD, delimiter = ",", fmt="%s")
return USD_close, EUR_close
Last time, all scalograms were created from daily waveform data. However, the people who actually trade change the period of the waveform to be evaluated as needed. Therefore, this time, we created a scalogram from the data of different periods. The data period was selected as follows. It is possible to make the data period finer, but this was the limit due to memory constraints. Learning data: 1 day, 1.5 days, 2 days, 2.5 days, 3 days Test data: 1 day Now, the problem here is that the size of the scalogram changes when the data period is changed. CNN cannot learn images of different sizes. Therefore, we unified the image size to 128x128 using the image processing library "Pillow".
Figure 4. Schematic diagram of unified image size
Resize scalogram with Pillow
from PIL import Image
"""
original_scalogram :Original scalogram(numpy array)
width :Image width after resizing(=128)
height :Image height after resizing(=128)
"""
img_scalogram = Image.fromarray(original_scalogram) #Convert to image object
img_scalogram = img_scalogram.resize((width, height)) #Image resizing
array_scalogram = np.array(img_scalogram) #Convert to numpy array
The structure of CNN and the learning flow are shown in Fig. 5 and Fig. 6, respectively.
Figure 5. CNN structure
Figure 6. Learning flow
Figure 7 shows the transition of the correct answer rate for the training data and the test data. If the number of learnings = 8000 or more, the correct answer rate for the learning data will not increase. The average accuracy rate for test data with the number of learnings = 8000 to 20000 was 53.7%. It seems that the accuracy rate of the test data increases when the number of learnings = 0 to 4000, but I can't say anything about it.
Figure 7. Transition of correct answer rate
The AI prediction result is output with a probability such as "up: 82%, down: 18%". Figure 8 shows the transition of the prediction results for the test data. At the beginning of learning, the certainty is low for most of the data, for example, "up: 52%, down: 48%". However, as the number of learnings increases, it becomes only 90% to 100%. Even though I answered with such confidence, the correct answer rate = 53.7% seems strange.
Figure 8. Transition of prediction results for test data
So, although the correct answer rate was> 50%, I still can't say whether it was a coincidence or whether I could grasp the characteristics. .. .. I think it is necessary to verify whether the same accuracy rate can be obtained even if the learning period and the test period are changed. As shown in Fig. 8, the fact that the prediction results with high certainty increase as the learning progresses means that the training data contains scalograms with similar characteristics to the test data. I think that the correct answer rate does not increase because the future price movements do not match between the training data and the test data even if the scalogram is judged to be similar in AI. Therefore, by considering not only the euro but also other currencies and financial data (increasing the number of data channels), we hope that the scalogram will be diversified and the above-mentioned mistake judges may be reduced. For the time being, there are too many prediction results with high conviction compared to the correct answer rate. .. .. Lol
Yu-Nie
Appendix The data used for the analysis can be downloaded from the following. USDJPY_20160301_20170228_5min.csv USDJPY_20170301_20170731_5min.csv EURJPY_20160301_20170228_5min.csv EURJPY_20170301_20170731_5min.csv
Below is the code used for the analysis.
scalogram_test_7
# 20170731
# y.izumi
import tensorflow as tf
import numpy as np
import scalogram4 as sca #Module for FFT and spectrogram creation
import time
"""Functions that perform parameter initialization, convolution operations, and pooling operations"""
#=============================================================================================================================================
#Weight initialization function
def weight_variable(shape, stddev=5e-3): # default stddev = 1e-4
initial = tf.truncated_normal(shape, stddev=stddev)
return tf.Variable(initial)
#Bias initialization function
def bias_variable(shape):
initial = tf.constant(0.0, shape=shape)
return tf.Variable(initial)
#Convolution operation
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding="SAME")
# pooling
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")
#=============================================================================================================================================
"""Scalogram creation conditions"""
#=============================================================================================================================================
train_USD_csv = "USDJPY_20160301_20170228_5min.csv" #Exchange data file name, train
train_EUR_csv = "EURJPY_20160301_20170228_5min.csv"
# train_USD_csv = "USDJPY_20170301_20170731_5min.csv" #Exchange data file name, train, for debag
# train_EUR_csv = "EURJPY_20170301_20170731_5min.csv"
test_USD_csv = "USDJPY_20170301_20170731_5min.csv" #Exchange data file name, test
test_EUR_csv = "EURJPY_20170301_20170731_5min.csv"
# scales = np.arange(1,129)
predict_time_inc = 144 #Increment of time to predict price movement
# train_heights = [288] #Scalogram height, num of time lines,Specify in the list
# test_heights = [288]
train_heights = [288, 432, 576, 720, 864] #Scalogram height, num of time lines,Specify in the list
test_heights = [288]
base_height = 128 #Height of scalogram used for training data
width = 128 #Scallogram width, num of freq lines
ch_flag = 1 #Select the data to be used from the four values and the volume, ch_flag=1:close,Under construction(ch_flag=2:close and volume, ch_flag=5:start, high, low, close, volume)
input_dim = (ch_flag, base_height, width) # channel = (1, 2, 5), height(time_lines), width(freq_lines)
save_flag = 0 # save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
scales = np.linspace(0.2,80,width) #Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet = "gaus1" #Wavelet name, 'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh'
tr_over_lap_inc = 4 #Incremental train data of CWT start time
te_over_lap_inc = 36 #Incremental CWT start time test data
#==============================================================================================================================================
"""Creating scalograms and labels"""
#==============================================================================================================================================
# carry out CWT and make labels
print("Making the train data.")
x_train, t_train, freq = sca.merge_scalogram3(train_USD_csv, train_EUR_csv, scales, wavelet, train_heights, base_height, width, predict_time_inc, ch_flag, save_flag, tr_over_lap_inc)
# x_train, t_train, freq = sca.merge_scalogram3(test_USD_csv, test_EUR_csv, scales, wavelet, train_heights, base_height, width, predict_time_inc, ch_flag, save_flag, tr_over_lap_inc) # for debag
print("Making the test data.")
x_test, t_test, freq = sca.merge_scalogram3(test_USD_csv, test_EUR_csv, scales, wavelet, test_heights, base_height, width, predict_time_inc, ch_flag, save_flag, te_over_lap_inc)
# save scalograms and labels
print("Save scalogarams and labels")
np.savetxt(r"temp_result\x_train.csv", x_train.reshape(-1, 2*base_height*width), delimiter = ",")
np.savetxt(r"temp_result\x_test.csv", x_test.reshape(-1, 2*base_height*width), delimiter = ",")
np.savetxt(r"temp_result\t_train.csv", t_train, delimiter = ",", fmt = "%.0f")
np.savetxt(r"temp_result\t_test.csv", t_test, delimiter = ",", fmt = "%.0f")
np.savetxt(r"temp_result\frequency.csv", freq, delimiter = ",")
# load scalograms and labels
# print("Load scalogarams and labels")
# x_train = np.loadtxt(r"temp_result\x_train.csv", delimiter = ",")
# x_test = np.loadtxt(r"temp_result\x_test.csv", delimiter = ",")
# t_train = np.loadtxt(r"temp_result\t_train.csv", delimiter = ",", dtype = "i8")
# t_test = np.loadtxt(r"temp_result\t_test.csv", delimiter = ",", dtype = "i8")
# x_train = x_train.reshape(-1, 2, base_height, width)
# x_test = x_test.reshape(-1, 2, base_height, width)
# freq = np.loadtxt(r"temp_result\frequency.csv", delimiter = ",")
print("x_train shape " + str(x_train.shape))
print("t_train shape " + str(t_train.shape))
print("x_test shape " + str(x_test.shape))
print("t_test shape " + str(t_test.shape))
print("mean_t_train " + str(np.mean(t_train)))
print("mean_t_test " + str(np.mean(t_test)))
print("frequency " + str(freq))
#==============================================================================================================================================
"""Data shape processing"""
#==============================================================================================================================================
#Swap dimensions for tensorflow
x_train = x_train.transpose(0, 2, 3, 1) # (num_data, ch, height(time_lines), width(freq_lines)) ⇒ (num_data, height(time_lines), width(freq_lines), ch)
x_test = x_test.transpose(0, 2, 3, 1)
train_size = x_train.shape[0] #Number of training data
test_size = x_test.shape[0] #Number of test data
train_batch_size = 100 #Learning batch size
test_batch_size = 600 #Test batch size
# labes to one-hot
t_train_onehot = np.zeros((train_size, 2))
t_test_onehot = np.zeros((test_size, 2))
t_train_onehot[np.arange(train_size), t_train] = 1
t_test_onehot[np.arange(test_size), t_test] = 1
t_train = t_train_onehot
t_test = t_test_onehot
# print("t train shape onehot" + str(t_train.shape)) # for debag
# print("t test shape onehot" + str(t_test.shape))
#==============================================================================================================================================
"""Build CNN"""
#==============================================================================================================================================
x = tf.placeholder(tf.float32, [None, input_dim[1], input_dim[2], 2]) # (num_data, height(time), width(freq_lines), ch),ch is the number of input data channels, USD/JPY, EUR/JPY ⇒ ch = 2
y_ = tf.placeholder(tf.float32, [None, 2]) # (num_data, num_label)
print("input shape ", str(x.get_shape()))
with tf.variable_scope("conv1") as scope:
W_conv1 = weight_variable([5, 5, 2, 16])
b_conv1 = bias_variable([16])
h_conv1 = tf.nn.relu(conv2d(x, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
print("conv1 shape ", str(h_pool1.get_shape()))
with tf.variable_scope("conv2") as scope:
W_conv2 = weight_variable([5, 5, 16, 32])
b_conv2 = bias_variable([32])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
print("conv2 shape ", str(h_pool2.get_shape()))
h_pool2_height = int(h_pool2.get_shape()[1])
h_pool2_width = int(h_pool2.get_shape()[2])
with tf.variable_scope("conv3") as scope:
W_conv3 = weight_variable([5, 5, 32, 64])
b_conv3 = bias_variable([64])
h_conv3 = tf.nn.relu(conv2d(h_pool2, W_conv3) + b_conv3)
h_pool3 = max_pool_2x2(h_conv3)
print("conv3 shape ", str(h_pool3.get_shape()))
h_pool3_height = int(h_pool3.get_shape()[1])
h_pool3_width = int(h_pool3.get_shape()[2])
with tf.variable_scope("fc1") as scope:
W_fc1 = weight_variable([h_pool3_height*h_pool3_width*64, 1024])
b_fc1 = bias_variable([1024])
h_pool3_flat = tf.reshape(h_pool3, [-1, h_pool3_height*h_pool3_width*64])
h_fc1 = tf.nn.relu(tf.matmul(h_pool3_flat, W_fc1) + b_fc1)
print("fc1 shape ", str(h_fc1.get_shape()))
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
with tf.variable_scope("fc2") as scope:
W_fc2 = weight_variable([1024, 2])
b_fc2 = bias_variable([2])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
print("output shape ", str(y_conv.get_shape()))
#Visualize parameters with tensorboard
W_conv1 = tf.summary.histogram("W_conv1", W_conv1)
b_conv1 = tf.summary.histogram("b_conv1", b_conv1)
W_conv2 = tf.summary.histogram("W_conv2", W_conv2)
b_conv2 = tf.summary.histogram("b_conv2", b_conv2)
W_conv3 = tf.summary.histogram("W_conv3", W_conv3)
b_conv3 = tf.summary.histogram("b_conv3", b_conv3)
W_fc1 = tf.summary.histogram("W_fc1", W_fc1)
b_fc1 = tf.summary.histogram("b_fc1", b_fc1)
W_fc2 = tf.summary.histogram("W_fc2", W_fc2)
b_fc2 = tf.summary.histogram("b_fc2", b_fc2)
#==============================================================================================================================================
"""Specifying the error function"""
#==============================================================================================================================================
# cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
cross_entropy = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits = y_conv))
loss_summary = tf.summary.scalar("loss", cross_entropy) # for tensorboard
#==============================================================================================================================================
"""Specify optimizer"""
#==============================================================================================================================================
optimizer = tf.train.AdamOptimizer(1e-4)
train_step = optimizer.minimize(cross_entropy)
#Visualize the gradient with a tensorboard
grads = optimizer.compute_gradients(cross_entropy)
dW_conv1 = tf.summary.histogram("dW_conv1", grads[0]) # for tensorboard
db_conv1 = tf.summary.histogram("db_conv1", grads[1])
dW_conv2 = tf.summary.histogram("dW_conv2", grads[2])
db_conv2 = tf.summary.histogram("db_conv2", grads[3])
dW_conv3 = tf.summary.histogram("dW_conv3", grads[4])
db_conv3 = tf.summary.histogram("db_conv3", grads[5])
dW_fc1 = tf.summary.histogram("dW_fc1", grads[6])
db_fc1 = tf.summary.histogram("db_fc1", grads[7])
dW_fc2 = tf.summary.histogram("dW_fc2", grads[8])
db_fc2 = tf.summary.histogram("db_fc2", grads[9])
# for i in range(8): # for debag
# print(grads[i])
#==============================================================================================================================================
"""Parameters for accuracy verification"""
#==============================================================================================================================================
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
accuracy_summary = tf.summary.scalar("accuracy", accuracy) # for tensorboard
#==============================================================================================================================================
"""Execution of learning"""
#==============================================================================================================================================
acc_list = [] #List to save the accuracy rate and the progress of the error
num_data_each_conf = [] #A list that stores the progress of the number of data for each conviction
acc_each_conf = [] #A list that saves the progress of the correct answer rate for each conviction
start_time = time.time() #Calculation time count
total_cal_time = 0
with tf.Session() as sess:
saver = tf.train.Saver()
sess.run(tf.global_variables_initializer())
#Exporting files for tensorboard
merged = tf.summary.merge_all()
writer = tf.summary.FileWriter(r"temp_result", sess.graph)
for step in range(20001):
batch_mask = np.random.choice(train_size, train_batch_size)
tr_batch_xs = x_train[batch_mask]
tr_batch_ys = t_train[batch_mask]
#Confirmation of accuracy during learning
if step%100 == 0:
cal_time = time.time() - start_time #Calculation time count
total_cal_time += cal_time
# train
train_accuracy = accuracy.eval(feed_dict={x: tr_batch_xs, y_: tr_batch_ys, keep_prob: 1.0})
train_loss = cross_entropy.eval(feed_dict={x: tr_batch_xs, y_: tr_batch_ys, keep_prob: 1.0})
# test
# use all data
test_accuracy = accuracy.eval(feed_dict={x: x_test, y_: t_test, keep_prob: 1.0})
test_loss = cross_entropy.eval(feed_dict={x: x_test, y_: t_test, keep_prob: 1.0})
# use test batch
# batch_mask = np.random.choice(test_size, test_batch_size, replace=False)
# te_batch_xs = x_test[batch_mask]
# te_batch_ys = t_test[batch_mask]
# test_accuracy = accuracy.eval(feed_dict={x: te_batch_xs, y_: te_batch_ys, keep_prob: 1.0})
# test_loss = cross_entropy.eval(feed_dict={x: te_batch_xs, y_: te_batch_ys, keep_prob: 1.0})
print("calculation time %d sec, step %d, training accuracy %g, training loss %g, test accuracy %g, test loss %g"%(cal_time, step, train_accuracy, train_loss, test_accuracy, test_loss))
acc_list.append([step, train_accuracy, test_accuracy, train_loss, test_loss])
AI_prediction = y_conv.eval(feed_dict={x: x_test, y_: t_test, keep_prob: 1.0}) #AI prediction results use all data
# AI_prediction = y_conv.eval(feed_dict={x: te_batch_xs, y_: te_batch_ys, keep_prob: 1.0}) #AI prediction result use test batch
# print("AI_prediction.shape " + str(AI_prediction.shape)) # for debag
# print("AI_prediction.type" + str(type(AI_prediction)))
AI_correct_prediction = correct_prediction.eval(feed_dict={x: x_test, y_: t_test, keep_prob: 1.0}) #Correct answer:TRUE,Incorrect answer:FALSE use all data
# AI_correct_prediction = correct_prediction.eval(feed_dict={x: te_batch_xs, y_: te_batch_ys, keep_prob: 1.0}) #Correct answer:TRUE,Incorrect answer:FALSE use test batch
# print("AI_prediction.shape " + str(AI_prediction.shape)) # for debag
# print("AI_prediction.type" + str(type(AI_prediction)))
AI_correct_prediction_int = AI_correct_prediction.astype(np.int) #Correct answer:1,Incorrect answer:0
#Calculate the number of data and accuracy rate for each conviction
# 50%that's all,60%The following confidence(or 40%that's all,50%The following confidence)
a = AI_prediction[:,0] >= 0.5
b = AI_prediction[:,0] <= 0.6
# print("a " + str(a)) # for debag
# print("a.shape " + str(a.shape))
cnf_50to60 = np.logical_and(a, b)
# print("cnf_50to60 " + str(cnf_50to60)) # for debag
# print("cnf_50to60.shape " + str(cnf_50to60.shape))
a = AI_prediction[:,0] >= 0.4
b = AI_prediction[:,0] < 0.5
cnf_40to50 = np.logical_and(a, b)
cnf_50to60 = np.logical_or(cnf_50to60, cnf_40to50)
cnf_50to60_int = cnf_50to60.astype(np.int)
# print("cnf_50to60_int " + str(cnf_50to60)) # for debag
# print("cnf_50to60.shape " + str(cnf_50to60.shape))
correct_prediction_50to60 = np.logical_and(cnf_50to60, AI_correct_prediction)
correct_prediction_50to60_int = correct_prediction_50to60.astype(np.int)
sum_50to60 = np.sum(cnf_50to60_int) #Conviction is 50%From 60%Number of data
acc_50to60 = np.sum(correct_prediction_50to60_int) / sum_50to60 #Conviction is 50%From 60%Correct answer rate
# 60%Greater,70%The following confidence(or 30%that's all,40%Less certainty)
a = AI_prediction[:,0] > 0.6
b = AI_prediction[:,0] <= 0.7
cnf_60to70 = np.logical_and(a, b)
a = AI_prediction[:,0] >= 0.3
b = AI_prediction[:,0] < 0.4
cnf_30to40 = np.logical_and(a, b)
cnf_60to70 = np.logical_or(cnf_60to70, cnf_30to40)
cnf_60to70_int = cnf_60to70.astype(np.int)
correct_prediction_60to70 = np.logical_and(cnf_60to70, AI_correct_prediction)
correct_prediction_60to70_int = correct_prediction_60to70.astype(np.int)
sum_60to70 = np.sum(cnf_60to70_int)
acc_60to70 = np.sum(correct_prediction_60to70_int) / sum_60to70
# 70%Greater,80%The following confidence(or 20%that's all,30%Less certainty)
a = AI_prediction[:,0] > 0.7
b = AI_prediction[:,0] <= 0.8
cnf_70to80 = np.logical_and(a, b)
a = AI_prediction[:,0] >= 0.2
b = AI_prediction[:,0] < 0.3
cnf_20to30 = np.logical_and(a, b)
cnf_70to80 = np.logical_or(cnf_70to80, cnf_20to30)
cnf_70to80_int = cnf_70to80.astype(np.int)
correct_prediction_70to80 = np.logical_and(cnf_70to80, AI_correct_prediction)
correct_prediction_70to80_int = correct_prediction_70to80.astype(np.int)
sum_70to80 = np.sum(cnf_70to80_int)
acc_70to80 = np.sum(correct_prediction_70to80_int) / sum_70to80
# 80%Greater,90%The following confidence(or 10%that's all,20%Less certainty)
a = AI_prediction[:,0] > 0.8
b = AI_prediction[:,0] <= 0.9
cnf_80to90 = np.logical_and(a, b)
a = AI_prediction[:,0] >= 0.1
b = AI_prediction[:,0] < 0.2
cnf_10to20 = np.logical_and(a, b)
cnf_80to90 = np.logical_or(cnf_80to90, cnf_10to20)
cnf_80to90_int = cnf_80to90.astype(np.int)
correct_prediction_80to90 = np.logical_and(cnf_80to90, AI_correct_prediction)
correct_prediction_80to90_int = correct_prediction_80to90.astype(np.int)
sum_80to90 = np.sum(cnf_80to90_int)
acc_80to90 = np.sum(correct_prediction_80to90_int) / sum_80to90
# 90%Greater,100%The following confidence(or 0%that's all,10%Less certainty)
a = AI_prediction[:,0] > 0.9
b = AI_prediction[:,0] <= 1.0
cnf_90to100 = np.logical_and(a, b)
a = AI_prediction[:,0] >= 0
b = AI_prediction[:,0] < 0.1
cnf_0to10 = np.logical_and(a, b)
cnf_90to100 = np.logical_or(cnf_90to100, cnf_0to10)
cnf_90to100_int = cnf_90to100.astype(np.int)
correct_prediction_90to100 = np.logical_and(cnf_90to100, AI_correct_prediction)
correct_prediction_90to100_int = correct_prediction_90to100.astype(np.int)
sum_90to100 = np.sum(cnf_90to100_int)
acc_90to100 = np.sum(correct_prediction_90to100_int) / sum_90to100
print("Number of data of each confidence 50to60:%g, 60to70:%g, 70to80:%g, 80to90:%g, 90to100:%g "%(sum_50to60, sum_60to70, sum_70to80, sum_80to90, sum_90to100))
print("Accuracy rate of each confidence 50to60:%g, 60to70:%g, 70to80:%g, 80to90:%g, 90to100:%g "%(acc_50to60, acc_60to70, acc_70to80, acc_80to90, acc_90to100))
print("")
num_data_each_conf.append([step, sum_50to60, sum_60to70, sum_70to80, sum_80to90, sum_90to100])
acc_each_conf.append([step, acc_50to60, acc_60to70, acc_70to80, acc_80to90, acc_90to100])
#Exporting files for tensorboard
result = sess.run(merged, feed_dict={x:tr_batch_xs, y_: tr_batch_ys, keep_prob: 1.0})
writer.add_summary(result, step)
start_time = time.time()
#Execution of learning
train_step.run(feed_dict={x: tr_batch_xs, y_: tr_batch_ys, keep_prob: 0.5})
#Final accuracy rate for test data
# use all data
print("test accuracy %g"%accuracy.eval(feed_dict={x: x_test, y_: t_test, keep_prob: 1.0}))
# use test batch
# batch_mask = np.random.choice(test_size, test_batch_size, replace=False)
# te_batch_xs = x_test[batch_mask]
# te_batch_ys = t_test[batch_mask]
# test_accuracy = accuracy.eval(feed_dict={x: te_batch_xs, y_: te_batch_ys, keep_prob: 1.0})
print("total calculation time %g sec"%total_cal_time)
np.savetxt(r"temp_result\acc_list.csv", acc_list, delimiter = ",") #Writing out the correct answer rate and the progress of the error
np.savetxt(r"temp_result\number_of_data_each_confidence.csv", num_data_each_conf, delimiter = ",") #Exporting the progress of the number of data for each conviction
np.savetxt(r"temp_result\accuracy_rate_of_each_confidence.csv", acc_each_conf, delimiter = ",") #Writing out the progress of the correct answer rate for each conviction
saver.save(sess, r"temp_result\spectrogram_model.ckpt") #Export final parameters
#==============================================================================================================================================
scalogram4.py
# -*- coding: utf-8 -*-
"""
Created on Tue Jul 25 11:24:50 2017
@author: izumiy
"""
import pywt
import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
def align_USD_EUR(USD_csv, EUR_csv):
"""USD/JPY and EUR/A function that deletes missing data in JPY and extracts the closing price of the time that exists in both"""
USD = np.loadtxt(USD_csv, delimiter = ",", usecols = (0,1,5), skiprows = 1, dtype="S8")
EUR = np.loadtxt(EUR_csv, delimiter = ",", usecols = (0,1,5), skiprows = 1, dtype="S8")
# print("USD time " + str(USD[:,1])) # for debag
print("EUR shape " + str(EUR.shape)) # for debag
print("USD shape " + str(USD.shape)) # for debag
print("")
# USD_num_data = USD.shape[0]
# EUR_num_data = EUR.shape[0]
# idx_difference = abs(USD_num_data - EUR_num_data)
# print("USD num data " + str(USD_num_data)) # for debag
USD_close = USD[:,2]
EUR_close = EUR[:,2]
USD = np.core.defchararray.add(USD[:,0], USD[:,1])
EUR = np.core.defchararray.add(EUR[:,0], EUR[:,1])
# print("USD " + str(USD)) # for debag
#Index where the time does not match(idx_mismatch)To get
if USD.shape[0] > EUR.shape[0]:
temp_USD = USD[:EUR.shape[0]]
# print("EUR shape " + str(EUR.shape)) # for debag
# print("temp USD shape " + str(temp_USD.shape)) # for debag
coincidence = EUR == temp_USD
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
elif EUR.shape[0] > USD.shape[0]:
temp_EUR = EUR[:USD.shape[0]]
# print("temp EUR shape " + str(temp_EUR.shape)) # for debag
# print("USD shape " + str(USD.shape)) # for debag
coincidence = USD == temp_EUR
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
elif USD.shape[0] == EUR.shape[0]:
coincidence = USD == EUR
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
while USD.shape[0] != idx_mismatch:
print("idx mismatch " + str(idx_mismatch)) # for debag
print("USD[idx_mismatch] " + str(USD[idx_mismatch]))
print("EUR[idx_mismatch] " + str(EUR[idx_mismatch]))
#Delete unnecessary data
if USD[idx_mismatch] > EUR[idx_mismatch]:
EUR = np.delete(EUR, idx_mismatch)
EUR_close = np.delete(EUR_close, idx_mismatch)
elif EUR[idx_mismatch] > USD[idx_mismatch]:
USD = np.delete(USD, idx_mismatch)
USD_close = np.delete(USD_close, idx_mismatch)
print("EUR shape " + str(EUR.shape)) # for debag
print("USD shape " + str(USD.shape)) # for debag
print("")
if USD.shape[0] > EUR.shape[0]:
temp_USD = USD[:EUR.shape[0]]
# print("EUR shape " + str(EUR.shape)) # for debag
# print("temp USD shape " + str(temp_USD.shape)) # for debag
coincidence = EUR == temp_USD
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
elif EUR.shape[0] > USD.shape[0]:
temp_EUR = EUR[:USD.shape[0]]
# print("temp EUR shape " + str(temp_EUR.shape)) # for debag
# print("USD shape " + str(USD.shape)) # for debag
coincidence = USD == temp_EUR
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
elif USD.shape[0] == EUR.shape[0]:
coincidence = USD == EUR
if (coincidence==False).any():
idx_mismatch = np.where(coincidence == False)
idx_mismatch = idx_mismatch[0][0]
else:
idx_mismatch = np.where(coincidence == True)
idx_mismatch = idx_mismatch[0].shape[0]
USD = np.reshape(USD, (-1,1))
EUR = np.reshape(EUR, (-1,1))
USD_close = np.reshape(USD_close, (-1,1))
EUR_close = np.reshape(EUR_close, (-1,1))
USD = np.append(USD, EUR, axis=1)
USD = np.append(USD, USD_close, axis=1)
USD = np.append(USD, EUR_close, axis=1)
np.savetxt("USD_EUR.csv", USD, delimiter = ",", fmt="%s")
return USD_close, EUR_close
def variable_timelines_scalogram_1(time_series, scales, wavelet, predict_time_inc, save_flag, ch_flag, heights, base_height, width):
"""
A function that performs a continuous wavelet transform
Use closing price
time_series :Currency data,closing price
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
ch_flag :Number of channels to use, ch_flag=1 : close
heights :Image height num of time lines,Specify in the list
width :Image width num of freq lines
base_height :Height of scalogram used for training data
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print(" number of the series data : " + str(num_series_data))
close = time_series
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
scalogram = np.empty((0, ch_flag, base_height, width))
label_array = np.array([])
for height in heights:
print(" time line = ", height)
print(" carry out cwt...")
time_start = 0
time_end = time_start + height
# hammingWindow = np.hamming(height) #Humming window
# hanningWindow = np.hanning(height) #Hanning window
# blackmanWindow = np.blackman(height) #Blackman window
# bartlettWindow = np.bartlett(height) #Bartlett window
while(time_end <= num_series_data - predict_time_inc):
# print("time start " + str(time_start)) for debag
temp_close = close[time_start:time_end]
#With window function
# temp_close = temp_close * hammingWindow
#mirror,Add inverted data before and after the data
mirror_temp_close = temp_close[::-1]
x = np.append(mirror_temp_close, temp_close)
temp_close = np.append(x, mirror_temp_close)
temp_cwt_close, freq_close = pywt.cwt(temp_close, scales, wavelet) #Performing continuous wavelet transform
temp_cwt_close = temp_cwt_close.T #Transposed CWT(freq, time) ⇒ CWT(time, freq)
#mirror,Extract only the central data
temp_cwt_close = temp_cwt_close[height:2*height,:]
if height != base_height:
img_scalogram = Image.fromarray(temp_cwt_close)
img_scalogram = img_scalogram.resize((width, base_height))
temp_cwt_close = np.array(img_scalogram)
temp_cwt_close = np.reshape(temp_cwt_close, (-1, ch_flag, base_height, width)) # num_data, ch, height(time), width(freq)
# print("temp_cwt_close_shape " + str(temp_cwt_close.shape)) # for debag
scalogram = np.append(scalogram, temp_cwt_close, axis=0)
# print("cwt_close_shape " + str(cwt_close.shape)) # for debag
time_start = time_end
time_end = time_start + height
print(" scalogram shape " + str(scalogram.shape))
"""Creating a label"""
print(" make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
temp_label_array = predict_close > corrent_close
# print(temp_label_array[:30]) # for debag
"""
#How to use while,slow
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""temp_label_array(time),Slice so that time is divisible by height"""
raw_num_shift = temp_label_array.shape[0]
num_shift = int(raw_num_shift / height) * height
temp_label_array = temp_label_array[0:num_shift]
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
temp_label_array = np.reshape(temp_label_array, (-1, height))
temp_label_array = temp_label_array[:, col]
label_array = np.append(label_array, temp_label_array)
print(" label shape " + str(label_array.shape))
"""File output"""
if save_flag == 1:
print(" output the files")
save_cwt_close = np.reshape(scalogram, (-1, width))
np.savetxt("scalogram.csv", save_cwt_close, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return scalogram, label_array, freq_close
def create_scalogram_1(time_series, scales, wavelet, predict_time_inc, save_flag, ch_flag, height, width):
"""
A function that performs a continuous wavelet transform
Use closing price
time_series :Currency data,closing price
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
ch_flag :Number of channels to use, ch_flag=1 : close
height :Image height num of time lines
width :Image width num of freq lines
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print("number of the series data : " + str(num_series_data))
close = time_series
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
print("carry out cwt...")
time_start = 0
time_end = time_start + height
scalogram = np.empty((0, ch_flag, height, width))
# hammingWindow = np.hamming(height) #Humming window
# hanningWindow = np.hanning(height) #Hanning window
# blackmanWindow = np.blackman(height) #Blackman window
# bartlettWindow = np.bartlett(height) #Bartlett window
while(time_end <= num_series_data - predict_time_inc):
# print("time start " + str(time_start)) for debag
temp_close = close[time_start:time_end]
#With window function
# temp_close = temp_close * hammingWindow
#mirror,Add inverted data before and after the data
mirror_temp_close = temp_close[::-1]
x = np.append(mirror_temp_close, temp_close)
temp_close = np.append(x, mirror_temp_close)
temp_cwt_close, freq_close = pywt.cwt(temp_close, scales, wavelet) #Performing continuous wavelet transform
temp_cwt_close = temp_cwt_close.T #Transposed CWT(freq, time) ⇒ CWT(time, freq)
#mirror,Extract only the central data
temp_cwt_close = temp_cwt_close[height:2*height,:]
temp_cwt_close = np.reshape(temp_cwt_close, (-1, ch_flag, height, width)) # num_data, ch, height(time), width(freq)
# print("temp_cwt_close_shape " + str(temp_cwt_close.shape)) # for debag
scalogram = np.append(scalogram, temp_cwt_close, axis=0)
# print("cwt_close_shape " + str(cwt_close.shape)) # for debag
time_start = time_end
time_end = time_start + height
"""Creating a label"""
print("make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
label_array = predict_close > corrent_close
# print(label_array[:30]) # for debag
"""
#How to use while,slow
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""label_array(time),Slice so that time is divisible by height"""
raw_num_shift = label_array.shape[0]
num_shift = int(raw_num_shift / height) * height
label_array = label_array[0:num_shift]
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
label_array = np.reshape(label_array, (-1, height))
label_array = label_array[:, col]
"""File output"""
if save_flag == 1:
print("output the files")
save_cwt_close = np.reshape(scalogram, (-1, width))
np.savetxt("scalogram.csv", save_cwt_close, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return scalogram, label_array, freq_close
def create_scalogram_5(time_series, scales, wavelet, predict_time_inc, save_flag, ch_flag, height, width):
"""
A function that performs a continuous wavelet transform
Use closing price
time_series :Currency data,closing price
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
ch_flag :Number of channels to use, ch_flag=5 : start, high, low, close, volume
height :Image height num of time lines
width :Image width num of freq lines
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print("number of the series data : " + str(num_series_data))
start = time_series[:,0]
high = time_series[:,1]
low = time_series[:,2]
close = time_series[:,3]
volume = time_series[:,4]
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
print("carry out cwt...")
time_start = 0
time_end = time_start + height
scalogram = np.empty((0, ch_flag, height, width))
while(time_end <= num_series_data - predict_time_inc):
# print("time start " + str(time_start)) for debag
temp_start = start[time_start:time_end]
temp_high = high[time_start:time_end]
temp_low = low[time_start:time_end]
temp_close = close[time_start:time_end]
temp_volume = volume[time_start:time_end]
temp_cwt_start, freq_start = pywt.cwt(temp_start, scales, wavelet) #Performing continuous wavelet transform
temp_cwt_high, freq_high = pywt.cwt(temp_high, scales, wavelet)
temp_cwt_low, freq_low = pywt.cwt(temp_low, scales, wavelet)
temp_cwt_close, freq_close = pywt.cwt(temp_close, scales, wavelet)
temp_cwt_volume, freq_volume = pywt.cwt(temp_volume, scales, wavelet)
temp_cwt_start = temp_cwt_start.T #Transposed CWT(freq, time) ⇒ CWT(time, freq)
temp_cwt_high = temp_cwt_high.T
temp_cwt_low = temp_cwt_low.T
temp_cwt_close = temp_cwt_close.T
temp_cwt_volume = temp_cwt_volume.T
temp_cwt_start = np.reshape(temp_cwt_start, (-1, 1, height, width)) # num_data, ch, height(time), width(freq)
temp_cwt_high = np.reshape(temp_cwt_high, (-1, 1, height, width))
temp_cwt_low = np.reshape(temp_cwt_low, (-1, 1, height, width))
temp_cwt_close = np.reshape(temp_cwt_close, (-1, 1, height, width))
temp_cwt_volume = np.reshape(temp_cwt_volume, (-1, 1, height, width))
# print("temp_cwt_close_shape " + str(temp_cwt_close.shape)) # for debag
temp_cwt_start = np.append(temp_cwt_start, temp_cwt_high, axis=1)
temp_cwt_start = np.append(temp_cwt_start, temp_cwt_low, axis=1)
temp_cwt_start = np.append(temp_cwt_start, temp_cwt_close, axis=1)
temp_cwt_start = np.append(temp_cwt_start, temp_cwt_volume, axis=1)
# print("temp_cwt_start_shape " + str(temp_cwt_start.shape)) for debag
scalogram = np.append(scalogram, temp_cwt_start, axis=0)
# print("cwt_close_shape " + str(cwt_close.shape)) # for debag
time_start = time_end
time_end = time_start + height
"""Creating a label"""
print("make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
label_array = predict_close > corrent_close
# print(label_array[:30]) # for debag
"""
#How to use while,slow
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""label_array(time),Slice so that time is divisible by height"""
raw_num_shift = label_array.shape[0]
num_shift = int(raw_num_shift / height) * height
label_array = label_array[0:num_shift]
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
label_array = np.reshape(label_array, (-1, height))
label_array = label_array[:, col]
"""File output"""
if save_flag == 1:
print("output the files")
save_cwt_close = np.reshape(scalogram, (-1, width))
np.savetxt("scalogram.csv", save_cwt_close, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return scalogram, label_array, freq_close
def CWT_1(time_series, scales, wavelet, predict_time_inc, save_flag):
"""
A function that performs a continuous wavelet transform
Use closing price
time_series :Currency data,closing price
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print("number of the series data : " + str(num_series_data))
close = time_series
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
print("carry out cwt...")
cwt_close, freq_close = pywt.cwt(close, scales, wavelet)
#Transposed CWT(freq, time) ⇒ CWT(time, freq)
cwt_close = cwt_close.T
"""Creating a label"""
print("make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
label_array = predict_close > corrent_close
# print(label_array[:30]) # for debag
"""
#How to use while
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""File output"""
if save_flag == 1:
print("output the files")
np.savetxt("CWT_close.csv", cwt_close, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return [cwt_close], label_array, freq_close
def merge_CWT_1(cwt_list, label_array, height, width):
"""
Use closing price
cwt_list :CWT result list
label_array :Numpy array containing labels
height :Image height num of time lines
width :Image width num of freq lines
"""
print("merge CWT")
cwt_close = cwt_list[0] #Closing price CWT(time, freq)
"""CWT(time, freq),Slice so that time is divisible by height"""
raw_num_shift = cwt_close.shape[0]
num_shift = int(raw_num_shift / height) * height
cwt_close = cwt_close[0:num_shift]
label_array = label_array[0:num_shift]
"""Shape change, (The number of data,Channel,height(time),width(freq))"""
cwt_close = np.reshape(cwt_close, (-1, 1, height, width))
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
label_array = np.reshape(label_array, (-1, height))
label_array = label_array[:, col]
return cwt_close, label_array
def CWT_2(time_series, scales, wavelet, predict_time_inc, save_flag):
"""
A function that performs a continuous wavelet transform
closing price,Use Volume
time_series :Currency data,closing price, volume
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print("number of the series data : " + str(num_series_data))
close = time_series[:,0]
volume = time_series[:,1]
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
print("carry out cwt...")
cwt_close, freq_close = pywt.cwt(close, scales, wavelet)
cwt_volume, freq_volume = pywt.cwt(volume, scales, wavelet)
#Transposed CWT(freq, time) ⇒ CWT(time, freq)
cwt_close = cwt_close.T
cwt_volume = cwt_volume.T
"""Creating a label"""
print("make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
label_array = predict_close > corrent_close
# print(label_array[:30]) # for debag
"""
#How to use while
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""File output"""
if save_flag == 1:
print("output the files")
np.savetxt("CWT_close.csv", cwt_close, delimiter = ",")
np.savetxt("CWT_volume.csv", cwt_volume, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return [cwt_close, cwt_volume], label_array, freq_close
def merge_CWT_2(cwt_list, label_array, height, width):
"""
closing price,Use Volume
cwt_list :CWT result list
label_array :Numpy array containing labels
height :Image height num of time lines
width :Image width num of freq lines
"""
print("merge CWT")
cwt_close = cwt_list[0] #Closing price CWT(time, freq)
cwt_volume = cwt_list[1] #Volume
"""CWT(time, freq),Slice so that time is divisible by height"""
raw_num_shift = cwt_close.shape[0]
num_shift = int(raw_num_shift / height) * height
cwt_close = cwt_close[0:num_shift]
cwt_volume = cwt_volume[0:num_shift]
label_array = label_array[0:num_shift]
"""Shape change, (The number of data,Channel,height(time),width(freq))"""
cwt_close = np.reshape(cwt_close, (-1, 1, height, width))
cwt_volume = np.reshape(cwt_volume, (-1, 1, height, width))
"""Merge"""
cwt_close = np.append(cwt_close, cwt_volume, axis=1)
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
label_array = np.reshape(label_array, (-1, height))
label_array = label_array[:, col]
return cwt_close, label_array
def CWT_5(time_series, scales, wavelet, predict_time_inc, save_flag):
"""
A function that performs a continuous wavelet transform
Open price, high price, low price, close price,Use Volume
time_series :Currency data,Open price,High price,Low price,closing price, volume
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print("number of the series data : " + str(num_series_data))
start = time_series[:,0]
high = time_series[:,1]
low = time_series[:,2]
close = time_series[:,3]
volume = time_series[:,4]
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
print("carry out cwt...")
cwt_start, freq_start = pywt.cwt(start, scales, wavelet)
cwt_high, freq_high = pywt.cwt(high, scales, wavelet)
cwt_low, freq_low = pywt.cwt(low, scales, wavelet)
cwt_close, freq_close = pywt.cwt(close, scales, wavelet)
cwt_volume, freq_volume = pywt.cwt(volume, scales, wavelet)
#Transposed CWT(freq, time) ⇒ CWT(time, freq)
cwt_start = cwt_start.T
cwt_high = cwt_high.T
cwt_low = cwt_low.T
cwt_close = cwt_close.T
cwt_volume = cwt_volume.T
"""Creating a label"""
print("make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
label_array = predict_close > corrent_close
# print(label_array.dtype) >>> bool
"""
#How to use while
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""File output"""
if save_flag == 1:
print("output the files")
np.savetxt("CWT_start.csv", cwt_start, delimiter = ",")
np.savetxt("CWT_high.csv", cwt_high, delimiter = ",")
np.savetxt("CWT_low.csv", cwt_low, delimiter = ",")
np.savetxt("CWT_close.csv", cwt_close, delimiter = ",")
np.savetxt("CWT_volume.csv", cwt_volume, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return [cwt_start, cwt_high, cwt_low, cwt_close, cwt_volume], label_array, freq_close
def merge_CWT_5(cwt_list, label_array, height, width):
"""
cwt_list :CWT result list
label_array :Numpy array containing labels
height :Image height num of time lines
width :Image width num of freq lines
"""
print("merge CWT")
cwt_start = cwt_list[0] #Open price
cwt_high = cwt_list[1] #High price
cwt_low = cwt_list[2] #Low price
cwt_close = cwt_list[3] #Closing price CWT(time, freq)
cwt_volume = cwt_list[4] #Volume
"""CWT(time, freq),Slice so that time is divisible by height"""
raw_num_shift = cwt_close.shape[0]
num_shift = int(raw_num_shift / height) * height
cwt_start = cwt_start[0:num_shift]
cwt_high = cwt_high[0:num_shift]
cwt_low = cwt_low[0:num_shift]
cwt_close = cwt_close[0:num_shift]
cwt_volume = cwt_volume[0:num_shift]
label_array = label_array[0:num_shift]
"""Shape change, (The number of data,Channel,height(time),width(freq))"""
cwt_start = np.reshape(cwt_start, (-1, 1, height, width))
cwt_high = np.reshape(cwt_high, (-1, 1, height, width))
cwt_low = np.reshape(cwt_low, (-1, 1, height, width))
cwt_close = np.reshape(cwt_close, (-1, 1, height, width))
cwt_volume = np.reshape(cwt_volume, (-1, 1, height, width))
"""Merge"""
cwt_start = np.append(cwt_start, cwt_high, axis=1)
cwt_start = np.append(cwt_start, cwt_low, axis=1)
cwt_start = np.append(cwt_start, cwt_close, axis=1)
cwt_start = np.append(cwt_start, cwt_volume, axis=1)
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
label_array = np.reshape(label_array, (-1, height))
label_array = label_array[:, col]
# print(label_array.dtype) >>> bool
return cwt_start, label_array
def make_scalogram(input_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc):
"""
input_file_name :Exchange data file name
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
height :Image height num of time lines
width :Image width num of freq lines
ch_flag :Number of channels to use, ch_flag=1:close, ch_flag=2:close and volume, ch_flag=5:start, high, low, close, volume
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
over_lap_inc :Incremental CWT start time
"""
scalogram = np.empty((0, ch_flag, height, width)) #Array to store all scalograms and labels
label = np.array([])
over_lap_start = 0
over_lap_end = int((height - 1) / over_lap_inc) * over_lap_inc + 1
if ch_flag==1:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (5,), skiprows = 1) #Get the closing price as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
cwt_list, label_array, freq = CWT_1(temp_time_series, scales, wavelet, predict_time_inc, save_flag) #Run CWT
temp_scalogram, temp_label = merge_CWT_1(cwt_list, label_array, height, width) #Creating a scalogram
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
print("scalogram_shape " + str(scalogram.shape))
print("label shape " + str(label.shape))
print("frequency " + str(freq))
elif ch_flag==2:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (5,6), skiprows = 1) #closing price,Get volume as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
cwt_list, label_array, freq = CWT_2(temp_time_series, scales, wavelet, predict_time_inc, save_flag) #Run CWT
temp_scalogram, temp_label = merge_CWT_2(cwt_list, label_array, height, width) #Creating a scalogram
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
print("scalogram_shape " + str(scalogram.shape))
print("label shape " + str(label.shape))
print("frequency " + str(freq))
elif ch_flag==5:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (2,3,4,5,6), skiprows = 1) #Open price,High price,Low price,closing price,Get volume as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
cwt_list, label_array, freq = CWT_5(temp_time_series, scales, wavelet, predict_time_inc, save_flag) #Run CWT
temp_scalogram, temp_label = merge_CWT_5(cwt_list, label_array, height, width) #Creating a scalogram
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
# print(temp_label.dtype) >>> bool
# print(label.dtype) >>> float64
print("scalogram_shape " + str(scalogram.shape))
print("label shape " + str(label.shape))
print("frequency " + str(freq))
label = label.astype(np.int)
return scalogram, label
def merge_scalogram(input_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc):
"""
input_file_name :Exchange data file name
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
height :Image height num of time lines
width :Image width num of freq lines
ch_flag :Number of channels to use, ch_flag=1:close, ch_flag=2:close and volume, ch_flag=5:start, high, low, close, volume
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
over_lap_inc :Incremental CWT start time
"""
scalogram = np.empty((0, ch_flag, height, width)) #Array to store all scalograms and labels
label = np.array([])
over_lap_start = 0
over_lap_end = int((height - 1) / over_lap_inc) * over_lap_inc + 1
if ch_flag==1:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (5,), skiprows = 1) #Get the closing price as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
temp_scalogram, temp_label, freq = create_scalogram_1(temp_time_series, scales, wavelet, predict_time_inc, save_flag, ch_flag, height, width)
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
# print("scalogram_shape " + str(scalogram.shape))
# print("label shape " + str(label.shape))
# print("frequency " + str(freq))
if ch_flag==5:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (2,3,4,5,6), skiprows = 1) #Get the closing price as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
temp_scalogram, temp_label, freq = create_scalogram_5(temp_time_series, scales, wavelet, predict_time_inc, save_flag, ch_flag, height, width)
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
label = label.astype(np.int)
return scalogram, label, freq
def merge_scalogram2(input_file_name, scales, wavelet, heights, base_height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc):
"""
input_file_name :Exchange data file name
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
heights :Image height num of time lines,Specify in the list
width :Image width num of freq lines
ch_flag :Number of channels to use, ch_flag=1:close, ch_flag=2:close and volume, ch_flag=5:start, high, low, close, volume
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
over_lap_inc :Incremental CWT start time
base_height :Height of scalogram used for training data
"""
scalogram = np.empty((0, ch_flag, base_height, width)) #Array to store all scalograms and labels
label = np.array([])
over_lap_start = 0
over_lap_end = int((base_height - 1) / over_lap_inc) * over_lap_inc + 1
if ch_flag==1:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (5,), skiprows = 1) #Get the closing price as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
temp_scalogram, temp_label, freq = variable_timelines_scalogram_1(temp_time_series, scales, wavelet, predict_time_inc, save_flag, ch_flag, heights, base_height, width)
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
# print("scalogram_shape " + str(scalogram.shape))
# print("label shape " + str(label.shape))
# print("frequency " + str(freq))
label = label.astype(np.int)
return scalogram, label, freq
def merge_scalogram3(USD_csv, EUR_csv, scales, wavelet, heights, base_height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc):
"""
USD_csv : USD/JPY exchange data file name
EUR_csv : EUR/JPY exchange data file name
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
heights :Image height num of time lines,Specify in the list
width :Image width num of freq lines
ch_flag :Number of channels to use, ch_flag=1:close,Under construction(ch_flag=2:close and volume, ch_flag=5:start, high, low, close, volume)
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
over_lap_inc :Incremental CWT start time
base_height :Height of scalogram used for training data
"""
scalogram = np.empty((0, 2, base_height, width)) #Array to store all scalograms and labels
label = np.array([])
over_lap_start = 0
over_lap_end = int((base_height - 1) / over_lap_inc) * over_lap_inc + 1
if ch_flag==1:
print("Reading the input file...")
USD_close, EUR_close = align_USD_EUR(USD_csv, EUR_csv) # USD/JPY and EUR/Delete the missing data in JPY and extract the closing price of the time existing in both
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("Over Lap Start " + str(i))
temp_USD_close = USD_close[i:] #Change the start time of CWT
temp_EUR_close = EUR_close[i:]
print("CWT USD/JPY")
temp_USD_scalogram, temp_USD_label, USD_freq = variable_timelines_scalogram_1(temp_USD_close, scales, wavelet, predict_time_inc, save_flag, ch_flag, heights, base_height, width)
print("CWT EUR/JPY")
temp_EUR_scalogram, temp_EUR_label, EUR_freq = variable_timelines_scalogram_1(temp_EUR_close, scales, wavelet, predict_time_inc, save_flag, ch_flag, heights, base_height, width)
# print("temp USD scalogram shape " + str(temp_USD_scalogram.shape))
# print("temp EUR scalogram shape " + str(temp_EUR_scalogram.shape))
temp_scalogram = np.append(temp_USD_scalogram, temp_EUR_scalogram, axis=1)
# print("temp scalogram shape " + str(temp_scalogram.shape))
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_USD_label)
# label = np.append(label, temp_EUR_label)
print("Scalogram shape " + str(scalogram.shape))
print("Label shape " + str(label.shape))
print("")
# print("scalogram_shape " + str(scalogram.shape))
# print("label shape " + str(label.shape))
# print("frequency " + str(freq))
label = label.astype(np.int)
return scalogram, label, USD_freq
Recommended Posts