This method has not yet obtained good results. I'm at the stage of trying out ideas as a hobby, so I don't think it will be useful for those who are looking for a tool that can be used immediately. please note that. m (__) m In the previous and two previous articles, I used a sound analysis technology called spectrogram to image exchange data (USD / JPY) and learned it on CNN. The result was a terrible defeat. The accuracy rate for the test data did not increase.
[Previous article] Examination of exchange rate forecasting method using Deep Learning and Spectrogram [Previous article] Examination of exchange rate forecasting method using Deep Learning and Spectrogram-Part 2-
When I was thinking about what to do next, I found an article that the wavelet transform is more compatible than the FFT for analyzing financial data. Therefore, this time, I examined whether it is possible to predict "the exchange rate will rise or fall after 30 minutes" by combining the wavelet transform and CNN.
Figure 1 shows a schematic diagram of the wavelet transform. The FFT used up to the previous article is an analysis method that expresses a complex waveform by adding infinitely continuous sine waves. On the other hand, the wavelet transform expresses a complicated waveform by adding the localized waves (wavelets). While the FFT is good at analyzing stationary signals, the wavelet transform is suitable for analyzing irregular and non-stationary waveforms.
Figure 1. Schematic diagram of wavelet transform Source: https://www.slideshare.net/ryosuketachibana12/ss-42388444
The mapping of the wavelet strength at each shift (time) and each scale (frequency) is called a scalogram. Figure 2 is a scalogram created from the wavelet transform result of y = sin (πx / 16). Arbitrary waveforms can be imaged by using the wavelet transform in this way.
Figure 2. Scalogram example, y = sin (πx / 16)
There are two types of wavelet transform, continuous wavelet transform (CWT) and discrete wavelet transform (DWT), but this time we used continuous wavelet transform. There are various shapes of wavelets, but for the time being, the Gaussian function is used.
We created a scalogram from the closing price of the 5-minute bar of USD / JPY. The procedure of extracting 24-hour data from a huge amount of data for several years and creating a single scalogram was repeated many times. One data set is one scalogram and the price movement (up or down) 30 minutes after the last time. There was one problem here. It is that the boundary (edge) of the scalogram is distorted. This happens because you lose data when you cross the boundary.
Figure 3. Distortion at the scalogram boundary
Therefore, in order to remove the distortion, we added left-right inverted data to both ends of the raw data. After the wavelet transform, only the central part corresponding to the raw data was extracted. This method is generally used to remove the distortion, but it seems that there are pros and cons because it means that fictitious data is added.
Figure 4. How to remove distortion at the boundary
I devised a little learning flow. Until the last time, we trained the data of the past 10 years and verified the accuracy with the data of the last 3 months. This time, after training the data for the past 10 years, we trained the data for the past 5 years, and then shortened the training data period to 2 years and 1 year. The reason for doing this is that we should emphasize the latest price movements in order to predict the future. We also increased the test data period to 5 months. Test data is not used for training. In other words, it is unknown data for AI.
Figure 5. Learning flow
Figure 6 shows the structure of the CNN used this time.
Figure 6. Structure of CNN used this time
So, I tried it while thinking that it should go well, but the result is as shown in Fig. 7. This time too, the accuracy rate for the test data did not increase. By the way, the correct answer rate for the training data drops at Iterations = 20000, 30000, which coincides with the timing when the training data period is switched.
Figure 7. Calculation result
I think that the fact that the time information contained in one scalogram is constant is one of the reasons why it does not work. This time, every scalogram is created from 24-hour waveform data. People who are actually trading change the period of the waveform to be evaluated as needed. Recently, I've become interested in "game theory" and I'm studying it, so I'll take a break from currency analysis for a while.
Yu-Nie
Appendix The data used for the analysis can be downloaded from the following. Training data USDJPY_20070301_20170228_5min.csv USDJPY_20120301_20170228_5min.csv USDJPY_20150301_20170228_5min.csv USDJPY_20160301_20170228_5min.csv test data USDJPY_20170301_20170731_5min.csv
Below is the code used for the analysis.
Jack_for_qiita_TF_version.py
# 20170821
# y.izumi
import tensorflow as tf
import numpy as np
import scalogram2 as sca
import time
"""Functions that perform parameter initialization, convolution operations, and pooling operations"""
#=============================================================================================================================================
#Weight initialization function
def weight_variable(shape, stddev=1e-4): # default stddev = 1e-4
initial = tf.truncated_normal(shape, stddev=stddev)
return tf.Variable(initial)
#Bias initialization function
def bias_variable(shape):
initial = tf.constant(0.0, shape=shape)
return tf.Variable(initial)
#Convolution operation
def conv2d(x, W):
return tf.nn.conv2d(x, W, strides=[1, 1, 1, 1], padding="SAME")
# pooling
def max_pool_2x2(x):
return tf.nn.max_pool(x, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding="SAME")
#=============================================================================================================================================
"""Functions that perform learning"""
#=============================================================================================================================================
def train(x_train, t_train, x_test, t_test, iters, acc_list, num_data_each_conf, acc_each_conf, total_cal_time, train_step, train_batch_size, test_batch_size):
"""
x_train :Training data
t_train :Learning label, one-hot
x_test :test data
t_test :Test label, one-hot
iters :Number of learning
acc_list :List to save the progress of the correct answer rate
num_data_each_conf :A list that stores the progress of the number of data for each conviction
acc_each_conf :A list that saves the progress of the correct answer rate for each conviction
total_cal_time :Total calculation time
train_step :Learning class
train_batch_size :Batch size of training data
test_batch_size :Batch size of test data
"""
train_size = x_train.shape[0] #Number of training data
test_size = x_test.shape[0] #Number of test data
start_time = time.time()
iters = iters + 1
for step in range(iters):
batch_mask = np.random.choice(train_size, train_batch_size)
tr_batch_xs = x_train[batch_mask]
tr_batch_ys = t_train[batch_mask]
#Confirmation of accuracy during learning
if step%100 == 0:
cal_time = time.time() - start_time #Calculation time count
total_cal_time += cal_time
# train
train_accuracy = accuracy.eval(feed_dict={x: tr_batch_xs, y_: tr_batch_ys, keep_prob: 1.0})
train_loss = cross_entropy.eval(feed_dict={x: tr_batch_xs, y_: tr_batch_ys, keep_prob: 1.0})
# test
# use all data
test_accuracy = accuracy.eval(feed_dict={x: x_test, y_: t_test, keep_prob: 1.0})
test_loss = cross_entropy.eval(feed_dict={x: x_test, y_: t_test, keep_prob: 1.0})
# use test batch
# batch_mask = np.random.choice(test_size, test_batch_size)
# te_batch_xs = x_test[batch_mask]
# te_batch_ys = t_test[batch_mask]
# test_accuracy = accuracy.eval(feed_dict={x: te_batch_xs, y_: te_batch_ys, keep_prob: 1.0})
# test_loss = cross_entropy.eval(feed_dict={x: te_batch_xs, y_: te_batch_ys, keep_prob: 1.0})
print("calculation time %d sec, step %d, training accuracy %g, training loss %g, test accuracy %g, test loss %g"%(cal_time, step, train_accuracy, train_loss, test_accuracy, test_loss))
acc_list.append([step, train_accuracy, test_accuracy, train_loss, test_loss])
AI_prediction = y_conv.eval(feed_dict={x: x_test, y_: t_test, keep_prob: 1.0}) #AI prediction results
# print("AI_prediction.shape " + str(AI_prediction.shape)) # for debag
# print("AI_prediction.type" + str(type(AI_prediction)))
AI_correct_prediction = correct_prediction.eval(feed_dict={x: x_test, y_: t_test, keep_prob: 1.0}) #Correct answer:TRUE,Incorrect answer:FALSE
# print("AI_prediction.shape " + str(AI_prediction.shape)) # for debag
# print("AI_prediction.type" + str(type(AI_prediction)))
AI_correct_prediction_int = AI_correct_prediction.astype(np.int) #Correct answer:1,Incorrect answer:0
#Calculate the number of data and accuracy rate for each conviction
# 50%that's all,60%The following confidence(or 40%that's all,50%The following confidence)
a = AI_prediction[:,0] >= 0.5
b = AI_prediction[:,0] <= 0.6
# print("a " + str(a)) # for debag
# print("a.shape " + str(a.shape))
cnf_50to60 = np.logical_and(a, b)
# print("cnf_50to60 " + str(cnf_50to60)) # for debag
# print("cnf_50to60.shape " + str(cnf_50to60.shape))
a = AI_prediction[:,0] >= 0.4
b = AI_prediction[:,0] < 0.5
cnf_40to50 = np.logical_and(a, b)
cnf_50to60 = np.logical_or(cnf_50to60, cnf_40to50)
cnf_50to60_int = cnf_50to60.astype(np.int)
# print("cnf_50to60_int " + str(cnf_50to60)) # for debag
# print("cnf_50to60.shape " + str(cnf_50to60.shape))
correct_prediction_50to60 = np.logical_and(cnf_50to60, AI_correct_prediction)
correct_prediction_50to60_int = correct_prediction_50to60.astype(np.int)
sum_50to60 = np.sum(cnf_50to60_int) #Conviction is 50%From 60%Number of data
acc_50to60 = np.sum(correct_prediction_50to60_int) / sum_50to60 #Conviction is 50%From 60%Correct answer rate
# 60%Greater,70%The following confidence(or 30%that's all,40%Less certainty)
a = AI_prediction[:,0] > 0.6
b = AI_prediction[:,0] <= 0.7
cnf_60to70 = np.logical_and(a, b)
a = AI_prediction[:,0] >= 0.3
b = AI_prediction[:,0] < 0.4
cnf_30to40 = np.logical_and(a, b)
cnf_60to70 = np.logical_or(cnf_60to70, cnf_30to40)
cnf_60to70_int = cnf_60to70.astype(np.int)
correct_prediction_60to70 = np.logical_and(cnf_60to70, AI_correct_prediction)
correct_prediction_60to70_int = correct_prediction_60to70.astype(np.int)
sum_60to70 = np.sum(cnf_60to70_int)
acc_60to70 = np.sum(correct_prediction_60to70_int) / sum_60to70
# 70%Greater,80%The following confidence(or 20%that's all,30%Less certainty)
a = AI_prediction[:,0] > 0.7
b = AI_prediction[:,0] <= 0.8
cnf_70to80 = np.logical_and(a, b)
a = AI_prediction[:,0] >= 0.2
b = AI_prediction[:,0] < 0.3
cnf_20to30 = np.logical_and(a, b)
cnf_70to80 = np.logical_or(cnf_70to80, cnf_20to30)
cnf_70to80_int = cnf_70to80.astype(np.int)
correct_prediction_70to80 = np.logical_and(cnf_70to80, AI_correct_prediction)
correct_prediction_70to80_int = correct_prediction_70to80.astype(np.int)
sum_70to80 = np.sum(cnf_70to80_int)
acc_70to80 = np.sum(correct_prediction_70to80_int) / sum_70to80
# 80%Greater,90%The following confidence(or 10%that's all,20%Less certainty)
a = AI_prediction[:,0] > 0.8
b = AI_prediction[:,0] <= 0.9
cnf_80to90 = np.logical_and(a, b)
a = AI_prediction[:,0] >= 0.1
b = AI_prediction[:,0] < 0.2
cnf_10to20 = np.logical_and(a, b)
cnf_80to90 = np.logical_or(cnf_80to90, cnf_10to20)
cnf_80to90_int = cnf_80to90.astype(np.int)
correct_prediction_80to90 = np.logical_and(cnf_80to90, AI_correct_prediction)
correct_prediction_80to90_int = correct_prediction_80to90.astype(np.int)
sum_80to90 = np.sum(cnf_80to90_int)
acc_80to90 = np.sum(correct_prediction_80to90_int) / sum_80to90
# 90%Greater,100%The following confidence(or 0%that's all,10%Less certainty)
a = AI_prediction[:,0] > 0.9
b = AI_prediction[:,0] <= 1.0
cnf_90to100 = np.logical_and(a, b)
a = AI_prediction[:,0] >= 0
b = AI_prediction[:,0] < 0.1
cnf_0to10 = np.logical_and(a, b)
cnf_90to100 = np.logical_or(cnf_90to100, cnf_0to10)
cnf_90to100_int = cnf_90to100.astype(np.int)
correct_prediction_90to100 = np.logical_and(cnf_90to100, AI_correct_prediction)
correct_prediction_90to100_int = correct_prediction_90to100.astype(np.int)
sum_90to100 = np.sum(cnf_90to100_int)
acc_90to100 = np.sum(correct_prediction_90to100_int) / sum_90to100
print("Number of data of each confidence 50to60:%g, 60to70:%g, 70to80:%g, 80to90:%g, 90to100:%g "%(sum_50to60, sum_60to70, sum_70to80, sum_80to90, sum_90to100))
print("Accuracy rate of each confidence 50to60:%g, 60to70:%g, 70to80:%g, 80to90:%g, 90to100:%g "%(acc_50to60, acc_60to70, acc_70to80, acc_80to90, acc_90to100))
print("")
num_data_each_conf.append([step, sum_50to60, sum_60to70, sum_70to80, sum_80to90, sum_90to100])
acc_each_conf.append([step, acc_50to60, acc_60to70, acc_70to80, acc_80to90, acc_90to100])
#Exporting files for tensorboard
result = sess.run(merged, feed_dict={x:tr_batch_xs, y_: tr_batch_ys, keep_prob: 1.0})
writer.add_summary(result, step)
start_time = time.time()
#Execution of learning
train_step.run(feed_dict={x: tr_batch_xs, y_: tr_batch_ys, keep_prob: 0.5})
return acc_list, num_data_each_conf, acc_each_conf, total_cal_time
#==============================================================================================================================================
"""Functions that create scalograms and labels"""
#==============================================================================================================================================
def make_scalogram(train_file_name, test_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc):
"""
train_file_name :File name of training data
test_file_name :Test data file name
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
height :Image height, num of time lines
width :Image width, num of freq lines
predict_time_inc :Increment of time to predict price movement
ch_flag :Number of channels to use, ch_flag=1:close, ch_flag=5:start, high, low, close, volume
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
over_lap_inc :Incremental CWT start time
"""
#Creating scalograms and labels
# train
x_train, t_train, freq_train = sca.merge_scalogram(train_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc)
# x_train, t_train, freq_train = sca.merge_scalogram(test_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc) # for debag
# test
x_test, t_test, freq_test = sca.merge_scalogram(test_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc)
print("x_train shape " + str(x_train.shape))
print("t_train shape " + str(t_train.shape))
print("x_test shape " + str(x_test.shape))
print("t_test shape " + str(t_test.shape))
print("frequency " + str(freq_test))
#Swap dimensions for tensorflow
x_train = x_train.transpose(0, 2, 3, 1) # (num_data, ch, height(time_lines), width(freq_lines)) ⇒ (num_data, height(time_lines), width(freq_lines), ch)
x_test = x_test.transpose(0, 2, 3, 1)
train_size = x_train.shape[0] #Number of training data
test_size = x_test.shape[0] #Number of test data
# labes to one-hot
t_train_onehot = np.zeros((train_size, 2))
t_test_onehot = np.zeros((test_size, 2))
t_train_onehot[np.arange(train_size), t_train] = 1
t_test_onehot[np.arange(test_size), t_test] = 1
t_train = t_train_onehot
t_test = t_test_onehot
# print("t train shape onehot" + str(t_train.shape)) # for debag
# print("t test shape onehot" + str(t_test.shape))
return x_train, t_train, x_test, t_test
#==============================================================================================================================================
"""Scalogram creation conditions"""
#=============================================================================================================================================
predict_time_inc = 6 #Increment of time to predict price movement
height = 288 #Image height, num of time lines
width = 128 #Image width, num of freq lines
ch_flag = 1 #Number of channels to use, ch_flag=1:close, ch_flag=5:start, high, low, close, volume
input_dim = (ch_flag, height, width) # channel = (1, 5), height(time_lines), width(freq_lines)
save_flag = 0 # save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
scales = np.linspace(0.2,80,width) #Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
# scales = np.arange(1,129)
wavelet = "gaus1" #Wavelet name, 'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
over_lap_inc = 72 #Incremental CWT start time
#==============================================================================================================================================
"""Build CNN"""
#==============================================================================================================================================
x = tf.placeholder(tf.float32, [None, input_dim[1], input_dim[2], input_dim[0]]) # (num_data, height(time), width(freq_lines), ch)
y_ = tf.placeholder(tf.float32, [None, 2]) # (num_data, num_label)
print("input shape ", str(x.get_shape()))
with tf.variable_scope("conv1") as scope:
W_conv1 = weight_variable([5, 5, input_dim[0], 16])
b_conv1 = bias_variable([16])
h_conv1 = tf.nn.relu(conv2d(x, W_conv1) + b_conv1)
h_pool1 = max_pool_2x2(h_conv1)
print("conv1 shape ", str(h_pool1.get_shape()))
with tf.variable_scope("conv2") as scope:
W_conv2 = weight_variable([5, 5, 16, 32])
b_conv2 = bias_variable([32])
h_conv2 = tf.nn.relu(conv2d(h_pool1, W_conv2) + b_conv2)
h_pool2 = max_pool_2x2(h_conv2)
print("conv2 shape ", str(h_pool2.get_shape()))
h_pool2_height = int(h_pool2.get_shape()[1])
h_pool2_width = int(h_pool2.get_shape()[2])
with tf.variable_scope("fc1") as scope:
W_fc1 = weight_variable([h_pool2_height*h_pool2_width*32, 1024])
b_fc1 = bias_variable([1024])
h_pool2_flat = tf.reshape(h_pool2, [-1, h_pool2_height*h_pool2_width*32])
h_fc1 = tf.nn.relu(tf.matmul(h_pool2_flat, W_fc1) + b_fc1)
print("fc1 shape ", str(h_fc1.get_shape()))
keep_prob = tf.placeholder(tf.float32)
h_fc1_drop = tf.nn.dropout(h_fc1, keep_prob)
with tf.variable_scope("fc2") as scope:
W_fc2 = weight_variable([1024, 2])
b_fc2 = bias_variable([2])
y_conv = tf.nn.softmax(tf.matmul(h_fc1_drop, W_fc2) + b_fc2)
print("output shape ", str(y_conv.get_shape()))
#Visualize parameters with tensorboard
W_conv1 = tf.summary.histogram("W_conv1", W_conv1)
b_conv1 = tf.summary.histogram("b_conv1", b_conv1)
W_conv2 = tf.summary.histogram("W_conv2", W_conv2)
b_conv2 = tf.summary.histogram("b_conv2", b_conv2)
W_fc1 = tf.summary.histogram("W_fc1", W_fc1)
b_fc1 = tf.summary.histogram("b_fc1", b_fc1)
W_fc2 = tf.summary.histogram("W_fc2", W_fc2)
b_fc2 = tf.summary.histogram("b_fc2", b_fc2)
#==============================================================================================================================================
"""Specifying the error function"""
#==============================================================================================================================================
# cross_entropy = -tf.reduce_sum(y_ * tf.log(y_conv))
cross_entropy = tf.reduce_sum(tf.nn.softmax_cross_entropy_with_logits(labels = y_, logits = y_conv))
loss_summary = tf.summary.scalar("loss", cross_entropy) # for tensorboard
#==============================================================================================================================================
"""Specify optimizer"""
#==============================================================================================================================================
optimizer = tf.train.AdamOptimizer(1e-4)
train_step = optimizer.minimize(cross_entropy)
#Visualize the gradient with a tensorboard
grads = optimizer.compute_gradients(cross_entropy)
dW_conv1 = tf.summary.histogram("dW_conv1", grads[0]) # for tensorboard
db_conv1 = tf.summary.histogram("db_conv1", grads[1])
dW_conv2 = tf.summary.histogram("dW_conv2", grads[2])
db_conv2 = tf.summary.histogram("db_conv2", grads[3])
dW_fc1 = tf.summary.histogram("dW_fc1", grads[4])
db_fc1 = tf.summary.histogram("db_fc1", grads[5])
dW_fc2 = tf.summary.histogram("dW_fc2", grads[6])
db_fc2 = tf.summary.histogram("db_fc2", grads[7])
# for i in range(8): # for debag
# print(grads[i])
#==============================================================================================================================================
"""Parameters for accuracy verification"""
#==============================================================================================================================================
correct_prediction = tf.equal(tf.argmax(y_conv,1), tf.argmax(y_,1))
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
accuracy_summary = tf.summary.scalar("accuracy", accuracy) # for tensorboard
#==============================================================================================================================================
"""Execution of learning"""
#==============================================================================================================================================
acc_list = [] #List to save the accuracy rate and the progress of the error
num_data_each_conf = [] #A list that stores the progress of the number of data for each conviction
acc_each_conf = [] #A list that saves the progress of the correct answer rate for each conviction
start_time = time.time() #Calculation time count
total_cal_time = 0
iters = 10000 #Number of trainings for each training data
train_batch_size = 100 #Learning batch size
test_batch_size = 100 #Test batch size
with tf.Session() as sess:
saver = tf.train.Saver()
sess.run(tf.global_variables_initializer())
#Exporting files for tensorboard
merged = tf.summary.merge_all()
writer = tf.summary.FileWriter(r"temp_result", sess.graph)
print("learning term = 10year")
train_file_name = "USDJPY_20070301_20170228_5min.csv" #Exchange data file name, train
# train_file_name = "USDJPY_20170301_20170731_5min.csv" # for debag
test_file_name = "USDJPY_20170301_20170731_5min.csv" #Exchange data file name, test
#Creating a scalogram
x_train, t_train, x_test, t_test = make_scalogram(train_file_name, test_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc)
#Execution of learning
acc_list, num_data_each_conf, acc_each_conf, total_cal_time = train(x_train, t_train, x_test, t_test, iters, acc_list, num_data_each_conf, acc_each_conf, total_cal_time, train_step, train_batch_size, test_batch_size)
print("learning term = 5year")
train_file_name = "USDJPY_20120301_20170228_5min.csv" #Exchange data file name, train
# train_file_name = "USDJPY_20170301_20170731_5min.csv" # for debag
test_file_name = "USDJPY_20170301_20170731_5min.csv" #Exchange data file name, test
#Creating a scalogram
x_train, t_train, x_test, t_test = make_scalogram(train_file_name, test_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc)
#Execution of learning
acc_list, num_data_each_conf, acc_each_conf, total_cal_time = train(x_train, t_train, x_test, t_test, iters, acc_list, num_data_each_conf, acc_each_conf, total_cal_time, train_step, train_batch_size, test_batch_size)
print("learning term = 2year")
train_file_name = "USDJPY_20150301_20170228_5min.csv" #Exchange data file name, train
# train_file_name = "USDJPY_20170301_20170731_5min.csv" # for debag
test_file_name = "USDJPY_20170301_20170731_5min.csv" #Exchange data file name, test
#Creating a scalogram
x_train, t_train, x_test, t_test = make_scalogram(train_file_name, test_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc)
#Execution of learning
acc_list, num_data_each_conf, acc_each_conf, total_cal_time = train(x_train, t_train, x_test, t_test, iters, acc_list, num_data_each_conf, acc_each_conf, total_cal_time, train_step, train_batch_size, test_batch_size)
print("learning term = 1year")
train_file_name = "USDJPY_20160301_20170228_5min.csv" #Exchange data file name, train
# train_file_name = "USDJPY_20170301_20170731_5min.csv" # for debag
test_file_name = "USDJPY_20170301_20170731_5min.csv" #Exchange data file name, test
#Creating a scalogram
x_train, t_train, x_test, t_test = make_scalogram(train_file_name, test_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc)
#Execution of learning
acc_list, num_data_each_conf, acc_each_conf, total_cal_time = train(x_train, t_train, x_test, t_test, iters, acc_list, num_data_each_conf, acc_each_conf, total_cal_time, train_step, train_batch_size, test_batch_size)
#Final accuracy rate for test data
# use all data
print("test accuracy %g"%accuracy.eval(feed_dict={x: x_test, y_: t_test, keep_prob: 1.0}))
# use test batch
# batch_mask = np.random.choice(test_size, test_batch_size)
# te_batch_xs = x_test[batch_mask]
# te_batch_ys = t_test[batch_mask]
# test_accuracy = accuracy.eval(feed_dict={x: te_batch_xs, y_: te_batch_ys, keep_prob: 1.0})
print("total calculation time %g sec"%total_cal_time)
np.savetxt(r"temp_result\acc_list.csv", acc_list, delimiter = ",") #Writing out the correct answer rate and the progress of the error
np.savetxt(r"temp_result\number_of_data_each_confidence.csv", num_data_each_conf, delimiter = ",") #Exporting the progress of the number of data for each conviction
np.savetxt(r"temp_result\accuracy_rate_of_each_confidence.csv", acc_each_conf, delimiter = ",") #Writing out the progress of the correct answer rate for each conviction
saver.save(sess, r"temp_result\spectrogram_model.ckpt") #Export final parameters
#==============================================================================================================================================
scalogram2.py
# -*- coding: utf-8 -*-
"""
Created on Tue Jul 25 11:24:50 2017
@author: izumiy
"""
import pywt
import numpy as np
import matplotlib.pyplot as plt
def create_scalogram_1(time_series, scales, wavelet, predict_time_inc, save_flag, ch_flag, height, width):
"""
A function that performs a continuous wavelet transform
Use closing price
time_series :Currency data,closing price
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
ch_flag :Number of channels to use, ch_flag=1 : close
height :Image height num of time lines
width :Image width num of freq lines
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print("number of the series data : " + str(num_series_data))
close = time_series
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
print("carry out cwt...")
time_start = 0
time_end = time_start + height
scalogram = np.empty((0, ch_flag, height, width))
# hammingWindow = np.hamming(height) #Humming window
# hanningWindow = np.hanning(height) #Hanning window
# blackmanWindow = np.blackman(height) #Blackman window
# bartlettWindow = np.bartlett(height) #Bartlett window
while(time_end <= num_series_data - predict_time_inc):
# print("time start " + str(time_start)) for debag
temp_close = close[time_start:time_end]
#With window function
# temp_close = temp_close * hammingWindow
#mirror,Add inverted data before and after the data
mirror_temp_close = temp_close[::-1]
x = np.append(mirror_temp_close, temp_close)
temp_close = np.append(x, mirror_temp_close)
temp_cwt_close, freq_close = pywt.cwt(temp_close, scales, wavelet) #Performing continuous wavelet transform
temp_cwt_close = temp_cwt_close.T #Transposed CWT(freq, time) ⇒ CWT(time, freq)
#mirror,Extract only the central data
temp_cwt_close = temp_cwt_close[height:2*height,:]
temp_cwt_close = np.reshape(temp_cwt_close, (-1, ch_flag, height, width)) # num_data, ch, height(time), width(freq)
# print("temp_cwt_close_shape " + str(temp_cwt_close.shape)) # for debag
scalogram = np.append(scalogram, temp_cwt_close, axis=0)
# print("cwt_close_shape " + str(cwt_close.shape)) # for debag
time_start = time_end
time_end = time_start + height
"""Creating a label"""
print("make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
label_array = predict_close > corrent_close
# print(label_array[:30]) # for debag
"""
#How to use while,slow
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""label_array(time),Slice so that time is divisible by height"""
raw_num_shift = label_array.shape[0]
num_shift = int(raw_num_shift / height) * height
label_array = label_array[0:num_shift]
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
label_array = np.reshape(label_array, (-1, height))
label_array = label_array[:, col]
"""File output"""
if save_flag == 1:
print("output the files")
save_cwt_close = np.reshape(scalogram, (-1, width))
np.savetxt("scalogram.csv", save_cwt_close, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return scalogram, label_array, freq_close
def create_scalogram_5(time_series, scales, wavelet, predict_time_inc, save_flag, ch_flag, height, width):
"""
A function that performs a continuous wavelet transform
Use closing price
time_series :Currency data,closing price
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
ch_flag :Number of channels to use, ch_flag=5 : start, high, low, close, volume
height :Image height num of time lines
width :Image width num of freq lines
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print("number of the series data : " + str(num_series_data))
start = time_series[:,0]
high = time_series[:,1]
low = time_series[:,2]
close = time_series[:,3]
volume = time_series[:,4]
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
print("carry out cwt...")
time_start = 0
time_end = time_start + height
scalogram = np.empty((0, ch_flag, height, width))
while(time_end <= num_series_data - predict_time_inc):
# print("time start " + str(time_start)) for debag
temp_start = start[time_start:time_end]
temp_high = high[time_start:time_end]
temp_low = low[time_start:time_end]
temp_close = close[time_start:time_end]
temp_volume = volume[time_start:time_end]
temp_cwt_start, freq_start = pywt.cwt(temp_start, scales, wavelet) #Performing continuous wavelet transform
temp_cwt_high, freq_high = pywt.cwt(temp_high, scales, wavelet)
temp_cwt_low, freq_low = pywt.cwt(temp_low, scales, wavelet)
temp_cwt_close, freq_close = pywt.cwt(temp_close, scales, wavelet)
temp_cwt_volume, freq_volume = pywt.cwt(temp_volume, scales, wavelet)
temp_cwt_start = temp_cwt_start.T #Transposed CWT(freq, time) ⇒ CWT(time, freq)
temp_cwt_high = temp_cwt_high.T
temp_cwt_low = temp_cwt_low.T
temp_cwt_close = temp_cwt_close.T
temp_cwt_volume = temp_cwt_volume.T
temp_cwt_start = np.reshape(temp_cwt_start, (-1, 1, height, width)) # num_data, ch, height(time), width(freq)
temp_cwt_high = np.reshape(temp_cwt_high, (-1, 1, height, width))
temp_cwt_low = np.reshape(temp_cwt_low, (-1, 1, height, width))
temp_cwt_close = np.reshape(temp_cwt_close, (-1, 1, height, width))
temp_cwt_volume = np.reshape(temp_cwt_volume, (-1, 1, height, width))
# print("temp_cwt_close_shape " + str(temp_cwt_close.shape)) # for debag
temp_cwt_start = np.append(temp_cwt_start, temp_cwt_high, axis=1)
temp_cwt_start = np.append(temp_cwt_start, temp_cwt_low, axis=1)
temp_cwt_start = np.append(temp_cwt_start, temp_cwt_close, axis=1)
temp_cwt_start = np.append(temp_cwt_start, temp_cwt_volume, axis=1)
# print("temp_cwt_start_shape " + str(temp_cwt_start.shape)) for debag
scalogram = np.append(scalogram, temp_cwt_start, axis=0)
# print("cwt_close_shape " + str(cwt_close.shape)) # for debag
time_start = time_end
time_end = time_start + height
"""Creating a label"""
print("make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
label_array = predict_close > corrent_close
# print(label_array[:30]) # for debag
"""
#How to use while,slow
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""label_array(time),Slice so that time is divisible by height"""
raw_num_shift = label_array.shape[0]
num_shift = int(raw_num_shift / height) * height
label_array = label_array[0:num_shift]
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
label_array = np.reshape(label_array, (-1, height))
label_array = label_array[:, col]
"""File output"""
if save_flag == 1:
print("output the files")
save_cwt_close = np.reshape(scalogram, (-1, width))
np.savetxt("scalogram.csv", save_cwt_close, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return scalogram, label_array, freq_close
def CWT_1(time_series, scales, wavelet, predict_time_inc, save_flag):
"""
A function that performs a continuous wavelet transform
Use closing price
time_series :Currency data,closing price
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print("number of the series data : " + str(num_series_data))
close = time_series
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
print("carry out cwt...")
cwt_close, freq_close = pywt.cwt(close, scales, wavelet)
#Transposed CWT(freq, time) ⇒ CWT(time, freq)
cwt_close = cwt_close.T
"""Creating a label"""
print("make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
label_array = predict_close > corrent_close
# print(label_array[:30]) # for debag
"""
#How to use while
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""File output"""
if save_flag == 1:
print("output the files")
np.savetxt("CWT_close.csv", cwt_close, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return [cwt_close], label_array, freq_close
def merge_CWT_1(cwt_list, label_array, height, width):
"""
Use closing price
cwt_list :CWT result list
label_array :Numpy array containing labels
height :Image height num of time lines
width :Image width num of freq lines
"""
print("merge CWT")
cwt_close = cwt_list[0] #Closing price CWT(time, freq)
"""CWT(time, freq),Slice so that time is divisible by height"""
raw_num_shift = cwt_close.shape[0]
num_shift = int(raw_num_shift / height) * height
cwt_close = cwt_close[0:num_shift]
label_array = label_array[0:num_shift]
"""Shape change, (The number of data,Channel,height(time),width(freq))"""
cwt_close = np.reshape(cwt_close, (-1, 1, height, width))
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
label_array = np.reshape(label_array, (-1, height))
label_array = label_array[:, col]
return cwt_close, label_array
def CWT_2(time_series, scales, wavelet, predict_time_inc, save_flag):
"""
A function that performs a continuous wavelet transform
closing price,Use Volume
time_series :Currency data,closing price, volume
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print("number of the series data : " + str(num_series_data))
close = time_series[:,0]
volume = time_series[:,1]
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
print("carry out cwt...")
cwt_close, freq_close = pywt.cwt(close, scales, wavelet)
cwt_volume, freq_volume = pywt.cwt(volume, scales, wavelet)
#Transposed CWT(freq, time) ⇒ CWT(time, freq)
cwt_close = cwt_close.T
cwt_volume = cwt_volume.T
"""Creating a label"""
print("make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
label_array = predict_close > corrent_close
# print(label_array[:30]) # for debag
"""
#How to use while
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""File output"""
if save_flag == 1:
print("output the files")
np.savetxt("CWT_close.csv", cwt_close, delimiter = ",")
np.savetxt("CWT_volume.csv", cwt_volume, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return [cwt_close, cwt_volume], label_array, freq_close
def merge_CWT_2(cwt_list, label_array, height, width):
"""
closing price,Use Volume
cwt_list :CWT result list
label_array :Numpy array containing labels
height :Image height num of time lines
width :Image width num of freq lines
"""
print("merge CWT")
cwt_close = cwt_list[0] #Closing price CWT(time, freq)
cwt_volume = cwt_list[1] #Volume
"""CWT(time, freq),Slice so that time is divisible by height"""
raw_num_shift = cwt_close.shape[0]
num_shift = int(raw_num_shift / height) * height
cwt_close = cwt_close[0:num_shift]
cwt_volume = cwt_volume[0:num_shift]
label_array = label_array[0:num_shift]
"""Shape change, (The number of data,Channel,height(time),width(freq))"""
cwt_close = np.reshape(cwt_close, (-1, 1, height, width))
cwt_volume = np.reshape(cwt_volume, (-1, 1, height, width))
"""Merge"""
cwt_close = np.append(cwt_close, cwt_volume, axis=1)
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
label_array = np.reshape(label_array, (-1, height))
label_array = label_array[:, col]
return cwt_close, label_array
def CWT_5(time_series, scales, wavelet, predict_time_inc, save_flag):
"""
A function that performs a continuous wavelet transform
Open price, high price, low price, close price,Use Volume
time_series :Currency data,Open price,High price,Low price,closing price, volume
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
"""
"""Reading exchange time series data"""
num_series_data = time_series.shape[0] #Get the number of data
print("number of the series data : " + str(num_series_data))
start = time_series[:,0]
high = time_series[:,1]
low = time_series[:,2]
close = time_series[:,3]
volume = time_series[:,4]
"""Performing continuous wavelet transform"""
# https://pywavelets.readthedocs.io/en/latest/ref/cwt.html
print("carry out cwt...")
cwt_start, freq_start = pywt.cwt(start, scales, wavelet)
cwt_high, freq_high = pywt.cwt(high, scales, wavelet)
cwt_low, freq_low = pywt.cwt(low, scales, wavelet)
cwt_close, freq_close = pywt.cwt(close, scales, wavelet)
cwt_volume, freq_volume = pywt.cwt(volume, scales, wavelet)
#Transposed CWT(freq, time) ⇒ CWT(time, freq)
cwt_start = cwt_start.T
cwt_high = cwt_high.T
cwt_low = cwt_low.T
cwt_close = cwt_close.T
cwt_volume = cwt_volume.T
"""Creating a label"""
print("make label...")
#How to compare two sequences
last_time = num_series_data - predict_time_inc
corrent_close = close[:last_time]
predict_close = close[predict_time_inc:]
label_array = predict_close > corrent_close
# print(label_array.dtype) >>> bool
"""
#How to use while
label_array = np.array([])
print(label_array)
time_start = 0
time_predict = time_start + predict_time_inc
while(time_predict < num_series_data):
if close[time_start] >= close[time_predict]:
label = 0 #Go down
else:
label = 1 #Go up
label_array = np.append(label_array, label)
time_start = time_start + 1
time_predict = time_start + predict_time_inc
# print(label_array[:30]) # for debag
"""
"""File output"""
if save_flag == 1:
print("output the files")
np.savetxt("CWT_start.csv", cwt_start, delimiter = ",")
np.savetxt("CWT_high.csv", cwt_high, delimiter = ",")
np.savetxt("CWT_low.csv", cwt_low, delimiter = ",")
np.savetxt("CWT_close.csv", cwt_close, delimiter = ",")
np.savetxt("CWT_volume.csv", cwt_volume, delimiter = ",")
np.savetxt("label.csv", label_array.T, delimiter = ",")
print("CWT is done")
return [cwt_start, cwt_high, cwt_low, cwt_close, cwt_volume], label_array, freq_close
def merge_CWT_5(cwt_list, label_array, height, width):
"""
cwt_list :CWT result list
label_array :Numpy array containing labels
height :Image height num of time lines
width :Image width num of freq lines
"""
print("merge CWT")
cwt_start = cwt_list[0] #Open price
cwt_high = cwt_list[1] #High price
cwt_low = cwt_list[2] #Low price
cwt_close = cwt_list[3] #Closing price CWT(time, freq)
cwt_volume = cwt_list[4] #Volume
"""CWT(time, freq),Slice so that time is divisible by height"""
raw_num_shift = cwt_close.shape[0]
num_shift = int(raw_num_shift / height) * height
cwt_start = cwt_start[0:num_shift]
cwt_high = cwt_high[0:num_shift]
cwt_low = cwt_low[0:num_shift]
cwt_close = cwt_close[0:num_shift]
cwt_volume = cwt_volume[0:num_shift]
label_array = label_array[0:num_shift]
"""Shape change, (The number of data,Channel,height(time),width(freq))"""
cwt_start = np.reshape(cwt_start, (-1, 1, height, width))
cwt_high = np.reshape(cwt_high, (-1, 1, height, width))
cwt_low = np.reshape(cwt_low, (-1, 1, height, width))
cwt_close = np.reshape(cwt_close, (-1, 1, height, width))
cwt_volume = np.reshape(cwt_volume, (-1, 1, height, width))
"""Merge"""
cwt_start = np.append(cwt_start, cwt_high, axis=1)
cwt_start = np.append(cwt_start, cwt_low, axis=1)
cwt_start = np.append(cwt_start, cwt_close, axis=1)
cwt_start = np.append(cwt_start, cwt_volume, axis=1)
"""Extraction of labels corresponding to each scalogram, (The number of data,label)"""
col = height - 1
label_array = np.reshape(label_array, (-1, height))
label_array = label_array[:, col]
# print(label_array.dtype) >>> bool
return cwt_start, label_array
def make_scalogram(input_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc):
"""
input_file_name :Exchange data file name
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
height :Image height num of time lines
width :Image width num of freq lines
ch_flag :Number of channels to use, ch_flag=1:close, ch_flag=2:close and volume, ch_flag=5:start, high, low, close, volume
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
over_lap_inc :Incremental CWT start time
"""
scalogram = np.empty((0, ch_flag, height, width)) #Array to store all scalograms and labels
label = np.array([])
over_lap_start = 0
over_lap_end = int((height - 1) / over_lap_inc) * over_lap_inc + 1
if ch_flag==1:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (5,), skiprows = 1) #Get the closing price as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
cwt_list, label_array, freq = CWT_1(temp_time_series, scales, wavelet, predict_time_inc, save_flag) #Run CWT
temp_scalogram, temp_label = merge_CWT_1(cwt_list, label_array, height, width) #Creating a scalogram
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
print("scalogram_shape " + str(scalogram.shape))
print("label shape " + str(label.shape))
print("frequency " + str(freq))
elif ch_flag==2:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (5,6), skiprows = 1) #closing price,Get volume as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
cwt_list, label_array, freq = CWT_2(temp_time_series, scales, wavelet, predict_time_inc, save_flag) #Run CWT
temp_scalogram, temp_label = merge_CWT_2(cwt_list, label_array, height, width) #Creating a scalogram
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
print("scalogram_shape " + str(scalogram.shape))
print("label shape " + str(label.shape))
print("frequency " + str(freq))
elif ch_flag==5:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (2,3,4,5,6), skiprows = 1) #Open price,High price,Low price,closing price,Get volume as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
cwt_list, label_array, freq = CWT_5(temp_time_series, scales, wavelet, predict_time_inc, save_flag) #Run CWT
temp_scalogram, temp_label = merge_CWT_5(cwt_list, label_array, height, width) #Creating a scalogram
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
# print(temp_label.dtype) >>> bool
# print(label.dtype) >>> float64
print("scalogram_shape " + str(scalogram.shape))
print("label shape " + str(label.shape))
print("frequency " + str(freq))
label = label.astype(np.int)
return scalogram, label
def merge_scalogram(input_file_name, scales, wavelet, height, width, predict_time_inc, ch_flag, save_flag, over_lap_inc):
"""
input_file_name :Exchange data file name
scales :Specify the scale to use with a numpy array,The scale corresponds to the frequency of the wavelet used for analysis,High scales and low frequencies,If it is small, it will be high frequency
wavelet :Wavelet name,Use one of the following
'gaus1', 'gaus2', 'gaus3', 'gaus4', 'gaus5', 'gaus6', 'gaus7', 'gaus8', 'mexh', 'morl'
predict_time_inc :Increment of time to predict price movement
height :Image height num of time lines
width :Image width num of freq lines
ch_flag :Number of channels to use, ch_flag=1:close, ch_flag=2:close and volume, ch_flag=5:start, high, low, close, volume
save_flag : save_flag=1 :Save the CWT coefficient as a csv file, save_flag=0 :Do not save CWT coefficients as a csv file
over_lap_inc :Incremental CWT start time
"""
scalogram = np.empty((0, ch_flag, height, width)) #Array to store all scalograms and labels
label = np.array([])
over_lap_start = 0
over_lap_end = int((height - 1) / over_lap_inc) * over_lap_inc + 1
if ch_flag==1:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (5,), skiprows = 1) #Get the closing price as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
temp_scalogram, temp_label, freq = create_scalogram_1(temp_time_series, scales, wavelet, predict_time_inc, save_flag, ch_flag, height, width)
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
# print("scalogram_shape " + str(scalogram.shape))
# print("label shape " + str(label.shape))
# print("frequency " + str(freq))
if ch_flag==5:
print("reading the input file...")
time_series = np.loadtxt(input_file_name, delimiter = ",", usecols = (2,3,4,5,6), skiprows = 1) #Get the closing price as a numpy array
for i in range(over_lap_start, over_lap_end, over_lap_inc):
print("over_lap_start " + str(i))
temp_time_series = time_series[i:] #Change the start time of CWT
temp_scalogram, temp_label, freq = create_scalogram_5(temp_time_series, scales, wavelet, predict_time_inc, save_flag, ch_flag, height, width)
scalogram = np.append(scalogram, temp_scalogram, axis=0) #Combine all scalograms and labels into one array
label = np.append(label, temp_label)
label = label.astype(np.int)
return scalogram, label, freq
Recommended Posts