Previously (1, 2, [3](http://qiita. Continuation from com / yai / items / 1b7f8ef69f8f2343e3e9)). It's not fun to just classify, so I'd like to try a regression problem. So, this time, I used the machine learning framework TensorFlow and scikit-learn to directly predict the stock price. Previously, the stock price on the next day was classified as "up or down", but this time we will directly predict "how many yen". The data used for input is reused. Sumimasen every time.
--Try to predict the stock price using TensorFlow or scikit-learn. --Check the accuracy and usability.
"Use several days' worth of global stock indexes (Dow, Nikkei 225, DAX, etc.) to predict the next day's Nikkei 225" (regression)
scikit-learn scikit-learn 0.17.1 Python 2.7 Windows 7 TensorFlow TensorFlow 0.7 Ubuntu 14.04 Python 2.7 AWS EC2 micro instance
Download the Nikkei, Dow, Hang Seng and German stock indexes from the site Quandl. Combine them into one as text data. (Manual work)
Use the closing price of the next day as the correct answer data. (Predict the closing price of the next day) However, when I tried to put the stock price directly as a label, the result was divergent and divergent, so I devised it. We will make the prediction target indirect, such as "what percentage of the closing price of the next day will be higher or lower than the previous day", and then recalculate to find the closing price. The same rate of change is used for the inputs described below, so it makes sense ... (・ ・;)
# JUDGE_DAY = 1,Second subscript[3]Contains the closing price of the Nikkei 225.
y_array.append([(array_base[i][3] - array_base[i+JUDGE_DAY][3]) / array_base[i][3] * 100])
Instead of putting the stock price as it is, we give a list of "how much (%) it went up or down compared to the previous day". (Because it didn't work at all even if I put the stock price as it was)
tmp_array = []
for j in xrange(i+1, i + data_num + 1):
for k in range(16):
tmp_array.append((array_base[j][k] - array_base[j+1][k]) / array_base[j][k] * 100)
x_array.append(tmp_array)
TensorFlow has two hidden layers and the number of units is 50 and 25, respectively.
NUM_HIDDEN1 = 50
NUM_HIDDEN2 = 25
def inference(x_ph, keep_prob):
with tf.name_scope('hidden1'):
weights = tf.Variable(tf.truncated_normal([data_num * price_num, NUM_HIDDEN1], stddev=stddev), name='weights')
biases = tf.Variable(tf.zeros([NUM_HIDDEN1]), name='biases')
hidden1 = tf.nn.relu(tf.matmul(x_ph, weights) + biases)
with tf.name_scope('hidden2'):
weights = tf.Variable(tf.truncated_normal([NUM_HIDDEN1, NUM_HIDDEN2], stddev=stddev), name='weights')
biases = tf.Variable(tf.zeros([NUM_HIDDEN2]), name='biases')
hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
#DropOut
dropout = tf.nn.dropout(hidden2, keep_prob)
with tf.name_scope('regression'):
weights = tf.Variable(tf.truncated_normal([NUM_HIDDEN2, 1], stddev=stddev), name='weights')
biases = tf.Variable(tf.zeros([1]), name='biases')
y = tf.matmul(dropout, weights) + biases
return y
Use l2_loss () to calculate the loss. I'm wondering if this is good because the difference between the numbers is a loss, but I'm not sure if it's correct. Those who say "No," are welcome to comment.
def loss(y, target):
return tf.reduce_mean(tf.nn.l2_loss((y - target)))
Isn't there anything special to mention?
def optimize(loss):
optimizer = tf.train.AdamOptimizer(learning_rate)
train_step = optimizer.minimize(loss)
return train_step
Isn't there anything special to mention here as well?
def training(sess, train_step, loss, x_train_array, y_flg_train_array):
summary_op = tf.merge_all_summaries()
init = tf.initialize_all_variables()
sess.run(init)
summary_writer = tf.train.SummaryWriter(LOG_DIR, graph_def=sess.graph_def)
for i in range(int(len(x_train_array) / bach_size)):
batch_xs = getBachArray(x_train_array, i * bach_size, bach_size)
batch_ys = getBachArray(y_flg_train_array, i * bach_size, bach_size)
sess.run(train_step, feed_dict={x_ph: batch_xs, y_ph: batch_ys, keep_prob: 0.8})
ce = sess.run(loss, feed_dict={x_ph: batch_xs, y_ph: batch_ys, keep_prob: 1.0})
summary_str = sess.run(summary_op, feed_dict={x_ph: batch_xs, y_ph: batch_ys, keep_prob: 1.0})
summary_writer.add_summary(summary_str, i)
For accuracy, the difference between the calculated fluctuation rate of the stock price and the actual fluctuation rate is calculated, and the average of the absolute values is output. In other words, it just gives the average error.
accuracy = tf.reduce_mean(tf.abs(y - y_ph))
print "accuracy"
print(sess.run(accuracy, feed_dict={x_ph: test_batch_xs, y_ph: test_batch_ys, keep_prob: 1.0}))
There are various algorithms, but ... I'm not sure which one is the best, so I'll pick up about three and use them without arguments.
# SGDRegressor
clf = linear_model.SGDRegressor()
testClf(clf, x_train_array, y_train_array, x_test_array, y_test_array)
# DecisionTreeRegressor
clf = tree.DecisionTreeRegressor()
testClf(clf, x_train_array, y_train_array, x_test_array, y_test_array)
# SVM
clf = svm.SVR()
testClf(clf, x_train_array, y_train_array, x_test_array, y_test_array)
Training only executes fit (). When I executed score (), the evaluation became partially negative, and I was not sure how to judge it (please tell me if you know it), so as with TensorFlow, the predicted stock price change rate Is taken out and the absolute value of the difference in the actual stock price change rate is averaged (in short, the average error).
def testClf(clf, x_train_array, y_flg_train_array, x_test_array, y_flg_test_array):
print clf
clf.fit(x_train_array, y_flg_train_array)
result = clf.predict(x_test_array)
print clf.score(x_test_array, y_flg_test_array)
print np.mean(np.abs(np.array(result) - np.array(y_flg_test_array)))
TensorFlow
1.00044
scikit-learn
SGDRegressor: 0.943171296872
DecisionTreeRegressor: 1.3551351662
SVM: 0.945361479916
Therefore, the error was about 1%. A 1% error in the stock price forecast ... can't be used at all ... Gefun Gefun.
Since it's a big deal, let's actually get the predicted value. Since the data at hand is up to 2016/03/24, we predict the closing price of the Nikkei Stock Average on 2016/03/25. scikit-learn uses SVM.
TensorFlow
p = sess.run(y, feed_dict={x_ph: data, keep_prob: 1.0})
price = ((p[0][0] / 100.) + 1.) * 16892.33
print price
scikit-learn
p = clf.predict(data)
price = ((p[0] / 100.) + 1.) * 16892.33
print price
TensorFlow
16804.3398821
scikit-learn
16822.6013292
So, the actual stock price on 3/25 is ... 17,002.75. ** ... Well, that's right. ** **
――It seems that machine learning does not mean that "why do you train with it and the machines will do their best to come up with the best ones". It seems that humans also need to think about what to do to make machines easier to think and answer.
--Regression problem is fun. ――It's easy to find an error because it doesn't give an unmotivated answer such as "all first categories" like classification. ――The code is a little different from the time of classification, but development is easy because you can use 80% of the chords.