I took a Certificate at Coursera, so it's a way to spend the summer vacation trying to make something like machine learning or deep learning. There are great people in the world, and there are already tiredly tried approaches, and I will try them with the thinking power that I have studied.
In the future, I would like to make a highly accurate model, make a dream unearned income, and pay taxes.
It took me a day to study.
How to install TensorFlow that even beginners of machine learning can do immediately Many people are already doing it, so please imitate it. The thing to be careful about is to do it with Python3, it is good to specify the version to be used with pyenv etc.
I think it can be done in about 10 minutes.
Keras is a high-level neural network library written in Python that can be run on TensorFlow, CNTK, and Theano.
I was wondering if Deep Learning should have TensorFlow, but of course there are many frameworks that wrap it, and Keras is one of them. You can choose to use theano/tensorflow as documented.
After all, both are for dealing with tensors, and I don't really understand the difference, but at first I thought it wasn't that important and skipped it.
numpy Needless to say, a convenient python numerical calculation
pandas A convenient guy who handles time series data
data = pandas.read_csv('./csv/bitcoin_log_1month.csv')
data = data.sort_values(by='date')
data = data.reset_index(drop=True)
It is convenient to load csv like this. In addition to loading, it has the ability to sort and index.
Scikit-learn It provides a convenient module for machine learning. Since it is premised on processing for numpy.ndarray, it will be executed for this. There are also normalization modules.
tflearn It's a library that makes TensorFlow use like Scikit-learn. It's built into TensorFlow, so you can build neural networks and define models.
matplotlib Everyone loves matlab. It is used to visualize the created graph and save it as an image. Since an error occurs when importing, modify matplotlibrc partly by referring to the link below.
What should I do first? Let's take a look at a lot of helpful links. After all, what I want to do is to make something that can solve the binary classification of "whether tomorrow's closing price will rise or fall". By calling a reference link, I was able to convey the desire to generate unearned income for everyone. Ah yeah.
-Stock Price Forecast with TensorFlow (LSTM) ~ Stock Forecast Part 1 ~ -Forecast the number of airline passengers next month with RNN: Let's implement from LSTM to GRU with TFLearn -Try to predict the exchange rate (FX) with TensorFlow (deep learning)
After all, ML and DL are quite different, so what is handled is very different, so I organized it.
There are various places where bitcoin time series data can be obtained via API, but I wanted to do it investing ) To DL. You can take the annual closing price for several years. If it is not JPY, there are many other options.
We will normalize and standardize the necessary data. At this time it is in sklearn [processing package](http://qiita.com/nazoking@github/items/d6ac1948ee138d73fef1#431-%E6%A8%99%E6%BA%96%E5%8C%96%E5%B9%B3%E5% 9D% 87% E9% 99% A4% E5% 8E% BB% E3% 81% A8% E5% 88% 86% E6% 95% A3% E3% 81% AE% E3% 82% B9% E3% 82% With B1% E3% 83% BC% E3% 83% AA% E3% 83% B3% E3% 82% B0) you can easily do the following:
data['close'] = preprocessing.scale(data['close'])
The required error and model differ depending on the data used, so even if you do not have much knowledge here, you can verify by replacing the arguments. I think you can really study properly and get meaningful results.
Principle of easy deep learning
LSTM (Long Short-Term Memory) is a powerful model that eliminates the drawbacks of RNNs and can learn long-term time series data.
There are various models such as GRU and LSTN, but there are various models, although I will not mention the features.
The rest is error
The meaning of various errors (RMSE, MAE, etc.) There are various types such as MAPE and RMSE. Since it is an evaluation index, it evaluates how far the measured value and the predicted value are when the model is changed or the amount of data is added.
Select how much data you want to learn and how much data you want to compare with. Training: I felt that I often set test = 8: 2.
A model is created using the constructed neural network, fitting is performed based on the model, and prediction is performed.
After that, the results are graphed, visualized and evaluated. With TensorBoard, you can easily visualize models and errors.
I learned from the references that I should be able to implement this flow roughly.
# -*- coding: utf-8 -*-
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tflearn
from sklearn import preprocessing
from keras.models import Sequential
from keras.layers.core import Dense, Activation
from keras.layers.recurrent import LSTM
class Prediction :
def __init__(self):
self.dataset = None
#Values to calculate
self.model = None
self.train_predict = None
self.test_predict = None
#Data set parameter settings
self.steps_of_history = 3
self.steps_in_future = 1
self.csv_path = './csv/bitcoin_log.csv'
def load_dataset(self):
#Data preparation
dataframe = pd.read_csv(self.csv_path,
usecols=['closing price'],
engine='python').sort_values('closing price', ascending=True)
self.dataset = dataframe.values
self.dataset = self.dataset.astype('float32')
#Standardization
self.dataset -= np.min(np.abs(self.dataset))
self.dataset /= np.max(np.abs(self.dataset))
def create_dataset(self):
X, Y = [], []
for i in range(0, len(self.dataset) - self.steps_of_history, self.steps_in_future):
X.append(self.dataset[i:i + self.steps_of_history])
Y.append(self.dataset[i + self.steps_of_history])
X = np.reshape(np.array(X), [-1, self.steps_of_history, 1])
Y = np.reshape(np.array(Y), [-1, 1])
return X, Y
def setup(self):
self.load_dataset()
X, Y = self.create_dataset()
# Build neural network
net = tflearn.input_data(shape=[None, self.steps_of_history, 1])
#GRU because LSTM takes time
# http://dhero.hatenablog.com/entry/2016/12/02/%E6%9C%80%E5%BC%B1SE%E3%81%A7%E3%82%82%E6%A9%9F%E6%A2%B0%E5%AD%A6%E7%BF%92%E3%81%A7%E3%81%8A%E9%87%91%E3%81%8C%E7%A8%BC%E3%81%8E%E3%81%9F%E3%81%84%E3%80%905%E6%97%A5%E7%9B%AE%E3%83%BBTFLearn%E3%81%A8
net = tflearn.gru(net, n_units=6)
net = tflearn.fully_connected(net, 1, activation='linear')
#Regression settings
#Measured by Adam method
# http://qiita.com/TomokIshii/items/f355d8e87d23ee8e0c7a
#Mean as an index of prediction accuracy in time series analysis_I'm using square
#mape seems to be common
# categorical_crossentropy
# mean_square :Root mean square
net = tflearn.regression(net, optimizer='adam', learning_rate=0.001,
loss='mean_square')
# Define model
self.model = tflearn.DNN(net, tensorboard_verbose=0)
#This time 80%The training dataset, 20%Is treated as a test data set.
pos = round(len(X) * (1 - 0.2))
trainX, trainY = X[:pos], Y[:pos]
testX, testY = X[pos:], Y[pos:]
return trainX, trainY, testX
def executePredict(self, trainX, trainY, testX):
# Start training (apply gradient descent algorithm)
self.model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=1, n_epoch=150, run_id='btc')
# predict
self.train_predict = self.model.predict(trainX)
self.test_predict = self.model.predict(testX)
def showResult(self):
# plot train data
train_predict_plot = np.empty_like(self.dataset)
train_predict_plot[:, :] = np.nan
train_predict_plot[self.steps_of_history:len(self.train_predict) + self.steps_of_history, :] = \
self.train_predict
# plot test dat
test_predict_plot = np.empty_like(self.dataset)
test_predict_plot[:, :] = np.nan
test_predict_plot[len(self.train_predict) + self.steps_of_history:len(self.dataset), :] = \
self.test_predict
# plot show res
plt.figure(figsize=(8, 8))
plt.title('History={} Future={}'.format(self.steps_of_history, self.steps_in_future))
plt.plot(self.dataset, label="actual", color="k")
plt.plot(train_predict_plot, label="train", color="r")
plt.plot(test_predict_plot, label="test", color="b")
plt.savefig('result.png')
plt.show()
if __name__ == "__main__":
prediction = Prediction()
trainX, trainY, testX = prediction.setup()
prediction.executePredict(trainX, trainY, testX)
prediction.showResult()
I referred to the site I referred to very much, but it took about an hour to implement and understand.
Enter a brief code description.
load_dataset
The data used this time is like this. Read this and use it as a data set. Since the order of data is reversed, sort etc.
"Date","closing price","Open price","High price","Low price","The day before ratio"
"September 03, 2017","523714.1875","499204.7813","585203.1250","499204.7813","4.91"
"September 02, 2017","499204.7813","542277.3125","585203.1250","498504.5000","-7.94"
setup Build a neural network. GRU is one of the DL methods. There are various things such as LSTM and RNN, but this time I chose it with an emphasis on speed.
net = tflearn.input_data(shape=[None, self.steps_of_history, 1])
net = tflearn.gru(net, n_units=6)
net = tflearn.fully_connected(net, 1, activation='linear')
Selection of optimization method / miscalculation This time, the optimization is Adam method and the error is mean_square (RMSE (Root Mean Square Error)). / rmsemae) is used. Since methods such as MAPE are not prepared by default, it seems necessary to partially rewrite the library so that it can be used.
net = tflearn.regression(net, optimizer='adam', learning_rate=0.001,
loss='mean_square')
executePredict
Select and predict training / test data.
#This time 80%The training dataset, 20%Is treated as a test data set.
pos = round(len(X) * (1 - 0.2))
trainX, trainY = X[:pos], Y[:pos]
testX, testY = X[pos:], Y[pos:]
# Start training (apply gradient descent algorithm)
self.model.fit(trainX, trainY, validation_set=0.1, show_metric=True, batch_size=1, n_epoch=150, run_id='btc')
# predict
self.train_predict = self.model.predict(trainX)
self.test_predict = self.model.predict(testX)
Black is the actual data, red is the training data, and blue is the predicted value based on the test data.
Here is the result of TensorBoard.
It's like Humuu. Loss / Validation is strange for the time being. From here, we will do our best to investigate parameters that are likely to reduce the error while changing the number of Epoch, model and step, training data ratio and amount of data, number of GRU layers, etc.
By the way, I tried to predict using LSTM with sklearn without using tflearn as follows.
model.add(LSTM(self.hidden_neurons, \
batch_input_shape=(None, self.length_of_sequences, self.in_out_neurons), \
return_sequences=False))
model.add(Dense(self.in_out_neurons))
model.add(Activation("linear"))
model.compile(loss="mape", optimizer="adam")
mape is not in tflearn, but is it in sklearn ...? ?? Very difficult.
I made something that works for the time being, but the summer vacation is over and the rest will be done at a later date. Now that I can move it, I will learn more about the verification method and improve the accuracy.
Visual studio Code for Mac was very good. I thought the implementation of python was Pycham's choice, but it's very good.
Recommended Posts