It's been a long time.
I didn't feel motivated at all because my boarding house was on fire, my part-time job was fired, and I was so addicted to mahjong due to the influence of corona, but I decided to update it as a memorandum after a long time Did.
After touching various things, I came to the conclusion that the code written in the previous article Forecasting the Nikkei 225 with Pytorch is not very good. So I made an article.
It is almost the same as the previous article. I additionally referred to the following.
I tried FX prediction using LSTM
[Python] Stock Price Forecast by LSTM [Chainer]
If you don't have enough data, you can increase it.
Using Deep Learning Neural Networks and Candlestick Chart Representation to Predict Stock Market
Affinity Loss is a story of trying hard for accuracy with CIFAR-10
From the above article, the prediction by LSTM seems to be quite inaccurate. A diagram is plotted that deviates from the forecast and the actual stock price forecast.
When I search the net, I see articles like this, but in many cases, I don't feel like it. There is no basis. It's sensuous. Is it really science?
Then, in the previous article Forecasting the Nikkei 225 with Pytorch, we predict the time when it will increase by 3%. However, I felt that this was severe due to the imbalance of data.
I was dealing with the Nikkei 225 data, which has about 4300 data (for 20 years), but there was no bias in the data at a ratio of 1: 1 in terms of the top and bottom alone. However, if the condition of rise / fall rate + 3% is added, the ratio of rise: fall will be biased by about 1:10. In this case, all the models are expected to decline, and there is no expectation or disappointment.
This is also mentioned in the article Deep Learning Stock Price Forecast Model_1 with 67% Accuracy. Data Augmentation (this article, If you don't have enough data, you can increase it. It was written that it was cleared by (referred to).
DataAugmentation
So. Since the number of data is only 4300, we performed upsampling. In other words, we have increased the data. Since it is time series data, it was managed as a csv file like the image. From here, the flow is to read 20 days' worth (20 lines) with LSTM and let them learn.
With upsampling, when data is added, the smaller one (data and label that rises in this case) is added together. Therefore, in order to make the number of data 1: 1 the expected increase is added nearly 4000 lines in a row.
If this happens, I was learning every 20 lines, so I thought that the data would be completely unreliable in the latter 4000 lines (actually, I couldn't predict it properly). Here, I gave up using Data Augmentation for time series data because I thought it might be problematic.
At that time, I remembered that deep learning began to attract attention in the image classification competition using AlexNet. So I thought about the prediction using the image.
In fact, if you just guess the binary classification of rising and falling, the prediction in the image seems to be proud of high accuracy. Using Deep Learning Neural Networks and Candlestick Chart Representation to Predict Stock Market In this paper, the stock price is read from the chart, and the accuracy is 97% SOTA. ... apparently ...
The code in this article was on git hub, so I decided to use it and anticipate it.
If it is image data, I thought that the data imbalance might be solved by Data Augmentation. More recently, it seems that deeper distance learning can be incorporated to enable more imbalanced learning.
Affinity Loss is a story of trying hard for accuracy with CIFAR-10
I wasn't sure if the above article could be applied to LSTM using time series data, so I decided to study with images.
With this in mind, I'm currently coding. stay tuned. Next article Forecast of Nikkei 225 by Pytorch 2
Recommended Posts