References [Finance Machine Learning](https://www.amazon.co.jp/%E3%83%95%E3%82%A1%E3%82%A4%E3%83%8A%E3%83%B3 % E3% 82% B9% E6% A9% 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E2% 80% 95% E9% 87% 91% E8% 9E% 8D% E5 % B8% 82% E5% A0% B4% E5% 88% 86% E6% 9E% 90% E3% 82% 92% E5% A4% 89% E3% 81% 88% E3% 82% 8B% E6% A9 % 9F% E6% A2% B0% E5% AD% A6% E7% BF% 92% E3% 82% A2% E3% 83% AB% E3% 82% B4% E3% 83% AA% E3% 82% BA % E3% 83% A0% E3% 81% AE% E7% 90% 86% E8% AB% 96% E3% 81% A8% E5% AE% 9F% E8% B7% B5-% E3% 83% 9E% E3% 83% AB% E3% 82% B3% E3% 82% B9% E3% 83% BB% E3% 83% AD% E3% 83% 9A% E3% 82% B9% E3% 83% BB% E3% 83% 87% E3% 83% BB% E3% 83% 97% E3% 83% A9% E3% 83% 89-ebook / dp / B0834XJQTY)
When forecasting financial data, you need to define what you want to forecast, and the approach is completely different depending on what you want to forecast. Perhaps you are most familiar with defining whether the stock price $ T + 1 $ goes up or down with the price change rate or the sign of the price change rate? However, in some cases, it may be difficult to predict, and even if the correct answer rate is high, the average rate of return and Sharpe ratio may be terrible. Such a problem is not a problem that can be solved by labeling alone, but labeling is often neglected, but it actually has a deep meaning.
For example, suppose you have daily OHLC data of the Nikkei Stock Average.
If you want to predict the closing price of the Nikkei Stock Average on the next business day with each closing price as $ X_1, X_2, ..., X_T
Derived from the above example, is there an example where it is sufficient to give a correct answer rate of 50% or more on the prediction label? For example, what about such a strategy? We make a strong assumption that we can trade with the price of assets and have excellent liquidity (do not jump). If you hold a new asset at $ T = 0 $ and the reconciliation moves up + 1bps or -1bps, settle. This is the simplest binomial model introduced in finance. In this case, 50% or more of the predicted labels will have a positive expected value.
So how do you label it?
The data is tick data of board information (mid).
I want to explain using Python code.
label.py
labels = df["mid"].diff().shift(-1).replace(0, np.nan).bfill()
labels = labels / abs(labels)
――Since it changes by 1bps, look at it with diff. --Next, I want to see the difference between $ X_ {T} $ and $ X_ {T + 1} $, so shift the index one step to the left. --If the difference is 0, no transaction is made, so set 0 to Null. --Since the settlement is made only when the difference is not 0, if the stop time is set to $ t $, the profit of the strategy at time $ T = 0 $ will be $ X_t-X_0 $. ――At the end, I want to have two labels (1 or -1), so I only look at the code.
In addition, Triple Barrier method, Trend-Scanning method, etc. were introduced in this book, so why not try it as a reference?
Recommended Posts