I decided to use TOPIX historical data for the time being because I wanted to move my hands while reading. https://quotes.wsj.com/index/JP/TOKYO%20EXCHANGE%20(TOPIX)/180460/historical-prices I was able to download it from the WSJ website.
It seems that there are no missing values for the time being. The period is from December 30, 2008 to November 29, 2019. Historical data of 4 values without ex-dividend adjustment. When plotting the closing price, it looks like the figure below. It is a period from the time when the 1,000pt was broken immediately after the Lehman shock to the time after Abenomics. The daily returns are as follows. Since it is daily, it will not make a big difference, but I used a logarithmic return ($ \ Delta \ log {y_t} = \ log {(y_t)}-\ log {(y_ {t-1})} $). The biggest drop was recorded on March 15, 2011, at $ \ Delta \ log {y_t} = -0.0995 $. The cause was the Tohoku-Pacific Ocean Earthquake that occurred about 15 minutes before the closing of the previous day.
average
tpx_return = np.log(tpx['close'].values)[1:] - np.log(tpx['close'].values)[:-1]
tpx_return.mean()
0.00025531962222667643
** auto-covariance **
import statsmodels.api as sm
sm.tsa.stattools.acovf(tpx_return, fft=True, nlag=5)
array([ 1.57113176e-04, 1.16917913e-06, 3.48846296e-06, -4.67502133e-06, -5.31500282e-06, -2.82855375e-06])
I'm not used to the statsmodels library, so check it by hand.
# k=0
((tpx_return-tpx_return.mean())**2).sum() / len(tpx_return)
0.00015711317609153836
# k=4
((tpx_return-tpx_return.mean())[4:]*(tpx_return-tpx_return.mean())[:-4]).sum() / len(tpx_return)
-5.315002816332674e-06
** Auto-correlation **
sm.tsa.stattools.acf(tpx_return)[:5]
array([ 1. , 0.00744164, 0.0222035 , -0.02975576, -0.03382913])
If you check using the results of $ k = 0 $ and $ k = 4 $,
-5.315002816332674e-06 / 0.00015711317609153836
-0.03382913482212345
So it seems that the library is doing the calculations as I imagined.
I will also draw a correlogram. A correlogram is a graph of the autocorrelation coefficient.
autocorr = sm.tsa.stattools.acf(tpx_return)
ci = 1.96 / np.sqrt(len(tpx_return))
plt.bar(np.arange(len(autocorr)), autocorr)
plt.hlines([ci,-ci],0,len(autocorr), linestyle='dashed')
plt.title('Correlogram')
plt.ylim(-0.05,0.05)
plt.show()
CI (confidence interval) is a confidence interval, and when data are independent of each other and follow the same distribution, $ \ hat {\ rho} _k $ asymptotically means $ 0 $ on average and variance $ \ frac {1} {T. We calculate 95% on both sides using the property of following the normal distribution of} $.
It looks like this when using the library. Fashionable.
sm.graphics.tsa.plot_acf(tpx_return)
plt.ylim(-0.05,0.05)
plt.show()
Think a little about the numbers you got.
A method to test the null hypothesis that multiple autocorrelation coefficients are all 0.
First of all, from the pattern that uses the library of stats models. $ m \ approx \ log {(T)} = 7.89 $ So, for the time being, consider the lag in the range up to 16.
lvalue, pvalue = sm.stats.diagnostic.acorr_ljungbox(tpx_return)
That's it. Very easy.
The P value did not fall below 0.05 regardless of the value of $ m $, and the result was that the daily change rate of TOPIX could not be said to have an autocorrelation. Well, the market price is not that simple.
Finally, try without the library to deepen your understanding.
from scipy.stats import chi2
def Q_func(data, max_m):
T = len(data)
auto_corr = sm.tsa.stattools.acf(data)[1:]
lvalue = T*(T+2)*((auto_corr**2/(T-np.arange(1,len(auto_corr)+1)))[:max_m]).cumsum()
pvalue = 1 - chi2.cdf(lvalue, np.arange(1,len(lvalue)+1))
return lvalue, pvalue
Confirm that the same result is obtained.
l_Q_func, p_Q_func = Q_func(tpx_return,max_m=16)
l_sm, p_sm = sm.stats.diagnostic.acorr_ljungbox(tpx_return, lags=16)
((l_Q_func-l_sm)**2).mean(), ((p_Q_func-p_sm)**2).mean()
(0.0, 7.824090399073146e-34)
If you take a closer look at the case of $ m = 8 $,
T = len(tpx_return)
auto_corr = sm.tsa.stattools.acf(tpx_return)[1:]
lvalue = T*(T+2)*((auto_corr**2/(T-np.arange(1,len(auto_corr)+1)))[:8]).sum()
print(lvalue)
8.604732778577853
1-chi2.cdf(lvalue,8)
0.37672860496603844
So it turns out that it is quite difficult to reject the null hypothesis.
Recommended Posts