Due to the influence of the new coronavirus, the Shanghai Composite Index is a big bargain sale with a 7% discount. If it goes down so far, the government will probably put in money, so I feel like it will return a little in the short term. (I was sorry if I removed it) My wife (Chinese), who loves bargain sales, recommends buying with all her might, so let's analyze it a little quantitatively. What I want to know is "whether it will be profitable from the reaction from the day after the crash to the next month."
So, first, zscore the daily return and extract only the days that exceed -2σ.
t\in\{t~|~{\rm zscore}(r_t)\leq -2\},~{\rm zscore}(r_t)=\frac{r_t - \mu}{\sigma}
Then, find the relationship with the returns the day after that day, one week later, two weeks later, three weeks later, and one month later.
E[r_{t+d}]=f(r_t), ~d\in\{1, 5, 10, 15, 20\}
First, let's look at autocorrelation. Shanghai synthesis has only outliers like China, and I feel that the normal pearson product moment correlation (which is easily affected by outliers) is not good, so here is the rank correlation of spearman (because it is ranked, it is affected by outliers). I asked for it.
\rho=1-\frac{6\sum D^2}{N^3-N}
Where D is the difference between the corresponding X and Y ranks, and N is the number of value pairs (see Wiki for details).
Finding the rank correlation in Excel is a pain, but with pandas, the following code is a one-shot. It is wonderful!
rho_spearman = df.corr(method='spearman')
The result of actually calculating the correlation with pandas is as shown in the figure, and a negative autocorrelation is confirmed as a whole (= if the average is 0, it tends to repel).
Next, let's use regression analysis to predict the return from the next day to the next month from today's crash (-7.72% at the close). As usual, in order to mitigate the effect of the regression coefficient due to outliers, first winsorize the return (pre & post) within the range of ± 2σ.
{\rm winsorize}(r_{t+d})={\rm min}({\rm max}(r_{t+d},~\mu-2\sigma),~\mu+2\sigma)
Then, a linear regression was performed with x as the return on the day of the crash and y as the return on the next d days, and the predicted value y for today's return x = -7.72% was obtained for each d.
{\rm winsorize}(r_{t+d})=\beta_d*{\rm winsorize}(r_{t})+\alpha_d+\epsilon_{t+d}
As a result of performing the above regression analysis with Linear Regression of scikit-learn.linear_models, here is the predicted return after d days (daily rate per day). As you can see from this figure, it seems that the life span is about 1 to 2 weeks even if it repels. As a caveat here, some people may feel that the predicted value is negative even though the autocorrelation was negative even after one month, but in the first place, the period that meets the conditions ( This phenomenon occurs because the average return one month after (the day when the return on the day fell below -2σ) was significantly negative. Correlation is considered with the mean set to 0, whereas in regression analysis the mean is also taken into account by the intercept term. If the average return is non-zero, it's often the case, so it's dangerous to try to understand just the correlation.
Furthermore, the figure below is a scatter plot of x: return on the day (only on the crash day) and y: cumulative return on the next n days (n = 1, 5, 10, 15, 20). If you use seaborn's sns.regplot (), it will plot the regression line and its prediction range on the scatter plot at once! Convenient!
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
reg = LinearRegression()
import matplotlib.pyplot as plt
import seaborn as sns
sns.set()
df = pd.read_clipboard()
df['Date'] = pd.to_datetime(df['Date'])
df = df.set_index('Date').astype('float')
df_clip = df.apply(lambda x: np.clip(x, x.mean() - x.std() * 2, x.mean() + x.std() * 2), axis='index')
span_list = df_clip.columns[1:-1]
pred = pd.Series(index=span_list)
X_pred = -0.0772
for span in span_list:
X = df_clip['ret'].values.reshape(-1, 1)
Y = df_clip[span].values.reshape(-1, 1)
reg.fit(X, Y)
pred[span] = reg.predict(np.array(X_pred).reshape(-1, 1)).flatten() * int(span[:-1])
plt.clf()
sns.regplot(x=df_clip['ret'], y=df_clip[span])
plt.title('x: ret(t), y:average_ret(t+1:t+' + span[:-1] + ')')
plt.savefig('span + '.png')
plt.clf()
fig, [ax1, ax2] = plt.subplots(ncols=1, nrows=2)
df_clip.corr(method='spearman').iloc[0, 1:-1].plot(kind='bar', ax=ax1)
ax1.set_title('conditional autocorrelation when ret < -2σ')
pred.plot(kind='bar', ax=ax2)
ax2.set_title('conditional predicted return when ret < -2σ')
plt.tight_layout()
plt.savefig('pred_ret.png')
Recommended Posts