I don't know how many people in the world need to do this calculation.
The average pairwise stock correlation is the average of the correlation coefficients of price movements between stocks. In general, the environment is such that the market as a whole drops sharply during a crisis such as the Lehman shock or the European debt crisis, and then rises sharply in reaction, and the correlation coefficient between stocks increases. When the correlation jumps, all stocks behave in the same way, creating a market environment where it is difficult for active equity investors to earn excess returns by stock selection. So, you can use this indicator to adjust the risk level of your portfolio (such as reducing the risk when you are unlikely to win). As a more sloppy example, active stock managers use it as an excuse to say, "I haven't won recently, but in this market environment, please forgive me."
First, prepare the data. Prepare a DataFrame (m x n) with the date vertically (assumed to be m days), the stocks horizontally (assumed to be n), and the daily returns of stocks in each element.
This time, I prepared the data outside and read it with csv.
This article, download the stock price from yahoo using pandas, and use the pct_change ()
method for daily returns. You may fix it.
And finally I will calculate the correlation coefficient,
#Calculation of pairwise correlation(The result is Panel)
result = df.rolling(window=60, min_periods=30).corr()
Calculation of rolling pairwise correlation is completed in this one line!
What we are doing is "Calculate a correlation matrix (nxn matrix) that takes the correlation coefficient between all stocks for 60 days of daily returns up to that day. However, for stocks that do not have data for 30 days, None
Repeat this for all dates. " The resulting result
is a pandas.Panel
object that is a 2D DataFrame
with an additional axis (time axis) to make it 3D (m x n x n).
I was able to calculate the correlation, but it takes some work to get the average. Since this is a correlation matrix, 1 is included in the diagonal component, and it is necessary to take the average without this. 2) If the correlation coefficient cannot be calculated due to data loss, the None value is included. There are two reasons why you need to take the average over this.
First of all, it is the diagonal component 1, but I will change it to the None
value usingnp.fill_diagonal ()
of Numpy
. For a single DataFrame we use something like np.fill_diagonal (df, None)
, but this time we use the ʻapply ()method to apply it to the entire
Panel` as follows:
#Convert diagonal components to None
tmp = result.apply(lambda x: np.fill_diagonal(x.values, None), axis=(1,2))
Then apply the mean (skipna = True)
method to calculate the mean value, ignoring the None value. This is the average pairwise correlation. This is also applied over the time axis with a single ʻapply ()` method.
#Ignore None and calculate the average
apc = result.apply(lambda x: x.unstack().mean(skipna=True), axis=(1,2))
If you set Panel.apply (..., axis = (1,2))
, you can process the correlation matrix at each time point as DataFrame x
while moving the time axis.
The calculation is complete! Let's plot it.
apc.plot()
You can see that the correlation jumps at the time when the market is influenced by the occasional macro factors.
That's all there is to putting the code together. It's easy. pandas is only developed by the author Wes McKinney while he was working for hedge fund AQR Capital, making it very easy to handle financial data.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
#Data read
df = pd.read_csv('data.csv', na_values=' ', index_col=0, parse_dates=True)
#Calculation of pairwise correlation(The result is Panel)
result = df.rolling(window=60, min_periods=30).corr()
#Convert diagonal components to None
tmp = result.apply(lambda x: np.fill_diagonal(x.values, None), axis=(1,2))
#Ignore None and calculate the average
apc = result.apply(lambda x: x.unstack().mean(skipna=True), axis=(1,2))
#Plot
apc.plot()
--Pandas documentation. The rolling
method has changed since pandas 0.18.0, so please update to the latest pandas before using it.
http://pandas.pydata.org/pandas-docs/stable/computation.html#window-functions