Phenomena such as "when stock A goes up, stock B goes up" and "when stock A goes down, stock B goes down" are called "take-up" and "take-down". BNF, a genius trader, used this to buy and sell stocks and made a lot of money. Make a lot of money by imitating yourself! I thought, but I couldn't find a way to find a pair to work with. So, I implemented it in Python, so I will publish the code and its description.
I have it on Github. As you want https://github.com/toshiikuoo/puclic/blob/master/%E6%A0%AA%E4%BE%A1%E7%9B%B8%E9%96%A2.ipynb
The flow of operation is as follows
Get stock list information from wikipedia S & P500 page ↓ Acquire the stock price of the acquired stock list from yahoo finance ↓ Calculate the correlation of stock prices for all combinations of stocks * Correlation: A numerical value of how similar the two data are ↓ Sort pairs in descending order of correlation
I will explain while excerpting the above code.
#Required library import
!pip install lxml html5lib beautifulsoup4
import pandas as pd
from pandas import Series,DataFrame
from pandas_datareader import DataReader
import numpy as np
from datetime import datetime
from scipy.stats.stats import pearsonr
import itertools
# Install yfinance package.
!pip install yfinance
# Import yfinance
import yfinance as yf
# S&P Create a list of all stocks
url="https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
sp500_list=pd.read_html(url)[0].Symbol.values.tolist()
len(sp500_list)
#Store the closing prices of sp500 stocks in one DataFrame
close_sp500_list=yf.download(sp500_list_yahoo,'2019-10-04','2019-11-01')["Adj Close"]
#Calculate correlation with pairs for each column
#Creating a dictionary type to enter the calculated correlation
correlations={}
#Calculate correlation
for cola,colb in itertools.combinations(sp500_list_yahoo,2):
nas=np.logical_or(np.isnan(close_sp500_list.loc[:,cola]),np.isnan(close_sp500_list.loc[:,colb]))
try:
correlations[cola + '__'+ colb]=pearsonr(close_sp500_list.loc[:,cola][~nas],close_sp500_list.loc[:,colb][~nas])
except ValueError:
pass
#Output result"correlations"Is a list format, so convert it to a DataFrame
result=DataFrame.from_dict(correlations,orient='index')
result.columns=['PCC','p-value']
print(result.sort_values('PCC'))
The final output is below. The correlation of each stock pair is sorted and output.
PCC p-value
BKR__SPGI -0.968878 1.437804e-03
BIIB__HAS -0.962712 8.038530e-13
BKR__PGR -0.959178 2.465597e-03
PGR__WCG -0.941347 6.818268e-11
CI__PGR -0.935051 1.840799e-10
... ... ...
CNC__WCG 0.996087 1.493074e-22
BKR__PRGO 0.997290 1.101006e-05
CBS__VIAB 0.998546 7.579099e-27
BBT__STI 0.998835 8.266321e-28
GOOGL__GOOG 0.999502 1.701271e-31
[127260 rows x 2 columns]
I want to group highly correlated stocks using the output results. (Please feel free to contact me with any questions or improvements. This is my first post, so I think it's strange. Github is difficult ...)