I was interested in both machine learning and financial engineering, so I aimed for two birds with one stone Machine learning for trading @ Udacity I am attending. I learned about the stock price metrics $ \ alpha $ and $ \ beta $, so go to pandas + matplotlib. I animated the time change and played with it. The target is the top 10 stocks in the World Market Capitalization Ranking as of February 2017. Github is here. As you can see, I am an amateur, so I would appreciate it if you could point out any mistakes or deficiencies.
Since daily stock price data is used, it is necessary to obtain it in advance. For example, you can get it from YAHOO! FINANCE by the following procedure. Please note that ** scraping is a violation of the rules **.
GOOG.csv
)) Historical Data
at the top of the page.Time Period:
is the desired period and Frequency:
is Daily
, and click Download Data
. table.csv
→ GOOG.csv
).In this article, we will assume that stock price data such as GOOG.csv
is stored in the data /
directory.
2-2. python
Since pandas is used for data shaping and matplotlib is used for animation, it is necessary to install it in advance. We have confirmed the operation in the following environment.
3-1. daily-return
The daily-return $ r $ is simply the stock price from the previous day. Strictly speaking, it is defined by the following formula. Here, $ x [t] $ represents the adjusted closing price on the date $ t $.
r = \frac{x[t]}{x[t-1]} - 1
With pandas, you can easily calculate daily-return. See Machine learning for trading @ Udacity for more information. For your reference, the daily-return of GOOG
from December 1, 2006 to December 1, 2016 is as follows. It is shown in.
By comparing the daily-return with the market average, you can evaluate the characteristics of the stock. Here, SPY
is assumed to be the market average, and SPY
Draw a daily-return scatter diagram of quote / SPY? P = SPY) and GOOG
and draw a regression line. I will. The period is from December 1, 2006 to December 1, 2016.
The intercept of this regression line $ y = \ beta x + \ alpha $ is called ** alpha value **, and the slope is called ** beta value **. When the stock matches the market average, the regression line should match the straight line $ y = x $, that is, $ \ alpha = 0 $ and $ \ beta = 1 $. The larger $ \ alpha $, the larger the excess return to the market average, and the larger $ \ beta $, the greater the linkage to the market average. By the way, in the case of the above figure, it is $ \ alpha = 0.000309329688614 $, $ \ beta = 0.917720842929 $. $ \ Alpha $ and $ \ beta $ are indicators used to evaluate the active and passive returns of individual stocks.
Use matplotlib to animate the transition between alpha and beta values. The target is the following top 10 stocks in the World Market Capitalization Ranking as of February 2017. For the industry, I referred to Wikipedia. I don't have enough knowledge, so I don't have any particular consideration. All you have to do is look at the results and delusion.
# | symbol | Company name | Industry |
---|---|---|---|
1 | AAPL |
Apple | Electrical equipment |
2 | GOOG |
Alphabet | conglomerate |
3 | MSFT |
Microsoft | Information and communication industry |
4 | BRK-A |
Berkshire Hathaway | Insurance business |
5 | AMZN |
Amazon.com | Retail business |
6 | FB |
the Internet | |
7 | XON |
Exxon Mobil | Petroleum and coal products |
8 | JNJ |
Johnson & Johnson | Service industry |
9 | JPM |
JPMorgan Chase | Other financial industry |
10 | WFC |
Wells Fargo | Financial industry |
Read the stock price information from the csv file and calculate the daily-return. This article uses the following function [^ license] introduced in Udacity.
get_data (symbol)
: A function that reads the adjusted closing price from the CSV file of symbol
and outputs it in the pandas.DataFrame
format.fill_missing_values (stock_data)
: A function that fills in the missing parts of stock_data
(pandas.DataFrame
).compute_daily_returns (stock_data)
: A function that calculates daily-return from stock_data
(pandas.DataFrame
) and outputs it with pandas.DataFrame
.I made a function to animate $ \ alpha $ and $ \ beta $. In the default setting, the values from the data of the last 2 years (period
) for the above 10 stocks (symbols
) from December 1, 2006 (start_date
) to December 1, 2016 (ʻend_date) Is calculated. While shifting the calculation range by 10 days (ʻinterval
), draw one frame at a time with ʻanimate_polyfit () and output ʻab.gif
. The size of the legend represents the size of the correlation coefficient between SPY
and the relevant issue. The date on the upper right of the graph represents the last day of the calculation range. In other words, the value calculated from the date on the upper right based on the data for the past period
days is displayed.
You need to put the csv file of each brand in the data
directory. Also note that ʻudacity.py`, which defines the functions in Section 4.1, must be placed in the same directory.
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as ani
import matplotlib.cm as cm
import udacity # Functions defined in "Machine learning for trading"
def animate_a_b(
symbols=["AAPL", "GOOG", "MSFT", "BRK-A", "AMZN",
"FB", "XOM", "JNJ", "JPM", "WFC"],
start_date="2006-12-01", end_date="2016-12-01",
period=252 * 2):
""" --- Preprocess: You have to take Udacity! --- """
# Read data
dates = pd.date_range(start_date, end_date) # date range as index
stock_data = udacity.get_data(symbols, dates) # get data for each symbol
# Fill missing values
udacity.fill_missing_values(stock_data)
# Daily returns
daily_returns = udacity.compute_daily_returns(stock_data)
""" --- Make animation --- """
interval = 10 # Nframe interval
frames = (len(stock_data) - period) / interval # Num of frames
markers = ["o", "^", "s"]
def animate_polyfit(nframe):
plt.clf()
daily_returns_p = daily_returns[
nframe * interval: nframe * interval + period]
corr = daily_returns_p.corr(method="pearson")
xmin, xmax = -0.003, 0.003
ymin, ymax = 0.0, 2.0
plt.plot([0, 0], [ymin, ymax], '--', color='black')
plt.plot([xmin, xmax], [1, 1], '--', color='black')
for n, symbol in enumerate(symbols[1:]):
beta, alpha = np.polyfit(daily_returns_p["SPY"],
daily_returns_p[symbol], 1)
plt.plot(alpha, beta, markers[n % len(markers)], alpha=0.7,
label=symbol, color=cm.jet(n * 1. / len(symbols)),
ms=np.absolute(corr.ix[0, n + 1]) * 25)
plt.xlim([xmin, xmax])
plt.ylim([ymin, ymax])
plt.xlabel("Alpha")
plt.ylabel("Beta")
plt.text(xmax, ymax, str(daily_returns_p.index[-1]),
ha="right", va="bottom")
plt.legend(loc="upper left")
fig = plt.figure(figsize=(8, 8))
anim = ani.FuncAnimation(fig, animate_polyfit, frames=frames)
anim.save("ab.gif", writer="imagemagick", fps=18)
Around the Lehman shock around 2008, two stocks in the financial industry (JPM
, WFC
) are excited ... By the way, FB
is a relatively young company, so it will appear in the middle of the animation.
I tried to animate the transition of $ \ alpha $ and $ \ beta $ with pandas + matplotlib. Aside from its usage, animation may be a way to intuitively express the time change of two-dimensional or higher indicators [^ anime]. I would like to continue taking this course and introduce any articles that I can write about. Thank you for reading to the end!
[^ license]: The source code is not posted to avoid rights issues. This is a free course, so please take it!
[^ anime]: Actually, I remember seeing a presentation using an anime that is 100 times cooler than this at TED, so I tried to imitate it. I don't remember the details, but I explained the movements of major countries around GDP and something.
Appendix. Trendalyzer As you pointed out in the comments, the source of this article is Hans Rosling's Trendalyzer. Trendalyzer can be used as a Motion Graph in Google Spreadsheets, so I tried using it. The spreadsheet is available at here, so feel free to use it. The data body is placed on sheet 1 and the motion graph is placed on sheet 2.
I saved the calculation result in Excel format so that I can open it with google spreadsheet. Create pandas.DataFrame
according to Format [^ format] for motion graph.
save_a_b()
def save_a_b(
symbols=["AAPL", "GOOG", "MSFT", "BRK-A", "AMZN",
"FB", "XOM", "JNJ", "JPM", "WFC"],
start_date="2006-12-01", end_date="2016-12-01",
period=252 * 2):
""" --- Preprocess: You have to take Udacity! --- """
# Read data
dates = pd.date_range(start_date, end_date) # date range as index
stock_data = udacity.get_data(symbols, dates) # get data for each symbol
# Fill missing values
udacity.fill_missing_values(stock_data)
# Daily returns
daily_returns = udacity.compute_daily_returns(stock_data)
""" --- Calculate spreadsheet of alpha and beta ---"""
sheet = pd.DataFrame(columns=["Symbol", "Date", "Alpha", "Beta", "Color",
"Size"])
interval = 10 # Nframe interval
frames = (len(stock_data) - period) / interval # Num of frames
for nframe in range(frames):
daily_returns_p = daily_returns[
nframe * interval: nframe * interval + period]
corr = daily_returns_p.corr(method="pearson")
for n, symbol in enumerate(symbols[1:]):
beta, alpha = np.polyfit(daily_returns_p["SPY"],
daily_returns_p[symbol], 1)
new_row = pd.DataFrame(
[[symbol, daily_returns_p.index[-1].strftime("%Y/%m/%d"),
alpha, beta, n, np.absolute(corr.ix[0, n + 1]) * 25]],
columns=sheet.columns)
sheet = sheet.append(new_row, ignore_index=True)
sheet.to_excel("ab.xlsx")
[^ format]: I output in % m /% d /% Y
format, but I was addicted to it because google teacher did not convert it to a motion graph. After all, it was solved by outputting in % Y /% m /% d
format and specifying date
in display format> number
of google spreadsheet. I'm not sure ...: bow:
Press the start button to start the animation. You can also change the speed of the animation.
Use the time bar to fast forward and rewind at will. You can display the trajectory by selecting a specific brand. Wow!
In creating this article, I referred to the following. Thank you very much!
Recommended Posts