Animate the alpha and beta values of the world's top market cap stocks with pandas + matplotlib

1.First of all

I was interested in both machine learning and financial engineering, so I aimed for two birds with one stone Machine learning for trading @ Udacity I am attending. I learned about the stock price metrics $ \ alpha $ and $ \ beta $, so go to pandas + matplotlib. I animated the time change and played with it. The target is the top 10 stocks in the World Market Capitalization Ranking as of February 2017. Github is here. As you can see, I am an amateur, so I would appreciate it if you could point out any mistakes or deficiencies.

ab_optimized.gif

2. Environment

2-1. Obtaining stock price data

Since daily stock price data is used, it is necessary to obtain it in advance. For example, you can get it from YAHOO! FINANCE by the following procedure. Please note that ** scraping is a violation of the rules **.

  1. Access the summary page of the target stock (Example: google ( GOOG.csv))
  2. Click Historical Data at the top of the page.
  3. Confirm that Time Period: is the desired period and Frequency: is Daily, and click Download Data.
  4. If necessary, change the file name (example: table.csv GOOG.csv).

In this article, we will assume that stock price data such as GOOG.csv is stored in the data / directory.

2-2. python

Since pandas is used for data shaping and matplotlib is used for animation, it is necessary to install it in advance. We have confirmed the operation in the following environment.

3. Evaluation index

3-1. daily-return

The daily-return $ r $ is simply the stock price from the previous day. Strictly speaking, it is defined by the following formula. Here, $ x [t] $ represents the adjusted closing price on the date $ t $.

r = \frac{x[t]}{x[t-1]} - 1 

With pandas, you can easily calculate daily-return. See Machine learning for trading @ Udacity for more information. For your reference, the daily-return of GOOG from December 1, 2006 to December 1, 2016 is as follows. It is shown in.

goog.png

3-2. Alpha and beta values

By comparing the daily-return with the market average, you can evaluate the characteristics of the stock. Here, SPY is assumed to be the market average, and SPY Draw a daily-return scatter diagram of quote / SPY? P = SPY) and GOOG and draw a regression line. I will. The period is from December 1, 2006 to December 1, 2016.

figure_3.png

The intercept of this regression line $ y = \ beta x + \ alpha $ is called ** alpha value **, and the slope is called ** beta value **. When the stock matches the market average, the regression line should match the straight line $ y = x $, that is, $ \ alpha = 0 $ and $ \ beta = 1 $. The larger $ \ alpha $, the larger the excess return to the market average, and the larger $ \ beta $, the greater the linkage to the market average. By the way, in the case of the above figure, it is $ \ alpha = 0.000309329688614 $, $ \ beta = 0.917720842929 $. $ \ Alpha $ and $ \ beta $ are indicators used to evaluate the active and passive returns of individual stocks.

4. Evaluation

Use matplotlib to animate the transition between alpha and beta values. The target is the following top 10 stocks in the World Market Capitalization Ranking as of February 2017. For the industry, I referred to Wikipedia. I don't have enough knowledge, so I don't have any particular consideration. All you have to do is look at the results and delusion.

# symbol Company name Industry
1 AAPL Apple Electrical equipment
2 GOOG Alphabet conglomerate
3 MSFT Microsoft Information and communication industry
4 BRK-A Berkshire Hathaway Insurance business
5 AMZN Amazon.com Retail business
6 FB Facebook the Internet
7 XON Exxon Mobil Petroleum and coal products
8 JNJ Johnson & Johnson Service industry
9 JPM JPMorgan Chase Other financial industry
10 WFC Wells Fargo Financial industry

4-1. Pretreatment

Read the stock price information from the csv file and calculate the daily-return. This article uses the following function [^ license] introduced in Udacity.

4-2. Animation

I made a function to animate $ \ alpha $ and $ \ beta $. In the default setting, the values from the data of the last 2 years (period) for the above 10 stocks (symbols) from December 1, 2006 (start_date) to December 1, 2016 (ʻend_date) Is calculated. While shifting the calculation range by 10 days (ʻinterval), draw one frame at a time with ʻanimate_polyfit () and output ʻab.gif. The size of the legend represents the size of the correlation coefficient between SPY and the relevant issue. The date on the upper right of the graph represents the last day of the calculation range. In other words, the value calculated from the date on the upper right based on the data for the past period days is displayed. You need to put the csv file of each brand in the data directory. Also note that ʻudacity.py`, which defines the functions in Section 4.1, must be placed in the same directory.

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.animation as ani
import matplotlib.cm as cm
import udacity  # Functions defined in "Machine learning for trading"


def animate_a_b(
        symbols=["AAPL", "GOOG", "MSFT", "BRK-A", "AMZN",
                 "FB", "XOM", "JNJ", "JPM", "WFC"],
        start_date="2006-12-01", end_date="2016-12-01",
        period=252 * 2):

    """ --- Preprocess: You have to take Udacity! --- """
    # Read data
    dates = pd.date_range(start_date, end_date)  # date range as index
    stock_data = udacity.get_data(symbols, dates)  # get data for each symbol

    # Fill missing values
    udacity.fill_missing_values(stock_data)
    # Daily returns
    daily_returns = udacity.compute_daily_returns(stock_data)

    """ --- Make animation --- """
    interval = 10  # Nframe interval
    frames = (len(stock_data) - period) / interval  # Num of frames
    markers = ["o", "^", "s"]

    def animate_polyfit(nframe):

        plt.clf()
        daily_returns_p = daily_returns[
            nframe * interval: nframe * interval + period]
        corr = daily_returns_p.corr(method="pearson")

        xmin, xmax = -0.003, 0.003
        ymin, ymax = 0.0, 2.0

        plt.plot([0, 0], [ymin, ymax], '--', color='black')
        plt.plot([xmin, xmax], [1, 1], '--', color='black')
        for n, symbol in enumerate(symbols[1:]):
            beta, alpha = np.polyfit(daily_returns_p["SPY"],
                                     daily_returns_p[symbol], 1)
            plt.plot(alpha, beta, markers[n % len(markers)], alpha=0.7,
                     label=symbol, color=cm.jet(n * 1. / len(symbols)),
                     ms=np.absolute(corr.ix[0, n + 1]) * 25)
        plt.xlim([xmin, xmax])
        plt.ylim([ymin, ymax])
        plt.xlabel("Alpha")
        plt.ylabel("Beta")
        plt.text(xmax, ymax, str(daily_returns_p.index[-1]),
                 ha="right", va="bottom")
        plt.legend(loc="upper left")

    fig = plt.figure(figsize=(8, 8))
    anim = ani.FuncAnimation(fig, animate_polyfit, frames=frames)
    anim.save("ab.gif", writer="imagemagick", fps=18)

ab_optimized.gif

Around the Lehman shock around 2008, two stocks in the financial industry (JPM, WFC) are excited ... By the way, FB is a relatively young company, so it will appear in the middle of the animation.

5. Conclusion

I tried to animate the transition of $ \ alpha $ and $ \ beta $ with pandas + matplotlib. Aside from its usage, animation may be a way to intuitively express the time change of two-dimensional or higher indicators [^ anime]. I would like to continue taking this course and introduce any articles that I can write about. Thank you for reading to the end!

[^ license]: The source code is not posted to avoid rights issues. This is a free course, so please take it!

[^ anime]: Actually, I remember seeing a presentation using an anime that is 100 times cooler than this at TED, so I tried to imitate it. I don't remember the details, but I explained the movements of major countries around GDP and something.

Appendix. Trendalyzer As you pointed out in the comments, the source of this article is Hans Rosling's Trendalyzer. Trendalyzer can be used as a Motion Graph in Google Spreadsheets, so I tried using it. The spreadsheet is available at here, so feel free to use it. The data body is placed on sheet 1 and the motion graph is placed on sheet 2.

A-1. Preparation

I saved the calculation result in Excel format so that I can open it with google spreadsheet. Create pandas.DataFrame according to Format [^ format] for motion graph.

save_a_b()


def save_a_b(
        symbols=["AAPL", "GOOG", "MSFT", "BRK-A", "AMZN",
                 "FB", "XOM", "JNJ", "JPM", "WFC"],
        start_date="2006-12-01", end_date="2016-12-01",
        period=252 * 2):

    """ --- Preprocess: You have to take Udacity! --- """
    # Read data
    dates = pd.date_range(start_date, end_date)  # date range as index
    stock_data = udacity.get_data(symbols, dates)  # get data for each symbol

    # Fill missing values
    udacity.fill_missing_values(stock_data)
    # Daily returns
    daily_returns = udacity.compute_daily_returns(stock_data)

    """ --- Calculate spreadsheet of alpha and beta ---"""
    sheet = pd.DataFrame(columns=["Symbol", "Date", "Alpha", "Beta", "Color",
                                  "Size"])
    interval = 10  # Nframe interval
    frames = (len(stock_data) - period) / interval  # Num of frames

    for nframe in range(frames):

        daily_returns_p = daily_returns[
            nframe * interval: nframe * interval + period]
        corr = daily_returns_p.corr(method="pearson")

        for n, symbol in enumerate(symbols[1:]):
            beta, alpha = np.polyfit(daily_returns_p["SPY"],
                                     daily_returns_p[symbol], 1)

            new_row = pd.DataFrame(
                [[symbol, daily_returns_p.index[-1].strftime("%Y/%m/%d"),
                  alpha, beta, n, np.absolute(corr.ix[0, n + 1]) * 25]],
                columns=sheet.columns)
            sheet = sheet.append(new_row, ignore_index=True)

    sheet.to_excel("ab.xlsx")

[^ format]: I output in % m /% d /% Y format, but I was addicted to it because google teacher did not convert it to a motion graph. After all, it was solved by outputting in % Y /% m /% d format and specifying date in display format> number of google spreadsheet. I'm not sure ...: bow:

A-2. I played

Press the start button to start the animation. You can also change the speed of the animation.

ab_resize.gif

Use the time bar to fast forward and rewind at will. You can display the trajectory by selecting a specific brand. Wow!

select_resize.gif

reference

In creating this article, I referred to the following. Thank you very much!

Recommended Posts

Animate the alpha and beta values of the world's top market cap stocks with pandas + matplotlib
Find the sum of unique values with pandas crosstab
Reformat the timeline of the pandas time series plot with matplotlib
Adjust the bin width crisply and neatly with the histogram of matplotlib and seaborn
Align the size of the colorbar with matplotlib
Get the top nth values in Pandas
Fill the missing value (null) of DataFrame with the values before and after with pyspark
I compared the moving average of IIR filter type with pandas and scipy
Increase the font size of the graph with matplotlib
Implement "Data Visualization Design # 3" with pandas and matplotlib
Analyze Apache access logs with Pandas and Matplotlib
Aggregate VIP values of Smash Bros. with Pandas
Read the csv file with jupyter notebook and write the graph on top of it
I tried to compare the processing speed with dplyr of R and pandas of Python