This is a summary of tips for backtesting with Python. Let's finish the troublesome pre-processing quickly and concentrate on model making! That is the purpose. Although not introduced in the article, you can try various portfolio selection models such as multi-factor model because you can get macro data with pandas-datareader.

Overview: --Prepare an environment for Backtesting with Python. --Select the target assets from the TSE TOPIX constituent stocks and form a minimum diversified portfolio.

Acquisition of stock price data

First, install pandas-datareader in your environment.

pandas-datareader is a convenient Python package (with pandas.Dataframe friendly) that allows you to download market data such as stock prices via the Web API. IEX, World Bank, OECD, Yahoo! Finance, FRED, [Stooq] You can read the data acquired on the python code by hitting API such as (https://stooq.com/q/?s=usdkrw) internally. Please refer to Official Document for detailed usage.

# Install pandas-datareader (latest version)
pip install git+https://github.com/pydata/pandas-datareader.git
# Or install pandas-datareader (stable version)
pip install pandas-datareader

This time, the target products are stocks listed on the Tokyo Stock Exchange (TSE). Most of the data published on the Web is in the US market, but Poland's strongest site stooq.com is Tokyo Stock Exchange. The past data of the exchange is open to the public. Use pandas-datareader to get data for individual stocks from stooq.

Basically, you can run pandas_datareader.stooq.StooqDailyReader (). For the argument, specify the securities code (or ticker symbol) registered in each market and the site (Yahoo !, Stooq, ...) of the data disclosure source.

A 4-digit securities code is assigned to stocks handled on the Tokyo Stock Exchange, so we will use this this time. (Example: Toyota Motor is a stock whose securities code is ** 7203 ** on the TSE. In NYSE, it is traded as a stock whose ticker symbol is ** TM **.)

As a test, let's get the stock price data of Toyota Motor Corporation (TSE: 7203) and plot it.

import datetime
import pandas_datareader

start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2020, 11, 30)
stockcode = "7203.jp" # Toyota Motor Corporation (TM)

df = pandas_datareader.stooq.StooqDailyReader(stockcode, start, end).read()
df = df.sort_values(by='Date',ascending=True)
display(df) # Show dataframe
-----
            Open	High	Low	    Close	Volume
Date					
2015-01-05	6756.50	6765.42	6623.43	6704.69	10653925
2015-01-06	6539.48	6601.09	6519.83	6519.83	13870266
2015-01-07	6480.52	6685.05	6479.64	6615.40	12837377
2015-01-08	6698.46	6748.46	6693.98	6746.69	11257646
2015-01-09	6814.56	6846.70	6752.92	6795.80	11672928
...	...	...	...	...	...
2020-11-04	7024.00	7054.00	6976.00	6976.00	6278100
2020-11-05	6955.00	7032.00	6923.00	6984.00	5643400
2020-11-06	7070.00	7152.00	7015.00	7019.00	11092900
2020-11-09	7159.00	7242.00	7119.00	7173.00	7838600
2020-11-10	7320.00	7360.00	7212.00	7267.00	8825700

Daily stock price transition data was acquired as pandas.Dataframe! Let's plot the contents of the df just created. (Basically use the closing price)

# Plot timeseries (2015/1/1 - 2020/11/30)
plt.figure(figsize=(12,8))
plt.plot(df.index, df["Close"].values)
plt.show()

As shown in the figure below, the transition of the closing price (Close) could be easily plotted.

Create panel data for target assets

As a preparation for solving the portfolio optimization problem, create panel data for multiple assets (stocks) and organize them as a pandas.Dataframe object.

This time, we will select 5 stocks listed in TOPIX 500 as investment target assets. In addition, as a pre-processing, "closing price" is converted to "closing price-based rate of return". Please change the code in this part according to the situation.

import datetime
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import pandas_datareader.stooq as stooq


def get_stockvalues_tokyo(stockcode, start, end, use_ratio=False):
    """
    stockcode: market code of each target stock (ex. "NNNN") defined by the Tokyo stock market.
    start, end: datetime object
    """
    # Get index data from https://stooq.com/
    df = stooq.StooqDailyReader(f"{stockcode}.jp", start, end).read()
    df = df.sort_values(by='Date',ascending=True)
    
    if use_ratio:
        df = df.apply(lambda x: (x - x[0]) / x[0] )
    return df

def get_paneldata_tokyo(stockcodes, start, end, use_ratio=False):
    # Use "Close" value only 
    dfs = []
    for sc in stockcodes:
        df = get_stockvalues_tokyo(sc, start, end, use_ratio)[['Close']]
        df = df.rename(columns={'Close': sc})
        dfs.append(df)
    df_concat = pd.concat(dfs, axis=1)
    return df_concat

Create panel data using get_paneldata_tokyo ().

start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2020, 11, 30)
stockcodes=["1301", "1762", "1820", "1967", "2127"]

df = get_paneldata_tokyo(stockcodes, start, end, use_ratio=True)
display(df) # return ratio daily
-----
            1301	    1762    	1820    	1967	    2127
Date					
2015-01-05	0.000000	0.000000	0.000000	0.000000	0.000000
2015-01-06	-0.010929	-0.018385	-0.033937	-0.002265	-0.038448
2015-01-07	-0.014564	-0.020433	-0.059863	-0.013823	-0.059680
2015-01-08	-0.007302	-0.016338	-0.057883	-0.013823	-0.039787
2015-01-09	0.000000	-0.004490	-0.031938	-0.025407	-0.043770
...	...	...	...	...	...
2020-10-29	0.096138	-0.032923	-0.030777	0.858573	5.682321
2020-10-30	0.093336	-0.039657	-0.041199	0.832831	5.704266
2020-11-02	0.107748	-0.026188	-0.032198	0.845702	5.418978
2020-11-04	0.099341	-0.024392	-0.020829	0.858573	5.704266
2020-11-05	0.069315	-0.014964	-0.042147	0.904909	6.055390

With this, the panel data of each asset to be evaluated has been acquired.

Markowitz's mean variance model and its solution

Determining an appropriate investment ratio for multiple assets to be invested is called ** portfolio optimization **. This time, we will use the ** mean-variance model ** (Mean-Variance Model) proposed by Markowitz as the most basic portfolio optimization problem setting.

Markowitz mean variance model

In Markowitz's mean variance model, we consider an optimization problem that "minimizes portfolio variance" under the constraint that "expected return of portfolio is above a certain value".

In general, for a portfolio of $ n $ assets, the variance of the portfolio is a quadratic form of the covariance matrix between the $ n $ assets, so this optimization problem is a quadratic programming problem. It becomes a class of Programming, QP) and is formulated as follows.

\begin{align} \underset{\bf x}{\rm minimize} ~~~ &{\bf x}^T \Sigma {\bf x} \\\ {\rm subject~to} ~~~ &{\bf r}^T {\bf x} = \sum_{i=1}^{n} r_i x_i \geq r_e \\\ &{\|\| {\bf x} \|\|}\_{1} = \sum_{i=1}^{n} x_i = 1 \\\ &x_i \geq 0 ~~ (i = 1, \cdots, n) \end{align}

-$ \ Sigma \ in \ mathbb {R} ^ {n \ times n} - n $ Covariance matrix of assets -$ {\ bf x} \ in \ mathbb {R} ^ {n} - n $ Investment ratio vector of assets -$ {\ bf r} \ in \ mathbb {R} ^ {n} - n $ Expected rate of return vector of assets -$ x_i \ in \ mathbb {R} $-Investment ratio of assets $ i $ -$ r_i \ in \ mathbb {R} $-Expected rate of return on assets $ i $ -$ r_e \ in \ mathbb {R} $-Investor's expected rate of return

The first constraint formula requires that the expected rate of return of the portfolio be greater than or equal to a certain value ($ = r_e $). The second and third constraint formulas are self-explanatory from the definition of portfolio. If you allow short selling of assets, you may want to remove the third constraint.

How to use CVXOPT

Solve this quadratic programming problem (QP) using Python's convex optimization package CVXOPT. When dealing with quadratic programming problems with CVXOPT, organize the optimization problems you want to solve into the following generalized format.

\begin{align} \underset{\bf x}{\rm minimize} ~~~ &\frac{1}{2} {\bf x}^{T} P {\bf x} + {\bf q}^{T} {\bf x} \\\ {\rm subject~to} ~~~ & G {\bf x} \leq {\bf h} \\\ &A {\bf x} = {\bf b} \end{align}

Calculate the parameters P, q, G, h, A and execute thecvxopt.solvers.qp ()function to find the optimum solution and value. For Markowitz's mean / variance model, $ P = 2 \cdot \Sigma, ~~~ q = {\bf 0}, ~~~ G = - {\bf I}\_n, ~~~ h = - r_e, ~~~ A = {\bf 1}\_n^T, ~~~ b = 1 $ It becomes.

reference:

https://cvxopt.org/userguide/coneprog.html#quadratic-programming
https://qiita.com/ryoshi81/items/8b0c6add3e367f94c828

Calculated in Python

Calculate the required statistics from the panel data df of the target asset.

Covariance matrix between assets in the portfolio $ \ Sigma $:

df.cov() # Covariance matrix
-----
        1301	    1762        1820	    1967	    2127
1301	0.024211	0.015340	0.018243	0.037772	0.081221
1762	0.015340	0.014867	0.015562	0.023735	0.038868
1820	0.018243	0.015562	0.025023	0.029918	0.040811
1967	0.037772	0.023735	0.029918	0.109754	0.312827
2127	0.081221	0.038868	0.040811	0.312827	1.703412

Expected rate of return for each asset in the portfolio $ {\ bf r} $:

df.mean().values # Expected returns
-----
array([0.12547322, 0.10879767, 0.07469455, 0.44782516, 1.75209493])

Solve the optimization problem using CVXOPT.

import cvxopt

def cvxopt_qp_solver(r, r_e, cov):
    # CVXOPT QP Solver for Markowitz' Mean-Variance Model
    # See https://cvxopt.org/userguide/coneprog.html#quadratic-programming
    # See https://cdn.hackaday.io/files/277521187341568/art-mpt.pdf
    n = len(r)
    r = cvxopt.matrix(r)
    
    P = cvxopt.matrix(2.0 * np.array(cov))
    q = cvxopt.matrix(np.zeros((n, 1)))
    G = cvxopt.matrix(np.concatenate((-np.transpose(r), -np.identity(n)), 0))
    h = cvxopt.matrix(np.concatenate((-np.ones((1,1)) * r_e, np.zeros((n,1))), 0))
    A = cvxopt.matrix(1.0, (1, n))
    b = cvxopt.matrix(1.0)    
    sol = cvxopt.solvers.qp(P, q, G, h, A, b)
    return sol

r = df.mean().values # Expected returns
r_e = 1.45 * # Lower bound for portfolio's return
cov = df.cov() # Covariance matrix

# Solve QP and derive optimal portfolio
sol = cvxopt_qp_solver(r, r_e, cov)
x_opt = np.array(sol['x'])
print(x_opt)
print("Variance (x_opt) :", sol["primal objective"])

-----

 pcost       dcost       gap    pres   dres
 0:  4.3680e-03 -8.6883e-02  5e+00  2e+00  2e+00
 1:  9.1180e-02 -2.2275e-01  5e-01  1e-01  1e-01
 2:  2.1337e-02 -6.0274e-02  8e-02  2e-16  1e-16
 3:  1.0483e-02 -1.7810e-03  1e-02  1e-16  3e-17
 4:  4.9857e-03  1.5180e-03  3e-03  2e-16  8e-18
 5:  4.0217e-03  3.6059e-03  4e-04  3e-17  1e-17
 6:  3.7560e-03  3.7107e-03  5e-05  3e-17  1e-18
 7:  3.7187e-03  3.7168e-03  2e-06  1e-17  4e-18
 8:  3.7169e-03  3.7168e-03  2e-08  1e-16  6e-18
Optimal solution found.
[ 5.56e-05]
[ 1.00e+00]
[ 1.76e-05]
[ 3.84e-07]
[ 2.63e-07]

Variance (x_opt):  0.003716866155475511 #Optimal portfolio diversification

The optimal solution (optimal investment ratio for each asset) and the optimal value (portfolio diversification when the optimal investment ratio is applied) were obtained. The optimal solution based on the mean variance model used this time is called "minimum variance portfolio" because it emphasizes the optimality for portfolio risk (variance).

In many cases, the Sharpe ratio, which takes into account the rate of return (inflation rate) of risk-free assets, is used as the evaluation index for the rate of return. There are several ways to backtest, so please refer to specialized books and treatises.

Thank you for reading to the end!