This is a summary of tips for backtesting with Python. Let's finish the troublesome pre-processing quickly and concentrate on model making! That is the purpose. Although not introduced in the article, you can try various portfolio selection models such as multi-factor model because you can get macro data with pandas-datareader
.
Overview: --Prepare an environment for Backtesting with Python. --Select the target assets from the TSE TOPIX constituent stocks and form a minimum diversified portfolio.
First, install pandas-datareader
in your environment.
pandas-datareader
is a convenient Python package (with pandas.Dataframe friendly) that allows you to download market data such as stock prices via the Web API. IEX, World Bank, OECD, Yahoo! Finance, FRED, [Stooq] You can read the data acquired on the python code by hitting API such as (https://stooq.com/q/?s=usdkrw) internally. Please refer to Official Document for detailed usage.
# Install pandas-datareader (latest version)
pip install git+https://github.com/pydata/pandas-datareader.git
# Or install pandas-datareader (stable version)
pip install pandas-datareader
This time, the target products are stocks listed on the Tokyo Stock Exchange (TSE).
Most of the data published on the Web is in the US market, but Poland's strongest site stooq.com is Tokyo Stock Exchange. The past data of the exchange is open to the public. Use pandas-datareader
to get data for individual stocks from stooq.
Basically, you can run pandas_datareader.stooq.StooqDailyReader ()
. For the argument, specify the securities code (or ticker symbol) registered in each market and the site (Yahoo !, Stooq, ...) of the data disclosure source.
A 4-digit securities code is assigned to stocks handled on the Tokyo Stock Exchange, so we will use this this time. (Example: Toyota Motor is a stock whose securities code is ** 7203 ** on the TSE. In NYSE, it is traded as a stock whose ticker symbol is ** TM **.)
As a test, let's get the stock price data of Toyota Motor Corporation (TSE: 7203) and plot it.
import datetime
import pandas_datareader
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2020, 11, 30)
stockcode = "7203.jp" # Toyota Motor Corporation (TM)
df = pandas_datareader.stooq.StooqDailyReader(stockcode, start, end).read()
df = df.sort_values(by='Date',ascending=True)
display(df) # Show dataframe
-----
Open High Low Close Volume
Date
2015-01-05 6756.50 6765.42 6623.43 6704.69 10653925
2015-01-06 6539.48 6601.09 6519.83 6519.83 13870266
2015-01-07 6480.52 6685.05 6479.64 6615.40 12837377
2015-01-08 6698.46 6748.46 6693.98 6746.69 11257646
2015-01-09 6814.56 6846.70 6752.92 6795.80 11672928
... ... ... ... ... ...
2020-11-04 7024.00 7054.00 6976.00 6976.00 6278100
2020-11-05 6955.00 7032.00 6923.00 6984.00 5643400
2020-11-06 7070.00 7152.00 7015.00 7019.00 11092900
2020-11-09 7159.00 7242.00 7119.00 7173.00 7838600
2020-11-10 7320.00 7360.00 7212.00 7267.00 8825700
Daily stock price transition data was acquired as pandas.Dataframe!
Let's plot the contents of the df
just created. (Basically use the closing price)
# Plot timeseries (2015/1/1 - 2020/11/30)
plt.figure(figsize=(12,8))
plt.plot(df.index, df["Close"].values)
plt.show()
As shown in the figure below, the transition of the closing price (Close) could be easily plotted.
As a preparation for solving the portfolio optimization problem, create panel data for multiple assets (stocks) and organize them as a pandas.Dataframe object.
This time, we will select 5 stocks listed in TOPIX 500 as investment target assets. In addition, as a pre-processing, "closing price" is converted to "closing price-based rate of return". Please change the code in this part according to the situation.
import datetime
import numpy as np
import pandas as pd
import pandas_datareader.data as web
import pandas_datareader.stooq as stooq
def get_stockvalues_tokyo(stockcode, start, end, use_ratio=False):
"""
stockcode: market code of each target stock (ex. "NNNN") defined by the Tokyo stock market.
start, end: datetime object
"""
# Get index data from https://stooq.com/
df = stooq.StooqDailyReader(f"{stockcode}.jp", start, end).read()
df = df.sort_values(by='Date',ascending=True)
if use_ratio:
df = df.apply(lambda x: (x - x[0]) / x[0] )
return df
def get_paneldata_tokyo(stockcodes, start, end, use_ratio=False):
# Use "Close" value only
dfs = []
for sc in stockcodes:
df = get_stockvalues_tokyo(sc, start, end, use_ratio)[['Close']]
df = df.rename(columns={'Close': sc})
dfs.append(df)
df_concat = pd.concat(dfs, axis=1)
return df_concat
Create panel data using get_paneldata_tokyo ()
.
start = datetime.datetime(2015, 1, 1)
end = datetime.datetime(2020, 11, 30)
stockcodes=["1301", "1762", "1820", "1967", "2127"]
df = get_paneldata_tokyo(stockcodes, start, end, use_ratio=True)
display(df) # return ratio daily
-----
1301 1762 1820 1967 2127
Date
2015-01-05 0.000000 0.000000 0.000000 0.000000 0.000000
2015-01-06 -0.010929 -0.018385 -0.033937 -0.002265 -0.038448
2015-01-07 -0.014564 -0.020433 -0.059863 -0.013823 -0.059680
2015-01-08 -0.007302 -0.016338 -0.057883 -0.013823 -0.039787
2015-01-09 0.000000 -0.004490 -0.031938 -0.025407 -0.043770
... ... ... ... ... ...
2020-10-29 0.096138 -0.032923 -0.030777 0.858573 5.682321
2020-10-30 0.093336 -0.039657 -0.041199 0.832831 5.704266
2020-11-02 0.107748 -0.026188 -0.032198 0.845702 5.418978
2020-11-04 0.099341 -0.024392 -0.020829 0.858573 5.704266
2020-11-05 0.069315 -0.014964 -0.042147 0.904909 6.055390
With this, the panel data of each asset to be evaluated has been acquired.
Determining an appropriate investment ratio for multiple assets to be invested is called ** portfolio optimization **. This time, we will use the ** mean-variance model ** (Mean-Variance Model) proposed by Markowitz as the most basic portfolio optimization problem setting.
In Markowitz's mean variance model, we consider an optimization problem that "minimizes portfolio variance" under the constraint that "expected return of portfolio is above a certain value".
In general, for a portfolio of $ n $ assets, the variance of the portfolio is a quadratic form of the covariance matrix between the $ n $ assets, so this optimization problem is a quadratic programming problem. It becomes a class of Programming, QP) and is formulated as follows.
-$ \ Sigma \ in \ mathbb {R} ^ {n \ times n}
The first constraint formula requires that the expected rate of return of the portfolio be greater than or equal to a certain value ($ = r_e $). The second and third constraint formulas are self-explanatory from the definition of portfolio. If you allow short selling of assets, you may want to remove the third constraint.
Solve this quadratic programming problem (QP) using Python's convex optimization package CVXOPT. When dealing with quadratic programming problems with CVXOPT, organize the optimization problems you want to solve into the following generalized format.
Calculate the parameters P
, q
, G
, h
, A
and execute thecvxopt.solvers.qp ()
function to find the optimum solution and value. For Markowitz's mean / variance model,
reference:
Calculate the required statistics from the panel data df
of the target asset.
Covariance matrix between assets in the portfolio $ \ Sigma $:
df.cov() # Covariance matrix
-----
1301 1762 1820 1967 2127
1301 0.024211 0.015340 0.018243 0.037772 0.081221
1762 0.015340 0.014867 0.015562 0.023735 0.038868
1820 0.018243 0.015562 0.025023 0.029918 0.040811
1967 0.037772 0.023735 0.029918 0.109754 0.312827
2127 0.081221 0.038868 0.040811 0.312827 1.703412
Expected rate of return for each asset in the portfolio $ {\ bf r} $:
df.mean().values # Expected returns
-----
array([0.12547322, 0.10879767, 0.07469455, 0.44782516, 1.75209493])
Solve the optimization problem using CVXOPT.
import cvxopt
def cvxopt_qp_solver(r, r_e, cov):
# CVXOPT QP Solver for Markowitz' Mean-Variance Model
# See https://cvxopt.org/userguide/coneprog.html#quadratic-programming
# See https://cdn.hackaday.io/files/277521187341568/art-mpt.pdf
n = len(r)
r = cvxopt.matrix(r)
P = cvxopt.matrix(2.0 * np.array(cov))
q = cvxopt.matrix(np.zeros((n, 1)))
G = cvxopt.matrix(np.concatenate((-np.transpose(r), -np.identity(n)), 0))
h = cvxopt.matrix(np.concatenate((-np.ones((1,1)) * r_e, np.zeros((n,1))), 0))
A = cvxopt.matrix(1.0, (1, n))
b = cvxopt.matrix(1.0)
sol = cvxopt.solvers.qp(P, q, G, h, A, b)
return sol
r = df.mean().values # Expected returns
r_e = 1.45 * # Lower bound for portfolio's return
cov = df.cov() # Covariance matrix
# Solve QP and derive optimal portfolio
sol = cvxopt_qp_solver(r, r_e, cov)
x_opt = np.array(sol['x'])
print(x_opt)
print("Variance (x_opt) :", sol["primal objective"])
-----
pcost dcost gap pres dres
0: 4.3680e-03 -8.6883e-02 5e+00 2e+00 2e+00
1: 9.1180e-02 -2.2275e-01 5e-01 1e-01 1e-01
2: 2.1337e-02 -6.0274e-02 8e-02 2e-16 1e-16
3: 1.0483e-02 -1.7810e-03 1e-02 1e-16 3e-17
4: 4.9857e-03 1.5180e-03 3e-03 2e-16 8e-18
5: 4.0217e-03 3.6059e-03 4e-04 3e-17 1e-17
6: 3.7560e-03 3.7107e-03 5e-05 3e-17 1e-18
7: 3.7187e-03 3.7168e-03 2e-06 1e-17 4e-18
8: 3.7169e-03 3.7168e-03 2e-08 1e-16 6e-18
Optimal solution found.
[ 5.56e-05]
[ 1.00e+00]
[ 1.76e-05]
[ 3.84e-07]
[ 2.63e-07]
Variance (x_opt): 0.003716866155475511 #Optimal portfolio diversification
The optimal solution (optimal investment ratio for each asset) and the optimal value (portfolio diversification when the optimal investment ratio is applied) were obtained. The optimal solution based on the mean variance model used this time is called "minimum variance portfolio" because it emphasizes the optimality for portfolio risk (variance).
In many cases, the Sharpe ratio, which takes into account the rate of return (inflation rate) of risk-free assets, is used as the evaluation index for the rate of return. There are several ways to backtest, so please refer to specialized books and treatises.
Thank you for reading to the end!
Recommended Posts