Li Yuqiong_BacktestingExercises
Date: 2017 May 4¶
Author: Li Yuqiong¶
Exercises for backtesting¶
Updates from Previous version on 2017 Mar 2:¶
- Added calculation of monthly log return and volatility.
Current version of this document demonstrates a Python script to implement two trading strategies based on historical daily end stock data.
The stock data containing historical end price is downloaded from Yahoo! Finance using Pandas. This script contains a list of S&P500 ticker data and thus can analyze the performance on these listed companies.
Everyday’s trading state is characterized by four variables:
- “Share” = number of shares of each stock symbol. The value takes integers only.
- “Cash” = cash available for buying stocks.
- “Position” = Share * Cash. The value of stocks at that day’s end price.
- “Total” = Cash + Position. The net asset value of trading account at the date according to stock’s daily end price.
Two trading strategies are:
- The trigger point is, once encounter a 5% decrease of initial day end price within the month, cut 100% position the next day. Buy back the same stock using all available cash at the beginning of next month, and repeat the process.
- First trigger point is, cut 20% position at 5% loss on initial day end price based on day end price within every month. Then, if encounter another 5% loss on day end price within the same month, cut 100% of position the next day. Buy back the same stock using all available cash at the beginning of next month, and repeat the process.
In the end, this version of the script outputs a Pandas Panel object with three dimension being 1) ticker symbol 2) trading date and 3) net asset value (i.e. “Total”) at the end of each day using strategy 1 and strategy 2.
With the output panel object, the remaining work is to do a statistical analysis on the structural performance of these strategies, and the optimal allocation of capital on different investment options. This part of analysis is not included in this version of the code.
Also, some of the indexing methods in this version of code is not the most efficient way. This can be improved in future versions of the codes.
Finally, for demonstration, initial setup in this program is Share = 0, Total = 10000.0
The testing period is from 2015-1-1 to 2015-03-01
The stocks being tested are “ABBV”, “ABT”, “MMM”
import numpy as np
import pandas as pd # pandas for data storage and analysis
import pandas_datareader.data as web # this module provides function to fetch data from yahoo finance
import datetime
import requests, bs4 # for web scrapping
The section below includes self-defined functions used in constructing trading strategies. These actions include: get ticker list from Wikipedia; get monthly data based on historical daily end price; check for trigger dates; initialize each month’s first state of Share and Cash; perform buying, selling and holding actions.
Explainations are included before each function.
### Function name: getSP500.
### Dependencies: requests, bs4
### Note: This function scapes S&P500 ticker symbols from wikipedia
###### and store them in a list
def getSP500():
url = "https://en.wikipedia.org/wiki/List_of_S%26P_500_companies"
res = requests.get(url)
res.raise_for_status()
total = bs4.BeautifulSoup(res.text, "lxml")
elems = total.select('a[href^="https://www.nyse.com/quote/"]')
ticker = [x.getText() for x in elems]
return(ticker);
### Function name: getMonthlyData
### Dependencies: pandas
### Note: This function takes in a pandas dataframe with all data and
##### return slices of it based on year and month
def getMonthlyData(df, year, month):
dat = df.loc[(df.index.year==year) & (df.index.month == month)]
return(dat);
### Function name: checkCut
### Dependencies: pandas
### Note: This function takes in a pandas dataframe with monthly data
##### return dates of loss trigger and cut
def checkCut(dat, percent):
threshold1 = dat.Close[0] * (1-percent)
if dat.loc[dat.Close <= threshold1].empty:
return {};
else:
triggerDate1 = dat.index[dat.Close <= threshold1][0]
if (triggerDate1 == dat.index[-1]):
return {}; # if triggerDate1 is the last day of the month, do not cut
else:
cutDate1 = dat.ix[dat.index.get_loc(triggerDate1)+1].name
datslice = dat[triggerDate1: ]
threshold2 = dat.Close[0] * (1-percent)* (1-percent)
if datslice.loc[datslice.Close <= threshold2].empty:
return {"triggerDate1": triggerDate1, "cutDate1": cutDate1};
else:
triggerDate2 = datslice.index[datslice.Close <= threshold2][0]
if (triggerDate2 == datslice.index[-1]):
return {"triggerDate1": triggerDate1, "cutDate1": cutDate1};
else:
cutDate2 = datslice.ix[datslice.index.get_loc(triggerDate2)+1].name
return {"triggerDate1": triggerDate1, "cutDate1": cutDate1,
"triggerDate2": triggerDate2, "cutDate2": cutDate2};
### Function name: Cut
### Dependencies: pandas
### Note: This function takes in a pandas dataframe with monthly data and indexes
##### return an updated dataframe slices where stock from index1 are cut at index2
def Cut(dat, index1, index2, percent):
dat.Share[index2] = np.floor(dat.Position[index1] * (1-percent) /dat.Close[index2])
dat.Cash[index2] = dat.Cash[index1] + (dat.Share[index1] - dat.Share[index2]) * dat.Close[index2]
dat.Position[index2] = dat.Share[index2] * dat.Close[index2]
dat.Total[index2] = dat.Position[index2] + dat.Cash[index2]
return(dat);
### Function name: Hold
### Dependencies: pandas
### Note: This function takes in a pandas dataframe with monthly data and indexes
##### return an updated dataframe slices where stock are hold from index1 to index2
def Hold(dat, index1, index2):
dat.Share[index1:index2] = dat.Share[index1]
dat.Cash[index1:index2] = dat.Cash[index1]
dat.Position[index1:index2] = dat.Share[index1:index2] * dat.Close[index1:index2]
dat.Total[index1:index2] = dat.Position[index1:index2] + dat.Cash[index1:index2]
return(dat);
Caveats for the codes below: this version of getPrevious() function works for Year = 2015 only! A further revision is to modify parameters to make manual adjustment of years in this chunk of code unnecessary.
### Function name: getPrevious
### Dependencies: pandas
### Note: This function takes in 1) a pandas dataframe containing all historical data and 2) year and month to fetch data
##### return an the initial state of stock share and cash from last month
### Caveats: Beginning of months are tricky
def getPrevious(df, year, month):
if ((year == df.index.year[0]) & (month == df.index.month[0])):
inShare = 0.0
inCash = 10000.0
elif (month == 1):
inShare = df.Share[(df.index.year==year-1) & (df.index.month == 12)].tail(1)[0]
inCash = df.Cash[(df.index.year==year-1) & (df.index.month == 12)].tail(1)[0]
else:
inShare = df.Share[(df.index.year==year) & (df.index.month == month-1)].tail(1)[0]
inCash = df.Cash[(df.index.year==year) & (df.index.month == month-1)].tail(1)[0]
return([inShare, inCash]);
### Function name: firstDay
### Dependencies: pandas, getPrevious function
### Note: This function takes in 1) monthly data to be updated 2) initial states of cash and share
##### return the first day position of each month, where supposed to buy back
def firstDay(dat, inShare, inCash):
dat.Share[0] = inShare + np.floor(inCash/dat.Close[0])
dat.Cash[0] = inCash - (dat.Share[0]-inShare) * dat.Close[0]
dat.Position[0] = dat.Close[0] * dat.Share[0]
dat.Total[0] = dat.Position[0] + dat.Cash[0]
return(dat);
This section below includes codes to perform strategy 1 and strategy 2 as described at the beginning.
### Function name: getAsset1
### Dependencies: pandas, getPrevious, getMonthlyData, firstDay, Hold, Cut
### Note: This function takes in monthly data to be updated
##### return the updated data slice after executing strategy 1
def getAsset1(df):
for year in range(df.index.year[0], df.index.year[-1]+1):
for month in range(df.loc[df.index.year==year].index.month[0], df.loc[df.index.year==year].index.month[-1]+1):
dat = getMonthlyData(df, year, month)
inShare = getPrevious(df, year, month)[0]
inCash = getPrevious(df, year, month)[1]
dat = firstDay(dat, inShare, inCash)
beginDate = dat.index[0]
endDate = dat.index[-1]
if (len(checkCut(dat, percent))==0):
Hold(dat, beginDate, endDate)
else:
triggerDate1 = checkCut(dat, percent)['triggerDate1']
cutDate1 = checkCut(dat, percent)['cutDate1']
Hold(dat, beginDate, triggerDate1)
Cut(dat, triggerDate1, cutDate1, s1cutPt1)
Hold(dat, cutDate1, endDate)
df.loc[(df.index.year==year) & (df.index.month == month)] = dat
return(df);
### Function name: getAsset2
### Dependencies: pandas, getPrevious, getMonthlyData, firstDay, Hold, Cut
### Note: This function takes in monthly data to be updated
##### return the updated data slice after executing strategy 2
def getAsset2(df):
for year in range(df.index.year[0], df.index.year[-1]+1):
for month in range(df.loc[df.index.year==year].index.month[0], df.loc[df.index.year==year].index.month[-1]+1):
dat = getMonthlyData(df, year, month)
inShare = getPrevious(df, year, month)[0]
inCash = getPrevious(df, year, month)[1]
dat = firstDay(dat, inShare, inCash)
beginDate = dat.index[0]
endDate = dat.index[-1]
if (len(checkCut(dat, percent))==0):
Hold(dat, beginDate, endDate)
elif (len(checkCut(dat, percent))==2):
triggerDate1 = checkCut(dat, percent)['triggerDate1']
cutDate1 = checkCut(dat, percent)['cutDate1']
Hold(dat, beginDate, triggerDate1)
Cut(dat, triggerDate1, cutDate1, s2cutPt1)
Hold(dat, cutDate1, endDate)
else:
triggerDate1 = checkCut(dat, percent)['triggerDate1']
cutDate1 = checkCut(dat, percent)['cutDate1']
triggerDate2 = checkCut(dat, percent)['triggerDate2']
cutDate2 = checkCut(dat, percent)['cutDate2']
Hold(dat, beginDate, triggerDate1)
Cut(dat, triggerDate1, cutDate1, s2cutPt1)
Hold(dat, cutDate1, triggerDate2)
Cut(dat, triggerDate2, cutDate2, s2cutPt2)
Hold(dat, cutDate2, endDate)
df.loc[(df.index.year==year) & (df.index.month == month)] = dat
return(df);
Experiment¶
This part prepares and cleans data available for analysis.
Again, for demonstration, initial setup in this program is Share = 0, Total = 10000.0
The testing period is from 2002-11-16 to 2005-04-24. These two dates are a Saturday and a Sunday respectively. To change the period of backtesting, modify the start and end date is enough.
The stocks being tested are “MSFT” Microsoft and “AMZN” Amazon. This code works for other tickers listed on S&P500 as well.
### From here start stat analysis on strategies
start = datetime.datetime(2002, 11, 16) # This is a Saturaday
end = datetime.datetime(2005, 4, 24) # This is a Sunday
# start = datetime.datetime(1998, 3, 14) # Another test date
# end = datetime.datetime(2001, 1, 1)
# ticker = getSP500()[0:3]
ticker = ['MSFT', 'AMZN']
f = web.DataReader(ticker, "yahoo", start, end)
pnl = f.transpose(2, 1, 0)
pnl = pnl.drop(['Open', 'High', 'Low', 'Volume', 'Adj Close'], 2)
pnl.loc[:, :, 'Share'] = 0.0
pnl.loc[:, :, 'Position'] = 0.0
pnl.loc[:, :, 'Cash'] = 10000.0
pnl.loc[:, :, 'Total'] = 10000.0
percent = 0.05
s1cutPt1 = 1
s2cutPt1 = 0.2
s2cutPt2 = 1
This part of the codes implements the previous two trading strategies on data, collect results and organize it into a sigle output object.
s1pnl = pnl.apply(lambda x:getAsset1(x), axis=(1, 2)) # results of strategy 1
s2pnl = pnl.apply(lambda x:getAsset2(x), axis=(1, 2)) # results of strategy 2
Get a detailed view of each stock’s specific performance for each strategy¶
The charts below shows the daily changes in trading account, including shares, cash, position and total money. There thus should be four tables: Amazon for strategy one and two, Microsoft for strategy one and two.
s1pnl.loc['AMZN'] # Performance for Amazon at Strategy one
s1pnl.loc['MSFT'] # Performance for Microsoft at Strategy one
s2pnl.loc['AMZN'] # Performance for Amazon at Strategy two
s2pnl.loc['MSFT'] # Performance for Microsoft at Strategy two
Get a detailed view of the tickers available in meta output, with only tickers, closing prices and totals.¶
The problem of date span is fixed thanks to Pandas DataReader’s built-in facilities.
pnlAll = {'Close': s1pnl[:, :, 'Close'],
'Total1': s1pnl[:, :, 'Total'],
'Total2': s2pnl[:, :, "Total"]}
pnlAll = pd.Panel(pnlAll).transpose(2, 1, 0)
pnlAll
pnlAll.keys()
pnlAll.loc['AMZN']
pnlAll.loc['MSFT']
Visualization¶
The codes below visualizes the performance of strategy one and two, with regard to the two stocks being analyzed. For both chart the red line represents strategy one and the blue line represents strategy two.
import matplotlib.pyplot as plt
plt.style.use('ggplot') # Set plotting style
# The plot for Amazon
plt.xlabel('Date')
plt.ylabel('Total amount of money')
plt.title('Results of Amazon, Red = strategy 1 and Blue = strategy 2')
plt.plot(pnlAll.loc['AMZN'].index, pnlAll.loc['AMZN'].Total1, pnlAll.loc['AMZN'].index, pnlAll.loc['AMZN'].Total2)
plt.show()
# The plot for Microsoft
plt.xlabel('Date')
plt.ylabel('Total amount of money')
plt.title('Results of Microsoft, Red = strategy 1 and Blue = strategy 2')
plt.plot(pnlAll.loc['MSFT'].index, pnlAll.loc['MSFT'].Total1, pnlAll.loc['MSFT'].index, pnlAll.loc['MSFT'].Total2)
plt.show()
Statistical analysis of strategies’ performance for each stock based on return series¶
Data for analysis:¶
Extracting time series of monthly returns. Why? Because the updating frequency of each strategy is monthly, so daily return doesn’t really capture the full performance of strategies.
Indexes to look at include:
- Monthly logarithmic returns
- Volatility: the standard deviation of monthly logarithmic returns
- Sharp Ratio: risk/return measures.
- Max Drop:
- Benchmark performance:
### A function to get the end dates of each month
def getMonthDates(tempTradeDays):
dateRange = [tempTradeDays[0]] # A list with first date
tempYear = None
dictYears = tempTradeDays.groupby(tempTradeDays.year)
for yr in dictYears.keys():
tempYear = pd.DatetimeIndex(dictYears[yr]).groupby(pd.DatetimeIndex(dictYears[yr]).month)
for m in tempYear.keys():
dateRange.append(max(tempYear[m]))
dateRange = pd.DatetimeIndex(dateRange).order()
return(dateRange);
import math;
# A function to compute log return
## df is a time-series with datetimeIndex and monthly return
def logReturn(df):
ratio = df/df.shift(1)
log_return = ratio.apply(lambda x: math.log(x))
return(log_return);
monthly_end_date = getMonthDates(pnlAll.iloc[0].index)
returnAll = pnlAll.loc[:,monthly_end_date, :]
returnAll.ix[:,:, 'Log_Return1'] = returnAll.ix[:,:, 'Total1'].apply(logReturn)
returnAll.ix[:,:, 'Log_Return2'] = returnAll.ix[:,:, 'Total2'].apply(logReturn)
The log return for strategy one on Amazon and Microsoft.
returnAll.ix[:,:, 'Log_Return1']
The log return for strategy two on Amazon and Microsoft.
returnAll.ix[:,:, 'Log_Return2']
The volatility for strategy one on the period being tested.
return_volatility_1 = np.std(returnAll.ix[:,:, 'Log_Return1'])
return_volatility_1
The volatility for strategy two on the period being tested.
return_volatility_2 = np.std(returnAll.ix[:,:, 'Log_Return2'])
return_volatility_2
The Sharp ratio for strategy one and two. The interest rate for T-Bill at the period is
Pingback: Use Python to do finance-related stuff – Yuqiong Li