On backtesting – Keep updating

Intros : vectorized backtester and event-driven backtester

vectorized backtesting 

Excel R or Python, as only need dataframes. Data structure is simple, yet still, how to design project flow?

For a simple strategy, five steps

  1. Acquire the data — csv or data frame
  2. Create the indicators
  3. Set up the trading logic and generate signals (note here, kind of problematic. how to record variables?)
  4. Calculate the returns
  5. Set up a report of portfolio metrics

A series of tutorials :   including vectorized backtesting, and event-driven backtesing.


Components of backtesting system – tutorials on Quantstart

When developing a backtesting system it is tempting to want to constantly “rewrite it from scratch” as more factors are found to be crucial in assessing performance.

There are generally two types of backtesting system that will be of interest. The first is research-based, used primarily in the early stages, where many strategies will be tested in order to select those for more serious assessment. These research backtesting systems are often written in Python, R or MatLab as speed of development is more important than speed of execution in this phase.

The second type of backtesting system is event-based. That is, it carries out the backtesting process in an execution loop similar (if not identical) to the trading execution system itself. It will realistically model market data and the order execution process in order to provide a more rigourous assessment of a strategy.

The latter systems are often written in a high-performance language such as C++ or Java, where speed of execution is essential. For lower frequency strategies (although still intraday), Python is more than sufficient to be used in this context.

Quantstart use an object-oriented research backtester in Python 

Three components :

  • strategy – receive a Pandas DataFrame of bars (open-high-low-close-volume) data points. Then produce signals – a timestamp and an element from the set {1, 0, -1} — long, hold or short signal
  • portfolio – receive signal and create positions. produce an equity curve
  • performance – takes portfolio and produce a set of statistics. Risk, return, Sharpe, Information Ratios…

Abstract base classes in python …


QSTrader and Zipline

Event driven strategies components 

  • event
  • event quene – stores all the events
  • datahandler –


Jul 26 2017

Researched on Quantopian, Zipline and Pyfolio – the other two difficult to use because

  • Quantopian limit its own data
  • Zipline and Pyfolio rely on Yahoo! finance data but the API is down in May

Thus, decided to stick with Quantstart. Also for the purpose build research environment. Link here 

Python tutorial on Classes 

MIT Python materials 

CodeAcademy course on Python class is the most straightforward. 

C++ concepts and Object-oriented programming terminologies — constructor and destructor.

Quadl — a finance data library


Jul 27 2017

Zipline and csv… http://www.prokopyshen.com/create-custom-zipline-data-bundle

Basic python === _name_ == “__main__” what does this do?

On stackoverflow …

Run the module only when it is used by itself, not when imported from another module.

Lack of knowledge: Modules 

Vendor Data : question here 

Structure of a bar — Open-High-Low-Close-Volume (OHLCV) data points at a particular frequency

Data vendor = quandl 

  1. need to authentiate Python session with API keys, with the following line –
    1. import quandl
      quandl.ApiConfig.api_key = ‘YOURAPIKEY’
  2. Quandl codes when retreving data …To download a dataset, you will need to know its “Quandl code”.  In the above example, you downloaded a dataset with the Quandl code “WIKI/FB”.Every Quandl code has 2 parts: the database code (“WIKI”) which specifies where the data comes from, and the dataset code (“FB”) which identifies the specific time series you want.

Jul 28

Give up on the idea of building a whole backtesting system… Start from something simple. Aim of this version of codes:

  • Use strategy and position class
  • Add some performance tearsheet

Abstract base class – class for classes. Duplicate usage of classes.

Conversation with a quant – a strategy is about timing of enter and exit. Put on single names.

If multiple names and weighting, about portfolio management. Different morals.

He does not believe in NLP as many newspapers are not written by human… unless you have a clear thought what you put into that system.

I made a stupid mistake: borrowing codes that were written three years ago. Now that grammar and version changes there are many bugs and codes no longer work… shit.

  • class—Tell Python to make a new kind of thing.
  • object—Two meanings: the most basic kind of thing, and any instance of some thing.
  • instance—What you get when you tell Python to create a class.
  • def—How you define a function inside a class.
  • self—Inside the functions in a class, self is a variable for the instance/object being accessed.
  • inheritance—The concept that one class can inherit traits from another class, much like you and your parents.
  • composition—The concept that a class can be composed of other classes as parts, much like how a car has wheels.
  • attribute—A property classes have that are from composition and are usually variables.
  • is-a—A phrase to say that something inherits from another, as in a “salmon” is-a “fish.”
  • has-a—A phrase to say that something is composed of other things or has a trait, as in “a salmon has-a mouth.”
  • Speed of Development – One shouldn’t have to spend months and months implementing a backtest engine. Prototyping should only take a few weeks. Make sure that your software is not hindering your progress to any great extent, just to grab a few extra percentage points of execution speed. C++ is the “elephant in the room” here!

A post on sentiment trading… Now at a better position to understand it.

A key challenge in developing such a system is integrating the events representing sentiment, as stored in a CSV file of “datetime-ticker-sentiment” rows, into an event-driven trading system that is (usually) designed to trade directly off pricing data.

Note for the cut-loss I did, this is indeed a challenge in how to incorporate signals with trade actions? I had to use for loop… As this is impossible to be done vectorizely, because each position depends on previous positions.

The most important factor why this cannot be done vectorized is: the position is dependent on past holdings and shares. In QuantStart, this is not the case because it always buys a fixed amount of shares !!!! So it can generate a position series without for loop…

A collection of libraries 


Worth looking into:

  • backtrader
  • bt
  • zipline
  • QStrader

What to look?

  • designs of system
  • also how they incorporate into tearsheet.

For backtrader:

Use the analyzer method get_pf_items to retrieve the 4 components later needed for pyfolio:

returns, positions, transactions, gross_lev = pyfoliozer.get_pf_items()
Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *