Monthly Archives: July 2017

Efficiency

近期的工作日志。一个项目,要写 backtesting 回测。我 7 月 11 号开始看这个项目,到 19 号写出大纲。花了一个半星期琢磨 Python 编程,网上查阅资料。31号白天,一天就弄得差不多了。发现这件事情本可以更快完成。

当然,不完全怪我。我第一次写回测就是自己写,最后系统出了问题。Bruce 亦告诉过我看别人的 coding,因此花时间琢磨,是出于这个原因。但是,低估了写这么一个系统的难度。同时缺乏指点,不知道什么是重点。加上缺乏经验。这是客观原因。

但还是,做事情这么慢?这个工作习惯不是第一次出现。费了很大气力,但是产出并没有很多。读研究生也是,其实看了很多书,但是论文进展并不快。

  • 把简单问题复杂化了。回测系统可以写得简单,也可以写得复杂,考虑 transaction cost 以及如何设计系统,等等。对这个具体项目而言,其实很简单的 Excel 就可以搞定。但是我看了很久 Python 的内容。技术上确实学到一些,但是因为没有指导,自己摸索,效率并不高。Bruce 说,有人碰到问题会上手做,有的人碰到问题会先想清楚。我可能是后者。
    • 类似案例:有次帮老师算香港离婚率相关的矩阵,明明 excel 就可以做,我非要用 R。十分钟就可以做好的任务,我花了三天学习离婚率矩阵、学术界怎么测量,最后还是用简单方法算出来。
    • 有次看  momentum factor,本来要求很简单,但是我读了一些文献等等,因此多花费了时间。
    • 可能我就是喜欢把事情搞清楚?
  • 畏难情绪。并不是所有任务我都会拖延。比如改 PPT ,或者翻译,或者简单的任务,其实我做得很快。但是对于编程、论文等等需要动脑筋的工作,或者我信心不足的工作,似乎都会先选择“积累知识”,再去着手解决任务。例如三月份时需要扒数据,我选择的不是尽快学会 Python 包裹,而是先看了很多 html/javascript 网站设计的内容。
    • 这可能是种逃避。认为工作做得不好,会影响对自己的看法。因此永远要先做好准备工作才动手。
    • 可能是因为懒惰,不想做事情。而看 html 知识比较简单比较快。
    • 当然也和自己探索有关系。没有人指点用什么方法最快,或者任务如何简单处理。针对网站爬虫,当时确实也碰到技术难题,需要更多知识才能解决。
  • 目标和计划不明确。做事和学习的心态应该是非常不同的。
    • 做事:明确客户需求,准时高效 deliver。不需要搞得很复杂
    • 学习:学得扎实,学到新东西就可以了。不一定要有产出,而是练习的过程。
    • 也就是 “知识的消耗者” 和 “知识的生产者” 的区别。
  • 经验不足。没有做过的事情,对时间预估不足也挺正常。毕竟是探索的过程。
    • 自学能力有待提高。学海无涯,但是要想明白自己要从学习中获得什么。
  • 决心不够坚定。没有明确的 deadline。因为没有 external structure,关键原因是没有很 motivated。

如何解决这个问题?

  • 要下定决心。合理估计,立下的事情一定要完成。因为长时间来负面工作习惯,缺乏 validation,导致自信心降低,因此缺乏 motivation。从小任务开始,建立自信是第一步。
  • 练习 get things done 的 mindset 并每天记录,养成习惯

Update 2017 Aug 1

Bruce showed me the power of Excel and I started to wonder why I invested the time to write Python script in the first place. Is this somewhat tech-ism? Like to use complicate things. 

(Deeper level: feel like an intellectual. Self-esteem is built on “doing difficult things”.)

https://www.quora.com/What-are-the-benefits-of-a-Pythons-pandas-over-Microsoft-Excel-for-data-analysis

https://www.quora.com/When-should-I-use-excel-instead-of-python-for-data-analysis-and-vice-versa

I don’t think its a choice of “Python & Panda” or “Excel.” Rather, I view them as complimentary. I wouldnt use Panda to browse data (but you could), and I wouldn’t use Excel as a tool to clean up data or automate tasks (but you could). I’d use the right tool at the right time for the job.

Panda has a lot of power, but at a high level, the module is really good at two things:
1) Munging Data Sets: helping you clean up and put data together into a format that is easy to use, excel friendly, and analyze.
2) Automating the clean up of data sets (missing data, incongruent dates in series,etc).

Excel is simply not good at these things. Even if you are a keyboard jockey, it can take hours and hours to clean up and get even the smallest data sets to the point where you can do things like pivot tables etc (think lots of selecting, cutting and pasting).

To give a real world example, I use ad networks to monetize remnant inventory on my mobile apps. I use probably 10-15 ad networks (different apps, countries etc) and each ad network generates a csv file in a slightly different format. If I were to download each of these reports by hand each day and combine them into Excel, I would never have any time to actually analyze the results (not to mention the fact that this approach is fraught with the potential to create errors).  As result, I use Python and Pandas to take all my files, clean and combine them, and dump them into an Excel workbook.  THEN, I use Excel to browse, think about, and make decisions about the data. 

On the other hand, lets say I want to do a quick ad hoc analysis and I have a fairly neat, clean and reasonably sized (100s or 1000s of lines) data set (e.g. stock data), I’m probably not going to write a python script to analyze it in the early stages. Rather, Im just going to pull it into Excel, maybe put it into a pivot table and take a look at it and noodle on it some. If I decide that this is a data set I want to do something special with or I am going to be using this data over and over in the future, then I’ll invest the time to write a script.

Tips from Learn Python the Hard Way

What I discovered after this journey of learning is that it’s not the languages that matter but what you do with them. Actually, I always knew that, but I’d get distracted by the languages and forget it periodically. Now I never forget it, and neither should you.

Which programming language you learn and use doesn’t matter. Do not get sucked into the religion surrounding programming languages as that will only blind you to their true purpose of being your tool for doing interesting things.

People who can code in the world of technology companies are a dime a dozen and get no respect. People who can code in biology, medicine, government, sociology, physics, history, and mathematics are respected and can do amazing things to advance those disciplines.

We are defined by our memories 

If you’ve had an incredible morning, you’ll likely continue succeeding the rest of the day. Conversely, if you hit the snooze button a dozen times, and wastefully drag through your morning, you’ll likely justify mediocrity the rest of the day.

If we do this long enough, our whole life—our past—will not be what we intended it to be. As J.M. Barrie, author of Peter Pan, has said, “The life of every man is a diary in which he means to write one story, and writes another; and his humblest hour is when he compares the volume as it is with what he vowed to make it.”

Mindfulness increases self-control; since you’re not getting thrown by threats to your self-esteem, you’re better able to regulate your behavior. That’s the other irony: Inhabiting your own mind more fully has a powerful effect on your interactions with others.

On backtesting – Keep updating

Intros : vectorized backtester and event-driven backtester

vectorized backtesting 

Excel R or Python, as only need dataframes. Data structure is simple, yet still, how to design project flow?

For a simple strategy, five steps

  1. Acquire the data — csv or data frame
  2. Create the indicators
  3. Set up the trading logic and generate signals (note here, kind of problematic. how to record variables?)
  4. Calculate the returns
  5. Set up a report of portfolio metrics

A series of tutorials :   including vectorized backtesting, and event-driven backtesing.

 

Components of backtesting system – tutorials on Quantstart

When developing a backtesting system it is tempting to want to constantly “rewrite it from scratch” as more factors are found to be crucial in assessing performance.

There are generally two types of backtesting system that will be of interest. The first is research-based, used primarily in the early stages, where many strategies will be tested in order to select those for more serious assessment. These research backtesting systems are often written in Python, R or MatLab as speed of development is more important than speed of execution in this phase.

The second type of backtesting system is event-based. That is, it carries out the backtesting process in an execution loop similar (if not identical) to the trading execution system itself. It will realistically model market data and the order execution process in order to provide a more rigourous assessment of a strategy.

The latter systems are often written in a high-performance language such as C++ or Java, where speed of execution is essential. For lower frequency strategies (although still intraday), Python is more than sufficient to be used in this context.

Quantstart use an object-oriented research backtester in Python 

Three components :

  • strategy – receive a Pandas DataFrame of bars (open-high-low-close-volume) data points. Then produce signals – a timestamp and an element from the set {1, 0, -1} — long, hold or short signal
  • portfolio – receive signal and create positions. produce an equity curve
  • performance – takes portfolio and produce a set of statistics. Risk, return, Sharpe, Information Ratios…

Abstract base classes in python …

libraries

QSTrader and Zipline

Event driven strategies components 

  • event
  • event quene – stores all the events
  • datahandler –

 

Jul 26 2017

Researched on Quantopian, Zipline and Pyfolio – the other two difficult to use because

  • Quantopian limit its own data
  • Zipline and Pyfolio rely on Yahoo! finance data but the API is down in May

Thus, decided to stick with Quantstart. Also for the purpose build research environment. Link here 

Python tutorial on Classes 

MIT Python materials 

CodeAcademy course on Python class is the most straightforward. 

C++ concepts and Object-oriented programming terminologies — constructor and destructor.

Quadl — a finance data library

 

Jul 27 2017

Zipline and csv… http://www.prokopyshen.com/create-custom-zipline-data-bundle

Basic python === _name_ == “__main__” what does this do?

On stackoverflow …

Run the module only when it is used by itself, not when imported from another module.

Lack of knowledge: Modules 

Vendor Data : question here 

Structure of a bar — Open-High-Low-Close-Volume (OHLCV) data points at a particular frequency

Data vendor = quandl 

  1. need to authentiate Python session with API keys, with the following line –
    1. import quandl
      quandl.ApiConfig.api_key = ‘YOURAPIKEY’
  2. Quandl codes when retreving data …To download a dataset, you will need to know its “Quandl code”.  In the above example, you downloaded a dataset with the Quandl code “WIKI/FB”.Every Quandl code has 2 parts: the database code (“WIKI”) which specifies where the data comes from, and the dataset code (“FB”) which identifies the specific time series you want.

Jul 28

Give up on the idea of building a whole backtesting system… Start from something simple. Aim of this version of codes:

  • Use strategy and position class
  • Add some performance tearsheet

Abstract base class – class for classes. Duplicate usage of classes.

Conversation with a quant – a strategy is about timing of enter and exit. Put on single names.

If multiple names and weighting, about portfolio management. Different morals.

He does not believe in NLP as many newspapers are not written by human… unless you have a clear thought what you put into that system.

I made a stupid mistake: borrowing codes that were written three years ago. Now that grammar and version changes there are many bugs and codes no longer work… shit.

  • class—Tell Python to make a new kind of thing.
  • object—Two meanings: the most basic kind of thing, and any instance of some thing.
  • instance—What you get when you tell Python to create a class.
  • def—How you define a function inside a class.
  • self—Inside the functions in a class, self is a variable for the instance/object being accessed.
  • inheritance—The concept that one class can inherit traits from another class, much like you and your parents.
  • composition—The concept that a class can be composed of other classes as parts, much like how a car has wheels.
  • attribute—A property classes have that are from composition and are usually variables.
  • is-a—A phrase to say that something inherits from another, as in a “salmon” is-a “fish.”
  • has-a—A phrase to say that something is composed of other things or has a trait, as in “a salmon has-a mouth.”
  • Speed of Development – One shouldn’t have to spend months and months implementing a backtest engine. Prototyping should only take a few weeks. Make sure that your software is not hindering your progress to any great extent, just to grab a few extra percentage points of execution speed. C++ is the “elephant in the room” here!

A post on sentiment trading… Now at a better position to understand it.

A key challenge in developing such a system is integrating the events representing sentiment, as stored in a CSV file of “datetime-ticker-sentiment” rows, into an event-driven trading system that is (usually) designed to trade directly off pricing data.

Note for the cut-loss I did, this is indeed a challenge in how to incorporate signals with trade actions? I had to use for loop… As this is impossible to be done vectorizely, because each position depends on previous positions.

The most important factor why this cannot be done vectorized is: the position is dependent on past holdings and shares. In QuantStart, this is not the case because it always buys a fixed amount of shares !!!! So it can generate a position series without for loop…

A collection of libraries 

https://www.quantstart.com/articles/backtesting-systematic-trading-strategies-in-python-considerations-and-open-source-frameworks

Worth looking into:

  • backtrader
  • bt
  • zipline
  • QStrader

What to look?

  • designs of system
  • also how they incorporate into tearsheet.

For backtrader:

Use the analyzer method get_pf_items to retrieve the 4 components later needed for pyfolio:

returns, positions, transactions, gross_lev = pyfoliozer.get_pf_items()

发誓要多多旅行

目前我没有旅行的习惯。可能是因为懒,或者穷,不喜欢旅行也是我身上阶级性的东西。父母很辛苦,我很幸运。

大学里认识的同学,交往比较多的两位,都很喜欢旅行。旅行似乎是很酷的一件事情。去欧洲游玩的经历,想起来也很值得。所以,要克服的懒的习惯,多多旅行。

其实也就是三四天的事情。并不是时间问题。因为太忙而没有做的事情,闲下来也不会做。生活不是一种习惯,是一种态度。

公开发布以表决心。2017 年的计划,去趟台湾。美国也要多走走。

多交爱好旅行的朋友。

也想学唱歌,乐器,网球,拳击,和打枪。生活要精彩起来。

好了我去写在 to-do list 上了。

Market Timing

Related concepts :

  • Buy and Hold = periodic re-balance
  • day trading = short term
    • Think of hedge fund… One should never sell a security unless you need the money. Think about what Maggie said last time about hedge fund. These are smart guys, but they have to cut loss, because of the nature of capital.
  • Investment strategy 
    • contrarian investment = select good companies in time of down market
  • smaller companies

Mean reversion strategy : the tendancy for a stock to return to its mean. But the key is how to determine timing?

 

VBA tips

This post introduces some tips in writing efficient VBA programs, as well as code for certain common tasks.

Tasks:

  1. Find last row in a column. souce : https://www.rondebruin.nl/win/s9/win005.htm

Sub LastRowInOneColumn()
'Find the last used row in a Column: column A in this example
Dim LastRow As Long
With ActiveSheet
LastRow = .Cells(.Rows.Count, "A").End(xlUp).Row
End With
MsgBox LastRow
End Sub

2. Change date formats in Excel spreadsheet:
=IF(ISTEXT(A3),IF(LEN(A3)=9,DATE(RIGHT(A3,4),LEFT(A3,1),RIGHT(LEFT(A3,4),2)), IF(LEN(A3)=10,DATE(RIGHT(A3,4),LEFT(A3,2),RIGHT(LEFT(A3,5),2)),"Error")),DATE(YEAR(A3),DAY(A3),MONTH(A3)))

Other tips with using dates:

  • Bloomberg downloads data into Excel — date problem. If < 12 then interpret as days
  • Use python to clean data, use pd.read_csv(path, dayfirst = True)
  • Then in Python the dates sequences are right. In excel, also right.
  • But when output to csv, still wrong! It takes any number < 12 to be month….

3. Find last used row and column and clear all formats

  • selection.ClearContents
  • selection.ClearFormats

4. Copy contents from one sheet to another.

Sub Simple_Copy_Paste() Sheets("Sheet1").Range("B2:B8").Copy Sheets("Sheet2").Range("B2") End Sub

5. Count the number of non-empty cells !

MsgBox LoopIndex & " " & CellCount

6. Work with Ranges :

  • copying a range Range(“A1:A5”).Copy Range(“B1”)
  • copying a variable size range
  • selecting to the end of a row or column Range(ActiveCell, ActiveCell.End(xlDown)).Select
  • selecting a row or column ActiveCell.EntireColumn.Select

7. Change number formatting…

paste_ws.Range("J3:J" & LastRow).NumberFormat = "0.000000"

8. How to format cells ?

9. Replace blank values with zeros..
paste_ws.Range("K3:K" & row_length).Replace "", 0, xlWhole

Tips 

  1. What’s the difference between a VBA sub, vs a VBA function?
    1. function return values, but subs do not
    2. functions can only be called in other procedures, but subs can be executed directly

Reference here: http://analystcave.com/vba-function-vs-vba-sub-procedures/

2. Avoid avoid select method (because it is slow and error-prone) using VBA

A stackoverflow answer: https://stackoverflow.com/questions/10714251/how-to-avoid-using-select-in-excel-vba-macros#

3. Best practices for debugging

  • Examining the code
  • Inserting MsgBox functions at various locations in your code
  • Inserting Debug.Print statements
  • Using the Excel built-in debugging tools

4. The scope of VBA variable …

  • local — within sub
  • module — at the beginning of module
  • public scope — every module

5. Be careful when output to csv using VBA! Always alert. Closing this by force?

Note When using the SaveAs method for workbooks to overwrite an existing file, the Confirm Save As dialog box has a default of No, while the Yes response is selected by Excel when the DisplayAlerts property is set to False . The Yes response overwrites the existing file.When using the SaveAs method for workbooks to save a workbook that contains a Visual Basic for Applications (VBA) project in the Excel 5.0/95 file format, the Microsoft Excel dialog box has a default of Yes, while the Cancel response is selected by Excel when the DisplayAlerts property is set to False . You cannot save a workbook that contains a VBA project using the Excel 5.0/95 file format.

6. Handling errors in VBA. VBA Error handling ..

7. Excel relative, absolute and mixed references  – very helpful to know.

https://support.office.com/en-us/article/Switch-between-relative-absolute-and-mixed-references-dfec08cd-ae65-4f56-839e-5f0d8d0baca9

Examole: $A$1 vs A1

8. Identification a variable before calling: https://docs.microsoft.com/en-us/dotnet/visual-basic/language-reference/error-messages/object-variable-or-with-block-variable-not-set

9. Do not name a variable as the same name as a method. e.g. I once defined a sub named “selection” which conflicts with the Selection method. The error was Error: Object variable or With block variable not set.

References 

Book: VBA for dummies. A quick guide for knowing the structure of the language. Suggested reading time = 2 hours.

Book : IBM Excel spreadsheet modeling best practices. An old book published in 1999 …