Efficiency

近期的工作日志。一个项目,要写 backtesting 回测。我 7 月 11 号开始看这个项目,到 19 号写出大纲。花了一个半星期琢磨 Python 编程,网上查阅资料。31号白天,一天就弄得差不多了。发现这件事情本可以更快完成。

当然,不完全怪我。我第一次写回测就是自己写,最后系统出了问题。Bruce 亦告诉过我看别人的 coding,因此花时间琢磨,是出于这个原因。但是,低估了写这么一个系统的难度。同时缺乏指点,不知道什么是重点。加上缺乏经验。这是客观原因。

但还是,做事情这么慢?这个工作习惯不是第一次出现。费了很大气力,但是产出并没有很多。读研究生也是,其实看了很多书,但是论文进展并不快。

  • 把简单问题复杂化了。回测系统可以写得简单,也可以写得复杂,考虑 transaction cost 以及如何设计系统,等等。对这个具体项目而言,其实很简单的 Excel 就可以搞定。但是我看了很久 Python 的内容。技术上确实学到一些,但是因为没有指导,自己摸索,效率并不高。Bruce 说,有人碰到问题会上手做,有的人碰到问题会先想清楚。我可能是后者。
    • 类似案例:有次帮老师算香港离婚率相关的矩阵,明明 excel 就可以做,我非要用 R。十分钟就可以做好的任务,我花了三天学习离婚率矩阵、学术界怎么测量,最后还是用简单方法算出来。
    • 有次看  momentum factor,本来要求很简单,但是我读了一些文献等等,因此多花费了时间。
    • 可能我就是喜欢把事情搞清楚?
  • 畏难情绪。并不是所有任务我都会拖延。比如改 PPT ,或者翻译,或者简单的任务,其实我做得很快。但是对于编程、论文等等需要动脑筋的工作,或者我信心不足的工作,似乎都会先选择“积累知识”,再去着手解决任务。例如三月份时需要扒数据,我选择的不是尽快学会 Python 包裹,而是先看了很多 html/javascript 网站设计的内容。
    • 这可能是种逃避。认为工作做得不好,会影响对自己的看法。因此永远要先做好准备工作才动手。
    • 可能是因为懒惰,不想做事情。而看 html 知识比较简单比较快。
    • 当然也和自己探索有关系。没有人指点用什么方法最快,或者任务如何简单处理。针对网站爬虫,当时确实也碰到技术难题,需要更多知识才能解决。
  • 目标和计划不明确。做事和学习的心态应该是非常不同的。
    • 做事:明确客户需求,准时高效 deliver。不需要搞得很复杂
    • 学习:学得扎实,学到新东西就可以了。不一定要有产出,而是练习的过程。
    • 也就是 “知识的消耗者” 和 “知识的生产者” 的区别。
  • 经验不足。没有做过的事情,对时间预估不足也挺正常。毕竟是探索的过程。
    • 自学能力有待提高。学海无涯,但是要想明白自己要从学习中获得什么。
  • 决心不够坚定。没有明确的 deadline。因为没有 external structure,关键原因是没有很 motivated。

如何解决这个问题?

  • 要下定决心。合理估计,立下的事情一定要完成。因为长时间来负面工作习惯,缺乏 validation,导致自信心降低,因此缺乏 motivation。从小任务开始,建立自信是第一步。
  • 练习 get things done 的 mindset 并每天记录,养成习惯

Update 2017 Aug 1

Bruce showed me the power of Excel and I started to wonder why I invested the time to write Python script in the first place. Is this somewhat tech-ism? Like to use complicate things. 

(Deeper level: feel like an intellectual. Self-esteem is built on “doing difficult things”.)

https://www.quora.com/What-are-the-benefits-of-a-Pythons-pandas-over-Microsoft-Excel-for-data-analysis

https://www.quora.com/When-should-I-use-excel-instead-of-python-for-data-analysis-and-vice-versa

I don’t think its a choice of “Python & Panda” or “Excel.” Rather, I view them as complimentary. I wouldnt use Panda to browse data (but you could), and I wouldn’t use Excel as a tool to clean up data or automate tasks (but you could). I’d use the right tool at the right time for the job.

Panda has a lot of power, but at a high level, the module is really good at two things:
1) Munging Data Sets: helping you clean up and put data together into a format that is easy to use, excel friendly, and analyze.
2) Automating the clean up of data sets (missing data, incongruent dates in series,etc).

Excel is simply not good at these things. Even if you are a keyboard jockey, it can take hours and hours to clean up and get even the smallest data sets to the point where you can do things like pivot tables etc (think lots of selecting, cutting and pasting).

To give a real world example, I use ad networks to monetize remnant inventory on my mobile apps. I use probably 10-15 ad networks (different apps, countries etc) and each ad network generates a csv file in a slightly different format. If I were to download each of these reports by hand each day and combine them into Excel, I would never have any time to actually analyze the results (not to mention the fact that this approach is fraught with the potential to create errors).  As result, I use Python and Pandas to take all my files, clean and combine them, and dump them into an Excel workbook.  THEN, I use Excel to browse, think about, and make decisions about the data. 

On the other hand, lets say I want to do a quick ad hoc analysis and I have a fairly neat, clean and reasonably sized (100s or 1000s of lines) data set (e.g. stock data), I’m probably not going to write a python script to analyze it in the early stages. Rather, Im just going to pull it into Excel, maybe put it into a pivot table and take a look at it and noodle on it some. If I decide that this is a data set I want to do something special with or I am going to be using this data over and over in the future, then I’ll invest the time to write a script.

Tips from Learn Python the Hard Way

What I discovered after this journey of learning is that it’s not the languages that matter but what you do with them. Actually, I always knew that, but I’d get distracted by the languages and forget it periodically. Now I never forget it, and neither should you.

Which programming language you learn and use doesn’t matter. Do not get sucked into the religion surrounding programming languages as that will only blind you to their true purpose of being your tool for doing interesting things.

People who can code in the world of technology companies are a dime a dozen and get no respect. People who can code in biology, medicine, government, sociology, physics, history, and mathematics are respected and can do amazing things to advance those disciplines.

We are defined by our memories 

If you’ve had an incredible morning, you’ll likely continue succeeding the rest of the day. Conversely, if you hit the snooze button a dozen times, and wastefully drag through your morning, you’ll likely justify mediocrity the rest of the day.

If we do this long enough, our whole life—our past—will not be what we intended it to be. As J.M. Barrie, author of Peter Pan, has said, “The life of every man is a diary in which he means to write one story, and writes another; and his humblest hour is when he compares the volume as it is with what he vowed to make it.”

Mindfulness increases self-control; since you’re not getting thrown by threats to your self-esteem, you’re better able to regulate your behavior. That’s the other irony: Inhabiting your own mind more fully has a powerful effect on your interactions with others.

Share this post

One thought on “Efficiency

  1. YX

    因为高度封装化导致的使用者和学习者的成本差异非常大。编程的特点就是这样,或者说其他学科也有这样的特性,但是各种库和接口的应用使得编程把这个特性更放大化了
    我自己上手项目的时候也总有这种感觉,先读论文讲算法演化数学推导横向比较各种minibatch的影响,往往读着读着就觉得“这东西怎么这么难懂”觉得根本看不下去了;实在受不了去看看有没有现成的库提供相似算法,又发现往往一个自带的函数调用就解决了“这东西怎么可能这么简单”;但是细究发现理论算法和实际应用总会存在差别,又想着能不能自己改改将就着用,一翻源码觉得“改东西倒是很简单,但是自己加东西上去怎么弄运行都会存在错误”…折腾下来自己都觉得累
    我觉得还是要弄清自己到底是“engineer”还是“scientist”,前者你要想的是怎么改自己的需求才能符合你使用的工具,后者才需要自己来理解工具甚至修改工具,两者的学习曲线差别太大了

    Reply

Leave a Reply

Your email address will not be published. Required fields are marked *