Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Predictive Analytics in the Land of the Vampire Squid Dr. David Andre CEO Cerebellum Capital dandre at cerebellumcapital.com Predictive Analytics World, 2011 San Francisco Confidential and Proprietary - © Cerebellum Capital, 2009-2011 What I did after getting my PhD: 2 Confidential and Proprietary - © Cerebellum Capital, 2009-2011 What I moved to in early 2008 3 Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Overview • Perils and opportunities of Wall Street • Seven ways to get it very wrong • Our approach • Takeaways 4 Confidential and Proprietary - © Cerebellum Capital, 2009-2011 AI has come a long way: Watson Computers tend to win in the end when the contests are heavily influenced by speed, memory, and probability calculations Confidential and Proprietary - © Cerebellum Capital, 2009-2011 So…Is The Finance Domain Worth the Effort? • Asset Management is a $20 Trillion - $50 Trillion Domain, depending on how you count. • Machines are taking over trading at a fantastic rate. • Many excellent algorithms exist for high-frequency trading, many using predictive analytics in a relatively simple way. • Yet, there a widespread belief that computers can’t design meaningful new strategies – this belief is oddly ubiquitous in the field of quantitative finance. • Furthermore, Finance is nearly a perfect test-bed for AI. Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Finance seems deep & rich for AI • Huge datasets in both time & companies • Huge amounts of data about companies is flooding the web • Easily measured metrics for success • Huge rewards if you get it right • Nearly every aspect of AI can be useful: – Machine Learning (prediction & estimation) – Planning (trade planning) – Optimization (Portfolio optimization) – Knowledge Engineering – Text & Speech understanding Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Wait – aren’t the markets efficient? Maybe, unless: • You have better info or can use the info to make better predictions • You can reach the right answer faster or you can trade faster • Markets or investors allow a bookie to be a middle man, making both sides happy and taking a cut • You’re first to the inefficiency! Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Four ways being smart helps funds win • Deducing (sometimes using data available only to a few) who’s buying/selling what/when allows high-frequency front-running. • Predicting who will buy/sell based on recent price moves is technical trading. • Predicting relationships between assets that are semi- constant yields statistical arbitrage and pairs trades. • Estimating better a company’s true value yields longerterm profits. Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Seven traps of using predictive analytics on Wall St. 1. Lemons & Butter 2. Know thyself (or how to overfit without really trying) 3. Broker is not a noun, it’s an adjective 4. The test set must be IID and drawn from the same distribution… 5. What, my reality isn’t the worlds?!? 6. The big kahuna 7. Time moves forward one moment at a time Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Lemons and butter Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Lemons and butter Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Know thyself • Most Predictive Analytics researchers know well to test out-of-sample • However, you have to test the whole system out-ofsample (including the humans) to really get a fair test • Canonical examples are to run the whole process more than once (even if it includes cross-validation and out-of- sample testing). • Another is to throw away ideas that don’t work. Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Great way to fool yourself Add A New Input Stream Train/Learn In Sample Test Out of Sample NO Continue Do the Out of Sample Results Always Look Good? Keep Most Recently Added Input Stream YES YES Done Discard Most Recently Added Input Stream NO Does this Out of Sample Result Look Good? Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Broker? • Simple trading algorithms offered by most brokers are easily manipulated by the HFT community. • If you aren’t the shark, you are getting eaten. • The rules are set up to favor the bankers. • Brokers all have different symbology. • Trades do get messed up, and usually not in your favor. • The big brokers won’t take you if you’re not big, and it’s hard to get big without the advantages they offer. Confidential and Proprietary - © Cerebellum Capital, 2009-2011 The same distribution… • For the guarantees of theoretical ML to hold, the test set must be drawn from the same distribution the training set. • This is seldom less true than in finance. • Sarbanes-Oxley? • New scrutiny and regulation on shorting? • Context matters, e.g. tightness of credit: • Other funds/traders are figuring things out so alpha changes over time in a very complex way… Confidential and Proprietary - © Cerebellum Capital, 2009-2011 The same distribution? Confidential and Proprietary - © Cerebellum Capital, 2009-2011 The same distribution? Not so much Confidential and Proprietary - © Cerebellum Capital, 2009-2011 What? I can’t trade for free? • Most academic or online sources present alpha without taking into account real costs • Costs to borrow money and obtain leverage • Minimum commission costs • Slippage is real, especially in non S&P 100 names • Costs to borrow (and some assets can’t be borrowed at all) can be very high Confidential and Proprietary - © Cerebellum Capital, 2009-2011 What? I can’t trade for free? Without realistic trading constraints With realistic trading constraints Confidential and Proprietary - © Cerebellum Capital, 2009-2011 The Big Kahuna • You may not be as clever as you think, and lots of other hedge funds might be trading on very similar names. • If they have to get out of their positions (say, due to a liquidity crisis), your positions can get pounded. • If you can manage to hang in there, it can be worth it – but you have to be willing to accept 20 or 30% down! Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Time moves forward one moment at a time • It is remarkably easy to time-travel. • And, despite there being “so much data”, there is no way to get it faster – it comes only one minute a minute, one day a day. • Time travel is letting any information from the future get into the simulated past. • This can include even things like processor speed, so it’s very difficult to police. Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Some examples of time-travel • Survivor bias – can’t throw out Enron or Lehman. • Using a modern computer? They didn’t that in 1995. • Using a cleaned dataset? When did it get cleaned? • Stock universes? Are you selecting stocks based on inclusion in the SP500 today and looking at their past? • Throwing out models that used to look great • Trading like you could know the price now, now. Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Our approach • Focus on the long tail and automate discovery, not just tuning • Search over strategy space • Hybrid of humans and computers • Time safety • Bogosity detectors. Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Strategy 1 … … Feature K of Data Stream T Predictor 1 Predictor K Strategy K … … Predictor N Strategy W Hundreds Hundreds … Feature Y of Data Stream Z Hundreds Allocate to the best strategies so as to maximize returns, minimize risk, and keep portfolio balanced within risk constraints Trade Feature 1 of Data Stream 1 … Raw Data Streams A Decomposition of the Challenge Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Cost-effective for most Quant funds Long Tail Getting the cost to launch 1 additional strategy/leg toward zero The other 99% of the market inefficiencies! 26 Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Long Tail • Learn/discover the strategies Cost-effective for most Quant funds automatically • Humans find useful data sets • Advanced program search & evaluation finds features and combines them Getting the cost to launch 1 additional strategy/leg toward zero The other 99% of the market inefficiencies! 27 Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Search in program space… • We focus on looking for diverse strategies, not just simple variants of a human-derived strategy. • Essentially, we’re automating the process of science with respect to financial data. • Key to this is good input, as garbage-in, garbage-out. Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Process Is What Matters Weak World Class Human Chess Players Human Chess Players World Class Computer Chess Players Weak Computer Chess Players X X X ! (2005 “Freestyle Tournament” Playchess.com Winner was 2 weak humans + 3 weak laptops + Innovative Process) Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Roll Forward Cross Validation Time Out of Sample Test Out of Sample Test Pick best predictors and strategies to use Pick best predictors and strategies to use Pick best predictors and strategies to use Learn from past examples Learn from past examples Learn from past examples Out of Sample Test Pick best predictors and strategies to use Learn from past examples Pick be and stra Learn f examp Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Time safety in the programming language • Our strategy language and software infrastructure was built from time-safety first. • All strategies are coded in the same framework, with a “self-aware” representation so that the system can reason about the strategies. • The code can’t “see ahead” in the data structures, so time safety is guaranteed. Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Detecting Bogosity Candidate strategy SPY Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Detecting Bogosity 12% of random strategies with the same structure beat it in returns, 29% in Sharpe! Candidate strategy Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Some Strategies that ran in February 34 Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Takeaways 1. Be rigorous to avoiding time-travel and fool yourself 2. Learn/discover the right features 3. Use randomization to find the right complexity of model 4. The hybrid person/machine solution works best 5. Look where everyone else isn’t looking Confidential and Proprietary - © Cerebellum Capital, 2009-2011 Questions?
© Copyright 2025 Paperzz