Artificial Intelligence and Automation of Trading Systems Hindemburg Melão Jr. is an autodidact, holder of the world record of longest announced checkmate in simultaneous blindfold chess (Guinness Book 1998, Brazilian edition, pages 110-111); honorary member of Pars Society (for people with IQs higher than 180); he has worked with the development of automated trading systems since 2006, and is the author of “Saturn V”. Abstract: This article consists of a brief analysis of the positive and negative aspects of automating trading systems for stocks, commodities, currencies and other financial instruments. Different stages of the development of a strategy are compared with and without automation resources, and the results are evaluated at each step. In the optimization phase, we highlight the advantages of the use of genetic algorithms, which enable the solution problems involving 9.22 quintillion of setups. In the validation phase, we show a little innovation, comparing two or more backtests in slightly different conditions, instead of the traditional comparison between backtest and real account, which implies some improvements. Keywords: automated trading systems, investments, stocks, backtest, Forex 1. THE PIONEERS OF AUTOMATION In 1960, Richard Donchian, B.A. in Economics from Yale University, suggested the use of emerging technologies at the time to mechanize the testing of strategies, making the process faster, more precise, and more objective. In 1970, Edwark Seykota, B.Sc. in Engineering from MIT, grew excited by reading one of Donchian’s letters on the matter, and implemented the first automated system ever recorded. According to Jack Schwager (Market Wizards, 1998), Seykota’s system would have turned US$5,000.00, invested in 1972, into US$15,000,000.00 sometime in 1988. However, the results Schwager stated in his book are not backed by any documents or evidence. Judging from the technology available at the time, as well as from the evolution of Seykota’s history, it is likely that the mathematical modeling made by him contained inexactitudes, making his backtest results better than what could possibly be obtained in real trading. In 1982, James Simons, Ph.D. in Mathematics from the University of California, who has been awarded some international prizes, created a fund named Renaissance Technologies, and started using a system he developed to manage the fund. In the last 32 years, Simons achieved annual average returns higher than 60% (35% net returns after discounting the 44% performance fee and 5% management fee). Currently, Simons has a personal net worth of 12.5 billion dollars, according to Forbes. He is known as the world’s best investment manager of the last few decades, surpassing Soros and Buffett. Simons was probably the first to obtain results that were consistent, homogeneous, and stable in the long run through the use of automated trading systems. He remains the best in managing volumes greater than 10 billion dollars, although there is now at least one system superior to his when managing smaller volumes. 2. TECHNICAL ANALYSIS AND FUNDAMENTAL ANALYSIS It is normal to split investment methodologies into two large groups: Quantitative Analysis (or Technical Analysis), and Fundamental Analysis. Particularly, I consider it more appropriate to partition them into three groups: Scientific Analysis, Philosophical Analysis, and Esoteric Analysis. 1. The group of scientific analysts is comprised of James Simons, Ed. Seykota, Robert Pardo, Edward Thorp and others who employ rigorous protocols, in agreement with Contemporaneous Scientific Methodology and Formal Logic. A scientific analyst is distinguished by the use of statistical tools to investigate quantitative and morphological properties in historical price data, with the goal of recognizing fragments of recurring patterns and using this data to calculate the probabilities of prices moving in a given direction. 2. The group of philosophical analysts is comprised of George Soros, Warren Buffett, Jimmy Rogers, Paul Tudor Jones, and others that, despite making their decisions based on subjective evaluations, also make good use of Logic and Scientific Methodology. They reach practical results that confirm the efficiency of their methods. A philosophical analyst is distinguished by the ability of identifying relevant factors in Macroeconomics, Politics, in the climate, Culture, and other aspects that determine price movement. He applies his knowledge as well as his feeling to ponder about the relative importance of these factors, looking to understand the structure of the interrelation between them, and to interpret the effects they have on the prices of the Financial Markets. 3. The group of esoteric analysts is very wide and heterogeneous, covering a large variety of profiles. These are the people who lose money in a systematic manner; their participation in the market introduces noise in price movement, because the “methods” they use are inefficient, equivalent to trading randomly. This group is comprised of the great majority of market agents, including those who sell books and courses on Technical Analysis, their students, and their followers. Their decisions are not rooted in Logic or Science, but rather in superstitions and dogmas. The methods they employ lack a scientific basis, and would never pass any meticulous experiment seeking to validate their strategies. According to the usual classification, the traders in the first and third groups would be mingled in just one: “technical analysts”. However, there are strong reasons for them to be split into distinct groups, not only because they employ essentially different methods, but essentially yield opposite effects on the market: the former group contributes to organize market prices into less chaotic structures, i.e., to decrease their entropy, while the latter group increases it. Furthermore, in terms of etymology and semantics, the terms “scientific analyst,” “philosophical analyst,” and “esoteric analyst” are more pertinent and describe more accurately the characteristics observed in the typical profile of a trader belonging to each of these groups. Having made this distinction, we can now evaluate the impact which recent technological advancements have had on the different types of investors: For fundamental analysts, automation resources have not yet reached a level that enables them to enjoy new technologies as much as scientific analysts have been doing. Even with the use of Diffuse Logic and Neural Networks, it is very difficult to “teach” a machine the criteria recommended by Benjamin Graham on whether to buy or sell. The selection of which information should be used to feed the system also becomes very subjective, and very likely biased. This makes it impossible to automate philosophical strategies with the technology currently available. For esoteric analysts, new advancements represent a threat, for they unmask the analysts and reveal the fragility of the beliefs they try to spread. For scientific analysts, the possibility of automating a strategy is an extraordinary evolutionary leap, allowing for the rise of names like James Simons above world powerhouses that previously dominated the scene for decades, like Buffett and Soros. Had Simons lived 50 years earlier, he would probably not have stood out as an investor. The great revolution caused by the automation of strategies takes place among scientific analysts. For this reason, this will be our focus in this article. 3. THE FORMULATION OF A STRATEGY In order to better understand some of the effects brought about by technologies that enable the automation of strategies, we will conduct a case study with a simple strategy. We will analyse every stage involved: from conception, to parameter optimization, validation, and finally real time trading. The first stage that should be undertaken to formulate a trading strategy consists in investigating the properties of the Market. The analysis of morphological patterns that reoccur with greater regularity and frequency in the historical data is crucial, for it allows us to determine the very instants in which there are peaks of asymmetry in the probabilities of prices moving in a given direction. If these studies are conducted manually – that is, not in an automated manner – there will be two fundamental and highly restrictive problems: Subjectivity in the classification and grouping of similar patterns; Time needed for analysis and testing. The use of statistical tools like Hierarchical Clustering, Dynamic Clouds, Wavelets, or Neural Networks, enables us to group similar patterns according to objective criteria, which represents an important advantage relative to idiosyncratic pattern recognition. Before we can test the efficiency of a strategy, we must define it with exactitude and univocity, through clear and well-defined criteria. This is one of the advantages of automation. When testing manually, there is always the risk of slightly changing the criteria in each trade. This way, in the end, one cannot possibly know whether the result obtained reliably confirms the strategy’s efficiency or whether it is strongly biased by his desire that the strategy is victorious. The level of sophistication and complexity of a manually-analyzed strategy is also very limited, whereas it can be very high in an automated strategy – as high as the author’s imagination and 64-bit technology allow it to be. The ratio between the time needed to analyze historical price data in search of patterns manually and automatically is also one of the most important advantages in favor of automation. 4. THE BENEFITS OF OPTIMIZATION In general, a simple strategy can take dozens of hours to be manually tested over 20 years of historical prices. There is also the risk of making mistakes in the application of the buy and sell criteria and in the specification of the exact entry and exit points. If the same strategy is automated, it can be tested within a fraction of a second, with virtually no risk of errors in the application of the criteria. This time advantage allows for the possibility of retesting the strategy thousands or even millions of times over, using different setups for the parameters that determine the entry and exit points in each run. This superiority in processing power allows for something that is not usually done manually: optimization. Assume a strategy of the type: Indicator: 14-period RSI Entry criterion: When RSI < 25, buy using 100% of the portfolio. Exit criterion: When RSI > 75, sell 100% of the portfolio. If the position reaches a 5% profit, it is closed (take profit). If the position reaches a 5% loss, it is closed by a stop loss. Once a position’s profit exceeds 2%, the trailing stop is moved to 1% and gradually follows prices from half the distance between the entry point and the price at the time. When testing this strategy on EURUSD on a 15-minute time frame, we see that, from 1986 to 2014, it quickly leads the portfolio to ruin. But if we modify the values of a few parameters, can it possibly work? If the strategy is not automated, each new change in the value of one or more parameters will demand another 20 or 50 hours of testing. In addition, the trader will be induced to make alterations to the strategy before concluding the test, due to restlessness, anxiety, attempts to save time as well as other reasons, in such a way that the result of the test will not correctly reflect the strategy’s efficiency. Actually, the results obtained from manually tested strategies are usually so different from reality that it is almost frightening. Very often a trader will manually test his strategy and grow convinced that it is good, since it was profitable throughout 10 to 20 years of backtests, or even more. In truth, one must only run an automated backtest to end the illusion and see that the strategy actually leads to ruin in less than a year, regardless of the date in which the test is started. The graphs below show the results of different setups of the aforementioned strategy: RSI 25/75, 20/80, 15/85, and 10/90, respectively: The blue line in the upper part the graphs represents the evolution of the portfolio’s balance. The green bars in the lower part the graphs represent the volume traded in each position. The numbers in the x-axis are the IDs of each trade. In the yaxis we see the balance in dollars. We can see that in the long run (27.5 years), every setup tested incurrs loses, although a few setups were able to remain profitable for a few months. We can also see the RSI 25/75 strategy loses much more quickly than the RSI 10/90. This is not because the latter setup is better, but rather because the criteria RSI < 10 and RSI > 90 are less frequently met, generating less trading signals, which entails less trades and results in slower losses. When the strategy is automated, it yields reliable results, which are highly representative of how that strategy would have performed had it been put to use in the real market in the same period of the historical data on which it was tested. Moreover, it can be retested numerous times, modifying the values of the parameters each time. If one is able to run 1 backtest per second, one can test more than 10,000 different setups in 3 hours. This might sound like a lot, but when we calculate the total number of possible setups, we realize that it is actually little. In order to test every RSI period between 3 and 32, every value for the buy RSI and the sell RSI from 0 to 100, in intervals of 1; every stop loss value between 0.1% and 10% in intervals of 0.1%; every take profit value between 0.1% and 10% in intervals of 0.1%; every trailing stop value between 0.1% and 8% in intervals of 0.1%, we will have 30 x 101 x 101 x 100 x 100 x 80, or 2.4 x 1011 different combinations possible. If we run 1 backtest per second, it would take us 8,000 years to test every one of these possibilities. If the strategy were a bit more complex, with 10 parameters instead of 6, the time needed would easily exceed the theoretical age of the observable Universe (in the Friedmann-Lamaître cosmological model). 5. GENETIC ALGORITHMS IN THE OPTIMIZATION PROCESS Fortunately, in order to find setups that have a high probability of coming close to being the best possible setups, it is not necessary to test the majority of possible combinations. The use of Genetic Algorithms allows for the attribution of the strategy’s parameter values to be done judiciously, based on the results obtained by previously tested setups. This tremendously accelerates the process of searching for the best setup. Metatrader 4 is a platform for optimizing, backtesting, and trading. It has a simple genetic algorithm, with which one can test from about 10,000 setups selected from a pool of 263-1 different genotypes, meaning 9.22 quintillion (9.22 x 1018) different setups. The algorithm does this in such a way that there is a reasonable probability that this small sample of 10,000 contains some of the best genotypes from the entire set of 9.22 quintillion setups. The use of genetic algorithms allows us to find promising setups for complex strategies that would otherwise take much longer than billions of years to be discovered. In a summarized and simplified manner, a genetic algorithm combines setups of pairs of genotypes in each generation in order to define the setups of the genotypes of the next generation. It attributes to each generation’s genotype a crossover probability that is proportional to its adaptation score, and introduces different levels of mutation with different levels of probability. When dealing with a set of 9.22 quintillion possibilities, this mechanism yields results trillions of times superior to what could be obtained through random choices. The following graph shows the results of an optimization of version 9.01a of Saturn V: Each blue point represents a genotype. The x-axis represents the IDs of the genotypes, which, in this case, also coincides with the generation number, since each generation has a population of only 1 individual. The y-axis indicates the final balance reached. In the early generations, the average performance of the genotypes increases exponentially, but the speed of this evolution gradually decreases. If the graph included a few more thousands of generations, soon there would be an asymptote, and the genotypes would hit a performance ceiling whose value would roughly represent the maximum performance achievable by this strategy. This could indicate a local maximum, which was not avoided by a taboo function, or it could be the case that the algorithm actually found setups that exhibit performances fairly close to the maximum achievable by this strategy. It is very interesting that from a total of 9,223,372,036,854,775,807 setups, a relatively simple genetic algorithm can select only 10,000 setups to be tested and still have good chances that these 10,000 contain the best or one of the best possible setups. The benefits that can be reaped from this type of tool are exceptional, not only due to the time saved, but mainly because it allows us to find very reasonable solutions to problems whose exact solution would be impossible to obtain in a lifetime. 6. THE IMPORTANCE OF THE VALIDATION OF A STRATEGY First, we analyzed the properties of a financial asset until we found some patterns that predict asymmetries in the probabilities of price moments. Then, we used this knowledge to formulate a strategy. Afterward, we optimized its quantitative and qualitative parameters, and we have seen that the best setups are promising. We are now close to the final phase, which consists of confirming the results obtained and validating the strategy. There are 3 important questions that must be answered before using a strategy in the real market: 1) How would this strategy have performed had it been used in the last years or decades? 2) What reasons can we have to believe that the results obtained in the last years or decades will remain unchanged in the next years or decades? 3) What reasons do we have to assume that the results we obtained in backtests are representative of the results we would actually obtain in real trading? One of the proper ways to seek answers for the first two questions consists in dividing historical data in two parts. The first part will serve to provide the automated system with a large volume of data on the properties of the market. This data will be used to optimize the parameters of the strategy. The second part will serve to confirm whether the setup obtained for the champion genotype in the first part still works in other scenarios, different from those on which the optimization was conducted. For example: if we have historical data from 1986 to 2014, we can use the period from 1986 to 1990 to optimize the strategy’s parameters with the aid of the genetic algorithm. We then use the champion genotype of this period to trade in the following period, from 1991 to 2014, and we then verify whether it continues yielding profits consistently. If the historical data is of high enough quality and the backtest is well conducted in every technical detail, this result means much. It is equivalent to having lived in the year 1990, and having used data from 1986 to 1990 to optimize our strategy in order to put the system to trade in the real market starting in 1991. If this works, it is the first strong indication that the strategy is efficient. One should not expect that a genotype resulting from an optimization on a given period achieve the same level of profitability over a different period. This occurs because when the strategy is optimized on a specific period, each genotype trained in that period had to adapt to some particular properties found in the scenarios inherent to that period. Each genotype also had to adapt to other properties that are present in any historical moment, properties that are general, universal, and timeless. When this genotype is put to trade in a different period, it will not find the same properties exclusive to the optimization period, but it will find the universal and timeless properties. For this reason, the genotype will recognize a lower percentage of patterns, so its performance will tend to be a bit below that which was obtained in the period on which it was optimized. Hence, one should not discard a “bad” genotype that yielded yearly profits of 25% from 1986 to 1990 and then only 15% from 1991 to 2014. It is only fitting to conduct some tests of homogeneity, heteroscedasticity, and anisotropy in order to verify whether the difference observed is statistically expected, or whether it might signal any danger of overfitting. Saturn V is an exception, for it is an adaptive system capable of “deducing” how to deal with unprecedented scenarios based on the interpolation or extrapolation of known patterns. Interpolating and extrapolating data from a smooth and continuous function, defined by a Minkowski metric, is fairly simple, but interpolating data from a temporal fractal series in a fundamentally correct manner is much more difficult. A good adaptive system may also modify itself and refine its criteria according to the success rate of its attempts at recognizing new patterns. This makes results obtained in any period much more similar to the results of the optimization period, and it also ensures greater homogeneity in performance. However, when an abnormal abundance of unprecedented patterns occurs, as in the 2010 Greek debt crisis, performance may be impacted more heavily. Different anomalies may have a positive effect, like the burst of the subprime crisis in 2008, the dot-com bubble in 1999-2000, or the Wall Street Crash of 1929, because these events are associated to strong trends, which contribute to increase the profits of trend following strategies. The third question can be answered with a comparison of the results obtained in a real account with those obtained in a backtest in the same period. This comparison enables different types of analysis about point-to-point similarity, local similarity, and global similarity. Although there is no threshold that defines whether similarity is good enough, one can use as a criterion a 9 to 10 ratio of the annual average returns, or 0.9T for the ratio of the historical balance, where “T” is the time in years. If the answers for the 3 questions are positive, in the sense that backtests have shown to be reliable representations of what can be expected in real trading, and that optimizations on a specific period are capable of determining setups that remain profitable for long periods of time different from that on which the optimization was conducted, then one has clear formal corroboration that the strategy is appropriate for real trading. This procedure has the advantage of being highly reliable, because it directly compares real trading with backtesting. On the other hand, it does exhibit some drawbacks, one of which is that one needs at least a few months to gather a reasonable sample of real trading results for this comparison. One can also estimate the dissimilarity between backtesting and real trading in periods as long as several years, without having to wait for several years. A simple and efficient way of doing this consists in comparing the results of a backtest done on tickby-tick data with the results of a backtest on the same data, except with its 1-minute or 5-minute candle’s internal ticks removed. This type of comparison grants greater control over various variables, allowing one to know the magnitude of the differences produced in balance as a function of differences in execution latency, slippage, spread variations, etc. One can also compare backtests done in data from two or more different sources, data with real ticks and artificial ticks, as well as other possibilities, in order to reach different goals. If the goal is to get to know the probable maximum dissimilarity expected between backtesting and real trading, then the method of comparing backtests on tickby-tick data to backtests on data with empty candles is suitable. This is due to the fact that the difference between a backtest with empty candles, relative to a backtest with every tick and an average delay of 5 seconds in each trade, will be presumably larger than the difference between real trading and backtesting with every tick. The two graphs below allow for a comparison of backtests of Saturn V between 4/5/2007 and 30/4/2014, on EURUSD data from Dukascopy. In the first graph, we used tick-by-tick data, while in the second we used empty 15-minute candles: We can see that the similarity between the two graphs is very high, not only in the big picture, but also considering every little stretch. The difference in the total number of trades is also small (less than 1%), and we can see that the final balance, although fairly similar, is less in the empty candle data than in the tick-by-tick data. This means that projections are underestimated. That is, if we interpret that results from empty candle data are to results from the tick-by-tick data in the same way as results from tick-by-tick data are to results from real trading, then it is expected that real profits are slightly higher than the projections based on backtest results. There are other measures that may be taken to make backtests more difficult than real trading, in order to force real results to have a good probability of being better than the forecasts. For example, one can use wider spreads or higher commissions than those seen in real trading, or a combination of both. In this example, the comparison confirmed the expectations and the strategy was successfully validated (actually, it had been validated in 2010, based on other tests). However, most of the time, results obtained in the validation phase are negative, calling for many improvements in testing methodology and noise filtering in the historical data before a reasonable level of similarity between backtesting and real trading is reached. In order to achieve a good level of similarity, backtests must take the following factors into account: Spreads (difference between the bid and the ask prices). Commissions, custody and settlement fees, and swaps. Latency in execution and slippage. Penetration in the order book. Delays in execution sometimes result in slightly better execution prices, sometimes slightly worse, but the penetration in the order book always results in worse prices than the bid or the ask. For this reason, when dealing with strategies that involve short trades (something like 1% profit or loss in each trade), it is fundamental that backtests are accurate representations of real trading in every detail. When dealing with strategies that involve longer trades done with relatively small volumes in comparison to the market’s liquidity, the relative importance of details such as the penetration in the order book diminishes. On the other hand, if traded volumes are large enough to the point of influencing prices and deeply penetrating the offer book, backtesting simulations must also represent these factors as precisely as possible. With good quality historical price data and properly conducted backtests, one can test the quality of a strategy fairly rigorously, and know beforehand what levels of risk and return he can expect when it is used in real trading. One is able to take control over many variables involved and verify how the strategy behaves in different conditions. However, automation does not work miracles; it does not transform a losing strategy into a profitable one. Automation allows one to extract the best from each strategy, and to get to know its real probabilities of success before executing it in real trading. Human imagination still plays the main role in every stage, especially in the most important one: the creation of the strategy. In 2006, I analyzed more than 500 strategies available on investment websites, as well as about 2,000 variants of the original strategies. I tested thousands of setups for each one of them, but none has shown to be profitable in the long run, and only two of them showed signs that they could be profitable if they were exhaustively optimized. In 2010 and 2014, I have conducted new tests with the strategies that have received the most favorable user reviews in specialized websites, and results were not any better than in 2006. These facts reveal that despite the immense processing power provided by the use of computers, compounded by the advantages of using genetic algorithms in optimizations, it is still very difficult and rare to find a strategy that is truly efficient, consistent, and stable in the long run. Hence, to the vast majority of people, the automation of their favorite strategies does not actually fulfill the goal of profiting through their use; in the end, tests only reveal that their beloved strategies do not work. Although this reality seems bitter for most, its effects are beneficial, because they help avoiding large monetary losses caused by the use of losing strategies. As disappointing as it is to watch your favorite strategy bring your balance to zero in backtests, it is much better than watching the same happen to your real account. Bibliographical References: http://dlmf.nist.gov http://mathworld.wolfram.com https://www.statsoft.com/textbook http://www.wikipedia.org http://www.saturnov.com Historical price data: http://www.olsendata.com http://www.dukascopy.com https://www.globalfinancialdata.com
© Copyright 2026 Paperzz