Artificial Intelligence and Automation of Trading Systems

Artificial Intelligence and Automation of Trading Systems
Hindemburg Melão Jr. is an autodidact, holder of the world record of
longest announced checkmate in simultaneous blindfold chess
(Guinness Book 1998, Brazilian edition, pages 110-111); honorary
member of Pars Society (for people with IQs higher than 180); he
has worked with the development of automated trading systems
since 2006, and is the author of “Saturn V”.
Abstract: This article consists of a brief analysis of the positive and negative aspects of
automating trading systems for stocks, commodities, currencies and other financial
instruments. Different stages of the development of a strategy are compared with and
without automation resources, and the results are evaluated at each step. In the
optimization phase, we highlight the advantages of the use of genetic algorithms, which
enable the solution problems involving 9.22 quintillion of setups. In the validation phase,
we show a little innovation, comparing two or more backtests in slightly different
conditions, instead of the traditional comparison between backtest and real account,
which implies some improvements.
Keywords: automated trading systems, investments, stocks, backtest, Forex
1. THE PIONEERS OF AUTOMATION
In 1960, Richard Donchian, B.A. in Economics from Yale University, suggested
the use of emerging technologies at the time to mechanize the testing of strategies,
making the process faster, more precise, and more objective.
In 1970, Edwark Seykota, B.Sc. in Engineering from MIT, grew excited by
reading one of Donchian’s letters on the matter, and implemented the first automated
system ever recorded. According to Jack Schwager (Market Wizards, 1998), Seykota’s
system would have turned US$5,000.00, invested in 1972, into US$15,000,000.00
sometime in 1988. However, the results Schwager stated in his book are not backed by
any documents or evidence. Judging from the technology available at the time, as well
as from the evolution of Seykota’s history, it is likely that the mathematical modeling
made by him contained inexactitudes, making his backtest results better than what
could possibly be obtained in real trading.
In 1982, James Simons, Ph.D. in Mathematics from the University of California,
who has been awarded some international prizes, created a fund named Renaissance
Technologies, and started using a system he developed to manage the fund. In the last
32 years, Simons achieved annual average returns higher than 60% (35% net returns
after discounting the 44% performance fee and 5% management fee). Currently,
Simons has a personal net worth of 12.5 billion dollars, according to Forbes. He is
known as the world’s best investment manager of the last few decades, surpassing
Soros and Buffett.
Simons was probably the first to obtain results that were consistent,
homogeneous, and stable in the long run through the use of automated trading
systems. He remains the best in managing volumes greater than 10 billion dollars,
although there is now at least one system superior to his when managing smaller
volumes.
2. TECHNICAL ANALYSIS AND FUNDAMENTAL ANALYSIS
It is normal to split investment methodologies into two large groups: Quantitative
Analysis (or Technical Analysis), and Fundamental Analysis. Particularly, I consider it
more appropriate to partition them into three groups: Scientific Analysis, Philosophical
Analysis, and Esoteric Analysis.
1. The group of scientific analysts is comprised of James Simons, Ed. Seykota,
Robert Pardo, Edward Thorp and others who employ rigorous protocols, in
agreement with Contemporaneous Scientific Methodology and Formal Logic. A
scientific analyst is distinguished by the use of statistical tools to investigate
quantitative and morphological properties in historical price data, with the goal of
recognizing fragments of recurring patterns and using this data to calculate the
probabilities of prices moving in a given direction.
2. The group of philosophical analysts is comprised of George Soros, Warren
Buffett, Jimmy Rogers, Paul Tudor Jones, and others that, despite making their
decisions based on subjective evaluations, also make good use of Logic and
Scientific Methodology. They reach practical results that confirm the efficiency of
their methods. A philosophical analyst is distinguished by the ability of identifying
relevant factors in Macroeconomics, Politics, in the climate, Culture, and other
aspects that determine price movement. He applies his knowledge as well as his
feeling to ponder about the relative importance of these factors, looking to
understand the structure of the interrelation between them, and to interpret the
effects they have on the prices of the Financial Markets.
3. The group of esoteric analysts is very wide and heterogeneous, covering a large
variety of profiles. These are the people who lose money in a systematic manner;
their participation in the market introduces noise in price movement, because the
“methods” they use are inefficient, equivalent to trading randomly. This group is
comprised of the great majority of market agents, including those who sell books
and courses on Technical Analysis, their students, and their followers. Their
decisions are not rooted in Logic or Science, but rather in superstitions and
dogmas. The methods they employ lack a scientific basis, and would never pass
any meticulous experiment seeking to validate their strategies.
According to the usual classification, the traders in the first and third groups
would be mingled in just one: “technical analysts”. However, there are strong reasons
for them to be split into distinct groups, not only because they employ essentially
different methods, but essentially yield opposite effects on the market: the former group
contributes to organize market prices into less chaotic structures, i.e., to decrease their
entropy, while the latter group increases it.
Furthermore, in terms of etymology and semantics, the terms “scientific analyst,”
“philosophical analyst,” and “esoteric analyst” are more pertinent and describe more
accurately the characteristics observed in the typical profile of a trader belonging to
each of these groups.
Having made this distinction, we can now evaluate the impact which recent
technological advancements have had on the different types of investors:

For fundamental analysts, automation resources have not yet reached a level
that enables them to enjoy new technologies as much as scientific analysts have
been doing. Even with the use of Diffuse Logic and Neural Networks, it is very
difficult to “teach” a machine the criteria recommended by Benjamin Graham on
whether to buy or sell. The selection of which information should be used to feed
the system also becomes very subjective, and very likely biased. This makes it
impossible to automate philosophical strategies with the technology currently
available.

For esoteric analysts, new advancements represent a threat, for they unmask the
analysts and reveal the fragility of the beliefs they try to spread.

For scientific analysts, the possibility of automating a strategy is an extraordinary
evolutionary leap, allowing for the rise of names like James Simons above world
powerhouses that previously dominated the scene for decades, like Buffett and
Soros. Had Simons lived 50 years earlier, he would probably not have stood out
as an investor.
The great revolution caused by the automation of strategies takes place among
scientific analysts. For this reason, this will be our focus in this article.
3. THE FORMULATION OF A STRATEGY
In order to better understand some of the effects brought about by technologies that
enable the automation of strategies, we will conduct a case study with a simple strategy.
We will analyse every stage involved: from conception, to parameter optimization,
validation, and finally real time trading.
The first stage that should be undertaken to formulate a trading strategy consists in
investigating the properties of the Market. The analysis of morphological patterns that
reoccur with greater regularity and frequency in the historical data is crucial, for it allows
us to determine the very instants in which there are peaks of asymmetry in the
probabilities of prices moving in a given direction. If these studies are conducted
manually – that is, not in an automated manner – there will be two fundamental and
highly restrictive problems:

Subjectivity in the classification and grouping of similar patterns;

Time needed for analysis and testing.
The use of statistical tools like Hierarchical Clustering, Dynamic Clouds, Wavelets, or
Neural Networks, enables us to group similar patterns according to objective criteria,
which represents an important advantage relative to idiosyncratic pattern recognition.
Before we can test the efficiency of a strategy, we must define it with exactitude and
univocity, through clear and well-defined criteria. This is one of the advantages of
automation. When testing manually, there is always the risk of slightly changing the
criteria in each trade. This way, in the end, one cannot possibly know whether the result
obtained reliably confirms the strategy’s efficiency or whether it is strongly biased by his
desire that the strategy is victorious.
The level of sophistication and complexity of a manually-analyzed strategy is also
very limited, whereas it can be very high in an automated strategy – as high as the
author’s imagination and 64-bit technology allow it to be.
The ratio between the time needed to analyze historical price data in search of
patterns manually and automatically is also one of the most important advantages in
favor of automation.
4. THE BENEFITS OF OPTIMIZATION
In general, a simple strategy can take dozens of hours to be manually tested over 20
years of historical prices. There is also the risk of making mistakes in the application of
the buy and sell criteria and in the specification of the exact entry and exit points. If the
same strategy is automated, it can be tested within a fraction of a second, with virtually
no risk of errors in the application of the criteria. This time advantage allows for the
possibility of retesting the strategy thousands or even millions of times over, using
different setups for the parameters that determine the entry and exit points in each run.
This superiority in processing power allows for something that is not usually done
manually: optimization.
Assume a strategy of the type:

Indicator: 14-period RSI

Entry criterion: When RSI < 25, buy using 100% of the portfolio.

Exit criterion: When RSI > 75, sell 100% of the portfolio.

If the position reaches a 5% profit, it is closed (take profit).

If the position reaches a 5% loss, it is closed by a stop loss.

Once a position’s profit exceeds 2%, the trailing stop is moved to 1% and
gradually follows prices from half the distance between the entry point and the
price at the time.
When testing this strategy on EURUSD on a 15-minute time frame, we see that,
from 1986 to 2014, it quickly leads the portfolio to ruin. But if we modify the values of a
few parameters, can it possibly work? If the strategy is not automated, each new
change in the value of one or more parameters will demand another 20 or 50 hours of
testing. In addition, the trader will be induced to make alterations to the strategy before
concluding the test, due to restlessness, anxiety, attempts to save time as well as other
reasons, in such a way that the result of the test will not correctly reflect the strategy’s
efficiency. Actually, the results obtained from manually tested strategies are usually so
different from reality that it is almost frightening.
Very often a trader will manually test his strategy and grow convinced that it is
good, since it was profitable throughout 10 to 20 years of backtests, or even more. In
truth, one must only run an automated backtest to end the illusion and see that the
strategy actually leads to ruin in less than a year, regardless of the date in which the
test is started.
The graphs below show the results of different setups of the aforementioned
strategy: RSI 25/75, 20/80, 15/85, and 10/90, respectively:
The blue line in the upper part the graphs represents the evolution of the
portfolio’s balance. The green bars in the lower part the graphs represent the volume
traded in each position. The numbers in the x-axis are the IDs of each trade. In the yaxis we see the balance in dollars.
We can see that in the long run (27.5 years), every setup tested incurrs loses,
although a few setups were able to remain profitable for a few months. We can also see
the RSI 25/75 strategy loses much more quickly than the RSI 10/90. This is not because
the latter setup is better, but rather because the criteria RSI < 10 and RSI > 90 are less
frequently met, generating less trading signals, which entails less trades and results in
slower losses.
When the strategy is automated, it yields reliable results, which are highly
representative of how that strategy would have performed had it been put to use in the
real market in the same period of the historical data on which it was tested. Moreover, it
can be retested numerous times, modifying the values of the parameters each time. If
one is able to run 1 backtest per second, one can test more than 10,000 different setups
in 3 hours.
This might sound like a lot, but when we calculate the total number of possible
setups, we realize that it is actually little. In order to test every RSI period between 3
and 32, every value for the buy RSI and the sell RSI from 0 to 100, in intervals of 1;
every stop loss value between 0.1% and 10% in intervals of 0.1%; every take profit
value between 0.1% and 10% in intervals of 0.1%; every trailing stop value between
0.1% and 8% in intervals of 0.1%, we will have 30 x 101 x 101 x 100 x 100 x 80, or 2.4 x
1011 different combinations possible. If we run 1 backtest per second, it would take us
8,000 years to test every one of these possibilities. If the strategy were a bit more
complex, with 10 parameters instead of 6, the time needed would easily exceed the
theoretical age of the observable Universe (in the Friedmann-Lamaître cosmological
model).
5. GENETIC ALGORITHMS IN THE OPTIMIZATION PROCESS
Fortunately, in order to find setups that have a high probability of coming close to
being the best possible setups, it is not necessary to test the majority of possible
combinations. The use of Genetic Algorithms allows for the attribution of the strategy’s
parameter values to be done judiciously, based on the results obtained by previously
tested setups. This tremendously accelerates the process of searching for the best
setup.
Metatrader 4 is a platform for optimizing, backtesting, and trading. It has a simple
genetic algorithm, with which one can test from about 10,000 setups selected from a
pool of 263-1 different genotypes, meaning 9.22 quintillion (9.22 x 1018) different setups.
The algorithm does this in such a way that there is a reasonable probability that this
small sample of 10,000 contains some of the best genotypes from the entire set of 9.22
quintillion setups.
The use of genetic algorithms allows us to find promising setups for complex
strategies that would otherwise take much longer than billions of years to be discovered.
In a summarized and simplified manner, a genetic algorithm combines setups of pairs of
genotypes in each generation in order to define the setups of the genotypes of the next
generation. It attributes to each generation’s genotype a crossover probability that is
proportional to its adaptation score, and introduces different levels of mutation with
different levels of probability. When dealing with a set of 9.22 quintillion possibilities, this
mechanism yields results trillions of times superior to what could be obtained through
random choices. The following graph shows the results of an optimization of version
9.01a of Saturn V:
Each blue point represents a genotype. The x-axis represents the IDs of the
genotypes, which, in this case, also coincides with the generation number, since each
generation has a population of only 1 individual. The y-axis indicates the final balance
reached.
In the early generations, the average performance of the genotypes increases
exponentially, but the speed of this evolution gradually decreases. If the graph included
a few more thousands of generations, soon there would be an asymptote, and the
genotypes would hit a performance ceiling whose value would roughly represent the
maximum performance achievable by this strategy. This could indicate a local
maximum, which was not avoided by a taboo function, or it could be the case that the
algorithm actually found setups that exhibit performances fairly close to the maximum
achievable by this strategy.
It is very interesting that from a total of 9,223,372,036,854,775,807 setups, a
relatively simple genetic algorithm can select only 10,000 setups to be tested and still
have good chances that these 10,000 contain the best or one of the best possible
setups. The benefits that can be reaped from this type of tool are exceptional, not only
due to the time saved, but mainly because it allows us to find very reasonable solutions
to problems whose exact solution would be impossible to obtain in a lifetime.
6. THE IMPORTANCE OF THE VALIDATION OF A STRATEGY
First, we analyzed the properties of a financial asset until we found some patterns
that predict asymmetries in the probabilities of price moments. Then, we used this
knowledge to formulate a strategy. Afterward, we optimized its quantitative and
qualitative parameters, and we have seen that the best setups are promising. We are
now close to the final phase, which consists of confirming the results obtained and
validating the strategy.
There are 3 important questions that must be answered before using a strategy in
the real market:
1) How would this strategy have performed had it been used in the last years or
decades?
2) What reasons can we have to believe that the results obtained in the last
years or decades will remain unchanged in the next years or decades?
3) What reasons do we have to assume that the results we obtained in
backtests are representative of the results we would actually obtain in real
trading?
One of the proper ways to seek answers for the first two questions consists in
dividing historical data in two parts. The first part will serve to provide the automated
system with a large volume of data on the properties of the market. This data will be
used to optimize the parameters of the strategy. The second part will serve to confirm
whether the setup obtained for the champion genotype in the first part still works in
other scenarios, different from those on which the optimization was conducted.
For example: if we have historical data from 1986 to 2014, we can use the period
from 1986 to 1990 to optimize the strategy’s parameters with the aid of the genetic
algorithm. We then use the champion genotype of this period to trade in the following
period, from 1991 to 2014, and we then verify whether it continues yielding profits
consistently. If the historical data is of high enough quality and the backtest is well
conducted in every technical detail, this result means much. It is equivalent to having
lived in the year 1990, and having used data from 1986 to 1990 to optimize our strategy
in order to put the system to trade in the real market starting in 1991. If this works, it is
the first strong indication that the strategy is efficient.
One should not expect that a genotype resulting from an optimization on a given
period achieve the same level of profitability over a different period. This occurs
because when the strategy is optimized on a specific period, each genotype trained in
that period had to adapt to some particular properties found in the scenarios inherent to
that period. Each genotype also had to adapt to other properties that are present in any
historical moment, properties that are general, universal, and timeless. When this
genotype is put to trade in a different period, it will not find the same properties
exclusive to the optimization period, but it will find the universal and timeless properties.
For this reason, the genotype will recognize a lower percentage of patterns, so its
performance will tend to be a bit below that which was obtained in the period on which it
was optimized.
Hence, one should not discard a “bad” genotype that yielded yearly profits of
25% from 1986 to 1990 and then only 15% from 1991 to 2014. It is only fitting to
conduct some tests of homogeneity, heteroscedasticity, and anisotropy in order to verify
whether the difference observed is statistically expected, or whether it might signal any
danger of overfitting.
Saturn V is an exception, for it is an adaptive system capable of “deducing” how
to deal with unprecedented scenarios based on the interpolation or extrapolation of
known patterns. Interpolating and extrapolating data from a smooth and continuous
function, defined by a Minkowski metric, is fairly simple, but interpolating data from a
temporal fractal series in a fundamentally correct manner is much more difficult.
A good adaptive system may also modify itself and refine its criteria according to
the success rate of its attempts at recognizing new patterns. This makes results
obtained in any period much more similar to the results of the optimization period, and it
also ensures greater homogeneity in performance. However, when an abnormal
abundance of unprecedented patterns occurs, as in the 2010 Greek debt crisis,
performance may be impacted more heavily. Different anomalies may have a positive
effect, like the burst of the subprime crisis in 2008, the dot-com bubble in 1999-2000, or
the Wall Street Crash of 1929, because these events are associated to strong trends,
which contribute to increase the profits of trend following strategies.
The third question can be answered with a comparison of the results obtained in
a real account with those obtained in a backtest in the same period. This comparison
enables different types of analysis about point-to-point similarity, local similarity, and
global similarity.
Although there is no threshold that defines whether similarity is good enough,
one can use as a criterion a 9 to 10 ratio of the annual average returns, or 0.9T for the
ratio of the historical balance, where “T” is the time in years.
If the answers for the 3 questions are positive, in the sense that backtests have
shown to be reliable representations of what can be expected in real trading, and that
optimizations on a specific period are capable of determining setups that remain
profitable for long periods of time different from that on which the optimization was
conducted, then one has clear formal corroboration that the strategy is appropriate for
real trading.
This procedure has the advantage of being highly reliable, because it directly
compares real trading with backtesting. On the other hand, it does exhibit some
drawbacks, one of which is that one needs at least a few months to gather a reasonable
sample of real trading results for this comparison.
One can also estimate the dissimilarity between backtesting and real trading in
periods as long as several years, without having to wait for several years. A simple and
efficient way of doing this consists in comparing the results of a backtest done on tickby-tick data with the results of a backtest on the same data, except with its 1-minute or
5-minute candle’s internal ticks removed.
This type of comparison grants greater control over various variables, allowing
one to know the magnitude of the differences produced in balance as a function of
differences in execution latency, slippage, spread variations, etc. One can also compare
backtests done in data from two or more different sources, data with real ticks and
artificial ticks, as well as other possibilities, in order to reach different goals.
If the goal is to get to know the probable maximum dissimilarity expected
between backtesting and real trading, then the method of comparing backtests on tickby-tick data to backtests on data with empty candles is suitable. This is due to the fact
that the difference between a backtest with empty candles, relative to a backtest with
every tick and an average delay of 5 seconds in each trade, will be presumably larger
than the difference between real trading and backtesting with every tick.
The two graphs below allow for a comparison of backtests of Saturn V between
4/5/2007 and 30/4/2014, on EURUSD data from Dukascopy. In the first graph, we used
tick-by-tick data, while in the second we used empty 15-minute candles:
We can see that the similarity between the two graphs is very high, not only in
the big picture, but also considering every little stretch. The difference in the total
number of trades is also small (less than 1%), and we can see that the final balance,
although fairly similar, is less in the empty candle data than in the tick-by-tick data. This
means that projections are underestimated. That is, if we interpret that results from
empty candle data are to results from the tick-by-tick data in the same way as results
from tick-by-tick data are to results from real trading, then it is expected that real profits
are slightly higher than the projections based on backtest results.
There are other measures that may be taken to make backtests more difficult
than real trading, in order to force real results to have a good probability of being better
than the forecasts. For example, one can use wider spreads or higher commissions
than those seen in real trading, or a combination of both.
In this example, the comparison confirmed the expectations and the strategy was
successfully validated (actually, it had been validated in 2010, based on other tests).
However, most of the time, results obtained in the validation phase are negative, calling
for many improvements in testing methodology and noise filtering in the historical data
before a reasonable level of similarity between backtesting and real trading is reached.
In order to achieve a good level of similarity, backtests must take the following
factors into account:

Spreads (difference between the bid and the ask prices).

Commissions, custody and settlement fees, and swaps.

Latency in execution and slippage.

Penetration in the order book.
Delays in execution sometimes result in slightly better execution prices,
sometimes slightly worse, but the penetration in the order book always results in worse
prices than the bid or the ask. For this reason, when dealing with strategies that involve
short trades (something like 1% profit or loss in each trade), it is fundamental that
backtests are accurate representations of real trading in every detail. When dealing with
strategies that involve longer trades done with relatively small volumes in comparison to
the market’s liquidity, the relative importance of details such as the penetration in the
order book diminishes.
On the other hand, if traded volumes are large enough to the point of influencing
prices and deeply penetrating the offer book, backtesting simulations must also
represent these factors as precisely as possible.
With good quality historical price data and properly conducted backtests, one can
test the quality of a strategy fairly rigorously, and know beforehand what levels of risk
and return he can expect when it is used in real trading. One is able to take control over
many variables involved and verify how the strategy behaves in different conditions.
However, automation does not work miracles; it does not transform a losing
strategy into a profitable one. Automation allows one to extract the best from each
strategy, and to get to know its real probabilities of success before executing it in real
trading. Human imagination still plays the main role in every stage, especially in the
most important one: the creation of the strategy.
In 2006, I analyzed more than 500 strategies available on investment websites,
as well as about 2,000 variants of the original strategies. I tested thousands of setups
for each one of them, but none has shown to be profitable in the long run, and only two
of them showed signs that they could be profitable if they were exhaustively optimized.
In 2010 and 2014, I have conducted new tests with the strategies that have received the
most favorable user reviews in specialized websites, and results were not any better
than in 2006.
These facts reveal that despite the immense processing power provided by the
use of computers, compounded by the advantages of using genetic algorithms in
optimizations, it is still very difficult and rare to find a strategy that is truly efficient,
consistent, and stable in the long run. Hence, to the vast majority of people, the
automation of their favorite strategies does not actually fulfill the goal of profiting through
their use; in the end, tests only reveal that their beloved strategies do not work.
Although this reality seems bitter for most, its effects are beneficial, because they help
avoiding large monetary losses caused by the use of losing strategies. As disappointing
as it is to watch your favorite strategy bring your balance to zero in backtests, it is much
better than watching the same happen to your real account.
Bibliographical References:
http://dlmf.nist.gov
http://mathworld.wolfram.com
https://www.statsoft.com/textbook
http://www.wikipedia.org
http://www.saturnov.com
Historical price data:
http://www.olsendata.com
http://www.dukascopy.com
https://www.globalfinancialdata.com