High-Dimensional Index Tracking

High-dimensional Index Tracking with Cointegrated
assets using an hybrid Genetic AlgorithmI
Daniele Bianchi∗,a,b , Antonio Garganoa,c
a
b
Department of Finance, Bocconi University, Milan, Italy
IROM Department, Red McCombs School of Business, University of Texas at Austin, USA
c
Rady School of Management, UCSD, USA
Abstract
In this paper we present a two-steps procedure to solve a large class of portfolio management
problems, i.e.(active)index tracking in high-dimensional spaces. The goal is to track the trajectory of vast indexes instead of the ubiquitous returns. Cointegration between index and
constituent levels is the key point in making the stock selection. The binary stock selection NP-hard problem is efficiently solved by heuristic procedures by using the Augmented
Dickey Fuller test as decision rule. The procedure is completed selecting the optimal set
of weights, setting the objective as a function of the index and the tracking portfolio normalized trajectories. Transaction costs are taken carefully into account by introducing an
L1-penalization in the aforementioned target function. The developed approach is tested on
the basis of several distance and performance measures, against other, well-established, index tracking procedures and by using three different indexes. Likewise we considered several
cardinality sizes and different portfolio rebalancing horizons, consistently with the changing
nature of broadly based indexes. We find that the index trajectory can be arguably tracked
satisfactorily. More importantly, we outperform the benchmark return-based procedures
with reference to several distance/performances measures.
I
This draft: March 14, 2011, We would thank Paolo Colla, Fabrizio Leisen and Eva Besada for helpful
comments and suggestions.
∗
Corresponding author
Email address: [email protected] (Daniele Bianchi)
March 14, 2011
1. Introduction
As opposed to traditional fund managers that seek to beat the market by using every
sort of stock picking strategy, trackers do something different attempting to match the riskreturn profile of a benchmark index. Index tracking funds have become a popular investment
vehicle in the 1990s when a handful of investment banks began offering these products to
small investors. Since then, ETFs and tracker funds have seen a sharpe increase in volume,
which is almost doubled during 2000s and still increasing. Usually index tracking is related
to market efficiency as shown in Jensen (1968) among the others. Indeed, empirical findings
seems to support the risk-adjusted scarse profitability of traditional funds, especially if
transaction costs are involved leading to losses even in bull market, (see e.g. Barber and
Odean (2000)). Then, tracker funds represent a useful alternative investment method, even
where capital market efficiency does not hold, to get round the perceived drawbacks of more
aggressive vehicles.
Furthermore, a well-designed index tracking methodology could be used to track not
only equity indexes, but also commodity and bond indexes, like Goldman Sachs Commodity
Index, which usually have countercyclical risk-return profiles with respect to stock market.
Finally, index tracking has the benefit of allowing to focus on a specific asset, industry or
geographical sector, but still getting a reliable amount of diversification within it.
The simplest way of index replication is to invest in all of the assets composing the index. This is called full replication. Despite, theoretically, sounds the most correct way, full
replication is not only cumbersome but also rather costly. Indeed, trading and monitoring
costs hamper this approach especially in high-dimensional indexes, since the index composition is rather time-varying recquiring frequent rebalancing. Another relevant approach is
represented by synthetic replication towards equity derivatives like future contracts. The
latter, usually have singularly less transaction costs. However, rolling contracts to dynamically track the underlying index is rather expensive and risky. This makes equity derivatives
strategies less attractive. Finally, an investor might consider partial replication of a benchmark index. Partial Indexing is the core business of ETFs and Tracker Funds, and the
2
framework we focus on this paper. Managers pursuing a partial index tracking strategy encounters two main problems, (1) select the optimal set of stocks to create the tracker index
and, (2) optimally quantify the amount of wealth to invest in each of the stocks selected.
In the present paper we propose a computationally tractable solution for the design of
near-optimal replication strategies in which the investor limits the number of assets used to
track the benchmark index. As opposed to most of the literature, the aim is to track the
trajectory of a benchmark index instead the usual ubiquitous returns. We develop a twostep procedure where the key point is cointegration between the index and the tracker fund
constituents. As pointed out in the seminal paper by Alexander and Dimitru (2005), the
usual correlation based procedures produce relatively unstable tracking portfolios. Alexander (2001), Alexander and Dimitru (2005) and Dunis and Ho (2005) propose to use cointegration to capture stable long-run equilibria between tracker fund and the benchmark index.
Their methodology however considers stock picking as kind of black-box procedure. Further
they did not consider high-dimensional benchmarks.
We extend the reference literature in several directions by, (1) taking carefully into
account the stock picking procedure developing an hybrid genetic algorithm to extract the
strongest cointegration relationships. In this sense we address the computational burden of
making clear the stock-picking based on cointegration. Then (2) we consistently structure
an objective function to track the normalized trajectory of a benchmark index. This is
partly similar to Focardi and Fabozzi (2004) and Dose and Cincotti (2005). However, they
applies clustering methods for stock picking instead of cointegration, and trajectory tracking
without explicitly considering transaction costs as part of the objective function. Finally (3)
we provide some further insight on the joint effect of cointegration stock picking and tracking
trajectory disentangling their benefit as opposed to correlation/returns based procedures.
The structure of the paper is as follows. Section 2 formalize the reason why we should
rely on trajectory approximations instead of returns, providing some useful insight on why
usual returns-based distance measures turns out to be suboptimal. Section 3 deals with
the nature of the index tracking problem, describing the risk measure in the minimzation
procedure, together with the transaction costs and the optimization program as a whole.
3
Section 4 presents the genetic algorithm used, the portfolio simulation strategy, and the
performance measures used to check portfolio construction reliability. Section 5 shows the
other benchmark index tracking methodologies used to compare the goodness of our index
tracking algorithm. Then Section 6 describes the sample selection and the dataset. Finally
Section 7 and 8 reports respectively the empirical results and the concluding remarks.
2. Setting objectives under classical distance measures: why trajectory instead
of returns
Let us consider a general benchmark index with {ki }K
i=1 , constituent assets. Each stock
has price pi (t) ∈ (0, ∞) and ci (t) the proportion invested in the ith asset such that the ith
weight at time t is defined as
ci (t)pi (t)
i∈Φ ci (t)pi (t)
wi (t) = P
such that
wi (t) ∈ (0, 1)
(1)
The population level of the index and the tracker portfolio can be respectively defined as a
linear combination of price vectors as
I(t) =
K
X
i=1
ci (t) × pi (t)
and
V(t) =
X
ci (t) × pi (t)
(2)
i∈Ξ
with Ξ the subset of N < K selected stocks. Therefore, considering the usual discrete time
returns definition rI (t) = (I(t) − I(t − 1))/I(t − 1) for the index, ri (t) = (pi (t) − pi (t −
1))/pi (t − 1) for the ith stock and a usual mean squared loss function
T
T
N
1X
1X X
2
M SE(rI , rp ) =
(rp (t) − rI (t)) =
wi (t)ri (t) − rI (t)
T t=1
T t=1 i=1
!2
(3)
it is easy to see that (1), (2) and the returns definition together introduce high nonlinearity
in the index tracking problem (3) by using the returns definition. Indeed, this represents
one of the most appealing motivations behind the index tracking literature.
Now, considering prices dynamics, the index clearly readjusts over time. This is not only
4
from the endogenous changing nature of prices, but also exogenous factors like composite
revisions, share issues and repurchases, as well as spinoffs, play a relevant role in this sense
(see Frino et al. (2004) for more details).
Some different strategies have been implemented in the literature. A first possible strategy is to maintain ci (t) constant over time in wi (t), thorough the tracking period. This
procedure however, incorporate high non linearity and non-trivial optimization behaviors,
because of the dynamics in pi (t) the portfolio weights evolves according to
ci pi (t)
wi (t) = PK
,
i=1 ci pi (t)
∀i = 1, ..., K
(4)
This limits its applicability in using classical exact optimization procedures. An alternative
strategy proposed in Ammann and Zimmermann (2001) and exploited in Ruiz-Torrubiano
and Suàrez (2009) is to hold wi (t) constant, actively managing ci (t), such that
ci (t)pi (t)
wi = PK
,
i=1 ci (t)pi (t)
∀i = 1, ..., K
(5)
This second strategy has the merit to keep the optimization problem quadratic, and computationally feasible, getting exact solutions. However, by using the MSE as loss function based
P
on approximating the tracker returns as rp (t) N
i=1 wi (t)ri (t) entails ci (t) ≈ ci (t−1). This implies a relevant approximation/measurement error, especially in minimizing returns-based
measures (see Ruiz-Torrubiano and Suàrez (2009), Derigs and Nickel (2004) and Beasley
et al. (2003) among the others). This concept is formalized in proposition 1. A textbook
proof is provided in the appendix.
Proposition 1. Let us consider the returns-based MSE minimization as defined in (3). Now
let us define the index returns as rI (t) = (I(t) − I(t − 1))/I(t − 1) and the returns of the ith
stock as ri (t) = (pi (t) − pi (t − 1)/pi (t − 1). Then assuming the portfolio returns are defined
P
as rp = N
i=1 wi (t)ri (t) entails ci (t) ≈ ci (t − 1). This introduces a relevant approximation
P
PN
error since implies V (t) = N
i=1 ci (t)pi (t) ≡
i=1 ci (t − 1)pi (t) in [t − 1, t).
Another relevant issue in considering returns is information. Indeed, computing returns
5
implies detrending the price levels. This means losing a relatively important amount of
information, especially about the long-run index/shares behavior. Finally, usual distance
measure are strictly related to correlation among the index and the tracker fund. Indeed,
is quite straightfoward to see that usually index tracking problems can be restated as least
squares minimization, i.e. regressions (see Fan et al. (2008) and Brodie et al. (2008) for
more details). Thus, the well known time-varying nature of correlation represents a further
issue that limits returns-based measures usefulness, with respect to more general distance
functionals.
The methodology we propose aims to track the trajectory of the index instead of returns.
Price levels are non-stationary, then usual correlation measures do not hold. Some relevant
examples of distance functionals is given in Focardi and Fabozzi (2004) and Dose and Cincotti (2005). They used clustering techniques based on a modified correlation measure for
integrated random variables. Our methodology is based on the concept of cointegration.
Definition 1. A nonstationary stochastic process X ∼ I(1) which satisfies Xt − E(Xt ) =
P∞
P∞
i=0 Ci 6= 0 and t ∼ i.i.dπ(0, Ω), for a general multivariate
i=0 Ci t−i , is called I(0), with
distribution π(.). Now, we call X ∼ I(1) cointegrated with β 6= 0 cointegrating vector if
β 0 X ∼ I(0), i.e. can be made stationary.
Strictly speaking two non stationary random variables are cointegrated if exists a non trivial linear relationship which is stationary. We exploit the concept of cointegration in the
univariate framework developed in Granger (1981), Granger (1983) and Engle and Granger
(1987). We deal more explicitly with the univariate cointegration in Section 3. Pioneering
index tracking applications of cointegration can be found in Alexander (2001), Alexander
and Dimitru (2005) and Dunis and Ho (2005).
3. Problem Formulation
Let us consider a market with N securities with price pi (t) at each time t. By I(t) we
define the level of the index at time t. Time is discrete, and each step the objective is
to select the optimal subset of stocks n < N , such that some risk measure is minimized.
6
Now wi (t) ∈ (0, 1) represents the fraction of the portfolio value kept in the ith stock at the
beginning of period t. We consider a dynamic setting, where the portfolio positions are
readjusted over time. These changes are due to new market information. The composition
of the tracking portfolio does not change, meaning that, we assume the stock selected at
time t are still available at time t+h, with h the revision investment horizon. This is kind
of reasonable if h is taken reasonably low. We applied an historical look-back approach
implicitly as in Beasley et al. (2003). The assumption is that the past contains enough
information to get the future potential dynamics of the index. This is reasonable as far as
we do not make estimation of covariances. There is no prediction we deal with. This is
outside the scope of the paper. Indeed, we applied population heuristics to get equilibria
between index and tracker fund (cointegration) and numerical procedures to get the optimal
vector of weights (objective).
3.1. Risk-measure and transaction costs
Several risk measures have been proposed in the literature. Most of them are based on
correlation measures or on estimates of variances of tracking deviations. The latter are however flawed. As noted in Beasley et al. (2003), if the difference between the index path and
those of the tracking portfolios are constant over time, then the tracking error would be zero.
This, of course, is an undesirable result because it does not take into account the tracking
bias. In the current investigation a weighted average between the root mean squared error
of the tracking deviation and the tracker excess returns is considered, penalizing constant
differences, (see also Beasley et al. (2003) and Ammann and Zimmermann (2001)). The
general baseline risk function considered is a modified version of Gaivoronski and Van der
Wijst (2005) defined as follows
( v
u "
2 #
)
u
µ
µ
I
I
t
t
t
t
Q(w, µt , It ) = λtE
+ (1 − λ)E
−
−
µt−1 It−1
µt−1 It−1
(6)
where µt = p̃t w is the trajectory of the tracking portfolio and p̃t = pt /p0 is the matrix of
normalized prices. Notice that λ ∈ (0, 1), represents an implicit trade-off between tracking
7
error and excess return. For instance λ = 1, corresponds to minimizing tracking error, i.e.
pure index tracking, while by imposing λ = 0 implies maximising the excess returns. In
the empirical experiment we solve the optimization problem for different values of λ in his
computational domain.
The extension of (6), we develop takes also into account transaction costs directly in the
decision problem as proposed by Derigs and Nickel (2004) and Adcock and Meade (1994).
Transaction costs are considered as sort of second objective, allowing to consider those
portfolios which are efficient both with respect to the risk measure and transaction costs.
This not-only helps to regularize the optimization algorithm but above all helps the fund
manager to discriminate those stocks with high transaction costs coming from, for instance,
high liquidity risk and so high bid-ask spreds. As in Brodie et al. (2008) transaction costs
are considered as
T C(w, w, s) =
n
X
si |wi − wi |
(7)
i=1
where si , wi , wi are respectively the per-share transaction cost, the optimal weights to be
chosen and the old optimal weights. Notice that, wi = 0, at the first date of investment.
Now, let us define the information set available to the manager as Dt = (w, s, µt , It ). Then
we can write population version of the loss function as a combination of (6) and (7) as
follows
L(w, Dt ) = Q(w, Dt ) + T C(w, Dt )
(8)
3.2. Constraints and optimization problem
Portfolio selection has to deal with investment preferences and law guidelines. Therefore
the set W of all potential portfolios should be restricted to a smaller set of feasible portfolios.
From a mathematical perspective the source of constraint is irrelevant, what matters is their
structure. The most basic constraint is the budget constraint, i.e. e0 w = 1. There are several
other usually considered manager-specific contraints. The most simple is the floor/ceiling
8
trading constraint, defined as Li ≤ wi ≤ Ui with Li , Ui representing respectively the lower
and upper bounds for each portfolio weight. With the same rationale we could define bundle
static constraints, by defining the lower and upper bound of trading in a certain industry
P
or geographical area, i.e LB ≤ i∈B wi ≤ UB . Gross-exposure constraints not only have
economic relevance but also helps in regularize the optimization algorithm. Indeed, as
depicted in Jagannathan and Ma (2003) and Fan et al. (2008), no-short sales together with
gross-exposure helps minimizing the portfolio risk. This is not true in general, however is
true if any dynamics or estimation is used in the optimization problem.
As is Beasley et al. (2003), Ruiz-Torrubiano and Suàrez (2009) and Derigs and Nickel
(2004) we consider the stock selection problem embedded in the weight selection by introducing an auxiliary binary variable z = {zi }N
i=1 , such that zi = 1 if the ith is considered as
outcome of stock picking procedure and zero otherwise. From the population version defined
in (6) and (8) we can write the empirical counterpart with the trading constraints as follows
w∗
( v
u
2
)
T T u1 X
X
p̃
w
I
p̃
w
I
1
t
t
t
t
= min λt
−
−
+ (1 − λ)
w∈W
T t=1 p̃t−1 w It−1
T t=1 p̃t−1 w It−1
+
N
X
si |wi − wi |
i=1
s.t.
w ∈ C(w)
(9)
where
( N
)
N
N
X
X
X
C(w) = w
zi wi = 1,
|wi zi | = 1,
zi ≤ n
i=1
i=1
(10)
i=1
with n representing the user-defined maximum cardinality of the portfolio. As showed in
Ruiz-Torrubiano and Suàrez (2009) including a cardinality constraint, which is discrete, leads
to an NP-hard problem therefore highly nonlinear. A possible approach is to investigate the
asset weights as a functional of z such that the solution of (9), (10) is expressed as {wi (z)}ni=1
for a fixed value of z. This is similar to the approach used in the present paper. The objective
function to find out z∗ is actually very different from (9). For that reason we separate the
9
problem in two steps, namely stock picking (binary problem) and weight selection (minimize
(9) for a given z), such that, the vector of weights turns out to be a functional of z∗ .
The latter is a binary model selection problem as follows
z∗ ∈ arg min R(I, P)
z∈Z
n
X
s.t.
zi = n,
with n < N
(11)
i=1
with R(I, P) the loss function and P ∈ P the subset of constituent stocks considered in the
set P of available assets. Notice the loss used is a functional of the residuals of a generalised
linear model in the spirit of univariate cointegration.
In other words, for a fixed value of n we solved (11) with respect to R(I, P), then we
plug z∗ in (9) to get w∗ . The risk function in the binary problem is found by using genetic
algorithm and is detailed explained in the next section.
4. Stock Picking, Genetic Algorithm and the ADF loss function
The stock selection problem is definitively a binary model selection one, and is somehow
nested in (9). Considering the high non-stationarity of the random variables involved, classical Stochastic Search Variable Selection methods cannot be efficiently used in the usual least
squares sense. This is true especially in large dimensional spaces, i.e. large indexes. This
lack of inefficiency in classical methodologies justifies the usefulness of Genetic Algorithm
[GA]. Indeed GA allows us to solve quite efficiently, helping to mitigate the computational
burden. The loss function used in GA is as follows.
Let us consider a linear model as
log(It ) =
N
X
zi βi log(pi,t )+t t ∼ π(0, σ 2 )
with
i=0

 1 for the i stock
th
zi =
 0 otherwise
(12)
where It and pi,t represent respectively the index and the ith constituent prices, and π(.)
a general distribution function. Our purpose is to choose z∗ such that β = {β1 , ..., βp }
10
represents the strongest equilibrium relationships between the index trajectory and the consituent stocks. By exploiting Definition 1, we can see that, given both log of index and
log of prices are I(1) stochastic processes, then if β represent a cointegration relationship,
∼ I(0), namely the residuals are supposed to be stationary processes. The loss function is
defined as
R(I, P) ≡ R() =
ρ̂ − 1
se(ρ̂)
(13)
with ρ a regression coefficient in the following auxiliary residual regression
t = ρt−1 +
D
X
∆t−i + ut ≡ xt β + ut
with ∆t = t − t−1
(14)
i=1
where se(ρ̂) and D are respectively the relative standard error and the lag-order. The null
hypothesis is H0 : ρ = 1, i.e. there is a unit root. The autoregressive univariate model in
(14) is neither with intercept nor with a stationary trend since E(ut ) = 0. This is the well
known Augmented Dickey Fuller [ADF]. The aim is to select the vector z∗ such that R() is
minimized, since the lower the ADF test statistics, the higher the stationarity of residuals,
meaning the stronger the equilibrium relationships between the index and the subset of
selected stocks. The ADF is long-time used for pairwise trading algorithm and recently
pioneered for index tracking problems as in Alexander (2001), Alexander and Dimitru (2005)
and Dunis and Ho (2005) among the others. Originally has been developed to test the
presence of unit root in a general stochastic process (see Engle and Granger (1987), Granger
(1983) and Granger (1981) for more details on ADF and cointegration).
4.1. Genetic Algorithm
As explained in the laset section the loss function to select the optimal indicator vector
z∗ is highly non-linear. Then, finding out the best indicator vector and plugging it in (9)
is no actually manageble with neither usual stochastic search nor brute forcse algorithms.
Population heuristics are then used as computational tool to get quickly and fairly precisely
to the model selection problem represented in (11). As aforementioned, the model selection
11
issue is to find out most cointegrated subset of stocks , where the degree of cointegration is
a functional of the linear model residuals in (12). The Genetic Algorithm used is developed
as follows. The P 0 and P 00 are n × k matrices, where each row rapresent a portfolio, i.e. a
Algorithm 1 Genetic Algorithm
1: Generate initial population P , initialize pmut and pcross
2: while stopping criteria not met do
3:
Select P 0 ⊂ P (mating pool) set P 00 = (set of child)
4:
for i=1 to nP (population size) do
5:
Select individuals ma and mb at random from P 0
6:
if u(0, 1) < pcross then
7:
cross-over: set mc = ma ∧ mb
8:
else
9:
do not cross-over: set mc = ma ∨ mb
10:
end if
11:
if u(0, 1) < pmut then
12:
mutate mc into µ
13:
else
14:
do not mutate: set µ = mc
15:
end if
16:
set P 00 = P 00 ∪ µ
17:
end for
18:
set P = P 00
19: end while
20: Return first row of P
vector of integers from 1 to N , where N is the number of stocks in the index. Intuitively,
each integer represents a stock. Similarly, P 0 is a sub matrix of P whose row size depends
on the parameters discussed below. When the solutions are taken individually as 1 × k
vectors, they are labeled with m. The asymptotic convergence to global solution relies
on the schema theorem by Holland (1974) and on Moral and Miclo (1999). They use the
possibility of constructing Markov transition matrix from one generation to the other in order
to apply standard Markov chain theory. In other words, as the population or the number
of generations increase (i.e. higher problem difficulty), convergence is harder to reach in
a reasonable amount of time. Therefore Genetic Algorithm’ s parameters are often chosen
according to some general guidelines presented in literature and by calibrating, mainly by
12
trial and error, the algorithm to the problem (see Chen (2002)). Following Jong (1975) and
Haupt and Haupt (2004) the value for pcross and pmut have been set to 0.9 and 0.3. This allow
the algorithm to explore a wide set of initial solutions. As suggested by Sivanandam and
Deepa (2008), they decrease over time, in order to make the algorithm to converge faster.
Given its optimal properties, uniform crossover is adopted. When a solution is mutated
n < N genes (stocks) are substituted by n randomly selected genes (stocks). However, none
of the replacing elements belongs to the initial chromosome (portfolio); it would generate
a non invertible regressor matrix. In order to improve the algorithm performance, elitism
operator is used by setting the row size of P 0 0.5 times the one of P ; it implies that only the
better half of the current population is used to breed the next generation. The algorithm
stops whether at least half of the P matrix is populated by the same model, or the best
model is the same for more than half of the total number of generations. In both cases keep
running the algorithm would be useless, because it has, most probably, already converged
towards a solution. On the light of these considerations, the only parameter to be tuned
by trial and error is nP (Population Size). This is crucial because if it is too small, there
would be not enough genetic diversification at damage of the quality of the final solution.
If it is too big, too many iterations will be performed before it eventually converge. A set
of different population sizes are chosen; for each value in this range the algorithm is run ten
times and the median fit function computed. Finally, these values are plotted against the
population sizes. The value for which the median fit function start converging, is chosen.
The last point to be discussed is how we select the fifty portfolios. Instead of running the
algorithms fifty times and picking the best solutions we run it once, rank all unique solutions
and select the first 50. This is done not only to reduce the computational burden but also
to have a more diversified set of final portfolios.
4.2. Portfolio simulation and rebalancing
From the optimization algorithm depicted in (9), (10) and (11) we get K = 50 optimal vectors of weights w∗ , for each of the GA solutions. We took K > 1 possible optimal
portfolios as robustness check. Our goal is to investigate the performance of the proce13
dure out-of-sample, against some other standard benchmark alternative procedures. The
analysis rely on a rolling sample approach1 . In particular, given a dataset with T daily
observations for each of the n stocks, we choose an estimation windows M = 24 months of
daily observations. This is the training sample used to calibrate the stock selection model.
Now, let us consider a period h of rebalancing, then at each time t, starting from t =
M + h, we used the data in the previous M months to calibrate the optimization problem.
The output vector of weights w∗ is then used to determine the relative tracker portfolio
returns in t + h. This process is carried on by adding a period h of daily data and drop the
earliest one to run the optimization procedure again. The outcome is then T − M monthly
out-of-sample returns generated by each of the K portfolio solutions. The period T − M
is called testing period, since here we test the reliability of the cointegration-based stock
selection strategy.
Notice that the portfolio composition is held from t = M to t = T . Then the GA
is run to get the initial tracker fund composition, and the latter is not modified in the
rebalancing procedure. This is to stress the reliability of cointegration in reducing turnover
and transaction costs. Indeed, selecting the mostly cointegrated subset of stocks is aimed
to benefit especially in the long-run. A drawback is that we should limit the investment
horizon to one year because of the changing nature of the index composition. However, is
outside our scope to dynamically model the composition of the index.
The rebalancing horizons considered are h = 1, 3, 6, 12 months of daily data. Transaction
costs are subtracted at each portfolio rebalancment. In case of one month rebalancing, h is
made by 21 trading day observations. Finally we consider different values of n. We took
this as percentages of the original benchmark with p = [0.05, 0.1, 0.15] such that n = p ∗ N
represents the size of the tracker fund.
1
As robustness check the procedure has been redone using an enlarging approach. The results turn out to
be qualitatively the same. However, we decided to reports the rolling basis approach since allows to impose
some very simple dynamics on the rebalancing framework
14
4.3. Performance measures
The output of the index tracking is a series of T −M out-of-sample daily returns generated
by each of the K portfolios. Based on this output we computed several performance measures
to assess the reliability of the conintegrated GA procedure. We chosen the performance
measure in such a way to be consistent with the objective function (9) and the reference
literature. As a first measure of closeness we compute the average beta of the tracker
portfolios. This is done by regressing the index returns on each of the K tracking returns.
The linear model is computed by Ordinary Least Squares.
β̂ =
1
βi
K
with βi comes from
rI (t) = βi ri,p (t) + ηt
for i = 1, ..., K
(15)
where ηt stationary distributed. The βi in (15) gives a measure of the sensitivity of ith tracking portfolio return with respect to the index return. This is meant as closeness measure.
Indeed, in principle, the closer β̂ is to 1, the closer is the tracking behavior with respect to
the benchmark. The second measure adopted is the out-of-sample average Mean Squared
Error computed as
K
1 X
\
M SEi
M SE =
K i=1
with
TX
−M
1
(ri (t) − rI (t))2
M SEi =
T − M t=1 p
(16)
We adopt (16) because is consistent with some of the reference literature as Maringer and
Oyewumi (2007) and Beasley et al. (2003) among the others, and because is consistent with
the objective function we used in (9).
The third measure adopted is the average Tracking Error volatility across the selected
portfolios. The TEV is computed as
K
X
1
T[
EV =
T EVi
K i=1
with
TX
−M
1
T EVi =
[(rI − ri ) − (rˆI − rˆi )]2
T − M t=1
(17)
where rˆI and rˆi are respectively the mean out-of-sample index and ith tracking returns. This
measure is meant to be consistent with most of the reference financial literature as Alexander
15
and Dimitru (2005), Jansen and Van Dijk (2002) and Jorion (2003).
The last closeness measure is the average turn over. The average turn over provides a
measure of stability of the portfolio selection procedure, namely the reliability of cointegration and GA together with weight selection as in (9). The average trading volume is
computed as
K
1 X
ˆ
T Oi
TO =
K i=1
with
TX
−M X
N
1
i
i
T Oi =
|wj,t+1
− wj,t
|
T − M t=1 j=1
(18)
Finally, we consider a non-parametric perfomance measure. A non-parametric measure is
useful since do not take into account outliers in determining the probability of getting en
excess positive returns. This could be the case in using the expected excess returns. Again,
what we are interested in is the probability to outperform the index, especially getting close
to λ = 0. The non-paramteric measure used is just the average probability of a positive
excess returns with respect to the index, and is defined as follows
n
1 X
\
P (ri > rI ) =
P (ri > rI )
K i=1
(19)
with
P (ri > rI ) =
1{ri >rI }
T −M
where
1{ri >rI }

 1 if r > r
i
I
=
 0 otherwise
(20)
Notice that the averages are taken across the K portfolios generated for each of the comparison methodologies. The latter are explained in more details in Section 5.
5. Comparing the algorithm: Alternative Methodologies
We propose four different methodologies to test the reliability of the cointegrated index
tracking methodology proposed earlier. They have been chosen to help in disentangling the
joint benefit of cointegration in stock picking and the weight selection objective function (9).
Notice that we made separately the stock picking and weight selection procedure. This is to
16
consistently compare these benchmark procedures and the cointegrated GA we proposed.
5.1. Correlation-based stock picking
The first two methodologies are based on correlation between index returns and the consituent stocks. The “optimal” subset of stocks for the tracker fund is done by taking the
most n correlated stocks with the index. To generate the K = 50 portfolios we used a stationary non-parametric bootstrap as in Politis and Romano (1994) getting K tracker funds.
This exercise is done considering three different size of the funds with p = [0.05, 0.1, 0.15] of
the benchmark such tha n = p × N , with N the index dimension. In this case, for p fixed,
the stock selection is based on a z∗j binary selection variable defined as
z∗j = rank {ρi,j }ni=1
with
ρi,j = corr(ri,j , rI )
and
j = 1, ..., K
(21)
where ri,j is the returns of the ith stock in the jth sampling. Again, rank(.) is a ranking
operator sorting in descending order the most n correlated stocks with respect to the index.
Then once made the stock picking we find the optimal portfolio weight by applying either
(9) plus (10) or the following returns based counterpart
w∗
( v
)
u
T
T
u1 X
X
1
= min λt
(rt w − It )2 + (1 − λ)
(rt w − It )
w∈W
T t=1
T t=1
+
N
X
si |wi − wi |
i=1
s.t.
w ∈ C(w)
(22)
where
( N
)
N
N
X
X
X
C(w) = w
zi wi = 1,
|wi zi | = 1,
zi ≤ n
i=1
i=1
i=1
17
(23)
with rt the n-dimensional vector of the selected stocks. Notice that (22) is repeated for
each of the K sampled portfolios, generating different solutions and getting averaged performance2 .
5.2. Random selection
The methods referred as 3 and 4 in the empirical results are based on random stock
selection. This method is taken as benchmark since taking a large subset of stocks in a
broadly based indexes, likely implies the possibility to get cointegration between the stock
selected and the index. Thus, comparing this method with the one we propose helps in
pointing out the advantages of the genetic algorithm with the ADF as loss function. In
general the binary selection vector is then defined as z∗ taking randomly n stocks among
the N possible. Once we get the binary selection vector we applied either (9) or (22) to
get respectively method 3 and 4. This helps in disentangling and compare the joint benefit
of (9) and (11) the used in the cointegrated GA proposed. The stock and weight selection
procedure is done K times to get the portfolios and averaging the performances.
6. Data
6.1. Sample Description
In the empirical investigation we considered three different broadly based indexes. They
are selected based on increasing dimensions. The aim is to disentangle the benefit of the
GA jointly with the cointegration procedure as the computational burden gets bigger. The
indexes considered are the FTSE 100, the NIKKEI 225 and the S&P500. Not only the
dimension but also the way of constructing the index composition differs across indexes.
Indeed the S&P 500 and the FTSE are value-weighted composite indexes while the Nikkei
is one of the few example of price-weighted indexes. This helps in clarify the benefit of
2
The combination of (21) and (9) is referred as method 1 in the empirical results. While the method 2
of the alternative strategies refers to the combination of (21) and (22)
18
tracking the trajectories instead of the returns. The latter actually do not take into account
even undirectly the weighting procedure implied peculiarly by price-weighted indexes3 .
Tha database is from CRSP, Datastream and Bloomberg. The information used consists
of daily closing price of components stocks over the period September 2005 - August 2008.
The benchmarks are the aforementioned indexes, namely S& P500, NIKKEI 225 and FTSE
100. The descriptive statistics are reported in Table (1).
[Insert Table 1 here]
The sample period has been chosen purposely like that. This is to get bull and bear markets
respectively in-sample and out-of-sample. The aim is to check the robustness of the stock
picking (cointegration) procedure. Therefore, even though the goal is not estimate any
covariance structure, we can implicitly argue that the algorithm actually get the strongest
equilibria even in market regime switching. A plot of the index is reported in Figure (1)
[Insert Figure 1 here]
The grey vertical line indicates the end of the in-sample period for the GA stock selection.
As we can see the in-sample is bull market while the out-of-sample is bear market, meaning
that, there is a shift from one market regime to another at the beginning of the testing
period. This helps in clarify the benefits of the cointegration approach as stock selection
strategy for the tracker fund.
7. Empirical Results
In this section we report the empirical results of the GA + cointegration procedure
compared to the others benchmark/alternative methodologies. The results are reported
3
The value-weighted index is based on the capitalization. However considering the latter as the number
of outstanding shares times the trading price, we can fairly say that the share weight in the index is a
function of its price even though indirectly.
19
separately for each of the three indexes considered. Further the results are reported for
three different values of λ in (9), precisely λ = [1, 0.75, 0.5]. Again, we consider different
N s, meaning different percentages of the index as subset of stocks. Especially we take the
p = [0.05, 0.1, 0.15] of the stocks such that N = p × K, with K the index dimension. Then
we take the average across these percentages. This is meant as robustness exercise. Figure
(2) reports the results for the average portfolio got with the GA procedure.
[Insert Figure 2 here]
Each table the first column represents the absolute value of the measure relatively to our
algorithm. On the other hand from the second to the last column we report the ratio with
the alternative measure as denominator and the GA procedure as numerator. Therefore a
value greater than 1 means the GA has a greater value for that measure and the other way
round. The alternative procedures are numbered as follows. Method 1 is the correlation
stock picking with the (9) as objective function. Method 2 stands for the correlation-based
stock selection strategy and the usual MSE as in (3). Then Method 3 and 4 represents the
Randomly selected portfolios with respectively (9) and (3) as objective functions. Finally
the βs are compared considering the ration |1 − β̂|/|1 − β 0 | where β 0 is the average beta from
the K alternative portfolios generated.
7.1. FTSE 100
Table (2) reports the experimental results for the FTSE 100. The β̂ span from 0.78 to
0.80. All of the alternative procedures reports significantly lower ratios. This is true for
all the four investment horizons considered and all of the λs. Again, the randomly selected
portfolios seem to be better than the correlation-based. This provides some insight about the
out-of-sample unreliability of the usual correlation as stock selection procedure. The reason
why the randomly selected portfolios do not so bad is probably because taking a large subset
of stocks in the index likely implies taking some weak cointegration relationship.
[Insert Table 2 here]
20
The relationship between correlation based and randomly selected persists also with reference
to the MSE. Again, this is because the aforementioned probability to randomly select a
cointegration relationship in a widely spread index as the FTSE 100. As we can see, the GA
procedure significantly outperform the others. This is true across the all of the lambdas and
all of the investment horizons. The picture slightly changes for the average TEV. Indeed both
the randomly selected portfolios and the correlation-based ones comparably underperform
the proposed index tracking methodology. On the other hand, considering the probability
to get positive returns as in (19), we can see that the correlation slightly outperform the
randomly selected portfolios. The GA anyway gets the higher probability of positive excess
returns. This is regardless the lambda considered, and persistent across investment horizons.
[Insert Table 3 here]
Table (3) Panel A, shows the average trading volume for the FTSE case. As we can see the
GA cointegrated procedure for the short rebalancing term, namely 1 month and three month
the performances are close to each other. This is true especially for λ = 1. On the other
hand the GA starts to sensibly outperform the others procedures as the investment horizon
increases, i.e. 6 and 12 months. This shows that the strongest cointegration relationships
got by the GA towards the ADF actually allow to reduce the rebalancing needs getting
more stable portfolios. This is perfectly in line with our prior assumption that cointegration
matters more than correlation as a measure of (especially long-term) dependence.
7.2. NIKKEI 225
Table (4) reports the results for the Nikkei 225 index. With reference to the βs, the
increases in absolute terms. This is fair enough since the percentages of the indexes taken as
subset of stocks incresed proportionally to the index dimensions. In relative terms, we can
point out the same rationale as for the FTSE case. Indeed, the randomly selection seems
to be a though competitor in terms of tracker sensitivity. This is true especially if we use
21
(9) as objective function. However, this competitiveness of the randomly selection strategy
disappear with respect to all of the other measures. Indeed, all of the four alternative
procedures are clearly, equally, outperformed by the cointegrated GA method looking at the
MSE measure. This is also true with respect to the average tracking error volatility.
[Insert Table 4 here]
The probability to get a positive excess returns shows another interesting picture. Indeed,
all of the procedures are quite equally meaningful. However, generally speaking, the cointegrated GA procedure gets the same probability to realize an excess returns with less risk.
Then the risk adjusted probability to get excess returns is anyway higher for the methodology we proposed in the paper. Finally, Panel B of Table (3) Panel B shows the average
trading volume. We can point out that the correlation-based methods is clearly outperformed, while the random selection seems to be fairly comparable. The latter indeed turns
out to be better especially for the long-term investment horizon. Howver, overall the cointegrated GA procedure turns out to outperform the random selection, since we get a lower
risk and transaction costs adjusted expected returns. This is true since the probability of
getting excess returns is adjuted for transaction costs, meaning that risk adjusted excess
returns are cleaned with respect to trading volume. Then our procedures overall seems to
outperform the others.
7.3. S&P 500
The last index is the most relevant for our purposes. This is because is the largest index
in our dataset, then the most reliable in terms of curse of dimensionality. The prior is that
the cointegrated GA procedure gets clearly better for all of the measures since the benefit of
both the genetic algorithm efficiency and cointegration. Table (5) reports the results relative
to the average tracker fund. As before the first column reports the absolute value for our
methodology, while from the second to the last column we reports the relative(ratio) values
where the GA is the numerator.
22
As we can see the βs are strongly better than the alternative procedures. The evidence
is far stronger than for the others two indexes. This shows how our procedure gets better
when the curse of dimensionality starts to become a serious issues. This makes sense since
the rationale behind the GA and parsimony coming from cointegration in the stock picking
procedure. The same path is stressed with reference to the MSE measure. Again, none of the
four alternatives is comparabl close to the one we propose. The only that seems to be fairly
closed is the one with random selection and (9) as objective function. Again this is because
taking a fairly large subset implies taking some (maybe weak) equilibrium (cointegration)
relationship. Then the proper optimization function we defined helps in getting closer to
the index trajectory avoiding the usual measurement error aforementioned.
[Insert Table 5 here]
The average TEV points out the same rationale. The random selection does not so badly
while the correlation based strategies are by far the worst. As before, this is true for all of
the λs considered and across the index percentages. In relative terms the probability to get
positive returns is fairly comparable for all of the four different methodologies. However, as in
the other three indexes, the risk adjusted performance, is clearly in favour of the cointegrated
GA procedure. The same argument relates the average trading volume showed in Panel C of
Table (3). Although the random selection with (9) objective is fairly comparable in relative
terms, our procedure is fairly better overall. Especially because the average trading volume
for the longest investment horizon is largely in favour of our procedure. This is the objective
that we had at the beginning to stress the reliability of getting stable tracker portfolios
towards the strongest cointegration relationships.
8. Concluding remarks
In this paper we consider an alternative index tracking methodology based on cointegration applied to high-dimensional indexes. The allocation procedure is separated in two
steps, (1) getting the optimal subset of stocks minimizing a particular loss function, i.e. the
23
ADF, towards a genetic algorithm, and (2) getting the optimal vector of weights tracking the
normalized trajectory of the index instead of ubiquitous returns. In this way we empirically
investigate the theoretical benefits of cointegration between the tracking portfolio and the
benchmark.
The genetic algorithm allows to handle the binary model selection problem (stock picking) by using the ADF as loss function in an efficient way. We essentially exploit the fact
that the tracker fund and the benchmark are tied together, especially in the long-run. Yet,
we point out that, the usefulness of tracking trajectories instead of prices allows us to exploit the full amount of information including that in the common and linear trends in the
non.stationary level of prices. This is clearly an advantage with respect to usual correlationbased distance measures, which are widely known to be unstable across time.
We test the reliability of our procedure against four different alternative methods. The
first two refers to correlation-based stock picking, while the others are based on random
portfolio selection. In all of the cases we applied the stock selection to different objective
weight selection functions. This allows us to disentagle the joint benefit of using the strongest
equilibria (cointegration) as stock picking guidelines and benchmark trajectories.
In the empirical investigation we considered three different indexes, four different investment horizons and three different size of tracker funds. Yet, all of the performances are
related to the average of 50 generated portfolios for each of the procedures and rebalancing
periods. Our tracker funds raise some interesting results. We outperform the aforementioned alternative procedures with respect to all of the distance measures, Mean Squared
Error, Tracking Error Volatility and Beta. The same is true for the risk adjusted probability
of getting positive excess returns. Finally the average trading volume turns out to be lower
than the competing models especially for longer investment horizons. This is coherent with
the inherent benefit of cointegration as equilibrium long-run relationship between the tracker
fund and the benchmark.
24
Appendix A: Proposition 1
Let us consider the usual definition of portfolio value, for n < N subset of stocks, at
P
time t, V (t) = ni=1 ci (t)pi (t). The portfolio return is
V (t) − V (t − 1)
rp (t) =
V (t − 1)
(24)
Then considering a usual tracker portfolio return approximation defined as rp (t) =
with wi (t) =
Pnci (t)pi (t)
i=1 ci (t)pi (t)
Pn
i=1
wi (t)ri (t)
and ri (t) = (pi (t) − pi (t − 1))/pi (t − 1). Then we can see that
X
n
V (t) − V (t − 1)
rp (t) =
≡
wi (t)pi (t)
V (t − 1)
i=1
⇔
ci (t) ≈ ci (t − 1)
(25)
Indeed (25) equivalence comes from
X
n
V (t) − V (t − 1)
ci (t)pi (t) − ci (t − 1)pi (t − 1) pi (t − 1)
Pn
×
rp (t) =
≡
V (t − 1)
pi (t − 1)
i=1 ci (t − 1)pi (t − 1)
i=1
(26)
together with the assumption ci (t) ≈ ci (t − 1). Then this incorporates a quite meaningful
amount of error coming from rp (t) approximation. This is especially relevant using (3) as
primary objective function.
References
Adcock, C. and Meade, N. (1994). A simple algorithm to incorporate transaction costs in quadratic optimization. European Journal of Operational Research, 79:85–94.
Alexander, C. (2001). Optimal hedging using cointegration. Phiosophical Transactions of the Royal Society,
Series A:2039–2058.
Alexander, C. and Dimitru, A. (2005). Indexing and statistical arbitrage. Journal of Portfolio Management,
Winter:50–63.
Ammann, M. and Zimmermann, H. (2001). Tracking error and tactical asset allocation. Financial Analysts
Journal, 57(2):32–43.
Barber, B. and Odean, T. (2000). Trading is hazardous to your wealth: the common stock investment
performance of individual investors. Journal of Finance, 55(2):773–806.
25
Beasley, J. E., Meade, N., and Chang, T. J. (2003). An evolutionary heuristic for the index tracking problem.
European Journal of Operations Research, 148(3):621–643.
Brodie, J., Daubechies, I., De Mol, C., Giannone, D., and Loris, I. (2008). Sparse and stable markowitz
portfolios. ECB, Working Paper Series, No. 936:4–19.
Chen, S., editor (2002). Genetic Algorithms and Genetic Programming in computational finance.
Derigs, U. and Nickel, N. H. (2004). On a local-search heuristic for a class of tracking error minimization
problems in portfolio management. Annals of Operations Research, 131:45–77.
Dose, C. and Cincotti, S. (2005). Clustering of financial time series with application to index and enhanced
index tracking portfolio. Physica A, 355:145–151.
Dunis, C. and Ho, R. (2005). Cointegration portfolios of european equities for index tracking and market
neutral strategies. Journal of Asset Management, 6:33–52.
Engle, R. F. and Granger, C. W. J. (1987). Cointegration and error correction: Representation, estimation
and testing. Econometrica, 55:251–276.
Fan, J., Zhang, J., and Yu, K. (2008). Asset allocation and risk assessment with gross exposure constraints
for vast portfolios. Preprint, Princeton University.
Focardi, S. and Fabozzi, F. (2004). A methodology for index tracking based on time-series clustering.
Quantitative Finance, 4:417–425.
Frino, A., Gallagher, D., Neubert, S., and Oetomo, T. (2004). Index design and implications for index
tracking. The Journal of Portfolio Management, Winter 2004:89–95.
Gaivoronski, A., K. S. and Van der Wijst, N. (2005). Optimal portfolio selection and dynamic benchmark
tracking. European Journal of Operational Research, 163:115–131.
Granger, C. W. J. (1981). Some properties of time series data and their use in econometric model specification. Journal of Econometrics, 16:121–30.
Granger, C. W. J. (1983). Cointegrated variables and error correction models. Discussion Paper, UCSD.
Haupt, R. and Haupt, S. (2004). Practical genetic algorithms. Wiley.
Holland, J. (1974). Adaptation in natural and artificial systems. University of Michigan press.
Jagannathan, R. and Ma, T. (2003). Risk reduction in large portfolios: Why imposing the wrong constraint
helps. Journal of Finance, 58(4):1651–1684.
Jansen, R. and Van Dijk, R. (2002). Optimal benchmark tracking with small portfolios. The Journal of
Portfolio Management, Winter 2002:33–39.
Jensen, M. (1968). The performance of mutual funds in the period 1945-1964. Journal of Finance, 23(2):389–
416.
Jong, J. D. (1975). An analysis of the behavior of a class of genetic adaptive systems.
Jorion, P. (2003). Portfolio optimization with tracking-error constraints. Financial Analysts Journal,
26
September/October 2003:70–82.
Maringer, D. and Oyewumi, O. (2007). Index tracking with constrained portfolios. Intelligent Systems in
Accounting, Finance and Management, 15:57–71.
Moral, P. D. and Miclo, L. (1999). On the convergence and applications of generalised simulated annealing.
SIAM Journal of Controp Optimization, 2:109–135.
Politis, N. and Romano, J., P. (1994). The stationary bootstrap. Journal of the American Statistical
Association, 89.
Ruiz-Torrubiano, R. and Suàrez, A. (2009). An hybrid optimization approach to index tracking. Annals of
Operations Research, 166:57–71.
Sivanandam, S. and Deepa, S. (2008). Introduction to Genetic Algorithms.
TABLES AND FIGURES
Table 1: Descriptive Statistics
This table reports descriptive statistics of the considered indexes. Period from Sep 2005 to Aug 2008,
T = 756 daily observations. Notice ValueW means Value-Weighted index composition. The same is true
for Price-Weights. Data are not express in percentages.
FTSE 100
NIKKEI 225
S&P 500
Num. Stocks Mean Median
81
-0.0001 0.0000
208
0.0002 0.0000
442
0.0001 0.0004
27
St. Dev.
0.0117
0.0131
0.0094
Skew
Kurt
Type
0.1304 8.7012 ValueW
-0.4368 4.6387 PriceW
-0.1624 5.2076 ValueW
Table 2: FTSE 100
This table reports the results for the FTSE 100 index. The sample comes from Sep 2005 to Aug 2008 with
T = 756 daily observations. The in-sample lenght is M = 504 daily observations with a testing period of
T − M = 252 daily observations. A month is represented by 21 trading days. Model 1 and 2 are
correlation-based stock picking while model 3 and 4 entails random selected portfolios. Beta represents
the OLS coefficient of regressing the index returns on the tracker returns, Mse is the Mean Squared Error,
Tev represents the Tracking Error Volatility while Prob is the probability to get a positive excess return.
Out-of-Sample Performances
λ
1
Horizon
1
3
6
12
0.75
1
3
6
12
0.5
1
3
6
12
GAcoin
0.805
0.803
0.793
0.787
0.808
0.800
0.789
0.782
0.807
0.800
0.792
0.784
λ
1
GAcoin
3.40e-002
3.42e-002
3.54e-002
3.60e-002
3.34e-002
3.45e-002
3.59e-002
3.68e-002
3.37e-002
3.45e-002
3.53e-002
3.62e-002
Horizon
1
3
6
12
0.75
1
3
6
12
0.5
1
3
6
12
Beta
1
2
0.535 0.569
0.549 0.590
0.559 0.593
0.520 0.559
0.512 0.555
0.546 0.601
0.566 0.624
0.525 0.580
0.528 0.556
0.553 0.600
0.562 0.607
0.522 0.569
3
0.702
0.742
0.735
0.758
0.684
0.732
0.759
0.784
0.679
0.734
0.739
0.761
4
0.778
0.816
0.837
0.819
0.735
0.803
0.797
0.786
0.794
0.829
0.829
0.819
GAcoin
8.35e-005
8.68e-005
9.24e-005
9.89e-005
7.81e-005
8.85e-005
9.57e-005
1.04e-004
8.07e-005
8.77e-005
9.01e-005
9.72e-005
1
0.698
0.730
0.699
0.567
0.631
0.740
0.739
0.590
0.680
0.750
0.706
0.552
Mse
2
0.759
0.819
0.763
0.606
0.704
0.846
0.827
0.644
0.723
0.827
0.757
0.598
3
0.696
0.714
0.694
0.667
0.679
0.775
0.770
0.750
0.699
0.758
0.712
0.681
4
0.846
0.867
0.867
0.836
0.732
0.856
0.816
0.784
0.865
0.878
0.834
0.805
Tev
2
0.847
0.870
0.838
0.745
0.826
0.884
0.876
0.769
0.831
0.878
0.851
0.752
3
0.820
0.824
0.818
0.807
0.816
0.849
0.851
0.842
0.822
0.843
0.833
0.820
4
0.892
0.895
0.896
0.875
0.848
0.890
0.871
0.853
0.904
0.904
0.891
0.874
GAcoin
49.54
49.05
49.22
49.13
49.26
49.02
49.21
48.92
49.07
49.07
49.09
49.01
1
1.189
1.149
1.200
1.180
1.180
1.159
1.192
1.181
1.173
1.153
1.185
1.175
Prob
2
1.161
1.131
1.159
1.136
1.153
1.130
1.161
1.142
1.138
1.131
1.151
1.130
3
1.041
1.029
1.030
1.033
1.040
1.026
1.030
1.030
1.034
1.025
1.029
1.033
4
1.041
1.020
1.032
1.030
1.043
1.029
1.041
1.036
1.034
1.023
1.033
1.034
1
0.799
0.813
0.798
0.707
0.768
0.811
0.813
0.715
0.793
0.819
0.806
0.706
28
Table 3: Average Trading Volumes
This table reports the results for the trading volumes. The results are in percentages. The sample comes
from 15-Sep-2005 to 6-Aug-2008 with T = 756 daily observations. The in-sample lenght is M = 504 daily
observations with a testing period of T − M = 252 daily observations. A month is represented by 21
trading days. Model 1 and 2 are correlation-based stock picking while model 3 and 4 entails random
selected portfolios.
Panel A
Ftse 100
λ
Horizon GAcoin
1
2
1
1
8.73
0.948 0.874
3
18.39
0.898 0.933
6
18.63
0.567 0.464
0.75
1
7.89
0.871 0.766
3
17.59
0.815 0.869
6
18.19
0.552 0.520
0.5
1
7.23
0.883 0.711
3
16.51
0.829 0.839
6
15.64
0.483 0.456
3
1.027
1.010
0.697
0.881
0.854
0.643
0.806
0.832
0.518
4
0.851
1.001
0.916
0.767
0.897
0.844
0.728
0.911
0.782
Panel B
Nikkei 225
λ
Horizon GAcoin
1
2
3
1
1
4.89
0.763 0.681 0.999
3
7.46
0.614 0.630 0.817
6
14.60
0.858 0.884 1.142
0.75
1
5.26
0.805 0.745 1.067
3
7.46
0.610 0.659 0.814
6
14.89
0.894 1.123 1.186
0.5
1
5.03
0.753 0.697 1.017
3
7.63
0.633 0.735 0.843
6
15.18
0.946 1.352 1.215
4
0.890
0.840
1.304
0.950
0.777
1.515
0.880
0.784
1.435
Panel C
S&P 500
λ
Horizon GAcoin
1
2
1
1
2.40
0.603 0.514
3
3.47
0.618 0.467
6
3.33
0.315 0.209
0.75
1
2.17
0.531 0.463
3
3.72
0.714 0.512
6
3.83
0.363 0.239
29
0.5
1
2.20
0.555 0.471
3
3.53
0.674 0.460
6
3.51
0.333 0.223
3
4
1.032 0.738
0.943 0.708
0.647 0.500
0.986 0.718
1.086 0.861
0.753 0.607
1.047 0.743
1.064 0.840
0.649 0.579
Table 4: NIKKEI 225
This table reports the results for the Nikkei 225 index. The sample comes from 2-Aug-2005 to 23-June-2008
with T = 756 daily observations. The in-sample lenght is M = 504 daily observations with a testing period
of T − M = 252 daily observations. A month is represented by 21 trading days. Model 1 and 2 are
correlation-based stock picking while model 3 and 4 entails random selected portfolios. Beta represents
the OLS coefficient of regressing the index returns on the tracker returns, Mse is the Mean Squared Error,
Tev represents the Tracking Error Volatility while Prob is the probability to get a positive excess return.
Out-of-Sample Performances
λ
1
Horizon
1
3
6
12
0.75
1
3
6
12
0.5
1
3
6
12
GAcoin
0.932
0.929
0.931
0.931
0.930
0.927
0.929
0.929
0.931
0.927
0.929
0.929
λ
1
GAcoin
1.94e-002
1.95e-002
1.95e-002
1.91e-002
1.94e-002
1.94e-002
1.94e-002
1.91e-002
1.93e-002
1.94e-002
1.94e-002
1.91e-002
Horizon
1
3
6
12
0.75
1
3
6
12
0.5
1
3
6
12
Beta
1
2
0.486 0.508
0.537 0.546
0.552 0.602
0.569 0.606
0.495 0.526
0.528 0.584
0.560 0.640
0.577 0.645
0.485 0.524
0.542 0.602
0.564 0.646
0.570 0.646
3
0.901
0.925
1.012
1.095
0.948
0.944
1.039
1.109
0.934
0.971
1.046
1.104
4
0.700
0.774
0.827
0.824
0.730
0.758
0.800
0.818
0.682
0.754
0.766
0.784
GAcoin
2.45e-005
2.45e-005
2.46e-005
2.36e-005
2.44e-005
2.43e-005
2.44e-005
2.34e-005
2.44e-005
2.45e-005
2.44e-005
2.34e-005
1
0.841
0.865
0.854
0.838
0.869
0.867
0.886
0.863
0.863
0.898
0.889
0.862
Mse
2
0.727
0.727
0.759
0.739
0.732
0.757
0.793
0.768
0.764
0.829
0.823
0.795
3
0.759
0.744
0.753
0.761
0.761
0.749
0.753
0.763
0.764
0.764
0.754
0.764
4
0.707
0.714
0.755
0.713
0.760
0.731
0.750
0.724
0.693
0.707
0.696
0.681
Tev
2
0.869
0.876
0.892
0.881
0.872
0.892
0.906
0.894
0.888
0.921
0.918
0.905
3
0.885
0.881
0.886
0.891
0.887
0.884
0.886
0.892
0.889
0.891
0.887
0.893
4
0.876
0.881
0.899
0.881
0.888
0.879
0.888
0.878
0.858
0.870
0.865
0.857
GAcoin
42.44
45.83
46.58
47.33
46.61
46.12
46.47
48.67
46.12
47.21
44.95
48.67
1
0.933
0.991
1.030
1.050
1.037
1.030
1.032
1.081
1.030
1.050
1.000
1.088
Prob
2
0.918
0.998
1.010
1.027
1.001
0.989
0.996
1.046
0.989
1.014
0.961
1.047
3
0.925
0.995
1.004
1.021
0.994
1.003
1.006
1.054
1.003
0.971
1.032
1.049
4
0.918
0.987
1.005
1.024
0.983
0.999
1.015
1.053
0.999
1.014
1.037
1.054
1
0.926
0.939
0.935
0.926
0.938
0.937
0.946
0.935
0.935
0.952
0.948
0.935
30
Table 5: S&P 500
This table reports the results for the S&P 500 index. The sample comes from 15-Sep-2005 to 6-Aug-2008
with T = 756 daily observations. The in-sample lenght is M = 504 daily observations with a testing period
of T − M = 252 daily observations. A month is represented by 21 trading days. Model 1 and 2 are
correlation-based stock picking while model 3 and 4 entails random selected portfolios. Beta represents
the OLS coefficient of regressing the index returns on the tracker returns, Mse is the Mean Squared Error,
Tev represents the Tracking Error Volatility while Prob is the probability to get a positive excess return.
Out-of-Sample Performances
λ
1
Horizon
1
3
6
12
0.75
1
3
6
12
0.5
1
3
6
12
GAcoin
0.947
0.959
0.963
0.962
0.950
0.956
0.962
0.960
0.952
0.961
0.965
0.965
λ
1
GAcoin
1.66e-002
1.61e-002
1.60e-002
1.61e-002
1.63e-002
1.61e-002
1.59e-002
1.60e-002
1.61e-002
1.59e-002
1.58e-002
1.58e-002
Horizon
1
3
6
12
0.75
1
3
6
12
0.5
1
3
6
12
Beta
1
2
0.221 0.218
0.212 0.202
0.211 0.197
0.194 0.177
0.210 0.208
0.200 0.192
0.200 0.188
0.182 0.166
0.219 0.218
0.210 0.201
0.207 0.195
0.191 0.175
3
0.661
0.698
0.697
0.751
0.661
0.695
0.700
0.759
0.681
0.728
0.726
0.806
4
0.504
0.527
0.521
0.515
0.493
0.511
0.513
0.498
0.558
0.589
0.588
0.569
GAcoin
1.77e-005
1.66e-005
1.64e-005
1.66e-005
1.70e-005
1.65e-005
1.64e-005
1.91e-002
1.67e-005
1.61e-005
1.60e-005
1.61e-005
1
0.271
0.212
0.203
0.163
0.257
0.210
0.162
0.935
0.253
0.206
0.197
0.158
Mse
2
0.271
0.195
0.179
0.133
0.259
0.197
0.134
0.894
0.258
0.192
0.176
0.133
3
0.863
0.826
0.812
0.871
0.868
0.870
0.919
0.892
0.840
0.845
0.830
0.905
4
0.666
0.637
0.628
0.613
0.625
0.613
0.594
0.878
0.723
0.719
0.721
0.688
Tev
2
0.528
0.449
0.431
0.374
0.516
0.451
0.429
0.374
0.515
0.446
0.426
0.373
3
0.951
0.929
0.922
0.952
0.952
0.952
0.941
0.978
0.940
0.941
0.935
0.972
4
0.856
0.837
0.829
0.824
0.835
0.831
0.829
0.822
0.875
0.871
0.871
0.858
GAcoin
49.30
49.27
49.26
49.17
49.15
49.38
49.07
49.07
49.23
49.31
49.21
49.15
1
1.158
1.168
1.174
1.184
1.152
1.173
1.170
1.183
1.156
1.169
1.172
1.183
Prob
2
1.181
1.195
1.202
1.221
1.180
1.200
1.198
1.217
1.177
1.189
1.197
1.220
3
1.025
1.023
1.023
1.013
1.019
1.021
1.017
1.011
1.022
1.017
1.018
1.012
4
1.018
1.018
1.018
1.015
1.015
1.017
1.021
1.015
1.014
1.015
1.021
1.023
1
0.528
0.470
0.461
0.417
0.514
0.468
0.458
0.415
0.511
0.464
0.454
0.411
31
Figure 1: Indexes
This figure reports the three indexes considered. The grey vertical line represents the in-sample end for the
GA stock picking procedure. As we can see there is a regime switch from bull to bear market at the
beginning of the out-of-sample period.
32
Figure 2: Indexes and the Cointegrated GA procedure
This figure reports the three indexes considered together with the average trakcer fund got by our
proposed approach. The series reported is the T − M out-of-sample period.
33