High-dimensional Index Tracking with Cointegrated assets using an hybrid Genetic AlgorithmI Daniele Bianchi∗,a,b , Antonio Garganoa,c a b Department of Finance, Bocconi University, Milan, Italy IROM Department, Red McCombs School of Business, University of Texas at Austin, USA c Rady School of Management, UCSD, USA Abstract In this paper we present a two-steps procedure to solve a large class of portfolio management problems, i.e.(active)index tracking in high-dimensional spaces. The goal is to track the trajectory of vast indexes instead of the ubiquitous returns. Cointegration between index and constituent levels is the key point in making the stock selection. The binary stock selection NP-hard problem is efficiently solved by heuristic procedures by using the Augmented Dickey Fuller test as decision rule. The procedure is completed selecting the optimal set of weights, setting the objective as a function of the index and the tracking portfolio normalized trajectories. Transaction costs are taken carefully into account by introducing an L1-penalization in the aforementioned target function. The developed approach is tested on the basis of several distance and performance measures, against other, well-established, index tracking procedures and by using three different indexes. Likewise we considered several cardinality sizes and different portfolio rebalancing horizons, consistently with the changing nature of broadly based indexes. We find that the index trajectory can be arguably tracked satisfactorily. More importantly, we outperform the benchmark return-based procedures with reference to several distance/performances measures. I This draft: March 14, 2011, We would thank Paolo Colla, Fabrizio Leisen and Eva Besada for helpful comments and suggestions. ∗ Corresponding author Email address: [email protected] (Daniele Bianchi) March 14, 2011 1. Introduction As opposed to traditional fund managers that seek to beat the market by using every sort of stock picking strategy, trackers do something different attempting to match the riskreturn profile of a benchmark index. Index tracking funds have become a popular investment vehicle in the 1990s when a handful of investment banks began offering these products to small investors. Since then, ETFs and tracker funds have seen a sharpe increase in volume, which is almost doubled during 2000s and still increasing. Usually index tracking is related to market efficiency as shown in Jensen (1968) among the others. Indeed, empirical findings seems to support the risk-adjusted scarse profitability of traditional funds, especially if transaction costs are involved leading to losses even in bull market, (see e.g. Barber and Odean (2000)). Then, tracker funds represent a useful alternative investment method, even where capital market efficiency does not hold, to get round the perceived drawbacks of more aggressive vehicles. Furthermore, a well-designed index tracking methodology could be used to track not only equity indexes, but also commodity and bond indexes, like Goldman Sachs Commodity Index, which usually have countercyclical risk-return profiles with respect to stock market. Finally, index tracking has the benefit of allowing to focus on a specific asset, industry or geographical sector, but still getting a reliable amount of diversification within it. The simplest way of index replication is to invest in all of the assets composing the index. This is called full replication. Despite, theoretically, sounds the most correct way, full replication is not only cumbersome but also rather costly. Indeed, trading and monitoring costs hamper this approach especially in high-dimensional indexes, since the index composition is rather time-varying recquiring frequent rebalancing. Another relevant approach is represented by synthetic replication towards equity derivatives like future contracts. The latter, usually have singularly less transaction costs. However, rolling contracts to dynamically track the underlying index is rather expensive and risky. This makes equity derivatives strategies less attractive. Finally, an investor might consider partial replication of a benchmark index. Partial Indexing is the core business of ETFs and Tracker Funds, and the 2 framework we focus on this paper. Managers pursuing a partial index tracking strategy encounters two main problems, (1) select the optimal set of stocks to create the tracker index and, (2) optimally quantify the amount of wealth to invest in each of the stocks selected. In the present paper we propose a computationally tractable solution for the design of near-optimal replication strategies in which the investor limits the number of assets used to track the benchmark index. As opposed to most of the literature, the aim is to track the trajectory of a benchmark index instead the usual ubiquitous returns. We develop a twostep procedure where the key point is cointegration between the index and the tracker fund constituents. As pointed out in the seminal paper by Alexander and Dimitru (2005), the usual correlation based procedures produce relatively unstable tracking portfolios. Alexander (2001), Alexander and Dimitru (2005) and Dunis and Ho (2005) propose to use cointegration to capture stable long-run equilibria between tracker fund and the benchmark index. Their methodology however considers stock picking as kind of black-box procedure. Further they did not consider high-dimensional benchmarks. We extend the reference literature in several directions by, (1) taking carefully into account the stock picking procedure developing an hybrid genetic algorithm to extract the strongest cointegration relationships. In this sense we address the computational burden of making clear the stock-picking based on cointegration. Then (2) we consistently structure an objective function to track the normalized trajectory of a benchmark index. This is partly similar to Focardi and Fabozzi (2004) and Dose and Cincotti (2005). However, they applies clustering methods for stock picking instead of cointegration, and trajectory tracking without explicitly considering transaction costs as part of the objective function. Finally (3) we provide some further insight on the joint effect of cointegration stock picking and tracking trajectory disentangling their benefit as opposed to correlation/returns based procedures. The structure of the paper is as follows. Section 2 formalize the reason why we should rely on trajectory approximations instead of returns, providing some useful insight on why usual returns-based distance measures turns out to be suboptimal. Section 3 deals with the nature of the index tracking problem, describing the risk measure in the minimzation procedure, together with the transaction costs and the optimization program as a whole. 3 Section 4 presents the genetic algorithm used, the portfolio simulation strategy, and the performance measures used to check portfolio construction reliability. Section 5 shows the other benchmark index tracking methodologies used to compare the goodness of our index tracking algorithm. Then Section 6 describes the sample selection and the dataset. Finally Section 7 and 8 reports respectively the empirical results and the concluding remarks. 2. Setting objectives under classical distance measures: why trajectory instead of returns Let us consider a general benchmark index with {ki }K i=1 , constituent assets. Each stock has price pi (t) ∈ (0, ∞) and ci (t) the proportion invested in the ith asset such that the ith weight at time t is defined as ci (t)pi (t) i∈Φ ci (t)pi (t) wi (t) = P such that wi (t) ∈ (0, 1) (1) The population level of the index and the tracker portfolio can be respectively defined as a linear combination of price vectors as I(t) = K X i=1 ci (t) × pi (t) and V(t) = X ci (t) × pi (t) (2) i∈Ξ with Ξ the subset of N < K selected stocks. Therefore, considering the usual discrete time returns definition rI (t) = (I(t) − I(t − 1))/I(t − 1) for the index, ri (t) = (pi (t) − pi (t − 1))/pi (t − 1) for the ith stock and a usual mean squared loss function T T N 1X 1X X 2 M SE(rI , rp ) = (rp (t) − rI (t)) = wi (t)ri (t) − rI (t) T t=1 T t=1 i=1 !2 (3) it is easy to see that (1), (2) and the returns definition together introduce high nonlinearity in the index tracking problem (3) by using the returns definition. Indeed, this represents one of the most appealing motivations behind the index tracking literature. Now, considering prices dynamics, the index clearly readjusts over time. This is not only 4 from the endogenous changing nature of prices, but also exogenous factors like composite revisions, share issues and repurchases, as well as spinoffs, play a relevant role in this sense (see Frino et al. (2004) for more details). Some different strategies have been implemented in the literature. A first possible strategy is to maintain ci (t) constant over time in wi (t), thorough the tracking period. This procedure however, incorporate high non linearity and non-trivial optimization behaviors, because of the dynamics in pi (t) the portfolio weights evolves according to ci pi (t) wi (t) = PK , i=1 ci pi (t) ∀i = 1, ..., K (4) This limits its applicability in using classical exact optimization procedures. An alternative strategy proposed in Ammann and Zimmermann (2001) and exploited in Ruiz-Torrubiano and Suàrez (2009) is to hold wi (t) constant, actively managing ci (t), such that ci (t)pi (t) wi = PK , i=1 ci (t)pi (t) ∀i = 1, ..., K (5) This second strategy has the merit to keep the optimization problem quadratic, and computationally feasible, getting exact solutions. However, by using the MSE as loss function based P on approximating the tracker returns as rp (t) N i=1 wi (t)ri (t) entails ci (t) ≈ ci (t−1). This implies a relevant approximation/measurement error, especially in minimizing returns-based measures (see Ruiz-Torrubiano and Suàrez (2009), Derigs and Nickel (2004) and Beasley et al. (2003) among the others). This concept is formalized in proposition 1. A textbook proof is provided in the appendix. Proposition 1. Let us consider the returns-based MSE minimization as defined in (3). Now let us define the index returns as rI (t) = (I(t) − I(t − 1))/I(t − 1) and the returns of the ith stock as ri (t) = (pi (t) − pi (t − 1)/pi (t − 1). Then assuming the portfolio returns are defined P as rp = N i=1 wi (t)ri (t) entails ci (t) ≈ ci (t − 1). This introduces a relevant approximation P PN error since implies V (t) = N i=1 ci (t)pi (t) ≡ i=1 ci (t − 1)pi (t) in [t − 1, t). Another relevant issue in considering returns is information. Indeed, computing returns 5 implies detrending the price levels. This means losing a relatively important amount of information, especially about the long-run index/shares behavior. Finally, usual distance measure are strictly related to correlation among the index and the tracker fund. Indeed, is quite straightfoward to see that usually index tracking problems can be restated as least squares minimization, i.e. regressions (see Fan et al. (2008) and Brodie et al. (2008) for more details). Thus, the well known time-varying nature of correlation represents a further issue that limits returns-based measures usefulness, with respect to more general distance functionals. The methodology we propose aims to track the trajectory of the index instead of returns. Price levels are non-stationary, then usual correlation measures do not hold. Some relevant examples of distance functionals is given in Focardi and Fabozzi (2004) and Dose and Cincotti (2005). They used clustering techniques based on a modified correlation measure for integrated random variables. Our methodology is based on the concept of cointegration. Definition 1. A nonstationary stochastic process X ∼ I(1) which satisfies Xt − E(Xt ) = P∞ P∞ i=0 Ci 6= 0 and t ∼ i.i.dπ(0, Ω), for a general multivariate i=0 Ci t−i , is called I(0), with distribution π(.). Now, we call X ∼ I(1) cointegrated with β 6= 0 cointegrating vector if β 0 X ∼ I(0), i.e. can be made stationary. Strictly speaking two non stationary random variables are cointegrated if exists a non trivial linear relationship which is stationary. We exploit the concept of cointegration in the univariate framework developed in Granger (1981), Granger (1983) and Engle and Granger (1987). We deal more explicitly with the univariate cointegration in Section 3. Pioneering index tracking applications of cointegration can be found in Alexander (2001), Alexander and Dimitru (2005) and Dunis and Ho (2005). 3. Problem Formulation Let us consider a market with N securities with price pi (t) at each time t. By I(t) we define the level of the index at time t. Time is discrete, and each step the objective is to select the optimal subset of stocks n < N , such that some risk measure is minimized. 6 Now wi (t) ∈ (0, 1) represents the fraction of the portfolio value kept in the ith stock at the beginning of period t. We consider a dynamic setting, where the portfolio positions are readjusted over time. These changes are due to new market information. The composition of the tracking portfolio does not change, meaning that, we assume the stock selected at time t are still available at time t+h, with h the revision investment horizon. This is kind of reasonable if h is taken reasonably low. We applied an historical look-back approach implicitly as in Beasley et al. (2003). The assumption is that the past contains enough information to get the future potential dynamics of the index. This is reasonable as far as we do not make estimation of covariances. There is no prediction we deal with. This is outside the scope of the paper. Indeed, we applied population heuristics to get equilibria between index and tracker fund (cointegration) and numerical procedures to get the optimal vector of weights (objective). 3.1. Risk-measure and transaction costs Several risk measures have been proposed in the literature. Most of them are based on correlation measures or on estimates of variances of tracking deviations. The latter are however flawed. As noted in Beasley et al. (2003), if the difference between the index path and those of the tracking portfolios are constant over time, then the tracking error would be zero. This, of course, is an undesirable result because it does not take into account the tracking bias. In the current investigation a weighted average between the root mean squared error of the tracking deviation and the tracker excess returns is considered, penalizing constant differences, (see also Beasley et al. (2003) and Ammann and Zimmermann (2001)). The general baseline risk function considered is a modified version of Gaivoronski and Van der Wijst (2005) defined as follows ( v u " 2 # ) u µ µ I I t t t t Q(w, µt , It ) = λtE + (1 − λ)E − − µt−1 It−1 µt−1 It−1 (6) where µt = p̃t w is the trajectory of the tracking portfolio and p̃t = pt /p0 is the matrix of normalized prices. Notice that λ ∈ (0, 1), represents an implicit trade-off between tracking 7 error and excess return. For instance λ = 1, corresponds to minimizing tracking error, i.e. pure index tracking, while by imposing λ = 0 implies maximising the excess returns. In the empirical experiment we solve the optimization problem for different values of λ in his computational domain. The extension of (6), we develop takes also into account transaction costs directly in the decision problem as proposed by Derigs and Nickel (2004) and Adcock and Meade (1994). Transaction costs are considered as sort of second objective, allowing to consider those portfolios which are efficient both with respect to the risk measure and transaction costs. This not-only helps to regularize the optimization algorithm but above all helps the fund manager to discriminate those stocks with high transaction costs coming from, for instance, high liquidity risk and so high bid-ask spreds. As in Brodie et al. (2008) transaction costs are considered as T C(w, w, s) = n X si |wi − wi | (7) i=1 where si , wi , wi are respectively the per-share transaction cost, the optimal weights to be chosen and the old optimal weights. Notice that, wi = 0, at the first date of investment. Now, let us define the information set available to the manager as Dt = (w, s, µt , It ). Then we can write population version of the loss function as a combination of (6) and (7) as follows L(w, Dt ) = Q(w, Dt ) + T C(w, Dt ) (8) 3.2. Constraints and optimization problem Portfolio selection has to deal with investment preferences and law guidelines. Therefore the set W of all potential portfolios should be restricted to a smaller set of feasible portfolios. From a mathematical perspective the source of constraint is irrelevant, what matters is their structure. The most basic constraint is the budget constraint, i.e. e0 w = 1. There are several other usually considered manager-specific contraints. The most simple is the floor/ceiling 8 trading constraint, defined as Li ≤ wi ≤ Ui with Li , Ui representing respectively the lower and upper bounds for each portfolio weight. With the same rationale we could define bundle static constraints, by defining the lower and upper bound of trading in a certain industry P or geographical area, i.e LB ≤ i∈B wi ≤ UB . Gross-exposure constraints not only have economic relevance but also helps in regularize the optimization algorithm. Indeed, as depicted in Jagannathan and Ma (2003) and Fan et al. (2008), no-short sales together with gross-exposure helps minimizing the portfolio risk. This is not true in general, however is true if any dynamics or estimation is used in the optimization problem. As is Beasley et al. (2003), Ruiz-Torrubiano and Suàrez (2009) and Derigs and Nickel (2004) we consider the stock selection problem embedded in the weight selection by introducing an auxiliary binary variable z = {zi }N i=1 , such that zi = 1 if the ith is considered as outcome of stock picking procedure and zero otherwise. From the population version defined in (6) and (8) we can write the empirical counterpart with the trading constraints as follows w∗ ( v u 2 ) T T u1 X X p̃ w I p̃ w I 1 t t t t = min λt − − + (1 − λ) w∈W T t=1 p̃t−1 w It−1 T t=1 p̃t−1 w It−1 + N X si |wi − wi | i=1 s.t. w ∈ C(w) (9) where ( N ) N N X X X C(w) = w zi wi = 1, |wi zi | = 1, zi ≤ n i=1 i=1 (10) i=1 with n representing the user-defined maximum cardinality of the portfolio. As showed in Ruiz-Torrubiano and Suàrez (2009) including a cardinality constraint, which is discrete, leads to an NP-hard problem therefore highly nonlinear. A possible approach is to investigate the asset weights as a functional of z such that the solution of (9), (10) is expressed as {wi (z)}ni=1 for a fixed value of z. This is similar to the approach used in the present paper. The objective function to find out z∗ is actually very different from (9). For that reason we separate the 9 problem in two steps, namely stock picking (binary problem) and weight selection (minimize (9) for a given z), such that, the vector of weights turns out to be a functional of z∗ . The latter is a binary model selection problem as follows z∗ ∈ arg min R(I, P) z∈Z n X s.t. zi = n, with n < N (11) i=1 with R(I, P) the loss function and P ∈ P the subset of constituent stocks considered in the set P of available assets. Notice the loss used is a functional of the residuals of a generalised linear model in the spirit of univariate cointegration. In other words, for a fixed value of n we solved (11) with respect to R(I, P), then we plug z∗ in (9) to get w∗ . The risk function in the binary problem is found by using genetic algorithm and is detailed explained in the next section. 4. Stock Picking, Genetic Algorithm and the ADF loss function The stock selection problem is definitively a binary model selection one, and is somehow nested in (9). Considering the high non-stationarity of the random variables involved, classical Stochastic Search Variable Selection methods cannot be efficiently used in the usual least squares sense. This is true especially in large dimensional spaces, i.e. large indexes. This lack of inefficiency in classical methodologies justifies the usefulness of Genetic Algorithm [GA]. Indeed GA allows us to solve quite efficiently, helping to mitigate the computational burden. The loss function used in GA is as follows. Let us consider a linear model as log(It ) = N X zi βi log(pi,t )+t t ∼ π(0, σ 2 ) with i=0 1 for the i stock th zi = 0 otherwise (12) where It and pi,t represent respectively the index and the ith constituent prices, and π(.) a general distribution function. Our purpose is to choose z∗ such that β = {β1 , ..., βp } 10 represents the strongest equilibrium relationships between the index trajectory and the consituent stocks. By exploiting Definition 1, we can see that, given both log of index and log of prices are I(1) stochastic processes, then if β represent a cointegration relationship, ∼ I(0), namely the residuals are supposed to be stationary processes. The loss function is defined as R(I, P) ≡ R() = ρ̂ − 1 se(ρ̂) (13) with ρ a regression coefficient in the following auxiliary residual regression t = ρt−1 + D X ∆t−i + ut ≡ xt β + ut with ∆t = t − t−1 (14) i=1 where se(ρ̂) and D are respectively the relative standard error and the lag-order. The null hypothesis is H0 : ρ = 1, i.e. there is a unit root. The autoregressive univariate model in (14) is neither with intercept nor with a stationary trend since E(ut ) = 0. This is the well known Augmented Dickey Fuller [ADF]. The aim is to select the vector z∗ such that R() is minimized, since the lower the ADF test statistics, the higher the stationarity of residuals, meaning the stronger the equilibrium relationships between the index and the subset of selected stocks. The ADF is long-time used for pairwise trading algorithm and recently pioneered for index tracking problems as in Alexander (2001), Alexander and Dimitru (2005) and Dunis and Ho (2005) among the others. Originally has been developed to test the presence of unit root in a general stochastic process (see Engle and Granger (1987), Granger (1983) and Granger (1981) for more details on ADF and cointegration). 4.1. Genetic Algorithm As explained in the laset section the loss function to select the optimal indicator vector z∗ is highly non-linear. Then, finding out the best indicator vector and plugging it in (9) is no actually manageble with neither usual stochastic search nor brute forcse algorithms. Population heuristics are then used as computational tool to get quickly and fairly precisely to the model selection problem represented in (11). As aforementioned, the model selection 11 issue is to find out most cointegrated subset of stocks , where the degree of cointegration is a functional of the linear model residuals in (12). The Genetic Algorithm used is developed as follows. The P 0 and P 00 are n × k matrices, where each row rapresent a portfolio, i.e. a Algorithm 1 Genetic Algorithm 1: Generate initial population P , initialize pmut and pcross 2: while stopping criteria not met do 3: Select P 0 ⊂ P (mating pool) set P 00 = (set of child) 4: for i=1 to nP (population size) do 5: Select individuals ma and mb at random from P 0 6: if u(0, 1) < pcross then 7: cross-over: set mc = ma ∧ mb 8: else 9: do not cross-over: set mc = ma ∨ mb 10: end if 11: if u(0, 1) < pmut then 12: mutate mc into µ 13: else 14: do not mutate: set µ = mc 15: end if 16: set P 00 = P 00 ∪ µ 17: end for 18: set P = P 00 19: end while 20: Return first row of P vector of integers from 1 to N , where N is the number of stocks in the index. Intuitively, each integer represents a stock. Similarly, P 0 is a sub matrix of P whose row size depends on the parameters discussed below. When the solutions are taken individually as 1 × k vectors, they are labeled with m. The asymptotic convergence to global solution relies on the schema theorem by Holland (1974) and on Moral and Miclo (1999). They use the possibility of constructing Markov transition matrix from one generation to the other in order to apply standard Markov chain theory. In other words, as the population or the number of generations increase (i.e. higher problem difficulty), convergence is harder to reach in a reasonable amount of time. Therefore Genetic Algorithm’ s parameters are often chosen according to some general guidelines presented in literature and by calibrating, mainly by 12 trial and error, the algorithm to the problem (see Chen (2002)). Following Jong (1975) and Haupt and Haupt (2004) the value for pcross and pmut have been set to 0.9 and 0.3. This allow the algorithm to explore a wide set of initial solutions. As suggested by Sivanandam and Deepa (2008), they decrease over time, in order to make the algorithm to converge faster. Given its optimal properties, uniform crossover is adopted. When a solution is mutated n < N genes (stocks) are substituted by n randomly selected genes (stocks). However, none of the replacing elements belongs to the initial chromosome (portfolio); it would generate a non invertible regressor matrix. In order to improve the algorithm performance, elitism operator is used by setting the row size of P 0 0.5 times the one of P ; it implies that only the better half of the current population is used to breed the next generation. The algorithm stops whether at least half of the P matrix is populated by the same model, or the best model is the same for more than half of the total number of generations. In both cases keep running the algorithm would be useless, because it has, most probably, already converged towards a solution. On the light of these considerations, the only parameter to be tuned by trial and error is nP (Population Size). This is crucial because if it is too small, there would be not enough genetic diversification at damage of the quality of the final solution. If it is too big, too many iterations will be performed before it eventually converge. A set of different population sizes are chosen; for each value in this range the algorithm is run ten times and the median fit function computed. Finally, these values are plotted against the population sizes. The value for which the median fit function start converging, is chosen. The last point to be discussed is how we select the fifty portfolios. Instead of running the algorithms fifty times and picking the best solutions we run it once, rank all unique solutions and select the first 50. This is done not only to reduce the computational burden but also to have a more diversified set of final portfolios. 4.2. Portfolio simulation and rebalancing From the optimization algorithm depicted in (9), (10) and (11) we get K = 50 optimal vectors of weights w∗ , for each of the GA solutions. We took K > 1 possible optimal portfolios as robustness check. Our goal is to investigate the performance of the proce13 dure out-of-sample, against some other standard benchmark alternative procedures. The analysis rely on a rolling sample approach1 . In particular, given a dataset with T daily observations for each of the n stocks, we choose an estimation windows M = 24 months of daily observations. This is the training sample used to calibrate the stock selection model. Now, let us consider a period h of rebalancing, then at each time t, starting from t = M + h, we used the data in the previous M months to calibrate the optimization problem. The output vector of weights w∗ is then used to determine the relative tracker portfolio returns in t + h. This process is carried on by adding a period h of daily data and drop the earliest one to run the optimization procedure again. The outcome is then T − M monthly out-of-sample returns generated by each of the K portfolio solutions. The period T − M is called testing period, since here we test the reliability of the cointegration-based stock selection strategy. Notice that the portfolio composition is held from t = M to t = T . Then the GA is run to get the initial tracker fund composition, and the latter is not modified in the rebalancing procedure. This is to stress the reliability of cointegration in reducing turnover and transaction costs. Indeed, selecting the mostly cointegrated subset of stocks is aimed to benefit especially in the long-run. A drawback is that we should limit the investment horizon to one year because of the changing nature of the index composition. However, is outside our scope to dynamically model the composition of the index. The rebalancing horizons considered are h = 1, 3, 6, 12 months of daily data. Transaction costs are subtracted at each portfolio rebalancment. In case of one month rebalancing, h is made by 21 trading day observations. Finally we consider different values of n. We took this as percentages of the original benchmark with p = [0.05, 0.1, 0.15] such that n = p ∗ N represents the size of the tracker fund. 1 As robustness check the procedure has been redone using an enlarging approach. The results turn out to be qualitatively the same. However, we decided to reports the rolling basis approach since allows to impose some very simple dynamics on the rebalancing framework 14 4.3. Performance measures The output of the index tracking is a series of T −M out-of-sample daily returns generated by each of the K portfolios. Based on this output we computed several performance measures to assess the reliability of the conintegrated GA procedure. We chosen the performance measure in such a way to be consistent with the objective function (9) and the reference literature. As a first measure of closeness we compute the average beta of the tracker portfolios. This is done by regressing the index returns on each of the K tracking returns. The linear model is computed by Ordinary Least Squares. β̂ = 1 βi K with βi comes from rI (t) = βi ri,p (t) + ηt for i = 1, ..., K (15) where ηt stationary distributed. The βi in (15) gives a measure of the sensitivity of ith tracking portfolio return with respect to the index return. This is meant as closeness measure. Indeed, in principle, the closer β̂ is to 1, the closer is the tracking behavior with respect to the benchmark. The second measure adopted is the out-of-sample average Mean Squared Error computed as K 1 X \ M SEi M SE = K i=1 with TX −M 1 (ri (t) − rI (t))2 M SEi = T − M t=1 p (16) We adopt (16) because is consistent with some of the reference literature as Maringer and Oyewumi (2007) and Beasley et al. (2003) among the others, and because is consistent with the objective function we used in (9). The third measure adopted is the average Tracking Error volatility across the selected portfolios. The TEV is computed as K X 1 T[ EV = T EVi K i=1 with TX −M 1 T EVi = [(rI − ri ) − (rˆI − rˆi )]2 T − M t=1 (17) where rˆI and rˆi are respectively the mean out-of-sample index and ith tracking returns. This measure is meant to be consistent with most of the reference financial literature as Alexander 15 and Dimitru (2005), Jansen and Van Dijk (2002) and Jorion (2003). The last closeness measure is the average turn over. The average turn over provides a measure of stability of the portfolio selection procedure, namely the reliability of cointegration and GA together with weight selection as in (9). The average trading volume is computed as K 1 X ˆ T Oi TO = K i=1 with TX −M X N 1 i i T Oi = |wj,t+1 − wj,t | T − M t=1 j=1 (18) Finally, we consider a non-parametric perfomance measure. A non-parametric measure is useful since do not take into account outliers in determining the probability of getting en excess positive returns. This could be the case in using the expected excess returns. Again, what we are interested in is the probability to outperform the index, especially getting close to λ = 0. The non-paramteric measure used is just the average probability of a positive excess returns with respect to the index, and is defined as follows n 1 X \ P (ri > rI ) = P (ri > rI ) K i=1 (19) with P (ri > rI ) = 1{ri >rI } T −M where 1{ri >rI } 1 if r > r i I = 0 otherwise (20) Notice that the averages are taken across the K portfolios generated for each of the comparison methodologies. The latter are explained in more details in Section 5. 5. Comparing the algorithm: Alternative Methodologies We propose four different methodologies to test the reliability of the cointegrated index tracking methodology proposed earlier. They have been chosen to help in disentangling the joint benefit of cointegration in stock picking and the weight selection objective function (9). Notice that we made separately the stock picking and weight selection procedure. This is to 16 consistently compare these benchmark procedures and the cointegrated GA we proposed. 5.1. Correlation-based stock picking The first two methodologies are based on correlation between index returns and the consituent stocks. The “optimal” subset of stocks for the tracker fund is done by taking the most n correlated stocks with the index. To generate the K = 50 portfolios we used a stationary non-parametric bootstrap as in Politis and Romano (1994) getting K tracker funds. This exercise is done considering three different size of the funds with p = [0.05, 0.1, 0.15] of the benchmark such tha n = p × N , with N the index dimension. In this case, for p fixed, the stock selection is based on a z∗j binary selection variable defined as z∗j = rank {ρi,j }ni=1 with ρi,j = corr(ri,j , rI ) and j = 1, ..., K (21) where ri,j is the returns of the ith stock in the jth sampling. Again, rank(.) is a ranking operator sorting in descending order the most n correlated stocks with respect to the index. Then once made the stock picking we find the optimal portfolio weight by applying either (9) plus (10) or the following returns based counterpart w∗ ( v ) u T T u1 X X 1 = min λt (rt w − It )2 + (1 − λ) (rt w − It ) w∈W T t=1 T t=1 + N X si |wi − wi | i=1 s.t. w ∈ C(w) (22) where ( N ) N N X X X C(w) = w zi wi = 1, |wi zi | = 1, zi ≤ n i=1 i=1 i=1 17 (23) with rt the n-dimensional vector of the selected stocks. Notice that (22) is repeated for each of the K sampled portfolios, generating different solutions and getting averaged performance2 . 5.2. Random selection The methods referred as 3 and 4 in the empirical results are based on random stock selection. This method is taken as benchmark since taking a large subset of stocks in a broadly based indexes, likely implies the possibility to get cointegration between the stock selected and the index. Thus, comparing this method with the one we propose helps in pointing out the advantages of the genetic algorithm with the ADF as loss function. In general the binary selection vector is then defined as z∗ taking randomly n stocks among the N possible. Once we get the binary selection vector we applied either (9) or (22) to get respectively method 3 and 4. This helps in disentangling and compare the joint benefit of (9) and (11) the used in the cointegrated GA proposed. The stock and weight selection procedure is done K times to get the portfolios and averaging the performances. 6. Data 6.1. Sample Description In the empirical investigation we considered three different broadly based indexes. They are selected based on increasing dimensions. The aim is to disentangle the benefit of the GA jointly with the cointegration procedure as the computational burden gets bigger. The indexes considered are the FTSE 100, the NIKKEI 225 and the S&P500. Not only the dimension but also the way of constructing the index composition differs across indexes. Indeed the S&P 500 and the FTSE are value-weighted composite indexes while the Nikkei is one of the few example of price-weighted indexes. This helps in clarify the benefit of 2 The combination of (21) and (9) is referred as method 1 in the empirical results. While the method 2 of the alternative strategies refers to the combination of (21) and (22) 18 tracking the trajectories instead of the returns. The latter actually do not take into account even undirectly the weighting procedure implied peculiarly by price-weighted indexes3 . Tha database is from CRSP, Datastream and Bloomberg. The information used consists of daily closing price of components stocks over the period September 2005 - August 2008. The benchmarks are the aforementioned indexes, namely S& P500, NIKKEI 225 and FTSE 100. The descriptive statistics are reported in Table (1). [Insert Table 1 here] The sample period has been chosen purposely like that. This is to get bull and bear markets respectively in-sample and out-of-sample. The aim is to check the robustness of the stock picking (cointegration) procedure. Therefore, even though the goal is not estimate any covariance structure, we can implicitly argue that the algorithm actually get the strongest equilibria even in market regime switching. A plot of the index is reported in Figure (1) [Insert Figure 1 here] The grey vertical line indicates the end of the in-sample period for the GA stock selection. As we can see the in-sample is bull market while the out-of-sample is bear market, meaning that, there is a shift from one market regime to another at the beginning of the testing period. This helps in clarify the benefits of the cointegration approach as stock selection strategy for the tracker fund. 7. Empirical Results In this section we report the empirical results of the GA + cointegration procedure compared to the others benchmark/alternative methodologies. The results are reported 3 The value-weighted index is based on the capitalization. However considering the latter as the number of outstanding shares times the trading price, we can fairly say that the share weight in the index is a function of its price even though indirectly. 19 separately for each of the three indexes considered. Further the results are reported for three different values of λ in (9), precisely λ = [1, 0.75, 0.5]. Again, we consider different N s, meaning different percentages of the index as subset of stocks. Especially we take the p = [0.05, 0.1, 0.15] of the stocks such that N = p × K, with K the index dimension. Then we take the average across these percentages. This is meant as robustness exercise. Figure (2) reports the results for the average portfolio got with the GA procedure. [Insert Figure 2 here] Each table the first column represents the absolute value of the measure relatively to our algorithm. On the other hand from the second to the last column we report the ratio with the alternative measure as denominator and the GA procedure as numerator. Therefore a value greater than 1 means the GA has a greater value for that measure and the other way round. The alternative procedures are numbered as follows. Method 1 is the correlation stock picking with the (9) as objective function. Method 2 stands for the correlation-based stock selection strategy and the usual MSE as in (3). Then Method 3 and 4 represents the Randomly selected portfolios with respectively (9) and (3) as objective functions. Finally the βs are compared considering the ration |1 − β̂|/|1 − β 0 | where β 0 is the average beta from the K alternative portfolios generated. 7.1. FTSE 100 Table (2) reports the experimental results for the FTSE 100. The β̂ span from 0.78 to 0.80. All of the alternative procedures reports significantly lower ratios. This is true for all the four investment horizons considered and all of the λs. Again, the randomly selected portfolios seem to be better than the correlation-based. This provides some insight about the out-of-sample unreliability of the usual correlation as stock selection procedure. The reason why the randomly selected portfolios do not so bad is probably because taking a large subset of stocks in the index likely implies taking some weak cointegration relationship. [Insert Table 2 here] 20 The relationship between correlation based and randomly selected persists also with reference to the MSE. Again, this is because the aforementioned probability to randomly select a cointegration relationship in a widely spread index as the FTSE 100. As we can see, the GA procedure significantly outperform the others. This is true across the all of the lambdas and all of the investment horizons. The picture slightly changes for the average TEV. Indeed both the randomly selected portfolios and the correlation-based ones comparably underperform the proposed index tracking methodology. On the other hand, considering the probability to get positive returns as in (19), we can see that the correlation slightly outperform the randomly selected portfolios. The GA anyway gets the higher probability of positive excess returns. This is regardless the lambda considered, and persistent across investment horizons. [Insert Table 3 here] Table (3) Panel A, shows the average trading volume for the FTSE case. As we can see the GA cointegrated procedure for the short rebalancing term, namely 1 month and three month the performances are close to each other. This is true especially for λ = 1. On the other hand the GA starts to sensibly outperform the others procedures as the investment horizon increases, i.e. 6 and 12 months. This shows that the strongest cointegration relationships got by the GA towards the ADF actually allow to reduce the rebalancing needs getting more stable portfolios. This is perfectly in line with our prior assumption that cointegration matters more than correlation as a measure of (especially long-term) dependence. 7.2. NIKKEI 225 Table (4) reports the results for the Nikkei 225 index. With reference to the βs, the increases in absolute terms. This is fair enough since the percentages of the indexes taken as subset of stocks incresed proportionally to the index dimensions. In relative terms, we can point out the same rationale as for the FTSE case. Indeed, the randomly selection seems to be a though competitor in terms of tracker sensitivity. This is true especially if we use 21 (9) as objective function. However, this competitiveness of the randomly selection strategy disappear with respect to all of the other measures. Indeed, all of the four alternative procedures are clearly, equally, outperformed by the cointegrated GA method looking at the MSE measure. This is also true with respect to the average tracking error volatility. [Insert Table 4 here] The probability to get a positive excess returns shows another interesting picture. Indeed, all of the procedures are quite equally meaningful. However, generally speaking, the cointegrated GA procedure gets the same probability to realize an excess returns with less risk. Then the risk adjusted probability to get excess returns is anyway higher for the methodology we proposed in the paper. Finally, Panel B of Table (3) Panel B shows the average trading volume. We can point out that the correlation-based methods is clearly outperformed, while the random selection seems to be fairly comparable. The latter indeed turns out to be better especially for the long-term investment horizon. Howver, overall the cointegrated GA procedure turns out to outperform the random selection, since we get a lower risk and transaction costs adjusted expected returns. This is true since the probability of getting excess returns is adjuted for transaction costs, meaning that risk adjusted excess returns are cleaned with respect to trading volume. Then our procedures overall seems to outperform the others. 7.3. S&P 500 The last index is the most relevant for our purposes. This is because is the largest index in our dataset, then the most reliable in terms of curse of dimensionality. The prior is that the cointegrated GA procedure gets clearly better for all of the measures since the benefit of both the genetic algorithm efficiency and cointegration. Table (5) reports the results relative to the average tracker fund. As before the first column reports the absolute value for our methodology, while from the second to the last column we reports the relative(ratio) values where the GA is the numerator. 22 As we can see the βs are strongly better than the alternative procedures. The evidence is far stronger than for the others two indexes. This shows how our procedure gets better when the curse of dimensionality starts to become a serious issues. This makes sense since the rationale behind the GA and parsimony coming from cointegration in the stock picking procedure. The same path is stressed with reference to the MSE measure. Again, none of the four alternatives is comparabl close to the one we propose. The only that seems to be fairly closed is the one with random selection and (9) as objective function. Again this is because taking a fairly large subset implies taking some (maybe weak) equilibrium (cointegration) relationship. Then the proper optimization function we defined helps in getting closer to the index trajectory avoiding the usual measurement error aforementioned. [Insert Table 5 here] The average TEV points out the same rationale. The random selection does not so badly while the correlation based strategies are by far the worst. As before, this is true for all of the λs considered and across the index percentages. In relative terms the probability to get positive returns is fairly comparable for all of the four different methodologies. However, as in the other three indexes, the risk adjusted performance, is clearly in favour of the cointegrated GA procedure. The same argument relates the average trading volume showed in Panel C of Table (3). Although the random selection with (9) objective is fairly comparable in relative terms, our procedure is fairly better overall. Especially because the average trading volume for the longest investment horizon is largely in favour of our procedure. This is the objective that we had at the beginning to stress the reliability of getting stable tracker portfolios towards the strongest cointegration relationships. 8. Concluding remarks In this paper we consider an alternative index tracking methodology based on cointegration applied to high-dimensional indexes. The allocation procedure is separated in two steps, (1) getting the optimal subset of stocks minimizing a particular loss function, i.e. the 23 ADF, towards a genetic algorithm, and (2) getting the optimal vector of weights tracking the normalized trajectory of the index instead of ubiquitous returns. In this way we empirically investigate the theoretical benefits of cointegration between the tracking portfolio and the benchmark. The genetic algorithm allows to handle the binary model selection problem (stock picking) by using the ADF as loss function in an efficient way. We essentially exploit the fact that the tracker fund and the benchmark are tied together, especially in the long-run. Yet, we point out that, the usefulness of tracking trajectories instead of prices allows us to exploit the full amount of information including that in the common and linear trends in the non.stationary level of prices. This is clearly an advantage with respect to usual correlationbased distance measures, which are widely known to be unstable across time. We test the reliability of our procedure against four different alternative methods. The first two refers to correlation-based stock picking, while the others are based on random portfolio selection. In all of the cases we applied the stock selection to different objective weight selection functions. This allows us to disentagle the joint benefit of using the strongest equilibria (cointegration) as stock picking guidelines and benchmark trajectories. In the empirical investigation we considered three different indexes, four different investment horizons and three different size of tracker funds. Yet, all of the performances are related to the average of 50 generated portfolios for each of the procedures and rebalancing periods. Our tracker funds raise some interesting results. We outperform the aforementioned alternative procedures with respect to all of the distance measures, Mean Squared Error, Tracking Error Volatility and Beta. The same is true for the risk adjusted probability of getting positive excess returns. Finally the average trading volume turns out to be lower than the competing models especially for longer investment horizons. This is coherent with the inherent benefit of cointegration as equilibrium long-run relationship between the tracker fund and the benchmark. 24 Appendix A: Proposition 1 Let us consider the usual definition of portfolio value, for n < N subset of stocks, at P time t, V (t) = ni=1 ci (t)pi (t). The portfolio return is V (t) − V (t − 1) rp (t) = V (t − 1) (24) Then considering a usual tracker portfolio return approximation defined as rp (t) = with wi (t) = Pnci (t)pi (t) i=1 ci (t)pi (t) Pn i=1 wi (t)ri (t) and ri (t) = (pi (t) − pi (t − 1))/pi (t − 1). Then we can see that X n V (t) − V (t − 1) rp (t) = ≡ wi (t)pi (t) V (t − 1) i=1 ⇔ ci (t) ≈ ci (t − 1) (25) Indeed (25) equivalence comes from X n V (t) − V (t − 1) ci (t)pi (t) − ci (t − 1)pi (t − 1) pi (t − 1) Pn × rp (t) = ≡ V (t − 1) pi (t − 1) i=1 ci (t − 1)pi (t − 1) i=1 (26) together with the assumption ci (t) ≈ ci (t − 1). Then this incorporates a quite meaningful amount of error coming from rp (t) approximation. This is especially relevant using (3) as primary objective function. References Adcock, C. and Meade, N. (1994). A simple algorithm to incorporate transaction costs in quadratic optimization. European Journal of Operational Research, 79:85–94. Alexander, C. (2001). Optimal hedging using cointegration. Phiosophical Transactions of the Royal Society, Series A:2039–2058. Alexander, C. and Dimitru, A. (2005). Indexing and statistical arbitrage. Journal of Portfolio Management, Winter:50–63. Ammann, M. and Zimmermann, H. (2001). Tracking error and tactical asset allocation. Financial Analysts Journal, 57(2):32–43. Barber, B. and Odean, T. (2000). Trading is hazardous to your wealth: the common stock investment performance of individual investors. Journal of Finance, 55(2):773–806. 25 Beasley, J. E., Meade, N., and Chang, T. J. (2003). An evolutionary heuristic for the index tracking problem. European Journal of Operations Research, 148(3):621–643. Brodie, J., Daubechies, I., De Mol, C., Giannone, D., and Loris, I. (2008). Sparse and stable markowitz portfolios. ECB, Working Paper Series, No. 936:4–19. Chen, S., editor (2002). Genetic Algorithms and Genetic Programming in computational finance. Derigs, U. and Nickel, N. H. (2004). On a local-search heuristic for a class of tracking error minimization problems in portfolio management. Annals of Operations Research, 131:45–77. Dose, C. and Cincotti, S. (2005). Clustering of financial time series with application to index and enhanced index tracking portfolio. Physica A, 355:145–151. Dunis, C. and Ho, R. (2005). Cointegration portfolios of european equities for index tracking and market neutral strategies. Journal of Asset Management, 6:33–52. Engle, R. F. and Granger, C. W. J. (1987). Cointegration and error correction: Representation, estimation and testing. Econometrica, 55:251–276. Fan, J., Zhang, J., and Yu, K. (2008). Asset allocation and risk assessment with gross exposure constraints for vast portfolios. Preprint, Princeton University. Focardi, S. and Fabozzi, F. (2004). A methodology for index tracking based on time-series clustering. Quantitative Finance, 4:417–425. Frino, A., Gallagher, D., Neubert, S., and Oetomo, T. (2004). Index design and implications for index tracking. The Journal of Portfolio Management, Winter 2004:89–95. Gaivoronski, A., K. S. and Van der Wijst, N. (2005). Optimal portfolio selection and dynamic benchmark tracking. European Journal of Operational Research, 163:115–131. Granger, C. W. J. (1981). Some properties of time series data and their use in econometric model specification. Journal of Econometrics, 16:121–30. Granger, C. W. J. (1983). Cointegrated variables and error correction models. Discussion Paper, UCSD. Haupt, R. and Haupt, S. (2004). Practical genetic algorithms. Wiley. Holland, J. (1974). Adaptation in natural and artificial systems. University of Michigan press. Jagannathan, R. and Ma, T. (2003). Risk reduction in large portfolios: Why imposing the wrong constraint helps. Journal of Finance, 58(4):1651–1684. Jansen, R. and Van Dijk, R. (2002). Optimal benchmark tracking with small portfolios. The Journal of Portfolio Management, Winter 2002:33–39. Jensen, M. (1968). The performance of mutual funds in the period 1945-1964. Journal of Finance, 23(2):389– 416. Jong, J. D. (1975). An analysis of the behavior of a class of genetic adaptive systems. Jorion, P. (2003). Portfolio optimization with tracking-error constraints. Financial Analysts Journal, 26 September/October 2003:70–82. Maringer, D. and Oyewumi, O. (2007). Index tracking with constrained portfolios. Intelligent Systems in Accounting, Finance and Management, 15:57–71. Moral, P. D. and Miclo, L. (1999). On the convergence and applications of generalised simulated annealing. SIAM Journal of Controp Optimization, 2:109–135. Politis, N. and Romano, J., P. (1994). The stationary bootstrap. Journal of the American Statistical Association, 89. Ruiz-Torrubiano, R. and Suàrez, A. (2009). An hybrid optimization approach to index tracking. Annals of Operations Research, 166:57–71. Sivanandam, S. and Deepa, S. (2008). Introduction to Genetic Algorithms. TABLES AND FIGURES Table 1: Descriptive Statistics This table reports descriptive statistics of the considered indexes. Period from Sep 2005 to Aug 2008, T = 756 daily observations. Notice ValueW means Value-Weighted index composition. The same is true for Price-Weights. Data are not express in percentages. FTSE 100 NIKKEI 225 S&P 500 Num. Stocks Mean Median 81 -0.0001 0.0000 208 0.0002 0.0000 442 0.0001 0.0004 27 St. Dev. 0.0117 0.0131 0.0094 Skew Kurt Type 0.1304 8.7012 ValueW -0.4368 4.6387 PriceW -0.1624 5.2076 ValueW Table 2: FTSE 100 This table reports the results for the FTSE 100 index. The sample comes from Sep 2005 to Aug 2008 with T = 756 daily observations. The in-sample lenght is M = 504 daily observations with a testing period of T − M = 252 daily observations. A month is represented by 21 trading days. Model 1 and 2 are correlation-based stock picking while model 3 and 4 entails random selected portfolios. Beta represents the OLS coefficient of regressing the index returns on the tracker returns, Mse is the Mean Squared Error, Tev represents the Tracking Error Volatility while Prob is the probability to get a positive excess return. Out-of-Sample Performances λ 1 Horizon 1 3 6 12 0.75 1 3 6 12 0.5 1 3 6 12 GAcoin 0.805 0.803 0.793 0.787 0.808 0.800 0.789 0.782 0.807 0.800 0.792 0.784 λ 1 GAcoin 3.40e-002 3.42e-002 3.54e-002 3.60e-002 3.34e-002 3.45e-002 3.59e-002 3.68e-002 3.37e-002 3.45e-002 3.53e-002 3.62e-002 Horizon 1 3 6 12 0.75 1 3 6 12 0.5 1 3 6 12 Beta 1 2 0.535 0.569 0.549 0.590 0.559 0.593 0.520 0.559 0.512 0.555 0.546 0.601 0.566 0.624 0.525 0.580 0.528 0.556 0.553 0.600 0.562 0.607 0.522 0.569 3 0.702 0.742 0.735 0.758 0.684 0.732 0.759 0.784 0.679 0.734 0.739 0.761 4 0.778 0.816 0.837 0.819 0.735 0.803 0.797 0.786 0.794 0.829 0.829 0.819 GAcoin 8.35e-005 8.68e-005 9.24e-005 9.89e-005 7.81e-005 8.85e-005 9.57e-005 1.04e-004 8.07e-005 8.77e-005 9.01e-005 9.72e-005 1 0.698 0.730 0.699 0.567 0.631 0.740 0.739 0.590 0.680 0.750 0.706 0.552 Mse 2 0.759 0.819 0.763 0.606 0.704 0.846 0.827 0.644 0.723 0.827 0.757 0.598 3 0.696 0.714 0.694 0.667 0.679 0.775 0.770 0.750 0.699 0.758 0.712 0.681 4 0.846 0.867 0.867 0.836 0.732 0.856 0.816 0.784 0.865 0.878 0.834 0.805 Tev 2 0.847 0.870 0.838 0.745 0.826 0.884 0.876 0.769 0.831 0.878 0.851 0.752 3 0.820 0.824 0.818 0.807 0.816 0.849 0.851 0.842 0.822 0.843 0.833 0.820 4 0.892 0.895 0.896 0.875 0.848 0.890 0.871 0.853 0.904 0.904 0.891 0.874 GAcoin 49.54 49.05 49.22 49.13 49.26 49.02 49.21 48.92 49.07 49.07 49.09 49.01 1 1.189 1.149 1.200 1.180 1.180 1.159 1.192 1.181 1.173 1.153 1.185 1.175 Prob 2 1.161 1.131 1.159 1.136 1.153 1.130 1.161 1.142 1.138 1.131 1.151 1.130 3 1.041 1.029 1.030 1.033 1.040 1.026 1.030 1.030 1.034 1.025 1.029 1.033 4 1.041 1.020 1.032 1.030 1.043 1.029 1.041 1.036 1.034 1.023 1.033 1.034 1 0.799 0.813 0.798 0.707 0.768 0.811 0.813 0.715 0.793 0.819 0.806 0.706 28 Table 3: Average Trading Volumes This table reports the results for the trading volumes. The results are in percentages. The sample comes from 15-Sep-2005 to 6-Aug-2008 with T = 756 daily observations. The in-sample lenght is M = 504 daily observations with a testing period of T − M = 252 daily observations. A month is represented by 21 trading days. Model 1 and 2 are correlation-based stock picking while model 3 and 4 entails random selected portfolios. Panel A Ftse 100 λ Horizon GAcoin 1 2 1 1 8.73 0.948 0.874 3 18.39 0.898 0.933 6 18.63 0.567 0.464 0.75 1 7.89 0.871 0.766 3 17.59 0.815 0.869 6 18.19 0.552 0.520 0.5 1 7.23 0.883 0.711 3 16.51 0.829 0.839 6 15.64 0.483 0.456 3 1.027 1.010 0.697 0.881 0.854 0.643 0.806 0.832 0.518 4 0.851 1.001 0.916 0.767 0.897 0.844 0.728 0.911 0.782 Panel B Nikkei 225 λ Horizon GAcoin 1 2 3 1 1 4.89 0.763 0.681 0.999 3 7.46 0.614 0.630 0.817 6 14.60 0.858 0.884 1.142 0.75 1 5.26 0.805 0.745 1.067 3 7.46 0.610 0.659 0.814 6 14.89 0.894 1.123 1.186 0.5 1 5.03 0.753 0.697 1.017 3 7.63 0.633 0.735 0.843 6 15.18 0.946 1.352 1.215 4 0.890 0.840 1.304 0.950 0.777 1.515 0.880 0.784 1.435 Panel C S&P 500 λ Horizon GAcoin 1 2 1 1 2.40 0.603 0.514 3 3.47 0.618 0.467 6 3.33 0.315 0.209 0.75 1 2.17 0.531 0.463 3 3.72 0.714 0.512 6 3.83 0.363 0.239 29 0.5 1 2.20 0.555 0.471 3 3.53 0.674 0.460 6 3.51 0.333 0.223 3 4 1.032 0.738 0.943 0.708 0.647 0.500 0.986 0.718 1.086 0.861 0.753 0.607 1.047 0.743 1.064 0.840 0.649 0.579 Table 4: NIKKEI 225 This table reports the results for the Nikkei 225 index. The sample comes from 2-Aug-2005 to 23-June-2008 with T = 756 daily observations. The in-sample lenght is M = 504 daily observations with a testing period of T − M = 252 daily observations. A month is represented by 21 trading days. Model 1 and 2 are correlation-based stock picking while model 3 and 4 entails random selected portfolios. Beta represents the OLS coefficient of regressing the index returns on the tracker returns, Mse is the Mean Squared Error, Tev represents the Tracking Error Volatility while Prob is the probability to get a positive excess return. Out-of-Sample Performances λ 1 Horizon 1 3 6 12 0.75 1 3 6 12 0.5 1 3 6 12 GAcoin 0.932 0.929 0.931 0.931 0.930 0.927 0.929 0.929 0.931 0.927 0.929 0.929 λ 1 GAcoin 1.94e-002 1.95e-002 1.95e-002 1.91e-002 1.94e-002 1.94e-002 1.94e-002 1.91e-002 1.93e-002 1.94e-002 1.94e-002 1.91e-002 Horizon 1 3 6 12 0.75 1 3 6 12 0.5 1 3 6 12 Beta 1 2 0.486 0.508 0.537 0.546 0.552 0.602 0.569 0.606 0.495 0.526 0.528 0.584 0.560 0.640 0.577 0.645 0.485 0.524 0.542 0.602 0.564 0.646 0.570 0.646 3 0.901 0.925 1.012 1.095 0.948 0.944 1.039 1.109 0.934 0.971 1.046 1.104 4 0.700 0.774 0.827 0.824 0.730 0.758 0.800 0.818 0.682 0.754 0.766 0.784 GAcoin 2.45e-005 2.45e-005 2.46e-005 2.36e-005 2.44e-005 2.43e-005 2.44e-005 2.34e-005 2.44e-005 2.45e-005 2.44e-005 2.34e-005 1 0.841 0.865 0.854 0.838 0.869 0.867 0.886 0.863 0.863 0.898 0.889 0.862 Mse 2 0.727 0.727 0.759 0.739 0.732 0.757 0.793 0.768 0.764 0.829 0.823 0.795 3 0.759 0.744 0.753 0.761 0.761 0.749 0.753 0.763 0.764 0.764 0.754 0.764 4 0.707 0.714 0.755 0.713 0.760 0.731 0.750 0.724 0.693 0.707 0.696 0.681 Tev 2 0.869 0.876 0.892 0.881 0.872 0.892 0.906 0.894 0.888 0.921 0.918 0.905 3 0.885 0.881 0.886 0.891 0.887 0.884 0.886 0.892 0.889 0.891 0.887 0.893 4 0.876 0.881 0.899 0.881 0.888 0.879 0.888 0.878 0.858 0.870 0.865 0.857 GAcoin 42.44 45.83 46.58 47.33 46.61 46.12 46.47 48.67 46.12 47.21 44.95 48.67 1 0.933 0.991 1.030 1.050 1.037 1.030 1.032 1.081 1.030 1.050 1.000 1.088 Prob 2 0.918 0.998 1.010 1.027 1.001 0.989 0.996 1.046 0.989 1.014 0.961 1.047 3 0.925 0.995 1.004 1.021 0.994 1.003 1.006 1.054 1.003 0.971 1.032 1.049 4 0.918 0.987 1.005 1.024 0.983 0.999 1.015 1.053 0.999 1.014 1.037 1.054 1 0.926 0.939 0.935 0.926 0.938 0.937 0.946 0.935 0.935 0.952 0.948 0.935 30 Table 5: S&P 500 This table reports the results for the S&P 500 index. The sample comes from 15-Sep-2005 to 6-Aug-2008 with T = 756 daily observations. The in-sample lenght is M = 504 daily observations with a testing period of T − M = 252 daily observations. A month is represented by 21 trading days. Model 1 and 2 are correlation-based stock picking while model 3 and 4 entails random selected portfolios. Beta represents the OLS coefficient of regressing the index returns on the tracker returns, Mse is the Mean Squared Error, Tev represents the Tracking Error Volatility while Prob is the probability to get a positive excess return. Out-of-Sample Performances λ 1 Horizon 1 3 6 12 0.75 1 3 6 12 0.5 1 3 6 12 GAcoin 0.947 0.959 0.963 0.962 0.950 0.956 0.962 0.960 0.952 0.961 0.965 0.965 λ 1 GAcoin 1.66e-002 1.61e-002 1.60e-002 1.61e-002 1.63e-002 1.61e-002 1.59e-002 1.60e-002 1.61e-002 1.59e-002 1.58e-002 1.58e-002 Horizon 1 3 6 12 0.75 1 3 6 12 0.5 1 3 6 12 Beta 1 2 0.221 0.218 0.212 0.202 0.211 0.197 0.194 0.177 0.210 0.208 0.200 0.192 0.200 0.188 0.182 0.166 0.219 0.218 0.210 0.201 0.207 0.195 0.191 0.175 3 0.661 0.698 0.697 0.751 0.661 0.695 0.700 0.759 0.681 0.728 0.726 0.806 4 0.504 0.527 0.521 0.515 0.493 0.511 0.513 0.498 0.558 0.589 0.588 0.569 GAcoin 1.77e-005 1.66e-005 1.64e-005 1.66e-005 1.70e-005 1.65e-005 1.64e-005 1.91e-002 1.67e-005 1.61e-005 1.60e-005 1.61e-005 1 0.271 0.212 0.203 0.163 0.257 0.210 0.162 0.935 0.253 0.206 0.197 0.158 Mse 2 0.271 0.195 0.179 0.133 0.259 0.197 0.134 0.894 0.258 0.192 0.176 0.133 3 0.863 0.826 0.812 0.871 0.868 0.870 0.919 0.892 0.840 0.845 0.830 0.905 4 0.666 0.637 0.628 0.613 0.625 0.613 0.594 0.878 0.723 0.719 0.721 0.688 Tev 2 0.528 0.449 0.431 0.374 0.516 0.451 0.429 0.374 0.515 0.446 0.426 0.373 3 0.951 0.929 0.922 0.952 0.952 0.952 0.941 0.978 0.940 0.941 0.935 0.972 4 0.856 0.837 0.829 0.824 0.835 0.831 0.829 0.822 0.875 0.871 0.871 0.858 GAcoin 49.30 49.27 49.26 49.17 49.15 49.38 49.07 49.07 49.23 49.31 49.21 49.15 1 1.158 1.168 1.174 1.184 1.152 1.173 1.170 1.183 1.156 1.169 1.172 1.183 Prob 2 1.181 1.195 1.202 1.221 1.180 1.200 1.198 1.217 1.177 1.189 1.197 1.220 3 1.025 1.023 1.023 1.013 1.019 1.021 1.017 1.011 1.022 1.017 1.018 1.012 4 1.018 1.018 1.018 1.015 1.015 1.017 1.021 1.015 1.014 1.015 1.021 1.023 1 0.528 0.470 0.461 0.417 0.514 0.468 0.458 0.415 0.511 0.464 0.454 0.411 31 Figure 1: Indexes This figure reports the three indexes considered. The grey vertical line represents the in-sample end for the GA stock picking procedure. As we can see there is a regime switch from bull to bear market at the beginning of the out-of-sample period. 32 Figure 2: Indexes and the Cointegrated GA procedure This figure reports the three indexes considered together with the average trakcer fund got by our proposed approach. The series reported is the T − M out-of-sample period. 33
© Copyright 2026 Paperzz