Dynamic Model Averaging for Practitioners in Economics

Dynamic Model Averaging for Practitioners in
Economics and Finance: The eDMA Package
Leopoldo Catania
Nima Nonejad
University of Rome, “Tor Vergata”
Aalborg University and CREATES
arXiv:1606.05656v1 [stat.CO] 17 Jun 2016
Abstract
Raftery, Kárnỳ, and Ettler (2010) introduce an estimation technique called Dynamic
Model Averaging (DMA). In their application, DMA is used to the problem of predicting
the output strip thickness for a cold rolling mill, where the output is measured with a time
delay. Recently, DMA has also shown to be very useful in macroeconomic and financial
applications. In this paper, we present the eDMA package for DMA estimation in R,
which is especially suited for practitioners in economics and finance. Our implementation
proves to be up to 133 times faster then a standard implementation on a single–core
CPU. Using our package, practitioners are able perform DMA on a standard PC without
needing to resort to clusters with large memory. We demonstrate the usefulness of this
package through some simulation experiments and an empirical application using quarterly
inflation data.
Keywords: Dynamic model averaging, Multi core CPU, Parallel computing, R, OpenMP.
1. Introduction
Modeling and forecasting economic variables such as real GDP, inflation and equity premium
is of clear importance for researchers in economics and finance. For instance, forecasting inflation is crucial for central banks with regards to conducting optimal monetary policy. Similarly,
understanding the behavior of equity premium is one most widely important topics discussed
in financial economics as it has great implications on portfolio choice and risk management.
In order to obtain the best forecast as possible, practitioners often try to take advantage of the
many potential predictors available and seek to combine the information from these predictors
in an optimal way, see Groen, Paap, and Ravazzolo (2013), Stock and Watson (1999), Stock
and Watson (2008) just to mention a few references. In the context of economic applications,
Koop and Korobilis (2011) and Koop and Korobilis (2012), implement a technique developed
by Raftery et al. (2010), referred to as Dynamic Model Averaging (DMA). The original intent
of the mentioned article is to predict the output strip thickness for a cold rolling mill, where
the output is measured with a time delay. Besides allowing for time–variation in the regression coefficients of each individual model among the total number of model combinations,
DMA also allows the relevant model set to change with time as well. Koop and Korobilis
(2011) and Koop and Korobilis (2012) argue (very elegantly) that by slightly adjusting the
original framework, DMA can be used in economic context, especially inflation forecasting.1
1
Specifically, Koop and Korobilis (2012) change the conditional volatility formula of Raftery et al. (2010)
2
eDMA: Efficient Dynamic Model Averaging
Dangl and Halling (2012) provide further suggestions on how to improve DMA such that it
better suits the nature of economic data. More importantly, the aforementioned authors also
provide a very useful variance decomposition scheme using the output from the estimation
procedure. Byrne, Korobilis, and Ribeiro (2014) use the modifications proposed in Dangl and
Halling (2012) to model currency exchange–rate behavior.
However, designing an efficient DMA algorithm remains a challenging issue. As previously
mentioned, DMA considers all possible combinations from a set of predictors. Typically, many
candidate variables are available and, as a consequence, it poses a limit given the computational facilities at hand, which, for many practitioners, typically consists of a standard 8 core
CPU. Thus, while handling a relatively small number of model combinations, typically 1000
to 3000, allows one to perform DMA using standard loops and software, handling a larger
number of models becomes very burdonsome, especially in the context of memory consumption. In order to deal with this issue, Onorante and Raftery (2016) suggest a strategy that
considers not the whole model space, but rather a subset of models and dynamically optimizes
the choice of models at each point in time. However, Onorante and Raftery (2016) have to
assume that models do not change too fast over time, which is not an ideal assumption when
dealing with financial and, in some cases, monthly economic data. Furthermore, the approach
of Onorante and Raftery (2016) is not able to clearly state why certain predictors can be considered as less important than others. Finally, incorporating the modifications suggested in
Dangl and Halling (2012), which are very important with regards to model interpretation, is
impossible using the approach of Onorante and Raftery (2016).
In this paper, we introduce the eDMA package for R (R Core Team 2015), which efficiently
implements a DMA procedure based on Raftery et al. (2010) and Dangl and Halling (2012).
The routines in the eDMA package are principally written in C++ using the armadillo
library of Sanderson (2010) and then made available in R exploiting the Rcpp and RcppArmadillo packages of Eddelbuettel, Francois, Allaire, Ushey, Kou, Bates, and Chambers (2016a)
and Eddelbuettel, Francois, and Bates (2016b), respectively. Furthermore, the OpenMP API
(OpenMP 2008) is used to speedup the computations when a shared memory multiple processors hardware is available, which, nowadays, is standard for the majority of commercial
laptops. However, if the hardware does not have multiple processors, the eDMA package can
still be used with the classical sequential CPU implementation.
We must emphasize that the purpose of this paper is not to provide a detailed technical
analysis of the computational facilities of R nor its comparison with other packages. On the
other hand, we aim to provide a package that can be used by a large audience from different
academic fields. More importantly, our package allows practitioners to perform DMA using a
large number of predictors without needing to resort to understanding complex programing
concepts such as “how to efficiently allocate memory”, or “how to efficiently parallelize the
computations”, and so forth. It is worth noting that, within the R environment, the package
dma downloadable from CRAN can be used to perform the DMA of Raftery et al. (2010).
However, dma has several weaknesses such as (i): Does not allow for the extensions of Dangl
and Halling (2012), which are important in the context of interpreting the amount of time–
variation in the regression coefficients and performing a variance decomposition analysis, (ii):
It is remarkably slow compared to our alternative, (iii): Requires a very large amount of RAM
when executed for moderately large applications, (iv): Does not allow for parallel computing.
arguing that their original formula is not suited in the context of economic data. Their suggestion proves more
compatible with the behavior of economic data such as inflation.
Leopoldo Catania, Nima Nonejad
3
We refer the reader interested in these aspects to Section 4, where we report a comparative
analysis between dma and eDMA using simulations. The empirical application in Section 6
intuitively presents different features of this package, provides illustrations on how practitioners can use the output from the estimation procedure to interpret data and draw possibly
more nuanced conclusions.
The structure of this paper is as follows: Sections 2 and 3 briefly introduce DMA and its
extension. Section 4 presents the technical aspects. Section 5 provides an intuitive description of the challenges that DMA posses from a practical point of view. Section 6 provides
an empirical application to demonstrate the advantages of eDMA from a practical point of
view. Therefore, practitioners who are solely interested on how to implement DMA with the
eDMA package can skip Section 2. Finally, Section 7 concludes.
2. The Model
In this section, we briefly introduce the DMA of Raftery et al. (2010). We then pinpoint some
of the main challenges associated with applying DMA in economic and financial applications.
Thereafter, we propose solutions in Section 5.
Assume a time–series of T observations and i = 1, ..., k a pool of candidate Dynamic Linear
Models (DLM), see West and Harrison (1999) for more details,
(i)0 (i)
(i)
(i)
(i)
(1)
yt = Ft θt + εt , εt ∼ N 0, Vt
(i)
(i)
(i)
(i)
(i)
.
(2)
θt = θt−1 + ηt , ηt ∼ N 0, Wt
In other words, we have a set of models that are characterized by having different subsets of
(i)
predictors, Ft from the total number of n predictors available. Let p denote the numbers
(i)
(i)
of explanatory variables in Ft for model i. Then, θt is a p × 1 vector of time–varying
regression coefficients, which evolve according to (2). Note that we do not assume any sys(i)
(i)
tematic movements in θt , on the contrary, we consider changes in θt as unpredictable.2 The
(i)
(i)
conditional variances, Vt and Wt for model i are the unknown quantities associated with
(i)
the observational equation, (1), and the state equation, (2). Obviously, when Wt = 0, then
(i)
θt is constant over time. Thus, our model automatically nests the specification of constant
(i)
(i)
regression coefficients. As Wt increases, the variation in θt changes according to (2).3
n
DMA considers a total of k = 2 −1 possible combinations of the predictors.4 It then averages
across k combinations using a recursive updating scheme. At each point in time, the ranking
of the models is based on the predictive likelihood. Thus, DMA allows for time–variation in
the model parameters as well as the entire model set. More importantly, DMA avoids the
(i)
difficult task of specifying Wt by using a forgetting factor, 0 < δ ≤ 1, which describes the
loss of information through time. Using the same notation as the Appendix in Dangl and
(i)
(i)
Halling (2012), we define: (i): Rt as the unconditional variance of θt (see equation (14)),
(i)
(i)
(i)
(ii): Ct as the estimator for the variance of θt , (see equation (20)), and (iii): St as the
2
See Dangl and Halling (2012) and Koop and Korobilis (2012) for a similar model specification.
Typically, the way one updates the forgetting factor, δ, that determines the magnitude of the shocks that
hit θt at each t)avoids unreasonable behavior of θt . Thus, we do not need to put any restriction on θt itself,
see Dangl and Halling (2012) for more details.
4
The model yt = εt is not considered in the universe of models, see also Dangl and Halling (2012).
3
4
eDMA: Efficient Dynamic Model Averaging
estimator of the observational variance (see equation (17)). Then, using the forgetting factor
(i)
(i)
(i)
δ, we can rewrite Rt = δ −1 Ct−1 , indicating that there is a relationship between Wt and δ,
(i)
which is given as Wt
(i)
(i)
= (1 − δ) /δCt−1 . This equation shows that the loss of information is
(i)
proportional to St−1 Ct−1 . In this way, we can control the magnitude of the shocks that affect
(i)
θt by adjusting δ, without the need to resort to simulation techniques. Accordingly, δ = 1
(i)
corresponds to Wt = 0, i.e. θt is constant over time. Thus, (1)–(2) with δ = 1 corresponds
(i)
to the linear regression model. For δ < 1, we introduce more time–variation in θt . For
instance, with δ = 0.98 the variance increases by 50% within 20 months, which corresponds
(i)
to gradual time–variation in θt . Similarly, fixing δ at 0.96, translates into 50% increase in 10
months, suggesting relatively more abrupt changes in the regression coefficients. Evidently,
while this renders the model more flexible to adapt to changes in yt , the increased variability
(i)
in θt translates into high prediction variance. The same line of argumentation holds for the
additional parameter α, which controls the forgetting for the entire model set, see Raftery
et al. (2010). For instance α = 0.95 means that past model performance is less important
(i)
than when α = 0.99. Finally, we must also model Vt . Here we have several choices, which
we go into more details below. To summarize, DMA depends on:
• (a): The number of predictors to include in Ft . Typically, in economic applications,
the set of predictors contains exogenous variables as well as lagged values of yt . For
instance, in the context of forecasting quarterly inflation, besides considering predictors
such as Unemployment rate and T–bill rates, Koop and Korobilis (2012) also consider
the first three lags of yt in Ft .
• (b): The choice of α and δ. Typically, in many applications α ∈ {0.98, 0.99, 1} works
very well and results do not change drastically. On the other hand, as previously
mentioned, the choice of δ is more important. Koop and Korobilis (2012) perform
DMA by fixing δ at {0.95, 0.98, 0.99, 1} and run DMA using each of these values. They
find that results differ considerably in terms of out–of–sample forecasts, see Koop and
Korobilis (2012) for more details. Evidently, in many economic applications, it is very
plausible that δ would indeed be time–varying. For instance, it is plausible to expect
that δ is relatively low in recessions or periods of market turmoil (as there is considerable
(i)
time–variation in θt ). On the other hand, δ is expected to be close to 1 during tranquil
and expansion periods. Dangl and Halling (2012) propose a very elegant solution to
this problem by considering a grid of values for δ and incorporate this in the DMA
setting by averaging over the number of predictors as well as over a grid value of delta.
Furthermore, this procedure can also be used to obtain more information from the data
through a variance decomposition scheme, see below for more details.
(i)
• Modeling Vt : In this paper, we make things easy for conjugate analysis by using
the following assumptions, see also Dangl and Halling (2012) and Byrne et al. (2014):
(i)
We assume that Vt = V (i) for all t. For time t = 0, we specify a Normal prior
(i)
on θ0 and a Gamma prior on φ(i) = V −1(i) . The time t estimate of the variance of
the error term in (1) is then given as equation (16) in the Appendix of Dangl and
Halling (2012). More importantly, by using these assumptions, we find that, when we
(i)
integrate the conditional density of yt over the values of θt and V (i) to obtain the
predictive density, p (yt |Ft−1 ), the corresponding density has a closed–form solution,
Leopoldo Catania, Nima Nonejad
5
(i)
(i)
which is given as p (yt |Ft−1 ) ∼ tn(i) ŷt , Qt , where tn(i) stands for the Student–t
t
t
(i)
(i)
(i)
distribution with nt degrees–of–freedom and mean and scale given by ŷt and Qt ,
see equations (11) and (12) of Dangl and Halling (2012), respectively. We can also
follow Koop and Korobilis (2012) an use an Exponentially Weighted Moving Average
(i)
(EWMA) estimate of Vt . However, this requires the practitioner to also consider an
additional parameter, namely, κ. Furthermore, adding to the computational burden,
EWMA does not drastically change the results.
3. Modified DMA
Below, we present the DMA algorithm modified to incorporate the extensions mentioned
above. We refer the reader to Dangl and Halling (2012) for more details. Let Mi denote
a model containing a specific set of predictors chosen from a set of k = 2n − 1 candidates
and δj ∈ {δ1 , ..., δd } denote a specific choice of the degree of time–variation in the regression
coefficients at time t. The total posterior density of model Mi and forgetting δj at time t,
p (Mi , δj |Ft ), is given as
p (Mi , δj |Ft ) = p (Mi |δj , Ft ) p (δj |Ft ) .
In order to obtain p (Mi |Ft ) and p (δj |Ft ), we can use that
p (Mi |Ft ) =
d
X
p (Mi |δj , Ft ) p (δj |Ft ) .
(3)
j=1
The term, p (Mi |δj , Ft ), in (3) is given as
p (yt |Mi , δj , Ft−1 ) p (Mi |δj , Ft−1 )
p (Mi |δj , Ft ) = Pk
l=1 p (yt |Ml , δj , Ft−1 ) p (Ml |δj , Ft−1 )
(4)
p (Mi |δj , Ft−1 )α
p (Mi |δj , Ft−1 ) = Pk
.
α
p
(M
|δ
,
F
)
j
t−1
l
l=1
(5)
where
The second term on the right–hand side of (3) is given as
p (yt |δj , Ft−1 ) p (δj |Ft−1 )
p (δj |Ft ) = Pd
.
p
(y
|δ
,
F
)
p
(δ
|F
)
t
t−1
t−1
l
l
j=1
(6)
where
p (δl |Ft−1 )α
p (δj |Ft−1 ) = Pd
.
α
l=1 p (δl |Ft−1 )
(j)
(j)
As previously mentioned, p (yt |Mi , δj , Ft−1 ) ∼ tnt ŷi,t , Qji,t , where ŷi,t and Qji,t are the
quantities of model Mi , i = 1, ..., k, conditional on δj , j = 1, ..., d, and α. Typically,
6
eDMA: Efficient Dynamic Model Averaging
p (Mi , δj |F0 ) = 1/d·k such that initially, all model combinations and degrees of time–variation
are equally likely. Thereafter, model probabilities are updated using the above recursions.
3.1. Using the output from DMA
For practitioners, the most interesting output from DMA are:
• (i): The predictive mean of yt+1 conditional on Ft , ŷt+1 . This is simply an average of
each of the individual model predictive means. That is
ŷt+1 =
d
X
h
i
(j)
E yt+1 |Ft p (δj |Ft ) ,
(7)
j=1
where
k
h
i X
h
i
(j)
(j)
E yt+1 |Ft =
E yt+1,i |Ft p (Mi |δj , Ft ) .
i=1
The formulas for the predictive density are very similar. We only replace the predictive
mean with the predictive density, see Dangl and Halling (2012).
• (ii): Quantities such as the expected size of the predictor, E [Sizet ] = Σki=1 Sizet,i p (Mi |Ft ),
where Sizet,i be the number of predictors at time t in model i. This quantity reveal the
average number of predictors in the DMA, see Koop and Korobilis (2012).
• (iii): Posterior inclusion probabilities for the predictor. That is at each t, we calculate
Σki=1 1(i⊂m) p (Mi |Ft ), where 1(i⊂m) is an indicator function taking the value of either 0
or 1 and m is the mth predictor.
• (iv): posterior weighted average of δ at each point in time, see equation (6).
• (iv): Posterior weighted average estimates of θt ,
E [θt |Ft ] =
d
h
i
X
(j)
E θt |Ft p (δj |Ft )
j=1
where
h
E
(j)
θt |Ft
i
=
k
X
i=1
h
i
(j)
E θt,i |Ft p (Mi |δj , Ft ) .
Leopoldo Catania, Nima Nonejad
7
• (v): Variance decomposition of the data, Var (yt+1 |Ft ), decomposed into
" k
#
d
X
X
Var (yt+1 |Ft ) =
(St |Mi , δj , Ft ) p (Mi |δj , Ft ) p (δj |Ft )
j=1
i=1
d
k
X
X
"
+
j=1
#
Ft0 Rt Ft |Mi , δj , Ft p (Mi |δj , Ft ) p (δj |Ft )
i=1
d
k X
X
"
+
j=1
+
d X
(j)
(j)
ŷt+1,i − ŷt+1
2
#
p (Mi |δj , Ft ) p (δj |Ft )
i=1
(j)
ŷt+1 − ŷt+1
2
p (δj |Ft ) .
(8)
j=1
The first term is the observational variance, Obs. The remaining terms are: Variance
due to errors in the estimation of the coefficients, Coeff, variance due to uncertainty
with respect to the choice of the predictors, Mod, and variance due to uncertainty with
respect to the choice of the degree of time–variation in the regression coefficients, TVP.
Each of these terms is directly available from the DMA procedure. This way, we can
investigate, which of these components best explain expected variation in yt .
4. The eDMA package for R
The eDMA package for R offers an integrated environment for practitioners in economics
and finance to perform our DMA algorithm. It is principally written in C++ exploiting the
armadillo library of Sanderson (2010) to speed up the computations, the relevant functions
are then made available in R through the Rcpp and RcppArmadillo packages of Eddelbuettel
et al. (2016a) and Eddelbuettel et al. (2016b), respectively. It also makes use of the OpenMP
API (OpenMP 2008) to parallelize part of the routines needed to perform DMA over the
series of alternative models. Multiple processors are automatically used if supported by the
hardware, however, as will be discussed later, the user is free to manage the level of resources
used by the program. The eDMA package is written using the S4 object oriented language,
meaning that classes and methods are available in the code. Specifically, R users will find
common methods such as plot(), show() and as.data.frame() in order to visualise the
output of DMA and extract estimated quantities.
Once the package is correctly installed and loaded, the user faces one function named doDMA()
to perform DMA over a series of possible models. The doDMA() function accepts a series of
arguments and returns an object of the class DMAclass which comes with several methods:
plot(), show(), as.data.frame(). The arguments the doDMA() function accepts are:
• vY, a T × 1 numeric vector of time–ordered dependent variables. It can also be an
object of the class xts, if this is the case, the time information is used in the graphical
representation of the results.
• mF, a T × n matrix of the predictors.
• vDelta, a d × 1 numeric vector representing the grid where δ takes values, typically
chosen as {0.9, 0.91, . . . , 1}.
8
eDMA: Efficient Dynamic Model Averaging
• dAlpha, a numeric variable representing α in equation (5). By default dAlpha = 0.99.
• vKeep, a numeric vector of indexes representing the predictors that must be always
included in the models. The combinations of predictors that do not include the variables
declared in vKeep are automatically discarded. The indexes must be consistent with
the columns of mF, i.e. if the first and the fourth variables always have to be included,
then we must set vKeep=c(1,4). It can also be a character vector indicating the names
of the predictors if these are consistent with the column names of mF. By default all the
combinations are considered, i.e. vKeep = NULL. That is all predictors are free to vary.
• bZellnerPrior, a boolean variable indicating whether the Zellner prior (Dangl and
Halling 2012, see) should be used for the coefficients at time t = 0. By default
bZellnerPrior = TRUE.
• dG, a numeric variable equal to 50 by default. If bZellnerPrior = TRUE this represents
the variable g in Eq. (4) of Dangl and Halling (2012). Otherwise, if bZellnerPrior =
FALSE, it represents the scaling factor for the variance covariance matrix of the Normal
prior for θ 0 , i.e. θ 0 ∼ N (0, dG∗I), where I is the identity matrix. Our experience reveals
that the latter prior works better in the context of data with T = 100 and T = 200.
For T > 200, both priors perform very similarly.
• bParallelize, a boolean variable indicating wether to use multiple processors to speed
up the computations. By default bParallelize = TRUE. Since the use of multiple
processors is basically effortless for the user, we suggest to not change this value. Furthermore, if the hardware does not permit parallel computations, the program will
automatically adapt to run on a single core.
• iCores, an integer indicating the number of cores to use if bParallelize = TRUE.
By default all but one cores are used. The number of cores is guessed using the
detectCores() function from the parallel package. The choice of the number of cores
depends somehow from the specific application, namely the length of the time series T
and the number of the predictors n. However, as detailed in Chapman, Jost, and Van
Der Pas (2008), the level of parallelization of the code should be traded off with the
increase in computational time due to threads communications. Consequently, the user
can fine tune its application depending on its hardware changing this parameter.
The doDMA() function returns an object of the class DMAclass. This object contains model
information and estimated quantities. It is organised in three slots: model, Est, data. The
slot model contains information about the specification used to perform DMA, examples
are the number of considered models and the computational time. The slot Est contains
the estimated quantities such as: Point forecasts, Predictive likelihood, Posterior inclusion
probability of the predictors, Filtered estimates of the regression coefficients, etc. Finally, the
slot data includes the data passed to the doDMA() function, vY and mF.
For reproducibility purposes, the data that we use in the empirical application of this paper
are included in the eDMA package. These are two time series object of the class xts named
gdpdef and predictors, representing the US quarterly inflation measured as the logarithmic
change in the implicit GDP price deflator and several related predictors such as Unemployment
rate, T–bill rates and Expected inflation, respectively. A detailed description of the variables
included in the predictors object is reported in Section 6.
Leopoldo Catania, Nima Nonejad
9
4.1. Using eDMA
After having installed eDMA through the usual install.packages() function, it can be
easily loaded using library(doDMA). Once the package is loaded, gdpdef and predictors
can be easily made available into the workspace using the following chunk of code:
R> library(doDMA)
R> load('gdpdef')
R> load('predictors')
For instance, suppose that we are interested in performing DMA using the first seven variables
included in the object predictors. We can define a new object mF as
R> mF = predictors[,1:7]
The mF object is a T × 7 matrix containing the first seven predictors detailed in Section 6,
the first four columns are the constant term and the first three lagged value of the inflation,
respectively. Suppose now that we want to do DMA over all the possible combinations of the
predictors, but keeping fixed the first and the second predictors (which are the constant term
and the first lagged value of yt ). This can be easily done using the following portion of code:
R> DMA = doDMA(gdpdef, mF, vDelta = seq(0.9,1.0,0.02),
dAlpha = 0.99, vKeep = c(1, 2))
A summary of the DMA procedure is available just calling the object returned by doDMA():
R> DMA
-----------------------------------------Dynamic Model Ageraging
-----------------------------------------Model Specification
T = 203
n = 7
Number of possible models = 32
Variables always included : const, lag1
-----------------------------------------Instability parameters:
Delta = 0.9, 0.92, 0.94, 0.96, 0.98, 1
-----------------------------------------Elapsed time
: 0.15 secs
Note that, we have specified a grid of six equally spaced values for δ (d = 6) ranging from 0.9
to 1.0. Furthermore, since we do not specify any value for bZellnerPrior and bParallelize,
their default values have been used, i.e. bZellnerPrior = TRUE and bParallelize = TRUE.
In order to extract the quantities estimated by the DMA procedure, the user can relay on the
as.data.frame() method. The as.data.frame() method accepts two arguments: (i): An
object of the class DMAclass and (ii): A character string, which, indicating the quantity to
extract. Possible values for which are:
10
eDMA: Efficient Dynamic Model Averaging
• mincpmt: Posterior inclusion probability of the predictors at each point in time, see
Koop and Korobilis (2012).
• vsize: Expected number of predictors (average size), see Koop and Korobilis (2012).
• mtheta: Filtered estimates of the regression coefficients for DMA.
• mpmt: Posterior probability of the degrees of instability.
• vyhat: Point forecasts of DMA, see Equation (7).
• vLpdfhat: Predictive likelihood of DMA, see Dangl and Halling (2012).
• vdeltahat: Posterior weighted average of δ.
• mvdec: Individual components of (8), see point (v) in page 6 and Dangl and Halling
(2012) for more details. The function returns a T × 5 matrix whose columns contain
the variables.
– vtotal: Total variance, that is vtotal=vobs+vcoeff+vmod+vtvp.
– vobs: Observational variance.
– vcoeff: Variance due to errors in the estimation of the coefficients.
– vmod: Variance due to model uncertainty.
– vtvp: Variance due to uncertainty with respect to the choice of the degrees of
time–variation in the regression coefficients.
For instance, in order to extract the posterior inclusion probabilities of the predictors, we can
easily run the following command
R> PostProb = as.data.frame(DMA, which = 'mincpmt')
which returns a T × 7 matrix of inclusion probabilities for the regressors at each time. Furthermore, if the supplied vector of observations vY is an xts object, the class membership
is automatically transferred to the output of the as.data.frame() method. Obviously, the
inclusion probabilities for those predictors that are always included are equal to one by construction. Finally, the plot() method is available for the class DMAClass. Specifically, this
method print an interactive menu in the console permitting the user to chose between a series
of interesting graphical representation of the estimated quantities. It can be straightforwardly
executed running
R> plot(DMA)
Print 1-11 or 0 to exit
1: Point forecasts
2: Predictive likelihood
3: Posterior weighted average of delta
4: Posterior inclusion probability of the predictors
5: Posterior probability of the degree of instability
6: Filtered estimates of the regression coefficients
Leopoldo Catania, Nima Nonejad
11
7:
8:
9:
10:
Observational variance
Variance due to errors in the estimation of the coefficients
Variance due to model uncertainty
Variance due to uncertainty with respect to the choice of the degrees
of time-variation in the regression coefficients
11: Expected number of predictors (average size)
and selecting the desiderated options. An example of the graphs computed by eDMA is
reported in Section 6. As mentioned before, if the data passed to doDMA() is of the class xts,
then the time information is used in the graphs.
5. Computational challenges
Although estimation of DMA does not require resorting to simulation methods, in many economic applications, performing DMA can become computationally very cumbersome. As it
can be seen from the set of recursions from the previous Section 3, DMA consists of a large
number of summations considering different model combinations. Furthermore, the information from these combinations cannot be deleted as they are subsequently used to compute
(6), (8) and (9), which means that the estimation procedure tends to occupy a large chunk of
memory. Consequently, this fact makes estimation using a large number of predictors difficult
from a practical point of view as the system basically runs out of memory. For instance, Koop
and Korobilis (2012) use DMA to forecast quarterly inflation based on the generalized Phillips
curve at different horizons. Thus, yt in (1) is the percentage changes in quarterly GDP price
deflator and Ft consists of 14 exogenous predictors. Furthermore, inflation is highly persistence, then Koop and Korobilis (2012) also include three lags of yt in Ft . However, handling
217 combinations reveals to be very burdensome in their programming framework, and, as
a consequence, the authors choose to include the lags of inflation in all model combinations
and thus reducing model space. Furthermore, Koop and Korobilis (2012) estimate several
DMAs by fixing δ at different values such as 0.95, 0.98, 0.99 and 1.0, which again increase
the computational burden.
We can argue that DMA can impose a very substantial challenge for the practitioner when
dealing with a large number of predictors, namely, that besides dealing with the task of transforming mathematical equations from paper to codes, handling data and estimation issues,
a practitioners also has to overcome “technical/computer science” challenges such as how to
deal with extensive memory consumption and how to use multiple cores instead of a single
core to speed up computation time. Although one can always improve the computational
procedure by “coding smarter” or discovering way to optimize memory allocation, it seems
unreasonable to expected that practitioners in economics should have extensive knowledge of
computer science concepts such as those stated above. In this paper, we provide practical
solutions to these problems. First, reduction in computation time is implemented by writing
all the code in C++ using the armadillo library of Sanderson (2010). Second, we exploit
multiple processors through the OpenMP API whenever the hardware is suited for that. The
combination of C++ routines and parallel processing permits to dramatically speed up the
computations over the same code written in plain R. As an example, we report a comparison between our code and the available dma package. For this experiment, since the dma
package cannot operate over a grid value of δ, we consider the case of δ = 0.95. We simulate
12
eDMA: Efficient Dynamic Model Averaging
T /n
4
6
8
10
12
14
16
100
500
1000
10.9
37.5
13.0
92.6
29.4
15.0
133.2
31.5
13.8
88.4
30.7
12.9
69.4
25.7
12.7
70.5
26.5
13.5
58.4
25.4
13.8
Table 1: Ratio between the computational time of the dma() function from the dma package and the doDMA()
function of the eDMA package for different values of T and n.
T = {100, 500, 1000} observations from a DLM with n = {4, 6, 8, 10, 12, 14, 16} predictors and
evaluate the differences in the computational time of the dma() function in the dma package
and the doDMA() function in the presented eDMA package. The experiment is performed on
a standard Intel Core i7–4790 processor with 8 threads and Ubuntu 12.04 server edition.
Table 1 reports the ratio of the CPU time for different values of T and n between the dma()
and the doDMA() functions. As one can note, the decrease in the computational time is huge.
For example, in the case T = 500 and n = 16, dma() takes 37.5 minutes while doDMA()
only 1.8. It is also worth stressing that, the benefit of using eDMA does not only concern
the possibility of running moderately large applications in a reasonable time using a commercial hardware, but also enables practitioners to run application with a large number of
exogenous variables. To give an idea of the computational time a eDMA user faces, we report a second simulation study. We simulate from a DLM with T = {100, 200, ..., 900, 1000},
n = {2, 3, .., 18} and run doDMA() using a grid of values for δ between 0.9 and 1.0 with different spaces d, namely d = {2, 3, . . . , 10}. Figure 1 displays the computational time in minutes
for all the combination of T, n, d. The lines reported in each subfigure represents the computational time for a specific choice of d. The line at the bottom of each subfigure is for d = 2,5
the one immediately above is for d = 3 and so on until the last which is for d = 10. From
the figure, we can see that, when T ≤ 400, even for n = 18 and d = 10, the computational
time is less then 15 minutes. Such sample size is relatively common in economics application.
When T increases, computational time increases approximately linearly. For example, when
T = 800, n = 18 and d = 10, computational time is 30 minutes, which is the double of the
same case with T = 400.
The other relevant problem with DMA is RAM usage. Specifically, if we want to store the
quantities defined in (1) and (4), we need to define two arrays of dimension T × d × (2n − 1).
These kind of objects are not present in the eDMA package since we rely on the markovian
nature of the model clearly evident from (1). In this respect, we keep track of the quantities
coming from (4) and p (yt |Mi , δj , Ft−1 ) only for two consecutive periods during the loop over
T . However, arrays of size 2 × d × (2n − 1) cannot be eliminated at all in our framework.
RAM usage is still efficiently performed in the eDMA package, indeed, the computer where
we run all our simulations has only 8GB of RAM.
6. Empirical Application
In this Section, we demonstrate the usefulness of the eDMA package for an actual empirical application. This can be thought of as a typical assignment for a researcher at
a central bank who is interested in forecasting inflation one–period ahead. The data we
5
In this case δ can takes values δ = 0.9 and δ = 1.0.
Leopoldo Catania, Nima Nonejad
δ
RMSE
log–PBF
13
0.92
0.95
0.98
0.99
1.00
1.09
-12.79
1.07
-7.40
1.02
-2.29
1.00
-1.87
0.99
-2.20
Table 2: Relative Means Squared Error (RMSE) and the difference between the log-predictive likelihood
(log-predictive Bayes factor, log-PBF) between DMA with δ fixed and DMA with δ time–varying.
use are the exact data used in Groen et al. (2013). They are downloadable from http:
//www.tandfonline.com/doi/suppl/10.1080/07350015.2012.727718. We use the same
labels as the mentioned paper, see page 33 of Groen et al. (2013) for more details. The
measure of inflation is the percentage change in the implicit GDP price deflator from 1960q1
till 2011q2 for a total of T=206 observations. We use the same 14 exogenous predictors as
Groen et al. (2013) along with four lags of inflation. A constant term is also included in all
models. Thus, we have a total number of 218 = 262144 model combinations. Furthermore,
we let δ = {0.9, 0.91, ..., 1} such that we have a total of 218 · 11 = 2883584 combinations at
each point in time. The total estimation time of our DMA is around 28 minutes on an Intel
Core i7-3630QM.
In Figures 2 to 4, we provide results using the output from the estimation procedure. We
focus mainly on their intuition such that a reader interested in future economic applications
can grasp the potential of our estimation procedure. In panel (c) of Figure 4, we report
E [Sizet |Ft ] that is the average number of predictors in DMA (not including the intercept,
which is common to all models). Similar to Koop and Korobilis (2012), we find that although
we have many predictors, there is considerable shrinkage as E [Sizet |Ft ] is around 6 to 8 predictors. The inclusion probabilities in Figure 2 show that the first, third and fourth lags of yt
show up as important predictors, whereas the second lag of yt does not seem to be important.
We also see that there is time–variation in the inclusion probabilities. For instance, real spot
price of oil, OIL, becomes important in forecasting inflation only from the 1980s whereas the
opposite happens to YL, which denotes the level factor of the term structure.
The posterior weighted average estimate δ shows also a very clear picture, see panel (b) in
Figure 4. It comes close to the lower bound during the recessions in the 1970s, which fares well
with the notation of large movements of time–variation in θt , during periods of high inflation,
inflation volatility and recessions, see Figure 4. It then rises during the Great Moderation
and subsequently falls during the Great Recession of 2008.
In Table 2, we report the one step–ahead predictive mean and density for our DMA compared
to versions where δ is fixed at a certain values. Evidently, we see that there are improvements
both in terms of density and point forecast by allowing for time–variation in δ. We expect
these gains to be greater for longer forecast horizons (we leave the empirical implications of
this for future research). For h > 1, a practitioner basically only needs to lag mF h periods and
then run doDMA(), see Koop and Korobilis (2012) for a similar approach and more details on
forecasting using DMA. Similarly, posterior probability of δ = 1 increases during the Great
Moderation and subsequently falls to zero at the beginning of the Great Recession, see panel
(a) of Figure 4.
Finally, we also use the output from the estimation procedure to perform a variance decomposition analysis. In panel (d) of Figure 4, we plot the relative weight of these components
over time. Evidently, the dominant source of uncertainty is the observational variance. This
observation is not surprising as over the short forecast horizon, random fluctuations are ex-
14
eDMA: Efficient Dynamic Model Averaging
pected to dominate uncertainty. Uncertainty about the correct magnitude of time–variation
in θt is high during recessions such as the Great recession of 2008. Similar to results regarding
the magnitude of time–variation in δ, we believe that practitioners can build on the variance
decomposition results to better understand the sources of variation in inflation.
7. Conclusion
In this paper we present the eDMA package for R. The purpose of eDMA is to offer an integrated environment to perform DMA using the available doDMA() function, which enables
practitioners to perform DMA exploiting multiple processors without major efforts. Furthermore, R users will find common methods to represent and extract estimated quantities such
as plot() and as.data.frame().
Overall, eDMA is able to: (i): Incorporate the extensions introduced in Dangl and Halling
(2012), which is particularly relevant for economic and financial application, (ii): Compared
to other approaches is relatively much faster, (iii): It requires a smaller amount of RAM even
in cases of moderately large applications, (iv): It allows for parallel computing.
In Section 5 we also detail the expected time the program takes to perform DMA under different sample sizes, number of predictors and number of grid points. For typical economic
applications, estimation time is around 30 min using a commercial laptop. Very large applications can still benefit from the use of eDMA when performed on desktops or even clusters,
without additional effort from the user.
Leopoldo Catania, Nima Nonejad
A. Figures
15
eDMA: Efficient Dynamic Model Averaging
16
30 min
15 min
5 min
1 min
30 min
15 min
5 min
1 min
30 min
15 min
5 min
1 min
30 min
15 min
5 min
1 min
30 min
15 min
K
5 min
1 min
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
10
●
11
●
12
●
13
●
●
14
●
●
●
15
●
●
●
●
●
16
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
9
●
●
●
●
●
●
●
8
●
●
●
7
●
●
●
6
●
●
●
5
●
●
●
4
●
●
●
3
●
●
●
2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
17
T = 100
●
●
●
●
●
●
●
●
●
T = 200
●
●
●
●
●
●
●
●
●
●
T = 300
●
●
●
●
●
●
●
●
●
●
T = 400
●
●
●
●
●
●
●
●
●
●
T = 500
●
●
●
●
●
●
●
●
●
●
18
(a)
(b)
(c)
(d)
(e)
30 min
15 min
5 min
1 min
30 min
15 min
5 min
1 min
30 min
15 min
5 min
1 min
30 min
15 min
5 min
1 min
30 min
15 min
K
5 min
1 min
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
9
●
10
●
11
●
12
●
●
13
●
●
●
14
●
●
●
●
●
15
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
8
●
●
●
●
●
7
●
●
●
6
●
●
●
5
●
●
●
4
●
●
●
3
●
●
●
2
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
16
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
17
T = 600
●
●
●
●
●
●
●
●
●
●
T = 700
●
●
●
●
●
●
●
●
●
●
T = 800
●
●
●
●
●
●
●
●
●
●
T = 900
●
●
●
●
●
●
●
●
●
●
●
●
T = 1000
●
●
●
●
●
●
●
●
18
(f)
(g)
(h)
(i)
(j)
Figure 1: Computational time for doDMA() using simulated data. Each panel represents computation time in minutes for doDMA()
using different sample sizes T , number of predictors n and values of d, i.e. the number of points in the grid of δ. The values for d
range between 2 and 10, the line at the bottom of each subfigure is for d = 2, the one immediately above is for d = 3 and so on until
the last which is for d = 10. The experiment is performed on a standard Intel Core i7–4790 processor with 8 threads and 8 GB of
RAM with Ubuntu 12.04 server edition
Leopoldo Catania, Nima Nonejad
1.0
1.0
0.8
0.8
0.6
17
MS
0.6
(a)
0.4
(e)
0.4
0.2
0.2
0.0
yt−1
0.0
1.0
yt−3
1.0
0.8
M2
0.8
0.6
0.6
(b)
(f)
0.4
0.4
0.2
0.2
0.0
0.0
1.0
1.0
yt−4
0.8
OIL
0.8
0.6
0.6
(c)
(g)
0.4
0.4
0.2
0.2
0.0
0.0
1.0
1.0
0.8
0.8
0.6
YL
0.6
(d)
0.4
(h)
0.4
0.2
0.2
PIMP
0.0
1969
1972
1975
1978
1981
1984
1987
1990
1993
1996
1999
2002
2005
2008
2011
0.0
1969
1972
1975
1978
1981
1984
1987
1990
1993
1996
1999
2002
2005
2008
2011
Figure 2: Posterior inclusion probabilities of the predictors. Graphs (a), (b) and (c): First,
second and third lags of inflation. Graph (d): Import deflator. Graph (e): Inflation expectations through the one-year ahead inflation expectations that come from the Reuters/Michigan
Survey of Consumers. Graph (f): M2 monetary aggregate, Graph (g): Real spot price of oil.
Graph (h): Level factor of the term structure. We refer the reader to Groen et al. (2013)
for more details regarding the variables. Grey vertical bands represent periods of recessions
according to the Recession Indicators Series available from the Federal Reserve Bank of St.
Louis.
18
eDMA: Efficient Dynamic Model Averaging
0.35
yt−1
0.6
MS
0.30
0.5
0.25
0.4
0.20
(a)
0.3
(e)
0.15
0.10
0.2
0.05
0.1
0.00
0.0
0.06
yt−3
M2
0.25
0.04
0.20
0.15
(b)
(f)
0.02
0.10
0.00
0.05
0.00
0.4
yt−4
OIL
0.30
0.3
0.25
0.2
0.20
(c)
0.1
(g)
0.15
0.0
0.10
−0.1
0.05
−0.2
1.5
0.00
PIMP
YL
0.3
1.0
0.2
(d)
0.1
(h)
0.5
0.0
0.0
1969
1972
1975
1978
1981
1984
1987
1990
1993
1996
1999
2002
2005
2008
2011
1969
1972
1975
1978
1981
1984
1987
1990
1993
1996
1999
2002
2005
2008
2011
Figure 3: Filtered estimates of the regression coefficients for the most important predictor
of DMA. Graphs (a), (b) and (c): First, second and third lags of inflation. Graph (d):
Import deflator. Graph (e): Inflation expectations through the one–year ahead inflation
expectations that come from the Reuters/Michigan Survey of Consumers. Graph (f): M2
monetary aggregate, Graph (g): Real spot price of oil. Graph (h): Level factor for the terms
structure. We refer the reader to Groen et al. (2013) for more details regarding the variables.
Grey vertical bands represent periods of recessions according to the Recession Indicators Series
available from the Federal Reserve Bank of St. Louis.
Leopoldo Catania, Nima Nonejad
1.0
δ = 0.92
δ = 0.96
δ = 0.98
δ = 1.00
19
9
0.8
8
0.6
(a)
7
(c)
0.4
6
0.2
5
0.0
1.0
0.99
Obs
Coef
Mod
TVP
0.98
0.8
0.97
0.6
(b)
0.96
(d)
0.4
0.95
0.2
0.94
0.93
0.0
1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 1999 2002 2005 2008 2011
1969 1972 1975 1978 1981 1984 1987 1990 1993 1996 1999 2002 2005 2008 2011
Figure 4: Graph (a): Posterior probabilities of δ. Graph (b): Posterior weighted average
estimate of δ. Graph (c): Expected size of DMA. Graph (d): Variance decomposition analysis.
Grey vertical bands represent periods of recessions according to the Recession Indicators Series
available from the Federal Reserve Bank of St. Louis.
20
eDMA: Efficient Dynamic Model Averaging
References
Byrne JP, Korobilis D, Ribeiro PJ (2014). “On the Sources of Uncertainty in Exchange Rate
Predictability.” Available at SSRN 2502586.
Chapman B, Jost G, Van Der Pas R (2008). Using OpenMP: portable shared memory parallel
programming, volume 10. MIT press.
Dangl T, Halling M (2012). “Predictive regressions with time-varying coefficients.” Journal
of Financial Economics, 106(1), 157–181.
Eddelbuettel D, Francois R, Allaire J, Ushey K, Kou Q, Bates D, Chambers J (2016a).
“’Rcpp’: Seamless R and C++ Integration. R package version 0.12.3 . URL: https://cran.rproject.org/web/packages/Rcpp/index.html.”
Eddelbuettel D, Francois R, Bates D (2016b). “’RcppArmadillo’: ’Rcpp’ Integration for the
’Armadillo’ Templated Linear Algebra Library. R package version 3.2. URL: https://cran.rproject.org/web/packages/RcppArmadillo/index.html.”
Groen JJ, Paap R, Ravazzolo F (2013). “Real-time inflation forecasting in a changing world.”
Journal of Business & Economic Statistics, 31(1), 29–44.
Koop G, Korobilis D (2011). “UK macroeconomic forecasting with many predictors: Which
models forecast best and when do they do so?” Economic Modelling, 28(5), 2307–2318.
Koop G, Korobilis D (2012). “Forecasting inflation using dynamic model averaging.” International Economic Review, 53(3), 867–886.
Onorante L, Raftery AE (2016). “Dynamic model averaging in large model spaces using
dynamic OccamŒş s window.” European Economic Review, 81, 2–14.
OpenMP A (2008). “OpenMP application program interface, v. 3.0.” OpenMP Architecture
Review Board.
R Core Team (2015). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Raftery AE, Kárnỳ M, Ettler P (2010). “Online prediction under model uncertainty via
dynamic model averaging: Application to a cold rolling mill.” Technometrics, 52(1), 52–66.
Sanderson C (2010). “Armadillo: An open source C++ linear algebra library for fast prototyping and computationally intensive experiments.”
Stock JH, Watson MW (1999). “Forecasting inflation.” Journal of Monetary Economics,
44(2), 293–335.
Stock JH, Watson MW (2008). “Phillips curve inflation forecasts.” Technical report, National
Bureau of Economic Research.
West M, Harrison J (1999). Bayesian Forecasting & Dynamic Models. Springer.
Leopoldo Catania, Nima Nonejad
Affiliation:
Leopoldo Catania
Department of Economics and Finance
Faculty of Economics
University of Rome, “Tor Vergata”
Via Columbia, 2
00133 Rome, Italy
E-mail: [email protected]
Nima Nonejad
Department of Mathematical Sciences
Aalborg University and CREATES
Fredrik Bajers Vej 7G
9220, Aalborg East, Denmark
E-mail: [email protected]
21