Norming Rates and Limit Theory for Some Time-Varying Coe¢ cient Autoregressions O¤er Lieberman and Peter C. B. Phillipsy January, 2013 Abstract A time-varying autoregression is considered with a similarity-based coe¢ cient and possible drift. It is shown that the random walk model has a natural interpretation as the leading term in a small-sigma expansion of a similarity model with an exponential similarity function as its autoregressive coe¢ cient. Consistency of the quasi-maximum likelihood estimator of the parameters in this model is established and the behavior of the score function is analyzed. A complete list is provided of the normalization rates required for the consistency proof and for the score function standardization. A large family of unit root models with stationary and explosive alternatives are characterized within the similarity class through the asymptotic negligibility of a certain quadratic form that appears in the score function. A variant of the stochastic unit root model within the class is studied and a large sample limit theory provided which leads to a new nonlinear di¤usion process limit showing the form of the drift and conditional volatility induced by this model . Some simulations and a brief empirical application to data on an Australian Exchange Traded Fund are included. Bar-Ilan University. Support from Israel Science Foundation grant No. 396/10 is gratefully acknowledged. I am indebted to Gabi Gayer and Itzhak Gilboa for numerous helpful comments. Correspondence to: Department of Economics, Bar-Ilan University, Ramat Gan 52900, Israel. E-mail: o¤[email protected] y Yale University, University of Auckland, Southampton University, and Singapore Management University. Support is acknowledged from the NSF under Grant No. SES 0956687. 1 Key words and phrases: Autoregression; Consistency; Nonlinear diffusion; Nonstationarity; Similarity; Small sigma approximation; Stochastic unit root; Time-varying coe¢ cients. JEL Classi…cation: C22 1 Introduction First-order autoregressions with possible unit roots or roots that are in the vicinity of unity have attracted an enormous amount of interest over recent decades. The literature now provides a near comprehensive coverage of estimation and testing of the coe¢ cient of the lag dependent variable in stationary, unit root, explosive, and many intermediate cases of near and mild integration, including models with or without …tted intercepts and trends. Traditional analysis of this model relates to data generating processes (DGP’s) with …xed coe¢ cients that are consistent with a single scenario. For instance, empirical studies frequently work under a null hypothesis that the DGP is a unit root process with drift, not that the process may have ‡uctuating, timedependent parameters that are compatible with stationary behavior for some parts of the sample, unit root behavior for other parts, and mildly explosive behavior elsewhere. Most econometric software packages include common tests for a unit root which re‡ect this characterization. But recent empirical work, particularly on the global …nancial crisis, has shown the advantages of working with ‡exible systems that accommodate multiple regimes of stationary and nonstationary behavior and transition mechanisms between them (e.g. Phillips and Yu, 2011). A second trend in the literature involves models with time varying coe¢ cients, such that the process is at least weakly stationary. See among others, Nicholls and Quinn (1980, 1981, 1982), Chen and Tsay (1993), Dahlhaus et. al. (1999), Dahlhaus (2000) and Lundbergh et. al. (2003). Some related work on explosive random coe¢ cient autoregressive processes has been done by Hwang and Basawa (2005). In addition, Granger and Swanson (1997) introduced a stochastic unit root (STUR) model where the autoregressive root is in the vicinity of unity, is stochastic, and is driven by an independent stationary process - see also McCabe and Tremayne (1995). Some properties of that model were derived but no limit theory or estimation theory was established. Within the context of a wider class of models, the present paper considers a variant of the STUR model, provides a large sample limit theory 1 and studies the discriminatory power of unit root testing against STUR alternatives. We do not cover in this paper an important line of the literature dealing with time varying coe¢ cient models that are not autoregressions. Those models give rise to issues which are very di¤erent from the ones which surface in autoregressions. Recently, Lieberman (2012) introduced a similarity-based model in the context of time varying coe¢ cient autoregressions. That paper developed the asymptotic theory for quasi-maximum-likelihood estimation (QMLE) of this model and various statistical tests. Unlike earlier literature, the coe¢ cient of the lag dependent variable in this model can ‡uctuate freely and, at any speci…c period t; the process may behave in a stationary, unit root, or explosive manner. This feature of the similarity model adds some ‡exibility to the prominent unit root model producing a system for which unit root effects may hold on average in a given sample but not necessarily at all points within the sample. In this paper we develop the idea further by considering a larger class of models and by showing that the unit root model can be naturally interpreted as a small- asymptotic approximation to the similarity model. To …x ideas, we consider the process Y1 = Yt = + "1 ; + t (w) Yt 1 + "t ; t = 2; :::; n; (1) where 2 R, w is an m-dimensional vector of unknown parameters, assumed to lie in a compact subset of Rm , t (w) = t (xt ; xt 1 ; w) 2 R+ , xt is an mvector of explanatory variables, possibly random, and f"t g is a sequence of iid (0; 2 ) random variables. It is emphasized that can be zero or otherwise and that the set of permissible speci…cations for t (xt ; xt 1 ; w) is rich. Moreover, for a given t, t (w) can be less than-, equal to- or greater than unity, so that the model can behave in a stationary-, unit-root- or explosive manner over subperiods. Of particular interest is the model with an exponential similarity function t (xt ; xt 1 ; w) = exp (wZt ) ; (2) where Zt = Xt is the source of variation in the coe¢ cient. A natural choice for Xt would be an explanatory variable for Yt , but other generators of coe¢ cient randomness are possible. EZt ( t ) is the moment generating 2 function (mgf) of Zt when it exists, and under certain conditions that will be discussed in Section 2 the sample average response behaves as 1X n t=1 n t (w) !p 1 + O (w 2 Z) ; (3) where 2Z = V ar (Zt ), 8t. Thus, on average, the unit root speci…cation, which is believed to be prevalent in economic and …nancial data, may be interpreted as the leading form in a small- expansion of the similarity model, with an error which is of the order O ( 2Z ). In studying the model (1), we prove consistency of the QMLE of w when 2 R. This extends t (w) is allowed to be random, non-negative and with the results of Lieberman (2012) by allowing for the cases = 0 and a random t (w). To achieve the results, we introduce uniform norming factors which are functions of n n matrices, one of which covers the = 0 case and another the case 6= 0. The behavior of the score function is analyzed and, as with the consistency proof, one uniform norming factor is given for the = 0 case and another for the 6= 0 case. The simplest scenario, in which 6= 0 and the process is approximately a unit root, leads to a score-based test which is asymptotically normal (see Lieberman 2012). A further case, where the slope coe¢ cient w = wn in the similarity function (2) is local to zero, also approximates a unit root model. This STUR model is analyzed by weak convergence methods and its limit theory is related to that of a local to unity process but gives rise to a new nonlinear di¤usion. The properties of this limit process reveal the explicit form of the conditional volatility induced by the similarity function. In many other cases the distribution theory is much more complicated. The plan for the paper is as follows. In Section 2 we discuss some important special cases of the model and shows how the random walk model can be interpreted as a small- approximation of the similarity model with an exponential similarity function. Notation and the main results are introduced in Section 3, followed by some discussion in Section 4. Section 5 studies a similarity-based STUR model, its limit theory, and the discriminatory power of unit root tests against STUR alternatives. Simulations and an empirical application are provided in Section 6. All proofs are contained in the Appendix. 3 2 2.1 Special Cases A Unit Root model as a Smallto the Similarity Model Approximation Let Zt be iid random copies of Z, each with mgf MZ (w). Under exponential similarity (2) we then have the pointwise convergence 1 X wZt e !p MZ (w) : n t=2 n Suppose that in addition to the above, Zt has a zero mean with small variation, as when it is distributed as U [ a; a], or N (0; 2Z ) with small a or 2Z , respectively. The import is that the coe¢ cient of Yt 1 varies with Zt but that the ‡uctuations in the coe¢ cient value are not too large. In the former case, as 2Z = a2 =3, e wa 2wa (wa)2 (wa)4 = 1+ + + O (wa)6 6 120 (w Z )2 3 (w Z )4 + + O (wa)6 = 1+ 2 40 MZ (w) = ewa (4) and the property of the limit shown in (3) follows. In the second case, EZ ( t (w)) = e(w = 1+ Z) 2 =2 (w 2 Z) 2 + (w Z) 8 4 + O (w 6 Z) so that (3) again holds. Of course, in both cases we can reparameterize with wZt = Zt U [ w ; w ], w = wa or Zt N (0; w ), w = w Z . These choices of t are natural and re‡ect the principle that the average Yt 1 coe¢ cient value may be close to unity across the sample but will deviate from unity at any point on the trajectory. Figure 1 illustrates this point showing the maximum likelihood estimate of t for data on an Australian Exchange Traded Fund (ETF). The unit root model can thus be viewed as a small- approximation to a ‡exible similarity model. 4 2.2 CAPM The capital asset pricing model (CAPM) has long been central to …nance and provides a working foundation for more sophisticated models. Let log Yt be the expected excess return of a certain capital asset and log Zt be the market premium, both at time t. The CAPM model relates these quantities through the equation log Yt = log Zt : Upon rearrangement, the model is Yt = exp ( Xt ) Yt 1 ; where Xt = log Zt . Thus, the CAPM model is just a similarity model with an exponential similarity function in which the value of Yt is determined by its similarity to Yt 1 through the closeness of Xt to Xt 1 . 2.3 A K–nearest neighbors DGP The threshold case t (xt ; xt 1 ; w) = 1 fk xt k < w1 g + w2 1 fk xt k w1 g ; where k k is the Euclidean norm, is of particular interest. Here, Yt 1 receives a unit weight in the response only if its characteristics, xt 1 , are within w1 -Euclidean distance from xt , the characteristics of Yt . Otherwise, Yt 1 is considered to be too ‘far’ from Yt and receives only a w2 -weight, where jw2 j < 1. This model essentially has the form of a K-nearest neighbors (KN N ) mechanism for the coe¢ cient. Whereas KN N is a curve …tting technique, here the mechanism underlies the dgp. 3 Notation and Main Results This section contains the main theoretical results of the paper. Assumptions and proofs for what follows are given in the Appendix. Let C = C (w) be an n n matrix with entries [C (w)]t;t 1 = t (w) = t , t = 2; :::; n and [C (w)]i;j = 0, otherwise, In be the identity matrix of order 0 n, S = In C, 1 = 2 , 02 = (w1 ; : : : ; wm ), = ( 1 ; 02 ) , y = (Y1 ; :::; Yn )0 and " = ("1 ; :::; "n )0 . Following convention, the true values of ; w, and 5 are denoted 0 ; w0 and 0 , respectively. Similarly, we set C0 = C (w0 ) and S0 = In C0 : For the 0 = 0 case, as det (S) = 1, the quasi-log-likelihood function is given by y 0 S 0 Sy n log 2 2 ln ( ) = : (5) 2 2 2 For the 0 6= 0 case, we concentrate the quasi-log-likelihood function by using ^ n ( ) = 10 S ( ) y=n in place of , which leads to lnc ( ) = n log 2 2 y 0 S 0 M Sy ; 2 2 2 (6) where M = In P , P = 110 =n and 10 is an n 1 (row) vector of 1’s. For brevity, ^n will denote the QMLE of using either (5), for the case 0 = 0, or (6), for the case 0 6= 0. We use the l1 -, spectral- and Frobenius norms of an n n matrix A, Pn denoted by kAk1 , kAk2 and kAkF , and given by kAk1 = i;j=1 [A]i;j , kAk2 = supjxj=1 x0 A0 Ax The quantity 1=2 and kAkF = (tr (A0 A))1=2 . n S0 1 = kS 10 S 2 F 1k 1 turns out to be central to the asymptotic analysis. Terms in the expansions of ln ( ) ; lnc ( ) and the score functions based on them can be conveniently grouped according to orders of magnitude of powers of n . As C0 is nonnegative and nilpotent by Assumption A2, S0 1 = In + C0 + + C0n 1 ; so that all the elements of S0 1 are nonnegative. It follows that S0 1 2 F = n X S0 10 S0 1 n X i;i i=1 S0 10 S0 1 i;j = S 10 S 1 1 ; (7) i;j=1 implying that 1: n (8) The behavior of n has been analyzed by Lieberman (2012, lemma 1) in some leading special cases. In particular, for …xed coe¢ cient stable or 6 explosive autoregressions n is bounded from below. For the unit root model 1 n = O (n ). Consistency of the QMLE is given in Theorem 1. Theorem 1 Under Assumptions A0-A3, ^n !p 0. The requirements for consistency seem quite weak. To establish the asymptotic distribution of the score, we let Dn and Dnc be n n normalizing matrices corresponding to the 0 = 0 and 0 6= 0 cases, respectively, such that, 1 0 1 p 0 0 n C B 0 1 C B kS0 1 kF C B C B Dn = B C C B A @ 1 0 1 kS0 kF and 0 p1 n B 0 B B c Dn = B B B @ 0 0 1 0 1 1=2 kS0 10 S0 1 k1 1 kS0 The normalized concentrated score is @ln ( @ c @l ( znc ( ) = Dnc n @ zn ( ) = Dn ) ) 10 S0 , if 0 = 0; , if 0 6= 0; 1 1=2 k1 C C C C: C C A c with components znr ( ) and znr ( ), r = 1; :::; m + 1. We have, with S_ 0r = @S0 =@wr , p n y 0 S00 S0 y p ; = 0; zn1 ( ) = + 2 20 2 40 n 0 p n y 0 S00 M S0 y c p ; 0 6= 0; zn1 ( ) = + (9) 2 20 2 40 n 7 and for r = 2; :::; m + 1, 0 "0 S_ 0r S0 1 + S0 10 S_ 0r " znr ( ) = c znr ( )= 2 S0 1 2 0 ; 0 F 0 0 M " + 2 0 10 S0 10 S_ 0r M" "0 M S_ 0r S0 1 + S0 10 S_ 0r 2 and c znr ( )= 2 0 S0 10 S0 1 10 _ 0 0 0 1 S0 S0r M " 1 2 0 S0 F (10) = 0; 1=2 1 + op (1) ; ; 0 6= 0; if n = O (1) (11) 0 6= 0; if n = o (1) : (12) c As is evident from (10)-(12), the leading terms of znr ( ) and znr ( ) depend on both the value of 0 and on the order of magnitude of n . In particular, in the 0 6= 0 case with n = o (1), the leading term in (12) is linear in ". This case corresponds, for instance, to a unit root with a drift c process. On the other hand, in the 0 6= 0 with n = O (1) case, znr ( ) (r = 2; :::; m + 1) involves both a linear and a quadratic form in ", so the asymptotic distributions of the score in the two cases are very di¤erent. Let 4 be the fourth-order cumulant of "t . Theorem 2 Under Assumptions A0-A3, 1. zn1 ( 0 ) !d N 0; 2 1 4 + 0 4 4 8 0 : c ( 0 ) are non-negligible, r = 2; :::; m + 1. 2. znr ( 0 ) and znr c 3. In the case 0 = 6 0, n = o (1) and a Gaussian ", znr ( 0 ) is asymptotc ically normal. In all other cases znr ( ) and znr ( ) involve quadratic forms in ". c Since, in general, znr ( ) and znr ( ) involve quadratic forms, their asymptotic distributions cannot be determined without additional structure on the model. See Theorem 2 of Lieberman (2012). 8 4 Discussion Some of the implications of these …ndings are as follows. 1. The normalization rates required for ln ( ) and lnc ( ) in the proof of Theorem 1 are detailed in Table 1. For the case 0 = 0, we require an 2 1 S0 1 F - or an S0 10 S0 1 1 - normalization when n = O (1) and an 2 S0 1 F -normalization when n = o (1). In terms of the …xed coe¢ cient framework, as the unit root model is characterized by the 2 condition n = o (1), an S0 1 F -normalization corresponds to an n 2 -normalization. For the …xed coe¢ cient stable or explosive AR(1) 2 model, both which are characterized by n = O (1), an S0 1 F - or an 1 2n S0 10 S0 1 1 - normalization corresponds to an n 1 - and a - normalization in the stable- and explosive cases, respectively. These rate results are well known – see Evans and Savin (1984), Phillips (1987) and Lemma 1 of Lieberman (2012). 1 2. In the 0 6= 0 case, an S0 10 S0 1 1 - normalization is required for the consistency proof based on lnc ( ), regardless of the order of magnitude 2n of n . This corresponds to an n 1 - and a - normalization in the stable- and explosive cases, respectively and an n 3 -normalization for the unit root model. Thus, unlike the stable and explosive cases in which the normalization is uniform in 0 , di¤erent normalizations are required for the 0 = 0 and 0 6= 0 cases in the unit root case. 3. To establish the behavior of the score, we use the normalizations summarized in Table 2. Let 0 QFn = "0 S_ 0r S0 1 + S0 10 S_ 0r "; (13) 0 M "; QFnc = "0 M S_ 0r S0 1 + S0 10 S_ 0r (14) and LFnc = 0 10 _ 0 0 1 S0 S0r M ": (15) The notations QFn and LFn stands for quadratic form and linear form, respectively, and when the superscript c is used, it indicates that the score is based on lnc ( ). In the 0 = 0 case, the score is a scalar multiple 1 of QFn and needs to be normalized by S0 1 F . In the 0 6= 0 and 9 = O (1) case, both QFnc and LFnc are of the same order of magnitude 1=2 whereas in the 0 6= 0 and n = o (1) case the term QFn = kS 10 S 1 k1 is negligible. As the condition n = o (1) characterizes a unit root type process, the vanishing of the latter term is indicative that the process is unit root and not a stable or an explosive process. Moreover, since LFn is the dominant term in the 0 6= 0 and n = o (1), case, if " is Gaussian, the normalized score, being linear in ", is also Gaussian. This is the case discussed in Lieberman (2012). For all other cases, the normalized score involves a quadratic form in " and therefore is not asymptotically Gaussian. It is clear that in general it is not possible to determine the asymptotic distribution of the normalized score without additional structure on the model. n 4. In the special case 0 = 0 and a (…xed coe¢ cient)unit root model, the normalized score given by QFn = S0 1 F is easily seen to converge to a ( 2 (1) 1) =2 variate, which agrees with the result of Phillips (1987). See also Lieberman (2010). 5. The present setting assumes exogenous regressors Zt . This assumption is common in time series regression where additional regressors are introduced and occurs in models such as the CAPM, ARMAX, and error correction models. In the following section, we consider a stochastic unit root model where the regressors may be dependent. 5 A Similarity STUR Model As a further example of a similarity autoregression we consider a model that belongs to a class most closely related to the stochastic unit root (STUR) model studied in Granger and Swanson (1997). We use the no-intercept similarity autoregression y1 = "1 ; yt = t (wn ) yt 1 + "t ; t = 2; :::; n; (16) with exponential similarity function t (xt ; xt 1 ; wn ) = ewn xt = ewn ut ; where xt = ut is the source of the variation in the autoregressive coe¢ cient. In 10 pa n this formulation, wn = t is local to zero, so that a p ut n = exp a = 1 + p ut + Op n 1 n ! 1; (17) is local to unity as n ! 1. However, unlike the usual constant coe¢ cient 1 + na ; the coe¢ cient is stochastic local to unity model where = exp na and may therefore lie in the stationary or the explosive region, depending on the realization of ut . Deviations from unity are Op n 1=2 in this model rather than deterministic and O (n 1 ) : The model (16) and (17) is closely related to the STUR model of Granger and Swanson (1997) in which the autoregressive coe¢ cient took the form t = e t where t is generated independently of yt by a stationary autoregression. Some of the properties of that model were studied in Granger and Swanson but no limit theory was provided. With the localizing coe¢ cient wn = pan in the time-varying representation t = exp (wn xt ) ; the behavior of the model is autoregressive in the vicinity of unity and is amenable to functional limit theory, as we show below. This behavioir may be directly analyzed as a stochastic alternative to either a unit root model or a constant local to unity model. As we will see, the limit behavior of the system is a nonlinear di¤usion process rather than a linear di¤usion process. We shall assume that the moment generating function Mu (s) = E (esut ) of ut is …nite over some interval s 2 ( ; ) of the origin for > 0: Solving (16) we have ) ( j ) ( j t 1 t 1 Y pa u Y X X t k "t j = "t + e n "t j y t = "t + t k j=1 = "t + e j=1 j=1 k=0 t 1 n X pa n Pj k=0 ut k o "t j = "t + k=0 t 1 n X e pa n s=1 Pt s `=0 us+` o "s : Observe that the impulse responses in this system are j Y @yt = @"t j k=0 t k =e pa n 11 Pj k=0 ut k u = eaXnj ; (18) P u where Xnj = n 1=2 jk=0 ut k is stochastic and a normalized partial sum process which wanders over R. Hence, the impulse response function @"@yt t j 2 R+ and may be arbitrarily small or arbitrarily large, the values being driven u by the partial sum process Xnj . We assume that partial sums of t = ("t ; ut )0 satisfy the invariance princiP BM( ) ; where B = (Bu ; B" )0 is vector Brownple n 1=2 bnrc s=1 s ) B (r) ian motion (BM) with variance matrix = ( ij ) : Then, the asymptotic behavior of the time series yt in (16) has the following form n 1=2 ybnrc = n 1=2 = n 1=2 bnrc 1 n X e pa n j=1 bnrc 1 n X e pa n s=1 = n ) Z 0 1=2 r bnrc 1 n e s=1 Rr a p dBu (q) e aBu (r) = e X Z pa n Pj k=0 ubnrc Pbnrc s `=0 Pbnrc j=s uj k us+` o o o "bnrc j + Op n "s + O p n "s + Op n 1=2 1=2 1=2 dB" (p) r e aBu (p) dB" (p) =: Ga (r) : (19) 0 The limit process Ga (r) is a nonlinear Itō di¤usion and satis…es the stochastic di¤erential equation dGa (r) = aGa (r) dBu (r) + a2 22 2 Ga (r) dr + dB" (r) ; showing the form of the drift and conditional volatility in the process that are induced by the similarity function t (wn ) = exp (wn xt ) in (16). The quadratic variation of Ga (r) is Z r Z r 2 2 [Ga ]r = 11 r + a 22 Ga (s) ds + 2a 12 Ga (s) ds 0 0 Z r = ea (s)0 ea (s) ds; 0 12 where ea (s)0 = (1; aGa (s)) : Evidently, realized and integrated volatility depend on the past history of the state variable fGa (s) ; s rg : If 12 = 0; Bu is independent of B" and the limit process Ga (r) is mixed Gaussian with the following covariance kernel conditional on Fu = fBu (s) : s 2 [0; 1]g Fu (r; s) = E fGa (r) Ga (s) jFu g Z rZ s aBu (r) aBu (s) e aBu (p1 ) e aBu (p2 ) E fdB" (p1 ) dB" (p2 )g = 11 e e Z r^s Z0 r^s0 2aBu (p) aBu (r) aBu (s) aBu (r) aBu (s) e dp = 11 e e e 2aBu (p) dp; = 11 e e 0 0 and unconditional covariance kernel aBu (r)+aBu (s) (r; s) = E = = = Z r^s e Fu (r; s) = 11 E e 0 Z r^s E ea[Bu (r) Bu (p)]+a[Bu (s) Bu (p)] dp 11 0 Z r^s a[Bu (r_s) Bu (r^s)] E e2a[Bu (r^s) 11 E e 11 e 1 2 a (r_s 2 r^s) 22 Z 2 = 11 22 r^s 2a2 1 e dp Bu (p)] dp 0 r^s e 2a2 [r^s p] 22 dp = 11 e 0 e2a 2aBu (p) 1 2 a 22 (r_s 2 r^s) 1 2 a (r_s 2 r^s) 22 " : 22 Observe that when a2 22 ! 0; (r; s) ! 11 r ^ s; the covariance kernel of the Brownian motion B" ; corresponding to the case where t = 1; Ga (r) ! B" (r) ; and (16) is a random walk. Thus, for small jaj or 22 ; the model is local to a simple unit root model. If 12 6= 0; Bu and B" are dependent, and the limit process Ga (r) is no longer conditionally Gaussian. Instead, we have Z r Z r aBu (r) aBu (p) aBu (r) Ga (r) = e e dB" (p) = e e aBu (p) dB" (p) 0 Z0 r Z r 12 aBu (r) aBu (r) aBu (p) = e e dB":u (p) + e e aBu (p) dBu (p) 22 0 = : Ga;":u (r) + 12 Ga;u (r) ; 22 13 0 2 [r^s e2a 2a2 p] 22 22 #r^s 0 2 12 Bu is BM( 11:2 ) with 11:2 = 11 Here B":u (r) = B" (r) 12 = 22 ; and 22 Ga;":u (r) is a conditional Gaussian (Itō di¤usion) process with conditional covariance kernel Z r^s aBu (r) aBu (s) e e 2aBu (p) dp; ":u;Fu (r; s) = E fGa;":u (r) Ga;":u (s) jFu g = 11:2 e 0 and unconditional covariance kernel 2 ":u e2a (r; s) = E fGa;":u (r) Ga;":u (s)g = 11:2 22 r^s 2a2 1 1 2 (r_s e2a r^s) 22 : 22 Rr The process Ga;u (r) = eaBu (r) 0 e aBu (p) dBu (p) is a nonlinear stochastic integral of Bu and obviously non Gaussian when a 6= 0: For t = bnrc for some r 2 (0; 1] and large j = bn c; > 0; the impulse responses (18) have the form @ybnrc @"bnrc bn = c bn c Y k=0 t k =e pa n Pbn c k=0 ubnrc k ) ea Rr r dBu (s) = ea[Bu (r) Bu (r )] ; which may be arbitrarily small, close to unity, or arbitrarily large depending on the historical trajectory of the process Bu over the past interval [r ; r] : The functional law (19) enables us to derive the limit behavior of statistics arising from the model (16) and (17). For example, we may consider conventional unit root tests applied to model (16) and (17). Observe that least squares applied to (16) gives Pn pa ut 2 Pn Pn n yt 1 y t 1 "t y y t t 1 t=1 e ^ = Pt=1 Pn 2 = + Pt=1 n n 2 2 t=1 yt 1 t=1 yt 1 t=1 yt 1 h h a io i Pn Pn n pa ut pa ut p u Pn 2 n n E e y e E e n t yt2 t 1 t=1 t=1 y t 1 "t t=1 Pn 2 Pn 2 = + Pn 2 + t=1 yt 1 t=1 yt 1 t=1 yt 1 n a o P p u n Pn n t pa e M yt2 1 t=1 n y t 1 "t a t=1 Pn 2 = M p + + Pn 2 n t=1 yt 1 t=1 yt 1 14 1 so that n ^ a p n M P n 1 nt=1 yt 1 "t n P = + n 2 nt=1 yt2 1 1 Pn n pa ut n M t=1 e P n n 2 t=1 yt2 pa n 1 o yt2 1 By standard weak convergence arguments we have R1 P Ga (r) dB" (r) n 1 nt=1 yt 1 "t Pn 2 : ) 0R 1 2 2 n G (r) dr t=1 yt 1 a 0 Expanding the moment generating function we have n;t pa (a) = e n ( ut M a p n 1 a 1 + p ut + 2 n = a p n a2 2 a = p ut + u 2n t n 22 2 u2t ) + op n ( 1 1+ 2 1 a p n 2 22 ) + op n 1 ; so that by standard arguments Pn 2 n t=1 yt 1 n;t (a) Pn 2 2 n t=1 yt 1 1 a ) a Pn yt p t=1 n R1 0 Pn 1 t=1 2 ut p n 1 n yt p 2 + Op n 1=2 1 n Ga (r)2 dBu (r) ; R1 Ga (r)2 dr 0 which leads to the limit theory R1 R1 G (r) dB (r) Ga (r)2 dBu (r) a a " 0 0 ^ n M p ) R1 +a R1 : n Ga (r)2 dr Ga (r)2 dr 0 0 Correspondingly, with a unit root centering, we have R1 R1 h i G (r) dB (r) Ga (r)2 dBu (r) 1 a " 2 0 0 n ^ 1 ) a 22 + R 1 + a R1 2 Ga (r)2 dr Ga (r)2 dr 0 0 15 (20) It follows that standard coe¢ cient based UR tests have only local discriminatory power against a stochastic UR of the form (16). Similar results apply for t-ratio based and other UR tests. 6 6.1 Simulations and Empirics Simulations To evaluate the behavior of the estimators, simulations were conducted on the model (1) with t = exp (wZt ) for n = 250, 500, 1000, = 0, 0:25, and w = 0:07, 0:2, with 2000 replications per experiment. In one setting we took Zt N ID ( w=2; 1) and in the other Zt U [ 1; 1] + bw , bw = 0:0116648 if w = 0:07 and bw = 0:033289 if w = 0:2. With these choices EZ ( t ) = 1, 8t, but for each t, t can be greater than- equal to- or less than unity. Means and standard deviations of w^n , ^ n and ^ t for all the scenarios considered here are summarized in Tables 3 and 4. Overall, with no noticeable di¤erences between the cases, the means are very close to the true values and the estimated standard deviations decline with n, as is expected, corroborating consistency. 6.2 Financial Data Application We consider eight country Exchange Traded Funds (ETFs), denoted by Pt , traded in the U.S. and measured in 15-minutes intervals. The model is Pt = expf N AVt (w1 + w2 Dt ) + SPt (w3 + w4 Dt ) +ECTt 1 (w5 + w6 Dt )gPt 1 + "t ; (21) where N AVt is the net asset value, SPt is the S&P500 index, ECTt is an error correction term, equal to Pt N AVt , Dt is a dummy variable, taking the value of unity if t is the U.S. market-open time and zero otherwise, and all variables (apart from Dt ) were transformed by a natural logarithm. The data are available for Australia, Hong Kong, Japan, Malaysia, Singapore, Taiwan, South Korea and China. The tickers for these countries are given by EWA, EWH, EWJ, EWM, EWS, EWT, EWY and FXI, respectively. The sample range is 12/15/2000-12/13/2010, apart for China, where it is available for 10/8/2004-12/13/2010. The data is discussed in detail in Levy 16 and Lieberman (2012). Unit root tests for Pt with and without a constant are reported in Table 5. As expected, the p-values in all cases are very high and they are generally higher for the version of the ADF test which does not include a constant. Thus, the (…xed coe¢ cient) unit root null hypothesis cannot be rejected, although the underlying process may well include a volatile lag dependent variable coe¢ cient. The estimated model results are given in Table 6, with M1 and M2 denoting the unit root with a drift model and model (21), respectively. For the former, the slope coe¢ cient equals unity to the third decimal place throughout, whereas the drift parameter is very small with large standard errors. Thus, a driftless unit root model seems to be a reasonable approximation to the dgp. Nevertheless, the average AIC criterion for M1 equals 7:949, whereas for M2, it is 9:618. To the third decimal place the SC averages are almost identical to the AIC averages and are therefore omitted. Across all cases, the error correction coe¢ cient estimates w^5 and w^6 are negative, with cross country sample averages equal to 0:004 and 0:230, respectively. These results imply that, ceteris paribus, a positive change in N AVt will result in Pt > Pt 1 . Conversely, when the error correction term is positive so that Pt 1 > N AVt 1 , there will be a downward correction to Pt . The model thus has a time varying coe¢ cient together with an embedded error correction mechanism that is a nonlinear driver of the coe¢ cient of the lagged dependent variable. The graph of ^ t for EWA is displayed in Figure 1, exhibiting volatility and ‡uctuating around unity, as expected. Other cases look very similar. Overall, the random walk model appears to be a reasonable approximation to (21) but it does not re‡ect any of the period by period variation captured in Figure 1 or potential drivers of that variation. 7 Conclusions We investigated time varying autoregressions in which variation in the coef…cient of the lag dependent variable is driven by a similarity function. A key feature of this model is that the slope coe¢ cient can be equal to, less than, or greater than unity at any point in time, giving the model a high degree of ‡exibility in the autoregressive response. Consistency of the quasi MLE of the parameter vector was established together with a complete taxonomy of 17 the required norming rates and standardization of the score function for the di¤erent cases. A local to zero similarity-based STUR system was introduced and a new limit theory was established in which the limit process, Ga (r), is a nonlinear Itō di¤usion process. The similarity function impacts this model by inducing drift and conditional volatility in the limit process, showing how the ‡exible autoregressive response in a STUR system can be the source of both drift and volatility. Our simulations show that the quasi MLE performs well for sample sizes ranging from 250 to 1000. The model is illustrated empirically in an application to international ETF data. While a unit root model is not rejected, the time varying coe¢ cient characteristics of the autoregressive responses are vividly apparent in the sample data, showing both mildly explosive and mildly integrated realizations. Appendix The parameter space is given by = 1 2 , where 1, 2 are the 2 spaces in which and w are assumed to lie, respectively. By K we denote a generic bounding constant, independent of n, which may vary from step to step. In the following we enlist the assumptions used for our model. Assumption A0: f"t gnt=1 is a sequence of iid continuous random variables, each with a zero mean, variance 2 and bounded cumulants r , r 3. If w 6= w0 , t (w) 6= t (w0 ), 8t. Assumption A1: There exist 2L , 2H , wL and wH , such that 20 2 [ 2L ; 2H ], with 0 < 2L < 2H < 1 and for each i = 1; :::; m, wi;0 2 [wL ; wH ], with 1 < wL < wH < 1. In addition, 0 2 R. Assumption A2: For all 1 < t n, the function continuous and is twice-continuously di¤erentiable. Let C0 = C (w0 ), so that S0 = In t (w) is non-negative, C0 and set C_ r (w) = @C (w) =@wr and C•r;s (w) = @ 2 C (w) =@wr @ws : Assumption A3: For all 1 Ci;j r m, 1 KC0i;j ; 18 i; j n, w 2 2 Rm , h KL [C0 (w)]i;j i C_ r (w) K [C0 (w)]i;j : i;j Assumption A0 includes an identi…cation condition. If t = , 8t, then this condition trivially holds. Assumptions A0–A3 are similar to those of Lieberman (2012), the key di¤erence being that 0 is allowed to be zero here. All the assumptions hold for all special cases discussed in this paper. The following inequalities will be used throughout (see, among others, Graybill (1983)). p kAk2 kAkF n kAk2 ; (22) 0 1 A1 n kAk2 , for A > 0, jtr (AB)j kAkF kBkF ; kABkF kAk2 kBkF , kABkF kAkF kBk2 : Proof of Theorem 1: The proof for the case 0 6= 0 was given by Lieberman (2012). The case 0 = 0 requires a di¤erent normalization and we deal with it here. For any 1 > 0, denote by B 1 ( 0 ) the ball f 2 : jj 0 jj 1g c and by B 1 ( 0 ) the complement of B 1 ( 0 ) in . Using Wu’s (1981) criterion, it is su¢ cient to show that 8 1 > 0, lim inf inf n n!1 B c ( 1 1 and lim inf inf n!1 B c ( 1 0) 2 0 0 ; 20 ln 0) 2 S0 1 2 0 0 ; 20 ln F 2 ln 0 20 ; (23) ; 2 0 0; 2 ln (24) are strictly positive in probability. Now, E 0 n 1 ln 2 ; 0 20 = 1 log 2 2 and V ar 0 n 1 ln 2 ; 0 20 = 2 4 0 4n 2 0 ; 2 2 2 + 4 4 4n : Hence, n 1 ln 2 0 ; 20 ln 2 ; 0 20 !p 19 1 2 2 0 2 1 log 2 0 2 0; with equality i¤ 2 = 20 . To establish (24), we use the decomposition 2 0 0 ; 20 ln 2 0 0; 2 ln = 1 2 0 2 + 1 2 2 0 (y 0 S 0 Sy y 0 S0 S0 y) y 0 S00 S0 10 G0 + GS0 1 S0 y 1 2 = 2 0 y 0 G0 Gy; (25) = Q1n + Q2n ; say, where G = S S0 . We have, S0 1 Q1n = Op F (26) : Considering Q2n , we have E 0 (y 0 G0 Gy) = E 0 "0 S0 10 G0 GS0 1 " K S0 1 and V ar 0 (y 0 G0 Gy) K S0 1 so that S0 1 Q2n = Op 2 F 2 F ; 4 F : 2 As S0 1 F n, Q2n strictly dominates Q1n . To complete the proof, we see that Q2n 0, because G0 G is positive semide…nite. Now, with gmin = min2 i n [G]i;i 1 , E 0 "0 S0 10 G0 GS0 1 " S0 1 2 F = = 20 2 0 tr S0 10 G0 GS0 1 2 F S0 1 2 0 GS0 1 S0 1 2 F 2 F Pn 1 i;j=1 2 2 0 gmin Pn i;j=1 2 i;j 1 2 S0 i;j S0 1 KL ; for some KL > 0 which is independent of n. Because y is continuous, Q2n 0 2 2 and E 0 Q2n = S0 1 F > 0, Q2n = S0 1 F is strictly positive in probability uniformly in B c1 ( 0 ), as required. Proof of Theorem 2: Case 1: 0 = 0. The score with respect to 2 is given by p n y 0 S00 S0 y p : + zn1 ( 0 ) = 2 20 2 40 n (27) Hence, E 0 (zn1 ( 0 )) = 0 and V ar 0 (zn1 ( 0 )) = 1 2 4 0 + 4 : 8 0 4 Higher order cumulants of zn1 ( 0 ) tend to zero, so part 1 of the Theorem is done. For r = 2; :::; m + 1, @ln ( ) = @ r = 0 y 0 S00 S_ 0r S0 1 + S0 10 S_ 0r S0 y 2 2 0 0 " "0 S_ 0r S0 1 + S0 10 S_ 0r 2 2 0 (28) : It follows that 0 " = Op "0 S_ 0r S0 1 + S0 10 S_ 0r S0 1 By Assumption A3, C_ 0r S0 1 F KL S 0 1 21 In F : F : (29) Hence, S0 1 2 F = 0 _ tr S0 10 S_ 0r S0r S0 1 S_ 0r S0 1 KL = KL Pn i;j=1 C0 + Pn 2 F S0 1 F 1 2 S0 F 2 S0 1 I F + C0n C0 + i;j=1 2 1 2 i;j +n 1 2 i;j + C0n ! n 2 i;j=1 [C0 ]i;j < KL 1 + Pn K. Therefore, there exists a c; such that, 1 2 S0 1 F V ar 0 0 "0 S_ 0r S0 1 + S0 10 S_ 0r " > c > 0; implying that the required normalization for @ln ( ) =@ r , r = 2; :::; m + 1, is 1 S0 1 F . Case 2: 0 6= 0. The score wrt 2 has been dealt with by Lieberman (2012). The concentrated score wrt r , r = 2; :::; m + 1, is given by @lnc ( ) = @ r 1 2 2 0 (QFnc + 2 0 LFnc ) ; (30) where QFnc and LFnc are given by (14) and (15). We know from Lieberman (2012) that QFnc = Op S0 1 F (31) and that LFnc = Op S0 10 S0 1 1=2 1 : (32) Therefore, we need to distinguish between two subcases. Subcase 2(i): n = O (1). In this subcase, we can normalize the score 22 by S0 10 S0 1 1=2 1 to obtain c ( 0 ) = Op znr but Op p n p + Op (1) n = Op (1), and because S0 1 F KL S0 10 S0 1 1=2 , 1 for some c znr ( 0) = LFnc S0 10 S0 1 1=2 1=2 1 10 1=2 S0 S0 1 1 1 > KL > 0, obtaining the lower bound on the variance of is similar to the derivation of the lower bound in the 0 case and therefore, this subcase is done. Subcase 2(ii): n = o (1). By (30)-(32), in this subcase QFnc = + op (1) : The lower bound on LFnc = S0 10 S0 1 1 was established in Lieberman (2012) and therefore the proof of Theorem 2 is completed. 23 Table 1. Normalization factors for the consistency proof. 0 = 0 0 6= 0 Normalization Normalization 2 1 1 1 10 1 S0 F or S0 S0 1 S0 10 S0 1 1 n = O (1) 2 1 S0 1 F S0 10 S0 1 1 n = o (1) Table 2. Normalization factors and dominant terms for the score. 0 = 0 0 6= 0 Normalization Normalization Dominant Term 1 1=2 S0 1 F kS 10 S 1 k1 QFn + LFn n = O (1) 1 1=2 1 10 1 S0 F kS S k1 LFn n = o (1) 24 Table 3. Performance of the Model Estimators = 0, w = 0:07 ^t w^ ^ n Mean SD Mean SD Mean SD 250 :0696 :0164 :0010 :0624 1:0000 :0027 500 :0702 :0082 :0001 :0448 1:0000 :0018 1000 :0699 :0042 :0006 :0318 1:0000 :0013 = 0, w = 0:2 ^t w^ ^ n Mean SD Mean SD Mean SD 250 :1998 :0170 :0023 :0632 :9996 :0074 500 :1998 :0093 :0007 :0448 :9997 :0052 1000 :2001 :0048 :0000 :0309 :9999 :0037 Zt = 0:25, w w^ ^ n Mean SD Mean 250 :0701 :0038 :2500 500 :0700 :0013 :2496 1000 :0700 :0005 :2501 = 0:07 = 0:25, w w^ ^ n Mean SD Mean 250 :1998 :0053 :2513 500 :2000 :0021 :2523 1000 :2000 :0010 :2501 = 0:2 ^ t SD Mean SD :0626 1:0000 :0026 :0450 1:0000 :0018 :0326 1:0000 :0013 ^ t SD Mean SD :0630 :9999 :0074 :0447 1:0001 :0052 :0316 1:0001 :0036 U [ 1; 1] + bw , where bw = 0:0116648 if w = 0:07 and bw = 0:033289 if w = 0:2. 25 Table 4. Performance of the Model’s Estimators = 0, w = 0:07 ^t w^ ^ n Mean SD Mean SD Mean SD 250 :0700 :0094 :0010 :0630 :9999 :0044 500 :0698 :0050 :0009 :0457 :9999 :0031 1000 :0700 :0023 :0009 :0324 1:0000 :0022 w^ n Mean SD 250 :1996 :0106 500 :1997 :0052 1000 :1999 :0027 = 0, w = 0:2 ^t ^ Mean SD Mean SD :0026 :0629 :9998 :0128 :0004 :0454 :9998 :0091 :0002 :0321 1:0000 :0064 = 0:25, w w^ ^ n Mean SD Mean 250 :0700 :0026 :2485 500 :0700 :0010 :2507 1000 :0700 :0004 :2494 = 0:07 = 0:25, w w^ ^ n Mean SD Mean 250 :2000 :0043 :2514 500 :1999 :0019 :2501 1000 :1999 :0009 :2499 = 0:2 Zt N ID ( Z ; 1) ^ ^ t SD Mean SD :0636 :9999 :0128 :0436 :9998 :0093 :0322 1:0001 :0063 with 26 t SD Mean SD :0638 1:0000 :0045 :0460 :9999 :0032 :0315 1:0000 :0022 Z = w=2. Table 5. p-values of the ADF tests for the ETF data Ticker EWA EWH EWJ EWM (Australia) (Hong Kong) (Japan) (Malaysia) Intercept 0.1800 0.7910 0.6403 0.7002 None 0.5913 0.6190 0.6109 0.8491 Ticker EWS EWT EWY FXI (Singapore) (Taiwan) (South Korea) (China) Intercept 0.6105 0.6142 0.0675 0.3243 None 0.8078 0.6400 0.8034 0.8596 Note: Intercept-the ADF test with an intercept; None-the ADF test without a trend or an intercept. 27 Table 6. Similarity-based model estimation of the ETF data ^ w^1 w^2 w^3 w^4 Ticker EWA (Australia) M1 0.001 (0.0004) M2 EWH (Hong Kong) M1 0.000 (0.003) M2 EWJ (Japan) M1 0.000 (0.0002) M2 EWM (Malaysia) M1 0.000 (0.0003) M2 EWS (Singapore) M1 0.001 (0.0005) M2 EWT (Taiwan) M1 0.000 (0.0003) M2 EWY (South Korea) M1 0.001 (0.0004) M2 FXI (China) M1 M2 0.002 (0.0008) 1.000 (0.0001) 0.269 (0.0033) 1.000 (0.0001) 0.040 (0.0026) 1.000 (0.0000) 0.213 (0.0037) 1.000 (0.0001) 0.031 (0.0109) 1.000 (0.0002) 0.104 (0.0141) 1.000 (0.0001) 0.0258 (0.0082) 1.000 (0.0001) 0.004 (0.0023) 1.000 (0.0002) 0.010 (0.0170) w^5 AIC -7.625 0.001 (0.0036) 0.283 (0.0018) 0.017 (0.0032) -0.003 (0.0004) -0.251 (0.0019) -9.903 -8.010 0.244 (0.0026) 0.374 (0.0019) -0.053 (0.0035) -0.003 (0.0004) -0.241 (0.0018) -9.668 -8.779 0.072 (0.0040) 0.336 (0.0017) 0.054 (0.0028) -0.002 (0.0003) -0.237 (0.0016) -10.180 -8.144 0.329 (0.0114) 0.339 (0.0032) -0.082 (0.0055) -0.008 (0.0007) -0.236 (0.0035) -9.196 -7.583 0.299 (0.0144) 0.424 (0.0030) -0.165 (0.0059) -0.007 (0.0008) -0.348 (0.0035) -9.575 -7.958 0.275 (0.0084) 0.399 (0.0025) 0.042 (0.0044) -0.004 (0.0004) -0.204 (0.0020) -9.335 -7.773 0.202 (0.0026) 0.281 (0.0016) 0.008 (0.0030) -0.001 (0.0003) -0.159 (0.0013) -9.370 -7.722 0.189 (0.0170) 0.332 (0.0017) -0.061 (0.0034) -0.001 (0.0003) Note: M1-the random walk with a drift model; For M1, w^1 ^ ; M2-the similarity model (21); standard errors are in brackets. 28 w^6 -0.166 (0.0017) -9.715 1.04 1.03 1.02 1.01 1.00 0.99 0.98 0.97 Fig. 1: Recursive values of ^ t based on the …tted version of (21) 29 References Chen, R. & R. S. Tsay (1993) Functional-coe¢ cient autoregressive models. Journal of the American Statistical Association 88, 298–308. Dahlhaus, R. (2000) A likelihood approximation for locally stationary processes. The Annals of Statistics 28, 1762–1794. Dahlhaus, R., R.H. Neumann & R. Von Sachs (1999) Nonlinear wavelet estimation of time-varying autoregressive process. Bernoulli 5, 873– 906. Evans, G.B.A. & N.E. Savin (1984) Testing for Unit Roots: 2. Econometrica 52, 1241–1269. Granger, C.W.J. & N.R. Swanson (1997) Introduction to stochastic unit root processes. Journal of Econometrics 80, 35–62. Graybill, F.A. (1983) Matrices with Applications in Statistics, 2nd Edition. Wadsworth, Belmont, California. Hwang, S. Y. & I. V. Basawa (2005). Explosive random coe¢ cient AR(1) processes and related asymptotics for least squares estimation, Journal of Time Series Analysis, 26, 807-824. Levy, A. & O. Lieberman (2012) Overreaction of Country ETFs to US Market Returns: Intraday vs. Daily Horizons and the Role of Synchronized Trading. Forthcoming in Journal of Banking and Finance. Lieberman, O. (2010) Asymptotic Theory for Empirical Similarity models. Econometric Theory 26, 1032–1059. Lieberman, O. (2012) A similarity-based approach to time-varying coe¢ cient nonstationary autoregression. Journal of Time Series Analysis 33, 484–502. Lundbergh, S., T. Ter½asvirta, & D. van Dijk (2003) Time-varying smooth transition autoregressive models. Journal of Business and Economic Statistics 21, 104-121. McCabe, B. P. and A. R. Tremayne (1995). Testing a time series for di¤erence stationarity. Annals of Statistics, 23, 1015-1028. 30 Nicholls, D.F. & B.G. Quinn (1980) The estimation of random coe¢ cient autoregressive models. I. Journal of Time Series Analysis 1, 37–46. Nicholls, D.F. and B.G. Quinn (1981) The estimation of random coe¢ cient autoregressive models. II. Journal of Time Series Analysis 2, 185–203. Nicholls, D.F. & B.G. Quinn (1982) Random Coe¢ cient Autoregressive Models: An Introduction. Lecture Notes in Statistics 11. SpringerVerlag. Phillips, P.C.B. (1987) Towards a uni…ed asymptotic theory for autoregression. Biometrika 74, 535–547. Phillips P. C. B. and J. Yu (2011). “Dating the Timeline of Financial Bubbles during the Subprime Crisis,” Quantitative Economics, 2, pp. 455-491. Wu, C.F. (1981) Asymptotic theory of nonlinear least squares estimation. The Annals of Statistics 9, 501–513. 31
© Copyright 2026 Paperzz