•• THE FITTING OF TlME-SERIES MODELS by James Durbin University of North Carolina and University of London This research was supported by the Office of Naval Research under Contract No. Nonr-855(09) for research in probability and statistics at Chapel Hill. Reproduction in whole or in part for any purpose of the United States Government is permitted. Institute of Statistics Mimeograph Series No. 244 December 1 1959 THE Fl'TTING OF TD'lE-SERIES MODELS 1,2 by J. Durbin University of North Carolina and London School of Economics 1. Introduction The purpose of this paper is to reT1ew methods of efficient estima- tion of the parameters in some of the models commonly employed in timeseries analysis. The models we shall consider are the following: The autoregressive model (1) ut + (tlut _l +. • • + (tkUt_k = Regression on fixed XIS and lagged yls (2) et (t = 1, ••• , n); Yt + (tlYt-l +. • .+ (tpYt _p - ~lXlt +. • .+ ~qX~ + Et (t = 1, ••• , n)i Regression on fixed XIS with autoregressive disturbances (3) Yt· ~lXlt ~. • .+ ~qXqt + ut ll.t + ~Ut_l +. • .+ (tput _p = ~t where (t = 1, ••• , n); (t = 1, ••• , The moving-average model n); lThis research was supported by the Office of Naval Research under Contract No. Nonr-855(09) for research in probability and statistics at Chapel Hill. Reproduction in Whole orin part for any pUrpose of the ~ted States Government is permitted. - 2 Invited review paper presented at a meeting of the Econometric Society at Washing;on, D. C. on December 30, 1959. • 2 The autoregressive model with moving-average errors 6 £. q t-q (t = 1,. ., n). ~ In all cases [£t~ is assumed to be a sequence of independently and identically distributed random variables with mean zero, variance ,l and with finite 'moments of all orders. For discussions of effi- ciency, but only then, the etts will be assumed to be normally distributed. • The sequences u , u l' u 2'0 • 0- - - 0 and y J 0 Y l'Y 2'. - - ~ • are regarded as sequences of fixed constants. For model (1) we assume that all k k-l the roots of the equation x + ct X +." .+"It = 0 !;lave modulus less l than one; a similar assumption holds for the eq~ation xP + (tlxP- l +. • .+ h-l . h ( ) ct p = 0 in model (2) and model 3 J the equat~on; x + ~lx + • • • + ~h:o q l in model (4) and the equations x P + 'YlxP- + • • • + 'Y = 0 and x + ~l p q x - l + • • • + ~ = 0 in model (5). These assumptions are more restricq tive than is necessary for certain parts of the exposition; they have been made at the outset in this rather sweeping fashion for the sake of sir~licity of presentation later on. Throughout the paper an es- 'timator will be said to be efficient when it is asymptotically ef- ficient. 20 the autoregressive model There is an underlying unity in the methods of estimation to be discussed in this paper arising from the fact that they all depend fundamentally on the method of least squares. model For the autoregressive • 3 n ~l the minimisation of to estimates (ut + ~Ut_l +. • 0+ ~Ut_k)2 leads directly •• J a which are efficient and easy to compute. k The sampling properties of a , ••• , a were first investigated by k l Mann and Wald [8J who showed that they are the same asymptotically ~'o as those of least-squares estimates of regression coefficients in multivariate normal systems. (Actually Mann and Waldls model con- tained a constant a o and therefore differs slightly from (1) but this does not substantially affect the conclusions). It is sometimes preferable to work with the asymptotically equivalent values a " 1 0 0 • J a IC I obtained frol.l the equations (6) + where r i ~r=o is the i th sample serial correlation coefficient. If desired the rila can be replaced by estimates of the serial covariances. For all except small vailiues of k the equations (6) are easier to solve than the least-squares equations owing to their possession of a more symmetric structure. It is found that a pivotal reduction of (6) reduces to the recurrence relations r a + a a-l.l r s-l + a a-1 0 2 r a- 2+. • .+as-l.s-l = ... - - - - - - - -+. - -• -• - - - -+a - - - - r- a-l.a-1 s-1 (s = 1,. •• J It) • 4 (8) a ... a +a a s-l.r ss s-l~s-r sr (r = 1,. 0 .,s-l), USi:ng;, ~l ... -r a l as the starting value. The quantities asl '. Ii . , are the coefficients of the best-fitting autoregressive model of ss order s, while -a ,o • o,-akk are estimates of the partial correlation 22 coefficients between observations 2, •• ~,k time periods apart with intermediate observations held fixed. Apart from yielding this in- formation of incidental interest, the use of (1) and (8) is decidedly more expeditious than a direct solution of (6). The final coefficients akl ,. c .,akk are identically equal to the values aI', •• obtained from (6) .. 3. "~', Regression on fixed XIS and lagged yls The extension of the autoregressive model to include fixed XIS appears to have been first considered about 1945 by Cowles Commission Writers in connection with the study of simultaneous regression systems (see Koopmans et al l11 ).. In this paper we are concerned only with the single-equation model Yt + alYt_l +. • .. +apyt _p - ~lXlt +0 • .+~qXqt + Bt (t = 1,. • .,n) where [~t} ,. 0' qt are sequences of constants. Putting Yt-i It x q+i • t and a i the form [x 1 Q = -~q+i (i ... 1, ••• ,p) the model can be written in (t ... 1,. ... ,n), or in an obvious matrix notation, Y = X~ + £', where X is a nx(p+q) • matrix• The least-squares estimatOrs of of the vector b = (X'X)-lX'y. ~l'. 0 .'~Q+P are the elements If X were a matrix of constants, least- squares theory would tell us that the vector of discrepancies b has vector mean zero and variance matrix 02(X1X)-1. ~ For the present model these results do not hold since some of the elements of X are random variables. However we can obtain analogous results by introducing the matrix t = [E(XtX)] -1 XIX, where E(X'X) denotes the matriX whose elements are the expected values of the corresponding elements of XIX. The matrix t will usually converge stochastically to the unit matrix as n ~ It is show~ in 2 matrix 0 E(XIX). 00. [41 that t(b-~) has vector mean zero and variance [t is also shown that when the 8 variance matrix is minimal in a certain sense. ( 9) s2 J. IS are normal this Letting ~ )2 = i;1f. t;l (Yt - blxlt • ...~bq+p..t x'l+P. t it is shown further that under certain assumptions E(s 2 ) 1 = 0 2+O(~) n and that bl ,., ... ,b are asymptotically multinormal. The implication k of these results is that least-squares theory applies asymptotically to the model (2). 40 Regression on fixed XIS with autoregressive err~ Of greater relevance in many investigations is the model (t = where the uttS are autocorrelated o 1,. Q .,n), It is well known that a simple least- squares analysis of data from such a model can be seriously misleading 6 owing to the inefficiency of the least-squares estimators of the ~'s and to the biasedness of their estimates of variance (see for example the discussion by Cochrane and Orcutt and Anderson Ll] )0 [3] , Watson [loJ It is true that for certain special cases, in- cluding regressions on polynomial trends and seasonal constants, the least-squares coefficients have been shown to be asymptotically efficient (Grenander and Rosenblatt [6J and Anderson and Anderson [2] )~ Nevertheless the least-squares estimator of variance remains biased and the use of analysis-of-variance methods for testing hypotheses and setting confidence limits can be expected to give incorrect results. It is often reasonable to assume that the ut's have the autoregressive structure (11) ut + CLlut _l +0 • .. + CLpUt ...p .. f t . (10) may then be transformed to the form (12) Yt + CL1Yt _l +. • 0+ CLpYt _p = ~l~t + CLl~lXlot_l +. This now has the same structure as (2) except that relations exist between the coefficients. Conceptually the simplest approach to the n estimation problem would be to minimise £t2 , when this is expressed t=l in terms of the Y's and a's, with respect to the a I s and ~ IS. How- L ever, since some of the coefficients in (12) are quadratic in the unknowns this procedure results in non-linear estimating equations which are usually unmanageable for practical use. • 7 Before going on to discuss general methods we draw attention to an important special case in whioh simple methods based on leastsquares theory do give satisfactory results. This arises when each x. t j (i=l, ••• ,q; j=l, ••• ,p) can be expressed as a linear function 1.0 - of xlt'o • .,Xqt • Examples are polynomial trends, seasonal constants and periodic regressions. (12) then reduces to the form (2) with funotionally independent coefficients which can be legituaately estimated by least squares. We illustrate the procedure by considering the case of regression on a pure periodic function with firstorder autoregressive disturbances, i.e. Yt = ~l cos'At + ~2 siriAt + u " where ut + aUt _l = Et and t where A is known. We have Yt +ayt-l .. ~l cosAt+a~l cos'A( t-l) + ~ 2 .sin 'A t+aJJ 2 s inA( t-l) '+ t, " == "(1 cos'At + "(2 sin'At + 8 t, where 'Yl == ~l + a~l Efficient estimates a, n L (Yt + t=l of ~l' ~2 cos'A .. a~2 sinA, and c2 are then obtained by minimising aYt_l .. cl cos'At .. c 2 SiriAt)2 , whence estimates bl , b2 ~ result from the equations (1 + a cos 'A) bl .. a sin ~2 = 01 a sin 'A b + (1+acos'A)b .. c2 • 2 1 8 a 2 is estimated by s 2 .1 = n-3 2 n ~ (Yt+aYt _1 - cl cOSAt - c2sinAt) • Fbr simPlicity of exposition our discussion of the general problem will be based mainly on the two-coefficient model (13) Yt = pXt + ~t ' where (t=l,. (14) 0 .,n). It was pointed out by Cochrane and Orcutt (3 ] that if a were known we could employ an autoregressive transformation Yt I = xt Xt ' Y t I = AX t' = Yt +.. aYt _l + axt-l to put the model in the form t I + et , to Which least squares can be applied quite validly. For the case of unknown a they suggested that one should insert in the autoregressive transformation either a value of a guessed on a priori grounds or a value estimated from the residuals of a fitted least-squares regression, further iterations being carried out if desired. these suggestione~~ The first of is computationally attractive, though inefficient, while the second, though efficient, is computationally burdensome. An approach will now be outlined Which leads to estimates which are efficient and which are not too onerous to compute. From (13) and (14) we have Yt + aYt_l - pXt + r.x t - l + c t ' where y "'" ape If we were to ignore the restriction y = (15) a~ and regard y as a free parameter, (15) would have the form (2) so that the leastn squares estimators a, b, c obtained by minimising ~ (Yt+aYt_l-bXt-CXt_l)2 t=l 9 would be efficient estimators of a, mators of a and ~ bution of a, b, Co since Yt-l ~,y. To obtain efficient esti- we need only therefore consider the joint distri- = ~Xt_l + ut _l • Consequently a, band c - a~ are the least-squares coefficients of regression of Yt on -ut _l , xt and xt_lo The corresponding true coefficients are a, ~ and zero in virtue of the relation (16) By a slight extension of the results of the previous section we know that least-squares regression theory applies asymptotically to (16). asymptotically normally distributed with matrix (i CD ~ero a~ and c - are means and variance A-I, where A is the expected value of the matrix l: u 2_ t 1 l:Ut_1Xt I: ut_1X t l: x t i: ut -J3't-l I: XtX _ distribution of a, band c n4 ~ Consequently the quantities a - a, b - of the expression 2 i: XtX _ t 1 t 1 ~ % ut_1X t _1 i: x 2_ t 1 • has a density which is the limit as 10 (11) lf 2 2 constant x exp [ - 2a 2t(a-a) E(Zut _l ) + (b-~) + 2( b-jl )( ZXt a-.~) ],xt"t_l + (a-all)2 ],xt:l~] • On maximising the exponent of (17) with respect to a and th~t 22 ~ we find their efficient estimates are 1\ a ... a Z(xt + axt_l)(Yt + aYt_l) (18) Z(x + ax _ ) 2 t l t It is remarkable that "~ .. is precisely the same estimator as is obtained by using a as an estimator of a in an autoregressive transformation. The same procedure was arrived at earlier by the author [4 ] using a rather different approacho 2 t and ~ ZX~_l' as :Ls legitimate to the order of accuracy considered here, we find that (18) Ignoring the difference between ~ 'l:x. reduces to (19) R... t' (1 + ar)b + (a + r)c 2' 1 + 2ar + a 2 where r ... 1::xt X _ / ZX t t 1 depently distributed. ~ Note that a and ~ ~ are asymptotically in- The treatment of the general model 0) follows along similar lineso We shall confine ourselves here to the presentation of a brief summary L of the computing routine, referring the reader to 4} for further theoretical discussion. For simplicity of exposition let us suppose 11 independent. (The outstanding case to the contrary is the common one in which the model contains a constant term, i.e. x equals it unity for some i and all t; this, however, is easily dealt with by working throughout with deviations from sample means as in ordinary regression analysis. MOdifications for other cases are easily worked out ad ~.~)o Suppose that the normal equations for the least-squares fitting Xqot _p' -Yt_l'O •• ,-Yt - p ' the variables being taken in this order, are denoted by where A is the (p + q + pq) x (p + q'+ pq) matrix of sums of l and products of the variables L .L t ,· • .. , xq.. t -p '-Yt- 1 ,. 0 ., sq~ares -Yt .... , '~.,:p where c is the vector of sums of products of these variables with l Y ' and where b is the vector of regression coefficients. If the t equations are solved by a method such as the abbreviated Doolittle method note that it is only necessary to carry the back solution far enough to give the coefficients al ,. • .,a of -Yt 1'· • p - 0' -Yt -po Form a new matrix A2 whose first row is obtained by multiplying the first p+l rows of ~ by 1, a , .... ,ap respectively and adding, l whose second row is obtained by taking the second group of p+l roes of A , multiplying by 1, a , ••• ,ap respectively and adding. Conl l tinue in this way until A2 has q rows. Repeat the process on 01 to give a new vector c containing q :elements. 2 Repeat the process on the -1' ·12 columns of A2 to give a new matrix A with q rows anu columns. Let "3 be 3 the vector obtained by subtracting from c 2 the sum of ~ times the last column of A2 , a times the second last column, •• • , ap times 2 the ~ column of A2 counting backwards from the last column. Then the solution "- ~ of the equation (20) is the vector of efficient estimators of ~l'o •• '~q. Its estimated . · ·1.S s 2A-1 I were h varl.ance matr:1,X 3 5. The moving-average model The special problems of fitting moving-average models can be appreciated from a consideration of the first-order model (21) (t • 1, • • • ,n). A simple estimator of ~ can be obtained by equating the theoretical value of the first serial correlation, namely ~/(1+~2), to the sample value r l <, However, the estimator was shown to be inefficient by Whittle [11,12J who proposed the use of an approximate maximumlikelihood estimator equivalent to the solution to' the equation ~ [~ (1 - 2~rl + 2~2r2 l-~ of where r. is the i th sample serial 1. - 0 • 0)1~ ,. 0 co~relation. Although efficient, this estimator is difficult to calculate and the method does not easily extend to higher-order models. The 13 J author [ 5 has therefore suggested a different method in which a ~th .. order autoregressive model is first fitted to the data, k being taken to be large. cients ~, 0 • 0, It is 'shown in l?Jthat the fitted coeffi- ak have an asymptotic distribution with the approximate density 2 1/2 \" nfk-l 2 2 2}U constant x (l-~ ) .. exp (ai +l +~ai) + ~ ~U L- '2'li:C Neglecting the factor • (1_~2) -1/2 since this is of small order in n compared with the remainder and maximising the exponent With respect to ~ we obtain as our estimator of ~, k-l .2: ( 22 ) b .... - 1=0 a i ai +l ""k~--2 (a -cl), o Z a. i-o 1 the efficiency of which can be made as close to unity as desired by taking k sufficiently largeo For the gener al model (t == 1, ... ",n) the same approach yields estimators b ". " .,b Which are obtained l h as the solution to the linear equations h (23) E At .' b ... -A J n 'r i=l r-Jl (r = 1, .... ,h), k-r where A .... 2: aiai + ' the a1. f s being as before. r 1=0 r 6. The autoregressive model with moving-average errors. This model"which has the generating equation has greater theoretical importance than the attention paid to it in time-series literature would appear to indicate. Firstly, it is the general model of which the autoregressive and moving-average models are special cases. Secondly, when q=p-l it is the only one of the three models whose structure is invariant under changes in the timeperiod between successive observations" a fact pointed out by Quenouille [9J . Thirdly" equi-spaced observations from a continuous stochastic process generated by a linear stochastic differential equation, cr having a rational spectral density, conform to a discrete model (24) with q = p-l. Consequently a solution to the problem of efficient fitting of (24) also gives as a by-product the solution to the problems of fitting stochastic differential equation models and of estimating rational spectral densities from discrete data~ Yet in spite of the theoretical importance of the model J only Quenouille [9 appears to have considered the fitting problem; however Quenouille did not attempt to discuss efficient methods of estimation. Two methods of fitting will now be described. non-iterative but is not fully efficient. The first is The second is an iterative method in which the autoregressive and moving-average parameters are estimated alternately.... It is hoped to investigate the performance 1,5 of both methods by means of sampling experiments and to publish the results later~ Let us begin by considering the first-order model (2,5) Let al,o (t=l,. • .,n) D D o,a denote the coefficients of a fitted autoregressive k model of large order k and let e t denote the residual ut + +. . . .+ akut_k<l ~Ut_l In the first method of estimation we replace ~t-l in (2,5) by e _ and estimate y and 0 by the values of c and of d t l which minimise E (u + cU _ - de _ )2 t l t l t co This leads to the equations the solution of which is asymptotically equivalent to the expressions (26) +. " .+ \:r + k l 3 c = - alr + a r +. ... + akr k 2 2 l (27) d ... c ~r2 + a2r r + a r , +. .+ akr + l 2 k l " + I + alr +• • • + akr k l 1 The method is readily extended to cover higher-order systems and can be used to give starting values for the second method, which we now describe. First let us consider the estimation of 5 for a given value of Y. Suppose that the true values of the first k autoregressive coefficients 16 are ~1'~ before. the fitted values being denoted by al , ••• ,~ as Using Mann and Wald's results we know that for large k • ~k' [8J the asymptotic distribution of a , ••• ,a is normal with density k l (28) constant x exp [- ..!;,. ~ 2a~ 1,j=l (a - el.) (a ... el j ) E (ut . u .) 1 1 J -1 t -J~ The quadratic form in the exponent can be represented operationally as where Gdenotes the operation of taking an expectation over variation of the u's, the a's being regarded as fixed constants. Suppose that the true value of transformation from a1 ,0 A ,. .,A were made, l k I> y were known and the following ~ ,ak and ell"" • • ,elk to 11'. • ."Ik and 0 l =V1 +y a2 = f/2 + yf/l a ell ... A.1 + Y ~ ~2 = A. 2 + "fA1 . ~ =( k + "ft k-1 SUbstituting in (29) we have . •J (30) Q = ~~'i-il)(Ut_1+rUt_2)+· ~ ~+(tk-l-~-1)(Ut-k+1+YUt-k) +(fk-Ak) U t _k} 2 • 0 11 Now ut + AUt _l = et + 6 e tal • Consequently, on putting Zt we see that Zt satisfies the moving..average process Zt Moreover ut _ k 2 :& Zt_k .. YZ _k_ + Y Zt_k_2 +. t l ~. • :& 0= ut + 'YUt _l €t + 6 S t-l • Consequently gives (31) • From the fact that the true autoregressive coefficients are generatea by the relation 1+alz+a2z2+•• 0 = (l+yz )(1+0z)-1, it . 1 follows that for large k, A. i • (.6) (i = 1,. e o,k) as accurately as desired. Since A can be made arbitrarily small the error committed k by taking A + = (_6)r A in (30) in place of ~+r = (_y)r A (r=1,2, ••• ) k r k k can be made arbitrarily small. Thus to a high degree of accuracy (31) holds with Zt_l' Zt_2,e "oorrespgndtng to a moving-average model for which A , A ,D 2 l 0 0 •• are the true autoregressive coefficients. Comparing (31) with (29) we see that 11 , 12, • • • behave like autoregressive coefficients fitted to data corresponding to the model (t=l, ... GIn) • Consequently it follows from (22) that the efficient estimator of 18 5 is (fo (32) II; 1) Q In practice it will probably suffice to terminate the smmnations in this expression at about i .. k. Note that we need not take explicit account of the fact that the "constantlt in (28) depends on the co" variance determinant of a ,. • ., a , which in turn depends on the k l unknown parameter 5, since the determinant is of small order in n compared with the exponent of (28)0 Let us now consider the converse problem, i.e. the estimation of y given the true value of 5. Define w 'o • o'Wn by the relation 1 where w is either defined arbitrarily O? taken equal to u -5u 1+52u 20 • •• o 0 - The wts then satisfy the autoregressive model Consequently y is efficiently estimated by the expression where E' denotes smmnation over the range of passible values of t .. 19 divided by the number of terms summed. Let s r ... zr u u Pr ... t t+r Zl U W t t+r It is easy to verify that to a good approximation we have the relations (34) p r = sr - 8p r-l (r - -k+l , -k+2, •• olk) (r = k-l, k-2, •• 0,0) (5) Applying (4) recursively taking P-k = S_k' • and then (35) recursively taking qk ... Pk' we obtain ql and qo from which we obtain the estimate of y as (36) By applying (32) and (36) alternately we obtain an iterative method of estimating y and 8. a starting (26) or (27) can be used to provide point~ The treatment of the general model follows similar lines. are transformed to The fitted a\l.toregressive constants a ,,, •• ,a k l i l , •• • ,fk by the relations 20 '1. .. 11 + "fl = i2 + "fl a2 . fit s and further fit s + "f2 are obtainable from the expression t The 11 r + y'rr..1 +. • .+ f/. "fp r-p • 0 behave approximately like autoregressive coefficients fitted to data generated by the moving-average model •• ,6 q can therefore be estimated from equations (23) taking ro .1: (JiRi+r (flo = 1).. In practice the sUIlIDlation can probably be J.=O truncated at about i ... k. To estimate "f ,. l e ."f for given 6 p 1 ,0 ... ,6q we use in place of (34) and (35) the expressions (37) (38) • • + 6 p. q 1"... q q + 6 q r . l r+ .. s (r r = -k+q,o •• ,k) (r .. k..q, ••• ,0) 1 +0 ... + 6 q ... p q r+q r Estimates c ' •• o,c are then obtained from the equations l p .. • 21 References [11 Anderson, R. L.: lIThe Problem of Autocorrelation in Regression Analysis, It Journal of the American Statistical Association, Vola 49 (1954)" 113 - 1290 [2] [3 J Anderson, R. L.. and Anderson, T. W.: "Distribution of the Circular Serial Correlation 60efficient for Residuals from Fitted Fourier Series, It Annals of Mathematical Statistics, Vol. 21 (1950), 59 - 81.. Cochrane, D. and Orcutt, G. Ho 'lApplication of Least-Squares Regression to Relationships Containing Autocorrelated Error Terms," Journal of the American Statistical Association, Vol. 44 (1949), 32 - 61 0 Durbin, J. 2 "Estimation of Parameters in Time-Series Regression Models, II Journal of the Hoyal statistical Society, Series B, (1960). In the press. [;J [6J ,: ItEfficient Estimation of Parameters in l"1oving-Average Models," Biometrika, Vol. 46 (1959)g In the press. Grenander, U. and Rosenblatt, M.: Statistical Analysis of Stationary Time Series. New York: John Wiley and Sons, 1957. Koopmans, To (Editor): Statistical Inference in Dynamic Economic Models. New York: John Wiley and Sons, 1950" L8 J Mann, H. B. and Weld, A.: llOn the Statistical Treatment of Linear Stochastic Difference Equations, II Econometrica, Vol. II (1943), 173 - 220. [9J Quenouille, M. H.: Time-Intervals," [lOJ Watson" G. S. = Metrika" Volo 1 (1958)" 21 - 21. "Serial Correlation in Regression Analysis I, It Biometrika" Vol. [l~ "Discrete Autoregressive Schemes ·with Varying 42 (1955), 321 - 341. Whittle, P.: Hypothesis Testing in Time Series Analysis. AlmgVist and Wiksells, 1951. , 2 ---~kiv for Uppsala: "Estimation and Information in Stationary Time Series,1l Matematik, Vol. 2 (1953), 423 - 434.
© Copyright 2026 Paperzz