(Previously presented at SAS Global Forum, Dallas, 2015) Adventures in Forecasting David A. Dickey NC State University Learning objectives: __Understand ARIMA models. __Interpret ARIMA output from PROC ARIMA __Forecast with forecast intervals __Understand when to difference data __Understand advantages/disadvantages of deterministic vs. stochastic inputs __Compare forecasts from deterministic versus stochastic input models. __ Incorporate Trends __ Incorporate Seasonality __ (optional) Introduce Cointegration Note: Examples here are run in SASTM TM SAS and its products is the registered trademark of SAS Institute, Cary NC, USA 1 Brief index Pg. Topic 3 Overview 4 Autoregressive Models 8 Model checking 10 Prediction Intervals 11 Moving Average Models 14 Model Selection - AIC 19 Stationarity – Unit Roots 26 Determining Lag Differences for Unit Root Tests 33 Models with Inputs (PROC AUTOREG) 47 Detecting Outliers 52 Seasonal Models 63 (optional) Nonlinear Trends 67 (optional) St. Petersburg Visitor Example 72 (optional) Seasonal Unit Roots 77 (optional) Cointegrated Series Real data examples Silver pg. 5, 24, 36 Iron &Steel 12 Brewers 14, 23 Corn Yields 24 Harley 33 NCSU Energy 40 Deer Crash 52 Ice Classic 63 St. Pete Visits 67 T-bill rates 78 2 Overview of Time Series and Forecasting: Data taken over time (usually equally spaced) Yt = data at time t = mean (constant over time) Simplest model: Yt = + et where et ~ N(0,2) independent. Forecast: Y 1.96 S Example 0: Yt = + Zt , corr(Zt,Zt-1)=0.8 Model (Yt(Yt1et , et ~ N(0,2) independent 3 ARIMA Models: AR(p): “Autoregressive” (Yt ) 1 (Yt 1 ) 2 (Yt 2 ) p (Yt p ) et et independent, constant variance: “White Noise” How to find p? Regress Y on lags. PACF Partial Autocorrelation Function (1) Regress Yt on Yt-1 then Yt on Yt-1 and Yt-2 then Yt on Yt-1,Yt-2, Yt-3 etc. (2) Plot last lag coefficients versus lags. Red series example: Partial Autocorrelations 4 Example 1: Supplies of Silver in NY commodities exchange: Getting PACF (and other identifying plots). SAS code: PROC ARIMA data=silver plots(unpack) = all; IDENTIFY var=silver; run; 5 PACF “Spikes” outside 2 standard error bands are statistically significant Two spikes p=2 (Yt ) 1 (Yt 1 ) 2 (Yt 2 ) et How to estimate and ’s ? PROC ARIMA’s ESTIMATE statement. Use maximum likelihood (ml option) PROC ARIMA data=silver plots(unpack) = all; identify var=silver; ESTIMATE p=2 ml; 6 Maximum Likelihood Estimation Parameter Estimate Standard Error t Value Approx Lag Pr > |t| 668.29592 38.07935 17.55 <.0001 0 AR1,1 1.57436 0.10186 15.46 <.0001 1 AR1,2 -0.67483 0.10422 -6.48 <.0001 2 MU (Yt ) 1 (Yt 1 ) 2 (Yt 2 ) et (Yt 668) 1.57(Yt 1 668) 0.67(Yt 2 668) et (Yt 668) 1.57(Yt 1 668) 0.67(Yt 2 668) et 7 Backshift notation: B(Yt)=Yt-1, B2(Yt)=B(B(Yt))=Yt-2 (1 1.57 B 0.67 B 2 )(Yt 668) et SAS output: (uses backshift) Autoregressive Factors Factor 1: 1 - 1.57436 B**(1) + 0.67483 B**(2) Checks: (1) Overfit (try AR(3) ) ESTIMATE p=3 ml; Maximum Likelihood Estimation Parameter Estimate Standard Error t Value Approx Lag Pr > |t| 664.88129 35.21080 18.88 <.0001 0 AR1,1 1.52382 0.13980 10.90 <.0001 1 AR1,2 -0.55575 0.24687 -2.25 0.0244 2 AR1,3 -0.07883 0.14376 -0.55 0.5834 3 MU (2) Residual autocorrelations Residual rt Residual autocorrelation at lag j: Corr(rt, rt-j) = (j) 8 Box-Pierce Q statistic: Estimate, square, and sum k of these. Multiply by sample size n. PROC ARIMA: k in sets of 6. Limit distribution Chi-square if errors independent. Later modification: Box-Ljung statistic for H0:residuals uncorrelated n2 2 n j j 1 n j k SAS output: Autocorrelation Check of Residuals To Chi- DF Pr > Lag Square ChiSq 6 3.49 Autocorrelations 4 0.4794 -0.070 -0.049 -0.080 0.100 -0.112 0.151 12 5.97 10 0.8178 0.026 -0.111 -0.094 -0.057 0.006 -0.110 18 10.27 16 0.8522 -0.037 -0.105 0.128 -0.051 0.032 -0.150 24 16.00 22 0.8161 -0.110 0.066 -0.039 0.057 0.200 -0.014 Residuals uncorrelatedResiduals are White Noise Residuals are unpredictable 9 SAS computes Box-Ljung on original data too. Autocorrelation Check for White Noise To Chi- DF Pr > Lag Square ChiSq 6 12 81.84 Autocorrelations 6 <.0001 0.867 0.663 0.439 0.214 -0.005 -0.184 142.96 12 <.0001 -0.314 -0.392 -0.417 -0.413 -0.410 -0.393 Data autocorrelated predictable! Note: All p-values are based on an assumption called “stationarity” discussed later. How to predict? (Yt ) 1 (Yt 1 ) 2 (Yt 2 ) et One step prediction Yt 1 1 (Yt ) 2 (Yt 1 ), future error et 1 Two step prediction Yt 2 1 (Yt 1 ) 2 (Yt ), error et 2 1et 1 Prediction error variance ( 2 = variance(et) ) 2 , (1 12 ) 2 , ... 10 From prediction error variances, get 95% prediction intervals. Can estimate variance of et from past data. SAS PROC ARIMA does it all for you! Moving Average, MA(q), and ARMA(p,q) models MA(1) Yt = + et et-1 Variance (1+2)2 Yt-1 = + et-1 et-2 (1)=-/(1+2) Yt-2 = + et-2 et-3 (2)=0/(1+2)=0 11 Autocorrelation function “ACF” ((j)) is 0 after lag q for MA(q). PACF is useless for identifying q in MA(q). PACF drops to 0 after lag 3 AR(3) p=3 ACF drops to 0 after lag 2 MA(2) q=2 Neither drops ARMA(p,q) p= ___ q=____ (Yt ) 1 (Yt 1 ) ... p (Yt p ) et 1et 1 Example 2: Iron and Steel Exports. PROC ARIMA plots(unpack)=all; IDENTIFY var=EXPORT; 12 q et q ACF: could be MA(1) Spike at lags 0, 1 PACF: could be AR(1) (No spike displayed at lag 0) ESTIMATE P=1 ML; ESTIMATE Q=2 ML; ESTIMATE Q=1 ML; Maximum Likelihood Estimation Parameter MU AR1,1 Estimate 4.42129 0.46415 t Value 10.28 3.42 Approx Pr>|t| <.0001 0.0006 MU MA1,1 MA1,2 4.43237 -0.54780 -0.12663 11.41 -3.53 -0.82 <.0001 0.0004 0.4142 0 1 2 MU MA1,1 4.42489 -0.49072 12.81 -3.59 <.0001 0.0003 0 1 How to choose? AIC - smaller is better AIC: -2 ln(Lmax)+2(# parameters) Lmax = max of likelihood function 13 Lag 0 1 AIC AIC AIC 165.8342 166.3711 167.1906 (MA(1)) (AR(1)) (MA(2)) FORECAST lead=5 out=out1 id=date interval=year; Example 3: Brewers’ Proportion Won Mean of Working Series Standard Deviation Lag Correlation 0 1.00000 0.478444 0.059934 Autocorrelations -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 | |********************| 14 Std Error 0 1 2 3 4 5 6 7 8 9 10 11 12 0.52076 0.18663 0.11132 0.11490 -.00402 -.14938 -.13351 -.06019 -.05246 -.20459 -.22159 -.24398 | | | | | | | | | | | | . |********** . |**** . . |** . . |** . . | . . ***| . . ***| . . *| . . *| . . ****| . . ****| . . *****| . "." marks two standard errors | | | | | | | | | | | | 0.149071 0.185136 0.189271 0.190720 0.192252 0.192254 0.194817 0.196840 0.197248 0.197558 0.202211 0.207537 Could be MA(1) Autocorrelation Check for White Noise To Lag ChiSquare DF Pr > ChiSq 6 12 17.27 28.02 6 12 0.0084 0.0055 --------------Autocorrelations--------------0.521 -0.134 0.187 -0.060 0.111 -0.052 0.115 -0.205 -0.004 -0.222 -0.149 -0.244 NOT White Noise! SAS Code: PROC ARIMA data=brewers; IDENTIFY var=Win_Pct nlag=12; run; ESTIMATE q=1 ml; Maximum Likelihood Estimation Parameter Estimate Standard Error t Value Approx Pr > |t| Lag MU MA1,1 0.47791 -0.50479 0.01168 0.13370 40.93 -3.78 <.0001 0.0002 0 1 AIC -135.099 Autocorrelation Check of Residuals To Lag ChiSquare DF Pr > ChiSq --------------Autocorrelations------------ 15 6 12 18 24 3.51 11.14 13.54 17.31 5 11 17 23 0.6219 0.4313 0.6992 0.7936 0.095 0.161 0.006 0.119 0.006 -0.140 -0.061 -0.072 0.066 -0.221 -0.053 -0.242 0.003 -0.037 -0.162 -0.010 -0.076 -0.011 -0.045 -0.035 -0.133 -0.087 -0.114 0.015 Estimated Mean 0.477911 Moving Average Factors Factor 1: 1 + 0.50479 B**(1) Partial Autocorrelations Lag 1 2 3 4 5 6 7 8 9 10 11 12 Correlation 0.52076 -0.11603 0.08801 0.04826 -0.12646 -0.12989 0.01803 0.01085 -0.02252 -0.20351 -0.03129 -0.18464 -1 9 8 7 6 5 4 3 2 1 0 1 2 3 4 5 6 7 8 9 1 | . |********** | | . **| . | | . |** . | | . |* . | | . ***| . | | . ***| . | | . | . | | . | . | | . | . | | . ****| . | | . *| . | | . ****| . | OR … could be AR(1) ESTIMATE p=1 ml; Maximum Likelihood Estimation Standard Approx Estimate Error t Value Pr > |t| Parameter MU AR1,1 0.47620 0.01609 29.59 0.53275 0.12750 4.18 AIC -136.286 (vs. -135.099) <.0001 <.0001 Lag 0 1 Autocorrelation Check of Residuals To Lag 6 12 18 24 ChiSquare 3.57 8.66 10.94 13.42 DF 5 11 17 23 Pr > ChiSq 0.6134 0.6533 0.8594 0.9423 -----------Autocorrelations-----------0.050 -0.133 -0.033 0.129 0.021 -0.173 -0.089 0.030 0.117 -0.154 -0.065 -0.181 0.074 0.027 -0.161 0.010 -0.019 0.007 0.011 -0.012 -0.092 -0.081 -0.106 0.013 16 Model for variable Win_pct Estimated Mean 0.476204 Autoregressive Factors Factor 1: 1 - 0.53275 B**(1) Conclusions for Brewers: Both models have statistically significant parameters. Both models are sufficient (no lack of fit) Predictions from MA(1): First one uses correlations The rest are on the mean. Predictions for AR(1): Converge exponentially fast toward mean Not much difference but AIC prefers AR(1) 17 Stationarity (1) Mean constant (no trends) (2) Variance constant (3) Covariance (j) and correlation (j) = (j)/(0) between Yt and Yt-j depend only on j ARMA(p,q) model: 18 (Yt ) 1 (Yt 1 ) ... p (Yt p ) et 1et 1 q et q Stationarity guaranteed whenever solutions of equation (roots of “characteristic polynomial”) Xp – 1Xp-12Xp-2…p =0 are all <1 in magnitude. Examples (1) Yt = .8(Yt-1) + et X-.8=0 X=.8 stationary (2) Yt = 1.00(Yt-1) + et nonstationary Note: Yt= Yt-1 + et Random walk (3) Yt = 1.6(Yt-1) 0.6(Yt-2)+ et 19 “characteristic polynomial” X21.6X+0.6=0 X=1 or X=0.6 nonstationary (unit root X=1) (Yt)(Yt-1) =0.6[(Yt-1) (Yt-2)]+ et (YtYt-1) =0.6(Yt-1 Yt-2) + et First differences form stationary AR(1) process! No mean – no mean reversion – no gravity pulling toward the mean. (4) Yt = 1.60(Yt-1) 0.63(Yt-1)+ et X21.60X+0.63=0 X=0.9 or X=0.7 |roots|<1 stationary (Yt)(Yt-1) = 0.03(Yt-1) + 0.63[(Yt-1) (Yt-2)]+ et YtYt-1 = 0.03(Yt-1) + 0.63(Yt-1Yt-2)+ et * 20 Unit Root testing (H0: Series has a unit root) Regress YtYt-1 on Yt-1 and (Yt-1Yt-2) Look at t test for Yt-1. If it is significantly negative then stationary. *Note: If X=1 then –(X2–1.60X+0.63) = 0.3 (always equals lag Y coefficient so 0 unit root) Problem: Distribution of “t statistic” on Yt-1 is not t distribution under unit root hypothesis. Distribution looks like this histogram: 21 (1 million random walks of length n=100) Overlays: N(sample mean & variance) N(0,1) Correct distribution: Dickey-Fuller test in PROC ARIMA. -2.89 is the correct (left) 5th %ile 46% of t’s are less than -1.645 (the normal 5th percentile) Example 3 revisited: Brewers PROC ARIMA data=brewers; IDENTIFY var=Wins nlag=12 22 stationarity=(ADF=0); run; Dickey-Fuller Unit Root Tests Type Lags Zero Mean Single Mean Trend 0 0 0 Rho Pr < Rho Tau Pr < Tau -0.1803 -21.0783 -21.1020 0.6376 0.0039 0.0287 -0.22 -3.75 -3.68 0.6002 0.0062 0.0347 Why “single mean?” Series has nonzero mean and no trend. Conclusion reject H0:unit roots so Brewers series is stationary (mean reverting). 0 lags do not need lagged differences in model (just regress Yt-Yt-1 on Yt-1) Example 1 revisited: Stocks of silver 23 Needed AR(2) (2 lags) so regress Yt-Yt-1 (D_Silver) on Yt-1 (L_Silver) and Yt-1-Yt-2 (D_Silver_1) PROC REG: Variable Intercept L_Silver Parameter DF Estimate t Value 1 1 75.58073 -0.11703 2.76 -2.78 D_Silver_1 1 0.67115 6.21 Pr>|t| 0.0082 0.0079 wrong distn. <.0001 OK PROC ARIMA: Augmented Dickey-Fuller Unit Root Tests 24 Type Lags Rho Pr<Rho Tau Pr<Tau Zero Mean 1 -0.2461 0.6232 -0.28 0.5800 Single Mean Trend 1 1 -17.7945 -15.1102 0.0121 0.1383 -2.78 -2.63 0.0689 0.2697 OK Same t statistic, corrected p-value! Conclusion: Unit root difference the series. 1 lag need 1 lagged difference in model (regress Yt-Yt-1 on Yt-1 and Yt-1-Yt-2 ) PROC ARIMA data=silver; IDENTIFY var=silver(1) stationarity=(ADF=(0)); ESTIMATE p=1 ml; FORECAST lead=24 out=outN ID=date Interval=month; 25 Unit root forecast & forecast interval HOW MANY LAGGED DIFFERENCES? Regression: (YtYt1) = b0 + b1Yt1 + b2(Yt1Yt2) + …+ bp(Ytp1Ytp) not. standard | standard distributions for these | Dickey & Fuller (1979) 26 Lagged difference coefficients b2 … bp have standard (asymptotically normal) distributions. Trust their t test p-values in PROC REG. b0 and b1 have t statistics with same nonstandard limit distributions as in the AR(1) model. Implication: Just use PROC REG to determine appropriate number of lagged differences. o Too few => invalid tests o Too many => loss of power Said & Dickey (1984, 1985) prove that methods work even if moving average terms are present. Chang and Dickey (1993) show that the Inverse Autocorrelation Function (IACF) can be used to check for overdifferencing. Yt- = (Yt-1-) + et – et-1 (||<1) Autoregressive Moving Average “Dual” model: Yt- = (Yt-1-) + et – et-1 Definition: Inverse Autocorrelation Function is Autocorrelation Function of dual model. 27 IACF estimation: (a) Fit long autoregression, (b) move coefficients to moving average (MA) side, (c) calculate ACF as if estimated MA is true. Chang (1993) Moving average unit root (e.g. =1) slow decay in IACF (Inverse AutoCorrelation Function) Differencing whenever you see a trend is NOT appropriate: Yt t et et ~ independent N (0, 2 ) Yt 1 (t 1) et 1 Yt et et 1 1 (“Non-invertible moving average”) IACF from generated linear trend plus white noise: 28 Example 4: Corn yields in the U.S. (bushels per acre 1866-1942 and 1943-2014) Analysis of post 1942 yields. Levels data: PROC ARIMA; IDENTIFY var=Yield stationarity=(adf=0); 29 Dickey-Fuller Unit Root Tests Type Lags Zero Mean 0 Single Mean 0 Trend 0 Tau 0.61 -1.23 -8.65 Pr<Tau 0.8454 0.6559 <.0001 Autocorrelation Check of Residuals (from linear trend plus white noise) To Lag ChiSquare DF 6 12 18 24 4.27 10.82 15.54 22.43 6 12 18 24 Pr > ChiSq --Autocorrelations----0.6403 -0.05 ... 0.20 -0.06 0.5445 -0.02 ... -0.21 -0.08 0.6247 0.03 ... 0.04 -0.08 0.5535 0.19 ... 0.08 0.09 Suppose we difference anyway: IACF 30 Example 2 revisited again: Silver Series DATA CHECK; SET SILVER; Lag_silver = LAG(silver); Diff = silver-Lag_silver; OUTPUT; RETAIN; Diff5=Diff4; Diff4=Diff3; Diff3=Diff2; Diff2=Diff1; Diff1=Diff; PROC REG; MODEL Diff = Lag_silver Diff1-Diff5; REMOVE_2andup: TEST Diff2=0, Diff3=0, Diff4=0, Diff5=0; run Variable Intercept Lag_silver Diff1 Diff2 Diff3 Diff4 Diff5 DF Parameter Estimate Standard Error t Value Pr > |t| 1 1 1 1 1 1 1 124.70018 -0.19292 0.62085 0.06292 0.00293 0.21501 0.08100 45.46720 0.07144 0.14427 0.17499 0.17224 0.17392 0.16209 2.74 -2.70 4.30 0.36 0.02 1.24 0.50 0.0092 0.0102 0.0001 0.7211 0.9865 0.2238 0.6201 X X OK ** OK OK OK OK Test REMOVE_2andup Results for Dependent Variable Diff Source Numerator Denominator DF 4 39 Mean Square 803.50398 903.76157 F Value 0.89 Pr > F 0.4796 OK What actually happened next in Silver series? 31 (1) Fit stationary (AR(2)) and nonstationary models (differences~ AR(1)) to the data. (2) Compute forecasts, stationary and nonstationary PROC AUTOREG Fits a regression model (least squares) Fits stationary autoregressive model to error terms Refits accounting for autoregressive errors. 32 Example 5-A: AUTOREG Harley-Davidson closing stock prices 2009-present. PROC AUTOREG data=Harley; MODEL close=date/ nlag=15 backstep; run; One by one, AUTOREG eliminates insignificant lags then: Estimates of Autoregressive Parameters Lag Coefficient Standard Error t Value 33 Estimates of Autoregressive Parameters Lag Coefficient Standard Error t Value 1 -0.975229 0.006566 -148.53 Final model: Parameter Estimates Variable DF Estimate Standard Error t Value Approx Pr > |t| Intercept 1 -412.1128 Date 1 0.0239 35.2646 -11.69 <.0001 0.001886 12.68 <.0001 In PROC AUTOREG model is Zt+Zt-1=et rather than ZtZt-1=et so with = 0.975229, error term Zt satisfies Zt0.97Zt-1=et. ARIMA Harley-Davidson closing stock prices 01/01/2009through 05/13/ 2013. (vs. AUTOREG) 34 Apparent upward movement: Linear trend or nonstationary? Regress Yt – Yt-1 on 1, t, Yt-1 (& lagged differences) H0: Yt= + Yt-1 + et “random walk with drift” H1: Yt=t + Zt with Zt stationary AR(p) * New distribution for t-test on Yt-1 35 With trend * Yt=t + Zt general model so YtYt-1 =t-(t-1) + Zt –Zt-1 and if Zt=Zt-1+et (unit root) then YtYt-1 = + et or YtYt-1 + et “Random walk with drift ” 36 Without trend With trend 1 million simulations - runs in 7 seconds! SAS code for Harley stock closing price 37 PROC ARIMA data=Harley; IDENTIFY var=close stationarity=(adf) crosscor=(date) noprint; ESTIMATE input=(date) p=1 ml; FORECASE lead=120 id=date interval=weekday out=out1; run; Stationarity test (0,1,2 lagged differences): Augmented Dickey-Fuller Unit Root Tests Type Zero Mean Single Mean Trend Lags Rho Pr < Rho Tau Pr < Tau 0 0.8437 0.8853 1.14 0.9344 1 0.8351 0.8836 1.14 0.9354 2 0.8097 0.8786 1.07 0.9268 0 -2.0518 0.7726 -0.87 0.7981 1 -1.7772 0.8048 -0.77 0.8278 2 -1.8832 0.7925 -0.78 0.8227 0 -27.1559 0.0150 -3.67 0.0248 1 -26.9233 0.0158 -3.64 0.0268 2 -29.4935 0.0089 -3.80 0.0171 Conclusion: stationary around a linear trend. Estimates: trend + AR(1) 38 Maximum Likelihood Estimation Parameter MU Estimate Standard Error t Value Approx Lag Variable Shift Pr > |t| -412.08104 35.45718 -11.62 <.0001 0 Close 0 AR1,1 0.97528 0.0064942 150.18 <.0001 1 Close 0 NUM1 0.02391 0.0018961 <.0001 0 Date 0 12.61 Autocorrelation Check of Residuals To Lag Chi-Square DF Pr > ChiSq Autocorrelations 39 Autocorrelation Check of Residuals To Lag Chi-Square DF Pr > ChiSq 5 0.6694 -0.005 0.044 -0.023 0.000 0.017 0.005 12 6.49 11 0.8389 -0.001 0.019 0.003 -0.010 0.049 -0.003 18 10.55 17 0.8791 0.041 -0.026 -0.022 -0.023 0.007 -0.011 24 16.00 23 0.8553 0.014 -0.037 0.041 -0.020 -0.032 0.003 30 22.36 29 0.8050 0.013 -0.026 0.028 0.051 0.036 0.000 36 24.55 35 0.9065 0.037 0.016 -0.012 0.002 -0.007 0.001 42 29.53 41 0.9088 -0.007 -0.021 0.029 0.030 -0.033 0.030 48 49.78 47 0.3632 0.027 -0.009 -0.097 -0.026 -0.074 0.026 6 3.20 Autocorrelations What actually happened? 40 Example 6 (with inputs): NCSU Energy Demand Type of day Class Days Work Days (no classes) Holidays & weekends. Temperature Season of Year Step 1: Make some plots of energy demand vs. temperature and season. Use type of day as color. Seasons: S = A sin(2t/365) , C=B cos(2t/365) Temperature Season of Year 41 Step 2: PROC AUTOREG with all inputs: PROC AUTOREG data=energy; MODEL demand = temp tempsq class work s c /nlag=15 backstep dwprob; output out=out3 predicted = p predictedm=pm residual=r residualm=rm; run; Estimates of Autoregressive Parameters Lag Coefficient Standard Error t Value 1 -0.559658 0.043993 -12.72 5 -0.117824 0.045998 -2.56 7 -0.220105 0.053999 -4.08 8 0.188009 0.059577 3.16 9 -0.108031 0.051219 -2.11 12 0.110785 0.046068 2.40 14 -0.094713 0.045942 -2.06 Autocorrelation at 1, 7, 14, and others. After autocorrelation adjustments, trust t tests etc. 42 Parameter Estimates Variable DF Estimate Standard Error t Value Approx Pr > |t| Intercept 1 6076 296.5261 20.49 <.0001 TEMP 1 28.1581 3.6773 7.66 <.0001 TEMPSQ 1 0.6592 0.1194 5.52 <.0001 CLASS 1 1159 117.4507 9.87 <.0001 WORK 1 2769 122.5721 22.59 <.0001 S 1 -764.0316 186.0912 -4.11 <.0001 C 1 -520.8604 188.2783 -2.77 0.0060 Need better model? Big negative residual on Jan. 2 rm 2000 1000 0 -1000 -2000 -3000 -4000 -5000 01JUL79 01AUG79 01SEP79 01OCT79 01NOV79 01DEC79 01JAN80 01FEB80 01MAR80 01APR80 DATE WC non work work class Residuals from regression part. Large residual on workday January 2. Add dummy variable. Same idea: PROC ARIMA 43 01MAY80 01JUN80 01JUL80 Step 1: Graphs Step 2: Regress on inputs, diagnose residual autocorrelation: Not white noise (bottom right) Activity (bars) at lag 1, 7, 14 44 Step 3: Estimate resulting model from diagnostics plus trial and error: e input = (temp tempsq class work s c) p=1 q=(1,7,14) ml; Maximum Likelihood Estimation Parameter Estimate Standard Error t Value Approx Lag Variable Pr > |t| Shift 6183.1 300.87297 20.55 <.0001 0 DEMAND 0 MA1,1 0.11481 0.07251 1.58 0.1133 1 DEMAND 0 MA1,2 -0.18467 0.05415 -3.41 0.0006 7 DEMAND 0 MA1,3 -0.13326 0.05358 -2.49 0.0129 14 DEMAND 0 AR1,1 0.73980 0.05090 14.53 <.0001 1 DEMAND 0 NUM1 26.89511 3.83769 7.01 <.0001 0 TEMP 0 NUM2 0.64614 0.12143 5.32 <.0001 0 TEMPSQ 0 NUM3 912.80536 122.78189 7.43 <.0001 0 CLASS 0 NUM4 2971.6 123.94067 23.98 <.0001 0 WORK 0 NUM5 -767.41131 174.59057 -4.40 <.0001 0 S 0 NUM6 -553.13620 182.66142 -3.03 0.0025 0 C 0 MU Note: class days get class effect 913 plus work effect 2972. Note 2: Lags are sensible. 45 Step 4: Check model fit (stats look OK): Autocorrelation Check of Residuals To Lag Chi-Square DF Pr > ChiSq Autocorrelations 6 2.86 2 0.2398 -0.001 -0.009 -0.053 -0.000 0.050 0.047 12 10.71 8 0.2188 0.001 -0.034 0.122 0.044 -0.039 -0.037 18 13.94 14 0.4541 -0.056 0.013 -0.031 0.048 -0.006 -0.042 24 16.47 20 0.6870 -0.023 -0.028 0.039 -0.049 0.020 -0.029 30 24.29 26 0.5593 0.006 0.050 -0.098 0.077 -0.002 0.039 36 35.09 32 0.3239 -0.029 -0.075 0.057 -0.001 0.121 -0.047 42 39.99 38 0.3817 0.002 -0.007 0.088 0.019 -0.004 0.060 48 43.35 44 0.4995 -0.043 0.043 -0.027 -0.047 -0.019 -0.032 46 Looking for “outliers” that can be explained PROC ARIMA, OUTLIER statement Available types (1) Additive (single outlier) (2) Level shift (sudden change in mean) (3) Temporary change (level shift for k contiguous time points – you specify k) NCSU energy: tested every point – 365 tests. Adjust for multiple testing Require p < 0.05/365 = .0001369863 (Bonferroni) OUTLIER type=additive alpha=.0001369863 id=date; FORMAT date weekdate.; run; /***************************************** January 2, 1980 Wednesday: Hangover Day :-) March 3,1980 Monday: On the afternoon and evening of March 2, 1980, North Carolina experienced a major winter storm with heavy snow across the entire state and near blizzard conditions in the eastern part of the state. Widespread snowfall totals of 12 to 18 47 inches were observed over Eastern North Carolina, with localized amounts ranging up to 22 inches at Morehead City and 25 inches at Elizabeth City, with unofficial reports of up to 30 inches at Emerald Isle and Cherry Point (Figure 1). This was one of the great snowstorms in Eastern North Carolina history. What made this storm so remarkable was the combination of snow, high winds, and very cold temperatures. May 10,1980 Saturday. Graduation! ****************************************/; Outlier Details Obs Time ID Type Estimate Chi-Square Approx Prob>ChiSq 186 Wednesday Additive -3250.9 87.76 <.0001 315 Saturday Additive 1798.1 28.19 <.0001 247 Monday Additive -1611.8 22.65 <.0001 Outlier Details Obs Time ID Type 186 02-JAN-1980 Additive -3250.9 87.76 <.0001 315 10-MAY-1980 Additive 1798.1 28.19 <.0001 247 03-MAR-1980 Additive -1611.8 22.65 <.0001 Estimate Chi-Square Approx Prob>ChiSq 48 Outliers: Jan 2 (hangover day!), March 3 (snowstorm), May 10 (graduation day). AR(1) produces 3 ‘rebound’ outlying next day residuals (). Add dummy variables for explainable outliers data next; merge outarima energy; by date; hangover = (date="02Jan1980"d); storm = (date="03Mar1980"d); graduation = (date="10May1980"d); 49 PROC ARIMA data=next; IDENTIFY var=demand crosscor=(temp tempsq class work s c hangover graduation storm) noprint; ESTIMATE input = (temp tempsq class work s c hangover graduation storm) p=1 q=(7,14) ml; FORECAST lead=0 out=outARIMA2 id=date interval=day; run; Maximum Likelihood Estimation Parameter Estimate Standard Error t Value Approx Lag Variable Pr > |t| Shift 6127.4 259.43918 23.62 <.0001 0 DEMAND 0 MA1,1 -0.25704 0.05444 -4.72 <.0001 7 DEMAND 0 MA1,2 -0.10821 0.05420 -2.00 0.0459 14 DEMAND 0 AR1,1 0.76271 0.03535 21.57 <.0001 1 DEMAND 0 NUM1 27.89783 3.15904 8.83 <.0001 0 TEMP 0 NUM2 0.54698 0.10056 5.44 <.0001 0 TEMPSQ 0 NUM3 626.08113 104.48069 5.99 <.0001 0 CLASS 0 NUM4 3258.1 105.73971 30.81 <.0001 0 WORK 0 NUM5 -757.90108 181.28967 -4.18 <.0001 0 S 0 NUM6 -506.31892 184.50221 -2.74 0.0061 0 C 0 NUM7 -3473.8 334.16645 -10.40 <.0001 0 hangover 0 NUM8 2007.1 331.77424 6.05 <.0001 0 graduation 0 NUM9 -1702.8 333.79141 -5.10 <.0001 0 storm 0 MU 50 Model looks fine. Comparison: Workday = Non-workday + 3258 Jan 2 = Workday – 3473 = Non-workday + 3258 – 3473 Jan 2 is like a non-workday Class day = Non-workday + 3258+626 AUTOREG - regression with AR(p) errors versus ARIMA – regression with differencing, ARMA(p,q) errors. 51 SEASONALITY Many economic and environmental series show seasonality. (1) Very regular (“deterministic”) or (2) Slowly changing (“stochastic”) Example 7: NC accident reports involving deer. Method 1: regression PROC REG data=deer; MODEL deer=date X11; run; (X11: 1 in Nov, 0 otherwise) Variable DF Intercept 1 X11 1 Parameter Estimate 1181.09091 2578.50909 Standard Error 78.26421 271.11519 52 t Value Pr > |t| 15.09 <.0001 9.51 <.0001 Looks like December and October need dummies too! PROC REG data=deer; MODEL deer=date X10 X11 X12; run; Variable Intercept X10 X11 X12 DF 1 1 1 1 Parameter Estimate 929.40000 1391.20000 2830.20000 1377.40000 Standard Error 39.13997 123.77145 123.77145 123.77145 t Value 23.75 11.24 22.87 11.13 Pr > |t| <.0001 <.0001 <.0001 <.0001 Average of Jan through Sept. is 929 crashes per month. Add 1391 in October, 2830 in November, 1377 in December. 53 Try dummies for all but one month (need “average of rest” so must leave out at least one) PROC REG data=deer; MODEL deer=X1-X11; OUTPUT out=out1 predicted=p residual=r; run; Variable Intercept X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 DF 1 1 1 1 1 1 1 1 1 1 1 1 Parameter Estimate 2306.80000 -885.80000 -1181.40000 -1220.20000 -1486.80000 -1526.80000 -1433.00000 -1559.20000 -1646.20000 -1457.20000 13.80000 1452.80000 Standard Error 81.42548 115.15301 115.15301 115.15301 115.15301 115.15301 115.15301 115.15301 115.15301 115.15301 115.15301 115.15301 t Value 28.33 -7.69 -10.26 -10.60 -12.91 -13.26 -12.44 -13.54 -14.30 -12.65 0.12 12.62 Pr > |t| <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.9051 <.0001 Average of rest is just December mean 2307. Subtract 886 in January, add 1452 in November. 54 October (X10) is not significantly different than December. Residuals for Deer Crash data: Looks like a trend – add trend (date): PROC REG data=deer; MODEL deer=date X1-X11; OUTPUT out=out1 predicted=p residual=r; run; Variable Intercept X1 X2 DF 1 1 1 Parameter Estimate -1439.94000 -811.13686 -1113.66253 Standard Error 547.36656 82.83115 82.70543 55 t Value -2.63 -9.79 -13.47 Pr > |t| 0.0115 <.0001 <.0001 X3 X4 X5 X6 X7 X8 X9 X10 X11 date 1 1 1 1 1 1 1 1 1 1 -1158.76265 -1432.28832 -1478.99057 -1392.11624 -1525.01849 -1618.94416 -1436.86982 27.42792 1459.50226 0.22341 82.60154 82.49890 82.41114 82.33246 82.26796 82.21337 82.17106 82.14183 82.12374 0.03245 -14.03 -17.36 -17.95 -16.91 -18.54 -19.69 -17.49 0.33 17.77 6.88 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.7399 <.0001 <.0001 Trend is 0.22 more accidents per day (1 per 5 days) and is significantly different from 0. What about autocorrelation? Method 2: PROC AUTOREG PROC AUTOREG data=deer; MODEL deer=date X1-X11/nlag=13 backstep; run; Backward Elimination of Autoregressive Terms Lag 6 11 4 9 5 Estimate -0.003105 0.023583 -0.032219 -0.074854 0.064228 t Value -0.02 0.12 -0.17 -0.42 0.44 56 Pr > |t| 0.9878 0.9029 0.8641 0.6796 0.6610 13 12 8 10 7 2 3 -0.081846 0.076075 -0.117946 -0.127661 0.153680 0.254137 -0.178895 -0.54 0.56 -0.81 -0.95 1.18 1.57 -1.37 Preliminary MSE 0.5955 0.5763 0.4205 0.3489 0.2458 0.1228 0.1781 10421.3 Estimates of Autoregressive Parameters Lag 1 Coefficient -0.459187 Standard Error 0.130979 t Value -3.51 Parameter Estimates Variable DF Estimate Intercept 1 -1631 date 1 0.2346 X1 1 -789.7592 X2 1 -1100 X3 1 -1149 X4 1 -1424 X5 1 -1472 X6 1 -1386 X7 1 -1519 X8 1 -1614 Standard Error 857.3872 0.0512 64.3967 74.9041 79.0160 80.6705 81.2707 81.3255 80.9631 79.9970 57 t Value -1.90 4.58 -12.26 -14.68 -14.54 -17.65 -18.11 -17.04 -18.76 -20.17 Approx Pr > |t| 0.0634 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 X9 X10 X11 1 1 1 -1432 31.3894 1462 77.8118 72.8112 60.4124 -18.40 0.43 24.20 <.0001 0.6684 <.0001 Method 3: PROC ARIMA PROC ARIMA IDENTIFY (date X1 ESTIMATE (date X1 FORECAST run; plots=(forecast(forecast)); var=deer crosscor= X2 X3 X4 X5 X6 X7 X8 X9 X10 X11); p=1 ML input= X2 X3 X4 X5 X6 X7 X8 X9 X10 X11); lead=12 id=date interval=month; Maximum Likelihood Estimation Parameter Standard Approx Estimate Error t Value Pr>|t| MU -1640.9 877.41683 AR1,1 0.47212 0.13238 NUM1 0.23514 0.05244 NUM2 -789.10728 64.25814 NUM3 -1099.3 74.93984 NUM4 -1148.2 79.17135 NUM5 -1423.7 80.90397 NUM6 -1471.6 81.54553 NUM7 -1385.4 81.60464 NUM8 -1518.9 81.20724 NUM9 -1613.4 80.15788 NUM10 -1432.0 77.82871 NUM11 31.46310 72.61873 NUM12 1462.1 59.99732 -1.87 3.57 4.48 -12.28 -14.67 -14.50 -17.60 -18.05 -16.98 -18.70 -20.13 -18.40 0.43 24.37 58 0.0615 0.0004 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 <.0001 0.6648 <.0001 Lag Variable 0 1 0 0 0 0 0 0 0 0 0 0 0 0 deer deer date X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 Autocorrelation Check of Residuals To Lag ChiPr > Square DF ChiSq 6 5.57 12 10.41 18 22.18 24 32.55 5 11 17 23 ------Autocorrelations------- 0.3504 0.042 -0.175 0.146 0.178 0.001 -0.009 0.4938 -0.157 -0.017 0.102 0.115 -0.055 -0.120 0.1778 0.158 0.147 -0.183 -0.160 0.189 -0.008 0.0893 -0.133 0.013 -0.095 0.005 0.101 -0.257 Autoregressive Factors Factor 1: 1 - 0.47212 B**(1) Method 4: Differencing Compute and model Dt = Yt-Yt-12 Removes seasonality Removes linear trend Use (at least) q=(12) et-et-12 (A) if near 1, you’ve overdifferenced (B) if 0<<1 this is seasonal exponential smoothing model. Yt Y e e t 12 t t 12 59 et Yt Y (e ) Yt Y (Y Y e ) t 12 t 12 t 12 t 12 t 24 t 24 Yt (1 )[Y Y 2Y 3Y ] et t 12 t 24 t 36 t 48 Forecast is a weighted (exponentially smoothed) average of past values: Yˆt (1 )[Y Y 2Y 3Y ] t 12 t 24 t 36 t 48 IDENTIFY var=deer(12) nlag=25; ESTIMATE P=1 Q=(12) ml; run; Maximum Likelihood Estimation Parameter MU MA1,1 AR1,1 Estimate 85.73868 0.89728 0.46842 Standard Error t Value 16.78380 5.11 0.94619 0.95 0.11771 3.98 Approx Pr>|t| Lag <.0001 0 0.3430 12 <.0001 1 Autocorrelation Check of Residuals To Lag 6 12 18 24 ChiSquare 4.31 7.47 18.02 24.23 DF 4 10 16 22 Pr > ChiSq 0.3660 0.6801 0.3226 0.3355 ------------Autocorrelations-----------0.053 -0.161 0.140 0.178 -0.026 -0.013 -0.146 -0.020 0.105 0.131 -0.035 -0.029 0.198 0.167 -0.143 -0.154 0.183 0.002 -0.127 0.032 -0.083 0.022 0.134 -0.155 60 Lag 12 MA 0.897 somewhat close to 1 with large standard error, model OK but not best. Variance estimate 15,122 (vs. 13,431 for dummy variable model). Forecasts are similar 2 years out. 61 OPTIONAL (time permitting) Trend Breaks Accounting for changes in trend Example 7: Nenana Ice Classic data (trend break) Exact time (day and time) of thaw of the Tanana river in Nenana Alaska: 1917 1918 1919 2010 2011 2012 Apr 30 11:30 May 11 9:33 May 3 2:33 (more data) Apr 29 6:22 May 04 4:24 Apr 23 7:39 a.m. a.m. p.m. p.m. p.m. p.m. 62 When the tripod moves downstream, that is the unofficial start of spring. Get “ramp” with PROC NLIN X= ____/ 00000000000000012345678… 63 PROC NLIN data=all; PARMS point=1960 int=126 slope=-.2; X = (year-point)*(year>point); MODEL break = int + slope*X; OUTPUT out=out2 predicted=p residual=r; Parameter point int slope Estimate Approx Std Error 1965.4 126.0 -0.1593 11.2570 0.7861 0.0592 Approximate 95% Confidence Limits 1943.0 124.5 -0.2769 1987.7 127.6 -0.0418 PROC SGPLOT data=out2; SERIES Y=break X=year; SERIES Y=p X=year/ lineattrs = (color=red thickness=2); REFLINE 1965.4 / axis=X; run;quit; 64 What about autocorrelation? Final ramp: Xt = (year-1965.4)*(year>1965); PROC ARIMA; IDENTIFY var=break crosscor=(ramp) noprint; ESTIMATE input=(ramp); FORECAST lead=5 id=year out=out1; run; PROC ARIMA generates diagnostic plots: Autocorrelation Check of Residuals To Lag ChiSquare DF 6 6.93 6 Pr > ChiSq ---------------Autocorrelations--------------- 0.3275 -0.025 0.067 65 -0.027 -0.032 -0.152 -0.192 12 18 24 14.28 20.30 21.37 12 18 24 0.2834 0.086 -0.104 -0.074 0.184 -0.041 -0.091 0.3164 -0.086 0.111 0.041 -0.073 0.142 -0.066 0.6166 -0.015 -0.020 -0.036 -0.059 -0.047 -0.030 Optional: more on seasonal, time permitting: Example 9: Visitors to St. Petersburg/Clearwater http://www.pinellascvb.com/statistics/2013-04-VisitorProfile.pdf 66 Model 1: Seasonal dummy variables + trend + AR(p) (REG - R2>97%) PROC REG data=aaem.stpete; MODEL visitors=t m1-m11; OUTPUT out=out4 predicted = P residual=R UCL=u95 LCL=l95; run; PROC SGPLOT data=out3; BAND lower=l95 upper=u95 X=date; SERIES Y=P X=date; SCATTER Y=visitors X=date / datalabel=month datalabelattrs=(color=red size=0.3 cm); SERIES Y=U95 X=date/ lineattrs=(color=red thickness=0.8); SERIES Y=L95 X=date/ lineattrs=(color=red thickness=0.8); REFLINE "01apr2013"d / axis=x; where 2011<year(date)<2015; run; 67 PROC SGPLOT data=out4; NEEDLE Y=r X=date; run; 68 Definitely autocorrelated Slowly changing mean? Try seasonal span difference model … PROC ARIMA data=StPete plots=forecast(forecast); IDENTIFY var=visitors(12); 69 Typical ACF for ARMA(1,1) has initial dropoff followed by exponential decay – try ARMA(1,1) on span 12 differences. ESTIMATE P=1 Q=1 ml; FORECAST lead=44 id=date interval = month out=outARIMA; run; Parameter MU MA1,1 AR1,1 Estimate Standard Error 0.0058962 0.32635 0.79326 0.0029586 0.11374 0.07210 t Value Approx Pr > |t| Lag 1.99 2.87 11.00 0.0463 0.0041 <.0001 0 1 1 Autocorrelation Check of Residuals To Lag 6 12 18 24 30 36 ChiSquare DF 1.81 14.87 16.41 17.30 34.15 45.08 4 10 16 22 28 34 Pr > ChiSq 0.7699 0.1367 0.4248 0.7468 0.1961 0.0969 ----------------Autocorrelations---------------0.022 0.109 0.010 -0.056 -0.131 -0.102 -0.055 0.044 -0.012 0.020 -0.052 0.051 -0.000 0.003 0.036 0.004 -0.144 0.135 70 0.011 0.060 0.014 0.018 0.134 -0.107 -0.028 0.211 -0.066 -0.009 0.026 0.014 -0.071 -0.064 0.038 -0.017 -0.133 -0.074 PROC SGPLOT data=outARIMA; ; SERIES X=date Y=residual; run; 71 We have seen deterministic (dummy variables) and dynamic (seasonal differences) models for St Pete visitors. How do decide between them? Seasonal unit root test! Optional: Seasonal Multiplicative Model. Accommodates slowly changing seasonal patterns. Contrast - Indicator (dummy) variables assume same effect for every January (etc.) no matter what year. Simplest model – seasonal random walk. YtYt-s+et YtYt-s=et (1s)Yt=et Seasonal AR: YtYt-s+et 72 YtYt-s=(-1)(Yt-set Idea: Regress YtYt-s on (Yt-sto estimate and get a t test. Distribution of t is far from standard normal. Modification: replace Yt with deviations yt from seasonal means. Visitors data: Estimate of : 1-0.1556=0.8444 Variable DF Intercept 1 Y_12 1 Parameter Standard Estimate Error t Value 5772.30818 1099.20049 5.25 -0.15560 0.03191 -4.88 Pr>|t| <.0001 <.0001 XX Is t significant? What is the (null) distribution? 73 Empirical (left) and N(0,1) (right) densities with means. Area to left of leftmost line (=-4.50, see next model) is 0.4061. Critical value for 5% is -5.84, median is -4.27 (Dickey, Hasza & Fuller, 1984, table 7). DHF Conclusion: Insufficient evidence against seasonal dynamic model (seasonal unit roots). What if it’s seasonal with more lags? Multiplicative seasonal model is popular. 74 (1Bs)(11B2B2…pBp)(Yt-s)=et s=seasonal mean Step 1: Regress Dt=YtYt-s on Dt-1, Dt-2,…Dt-p to estimate ’s. Residuals are et. Using these estimates, estimate filtered lags Ft-s= (11B2B2…pBp)(Yt-s-s) (note t-s subscript). Step 2: Regress Dt on Ft-s, Dt-1, Dt-2, …, Dt-p to get test statistic and improvements for q estimates. For visitor data with p=2: Parameter Estimates Variable DF Parameter Estimate Intercept Filter D1 D2 1 1 1 1 -137.31542 -0.18020 0.01129 0.00207 Standard Error t Value 978.60547 0.04007 0.07204 0.07218 75 -0.14 -4.50 0.16 0.03 Pr > |t| 0.8886 <.0001 0.8756 0.9771 From normal curve graph above, p-value is about 0.4061, not <.0001 Optional (time permitting) Cointegration Two unit root processes Xt and Yt (nonstationary) are said to be cointegrated if there is a linear combination of them St=aXt+bYt such that St is stationary. Usually a and b are unknown but sometimes the situation suggests values and in that case all that is needed to show Cointegration is a unit root test on St . Cointegration analogy: Drunk man walking a dog Man -> unit root process Dog-> unit root process Distance from man to dog stationary! 76 Red path = man Blue path = dog Example 10: T-bill rates for two maturities Data: 10 year t-bill yields 30 year t-bill yields (in logarithms) 77 Unit root tests show (1) Both series are nonstationary (can’t reject unit roots using trend tests) (2) Neither requires more than 1 difference Let St = log(30 year) – log(10 year) = log(30 year rate / 10 year rate) 78 Graph of St Unit root test on St Type Zero Mean Single Mean Trend Augmented Dickey-Fuller Unit Root Tests Lags Rho Pr < Rho Tau 0 0.0028 0.6832 0.01 1 0.0243 0.6882 0.05 2 0.0324 0.6900 0.07 3 0.0204 0.6872 0.04 0 -27.6337 0.0017 -3.69 1 -24.9002 0.0033 -3.42 2 -22.8252 0.0054 -3.19 3 -26.3394 0.0024 -3.36 0 -28.5986 0.0100 -3.71 1 -25.8595 0.0186 -3.44 2 -23.7128 0.0298 -3.20 3 -27.6531 0.0124 -3.37 79 Pr < Tau 0.6844 0.6991 0.7060 0.6970 0.0048 0.0113 0.0216 0.0136 0.0227 0.0486 0.0866 0.0574 Final model: log(30 yr. rate/10 yr. rate) is stationary with estimated mean 0.2805 and autoregressive order 1 structure with =0.92 Summary: Use ACF, PACF to identify p=# autoregressive lags and q= # moving average lags. Stationarity – mean reverting models versus unit roots (random walk type models). Graphics and DF test (and others) available. Diagnostics – errors should be white noise – Ljung Box test to check. Regression with autoregressive or ARMA errors Nonlinear regression – to estimate slope changes (least squares). Seasonal models – dynamic or deterministic. 80 References for unit roots: Chang, M. C. and D. A. Dickey , (1993) "Recognizing Overdifferenced Time Series," Journal of Time Series Analysis , 15, 1-8. Dickey, D. A. and W. A. Fuller (1979). “Distribution of the Estimators for Autoregressive Time Series with a Unit Root”. Journal of the American Statistical Association, 74, p. 427-431. Dickey, D. A. and W. A. Fuller (1981). “Likelihood Ratio Statistics for Autoregressive Time Series with a unit Root”. Econometrica 49, 1057-1072. Dickey, D. A., D. P. Hasza, and W. A. Fuller (1984). “Testing for Unit Roots in Seasonal Time Series”, Journal of the American Statistical Association, 79, 355-367. Said, S. E. and D. A. Dickey (1984). “Testing for Unit Roots in Autoregressive-Moving Average Models of Unknown Order”, Biometrika, 71, 599-607. Said, S. E. and D. A. Dickey (1985). “Hypothesis Testing in ARIMA (p, 1, q) Models”, Journal of the American Statistical Association, 80, 369-374. Dickey, D. A., W.R. Bell and R. B. Miller (1986). “Unit Roots in Time Series Models: Tests and Implications”, American Statistician 40, 12-26. 81
© Copyright 2024 Paperzz