International Journal of Modern Mathematical Sciences, 2014, 11(1): 40-48 International Journal of Modern Mathematical Sciences ISSN:2166-286X Florida, USA Journal homepage:www.ModernScientificPress.com/Journals/ijmms.aspx Article ARIMA Methods of Detecting Outliers in Time Series Periodic Processes T. A. Lasisi1 and D. K. Shangodoyin2,* 1 Department of Mathematics and Statistics, The Polytechnics Ibadan, Nigeria 2 Department of Statistics, University of Botswana, Botswana * Author to whom correspondence should be addressed. E-mail: [email protected] Article history: Received 4 March 2014, Received in revised form 27 May, 2014 Accepted 24 June 2014, Published 28, July 2014. Abstract: We utilized the periodic likelihood ratio test statistic to assess the evidence that any estimated outlier for a given period is an outlier. The conditions are that the timing t T of the occurrence of an outlier is known or assumed and the magnitude of the outlier has been estimated or specified. While in February period, AO and TC outlier models respectively confirmed 30% significance of the weights of the outlier injected into the series, however, LS confirms significantly 40% significant of the weights injected displaying more powerful feat in capturing the time points of outliers. In October period, only IO outlier model confirmed significantly 20% of the weight of the outliers injected into the series. Our findings reveal that the AO, LS and TC models perform better in terms of their ability to capture the timing occurrence of outliers. Keywords: Likelihood ratio test statistic, ARMA models, Periodic processes 2000 Mathematical Subject Classification: C15, C22, C52 1. Introduction In practice, there may be need for detecting outliers if they are influential to a model fitting. Fox (1972) appears to have first discovered the detection of outliers with time series, assuming an Copyright © 2014 by Modern Scientific Press Company, Florida, USA Int. J. Modern Math. Sci. 2014, 11(1): 40-48 41 autoregressive process on outlier-free series with Gaussian noise. The approach is a likelihood ratio criteria for testing the existence of additive and innovative outliers under the condition that serial correlation exists, a good condition for analyzing periodic series. A number of research works have been done on both least squares and maximum likelihood methods of detecting outliers assuming known process models (Bianco et al 2001, Tsay 1996). Bruce and Martin (1989) commenting on Cook’s distance statistic(Cook,1977) observe that the test statistic based on the influence of the i-th observation on the parameter of the regression model is well established but the dependency relations which exist in time series give rise to a “smearing” effect when the test statistic for a model coefficient is calculated and therefore the need for time series ARMA model specified that test statistic. The identification of influential observations in the complex of ARIMA has been developed (Chang and Tiao 1983, Pena 1984, Tsay 1986 and 1996). In this paper, we utilized the derived statistic in (Lasisi, T. A. et a, 2013l) Section 3.1(3.1.1-3.1.4), PP 92 to assess the evidence that any estimated outlier for a given period is an outlier. The conditions are that the timing t T of the occurrence of an outlier is known or assumed and the magnitude of the outlier has been estimated or specified. Consideration is given to ARIMA models considered on periodic processes using the likelihood ratio test statistic developed by Tsay (1986), this extended to level shift and transitory change outliers for periodic series in this study. 2. Outlier Detection Test Statistic If we assume a single outlier model: 1 ( Bv )Zt ( r ,m) 1 ( Bv )Yt ( r ,m) ( Bv ) Dt((Tr ,)m) t (1) where ( B v ) is the weight attached to the magnitude of outliers. Suppose that the ARIMA models ( Bv : I )Yt ( r ,m) ( Bv : I )et ( r ,m) and ( Bv : I )Zt ( r ,m) ( Bv : I )et ( r ,m) where (2) ( Bv ) ( Bv ) ( Bv ) , are fitted on the outlier free series and outlier infested series respectively. t ( r ,m ) is a sequence of independent Gaussian variates with mean zero and variance one. In model (1), it is assumed that all the zeros of ( m) ( Bv : I ) and ( m) ( Bv : I ) are on or outside the unit circle and that ( m) ( Bv : I ) and ( m) ( Bv : I ) have no common factors, if some of the errors of ( m) ( Bv : I ) are on the unit circle then it is assumed that the process starts at fixed time point with known or given starting values. The white noise process et has mean zero and variance e2 , applying to the least square theory we have the following results : (i) ADDITIVE OUTLIER CASE: Copyright © 2014 by Modern Scientific Press Company, Florida, USA Int. J. Modern Math. Sci. 2014, 11(1): 40-48 42 1 (1) 2 ˆ ( Bv ) e2T ( r ,m ) The magnitude of outlier estimated D T ( r ,m ) is PT ( r ,m ) ( B)eT ( r ,m ) with variance (ii)INNOVATIONAL OUTLIER CASE: 2 The estimate of outlier magnitude is PT ( r ,m ) et(2) ( r ,m ) with the variance as eT ( r , m ) (iii)LEVEL SHIFT CASE: with variance 1 B v 2 2 ( Bv ) e2 Given that the weight 1 B v 1 , then the magnitude of the outliers estimated is 1 B ( B )eT ( r ,m ) t ( r ,m ) v 1 v . (iv)TRANSITORY CHANGE CASE: The weight coefficient of outlier is 1 v Bv an with variance 1 ( B)v 2 1 , the estimate of the outlier is 1 ( B) ( B)et ( r ,m) v 1 2 ( B) e2 t ( r ,m ) Based on the results in (i)-(iv), we may construct the test statistics for testing the existence of an outlier at time point t T . The null hypothesis is that there is no outlier at time point t T ; under the assumptions of knowing the time series parameters and time occurrence of the outliers the following test statistics are distributed as N 0, 2 .Although Chang and Tiao (1983) suggested the critical values 3.0, 3.5 and 4.0. But in practical time series analysis, we suggest the use of P (i ),T c for a specified value of under normality assumption Tsay(1988). Assume that () N (0,1) and suppose that the ARIMA( p, d , v, q) model parameters, timing of outliers and magnitude are known then the test statistic for each scenario would be as follows: (i) Existence of AO: A,T (ii) Dˆ T ( r ,m ) 1 ( B v )ˆ et ( r ,m ) (3) Existence of IO: I ,T Dˆ T ( r ,m ) ˆ et ( r ,m ) (4) (iii) Existence of LS: L,T 1 B Dˆ T ( r ,m ) v 1 ( B )ˆ et ( r ,m ) 1 v (iv) Existence of TC: Copyright © 2014 by Modern Scientific Press Company, Florida, USA (5) Int. J. Modern Math. Sci. 2014, 11(1): 40-48 C ,T Dˆ T ( r ,m ) (1 v B v )1 1 ( B v )ˆ et ( r ,m ) 43 (6) 2.1. Outliers Detection Algoritms (ODA) The criteria proposed in (i)-through (iv) would be useful in detecting outliers in periodic processes. If these estimates of the outliers magnitude are estimated using Outlier Detecting Algorithm (ODA) in figure 1 to get the magnitude from particular time point T; we assumed outlier free points have a zero magnitude and used these to find if this is true. The result in this section is only used in the intermediate steps of outlier detection procedure. The final estimates of outliers are from the model incorporating all the outliers in which all parameters are estimated in the ARIMA( p, d , v, q) model. The following flow chart demonstrates how automatic outlier detection works. Let be Dˆ t the magnitude of outliers and t T when an outlier occurs. Copyright © 2014 by Modern Scientific Press Company, Florida, USA Int. J. Modern Math. Sci. 2014, 11(1): 40-48 44 Figure 1: Outlier Detection Algorithm 3. Empirical Illustration The computation will follow the following algorithm for detection of outliers using the methodology described above. Outlier Detection Algorithm (ODA) follows these steps: Read the periodic observations Yt t=1,...,m Estimate the parameters of ARMA process using SPSS or SYSTAT program. Copyright © 2014 by Modern Scientific Press Company, Florida, USA Int. J. Modern Math. Sci. 2014, 11(1): 40-48 45 (iii)Obtain ( B) ( B) 1 ( B) from the estimated ARMA processes in step(ii). (iv)Read the values of Dˆ T to compute ˆ e2T (v)Read the critical values C as 1.645(1%) and 1.96(5%). The program for computing the existence outliers using 's is: (vi) Do: Calculate ˆ e2T from the values of Dˆ T . Calculate i ,T i=AO, IO, LS and TC using expression Dˆ Ti ( r ,m) Var et . If i ,T C display i ,T , otherwise no outlier, then stop. If AO IO , LS , TC then display AO , otherwise recheck using different i ,T (vii) End Do. (viii)Go and read new periodic observations from the file and perform algorithm. We have estimated the time series ARMA parameters from the data collected on Maun Airport precipitation and concentrated on three outlier free periods. The parameters shown in table 1 are used for estimating the ( Bv ) and e2t in equation (2) for all the outlier models considered. Under the null hypothesis that there is no outlier, the statistics A,T , I ,T , L ,T , and T ,T have standard distribution . This made the statistics (),T to be readily useful in practical modeling (Tsay, 1986). For practical purposes of detecting the existence and significance of outlier in time point T, if (),T is significantly greater than the chosen critical value and A,T is greater than any of I ,T , L ,T and T ,T . The additive outlier is more pronounced at this point than any other outliers. We assume the standard normal distribution with critical Z1 value as 1.96 at 5% significance level . 2 In table 1, AO is significantly greater than the critical value at time points 11, 16, 18, 24, 26 and 30. The IO is statistically significant at time points 11 and 30. Both LS and TC are significant at time points 11, 16, 18, 24, 26, and 30. The presence of LS and TC are prominent at these time points, because the -values of LS and TC are greater than those of AO and IO. For this January period, the AO, LS and TC outlier models confirmed significantly 60% of the weights of the outliers injected into the series. In table2, AO is significantly greater than the critical value at time points 16, 26 and 30. The IO is only statistically significant at time points 18 and 24.While LS is significant at time point 11, 16, 26 and 30, TC is significant at time points 16, 26, and 30. The presence of LS is prominent at these time points, implying that the -values of AO and TC are one point each less than that of LS. For this February period, while the AO and TC outlier models are respectively confirmed to be 30% significant of the Copyright © 2014 by Modern Scientific Press Company, Florida, USA Int. J. Modern Math. Sci. 2014, 11(1): 40-48 46 weights of the outliers injected into the series, the LS is 40% significant of the injected weights showing a more powerful feat in capturing the outliers. IO is significantly greater than the critical value at time points 2 and 6. The AO , LS and TC are statistically significant at time point 2 only as shown in table 3. The obvious reason is that the regime behaves differently to the presence of outliers as all the models capture the time point 2 outliers. The presence of IO is prominent at 2 and 6 time points because the -values of IO is greater than those of AO, LS and TC. For October period, only IO outlier model confirmed significantly 20% of the weights of the outliers injected into the series. Table 1: The Values of Likelihood Ratio Test Statistic for January Dˆ IO (Est.) Dˆ LS (Est.) Dˆ TC (Est.) TIMING Dˆ AO (Est.) 2 23.1 0.4633344 23.1 0.428922 23.1231 0.4882487 23.1 0.4881966 6 40.5 0.8123395 40.5 0.785461 40.39035 0.85285 40.35 0.8532472 11 133.2 2.6716945 133.2 2.137311 133.3332 2.8153562 133.2 2.8159087 16 104.475 2.0955351 104.475 1.440965 104.5795 2.2082158 104.475 2.2107953 18 112.65 2.2595074 16.533 -1.8171 112.7627 2.3810051 112.65 2.3829591 21 25.5 0.511473 25.5 -1.28727 25.5255 0.5389759 25.5 0.5412991 24 109.875 2.2038471 111.734 1.829186 110.06 2.3239371 109.9727 2.3247078 26 104.475 2.0955351 3.321 -0.21212 104.5795 2.2082158 104.475 2.2103044 27 42.075 0.8439305 50.224 -0.97134 42.11708 0.8893102 42.17948 0.8936314 30 167.025 3.3501484 167.025 3.117334 167.1921 167.025 3.5308067 AO IO LS 3.5302924 TC Table 2: The Values of Likelihood Ratio Test Statistic for February AO TIMING Dˆ AO (Est.) 2 12.75 0.1877402 12.75 0.1206609 12.76275 6 52.5 0.77304787 52.5 0.4994937 11 100.425 1.47873014 100.425 16 195.425 2.87757867 18 13.95 21 Dˆ TC (Est.) TC 0.283500128 12.75 0.19339279 52.5525 1.167353469 52.5 0.79651665 0.8430651 100.5254 2.232979867 100.425 1.52404896 195.425 1.3834307 195.6455 4.345891312 195.425 2.96574179 0.20540986 -177.2 -2.5676378 13.96395 0.310182493 13.95 0.21455869 77.85 1.14631956 77.85 -1.1125875 77.92755 1.731011766 77.85 1.18104523 24 75.975 1.1187107 75.975 2.378617 76.05098 1.689327346 75.975 1.15357435 26 185.325 2.72885898 185.325 1.0476535 185.5103 4.120757191 185.325 2.81217351 27 33.525 0.49364628 33.55853 -0.3484482 33.55853 0.745438684 33.71033 0.5141314 30 136.5 2.00992446 136.5 -0.4199967 136.6365 3.03511902 136.5 2.0709518 Dˆ IO (Est.) IO Dˆ LS (Est.) LS Copyright © 2014 by Modern Scientific Press Company, Florida, USA Int. J. Modern Math. Sci. 2014, 11(1): 40-48 47 Table 3: The Values of Likelihood Ratio Test Statistic for October Dˆ AO (Est.) AO Dˆ IO (Est.) IO Dˆ LS (Est.) LS Dˆ TC (Est.) TC TIMING 2 75.825 3.2903016 75.825 3.257088 75.901 3.2912153 75.825 3.2903016 6 37.2 1.614233 37.2 2.05393 37.237 1.6146689 37.2 1.6175233 11 21.375 0.9275331 21.375 -1.58691 21.396 0.9277723 21.375 0.9291473 16 2.625 0.1139076 2.625 -1.09745 2.678 0.1161233 2.625 0.1148351 18 1.275 0.0553265 -0.925 -0.79319 1.276 0.0553298 1.275 0.0554404 21 16.725 0.725754 16.725 0.618397 16.742 0.7259658 16.725 0.7258093 24 3.15 0.1366891 3.36 0.278199 3.153 0.1367202 3.1515 0.1374799 26 15.15 0.6574094 12.513 -0.04419 15.165 0.657584 15.15 0.6575462 27 16.8 0.7290085 18.921 0.767088 16.817 0.7292179 16.815 0.7303168 30 0.15 0.006509 0.15 -0.33009 0.15 0.0065043 0.15 0.0072387 4. Conclusion Periodic likelihood ratio test statistic was utilized to assess the evidence that any estimated outlier for a given period is an outlier. The conditions are that the timing t T of the occurrence of an outlier is known or assumed and the magnitude of the outlier has been estimated or specified. For Maun Airport data, it is observed that for January period, the AO, LS and TC outlier model confirm significantly 60% of the weights of the outliers injected into the series. The IO detects just only two time points while in February period, AO and TC outlier models respectively confirmed 30% significance of the weights of the outlier injected into the series. However, LS confirms significantly 40% significant of the weights injected displaying more powerful feat in capturing the time points of outliers. In October period, only IO outlier model confirmed significantly 20% of the weight of the outliers injected into the series. Our findings reveal that the AO, LS and TC models perform better in terms of their ability to capture the timing occurrence of outliers. References [1] Bianco, A. M., Gracia, B. M., Martinez, E. J., & Yohai, V.J., Outliers detection in regression models with ARIMA errors using robust estimations, Journal of Forecasting. 20(2001): 5665-579 [2] Bruce, A. G. and Martin, R.D., Leave-k-out diagnostics for Time Series, J.R. Statist. Soc. B, 51(1989): 363-424. Copyright © 2014 by Modern Scientific Press Company, Florida, USA Int. J. Modern Math. Sci. 2014, 11(1): 40-48 [3] 48 Chang, I., and Tiao, G.C., Estimation of Time Series Parameters in the presence of outliers, Technical Report 8, University of Chicago, Statistics Research Center, 1983. [4] Cook, R.D., Detection of Influence Observations in Linear Regression, Technometrics, 19(1977): 15-18 [5] Fox, A.J.(1972), Outliers in time series, Journal of Royal Stat. Soc. 34(1972): 350-363. [6] Pena, D, Influence Observations in Time Series, Technical Report 2178. Mathematics Research Centre, University of Wisconsin, Madison, 1984. [7] Tsay R.S., (1986), Time Series Model Specification in the presence of Outliers. J.A.S.A, 81(1986): 132-141. [8] Lasisi,T.A., Shangodoyin, D.K., and Moeng, S.R.T.(2013), Specicification of Perioddic Autocovarince Structures in the Presence of Outliers, Studies in Mathematical Sciences, 6(2)(2013): 83-95. Copyright © 2014 by Modern Scientific Press Company, Florida, USA
© Copyright 2026 Paperzz