ARIMA Methods of Detecting Outliers in Time Series Periodic

International Journal of Modern Mathematical Sciences, 2014, 11(1): 40-48
International Journal of Modern Mathematical Sciences ISSN:2166-286X
Florida, USA
Journal homepage:www.ModernScientificPress.com/Journals/ijmms.aspx
Article
ARIMA Methods of Detecting Outliers in Time Series
Periodic Processes
T. A. Lasisi1 and D. K. Shangodoyin2,*
1
Department of Mathematics and Statistics, The Polytechnics Ibadan, Nigeria
2
Department of Statistics, University of Botswana, Botswana
* Author to whom correspondence should be addressed. E-mail: [email protected]
Article history: Received 4 March 2014, Received in revised form 27 May, 2014 Accepted 24 June 2014,
Published 28, July 2014.
Abstract: We utilized the periodic likelihood ratio test statistic to assess the evidence that
any estimated outlier for a given period is an outlier. The conditions are that the timing t  T
of the occurrence of an outlier is known or assumed and the magnitude of the outlier has
been estimated or specified. While in February period, AO and TC outlier models
respectively confirmed 30% significance of the weights of the outlier injected into the series,
however, LS confirms significantly 40% significant of the weights injected displaying more
powerful feat in capturing the time points of outliers. In October period, only IO outlier
model confirmed significantly 20% of the weight of the outliers injected into the series. Our
findings reveal that the AO, LS and TC models perform better in terms of their ability to
capture the timing occurrence of outliers.
Keywords: Likelihood ratio test statistic, ARMA models, Periodic processes
2000 Mathematical Subject Classification: C15, C22, C52
1. Introduction
In practice, there may be need for detecting outliers if they are influential to a model fitting. Fox
(1972) appears to have first discovered the detection of outliers with time series, assuming an
Copyright © 2014 by Modern Scientific Press Company, Florida, USA
Int. J. Modern Math. Sci. 2014, 11(1): 40-48
41
autoregressive process on outlier-free series with Gaussian noise. The approach is a likelihood ratio
criteria for testing the existence of additive and innovative outliers under the condition that serial
correlation exists, a good condition for analyzing periodic series.
A number of research works have been done on both least squares and maximum likelihood
methods of detecting outliers assuming known process models (Bianco et al 2001, Tsay 1996). Bruce
and Martin (1989) commenting on Cook’s distance statistic(Cook,1977) observe that the test statistic
based on the influence of the i-th observation on the parameter of the regression model is well established
but the dependency relations which exist in time series give rise to a “smearing” effect when the test
statistic for a model coefficient is calculated and therefore the need for time series ARMA model
specified that test statistic. The identification of influential observations in the complex of ARIMA has
been developed (Chang and Tiao 1983, Pena 1984, Tsay 1986 and 1996).
In this paper, we utilized the derived statistic in (Lasisi, T. A. et a, 2013l) Section 3.1(3.1.1-3.1.4),
PP 92 to assess the evidence that any estimated outlier for a given period is an outlier. The conditions
are that the timing t  T of the occurrence of an outlier is known or assumed and the magnitude of the
outlier has been estimated or specified. Consideration is given to ARIMA models considered on periodic
processes using the likelihood ratio test statistic developed by Tsay (1986), this extended to level shift
and transitory change outliers for periodic series in this study.
2. Outlier Detection Test Statistic
If we assume a single outlier model:
 1 ( Bv )Zt ( r ,m)   1 ( Bv )Yt ( r ,m)  ( Bv ) Dt((Tr ,)m)   t
(1)
where  ( B v ) is the weight attached to the magnitude of outliers.
Suppose that the ARIMA models
 ( Bv : I )Yt ( r ,m)   ( Bv : I )et ( r ,m) and  ( Bv : I )Zt ( r ,m)   ( Bv : I )et ( r ,m)
where
(2)
 ( Bv )   ( Bv )  ( Bv ) , are fitted on the outlier free series and outlier infested series
respectively. 
t ( r ,m )
is a sequence of independent Gaussian variates with mean zero and variance one.
In model (1), it is assumed that all the zeros of  ( m) ( Bv : I ) and  ( m) ( Bv : I ) are on or outside the unit
circle and that  ( m) ( Bv : I ) and  ( m) ( Bv : I ) have no common factors, if some of the errors of
 ( m) ( Bv : I ) are on the unit circle then it is assumed that the process starts at fixed time point with known
or given starting values. The white noise process et has mean zero and variance
 e2 , applying to the
least square theory we have the following results : (i) ADDITIVE OUTLIER CASE:
Copyright © 2014 by Modern Scientific Press Company, Florida, USA
Int. J. Modern Math. Sci. 2014, 11(1): 40-48
42
1
(1)
2
ˆ
( Bv ) e2T ( r ,m )
The magnitude of outlier estimated D
T ( r ,m ) is PT ( r ,m )   ( B)eT ( r ,m ) with variance 
(ii)INNOVATIONAL OUTLIER CASE:
2
The estimate of outlier magnitude is PT ( r ,m )  et(2)
( r ,m ) with the variance as  eT ( r , m )
(iii)LEVEL SHIFT CASE:

with variance 1  B v




2
 2 ( Bv ) e2
Given that the weight 1  B v
1

, then the magnitude of the outliers estimated is 1  B  ( B )eT ( r ,m )
t ( r ,m )
v
1
v
.
(iv)TRANSITORY CHANGE CASE:

The weight coefficient of outlier is 1   v Bv

an with variance 1  ( B)v

2

1


, the estimate of the outlier is 1  ( B)  ( B)et ( r ,m)
v
1
 2 ( B) e2
t ( r ,m )
Based on the results in (i)-(iv), we may construct the test statistics for testing the existence of an
outlier at time point t  T . The null hypothesis is that there is no outlier at time point t  T ; under the
assumptions of knowing the time series parameters and time occurrence of the outliers the following test
statistics are distributed as N  0,  2  .Although Chang and Tiao (1983) suggested the critical values 3.0,


3.5 and 4.0. But in practical time series analysis, we suggest the use of P (i ),T  c   for a specified
value of  under normality assumption Tsay(1988). Assume that  ()
N (0,1) and suppose that the
ARIMA( p, d , v, q) model parameters, timing of outliers and magnitude are known then the test statistic
for each scenario would be as follows:
(i)
Existence of AO:
A,T 
(ii)
Dˆ T ( r ,m )
 1 ( B v )ˆ et ( r ,m )
(3)
Existence of IO:
I ,T 
Dˆ T ( r ,m )
ˆ et ( r ,m )
(4)
(iii) Existence of LS:
L,T 
1  B 
Dˆ T ( r ,m )
v 1
 ( B )ˆ et ( r ,m )
1
v
(iv) Existence of TC:
Copyright © 2014 by Modern Scientific Press Company, Florida, USA
(5)
Int. J. Modern Math. Sci. 2014, 11(1): 40-48
C ,T 
Dˆ T ( r ,m )
(1   v B v )1 1 ( B v )ˆ et ( r ,m )
43
(6)
2.1. Outliers Detection Algoritms (ODA)
The criteria proposed in (i)-through (iv) would be useful in detecting outliers in periodic
processes. If these estimates of the outliers magnitude are estimated using Outlier Detecting Algorithm
(ODA) in figure 1 to get the magnitude from particular time point T; we assumed outlier free points have
a zero magnitude and used these to find if this is true. The result in this section is only used in the
intermediate steps of outlier detection procedure. The final estimates of outliers are from the model
incorporating all the outliers in which all parameters are estimated in the ARIMA( p, d , v, q) model. The
following flow chart demonstrates how automatic outlier detection works. Let be Dˆ t the magnitude of
outliers and t  T when an outlier occurs.
Copyright © 2014 by Modern Scientific Press Company, Florida, USA
Int. J. Modern Math. Sci. 2014, 11(1): 40-48
44
Figure 1: Outlier Detection Algorithm
3. Empirical Illustration
The computation will follow the following algorithm for detection of outliers using the
methodology described above.
Outlier Detection Algorithm (ODA) follows these steps:
Read the periodic observations Yt  t=1,...,m
Estimate the parameters of ARMA process using SPSS or SYSTAT program.
Copyright © 2014 by Modern Scientific Press Company, Florida, USA
Int. J. Modern Math. Sci. 2014, 11(1): 40-48
45
(iii)Obtain  ( B)   ( B) 1 ( B) from the estimated ARMA processes in step(ii).
(iv)Read the values of Dˆ T to compute ˆ e2T
(v)Read the critical values C as 1.645(1%) and 1.96(5%). The program for computing the existence
outliers using  's is:
(vi) Do:
Calculate ˆ e2T from the values of Dˆ T .
Calculate i ,T  i=AO, IO, LS and TC using expression Dˆ Ti ( r ,m) Var  et  .
If i ,T  C display i ,T , otherwise no outlier, then stop.
If AO  IO , LS , TC then display AO , otherwise recheck using different i ,T
(vii) End Do.
(viii)Go and read new periodic observations from the file and perform algorithm.
We have estimated the time series ARMA parameters from the data collected on Maun Airport
precipitation and concentrated on three outlier free periods. The parameters shown in table 1 are used for
estimating the  ( Bv ) and  e2t in equation (2) for all the outlier models considered. Under the null
hypothesis that there is no outlier, the statistics A,T , I ,T , L ,T , and T ,T have standard distribution .
This made the statistics (),T to be readily useful in practical modeling (Tsay, 1986). For practical
purposes of detecting the existence and significance of outlier in time point T, if (),T is significantly
greater than the chosen critical value and A,T is greater than any of I ,T , L ,T and T ,T . The additive
outlier is more pronounced at this point than any other outliers. We assume the standard normal
distribution with critical Z1 value as 1.96 at 5% significance level  .
2
In table 1, AO is significantly greater than the critical value at time points 11, 16, 18, 24, 26 and
30. The IO is statistically significant at time points 11 and 30. Both LS and TC are significant at time
points 11, 16, 18, 24, 26, and 30. The presence of LS and TC are prominent at these time points, because
the  -values of LS and TC are greater than those of AO and IO. For this January period, the AO, LS
and TC outlier models confirmed significantly 60% of the weights of the outliers injected into the series.
In table2, AO is significantly greater than the critical value at time points 16, 26 and 30. The IO is only
statistically significant at time points 18 and 24.While LS is significant at time point 11, 16, 26 and 30,
TC is significant at time points 16, 26, and 30. The presence of LS is prominent at these time points,
implying that the  -values of AO and TC are one point each less than that of LS. For this February
period, while the AO and TC outlier models are respectively confirmed to be 30% significant of the
Copyright © 2014 by Modern Scientific Press Company, Florida, USA
Int. J. Modern Math. Sci. 2014, 11(1): 40-48
46
weights of the outliers injected into the series, the LS is 40% significant of the injected weights showing
a more powerful feat in capturing the outliers.
IO is significantly greater than the critical value at time points 2 and 6. The AO , LS and TC
are statistically significant at time point 2 only as shown in table 3. The obvious reason is that the regime
behaves differently to the presence of outliers as all the models capture the time point 2 outliers. The
presence of IO is prominent at 2 and 6 time points because the  -values of IO is greater than those of
AO, LS and TC. For October period, only IO outlier model confirmed significantly 20% of the weights
of the outliers injected into the series.
Table 1: The Values of Likelihood Ratio Test Statistic for January
Dˆ IO (Est.)
Dˆ LS (Est.)
Dˆ TC (Est.)
TIMING
Dˆ AO (Est.)
2
23.1
0.4633344
23.1
0.428922
23.1231
0.4882487
23.1
0.4881966
6
40.5
0.8123395
40.5
0.785461
40.39035
0.85285
40.35
0.8532472
11
133.2
2.6716945
133.2
2.137311
133.3332
2.8153562
133.2
2.8159087
16
104.475
2.0955351
104.475
1.440965
104.5795
2.2082158
104.475
2.2107953
18
112.65
2.2595074
16.533
-1.8171
112.7627
2.3810051
112.65
2.3829591
21
25.5
0.511473
25.5
-1.28727
25.5255
0.5389759
25.5
0.5412991
24
109.875
2.2038471
111.734
1.829186
110.06
2.3239371
109.9727
2.3247078
26
104.475
2.0955351
3.321
-0.21212
104.5795
2.2082158
104.475
2.2103044
27
42.075
0.8439305
50.224
-0.97134
42.11708
0.8893102
42.17948
0.8936314
30
167.025
3.3501484
167.025
3.117334
167.1921
167.025
3.5308067
AO
IO
LS
3.5302924
TC
Table 2: The Values of Likelihood Ratio Test Statistic for February
AO
TIMING
Dˆ AO (Est.)
2
12.75
0.1877402
12.75
0.1206609
12.76275
6
52.5
0.77304787
52.5
0.4994937
11
100.425
1.47873014
100.425
16
195.425
2.87757867
18
13.95
21
Dˆ TC (Est.)
TC
0.283500128
12.75
0.19339279
52.5525
1.167353469
52.5
0.79651665
0.8430651
100.5254
2.232979867
100.425
1.52404896
195.425
1.3834307
195.6455
4.345891312
195.425
2.96574179
0.20540986
-177.2
-2.5676378
13.96395
0.310182493
13.95
0.21455869
77.85
1.14631956
77.85
-1.1125875
77.92755
1.731011766
77.85
1.18104523
24
75.975
1.1187107
75.975
2.378617
76.05098
1.689327346
75.975
1.15357435
26
185.325
2.72885898
185.325
1.0476535
185.5103
4.120757191
185.325
2.81217351
27
33.525
0.49364628
33.55853
-0.3484482
33.55853
0.745438684
33.71033
0.5141314
30
136.5
2.00992446
136.5
-0.4199967
136.6365
3.03511902
136.5
2.0709518
Dˆ IO (Est.)
IO
Dˆ LS (Est.)
LS
Copyright © 2014 by Modern Scientific Press Company, Florida, USA
Int. J. Modern Math. Sci. 2014, 11(1): 40-48
47
Table 3: The Values of Likelihood Ratio Test Statistic for October
Dˆ AO (Est.)
AO
Dˆ IO (Est.)
IO
Dˆ LS (Est.)
LS
Dˆ TC (Est.)
TC
TIMING
2
75.825
3.2903016
75.825
3.257088
75.901
3.2912153
75.825
3.2903016
6
37.2
1.614233
37.2
2.05393
37.237
1.6146689
37.2
1.6175233
11
21.375
0.9275331
21.375
-1.58691
21.396
0.9277723
21.375
0.9291473
16
2.625
0.1139076
2.625
-1.09745
2.678
0.1161233
2.625
0.1148351
18
1.275
0.0553265
-0.925
-0.79319
1.276
0.0553298
1.275
0.0554404
21
16.725
0.725754
16.725
0.618397
16.742
0.7259658
16.725
0.7258093
24
3.15
0.1366891
3.36
0.278199
3.153
0.1367202
3.1515
0.1374799
26
15.15
0.6574094
12.513
-0.04419
15.165
0.657584
15.15
0.6575462
27
16.8
0.7290085
18.921
0.767088
16.817
0.7292179
16.815
0.7303168
30
0.15
0.006509
0.15
-0.33009
0.15
0.0065043
0.15
0.0072387
4. Conclusion
Periodic likelihood ratio test statistic was utilized to assess the evidence that any estimated outlier
for a given period is an outlier. The conditions are that the timing t  T of the occurrence of an outlier
is known or assumed and the magnitude of the outlier has been estimated or specified. For Maun Airport
data, it is observed that for January period, the AO, LS and TC outlier model confirm significantly 60%
of the weights of the outliers injected into the series. The IO detects just only two time points while in
February period, AO and TC outlier models respectively confirmed 30% significance of the weights of
the outlier injected into the series. However, LS confirms significantly 40% significant of the weights
injected displaying more powerful feat in capturing the time points of outliers. In October period, only
IO outlier model confirmed significantly 20% of the weight of the outliers injected into the series. Our
findings reveal that the AO, LS and TC models perform better in terms of their ability to capture the
timing occurrence of outliers.
References
[1]
Bianco, A. M., Gracia, B. M., Martinez, E. J., & Yohai, V.J., Outliers detection in regression models
with ARIMA errors using robust estimations, Journal of Forecasting. 20(2001): 5665-579
[2]
Bruce, A. G. and Martin, R.D., Leave-k-out diagnostics for Time Series, J.R. Statist. Soc. B,
51(1989): 363-424.
Copyright © 2014 by Modern Scientific Press Company, Florida, USA
Int. J. Modern Math. Sci. 2014, 11(1): 40-48
[3]
48
Chang, I., and Tiao, G.C., Estimation of Time Series Parameters in the presence of outliers,
Technical Report 8, University of Chicago, Statistics Research Center, 1983.
[4]
Cook, R.D., Detection of Influence Observations in Linear Regression, Technometrics, 19(1977):
15-18
[5]
Fox, A.J.(1972), Outliers in time series, Journal of Royal Stat. Soc. 34(1972): 350-363.
[6]
Pena, D, Influence Observations in Time Series, Technical Report 2178. Mathematics Research
Centre, University of Wisconsin, Madison, 1984.
[7]
Tsay R.S., (1986), Time Series Model Specification in the presence of Outliers. J.A.S.A, 81(1986):
132-141.
[8]
Lasisi,T.A., Shangodoyin, D.K., and Moeng, S.R.T.(2013), Specicification of Perioddic
Autocovarince Structures in the Presence of Outliers, Studies in Mathematical Sciences,
6(2)(2013): 83-95.
Copyright © 2014 by Modern Scientific Press Company, Florida, USA