A Significance Test for Time Series Analysis

A Significance Test for Time Series Analysis
Author(s): W. Allen Wallis and Geoffrey H. Moore
Reviewed work(s):
Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp. 401409
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2279616 .
Accessed: 19/02/2013 17:28
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
This content downloaded on Tue, 19 Feb 2013 17:28:43 PM
All use subject to JSTOR Terms and Conditions
A SIGNIFICANCE TEST FOR TIME SERIES ANALYSIS*
N
BY W. ALLEN WALLIS AND GEOFFREY H. MooRE
and RutgersUniversity
StanfordUniversity
NationalBureau ofEconomicResearch
SIGNIFICANCE
TEST is entirelyappropriateto economic
time series.One shortcomingof testsin commonuse is that they
ignoresequentialor temporalcharacteristics;that is, theytake no account of order.The standarderrorof estimate,forexample,implicitly
throwsall residualsinto a singlefrequencydistributionfromwhichto
estimate a variance. Furthermore,the usual tests cannot be applied
when series are analyzed by moving averages, free-handcurves, or
similardevices frequentlyresortedto in economicsforwant of more
adequate tools. This paper presentsa test of an oppositekind,one depending solely on order. Its principal advantages are speed and
simplicity,absence of assumptionsabout the formof population,and
freedomfromdependenceupon "mathematicallyefficient"methods,
such as least squares. This test is based on sequences in directionof
movement,that is, upon sequences of like sign in the differences
between successive observations(or some derivedquantities,e.g., residuals froma fittedcurve). In essence, it tests the randomnessof the
distributionof thesesequencesby length.
Each point at which the series under analysis (eitherthe original
or a derived series) ceases to decline and starts to rise, or ceases to
riseand startsto decline,is called a turningpoint.A turningpointis a
"peak" if it is a (relative)maximumor a "trough"if a (relative)minimum. The interval between consecutive turningpoints is called a
"phase." A phase is an "expansion" or a "contraction"accordingto
whetherit startsfroma troughand endsat a peak, or startsfroma peak
and ends at a trough.For the purposes of the presenttest, the incompletephase precedingthe firstturningpointand that followingthe
last turningpointare ignored.The lengthor durationof a phase is the
numberof intervals(hereafterreferred
to as "years,"thoughtheymay
representany system of denotingsequence) between its initial and
terminalturningpoints.Thus, a seriesof N observationsmay contain
as fewas zerooras manyas N- 2 turningpoints;and a phase may be as
short as one year (when two consecutive observationsare turning
O KNOWN
* Presented(in slightlydifferent
wording)beforethe NineteenthAnnualConferenceof the Pacific
Coast Economic Association,StanfordUniversity,December 28, 1940, and based on researchcarried
out at the National Bureau of Economic Research,1939-40,underResearchAssociateshipsprovided
by theCarnegieCorporationofNew York. A fulleraccountofthe methodand its uses willbe published
soon by the National Bureau of Economic Researchas the firstof its new seriesof TechnicalPapers.
401
This content downloaded on Tue, 19 Feb 2013 17:28:43 PM
All use subject to JSTOR Terms and Conditions
ASSOCIATION
STATISTICAL
AMERICAN
402
points)or as longas N -3 years (whenonly the second and penultimate
observationsare turningpoints).
The greaterthe number of consecutiverises in a series drawn at
randomfroma stable population,the less is the probabilityof an additional rise; forthe higherany observationmay be the smalleris the
chance of drawingone whichexceeds it. To calculate the expectedfrequencydistributionofphase durations,onlyone weak assumptionneed
be made about the population from which the observations come,
namely that the probabilityof two consecutive observationsbeing
identical is infinitesimal-a conditionmet by all continuouspopulations,henceby virtuallyall metricdata.
Without furtherpostulates about the formof the population,it is
of it leading to a
possible to conceive a mathematicaltransformation
knownpopulation,but leaving unalteredthe patternof rises and falls
of the original observations.For example, if each observationis replaced by its rank accordingto magnitudewithinthe entireseries,the
rankshave exactlythe same patternof expansionsand contractionsas
the originalobservations;and theirdistributionis simpleand definite,
each integerfrom1 to N having a relativefrequencyof 1/N. The distributionof phase durationsexpectedamong randomarrangementsof
the digits 1 to N is, therefore,comparable with the distribution
observedin any set of data. A littlemathematicalmanipulationreveals
itemsthe expectednumber
that in randomarrangementsof N different
2(d2+3d+1)(N-d-2)
of completed phases of d iS(d+3)!
mean durationof a phase 'is
. The expected
- 6
2N 7 - essentially12.
To test the randomnessof a serieswith respectto phase durations,
between
the firststep is to list in order the signs of the differences
successiveitems. Thus the sequence 0, 2, 1, 5, 7, 9, 8, 7, 9, 8 becomes
+,-,
+, +, +,
-, -, +, -. The signs are, of course, one fewerthan
the observations.The second step is to make a frequencydistribution
of the lengthsofrunsin the signs.There are fourcompletedrunsin the
example just given (the firstand last being ignoredas incomplete),of
lengths,1, 3, 2, and 1. The frequencydistributionthus shows two
phases of one year, one of two years, and one of threeyears. In case
consecutive items are equal but it can be assumed that sufficiently
refinedmeasurementwould reveal at least a slightdifference(an assumptionvalid wheneverthe test is applicable), the phase lengthsare
This content downloaded on Tue, 19 Feb 2013 17:28:43 PM
All use subject to JSTOR Terms and Conditions
*A SIGNIFICANCE TEST FOR TIME SERIES ANALYSIS
403
tabulated separatelyforeach possible sequence of signs of differences
betweentied items; and the resultantdistributionsare averaged, each
being weightedby the probabilityof securingthat distributionif each
difference
observedas zero is equally likelyto be positive or negative.
Third, the expected frequencyfor each length of phase is calculated
fromthe formulaabove, taking as N the numberof items in the sequence beingtested-in this case, 10. Next, the observedand expected
frequencydistributionsare compared by computingchi-squarein the
that is, by squaring the differusual way for testinggoodness-of-fit:
ences between actual frequenciesand correspondingtheoreticalfrequencies, dividing these squares by the respective theoretical frequencies, and summingthe resultantratios. In nearlyall applications
ofthe presenttest,avoidance ofexpectedfrequenciesthat are too small
necessitates restrictingthe distributionof phase durations to three
frequencyclasses,namelyone year's duration,two years' duration,and
over two years' duration,the theoreticalfrequenciesfor these classes
and (4N-21)/60, respectively.
being5(N-3)/12, ll(N-4)760,
The sum of the threeratios of squared deviationsto expectationsis,
then,similarto chi-squarefor two degrees of freedom,one degree of
freedombeing lost because a singlelinear constraintis imposedon the
theoreticalfrequenciesby takingthe value of N fromthe observations.
It is advisable, however,to distinguishchi-squareforphase duratiolns
by a subscriptp (denotingphase), because it does not quite conformto
the Pearsonian distributionfunction ordinarilyassociated with the
symbol X2. The phase lengthsin a single series are not entirelyindependentof one another;as a result,verylarge and verysmall values of
and the
xp2 are a littlemorelikelythan is shownby the x2 distribution,
mean and variance of Xp2 generallyexceed those of X2. We have not
determinedthe samplingdistributionof Xp2mathematically,but have
secured empiricallya substitutethat appears satisfactory.In the first
place, a recursionformulaenabled us to calculate the exact distribution
of xp2 forsmall values of N. Table I gives the exact probabilityof obtaininga value as large as or largerthan each possible value of xp2 for
values ofN from6 to 12, inclusive.As a secondstep towarddetermining
the samplingdistribution,an empiricaldistributionof xp2 was secured
from700 random series, 200 for N = 25, 300 for N = 50, and 200 for
forseparate values ofN werenot homoN = 75. The threedistributions
geneous with one anothernor with the exact distributionfor N=12;
but the differencesamong them were unimportantfor the present
purposes,occurringchieflyat the higherprobabilitiesratherthan at the
tail (the importantregionfora test of significance),and representing
This content downloaded on Tue, 19 Feb 2013 17:28:43 PM
All use subject to JSTOR Terms and Conditions
404
AMERICAN
STATISTICAL
ASSOCIATION-
TABLE I
DISTRIBUTIONS
OF xP2: EXACT FOR N=6 TO 12, AND APPROXIMATE
FOR LARGER VALUES OF N
(P representsthe probabilitythat an observedXp2will equal or exceed the specifiedvalue)
N=9
N=6
P
Xp2
.467
.867
1.194
1.667
2.394
2.867
19.667
1.000
.869
.675
.453
.367
.222
.053
N=7
Xp2
.552
.733
.752
.933
1.733
2.152
2.333
3.933
5.606
7.504
8.904
P
1.000
.789
.703
.536
.493
.370
.302
.277
.169
.117
.055
N=8
Xp2
.284
.684
.844
.920
1.320
1.480
2.364
2.680
2.935
3.000
4.375
4.455
4.935
5.000
5.819
6.455
P
1.000
.843
.665
.590
.560
.506
.495
.471
.392
.299
.293
.235
.194
.133
.064
.033
N=11
P
Xp2
.358
1.158
1.267
1.630
2.067
2.430
2.758
3.158
3.267
3.667
4.030
4.067
4.758
5.667
6.067
7.485
15.666
1.000
.798
.631
.605
.489
.452
.381
.374
.321
.215
.164
.144
.110
.078
.064
.020
.005
N=10
Xp2
.328
.614
.728
1.055
1.341
1.419
1.585
1.705
1.772
1.814
1.819
2.313
2.577
2.676
2.743
2.863
2.905
2.977
3.242
3.834
3.970
4.333
4.400
4.676
4.858
5.128
5.491
6.515
7.133
11.308
12.965
P
1.000
.941
.917
.813
.693
.606
.601
.594
.592
.526
.419
.407
.374
.327
.327
.274
.242
.220
.181
.179
.165
.158
.158
.139
.107
.072
.059
.054
.042
.014
.006
Xp2
.479
.579
.817
.917
.979
1.088
1.279
1.317
1.588
1.700
1.800
2.079
2.200
2.309
2.409
2.417
2.500
2.579
2.688
2.809
3.026
3.109
3.213
3.300
3.779
3.800
3.909
4.117
4.313
4.388
4.726
5.000
5.609
5.700
6.013
8.200
8.635
9.468
9.735
10.214
11.435
N=12
P
1.000
.980
.934
.844
.730
.723
.655
.576
.537
.473
.472
.468
.467
.466
.440
.403
.392
.384
.304
.274
.261
.230
.201
.147
.147
.147
.133
.128
.126
.099
.091
.077
.077
.076
.055
.050
.032
.022
.018
.009
.004
Xp2
0.615
0.661
0.748
0.794
0.837
0.971
1.015
1.061
1.415
1.461
1.637
1.683
1.933
1.948
2.067
2.156
2.203
2.289
2.333
2.556
2.615
2.661
2.733
2.837
2.870
2.883
2.956
3.267
3.415
3.489
3.933
4.070
4.156
4.348
4.394
4.571
4.616
4.733
5.667
5.803
5.889
6.025
6.733
6.842
6.956
7.504
7.622
8.576
8.822
9.237
9.267
10.556
19.667
This content downloaded on Tue, 19 Feb 2013 17:28:43 PM
All use subject to JSTOR Terms and Conditions
N>12
P
1.000
.984
.896
.891
.850
.786
.720
.685
.585
.583
.569
.533
.487
.486
.428
.427
.407
.344
.333
.331
.303
.303
.300
.300
.287
.246
.216
.211
.207
.149
.127
.127
.114
.113
.113
.112
.109
.101
.092
.092
.090
.090
.085
.072
.060
.050
.041
.029
.026
.019
.014
.003
.000
Xp2
6.448
5.50
6.674
5.75
P
.10
.098
.09
.087
5.927
.08
6.163
.07
6.541
.06
6.898
.05
7.401
.04
6.00
6.25
6.50
6.75
7.00
7.25
7.50
7.75
8.00
8.009
.077
.069
.061
.054
.048
.043
.038
.034
.030
.03
8.25
8.50
8.75
.027
.024
.021
8.886
.02
9.00
9.25
9.50
9.75
10.00
10.25
10.312
10.50
10.75
11.00
11.25
11.50
.019
.017
.015
.013
.012
.010
.01
.009
.008
.007
.006
.006
11.756
.005
15.085
.001
12.00
13.00
14.00
.004
.003
.002
*A SIGNIFICANCE
TEST
FOR TIME
SERIES
ANALYSIS
405
local irregularitiesrather than basic differences
in form.It appears,
therefore,that when N is as large as 12 a singlesamplingdistribution
Of Xp2is sufficient.
The mean of the 700 values of Xp2is 2.3049, and the variance 5.0458.
As an approach to the distributionof Xp2, it seems reasonable simply
to reduce Xp2 by approximatelyone-seventhand referit to the x2 distributionfor two degrees of freedom,which has a mean of two and
tables forwhichare readilyavailable; and such a comparisondoes indicate good conformity.That the variance of the observed values is
less than that of (7/6)X2fortwo degreesof freedomsuggests,however,
that a moresatisfactoryfit at the tails can be securedby using a distributionhaving a varianceof 5, e.g., x2 for "two and one-halfdegrees
offreedom."For xp2above about 5.5 and P below about .10, the agreement of this distributionwith the observationsis very satisfactory.In
the main body of the distributionthe functionwhose mean value is
equated to the sample mean gives a somewhatbetterfit.
In practice,therefore,the procedurefor interpretingXp2, assumed
always to be calculated fromthree frequencyclasses, is as follows:
If xp2 is less than 6.3 (the point of intersectionbetweenthe ogives of
(7/6)x2fortwo degreesof freedomand x2fortwo and one-halfdegrees
of freedom),reduce it by one-seventhand referto the usual x2 tables
fortwo degreesof freedom.This procedureis satisfactoryforall values
Of xp2,but forvalues above 6.3 somewhatmoreaccurate resultsare apparentlysecuredby referring
Xp2 to the last column of Table I, which
gives the distributionof x2 for two and one-halfdegrees of freedom.
When N < 13 the exact distributionsshould,of course,be used.
A simplertest of the same naturemay be based on the fact that in a
randomsequence ofN observations(whereN is not too small-not less
than 10, say) the total numberof completedphases is normallydistributedabout a mean of (2N-7)/3 withvarianceof (16N-29)/90. (In
betweenthe observedand expectednumusing this test,the difference
bers of phases should be reducedin absolute value by one-halfunit,in
order to allow for discontinuity.)This test of the total number of
phases, which is essentiallyequivalent to a test of the mean phase
duration,is normallyless sensitivethan the xp2 test, which takes accountof the lengthsofthe phases,thoughthe superiorityof the xp2test
in this respectis limitedby the necessityof confiningthe frequency
distributionto threeclasses. Advantagesofthe test ofthe total number
of phases are that it is even simplerto apply than the xp2test,that its
sampling distributionis known exactly and is readily available, and
that it is adaptable to cases wherethe hypothesisalternativeto the null
This content downloaded on Tue, 19 Feb 2013 17:28:43 PM
All use subject to JSTOR Terms and Conditions
406
AMERICAN
STATISTICAL
ASSOCIATION*
hypothesisis eitherthat the phases are abnormallylong or that they
are abnormallyshort.
Applicationofthe x%2testto an economicproblemmay be illustrated
by an analysis of sweet potato production,yield per acre, and acreage
harvestedin the United States, 1868-1937,as recordedon page 243 of
Agricultural
Statistics,1939. The frequencydistributionsof phase durationsin these serieshave been comparedwiththose to be expectedin a
random sequence. From the values of Xp2and their corresponding
probabilities,it appears that the fluctuationsin productionconform
withwhat would be expectedin a randomseries;whileof the two componentsof total production,yield per acre conformswell and acreage
harvesteddoes not conformat all.
The figureson total productiondo not, of course, constitutea random series,forthereis a markedupward trendin the data. In general,
the methodhere presentedis not verysensitiveto primarytrend.The
removal of trend from a series, or its introductioninto a trendless
series,can changethe sign of the difference
betweenconsecutiveitems
only if the trendfactorfora singleyear is greaterthan the difference
between the items in trend-adjustedform.If, therefore,the residuals
fromtrendare such that theirfirstdifferences
are rarelyas small as the
trendfactorfora singleyear,as is frequentlythe case in economictime
series,the distributionofphase lengthswillnot be muchaffectedby the
presenceor absence of trend.Anotherfactortendingto minimizethe
effectof trendon the test is that expansionsare lengthenedand contractionsshortenedif the trendis upward,and vice versa if it is downward,leaving the total numberof phases of a given durationrelatively
unaffected.In such cases the existenceof trend may be revealed by
separate distributionsfor expansionsand contractions.In the case of
sweetpotato production,both distributionsconformwell to the chance
distributionand to one another;but foracreage the two distributions
differmarkedly,suggestingthat the non-randomnessevidencedin the
acreage seriesmay be at least partlyattributableto trend.
Lack of sensitivityto primarytrendis a limitationof the technique
fromthe pointofview ofdetectingthe existenceofsuch a trend.On the
otherhand, it is not difficult
to determineby othermethodswhethera
primarytrendexists-the rank correlationbetweenthe variate and the
date oftenaffordsa convenienttest. And fordeterminingwhetherthe
systematicvariation contains secondarycomponents,e.g., cyclical or
seasonal variations,it is a decided advantage of the presentmethod
that it frequentlygives satisfactoryresultsregardlessof the presenceof
trend,thusavoidingthe complexitiesoftrendelimination.It is possible,
This content downloaded on Tue, 19 Feb 2013 17:28:43 PM
All use subject to JSTOR Terms and Conditions
*A SIGNIFICANCE
TEST
FOR TIME
SERIES
ANALYSIS
407
of course,forsecondaryfluctuationsalso to be concealed if theiryear
to year magnitudeis definitelyless than year to year random movements; this is not so likelyas in the case of a primarytrend,but is a
real possibilityin the case of gradual movements-e.g., long waves.
A second exampleillustratesthe use of Xp2as a criterionof the fitof
moving averages, and for selecting the proper period for a moving
average. If a movingaverage, or any othercurve,describesadequately
the systematicvariation in a series,the residuals should constitutea
randomsequence. If the periodis too long,waves or cyclesmay appear
in the residuals,and ifit is too shortthe residualswill clustertoo closely
about the line. To illustrate this application, ten moving averages
having spans from2 to 11 years have been fittedto the data on sweet
potato acreage,and the residualstested forrandomness.Each moving
average uses equal weights;the necessityof centeringeach average at
an observation,however,means an implicitincreaseof one year in the
span of averages based on an even numberof points,withthe firstand
last observationsreceivinghalfweight.The last two columnsof Table
II show the values of Xp2,and the corresponding
probabilities,obtained
by testingthe residuals for randomness.The moving averages based
on even numbersof years give notablybetterresultsthan those based
on the correspondingodd numbersof items; the taperingof the weight
diagramimplicitin the even averages evidentlyimprovesthe fit.Anotherstrikingfeatureis that the probabilitiesfirstriseand thendecline.
Thus, the odd averages attain a maximumprobabilityof .24 at seven
years while the even averages give the best resultat six years, when
the probabilityis .61. Had other weight diagrams been tested still
betterresultsmighthave been obtained.
It should be noted that a "better" result is not necessarilyone in
whichthe curvegives a closerfit,but one in whichthe residualsbehave
more like a series of independent,random observations,as judged by
The closest fitsto the original
sequences in signs of firstdifferences.
observationsare given by the shortestmoving averages; but these
describe not only the systematicvariation but also a portion of the
random fluctuations.If the movingaverage is eithertoo short or too
large; but in the formercase its magnitude
long,Xr2will be significantly
resultsfroman excess of short phases and a deficiencyof long ones,
whilein the lattercase the reverseis true. Table II includesthe actual
frequencydistributionsof residualsfromthe ten curves.
In orderto compare this new test with a more elaborate procedure
frequentlyemployedin time series analysis, a power series y=a+bx
was fittedby the methodof least squares to the series
+cx2+dx3+
...
This content downloaded on Tue, 19 Feb 2013 17:28:43 PM
All use subject to JSTOR Terms and Conditions
408
AMERICAN
STATISTICAL
ASSOCIATION*
TABLE II
FREQUENCY DISTRIBUTIONS
AVERAGES FITTED
OF PHASE DURATIONS IN RESIDUALS FROM MOVING
TO SWEET POTATO ACREAGE HARVESTED
UNITED STATES, 1868-1937
Duration of phase
Span of
moving
average
(years)
Over
Total
One
Two
Oe
To
two
(frequency)
year
years
years
(frequency) (frequency) (frequency)
x2
p
X
2
Expected
Observed
27.083
46
11.733
8
4.183
1
43
55
16.823
.0004
3
Expected
Observed
27.083
46
11.733
8
4.183
1
43
55
16.823
.0004
4
Expected
Observed
26.25
31.25
11.367
8.25
4.05
4.5
41.667
44
1.857
.45
5
Expected
Observed
26.25
19.25
11.367
9
4.05
7.75
41.667
36
5.740
.09
6
Expected
Observed
25.417
25.5
11
8.25
3.917
5.25
40.333
39
1.141
.61
7
Expected
Observed
25.417
28.5
11
6
3.917
5.5
40.333
40
3.287
.24
8
Expected
Observed
24.583
25
10.633
7
3.783
6
39
38
2.547
.34
9
Expected
Observed
24.583
18.75
10.633
7.5
3.783
6.75
39
33
10
Expected
Observed
23.75
23
10.267
5.5
3.65
5.5
37.667
34
3.175
.26
11
Expected
Observed
23.75
22
10.267
2
3.65
7
37.667
31
9.861
.01
4.634
.14
on sweetpotato acreage harvested.The calculationswere carriedas far
as the ninthdegreeterm,usingthetechniqueoforthogonalpolynomials,
but none beyond the third effecteda significantreduction in the
residual variance. Accordingto the usual criterion,therefore,a third
degree curve would be regarded as fittingadequately. The residuals
fromthe thirddegree curve were then submittedto the presenttest.
There were24 one-yearphases,3 two-yearphases,and 9 phases ofmore
than two years,producinga Xp2of 12.47,fromwhichit is clear that the
fit of the third degree polynomialis quite inadequate. The fault, of
course,lies in inferring
that a thirddegreepower series gives an adequate fitbecause no otherpower series gives a significantly
betterfit
as judged by the standard errorof estimate.
This content downloaded on Tue, 19 Feb 2013 17:28:43 PM
All use subject to JSTOR Terms and Conditions
*A
SIGNIFICANCE
TEST
FOR TIME
SERIES
ANALYSIS
409
xp2can also be used to test the independenceof two variates,and in
some circumstancesis superiorforthis purposeto the rank correlation
coefficient.
The procedureis to arrangethe pairs accordingto the order
of magnitude of one variate and tabulate the distributionof phase
durationsin the other variate. If the two series are independent,the
resultingvalue of xp2 will not be significant.A difficulty,
however,is
that the conclusionoccasionallydepends upon whichvariate is chosen
forarrangingin orderand whichforcountingthe phase durations.
It is perhaps advisable to emphasizeexplicitlythat the presenttest
by no means utilizesall of the information
in the data. In particular,it
ignoresthe magnitude of the year to year fluctuations,treatingthe
smallest as equivalent to the largest. A serial correlationcoefficient
computedfromranksmay retrievesome of this information
on magnitude. Another considerationin interpretingthe test is that a set of
phase durations which appears random when viewed only as a frequency distributionmay not have been arrangedat random in time.
An additional point, obvious but worthyof mention,is that the time
unit used may affectconclusionsderivedfromthe Xp2test; forexample,
year to year movementsmay appear random and month to month
movementsnon-random,or vice versa.
This content downloaded on Tue, 19 Feb 2013 17:28:43 PM
All use subject to JSTOR Terms and Conditions