A Significance Test for Time Series Analysis Author(s): W. Allen Wallis and Geoffrey H. Moore Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 36, No. 215 (Sep., 1941), pp. 401409 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2279616 . Accessed: 19/02/2013 17:28 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions A SIGNIFICANCE TEST FOR TIME SERIES ANALYSIS* N BY W. ALLEN WALLIS AND GEOFFREY H. MooRE and RutgersUniversity StanfordUniversity NationalBureau ofEconomicResearch SIGNIFICANCE TEST is entirelyappropriateto economic time series.One shortcomingof testsin commonuse is that they ignoresequentialor temporalcharacteristics;that is, theytake no account of order.The standarderrorof estimate,forexample,implicitly throwsall residualsinto a singlefrequencydistributionfromwhichto estimate a variance. Furthermore,the usual tests cannot be applied when series are analyzed by moving averages, free-handcurves, or similardevices frequentlyresortedto in economicsforwant of more adequate tools. This paper presentsa test of an oppositekind,one depending solely on order. Its principal advantages are speed and simplicity,absence of assumptionsabout the formof population,and freedomfromdependenceupon "mathematicallyefficient"methods, such as least squares. This test is based on sequences in directionof movement,that is, upon sequences of like sign in the differences between successive observations(or some derivedquantities,e.g., residuals froma fittedcurve). In essence, it tests the randomnessof the distributionof thesesequencesby length. Each point at which the series under analysis (eitherthe original or a derived series) ceases to decline and starts to rise, or ceases to riseand startsto decline,is called a turningpoint.A turningpointis a "peak" if it is a (relative)maximumor a "trough"if a (relative)minimum. The interval between consecutive turningpoints is called a "phase." A phase is an "expansion" or a "contraction"accordingto whetherit startsfroma troughand endsat a peak, or startsfroma peak and ends at a trough.For the purposes of the presenttest, the incompletephase precedingthe firstturningpointand that followingthe last turningpointare ignored.The lengthor durationof a phase is the numberof intervals(hereafterreferred to as "years,"thoughtheymay representany system of denotingsequence) between its initial and terminalturningpoints.Thus, a seriesof N observationsmay contain as fewas zerooras manyas N- 2 turningpoints;and a phase may be as short as one year (when two consecutive observationsare turning O KNOWN * Presented(in slightlydifferent wording)beforethe NineteenthAnnualConferenceof the Pacific Coast Economic Association,StanfordUniversity,December 28, 1940, and based on researchcarried out at the National Bureau of Economic Research,1939-40,underResearchAssociateshipsprovided by theCarnegieCorporationofNew York. A fulleraccountofthe methodand its uses willbe published soon by the National Bureau of Economic Researchas the firstof its new seriesof TechnicalPapers. 401 This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions ASSOCIATION STATISTICAL AMERICAN 402 points)or as longas N -3 years (whenonly the second and penultimate observationsare turningpoints). The greaterthe number of consecutiverises in a series drawn at randomfroma stable population,the less is the probabilityof an additional rise; forthe higherany observationmay be the smalleris the chance of drawingone whichexceeds it. To calculate the expectedfrequencydistributionofphase durations,onlyone weak assumptionneed be made about the population from which the observations come, namely that the probabilityof two consecutive observationsbeing identical is infinitesimal-a conditionmet by all continuouspopulations,henceby virtuallyall metricdata. Without furtherpostulates about the formof the population,it is of it leading to a possible to conceive a mathematicaltransformation knownpopulation,but leaving unalteredthe patternof rises and falls of the original observations.For example, if each observationis replaced by its rank accordingto magnitudewithinthe entireseries,the rankshave exactlythe same patternof expansionsand contractionsas the originalobservations;and theirdistributionis simpleand definite, each integerfrom1 to N having a relativefrequencyof 1/N. The distributionof phase durationsexpectedamong randomarrangementsof the digits 1 to N is, therefore,comparable with the distribution observedin any set of data. A littlemathematicalmanipulationreveals itemsthe expectednumber that in randomarrangementsof N different 2(d2+3d+1)(N-d-2) of completed phases of d iS(d+3)! mean durationof a phase 'is . The expected - 6 2N 7 - essentially12. To test the randomnessof a serieswith respectto phase durations, between the firststep is to list in order the signs of the differences successiveitems. Thus the sequence 0, 2, 1, 5, 7, 9, 8, 7, 9, 8 becomes +,-, +, +, +, -, -, +, -. The signs are, of course, one fewerthan the observations.The second step is to make a frequencydistribution of the lengthsofrunsin the signs.There are fourcompletedrunsin the example just given (the firstand last being ignoredas incomplete),of lengths,1, 3, 2, and 1. The frequencydistributionthus shows two phases of one year, one of two years, and one of threeyears. In case consecutive items are equal but it can be assumed that sufficiently refinedmeasurementwould reveal at least a slightdifference(an assumptionvalid wheneverthe test is applicable), the phase lengthsare This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions *A SIGNIFICANCE TEST FOR TIME SERIES ANALYSIS 403 tabulated separatelyforeach possible sequence of signs of differences betweentied items; and the resultantdistributionsare averaged, each being weightedby the probabilityof securingthat distributionif each difference observedas zero is equally likelyto be positive or negative. Third, the expected frequencyfor each length of phase is calculated fromthe formulaabove, taking as N the numberof items in the sequence beingtested-in this case, 10. Next, the observedand expected frequencydistributionsare compared by computingchi-squarein the that is, by squaring the differusual way for testinggoodness-of-fit: ences between actual frequenciesand correspondingtheoreticalfrequencies, dividing these squares by the respective theoretical frequencies, and summingthe resultantratios. In nearlyall applications ofthe presenttest,avoidance ofexpectedfrequenciesthat are too small necessitates restrictingthe distributionof phase durations to three frequencyclasses,namelyone year's duration,two years' duration,and over two years' duration,the theoreticalfrequenciesfor these classes and (4N-21)/60, respectively. being5(N-3)/12, ll(N-4)760, The sum of the threeratios of squared deviationsto expectationsis, then,similarto chi-squarefor two degrees of freedom,one degree of freedombeing lost because a singlelinear constraintis imposedon the theoreticalfrequenciesby takingthe value of N fromthe observations. It is advisable, however,to distinguishchi-squareforphase duratiolns by a subscriptp (denotingphase), because it does not quite conformto the Pearsonian distributionfunction ordinarilyassociated with the symbol X2. The phase lengthsin a single series are not entirelyindependentof one another;as a result,verylarge and verysmall values of and the xp2 are a littlemorelikelythan is shownby the x2 distribution, mean and variance of Xp2 generallyexceed those of X2. We have not determinedthe samplingdistributionof Xp2mathematically,but have secured empiricallya substitutethat appears satisfactory.In the first place, a recursionformulaenabled us to calculate the exact distribution of xp2 forsmall values of N. Table I gives the exact probabilityof obtaininga value as large as or largerthan each possible value of xp2 for values ofN from6 to 12, inclusive.As a secondstep towarddetermining the samplingdistribution,an empiricaldistributionof xp2 was secured from700 random series, 200 for N = 25, 300 for N = 50, and 200 for forseparate values ofN werenot homoN = 75. The threedistributions geneous with one anothernor with the exact distributionfor N=12; but the differencesamong them were unimportantfor the present purposes,occurringchieflyat the higherprobabilitiesratherthan at the tail (the importantregionfora test of significance),and representing This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions 404 AMERICAN STATISTICAL ASSOCIATION- TABLE I DISTRIBUTIONS OF xP2: EXACT FOR N=6 TO 12, AND APPROXIMATE FOR LARGER VALUES OF N (P representsthe probabilitythat an observedXp2will equal or exceed the specifiedvalue) N=9 N=6 P Xp2 .467 .867 1.194 1.667 2.394 2.867 19.667 1.000 .869 .675 .453 .367 .222 .053 N=7 Xp2 .552 .733 .752 .933 1.733 2.152 2.333 3.933 5.606 7.504 8.904 P 1.000 .789 .703 .536 .493 .370 .302 .277 .169 .117 .055 N=8 Xp2 .284 .684 .844 .920 1.320 1.480 2.364 2.680 2.935 3.000 4.375 4.455 4.935 5.000 5.819 6.455 P 1.000 .843 .665 .590 .560 .506 .495 .471 .392 .299 .293 .235 .194 .133 .064 .033 N=11 P Xp2 .358 1.158 1.267 1.630 2.067 2.430 2.758 3.158 3.267 3.667 4.030 4.067 4.758 5.667 6.067 7.485 15.666 1.000 .798 .631 .605 .489 .452 .381 .374 .321 .215 .164 .144 .110 .078 .064 .020 .005 N=10 Xp2 .328 .614 .728 1.055 1.341 1.419 1.585 1.705 1.772 1.814 1.819 2.313 2.577 2.676 2.743 2.863 2.905 2.977 3.242 3.834 3.970 4.333 4.400 4.676 4.858 5.128 5.491 6.515 7.133 11.308 12.965 P 1.000 .941 .917 .813 .693 .606 .601 .594 .592 .526 .419 .407 .374 .327 .327 .274 .242 .220 .181 .179 .165 .158 .158 .139 .107 .072 .059 .054 .042 .014 .006 Xp2 .479 .579 .817 .917 .979 1.088 1.279 1.317 1.588 1.700 1.800 2.079 2.200 2.309 2.409 2.417 2.500 2.579 2.688 2.809 3.026 3.109 3.213 3.300 3.779 3.800 3.909 4.117 4.313 4.388 4.726 5.000 5.609 5.700 6.013 8.200 8.635 9.468 9.735 10.214 11.435 N=12 P 1.000 .980 .934 .844 .730 .723 .655 .576 .537 .473 .472 .468 .467 .466 .440 .403 .392 .384 .304 .274 .261 .230 .201 .147 .147 .147 .133 .128 .126 .099 .091 .077 .077 .076 .055 .050 .032 .022 .018 .009 .004 Xp2 0.615 0.661 0.748 0.794 0.837 0.971 1.015 1.061 1.415 1.461 1.637 1.683 1.933 1.948 2.067 2.156 2.203 2.289 2.333 2.556 2.615 2.661 2.733 2.837 2.870 2.883 2.956 3.267 3.415 3.489 3.933 4.070 4.156 4.348 4.394 4.571 4.616 4.733 5.667 5.803 5.889 6.025 6.733 6.842 6.956 7.504 7.622 8.576 8.822 9.237 9.267 10.556 19.667 This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions N>12 P 1.000 .984 .896 .891 .850 .786 .720 .685 .585 .583 .569 .533 .487 .486 .428 .427 .407 .344 .333 .331 .303 .303 .300 .300 .287 .246 .216 .211 .207 .149 .127 .127 .114 .113 .113 .112 .109 .101 .092 .092 .090 .090 .085 .072 .060 .050 .041 .029 .026 .019 .014 .003 .000 Xp2 6.448 5.50 6.674 5.75 P .10 .098 .09 .087 5.927 .08 6.163 .07 6.541 .06 6.898 .05 7.401 .04 6.00 6.25 6.50 6.75 7.00 7.25 7.50 7.75 8.00 8.009 .077 .069 .061 .054 .048 .043 .038 .034 .030 .03 8.25 8.50 8.75 .027 .024 .021 8.886 .02 9.00 9.25 9.50 9.75 10.00 10.25 10.312 10.50 10.75 11.00 11.25 11.50 .019 .017 .015 .013 .012 .010 .01 .009 .008 .007 .006 .006 11.756 .005 15.085 .001 12.00 13.00 14.00 .004 .003 .002 *A SIGNIFICANCE TEST FOR TIME SERIES ANALYSIS 405 local irregularitiesrather than basic differences in form.It appears, therefore,that when N is as large as 12 a singlesamplingdistribution Of Xp2is sufficient. The mean of the 700 values of Xp2is 2.3049, and the variance 5.0458. As an approach to the distributionof Xp2, it seems reasonable simply to reduce Xp2 by approximatelyone-seventhand referit to the x2 distributionfor two degrees of freedom,which has a mean of two and tables forwhichare readilyavailable; and such a comparisondoes indicate good conformity.That the variance of the observed values is less than that of (7/6)X2fortwo degreesof freedomsuggests,however, that a moresatisfactoryfit at the tails can be securedby using a distributionhaving a varianceof 5, e.g., x2 for "two and one-halfdegrees offreedom."For xp2above about 5.5 and P below about .10, the agreement of this distributionwith the observationsis very satisfactory.In the main body of the distributionthe functionwhose mean value is equated to the sample mean gives a somewhatbetterfit. In practice,therefore,the procedurefor interpretingXp2, assumed always to be calculated fromthree frequencyclasses, is as follows: If xp2 is less than 6.3 (the point of intersectionbetweenthe ogives of (7/6)x2fortwo degreesof freedomand x2fortwo and one-halfdegrees of freedom),reduce it by one-seventhand referto the usual x2 tables fortwo degreesof freedom.This procedureis satisfactoryforall values Of xp2,but forvalues above 6.3 somewhatmoreaccurate resultsare apparentlysecuredby referring Xp2 to the last column of Table I, which gives the distributionof x2 for two and one-halfdegrees of freedom. When N < 13 the exact distributionsshould,of course,be used. A simplertest of the same naturemay be based on the fact that in a randomsequence ofN observations(whereN is not too small-not less than 10, say) the total numberof completedphases is normallydistributedabout a mean of (2N-7)/3 withvarianceof (16N-29)/90. (In betweenthe observedand expectednumusing this test,the difference bers of phases should be reducedin absolute value by one-halfunit,in order to allow for discontinuity.)This test of the total number of phases, which is essentiallyequivalent to a test of the mean phase duration,is normallyless sensitivethan the xp2 test, which takes accountof the lengthsofthe phases,thoughthe superiorityof the xp2test in this respectis limitedby the necessityof confiningthe frequency distributionto threeclasses. Advantagesofthe test ofthe total number of phases are that it is even simplerto apply than the xp2test,that its sampling distributionis known exactly and is readily available, and that it is adaptable to cases wherethe hypothesisalternativeto the null This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions 406 AMERICAN STATISTICAL ASSOCIATION* hypothesisis eitherthat the phases are abnormallylong or that they are abnormallyshort. Applicationofthe x%2testto an economicproblemmay be illustrated by an analysis of sweet potato production,yield per acre, and acreage harvestedin the United States, 1868-1937,as recordedon page 243 of Agricultural Statistics,1939. The frequencydistributionsof phase durationsin these serieshave been comparedwiththose to be expectedin a random sequence. From the values of Xp2and their corresponding probabilities,it appears that the fluctuationsin productionconform withwhat would be expectedin a randomseries;whileof the two componentsof total production,yield per acre conformswell and acreage harvesteddoes not conformat all. The figureson total productiondo not, of course, constitutea random series,forthereis a markedupward trendin the data. In general, the methodhere presentedis not verysensitiveto primarytrend.The removal of trend from a series, or its introductioninto a trendless series,can changethe sign of the difference betweenconsecutiveitems only if the trendfactorfora singleyear is greaterthan the difference between the items in trend-adjustedform.If, therefore,the residuals fromtrendare such that theirfirstdifferences are rarelyas small as the trendfactorfora singleyear,as is frequentlythe case in economictime series,the distributionofphase lengthswillnot be muchaffectedby the presenceor absence of trend.Anotherfactortendingto minimizethe effectof trendon the test is that expansionsare lengthenedand contractionsshortenedif the trendis upward,and vice versa if it is downward,leaving the total numberof phases of a given durationrelatively unaffected.In such cases the existenceof trend may be revealed by separate distributionsfor expansionsand contractions.In the case of sweetpotato production,both distributionsconformwell to the chance distributionand to one another;but foracreage the two distributions differmarkedly,suggestingthat the non-randomnessevidencedin the acreage seriesmay be at least partlyattributableto trend. Lack of sensitivityto primarytrendis a limitationof the technique fromthe pointofview ofdetectingthe existenceofsuch a trend.On the otherhand, it is not difficult to determineby othermethodswhethera primarytrendexists-the rank correlationbetweenthe variate and the date oftenaffordsa convenienttest. And fordeterminingwhetherthe systematicvariation contains secondarycomponents,e.g., cyclical or seasonal variations,it is a decided advantage of the presentmethod that it frequentlygives satisfactoryresultsregardlessof the presenceof trend,thusavoidingthe complexitiesoftrendelimination.It is possible, This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions *A SIGNIFICANCE TEST FOR TIME SERIES ANALYSIS 407 of course,forsecondaryfluctuationsalso to be concealed if theiryear to year magnitudeis definitelyless than year to year random movements; this is not so likelyas in the case of a primarytrend,but is a real possibilityin the case of gradual movements-e.g., long waves. A second exampleillustratesthe use of Xp2as a criterionof the fitof moving averages, and for selecting the proper period for a moving average. If a movingaverage, or any othercurve,describesadequately the systematicvariation in a series,the residuals should constitutea randomsequence. If the periodis too long,waves or cyclesmay appear in the residuals,and ifit is too shortthe residualswill clustertoo closely about the line. To illustrate this application, ten moving averages having spans from2 to 11 years have been fittedto the data on sweet potato acreage,and the residualstested forrandomness.Each moving average uses equal weights;the necessityof centeringeach average at an observation,however,means an implicitincreaseof one year in the span of averages based on an even numberof points,withthe firstand last observationsreceivinghalfweight.The last two columnsof Table II show the values of Xp2,and the corresponding probabilities,obtained by testingthe residuals for randomness.The moving averages based on even numbersof years give notablybetterresultsthan those based on the correspondingodd numbersof items; the taperingof the weight diagramimplicitin the even averages evidentlyimprovesthe fit.Anotherstrikingfeatureis that the probabilitiesfirstriseand thendecline. Thus, the odd averages attain a maximumprobabilityof .24 at seven years while the even averages give the best resultat six years, when the probabilityis .61. Had other weight diagrams been tested still betterresultsmighthave been obtained. It should be noted that a "better" result is not necessarilyone in whichthe curvegives a closerfit,but one in whichthe residualsbehave more like a series of independent,random observations,as judged by The closest fitsto the original sequences in signs of firstdifferences. observationsare given by the shortestmoving averages; but these describe not only the systematicvariation but also a portion of the random fluctuations.If the movingaverage is eithertoo short or too large; but in the formercase its magnitude long,Xr2will be significantly resultsfroman excess of short phases and a deficiencyof long ones, whilein the lattercase the reverseis true. Table II includesthe actual frequencydistributionsof residualsfromthe ten curves. In orderto compare this new test with a more elaborate procedure frequentlyemployedin time series analysis, a power series y=a+bx was fittedby the methodof least squares to the series +cx2+dx3+ ... This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions 408 AMERICAN STATISTICAL ASSOCIATION* TABLE II FREQUENCY DISTRIBUTIONS AVERAGES FITTED OF PHASE DURATIONS IN RESIDUALS FROM MOVING TO SWEET POTATO ACREAGE HARVESTED UNITED STATES, 1868-1937 Duration of phase Span of moving average (years) Over Total One Two Oe To two (frequency) year years years (frequency) (frequency) (frequency) x2 p X 2 Expected Observed 27.083 46 11.733 8 4.183 1 43 55 16.823 .0004 3 Expected Observed 27.083 46 11.733 8 4.183 1 43 55 16.823 .0004 4 Expected Observed 26.25 31.25 11.367 8.25 4.05 4.5 41.667 44 1.857 .45 5 Expected Observed 26.25 19.25 11.367 9 4.05 7.75 41.667 36 5.740 .09 6 Expected Observed 25.417 25.5 11 8.25 3.917 5.25 40.333 39 1.141 .61 7 Expected Observed 25.417 28.5 11 6 3.917 5.5 40.333 40 3.287 .24 8 Expected Observed 24.583 25 10.633 7 3.783 6 39 38 2.547 .34 9 Expected Observed 24.583 18.75 10.633 7.5 3.783 6.75 39 33 10 Expected Observed 23.75 23 10.267 5.5 3.65 5.5 37.667 34 3.175 .26 11 Expected Observed 23.75 22 10.267 2 3.65 7 37.667 31 9.861 .01 4.634 .14 on sweetpotato acreage harvested.The calculationswere carriedas far as the ninthdegreeterm,usingthetechniqueoforthogonalpolynomials, but none beyond the third effecteda significantreduction in the residual variance. Accordingto the usual criterion,therefore,a third degree curve would be regarded as fittingadequately. The residuals fromthe thirddegree curve were then submittedto the presenttest. There were24 one-yearphases,3 two-yearphases,and 9 phases ofmore than two years,producinga Xp2of 12.47,fromwhichit is clear that the fit of the third degree polynomialis quite inadequate. The fault, of course,lies in inferring that a thirddegreepower series gives an adequate fitbecause no otherpower series gives a significantly betterfit as judged by the standard errorof estimate. This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions *A SIGNIFICANCE TEST FOR TIME SERIES ANALYSIS 409 xp2can also be used to test the independenceof two variates,and in some circumstancesis superiorforthis purposeto the rank correlation coefficient. The procedureis to arrangethe pairs accordingto the order of magnitude of one variate and tabulate the distributionof phase durationsin the other variate. If the two series are independent,the resultingvalue of xp2 will not be significant.A difficulty, however,is that the conclusionoccasionallydepends upon whichvariate is chosen forarrangingin orderand whichforcountingthe phase durations. It is perhaps advisable to emphasizeexplicitlythat the presenttest by no means utilizesall of the information in the data. In particular,it ignoresthe magnitude of the year to year fluctuations,treatingthe smallest as equivalent to the largest. A serial correlationcoefficient computedfromranksmay retrievesome of this information on magnitude. Another considerationin interpretingthe test is that a set of phase durations which appears random when viewed only as a frequency distributionmay not have been arrangedat random in time. An additional point, obvious but worthyof mention,is that the time unit used may affectconclusionsderivedfromthe Xp2test; forexample, year to year movementsmay appear random and month to month movementsnon-random,or vice versa. This content downloaded on Tue, 19 Feb 2013 17:28:43 PM All use subject to JSTOR Terms and Conditions
© Copyright 2026 Paperzz