The Statistical Sign Test Author(s): W. J. Dixon and A. M. Mood Reviewed work(s): Source: Journal of the American Statistical Association, Vol. 41, No. 236 (Dec., 1946), pp. 557566 Published by: American Statistical Association Stable URL: http://www.jstor.org/stable/2280577 . Accessed: 19/02/2013 17:39 Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at . http://www.jstor.org/page/info/about/policies/terms.jsp . JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms of scholarship. For more information about JSTOR, please contact [email protected]. . American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal of the American Statistical Association. http://www.jstor.org This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions THE STATISTICAL SIGN TEST* W. J. DIXON of Oregon University A. M. MOOD Iowa State College This paper presentsand illustratesa simplestatisticaltest forjudgingwhetherone oftwo materialsor treatmentsis betterthantheother.The data to whichthetestis appliedconsist of paired observationson the two materialsor treatments. betweenthe The test is based on the signs of the differences pairs of observations. It is immaterialwhetherall the pairs of observationsare comparableor not. However,when all the pairs are compartests (the t test, for example) able, there are more efficient whichtake account of the magnitudesas well the signsof the Even in this case, the simplicityof the sign test differences. appraisal ofthe makesit a usefultool fora quick preliminary data. In thispapertheresultsofpreviouslypublishedworkon the sign test have been included,togetherwith a table of significance levels and illustrativeexamples. IN INTRODUCTION EXPERIMENTAL investigations,it is oftendesired to compare two materials or treatmentsunder various sets of conditions.Pairs of observations(one observationforeach of the two materials or treatments) are obtained for each of the separate sets of conditions.For example,in comparingthe yieldof two hybridlines of corn,A and B, one mighthave a few results fromeach of several experimentscarried out under widelyvaryingconditions.The experimentsmay have and fertilizers, been performedon differentsoil types, with different yearswithconsequentvariationsin seasonal effectssuch as in different rainfall,temperature,amount of sunshine,and so forth.It is supposed that both linesappeared equally oftenin each block ofeach experiment so that the observedyields occurin pairs (one yield foreach line) produced under quite similarconditions. The above example illustratesthe circumstancesunder which the signtestis mostuseful: (a) There are pairs of observationson two thingsbeing compared. * This paper is an adaptation of a memorandumsubmittedto the Applied MathematicsParel by the StatisticalResearch Group,PrincetonUniversity.The StatisticalResearch Group operatedunder a contractwiththe Officeof ScientificResearch and Development,and was directedby the Applied Mathematic Panel ofthe National DefenseResearchCommittee. 557 This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions 558 AMERICAN STATISTICAL ASSOCIATION (b) Each ofthe two observationsof a given pair arose undersimilar conditions. (c) The different pairs were observedunder different conditions. This last conditiongenerallymakes the t test invalid. If this were not the case (that is, if all the pairs of observationswere comparable),the ttest would ordinarilybe employedunlesstherewereotherreasons,for example,obvious non-normality, fornot usingit. Even whenthe t test is the appropriatetechniquemanystatisticians like to use the sign test because of its extremesimplicity.One merely counts the numberof positiveand negativedifferences and refersto a table of significancevalues. Frequently the question of significance may be settled at once by the sign test withoutany need forcalculations. It should be pointed out that, strictlyspeaking,the methodsofthis paper are applicable only to the case in which no ties in paired comparisons occur. In practice,however,even when ties would not occur if measurementswere sufficiently precise,ties do occur because measurementsare oftenmade only to the nearest unit or tenth of a unit forexample. Such ties should be includedamong the observationswith half ofthembeing counted as positiveand halfnegative. Finally, it is assumed that the differences between paired observations are independent,that is, that the outcome of one pair of observations is in no way influencedby the outcome of any otherpair. PROCEDURE Let A and B representtwo materialsor treatmentsto be compared. Let x and y representmeasurementsmade on A and B. Let the number of pairs of observationsbe n. The n pairs of observationsand their differences may be denoted by: and (XI, X1 - Yl1), Yl, (X2, X2 - y2), Y2 Yn) ***,(X., ...* Xn - Yn. The sign test is based on the signs of these differences. The letter r willbe used to denotethe numberoftimestheless frequentsign occurs. If some ofthe differences are zero,halfofthemwillbe givena plus sign and halfa minussign. As an example ofthe type ofdata forwhichthe sign test is appropriate, we may considerthe followingyields of two hybridlines of corn obtained fromseveral differentexperiments.In this example n =28 and r=7. This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions THE STATISTICAL 559 SIGN TEST in theyieldingabilityofthetwolines,the If thereis no difference by thebinomialdispositiveand negativesignsshouldbe distributed tributionwith p= . The null hypothesishere is that each difference distribution has a probability (whichneednotbe thesameforall difwithmedianequal to zero.This nullhypothesis willobtain, ferences) is symmetrically about a distributed forinstance,if each difference is notnecessary.The nullhymeanofzero,althoughsuchsymmetry pothesiswillbe rejectedwhenthe numbersof positiveand negative fromequality. signsdiffer significantly YIELDS Experiment Number 1 2 8 LINES OF CORN B Sign of x-y Experiment Number 47.8 48.6 47.6 43.0 42.1 41.0 46.1 50.1 48.2 48.6 + - 4 43.4 _ 28.9 29.0 27.4 28.1 28.0 28.3 26.4 26.8 38.8 31.1 28.0 27.5 28.7 28.8 26.3 26.1 + _ 33.3 30.6 32.4 31.7 + _ A Yield of OF TWO HYBRID 42.9 - - + + A Yield of B Sign of x-v _ _ + 40.8 39.8 42.2 41.4 41.3 40.8 42.0 42.5 38.9 39.0 37.5 39.14 39.4 37.3 + 6 36.8 35.9 33.6 37.5 37.3 34.0 - 7 39.2 39.1 40.1 42.6 - 5 - - Table 1 givesthecriticalvaluesofr forthe 1, 5, 10,and 25 percent A discussionofhowthesevaluesare computed levelsofsignificance. maybe foundin theappendix.A value ofr lessthanorequal to that at thegivenpercentlevel. in thetableis significant Thus in the example above where n = 28 and r = 7, there is sig- at the 5% level,as shownby Table 1. That is, the chances nificance are only1 in 20 ofobtaining a value ofr equal to orlessthan8 when thereis no real difference in the yieldsofthe two linesof corn.It is at the5% levelof significance, thatthetwolines concluded, therefore, havedifferent yields. In general,thereare no valuesofr whichcorrespond exactlyto the levelsofsignificance 1, 5, 10, 25 per cent.The values givenare such thattheyresultin a level of significance as closeas possibleto, but notexceeding1,5, 10,25 percent.Thus?thetestis a littlemorestrict, This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions TABLE 1 TABLE OF CRITICAL VALUES OF r FOR THE SIGN TEST Per Cent Level of Significance n 1 5 Per Cent Level of Significance 10 25 n 1 5 10 25 - - 0 0 0 0 51 52 53 54 55 15 16 16 17 17 18 18 18 19 19 19 19 20 20 20 20 21 21 22 22 0 0 0 0 0 0 1 1 0 0 1 1 1 1 1 1 2 2 56 57 58 59 60 17 18 18 19 19 20 20 21 21 21 21 21 22 22 23 23 23 24 24 25 11 12 13 14 15 0 1 1 1 2 1 2 2 2 3 2 2 3 3 3 3 3 3 4 4 61 62 63 64 65 20 20 20 21 21 22 22 23 23 24 23 24 24 24 25 25 25 26 26 27 16 17 18 19 20 2 2 3 3 3 3 4 4 4 5 4 4 5 5 5 5 5 6 6 6 66 67 68 69 70 22 22 22 23 23 24 25 25 25 26 25 26 26 27 27 27 28 28 29 29 21 22 23 24 25 4 4 4 5 5 5 5 6 6 7 6 6 7 7 7 7 7 8 8 9 71 72 73 74 75 24 24 25 25 25 26 27 27 28 28 28 28 28 29 29 30 30 31 31 32 26 27 28 29 30 6 6 6 7 7 7 7 8 8 9 8 8 9 9 10 9 10 10 10 11 76 77 78 79 80 26 26 27 27 28 28 29 29 30 30 30 30 31 31 32 32 32 33 33 34 31 32 33 34 35 7 8 8 9 9 9 9 10 10 11 10 10 11 11 12 11 12 12 13 13 81 82 83 84 85 28 28 29 29 30 31 31 32 32 32 32 33 33 33 34 34 35 35 36 36 36 37 38 39 40 9 10 10 11 11 11 12 12 12 13 12 13 13 13 14 14 14 14 15 15 86 87 88 89 90 30 31 31 31 32 33 33 34 34 35 34 35 35 36 36 37 37 38 38 39 41 42 43 44 45 11 12 12 13 13 13 14 14 15 15 14 15 15 16 16 16 16 17 17 18 91 92 93 94 95 32 33 33 34 34 35 36 36 37 37 37 37 38 38 38 39 39 40 40 41 46 47 48 49 50 13 14 14 15 15 15 16 16 17 17 16 17 17 18 18 18 19 19 19 20 96 97 98 99 100 34 35 35 36 36 37 38 38 39 39 39 39 40 40 41 41 42 42 43 43 1 2 3 4 5 6 7 8 9 10 _ - - - _ _ _ - - - For n > 100,approximatevalues of r may be foundby takingthenearestintegerlessthan in -k v n, wherek =1.3, 1, .82, .58 forthe 1, 5,10, 25 per centvalues respectively.A closerapproximationto the values of r is obtained fromi(n -1)-k -4 In+1 and the more exact values of k, 1.2879, .9800, .8224, .5752. This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions THE STATISTICAL 561 SIGN TEST whichis indicated.For on the average,thanthe level ofsignificance morest-ictin somecases. For smallsamplesthetestis considerably thevalueofr forn= 12forthe10percentlevelofsignificance example, to a percentlevellessthan5. actuallycorresponds The criticalvaluesofr inTable 1 forthevariouslevelsofsignificance werecomputedforthecases whereeitherthe +'s or -'s occura sigsmallnumberof times.Sometimesthe interestmay be in nificantly A and B, onlyoneofthesigns.Forexample,intestingtwotreatments, which can only B certain additions with for except A maybe identical of In this be interested would B. case one have the effect improving in the of minussigns(fordifferences onlyin whetherthe deficiency or not. In cases ofthiskindthe A minusB) weresignificant direction inTable 1 wouldbe dividedbytwo.Thus, percentlevelsofsignificance to the2.5% levelof 8 minussignsin a sampleof28 wouldcorrespond significance. SIZE OF SAMPLE a sampleoffourorevenfive Even thoughthereis no realdifference, withall signs alikewilloccurby chancemorethan 5% of the time. Foursignsalikewilloccurby chance12.5% ofthetimeand fivesigns at the 5% alikewilloccurby chance6.25% of the time.Therefore, to haveat leastsixpairsofobservaitis necessary levelofsignificance, tionseven if all signsare alike beforeany decisioncan be made. As in moststatisticalwork,morereliableresultsare obtainedfrom use thesign Onewouldnotordinarily a largernumberofobservations. testforsamplesas smallas 10 or 15, exceptforroughor preliminary work. samplesizenecessary The questionmaybe raisedas to theminimum Supposethatin an indefiintwomaterials. to detecta givendifference nitelylargenumberofobservations 30% +'s and 70% -'s are to be expectedand thatwewishthesampleto be largeenoughto detectthis Althoughno sample,howat the 1% level ofsignificance. difference difference everlarge,willmakeit absolutelycertainthata significant willbe found,the samplesize can be chosento makethe probability a significant of finding resultas near to certaintyas is desired.In Table 2, this probability has been chosenas 95%; the minimum criticalvalues of r values of n (samplesize) and the corresponding to insurea decision95% ofthetimeare givenforvariousactual pera. centagespoand levelsofsignificance froma ofdepartures The signtestmerelymeasuresthesignificance 45-55, then 50-50 distribution. If the signsare actuallydistributed unlessthe the departurefrom50-50 is not likelyto be significant This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions 562 AMERICAN STATISTICAL ASSOCIATION sample is quite large. Table 2 shows that if the signs are actually distributed45-55, then one must take samples of 1,297 pairs in order to get a significantdeparture froma 50-50 distributionat the 5% level of significance.The number1,297 is selected to give the desired significance95% of the time; that is, if a large numberof samples of 1,297 each were drawn froma 45-55 distribution,then 95% of those samples could be expected to indicate a significantdeparture (at the 5% level) froma 50-50 distribution. TABLE 2 MINIMUM VALUES OF n NECESSARY TO FIND SIGNIFICANT 95% OF THE TIME FOR VARIOUS DIFFERENCES GIVEN PROPORTIONS Pe .45(.55) .40(.60) .35(.65) .30(.70) .25(.75) .20(.80) .15(.85) .10(.90) .05(.95) r nt =1% 1,777 442 193 106 66 44 32 24 15 5% 10% 25% a=1% 5% 10% 25% 1,297 327 143 79 49 35 23 17 12 1,080 267 118 67 42 28 18 13 11 780 193 86 47 32 21 14 11 6 833 193 78 39 22 13 8 5 2 612 145 59 30 17 11 6 4 2 612 373 87 37 19 12 7 4 3 1 119 49 26 15 9 5 3 2 The italicizedvalues are approximate.The maximumerroris about 5 forthe value of n, and 2 forthe value of r. The values of n and r for5% weretaken fromMacStewart (reference1) who gives a table ofvalues of n and r fora rangeof confidencecoefficients (the above table uses only 95%) and a singlevalue a ' 5%. Of course,in practice one would not do any testingif he knew in advance the expected distributionof signs (that it was 45-55, for example). The practical significanceof Table 2 is of the following nature: In comparingtwo materials one is interestedin determining value. Before the inwhetherthey are of about equal or of different vestigationis begun, a decision must be made as to how different the materials must be in order to be classed as different.Expressed may be toleratedin the statein anotherway, how large a difference mentthat "the two materialsare ofabout equal value?" This decision, togetherwithTable 2, determinesthe sample size. If one is interested in detectinga differenceso small that the signs may be distributed 45-55, he must be preparedto take a very large sample. If, however, one is interestedonlyin detectinglargerdifferences, (forexample,differencesrepresentedby a 70-30 distributionofsigns),a smallersample willsuffice. In many investigations,the sample size can be leftundetermined, This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions THE STATISTICAL 563 SIGN TEST and onlyas muchdata accumulatedas is needed to arriveat a decision. In such cases, the signtest could be used in conjunctionwithmethods of sequential analysis. These methods provide a desiredamount of informationwith the minimumamount of sampling on the average. A completeexpositionofthe theoryand practiceofsequential analysis may be foundin references3 and 4. MODIFICATIONS OF THE SIGN TEST When the data are homogeneous (measurementsare comparable between pairs of observations),the sign test can be used to answer questionsofthe followingkind: 1. Is materialA betterthan B by P per cent? 2. Is materialA betterthan B by Q units? The firstquestion would be tested by increasingthe measurementon B by P per cent and comparingthe resultswiththe measurementson A. Thus, let (X1, y,), (X2, Y2), (X3, Y3), etc. be pairs of measurementson A and B, and suppose one wishedto test the hypothesisthat the measurements,x, on A were 5% higherthan the measurements,y, on B. The sign test would simplybe applied to the signs of the differences xl- 1.05y1, X2 - 1.05Y2, X3 - 1.05y3, etc. In the case ofthe second questionthe sign test would be applied to the differences x- (Yl + Q), x2 - (Y2 + Q), X3 - (Y3 + Q),etc. In eithercase, if the resultingdistributionof signs is not significantly differentfrom 50-50, the data are not inconsistentwith a positive answer to the question. Usually there will be a range of values of P distributionof signs. If (or Q) which will produce a non-significant a one determinessuch range,using the 5% level ofsignificanceforexample, then that range will be a 95% confidenceintervalforP (or Q). Even when the data are not homogeneous,it may be possible to framequestionsofthe above kind, or it may be possible to change the scales of measurementso that such questions would be meaningful. MATHEMATICAL APPENDIX A. Assumptions.Let observationson two materialsor treatmentsA and B be denoted by x and y, respectively.It is assumed that forany that pair of observations (xi, yi) there is a probabilityp(O<p<1) This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions 564 AMERICAN STATISTICAL ASSOCIATION *, n); p is assumed to be unknown.' It is also xi>yi (i=l, 2, assumed that the n pairs of observations (xi, yi), (i=l, 2, , n) are independent;i.e., the outcome (+ or -) for(xi, y1)is independent of the outcomefor (xi, y,) (isj). B. The Observations. The purpose of obtainingobservations(xi, yj) is to make an inferenceregardingp. The observedquantityupon which an inferenceis to be based is r, the numberof +'s or -'s (whichever occur in fewernumbers) obtained fromn paired observations(xi, yi). On the basis ofthe assumptionabove it followsthat the probabilityof obtainingexactlyr as the minimumnumberof +'s or -'s is: In\ (n) r [pr(1 - p) n-r + pn-r(l - p)r] r = 0, 1, 2, n -i , X ; n odd ~~~~~~~~~~~~2 n-2 r-OJ1 2}}, - ;neven 2~~~ r = 2; n even. Vpnlp)in (~~ C. The Inference.In the sign test the hypothesisbeing testedis that p=J; in other words that the distributionsof the differences Xi-yi (i= 1, 2, * , n) have zero medians. For the more general tests discussedin Section5, the hypothesisis that the differences xi-f(yi) (i= 1, 2, * * , n) have zero medians. The functionf(y) may be Py or Q+y (whereP and Q are the constantsmentionedin Section 5) or any other functionappropriate for comparisonwith x in the problemat hand. The hypothesisthat p= 2 is tested by dividingthe possiblevalues of r into two classes and accepting or rejectingthe hypothesisaccording as r fallsin one or the otherclass. The classes are chosenso as to make small (say < a) the chance of rejecting the hypothesis when it is true and also to make small the chance of accepting the hypothesis when it is untrue.It can be shown that in a certainsense, the best set of rejectionvalues forr is 0, 1, * *, 1, R whereR depends on a and n. R can be determinedby solvingforR =maximum i in the inequality: ( )(42)= I&n- i, i+ 1) _ c whereI. (a, b) is the incompletebeta function.Table 1 was computed in this way. 1 An additional assumptionis that the probabilityAi -Bi is 0; thus the probabilityBj >A, is (1-p). This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions THE STATISTICAL 565 SIGN TEST D. SampleSizes.Whenthesamplesizeis smallthesigntestis likely to rejectthehypothesis, p= 4,onlyifp is nearzeroorone.If p is near, butnotequalto 4,thetestis likelyto rejectthehypothesis, p= 4,only whenthesampleis large. The samplesize requiredto rejectthehypothesis p = 4at thea level ofsignificance, by finding the 100X%ofthetime,maybe determined largesti and smallestn whichsatisfy: -a ~0C)(~Y and j=0 - (.pi( p) f-i >\ n andi aregivenin Table II forvariousvaluesofp and a; Xwas taken to be .95in all cases.The tabularvaluesfor1-p arethesameas those forp becauseofthesymmetry ofthebinomialdistribution. E. Efficiency of theSign Test. Let z = x-y. Assume z is normally distributed withmeana and variancecr2.The probabality ofobtaining a + on a particular zi is: 1 p \=2r 00? e-lu'du. An estimateof p involvingonlythe signs of zi (i 1, 2, an estimateof * n) yields Cochran(reference 2) has shownthat in large (a/co). samplesthevariance ofthisestimate of (a/cr)is 2rpq e(aU)2 /n., We shall denotea/crby c. The efficiency ofan estimatebased on n independent observations is defined as thelimit(as n-> oo) oftheratioofthevarianceofan ef- estimateto thatofthe givenestimate.An efficient ficient estimateof c is: t wheret is Student's t and z -zi/n. The varianceof this estimateis 1/(n-2); thus the efficiency, E, is of the signtestis e_c2/27rpq.If c = 0, thenp= and the efficiency 2/7r=63.7%. The precedingdiscussionpertainsto largevalues of n; forsmall is a littlebetterthan63.7%. Computations valuesofn, theefficiency This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions 566 AMERICAN STATISTICAL ASSOCIATION were made forseveral smaller values of n, namely,forn = 18, 30, 44 pairs of observationsat the 10% level of significance.It was found that the sign test using 18 pairs of observations is approximately equivalent to the t-testusing 12 pairs of observations; for 30 pairs the equivalent t-testrequires between 20 and 21 pairs; and for 44 pairs the equivalent t-testrequiresbetween 28 and 29 pairs. Cochran shows that the efficiencyof r/n for estimating c decreases as Ic| increases. REFERENCES [1] W. MacStewart, 'A note on the power of the sign test," Annals of MathematicalStatistics,Vol. 12 (1941), pp. 236-238. of of the binomialseriestest of significance [2] W. G. Cochran,"The efficiencies Journal Royal StatisticalSociety, a mean and of a correlationcoefficient," Vol. C, Part I (1937), pp. 69-73. [3] A. Wald, "Sequentialmethodofsamplingfordecidingbetweentwo coursesof action,"JournalAmericanStatisticalAssociation,Vol. 40 (1945), pp. 277-306. [4] Statistical Research Group, Columbia University,Sequential Analysis of StatisticalData: Applications(1945), Columbia UniversityPress,New York. This content downloaded on Tue, 19 Feb 2013 17:39:22 PM All use subject to JSTOR Terms and Conditions
© Copyright 2026 Paperzz