Dixon and Mood 1946 - The Statistical Sign Test

The Statistical Sign Test
Author(s): W. J. Dixon and A. M. Mood
Reviewed work(s):
Source: Journal of the American Statistical Association, Vol. 41, No. 236 (Dec., 1946), pp. 557566
Published by: American Statistical Association
Stable URL: http://www.jstor.org/stable/2280577 .
Accessed: 19/02/2013 17:39
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact [email protected].
.
American Statistical Association is collaborating with JSTOR to digitize, preserve and extend access to Journal
of the American Statistical Association.
http://www.jstor.org
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions
THE STATISTICAL SIGN TEST*
W. J. DIXON
of Oregon
University
A. M. MOOD
Iowa State College
This paper presentsand illustratesa simplestatisticaltest
forjudgingwhetherone oftwo materialsor treatmentsis betterthantheother.The data to whichthetestis appliedconsist
of paired observationson the two materialsor treatments.
betweenthe
The test is based on the signs of the differences
pairs of observations.
It is immaterialwhetherall the pairs of observationsare
comparableor not. However,when all the pairs are compartests (the t test, for example)
able, there are more efficient
whichtake account of the magnitudesas well the signsof the
Even in this case, the simplicityof the sign test
differences.
appraisal ofthe
makesit a usefultool fora quick preliminary
data.
In thispapertheresultsofpreviouslypublishedworkon the
sign test have been included,togetherwith a table of significance levels and illustrativeexamples.
IN
INTRODUCTION
EXPERIMENTAL
investigations,it is oftendesired to compare two
materials or treatmentsunder various sets of conditions.Pairs of
observations(one observationforeach of the two materials or treatments) are obtained for each of the separate sets of conditions.For
example,in comparingthe yieldof two hybridlines of corn,A and B,
one mighthave a few results fromeach of several experimentscarried out under widelyvaryingconditions.The experimentsmay have
and
fertilizers,
been performedon differentsoil types, with different
yearswithconsequentvariationsin seasonal effectssuch as
in different
rainfall,temperature,amount of sunshine,and so forth.It is supposed
that both linesappeared equally oftenin each block ofeach experiment
so that the observedyields occurin pairs (one yield foreach line) produced under quite similarconditions.
The above example illustratesthe circumstancesunder which the
signtestis mostuseful:
(a) There are pairs of observationson two thingsbeing compared.
* This paper is an adaptation of a memorandumsubmittedto the Applied MathematicsParel by
the StatisticalResearch Group,PrincetonUniversity.The StatisticalResearch Group operatedunder
a contractwiththe Officeof ScientificResearch and Development,and was directedby the Applied
Mathematic Panel ofthe National DefenseResearchCommittee.
557
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions
558
AMERICAN
STATISTICAL
ASSOCIATION
(b) Each ofthe two observationsof a given pair arose undersimilar
conditions.
(c) The different
pairs were observedunder different
conditions.
This last conditiongenerallymakes the t test invalid. If this were not
the case (that is, if all the pairs of observationswere comparable),the
ttest would ordinarilybe employedunlesstherewereotherreasons,for
example,obvious non-normality,
fornot usingit.
Even whenthe t test is the appropriatetechniquemanystatisticians
like to use the sign test because of its extremesimplicity.One merely
counts the numberof positiveand negativedifferences
and refersto a
table of significancevalues. Frequently the question of significance
may be settled at once by the sign test withoutany need forcalculations.
It should be pointed out that, strictlyspeaking,the methodsofthis
paper are applicable only to the case in which no ties in paired comparisons occur. In practice,however,even when ties would not occur
if measurementswere sufficiently
precise,ties do occur because measurementsare oftenmade only to the nearest unit or tenth of a unit
forexample. Such ties should be includedamong the observationswith
half ofthembeing counted as positiveand halfnegative.
Finally, it is assumed that the differences
between paired observations are independent,that is, that the outcome of one pair of observations is in no way influencedby the outcome of any otherpair.
PROCEDURE
Let A and B representtwo materialsor treatmentsto be compared.
Let x and y representmeasurementsmade on A and B. Let the number of pairs of observationsbe n. The n pairs of observationsand their
differences
may be denoted by:
and
(XI,
X1 -
Yl1),
Yl,
(X2,
X2 -
y2),
Y2
Yn)
***,(X.,
...*
Xn -
Yn.
The sign test is based on the signs of these differences.
The letter r
willbe used to denotethe numberoftimestheless frequentsign occurs.
If some ofthe differences
are zero,halfofthemwillbe givena plus sign
and halfa minussign.
As an example ofthe type ofdata forwhichthe sign test is appropriate, we may considerthe followingyields of two hybridlines of corn
obtained fromseveral differentexperiments.In this example n =28
and r=7.
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions
THE STATISTICAL
559
SIGN TEST
in theyieldingabilityofthetwolines,the
If thereis no difference
by thebinomialdispositiveand negativesignsshouldbe distributed
tributionwith p= . The null hypothesishere is that each difference
distribution
has a probability
(whichneednotbe thesameforall difwithmedianequal to zero.This nullhypothesis
willobtain,
ferences)
is symmetrically
about a
distributed
forinstance,if each difference
is notnecessary.The nullhymeanofzero,althoughsuchsymmetry
pothesiswillbe rejectedwhenthe numbersof positiveand negative
fromequality.
signsdiffer
significantly
YIELDS
Experiment
Number
1
2
8
LINES OF CORN
B
Sign of
x-y
Experiment
Number
47.8
48.6
47.6
43.0
42.1
41.0
46.1
50.1
48.2
48.6
+
-
4
43.4
_
28.9
29.0
27.4
28.1
28.0
28.3
26.4
26.8
38.8
31.1
28.0
27.5
28.7
28.8
26.3
26.1
+
_
33.3
30.6
32.4
31.7
+
_
A
Yield of
OF TWO HYBRID
42.9
-
-
+
+
A
Yield of
B
Sign of
x-v
_
_
+
40.8
39.8
42.2
41.4
41.3
40.8
42.0
42.5
38.9
39.0
37.5
39.14
39.4
37.3
+
6
36.8
35.9
33.6
37.5
37.3
34.0
-
7
39.2
39.1
40.1
42.6
-
5
-
-
Table 1 givesthecriticalvaluesofr forthe 1, 5, 10,and 25 percent
A discussionofhowthesevaluesare computed
levelsofsignificance.
maybe foundin theappendix.A value ofr lessthanorequal to that
at thegivenpercentlevel.
in thetableis significant
Thus in the example above where n = 28 and r = 7, there is sig-
at the 5% level,as shownby Table 1. That is, the chances
nificance
are only1 in 20 ofobtaining
a value ofr equal to orlessthan8 when
thereis no real difference
in the yieldsofthe two linesof corn.It is
at the5% levelof significance,
thatthetwolines
concluded,
therefore,
havedifferent
yields.
In general,thereare no valuesofr whichcorrespond
exactlyto the
levelsofsignificance
1, 5, 10, 25 per cent.The values givenare such
thattheyresultin a level of significance
as closeas possibleto, but
notexceeding1,5, 10,25 percent.Thus?thetestis a littlemorestrict,
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions
TABLE 1
TABLE OF CRITICAL
VALUES OF r FOR THE SIGN TEST
Per Cent Level of
Significance
n
1
5
Per Cent Level of
Significance
10
25
n
1
5
10
25
-
-
0
0
0
0
51
52
53
54
55
15
16
16
17
17
18
18
18
19
19
19
19
20
20
20
20
21
21
22
22
0
0
0
0
0
0
1
1
0
0
1
1
1
1
1
1
2
2
56
57
58
59
60
17
18
18
19
19
20
20
21
21
21
21
21
22
22
23
23
23
24
24
25
11
12
13
14
15
0
1
1
1
2
1
2
2
2
3
2
2
3
3
3
3
3
3
4
4
61
62
63
64
65
20
20
20
21
21
22
22
23
23
24
23
24
24
24
25
25
25
26
26
27
16
17
18
19
20
2
2
3
3
3
3
4
4
4
5
4
4
5
5
5
5
5
6
6
6
66
67
68
69
70
22
22
22
23
23
24
25
25
25
26
25
26
26
27
27
27
28
28
29
29
21
22
23
24
25
4
4
4
5
5
5
5
6
6
7
6
6
7
7
7
7
7
8
8
9
71
72
73
74
75
24
24
25
25
25
26
27
27
28
28
28
28
28
29
29
30
30
31
31
32
26
27
28
29
30
6
6
6
7
7
7
7
8
8
9
8
8
9
9
10
9
10
10
10
11
76
77
78
79
80
26
26
27
27
28
28
29
29
30
30
30
30
31
31
32
32
32
33
33
34
31
32
33
34
35
7
8
8
9
9
9
9
10
10
11
10
10
11
11
12
11
12
12
13
13
81
82
83
84
85
28
28
29
29
30
31
31
32
32
32
32
33
33
33
34
34
35
35
36
36
36
37
38
39
40
9
10
10
11
11
11
12
12
12
13
12
13
13
13
14
14
14
14
15
15
86
87
88
89
90
30
31
31
31
32
33
33
34
34
35
34
35
35
36
36
37
37
38
38
39
41
42
43
44
45
11
12
12
13
13
13
14
14
15
15
14
15
15
16
16
16
16
17
17
18
91
92
93
94
95
32
33
33
34
34
35
36
36
37
37
37
37
38
38
38
39
39
40
40
41
46
47
48
49
50
13
14
14
15
15
15
16
16
17
17
16
17
17
18
18
18
19
19
19
20
96
97
98
99
100
34
35
35
36
36
37
38
38
39
39
39
39
40
40
41
41
42
42
43
43
1
2
3
4
5
6
7
8
9
10
_
-
-
-
_
_
_
-
-
-
For n > 100,approximatevalues of r may be foundby takingthenearestintegerlessthan in -k v n,
wherek =1.3, 1, .82, .58 forthe 1, 5,10, 25 per centvalues respectively.A closerapproximationto the
values of r is obtained fromi(n -1)-k -4 In+1 and the more exact values of k, 1.2879, .9800, .8224,
.5752.
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions
THE STATISTICAL
561
SIGN TEST
whichis indicated.For
on the average,thanthe level ofsignificance
morest-ictin somecases. For
smallsamplesthetestis considerably
thevalueofr forn= 12forthe10percentlevelofsignificance
example,
to a percentlevellessthan5.
actuallycorresponds
The criticalvaluesofr inTable 1 forthevariouslevelsofsignificance
werecomputedforthecases whereeitherthe +'s or -'s occura sigsmallnumberof times.Sometimesthe interestmay be in
nificantly
A and B,
onlyoneofthesigns.Forexample,intestingtwotreatments,
which
can only
B
certain
additions
with
for
except
A maybe identical
of
In
this
be
interested
would
B.
case
one
have the effect improving
in the
of minussigns(fordifferences
onlyin whetherthe deficiency
or not. In cases ofthiskindthe
A minusB) weresignificant
direction
inTable 1 wouldbe dividedbytwo.Thus,
percentlevelsofsignificance
to the2.5% levelof
8 minussignsin a sampleof28 wouldcorrespond
significance.
SIZE
OF SAMPLE
a sampleoffourorevenfive
Even thoughthereis no realdifference,
withall signs alikewilloccurby chancemorethan 5% of the time.
Foursignsalikewilloccurby chance12.5% ofthetimeand fivesigns
at the 5%
alikewilloccurby chance6.25% of the time.Therefore,
to haveat leastsixpairsofobservaitis necessary
levelofsignificance,
tionseven if all signsare alike beforeany decisioncan be made.
As in moststatisticalwork,morereliableresultsare obtainedfrom
use thesign
Onewouldnotordinarily
a largernumberofobservations.
testforsamplesas smallas 10 or 15, exceptforroughor preliminary
work.
samplesizenecessary
The questionmaybe raisedas to theminimum
Supposethatin an indefiintwomaterials.
to detecta givendifference
nitelylargenumberofobservations
30% +'s and 70% -'s are to be
expectedand thatwewishthesampleto be largeenoughto detectthis
Althoughno sample,howat the 1% level ofsignificance.
difference
difference
everlarge,willmakeit absolutelycertainthata significant
willbe found,the samplesize can be chosento makethe probability
a significant
of finding
resultas near to certaintyas is desired.In
Table 2, this probability
has been chosenas 95%; the minimum
criticalvalues of r
values of n (samplesize) and the corresponding
to insurea decision95% ofthetimeare givenforvariousactual pera.
centagespoand levelsofsignificance
froma
ofdepartures
The signtestmerelymeasuresthesignificance
45-55, then
50-50 distribution.
If the signsare actuallydistributed
unlessthe
the departurefrom50-50 is not likelyto be significant
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions
562
AMERICAN
STATISTICAL
ASSOCIATION
sample is quite large. Table 2 shows that if the signs are actually
distributed45-55, then one must take samples of 1,297 pairs in order
to get a significantdeparture froma 50-50 distributionat the 5%
level of significance.The number1,297 is selected to give the desired
significance95% of the time; that is, if a large numberof samples of
1,297 each were drawn froma 45-55 distribution,then 95% of those
samples could be expected to indicate a significantdeparture (at the
5% level) froma 50-50 distribution.
TABLE 2
MINIMUM VALUES OF n NECESSARY TO FIND SIGNIFICANT
95% OF THE TIME FOR VARIOUS
DIFFERENCES
GIVEN PROPORTIONS
Pe
.45(.55)
.40(.60)
.35(.65)
.30(.70)
.25(.75)
.20(.80)
.15(.85)
.10(.90)
.05(.95)
r
nt
=1%
1,777
442
193
106
66
44
32
24
15
5%
10%
25%
a=1%
5%
10%
25%
1,297
327
143
79
49
35
23
17
12
1,080
267
118
67
42
28
18
13
11
780
193
86
47
32
21
14
11
6
833
193
78
39
22
13
8
5
2
612
145
59
30
17
11
6
4
2
612
373
87
37
19
12
7
4
3
1
119
49
26
15
9
5
3
2
The italicizedvalues are approximate.The maximumerroris about 5 forthe value of n, and 2
forthe value of r. The values of n and r for5% weretaken fromMacStewart (reference1) who gives
a table ofvalues of n and r fora rangeof confidencecoefficients
(the above table uses only 95%) and a
singlevalue a ' 5%.
Of course,in practice one would not do any testingif he knew in
advance the expected distributionof signs (that it was 45-55, for
example). The practical significanceof Table 2 is of the following
nature: In comparingtwo materials one is interestedin determining
value. Before the inwhetherthey are of about equal or of different
vestigationis begun, a decision must be made as to how different
the materials must be in order to be classed as different.Expressed
may be toleratedin the statein anotherway, how large a difference
mentthat "the two materialsare ofabout equal value?" This decision,
togetherwithTable 2, determinesthe sample size. If one is interested
in detectinga differenceso small that the signs may be distributed
45-55, he must be preparedto take a very large sample. If, however,
one is interestedonlyin detectinglargerdifferences,
(forexample,differencesrepresentedby a 70-30 distributionofsigns),a smallersample
willsuffice.
In many investigations,the sample size can be leftundetermined,
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions
THE STATISTICAL
563
SIGN TEST
and onlyas muchdata accumulatedas is needed to arriveat a decision.
In such cases, the signtest could be used in conjunctionwithmethods
of sequential analysis. These methods provide a desiredamount of
informationwith the minimumamount of sampling on the average.
A completeexpositionofthe theoryand practiceofsequential analysis
may be foundin references3 and 4.
MODIFICATIONS
OF THE SIGN TEST
When the data are homogeneous (measurementsare comparable
between pairs of observations),the sign test can be used to answer
questionsofthe followingkind:
1. Is materialA betterthan B by P per cent?
2. Is materialA betterthan B by Q units?
The firstquestion would be tested by increasingthe measurementon
B by P per cent and comparingthe resultswiththe measurementson
A. Thus, let
(X1, y,), (X2, Y2), (X3, Y3),
etc.
be pairs of measurementson A and B, and suppose one wishedto test
the hypothesisthat the measurements,x, on A were 5% higherthan
the measurements,y, on B. The sign test would simplybe applied to
the signs of the differences
xl-
1.05y1, X2 - 1.05Y2,
X3 -
1.05y3, etc.
In the case ofthe second questionthe sign test would be applied to the
differences
x-
(Yl + Q), x2 - (Y2 + Q), X3 -
(Y3 +
Q),etc.
In eithercase, if the resultingdistributionof signs is not significantly
differentfrom 50-50, the data are not inconsistentwith a positive
answer to the question. Usually there will be a range of values of P
distributionof signs. If
(or Q) which will produce a non-significant
a
one determinessuch range,using the 5% level ofsignificanceforexample, then that range will be a 95% confidenceintervalforP (or Q).
Even when the data are not homogeneous,it may be possible to
framequestionsofthe above kind, or it may be possible to change the
scales of measurementso that such questions would be meaningful.
MATHEMATICAL
APPENDIX
A. Assumptions.Let observationson two materialsor treatmentsA
and B be denoted by x and y, respectively.It is assumed that forany
that
pair of observations (xi, yi) there is a probabilityp(O<p<1)
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions
564
AMERICAN
STATISTICAL
ASSOCIATION
*, n); p is assumed to be unknown.' It is also
xi>yi (i=l, 2,
assumed that the n pairs of observations (xi, yi), (i=l, 2,
, n)
are independent;i.e., the outcome (+ or -) for(xi, y1)is independent
of the outcomefor (xi, y,) (isj).
B. The Observations.
The purpose of obtainingobservations(xi, yj)
is to make an inferenceregardingp. The observedquantityupon which
an inferenceis to be based is r, the numberof +'s or -'s (whichever
occur in fewernumbers) obtained fromn paired observations(xi, yi).
On the basis ofthe assumptionabove it followsthat the probabilityof
obtainingexactlyr as the minimumnumberof +'s or -'s is:
In\
(n)
r
[pr(1
-
p) n-r
+ pn-r(l
-
p)r]
r = 0, 1, 2,
n -i
,
X
; n odd
~~~~~~~~~~~~2
n-2
r-OJ1 2}},
-
;neven
2~~~
r = 2; n even.
Vpnlp)in
(~~
C. The Inference.In the sign test the hypothesisbeing testedis
that p=J; in other words that the distributionsof the differences
Xi-yi (i= 1, 2, * , n) have zero medians. For the more general
tests discussedin Section5, the hypothesisis that the differences
xi-f(yi) (i= 1, 2, * * , n) have zero medians. The functionf(y) may
be Py or Q+y (whereP and Q are the constantsmentionedin Section
5) or any other functionappropriate for comparisonwith x in the
problemat hand.
The hypothesisthat p= 2 is tested by dividingthe possiblevalues of
r into two classes and accepting or rejectingthe hypothesisaccording
as r fallsin one or the otherclass. The classes are chosenso as to make
small (say < a) the chance of rejecting the hypothesis when it is
true and also to make small the chance of accepting the hypothesis
when it is untrue.It can be shown that in a certainsense, the best set
of rejectionvalues forr is 0, 1, * *, 1,
R whereR depends on a and n.
R can be determinedby solvingforR =maximum i in the inequality:
( )(42)=
I&n-
i, i+ 1) _
c
whereI. (a, b) is the incompletebeta function.Table 1 was computed
in this way.
1 An additional assumptionis that the probabilityAi -Bi is 0; thus the probabilityBj >A, is
(1-p).
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions
THE STATISTICAL
565
SIGN TEST
D. SampleSizes.Whenthesamplesizeis smallthesigntestis likely
to rejectthehypothesis,
p= 4,onlyifp is nearzeroorone.If p is near,
butnotequalto 4,thetestis likelyto rejectthehypothesis,
p= 4,only
whenthesampleis large.
The samplesize requiredto rejectthehypothesis
p = 4at thea level
ofsignificance,
by finding
the
100X%ofthetime,maybe determined
largesti and smallestn whichsatisfy:
-a
~0C)(~Y
and
j=0
-
(.pi(
p)
f-i
>\
n andi aregivenin Table II forvariousvaluesofp and a; Xwas taken
to be .95in all cases.The tabularvaluesfor1-p arethesameas those
forp becauseofthesymmetry
ofthebinomialdistribution.
E. Efficiency
of theSign Test. Let z = x-y. Assume z is normally
distributed
withmeana and variancecr2.The probabality
ofobtaining
a + on a particular
zi is:
1
p \=2r
00?
e-lu'du.
An estimateof p involvingonlythe signs of zi (i 1, 2,
an estimateof
* n) yields
Cochran(reference
2) has shownthat in large
(a/co).
samplesthevariance
ofthisestimate of (a/cr)is 2rpq e(aU)2 /n., We shall
denotea/crby c.
The efficiency
ofan estimatebased on n independent
observations
is defined
as thelimit(as n-> oo) oftheratioofthevarianceofan ef-
estimateto thatofthe givenestimate.An efficient
ficient
estimateof
c is:
t
wheret is Student's t and z -zi/n.
The varianceof this estimateis 1/(n-2); thus the efficiency,
E,
is
of the signtestis e_c2/27rpq.If c = 0, thenp= and the efficiency
2/7r=63.7%.
The precedingdiscussionpertainsto largevalues of n; forsmall
is a littlebetterthan63.7%. Computations
valuesofn, theefficiency
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions
566
AMERICAN STATISTICAL ASSOCIATION
were made forseveral smaller values of n, namely,forn = 18, 30, 44
pairs of observationsat the 10% level of significance.It was found
that the sign test using 18 pairs of observations is approximately
equivalent to the t-testusing 12 pairs of observations; for 30 pairs
the equivalent t-testrequires between 20 and 21 pairs; and for 44
pairs the equivalent t-testrequiresbetween 28 and 29 pairs. Cochran
shows that the efficiencyof r/n for estimating c decreases as Ic|
increases.
REFERENCES
[1] W. MacStewart, 'A note on the power of the sign test," Annals of MathematicalStatistics,Vol. 12 (1941), pp. 236-238.
of
of the binomialseriestest of significance
[2] W. G. Cochran,"The efficiencies
Journal Royal StatisticalSociety,
a mean and of a correlationcoefficient,"
Vol. C, Part I (1937), pp. 69-73.
[3] A. Wald, "Sequentialmethodofsamplingfordecidingbetweentwo coursesof
action,"JournalAmericanStatisticalAssociation,Vol. 40 (1945), pp. 277-306.
[4] Statistical Research Group, Columbia University,Sequential Analysis of
StatisticalData: Applications(1945), Columbia UniversityPress,New York.
This content downloaded on Tue, 19 Feb 2013 17:39:22 PM
All use subject to JSTOR Terms and Conditions