Chapter 9 Nonparametric Statistics

9. Nonparametric Statistics
Statistical analyses that do not depend upon the
knowledge of the distribution and parameters of
the population are called nonparametric or
distribution-free methods.
不依赖于总体的分布及其参数的统计方法称为非
参数方法或非分布方法。
9.1 Sign Test 符号检验
1
The simplest of all nonparametric methods is the sign test, which
is usually used to test the significance of the difference between two
means in a paired experiment.
最简单的非参数检验是符号检验
检验两个总体均值差的显著程度
It is particularly suitable when the various pairs are observed under
different conditions, a case in which the assumption of normality
may not hold. However, because of its simplicity, the sign test is
often used even though the populations are normally distributed. As
is implied by its name in this test only the sign of the difference
174
between the paired variates is used.
若两个总体的均值相等,那么符号‘+’、
‘-’的概
率一样。
D = sign of
(X1-X2 )
If p denotes the probability of a difference D being positive and
q the probability of its being negative, we have as hypothesis p=1/2.
appropriate test statistic is X , X~B(n, p), X --- N(‘+”)
To test the hypothesis
H 0 : 1  2  0
H1 : 1  2  0
we will reject H 0 in favor of H1 only if the proportion of plus
signs is sufficiently less than 1/2, that is , when the value x of our
random variable is small. Hence, if the computed P-value
P  P( X  x when p  12 )
is less than or equal to the significance level  , we reject H 0 in
favor of H1 .
To test the hypothesis
H 0 : 1  2  0
H1 : 1  2  0
175
we reject H 0 in favor H1 when the proportion of plus signs is
significantly less than or significantly greater than 1/2. This, of
course, is equivalent to x being sufficiently small or sufficiently
large, respectively. Therefore, if x  n / 2 and the computed P-value
P  2P( X  x when p  12 )
is less than or equal to  , or if x  n / 2 and the computed P-value
P  2P( X  x when p  12 )
is less than or equal to  , we reject H 0 in favor H1 .
Example 9.1.1
A taxi company is trying to decide whether the use of radial tires
instead of regular belted tires improves fuel economy. Sixteen cars
are equipped with radial tires and driven over a prescribed test
course. Without changing drivers, the same cars are then equipped
with the regular belted tires and driven once again over the test
course. The gasoline consumption, in kilometers per liter, was
recorded as follows:
Car
Radial tires
Belted tires
D
1
4.2
4.1
+
2
4.7
4.9
-
3
6.6
6.2
+
4
7.0
6.9
+
176
5
6.7
6.8
-
6
4.5
4.4
+
7
5.7
5.7
8
6.0
5.8
+
9
7.4
6.9
+
10
4.9
4.9
11
6.1
6.0
+
12
5.2
4.9
+
13
5.7
5.3
+
14
6.9
6.5
+
15
6.8
7.1
-
16
4.9
4.8
+
Can we conclude using the 5% level of significance that cars
equipped with radial tires have better fuel economy than those
equipped with regular belted tires?
Solution Let 1 and 2 represent the mean kilometers per liter for
cars equipped with radial and belted tires, respectively,
1.
H 0 : 1  2  0 .
H1 : 1  2  0
2. Test statistics: Binomial variable X with p=1/2.
3.   0.05
4. Calculations: After replacing each positive difference by a “+”
177
symbol and each negative difference by a “-” symbol, and then
discarding the two zero differences, we obtain the sequence
+ - + + - + + + + + + + -+
for which n  14 and x  11 . Using the normal-curve approximation,
  np  7 ,   npq  14 / 2 ,
we find
 X  np 10.5  7 
P  P( X  11)  P 

  P( Z  1.87)  0.0307
 npq
14 / 2 

5. Decision: Since P  0.0307  0.05 , we reject H 0 and conclude that,
on the average, radial tires do improve fuel economy.
符号检验的利弊
n 必须比较大
因为对于 n =5 的样本,会出现永远不拒绝“总体
均值相等“的假设。
( 极端情形 2(1/2)5=0.0625,全
部为正号情形 )
对双边检验,n 至少 6 以上,越大越好。
178
9-2
秩和检验
Rank-Sum Test
the Wilcoxon rank-sum test or Wilcoxon two
–sample test
is an appropriate alternative to the two-sample
t-test described
a sample (x1,x2, …,xn1) from X’s
population
a sample (y1,y2, …,yn2) from Y’s
population
Hypothesis: 1  2 总体均值是否相等
n= n1 +n2
In Wilcoxon’s test:
These n variates are graded (or ranked)
according to in creasing size
n 个变量从小到大排列并编号
x1, y1, x2, x3, y2, …
1
2
3
4
5
w1=1+3+4+…
w2=2+5+….
w1<w2
Find
If
P = p{  R ≤w1}
i
P <α
(significant level )
Then reject H0;
(otherwise, we accept H0 )
179
Example
检验制造水泥的一种新方法是否提高了
抗压强度
New method:
148, 143, 138, 145, 141
Standard method: 139, 136, 142, 133, 140
Does this indicate that the new method has
increased the compressive strength(5% level) ?
Solution.
H0:
ustandard = unew
H1: ustandard <unew
The variaes belong to the sample with smaller
mean are underlined
Original data:
133, 136, 138, 139, 140, 141, 142, 143, 145, 148
ranks: 1
2
3
4
5
6
7
8
9
10
w1= 1+2+4+5+7=19
w2=3+6+8+9+10=36
the total of all ranks is
w1+w2= 19+36 =55
(如果新方法没有显著效果,w1 应该和 w2 差不多。
事实上,任意一组 5 个数的组合,都应相近)
there are 12 cases, a sum of rank less than or equal
to 19
180
N(  R ≤w1) = 12
since
C105 =
so
)
i
252
12
P= p{  R ≤ 19 } = 252 =0.0477 <0.05=α
i
reject H0.
(用古典概型定义)
If there is no significant difference between the two sample means,
the total of the ranks corresponding to the first and those
corresponding to the second sample should be about the same. If,
however, the total of the ranks for one sample is appreciably less
than that of the other, we calculate ——under the hypothesis of
equal population means——the probability of obtaining by chance
alone sum of ranks less than or equal to that obtained in the given
experiment. If this probability is less than the significance level, we
reject the hypothesis; otherwise, we accept it.
The procedure can be better understood by an example.
Example
9.2.1
The following data give the results of tests on two
preparations( 药 剂 ) of a fly spray, in terms of the
percentage of mortality.
Sample A: 68, 68, 59, 72, 64, 67, 70, 74
Sample B: 60, 67, 61, 62, 67, 63, 56, 58
The ranks are determined as follows:
181
Original data: 56 58 59
69 70 72
74
Ranks:
1
2
3
60
61
62
63
64
67
67
67
68
4
5
6
7
8
10
10
10
12
13 14 15 16
In the case of ties (identical observation 有相同观察数的处理), we
replace the observations by the mean of the ranks that the
observations would have if they were distinguishable, here the ninth
and tenth and eleventh observations are identical, we assign a rank
10 to each of the three observations.
here n1  8, n2  8 , n1  n2  16 ,
w1  1  2  4  5  6  10  10  45
w1  w2 
w2 
n(n  1)
2 .
(16)(17)
 45  91 .
2
In choosing repeated samples of size n1 and n2 , we would
expect w1 , and therefore
w2
w2 ,
to vary. Thus we may think of w1 and
as values of the random variable W2 and W2 , respectively. The
null hypothesis 1  2 will be rejected in favor of the alternative
1  2 only if w1 is sufficiently small; the alternative 1  2 is
182
accepted if
w2
is sufficiently small; and the alternative 1  2 is
accepted if the minimum of w1 and
w2
is sufficiently small.
In actual practice we usually make our decision on the value
u1  w1 
of the related statistic
n1 (n1  1)
2
U1
or
u2  w2 
n2 (n2  1)
2
or U 2 , respectively, or on the value u of
the statistic U, the minimum of
U1
and U 2 . These statistics simplify
the construction of tables of critical values. Since both
U1
and U 2
have symmetric sampling distributions and assume values in the
interval from 0 to
such that u1  u2  n1n2 .
n1n2
From the formulas for u1 and u2 we see that u1 will be small
when w1 is small and u2 will be small when
w2
is small.
Consequently, the null hypothesis will be rejected whenever the
appropriate statistics
U1 , U 2 ,
or U assumes a value less than or
equal to the desired critical value given in Table A. The various test
procedures are summarized in Table 9.1.
Table 9.1 Rank-Sum Test
null hypothesis
alternative hypothesis
1  2
1  2
1  2
1  2
183
calculate
u1
u2
u
Table F gives critical value of
U1
and U 2 for some level of
significance. If the observed value of u1 and u2 or u is less than or
equal to the tabled critical value, the null hypothesis is rejected at the
level of significance indicated by the table.
Suppose, for the example above, that we wish to test if there is
a significant differences (5% level) between the two preparations. In
other words we wish to test the null hypothesis that 1  2 against
the two-tailed alternative that 1  2 at the 0.05 level of
significance for random samples of size
n1  8
and n2  8 that yield
the value w1  45 . It follows that
1
1
u1  w1  n1 (n1  1)  45   8  9  9 .
2
2
Our two-tailed test is base on the statistic
U1 .
Using Table F,
we reject the null hypothesis of equal means when u1  13 .
( { u < u0 } is rejected regions )
Since u1  9 falls in the rejection region, the null hypothesis can be
rejected and we conclude that there is a significant difference
between the two preparations.
Under the null hypothesis that two samples come from identical
populations, it can be shown that the mean and the variance of the
sampling distribution of
U 
1
U1
n1n2
2
are
and  U2
1

n1n2 (n  1)
.
2
If there are ties in rank, these formulas provide only approximations,
184
but if the number of ties is small, these approximations will
generally be good.
Since numerical studies have shown that the sampling
distribution of
U1
can be approximated closely by a normal
distribution when n1 and n2 are both greater than 8, the test of null
hypothesis that the two samples come from identical populations can
be based on
Z
U1  U1
U
1
which is a random variable having approximately the standard
normal distribution.
The use of the wilcoxon rank-sum test is not restricted to
nonnormal populations. It can be used in place of the two-sample
t-test when the populations are normal, although the power will be
smaller. The Wilcoxon rank-sum test is always superior to the t-test
for decidedly nonnormal populations.
9.3 Signed-Rank Test
Wilcoxon signed-rank test.(符号检验,秩和检验结合)
The reader should note that the sign test utilizes only the plus and
minus signs of the differences between the observations in the two
sample case, and 0 in the one-sample case. It does not take into
consideration the magnitudes of these differences. A test utilizing
185
both direction and magnitude was proposed in 1945 by Frank
Wilcoxon, called Wilcoxon signed-rank test.
1. 原理
H0:   0
( H0
Di  xi  0
: 1  2  0
)
( Di  xi  yi )
w   Ri ()
w   Ri ()
When H0 is true,
w  w
(should
nearly equal)
H1:   0 , when
w
is small and
H1:   0 , when
w
is large and w is small.
w
is large
H1: double tail
w=
w or w ,
w
when
is sufficiently small.
For one-tailed P  value  P{w  w0}  
,
reject H0.
For two-tailed: P  value  2 P{w  w0}   , reject H0
Example
(麦种处理效果比较)
Is there a significant difference between the two seed
treatments?
Solution
Let X be the first sample variates
X
Y
D=X-Y
186
rank
58
8
32
7
30
6
5
1
-7
3
6
2
11
5
10
4
n=8,
w-= 3 (负数序数和)
w+=n(n+1)/2-W-
There are a total of 256 equally likely possibilities for
the sequence of signs.(28)
No negative D-value:
1 case(++++++++)
One negative D-value:
3 case
(-+++++++)
(+-++++++)
(++-+++++)
two negative D-value:
1 case(――++++++)
P{  R ≤w-} = 5/256 = 0.0195
i
P=2 P{  Ri ≤w-} =0.0390 < 0.05
(拒绝 Ho)
187
Table 9-16 Signed-Rank Test
To test H 0
Versus H 0
  0
1  2
Calculate
  0
w
  0
w
  0
w
1  2
w
1  2
w
1  2
w
2. 查表(w0)
It should be noted that whenever n  5 and the level of
significance does not exceed 0.05 for a one-tailed test or 0.01 for a
two-tailed test, all possible values of w , w , or w will lead to the
acceptance of the null hypothesis.
when 5  n  30 , Appendix Table G shows approximate critical
values of W , W or W for some levels.
For example, when n  10 , Tabel G shows that a value of
w  5 (w0=5)is required for the one-tailed alternative   0
to be significant at the 0.01 level.
In above example,n= 8, significant at 0.05, for two-tailed,
w0=4,
since w-= 3< 4=w0, reject H0.
188
Example 9.3.1
The following data represent the number of hours
that a rechargeable hedge trimmer operates before a recharge is
required: 1.5, 2.2, 0.9, 1.3, 2.0, 1.6, 1.8, 1.5, 2.0, 1.2, and 1.7. Use
the signed-rank test to test the hypothesis at the 0.05 level of
significance that this particular trimmer operates with mean 1.8
hours before requiring a recharge.
Solution
H1 :   1.8
1. H 0 :   1.8 ,
2.   0.05
3. Critical region:
Since n = 10, after discarding the one
measurement that equals 1.8, Table G(Table 1.16) shows the critical
region to be w  8 .
4. Calcutation:
Subtracting 1.8 from each measurement and
then ranking the differences by their absolute values, we have
di:
Ranks:
-0.3
5.5
0.4
7
Now w  13 and
-0.9 -0.5
10 8
w  42
0.2
3
-0.2
3
-0.3
5.5
0.2
3
-0.6
9
-0.1
1
so that w  13 , the small of w and w .
5. Decision: Do not reject H 0 and conclude that the average
operating time is not significantly different form 1.8 hours.
189
Example 9.3.2
It is claimed that a college senior can increase his score in the major
field area of the graduate record examination by at least 50 points if
he is provided with sample problems in advance. To test this claim,
18 college seniors are divided in to 9 pairs such that each matched
pair has almost the same overall quality point average for their first 3
years in college. Sample problems and answers are provided at
random to one member of each pair 1 week prior to the examination.
The following examination scores were recorded:
Pair:
With sample problems:
Without sample problems:
1
2
3
4
5
6
7
8
9
531 621 663 579 451 660 591 719 543
509 540 688 502 424 683 568 748 530
Test the null hypothesis at the 0.05 level of significance that sample
problems increases the scores by 50 points against the alternative
hypothesis that the increase is less than 50 points.
Solution
Let 1 and 2 represent the mean score of all students taking the
test in question with and without sample problems, respectively.
1.
H 0 : 1  2  50 ,
H1 : 1  2  50
2.   0.05
3. Critical region: Since n = 9, Table 1.16 shows the critical region
to be w  8 .
190
4. Calcutation:
Pair:
di:
di-d0:
Ranks:
1
22
-28
4
2
3
81 -25
31 -75
5
8
4
77
27
2.5
5
27
-23
1
6
-23
-73
7
7
8
9
23 -29 13
-27 -79 -37
2.5 9
6
Now we find that w  5  2.5  7.5 .
5. Decision: Reject H 0 : and conclude that sample problems do not,
on the average, increase one’s graduate record score by as much as
50 points.
When n  15 ,
the sampling distribution of W (or W )
approaches the normal distribution with mean
W 

n(n  1)
4
and variance
 W2 

n(n  1)(2n  1)
24
.
Therefore, when n exceeds the largest value in Table A.16, the
statistic
Z
W  W
W

can be used to determine the critical region for our test.
Homework chap 9
9.1,
9.3,
9.4,
9. 10
191