Location-scale tests for non-negative data with skewed distribution

Location-scale tests for non-negative data with skewed distribution,
with focus on parasitology research
Shaimaa Yehia, Tanta University, Faculty of Science, Tanta, Egypt
Jenő Reiczigel, Szent István University, Faculty of Veterinary Science, Budapest, Hungary
Non-negative data with skewed distributions
arise in various research areas (income per
household, treatment costs of a disease, and so
on). Our motivation comes from parasitology
where skewed infection intensity data are also
quite typical. In that field of research these are
called “aggregated”, which means that most
hosts (those with stronger defense mechanisms
against parasites, e.g. a strong immune system)
are free or just slightly infected, and a relatively
small part of hosts harbor the majority of the
parasite population.
and define the shift function S(x) as
S(x) = x + −1(x)·D
where
denotes the inverse of the normal
distribution function with mean = (XL+XU)/2 and
sd = (XU−XL)/6.
−1
This definition results in a progressively increasing
shift, so that under XL the shift is negligible and
from XU practically the full shift D is in operation.
Note that the model includes no shift when D=0,
and also the simple shift when L=U=0 and D0.
Cummingsiella aurea
Power depended on the distribution and
sample size. For most sample sizes Neuhäuser’s
test had the highest power, and the MannWhitney-test the lowest one.
Average power (over all distributions)
n1, n2
Perm-t
M-W
Boot-W
Cucc
Neuh
average power for theor. distrib.
10,10
96%
54%
75%
73%
78%
30,30
73%
45%
71%
76%
78%
100,100
72%
54%
72%
81%
84%
10,30
73%
40%
66%
66%
72%
30,100
72%
48%
68%
79%
81%
10,100
66%
37%
55%
60%
69%
0.00
0.02
average power for paras. distrib.
0
50
100 150 200 250
Columbicola col. bac.
10,10
89%
62%
75%
76%
82%
30,30
70%
57%
68%
64%
76%
100,100
24%
39%
24%
42%
76%
10,30
84%
55%
73%
68%
79%
30,100
51%
51%
44%
55%
78%
10,100
68%
50%
52%
59%
80%
average
69%
49%
62%
67%
78%
0.00
0.04
0.08
highest power in each row marked bold
lowest power in each row marked red
0
10
20
30
40
50
0.00
0.04
0.08
Philopterus ocellatus
For the simulation we generated a large finite
population (n=100000) from each distribution of
interest. Alpha was assessed taking both
samples from this population. For power
simulation another population was generated
applying a progressive shift with L=0.1 and U=0.7
to this population. In each power simulation run
we set the shift parameter D so that at least one
of the tests would reach about 80% power. In
case of different sample sizes we made two
simulation runs, taking first the smaller, then the
larger sample from the shifted population.
10
20
30 40
Chi-squared on 5 df
50 60
Shifted (L=0.1, U=0.7, D=2)
0.00
0.00
0
10
20
30
40
The theoretical distributions were a chi-squared
distribution on 5 df, an exponential distribution
with =0.1, and a gamma distribution with
shape=0.5
and
scale=20.
The
parasite
distributions were generated from 3 parasite
samples reported in Rózsa et al. (2000).
Sample sizes varied from 10 to 100. Alpha was
compared assuming that the two samples
came from identical distributions. Power was
compared assuming that differences would
appear at higher infection levels, reflecting to
the observation that even in heavily infected
populations many hosts (those with good
defence) remain free or almost free.
Formally, we chose three values, L, U, and D
(0  L  U 1) and defined the probit-based
progressive shift S(x) as follows. Let XL and XU
denote the L and U quantiles of the distribution,
20
30
40
Shifted (L=0.1, U=0.7, D=6)
0.08
0
20
40
60
80
0
20
40
60
80
0
20
40
60
80
 For skewed data and a progressive shift
alternative Neuhäuser’s test had the highest
power.
 If interest lies in pure location differences,
only the bootstrap test is applicable. We
found that it maintains the alpha error rate
for balanced or moderately unbalanced
designs (sample size ratio  2.5).
 In relation with parasite infection data, the
probit-based progressive shift offers a more
realistic alternative than the conventional
shift or scale alternatives. We feel that it is
also true for the analysis of other skewed
data, such as treatment cost data.
 As the result of a method comparison study
may depend on the alternative hypothesis,
comparisons must be carried out assuming
that alternative, which is most realistic in the
field of interest.
References
100
0
20
40
60
80
100
Results
The permutation t-test, the Mann-Whitney-test,
and both location-scale tests had acceptable
alpha (under 6% at nominal 5%) up to a sample
size ratio of 1:10. However, alpha of the Welch ttest and the bootstrap test was too high if the
ratio of sample sizes exceeded 2 (or 2.5).
Highest alpha (over all distributions & sample sizes)
Ratio of the
sample sizes
 Location-scale tests may have a role in
parasitology research.
Shifted (L=0.1, U=0.7, D=8)
0.00 0.08
0.10
Gamma with shape=0.5, scale=20
0.00
We compared Cucconi’s and Neuhäuser’s
location-scale tests (for details see Marozzi,
2013) to 4 commonly used location tests
(Welch-t-test, Mann-Whitney-test, permutation ttest and bootstrap-t-test) for 3 right-skewed
theoretical distributions and 3 empirical parasite
distributions.
10
0.00
0.00
Methods
0
Exponential with lambda=0.1
0.08
Two-sample comparison of parasite infection
data is usually made by location tests. As more
infected samples have both higher mean and
higher SD, we expected that location-scale
tests would be more powerful.
0.10
0.15
0
Conclusions
Welch
M-W
Perm-t
Boot-W
1:1
5.5%
5.1%
5.7%
5.5%
1:2
5.8%
5.1%
6.0%
5.4%
1:2.5
7.3%
5.5%
5.5%
6.0%
1:3
9.4%
5.1%
5.0%
7.2%
1:10
15.7%
5.1%
5.3%
13.7%
red numbers indicate too liberal tests
Welch BL (1938) The Significance of the Difference
Between Two Means When the Population
Variances are Unequal, Biometrika, 29, 350–362.
Mann HB, Whitney DR (1947) On a Test of Whether
One of Two Random Variables is Stochastically
Larger Than the Other, Annals of Mathematical
Statistics, 18, 50–60.
Marozzi M (2013) Nonparametric Simultaneous Tests
for Location and Scale Testing: A Comparison of
Several Methods, Communications in Statistics Simulation and Computation, 42, 1298-1317.
Efron B, Tibshirani RJ (1993) An Introduction to the
Bootstrap, New York: Chapman & Hall.
Rózsa L, Reiczigel J, Majoros G (2000) Quantifying
parasites in samples of hosts. Journal of Parasitology
86, 228-232.
This research was supported by the Hungarian National
Research Fund (OTKA K 108571) and by the Research
Faculty Grant 2014 of the Szent István University, Faculty
of Veterinary Science.
ISCB2014 – 35th Annual Conference of the International Society for Clinical Biostatistics, 24-28 August 2014, Vienna, Austria