Location-scale tests for non-negative data with skewed distribution, with focus on parasitology research Shaimaa Yehia, Tanta University, Faculty of Science, Tanta, Egypt Jenő Reiczigel, Szent István University, Faculty of Veterinary Science, Budapest, Hungary Non-negative data with skewed distributions arise in various research areas (income per household, treatment costs of a disease, and so on). Our motivation comes from parasitology where skewed infection intensity data are also quite typical. In that field of research these are called “aggregated”, which means that most hosts (those with stronger defense mechanisms against parasites, e.g. a strong immune system) are free or just slightly infected, and a relatively small part of hosts harbor the majority of the parasite population. and define the shift function S(x) as S(x) = x + −1(x)·D where denotes the inverse of the normal distribution function with mean = (XL+XU)/2 and sd = (XU−XL)/6. −1 This definition results in a progressively increasing shift, so that under XL the shift is negligible and from XU practically the full shift D is in operation. Note that the model includes no shift when D=0, and also the simple shift when L=U=0 and D0. Cummingsiella aurea Power depended on the distribution and sample size. For most sample sizes Neuhäuser’s test had the highest power, and the MannWhitney-test the lowest one. Average power (over all distributions) n1, n2 Perm-t M-W Boot-W Cucc Neuh average power for theor. distrib. 10,10 96% 54% 75% 73% 78% 30,30 73% 45% 71% 76% 78% 100,100 72% 54% 72% 81% 84% 10,30 73% 40% 66% 66% 72% 30,100 72% 48% 68% 79% 81% 10,100 66% 37% 55% 60% 69% 0.00 0.02 average power for paras. distrib. 0 50 100 150 200 250 Columbicola col. bac. 10,10 89% 62% 75% 76% 82% 30,30 70% 57% 68% 64% 76% 100,100 24% 39% 24% 42% 76% 10,30 84% 55% 73% 68% 79% 30,100 51% 51% 44% 55% 78% 10,100 68% 50% 52% 59% 80% average 69% 49% 62% 67% 78% 0.00 0.04 0.08 highest power in each row marked bold lowest power in each row marked red 0 10 20 30 40 50 0.00 0.04 0.08 Philopterus ocellatus For the simulation we generated a large finite population (n=100000) from each distribution of interest. Alpha was assessed taking both samples from this population. For power simulation another population was generated applying a progressive shift with L=0.1 and U=0.7 to this population. In each power simulation run we set the shift parameter D so that at least one of the tests would reach about 80% power. In case of different sample sizes we made two simulation runs, taking first the smaller, then the larger sample from the shifted population. 10 20 30 40 Chi-squared on 5 df 50 60 Shifted (L=0.1, U=0.7, D=2) 0.00 0.00 0 10 20 30 40 The theoretical distributions were a chi-squared distribution on 5 df, an exponential distribution with =0.1, and a gamma distribution with shape=0.5 and scale=20. The parasite distributions were generated from 3 parasite samples reported in Rózsa et al. (2000). Sample sizes varied from 10 to 100. Alpha was compared assuming that the two samples came from identical distributions. Power was compared assuming that differences would appear at higher infection levels, reflecting to the observation that even in heavily infected populations many hosts (those with good defence) remain free or almost free. Formally, we chose three values, L, U, and D (0 L U 1) and defined the probit-based progressive shift S(x) as follows. Let XL and XU denote the L and U quantiles of the distribution, 20 30 40 Shifted (L=0.1, U=0.7, D=6) 0.08 0 20 40 60 80 0 20 40 60 80 0 20 40 60 80 For skewed data and a progressive shift alternative Neuhäuser’s test had the highest power. If interest lies in pure location differences, only the bootstrap test is applicable. We found that it maintains the alpha error rate for balanced or moderately unbalanced designs (sample size ratio 2.5). In relation with parasite infection data, the probit-based progressive shift offers a more realistic alternative than the conventional shift or scale alternatives. We feel that it is also true for the analysis of other skewed data, such as treatment cost data. As the result of a method comparison study may depend on the alternative hypothesis, comparisons must be carried out assuming that alternative, which is most realistic in the field of interest. References 100 0 20 40 60 80 100 Results The permutation t-test, the Mann-Whitney-test, and both location-scale tests had acceptable alpha (under 6% at nominal 5%) up to a sample size ratio of 1:10. However, alpha of the Welch ttest and the bootstrap test was too high if the ratio of sample sizes exceeded 2 (or 2.5). Highest alpha (over all distributions & sample sizes) Ratio of the sample sizes Location-scale tests may have a role in parasitology research. Shifted (L=0.1, U=0.7, D=8) 0.00 0.08 0.10 Gamma with shape=0.5, scale=20 0.00 We compared Cucconi’s and Neuhäuser’s location-scale tests (for details see Marozzi, 2013) to 4 commonly used location tests (Welch-t-test, Mann-Whitney-test, permutation ttest and bootstrap-t-test) for 3 right-skewed theoretical distributions and 3 empirical parasite distributions. 10 0.00 0.00 Methods 0 Exponential with lambda=0.1 0.08 Two-sample comparison of parasite infection data is usually made by location tests. As more infected samples have both higher mean and higher SD, we expected that location-scale tests would be more powerful. 0.10 0.15 0 Conclusions Welch M-W Perm-t Boot-W 1:1 5.5% 5.1% 5.7% 5.5% 1:2 5.8% 5.1% 6.0% 5.4% 1:2.5 7.3% 5.5% 5.5% 6.0% 1:3 9.4% 5.1% 5.0% 7.2% 1:10 15.7% 5.1% 5.3% 13.7% red numbers indicate too liberal tests Welch BL (1938) The Significance of the Difference Between Two Means When the Population Variances are Unequal, Biometrika, 29, 350–362. Mann HB, Whitney DR (1947) On a Test of Whether One of Two Random Variables is Stochastically Larger Than the Other, Annals of Mathematical Statistics, 18, 50–60. Marozzi M (2013) Nonparametric Simultaneous Tests for Location and Scale Testing: A Comparison of Several Methods, Communications in Statistics Simulation and Computation, 42, 1298-1317. Efron B, Tibshirani RJ (1993) An Introduction to the Bootstrap, New York: Chapman & Hall. Rózsa L, Reiczigel J, Majoros G (2000) Quantifying parasites in samples of hosts. Journal of Parasitology 86, 228-232. This research was supported by the Hungarian National Research Fund (OTKA K 108571) and by the Research Faculty Grant 2014 of the Szent István University, Faculty of Veterinary Science. ISCB2014 – 35th Annual Conference of the International Society for Clinical Biostatistics, 24-28 August 2014, Vienna, Austria
© Copyright 2026 Paperzz