Nonparametric predictive inference for order statistics of future observations Frank P.A. Coolen and Tahani A. Maturi Abstract Nonparametric predictive inference (NPI) is a powerful frequentist statistical framework which uses only few assumptions. Based on a post-data exchangeability assumption, precise probabilities for some events involving one or more future observations are defined, based on which lower and upper probabilities can be derived for all other events of interest. We present NPI for the r-th order statistic of m future real-valued observations and its use for comparison of two groups of data. 1 Introduction Nonparametric predictive inference (NPI) [3, 5] is a statistical framework which uses few modelling assumptions, with inferences explicitly in terms of future observations. For real-valued random quantities attention has thus far been mostly restricted to a single future observation, although multiple future observations have been considered for some NPI methods for statistical process control [1, 2]. For Bernoulli quantities, NPI has also been presented for m ≥ 1 future observations [4], with explicit study of the influence of the choice of m for comparison of groups of proportions data [6]. In this paper, we consider m future real-valued observations, given n observations, and as main contribution we focus on the r-th ordered observation of these m future observations, including comparison of two groups of data via comparison of their corresponding r-th ordered future observations. Without making further assumptions, these inferences require the use of lower and upper probabilities for several events of interest, as such this work fits in the theory of imprecise probability [12] and interval probability [13]. Dept. of Mathematical Sciences, Durham University, Durham, United Kingdom [email protected] · [email protected] 1 2 Frank P.A. Coolen and Tahani A. Maturi Assume that we have real-valued ordered data x1 < x2 < . . . < xn , with n ≥ 1. We assume that ties do not occur, in Example 2 in Section 3 we explain how to deal with ties. For ease of notation, define x0 = −∞ and xn+1 = ∞. The n observations partition the real-line into n + 1 intervals Ij = (xj−1 , xj ) for j = 1, . . . , n + 1. If we wish to allow ties between past and future observations explicitly, we could use closed intervals [xj−1 , xj ] instead of these open intervals Ij , the difference is rather minimal and to keep presentation easy we have opted not to do this here. We are interested in m ≥ 1 future observations, Xn+i for i = 1, . . . , m. We link the data and future observations via Hill’s assumption A(n) [10], or, more precisely, via A(n+m−1) (which implies A(n+k) for all k = 0, 1, . . . , m − 2; we will refer to this generically as ’the A(n) assumptions’), which can be considered as a post-data version of a finite exchangeability assumption for n + m random quantities. A(n+m−1) implies that all possible orderings of the n data observations and the m future observations are equally likely, where the n data observations are not distinguished among each other, and neither are the m future observations. Let Sj = #{Xn+i ∈ Ij , i = 1, . . . , m}, then assuming A(n+m−1) we have P( n+1 \ j=1 {Sj = sj }) = n+m n −1 (1) Pn+1 where sj are non-negative integers with j=1 sj = m. Another convenient way to interpret the A(n+m−1) assumption with n data observations and m future observations is to think that n randomly chosen observations out of all n + m real-valued observations are revealed, following which you wish to make inferences about the m unrevealed observations. The A(n+m−1) assumption then implies that one has no information about whether specific values of neighbouring revealed observations make it less or more likely that a future observation falls in between them. For any event involving the m future observations, (1 implies that we can count the number of such orderings for which this event holds. Generally in NPI a lower probability for the event of interest is derived by counting all orderings for which this event has to hold, while the corresponding upper probability is derived by counting all orderings for which this event can hold [3, 5]. NPI is close in nature to predictive inference for the low structure stochastic case as briefly outlined by Geisser [9], which is in line with many earlier nonparametric test methods where the interpretation of the inferences is in terms of confidence intervals. In NPI the A(n) assumptions justify the use of these inferences directly as probabilities. Using only precise probabilities or confidence statements, such inferences cannot be used for many events of interest, but in NPI we use the fact, in line with De Finetti’s Fundamental Theorem of Probability [7], that corresponding optimal bounds can be derived for all events of interest [3]. NPI provides exactly calibrated frequentist inferences [11], and it has strong consistency properties in theory of interval NPI for future order statistics 3 probability [3]. In NPI the n observations are explicitly used through the A(n) assumptions, yet as there is no use of conditioning as in the Bayesian framework, we do not use an explicit notation to indicate this use of the data. It is important to emphasize that there is no assumed population from which the n observations were randomly drawn, and hence also no assumptions on the sampling process. NPI is totally based on the A(n) assumptions, which however should be considered with care as they imply e.g. that the specific ordering in which the data appeared is irrelevant, so accepting A(n) implies an exchangeability judgement for the n observations. It is attractive that the appropriateness of this approach can be decided upon after the n observations have become available. NPI is always in line with inferences based on empirical distributions, which is an attractive property when aiming at objectivity [5]. 2 NPI for order statistics Let X(r) , for r = 1, . . . , m, be the r-th ordered future observation, so X(r) = Xn+i for one i = 1, . . . , m and X(1) < X(2) < . . . < X(m) . The following probabilities are derived by counting the relevant orderings, and hold for j = 1, . . . , n + 1, and r = 1, . . . , m, P (X(r) ∈ Ij ) = j+r−2 j−1 n−j+1+m−r n−j+1 n+m n −1 (2) For this event NPI provides a precise probability, as each of the n+m equally n likely orderings of n past and m future observations has the r-th ordered future observation in precisely one interval Ij . As an example, suppose that one is interested in the minimum X(1) of the n+m−1 m future observations. Formula (2) gives P (X(1) ∈ Ij ) = n−j+m , n−j+1 n m . Clearly, the event X(1) ∈ I1 occurs if with for example P (X(1) ∈ I1 ) = n+m the smallest of all n+m observations, so the n data observations and m future n observations, is not in the data set, which would occur with probability n+m . n+m −1 A further special case of interest is P (X(1) ∈ In+1 ) = n , following from the fact that there is only one ordering for which all n data observations occur before all m future observations. In theory of mathematical statistics and probability, much attention is paid to limit results. Many popular statistical methods are justified through limit properties, with limits taken with regard to the number n of data observations, leading to ’large-sample’ methods that are often applied in cases with relatively small samples without due consideration of the quality of the approximations involved and lacking clear foundational justification. Considering limits for n going to infinity is not very exciting in NPI as one just 4 Frank P.A. Coolen and Tahani A. Maturi ends up with the empirical distribution function and corresponding inferences. However, in NPI it might be of some interest to consider the limiting behaviour of the predictive probabilities (2) if m goes to infinity, hence if we consider an ever increasing future. Defining θ ∈ [0, 1] through the relationship r = θm (of course, this only makes sense when θm, and therefore also (1 − θ)m, is integer, we only sketch the argument here without giving the detailed mathematical presentation), the following limiting result is easily proven, for j = 1, . . . , n + 1, n lim P (X(θm) ∈ Ij ) = θj−1 (1 − θ)n−j+1 (3) m→∞ j−1 It is important to emphasize the difference with established statistical methods. The θ in (3) is not a characteristic of an assumed population from which the data are sampled, indeed no population assumption is made. Furthermore, (3) is not a probability distribution nor a likelihood function for θ. Instead, θ only serves for notation of this event of interest, and indicates the specific relative (with regard to m) future order statistic of interest. Of course, the actual A(n) assumptions required for this limit imply infinite exchangeability of the future observations, hence De Finetti’s Representation Theorem [7] indicates that a parametric representation can be assumed, yet this is different from the explicitly predictive use in NPI, most noticeably through the absence of a probability distribution for θ. The limiting probability (3) can be understood from the consideration that for the event X(θm) ∈ Ij to hold, precisely j − 1 of the n data observations must be smaller than X(θm) , but it must be emphasized again that (3) specified probabilities for X(θm) , not for any aspect of the observed data for which no concept of randomness, e.g. as following from assumed sampling from a population, is used in NPI. In NPI, the data are given, all randomness is explicitly with regard to the future observations, which nicely reflects where the uncertainty really is in applications. Analysis of the probability (2) leads to some interesting results, including the obvious symmetry P (X(r) ∈ Ij ) = P (X(m+1−r) ∈ In+2−j ). For all r, the probability for X(r) ∈ Ij is unimodal probability in j, with the maximum r−1 r−1 (n + 1) ≤ j ∗ ≤ m−1 (n + 1) + 1. This assigned to interval Ij ∗ with m−1 carries through to the limiting situation (3), where for given θ the maximum probability is assigned to interval Ij ∗ with (n + 1)θ ≤ j ∗ ≤ (n + 1)θ + 1. It is worth commenting on extreme values, in particular inference involving X(1) or X(m) for m large compared to the value of n. In these cases, NPI assigns large probabilities to the intervals I1 or In+1 , respectively, which are outside the range of the observed data and unbounded unless the random quantities of interest are logically bounded (e.g. zero as lower bound for lifetime data). This indicates that, for such inferences, very little can be concluded without further assumptions on the probability masses within these end intervals beyond the observed data. This will be illustrated in the examples in Section NPI for future order statistics 5 3. There are several inferential problems where one is explicitly interested in such a future order statistic X(r) . It may be of explicit interest to compare different groups or treatments by comparing particular future order statistics, this is presented in Section 3. 3 Comparing two groups Suppose we have two independent groups of real-valued observations, X and Y , their ordered observed values are x1 < x2 < . . . < xnx and y1 < y2 < . . . < yny . For ease of notation, let x0 = y0 = −∞ and xnx +1 = yny +1 = ∞. And let Ijxx = (xjx −1 , xjx ) and Ijyy = (yjy −1 , yjy ). We are interested in m ≥ 1 future observations from each group (i.e. mx = my = m), so in Xnx +i and Yny +i for i = 1, . . . , m. We wish to compare the r-th future order statistics from these two groups by considering the event X(r) < Y(r) , for which the NPI lower and upper probabilities, based on the A(nx ) and A(ny ) assumptions per group, are derived by P (X(r) < Y(r) ) = nX y +1 x +1 nX 1{xjx < yjy −1 }P (X(r) ∈ Ijxx )P (Y(r) ∈ Ijyy ) (4) P (X(r) < Y(r) ) = nX y +1 x +1 nX 1{xjx −1 < yjy }P (X(r) ∈ Ijxx )P (Y(r) ∈ Ijyy ) (5) jx =1 jy =1 jx =1 jy =1 where 1{E} is an indicator function which is equal to 1 if event E occurs and 0 else. This NPI lower (upper) probability follows by putting all probability masses for Y(r) corresponding to the intervals Ijyy = (yjy −1 , yjy ), jy = 1, . . . , ny + 1, to the left (right) end points of these intervals, and by putting all probability masses for X(r) corresponding to the intervals Ijxx = (xjx −1 , xjx ), jx = 1, . . . , nx +1, to the right (left) end points of these intervals. We illustrate this NPI method for comparison of two groups based on the r-th future order statistic in two examples, first a small artificial example followed by one considering a real data set. Example 1. To get a basic feeling for these inferences, we consider three small artificial data sets (cases A,B,C) as given in Table 1. For m = 5, 25, 200, the NPI lower and upper probabilities for the events X(r) < Y(r) for all r = 1, . . . , m are displayed in Fig. 1, with row 1,2,3 corresponding to cases A,B,C. Actually, the plotted lines per value of r represent the intervals bounded by the corresponding lower and upper probabilities, so the length of each line is the imprecision for that event. These results illustrate clearly the effect of increased sample sizes, leading to decreasing imprecision for future order statistics that are most likely to fall within the observed data range. For extreme future order statistics, 6 Frank P.A. Coolen and Tahani A. Maturi Table 1 Data sets, Example 1 A B C X: 1 X: 1 X: 1 4 2 2 7 3 8 4 Y: Y: Y: 13 14 15 16 m= 25 9 10 11 12 150 5 r 5 0.4 0.8 A 0 50 r 15 4 3 2 r 6 8 0.0 0.4 0.8 0.0 0.4 0.8 Lower and Upper probabilites Lower and Upper probabilites m= 5 m= 25 m= 200 150 0 50 5 1 0.0 0.4 0.8 B r 15 r 3 4 5 25 Lower and Upper probabilites 2 r 5 7 m= 200 1 0.0 0.0 0.4 0.8 0.0 0.4 0.8 Lower and Upper probabilites Lower and Upper probabilites m= 5 m= 25 m= 200 150 0 50 5 1 0.0 0.4 0.8 Lower and Upper probabilites C r 15 r 3 4 5 25 Lower and Upper probabilites 2 r 3 4 6 25 m= 5 2 3 5 0.0 0.4 0.8 Lower and Upper probabilites 0.0 0.4 0.8 Lower and Upper probabilites Fig. 1 NPI lower and upper probabilities, Example 1 imprecision remains high as no assumptions are made about the spread of probability mass within any interval Ijxx or Ijyy , so also not in the end intervals. This makes clear that, without additional assumptions, no strong inferences can be achieved for events involving extreme future order statistics if m is substantially larger than n. Example 2. We consider the data set of a study of the effect of ozone environment on rats growth [8, p.170]. One group of 22 rats were kept in an ozone environment and the second group of 23 similar rats were kept in an ozone-free environment. Both groups were kept for 7 days and their weight gains are given in Table 2. The NPI lower and upper probabilities (4) and (5) for the events X(r) < Y(r) , r = 1, . . . , m, are displayed in Fig. 2, where the first row gives figures NPI for future order statistics 7 Table 2 Rats’ weight gains data, Example 2 Ozone group (X) -15.9 6.1 14.0 28.2 -14.7 6.6 14.3 39.9 -12.9 6.8 15.5 44.1 -9.9 7.3 15.7 54.6 -9.0 10.1 17.9 Ozone-free group (Y ) -9.0 12.1 20.4 -16.9 19.2 24.4 27.4 13.1 21.4 25.9 28.5 15.4 21.8 26.0 29.4 17.4 21.9 26.0 38.4 17.7 22.4 26.6 41.0 18.3 22.7 27.3 corresponding to the full data for the cases with m = 5, 25, 200, while the second row gives the corresponding figures but with the observation −16.9 removed from group Y . This is done as this value could perhaps be considered to be an outlier, hence it might be interesting to see its influence on these inferences. Note that the data for group X and for group Y both contain two tied observations, at −9.0 and 26.0, respectively. As tied observations are within the same group, we just add a very small amount to one of them, not affecting their rankings within the group nor with the data for both groups combined and therefore not affecting the inferences. This can be interpreted as assuming that these values actually differ in a further decimal, not reported due to rounding. If observations where tied among the two groups, the same breaking of ties could be performed, with the NPI method presented in this paper applied to all possible ways to do so, and the smallest (largest) of the corresponding lower (upper) probabilities for the event of interest would be used as the NPI lower (upper) probability. The possibility to break ties in this manner is an attractive feature of statistical methods using lower and upper probabilities, as it does not require further assumptions for such tied values. This example shows that these data strongly support the event X(r) < Y(r) for future order statistics that are likely to be in the middle area of the data ranges, with the values of the NPI lower and upper probabilities reflecting the amount of overlap in the observed data for groups X and Y . For extreme future order statistics the imprecision is again very large, and the effect of deleting the smallest Y value from the data has caused quite a difference between the inferences for small values of r, as the lower ends of the plots in rows 1 and 2 in Fig. 2 clearly illustrate. References 1. Arts GRJ, Coolen FPA, van der Laan P (2004) Nonparametric predictive inference in statistical process control. Quality Technology and Quantitative Management 1:201– 216 2. Arts GRJ, Coolen FPA (2008) Two nonparametric predictive control charts. Journal of Statistical Theory and Practice 2:499–512 8 Frank P.A. Coolen and Tahani A. Maturi m= 25 m= 200 150 r 100 r 0 1 5 2 50 10 r 3 15 4 20 5 25 200 m= 5 0.0 0.4 0.8 0.0 0.4 0.8 0.0 0.8 Lower and Upper probabilites m= 5 m= 25 m= 200 150 r r 0 1 5 2 50 10 100 15 4 20 5 25 200 Lower and Upper probabilites 3 r 0.4 Lower and Upper probabilites 0.0 0.4 0.8 Lower and Upper probabilites 0.0 0.4 0.8 Lower and Upper probabilites 0.0 0.4 0.8 Lower and Upper probabilites Fig. 2 NPI lower and upper probabilities, Example 2 3. Augustin T, Coolen FPA (2004) Nonparametric predictive inference and interval probability. Journal of Statistical Planning and Inference 124:251–272 4. Coolen FPA (1998) Low structure imprecise predictive inference for Bayes’ problem. Statistics & Probability Letters 36:349–357 5. Coolen FPA (2006) On nonparametric predictive inference and objective Bayesianism. Journal of Logic, Language and Information 15:21–47 6. Coolen FPA, Coolen-Schrijner P (2007) Nonparametric predictive comparison of proportions. Journal of Statistical Planning and Inference 137:23–33 7. De Finetti B (1974) Theory of probability (2 volumes). Wiley, London 8. Desu MM, Raghavarao D (2004) Nonparametric statistical methods for complete and censored data. Chapman and Hall, Boca Raton 9. Geisser S (1993) Predictive inference: an introduction. Chapman and Hall, London 10. Hill BM (1968) Posterior distribution of percentiles: Bayes’ theorem for sampling from a population. Journal of the American Statistical Association 63:677–691 11. Lawless JF, Fredette M (2005) Frequentist prediction intervals and predictive distributions. Biometrika 92: 529–542 12. Walley P (1991) Statistical reasoning with imprecise probabilities. Chapman and Hall, London 13. Weichselberger K (2001) Elementare Grundbegriffe einer allgemeineren Wahrscheinlichkeitsrechnung I. Intervallwahrscheinlichkeit als umfassendes Konzept. Physika, Heidelberg
© Copyright 2026 Paperzz