An investigator sometimes obtains a numerical sample and wants to determine if it is plausible that it came from a certain distribution. This may be necessary because many procedures for statistical inference are based on the assumption that the population distribution is of a specific type. Knowing the distribution can sometimes give insight into the physical mechanism that generates the data. 2 An effective way to check a distributional assumption is to construct what is called a probability plot. The essence of such a plot is that if the distribution on which the plot is based is correct, the points in the plot should fall on a straight line. The details in constructing the plot differ a bit from source to source. Recall that the (100p)th percentile of a continuous distribution with cdf F is the number p such that F p p Sample percentiles are handled in roughly the same way, e.g. the (100p)th sample percentile should have (100p)% of the data to the left of the data point. This can’t happen exactly for all percentiles because of the discrete number of data points. For example, if n = 10, what is the 50th percentile? 4 Order the n sample observations from smallest to largest. Then the j-th smallest observation in the list is taken to be 100 j .5 / n th sample percentile. 5 We then plot in the x-y plane 100 j .5 / n th population percentile, jth smallest data point If the sample percentiles are close to the population percentiles, the points will fall close 45 to the line, and the distribution is plausible. 6 An investigator is typically not interested in knowing whether a specified probability distribution is a plausible model, but whether some member of a family of distributions supplies a plausible model. As an example, one may be interested in whether some normal distribution is a good fit. 7 The standard normal percentiles and percentiles of an arbitrary normal distribution are related by: 2 Percentile for normal , corresponding normal 0,1 percentile 8 A plot of pairs 100 j .5 / n th z percentile, jth smallest data point is a normal probability plot. If the sample 2 , observations are drawn from a normal distribution, the points should fall close to a line with slope and intercept . 9 Symmetric with “lighter tails” that the normal distribution (points below line on right end and above line on left end; see Figure 4.34) Symmetric with heavier tails than the normal distribution (points above line on right end and below line on left end; see Figure 4.37 (a)) For a distribution that has a short left tail and long right tail (positively skewed), both the smallest and largest observations will be above the line (Figure 4.37b). 10 Even when the population distribution is normal, the sample percentiles will not coincide exactly with the theoretical percentiles because of sampling variability. How much can the points in the probability plot deviate from a straight-line pattern before the assumption of population normality is no longer plausible? 11 There is typically greater variation in the appearance of a probability plot for sample sizes smaller than 30, and only for a much larger sample size does a linear pattern generally predominate. When a plot is based on a small sample, only a very substantial departure from linearity should be taken as conclusive evidence of non-normality. 12
© Copyright 2026 Paperzz