probability plots

An investigator sometimes obtains a numerical
sample and wants to determine if it is plausible
that it came from a certain distribution.
 This may be necessary because many
procedures for statistical inference are based on
the assumption that the population distribution
is of a specific type.
 Knowing the distribution can sometimes give
insight into the physical mechanism that
generates the data.

2
An effective way to check a distributional
assumption is to construct what is called a
probability plot.
 The essence of such a plot is that if the
distribution on which the plot is based is correct,
the points in the plot should fall on a straight line.
 The details in constructing the plot differ a bit
from source to source.


Recall that the (100p)th percentile of a continuous
distribution with cdf F is the number   p  such that
F   p    p

Sample percentiles are handled in roughly the same
way, e.g. the (100p)th sample percentile should have
(100p)% of the data to the left of the data point.

This can’t happen exactly for all percentiles because of
the discrete number of data points. For example, if n =
10, what is the 50th percentile?
4

Order the n sample observations from
smallest to largest. Then the j-th smallest
observation in the list is taken to be
100  j  .5 / n  th sample percentile.
5

We then plot in the x-y plane
 100  j  .5 / n th population percentile, jth smallest data point 

If the sample percentiles are close to the
population percentiles, the points will fall close
45
to the
line, and the distribution is
plausible.
6

An investigator is typically not interested in
knowing whether a specified probability
distribution is a plausible model, but whether
some member of a family of distributions
supplies a plausible model.

As an example, one may be interested in
whether some normal distribution is a good
fit.
7

The standard normal percentiles and
percentiles of an arbitrary normal distribution
are related by:


2
 Percentile for normal  , 
    corresponding normal  0,1
percentile
8

A plot of pairs
 100  j  .5 / n th z percentile, jth smallest data point 
is a normal probability plot. If the sample
2

,



observations are drawn from a normal
distribution, the points should fall close to a
line with slope  and intercept  .
9
Symmetric with “lighter tails” that the normal
distribution (points below line on right end and
above line on left end; see Figure 4.34)
 Symmetric with heavier tails than the normal
distribution (points above line on right end and
below line on left end; see Figure 4.37 (a))
 For a distribution that has a short left tail and
long right tail (positively skewed), both the
smallest and largest observations will be above
the line (Figure 4.37b).

10

Even when the population distribution is
normal, the sample percentiles will not
coincide exactly with the theoretical
percentiles because of sampling variability.

How much can the points in the probability
plot deviate from a straight-line pattern
before the assumption of population
normality is no longer plausible?
11

There is typically greater variation in the
appearance of a probability plot for sample
sizes smaller than 30, and only for a much
larger sample size does a linear pattern
generally predominate.

When a plot is based on a small sample, only
a very substantial departure from linearity
should be taken as conclusive evidence of
non-normality.
12