A NOTE ON REGIONS OF GIVEN PROBABILITY OF THE SKEW

A NOTE ON REGIONS OF GIVEN PROBABILITY
OF THE SKEW- NORMAL DISTRIBUTION
A. Azzalini
Department of Statistical Sciences
University of Padua, Italy
e-mail: [email protected]
January 2000
revision June 2001
1
I NTRODUCTION AND SUMMARY
One of the nice mathematical properties of the multivariate normal distribution is the availability of a simple method for constructing regions of assigned probability p and minimum
geometric measure. This feature is useful in a number of theoretical and practical problems;
the construction of tolerance regions is an example of the latter type.
If Z is a d−dimensional random variable with distribution Nd (0, Ω) where Ω represents
a covariance matrix, it is well known that the appropriate region is given by
RN = {x : x> Ω−1 x ≤ cp }
(1)
where cp is the p-th quantile of the χ2d distribution.
The present note addresses the same problem as before, namely the construction of a
region with given probability p and minimum volume, when the assumption on the distribution of Z is replaced by that of skew-normality. By this term, we mean that the density
function of Z at x (x ∈ Rd ) is
f (x) = 2 φd (x; Ω) Φ(α> x)
where φd (x; Ω) denotes the Nd (0, Ω) density function at x, Φ is the N(0, 1) distribution function and α is a vector of shape parameters. The above expression of the skew-normal density
refers in fact to the special case in which the location parameter is the null vector and Ω is a
correlation matrix. Since the stated problem is location and scale equivariant, this assumption does not involve any loss of generality, and it simplifies the notation. For a systematic
treatment of the skew-normal distribution, see Azzalini & Dalla Valle (1996); further results
are given by Azzalini & Capitanio (1999).
Since a quadratic form Z > Ω−1 Z has a χ2d distribution also in the case of a skew-normal
variate, (1) is still a region of exact probability p, but it does not have minimum volume,
because it does not correspond to the set of points with highest values of the density function.
Clearly, the appropriate set, RSN say, is of the form
RSN = {x : f (x) ≥ f0 },
for a suitable value f0 , which depends on p, Ω and α, such that the condition P{RSN } = p
holds.
1
An exact solution of this problem does not seem feasible, and one must look for an approximate one. The main part of this note deals with the case d = 2, which is the most
relevant one after the basic case d = 1, and describes a solution which has been found to be
satisfactory for practical purposes. The same sort of approximaton used for d = 2 has been
considered for a few other values of d, and it turned out to work well also in these other
cases.
2
T HE BIVARIATE CASE
Since the quadratic form x> Ω−1 x in (1) for the normal case can be re-written as
−2 log φd (x; Ω) − d log(2π) − log |Ω|,
it is quite natural to consider the analogous expression for the skew-normal case, simply
replacing the expression of the density φd (x; Ω) by f (x). This idea leads to consider the
region
{x : 2 log f (x) ≥ −cp − d log(2π) − log |Ω|}
(2)
as a candidate solution to our problem.
We have examined the empirical performance of this simple rule in a set of simulation
experiments, starting the case d = 2. Since Ω is a correlation matrix, then |Ω| = 1 − ω 2 , where
ω is the off-diagonal element of the matrix. On recalling that cp = −2 log(1 − p) for d = 2,
inequality in (2) leads to
2 log f (x) ≥ 2 log(1 − p) − 2 log(2π) − log(1 − ω 2 ).
(3)
In these simulation experiments, various combinations of the parameters α1 , α2 , ω have
been selected. For each choice of the parameters, 106 replicated samples have been generated
from the given distribution, and the rule (2) has been applied to a set of p values, namely
p = (0.99, 0.975, 0.95, 0.90, 0.80, 0.70, 0.50, 0.30, 0.20, 0.10, 0.05, 0.025, 0.01),
obtaining a corresponding vector of observed relative frequencies, p̃ say. The pseudo-random
variates have been generated with the aid of software provided by Azzalini (1998).
Figure 1 summarizes the main features for one of these experiments, having α1 = 2, α2 =
6, ω = −0.5. The circles indicate the points (cp , cp̃ ), where cp̃ denotes the quantile function
of the χ22 distribution evaluated at p̃. Ideally, these points should be lying on the dashed
line which corresponds on the identity function. This is not the case here, but it is apparent
that the points are very well aligned along some line, and this line is almost perfectly parallel
to the identity line. In the specific example displayed here, the slope of the line fitted to
the points was 0.9966 and the sample correlation was 0.999996. The meaning of the crosses
shown on the plot will be explained shortly.
The features mentioned above are not specific of the case to which Figure 1 refers. Many
others simulation examples have been performed and the same features remained constant:
the points were invariably well aligned along a line, and the slope of the fitted line was
extremely close to 1. In other words, the relationship
cp̃ = h + cp
(4)
was remarkably accurate to summarize the empirical data in all cases considered. The only
ingredient depending of the parameters of the distribution was the displacement amount,
2
10
8
6
c(p^)
4
2
0
0
2
4
6
8
10
c(p)
Figure 1: Actual versus nominal values of the probability p, transformed to the quantile
scale, in the case when the parameters of the distribution are α1 = 2, α2 = 6, ω = −0.5
h. Generally h > 0; it was 0 only when α1 = α2 = 0 which corresponds to the normal
distribution.
The next step was then to interpolate h as a function of α1 , α2 , ω. It turned out that h is a
1/2
monotone function of α∗ = α> Ωα
, a quantity which has already emerged as a summary
measure of skewness of this distribution (Azzalini & Capitanio, 1999). This fact is illustrated
by Figure 2 which shows the points (α∗ , h) for all simulation experiments. For instance, the
example described earlier corresponds to the point (5.29, 1.111).
To interpolate the points of Figure 2, we notice that the relationship between α∗ and
{log(eh/2 −1)}−1 is close to proportionality with ratio −0.6478; see Figure 3 for an illustration.
This fact translates into the interpolatory function
h = 2 log(1 + exp(−b/α∗ ))
(5)
where b = 1.544. This expression produces the continuous line plotted in Figure 2, with a
satisfactory interpolation of the observed points.
In practical terms, the above discussion leads to the following simple modification of the
initial procedure; namely (3) is replaced by
2 log f (x) ≥ 2 log(1 − p) − 2 log(2π) − log(1 − ω 2 ) + 2 log[1 + exp(−b/α∗ )].
(6)
If we denote by p̂ the estimated actual probabilities obtained by (6), the crosses in Figure 1
indicate the points (cp , cp̂ ). All new points are essentially on the identity line. Notice that the
type of axes scale used in Figure 1 emphasizes the behaviour for large values of p, which is
the most relevant case in practice.
Table 1 gives numerical details for the same case of Figure 1, comparing the nominal
values of p with the actual probabilities p̂ on the natural scale, rather than the quantile scale.
3
0.0
0.2
0.4
0.6
h
0.8
1.0
1.2
Relationship between h and alpha* (case d=2)
0
5
10
15
20
alpha*
Figure 2: Observed values of h plotted versus α∗ and interpolating function
−5
−10
−15
1/log(exp(h/2)−1)
0
Relationship between h and alpha* (case d=2)
0
5
10
15
20
alpha*
Figure 3: Observed values of 1/{log(eh/2 − 1)} plotted versus α∗ and interpolating line
4
Table 1: Nominal and actual values of the coverage probability in the case α1 = 2, α2 =
6, ω = −0.5
p 0.01 0.025 0.05 0.1
0.2
0.3
0.5
0.7
0.8
0.9
0.95 0.975 0.99
p̂ 0.043 0.056 0.077 0.121 0.212 0.306 0.500 0.698 0.797 0.898 0.949 0.974 0.990
The agreement between p and p̂ is fully satisfacory for moderate and large p. Only if p is close
to 0 one observes appreciable differences; however this discrepancy is not of much practical
relevance, because usually p is larger that 12 , and often substantially larger.
The behaviour shown in Figure 1 and in Table 1 is not specific of the parameters values
considered there. The same sort of results has been observed almost identical in all cases
considered.
3
O THER NUMBERS OF DIMENSIONS
The basic criterion (2) can be adopted also for other values of d, provided cp now refers to
the appropriate χ2d distribution.
Some additional work has been done for the cases d = 1, d = 3 and d = 4. The case
d = 1 is a special one, since the required region is an interval and it is not difficult to obtain
an exact numerical solution, by finding two points, x1 and x2 , say, such that f (x1 ) = f (x2 ),
and P{x1 < Z < x2 } = p. The numerical computations can be accomplished easily with
the aid of software tools mentioned earlier (Azzalini, 1998) to compute the integral of the
distribution function. Although the availability of a numerically exact solution removes the
need of additional rules like (6), we can still proceed like in the case d = 2, to examine
whether a similar behaviour is present.
It turned out that the patterns observed with d = 2 were still there for all values of d
which have been examined. Not only were the points aligned similarly to those of Figure 1,
but also an interpolatory formula of type (5) worked well, by suitably changing the value of
b, with a corresponding interpolation similar to the one observed in Figure 2.
The end conclusion is then as follows: by modifying the region (2) similarly to (6), we
obtain the approximation of type
RSN ≈ {x : 2 log f (x) ≥ −cp − d log(2π) − |Ω| + 2 log[1 + exp(−b/α∗ )]}
(7)
to the required region; here b = 1.854, 1.544, 1.498, 1.396 when d varies from 1 to 4, respectively. Over the range of cases considered, (7) works well in practice to obtain regions of
given probability p, provided this is not close to 0.
Clearly, it would be welcome to have some theoretical understanding of the reasons why
the proposed formula works so nicely numerically. However, even in its present form, the
result can still be of interest, at least for practical usage.
A CKNOWLEDGMENTS
I would like to thank an anonymous referee for insightful remarks which led to substantial
improvement of the paper. This research was supported partly by ‘Consiglio Nazionale delle
Ricerche’ (grant No. 98.01532.CT10) and partly by MURST (grant PRIN 2000), Italy.
5
R EFERENCES
Azzalini, A. (1998). The library sn for S-plus. Available on the WWW at URL:
http://azzalini.stat.unipd.it/SN
Azzalini, A. & Capitanio, A. (1999). Statistical applications of the multivariate skew-normal
distribution. J.Roy. Statist. Soc., B 61, 579–602.
Azzalini, A. & Dalla Valle, A. (1996). The multivariate skew-normal distribution. Biometrika
83, 715–26.
6