Homogeneity testing: How homogeneous do heterogeneous

Journal of Hydrology (2008) 360, 67– 76
available at www.sciencedirect.com
journal homepage: www.elsevier.com/locate/jhydrol
Homogeneity testing: How homogeneous do
heterogeneous cross-correlated regions seem?
A. Castellarin
a,*
, D.H. Burn b, A. Brath
a
a
DISTART, School of Civil Engineering, Viale Risorgimento, 2 University of Bologna, I-40136 Bologna, Italy
Department of Civil and Environmental Engineering, University of Waterloo, 200 University Avenue West, Waterloo, ON,
Canada N2L 3G1
b
Received 30 November 2007; received in revised form 8 July 2008; accepted 11 July 2008
KEYWORDS
Regional flood frequency
analysis;
Probability weighted
moments (PWM);
L-moments;
Variance of sample
estimators;
Hosking and Wallis
heterogeneity test
Summary The homogeneity of the flood frequency regime for a given pooling-group of
sites is a fundamental assumption for many regional flood frequency analysis techniques.
Assessing regional homogeneity is a critical step, which may be complicated by the presence of cross-correlation among flood sequences. The scientific literature proposes a
number of statistical homogeneity tests and documents that inter-site correlation of
floods is normally not negligible, but does not specifically address the impact of cross-correlation on such statistical tests. This paper analyzes the effectiveness of a well-known
homogeneity test proposed in the scientific literature in the presence of inter-site
cross-correlation through a series of Monte Carlo experiments. The numerical experiments enable us to comment on a possible theoretical correction for the test and to identify an empirical tool that accounts for the impact of inter-site cross-correlation of
floods.
ª 2008 Elsevier B.V. All rights reserved.
Introduction
A crucial task in designing, constructing and operating river
engineering works or hydraulic structures is flood risk
assessment, which is usually quantified for a given site as
the flood magnitude associated with the recurrence interval
T (the so-called T-year flood). Regional (or pooled) flood
* Corresponding author. Tel.: +39 051 209 3365.
E-mail address: [email protected] (A. Castellarin).
frequency analysis is widely employed in the estimation of
the flooding potential to avoid unreliable extrapolation
when dealing with data record lengths that are short as
compared to the recurrence interval of interest.
The traditional approach to regional flood frequency
analysis involves the identification of regions, or poolinggroups, of sites that are homogeneous in terms of flood frequency regime (see e.g., Dalrymple, 1960; Burn, 1990). The
homogeneity of the group of sites is a fundamental requirement in order to perform an effective regional estimation of
the T-year quantile (e.g., Lettenmaier et al., 1987; Stedinger and Lu, 1995).
0022-1694/$ - see front matter ª 2008 Elsevier B.V. All rights reserved.
doi:10.1016/j.jhydrol.2008.07.014
68
The literature proposes a number of homogeneity tests;
a few examples are summarised here. Since the introduction of the first statistical tests for assessing the homogeneity degree of a given group of sites, hydrologists were
very clear about the necessity of identifying effective measures of regional heterogeneity. Dalrymple (1949, 1960)
proposed a test, described in several classical textbooks
(e.g., Chow, 1964; Singh, 1992), that judges homogeneity
by analysing the variability of the sample coefficients of
variation (Cv) and/or skewness (Cs) among sites. The Dalrymple test became very popular among practitioners, until
Wiltshire (1986a,b,c) highlighted its limited discriminatory
power and proposed two alternatives. Lu (1991) and Lu
and Stedinger (1992) recommended a homogeneity test
based upon the variability of normalised at-site Generalized Extreme Value (GEV, see e.g. Jenkinson, 1955) distribution flood quantiles. Hosking and Wallis (1993, 1997)
proposed a heterogeneity measure that is a standardised
measure of the intersite variability of L-moment ratios
(see e.g., Hosking, 1990). The Hosking and Wallis (1993,
1997) heterogeneity measures are now routinely used by
hydrologists to test regional homogeneity (see e.g., Viglione et al., 2007).
Even though a plethora of homogeneity tests have been
proposed, the literature presents only a few studies that
compare systematically the power of tests (see e.g., Fill
and Stedinger, 1995; Viglione et al., 2007). The recent analysis by Viglione et al. (2007) compares the performance of
L-moment based tests (e.g., Hosking and Wallis, 1993) and
non-parametric rank tests (Scholz and Stephens, 1987; Durbin and Knott, 1971) and shows that rank based tests outperform L-moment based tests for highly skewed flood
frequency regimes.
Classical studies document that intersite correlation
among flood flows observed at different sites is typically
not negligible (see e.g., Matalas and Langbein, 1962; Stedinger, 1983). The impact of cross-correlation on trend
tests is clearly pointed out by Douglas et al. (2000). Research by Madsen and Rosbjerg (1997) and Madsen et al.
(2002) indicates that cross-correlation can affect homogeneity testing. Recently it has been shown how a probabilistic
statement can be attached to regional envelope curves
(RECs) identified for homogeneous regions (Castellarin
et al., 2005; Castellarin, 2007; Vogel et al., 2007). The
authors present a probabilistic interpretation of RECs and
propose an empirical estimator of the exceedance probability of a REC that takes into account the effect of intersite
correlation. Regional homogeneity is a fundamental prerequisite for applying the estimator proposed by the authors,
thus homogeneity testing in presence of cross-correlation
becomes a critical step.
We assess how intersite correlation impacts the homogeneity test proposed by Hosking and Wallis (1993, 1997). This
issue is discussed through theoretical considerations of the
variance of a regional estimator of L-moments for cross-correlated annual sequences and a series of numerical experiments. The results of the study enable us to: (1) show and
quantify the loss of performance of the test associated with
the presence of intersite correlation; (2) comment on a possible theoretical correction of the test; and (3) identify an
empirical tool for adjusting the original test for cross-correlated regions.
A. Castellarin et al.
Hosking and Wallis homogeneity test
Hosking and Wallis (1993, 1997) proposed a statistical test
for assessing the homogeneity of a group of basins at three
different levels by focusing on three measures of dispersion
for different orders of the sample L-moment ratios (see Hosking, 1990, for an explanation of L-moments).
1. A measure of dispersion for the L-Cv
,
R
R
X
X
2
V1 ¼
ni ðt2ðiÞ t2 Þ
ni :
i¼1
ð1Þ
i¼1
2. A measure of dispersion for both the L-Cv and the L-Cs
coefficients in the L-Cv–L-Cs space
,
R
R
h
i1=2 X
X
2
2
V2 ¼
ni ðt2ðiÞ t2 Þ þ ðt3ðiÞ t3 Þ
ni :
ð2Þ
i¼1
i¼1
3. A measure of dispersion for both the L-Cs and the L-kurtosis coefficients in the L-Cs–L-kurtosis space
,
R
R
h
i1=2 X
X
2
2
V3 ¼
ni ðt3ðiÞ t3 Þ þ ðt4ðiÞ t4 Þ
ni ;
ð3Þ
i¼1
i¼1
where t2 , t3 , and t4 are the group mean of L-Cv, L-Cs, and
L-kurtosis, respectively; t2(i), t3(i), t4(i), and ni are the values of L-Cv, L-Cs, L-kurtosis and the sample size for site
i; and R is the number of sites in the pooling group.
The underlying concept of the test is to measure the
sample variability of the L-moment ratios and compare it
to the variation that would be expected in a homogeneous
group. The expected mean value and standard deviation
of these dispersion measures for a homogeneous group,
namely lV k and rV k , are assessed through repeated simulations, by generating homogeneous groups of basins having
the same record lengths as those of the observed data. To
avoid any undue commitment to a particular three-parameter distribution, the authors recommend the four-parameter kappa distribution to generate the synthetic groups of
flood sequences. The kappa distribution includes as special
cases several well known two- and three-parameter distributions (see e.g., Hosking and Wallis, 1997; Castellarin
et al., 2007). The heterogeneity measures are then evaluated using the following expression
Hk ¼
V k lV k
;
rV k
for k ¼ 1; 2; 3:
ð4Þ
Hosking and Wallis suggest that a group of sites may be
regarded as ‘‘acceptably homogeneous’’ if Hk < 1, ‘‘possibly
heterogeneous’’ if 1 6 Hk < 2, and ‘‘definitely heterogeneous’’ if Hk P 2.
According to Hosking and Wallis (1993, p. 277–278), ‘‘if
H were used as a significance test, then the criterion for
rejection of the hypothesis of homogeneity at significance
level 10%, assuming normality for the distribution of V
would be H = 1.28. In comparison, a criterion H = 1 may
seem very strict, but as noted above, we do not seek to
use H in a significance test’’. The authors regard the reference values as guidelines instead, regarding for instance the
amount H = 1 as the borderline of whether a redefinition of
Homogeneity testing: How homogeneous do heterogeneous cross-correlated regions seem?
Cross-correlated Region
H values:
the region may lead to a meaningful increase in the accuracy of the quantile estimate.
Concerning the possible effects of cross-correlation on
the test the authors state that positive correlation among
sites is the most likely cause for negative values of Hk,
and large negative values, say Hk < 2, are likely to be associated with a large amount of cross-correlation (Hosking and
Wallis, 1997, p. 71). Since the synthetic sequences are
uncorrelated by definition, the sample variability of L-moment ratios for the synthetic group of sites is expected to
be higher than the sample variability for the original sequences when cross-correlation is present, even when real
and synthetic regions are characterised by the same degree
of heterogeneity. Therefore, cross-correlation may result in
large negative values of Hk when the group of sites is homogeneous as suggested by the authors. More importantly,
lower (rather than negative) values of Hk may cause a miscategorisation of the group of sites, suggesting to regard a
heterogeneous group of sites as possibly homogeneous.
Fig. 1 schematically illustrates three possible categorisation
errors. The figure reports the Hk values computed as recommended by Hosking and Wallis for an un-correlated region
on the x-axis, and the values of the same measure for a
cross-correlated region with the same degree of heterogeneity in terms of L-moments on the y-axis. Practically
speaking, the y-axis reports the Hk values that are actually
obtained from the application of the test to the real group
of sequences, the x-axis reports the Hk values for a hypothetical uncorrelated group of sites with the same degree
of heterogeneity. The homogeneity testing should be based
upon these latter Hk values but, unfortunately, they are unknown. Depending on the amount of cross-correlation of the
real group of sequences, the scheme identifies the following
categorisation errors (the darker the colour, the worse the
error): Error 1 – possibly heterogeneous cross-correlated
region categorized as acceptably homogeneous; Error 2 –
definitely heterogeneous region categorized as possibly heterogeneous; Error 3 – definitely heterogeneous region categorized as acceptably homogeneous. The remainder of our
paper analyzes the possibility of occurrence of these errors
from both a theoretical and an empirical perspective.
Information content of regional moments
It is well known that inter-site correlation is generally not
negligible for flood sequences (see e.g., Matalas and Langbein, 1962; Stedinger, 1983; Troutman and Karlinger,
2003; Rosbjerg, 2007 and Fig. 2) and leads to increases in
the variance of regional flood statistics (see for instance
Hosking and Wallis, 1988). For the case of R spatially correlated flood series with constant population mean and variance, each with record length n, Yule (1945) and Matalas
and Langbein (1962, Eq. (16)) document that the variance
of a regional mean is inflated by a factor that depends on
,
the average correlation among the sites q
Var½xj
q ¼
2
Err. 2
1
Err. 1
Err. 3
1
2
H values:
Uncorrelated Region
Figure 1 Possible categorization errors due to the presence
of cross-correlation.
69
r2X
ðR 1Þ;
½1 þ q
Rn
ð5Þ
where x indicates the regional sample mean and r2X the pop ¼ 0 the
ulation variance of the Rseries, each of length n; if q
variance of x reduces to r2X =ðRnÞ. Matalas and Langbein
(1962) defined the relative information content of R spatially correlated flood series, each of length n, as the ratio,
I¼
Var½x
ðR 1Þ1 :
¼ ½1 þ q
Var½xj
q
ð6Þ
The information content of the mean, I, in Eq. (6), is measured relative to the variance of the mean associated with
spatially and serially uncorrelated flows. Hence I = 1 when
¼ 0 and I < 1 when q
> 0. Values of I < 1 reflect the fact
q
that intersite correlation reduces the overall information
content of the regional sample. The effective number of
Figure 2 Empirical cross-correlation coefficients for a group of 32 Italian and 226 US annual flood sequences (see Castellarin, 2007
and Vogel et al., 2007).
70
A. Castellarin et al.
Stedinger (1983, Eq. (35)) derived the variance of an estimate of the regional variance of R cross-correlated and normally distributed series, each of length n, as,
h
i
2r4X
2 ðR 1Þ
Var s2X j
½1 þ q
q ¼
Rðn 1Þ
h i
2 ðR 1Þ;
¼ Var s2X ½1 þ q
ð8Þ
Probability Weighted Moments
Information content
of the mean
0.10000
0.01000
0.00100
b0
b1
b2
b3
0.00010
where s2X stands for the estimator of the regional variance
that, for samples of different length, is a weighted average
in which each sample variance is weighted proportionally to
the record length of the corresponding site, and q2 is the
average squared correlation of concurrent flows. Analogous
to the effective number of regional samples (sites) for estimation of the regional mean given in Eq. (7), the effective
number of regional samples (sites) for the estimation of
the regional variance, Rs2x , can be computed as follows,
0.10000
0.01000
0.00100
0.00010
0.00001
0.00001
Monte Carlo experiments
2 ðR 1Þ1 ;
Rs2x ¼ R½1 þ q
L moments
Information content
of conventional moments
0.10000
and therefore Eq. (9) returns
In general, q2 is smaller than q
a number of effective sites that is higher than the number
returned by Eq. (7).
The indexes Hk (with k = 1, 2 and 3) proposed by Hosking
and Wallis (1993) measure the inter-group variability of
sample L-moments and therefore are impacted by the presence of intersite correlation, which inflates the variance of
regional moments (as well as L-moments and L-moment ratios). The derivation of the relative information content of
regional L-moments and L-moment ratios is critical to the
quantification of the impact of cross-correlation on the Hosking and Wallis (1993) homogeneity test.
To derive indications of the information content of regional L-moments and L-moment ratios one may refer to the
Probability Weighted Moments (PWM, see Greenwood et al.,
1979), of which L-moments are linear combinations. We considered the class of PWMs for which the moment of order r
reads,
0.01000
0.00100
l1
l2
l3
l4
0.00010
0.10000
0.01000
0.00100
0.00010
0.00001
0.00001
Monte Carlo experiments
L moments ratios
Information content
of conventional moments
0.10000
br ¼ EfX½F X ðxÞr g;
0.01000
ð10Þ
where FX(x) is the cumulative distribution function (CDF) of
the random variable X and E{Æ} is the expectation. The unbiased estimator of br (see e.g., Greenwood et al., 1979)
reads,
0.00100
t2
t3
t4
0.00010
ð9Þ
0.10000
0.01000
0.00100
0.00010
0.00001
0.00001
Monte Carlo experiments
Figure 3 Variance of regional average PWMs, L-moments and
L-moment ratios for cross-correlated regions obtained from
Monte Carlo simulation (20,000 replicates) and computed
through the information content of the mean and conventional
moments (see Matalas and Langbein, 1962; Stedinger, 1983).
regional samples associated with estimation of the regional
mean Rx is then,
Rx ¼ RI:
ð7Þ
Figure 4 Schematic of the simulation options: 1HETSITE,
single discordant site, and BIMODAL, half discordant series
(regular catchments: dark grey; discordant catchments: light
grey).
Homogeneity testing: How homogeneous do heterogeneous cross-correlated regions seem?
br ¼ n1
n1
r
1 X
n j1
x j:n ;
r
j¼rþ1
ð11Þ
in which n is the length of the series and xj:n is the jth order
statistic, that is the jth value of a sample of length n arranged in ascending order (b0 is the sample mean). An unbiased sample estimator of the L-moment of order r + 1 is
then defined as,
‘rþ1 ¼
r
X
pr;k bk ;
k¼0
where pr;k ¼ ð1Þrk
r
rþk
;
k
k
r ¼ 0; 1; . . . ; n 1
ð12Þ
(‘1 is the sample mean). L-moment ratios of order 2 and
r+1 > 2 can then be estimated as,
t2 ¼ ‘2 =‘1 and trþ1 ¼ ‘rþ1 =‘2 ;
ð13Þ
where we adopt the same notation used in Eqs. (1)–(3), that
is t2 is the sample estimator of L-Cv, t3 of L-Cs, and t4 of
L-kurtosis.
The regional average PWM or L-moment can in general be
written as (see e.g. Hosking and Wallis, 1997),
PR
i¼1 ni Ms;i
MReg:
¼
ð14Þ
P
s
M
i¼1 ni
where MReg:
is the sample estimator of the regional moment
s
of order s (i.e., PWM, L-moment, L-moment ratio), Ms,i is
the at-site sample estimate of the same moment for site
i, ni is the length of the sequence at site i and R is the number of sites in the region (or pooling group). For the sake of
simplicity, the remainder of the paper considers regional
samples consisting of concurrent annual sequences of equal
length n.
Sampling properties of L-moments are analysed by Sankarasubramanian and Srinivasan (1999), whereas Elamir
and Seheult (2004) present the exact variance structure of
sample PWMs and L-moments, but the sampling properties
of regional PWMs and L-moments for cross-correlated sequences has not been addressed yet. We analysed this issue
through Monte Carlo simulation experiments. We generated
20,000 synthetic regional samples from the multivariate
normal distribution with constant mean (i.e., 10) and variance (i.e., 1). Regional samples consist of R = 10, 20, 30
concurrent and cross-correlated sequences of length
n = 10, 25, 50 years and cross-correlation q ranging from 0
to 0.8 with step 0.2. We then computed the values of the
empirical variance of regional moments (PWMs up to order
3, L-moments up to order 4 and corresponding L-moment ratios) for the 20,000 replicates of each set of R, n and q values and we adopted the same form of the regional
information content derived by Matalas and Langbein
(1962) and Stedinger (1983) to express these values as a
function of: (i) the empirical variance of the corresponding
regional moment for the uncorrelated case (R, n, q = 0); (ii)
rþ1 , depending on the considered moment.
R; and (iii) q or q
The results can be summarised as follows,
h
h
i
i
q Va
^r bReg:
^r bReg:
Va
½1 þ qðR 1Þ
r
r
for r ¼ 0; 1; 2 and 3;
h
i
h
i
^r ‘Reg:
^ Reg:
rþ1 ðR 1Þ
Va
rþ1 q Var ‘rþ1 ½1 þ q
for r ¼ 0; 1; 2 and 3;
h
i
h
i
^r tReg:
^ Reg:
rþ1 ðR 1Þ
Va
rþ1 q Var trþ1 ½1 þ q
for r ¼ 1; 2 and 3;
5
BIMODAL: 20 sites; 25 years
10
Uncorr .
0.2
0.4
0.6
0.8
7.5
H value
H value
7.5
2.5
0
-2.5
0.1
ð15Þ
^r [Æ] indicates the empirical variance resulting from
where Va
Monte Carlo simulations. The scatterplots of Fig. 3 report on
the x-axis the left term of Eq. (15) and the right terms on
the y-axis. Each point refers to a particular set of R, n,
and q values.
This evidence can be summarised by saying that the
information content of regional PWMs coincides with the
information content of the mean, regardless of the order
of the moment, whereas the information content of regional L-moments and L-moment ratios coincides with the
information content of conventional moments of the same
order. This results is consistent with the fact that sample
PWMs of any order are linear combinations of observations,
and therefore employing a first order Taylor series approximation to the variance of sample PWMs (see e.g., Castellarin et al., 2005) all covariance terms are equal to zero. The
same consideration does not apply to sample L-moments,
which are linear combinations of sample PWMs of different
orders.
1HETSITE: 20 sites; 25 years
10
71
5
Uncorr.
0.2
0.4
0.6
0.8
2.5
0
0.2 0.3 0.4 0.5 0.6
Cv* (discordant series)
0.7
-2.5
0.1
0.2 0.3 0.4 0.5 0.6
Cv* (discordant series)
0.7
Figure 5 Monte Carlo simulations: average H1 values computed from 10,000 replicates with different degrees of cross-correlation
and heterogeneity (1HETSITE: single discordant site with Cv = Cv*, BIMODAL: half discordant series with Cv = Cv*).
72
A. Castellarin et al.
1HETSITE: 10 sites
(10,25,50 years)
3
H values:
Cross-correlated Region
H values:
Cross-correlated Region
3
2
1
0
-1
Uncorr.
0.2
0.4
0.6
0.8
-2
-3
2
1
0
-1
Uncorr.
0.2
0.4
0.6
0.8
-2
-3
3
1
2
3
H values:
Uncorrelated Region
2
H values:
Uncorrelated Region
1HETSITE: 20 sites
(10,25,50 years)
BIMODAL: 20 sites
(10,25,50 years)
3
2
1
0
-1
Uncorr.
0.2
0.4
0.6
0.8
-2
0
H values:
Cross-correlated Region
0
H values:
Cross-correlated Region
BIMODAL: 10 sites
(10,25,50 years)
-3
3
2
1
0
-1
Uncorr.
0.2
0.4
0.6
0.8
-2
-3
0
1
2
3
0
H values:
Uncorrelated Region
1HETSITE: 30 sites
(10,25,50 years)
H values:
Cross-correlated Region
2
1
0
-1
Uncorr.
0.2
0.4
0.6
0.8
-2
-3
0
1
2
H values:
Uncorrelated Region
2
H values:
Uncorrelated Region
3
2
1
0
-1
Uncorr.
0.2
0.4
0.6
0.8
-2
-3
3
1
BIMODAL: 30 sites
(10,25,50 years)
3
3
H values:
Cross-correlated Region
1
0
1
2
H values:
Uncorrelated Region
3
Figure 6 Monte Carlo simulations, average H1 values: uncorrelated vs. cross-correlated regions for different degrees of crosscorrelation and heterogeneity.
The relative information content of regional L-moment
ratios is critical to the interpretation of how cross-correlation among flood sequences may impact Hosking and Wallis
(1993) heterogeneity measures. For example, H1 measures
the dispersion of sample L-Cv values for the group of series.
If cross-correlation is present, the expected variance of regional L-Cv for the original R series corresponds to the variance of Rs2x < R independent sequences, with Rs2x expressed
by Eq. (9). Using as reference in Eq. (4) lV 1 and rV 1 values
estimated for R synthetic and independent series may severely diminish the significance of the test. Analogous considerations hold for measures H2 and H3.
Monte carlo experiments
We assessed the sensitivity of H1 to cross-correlation by
adopting a Monte Carlo simulation algorithm similar to the
algorithm used by Hosking and Wallis (1988) and by Castellarin et al. (2005) (see Appendix). We repeatedly generated
10,000 synthetic regions with given degrees of regional heterogeneity and cross-correlation. Each synthetic region consists of R spatially correlated flood series of length n, with
cross-correlation among the sequences equal to q. The following values were adopted for the experiments: R = 10, 20,
30 series; n = 10, 25, 50 years and q = 0.2, 0.4, 0.6, 0.8 and
Homogeneity testing: How homogeneous do heterogeneous cross-correlated regions seem?
30 Sites
H values:
Cross-correlated Region
H values:
Cross-correlated Region
10 Sites
1.5
0
73
1.5
0
-1.5
-1.5
0
1.5
H values:
Uncorrelated Region
0
1.5
H values:
Uncorrelated Region
Figure 7 Monte Carlo simulations, average H1 values: uncorrelated vs. cross-correlated regions for different degrees of crosscorrelation, heterogeneity and parent distributions (black markers: GEV; white markers: EV1; average correlation: 0.2 triangles; 0.4
circles; 0.6 diamonds; length of the series 10 and 50 years).
0.0 (uncorrelated case). We selected as regional parent distribution for generating the sequences of regular sites the
Gumbel distribution with unit mean and Cv equal to 0.4,
EV1(1, 0.4) (see e.g., Hosking and Wallis, 1988), whereas
we generated the discordant series from an EV1(1, Cv*), with
Cv* = 0.1, 0.2, . . . , 0.7.
The study adopts two different generation options. The
first option (1HETSITE) considers synthetic regions in which
all series but one are regular, and only one discordant series
is present. The second option (BIMODAL) generates R/2 sequences from an EV1(1, 0.4) and R/2 from an EV1(1, Cv*).
Fig. 4 illustrates a schematic of the two options, reporting
regular (dark grey) and discordant (light grey) catchments.
For both options, 1HETSITE and BIMODAL, and all considered cases (R, n, q and Cv*), we averaged the 10,000 H1 values. Fig. 5 illustrates for R = 20 series and n = 25 years the
relationship between average H1 and Cv* for different degrees of cross-correlation, showing the strong control of
cross-correlation on H1.
Fig. 6 reports the results of the simulation experiments
for 1HETSITE and BIMODAL options. The figure illustrates
the relationship between average H1 values for uncorrelated
and cross-correlated regions with the same degree of crosscorrelation, identified by Cv*.
Since the results presented in Figs. 5 and 6 may be
dependent on the particular parent distribution adopted
for the numerical experiment, we repeated the 1HETSITE
simulations for R = 10, 30 series, n = 10, 50 years and
q = 0.0, 0.2, 0.4, 0.6 using the GEV distribution instead of
the EV1 distribution. The parameters of the distribution
were set to obtain a skewness coefficient c = 2.25, which
is almost twice as high as c for a EV1 distribution, a Cv equal
to 0.4 for regular series and Cv* = 0.2, 0.4, 06 for discordant
series and a unit mean. Fig. 7 compares the results obtained
for the EV1 and GEV regional parents in terms of average H1
values.
The analysis of Figs. 5–7 suggests that the effects of
cross-correlation on the discriminatory power of the Hosking and Wallis (1993) homogeneity test (i.e., H1 heterogeneity measure) are not negligible for plausible values of the
cross-correlation coefficient among annual flood sequences
(q = 0.2–0.6, see e.g., Matalas and Langbein, 1962; Stedinger, 1983; Hosking and Wallis, 1988; Vogel et al., 2001; Castellarin, 2007). In particular, Figs. 6 and 7 show that, due to
the presence of cross-correlation, categorization errors of
type 1 and 2 (see Fig. 1) occur frequently, and categorization errors of type 3 may also occur.
Fig. 6 clearly shows that the impact of cross-correlation
in terms of the relationship between H1 values for correlated and uncorrelated regions is associated with the degree
of cross-correlation and the number of sequences. Results
for 1HETSITE and BIMODAL options show nearly coincident
patterns, regardless of the length of the series n. Additionally, Fig. 7 points out that the results are independent of the
regional parent distribution (EV1 or GEV) as long as the heterogeneity degree is of the same nature, as for the simulations performed here in which parent distributions of
regular and discordant series differ in terms of Cv (or equivalently L-Cv) only.
Finally, it is interesting to observe that the results reported in Figs. 6 and 7 for a given degree of cross-correlation among series show a significant linearity between
average H1 values for cross-correlated and uncorrelated
synthetic regions, at least for the range of H1 values considered in the figures.
These considerations suggest identifying an empirical
corrector of the homogeneity test based upon H1 for providing an approximate indication of the actual degree of heterogeneity of the region in the presence of cross-correlation.
The form of the selected empirical corrector reads as
follows,
Discussion
where H1,adj is the adjusted value of the heterogeneity measure, H1 is the value resulting from the homogeneity test, C
is the empirical coefficient of the corrector that is assumed
to be constant, q2 is the average squared correlation of
The results reported in Figs. 5–7 lead to a number of considerations. We present below the most significant ones.
H1;adj ¼ H1 þ C q2 ðR 1Þ;
ð16Þ
74
A. Castellarin et al.
1HETSITE
BIMODAL
4
Empirical Corrector
Empirical Corrector
4
3
2
1
3
2
1
0
0
0
1
2
Monte Carlo
3
4
0
1
2
Monte Carlo
3
4
Figure 8 Average H1 values for uncorrelated regions having the same heterogeneity degree of original cross-correlated regions
obtained from simulation (Monte Carlo) and by applying Eq. (16) (black markers: GEV; white markers: EV1).
concurrent flows and R the number of sequences in the region. The expression selected for the corrector reflects:
(1) the discussion of Section 3 on the information content
of regional L-Cv; and (2) the marked linearity between average H1 values for cross-correlated and uncorrelated synthetic regions obtained via Monte Carlo simulation.
We identified the value of C in Eq. (16) from the average
H1 values from simulation 1HETSITE – EV1 (i.e., option – regional parent distribution) through an ordinary least squares
regression procedure. The identified value, C = 0.122, is
associated with a Nash and Sutcliffe, 1970 efficiency measure E = 0.981, with E 2 [1, 1] and E = 1 for a perfect fit.
The same C value applied to the average H1 values obtained
for BIMODAL – EV1 returns E = 0.997, and the application for
1HETSITE – GEV results in E = 0.961. Fig. 8 shows the scatterplots resulting from the application of Eq. (16) for 1HETSITE
– EV1 and 1HETSITE – GEV (left panel) and for BIMODAL –
EV1 (right panel), highlighting the reduction of categorization errors due to the application of the empirical corrector.
It is important to note that the nature of Eq. (16) is empirical and the proposed value of the coefficient C is inevitably
associated with the Monte Carlo simulation experiments performed in this study. Also, all simulations performed here
refer to hypothetical regions (i.e., group of flood sequences)
with constant cross-correlation coefficients among sequences and concurrent series of equal length. Generalisation of these aspects (see e.g., Castellarin, 2007) is an
open problem for future analyses.
Conclusions
The main objective of the present study is to show that the
presence of cross-correlation among annual flood sequences,
which generally cannot be ignored in practice, may significantly reduce the power of statistical homogeneity tests.
The study refers explicitly to the heterogeneity measure proposed by Hosking and Wallis (1993) based on L-moments, but
analogous considerations hold for all parametric and nonparametric tests proposed in the scientific literature.
First, the limitations of the considered test are examined
from a theoretical viewpoint by considering the effects of
sampling properties of L-moments for cross-correlated
regions. We adopted the concept of information content of
regional moments (see e.g. Matalas and Langbein, 1962; Stedinger, 1983) to express the impact of cross-correlation on
the variance of regional L-moments, and we showed via a
numerical experiment that the information content of regional L-moments corresponds to the information content of regional estimators of conventional moments.
Second, we assessed the effects of cross-correlation on
the power of the homogeneity test through a series of Monte
Carlo experiments. In particular, we quantified the impact
of cross-correlation on one of the heterogeneity measures
proposed by Hosking and Wallis, the measure that quantifies
the regional heterogeneity in terms of the dispersion of
sample L-Cv. We considered a number of different hypotheses on the size of the regional sample, heterogeneity degree, and regional parent distribution. The results indicate
that cross-correlation exerts a strong control on the considered homogeneity test and may lead to mis-categorization
of the groups of sequences (e.g., the group of sequences
may be regarded as possibly homogeneous, when it should
be regarded as heterogeneous).
Finally, we proposed an empirical corrector of the test
that provides an approximate indication of the actual degree of heterogeneity of the region (or group of sequences)
in the presence of cross-correlation.
Our study is approximate and represents a preliminary
effort at a comprehensive insight into the effects of crosscorrelation on homogeneity tests. We are persuaded that this
is an important issue, fundamental to regional frequency
analysis, which unjustifiably received very limited attention
from the scientific community. Further aspects should be addressed by future numerical analysis for a generalisation of
our study. Some relevant examples are: (i) the utilization
of a plausible intersite correlation model instead of using a
constant theoretical correlation coefficient among all sequences; (ii) a realistic variability of the series lengths within
the region (e.g., missing data at some gauges, installation of
new gauges, dismantlement of obsolete gauges, etc.) instead
of considering concurrent sequences of equal length; and (iii)
the assessment of the effects of cross-correlation on other
homogeneity tests proposed in the scientific literature, parametric and non-parametric.
We refer in our study to one particular homogeneity test,
in virtue of its notoriety and widespread utilization. Never-
Homogeneity testing: How homogeneous do heterogeneous cross-correlated regions seem?
theless, the conclusions we draw are general and suggest a
scrupulous revision, or at least an informed application, of
all homogeneity tests in the presence of significant intersite
dependence.
Acknowledgements
The research was partially supported by the Italian MIUR
(Ministry of Education, University and Research) through
the research Grant titled ‘‘Characterisation of average
and extreme flows in ungaged basins by integrated use of
data-based methods and hydrological modelling’’. The suggestions and comments of two anonymous reviewers are
gratefully acknowledged.
Appendix A. Simulation algorithm
The study adopts a simulation algorithm analogous to the
algorithm introduced by Hosking and Wallis (1988) to generate a large number of cross-correlated synthetic regions.
The algorithm assumes that if each site’s flood frequency
distribution were transformed to normality using the transformation F, then the joint distribution for all sites in the
region would be multivariate-normal. The simulation algorithm involves two main steps: (1) generation of a multivariate vector y having a multivariate normal distribution with
a given correlation matrix P; (2) application of the inverse
transformation F1 to obtain data with the required marginal distribution. A brief description of the simulation algorithm is given below:
1. Assume a region with R sites having record length n and
parent distribution of the annual flood denoted as FX
for R R* regular sites, and F X for R* discordant sites.
Also assume the cross-correlation coefficient among all
normalized floods is q, that is the diagonal elements of
the R-by-R matrix P are equal to one while non-diagonal
elements are all equal to q.
2. Generate the regional sample: (1) Generate a matrix
z = [z1, z2, . . . , zR] with R columns and n rows. Each column zj, j = 1, 2, . . . , R, contains n multivariate normal
deviates with zero mean, unit variance, and covariance
matrix P. (2) Transform the elements of the matrix z
belonging to regular sites into a realization from the cori
rect marginal distribution by setting x ij ¼ F 1
X ðUðzj ÞÞ,
with i = 1, 2, . . . , n and j = 1, 2, . . . , R R* and where U
is the cdf of the standard normal distribution. (3) Analogously, transform the elements of the matrix z belonging
to discordant sites into a realization from their marginal
i
distribution by setting x ik ¼ F 1
X ðUðzk ÞÞ, where i = 1,
2, . . . , n and k varies from (R R* + 1) to R.
3. Calculate the heterogeneity measure H1 for the synthetic
region as indicated by Hosking and Wallis (1993, 1997).
4. Repeat steps 2 and 3 10,000 times and calculate the
average of all H1 values.
The algorithm described above was applied by considering two different options that we termed 1HETSITE and
BIMODAL. Option 1HETSITE refers to a number of discordant
sites R* = 1, while R* = R/2 for option BIMODAL. R was arbi-
75
trarily set to 10, 20 and 30 sites, n was set to 10, 25 and 50
years, and q = 0.0, 0.2, . . . ,0.8.
A Gumbel distribution with unit mean and CV = 0.4,
EV1(1, 0.4), was used as parent distribution of regular annual floods, FX. The same distribution was used for generating cross-correlated annual floods in homogeneous regions
by Hosking and Wallis, 1988 [see section 4, p. 591]. The discordant sequences were generated from a EV1(1, Cv*) regional parent with Cv* = 0.1, 0.2, . . . , 0.7.
The algorithm was applied for option 1HETSITE a second
time by referring to a different set of regional parents. In
particular, a Generalized Extreme Value (GEV) distribution
with skewness coefficient c = 2.25, Cv = 0.4 and unit mean,
GEV(1, 0.4, 2.25), was considered as parent distribution for
regular sites. A GEV(1, Cv*, 2.25) with Cv* = 0.2, 0.4, 0.6
was used for discordant sites. This second group of 1HETSITE simulations considered R = 10, 30 sites and n = 10, 50
years.
It should be noted that the transformation from the multivariate normal deviates to a set of realizations from the
correct marginal distribution does not preserve the crosscorrelation structure. We performed a series of test runs
generating a number of cross-correlated pairs of sequences
of length 106, under various hypotheses for the marginal distributions. We found that the differences between the theoretical cross-correlation coefficient for the multivariate
normal deviates and the empirical cross-correlation coefficient after the transformation were always negligible for
practical purposes (maximum absolute difference 0.012).
References
Burn, D.H., 1990. Evaluation of regional flood frequency analysis
with a region of influence approach. Water Resources Research
26 (10), 2257–2265.
Castellarin, A., 2007. Probabilistic envelope curves for design flood
estimation at ungauged sites. Water Resources Research 43,
W04406. doi:10.1029/2005WR004384.
Castellarin, A., Camorani, G., Brath, A., 2007. Predicting annual
and long-term flow-duration curves in ungauged basins.
Advances in Water Resources 304, 937–953. doi:10.1016/
j.advwatres.2006.08.006.
Castellarin, A., Vogel, R.M., Matalas, N.C., 2005. Probabilistic
behavior of a regional envelope curve. Water Resources
Research 41, w06018. doi:10.1029/2004wr003042.
Chow, V.T.Editor in Chief, 1964. Handbook of Applied Hydrology,
Section 8-1. McGraw Hill, New York.
Dalrymple, T., 1949. In: Regional Flood Frequency: Presentation at
the 29th Annual Meeting of the Highway Research Board,
Washington, DC, 22 p., December 13.
Dalrymple, T., 1960. Flood frequency analyses, US Geology Survey
on Water Supply Paper 1543-A, Reston, VA.
Douglas, E.M., Vogel, R.M., Kroll, C.N., 2000. Trends in floods and
low flows in the United States: impact of spatial correlation.
Journal Of Hydrology 240 (1-2), 90–105.
Durbin, J., Knott, M., 1971. Components of Cramér – von Mises
Statistics, London School of Economy and Political Science, UK.
Elamir, E.A.H., Seheult, A.H., 2004. Exact variance structure of
sample L-moments. Journal of Statistical Planning and Inference
124, 337–359.
Fill, H.D., Stedinger, J.R., 1995. Homogeneity tests based upon
Gumbel distribution and a critical appraisal of Dalrymple’s test.
Journal of Hydrology 166, 81–105.
Greenwood, J.A., Landwehr, J.M., Matalas, N.C., Wallis, J.R.,
1979. Probability weighted moments: definition and relation to
76
parameters of several distributions expressible in inverse form.
Water Resources Research 15, 1049–1054.
Hosking, J.R.M., 1990. L-moments: analysis and estimation of
distributions using linear combination of order statistics. Journal
of Royal Statistical Society, Series B 52 (1), 105–124.
Hosking, J.R.M., Wallis, J.R., 1988. The effect of intersite dependence on regional flood frequency-analysis. Water Resources
Research 24 (4), 588–600.
Hosking, J.R.M., Wallis, J.R., 1993. Some useful statistics in regional
frequency analysis. Water Resources Research 29 (2), 271–281.
Hosking, J.R.M., Wallis, J.R., 1997. Regional frequency analysis –
an approach based on L-moments. Cambridge University Press,
New York, p. 224.
Jenkinson, A.F., 1955. The frequency distribution of the annual
maximum (or minimum) of meteorological elements. Quarterly
Journal Royal Meteorological Society 81, 158–171.
Lettenmaier, D.P., Wallis, J.R., Wood, E.F., 1987. Effect of regional
heterogeneity on flood frequency estimation. Water Resources
Research 23 (2), 313–323.
Lu, L.H., 1991. Statistical Methods for Regional Flood Frequency
Investigations. Ph.D. Dissertation, Cornell University, Ithaca, NY.
Lu, L., Stedinger, J.R., 1992. Sampling variance of normalized GEV/
PWM quantile estimators and a regional homogeneity test.
Journal of Hydrology 138, 223–245.
Madsen, H., Rosbjerg, D., 1997. Generalized least squares and
empirical Bayes estimation in regional partial duration series
index-flood modeling. Water Resources Research 33 (4), 771–
781.
Madsen, H., Mikkelsen, P.S., Rosbjerg, D., Harremoes, P., 2002.
Regional estimation of rainfall intensity-duration-frequency
curves using generalized least squares regression of partial
duration series statistics. Water Resources Research 38 (11),
1239. doi:10.1029/2001WR001125.
Matalas, N.C., Langbein, W.B., 1962. Information content of the
mean. Journal of Geophysical Research 67 (9), 3441–3448.
Nash, J.E., Sutcliffe, J.E., 1970. River flow forecasting through
conceptual models, Part 1-A discussion of principles. Journal of
Hydrology 10 (3), 282–290.
A. Castellarin et al.
Rosbjerg, D., 2007. Regional flood frequency analysis. In: Vasiliev,
O.F. et al. (Eds.), Extreme Hydrological Events: New Concepts
for Security, pp. 151–171.
Sankarasubramanian, A., Srinivasan, K., 1999. Investigation and
comparison of sampling properties of L-moments and conventional moments. Journal of Hydrology 218, 13–34.
Scholz, F.W., Stephens, M.A., 1987. K-sample Anderson–Darling
tests. Journal of American Statitistical Association 82, 918–924.
Singh, V.P., 1992. Elementary Hydrology. Prentice-Hall, Englewood
Cliffs, NJ, pp. 824–829.
Stedinger, J.R., 1983. Estimating a regional flood frequency
distribution. Water Resources Research 19, 503–510.
Stedinger, J.R., Lu, L., 1995. Appraisal of regional and index flood
quantile estimators. Stochastic Hydrology and Hydraulics 9 (1),
49–75.
Troutman, B.M., Karlinger, M.R., 2003. regional flood probabilities.
Water Resources Research 39 (4), 1095. doi:10.1029/
2001WR001140.
Viglione, A., Laio, F., Claps, P., 2007. A comparison of homogeneity
tests for regional frequency analysis. Water Resources Research
43, W03428. doi:10.1029/2006WR005095.
Vogel, R.M., Zafirakou-Koulouris, A., Matalas, N.C., 2001. Frequency of record breaking floods in the United States. Water
Resources Research 37 (6), 1723–1731.
Vogel, R.M., Matalas, N.C., England, J.F., Castellarin, A., 2007. An
assessment of exceedance probabilities of envelope curves.
Water Resources Research 43, W07403. doi:10.1029/
2006WR005586.
Wiltshire, S.E., 1986a. Regional flood frequency analysis I: homogeneity statistics. Hydrological Science Journal 31, 321–333.
Wiltshire, S.E., 1986b. Regional flood frequency analysis II: multivariate classification of drainage basins in Britain. Hydrological
Science Journal 31, 335–346.
Wiltshire, S.E., 1986c. Identification of homogeneous regions for
flood frequency analysis. Journal of Hydrology 84, 287–302.
Yule, G.U., 1945. A method of studying time series based on their
internal correlations. Journal of Royal Statistical Society 108,
208.