The formation of groups for regional flood

Hydrological Sciences-Journal~des Sciences Hydrologiques, 45( Î ) February 2000
The formation of groups for regional flood
frequency analysis
DONALD H. BURN
Department of Civil Engineering, University of Waterloo, Waterloo, Ontario N2L 3G I, Canada
e-mail: [email protected]
N. K. GOEL
Department of Hydrology, University ofRoorkee, Roorkee 247 667, India
Abstract A new technique is developed for identifying groups for regional flood
frequency analysis. The technique uses a clustering algorithm as a starting point for
partitioning the collection of catchments. The groups formed using the clustering
algorithm are subsequently revised to improve the regional characteristics based on
three requirements that are defined for effective groups. The result is overlapping
groups that can be used to estimate extreme flow quantiles for gauged or ungauged
catchments. The technique is applied to a collection of catchments from India and the
results indicate that regions with the desired characteristics can be identified using the
technique. The use of the groups for estimating extreme flow quantiles is
demonstrated for three example sites.
La formation de groupes pour l'estimation régionale de la fréquence
des crues
Résumé Nous avons développé une nouvelle technique afin de déterminer des
groupes de bassins pour estimer la fréquence des crues à l'échelle régionale. Le point
de départ de cette technique est un algorithme d'agrégation permettant de réaliser une
partition de l'ensemble des bassins considérés. Afin d'améliorer la régionalisation les
groupes ainsi formés sont ensuite modifiés en s'appuyant sur trois exigences définies
de telle sorte que les groupes soient utilisables avec efficacité. Le résultat consiste en
groupes pouvant se recouvrir qui peuvent être utilisés pour estimer les quantiles de
crues extrêmes de stations jaugées ou non. Cette technique a été appliquée à un
ensemble de bassins versants de l'Inde et les résultats montrent qu'elle permet
d'estimer les caractéristiques désirées. Trois sites de démonstration ont fait l'objet de
l'utilisation des groupes pour l'estimation des quantiles des crues extrêmes.
INTRODUCTION
Regional flood frequency analysis can be used to improve the estimation of extreme
flow quantiles at sites that have data record lengths that are short relative to the return
period of interest. Improved quantile estimates are obtained by using an increased
spatial characterization of the flow regime to compensate for an inadequate temporal
representation of the extreme flows at a given location. Either gauged or ungauged
locations can be analysed using regional flood frequency analysis. For gauged
locations, the aim is to augment the available, but generally limited, at-site information
for estimating extreme flow quantiles. Regional information becomes increasingly
important as the return period of interest increases. For an ungauged location,
information from hydrologically similar gauged catchments is used to characterize the
extreme flow regime for the ungauged catchment.
Open for discussion until 1 August 2000
97
98
Donald H. Burn & N. K. Goel
Regional flood frequency analysis involves the formation of a region that is
required to effect the spatial transfer of information. A region, in this context, has
come to mean a collection of catchments, not necessarily geographically contiguous,
that can be considered to be similar in terms of hydrological response. The goal of the
regionalization process is the identification of groupings of catchments that are sufficiently similar to warrant the combination of extreme flow information from sites
within the region. The regions defined should thus be homogeneous with respect to
extreme flow characteristics.
The present work addresses the identification of regions for regional flood
frequency analysis with a particular focus on the formation of regions that can be used
for estimating extreme flow quantiles for ungauged catchments. The approach uses, as
a starting point, a clustering technique that employs physiographic catchment characteristics to define similarity between catchments. The initially formed clusters
(regions) are subsequently modified, following a heuristic process, to improve upon
the regional characteristics.
An overview is provided of regionalization approaches that have been applied in
previous work, followed by a presentation of the region-forming process developed
herein. Application of the technique to a collection of catchments from India is
presented. The application describes the characteristics of both the geographical
regions currently in use and also the regions that result from applying the technique
developed in the research. The final section summarizes the important research conclusions and identifies avenues for further investigation.
REGIONALIZATION APPROACHES
Regional requirements
Regional flood frequency analysis typically begins with the definition of regions.
Regions are defined as subsets of the entire collection of sites for which extreme flow
information is available or required. A region can be considered to comprise a group of
sites from which extreme flow information can be combined for improving the
estimation of extreme flows at any site in the region. Hosking & Wallis (1997)
describe the benefits of regionalization for estimating extreme flow quantiles.
There are several characteristics that regions should possess to ensure effective
information transfer and therefore efficient estimation of extreme flow quantiles. The
first requirement is that the collection of catchments be hydrologically homogeneous.
This requirement arises from the need to ensure that the extreme flow information that
is transferred to a target site is similar to the extreme flow information at that site. The
importance of regional homogeneity for flood frequency analysis has been demonstrated (e.g. Lettenmaier et al., 1987; Stedinger & Lu, 1995). The second requirement
is that the region be identifiable, which implies that a regional home can be readily
determined for a new catchment which may be ungauged. The third requirement is that
the regions be sufficiently large. Larger regions imply that more extreme flow
information is incorporated into the estimation of extreme flow quantiles thus improving the estimates, provided that the extreme flow information is sufficiently
similar to that at the target site. It has been suggested (Jakob et al, 1999) that a region
The formation of groups for regionalfloodfrequency analysis
99
should ideally contain 5 T station-years of data in order to provide an effective estimate
for the flood event with a return period of T years. As the size of a region is increased,
there is a tendency for the homogeneity of the collection of catchments forming the
region to decrease. There is thus a trade-off between the first and third required
characteristics for a region, which requires the selection of an appropriate balancing
point (Reed et al, 1999).
Previous work
The delineation of regions was traditionally based on geographical, political,
administrative, or physiographic boundaries. However, regions formed in this manner
will not necessarily be homogeneous in terms of hydrological response, given the
potentially large amount of spatial variability in the physiographic or hydrological
characteristics of the catchments in this type of region.
Residuals from a regression model have been used to create geographically
contiguous regions based on grouping catchments with residuals of a similar sign and
magnitude (Wandle, 1977). Cluster analysis has also been used as a method of creating
regions (Tasker, 1982). Cluster analysis attempts to identify clusters (groups) of
catchments with the characteristic that the catchments within a cluster are similar,
while there is dissimilarity between the catchments in different clusters. Since different
types of variables can be used to define the similarity measure required to generate
clusters, the variables used should be carefully selected and weighted according to
their importance for the actual problem (Nathan & McMahon, 1990).
A novel regionalization approach avoids the use of fixed regions. This approach
allows each catchment to have a potentially unique set of catchments that constitutes
the region for the target site. This methodology involves the transfer of extreme flow
information from similar sites to the catchment of interest and was first suggested by
Acreman & Wiltshire (1987). The method was refined and referred to as the region of
influence (ROl) approach by Burn (1990). This approach requires the choice of a
threshold value that functions as a cut-off point for a dissimilarity measure. All sites
which have a dissimilarity measure with the target site that is greater than the threshold
value are excluded from the region of influence for that particular catchment.
The regionalization approaches that are most commonly used today require the
selection of variables that are used to define the similarity (or dissimilarity) for the
catchments. The two most common general types of variables used in similarity
measures are physiographic catchment characteristics (such as the drainage area, catchment slope, etc.) and flood statistics (such as the L-moment ratios, or other statistical
measures, calculated from the available flood series). Data for a sufficient number of
physiographic catchment characteristics are not always available, particularly for
applications in remote areas or in developing countries. As a result, the groupings
formed using the available physiographic variables may not be hydrologically
homogeneous.
The resulting regions are often hydrologically homogeneous when flood statistics
are used as the variables in the similarity measure. However with this approach, the
flood statistics are often used both to form the regions and to subsequently evaluate the
homogeneity of the collection of catchments in the region. This can result in regions
100
Donald H. Barn & N. K. Goel
that are homogeneous but not necessarily effective for regional flood frequency
analysis. This situation can occur if the homogeneity arises more by chance than as a
result of similarities in the hydrological regime for the catchments in the region. Burn
(1997) used seasonality statistics to avoid this dual use of statistics derived from flood
magnitudes. Seasonality statistics use data on the timing of flood events to define
catchment similarity, while reserving statistics derived from flood magnitudes for use
in evaluating the regional homogeneity. Seasonality statistics cannot be directly calculated for ungauged catchments.
REGION-FORMING PROCESS
Clustering technique
This work employs a clustering technique as the starting point for the formation of
regions that are homogeneous, identifiable, and sufficiently large. Clusters (groups) of
catchments are initially formed using the clustering algorithm outlined below and are
then modified, if necessary, to improve the homogeneity of the groups as well as the
size of the groups. The adjustments to the groups that are initially formed are accomplished using a heuristic process that is similar to that described by Burn et al. (1997).
The clustering technique used is the AT-means algorithm that can be used to
partition M objects (catchments) into K groups (regions) based on the values of /
features, or attributes, of the objects. The algorithm starts with an initial centroid, or
seed point, for each of the K clusters. Each of the objects is then assigned to the
"nearest" cluster centroid in terms of a similarity measure. Once all of the objects have
been assigned to a cluster, the centroid for each cluster is recalculated and the objects
may be reassigned to different clusters depending on the distance from the object to the
new centroid location. This process is repeated until no object experiences a change in
cluster membership. The dissimilarity measure that is used in this work to determine
the closeness of each catchment to each cluster centroid is defined as:
d„
D
(1)
l=—nr-
where D~ is the dissimilarity between catchment i and cluster j , Dy is the weighted
Euclidean distance (in attribute space) between catchment i and clustery, dy is the
geographical distance between catchment i and the geographic centroid of cluster j ,
and db and w are parameters to be estimated. Since equation (1) defines a dissimilarity
measure, lower values for Dy indicate catchments that are closer to the corresponding
cluster centroid.
The weighted Euclidean distance is defined as:
V.
D
,=
IJ
2
(2)
where Wk is the weight applied to attribute k, xik is the standardized value for attribute k
The formation ofgroups for regionalfloodfrequency analysis
101
for catchment i where the attribute values are standardized by dividing by the standard
deviation for the attribute, Xjk is the centroid value for attribute k for clustery, and m is
the number of attributes used to define similarity. The attributes used in equation (2)
are physiographic characteristics of the catchments. The weights in equation (2) are
selected to satisfy:
Zw*=1
(3)
The dissimilarity measure defined in equation (1) was originally proposed by Webster
& Burrough (1972) in an attempt to enhance the geographical continuity of clusters.
The two parameters, <4 and w, control the relative influence in the composite
dissimilarity measure of geographical distance vs the weighted Euclidean distance in
attribute space. Within the Euclidean distance measure, the Wk values reflect the
relative importance of each attribute in determining catchment dissimilarity. Initial
estimates for the weights and the two parameters in the distance function can be
selected using judgement and then be refined using either an optimization algorithm or
an ad hoc procedure. In this work, the parameters and weights were adjusted using a
search technique to enhance the homogeneity of the regions that are formed.
Regional homogeneity
The clusters formed are evaluated in terms of the homogeneity of the catchments
comprising the region. Regional homogeneity can be evaluated using one of a number
of tests that have been developed for this purpose. A commonly used approach is a
homogeneity test developed by Hosking & Wallis (1993). This test compares the
variability of L-moment ratios for the catchments in a region with the expected
variability, obtained from simulation, for a collection of catchments with the same
record lengths as those in the region. A statistic based on the weighted variance of the
L-coefficient of variation (L-CV) is derived from (Hosking & Wallis, 1993):
N
v =
±i
_
(4)
where V is the weighted variance of the L-CV for the region, m is the record length for
site i, t2(i) is the L-CV at site i, t2 is the group mean L-CV, and N is the number of sites
in the region. Calculation of the homogeneity measure requires an estimate of the
mean and standard deviation of V. This can be accomplished using simulation resulting
in estimates defined as \xv and ay, respectively. The homogeneity measure is then
defined as:
H =
YJZ^JL
(5)
where H is the homogeneity statistic. A region can be considered homogeneous if
H<\, possibly heterogeneous if 1 < H < 2, and definitely heterogeneous if H > 2
102
Donald H. Bum & N. K. Goel
(Hosking & Wallis, 1993). In this work, the goal is to define regions that result in a
homogeneity statistic of H< 1, although H values between one and two will be
considered acceptable.
Revisions to regions
The clustering procedure described above can be used to define initial groups of
catchments (regions). However, it is often found that the resulting groups do not meet
the three requirements for an effective region. Since the regions are formed based on
physiographic catchment characteristics, there is no concern with respect to identifying
a regional home for an ungauged catchment (i.e. the identifiable requirement).
However, there is no guarantee that the regions will be homogeneous and of a
sufficient size. Deficiencies in homogeneity and group size can be addressed through a
process of revising the regions that are initially formed by shifting catchments in a
process of homogeneity enhancement. This is outlined below.
Regional revision is a heuristic process in the sense that there is no set procedure
for how to move from one stage of the process to the next. Rather, the results from
each revision to the regions will often lead to insights as to what further changes would
prove to be most beneficial. The goals of the regional revision process are to increase
the homogeneity of the regions, as measured by the homogeneity test described in
equations (4) and (5), and to ensure each region is of a sufficient size, in accordance
with the 5T guideline. Each revision to the regions involves one of the following
options: (a) moving a catchment to a new region; (b) removing a catchment from its
current region and not assigning it to a new region; (c) replicating a catchment from
one region in one or more other regions (i.e. allowing a catchment to simultaneously
belong to more than one region); (d) merging two regions; and (e) splitting a region
into two (or more) new regions. Options (a), (b) and (e) are designed to improve the
homogeneity of a region that does not currently meet the homogeneity criterion, while
options (c) and (d) are designed to increase the size of regions that are considered to be
too small by the 5 7 guideline.
There are two fundamental steps in the process of determining if a catchment
should change regions (or be copied to an existing region). The first is to identify a
catchment that is a candidate for moving and the second is to evaluate the move. There
are two criteria used to determine if a catchment should be considered for moving from
its current region to a new region (or be copied to another region). The first is a discordancy measure that is calculated as a part of the homogeneity test described above.
The discordancy measure is useful to identify unusual flood series in terms of the
L-moment ratios. The discordancy measure is defined as (Hosking & Wallis, 1993):
D. = I jV( u . - u) T A"1 (u,. - u)
(6)
where A is the discordancy measure for catchment ;', N is the number of catchments in
the region, u,- is a vector containing the L-CV, L-skewness, and L-kurtosis for
catchment /', ïïis the regional average for u,-, and A is defined as:
A = f>,-u)(uy-ii)T
(7)
The formation ofgroups for regional flood frequency analysis
103
A large value for the discordancy measure indicates a catchment that could leave a
region in order to enhance the regional homogeneity. The second criterion is used to
identify potential new homes for a catchment. This criterion is the closeness of a
catchment to the centroid of each of the other clusters (regions), as defined through
equation (1). The acceptability of a proposed move is evaluated using the homogeneity
test. If moving a catchment from a region improves the homogeneity of the region,
then the removal of the catchment is accepted. If the homogeneity of the new region is
not adversely affected by the move, then the proposed new home is accepted. If a new
home for a catchment cannot be found, the catchment either remains in its current
region, or is removed from its current region and placed in a group of unallocated
catchments. Catchments in the group of unallocated catchments are periodically
considered for reintroduction into one of the existing regions. Following a successful
move (or the copying of a catchment to a new region), the regional centroids for all
affected regions are updated and a new revision is considered.
Region splitting and merging follow a similar process to that for catchment moves.
If a region is large, but not homogeneous, the splitting of the region is considered. The
region is divided, using the dissimilarity measure in equation (1) as a guide, to form
two or more new regions. If a region is very small, but homogeneous, then it may be
possible to consider merging the region with another region that is close in attribute
space (i.e. as measured by equation (1)). Proposed changes of this nature are again
evaluated using the homogeneity test. The process of region revision continues until no
further improvements to the regions can be identified.
Use of the regions
The process of copying catchments from one region to another means that any
catchment can potentially belong to more than one region. The region revision process
will thus result in a collection of potentially overlapping regions. If the regional
revision has been successful, the regions will be both homogeneous and of a sufficient
size to facilitate effective estimates for extreme flow quantiles. The regions can be
used to obtain quantile estimates either for gauged catchments that are members of one
or more of the regions or for ungauged catchments that must be assigned to an existing
region. For gauged catchments, the extreme flow information from all catchments in
the region is used to obtain a pooled estimate for the T-year flood event. This can be
done using the index flood procedure wherein a dimensionless regional growth curve
is scaled by an index flood value for the catchment to obtain estimates for extreme
flow quantiles for the target site. The regional growth curve is derived from extreme
flow information for the catchments in the region and the index flood value is calculated from the flood series for the target site. Traditionally, the mean of the annual
flood series has been used as the index flood. For catchments that belong to more than
one region, there are two options. An estimate of extreme flow quantiles for the
catchment could be obtained using extreme flow information from the region with
which the catchment has the greatest affinity, or an estimate could be obtained using a
weighted combination of the estimates from each of the regions to which the
catchment belongs. The latter approach would be preferred if the catchment was on the
border between two or more regions and was therefore not strongly associated with
any one region.
Donald H. Burn & N. K. Goel
104
For ungauged catchments, there are two changes to the procedure for gauged
catchments. First, as noted above, the catchment must be assigned to an existing
region. This can be done using the dissimilarity measure in equation (1) to determine
the distance from the catchment to each regional centroid. The catchment is then
assigned to the region to which it is closest. For ungauged catchments that are similar
to more than one region, it is possible to estimate extreme flow quantiles based on a
combination of information from more than one region. The second necessary revision
for ungauged catchments is the requirement to estimate, as opposed to calculate, the
index flood value for the ungauged catchment. This can be done from regression using
a relationship derived from physiographic catchment characteristics for the catchments
in the region to which the ungauged catchment is assigned.
APPLICATION
Description of study area
Economic constraints do not allow detailed hydrometric investigations at every new
site for the estimation of extreme flow quantiles for a large country such as India. The
regional flood frequency analysis approach and regional unit hydrograph based
approaches are in vogue in India for design flood estimation for ungauged catchments.
For this purpose, the country has been divided into seven zones and 26 sub-zones. For
preparing the flood estimation reports for these sub-zones, systematic and sustained
collection of hydrometric data at various representative catchments has been conducted
as a joint project by the Research, Design and Standards Organization of the Ministry
of Railways, Ministry of Transport, Central Water Commission and India
Meteorological Department. The available hydrometric data consist of an annual
maximum flood series for select locations within each sub-zone.
The focus in the present work is on Zone 3, which covers the river basins of
central India, as shown in Fig. 1. This zone has been further divided into the nine subzones described in Table 1. Parida et al. (1998) previously evaluated the homogeneity
of one sub-zone from Zone 3 while Kumar et al. (1999) developed flood formulae for
the sub-zones in Zone 3. Sub-zones 3(g) and 3(i) have not been considered in the
present study because of the lack of hydrometric data for these sub-zones.
Furthermore, some of the hydrometric gauging locations have been excluded from
Table 1 Summary of data available for the case study area.
Sub-zone
River basin
3(a)
3(b)
3(c)
3(d)
3(e)
3(f)
3(g)
3(h)
Mahi and Sabarmati
Lower Narmada and Tapi
Upper Narmada and Tapi
Mahanadi
Upper Godavari
Lower Godavari
Indravati
Krishna and Penner
Kaveri
3(i)
Number of sites
9
10
8
13
11
16
0
10
0
Catchment area
(km2)
30.1-1090
53.1-828
41.5-2110
58.0-1150
31.3-2230
50.0-824
Record length
(years)
14-25
12-26
19-30
13-30
14-32
14-29
64.8-1690
14-32
The formation ofgroups for regionalfloodfrequency analysis
.
>
(
i
070°E
-35°N
,
*
,
I
,
.
i
075°E
,
!
,
0J0°E
.
i
.
,
0S5°E
,
.
,
.
105
I ,
090°E
-30°N
Fig. 1 Location map for the case study area.
analysis either because of very short data records or because screening of the extreme
flow data revealed data that were suspect. The summary information presented in
Table 1 is for the 77 sites that are analysed in this work.
Geographic region alization
Table 2 presents the results of analysing the geographically defined sub-zones within
Zone 3 for homogeneity and group size. An evaluation of the acceptability of the
groups requires the selection of a target return period. If the 50-year event is selected,
then the 5T guideline implies that 250 station-years of record are required for an
effective quantile estimate. Table 2 reveals that only Sub-zones 3(d) and 3(f) meet this
criterion. Sub-zone 3(d) is heterogeneous while Sub-zone 3(f) is considered to be
possibly heterogeneous. Of the remaining sub-zones, two are too small by the 5T
guideline and three are clearly heterogeneous with H values much in excess of two.
Table 2 Characteristics of the geographic sub-zones.
Sub-zone number
3(a)
3(b)
3(c)
3(d)
3(e)
3(f)
3(h)
Number of sites
9
10
8
13
11
16
10
Station-years of record
174
195
180
279
229
363
234
Homogeneity statistic, H
L29
2.94
2.20
4.21
3.88
1.77
0.24
Donald H. Burn & N. K. Goel
106
Thus only one of the seven geographically defined sub-zones can be considered to be
acceptable for estimating extreme flow quantiles with a return period of the order of 50
years.
Clustering approach
Three physiographic catchment characteristics were used as attributes in the clustering
algorithm. The attributes used were the catchment area, the length of the main stream
of the river, and the slope of the main stream of the river. In addition to these three
attributes, the geographical position of each catchment is used in equation (1) to define
geographical separation. The weights that were used in equation (1) were determined
through a search technique. The search technique involved selecting initial estimates
for the parameters and then revising the values so as to improve upon the homogeneity
and size characteristics of the regions. The resulting values were 0.633, 0.316, and
0.051 for catchment area, main stream length, and main stream slope, respectively. The
values for db and w were 1270 and 19, respectively. The clustering algorithm was
applied using the weights noted above and the catchments were divided into seven
clusters. Seven clusters were initially selected to be consistent with the results from the
definition of geographic sub-zones. This number of clusters was maintained when
investigation of other numbers of clusters indicated no improvement in the regional
characteristics. The results are displayed in Table 3.
Table 3 Characteristics for the groups from the clustering algorithm.
Group number
1
2
3
4
5
6
7
Number of sites
13
12
5
11
7
17
12
Station-years of record
287
257
110
205
162
387
246
Homogeneity statistic, H
4~55
4.06
-1.16
1.53
-0.47
1.14
3.44
If the 50-year event is again selected as the target return period, then only one of
the regions (Group 6) can be considered acceptable in that it has a homogeneity
statistic close to one and contains more than 5 T station-years of data. A second region
(Group 4) can be considered marginally acceptable with slightly less than the required
number of station-years of record and a homogeneity statistic that is at the mid point of
the transition range from definitely heterogeneous to homogeneous. Two of the
remaining groups are too small (Groups 3 and 5) while three groups (Groups 1, 2
and 7) are heterogeneous. As a result, the regional revision process was implemented.
As noted above, the region revision process is heuristic in nature implying that
judgement is required throughout the process to aid in the determination of the next
step in the analysis. The regional revision process started with Group 7, the heterogeneous group with the lowest H value. This was selected as the starting point since it
was anticipated that minor revisions to this group would result in a homogeneous
collection of catchments. Two catchment removals from this group resulted in a homo-
The formation of groups for regional flood frequency analysis
107
Table 4 Characteristics for the final groups.
Group number
1
2
4
5
6
7
Number of sites
17
21
19
13
23
25
Station-years of record
379
445
384
292
507
527
Homogeneity statistic, H
L48
1.34
0.88
1.19
1.44
0.28
geneity value of less than two, which was considered to be acceptable. Several
catchment moves from Group 2 and from Group 1 also resulted in an acceptable
homogeneity for these two groups. At this stage, all groups were considered to be
acceptably homogeneous (H < 2) although several groups were small in size. The next
step was to improve upon the size of the groups through copying catchments from one
group to one or more of the other groups. The size of Group 2 was increased by
copying catchments from Groups 6 and 1 to Group 2. Catchments from Group 3,
which was a very small group, were dispersed to Groups 5 and 6 (one catchment each)
and Group 7 (the remaining three catchments). This left Groups 4 and 5 as small
groups. These two groups were increased in size by copying catchments from other
groups. The final step was to search for catchments that could be copied into any of the
groups in order to increase further the group size. Several catchments were identified
that could be copied to other groups giving the results displayed in Table 4. Note that
there are only six groups in Table 4 since the catchments from Group 3 were dispersed
to other groups. The group numbers from Table 3 are maintained for continuity
purposes. In addition to the catchments in the six groups, there are two catchments that
have not been allocated to any of the groups. These two catchments will be discussed
further below.
The final groups in Table 4 can be considered to be homogeneous, for practical
purposes, with all l v a l u e s less than 1.5. The number of station-years of record for the
groups varies from a low of 292 for Group 5 to a high of 527 for Group 7. In
accordance with the 5T guideline, these values imply that an estimate for the 60-year
flood could be obtained for any catchment in Group 5 while an estimate of the
100-year flood could be obtained for any catchment in Group 7. The number of
station-years of record for the collection of groups in Table 4 is 2534 vs 1654 stationyears in the non-overlapping groups defined in Table 3. Using overlapping groups
clearly represents a more effective use of the available extreme flow information.
An evaluation of the membership of each group reveals that the catchments belong
to 1.55 groups, on average, with 48 catchments belonging to only one group,
17 catchments belonging to two groups, ten catchments belonging to three groups and
two catchments each being a member of four groups. The two catchments belonging to
four groups belong to the same four groups while there are different combinations of
group memberships for the catchments that belong to three groups and for those that
belong to two groups. All groups have at least some catchments that are shared with
other groups, although the fraction of shared catchments varies with the group. Group 5,
the smallest group, has the least number (and smallest fraction) of shared catchments.
Figure 2 shows the location of the catchments in each of the groups along with the
two unallocated catchments. This figure shows that Group 1 is primarily in the central
108
Donald H. Burn & N. K. Goel
i
n
07S°E
,
,
,
.
|
,
,
,
,
080°E
|
085°E
O
o
*
<n>c m
8
/Legend
° Grcxpl
o Grcnp2
o Gr<xp4
A Grcxp5
v Grap6
> Grcxp7
* Unallocated
Fig. 2 Location of the catchments in the final groups.
part of the study area with roughly half of the catchments being shared catchments. This
group shares catchments with each of the other groups. Group 2 catchments are located
primarily in the northeastern part of the study area with a majority of the catchments
being shared catchments. Group 2 shares catchments with all groups except Group 5.
Group 4 catchments are located in the northwestern part of the study area with less than
half of the catchments being shared catchments. Group 4 shares catchments with Groups
1, 2, and 7. Group 5 catchments are located in the southern portion of the study area with
about 25% of the catchments shared with other groups. Group 5 catchments are shared
with Groups 1 and 7. Group 6 catchments are located in the north central part of the
study area with roughly half of the catchments shared with other groups. This group
shares catchments with Groups 1, 2, and 7. The catchments in Group 7 are centrally
located in the study area with the majority of the catchments being shared catchments.
Group 7 catchments are shared with each of the other groups.
The two unallocated catchments are also shown in Fig. 2. Interestingly, one of
these catchments (the one located closer to the western boundary of the study area) is
geographically close to several catchments that are members of multiple groups.
Further examination of the flood series for each of the unallocated catchments revealed
no unusual results. However, the catchments were sufficiently distinct in L-moment
space to prevent their successful inclusion in one of the defined groups.
Use of groups
This section demonstrates the use of the defined groups for estimating extreme flow
quantiles. Three examples are presented representing different situations in terms of
The formation of groups for regional flood frequency analysis
109
21 @ 3 (b)
3200
Distributions
1 . P3 - LMOM
2880-
User distributions
2. Group 4
2560tO 22400
E 1920O
1600
1280
960H
640
320H
Return period (years)
25
0
-1
0
~i
1
1
2
i
3
50
100
i
4
200
500
r
5
Gumbel reduced variate, y
Fig. 3 Extreme flow relationship for an "ungauged" catchment.
the information available and the characteristics of the target site. In the first example,
the estimation of extreme flow quantiles is simulated for the case of an ungauged site.
This is done by selecting a site that was not included in the original analysis and
estimating extreme flow quantiles following the procedures previously outlined. The
target site is a part of geographic Sub-zone 3(b) and is in fact gauged with a gauging
record of 17 years. A gauging record of this length is of limited use for estimating
extreme flow quantiles through at-site analysis, but can be expected to provide a
realistic estimate for the index flood value. If the catchment had actually been an
ungauged catchment, the index flood value could have been estimated using a
regression relationship in conjunction with the physiographic catchment characteristics. The catchment is assigned to Group 4 based on the proximity defined through
equation (1). The results of estimating the extreme flow relationship for this site are
displayed in Fig. 3. This figure shows the extreme flow relationship based on both the
at-site data and on the growth curve for Group 4. In both cases, the distribution
selected is the Pearson Type III (P3) distribution. This distribution was selected since it
is the distribution that provides the best fit to the data for Group 4 based on the
regional goodness of fit measure defined in Hosking & Wallis (1993). The curves in
Fig. 3 reveal that the regional fit is reasonable for the return periods of primary
interest. This curve differs substantially from the at-site curve, particularly for the
return periods generally used as the basis for design. For example, the estimate for the
100-year flood event is 1740 m3 s"1 using at-site data, and 2420 m 3 s"1 based on the
regional growth curve.
110
Donald H. Burn&N. K. Goel
66 @ 3 (e)
1350-
Distribution s
1.
1215-
P3 - LMOM
User
1080«
distributions
2 . Group 4
945-
CD
Ë
810-
o
675
540-I
405
270135-
Return period (years)
50
100 200
0
Gumbel reduced variate, y
Fig. 4 Extreme flow relationship for a catchment that was unallocated after the
regionalization process.
The second example examines one of the two catchments that were not allocated
to any of the regions. The first step was to assign the catchment to the region to which
it is closest, which is Group 4. Both regional and at-site extreme flow relationships
were then developed with the results displayed in Fig. 4. The target site has 16 years of
record, which is a very short record from which to estimate extreme flow quantiles.
The regional curve provides a reasonable fit to the at-site data given the limited record
length available. However, the regional curve and the at-site curve provide very
different results, which is most noticeable for the more extreme events. The estimate
for the 100-year event is 922 m 3 s"1 using at-site data and 520 m3 s~' based on the
regional growth curve. The results for this site reveal that even though extreme flow
information from the target site cannot be effectively used for Group 4, information
from the catchments in Group 4 can be used for effectively estimating quantiles at the
target site.
Similar analysis was conducted for the other catchment that was not allocated to
any of the groups. This catchment, which has 30 years of record, has an extreme flow
response that is quite different from that for the region to which it is closest (Group 2)
and also different from the other groups as well. It is concluded that this catchment is
not suitable for regional analysis, perhaps as a result of unique conditions that influence the extreme flows at this site.
The final example considers a target site that is included in multiple groups. The
catchment has Group 1 as its primary region but is also a member of Groups 5 and 7.
The estimation of extreme flow quantiles for this site could proceed using the regional
The formation ofgroups for regional flood frequency analysis
111
346 @ 3 (e)
1000-
3
4
Distributions
1. P3 - LMOM
900-
g
800-
2
700-
1
Dser distributions
2. Group 1
3. Group 5
4. Group 7
(D
E
Z3
O
600500-
-
400300200100-
Return period
'+
2
0
^~;^
I
I
5
I
10
1
25
1
50
100
1
(years)
200
1
500
1
i
-r -1
Gumbel reduced variate, y
Fig. 5 Extreme flow relationship for a catchment that is a member of multiple groups.
growth curve for Group 1 or could involve combining information from the growth
curves for the three regions. The extreme flow relationships, Fig. 5, include the at-site
curve as well as the three regional growth curves. The Group 1 curve is seen to provide
a very good fit to the 23 years of at-site data. The other two regional curves provide
fits that are reasonable, but that deviate more from the at-site curve than is the case for
Group 1. It can be concluded that the Group 1 growth curve can be used to estimate
extreme flow quantiles for this site.
CONCLUSIONS AND RECOMMENDATIONS
The regionalization process developed in this work has been demonstrated to result in
effective groups of catchments for regional flood frequency analysis. The use of a
clustering algorithm with limited physiographic information can still lead, through the
region revision process, to groups that are homogeneous, identifiable, and of a
sufficient size. Although the region revision component of the procedure is a heuristic
process, it can be used to effectively alter the original groups to enhance the regional
characteristics. The overlapping regions that result from the region revision process
represent a more effective use of the available extreme flow information than is the
case for distinct (non-overlapping) regions, particularly when the total number of
catchments is not large.
The application of the technique to data from rivers in India reveals that, as
expected, the procedure results in substantial improvements in the regional characteris-
112
Donald H. Bum & N. K. Goel
tics in comparison to the geographical regions. The procedure developed can be used
effectively as the basis for estimating extreme flow quantités for either gauged or
ungauged catchments.
Future work could explore the application of the methodology to data from other
locations to ascertain the generality of the results obtained with catchment data from
Zone 3 of India. It would also be useful to evaluate the performance of the technique
under conditions with differing amounts of available data. In this context, the impact
on the results of variations in the amount of both the hydrometric data and the physiographic catchment characteristic data could be examined.
Acknowledgements The research described has been partially supported by a grant
from the Natural Sciences and Engineering Research Council of Canada (NSERC).
This contribution is gratefully acknowledged. This work was commenced while the
second author was a visiting scholar in the Department of Civil Engineering,
University of Waterloo, Ontario, Canada. Figures 3-5 were produced using the
WINFAP-FEH software package developed by the Institute of Hydrology in the UK.
The authors are grateful to Dr Duncan Reed for providing a beta version of this
software. The present version of the paper has benefited greatly from the useful
comments provide by two anonymous reviewers.
REFERENCES
Acreman, M. C. & Wiltshire, S. E. (1987) Identification of regions for regional flood frequency analysis (abstract). EOS
68(44), 1262.
Burn, D. H. (1990) Evaluation of regional flood frequency analysis with a region of influence approach. Wat. Resour. Res.
26(10), 2257-2265.
Burn, D. H. (1997) Catchment similarity for regional flood frequency analysis using seasonality measures. J. Hydrol. 202,
212-230.
Burn, D. H.. Zrinji, Z. & Kowalchuk, M. (1997) Regionalization of catchments for regional flood frequency analysis.
J. Hydrol. Engng ASCE 2(2), 76-82.
Hosking, J. R. M. & Wallis, J. R. (1993) Some statistics useful in regional frequency analysis. Wat. Resour. Res. 29(2),
271-281.
Hosking, J. R. M. & Wallis, J. R. (1997) Regional Frequency Analysis: An Approach Based on L-Moments. Cambridge
University Press, Cambridge, UK.
Jakob, D., Reed, D. W. & Robson, A. J. (1999) Choosing a pooling-group, chapter C6. In: Flood Estimation Handbook,
vol. 3, Statistical Procedures for Flood Frequency Estimation. Institute of Hydrology, Wallingford, UK.
Kumar, R., Singh, R. D. & Seth, S. M. (1999). Regional flood formulas for seven subzones of Zone 3 of India. J. Hydrol.
Engng ASCE 4(3), 240-244.
Lettenmaier, D. P., Wallis, J. R. & Wood, E. F. (1987) Effect of regional heterogeneity on flood frequency estimation.
Wat. Resour. Res. 23(2), 313-323.
Nathan, R. J. & McMahon, T. A. (1990) Identification of homogeneous regions for the puipose of regionalization.
J. Hydrol. 121,217-238.
Parida, B. P., Kachroo, R. K. & Shrestha, D. B. (1998) Regional flood frequency analysis of Mahi-Sabarmati basin
(Subzone 3-a) using index flood procedure with L-moments. Wat. Resour. Manage. 12, 1-12.
Reed, D. W., Jakob, D., Robson, A. J., Faulkner, D. S. & Stewart, E. J. (1999) Regional frequency analysis: a newvocabulary. In: Hydrological Extremes: Understanding, Predicting. Mitigating (Proc. Birmingham Symp., July
1999) (ed. by L. Gottschalk, J.-C. Olivry, D. Reed & D. Rosbjerg), 237-243. IAHS Publ. no. 255.
Stedinger, J. R. & Lu, L. (1995) Appraisal of regional and index flood quantile estimators. Stochastic Hvdrol. Hydraul.
9(1), 49-75.
Tasker, G. D. (1982) Comparing methods of hydrologie regionalization. Wat. Resour. Bull. 18(6), 965-970.
Wandle, S. W. (1977) Estimating the magnitude and frequency of flood on natural streams in Massachusetts. USGS Water
Resources Investigations 77-39.
Webster, R. & Burrough, P. A. (1972) Computer-based soil mapping of small areas from sample data, II: classification
Received
30 JuneJ. 1999;
accepted
November 1999
smoothing.
Soil Sci.
23(2), 2222-234.