Hydrological Sciences-Journal~des Sciences Hydrologiques, 45( Î ) February 2000 The formation of groups for regional flood frequency analysis DONALD H. BURN Department of Civil Engineering, University of Waterloo, Waterloo, Ontario N2L 3G I, Canada e-mail: [email protected] N. K. GOEL Department of Hydrology, University ofRoorkee, Roorkee 247 667, India Abstract A new technique is developed for identifying groups for regional flood frequency analysis. The technique uses a clustering algorithm as a starting point for partitioning the collection of catchments. The groups formed using the clustering algorithm are subsequently revised to improve the regional characteristics based on three requirements that are defined for effective groups. The result is overlapping groups that can be used to estimate extreme flow quantiles for gauged or ungauged catchments. The technique is applied to a collection of catchments from India and the results indicate that regions with the desired characteristics can be identified using the technique. The use of the groups for estimating extreme flow quantiles is demonstrated for three example sites. La formation de groupes pour l'estimation régionale de la fréquence des crues Résumé Nous avons développé une nouvelle technique afin de déterminer des groupes de bassins pour estimer la fréquence des crues à l'échelle régionale. Le point de départ de cette technique est un algorithme d'agrégation permettant de réaliser une partition de l'ensemble des bassins considérés. Afin d'améliorer la régionalisation les groupes ainsi formés sont ensuite modifiés en s'appuyant sur trois exigences définies de telle sorte que les groupes soient utilisables avec efficacité. Le résultat consiste en groupes pouvant se recouvrir qui peuvent être utilisés pour estimer les quantiles de crues extrêmes de stations jaugées ou non. Cette technique a été appliquée à un ensemble de bassins versants de l'Inde et les résultats montrent qu'elle permet d'estimer les caractéristiques désirées. Trois sites de démonstration ont fait l'objet de l'utilisation des groupes pour l'estimation des quantiles des crues extrêmes. INTRODUCTION Regional flood frequency analysis can be used to improve the estimation of extreme flow quantiles at sites that have data record lengths that are short relative to the return period of interest. Improved quantile estimates are obtained by using an increased spatial characterization of the flow regime to compensate for an inadequate temporal representation of the extreme flows at a given location. Either gauged or ungauged locations can be analysed using regional flood frequency analysis. For gauged locations, the aim is to augment the available, but generally limited, at-site information for estimating extreme flow quantiles. Regional information becomes increasingly important as the return period of interest increases. For an ungauged location, information from hydrologically similar gauged catchments is used to characterize the extreme flow regime for the ungauged catchment. Open for discussion until 1 August 2000 97 98 Donald H. Burn & N. K. Goel Regional flood frequency analysis involves the formation of a region that is required to effect the spatial transfer of information. A region, in this context, has come to mean a collection of catchments, not necessarily geographically contiguous, that can be considered to be similar in terms of hydrological response. The goal of the regionalization process is the identification of groupings of catchments that are sufficiently similar to warrant the combination of extreme flow information from sites within the region. The regions defined should thus be homogeneous with respect to extreme flow characteristics. The present work addresses the identification of regions for regional flood frequency analysis with a particular focus on the formation of regions that can be used for estimating extreme flow quantiles for ungauged catchments. The approach uses, as a starting point, a clustering technique that employs physiographic catchment characteristics to define similarity between catchments. The initially formed clusters (regions) are subsequently modified, following a heuristic process, to improve upon the regional characteristics. An overview is provided of regionalization approaches that have been applied in previous work, followed by a presentation of the region-forming process developed herein. Application of the technique to a collection of catchments from India is presented. The application describes the characteristics of both the geographical regions currently in use and also the regions that result from applying the technique developed in the research. The final section summarizes the important research conclusions and identifies avenues for further investigation. REGIONALIZATION APPROACHES Regional requirements Regional flood frequency analysis typically begins with the definition of regions. Regions are defined as subsets of the entire collection of sites for which extreme flow information is available or required. A region can be considered to comprise a group of sites from which extreme flow information can be combined for improving the estimation of extreme flows at any site in the region. Hosking & Wallis (1997) describe the benefits of regionalization for estimating extreme flow quantiles. There are several characteristics that regions should possess to ensure effective information transfer and therefore efficient estimation of extreme flow quantiles. The first requirement is that the collection of catchments be hydrologically homogeneous. This requirement arises from the need to ensure that the extreme flow information that is transferred to a target site is similar to the extreme flow information at that site. The importance of regional homogeneity for flood frequency analysis has been demonstrated (e.g. Lettenmaier et al., 1987; Stedinger & Lu, 1995). The second requirement is that the region be identifiable, which implies that a regional home can be readily determined for a new catchment which may be ungauged. The third requirement is that the regions be sufficiently large. Larger regions imply that more extreme flow information is incorporated into the estimation of extreme flow quantiles thus improving the estimates, provided that the extreme flow information is sufficiently similar to that at the target site. It has been suggested (Jakob et al, 1999) that a region The formation of groups for regionalfloodfrequency analysis 99 should ideally contain 5 T station-years of data in order to provide an effective estimate for the flood event with a return period of T years. As the size of a region is increased, there is a tendency for the homogeneity of the collection of catchments forming the region to decrease. There is thus a trade-off between the first and third required characteristics for a region, which requires the selection of an appropriate balancing point (Reed et al, 1999). Previous work The delineation of regions was traditionally based on geographical, political, administrative, or physiographic boundaries. However, regions formed in this manner will not necessarily be homogeneous in terms of hydrological response, given the potentially large amount of spatial variability in the physiographic or hydrological characteristics of the catchments in this type of region. Residuals from a regression model have been used to create geographically contiguous regions based on grouping catchments with residuals of a similar sign and magnitude (Wandle, 1977). Cluster analysis has also been used as a method of creating regions (Tasker, 1982). Cluster analysis attempts to identify clusters (groups) of catchments with the characteristic that the catchments within a cluster are similar, while there is dissimilarity between the catchments in different clusters. Since different types of variables can be used to define the similarity measure required to generate clusters, the variables used should be carefully selected and weighted according to their importance for the actual problem (Nathan & McMahon, 1990). A novel regionalization approach avoids the use of fixed regions. This approach allows each catchment to have a potentially unique set of catchments that constitutes the region for the target site. This methodology involves the transfer of extreme flow information from similar sites to the catchment of interest and was first suggested by Acreman & Wiltshire (1987). The method was refined and referred to as the region of influence (ROl) approach by Burn (1990). This approach requires the choice of a threshold value that functions as a cut-off point for a dissimilarity measure. All sites which have a dissimilarity measure with the target site that is greater than the threshold value are excluded from the region of influence for that particular catchment. The regionalization approaches that are most commonly used today require the selection of variables that are used to define the similarity (or dissimilarity) for the catchments. The two most common general types of variables used in similarity measures are physiographic catchment characteristics (such as the drainage area, catchment slope, etc.) and flood statistics (such as the L-moment ratios, or other statistical measures, calculated from the available flood series). Data for a sufficient number of physiographic catchment characteristics are not always available, particularly for applications in remote areas or in developing countries. As a result, the groupings formed using the available physiographic variables may not be hydrologically homogeneous. The resulting regions are often hydrologically homogeneous when flood statistics are used as the variables in the similarity measure. However with this approach, the flood statistics are often used both to form the regions and to subsequently evaluate the homogeneity of the collection of catchments in the region. This can result in regions 100 Donald H. Barn & N. K. Goel that are homogeneous but not necessarily effective for regional flood frequency analysis. This situation can occur if the homogeneity arises more by chance than as a result of similarities in the hydrological regime for the catchments in the region. Burn (1997) used seasonality statistics to avoid this dual use of statistics derived from flood magnitudes. Seasonality statistics use data on the timing of flood events to define catchment similarity, while reserving statistics derived from flood magnitudes for use in evaluating the regional homogeneity. Seasonality statistics cannot be directly calculated for ungauged catchments. REGION-FORMING PROCESS Clustering technique This work employs a clustering technique as the starting point for the formation of regions that are homogeneous, identifiable, and sufficiently large. Clusters (groups) of catchments are initially formed using the clustering algorithm outlined below and are then modified, if necessary, to improve the homogeneity of the groups as well as the size of the groups. The adjustments to the groups that are initially formed are accomplished using a heuristic process that is similar to that described by Burn et al. (1997). The clustering technique used is the AT-means algorithm that can be used to partition M objects (catchments) into K groups (regions) based on the values of / features, or attributes, of the objects. The algorithm starts with an initial centroid, or seed point, for each of the K clusters. Each of the objects is then assigned to the "nearest" cluster centroid in terms of a similarity measure. Once all of the objects have been assigned to a cluster, the centroid for each cluster is recalculated and the objects may be reassigned to different clusters depending on the distance from the object to the new centroid location. This process is repeated until no object experiences a change in cluster membership. The dissimilarity measure that is used in this work to determine the closeness of each catchment to each cluster centroid is defined as: d„ D (1) l=—nr- where D~ is the dissimilarity between catchment i and cluster j , Dy is the weighted Euclidean distance (in attribute space) between catchment i and clustery, dy is the geographical distance between catchment i and the geographic centroid of cluster j , and db and w are parameters to be estimated. Since equation (1) defines a dissimilarity measure, lower values for Dy indicate catchments that are closer to the corresponding cluster centroid. The weighted Euclidean distance is defined as: V. D ,= IJ 2 (2) where Wk is the weight applied to attribute k, xik is the standardized value for attribute k The formation ofgroups for regionalfloodfrequency analysis 101 for catchment i where the attribute values are standardized by dividing by the standard deviation for the attribute, Xjk is the centroid value for attribute k for clustery, and m is the number of attributes used to define similarity. The attributes used in equation (2) are physiographic characteristics of the catchments. The weights in equation (2) are selected to satisfy: Zw*=1 (3) The dissimilarity measure defined in equation (1) was originally proposed by Webster & Burrough (1972) in an attempt to enhance the geographical continuity of clusters. The two parameters, <4 and w, control the relative influence in the composite dissimilarity measure of geographical distance vs the weighted Euclidean distance in attribute space. Within the Euclidean distance measure, the Wk values reflect the relative importance of each attribute in determining catchment dissimilarity. Initial estimates for the weights and the two parameters in the distance function can be selected using judgement and then be refined using either an optimization algorithm or an ad hoc procedure. In this work, the parameters and weights were adjusted using a search technique to enhance the homogeneity of the regions that are formed. Regional homogeneity The clusters formed are evaluated in terms of the homogeneity of the catchments comprising the region. Regional homogeneity can be evaluated using one of a number of tests that have been developed for this purpose. A commonly used approach is a homogeneity test developed by Hosking & Wallis (1993). This test compares the variability of L-moment ratios for the catchments in a region with the expected variability, obtained from simulation, for a collection of catchments with the same record lengths as those in the region. A statistic based on the weighted variance of the L-coefficient of variation (L-CV) is derived from (Hosking & Wallis, 1993): N v = ±i _ (4) where V is the weighted variance of the L-CV for the region, m is the record length for site i, t2(i) is the L-CV at site i, t2 is the group mean L-CV, and N is the number of sites in the region. Calculation of the homogeneity measure requires an estimate of the mean and standard deviation of V. This can be accomplished using simulation resulting in estimates defined as \xv and ay, respectively. The homogeneity measure is then defined as: H = YJZ^JL (5) where H is the homogeneity statistic. A region can be considered homogeneous if H<\, possibly heterogeneous if 1 < H < 2, and definitely heterogeneous if H > 2 102 Donald H. Bum & N. K. Goel (Hosking & Wallis, 1993). In this work, the goal is to define regions that result in a homogeneity statistic of H< 1, although H values between one and two will be considered acceptable. Revisions to regions The clustering procedure described above can be used to define initial groups of catchments (regions). However, it is often found that the resulting groups do not meet the three requirements for an effective region. Since the regions are formed based on physiographic catchment characteristics, there is no concern with respect to identifying a regional home for an ungauged catchment (i.e. the identifiable requirement). However, there is no guarantee that the regions will be homogeneous and of a sufficient size. Deficiencies in homogeneity and group size can be addressed through a process of revising the regions that are initially formed by shifting catchments in a process of homogeneity enhancement. This is outlined below. Regional revision is a heuristic process in the sense that there is no set procedure for how to move from one stage of the process to the next. Rather, the results from each revision to the regions will often lead to insights as to what further changes would prove to be most beneficial. The goals of the regional revision process are to increase the homogeneity of the regions, as measured by the homogeneity test described in equations (4) and (5), and to ensure each region is of a sufficient size, in accordance with the 5T guideline. Each revision to the regions involves one of the following options: (a) moving a catchment to a new region; (b) removing a catchment from its current region and not assigning it to a new region; (c) replicating a catchment from one region in one or more other regions (i.e. allowing a catchment to simultaneously belong to more than one region); (d) merging two regions; and (e) splitting a region into two (or more) new regions. Options (a), (b) and (e) are designed to improve the homogeneity of a region that does not currently meet the homogeneity criterion, while options (c) and (d) are designed to increase the size of regions that are considered to be too small by the 5 7 guideline. There are two fundamental steps in the process of determining if a catchment should change regions (or be copied to an existing region). The first is to identify a catchment that is a candidate for moving and the second is to evaluate the move. There are two criteria used to determine if a catchment should be considered for moving from its current region to a new region (or be copied to another region). The first is a discordancy measure that is calculated as a part of the homogeneity test described above. The discordancy measure is useful to identify unusual flood series in terms of the L-moment ratios. The discordancy measure is defined as (Hosking & Wallis, 1993): D. = I jV( u . - u) T A"1 (u,. - u) (6) where A is the discordancy measure for catchment ;', N is the number of catchments in the region, u,- is a vector containing the L-CV, L-skewness, and L-kurtosis for catchment /', ïïis the regional average for u,-, and A is defined as: A = f>,-u)(uy-ii)T (7) The formation ofgroups for regional flood frequency analysis 103 A large value for the discordancy measure indicates a catchment that could leave a region in order to enhance the regional homogeneity. The second criterion is used to identify potential new homes for a catchment. This criterion is the closeness of a catchment to the centroid of each of the other clusters (regions), as defined through equation (1). The acceptability of a proposed move is evaluated using the homogeneity test. If moving a catchment from a region improves the homogeneity of the region, then the removal of the catchment is accepted. If the homogeneity of the new region is not adversely affected by the move, then the proposed new home is accepted. If a new home for a catchment cannot be found, the catchment either remains in its current region, or is removed from its current region and placed in a group of unallocated catchments. Catchments in the group of unallocated catchments are periodically considered for reintroduction into one of the existing regions. Following a successful move (or the copying of a catchment to a new region), the regional centroids for all affected regions are updated and a new revision is considered. Region splitting and merging follow a similar process to that for catchment moves. If a region is large, but not homogeneous, the splitting of the region is considered. The region is divided, using the dissimilarity measure in equation (1) as a guide, to form two or more new regions. If a region is very small, but homogeneous, then it may be possible to consider merging the region with another region that is close in attribute space (i.e. as measured by equation (1)). Proposed changes of this nature are again evaluated using the homogeneity test. The process of region revision continues until no further improvements to the regions can be identified. Use of the regions The process of copying catchments from one region to another means that any catchment can potentially belong to more than one region. The region revision process will thus result in a collection of potentially overlapping regions. If the regional revision has been successful, the regions will be both homogeneous and of a sufficient size to facilitate effective estimates for extreme flow quantiles. The regions can be used to obtain quantile estimates either for gauged catchments that are members of one or more of the regions or for ungauged catchments that must be assigned to an existing region. For gauged catchments, the extreme flow information from all catchments in the region is used to obtain a pooled estimate for the T-year flood event. This can be done using the index flood procedure wherein a dimensionless regional growth curve is scaled by an index flood value for the catchment to obtain estimates for extreme flow quantiles for the target site. The regional growth curve is derived from extreme flow information for the catchments in the region and the index flood value is calculated from the flood series for the target site. Traditionally, the mean of the annual flood series has been used as the index flood. For catchments that belong to more than one region, there are two options. An estimate of extreme flow quantiles for the catchment could be obtained using extreme flow information from the region with which the catchment has the greatest affinity, or an estimate could be obtained using a weighted combination of the estimates from each of the regions to which the catchment belongs. The latter approach would be preferred if the catchment was on the border between two or more regions and was therefore not strongly associated with any one region. Donald H. Burn & N. K. Goel 104 For ungauged catchments, there are two changes to the procedure for gauged catchments. First, as noted above, the catchment must be assigned to an existing region. This can be done using the dissimilarity measure in equation (1) to determine the distance from the catchment to each regional centroid. The catchment is then assigned to the region to which it is closest. For ungauged catchments that are similar to more than one region, it is possible to estimate extreme flow quantiles based on a combination of information from more than one region. The second necessary revision for ungauged catchments is the requirement to estimate, as opposed to calculate, the index flood value for the ungauged catchment. This can be done from regression using a relationship derived from physiographic catchment characteristics for the catchments in the region to which the ungauged catchment is assigned. APPLICATION Description of study area Economic constraints do not allow detailed hydrometric investigations at every new site for the estimation of extreme flow quantiles for a large country such as India. The regional flood frequency analysis approach and regional unit hydrograph based approaches are in vogue in India for design flood estimation for ungauged catchments. For this purpose, the country has been divided into seven zones and 26 sub-zones. For preparing the flood estimation reports for these sub-zones, systematic and sustained collection of hydrometric data at various representative catchments has been conducted as a joint project by the Research, Design and Standards Organization of the Ministry of Railways, Ministry of Transport, Central Water Commission and India Meteorological Department. The available hydrometric data consist of an annual maximum flood series for select locations within each sub-zone. The focus in the present work is on Zone 3, which covers the river basins of central India, as shown in Fig. 1. This zone has been further divided into the nine subzones described in Table 1. Parida et al. (1998) previously evaluated the homogeneity of one sub-zone from Zone 3 while Kumar et al. (1999) developed flood formulae for the sub-zones in Zone 3. Sub-zones 3(g) and 3(i) have not been considered in the present study because of the lack of hydrometric data for these sub-zones. Furthermore, some of the hydrometric gauging locations have been excluded from Table 1 Summary of data available for the case study area. Sub-zone River basin 3(a) 3(b) 3(c) 3(d) 3(e) 3(f) 3(g) 3(h) Mahi and Sabarmati Lower Narmada and Tapi Upper Narmada and Tapi Mahanadi Upper Godavari Lower Godavari Indravati Krishna and Penner Kaveri 3(i) Number of sites 9 10 8 13 11 16 0 10 0 Catchment area (km2) 30.1-1090 53.1-828 41.5-2110 58.0-1150 31.3-2230 50.0-824 Record length (years) 14-25 12-26 19-30 13-30 14-32 14-29 64.8-1690 14-32 The formation ofgroups for regionalfloodfrequency analysis . > ( i 070°E -35°N , * , I , . i 075°E , ! , 0J0°E . i . , 0S5°E , . , . 105 I , 090°E -30°N Fig. 1 Location map for the case study area. analysis either because of very short data records or because screening of the extreme flow data revealed data that were suspect. The summary information presented in Table 1 is for the 77 sites that are analysed in this work. Geographic region alization Table 2 presents the results of analysing the geographically defined sub-zones within Zone 3 for homogeneity and group size. An evaluation of the acceptability of the groups requires the selection of a target return period. If the 50-year event is selected, then the 5T guideline implies that 250 station-years of record are required for an effective quantile estimate. Table 2 reveals that only Sub-zones 3(d) and 3(f) meet this criterion. Sub-zone 3(d) is heterogeneous while Sub-zone 3(f) is considered to be possibly heterogeneous. Of the remaining sub-zones, two are too small by the 5T guideline and three are clearly heterogeneous with H values much in excess of two. Table 2 Characteristics of the geographic sub-zones. Sub-zone number 3(a) 3(b) 3(c) 3(d) 3(e) 3(f) 3(h) Number of sites 9 10 8 13 11 16 10 Station-years of record 174 195 180 279 229 363 234 Homogeneity statistic, H L29 2.94 2.20 4.21 3.88 1.77 0.24 Donald H. Burn & N. K. Goel 106 Thus only one of the seven geographically defined sub-zones can be considered to be acceptable for estimating extreme flow quantiles with a return period of the order of 50 years. Clustering approach Three physiographic catchment characteristics were used as attributes in the clustering algorithm. The attributes used were the catchment area, the length of the main stream of the river, and the slope of the main stream of the river. In addition to these three attributes, the geographical position of each catchment is used in equation (1) to define geographical separation. The weights that were used in equation (1) were determined through a search technique. The search technique involved selecting initial estimates for the parameters and then revising the values so as to improve upon the homogeneity and size characteristics of the regions. The resulting values were 0.633, 0.316, and 0.051 for catchment area, main stream length, and main stream slope, respectively. The values for db and w were 1270 and 19, respectively. The clustering algorithm was applied using the weights noted above and the catchments were divided into seven clusters. Seven clusters were initially selected to be consistent with the results from the definition of geographic sub-zones. This number of clusters was maintained when investigation of other numbers of clusters indicated no improvement in the regional characteristics. The results are displayed in Table 3. Table 3 Characteristics for the groups from the clustering algorithm. Group number 1 2 3 4 5 6 7 Number of sites 13 12 5 11 7 17 12 Station-years of record 287 257 110 205 162 387 246 Homogeneity statistic, H 4~55 4.06 -1.16 1.53 -0.47 1.14 3.44 If the 50-year event is again selected as the target return period, then only one of the regions (Group 6) can be considered acceptable in that it has a homogeneity statistic close to one and contains more than 5 T station-years of data. A second region (Group 4) can be considered marginally acceptable with slightly less than the required number of station-years of record and a homogeneity statistic that is at the mid point of the transition range from definitely heterogeneous to homogeneous. Two of the remaining groups are too small (Groups 3 and 5) while three groups (Groups 1, 2 and 7) are heterogeneous. As a result, the regional revision process was implemented. As noted above, the region revision process is heuristic in nature implying that judgement is required throughout the process to aid in the determination of the next step in the analysis. The regional revision process started with Group 7, the heterogeneous group with the lowest H value. This was selected as the starting point since it was anticipated that minor revisions to this group would result in a homogeneous collection of catchments. Two catchment removals from this group resulted in a homo- The formation of groups for regional flood frequency analysis 107 Table 4 Characteristics for the final groups. Group number 1 2 4 5 6 7 Number of sites 17 21 19 13 23 25 Station-years of record 379 445 384 292 507 527 Homogeneity statistic, H L48 1.34 0.88 1.19 1.44 0.28 geneity value of less than two, which was considered to be acceptable. Several catchment moves from Group 2 and from Group 1 also resulted in an acceptable homogeneity for these two groups. At this stage, all groups were considered to be acceptably homogeneous (H < 2) although several groups were small in size. The next step was to improve upon the size of the groups through copying catchments from one group to one or more of the other groups. The size of Group 2 was increased by copying catchments from Groups 6 and 1 to Group 2. Catchments from Group 3, which was a very small group, were dispersed to Groups 5 and 6 (one catchment each) and Group 7 (the remaining three catchments). This left Groups 4 and 5 as small groups. These two groups were increased in size by copying catchments from other groups. The final step was to search for catchments that could be copied into any of the groups in order to increase further the group size. Several catchments were identified that could be copied to other groups giving the results displayed in Table 4. Note that there are only six groups in Table 4 since the catchments from Group 3 were dispersed to other groups. The group numbers from Table 3 are maintained for continuity purposes. In addition to the catchments in the six groups, there are two catchments that have not been allocated to any of the groups. These two catchments will be discussed further below. The final groups in Table 4 can be considered to be homogeneous, for practical purposes, with all l v a l u e s less than 1.5. The number of station-years of record for the groups varies from a low of 292 for Group 5 to a high of 527 for Group 7. In accordance with the 5T guideline, these values imply that an estimate for the 60-year flood could be obtained for any catchment in Group 5 while an estimate of the 100-year flood could be obtained for any catchment in Group 7. The number of station-years of record for the collection of groups in Table 4 is 2534 vs 1654 stationyears in the non-overlapping groups defined in Table 3. Using overlapping groups clearly represents a more effective use of the available extreme flow information. An evaluation of the membership of each group reveals that the catchments belong to 1.55 groups, on average, with 48 catchments belonging to only one group, 17 catchments belonging to two groups, ten catchments belonging to three groups and two catchments each being a member of four groups. The two catchments belonging to four groups belong to the same four groups while there are different combinations of group memberships for the catchments that belong to three groups and for those that belong to two groups. All groups have at least some catchments that are shared with other groups, although the fraction of shared catchments varies with the group. Group 5, the smallest group, has the least number (and smallest fraction) of shared catchments. Figure 2 shows the location of the catchments in each of the groups along with the two unallocated catchments. This figure shows that Group 1 is primarily in the central 108 Donald H. Burn & N. K. Goel i n 07S°E , , , . | , , , , 080°E | 085°E O o * <n>c m 8 /Legend ° Grcxpl o Grcnp2 o Gr<xp4 A Grcxp5 v Grap6 > Grcxp7 * Unallocated Fig. 2 Location of the catchments in the final groups. part of the study area with roughly half of the catchments being shared catchments. This group shares catchments with each of the other groups. Group 2 catchments are located primarily in the northeastern part of the study area with a majority of the catchments being shared catchments. Group 2 shares catchments with all groups except Group 5. Group 4 catchments are located in the northwestern part of the study area with less than half of the catchments being shared catchments. Group 4 shares catchments with Groups 1, 2, and 7. Group 5 catchments are located in the southern portion of the study area with about 25% of the catchments shared with other groups. Group 5 catchments are shared with Groups 1 and 7. Group 6 catchments are located in the north central part of the study area with roughly half of the catchments shared with other groups. This group shares catchments with Groups 1, 2, and 7. The catchments in Group 7 are centrally located in the study area with the majority of the catchments being shared catchments. Group 7 catchments are shared with each of the other groups. The two unallocated catchments are also shown in Fig. 2. Interestingly, one of these catchments (the one located closer to the western boundary of the study area) is geographically close to several catchments that are members of multiple groups. Further examination of the flood series for each of the unallocated catchments revealed no unusual results. However, the catchments were sufficiently distinct in L-moment space to prevent their successful inclusion in one of the defined groups. Use of groups This section demonstrates the use of the defined groups for estimating extreme flow quantiles. Three examples are presented representing different situations in terms of The formation of groups for regional flood frequency analysis 109 21 @ 3 (b) 3200 Distributions 1 . P3 - LMOM 2880- User distributions 2. Group 4 2560tO 22400 E 1920O 1600 1280 960H 640 320H Return period (years) 25 0 -1 0 ~i 1 1 2 i 3 50 100 i 4 200 500 r 5 Gumbel reduced variate, y Fig. 3 Extreme flow relationship for an "ungauged" catchment. the information available and the characteristics of the target site. In the first example, the estimation of extreme flow quantiles is simulated for the case of an ungauged site. This is done by selecting a site that was not included in the original analysis and estimating extreme flow quantiles following the procedures previously outlined. The target site is a part of geographic Sub-zone 3(b) and is in fact gauged with a gauging record of 17 years. A gauging record of this length is of limited use for estimating extreme flow quantiles through at-site analysis, but can be expected to provide a realistic estimate for the index flood value. If the catchment had actually been an ungauged catchment, the index flood value could have been estimated using a regression relationship in conjunction with the physiographic catchment characteristics. The catchment is assigned to Group 4 based on the proximity defined through equation (1). The results of estimating the extreme flow relationship for this site are displayed in Fig. 3. This figure shows the extreme flow relationship based on both the at-site data and on the growth curve for Group 4. In both cases, the distribution selected is the Pearson Type III (P3) distribution. This distribution was selected since it is the distribution that provides the best fit to the data for Group 4 based on the regional goodness of fit measure defined in Hosking & Wallis (1993). The curves in Fig. 3 reveal that the regional fit is reasonable for the return periods of primary interest. This curve differs substantially from the at-site curve, particularly for the return periods generally used as the basis for design. For example, the estimate for the 100-year flood event is 1740 m3 s"1 using at-site data, and 2420 m 3 s"1 based on the regional growth curve. 110 Donald H. Burn&N. K. Goel 66 @ 3 (e) 1350- Distribution s 1. 1215- P3 - LMOM User 1080« distributions 2 . Group 4 945- CD Ë 810- o 675 540-I 405 270135- Return period (years) 50 100 200 0 Gumbel reduced variate, y Fig. 4 Extreme flow relationship for a catchment that was unallocated after the regionalization process. The second example examines one of the two catchments that were not allocated to any of the regions. The first step was to assign the catchment to the region to which it is closest, which is Group 4. Both regional and at-site extreme flow relationships were then developed with the results displayed in Fig. 4. The target site has 16 years of record, which is a very short record from which to estimate extreme flow quantiles. The regional curve provides a reasonable fit to the at-site data given the limited record length available. However, the regional curve and the at-site curve provide very different results, which is most noticeable for the more extreme events. The estimate for the 100-year event is 922 m 3 s"1 using at-site data and 520 m3 s~' based on the regional growth curve. The results for this site reveal that even though extreme flow information from the target site cannot be effectively used for Group 4, information from the catchments in Group 4 can be used for effectively estimating quantiles at the target site. Similar analysis was conducted for the other catchment that was not allocated to any of the groups. This catchment, which has 30 years of record, has an extreme flow response that is quite different from that for the region to which it is closest (Group 2) and also different from the other groups as well. It is concluded that this catchment is not suitable for regional analysis, perhaps as a result of unique conditions that influence the extreme flows at this site. The final example considers a target site that is included in multiple groups. The catchment has Group 1 as its primary region but is also a member of Groups 5 and 7. The estimation of extreme flow quantiles for this site could proceed using the regional The formation ofgroups for regional flood frequency analysis 111 346 @ 3 (e) 1000- 3 4 Distributions 1. P3 - LMOM 900- g 800- 2 700- 1 Dser distributions 2. Group 1 3. Group 5 4. Group 7 (D E Z3 O 600500- - 400300200100- Return period '+ 2 0 ^~;^ I I 5 I 10 1 25 1 50 100 1 (years) 200 1 500 1 i -r -1 Gumbel reduced variate, y Fig. 5 Extreme flow relationship for a catchment that is a member of multiple groups. growth curve for Group 1 or could involve combining information from the growth curves for the three regions. The extreme flow relationships, Fig. 5, include the at-site curve as well as the three regional growth curves. The Group 1 curve is seen to provide a very good fit to the 23 years of at-site data. The other two regional curves provide fits that are reasonable, but that deviate more from the at-site curve than is the case for Group 1. It can be concluded that the Group 1 growth curve can be used to estimate extreme flow quantiles for this site. CONCLUSIONS AND RECOMMENDATIONS The regionalization process developed in this work has been demonstrated to result in effective groups of catchments for regional flood frequency analysis. The use of a clustering algorithm with limited physiographic information can still lead, through the region revision process, to groups that are homogeneous, identifiable, and of a sufficient size. Although the region revision component of the procedure is a heuristic process, it can be used to effectively alter the original groups to enhance the regional characteristics. The overlapping regions that result from the region revision process represent a more effective use of the available extreme flow information than is the case for distinct (non-overlapping) regions, particularly when the total number of catchments is not large. The application of the technique to data from rivers in India reveals that, as expected, the procedure results in substantial improvements in the regional characteris- 112 Donald H. Bum & N. K. Goel tics in comparison to the geographical regions. The procedure developed can be used effectively as the basis for estimating extreme flow quantités for either gauged or ungauged catchments. Future work could explore the application of the methodology to data from other locations to ascertain the generality of the results obtained with catchment data from Zone 3 of India. It would also be useful to evaluate the performance of the technique under conditions with differing amounts of available data. In this context, the impact on the results of variations in the amount of both the hydrometric data and the physiographic catchment characteristic data could be examined. Acknowledgements The research described has been partially supported by a grant from the Natural Sciences and Engineering Research Council of Canada (NSERC). This contribution is gratefully acknowledged. This work was commenced while the second author was a visiting scholar in the Department of Civil Engineering, University of Waterloo, Ontario, Canada. Figures 3-5 were produced using the WINFAP-FEH software package developed by the Institute of Hydrology in the UK. The authors are grateful to Dr Duncan Reed for providing a beta version of this software. The present version of the paper has benefited greatly from the useful comments provide by two anonymous reviewers. REFERENCES Acreman, M. C. & Wiltshire, S. E. (1987) Identification of regions for regional flood frequency analysis (abstract). EOS 68(44), 1262. Burn, D. H. (1990) Evaluation of regional flood frequency analysis with a region of influence approach. Wat. Resour. Res. 26(10), 2257-2265. Burn, D. H. (1997) Catchment similarity for regional flood frequency analysis using seasonality measures. J. Hydrol. 202, 212-230. Burn, D. H.. Zrinji, Z. & Kowalchuk, M. (1997) Regionalization of catchments for regional flood frequency analysis. J. Hydrol. Engng ASCE 2(2), 76-82. Hosking, J. R. M. & Wallis, J. R. (1993) Some statistics useful in regional frequency analysis. Wat. Resour. Res. 29(2), 271-281. Hosking, J. R. M. & Wallis, J. R. (1997) Regional Frequency Analysis: An Approach Based on L-Moments. Cambridge University Press, Cambridge, UK. Jakob, D., Reed, D. W. & Robson, A. J. (1999) Choosing a pooling-group, chapter C6. In: Flood Estimation Handbook, vol. 3, Statistical Procedures for Flood Frequency Estimation. Institute of Hydrology, Wallingford, UK. Kumar, R., Singh, R. D. & Seth, S. M. (1999). Regional flood formulas for seven subzones of Zone 3 of India. J. Hydrol. Engng ASCE 4(3), 240-244. Lettenmaier, D. P., Wallis, J. R. & Wood, E. F. (1987) Effect of regional heterogeneity on flood frequency estimation. Wat. Resour. Res. 23(2), 313-323. Nathan, R. J. & McMahon, T. A. (1990) Identification of homogeneous regions for the puipose of regionalization. J. Hydrol. 121,217-238. Parida, B. P., Kachroo, R. K. & Shrestha, D. B. (1998) Regional flood frequency analysis of Mahi-Sabarmati basin (Subzone 3-a) using index flood procedure with L-moments. Wat. Resour. Manage. 12, 1-12. Reed, D. W., Jakob, D., Robson, A. J., Faulkner, D. S. & Stewart, E. J. (1999) Regional frequency analysis: a newvocabulary. In: Hydrological Extremes: Understanding, Predicting. Mitigating (Proc. Birmingham Symp., July 1999) (ed. by L. Gottschalk, J.-C. Olivry, D. Reed & D. Rosbjerg), 237-243. IAHS Publ. no. 255. Stedinger, J. R. & Lu, L. (1995) Appraisal of regional and index flood quantile estimators. Stochastic Hvdrol. Hydraul. 9(1), 49-75. Tasker, G. D. (1982) Comparing methods of hydrologie regionalization. Wat. Resour. Bull. 18(6), 965-970. Wandle, S. W. (1977) Estimating the magnitude and frequency of flood on natural streams in Massachusetts. USGS Water Resources Investigations 77-39. Webster, R. & Burrough, P. A. (1972) Computer-based soil mapping of small areas from sample data, II: classification Received 30 JuneJ. 1999; accepted November 1999 smoothing. Soil Sci. 23(2), 2222-234.
© Copyright 2026 Paperzz