A SMOOTHED BOOTSTRAP APPROACH FOR MONTE CARLO GENERATION OF N SPATIALLY CORRELATED GRAPHS OF ANNUAL FLOW DURATION OR CUMULATIVE DISCHARGE W.E. BARDSLEY Department of Earth Sciences, University of Waikato, Private Bag 3105 Hamilton, New Zealand. It is common for river discharges in the same or adjacent river basins to be spatially correlated because the flows are generated from the same precipitation systems passing over the region. This correlation needs to be incorporated when simulating general seasonal flow patterns at N different sites for input to models of multi-tributary or multiriver engineering schemes. However, it is often difficult to generate sets of correlated discharge variables which maintain a good description of the correlation relations seen in the flow record. As a means around this difficulty, a smoothed bootstrap approach is advocated by which the original record of correlated discharge observations is interpreted as a sequence of realizations from an unknown N-variate multivariate distribution. A nonparametric estimate of this unknown distribution is obtained via a multivariate kernel approach by which N-variate normal distributions are located over each of the original data vectors. Simulation then reduces to a two-step process for a given realization. First, (i) any one of the multivariate normal distributions is selected at random, and (ii) a set of N correlated discharges is simulated from that distribution. This two-step process is particularly simple and requires only the generation of multivariate normal random variables. The method has the potential for extension to large values of N, regardless of the complexity of the correlation structure in the original data. This represents an improvement over the copula approach which can have difficulty in representing complex correlations as N increases. The smoothed bootstrap approach is illustrated with simulation of correlated annual cumulative inflows into two adjacent hydro storage lakes in the South Island of New Zealand. A similar approach is applicable to simulating correlated annual flow duration curves or indeed to simulating any set of correlated variables for which there is an existing data record. INTRODUCTION It is common for large irrigation, hydro power, and flood control schemes to be impacted by the discharge variations of multiple tributaries or multiple rivers. Setting the design parameters of such large and complex schemes is aided by simulating many years of discharges which seek to mimic the patterns of existing discharges without necessarily simply repeating them. Simple repetition of past records is always possible of course by way of bootstrap resampling of the data record. However, such tight control given to past 1 2 data may not be desirable. In particular, discharge extremes can never exceed the recorded extreme observations. The next logical step in moving to allow greater flexibility in the simulated values is to view the existing record from N different discharge recording sites as equivalent to random variables generated from N different statistical distributions fitted to site data. A feature which must be incorporated in this process is the site discharge spatial correlations which arise as a consequence of the rivers experiencing the same sequence of runoff-generating events in the form of precipitation systems and/or snowmelts. The specific causal mechanism of these correlations varies with the degree of averaging of the discharge variable of interest. For example, instantaneous discharges may be correlated through experiencing common storm events while mean annual flows will be correlated by experiencing the same sequence of wet and dry years. Discharge spatial correlations present a problem for simulations in that it is not a simple process to generate sets of correlated random variables from N-variate distributions which are flexible enough to accurately represent the existing N-variate discharge data set. Typically such data are highly skewed with complex correlation structures far removed from the standard multivariate distributions such at the multivariate normal distribution. One potential solution to the multivariate discharge simulation problem is through the use of copulas [2]. In essence, copulas are a flexible means of representing multivariate distributions in terms of independently defined marginal distributions and correlation structures. However, the correlation structure is fully determined by the choice of copula and it seems probable that as N increases any given copula is unlikely to have the capacity to accurately reflect in all dimensions the correlation structures evident in the data record. The purpose of this paper is to present briefly a more robust but simple alternative to the copula approach to simulation, through the use of a multivariate smoothed bootstrap. AN N-VARIATE SMOOTHED BOOTSTRAP The smoothed bootstrap approach adopted here utilises a kernel technique [4] for estimating the unknown N-variate probability density. The respective sets of correlated data are first rescaled to unit standard deviation and an N-variate normal distribution is placed on each vector of correlated rescaled data, such that each of the N distribution means corresponds to a data value. The component standard deviations of any given Nvariate normal distribution are set to a constant common value. The magnitude of this single value is free to be varied to influence the simulated data pattern, as is the associated matrix of correlation coefficients. Thus defined, a given set of K N-variate normal distributions will display as K bivariate normal distributions when viewed in any 2-variate plane, with the long axis gradient of all the bivariate distributions being +1.0 or -1.0 depending on the sign of the corresponding bivariate correlation coefficient [3 p.255]. A data point will appear in the centre of each bivariate distribution. 3 In setting up the simulation process, the user visits a display of each possible 2variate plane and by trial and error adjusts the bivariate correlation coefficients and the single standard deviation value per N-variate normal distribution. This process of adjustment inevitably involves a degree of subjectivity but the data display does provide constraint on the simulation pattern. The respective correlation coefficients can be adjusted independently in each 2-variate plane, but a change in a standard deviation value will influence simulation patterns in all 2-variate planes because this single value is common to all planes for a given N-variate normal distribution. In setting up the desired parameter values it is helpful to display the 99% bivariate confidence ellipses about each data point in the various planes. Increasing a correlation coefficient increases the elongation of the associated confidence ellipse and increasing the standard deviation increases the area of the ellipse. As will be shown in the example, a reasonable first approximation can be achieved by giving all the N-variate distributions a common correlation matrix and a common standard deviation. The simulation can proceed once the N-variate normal distribution parameter values have been set. A given simulation realization is achieved by selecting one of the Nvariate normal distributions at random and then generating a vector of N correlated variables from that distribution. This process is a smoothed bootstrap technique because the simulated data can be near or far from the data points depending on the extent to which the parameter specification results in the probability density being concentrated around the data points. In the limit as the standard deviations tend to zero the simulation process reverts to classical bootstrap resampling of the multivariate data set. In this way the user can determine the extent to which their simulations are constrained by the data record. EXAMPLE The simulation method described above is general and can be applied to any data set of correlated values, which need not necessarily arise from an observation process. This point is illustrated in the following example which simulates cumulative annual inflows plots for Lakes Tekapo (88 km2)and Pukaki (169 km2), located in the South Island of New Zealand. These two lakes provide most of New Zealand’s hydro electric storage and their adjacent location results in a considerable degree of correlation in their seasonal water inflows. The simulations were not intended to capture the fine structure of inflow variations so the base data used was simply cumulative lake inflow volumes at the end of each month. The simulation method could in principle be applied to generate correlated sets of the 24 inflow values for the two lakes for each year of simulation. However, a degree of reduction in dimensionality is desirable if only to avoid the tedious process of inspecting 276 2-variate planes. To this end, the following equations were employed to give 3parameter approximations to the respective 12-value inflow plots per year: 4 1.23 w m 2i 1.11 1.93 3.86 m m 2i 3i Pc w m im 1i Tc m im 1i 1.92 w m 3i 3.14 (1) (2) which respectively give, for Lakes Pukaki (P) and Tekapo (T), the cumulative inflow volume at the end of the mth month of the ith year of record. These expressions are in the form of weighted additive power functions, with each year requiring three weight values, symbolised as w or to emphasise that the numerical weight values are different for the respective lakes. These 3-parameter functions give reasonable approximation to the monthly inflows. Two example fits for Lake Pukaki are shown in Figure 1 and the observed and fitted values for all months of record are shown in Figures 3 and 4. Cumulative Inflow (108m3) 40 30 1992 20 1993 10 0 0 0.2 0.4 0.6 0.8 1 Fraction of Year Values from Fitted Function Figure 1. Example fitted function (1) to Lake Pukaki inflows for 1992 and 1993 60.0 40.0 30.0 20.0 0.0 0.0 0.0 30.0 60.0 Lake Pukaki Monthly Cum ulative Inflow s (a) 0.0 20.0 40.0 Lake Tekapo Monthly Cum ulative Inflow s (b) Figure 2. All observed and fitted monthly cumulative inflows for (a) Lake Pukaki (19262000) and (b) Lake Tekapo (1940-2000). Fitted functions are eqs (1) and (2) respectively. Inflow volumes are in units of 108 cubic metres. 5 The weighted power expressions (1) and (2) are not unique in any way and no doubt numerous other functions could also describe the data at least as well. Nor is the goodness of fit shown in Figures 2 and 3 of any particular surprise because those plots represent multiple sets of 12 variables being approximated by 3-parameter functions. The functions are simply a utilitarian means of reducing the dimensionality of the simulation process from 24 correlated inflow variables to 6 correlated weight variables. The simulation process in this case generates random (scaled) weight variables via suitably paramaterised 6-variate normal distributions as described earlier. For a given realization the resulting six simulated correlated variables are then rescaled back to serve as the weight values in eqs. (1) and (2), defining a single realization of a correlated pair of annual inflow hydrographs. For the purposes of this paper a simple multivariate parameterisation was adopted, with all the component 6-variate normal distributions having a common standard deviation of 0.15 and a common correlation matrix (Table 1). Assignment of the 15 correlation coefficients values was made on the basis of inspection of the 15 possible 2variate plots of the standardised weight values. Where a higher degree of correlation of the weight data plots was evident, a higher correlation coefficient was assigned to extend the probability density along the 1.0 gradient – corresponding to a situation of narrower bivariate confidence ellipses for the 2-variate plane concerned. More broadly scattered data points suggests a more diffuse probability density and lower correlation coefficients were assigned. Figures 3 and 4 respectively illustrate the effect of the application of higher and lower correlation coefficients. Pukaki weight values from 1940 onwards were employed to ensure no missing values in the utilised multivariate weight set. Table 1. Common correlation matrix of the three standardised weights for lakes Pukaki (P) and Tekapo (T), which was applied to all the multivariate normal distributions used in the simulations. PW1 PW2 PW3 TW1 TW2 TW3 PW1 1 -0.85 0.70 0.70 -0.70 0.60 PW2 PW3 TW1 TW2 1 -0.85 -0.70 0.70 -0.60 1 0.7 -0.7 0.7 1 -0.85 0.7 1 -0.85 TW3 1 The patterns of simulated weight values shown in Figures 3 and 4 represent two of the 15 possible 2-variate projections of 5,000 simulated sets of 6 correlated weight values. The data display and simulation process was carried out using MATLAB code, utilising the mvnrnd function in the Statistics Toolbox to generate the multivariate normal random variables. The simulated data shows a tendency to cluster around the actual weight values, indicating that the assigned 0.15 common standard deviation value gives a 6 9 8 Lake Tekapo Weight 1 7 6 5 4 3 2 2.5 3 3.5 4 4.5 5 5.5 Lake Pukaki Weight 1 6 6.5 7 7.5 Figure 3. Standardised actual (black) and simulated (grey) weight values in the PW1 TW1 plane, showing the elongation effect of the 0.7 correlation coefficient (Table 1) 7 6 Lake Tekapo Weight 3 5 4 3 2 1 0 2.5 3 3.5 4 4.5 5 5.5 Lake Pukaki Weight 1 6 6.5 7 7.5 Figure 4. Standardised actual (black) and simulated (grey) weight values in the PW1 TW3 plane, showing the dispersive effect of the 0.6 correlation coefficient (Table 1) 7 Cumulative Inflow fairly strong emphasis to the weights originally obtained in the fitting process. The next refinement might be to assign higher standard deviations to the outlying points to give a more even spread of probability density through to the main mass of the data values. Figure 5 shows just four of the 5000 pairs of simulated correlated cumulative inflow plots. It is interesting to note how the random variation from the multivariate normal distributions can sometimes generate a similar inflow pair, while more different curves arise from other simulated years. Further analysis of the simulated cumulative inflows could be carried out to establish joint probabilities of low inflows to both lakes as part of hydro power risk analysis. However, this in not pursued further here. 50.0 50.0 25.0 25.0 0.0 0.0 Cumulative Inflow 0 0.2 0.4 0.6 0.8 1 50.0 50.0 25.0 25.0 0 0.2 0 0.2 0.4 0.6 0.8 1 0.4 0.6 0.8 1 0.0 0.0 0 0.2 0.4 0.6 0.8 1 Fraction of Year Fraction of Year Figure 5. Selected pairs of simulated correlated cumulative inflow curves for Lakes Pukaki (upper curves) and Tekapo (lower curves). Inflow volumes are in units of 108 m3 ANNUAL FLOW DURATION CURVES The above methodology of simulating annual cumulative discharge curves can be extended to annual flow duration curves. This would give a nonparametric alternative to deriving annual flow duration variability using parametric distributions of river discharge variations [1]. Allowance needs to be made for the minimum point on the curve being the annual discharge minimum rather than zero, so an additive weighted power function approximation to the annual flow duration curve could be written: 8 a b Q w w P w P w P c ... P 1 2 3 4 (3) where P is the proportion of time flow is less than the discharge QP, and the w values are correlated weights. The annual minima and maxima are respectively the first weight and the sum of all weights, so the annual maxima is now a multivariate realization. CONCLUSION It might be argued that simulating correlated hydrological variables using the smoothed bootstrap approach is weakened by the subjective allocation of the multivariate normal standard deviations and correlation coefficients. However, subjectivity is unavoidable in any data-fitting process involving distribution selection. For example, copulas require subjective specification of the marginal distributions and the inevitable error introduced in this selection cannot be offset by the application of formal estimation of parameter values. Also the selection of the utilised copula is itself a subjective decision, guided by how well the simulated points appear to fall among the data points in the various projection planes. The smoothed bootstrap method would therefore appear to be no better or worse than other means of simulating multivariate hydrological variables. However, it does have a significant advantage in terms of flexibility and ease of use and therefore has potential as a practical tool for application in a wide range of water resource simulations in a correlated multivariate context. REFERENCES [1] Castellarin A., Vogel R. M. and Brath A., “A stochastic index flow model of flow duration curves”, Water Resour. Res., Vol. 40, (2004), W03104, doi: 10.1029/2003WR002524. [2] Favre A-C., El Adlouni S., Perreault L., Thiémonge N. and Bobée B., “Multivariate hydrological frequency analysis using copulas”, Water Resour. Res., Vol. 40, (2004), W01101, doi: 10.1029/2003WR002456. [3] Kotz S., Balakrishnan N. and Johnson N. L., “Continuous Multivariate Distributions Volume 1 ”, 2nd edition, Wiley, (2000). [4] Scott D.W., “Mutivariate Density Estimation”, 1st edition, Wiley, (1992).
© Copyright 2026 Paperzz