188 Quantification and Reduction of Predictive Uncertainty for Sustainable Water Resources Management (Proceedings of Symposium HS2004 at IUGG2007, Perugia, July 2007). IAHS Publ. 313, 2007. Adjusting ensemble forecast probabilities to reflect several climate forecasts JERY R. STEDINGER1 & YOUNG-OH KIM2 1 School of Civil and Environmental Engineering, Cornell University, Hollister Hall, Ithaca, New York 14853-3501, USA [email protected] 2 School of Civil, Urban, and Geosystem Engineering, Seoul National University, Seoul 151-742, Korea Abstract An activity of growing importance is the use of forecast information to update meteorological and hydrological series and their associated probabilities so as to describe the distribution of future events of interest. A simple and flexible pdf-ratio method generates a consistent and smooth set of probabilities for climate series across the entire range of the key variable reflecting the change in the likelihood of each individual climate series. This paper addresses the use of the pdf-ratio method with several forecasts, which could represent different forecast periods, different variables, or different basins. Examples demonstrate that if separate and independent adjustments are adopted to capture the conditional probabilities of different variables, such as temperature, precipitation, or seasonal flow for different forecast periods or different basins, then the resulting joint distribution of such variables can be grossly distorted. Key words ensemble forecasting; forecast information; multivariate forecast; pdf-ratio method; probability adjustment; scenarios INTRODUCTION A recent National Research Council report stressed the need to reflect forecast uncertainty in weather forecasts (NRC, 2006, Summary): “Uncertainty is thus a fundamental characteristic of weather, seasonal climate, and hydrologic prediction, and no forecast is complete without a description of uncertainty”. The authors observe that the use of weather and climate ensembles is often the way the US NWS (National Weather Service) evaluates and represents the weather uncertainty in its forecasts. The generation and use of ensemble forecasts is indeed an active research topic in meteorology and hydrology. At the National Center for Environmental Prediction, the ensemble approach has been applied operationally with both medium- and extendedrange forecasts (Tracton & Kalnay, 1993; Toth & Kalnay, 1993). Ensemble streamflow prediction currently serves as a key component of the 21 Century Advanced Hydrologic Prediction System for the US National Weather Service (Schaake et al., 2004). Here we consider a situation wherein an analysis has resulted in a specification of the conditional distribution of some selected sets of climate variables X, which we denote as X = g(v) where a set of historical climate series {vi} is available which contains daily or weekly values of climate variables, at perhaps several sites. A forecast variable H may not be explicitly specified, but we are given a distribution D[g(v)|H]. The challenge is then to adjust the probabilities initially assigned to each Copyright © 2007 IAHS Press Adjusting ensemble forecast probabilities to reflect several climate forecasts 189 series vi so that the climate series/probability pairs {(vi,qi)} together provide a good representation of D[g(v)|H]. Recently Croley (2000, 2003) and Wilks (2002) addressed this problem wherein D[g(v)|H] is summarized by the probability that each of the selected set of variables g(v) falls in three specific intervals. Their algorithms adjust the probabilities assigned to the different series so as to achieve the target probabilities using the values of the selected variables g(vi) only to the extent that they determine whether a given climate series vi is in the below-normal, normal, or above-normal range for g(vi). This approach can be considered as a block adjustment because the probabilities assigned to all series in the same block or category are the same. Stedinger & Kim (2002) discussed the disadvantages and problems associated with the Croley-Wilks block-adjustment method wherein the resulting discrete distributions achieve the target interval probabilities, but may provide a poor description of the continuous conditional distributions of interest. Croley (2003) considered reproduction of a mean and variance, or other statistics, rather than three probabilities. More generally, Stedinger & Kim (2002) proposed a simple and general approach called the pdf-ratio method. It makes use of the entire D[g(v)|H] distribution, as well as the individual values of the selected variables g(vi), to better approximate the entire D[g(v)|H] distribution, rather than just reproducing the three interval probabilities. Their examples showed that the pdf-ratio method generally provides an adequate description of the target conditional distribution, provided the initial sample is large enough and thus covers a sufficient range with an adequate density to support a good description of the desired conditional distribution. However, the examples provided in Stedinger & Kim (2002) were limited to a univariate forecast variable. This study illustrates use of the pdf-ratio method with several forecasts reflecting different forecast periods or different basins. The examples also illustrate the problems that can arise if a reasonable description of the joint distribution of the forecasts is not adopted. THEORY OF THE PDF-RATIO METHOD Consider a set of historical climate series or scenarios {vi} for i = 1, 2,…, N. To each is assigned a value xi = g(vi) which reflects the general character of the more detailed and complete series. Here xi may be the monthly flow, mean temperature, seasonal rainfall depth, or the principle component score for several such variables (Wilks, 2006, Chapter 11). One also has quantiles xb and xa such that F(xb) = F(xa) – F(xb) = 1 – F(xa) = 0.333 where F(x) is a cumulative distribution function, and xb and xa are the lower and upper terciles, respectively, defining below-normal and above-normal ranges. If there is no additional information, one usually assigns an equal weight, 1/N, to each climate series. Now for the terciles xa and xb, additional climate information is given as F(xb) = pb, F(xa) – F(xb) = pn, and 1 – F(xa) = pa where pb + pn + pa = 1. The three interval probabilities, pb, pn, and pa are called the below-normal, normal, and above-normal probabilities, respectively. To reflect this climate information, one wishes to update the prior probabilities {1/N} on the {xi} to new values {qi}. This is the problem addressed by Croley (2000, 2003) and Wilks (2002). Stedinger & Kim (2002) viewed the problem as shifting an empirical distribution that represents a continuous initial pdf (probability density function) f0(x) corres- 190 Jery R. Stedinger & Young-Oh Kim ponding to the distribution D0 to one that represents a new pdf f1(x) corresponding to distribution D1. The purpose of these new probabilities is to allow for the computation of the expectation of any function G(X) conditioned on forecast information denoted as H: E{G ( X | H )} = ∫ G ( x) f1 ( x)dx = ∫ G ( x){ f1 ( x) / f 0 ( x)} f 0 ( x)dx (1) Appropriate revised probabilities for each xi corresponding to vi are simply, qi = { f1 ( xi ) / f 0 ( xi )}(1/ N ) (2) Clearly there will be numerical implications of this substitution when N is finite: the sum over i of all the probabilities qi should always be one. Stedinger & Kim (2002) called this simple method, with normalization, the pdf-ratio method, and showed how it could be derived as a Bayesian updating scheme following Kelman et al. (1990). MULTIVARIATE PDF-RATIO ADJUSTMENT As illustrated by examples in Croley (2000), the multivariate case is very important because watershed dynamics depend on the values of precipitation, temperature, and other climate variables in different time periods, resulting in a problem of potentially large dimensions. Use of multiple simultaneous forecasts poses little significant conceptual challenge to the pdf-ratio method, provided adequate descriptions of f0(x) and f1(x) can be constructed. Generally f0(x) is not the problem because the available climate series {vi} provide a multivariate sample for X with which a representation of f0(x) can be developed. However, f1(x) can be problematic, depending upon how the forecasting information is communicated. If only probability triplets are provided separately for different variables, then little or no forecast information describes the joint distribution of the components of X. Let the true initial, the adopted initial, and the adopted target distributions of our vector of random variables be f0(x), f0A(x), and f1A(x), respectively, which we take to be multivariate normal with means and variances denoted μ* and Σ*. If one normalizes so that the resulting pdf: f1R ( x) = k[ f1 A / f 0 A ] f 0 ( x) (3) is a legitimate density function by choice of the constant k, it is easy to show that the resultant approximation f1R(x) of f1(x) is asymptotically multivariate normal with: X ~ N[ μ1R , Σ1R ] (4) where: ∑1R −1 = ∑ 0 −1 + ∑1 A−1 − ∑ 0 A−1 ∑1R −1 μ R = ∑ 0 −1 μ0 + ∑1 A−1 μ1 A − ∑ 0 A−1 μ0 A (5) To illustrate the distortions that occur if Σ0A or Σ1A is mispecified, we now consider the case with two variates that have a joint normal distribution. For simplicity we assume that for both variables there initial and final variances are correctly specified: σ0 = σ0A and σ1 = σ1A. The three remaining parameters are the initial true crosscorrelation between the two variables ρ0, the initial cross-correlation assumed in the 191 Adjusting ensemble forecast probabilities to reflect several climate forecasts approximation ρ0A, and the cross-correlation assumed for the conditional distribution ρ1A. A concern is that some researchers have implicitly or explicitly assumed that variables are independent so that ρ0A = ρ1A = 0, and have hoped that an adjustment like the pdf-ratio will still yield reasonable results. The analysis here demonstrates that a lot can go wrong. With the assumption that ρ0A = ρ1A = 0, and that the two initial and two conditional variances are equal to σ02 and σ12, the diagonal elements of Σ0A and Σ1A, respectively, equation (5) yields (Stedinger & Kim, 2007): ρ1R = ρ0 1 − (1 − ρ )(σ 12 − σ 02 ) / σ 12 (6) 2 0 Figure 1 provides a plot of this relationship further assuming for simplicity, but no loss of generality, that σ0 = σ0A = 1. Given an initial cross correlation ρ0, and adopting ρ0A = ρ1A = 0, we can compute the asymptotic cross-correlation ρ1R of the two adjusted variables. Suppose for our exploration we assume that ρ1 = ρ0; that is, forecast information may change the variances of the two variables equally, but it will not affect their cross-correlation. Then Fig. 1 illustrates how the degree of the distortion in the posterior cross-correlation depends on the forecast accuracy. Suppose that ρ0 = 0.80, then if we seek to increase the variance of the variables so that σ1/σ0 = 1.2, the cross-correlation of the resultant variables will increase from 0.80 to 0.90. On the other hand, if we seek to reduce the variance so that σ1/σ0 = 0.25, then the resultant crosscorrelation drops to 0.13, which is essentially zero. When the forecast is precise, i.e. σ1/σ0 = 0.25, the approximate cross-correlation is very much less than the target crosscorrelation, ρ1 = ρ0, whereas increasing the variance of the variables increases their cross-correlation from the initial value, which was thought to be the final value as well ρ1 = ρ0. Figures 2 and 3 provide examples that illustrate why this occurs. 1.0 σ1= 0.25 0.9 σ1= 0.50 σ1= 0.75 0.8 σ1= 0.90 σ1= 1.20 0.7 0.6 ρ 1R 0.5 0.4 0.3 0.2 0.1 0.0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 True Initial Cross-Correlation ρ 0 Fig. 1 Distortions in the posterior cross-correlation caused by the assumption of independence ρ1A = ρ 0A = 0. The graph shows the cross-correlation ρ1R that results from an adjustment assuming independence as a function of the true initial crosscorrelation ρ0, for different values of σ1 when σ0 = 1; dotted lines corresponds to ρ1R = ρ0. 192 Jery R. Stedinger & Young-Oh Kim Assuming Series are Independent 6 5 4 Yi 3 2 1 0 0 1 2 3 4 5 6 5 6 Xi Assum ing Series have Cross-correlation = 0.80 6 5 4 Yi 3 2 1 0 0 1 2 3 4 Xi Fig. 2 The pdf ratios calculated for a bivariate normal distribution with {μ0x, μ0y} = {μ1x, μ1y} = {3,3} and initial {σ0x, σ0y} = (1,1), versus target {σ1x, σ1y} = {1.5,1.5}. The area of a circle is proportional to the probability of each point. Lower figure adopts correct assumption that ρ1 = ρ0 = 0.8. Figures 2 and 3 consider the following situation wherein one has two variables X and Y that have a joint normal distribution with a cross-correlation of 0.80. The actual sample values for both of the variables are the constructed sets employed earlier in Stedinger & Kim (2002), and those discrete values have been paired following a joint normal distribution to obtain this sample that has a cross-correlation of 0.80. Figure 2 shows how the pdf-ratio changes the probabilities assigned to the points so as to accommodate a forecast that increased the variance from 1 to 1.5, while the mean remains unchanged. Figure 3 shows how the pdf-ratio changes the probabilities if one seeks to decrease the mean from 3 to 2.5 (looks like drought) while decreasing the Adjusting ensemble forecast probabilities to reflect several climate forecasts 193 Assum ing Series are Independent 6 5 4 Yi 3 2 1 0 0 1 2 3 4 5 6 5 6 Xi Assuming Series have Cross-correlation = 0.80 6 5 4 Yi 3 2 1 0 0 1 2 3 4 Xi Fig. 3 The pdf ratios calculated for a bivariate normal distribution with initially {μ0x, μ0y} = {3,3} and {σ0x, σ0y} = (1,1), versus {μ1x, μ1y} = {2.5,2.5} and {σ1x, σ1y} = {0.5,0.5}. The area of a circle is proportional to the probability of each point. Lower figure adopts correct assumption that ρ1 = ρ0 = 0.8. variance from 1 to 0.5. The figures show the adjusted probabilities when one assumes that the variables are initially and conditionally independent ρ0A = ρ1A = 0 even though that is false because ρ0 = 0.8; the figures also show the adjusted probabilities when the assumed cross-correlations match the true value: ρ0 = ρ0A = ρ1A = 0.80. In Fig. 2, the differences between the two sets of probabilities employing ρ0A = ρ1A = 0, and ρ0 = ρ0A = ρ1A = 0.80 are remarkable. If one expands the variance assuming independence, almost all of the probability shifts to the points with the largest values; the other points almost disappear in the figure. On the other hand, correctly reflecting the initial and conditional cross-correlation of 0.80 between the variables results in probabilities that yield a cross-correlation of 0.78 in this instance, using relatively 194 Jery R. Stedinger & Young-Oh Kim similar weights on a wide range of the points. The two representations of possible future events are dramatically different. In the second case that is displayed in Fig. 3, we seek to describe a potential drought situation in what may be two large basins which conditionally have a decreased mean flow of 2.5 and a standard deviation of 0.5. In this case, assuming independence places almost all the probability symmetrically on a small circular cluster of points, as is appropriate if x and y are independent. On the other hand, correctly including the cross-correlation in the adjustment results in a set of probabilities that sweep across the data set yielding a cross-correlation of 0.80 for the discrete approximation. These characteristics of the solution become even more extreme if the standard deviations are to be increased to 0.25 with a mean of 2.0; then incorrectly assuming independence resulted in a cross-correlation of –0.06 (essentially zero), whereas correctly assuming a cross-correlation of 0.80 yielded an approximation with a cross-correlation of 0.75. CONCLUSIONS The pdf-ratio method is reasonably simple and flexible. It can be used with different univariate and multivariate families of distributions describing the initial and target distributions for climate variables. It deals with the multivariate forecast case without difficulty. Examples clearly demonstrate the need in the multivariate forecast case to be careful in the specification of the initial and target conditional multivariate distributions of the climate variables. We conclude that if separate and independent adjustments are adopted to capture the conditional probabilities of different variables for different forecast periods or different basins, as was done in Croley (2000, pp. 110– 112), then the resulting joint distribution of such variables can be grossly distorted. REFERENCES Croley II, T. E. (2000) Using Meteorology Probability Forecasts in Operational Hydrology. ASCE Press, Reston, Virginia, USA. Croley II, T. E. (2003) Weighted-climate parametric hydrologic forecasting. J. Hydrol. Engng 8(4), 171–180. Kelman, J., Stedinger, J. R., Cooper, L. A., Hsu, E. & Yuan, S. (1990) Sampling stochastic dynamic programming applied to reservoir operation. Water Resour. Res. 26(3), 447–454. National Research Council (2006) Completing The Forecast: Characterizing and Communicating Uncertainty for Better Decisions Using Weather and Climate Forecasts. National Academy Press, Washington DC, USA. Schaake, J. C., Hartman, R., Demargne, J., Mullusky, M., Welles, E., Wu, L. & Fan, X. (2004) Ensemble streamflow prediction by the National Weather Service (NWS) Advanced Hydrologic Prediction Services (AHPS). In: 2004 Joint Assembly of AGU, CGU, SEG, and EEGS. American Geophysical Union, Montreal, Canada. Stedinger J. R. & Kim, Y.-O. (2002) Updating ensemble probabilities based on climate forecast. In: Proceedings 2002 Conference on Water Resources Planning and Management (ed. by D. F. Kibler), paper C-2-109. ASCE, Reston, Virginia, USA. Stedinger J. R. & Kim, Y.-O. (2007) Probabilities for ensemble forecasts reflecting climate information. Water Resour. Res. (submitted). Toth, Z. & Kalnay, E. (1993) Ensemble forecasting at NMC: The generation of perturbations. Bull. Am. Met. Soc. 74, 2317–2330. Tracton, M. S. & Kalnay, E. (1993) Operational ensemble prediction at the National Meteorological Center: practical aspects. Weather Forecast 8, 378–398. Wilks, D. S. (2002) Realizations of daily weather in forecast seasonal climate. J. Hydromet. 3(2), 195–207. Wilks, D. S. (2006) Statistical Methods in the Atmospheric Sciences, second edn. Academic Press/Elsevier, Burlington, Massachusetts, USA.
© Copyright 2026 Paperzz