Adjusting ensemble forecast probabilities to reflect several climate

188
Quantification and Reduction of Predictive Uncertainty for Sustainable Water Resources Management
(Proceedings of Symposium HS2004 at IUGG2007, Perugia, July 2007). IAHS Publ. 313, 2007.
Adjusting ensemble forecast probabilities to
reflect several climate forecasts
JERY R. STEDINGER1 & YOUNG-OH KIM2
1 School of Civil and Environmental Engineering, Cornell University, Hollister Hall, Ithaca,
New York 14853-3501, USA
[email protected]
2 School of Civil, Urban, and Geosystem Engineering, Seoul National University, Seoul 151-742,
Korea
Abstract An activity of growing importance is the use of forecast information
to update meteorological and hydrological series and their associated
probabilities so as to describe the distribution of future events of interest. A
simple and flexible pdf-ratio method generates a consistent and smooth set of
probabilities for climate series across the entire range of the key variable
reflecting the change in the likelihood of each individual climate series. This
paper addresses the use of the pdf-ratio method with several forecasts, which
could represent different forecast periods, different variables, or different
basins. Examples demonstrate that if separate and independent adjustments are
adopted to capture the conditional probabilities of different variables, such as
temperature, precipitation, or seasonal flow for different forecast periods or
different basins, then the resulting joint distribution of such variables can be
grossly distorted.
Key words ensemble forecasting; forecast information; multivariate forecast; pdf-ratio method;
probability adjustment; scenarios
INTRODUCTION
A recent National Research Council report stressed the need to reflect forecast
uncertainty in weather forecasts (NRC, 2006, Summary): “Uncertainty is thus a
fundamental characteristic of weather, seasonal climate, and hydrologic prediction,
and no forecast is complete without a description of uncertainty”. The authors observe
that the use of weather and climate ensembles is often the way the US NWS (National
Weather Service) evaluates and represents the weather uncertainty in its forecasts.
The generation and use of ensemble forecasts is indeed an active research topic in
meteorology and hydrology. At the National Center for Environmental Prediction, the
ensemble approach has been applied operationally with both medium- and extendedrange forecasts (Tracton & Kalnay, 1993; Toth & Kalnay, 1993). Ensemble streamflow prediction currently serves as a key component of the 21 Century Advanced
Hydrologic Prediction System for the US National Weather Service (Schaake et al.,
2004).
Here we consider a situation wherein an analysis has resulted in a specification of
the conditional distribution of some selected sets of climate variables X, which we
denote as X = g(v) where a set of historical climate series {vi} is available which
contains daily or weekly values of climate variables, at perhaps several sites. A
forecast variable H may not be explicitly specified, but we are given a distribution
D[g(v)|H]. The challenge is then to adjust the probabilities initially assigned to each
Copyright © 2007 IAHS Press
Adjusting ensemble forecast probabilities to reflect several climate forecasts
189
series vi so that the climate series/probability pairs {(vi,qi)} together provide a good
representation of D[g(v)|H].
Recently Croley (2000, 2003) and Wilks (2002) addressed this problem wherein
D[g(v)|H] is summarized by the probability that each of the selected set of variables
g(v) falls in three specific intervals. Their algorithms adjust the probabilities assigned
to the different series so as to achieve the target probabilities using the values of the
selected variables g(vi) only to the extent that they determine whether a given climate
series vi is in the below-normal, normal, or above-normal range for g(vi). This
approach can be considered as a block adjustment because the probabilities assigned to
all series in the same block or category are the same.
Stedinger & Kim (2002) discussed the disadvantages and problems associated with
the Croley-Wilks block-adjustment method wherein the resulting discrete distributions
achieve the target interval probabilities, but may provide a poor description of the
continuous conditional distributions of interest. Croley (2003) considered reproduction
of a mean and variance, or other statistics, rather than three probabilities. More
generally, Stedinger & Kim (2002) proposed a simple and general approach called the
pdf-ratio method. It makes use of the entire D[g(v)|H] distribution, as well as the
individual values of the selected variables g(vi), to better approximate the entire
D[g(v)|H] distribution, rather than just reproducing the three interval probabilities.
Their examples showed that the pdf-ratio method generally provides an adequate
description of the target conditional distribution, provided the initial sample is large
enough and thus covers a sufficient range with an adequate density to support a good
description of the desired conditional distribution. However, the examples provided in
Stedinger & Kim (2002) were limited to a univariate forecast variable. This study
illustrates use of the pdf-ratio method with several forecasts reflecting different
forecast periods or different basins. The examples also illustrate the problems that can
arise if a reasonable description of the joint distribution of the forecasts is not adopted.
THEORY OF THE PDF-RATIO METHOD
Consider a set of historical climate series or scenarios {vi} for i = 1, 2,…, N. To each is
assigned a value xi = g(vi) which reflects the general character of the more detailed and
complete series. Here xi may be the monthly flow, mean temperature, seasonal rainfall
depth, or the principle component score for several such variables (Wilks, 2006,
Chapter 11). One also has quantiles xb and xa such that F(xb) = F(xa) – F(xb) =
1 – F(xa) = 0.333 where F(x) is a cumulative distribution function, and xb and xa are the
lower and upper terciles, respectively, defining below-normal and above-normal
ranges. If there is no additional information, one usually assigns an equal weight, 1/N,
to each climate series. Now for the terciles xa and xb, additional climate information is
given as F(xb) = pb, F(xa) – F(xb) = pn, and 1 – F(xa) = pa where pb + pn + pa = 1. The
three interval probabilities, pb, pn, and pa are called the below-normal, normal, and
above-normal probabilities, respectively. To reflect this climate information, one
wishes to update the prior probabilities {1/N} on the {xi} to new values {qi}. This is
the problem addressed by Croley (2000, 2003) and Wilks (2002).
Stedinger & Kim (2002) viewed the problem as shifting an empirical distribution
that represents a continuous initial pdf (probability density function) f0(x) corres-
190
Jery R. Stedinger & Young-Oh Kim
ponding to the distribution D0 to one that represents a new pdf f1(x) corresponding to
distribution D1. The purpose of these new probabilities is to allow for the computation
of the expectation of any function G(X) conditioned on forecast information denoted as
H:
E{G ( X | H )} = ∫ G ( x) f1 ( x)dx = ∫ G ( x){ f1 ( x) / f 0 ( x)} f 0 ( x)dx
(1)
Appropriate revised probabilities for each xi corresponding to vi are simply,
qi = { f1 ( xi ) / f 0 ( xi )}(1/ N )
(2)
Clearly there will be numerical implications of this substitution when N is finite: the
sum over i of all the probabilities qi should always be one. Stedinger & Kim (2002)
called this simple method, with normalization, the pdf-ratio method, and showed how
it could be derived as a Bayesian updating scheme following Kelman et al. (1990).
MULTIVARIATE PDF-RATIO ADJUSTMENT
As illustrated by examples in Croley (2000), the multivariate case is very important
because watershed dynamics depend on the values of precipitation, temperature, and
other climate variables in different time periods, resulting in a problem of potentially
large dimensions. Use of multiple simultaneous forecasts poses little significant
conceptual challenge to the pdf-ratio method, provided adequate descriptions of f0(x)
and f1(x) can be constructed. Generally f0(x) is not the problem because the available
climate series {vi} provide a multivariate sample for X with which a representation of
f0(x) can be developed. However, f1(x) can be problematic, depending upon how the
forecasting information is communicated. If only probability triplets are provided
separately for different variables, then little or no forecast information describes the
joint distribution of the components of X.
Let the true initial, the adopted initial, and the adopted target distributions of our
vector of random variables be f0(x), f0A(x), and f1A(x), respectively, which we take to be
multivariate normal with means and variances denoted μ* and Σ*. If one normalizes so
that the resulting pdf:
f1R ( x) = k[ f1 A / f 0 A ] f 0 ( x)
(3)
is a legitimate density function by choice of the constant k, it is easy to show that the
resultant approximation f1R(x) of f1(x) is asymptotically multivariate normal with:
X ~ N[ μ1R , Σ1R ]
(4)
where:
∑1R −1 = ∑ 0 −1 + ∑1 A−1 − ∑ 0 A−1
∑1R −1 μ R = ∑ 0 −1 μ0 + ∑1 A−1 μ1 A − ∑ 0 A−1 μ0 A
(5)
To illustrate the distortions that occur if Σ0A or Σ1A is mispecified, we now consider
the case with two variates that have a joint normal distribution. For simplicity we
assume that for both variables there initial and final variances are correctly specified:
σ0 = σ0A and σ1 = σ1A. The three remaining parameters are the initial true crosscorrelation between the two variables ρ0, the initial cross-correlation assumed in the
191
Adjusting ensemble forecast probabilities to reflect several climate forecasts
approximation ρ0A, and the cross-correlation assumed for the conditional distribution
ρ1A. A concern is that some researchers have implicitly or explicitly assumed that
variables are independent so that ρ0A = ρ1A = 0, and have hoped that an adjustment like
the pdf-ratio will still yield reasonable results. The analysis here demonstrates that a lot
can go wrong. With the assumption that ρ0A = ρ1A = 0, and that the two initial and two
conditional variances are equal to σ02 and σ12, the diagonal elements of Σ0A and Σ1A,
respectively, equation (5) yields (Stedinger & Kim, 2007):
ρ1R =
ρ0
1 − (1 − ρ )(σ 12 − σ 02 ) / σ 12
(6)
2
0
Figure 1 provides a plot of this relationship further assuming for simplicity, but no loss
of generality, that σ0 = σ0A = 1. Given an initial cross correlation ρ0, and adopting ρ0A =
ρ1A = 0, we can compute the asymptotic cross-correlation ρ1R of the two adjusted
variables. Suppose for our exploration we assume that ρ1 = ρ0; that is, forecast
information may change the variances of the two variables equally, but it will not
affect their cross-correlation. Then Fig. 1 illustrates how the degree of the distortion in
the posterior cross-correlation depends on the forecast accuracy. Suppose that ρ0 =
0.80, then if we seek to increase the variance of the variables so that σ1/σ0 = 1.2, the
cross-correlation of the resultant variables will increase from 0.80 to 0.90. On the other
hand, if we seek to reduce the variance so that σ1/σ0 = 0.25, then the resultant crosscorrelation drops to 0.13, which is essentially zero. When the forecast is precise, i.e.
σ1/σ0 = 0.25, the approximate cross-correlation is very much less than the target crosscorrelation, ρ1 = ρ0, whereas increasing the variance of the variables increases their
cross-correlation from the initial value, which was thought to be the final value as well
ρ1 = ρ0. Figures 2 and 3 provide examples that illustrate why this occurs.
1.0
σ1= 0.25
0.9
σ1= 0.50
σ1= 0.75
0.8
σ1= 0.90
σ1= 1.20
0.7
0.6
ρ 1R
0.5
0.4
0.3
0.2
0.1
0.0
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
True Initial Cross-Correlation ρ 0
Fig. 1 Distortions in the posterior cross-correlation caused by the assumption of
independence ρ1A = ρ 0A = 0. The graph shows the cross-correlation ρ1R that results
from an adjustment assuming independence as a function of the true initial crosscorrelation ρ0, for different values of σ1 when σ0 = 1; dotted lines corresponds to
ρ1R = ρ0.
192
Jery R. Stedinger & Young-Oh Kim
Assuming Series are Independent
6
5
4
Yi 3
2
1
0
0
1
2
3
4
5
6
5
6
Xi
Assum ing Series have Cross-correlation = 0.80
6
5
4
Yi 3
2
1
0
0
1
2
3
4
Xi
Fig. 2 The pdf ratios calculated for a bivariate normal distribution with {μ0x, μ0y} =
{μ1x, μ1y} = {3,3} and initial {σ0x, σ0y} = (1,1), versus target {σ1x, σ1y} = {1.5,1.5}. The
area of a circle is proportional to the probability of each point. Lower figure adopts
correct assumption that ρ1 = ρ0 = 0.8.
Figures 2 and 3 consider the following situation wherein one has two variables X
and Y that have a joint normal distribution with a cross-correlation of 0.80. The actual
sample values for both of the variables are the constructed sets employed earlier in
Stedinger & Kim (2002), and those discrete values have been paired following a joint
normal distribution to obtain this sample that has a cross-correlation of 0.80. Figure 2
shows how the pdf-ratio changes the probabilities assigned to the points so as to
accommodate a forecast that increased the variance from 1 to 1.5, while the mean
remains unchanged. Figure 3 shows how the pdf-ratio changes the probabilities if one
seeks to decrease the mean from 3 to 2.5 (looks like drought) while decreasing the
Adjusting ensemble forecast probabilities to reflect several climate forecasts
193
Assum ing Series are Independent
6
5
4
Yi 3
2
1
0
0
1
2
3
4
5
6
5
6
Xi
Assuming Series have Cross-correlation = 0.80
6
5
4
Yi 3
2
1
0
0
1
2
3
4
Xi
Fig. 3 The pdf ratios calculated for a bivariate normal distribution with initially {μ0x,
μ0y} = {3,3} and {σ0x, σ0y} = (1,1), versus {μ1x, μ1y} = {2.5,2.5} and {σ1x, σ1y} =
{0.5,0.5}. The area of a circle is proportional to the probability of each point. Lower
figure adopts correct assumption that ρ1 = ρ0 = 0.8.
variance from 1 to 0.5. The figures show the adjusted probabilities when one assumes
that the variables are initially and conditionally independent ρ0A = ρ1A = 0 even though
that is false because ρ0 = 0.8; the figures also show the adjusted probabilities when the
assumed cross-correlations match the true value: ρ0 = ρ0A = ρ1A = 0.80.
In Fig. 2, the differences between the two sets of probabilities employing ρ0A = ρ1A
= 0, and ρ0 = ρ0A = ρ1A = 0.80 are remarkable. If one expands the variance assuming
independence, almost all of the probability shifts to the points with the largest values;
the other points almost disappear in the figure. On the other hand, correctly reflecting
the initial and conditional cross-correlation of 0.80 between the variables results in
probabilities that yield a cross-correlation of 0.78 in this instance, using relatively
194
Jery R. Stedinger & Young-Oh Kim
similar weights on a wide range of the points. The two representations of possible
future events are dramatically different.
In the second case that is displayed in Fig. 3, we seek to describe a potential
drought situation in what may be two large basins which conditionally have a
decreased mean flow of 2.5 and a standard deviation of 0.5. In this case, assuming
independence places almost all the probability symmetrically on a small circular
cluster of points, as is appropriate if x and y are independent. On the other hand,
correctly including the cross-correlation in the adjustment results in a set of
probabilities that sweep across the data set yielding a cross-correlation of 0.80 for the
discrete approximation. These characteristics of the solution become even more
extreme if the standard deviations are to be increased to 0.25 with a mean of 2.0; then
incorrectly assuming independence resulted in a cross-correlation of –0.06 (essentially
zero), whereas correctly assuming a cross-correlation of 0.80 yielded an approximation
with a cross-correlation of 0.75.
CONCLUSIONS
The pdf-ratio method is reasonably simple and flexible. It can be used with different
univariate and multivariate families of distributions describing the initial and target
distributions for climate variables. It deals with the multivariate forecast case without
difficulty. Examples clearly demonstrate the need in the multivariate forecast case to
be careful in the specification of the initial and target conditional multivariate
distributions of the climate variables. We conclude that if separate and independent
adjustments are adopted to capture the conditional probabilities of different variables
for different forecast periods or different basins, as was done in Croley (2000, pp. 110–
112), then the resulting joint distribution of such variables can be grossly distorted.
REFERENCES
Croley II, T. E. (2000) Using Meteorology Probability Forecasts in Operational Hydrology. ASCE Press, Reston,
Virginia, USA.
Croley II, T. E. (2003) Weighted-climate parametric hydrologic forecasting. J. Hydrol. Engng 8(4), 171–180.
Kelman, J., Stedinger, J. R., Cooper, L. A., Hsu, E. & Yuan, S. (1990) Sampling stochastic dynamic programming applied
to reservoir operation. Water Resour. Res. 26(3), 447–454.
National Research Council (2006) Completing The Forecast: Characterizing and Communicating Uncertainty for Better
Decisions Using Weather and Climate Forecasts. National Academy Press, Washington DC, USA.
Schaake, J. C., Hartman, R., Demargne, J., Mullusky, M., Welles, E., Wu, L. & Fan, X. (2004) Ensemble streamflow
prediction by the National Weather Service (NWS) Advanced Hydrologic Prediction Services (AHPS). In: 2004
Joint Assembly of AGU, CGU, SEG, and EEGS. American Geophysical Union, Montreal, Canada.
Stedinger J. R. & Kim, Y.-O. (2002) Updating ensemble probabilities based on climate forecast. In: Proceedings 2002
Conference on Water Resources Planning and Management (ed. by D. F. Kibler), paper C-2-109. ASCE, Reston,
Virginia, USA.
Stedinger J. R. & Kim, Y.-O. (2007) Probabilities for ensemble forecasts reflecting climate information. Water Resour.
Res. (submitted).
Toth, Z. & Kalnay, E. (1993) Ensemble forecasting at NMC: The generation of perturbations. Bull. Am. Met. Soc. 74,
2317–2330.
Tracton, M. S. & Kalnay, E. (1993) Operational ensemble prediction at the National Meteorological Center: practical
aspects. Weather Forecast 8, 378–398.
Wilks, D. S. (2002) Realizations of daily weather in forecast seasonal climate. J. Hydromet. 3(2), 195–207.
Wilks, D. S. (2006) Statistical Methods in the Atmospheric Sciences, second edn. Academic Press/Elsevier, Burlington,
Massachusetts, USA.