this forecast - NCAR-RAL

Fuzzy verification of
fake cases
Beth Ebert
Center for Australian Weather
and Climate Research
Bureau of Meteorology
1
NCAR, 15 April 2008
Fuzzy (neighborhood) verification
• Look in a space / time neighborhood around the
point of interest
t
Frequency
Frequency
observation
t-1
forecast
t+1
Forecast value
– Evaluate using categorical, continuous, probabilistic
scores / methods
– Will only consider spatial neighborhood for fake cases
NCAR, 15 April 2008
2
Fuzzy verification framework
Fuzzy methods use one of two approaches to compare
forecasts and observations:
single observation –
neighborhood forecast
(user-oriented)
observation
forecast
observation
forecast
neighborhood observation
– neighborhood forecast
(model-oriented)
NCAR, 15 April 2008
3
Fuzzy verification framework
good performance
poor performance
NCAR, 15 April 2008
4
Upscaling
Neighborhood observation - neighborhood forecast
Average the forecast and observations to successively
larger grid resolutions, then verify as usual
% change in ETS
NCAR, 15 April 2008
Weygandt et al. (2004)
5
Fractions skill score
Neighborhood observation - neighborhood forecast
Compare forecast fractions with observed fractions
(radar) in a probabilistic way over different sized
neighbourhoods
1 N
(Pfcst  Pobs )2

N i 1
FSS  1 
1 N
1 N
2
2
P

P
 fcst N 
obs
N i 1
i 1
observed
NCAR, 15 April 2008
forecast
Roberts and Lean (2008)
6
Spatial multi-event contingency table
Single observation - neighborhood forecast
Measure how close the forecast is to the place / time / magnitude
of interest.
Vary decision thresholds:
• magnitude (ex: 1 mm h-1 to 20 mm h-1)
• distance from point of interest (ex: within 10
km, .... , within 100 km)
ROC
• timing (ex: within 1 h, ... , within 12 h)
• anything else that may be important in
interpreting the forecast
single threshold
Fuzzy methodology – compute Hanssen and
Kuipers score HK = POD – POFD
NCAR, 15 April 2008
Atger (2001)
7
Practically perfect hindcasts
Single observation - neighborhood forecast
Q: If the forecaster had all of the observations in advance, what would the
"practically perfect" forecast look like?
– Apply a smoothing function to the observations to get probability contours,
choose yes/no threshold that maximizes CSI when verified against obs
– Did the actual forecast look like the practically perfect forecast?
– How did the performance of the actual forecast compare to the
performance of the practically perfect forecast?
Fuzzy methodology –
compute
ETS f orecast
ETS PracPerf
forecast
CSIforecast = 0.34
PracPerf
CSIPracPerf = 0.48
Kay and Brooks (2000)
NCAR, 15 April 2008
8
12.7 mm
1st geometric case
50 pts to the right
25.4 mm
good
bad
NCAR, 15 April 2008
9
2nd geometric case
200 pts to the right
good
bad
NCAR, 15 April 2008
10
5th geometric case
125 pts to the right and huge
good
bad
NCAR, 15 April 2008
11
1st case vs. 5th case
Case 1
better
~same
Case 5
better
NCAR, 15 April 2008
12
Perturbed cases
(4) Shift 24 pts right, 40 pts
down
Which forecast is better?
1000 km
(6) Shift 12 pts right, 20 pts
down, intensity*1.5
"Observed"
NCAR, 15 April 2008
13
4th perturbed case
24 pts right, 40 pts down
good
bad
NCAR, 15 April 2008
14
6th perturbed case
12 pts right, 20 pts down, intensity*1.5
good
bad
NCAR, 15 April 2008
15
Difference between cases 6 and 4
Case 4 - Shift 24 pts right, 40 pts down
Case 6 - Shift 12 pts right, 20 pts down, intensity*1.5
6
Case 6 – Case 4
4
NCAR, 15 April 2008
16
How do fuzzy results for shift + amplification
compare to results for the case of shifting only?
Case 6 - Shift 12 pts right, 20 pts down, intensity*1.5
Case 3 - Shift 12 pts right, 20 pts down, no intensity change
Case 6 – Case 3
6
3
Why does the case with incorrect amplitude sometimes perform
better??
Baldwin and Kain (2005): When the forecast is offset from the
observations most scores can be improved by overestimating rain area,
provided rain is less common than "no rain".
NCAR, 15 April 2008
17
Some observations about methods
Traditional
• Measures direct
correspondence of
forecast and observed
values at grid scale
• Hard to score well
unless forecast is
~perfect
• Requires overlap of
forecasts and obs
Entity-based (CRA)
Fuzzy
• Measures location
• Measures scale- and
error and properties of
intensity-dependent
blobs (size, mean/max
similarity of forecast to
intensity, etc.)
observations
• Scores well if forecast • Forecast can score
looks similar to
well at some scales
observations
and not at others
• Does not require much • Does not require
overlap to score well
overlap to score well
NCAR, 15 April 2008
18
Some final thoughts…
Object-based and fuzzy verification seem to have
different aims
Object-based methods
• Focus on describing the error
• What is the error in this forecast?
• What is the cause of this error (wrong location, wrong size,
wrong intensity, etc.)?
Fuzzy neighborhood methods
• Focus on skill quantification
• What is the forecast skill at small scales? Large scales?
Low/high intensities?
• What scales and intensities have reasonable skill?
• Different fuzzy methods emphasize different aspects of skill
NCAR, 15 April 2008
19
Some final thoughts…
When can each type of method be used?
Object-based methods
• When rain blobs are well defined (organized systems, longer
rain accumulations)
• When it is important to measure how well the forecast predicts
the properties of systems
• When size of domain >> size of rain systems
Fuzzy neighborhood methods
• Whenever high density observations are available over a
reasonable domain
• When knowing scale- and intensity-dependent skill is important
• When comparing forecasts at different resolutions
NCAR, 15 April 2008
20