Towards ice-core-based synoptic reconstructions of west antarctic

INTERNATIONAL JOURNAL OF CLIMATOLOGY
Int. J. Climatol. 25: 581–610 (2005)
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/joc.1143
TOWARDS ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS OF WEST
ANTARCTIC CLIMATE WITH ARTIFICIAL NEURAL NETWORKS
a
b
DAVID B. REUSCH,a, * BRUCE C. HEWITSONb and RICHARD B. ALLEYa
Department of Geosciences and EMS Environment Institute, The Pennsylvania State University, University Park, PA 16802 USA
Department of Environmental and Geographical Sciences, University of Cape Town, Private Bag, Rondebosch 7701, South Africa
Received 9 March 2004
Revised 22 August 2004
Accepted 12 November 2004
ABSTRACT
Ice cores have, in recent decades, produced a wealth of palaeoclimatic insights over widely ranging temporal and
spatial scales. Nonetheless, interpretation of ice-core-based climate proxies is still problematic due to a variety of issues
unrelated to the quality of the ice-core data. Instead, many of these problems are related to our poor understanding
of key transfer functions that link the atmosphere to the ice. This study uses two tools from the field of artificial
neural networks (ANNs) to investigate the relationship between the atmosphere and surface records of climate in West
Antarctica. The first, self-organizing maps (SOMs), provides an unsupervised classification of variables from the midtroposphere (700 hPa temperature, geopotential height and specific humidity) into groups of similar synoptic patterns.
An SOM-based climatology at annual resolution (to match ice-core data) has been developed for the period 1979–93
based on the European Centre for Medium-Range Weather Forecasts (ECMWF) 15-year reanalysis (ERA-15) dataset.
This analysis produced a robust mapping of years to annual-average synoptic conditions as generalized atmospheric
patterns or states. Feed-forward ANNs, our second ANN-based tool, were then used to upscale from surface data to
the SOM-based classifications, thereby relating the surface sampling of the atmosphere to the large-scale circulation of
the mid-troposphere. Two recorders of surface climate were used in this step: automatic weather stations (AWSs) and
ice cores. Six AWS sites provided 15 years of near-surface temperature and pressure data. Four ice-core sites provided
40 years of annual accumulation and major ion chemistry. Although the ANN training methodology was properly designed
and followed standard principles, limited training data and noise in the ice-core data reduced the effectiveness of the
upscaling predictions. Despite these shortcomings, which might be expected to preclude successful analyses, we find
that the combined techniques do allow ice-core reconstruction of annual-average synoptic conditions with some skill.
We thus consider the ANN-based approach to upscaling to be a useful tool, but one that would benefit from additional
training data. Copyright  2005 Royal Meteorological Society.
KEY WORDS:
ice cores; synoptic reconstruction; artificial neural networks; self-organizing maps; West Antarctica
1. INTRODUCTION
This work seeks to use ice-core proxy datasets to reconstruct 40 years (1954–93) of West Antarctic annual
climate as seen in the mid-tropospheric circulation using the well-known but poorly understood link between
the atmosphere and ice cores. Ice-core proxy data is calibrated to atmospheric circulation data for the period
1979–93 and then used to predict the latter for the period 1954–78. As an additional test of our method,
automatic weather station (AWS) data are also used for a limited reconstruction in the 1979–93 period. As
described in further detail in the following sections, our approach to this problem consists of four steps.
1. Simplify the atmosphere: self-organizing maps (SOMs) extract patterns of variability by doing a
classification into generalized states. These patterns are useful both for studying the recent atmosphere
and as components of the subsequent reconstruction.
* Correspondence to: David B. Reusch, Department of Geosciences and EMS Environmental Institute, The Pennsylvania State
University, University Park, PA 16802, USA; e-mail: [email protected]
Copyright  2005 Royal Meteorological Society
582
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
2. Link ice-core proxy and AWS datasets to the atmosphere: a feed-forward artificial neural network (FF
ANN) is trained to predict the patterns from (1) using either ice-core or AWS data. This step calibrates
the upscaling tool.
3. Reconstruct earlier climate: ice-core data outside the calibration period are used with the trained FF ANN
to predict the associated atmospheric patterns for the rest of the ice-core period. Given sufficient confidence
in these predictions, they would be used to develop a full time series of reconstructed climate.
4. Evaluate the methodology: confidence in a climate reconstruction is tied to the data and steps involved in
creating it. We thus evaluate the reliability of the SOM-based analysis and FF ANN-based upscaling steps
and comment on issues associated with the ice-core data that reduce skill in the upscaling step.
1.1. Ice cores and climate
Decades of research have shown ice cores to be extremely valuable records of the Earth’s climate from
subannual to millenial time scales and beyond (e.g. Wolff and Peel, 1985; Legrand et al., 1988; Mayewski
et al., 1988, 1997; Zielinski et al., 1994; White et al., 1999). As with marine sediment cores, tree rings and
other climate proxies, interpretation of the ice-core record of palaeoclimate is not always straightforward.
In many cases, although the data are of unquestionably high quality and temporal resolution, our poor
knowledge of the relevant transfer functions can greatly reduce the value of the record. This is a recognized
problem (Waddington, 1996) and, although process studies and field work have helped greatly in some areas
(e.g. in improving our understanding of how δ 18 Oice records temperature), there are still many gaps in the
knowledge we need to understand the proxy records fully. This is particularly true in the Antarctic, where
direct observational data for the atmosphere are hard to obtain and, when available, tend to be relatively short
(by climatological standards) and spatially limited.
Ice cores record many different aspects of the climate system, sometimes in multiple ways. Each proxy
captures one or more climate features in a way that will likely differ in both space and time to varying
degrees. For example, the relationship between δ 18 Oice and temperature can be different at different places
and times (Alley and Cuffey, 2001; Jouzel et al., 2003) as other influences on δ 18 Oice vary in their relative
effect. Furthermore, many proxies are only captured during precipitation events. This can lead to biases in
the proxy when precipitation is seasonally variable. For example, if wet deposition is the dominant capture
process for a chemical species and snow only falls during the summer, then the ice-core record for that
species will be biased towards a picture of the summer atmosphere. Unfortunately, the subannual character
of precipitation is typically not very well known at West Antarctic ice-core sites, and we are often forced
to assume that snow falls uniformly throughout the year. Subannual sampling is still possible, and indeed
necessary to reconstruct annual cycles of chemical species, but it must be remembered that these data are
projected onto an underlying assumption about uniform snowfall. Thus, unless detailed subannual process
data are available (e.g. Kreutz et al., 1999), we are limited to studies of ice-core proxies at annual resolution.
High-resolution meteorological data (reanalysis and/or observational) provide a means to study relationships
between ice-core proxies and the atmosphere over subannual intervals, but we remain limited to annual (or
possibly semiannual)-resolution climate reconstructions from the proxies.
1.2. The meteorological record and reanalysis datasets
The best meteorological datasets in the Antarctic are typically from two areas: the coastal stations, such as
McMurdo, Mawson and Halley Bay, and the two long-term interior plateau stations, South Pole and Vostok
(Figure 1). Records from elsewhere in the Antarctic interior are limited, with few exceptions, to AWSs and
short-term data collection during traverses and ice-core drilling operations (e.g. Siple Dome). The latter often
only represent the summer field season (when the sites are occupied). AWSs provide year-round data, apart
from instrument problems, but only measure the near-surface environment in a limited manner. The AWS
network, nonetheless, provides an invaluable sampling of the West Antarctic atmosphere.
The shortage of direct observational data has, in turn, increased the importance and utility of numerical
forecast/data assimilation/analysis products for Antarctic climate research. The two most widely used datasets
of this type are from the National Centers for Environmental Prediction–National Center for Atmospheric
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
583
Figure 1. Site map showing AWS (solid circles) and ice-core sites (squares) of this study, and other sites (open circles) and regions
mentioned in the text. CWA (central West Antarctica) collectively describes four ice core sites (A, B, C, D) published in Reusch et al.
(1999)
Research (NCEP–NCAR) in the USA and the European Centre for Medium-Range Weather Forecasts
(ECMWF) in the UK. Both forecast models have problems in the Antarctic (e.g. Bromwich et al., 1995;
Genthon and Braun, 1995; Cullather et al., 1997) and the shortage of observations tends to produce analyses
that resemble the forecast more than in areas with more available observations. Nonetheless, despite their
shortcomings (e.g. Marshall, 2002; Bromwich and Fogt, 2004), these products are still much better options
than having only the observational data. ECMWF and NCEP–NCAR have each produced so-called reanalysis
versions of their model predictions (Kalnay et al., 1996; ECMWF, 2000; Kistler et al., 2001). A reanalysis
uses one version of the forecast model for the duration of the study period and thus removes changes to the
model as a source of changes in the forecasts. Other factors, such as addition and removal of observational
data over time, remain as variables affecting the skill of the reanalyses, but these are external to the model.
In short, the reanalysis datasets provide a realistic, 6 h picture of the atmosphere with reasonable horizontal
and vertical resolution in an otherwise data-sparse region.
1.3. Classification and upscaling: towards synoptic reconstructions
The focus of our study has thus been the link between annually resolved ice-core proxies, e.g. major
ion chemistry and accumulation, and the annually resolved atmosphere using two tools from the field of
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
584
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
ANNs. First, SOMs (Kohonen, 1990, 1995) provide a classification of meteorological variables from the midtroposphere (from the ECMWF 15-year reanalysis, ERA-15) into groups of similar synoptic patterns. SOMs
have been used for climate downscaling (Crane and Hewitson, 1998), as a mechanism for climate classification
(Cavazos, 1999, 2000), and for examining changes in climate over time (Hewitson and Crane, 2002), all in midlatitude settings. In this study, we have developed an SOM-based climatology of mid-tropospheric variables
for annually averaged ERA-15 data. Second, FF ANNs allow us to find a possibly nonlinear relationship
between a set of predictors (e.g. upper air variables) and a set of targets (e.g. AWS surface observations).
In particular, we have used FF ANNs to upscale from AWS and ice-core predictor data to SOM-based
synoptic classification targets. In this way, past atmospheric conditions can be predicted from proxy data with
a confidence based on the quantity and nature of the training data. Unfortunately, the short training period
(15 years) and limited spatial extent of the selected ice-core data appear to prohibit a simple, deterministic
reconstruction from high-confidence, well-defined predictions. Instead, a more probabilistic approach is needed
to make up for the shortcomings in these datasets. Pending development of a more robust methodology and
improved datasets, we are limited to an evaluation of what can be done with the current data.
In Section 2 we describe the AWS, ECMWF and ice-core datasets. An overview of the ANN tools used
is given in Section 3 (also see Hewitson and Crane (2002) and Reusch and Alley (2002). Section 4 presents
the SOM-based annual synoptic climatology and upscaling results/atmospheric reconstruction from AWS and
ice-core data. Issues related to the methodologies and input data are covered in Section 5.
2. DATA
2.1. ECMWF data
The ECMWF 15-year reanalysis data product (ERA-15) provided global-scale meteorological data for
the period 1979–93 (fully described in Gibson et al. (1999)). The original ERA-15 production system used
spectral T106 resolution with 31 vertical hybrid levels (terrain following and higher resolution in the lower
troposphere, lower resolution in the stratosphere). The lower resolution product used here was derived from
the production system and provides 2.5° horizontal resolution for the surface and 17 upper air pressure levels.
Six-hourly data are available at 0, 6, 12 and 18 UTC. Annual averages of the 6-h data were normalized
to the 1979–93 baseline (by respectively subtracting and dividing by the full dataset mean and standard
deviation) prior to the SOM analysis (Section 3.1). Figure 2 is an example of the annual-average 700 hPa
temperatures used.
Potential problems have been noted with ECMWF (re)analysis data over Antarctica. (Comments in this
section refer to ERA-15 and operational analyses prior to development of the ERA-40 dataset, which was
unavailable for this project. Some of these issues have been resolved in ERA-40, but other problems remain,
many of which are related simply to the lack of observational data in this region prior to the satellite era
(Bromwich and Fogt, 2004).) The first relates to the flawed surface elevation dataset used by ECMWF for this
region (Genthon and Braun, 1995). Elevation errors exceeding 1000 m exist in some areas of Queen Maud
Land and the Antarctic Peninsula (e.g. Genthon and Braun, 1995: figure 3). Topography in West Antarctica
is generally much better, but errors from outside our study area will still have an influence on the reanalysis
data (e.g. an elevation error for Vostok station has broad effects on geopotential heights). The horizontal
resolution of the model also introduces unavoidable elevation errors in areas where the relief is high relative
to grid spacing (e.g. Bromwich and Fogt, 2004). The ECMWF model also suffers from two issues affecting
skill in the near-surface region: relatively low vertical resolution near the surface (even with the hybrid levels)
and specification of ice-shelf regions as permanent sea ice. Both lead to possible errors in the surface energy
balance due to unresolved katabatic flows and incorrect physics over the ice-shelf regions (Bromwich et al.,
2004). The latter leads to large errors in ERA-15 surface temperatures when compared with available AWS
data (Reusch and Alley, 2002, 2004). Evaluations of several operational products (e.g. Bromwich et al., 1995,
2000; Cullather et al., 1998) and discussions with experienced polar meteorologists (D. Bromwich, J. Turner,
personal communications) suggest that the ECMWF analyses are the best data sets currently available for
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
585
Figure 2. Annual average 700 hPa temperature for ERA-15 period (1979–93) as grid-point anomalies from the grid-wide average after
normalizing with the full-period, grid-wide average and standard deviation. Temperature generally decreases poleward; thus, darkest
shades represent warmest (coldest) temperatures in the north (south). Zero contour is in bold and values close to zero are not shaded
Antarctica (see also Bromwich et al. (1998)), although the ECMWF 40-year reanalysis (ECMWF, 2001) may
set a new standard when it is readily available.
2.2. AWS data
The main source of direct meteorological data in West Antarctica is the network of AWSs maintained
by the University of Wisconsin-Madison since 1980 (Lazzara, 2000). All stations provide near-surface air
temperature, pressure, and wind speed and direction; some stations also report relative humidity and multiple
vertical temperatures (e.g. for vertical temperature differences). The main instrument cluster is nominally
within 3 m above the snow surface. This distance changes with snow accumulation and removal. Pressure
is calibrated to ±0.2 hPa with a resolution of approximately 0.05 hPa. Temperature accuracy is 0.25–0.5 ° C,
with lowest accuracy at −70 ° C, i.e. accuracy decreases with decreasing temperature (M. Lazzara, personal
communication). The data used here are from the 3 h quality-controlled datasets available at the University
of Wisconsin-Madison FTP site (ice.ssec.wisc.edu). A 6 h subset of these data (for 0, 6, 12 and 18 UTC) is
used to match ECMWF time steps (see below).
The AWSs used in this study are shown in Figure 1 and summarized in Table I. Two of the sites, Byrd and
Ferrell, represent the oldest two AWSs still in operation. Siple was installed at around the same time as these
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
586
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
Siple
1
0.5
0
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1988
1989
1990
1991
1992
1993
1994
1988
1989
1990
1991
1992
1993
1994
1988
1989
1990
1991
1992
1993
1994
1988
1989
1990
1991
1992
1993
1994
1988
1989
1990
1991
1992
1993
1994
1988
1989
1990
1991
1992
1993
1994
Byrd
1
0.5
0
1979
1980
1981
1982
1983
1984
1985
1986
1987
Lettau
1
0.5
0
1979
1980
1981
1982
1983
1984
1985
1986
1987
Marilyn
1
0.5
0
1979
1980
1981
1982
1983
1984
1985
1986
1987
Elaine
1
0.5
0
1979
1980
1981
1982
1983
1984
1985
1986
1987
Ferrell
1
0.5
0
1979
1980
1981
1982
1983
1984
1985
1986
1987
Average
1
0.5
0
1979
1980
1981
1982
1983
1984
1985
1986
1987
Figure 3. Relative proportions of observations (grey) and predictions (white) in AWS temperature and pressure records on a monthly
basis. See Table I for site installation dates
Table I. AWS locations and other useful data
Station
Latitude
Longitude
Elevation (m)
Date Installed
Distancea (km)
Byrd Station
Elaine
Ferrell
Lettau
Marilyn
Siple
80.01 ° S
83.13 ° S
77.91 ° S
82.52 ° S
79.95 ° S
75.90 ° S
119.40 ° W
174.17 ° E
170.82 ° E
174.45 ° W
165.13 ° E
84.00 ° W
1530
60
45
55
75
1054
February 1980
January 1986
December 1980
January 1986
January 1987
January 1982b
11.5
71
49.5
8.3
6.1
103.9
a Distance
b Siple
to the nearest ERA-15 gridpoint.
AWS was removed in April 1992.
AWSs but was removed in 1992 due to logistical problems. Siple’s remoteness from McMurdo-based field
support and the high accumulation rates in this region were the main reasons that this station was removed.
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
587
The remaining sites were installed in 1986 and 1987. All sites are within the south/southeast Pacific sector
of West Antarctica. Figure 3 summarizes the availability of each AWS for the study period (1979–93) as the
fraction of observations recorded each month. An average for the suite of AWSs is also shown. In general,
availability is either quite high or quite low, with few monthly values in between. The absence of data from
Siple in 1985–87 were not directly related to failure of the meteorological instruments but to other factors
(primarily power related, C. Stearns, personal communication, 2002). Otherwise, most data loss is related to
winter-season failures and the subsequent wait until the austral summer field season for repair.
Because all AWS sites had periods of missing observations (due to failures or just not being active for
the full study period), we have developed an ANN-based technique to supply the missing data (Reusch and
Alley, 2002). The AWS records used in this study are thus a merger of observations and predictions based
on our technique. Numerous sources are available for ANN theory and practice (e.g. Hewitson and Crane,
1994; Gardner and Dorling, 1998; Haykin, 1999; Demuth and Beale, 2000); thus, we will provide only a
short overview of our approach here. Briefly, upper air data from ERA-15 provide predictors for available
AWS temperature and pressure observations. The training methodology explores numerous predictors, ANN
parameters and ensembles of FF ANNs to develop the most skilful network (within the search space). Because
of variable (i.e. temperature and pressure) and site differences, a separate ANN is used for each variable at
each site (for a total of 12 ANNs) to predict the missing observations from upper air data at the corresponding
time steps. The average root-mean-square errors from ANN training (i.e. calibration of the prediction tool)
for all six AWS sites were 2.9 ° C and 1.9 hPa for monthly average temperature and pressure respectively.
The average correlation r of monthly average ANN training predictions and AWS observations was 0.96.
The AWS-prediction ANNs were implemented with the MATLAB Neural Network Toolbox (Haykin, 1999;
Demuth and Beale, 2000). The climatological/statistical properties of this dataset are described in more detail
elsewhere (Reusch and Alley, 2004).
2.3. Ice-core data
Four shallow firn/ice cores (<90 m) from central West Antarctica (Reusch et al., 1999) provided highresolution glaciochemical and annual accumulation data for comparison with the ERA-15 data (Figure 1). Each
core was originally sampled at high resolution (continuously every 3 cm or 10–12 samples per year) to capture
the annual signals in the major soluble ions of atmospheric chemistry: Na+ , K+ , Mg2+ , Ca2+ , Cl− , NO3 − and
SO2−
4 . Annual averages of the chemistry time series were created from the original subannual resolution time
series for this study. Reusch et al. (1999) provide full details on the dating and development of these records.
A number of other ice-core datasets are available, but most fail one or both requirements for being useful
in a study such as this: availability to the community and sufficient overlap with the calibration period
(1979–93). For example, the Byrd Station dataset of Langway et al. (1994) is not readily available and has
only a 10 year overlap with the calibration period. Overlap is also an issue with the Siple Station cores of
Mosley-Thompson et al. (1991). This situation will improve with time as datasets become available from
projects such as the US ITASE program (US ITASE Steering Committee, 1996). Future work will incorporate
these datasets and others with sufficient overlap with whatever calibration period is used.
3. METHODS
Our overall methodology breaks down into two main areas: development of a synoptic classification of the
atmospheric circulation using SOMs and training/application of FF ANNs to do upscaling from ice-core proxy
and AWS datasets to the classified atmospheric patterns.
3.1. Synoptic classification and SOMs
3.1.1. SOMs. SOMs (Kohonen, 1990, 1995) provide a means to do unsupervised classification of large,
multivariate data sets into a fixed number of distinct generalized states or modes. Hewitson and Crane
(2002) reviewed SOM applications and issues in climatology, so only a brief introduction will be given here.
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
588
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
SOM analysis effectively quantizes a continuous input space to a finite-state output space (the SOM map).
Multidimensional data are projected onto a plane as a rectangular grid (or map) of discrete, generalized states
extracted from the data through the training process. The size of the grid (number of states) directly influences
the amount of generalization: smaller (larger) maps have fewer (more) available states in the grid, so the final
states developed during training will tend to do more (less) generalization of the input. Each map state, or
node, is associated with a reference vector that represents the projection from multidimensional space to the
two-dimensional SOM space for that state. For example, in an analysis of input containing 10 variables,
each reference vector will have length 10. At the start of SOM training, reference vectors are initialized
either randomly (distributed across the input data space) or based on the first two principal eigenvectors of
the training data (Kohonen et al., 1996). The latter often has the advantage of faster subsequent training.
At the end of SOM training, each reference vector represents a generalized state extracted from the input
space. States in the SOM grid are usually identified by an (x, y) column, row coordinate pair, where x = 0
to xmax − 1 and y = 0 to ymax − 1.
SOM training is based on iterative adjustment of the reference vectors during SOM training phases. Each
phase alternates between mapping input records and adjusting reference vectors. Each input record is matched
to the closest reference vector (normally determined via Euclidean distance). The value of the matching
reference vector is then adjusted towards the value of the input record by an amount determined by the
current learning rate. The learning rate is a dimensionless parameter used to promote stability of the reference
vectors during training. The difference between an input record and its closest reference vector is scaled
by the learning rate, with the result used to adjust the reference vector. A learning rate of zero makes no
adjustments and a value of one applies the complete difference. Typical values in this work range from 0.01
to 0.05. A key feature of SOMs is that the reference vectors of a neighbourhood of nodes adjacent to the best
match are also updated, but to a lesser degree. The size of the neighbourhood and the learning rate (amount
of adjustment of the reference vector) are both reduced as training progresses. This process produces vectors
representing distinct portions of the input space. Nodes will also be most similar to adjacent nodes that each
represent a nearby region of the input space. Similarity between map nodes thus decreases with increasing
internode distance. That is, adjacent nodes have the greatest similarity and diagonally opposite corners will
have the largest dissimilarity. This is a direct result of the SOM training process (Kohonen, 1995).
In turn, SOM analysis typically involves two logical training phases: ordering and refinement. During the
ordering phase, the general shape of the map is determined. The adjustment neighbourhood starts at the
value of the larger map dimension (e.g. 5 for a 5 × 3 SOM) so that all nodes will be initially affected. The
learning rate starts at a relatively high value (e.g. 0.05, or 5% of the difference between an input record and
its closest reference vector is applied to the reference vector) so that changes are initially relatively large.
In the refinement phase (or phases), the initial size of the adjustment neighbourhood and learning rate are
reduced (e.g. to 2 and 0.02 respectively) to attempt finer adjustments over smaller subareas. This enables
further separation of related nodes from less-related neighbours (assuming a stable configuration does not
already exist). Whereas the ordering phase may produce slightly different classifications depending on the
number of training iterations, a properly applied refinement phase will lead to a convergent solution.
Once training is complete, each reference vector is an abstraction of a portion of the input data space and
each input vector maps to one reference vector, i.e. a node of the SOM map and the data mapping to that
node share similar characteristics. Because of the quantizing nature of the SOM generalization, input data
are generally not identical to their mapped reference vectors and a residual difference remains. Depending on
the application, the residuals may be useful, e.g. to compare records mapping to the same reference vector.
The quantizing process may also lead to SOM nodes that have no mapped input vectors because there are no
training data in the region of the input space represented by the SOM node. Such SOM nodes are perfectly
valid; they just represent states intermediate between those seen in the training process. It is possible that
data not present in the training set may map to one of these nodes in the future. This is an example of the
robust nature of the SOM classification process.
A SOM classification differs from more traditional linear analysis in a number of ways that give SOMs
additional power over nonlinear datasets. For example, empirical orthogonal function (EOF) analysis, or
principal component analysis (PCA), and its variants, has been widely and successfully used for many years in
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
589
the climate and atmospheric sciences to simplify large, multivariate datasets for the purposes of interpretation
and understanding (e.g. Smith et al., 1996; Mayewski et al., 1997; Sinclair et al., 1997; Reusch et al., 1999;
von Storch and Zwiers, 1999; Schneider and Steig, 2002). Because an EOF analysis by definition produces a
linear combination of orthogonal variables, it may not always be appropriate to apply this technique to data
known to have nonlinear characteristics. The resultant EOF components may not represent realistic groups of
variables (Barry and Carleton, 2001). EOF analysis also has other pitfalls less related to nonlinearity, such
as sample size issues and determining which components to retain (e.g. North et al., 1982; von Storch and
Zwiers, 1999; Barry and Carleton, 2001), that require significant experience with the technique and knowledge
of the input data. Although the SOM technique is not free of a learning curve, it does not force the data into
orthogonal linear combinations. SOM states are not linearly spaced, rather they represent an even coverage
of the probability density function (PDF) of the input data (Hewitson and Crane, 2002). Distances between
SOM states vary with the magnitude of the difference between the generalized patterns of each state. For
example, a subset of similar states is likely to be well separated from the remaining states in the grid but be
relatively close to one another. SOMs also interpolate into regions of the PDF without input data (Hewitson
and Crane, 2002). Lastly, the sum of the reference vectors from an SOM analysis will not reconstruct the
original input data, as would a summation of EOF components, because each input record has a residual with
its closest matching reference vector. The SOM states are generalized patterns from the input data, not the
data themselves.
3.1.2. Synoptic classification. Three ERA-15 variables from the middle troposphere (700 hPa) were used
in the SOM-based analyses: temperature T , geopotential height Z and specific humidity q. The 700 hPa level
was selected, somewhat subjectively, to get above ERA-15 near-surface problems and the physical surface
for (most of) West Antarctica. (A continent-wide analysis would require us to move to at least 600 hPa
because this is the first pressure level fully above the surface for the full continent.) These three variables
were selected to capture the synoptic circulation and moisture transports. (Full characterization of moisture
transport, as in Bromwich et al. (2000), also requires u and v wind components.) Because of the converging
lines of longitude in polar regions, and so that grid points represented similar spatial areas, the ERA-15 data
were first resampled to an equal-area grid. Both 250 km and 125 km versions of the National Snow and
Ice Data Center EASE-Grid (Armstrong and Brodzik, 1995) were tested with the 125 km grid being used
for this study. Grid-scale means and standard deviations were then calculated from 6 h values for annual,
semiannual and seasonal time scales for each variable from the regridded data (but only annual data have
been fully analysed). Because T , Z and q have widely different mean and extreme values, each variable was
standardized to avoid scale problems in the SOM. Point-wise anomalies from the 15-year grid-point means
of the standardized values were calculated for each grid point to highlight patterns of variability. Finally, the
six variables (anomalies of the mean and standard deviation of T , Z and q) were combined for input to the
SOM software. With each input variable representing spatial data, the SOM grid is effectively a map of maps.
Furthermore, since each input record contains six variables, each SOM node is actually six separate maps
representing the generalized spatial state for each of the variables. For simplicity, we will typically show only
one variable at a time.
The SOM analyses were performed using SOM-PAK software (Kohonen et al., 1996). We have used three
map sizes in this study: 3 × 2, 4 × 3 and 5 × 3. Although this was not the primary goal of this work, the use of
multiple sizes allows us to look at how data grouping changes with varying numbers of SOM states available
for generalization. Smaller maps have fewer states to which the input data can be mapped, so it is expected
that some grouping of input years will occur. The 5 × 3 map has 15 states and matches the size of our input
data (15 years); thus, it is possible, if not highly probable, for each year to map to a unique SOM state. That
is, it is expected that a 5 × 3 map will only group records with significant similarity, whereas smaller maps
will have some degree of forced association due to having fewer states. Thus, the 5 × 3 maps (15 nodes)
are preferred for our purposes. Although we are interested in the SOM’s ability to generalize, we are also
interested in its ability to classify and order without supervision. Even without substantial generalization, a
SOM analysis provides useful information about the similarities and differences in the input data through the
spatial mapping of the data on the SOM grid. Each SOM analysis produces a classification of input records
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
590
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
(calendar years) grouped by the similarity of their meteorological data (with one or more years per group).
AWS and ice-core data from each group were then compared for similarity within groups and differences
between groups before being used as predictors of the SOM classifications.
3.1.3. Assessment of results. As with many statistical analysis methods for grouping similar data, including
cluster analysis, it can be difficult to determine a ‘perfect’ classification. Assessment of an SOM classification
is both a subjective and quantitative process, especially with smaller datasets (details of individual mappings
are often less important as the size of the input dataset grows). As always, it is important to bear in mind
the question being asked of the analysis method when trying to determine whether the method has given
a reliable answer. Ideally, the refinement phase produces one convergent solution from dissimilar orderingphase classifications. In the event that this is not possible, other criteria are required. For example, training
might be considered ‘done’ (subjectively) once mappings to the diagonal corners of the map have stabilized,
since these nodes are often the most distinct (although continuing beyond this stage is usually recommended).
Sammon maps (Sammon, 1969), which provide two-dimensional projections of multidimensional vectors, are
a common method for this type of evaluation. Quantization error, the difference between the input data and
the reference vectors, can be useful as a quantitive measure of SOM ‘error’, but it is unlikely to be the only
useful metric. For example, there may be little need to continue reducing the error if the mappings have
stabilized across all nodes. Similarly, there is no guarantee that continued efforts to reduce the error will lead
to stable mappings if some data are right on the boundary between quantized states. However, in theory,
even those data records can be separated successfully if enough time is spent on training, though this is often
unnecessary. Examination of residuals provides another quantitative assessment approach.
After much testing and evaluation, we settled on a combination of input-data mapping stability (i.e. are
groups still changing significantly) and quantization error to determine how much training was enough. With
the small number of input records, as few as 1000 iterations were sufficient to provide initial ordering of the
input data, although 5000 were preferred. (A run of 5000 iterations typically required only a few minutes
to complete on an 867 MHz Apple Macintosh G4 Powerbook laptop.) A refinement run of up to 20 000
iterations often produced useful further separation of the modes; longer run lengths had only minor benefits.
3.2. Climate upscaling
3.2.1. Overview. FF ANNs were also used for our climate upscaling studies, but with the freely available
NevProp package (Goodman, 2002). Climate upscaling uses surface data to predict corresponding SOMclassified synoptic states. In this way, synoptic conditions can be reconstructed from a surface sampling
of the atmosphere. More surface sites and greater record length in the calibration period both improve the
quality of such reconstructions. Two surface records were used as predictors: AWS observations (six sites,
15 years) and ice-core climate proxies (four sites, 40 years). Each dataset allowed us to evaluate the surfaceto-mid-troposphere relationship, but with different levels of dating uncertainty and, thus, different temporal
resolutions. AWS observations are effectively perfectly dated records and, thus, support comparisons at up
to 6 h resolution. Thus, upscaling from AWS data allows us to test the approach under best-case conditions
for a given number of AWS sites. Because of the shortness of the AWS record, however, only very limited
testing of data not used in training is possible. As noted previously, ice-core records are generally limited
to annual or lower resolution because of assumptions about the rate of deposition and dating uncertainties.
This restricted temporal resolution is offset by the potentially much greater length of the ice-core records.
Calibration is still done for the 15 year overlap with ERA-15 data, but upscaling can be done further into
the past using the ice-core data. In this study, the four ice cores provided an additional 25 years at all sites,
taking the upscaling back to 1954.
3.2.2. Data coding. Training of an upscaling ANN requires an encoding of the predictor and target data.
Full details of how this was done are presented in Appendix A. Briefly, the predictor data are simply the
measurements at all sites as standardized anomalies from the study period mean (15 years for AWS, 40 years
for the ice cores) at each site. To match the ice-core accumulation data (and SOM analysis data), the AWS
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
591
and ice-core chemistry data were averaged to annual values prior to calculating the anomalies. The target
SOM classifications were encoded as x, y grid coordinates from the 5 × 3 SOM map (after testing three
possible encoding schemes). Because SOM grid coordinate values are predicted as real numbers (our ANNs
use floating-point outputs), issues arise regarding how best to convert the predictions to usable values. We
have chosen only to map predictions to the closest SOM grid coordinates (based on shortest distance). See
Appendix A for further details related to the pros and cons of this approach.
3.2.3. Generalization. Because of the limited number of input records (15), further limited by grouping,
it is more challenging than typical ANN practice to train an ANN that does not overfit the data. A fraction
of the input can, of course, be withheld for validation, but the benefit of this is likely to be outweighed by
the severe reduction in the number of training cases. Standard practice would have us withhold ∼30% of the
input, which would cut the training set down to 10 or 11 records without necessarily bringing much benefit
of generalization skill. Our approach to avoiding this problem is to generate groups of new input records
by adding small amounts of noise to each original record. In this way, the input data set is enlarged with
prediction vectors that are close to the original vectors and map to the same targets, thus improving network
generalization (Haykin, 1999). A scientific rationale for this approach comes from the hypothesis that any
local meteorological measurement is a function of synoptic/regional effects, fixed local effects (e.g. orography)
and variable local effects (e.g. soil moisture, in the general case). Upscaling takes a local measurement and
maps it to a synoptic/regional value. Because there are local effects not captured by the synoptic/regional
state, it is reasonable to suppose that values close to the local value will map to the same state. In this case,
we simply add small amounts of normally distributed noise to reproduce this effect. The distance from the
original to the new vectors was constrained to be within 0.03 standard deviations to keep the new vectors
in a small cluster around the original vector. Obviously, other noise distributions could be used, as well as
other approaches to adding the noise, but this approach has the advantage of conceptual simplicity. With the
additional input records (10 to 20 for each original record), traditional techniques for robust ANN training
again become viable.
We have used two widely accepted techniques, cross-validation and bootstrapping, to attempt to improve
generalization and avoid overfitting of the training data. Cross-validation splits the input data into training
and testing (holdout) subsets. The training subset is used to adjust the network weights iteratively and ‘learn’
the data as in standard training. The testing subset is used to check whether the ANN has overfit the training
data. Increasing errors from the testing subset strongly suggest that training has gone too far and should be
stopped. For this reason, this approach is also known as early stopping. NevProp implements early stopping
as a two-phase process. In phase one, multiple ANNs are trained using random splits of the input data based
on a user-defined percentage holdout level (e.g. set aside 30% for testing). Training of each ANN continues
until the testing error starts to rise, at which point the ANN is saved as the best version for that data split.
Phase two uses the complete input dataset for training and the mean error of the ANNs trained in phase one
as the target to stop training. With this approach, NevProp determines an unbiased estimate of the mean error
value at which to stop training by doing early stopping training on multiple splits (up to 10) of the input data.
The final model (ANN) is produced by phase two.
The second technique for improving generalization, bootstrapping (e.g. Efron and Tibshirani, 1993), is
logically a level higher than the early-stopping technique. NevProp’s version creates a user-defined number
of ‘booted’ datasets by sampling with replacement from the original input data (thus, some samples may
be replicated and others omitted entirely). Each of these datasets is then used as input for training with
early stopping. Results from each booted dataset are then used to adjust overall performance statistics to
account for cross-validation training being based on a smaller subset of the input space and, thus, possibly
producing overly optimistic performance statistics. Bootstrapping as implemented by NevProp also provides
95% confidence intervals on predicted data.
3.2.4. ANN configurations and ensembles. Table II summarizes configurations used to predict the SOM
classifications from ice-core data. To address situations where the early-stopping criteria were not satisfied,
and to provide another means to avoid overtraining, two different maximum iteration stopping points were
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
592
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
Table II. Summary of ice-core-based upscaling configurations
Groupa
1a
1b
2
3a
3b
4a
4b
5a
5b
6
Hidden nodesb
Iteration limit(×10−3 )c
4
4
5
4
4
3
3
4
4
5
10
20
20
10
20
10
20
10
20
20
Predictors
Accumulation
Accumulation
Na+
SO2−
4
SO2−
4
Accumulation and Na+
Accumulation and Na+
Na+ , SO2−
4
Na+ , SO2−
4
Accumulation, Na+ , NO3 − , SO2−
4
a All
versions used 10 extra training records per data record added using noise methodology (Section 3.2). All versions also used the x,
y grid coordinate from the SOM used directly as the ANN target. Groups with an alphabetic suffix are results from an ensemble of 20
individual ANNs.
b Number of nodes in the hidden layer of the ANN.
c The maximum iteration count used in training.
used for each configuration. To test ANN skill further, we also created ANN ensembles by training multiple
instances using the same configuration of predictors and hidden nodes and repeating the above training steps.
This produced an ensemble of 20 ANNs for each configuration in which each instance was trained from
slightly different starting conditions and with different subsets and ordering of the training data.
4. RESULTS
Before describing our results, it is worthwhile reviewing the main assumptions involved in these analyses:
•
•
•
•
ERA-15 provides a reasonably valid representation of the free atmosphere over West Antarctica.
AWS observations are representative of the near-surface environment.
Synthesized AWS data are valid and capture the natural variability in the system.
Ice-core dating is accurate enough for the annual values to be valid.
4.1. Synoptic classifications
4.1.1. Generalization and SOM grid size. The characteristics of an SOM-based climatology are influenced
by the grid size of the SOM, since this affects the level of generalization. The strongest associations, i.e.
groupings of years, will persist as the SOM size increases. Weaker associations will shift as more SOM nodes
become available to differentiate the data. We have used three sizes of SOM specifically to examine this
behaviour. Figure 4 and Table III summarize the grouping of years from annual analyses by our three grid
sizes (3 × 2, 4 × 3 and 5 × 3). Groups classified by each SOM are listed by the rows in Figure 4. Each row
shows the year groups (grey boxes) identified by each SOM. Shading and black boxes around groups show
the history of each group from the largest (5 × 3) to smallest (3 × 2) SOM. In particular, three groups from
the 5 × 3 SOM (1980, 1988; 1983, 1993; 1991, 1992) are seen to be quite stable at the three generalization
levels. Each of these groups starts as part of a larger group in the smallest SOM, but the additional years
move to new groups as the SOM size increases. For example, 1980 and 1988 are grouped with 1979 and
1981 in the smallest SOM. In the 4 × 3 SOM, 1980 and 1988 are grouped separately and the other years
have moved to two new groupings. Robust year groups suggest that those years are highly separable from
the rest of the data. This is particularly true for the 1980, 1988 and the 1983, 1993 groups since these are
mapped to the corners of the 5 × 3 SOM.
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
593
3x2
1983 1993 1989 1984
4x3
1983 1993
5x3
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
1983 1993 1989 1990
3
1989 1985
1979 1981 1980 1988
1985
1986 1987 1990 1991 1992
1982
1979
1980 1988
1982 1984 1986
1979 1981
1980 1988
1982 1984
1985 1986 1987
5
1
4
2
7
19811987
1990 1991 1992
1991 1992
6
Figure 4. Generalization and grouping by annual SOMs. Each row represents groups determined by a given size SOM. Sizes are noted
at the left side. Groups classified by a given SOM indicated by grey boxes in each row. Shades, lines and black boxes indicate the
history in the smaller SOMs of the groups in the largest (5 × 3) SOM. Numbers at the bottom indicate the 5 × 3 SOM group number.
Forced generalization decreases from top to bottom as the SOM size increases and more states become available. This figure is available
in color online at http://www.interscience.wiley.com/ijoc
Table III. Summary of SOM-based classifications of annual data, by
SOM size
Group
SOM grid coordinate
1
2
3
4
5
0,
1,
2,
0,
1,
0
0
0
1
1
1
2
3
4
5
6
7
0,
1,
2,
1,
3,
0,
2,
0
0
0
1
1
2
2
1
2
3
4
5
6
7
0,
2,
4,
3,
0,
2,
4,
0
0
0
1
2
2
2
Years
3 × 2 SOM
1983–84, 1989, 1993
1985
1979–81, 1988
1982
1986–87, 1990–92
4 × 3 SOM
1980, 1988
1979
1985, 1989
1981, 1987
1983, 1993
1990–92
1982, 1984, 1986
5 × 3 SOM
1980, 1988
1985–87
1983, 1993
1982, 1984
1979, 1981
1991–92
1989–90
With Table III, the Sammon map (a two-dimensional projection of the SOM reference vectors; Sammon,
1969) for the 3 × 2 SOM (Figure 5(a)) shows that the annual data map almost entirely into three wellseparated groups on nodes (0, 0), (2, 0) and (1, 1). At this level of generalization, only 1982 (node 0, 1) and
1985 (node 1, 0) are distinct and ungrouped with other years. The Sammon distances between node (2, 0)
and its neighbours also suggest that this group of years (1979, 1980, 1981 and 1988) is well separated from
the remainder of the data.
The next larger size SOM grid (4 × 3, Figure 5(b)) shows the three large groups of the 3 × 2 SOM splitting
into smaller groups. Year associations also change as the new groups are formed. For example, 1989 splits
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
594
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
Figure 5. Sammon mappings of various size SOM grids: (a) 3 columns × 2 rows; (b) 4 × 3; (c) 5 × 3. Axes represent distance in the
two-dimensional projection space and are effectively without units. Each SOM node is labelled with its coordinates in the SOM column,
row grid with columns increasing left-to-right and rows increasing top-to-bottom. Values on graph edges are the distances between the
projected SOM nodes rounded to an integer
from its four-member group to join the singleton 1985. Four of the final (5 × 3 SOM) seven groups (1980,
1988; 1983, 1993; 1982, 1984; 1991, 1992) have been identified although two of these (1982, 1984; 1991,
1992) still have an extra year in the 4 × 3 group (1986 and 1990 respectively). Each corner group remains
well separated from its neighbours.
The largest size SOM grid (5 × 3, Figure 5(c)) shows the final year groups. Two corners retain their 4 × 3
grid groups (1980, 1988; 1983, 1993) and are somewhat more differentiated in the larger grid. Three years,
1979, 1986 and 1987, have rejoined years with which they were originally associated in the 3 × 2 grid.
Lastly, the 5 × 3 grid provides eight states unmapped by the input data. The result is that all mapped states
have unmapped states as their neighbours. Thus, the SOM has both grouped similar input years and provided
intermediate states between all the mapped groups.
4.1.2. Generalized patterns. Figure 6 presents the 15 generalized patterns for 700 hPa average annual
temperature as extracted by the 5 × 3 SOM analysis and expressed as anomalies from the standardized gridwide annual average (as described in Section 2.1). It is also important to remember that this is only one figure
from a set of six (one each for the six variables analysed) and that the complete analysis provides generalized
patterns for all six variables (e.g. Figure 7). The annual values map to seven distinct groups (Table III) in
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
595
Figure 6. The 5 × 3 SOM classification map for 700 hPa temperature anomalies. Each node is a generalized pattern extracted from the
original data by the SOM analysis as described in the text. Values are grid-point anomalies from the grid-wide mean. Zero contour is
in bold, and relative highs and lows are labelle as H and L respectively. Labels above selected nodes identify the input years that most
closely match the pattern shown in the node. Nodes with no label represent intermediate states not seen in the input data. Figure 5(c)
shows distances in SOM space between the nodes
this SOM-analysis, as shown by the year labels over the maps in Figure 6. This effectively says that, with
this size SOM grid, the data primarily cluster in pairs with one group of three. That is, within the 15 year
period, there is still a fair amount of variability between the years. For later reference, these groups will be
numbered in left-to-right, top-to-bottom order (also see Figure 4).
To aid the interpretation of the SOM analysis results, the generalized patterns for the six variables analysed
are presented in separate figures, of which three are shown here (Figure 7). Two of these groups are from
the corners of the SOM grid (Figure 6) to emphasize differences. Group 1 (Figure 7(a), 1980 and 1988)
shows warmer temperatures, higher geopotential height and increased moisture over a broad region centred
along the Amundsen Sea and Marie Byrd Land. Positive height anomalies exceed 40 m at the centre of the
pattern and are at least 10 m over all of West Antarctica and the Antarctic Peninsula. The temperature
anomalies of up to 2 ° C at 700 hPa are also seen at similar magnitude in AWS surface records from
across West Antarctica for both these years (Reusch and Alley, 2004). The pattern of increased moisture
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
596
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
Figure 7. Generalized patterns as anomalies from mean values for specific year groups: (a) 1980 and 1988, (b) 1989 and 1990, (c) 1983
and 1993. Contour limits and interval, in standard deviations, shown beneath each map with the zero contour in bold. Relative highs
and lows are labelled as H and L respectively. Variables as described in text
over the western Amundsen Sea and Ross Sea is at least partly explained by the warmer air, as seen in
temperature and increased 700 hPa heights. Group 1 also shows increased stability (i.e. reduced variability)
in the 700 hPa temperature and height fields (negative anomalies in the standard deviation fields) with more
variability (positive anomalies) in the moisture field. The latter is at least partly due to the higher absolute
values for specific humidity in the generally very dry Antarctic atmosphere. Although the relationship is
not universal, higher variability is often associated with higher absolute specific humidity in this region.
The generalized patterns for 1980 and 1988 thus describe a warmer, wetter and generally less variable
mid-troposphere.
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
597
Figure 7. (Continued)
Diagonally opposite to Group 1 on the SOM grid (Figure 6) is Group 7 (Figure 7(b), 1989 and 1990).
A distinguishing feature of this generalized pattern is the large negative anomaly (more than 20 m over
the Amundsen and Bellingshausen Seas) and above-average variability in geopotential height. The former
is related to an eastward extension and overall deepening of the Amundsen Sea low from a centre over the
eastern Ross Sea, especially in 1989. The latter likely represents a shift of storm tracks into the southern
Antarctic Peninsula/Bellingshausen Sea region due to the height anomaly. The area of higher moisture over the
Weddell Sea is similarly explained. Except for this region, moisture is close to average values, with moderate
departures from average variability in the Ross Sea (positive) and Victoria Land (negative). Temperature
variability is moderately above average over much of the region.
From the upper right corner of the SOM grid, the main feature of Group 3 (Figure 7(c), 1983 and 1993)
is the large positive anomaly in the standard deviations of all three variables. Geopotential height variability
exceeds 15 m above average values over much of the Ross Sea. Strong temperature and moisture variability
anomalies fit within the maxima of the height variability anomaly, with peak values over Victoria Land and the
western Ross Sea. In the mean, temperatures are close to average values (±0.5 ° C), geopotential heights are
moderately reduced, and moisture is generally below normal, apart from the Victoria Land anomaly. Thus,
the generalized patterns for 1983 and 1993 describe mean values close to the study period average, with
enhanced variability over the Ross Sea and Victoria Land. The map of average geopotential height suggests
that a source for the variability may lie in the central South Pacific outside the study area.
4.2. Climate upscaling
4.2.1. Data stratification. Before attempting to upscale to the SOM classifications, it is useful to examine
surface data associated with each group. Figure 8 shows AWS temperatures and accumulation rate data
for each SOM-classified year group. In the ideal case, each group of surface data would be distinct from
all other groups. This would indicate an unambiguous (if not necessarily simple) relationship between the
surface record and atmospheric conditions. A nearly perfect example is seen in the temperature data for 1980
and 1988 (Figure 8(a)). Both years have the same characteristics at all six AWS sites, i.e. unambiguously
warm temperatures. That the other patterns are less clean (e.g. Figure 8(b)) does not make them unusable, but
hinders their use as predictors. Given enough training data, an ANN should be able to determine a relationship
between a set of noisy predictors and the SOM classifications. That is, we should be able to train an ANN to
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
598
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
Figure 8. Surface data (normalized) for each SOM-classified year group, by site: (a) AWS temperatures; (b) ice-core accumulation rates.
Each plot shows the surface or proxy data associated with each group of years identified in the SOM analysis. AWS data are shown
in geographic order by longitude, east to west (see Figure 1). Ice-core data are ordered site A to site D. Labels on x axis denote the
years in each SOM group and each year’s data are grouped by shade. For example, the data shown for 1980 (light grey) and 1988 (dark
grey) are associated with the generalized 700 hPa patterns in Figure 7. The data are thus ‘stratified’ into groups based on the SOM
classification. Values are normalized to allow for different ranges at the different stations/sites
take possibly noisy (or ‘smudged’) fingerprints that the atmosphere has imprinted on surface data and relate
them back to our set of classified states and thus reconstruct past conditions in the atmosphere.
4.2.2. AWS-based upscaling. Although the AWS records currently provide no long-term opportunity to
reconstruct the atmosphere from the SOM patterns (the records start only in 1979), we still considered it
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
599
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
useful to explore upscaling with the AWS data as predictors. This served as an alternate way to test the ANNbased upscaling methodology. The AWS data are also inherently cleaner (less noisy), because the dating is
inherently more accurate relative to that in the ice-core records. The issue of circularity needs consideration
when using the reconstructed AWS records to predict ERA-15 data, since ERA-15 upper air variables were
in fact used in the reconstruction process. Three facts support our belief that circularity can be ignored. First,
the upscaling ANN is attempting to predict a pair of integer grid coordinates, not the actual atmospheric data.
Second, variables from multiple upper air levels and a limited spatial extent were used in the reconstructions,
not just the 700 hPa level. Third, the original AWS data do not appear to have been used during the ERA-15
reanalysis itself and, thus, do not influence the ERA-15 dataset.
Because our emphasis is on reconstructions prior to 1979, only one scenario, with one year held out of
the training set, was run using AWS data. A thorough evaluation of AWS-based upscaling would call for
additional testing with different years held back for validation and different sets of predictors. Our one scenario
used the temperature records for all six sites as predictors for two ensembles (maximum iteration limits of
10 000 and 20 000) of 20 ANN models each. Three hidden-layer nodes were used for all ANNs. 1988 was
held back from the training set to provide a completely independent test of the ANN (beyond the normal
splits during training). Table IV summarizes the results from this testing. For this configuration, additional
training iterations (ensemble 2) appear to result in ANNs with reduced skill (fewer correct predictions during
training and testing). However, even the more skilled ensemble still mispredicts for 1988 25% of the time.
Table V provides an alternate evaluation of skill during training based on differences between predicted
and expected x, y coordinate values (after mapping to the grid). Errors are broken down by just x or y being
incorrect or both values being wrong. The ANNs appear to be better able to predict y than x, for both single
and double errors. Overall, ∼68% (∼57%) of predictions for x (y) are within two (one) unit(s) of the correct
Table IV. Summary of AWS-based upscaling ensembles
Ensemblea
1(10)
2(20)
Skillb
# Wrongc
Correctd
µ
σn−1
µ
σn−1
0.95
0.95
0.02
0.02
4.6
5.1
2.0
1.7
Prediction Errore
15 (75%)
11 (55%)
µ
Min
Max
0.8
1.1
0.1
0.5
4.1
5.0
a Value
in parentheses is the maximum iteration count used in training, in thousands.
is the R 2 value as reported by NevProp for the training ANNs. This is not the version of R 2 from traditional statistics, but is a
scaling of the mean square error to the range 0–1 (Goodman, 1996). It has also been adjusted based on the bootstrapping methodology,
which attempts to assess how the ANN will perform on data outside the training set.
c Number of incorrect predictions during training, out of 14 known targets (years 1979–93 less 1988).
d Number of correct predictions during testing, out of 20 ensemble members. A correct prediction is x, y values that match the SOM x,
y values for 1988, i.e. 0, 0, after being mapped to the nearest SOM x, y coordinates.
e Quantization error related to mapping an ANN x, y prediction to the nearest SOM x, y coordinates. SOM nodes are spaced one unit
apart from their vertical and horizontal neighbour(s).
b Skill
Table V. Summary of mapping errors from AWS-based upscaling ensembles. Mean and standard deviation are for the
difference between the prediction and the correct value for the x or y coordinate
Ensemble
1
2
Total wronga
102 (20)
91 (18)
Just x wrong
Both x and y wrongc
Just y wrong
nb
µ
σ
nb
µ
σ
nb
µ
σ
20 (20)
21 (23)
−1.7
−1.8
1.3
0.7
40 (39)
29 (32)
0.3
0.1
1.0
1.0
42 (41)
41 (45)
2.8, −0.1
2.8, 1.2
1.3, 1.8
−0.1, 1.9
a Percentage
of total predictions shown in parentheses, out of 500 predictions.
of prediction errors shown in parentheses, out of the total wrong.
c Mean and standard deviations are for both x and y.
b Percentage
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
600
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
value. That only one independent target is being predicted in these ensembles makes it difficult to come to
any strong conclusions about the value of AWS data for upscaling. However, with correct prediction of the
unseen year (1988) 75% of the time and mapping errors within one or two units (in y and x respectively) for
60–70% of the training data, the results are definitely promising. That a limited suite of AWS sites can be
used effectively to estimate the large-scale circulation patterns of the atmosphere is a reasonable conclusion.
4.2.3. Ice-core-based upscaling. Characteristics of 10 ANN configurations used to predict SOM
classifications from ice-core data are summarized in Table II. After training, each of these ANNs was used to
predict the SOM classifications for 1954–78 using corresponding ice-core data. Six sets of predictors were
used with two 20-member ensemble runs for four of the six predictor sets. Specifically, the predictors tested
2−
+
+
are three singletons (accumulation, Na+ , SO2−
4 ), two pairs (accumulation, Na ; Na , SO4 ) and one ‘kitchen2−
+
−
sink’ (accumulation, Na , NO3 , SO4 ). In all cases, data from all four sites were used for each variable in
the predictor set resulting in 4, 8 and 16 ANN inputs respectively for the three predictor categories. Table VI
summarizes leading statistics for the 10 configurations tested. Based on NevProp’s adjusted R 2 and the
number of incorrect predictions in the training set, prediction skill of all the networks is high. Unfortunately,
the prediction errors suggest a different story, i.e. that the networks may be overtrained despite all attempts
to avoid this. The prediction error represents errors related to quantization of real-valued x, y predictions to
the integer-valued SOM x, y grid. For example, an ANN prediction of (2.34, 1.79) would be mapped to (2,
2) with a Euclidean distance prediction error of 0.4. The prediction errors for the training data are small,
but values for the testing data (1954–78) are substantially larger (typically an order of magnitude). Mean
prediction errors for the testing data (last column), and examination of the raw prediction values (before
mapping to the grid range), shows that many predictions actually lie outside the grid and are being mapped
into corner or edge nodes from more than the average within-grid internode half-distance (i.e. 0.71 since
nodes are on a one-unit grid). An alternative explanation to the ANNs being overtrained is that the prediction
error may indicate that the training data are insufficient for representing the data space covered by the testing
data. We will return to this topic in Section 5.
Table VI. Summary of statistics for ice-core-based upscaling ensembles
Prediction errorc
Group
Skilla
1a
1b
2d
3a
3b
4a
4b
5a
5b
6d
# Wrongb
Training
Testing
µ
σn−1
µ
σn−1
µ()
µ(µ)
µ()
µ(µ)
0.98
0.98
0.98
0.97
0.97
0.97
0.98
0.95
0.95
1.00
0.01
0.01
–
0.01
0.01
0.01
0.01
0.01
0.02
–
1.1
0.4
3
0.9
0.8
1.2
1.2
1.8
2.0
0
1.4
0.6
–
1.0
1.2
1.3
1.6
1.8
1.7
–
2.53
2.36
2.70
3.26
3.35
2.97
2.09
3.43
3.16
0.00
0.16
0.16
0.20
0.20
0.21
0.19
0.15
0.23
0.22
0.00
28.1
33.8
28.9
52.6
60.7
46.8
46.3
48.9
55.9
17.4
1.10
1.36
1.20
2.11
2.42
1.88
1.85
1.95
2.23
0.70
a As
in Table IV.
of incorrect predictions during training, out of 15 known targets (years 1979–93).
c As in Table IV, the values represent the quantization error associated with mapping a real-valued x, y prediction to the integer-valued
SOM x, y grid. Two statistics are shown for the training and testing predictions. Each is the mean value over the ensemble (or the
actual value for non-ensembles). µ() is the mean of the total quantization error and µ(µ) is the mean of the mean quantization error,
i.e. the sum divided by the number of predictions (15).
d Results are from a single ANN, not an ensemble; therefore, standard deviation does not apply and the mean values are the actual
results.
b Number
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
601
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
Table VII. Predictions of SOM grid x, y coordinates from ice-core data for 1954–78, the period outside the training
set. Predictions from an ensemble group are from a representative ensemble member, i.e. one of high but not necessarily
highest skill. Most common prediction, if any, is highlighted in bold for each year
Year
SOM x, y for group
1a
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
2,
3,
4,
3,
1,
2,
0,
1,
0,
0,
4,
2,
1,
2,
2,
2,
2,
1,
4,
1,
0,
1,
2,
2,
2,
2
2
2
2
0
2
2
1
2
0
0
1
0
2
1
2
2
0
2
0
0
2
2
2
2
1b
2,
3,
4,
3,
1,
4,
4,
2,
0,
0,
4,
2,
1,
1,
2,
2,
1,
1,
4,
1,
1,
2,
1,
4,
4,
1
2
2
2
0
2
0
2
2
0
1
2
0
2
2
2
0
0
2
0
0
2
0
1
0
2
1,
1,
0,
3,
2,
1,
1,
0,
0,
0,
3,
0,
4,
0,
1,
2,
4,
4,
0,
2,
0,
4,
3,
2,
3,
3a
2
2
1
2
0
0
2
0
0
0
1
1
0
0
2
0
0
2
0
0
0
2
1
0
2
4,
0,
0,
3,
2,
4,
1,
1,
2,
4,
0,
0,
4,
1,
1,
3,
2,
0,
1,
3,
3,
4,
2,
1,
0,
3b
1
2
2
2
0
1
0
1
0
2
0
0
1
2
2
1
0
2
1
1
1
1
2
2
2
4,
2,
0,
0,
2,
4,
0,
4,
2,
4,
1,
4,
4,
0,
4,
2,
1,
0,
0,
2,
4,
2,
0,
4,
0,
2
2
1
1
2
2
2
2
2
2
2
2
2
1
2
0
2
1
2
2
2
1
1
2
1
4a
2,
2,
2,
2,
0,
1,
0,
2,
0,
0,
4,
2,
1,
4,
4,
2,
2,
0,
0,
2,
1,
2,
4,
2,
2,
2
2
2
2
1
1
0
2
0
0
1
2
0
1
1
2
2
0
0
2
2
1
0
2
2
4b
0,
2,
2,
3,
0,
3,
2,
0,
3,
0,
4,
0,
0,
3,
2,
4,
1,
0,
0,
0,
0,
3,
0,
4,
1,
2
2
0
1
0
1
0
0
2
0
2
1
1
2
0
0
2
1
2
0
0
1
0
0
0
5a
4,
0,
0,
0,
0,
4,
0,
1,
1,
4,
4,
4,
0,
3,
4,
4,
4,
0,
0,
2,
4,
0,
0,
4,
1,
1
2
1
2
2
1
0
2
1
2
1
2
0
2
1
1
1
1
0
2
2
2
2
0
2
5b
4,
0,
0,
0,
1,
4,
0,
2,
4,
2,
1,
4,
1,
0,
2,
2,
0,
0,
1,
0,
3,
0,
0,
4,
0,
0
0
1
1
2
1
0
1
2
1
0
2
0
1
1
0
1
0
2
1
0
2
2
2
0
6
2,
3,
1,
3,
1,
4,
2,
3,
3,
2,
4,
3,
2,
3,
1,
3,
1,
0,
2,
1,
4,
4,
0,
3,
1,
1
2
2
2
1
1
0
1
2
1
2
0
0
1
0
0
1
2
2
2
1
2
2
2
2
Table VII summarizes predicted SOM grid coordinates for the testing period (1954–78) based on the various
predictors. For those groups that are ensembles (all but groups 2 and 6), the predictions listed are from a
representative ensemble member. Determining which ensemble member has the highest skill is somewhat
subjective because of the variety of metrics being used. Thus, although the ANNs listed in Table VII may
not have the highest overall skill, they are still among the best in their ensembles. For years that have any
agreement in the predictions, the most common prediction is highlighted (ties are not highlighted). At first
look, Table VII suggests that the ANN-based approach is not very successful, since there is so little agreement
on predictions between the various ANNs. As further explained below, this is only partly true.
5. DISCUSSION
The ANN-based upscaling results based on ice-core datasets as predictors (Table VII) suggest that the
available training data are not sufficient to produce reliable predictions and, thus, preclude development of a
deterministic, ice-core-based climate reconstruction. Instead, it appears that a more probabilistic approach is
needed for determining the appropriate circulation patterns from the ice-core predictors. This approach awaits
better (i.e. longer) atmospheric and ice-core datasets from a broader spatial region. Meanwhile, possible
sources of error affecting the ANN’s skill, robustness of the main methods, and potential refinements to the
methodology are discussed in the following sections.
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
602
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
5.1. General issues and limitations
Despite following established practice for ANN training and applying multiple safeguards against
overtraining, ANN predictions on non-training data vary considerably from one model to the next. With
ANN training methodology ruled out as the source of this variability, the other main possible explanations
are inadequate predictor data, dominance of local forcing and the absence of a climate record in the ice-core
data. Although ice-core datasets are known to be noisy to varying degrees (e.g. Benoist et al., 1982; White
et al., 1997), a climate signal has been solidly established (e.g. Lorius, 1989; Delmas et al., 1992; Legrand
and Mayewski, 1997; Shuman et al., 1998; Reusch et al., 1999); thus, we turn our attention to the data. One
of the keys to applying ANNs successfully for prediction is to ensure that the training data cover the full
range of the input space (Haykin, 1999). In other words, give an ANN input data from outside the range seen
in training and the predictive skill will likely drop. Figure 9 summarizes the individual training and testing
predictors for the accumulation rate data. Taken by individual site, only Site A has significant testing data in
a range not covered by the training data. All ANNs using accumulation rate as a predictor use the data from
all four sites; so, collectively (and subjectively), the input space appears to be reasonably well covered by the
training data, at least when the predictors are examined individually in one dimension. A different conclusion
may be drawn in the native four-dimensional space in which the data reside.
Evaluating the representativeness of training data rapidly becomes very difficult as the size of the input
vector increases, although Sammon maps can help. SOMs may also be useful for the assessment of upscaling
training data coverage. A SOM trained on the complete set of data used as predictor input to the upscaling
Figure 9. Distribution of normalized accumulation rate data by training (dark grey) and testing (light grey) predictors. For ideal ANN
training, the range and coverage of the training data should match that of the prediction data. Verification is made more difficult by the
fact that all four sites are used together as predictors. The x axis is in standard deviations. The y axis is a frequency count
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
603
ANN can be used to characterize the input space. Mapping the upscaling training and testing data separately
to the SOM should reveal the coverage of each data set relative to the SOM classifications space. If the
upscaling training and testing data map to all the same SOM nodes, then there is strong evidence that the
training and testing data both cover the same portions of the input space. Nodes that are only mapped by the
upscaling testing data suggest (but do not prove) that the training data are not capturing the complete input
space. Unfortunately, it is difficult to say that an SOM analysis of the upscaling predictors is definitive, since
the result depends on the amount of generalization. That is, a small SOM will generalize more than a larger
one and give a different picture of how well the upscaling training set covers the input space. Nonetheless,
SOM analysis of the upscaling predictors can still provide useful clues. For example, preliminary analysis of
the accumulation rate dataset suggests that there are, in fact, gaps in the training set that may account for at
least some of the testing results based on this predictor.
Even if data space coverage is complete, however, this does not exclude the possibility of the training
period being too short, since noise in the data is another factor in how well an ANN can perform (Haykin,
1999). For example, even a very long record of a very noisy (random) dataset may not contain the information
needed for reliable prediction. In our case, the training record is brief (15 years) and noise is definitely present
in the ice-core data (Reusch et al., 1999). It is also possible that the geographic coverage of the predictor
data is not broad enough to capture all of the potential variability of the atmosphere over West Antarctica.
The ice-core sites span only a 200 km transect in the central part of West Antarctica. The variability of this
region may be different enough from the larger region that it reduces the skill of upscaling ANNs based on
just these ice-core sites. In this respect, the AWSs should be better upscaling predictors; unfortunately, the
record is too short to know for sure. Based on the limited results from AWS-based predictions (Section 4.2.2),
the ANN-based methodology is capable of skillful prediction for testing data within the range of the training
data. Given that the AWS data are not noise free, limited noise should also not preclude reasonable ANN
performance.
5.2. Noise and aliasing. Few, if any, climate records are free of noise. Certain aspects of the climate system
as a whole may rely on the system being noisy to explain their existence and behaviour. For example, there
is evidence supporting a relationship between noise and the highly regular spacing (∼1500 years) of the
Dansgaard–Oeschger oscillations seen in Greenland ice-core proxy records of temperature and dust, as well
as various North Atlantic marine records (Alley et al., 2001). How much of an interpretation problem results
from the presence of noise varies widely by dataset and time scale. AWS records at an annual resolution
should be quite robust and relatively noise free, in part because of the large number of observations that
form the annual average. It is also possible that going to an annual average may also smooth out useful
signals in the data, not just reduce the noise. Ice-core records are more subject to noise, in part because of the
larger uncertainties in dating. AWS records are time-stamped; ice-core dating often depends on assumptions
about poorly observed precipitation processes. These assumptions have enough observational history that we
feel confident in using them to develop annual resolution records, but the uncertainties do not disappear.
The two main uncertainty components in multiparameter, ice-core-chemistry-based dating relate to issues
in peak identification and assumptions about timing of chemical species deposition (Reusch et al., 1999).
Noise is a factor in peak identification, since no chemical species follows a pure annual cycle in the real
world. Peaks may be lost or added through various noise-contributing processes (e.g. redeposition, removal).
Multiparameter techniques reduce, but do not eliminate, this problem.
Timing of deposition is broadly known for Antarctic ice cores (e.g. Legrand, 1987; Legrand and Mayewski,
1997), but, like the climate system itself, it is not invariant from year to year. The dating of the ice-core data
used here is based on multiple parameters (annual cycles in major ion chemistry, gross β activity from bomb
fallout horizons and excess SO2−
4 peaks from volcanic events), but it depends strongly on the assumption that
Na+ and SO2−
peak
in
the
austral
winter and summer respectively (Reusch et al., 1999). Variability in the
4
actual timing of these peaks can produce an aliasing of the annual data, where values from one year move
into (or out of) another. Shifting of the Na+ peak is less important at the annual scale, since the calendar year
is being used and the austral winter season falls in the middle of the averaging period. Interannual variability
of the timing of the SO2−
4 peak will affect dating and the accuracy of all the other ice-core records, since it
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
604
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
determines the start/end of the calendar year. This ultimately affects the upscaling process by adding noise to
the predictors. Because the SOM-based classifications are generalized patterns, the same atmospheric state,
as represented by the SOM, could readily produce multiple, unique ice-core (and AWS) records. Noise will
expand the number of (nearly) unique surface patterns mapping to the same atmospheric state. In theory,
learning this many-to-one relationship is well within the power of ANNs, but it is likely that more training
records will be needed compared with the noise-free, unaliased case. Thus, although Table VI suggests enough
training data are available, noise is a definite factor in the predictive skill of the upscaling ANNs, and a larger
training set is needed to improve the results.
5.3. Robustness of method
The two distinct methods involved, SOM-based classification and ANN-based upscaling, have different
robustness characteristics and respond to the length of training data (15 years) in different ways.
5.3.1. SOM classifications. Despite the shortness of record, the SOM analyses are robust and reproducible.
The same year groups are found during the 5 × 3 analysis regardless of network initialization ordering of the
input data, or the number of training iterations. A number of years (e.g. 1980, 1988) are classified together in
all three grid sizes studied. The SOM has clearly found a reliable relationship between such years. Although
our interpretation of the generalized patterns has been limited, partly due to our focus on upscaling, these
results have value and will help to improve our understanding of West Antarctic climate.
5.3.2. ANN-based upscaling. The ANN-based upscaling has a less robust response to the size of the training
set, particularly for the ice-core data. A factor contributing to the different results for the two datasets is simply
that the AWSs are likely sampling a substantially larger area of the West Antarctic ice sheet (Figure 1). Thus,
along with factors such as noise, the geographic coverage of the predictor dataset needs to be considered in
evaluating the upscaling results. As discussed previously, the individual ANNs have high predictive skill with
the training data but much lower performance with the testing data. Figure 10 summarizes the predictions for
four ensembles and displays the extent to which predictions fall inside versus outside the SOM grid. The inner,
darker black rectangle indicates the domain of the SOM grid. The outer, lighter black rectangle indicates the
area within the ‘normal’ mapping domain of the x, y space. ‘Normal’ is defined as falling within the SOM
grid or no more than one-half the internode distance outside the grid. SOM nodes are in a rectangular grid
and spaced one unit apart. The mapping domain for each node is thus 0.5 units in the vertical and horizontal
directions and ∼0.7 units on the diagonal. Thus, predictions that fall within that distance of a node will be
mapped to that node. Predictions outside the outer rectangle (Figure 10) are more than this distance from the
corner and edge nodes. Since we are just mapping to the nearest node, SOM nodes at the corners and on the
edges (e.g. 4, 2) may include predictions from well outside the SOM grid extent. This is a side-effect of using
the shortest distance to map predictions to the grid coordinates: corners will be the closest coordinate to all
predictions outside the grid in that region. An alternative approach would be simply not to map predictions
that fall outside the outer rectangle. This would provide useful information about predictive skill, but it would
not solve the problem by itself. That the same year is mapped to adjacent nodes in different models is not
in and of itself an indication of poor predictive skill. It is also important to consider the distance between
map nodes when assessing differences between model predictions. Two nodes may have fairly similar spatial
maps (generalized patterns) if they are adjacent; or they could be quite distinct, and an off-by-one error in
the prediction leads to quite different results. This highlights the importance of Sammon maps (e.g. Figure 5)
when evaluating upscaling predictions.
Although not necessarily justifiable, it is also worth considering how prediction results might be improved
by relaxing the criteria for a match. A simple change would be to consider predictions to adjacent nodes also
to be correct. This would be quite reasonable for nodes that form a fairly similar cluster separated from the
rest of the grid. Relaxing the criteria in this way would also make more sense for larger SOM grids (e.g.
5 × 7). For the sizes used in this study, it is hard to justify making this change, since the new range of correct
answers would potentially cover over half the grid nodes (9 of 15). A less black-and-white approach might
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
Figure 10. Contouring of upscaling predictions from ice-core data for 1954–78. The 20 members of each ensemble shown made a total of 500 x, y coordinate predictions (20
ensemble members ×25 predictions each). Each figure contours the positions of these predictions with respect to the 5 × 3 SOM grid (shown as the heavy black rectangle). The
light black rectangle represents an additional half unit distance outside the 5 × 3 grid; points outside this area are farther from the grid than half the distance between the grid nodes.
Under ideal circumstances, all predictions would lie within the heavy black lines. The near-zero range has been made white for clarity. Note: an average of 5% of the predictions lie
outside the axis limits of the plots
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
Copyright  2005 Royal Meteorological Society
605
Int. J. Climatol. 25: 581–610 (2005)
606
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
be useful for this size grid, i.e. allowing predictions to adjacent nodes to be ‘sort of’ correct rather than just
wrong. It is more likely that the simplest approach is just to train with more data, particularly since any
relaxation of the criteria is also going to help the random solution (described below).
As an alternative way to assess the ANN predictions outside the ice-core training set, we applied Monte
Carlo techniques to the suite of predictions from the eight ensembles to test whether the yearly predictions
were doing better than chance. An approach such as this is necessary due to the shortage of independent test
data. Synoptic reconstructions do exist for years within the 1954–78 period (e.g. Rastorguev and Alvarez,
1958; Phillpot, 1997, and references cited therein), but comparisons with our work have not been attempted
because of differing time scales, pressure levels, data availability, etc. In our test, a random sample is defined
as a set of 25 x, y pairs (one pair for each year) drawn without replacement from the 4000 x, y predictions
available in the eight ensembles of 20 ANNs each (with each ANN providing 25 predictions). The x, y values
are used in real-value form to avoid the biases introduced by mapping them to integer grid coordinates. Each
of these random samples has a standard deviation σRS measuring the spread of the predicted values (σRS is, in
fact, calculated separately for x and y). Random skill is then defined by calculating the average and standard
deviation (µ(σRS ) and σ (σRS ) respectively) of 1000 such random samples. This process was repeated 20 times
to get average values for µ(σRS ) and σ (σRS ), i.e. an average random skill. For a particular year’s upscaling
predictions to have skill relative to chance, the standard deviation of the prediction set σP needs to be less than
µ(σRS ) − σ (σRS ). Table VIII summarizes results from comparisons for the ice-core-based upscaling for each
ensemble. Values are the number of years in the ensemble that had σP less than our definition of random chance,
µ(σRS ) − σ (σRS ). ANN skill is most tightly defined by the predictions for both x and y, since that determines
the grid location. Because there is also information in how well the ANN predicts the individual coordinates
(the ANNs are predicting the two outputs independently), Table VIII includes these data as well. To complete
this analysis, we recognized that an ensemble could have a certain number of more skilful years purely by
chance. Counts above this threshold suggest that more than just chance is involved and that there is some
amount of skill in the ensemble’s predictions. Table VIII allows for a conservative 10% of years to have higher
skill by chance and highlights those counts that exceed this threshold. Under these criteria, six of the eight
ensembles have skill greater than chance in predicting the x, y values for 1954–78 (and, thus, the atmospheric
patterns for these years). All ensembles have skill with one or the other grid coordinate (x or y), with skill
being noticeably higher in predicting x (six of eight ensembles) versus y (only two of eight ensembles). These
results strongly suggest that the ANNs are more skilful than random chance, although this is not directly
apparent from the individual year predictions (Table VII) or the distribution of all predictions (Figure 10).
5.3.3. Refinements. As mentioned previously, noise is present in all climate records. We have tried to make
use of this in a positive way by adding small amounts of noise to the original data to create additional training
records. Although the idea is sound, the value of this approach has not been shown indisputably. Predictive
skill tended to be more a function of the number of predictors than the size of the training set, although the
latter was definitely a factor. Additional testing with a wider variety of artificial noise would likely be useful.
Table VIII. Summary of ANN-based upscaling predictions of SOM x, y coordinates versus chance. See text for
explanation of entries. Bold indicates counts higher than expected by chance at the 10% level (rounded to 3 out of 25)
Ensemble
1a
1b
3a
3b
4a
4b
5a
5b
Just x
Just y
x or y
x and y
0
7
8
3
7
4
8
11
7
0
0
3
5
3
1
0
7
7
8
6
12
7
9
11
18
15
8
5
6
10
3
3
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
607
With respect to noise reduction, additional preprocessing steps may prove useful for improving predictive
skill. Input data could be quantized into a small set of values, e.g. high, neutral, low, to reduce the range
of cases that the ANN needs to relate to the target data. We have avoided traditional dimension reduction
techniques, such as PCA, in order to focus on the simplest version of the prediction problem, i.e. predicting
from raw values. Both of these preprocessing methods are worth further investigation, even with a larger
input set for training.
Ice-core-based upscaling is currently done to annual atmospheric patterns because of the annual resolution
of the ice cores. However, there are strong annual cycles in the ice-core chemistry records. Although the
timing of these cycles may not be known well enough to create unambiguous subannual chemistry records,
enough is known to believe that relationships between the chemistry and the atmosphere will be better for
certain subannual periods. For example, because of the Na+ peak in winter, it is reasonable to expect at least
as good, and probably better, correlation between annual Na+ and the winter-season atmosphere. Thus, it
would be useful to do SOM analyses on subannual atmospheric data and repeat the ice-core-based upscaling.
Another approach to improving the upscaling results is to use predictor data that more fully sample the
atmosphere over West Antarctica. A suite of ice-core data that provides broad geographical coverage would
be one example of an improved predictor set. The four cores used so far cover only a limited spatial area
and, thus, may not be entirely well suited to predicting the larger domain. Ice cores are, of course, not the
only route to palaeoatmospheric reconstructions with upscaling. Extensive records from manned stations are
available, although mostly coastal, and it would be useful to take advantage of this resource. These data could
be used as independent predictors to compare with the ice-core-based results, or in combination with the ice
cores (where appropriate) to join the information provided by each source.
6. CONCLUSIONS
SOM-based classification of annually averaged reanalysis data for West Antarctica produces robust groupings
that provide insight into atmospheric climatology. ANN-based upscaling from a limited set of ice-core data
is skilful at identifying the main SOM states extracted in the classification analysis. Unfortunately, the short
training period (15 years) and limited spatial extent of the selected ice-core data appear to prohibit, in this
case, a simple, deterministic reconstruction from high-confidence, well-defined predictions. Instead, a more
probabilistic approach is needed to make up for the shortcomings in these datasets. This is clearly less desirable
than simple, deterministic, high-confidence reconstructions and, thus, we will continue to pursue this goal
in the future. Nonetheless, recognizing the short period of analysis, this is not a particular shortcoming in
respect of advancing the methodological basis for researching climate and palaeoclimates. We have chosen
to publish a progress report now because of the great potential we see for the analytical path followed here.
Pending enhancements to our already robust methodology and availability of improved datasets, we are
limited to an evaluation of what can be done with the current data. Careful evaluation of the upscaling analysis
path suggests that, with longer reanalysis datasets and greater spatial coverage from ice-core data, improved
reconstructions of histories of climate states should be possible. To this end, we plan to acquire the ERA-40
dataset (despite its shortcomings it still provides an improved dataset for this region of the globe) and obtain
access to a larger suite of recent ice-core datasets. Despite the unfavourably short-term and spatially restricted
dataset used to date, which might be expected to preclude successful analyses, we find that the combined
techniques do allow ice-core reconstruction of annual-average synoptic conditions with some skill. We believe
that this skill in the face of the difficulties justifies much wider testing of the techniques.
ACKNOWLEDGEMENTS
This research was supported by the Office of Polar Programs of the National Science Foundation through
grants OPP 94-18622, OPP 95-26374, OPP 96-14927 and OPP 00-87380 to R. B. Alley. We are also grateful
to the Antarctic Meteorological Research Center, University of Wisconsin, for their archive of Antarctic AWS
data and to NCAR’s Visualization and Enabling Technologies Section, Scientific Computing Division, for
their tireless support of NCL.
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
608
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
APPENDIX A: DATA CODING AND INTERPRETATION OF ANN PREDICTIONS
As outlined in Section 3.3.2, predictor and target data must be encoded prior to training of the upscaling
ANNs. This is straightforward for the predictor data, as they are simply the measurements at all sites as
standardized anomalies from the 15 year study period mean at each site. The AWS and ice-core chemistry
data were averaged to annual values prior to calculating the anomalies to match the annual time scale of
this study.
Encoding of the targets is potentially more complex and, thus, we tested three coding schemes for the
target SOM classifications: two versions of 1-of-N mapping and as x, y grid coordinates. The latter simply
uses the x and y grid coordinates of each ERA-15 year in the 5 × 3 SOM map. Thus, each prediction vector
is being mapped to two integer values, and two output values are needed for the prediction ANN. 1-of-N
coding translates the x, y grid coordinates into either an integer or a bit vector. When using integers, the
SOM nodes are simply numbered sequentially 1 to N (where N = xmax ymax ) from grid 0, 0 to the maximum
x, y coordinate in row order. With this scheme, only one ANN output is required. Bit vector coding attempts
to take advantage of the fact that the SOM classification is a mapping from a prediction vector to a group
of years identified by the SOM as having similar characteristics. For example, data are classified into six
separate groups by a particular SOM. In this case the upscaling ANN is mapping from the input prediction
vector to one of six groups (one-to-six mapping) and six outputs are required. For N target groups, a binary
vector of length N is created with the position for the target group set to 1, and N output nodes are needed.
For example, if group 3 of seven groups is the target classification for a sample, then the corresponding
target vector will be 0010000 and seven output nodes are required. Bit vector coding thus has the highest
complexity in the output layer, since it requires the most output nodes. It also has additional drawbacks, such
as not being able to map SOM nodes that have not had input mapped to them. Integer 1-of-N coding has the
lowest complexity (only one output node), but it has problems with small errors (off by only one position)
potentially moving the prediction to the opposite side of the SOM. The x, y grid coordinate coding scheme
does not have the problems of either 1-of-N integer coding or bit vector coding, at the small expense of
a slightly more complex output layer. Thus, after evaluating all three schemes, we have selected x, y grid
coordinates as the target format for ANN training.
Because coordinate values are predicted as real numbers, post-processing is required to map predicted
values to the SOM grid for comparison with known targets and other analysis steps. The simplest approach
is just to map to the closest SOM grid coordinates (based on shortest distance). This takes advantage of the
power of ANNs to make predictions of targets not seen during training (i.e. SOM nodes not mapped during
the original SOM training), but it is also open to some drawbacks. For example, predictions that actually lie
well outside the coordinate range of the SOM grid will be collapsed into the corners and edges of the grid.
If it is a 5 × 3 SOM, then all upscaling ANN predictions greater than five and greater than three will map to
the lower right node of the SOM. This becomes a problem when the distance between the prediction and the
nearest x, y grid coordinate grows larger than one or two units, the maximum internode spacing within the
grid itself. (See Section 4.2.3 for the impact of this error.) Future work with the post-processing steps will
look at alternative ways to map the ANN predictions back to the SOM grid.
REFERENCES
Alley RB, Cuffey KM. 2001. Oxygen- and hydrogen-isotopic ratios of water in precipitation: beyond paleothermometry. In Stable Isotope
Geochemistry, Valley JW, Cole D (eds). Reviews in Mineralogy & Geochemistry, vol. 43; Mineralogical Society of America/The
Geological Society: 527–553.
Alley RB, Anandakrishnan S, Jung P. 2001. Stochastic resonance in the North Atlantic. Paleoceanography, 16: 190–198.
Armstrong RL, Brodzik MJ. 1995. An Earth-gridded SSM/I data set for cryospheric studies and global change monitoring. Advances
in Space Research 10: 155–163.
Barry RG, Carleton AM. 2001. Synoptic and Dynamic Climatology. Routledge.
Benoist JP, Jouzel J, Lorius C, Merlivat L, Pourchet M. 1982. Isotope climatic record over the last 2,500 years from Dome C (Antarctica)
ice cores. Annals of Glaciology 3: 17–22.
Bromwich DH, Fogt RL. 2004. Strong trends in the skill of the ERA-40 and NCEP/NCAR reanalyses in the high and middle latitudes
of the Southern Hemisphere, 1958–2001. Journal of Climate: 17: 4603–4619.
Bromwich DH, Robasky FM, Cullather RI, Vanwoert ML. 1995. The atmospheric hydrologic cycle over the Southern Ocean and
Antarctica from operational numerical analyses. Monthly Weather Review 123: 3518–3538.
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
ICE-CORE-BASED SYNOPTIC RECONSTRUCTIONS
609
Bromwich DH, Cullather RI, Van Woert ML. 1998. Antarctic precipitation and its contribution to the global sea-level budget. Annals
of Glaciology 27: 220–226.
Bromwich DH, Rogers AN, Kållberg P, Cullather RI, White JWC, Kreutz KJ. 2000. ECMWF analyses and reanalyses depiction of
ENSO signal in Antarctic precipitation. Journal of Climate 13: 1406–1420.
Bromwich DH, Monaghan AJ, Guo Z. 2004. Modeling the ENSO modulation of Antarctic climate in the late 1990s with Polar MM5.
Journal of Climate 17: 109–132.
Cavazos T. 1999. Large-scale circulation anomalies conducive to extreme events and simulation of daily rainfall in northeastern Mexico
and southeastern Texas. Journal of Climate 12: 1506–1523.
Cavazos T. 2000. Using self-organizing maps to investigate extreme climate events: an application to wintertime precipitation in the
Balkans. Journal of Climate 13: 1718–1732.
Crane RG, Hewitson BC. 1998. Doubled CO2 precipitation changes for the Susquehanna basin: down-scaling from the GENESIS
general circulation model. International Journal of Climatology 18: 65–76.
Cullather RI, Bromwich DH, Grumbine RW. 1997. Validation of operational numerical analyses in Antarctic latitudes. Journal of
Geophysical Research 102: 13 761–13 784.
Cullather RI, Bromwich DH, Van Woert ML. 1998. Spatial and temporal variability of Antarctic precipitation from atmospheric methods.
Journal of Climate 11: 334–367.
Delmas RJ, Kirchner S, Palais JM, Petit J-R. 1992. 1000 years of explosive volcanism recorded at the South Pole. Tellus, series B:
Chemical and Physical Meteorology 44: 335–350.
Demuth H, Beale M. 2000. Neural Network Toolbox. Mathworks, Inc..
ECMWF. 2000. ERA-15. http://wms.ecmwf.int/research/era/Era-15.html [Last accessed 2001].
ECMWF. 2001. ERA-40 project plan. http://wms.ecmwf.int/research/era/Project plan.html [Last accessed 30 July 2001].
Efron B, Tibshirani R. 1993. An Introduction to the Bootstrap. Chapman & Hall.
Gardner MW, Dorling SR. 1998. Artificial neural networks (the multilayer perceptron) — a review of applications in the atmospheric
sciences. Atmospheric Environment 32: 2627–2636.
Genthon C, Braun A. 1995. ECMWF analyses and predictions of the surface climate of Greenland and Antarctica. Journal of Climate
8: 2324–2332.
Gibson JK, Kållberg P, Uppala S, Hernandez A, Nomura A, Serrano E. 1999. 1. ERA-15 description. ECMWF re-analysis report series,
European Centre for Medium-Range Weather Forecasts, Reading, UK.
Goodman PH. 1996. NevProp software, version 4. http://www.scs.unr.edu/nevprop [Last accessed 2002].
Haykin SS. 1999. Neural Networks: A Comprehensive Foundation, 2nd edn. Prentice Hall.
Hewitson BC, Crane RG (eds). 1994. Neural Nets: Applications in Geography. Kluwer Academic.
Hewitson BC, Crane RG. 2002. Self-organizing maps: applications to synoptic climatology. Climate Research 22: 13–26.
Jouzel J, Vimeux F, Caillon N, Delaygue G, Hoffman G, Masson-Delmotte V, Parrenin F. 2003. Magnitude of isotope/temperature
scaling for interpretation of central Antarctic ice cores. Journal of Geophysical Research 108: ACL 6-1–6-10. DOI:
10.1029/2002JD002677.
Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Leetmaa A,
Reynolds B, Chelliah M, Ebisuzaki W, Higgins W, Janowiak J, Mo KC, Ropelewski C, Wang J, Jenne R, Joseph D. 1996. The
NCEP–NCAR 40-year reanalysis project. Bulletin of the American Meteorological Society 77: 437–472.
Kistler R, Kalnay E, Collins W, Saha S, White G, Woollen J, Chelliah M, Ebisuzaki W, Kanamitsu M, Kousky V, van den Dool H,
Jenne R, Fiorino M. 2001. The NCEP–NCAR 50-year reanalysis: monthly means CD-ROM and documentation. Bulletin of the
American Meteorological Society 82: 247–267.
Kohonen T. 1990. The self organizing map. Proceedings of the IEEE 78: 1464–1480.
Kohonen T. 1995. Self-Organizing Maps. Springer Series in Information Sciences, vol. 30. Springer-Verlag.
Kohonen T, Hynninen J, Kangas J, Laaksonen J. 1996. SOM PAK: the self-organizing map program package. Technical Report A31,
Helsinki University of Technology, Laboratory of Computer and Information Science, Espoo.
Kreutz KJ, Mayewski PA, Twickler MS, Whitlow SI, White JWC, Shuman CA, Raymond CF, Conway H, McConnell JR. 1999.
Seasonal variations of glaciochemical, isotopic, and stratigraphic properties in Siple Dome, Antarctica surface snow. Annals of
Glaciology 29: 38–44.
Langway Jr CC, Osada K, Clausen HB, Hammer CU, Shoji H, Mitani A. 1994. New chemical stratigraphy over the last millenium for
Byrd Station, Antarctica. Tellus, Series B: Chemical and Physical Meteorology 46: 40–51.
Lazzara MA. 2000. Antarctic automatic weather stations Web site home page. http://uwamrc.ssec.wisc.edu/aws/ [Last accessed 13 May
2000].
Legrand M. 1987. Chemistry of Antarctic snow and ice. Journal de Physique 48: 77–86.
Legrand M, Mayewski PA. 1997. Glaciochemistry of polar ice cores: a review. Reviews of Geophysics 35: 219–243.
Legrand MR, Lorius C, Barkov NI, Petrov VN. 1988. Vostok (Antarctica) ice core: atmospheric chemistry changes over the last climatic
cycle (160,000 years). Atmospheric Environment 22: 317–331.
Lorius C. 1989. Polar ice cores and climate. In Climate and Geo-Sciences, Berger A, Schneider S, Duplessy JCl (eds). Kluwer Academic
Publishers: 77–103.
Marshall GJ. 2002. Trends in Antarctic geopotential height and temperature: a comparison between radiosonde and NCEP–NCAR
reanalysis data. Journal of Climate 15: 659–674.
Mayewski PA, Spencer MJ, Lyons WB, Twickler MS, Dibb J. 1988. Ice core records and ozone depletion — potential for a proxy
ozone record. Antarctic Journal of the United States 23: 64–68.
Mayewski PA, Meeker LD, Twickler MS, Whitlow S, Yang QZ, Lyons WB, Prentice M. 1997. Major features and forcing of highlatitude Northern Hemisphere atmospheric circulation using a 110,000-year-long glaciochemical series. Journal of Geophysical
Research 102: 26 345–26 366.
Mosley-Thompson E, Dai J, Thompson LG, Grootes PM, Arbogast JK, Paskievitch JF. 1991. Glaciological studies at Siple Station
(Antarctica): potential ice core paleoclimatic record. Journal of Glaciology 37: 11–22.
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)
610
D. B. REUSCH, B. C. HEWITSON AND R. B. ALLEY
North GR, Bell TL, Cahalan RF, Moeng FJ. 1982. Sampling errors in the estimation of empirical orthogonal functions. Monthly Weather
Review 110: 699–706.
Phillpot HR. 1997. Some observationally identified meteorological features of East Antarctica (Meteorological Study No. 42). Australia
Bureau of Meteorology.
Rastorguev VI, Alvarez JA. 1958. Description of the Antarctic circulation observed from April to November 1957 at the IGY Antarctic
Weather Central, Little America Station, IGY World Data Center A, National Academy of Sciences, Washington, DC.
Reusch DB, Alley RB. 2002. Automatic weather stations and artificial neural networks: improving the instrumental record in West
Antarctica. Monthly Weather Review 130: 3037–3053.
Reusch DB, Alley RB. 2004. A 15-year West Antarctic climatology from six automatic-weather-station temperature and pressure records.
Journal of Geophysical Research 109: 1–28. DOI: 10.1029/2003JD004178.
Reusch DB, Mayewski PA, Whitlow SI, Pittalwala II, Twickler MS. 1999. Spatial variability of climate and past atmospheric circulation
patterns from central West Antarctic glaciochemistry. Journal of Geophysical Research 104: 5985–6001.
Sammon Jr JW. 1969. A nonlinear mapping for data structure analysis. IEEE Transactions on Computers C-18: 401–409.
Schneider DP, Steig EJ. 2002. Spatial and temporal variability of Antarctic ice sheet microwave brightness temperatures. Geophysical
Research Letters 29: 25–1–25-4. DOI: 10.129/2002GL15490.
Shuman CA, Alley RB, Fahnestock MA, Bindschadler RA, White JWC, Winterle J, McConnell JR. 1998. Temperature history and
accumulation timing for the snow pack at GISP2, central Greenland. Journal of Glaciology 44: 21–30.
Sinclair MR, Renwick JA, Kidson JW. 1997. Low-frequency variability of Southern Hemisphere sea level pressure and weather system
activity. Monthly Weather Review 125: 2531–2543.
Smith TM, Reynolds RW, Livezey RE, Stokes DC. 1996. Reconstruction of historical sea surface temperatures using empirical
orthogonal functions. Journal of Climate 9: 1403–1420.
US ITASE Steering Committee. 1996. Science and implementation plan for US ITASE: 200 years of past Antarctic climate and
environmental change. US ITASE Workshop, Baltimore, MD, National Science Foundation.
Von Storch H, Zwiers FW. 1999. Statistical Analysis in Climate Research. Cambridge University Press.
Waddington ED. 1996. Where are we going? The ice core — paleoclimate inverse problem. In Chemical Exchange Between the
Atmosphere and Polar Snow, Wolff EW, Bales RC (eds). NATO ASI Series I: Global Environmental Change 43. Springer-Verlag:
630–640.
White JWC, Barlow LK, Fisher D, Grootes P, Jouzel J, Johnsen SJ, Stuiver M, Clausen H. 1997. The climate signal in the stable
isotopes of snow from Summit, Greenland — results of comparisons with modern climate observations. Journal of Geophysical
Research 102: 26 425–26 439–.
White JWC, Steig EJ, Cole J, Cook ER, Johnsen SJ. 1999. Recent, annually resolved climate as recorded in stable isotope ratios in ice
cores from Greenland and Antarctica. In 10th Global Change Studies. American Meteorological Society: 300–302.
Wolff EW, Peel DA. 1985. The record of global pollution in polar snow and ice. Nature 313: 535–540.
Zielinski GA, Fiacco RJ, Mayewski PA, Meeker LD, Whitlow S, Twickler MS, Germani MS, Endo K, Yasui M. 1994. Climatic impact
of the AD 1783 Asama (Japan) eruption was minimal — evidence from the GISP2 ice core. Geophysical Research Letters 21:
2365–2368.
Copyright  2005 Royal Meteorological Society
Int. J. Climatol. 25: 581–610 (2005)