A historical upper air-data set for the 1939

INTERNATIONAL JOURNAL OF CLIMATOLOGY
Int. J. Climatol. 23: 769–791 (2003)
Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/joc.914
A HISTORICAL UPPER AIR-DATA SET FOR THE 1939–44 PERIOD
a
STEFAN BRÖNNIMANNa,b, *
Lunar and Planetary Laboratory, University of Arizona, Tucson, AZ 85721, USA
b Institute of Geography, University of Bern, Switzerland
Received 16 December 2002
Revised 4 March 2003
Accepted 15 March 2003
ABSTRACT
Historical upper-air data from radiosonde ascents and weather flights were re-evaluated in order to study the circulation
of the upper troposphere and lower stratosphere during the 1939–44 period. Temperature and geopotential height data
from 132 records, comprising around 26 500 single atmospheric profiles and 1750 monthly mean profiles, were compiled,
digitized, controlled, adjusted, and assessed. The data stem from a number of countries in the extratropical Northern
Hemisphere, including Germany and occupied areas, Sweden, the UK, the former Soviet Union, and the USA. In this
paper, the principal procedures used to correct the historical radiosonde data for the effects of lag and radiation errors are
presented and ways of assessing the accuracy and precision of the data are discussed. The results show that the specified
quality criteria are generally met by most data records. Some of the records had to be corrected for systematic errors; a
few were rejected. Many series, however, were too short for a reasonable assessment. Also, individual ascents may have
a larger error. Monthly anomaly maps of upper-level temperature and geopotential height based on the re-evaluated data
show clear spatial patterns that are consistent with each other and with corresponding anomaly fields from the Earth’s
surface. The re-evaluated data can be used to study synoptic to interannual variability, but they are not suitable for
long-term trend analysis. Copyright  2003 Royal Meteorological Society.
KEY WORDS:
radiosonde data; aircraft data; aerological data; historical data; Northern Hemisphere; geopotential height; air temperature
1. INTRODUCTION
Exceptional total ozone values at several sites, as well as unusual climatic conditions at the Earth’s surface,
point to a possible anomaly of the circulation of the stratosphere and troposphere of the extratropical Northern
Hemisphere in the early 1940s (Labitzke and van Loon, 1999). This supposed anomaly seems worth studying;
however, this is not possible with the currently available upper-air data or gridded data sets above the Earth’s
surface, which start around 1948. Nevertheless, data from pilot balloons, radiosonde ascents, and weather
flights since the 1930s can still be found today on paper in various meteorological archives. These data could
enable a detailed study of the circulation of the early 1940s, if one is willing to take the trouble of compiling,
digitizing, controlling, assessing plausibility, correcting and validating them.
This effort was undertaken for a small fraction of the available radiosonde and weather flight data for the
1939 to 1944 period. In total, temperature and geopotential height from around 26 500 single atmospheric
profiles and 1750 monthly mean profiles were digitized and processed. This paper presents the new upper-air
data set, with a special focus on the procedures used for correcting the data, an assessment of the quality, and
a brief presentation of results. For a full description of the data sources, instruments, data series and station
information, procedures used for re-evaluating the data, and detailed results of the quality assessment, the
reader is referred to Brönnimann (2003), hereafter termed the full report. The data can be downloaded from
the Website given at the end of the article.
* Correspondence to: Stefan Brönnimann, Lunar and Planetary Laboratory, University of Arizona, P.O. Box 210092, Tucson, AZ
85721, USA; e-mail: [email protected]
Copyright  2003 Royal Meteorological Society
770
S. BRÖNNIMANN
2. AIMS AND QUALITY REQUIREMENTS
The final data should suit the needs of different planned analyses. On the one hand, daily time series from
certain sites will be analysed with respect to their variability and, for example, compared with total ozone
time series. On the other hand, monthly mean values from a large number of sites will be analysed in
their spatial context and used in statistical procedures to reconstruct, together with station data from the
Earth’s surface, monthly mean fields of geopotential height and temperature at various levels up into the
lowermost stratosphere. Both of these analyses will be based on anomalies from the long-term mean seasonal
cycle, where the National Centers for Environmental Prediction (NCEP) — National Center for Atmospheric
Research (NCAR) reanalysis data set is chosen as the long-term reference (Kalnay et al., 1996; see Section 3.6.
for a discussion on the possible problems). The individual profiles could eventually be used to supplement
observations from the Earth’s surface and from pilot balloons in a data assimilation and modelling procedure
to construct a ‘reanalysis’ data set for this time period. It is important to note that the focus of all these
applications is on time scales from synoptic to interannual variability. There is no intention to obtain a data
set suitable for trend analyses.
The applications sketched above dictate a certain range of accuracy and precision of the data set. After a
preliminary analysis of the available data for that time period (see Section 8) with respect to the expected
signal, I defined the target precision as ±4 ° C for single ascents and ±1.6 ° C for monthly mean values, meaning
that 90% of the data points in each case should be within the given limits. For geopotential height, the target
precision depends on the pressure level (around 700 to 100 hPa) and is estimated as ±70–160 gpm for single
ascents and ±30–80 gpm for monthly means. The bias with respect to the reference, i.e. the NCEP–NCAR
reanalysis data set, should be less than ±0.7 ° C (±15–30 gpm). A data set that meets these targets could also
be useful for certain other climatological analyses.
3. DATA ARCHIVES
The data presented in this paper stem from five archival sources. A brief overview is given in this section,
and more detailed descriptions are included in the full report. The bibliographical details of these archival
sources are listed in Section 10.
3.1. The Lindenberg compilation
An estimated 20 000 radiosounding ascents from about 60 sites in Germany and occupied areas in 1939 to
1944 were later compiled at the Observatory of Lindenberg by Beelitz and Robitzsch (1949). I digitized data
from around 9500 ascents from 13 sites in Europe and three sites in North Africa.
Data for geopotential height (in dynamic metres; least significant digit: 1), temperature (° C, 0.1), and
humidity (%, 1) on pressure levels (1000 to 100 hPa in steps of 100 hPa, and 50 hPa) are printed in tables.
A cover sheet for each station gives additional information, followed by an assessment of the quality of
the corresponding data and signed by the authors. Obviously, the soundings had been worked through and
implausible ascents were excluded. The Lindenberg compilation consists of thin A3-sized sheets: carbon
copies, where the colour has become faint and blurred over time. As a consequence, the legibility is a
problem (see full report for details). Parts of the Lindenberg compilation were photocopied in Lindenberg.
3.2. Täglicher Wetterbericht der Deutschen Seewarte
The upper-air section of the daily weather report from Germany, issued by the Deutsche Seewarte, contains
hand-written tabulated data from weather flights and radiosonde ascents from a large number of stations for
the period of interest. The data are given in different forms; I used the pressure level data (in dynamic metres,
least significant digit 1 or 10, and ° C, 0.1 or 1) for the levels 700 to 100 hPa (in 100 hPa steps). From January
1942 on, the data above 300 hPa were given on different levels and not digitized.
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
HISTORICAL UPPER-AIR DATA SET
771
Around 8000 profiles from the period January 1939 to February 1942 were digitized, mainly from the
former Soviet Union (until 4 July 1941), but also from other locations in Europe and North Africa. Because
of the large amount of information in this report (soundings, upper-level charts), the Wetterbericht became
indispensable for assessing the plausibility or cross-checking information from the Lindenberg compilation
(e.g. indecipherable numbers). The data sheets for the period January 1939 to June 1943 were photocopied
at the archive of the Bundesamt für Seeschiffahrt und Hydrographie in Hamburg, Germany.
3.3. UK daily weather report, upper-air section
The upper-air section of the UK daily weather report contains radiosonde and weather flight data in small
hand-written tables. The data are primarily from the UK, and irregularly from other countries. There were
many changes in layout and reporting practices during the 1939 to 1944 period, which made the work difficult.
For instance, radiosoundings were first given as significant points, then as significant points plus standard
levels, later on only three pressure levels (700, 500, 250 hPa), and again later on pressure levels in 50 hPa
steps. Altitude (in feet, least significant digit 10 or 100 ft) and temperature (in ° F, 0.5 or 1) were digitized
from around 4350 weather flights and 3850 radiosoundings from nine sites in the UK. However, because
of the changes in reporting practices and restricted digitizing time, only data from one or two levels were
digitized in most cases (see full report for details).
The UK Daily Weather Report was obtained from the archive of the UK Met Office in Bracknell, UK.
The data up to 1941 were digitized in the archive, whereas the data sheets for the 1942 to 1944 period were
photocopied.
3.4. Yearbooks of the Swedish Meteorological and Hydrological Institute
Radiosonde data from Sweden, mainly from one site, can be found in Part 6 of the Meteorological Yearbook
of the Swedish Meteorological and Hydrological Institute in the form of printed tables. The data are given on
geopotential levels, on pressure levels and as significant points. A full description is given in the yearbook
1936–1937. The data chosen for this study are pressure-level data (900 to 100 hPa in 100 hPa steps), which are
given in dynamic metres (least significant digit 1) and ° C (0.1), from 733 ascents. The Swedish Meteorological
Service kindly sent us photocopies of the data.
3.5. Monthly Weather Review (USA)
The aerological network of the USA has its roots in the 1920s and was in relatively good shape from
around 1938 onwards (Hughes and Gedzelman, 1995; Moninger et al., 2003). In this study, an attempt was
made to work directly with the monthly mean values that can be found in the Monthly Weather Review for the
time period of interest. The data are printed in tables with the number of observations, pressure (in hPa, least
significant digit 1), temperature (° C, 0.1) and humidity for fixed altitude levels (every 0.5 km up to 3 km,
then every 1 km). Data from around 1750 monthly mean profiles from 36 sites were digitized, consisting
mainly of radiosonde ascents and some weather flights.
3.6. Data used for correction and validation
For several steps in the re-evaluation procedure, a reference climatology is needed. This also holds for many
of the validation experiments and, as mentioned above, for the final analysis. I chose the NCEP–NCAR
reanalysis data set (Kalnay et al., 1996) as a reference. Note that this data set has inaccuracies and
inhomogeneities (e.g. see Santer et al. (1999) and Randel et al. (2000)). Nevertheless, it is the best currently
available long-term global meteorological data set and the reported problems mainly concern the tropics
rather than the northern extratropics. In view of the specified quality targets, I considered it safe to use the
NCEP–NCAR reanalysis data set as a reference.
For the re-evaluation procedure, the long-term means provided by the NOAA–CDC via their Website
were used, i.e. long-term mean values of geopotential height and temperature for each calendar month, level,
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
772
S. BRÖNNIMANN
and variable as well as four-times-daily long-term mean anomalies for the months January, April, July, and
October. The reference period for the long-term means is 1968 to 1996. The anomalies were interpolated to
fill in the other months and added to the monthly mean values. The diurnal cycle was then interpolated in time
between the standard times of observation, yielding a ‘climatology’ for any given time of day, month, level,
location, and variable. Note that the diurnal cycle in the NCEP–NCAR data is model generated (radiosonde
ascents are normally performed twice per day) and may deviate from the true diurnal cycle. However, in
view of the specified quality targets, this error is assumed to be tolerable.
For the validation, the NCEP–NCAR reanalysis data (1948–97) were used, as well as monthly temperature
data from 111 sites from the NASA–GISS station data set (Peterson and Vose, 1997; homogenized version,
see full report for details) and from Pic du Midi (Dessens, 1991). Three-times-daily temperature and pressure
data from two mountain sites in Switzerland, Säntis (2500 m a.s.l.) and Jungfraujoch (3580 m) were provided
by MeteoSwiss. In addition, gridded sea-level pressure (SLP) data (Trenberth and Paolino, 1980) and monthly
mean 300 hPa geopotential height fields over Europe, which were reconstructed by Schmutz et al. (2001)
from 1901 to 1947 based on station measurements at the surface, were used for the validation. The preliminary
analysis makes use of the gridded surface air temperature data set HadCRUTv (Jones et al., 2001).
4. DATA RECORDS AND DIGITIZING PROCEDURE
In total, 132 records from 116 sites in the Northern Hemisphere were digitized (Figure 1). Each record consists
of data from 1 to 2165 vertical profiles. The following is a brief summary of the digitizing procedure, which
is described in detail in the full report. Not all data were digitized. For instance, only the levels that are also
standard pressure levels today were chosen, i.e. mostly from 700 to 100 hPa in 100 hPa steps. Nocturnal
Number of observations
Aircraft Radiosonde
<50
50-200
200-500
500-1000
1000-2000
>2000
Figure 1. Map showing the location and size of the records used in this study. For the US records, the size refers to the total number
of ascents on which the monthly mean values are based
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
HISTORICAL UPPER-AIR DATA SET
773
ascents were preferred. In the case of twice-daily soundings, the daytime data were normally not digitized,
or if so, only from 300 hPa upward.
The data were digitized using speech recognition (Dragon Systems, Dragon NaturallySpeaking 5.00
Preferred Deutsch) followed by a three-step plausibility screening based on redundant information and on
hydrostatic considerations. Each outlying data point was checked on the original sheets and the plausibility
was assessed from comparison with data from neighbouring levels, earlier or later ascents, surrounding stations
(often non-digitized data), meteorological stations or daily upper-level weather charts. Data points considered
implausible were flagged (for all analyses shown in this study, flagged data points were excluded). This
procedure is manual, and hence very time consuming, but it is considered essential for obtaining a high
quality. The error rate of the digitizing procedure was 0.1–0.8% (range of 20 random samples of 1000
numbers each) and the errors mostly concerned the least significant digit. This error rate was considered
acceptable. Then, the units were converted to standard units (° C, gpm, UTC) and, if necessary, the data
interpolated to pressure levels.
5. INSTRUMENTS AND ERRORS
This section gives a brief discussion of the instruments used and the errors to be expected in the original data.
The focus is on the radiosonde data; aircraft measurements, at that time, were believed to be more reliable
than radiosondes and were often used as a reference (e.g. Diamond et al., 1938), although they, too, had
errors (Scherhag, 1948).
There are various types of error that affect the accuracy and precision of radiosonde data. Some of them
are related to the ambient conditions during the ascent, such as the lag and radiation errors, the temperature
dependence of the pressure element or insufficient ventilation of the sonde in the balloon wake. Others are not
related to the ascent, and include the ground correction, the calculation of geopotential height from pressure
and temperature, as well as the instrument-specific errors of the temperature and pressure measurements.
Most of the data sources do not indicate the type of sonde used. However, by studying the literature it was
possible to find a very likely sonde type for each country (Table I). Unfortunately, relatively little is known
about the systematic errors of each of these sondes. The following is a summary of what can be found in
the literature. For the German sonde, Scherhag (1948) assumes an error of the temperature measurement of
0.5 ° C, and from this estimates errors in geopotential height of 10 gpm, 20 gpm, 35 gpm, and 50 gpm at
500 hPa, 200 hPa, 100 hPa, and 40 hPa respectively (he does not discuss the pressure error). Diamond et al.
(1938) compared the US sonde with aircraft measurements and found an agreement within 2 ° C for 90% of
the observations and estimated the error as ±1 ° C. For the pressure measurement an error of ±1 hPa was
anticipated (Diamond et al., 1937, 1938).
More information is available from the first international radiosonde intercomparison under the auspices
of the WMO, which was performed in 1950 in Payerne, Switzerland (OMI, 1951; OMM, 1952). Among
others, sondes from Germany, Finland, UK, and the USA participated in this campaign. Some of the sonde
types were improved between the early 1940s and 1950. Nevertheless, the results provide information about
the magnitude of the errors to be expected, and they explicitly address the different sources of error. The
method of calculating geopotential height from the raw data (by numerical or graphical methods or look-up
tables) revealed maximum differences from around 5 gpm at 700 hPa to 30 gpm at 100 hPa. The mean lag
error was estimated as 0.4 ° C (at 700 hPa) and 0.6 ° C (at 400 hPa) for the Finnish, UK, and German sonde,
but was 0.1 ° C and 0.2 ° C respectively for the US sonde. The radiation error averaged over all ascents was
estimated to be around 1.5 ° C at 200 hPa for the UK and Finnish sondes but was much less for the German
and US sondes. The uncorrected data from nocturnal flights revealed systematic errors (mean deviation of
one sonde type from all others) in the range of −0.3 to +0.7 ° C at pressures larger than 300 hPa and −1.5
to +0.6 ° C at higher levels for the four sonde types discussed here. The corresponding systematic errors in
geopotential height were −15 to +10 gpm at 500 hPa and −30 to +15 at 200 hPa. The standard deviations
of the differences were around 1.5 ° C for temperature and 10 to 90 gpm (at 700 to 100 hPa) for geopotential
height. Interestingly, the reports point on several occasions to the importance of well-trained staff as one of
the most important quality factors.
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
774
S. BRÖNNIMANN
Table I. Types of radiosonde used in the early 1940s, λ0 is the lag coefficient (s), α is the pressure dependence of
radiation error, S0 (h) is the radiation error (see text for details)
Year
Developed/
produced
H-38
1938
Dr Graw GmbH
Germany
A/B
RS-11
1936
Väisälä
Finland,
Swedenb
A/B
Kew Pattern MK-I
1937
National Physical UK
Laboratory
1930s Molchanow
USSR
Comb sonde
(RZ-049?)
Diamond–Hinman, 1937
version unknown
Diamond–Hinman, 1943
version unknown
Blue Hill
Country
Sensorsa
Sonde type
1935
National Bureau
of Standards
Friez & Sons
National Bureau
of Standards
USA (Navy,
Weather
Bureau)
USA (Navy,
Weather
Bureau)
Harvard
University
USA
(Boston)
A/B
A or T
B
A/E
A/R
A or T
B
Reference
Hesse (1961);
personal
communication
J. Thieme, Dr
Graw GmbH
Väisälä (1941,
1949), Raunio
(1950)
Lander (1946)
Gaffen (1993),
Zaitseva (1993)
Diamond et al.
(1937, 1938),
Gaffen (1993)
Moore and
Neiburger
(1945), Gaffen
(1993)
Lange (1937)
λ0
α
S0 (h)
20 0.75 Raunio (1950)
scaled with
0.78
–b
–b
–b
15 0.78 Tweles and
Finger (1960)
10 0.82 Tweles and
Finger (1960)
–
–
–
–
–
–
–
–
–
a A: aneroid box pressure sensor; B: bimetal thermometer; E: electrolyte thermometer in glass tube; R: ceramic thermometer based on
resistance principle; T: burdon tube.
b This sonde type was also assumed for the records from Reykjavik, Tromsø 1939, and Rabat, but in these cases the standard correction
was used, i.e. λ0 = 15 s, α = 0.852, S0 (h) from Raunio (1950).
The reports show that the expected errors are in the same range as the quality targets. Hence, it is necessary
to make corrections. Instrument-specific corrections (other than statistical) are possible only for the lag and
radiation errors. For this purpose, specific information is needed for each sonde, such as the dependence of
the radiation error on the solar zenith angle and pressure and an estimation of the lag of the thermometer.
6. CORRECTION PROCEDURE
Today, radiosonde data are corrected for the radiation and lag errors via numerical models of the heat balance
of the sonde during ascent (Luers and Eskridge, 1995, 1998; Durre et al., 2002). Apart from the ascent velocity
and the solar elevation, these models include a wide range of environmental parameters and model the heat
transfer between the different parts of the sonde. It is not possible to apply such elaborated corrections to
historical data, where none of these parameters, except for the solar elevation, is known and the properties
of the sonde are not known in detail either. Rather, I decided to start from the information that can be found
in the old literature.
6.1. Radiation error
In order to correct the data for the radiation error, I adopted the general framework by Väisälä (1941, 1949)
and Raunio (1950) who suggest the following formula for the radiation error TR (K) as a function of solar
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
775
HISTORICAL UPPER-AIR DATA SET
elevation h and pressure p:
I (h, p)
v0 T (p) p0 α
TR (h, p) =
S0 (h)
I0
v(p)T0 p
(1)
where v is the ascent velocity, T (K) temperature, and the subscript 0 refers to a reference (see below). The
first term I (h, p)/I0 represents the radiation reaching the sonde normalized to the solar constant I0 (1367 W
m−2 ). The second term S0 (h) (K) gives the reference radiation error of a given sonde as a function of the
solar zenith angle. The third term represents the ventilation of the sonde and the last term is the pressure
dependence. In Väisälä (1941), the constants v0 = 5 m s−1 , p0 = 100 hPa, and T0 = 216.7 K are used to
normalize the formula to a typical ascent velocity and to the 100 hPa level. For these conditions, S0 (h) and
α were then empirically estimated for the Finnish sonde. The functions S0 (h) and I (h, p)/I0 are given in
figure 2 of Väisälä (1949) and figure 1 of Raunio (1950) respectively.
According to Väisälä (1941), this formula is valid for the 200 hPa level and higher altitudes. He suggested
to assume zero at 500 hPa and to use a scaling linear function with altitude for the levels between 500 and
200 hPa. I changed the zero level of the interpolation to 700 hPa (which is normally the lowest level) because
it is known from historical (Flohn, 1944, 1947) and current (Luers and Eskridge, 1995, 1998) literature that
the radiation error may also be effective in the middle troposphere.
I assumed that this general formula can be used for all radiosonde data if the function S0 (h) and the
parameter α are known for each sonde type. A constant value of 5 m s−1 was used for v(p) for all radiosonde
ascents due to lack of specific information, T (p) was taken from the sonde data or (if missing) from the
climatology and T0 was taken from Väisälä (1941). TR (K) is then only a function of h, which was calculated
from the start time.
Table I gives an overview of S0 (h) and α adopted for each sonde type. This paragraph briefly describes,
how they were derived. As mentioned above, both S0 (h) and α (0.852) are known for the Finnish sonde type.
However, an analysis of the raw data suggests that these data were already corrected for the radiation error,
as was partly expected from the literature (OMI, 1951; see also Väisälä (1941, 1949)). Hence no correction
was applied. For the UK and Soviet sondes, S0 (h) and α were determined from the correction curves given
in Tweles and Finger (1960) for various sonde types and pressure levels. S0 (h) was taken from the curve
referring to the 100 hPa level, divided by I (h, p)/I0 . The parameter α was estimated by comparing S0 (h)
with the corresponding function at 200 hPa. Values of around 0.78 and 0.82 were found for the UK and
Soviet sondes respectively. Note, however, that the curves for the UK sonde in Tweles and Finger (1960)
refer to type MKII-B, whereas the data probably stem from type MKI, and for the Soviet sonde they refer to
type RZ-049, which may or may not have been used for the soundings re-evaluated in this study.
Error in GPH [gpm]
250
41 hPa
200
150
96 hPa
100
50
225 hPa
0
-5
15
35
55
Solar elevation [°]
Figure 2. Radiation error in geopotential height as a function of solar elevation and pressure level. Dashed lines with crosses: error
estimated by Scherhag (1948). Thick lines: fit to these curves using the standard correction and parameters as described in Table I
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
776
S. BRÖNNIMANN
For estimating S0 (h) and α for the German sonde, I started from curves of the radiation error in geopotential
height (Figure 2), which were empirically estimated for three pressure levels (225, 96, 41 hPa) as a function
of h by Scherhag (1948). The curves are based on a large amount of data, but are crude in the sense that they
were determined from monthly mean values. Scherhag (1948) points out that they are probably wrong for low
solar elevations. Since the curves are for geopotential height, they largely depend on the temperature errors
of the levels below. As a first guess, I chose the same S0 (h) as for the Finnish sonde, scaled with a factor
f , and iteratively tested combinations of f and α until a best match with the original curves was found. The
temperature error was converted into geopotential height using the thickness equation as described below. A
very good agreement for the 225 hPa and 96 hPa curves was found using f = 0.78 and α = 0.75 (Figure 2).
The agreement is acceptable also for the 41 hPa level up to 30° solar elevation, whereas large discrepancies
are found for very high solar elevations.
There are several indications that the radiation error of the US sonde was small. According to Diamond
et al. (1937), the connection between the temperature element and the body of the sonde was minimal. OMM
(1952) mentions a small radiation error. For later sonde types, Tweles and Finger (1960) also find a very
small error. Because the US data were used only up to the 300 hPa level and because most ascents were
during the night, I assumed that no correction was necessary.
6.2. Lag error
The lag error TL can be approximated by the product of the lag coefficient λ, the ascent velocity v, and
the temperature lapse rate = −dT /dz:
TL (v, p) ≈ λ(v, p)v(p)(p)
(2)
The lag coefficient is a property of the instrument and additionally depends on air pressure and ascent
velocity (see Knowles Middleton et al. (1938)):
λ(v, p) = λ0 v(p)/v0 f (p)
(3)
where λ0 is the lag coefficient at sea level for the ascent velocity v0 and f (p) is a function describing
the pressure dependence. I again used v(p) = v0 = 5 m s−1 . The function f (p) was taken from the values
tabulated by Scrase (1954) for the UK Met Office sonde for different altitude levels (interpolated to pressure
levels using the US76 standard atmosphere). The lapse rate was taken from the (uncorrected) temperature
of the two neighbouring levels (or the given level and the neighbouring level at the top and bottom levels)
during the ascent. In the case of missing data, was taken from the climatology.
The lag coefficient λ0 is known only approximately. Adjusting the values from Scrase (1954) for the UK
sonde to v0 gives λ0 = 8 s at sea level. This value is not in agreement with the lag errors mentioned in
OMM (1952), which correspond to a lag coefficient λ0 of around 15 to 20 s for the Finnish, UK and German
sondes. In fact, Väisälä (1941) mentions a lag coefficient of the Finnish sonde of 15 s. The US sonde has a
substantially lower temperature lag because of the different sensor used. The numbers given by Diamond et al.
(1937) suggest a lag coefficient of around 4.5 s, which is in good agreement with the lag error mentioned
by OMM (1952). These considerations suggest values around 15 s for the bimetal sensor sondes (Finnish,
German, UK, and Soviet sondes) and 5 s for the US sonde. However, comparisons with reference series
(see Section 7, Figure 5), averaged over all data from one sonde type revealed better results when using
lag coefficients of 10 and 20 s for the Soviet and German sondes, respectively. These values are physically
plausible and were adopted as standard corrections, although the small systematical differences found might
also be due to other causes than an inaccurate lag coefficient (e.g. wrong ascent velocity, pressure offset).
Similar comparisons also revealed that no lag correction was necessary for the data from the Finnish and
US sondes (see Section 7). Table I summarizes the standard corrections applied for all radiosonde records.
Aircraft data were not corrected.
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
HISTORICAL UPPER-AIR DATA SET
777
6.3. Error in geopotential height
After correcting the temperature by a total amount T , the geopotential height Z was corrected using the
thickness equation:
Rd Tv0 + Tv1
p0
ln
(4)
Z1 = Z0 +
g
2
p1
where Rd is the gas constant for dry air, g is the acceleration due to gravity, Tv the virtual temperature, and
p pressure. The subscripts denote two neighbouring pressure levels. Since humidity data were not digitized
from the soundings, specific humidity from the reference climatology was used to calculate Tv . Z0 was set
to zero at 1000 hPa and Z was then calculated upwards to the highest level. For a given start time, the
temperature correction can be calculated for all levels, even if no data are available. Similarly, the geopotential
height correction can be calculated even if, for example, only data for the 250 hPa level are available.
6.4. Adjusting for changes in sonde start time
The start times of the sondes changed frequently within each record and were different for different
locations. For some of the analyses, it is desirable to correct for these changes in start time in order to
make the soundings comparable to each other. In this case, the daily mean value was used as a reference
and the soundings were adjusted by subtracting the corresponding difference determined from the reference
climatology which, however, introduces some additional uncertainty (see Section 3.6).
6.5. Final data products
The procedure to obtain a final data product from the raw data involves many steps, as discussed in
this section. Some of them are well supported; others have a more preliminary character because sufficient
information is currently not available. The final product is called Version 1.0. When further data are reevaluated and more information becomes available, some of the procedures might be reassessed and lead
to a new version of the data set. Therefore, it is important that all intermediate products are archived and
described.
The digitized and controlled, but uncorrected, data are termed DC. A data set is then produced in which
only the changes in start time are adjusted for, but not the radiation/lag error (DCA). The term DCR is used
for a data set where only radiation/lag error is accounted for but not the changes in start time. The data set
where both corrections are applied is termed DCRA. These are the main products, from which the following
are derived. DCRD is a data set with daily averages after removing flagged values, and DCRM are monthly
mean values formed from DCRD if, for a given level, variable, and month, one of the two following applies:
the number of observations is at least 13, and no gaps are longer than 7 days.
For each of these data sets, three versions are available: R, C, and S (e.g. R.DCA, C.DCRD, S.DRCM).
‘R’ refers to the standard correction described in this section and summarized in Table I. ‘C’ refers to the
data sets after additional corrections during the assessment procedure (see Section 7). Finally, ‘S’ refers to
the ‘C’ data set after records from neighbouring sites were combined. In this case, one of the two records was
adjusted for the climatological difference between the two sites determined from the NCEP–NCAR reference
for each month, level, and variable (see full report for details).
7. QUALITY ASSESSMENT
The quality of historical upper-air data needs special attention (see Section 5); a thorough assessment is very
important. However, it is difficult to find independent data series that could be used as a reference. In this
section, ways of assessing the data quality are discussed, some results are shown and a summarized assessment
is given. The main procedures were applied to 92 records (around 850 series); further tests were performed
for 22 records. More detailed results can be found in the full report.
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
778
S. BRÖNNIMANN
7.1. Method
The assessment is based on statistical tests, most of which involve a comparison between the candidate
series and a reference series. In most cases, the latter was statistically reconstructed by using information from
the Earth’s surface. Meteorological series from nearby high-elevation sites can also be used as a reference;
however, mountain sites may show a different behaviour than the free atmosphere, depending on the time
of day and season (Barry, 1992). Under certain assumptions, which will be discussed below, a set of tests
for both accuracy and precision can be derived from the difference between the candidate series and the
reference series. If a significant bias is found in a record and the decision is taken to correct the data, then
the corresponding reference series is used to guide the correction and the assessment is repeated in order to
check the consistency of the corrections. In the following, the procedure is discussed in detail.
Monthly reference series were statistically reconstructed for each of the around 850 candidate series
separately based on predictor series that are available in the historical period, as well as in a long period (∼40
years) overlapping the NCEP–NCAR data set (calibration period). In the calibration period, the NCEP–NCAR
data, interpolated to the location of the sounding site, were assumed to represent the true value at that location.
First, the mean seasonal cycle of all variables was determined in the calibration period and was subtracted
from all data, including the historical radiosoundings. A multiple regression model was then fitted to the
anomalies in the calibration period and was used to reconstruct a reference series in the historical period.
Different sets of predictor variables were used for each record (see full report for details). In most cases I
used the time series of the first three principal components of a pressure field (300 hPa geopotential height
over Europe, which itself is a reconstruction based only on data from the Earth’s surface, or continental-sized
SLP fields) and two to six temperature time series from surrounding sites, if possible mountain sites. The
skill of the reconstructions varies strongly with location and altitude. In general, I considered a reconstructed
series ‘useful’ if the explained variance R 2 in the calibration period was at least 60%. This was normally the
case up to around 400 or 300 hPa.
The general statistics for assessing the accuracy and precision of the data using statistically reconstructed
reference series are as follows. The bias, i.e. the mean difference between the observed value (subscript O)
and the true value (T) in the historical period is estimated from the difference between the observed and the
reconstructed (R) values:
xO − xT = xO − xR + xR − xT ≈ xO − xR
(5)
which assumes that the reconstructions are unbiased. The standard error for the bias, SEO – T , can be calculated
from the sum of the variances of xO − xR (error of observations with respect to reconstructions) and xR − xT
(error of reconstructions), assuming independence:
SEO – T =
sO2 – R + sR2 – T
n
(6)
where n is the number of differences in the historical period. The error of the reconstruction in the historical
period sR2 – T is not known and is approximated by the corresponding error in the calibration period, i.e. the
variance of the residuals. A bias is normally termed significant if
|xO − xT | < 2SEO – T
(7)
The precision can be assessed similarly by assuming that the variance of the differences between observed
and reconstructed series is composed of the observation error and the reconstruction error, again assumed
independent:
sO2 – R = sO2 – T + sR2 – T
Copyright  2003 Royal Meteorological Society
(8)
Int. J. Climatol. 23: 769–791 (2003)
HISTORICAL UPPER-AIR DATA SET
779
For the observation error (sO2 – T ), I have defined a target precision referring to a 90% interval (Section 2).
Dividing the target by 1.644 gives an approximate target standard deviation sTarg . Accordingly, one can assess
a series as precise if
2
+ sR2 – T
sO2 – R ≤ sTarg
(9)
where sR2 – T is again approximated by the variance of the residuals. The variance sO2 – R is estimated from
a small sample, which introduces an additional uncertainty. Hence I used the lower 95% confidence limit
instead, approximated by (von Storch and Zwiers, 1999)
(n − 1)sO2 – R
ψL
(10)
where ψL indicates the lower critical value of the χ 2 distribution at a probability of 0.05 (n − 1 degrees of
freedom). In principle, the same procedure can also be applied when comparing series from two very close
sites A and B:
2
sA2 – B ≤ 2sTarg
(11)
This relation can also be used to assess the quality of daily data. Note, however, that the difference in
space and time between observations A and B introduces a significant additional error term, which cannot be
estimated. Using this relation, therefore, leads to too frequent rejections. To some extent this also applies to
Equations (6) and (9), where sR2 – T was approximated by the variance of the residuals. Because of the problem
of overfitting, the true sR2 – T might be underestimated.
There are also more fundamental problems when using statistically reconstructed reference series. First,
the predictor series might be inhomogeneous and cause a bias in the reconstructions. Second, the underlying
assumption that the relation between the predictors and the predictand is the same in the historical and the
calibration period is not necessarily true, especially because the early 1940s probably represent an anomalous
period and because climatic trends were registered over the last 50 years. Note that a trend in the calibration
period does not necessarily bias the reconstructions if its regional spatial structure is similar to that of monthto-month variability. Third, one has to keep in mind that the final data set will not be fully independent of
the different data sets used for the reconstruction of the reference series. Fourth, the ‘true values’ (i.e. the
NCEP–NCAR data) used for the calibration are complete series with no gaps, whereas the monthly mean
values in the historical data set are sometimes based on only a few data points. Hence, a too large variability of
the monthly mean values does not necessarily imply that the daily data are not precise. Fifth, the reconstructed
series have less skill when moving to higher levels and, if the skill drops to very low values, do not have
more information than the long-term mean value. Therefore, any test is but one of several arguments on
which a decision is based. It is reasonable to consult other statistics, such as the amount of the bias, R 2 ,
n, the shape of the difference profile, and comparisons with neighbouring stations. Generally, I addressed a
bias as significant only for n ≤ 5 and if R 2 was at least 60%. Because of these shortcomings, the validation
procedure is not a universally applicable test, and in some cases it lacks power. Nevertheless, it turned out
to be a useful guideline for decisions and for the assessment of the quality of the data series.
It should be noted that most of the series in the ‘R’ data set are independent from the reference series.
The radiosonde data from Germany and the Soviet Union, however, are not independent, because the lag
coefficient was adjusted based on a comparison with the reference data (although for any individual series the
effect on the statistics is small). If a bias in a series is detected, then it can be corrected based on information
from the reference series. Hence, the two series are no longer independent with respect to the bias. Still, it is
necessary to assess the corrected series in the same way in order to check the consistency of the corrections.
As to what concerns the tests for the precision, the candidate series can be considered independent from
the reference series because the corrections do not (or only to a negligible extent) affect the variance of the
differences between the candidate and the reference series.
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
780
S. BRÖNNIMANN
7.2. Corrections
If a significant bias is found, then there are three possible procedures. First, the record can be rejected.
Normally this concerns the entire period, except if there are clear inhomogeneities, in which case the record
can be split into parts. Second, the record can be left uncorrected, despite the significant difference. This can
be the case if the bias is small compared with the target accuracy and concerns only one series, whereas
corrections would cause another series to be significantly biased. Third, if the precision of the series is good,
then it can be corrected.
Whenever possible, I did not use a purely statistical correction but one that assumes an underlying physical
mechanism. Assuming that lag and radiation corrections are adequate, possible mechanisms include offsets
in the temperature or pressure element. Both lead to an offset in temperature that is constant with altitude in
the former case and strongly increases with altitude up to the upper troposphere and then sharply decreases
at the tropopause level in the latter case.
A constant offset in temperature can simply be added to the radiation and lag error terms. The effect of an
offset in pressure P on the temperature can be calculated as
Tp (p) = (p)Z(p)
(12)
p + P
Rd
Z(p) = − Tv (p) ln
g
p
(13)
with
which is then added to the radiation and lag error terms in order to calculate the effect on geopotential height.
The offset amount was determined such as to minimize the temperature difference between the candidate series
and the reference series averaged over all levels for which the latter were considered reliable. Correcting for
an offset in a physically consistent manner is expected to reduce the offset simultaneously at all levels for
both temperature and geopotential height by changing just the value for the offset. This was mostly the case,
and the correction was then accepted. If this was not the case, then the record was rejected or left uncorrected
(see above).
7.3. Radiation correction and adjustment of start time
In order to assess the radiation correction, pairs of soundings performed on the same day were studied. It
is expected that the uncorrected data (R.DC) show higher temperatures and geopotential heights during the
day than at night, because of the true diurnal cycle and the effect of radiation. In data sets corrected for one
or both of these effects (R.DCA and R.DCRA), the difference should be smaller. Note, however, that it is
not possible to distinguish between an inaccurate radiation correction and an inaccurate adjustment for the
start time because the true diurnal cycle is not known.
This analysis was performed for 502 pairs of ascents from Freiburg i. B. from the German network and
76 pairs from Lerwick (Shetland Islands) from the UK network. Note that, apart from these 76 pairs, almost
only the 250 hPa level was digitized for all UK soundings. Only pairs were chosen where the solar elevation
of one or both ascents was larger than −5° at the 200 hPa level. The ascent with the lower solar elevation
angle was then subtracted from the other ascent. Figure 3 shows the mean differences (95% confidence intervals) for both sites for geopotential height and temperature. Weather changes can occur within a few hours;
therefore, the error bars are large. Nevertheless, in both cases a strong, statistically significant error is found
in the uncorrected data. The adjustment for the start time reduces the difference, but it is still significant.
After the radiation and lag correction, the differences are no longer statistically significant in the case of the
German data.
In the case of the UK data, there remains a significant difference between daytime and nocturnal ascents
after the standard correction. This could be due to the fact that the function S0 (h) refers to an updated
(MK-IIB) version of the UK Met Office sonde. Additional evidence for a too weak radiation correction
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
781
HISTORICAL UPPER-AIR DATA SET
b
200
Pressure [gph]
Pressure [gph]
a
400
600
200
400
600
-4
-2
0
2
4
-100
Temp. difference [°C]
0
50
100
GPH difference [gpm]
c
d
200
Pressure [gph]
Pressure [gph]
-50
400
600
200
400
600
-2
0
2
4
Temp. difference [°C]
6
-50
0
50
100
150
GPH difference [gpm]
Figure 3. Mean differences of temperature and geopotential height between high and low solar elevation for pairs of ascents made on
the same day (see text) at different levels for the records from Freiburg i. B. (a, b) and Lerwick, UK (c, d). Open diamonds: R.DC
data; dashed lines without symbols: R.DCA data; filled diamonds: R.DCRA data; open squares: C.DCRA data for Lerwick. The error
bars for the R.DCRA data refer to the 95% confidence interval
comes from the comparison of 250 hPa temperature with the reconstructed reference series, although the
latter results were not statistically significant. I decided to accept the standard correction for the ‘R’ data set,
but to scale the function S0 (h) for the alternative ‘C’ data set. A factor of 1.5 was chosen, which eliminates
the difference at the tropopause and leaves no significant differences elsewhere. The correction is crude and
preliminary and will be re-assessed in future versions of the data set if further soundings from the UK network
are digitized. Note that the scaling affects the accuracy rather than the precision. This is because there are
almost exclusively early morning ascents for the entire UK network (apart from the 76 pairs). The effect on
temperature at 250 hPa (for all UK sites) is less than 0.55 ° C in 97% of the cases (the maximum is 0.81 ° C),
whereas the mean value changes by 0.25 ° C.
7.4. Accuracy
Figure 4 shows the results of the assessment of the accuracy for Freiburg i. B., Germany. It shows the
mean difference between the candidate series and the reconstructed reference series, for the data sets DCA
and DCRA respectively, along with the standard errors of the reconstructed mean anomalies (grey bars around
the zero line) and the standard errors of the mean differences (confidence intervals with whiskers). Freiburg
represents at the same time the best and by far worst outcome of all validation results. A visual comparison
of the observed and the reconstructed time series revealed a possible offset after the first 4 months. First,
only the later part of the series, from May 1940 to April 1942 (Figures 4(a) and 4(b)) was investigated.
The reconstructed reference series for Freiburg i. B. are among the best and are reliable up to 200 hPa for
geopotential height (R 2 = 95%) and up to 300 hPa for temperature. There is a highly significant positive
offset of temperature and geopotential height in the DCA data that confirms the need for radiation and lag
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
782
S. BRÖNNIMANN
a
100
Pressure [gph]
Pressure [gph]
100
300
500
b
300
500
700
700
-4
-2
0
2
4
-100
-50
c
100
Pressure [gph]
Pressure [gph]
100
0
50
100
GPH difference [gpm]
Temp. difference [°C]
300
500
700
d
300
500
700
-8
-4
0
4
Temp. difference [°C]
8
-200
-100
0
100
200
GPH difference [gpm]
Figure 4. Mean differences (error bars give ±1 standard error) between observations and reconstructions of temperature and geopotential
height at different levels for radiosonde data from Freiburg i. B., May 1940 to April 1942 (a, b) and February to April 1940 (c, d).
Open diamonds: R.DCA data; filled diamonds: R.DCRA data; open squares: C.DCRA data. The shaded error bars give ±1 standard
error of the reconstructed mean anomaly
corrections. The remaining differences after the standard correction (R.DCRA) are small and insignificant
(the difference in 100 hPa temperature cannot be assessed). This figure represents the desired outcome of
the assessment, and the most frequent: around 70% of all series tested fall into this category, although the
differences and errors were normally larger than in Figure 4(a) and (b).
The same does not apply to the first part of the Freiburg series. Figure 4(c) and (d) shows the differences for
February to April 1940. Note that the scale is extended by a factor of two compared with Figure 4(a) and (b).
Although n < 5, I addressed these extremely large differences as significant. A comparison with three-timesdaily data from the nearby mountain sites Jungfraujoch and Säntis suggests that the break possibly occurred
around 15 April 1940. The 36 pairs of ascents in the first part of the record give no evidence for an inaccurate
radiation correction. Rather, the vertical profile of the temperature difference is typical for a pressure offset.
Because the precision of the first part of the record is very good in the lower and middle troposphere (see
Section 7.4), I decided to correct for a possible pressure offset. I estimated P in the soundings prior to 15
April so as to cancel out the error in temperature at the lowest four levels. The corresponding profile (open
squares) fits very well with the reconstructions for both temperature and geopotential height. Note, however,
that this correction (P = −30 hPa) is extraordinary: the second largest correction applied to any series was
7 hPa. Because of the precision of the data and the reliability of the reference series, the correction was
accepted and a ‘C’ data set was formed by combining the corrected first part with the second part.
For the other records from western and central Europe (data from the Lindenberg compilation, Wetterbericht,
UK Weather Report, and Torslanda) the reconstructions were reliable up to around 300 hPa and most of the
series sufficiently long so that the 95% confidence intervals for the bias were mostly small (±0.3 to ±0.6 ° C
for temperature). Around 40% of the records were corrected because a significant bias was found.
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
783
HISTORICAL UPPER-AIR DATA SET
a
300
Pressure [gph]
Pressure [gph]
The validation was more difficult for radiosondes from the former Soviet Union. On the one hand, reliable
reference series could often be obtained only up to 500 hPa. On the other hand, most of the records are
short. In addition, there is a sampling error in the monthly mean values because the data were not reported
regularly in the data source (Wetterbericht). The 95% confidence interval for the bias was often in the range
of ±1 to ±1.5 ° C, i.e. clearly larger than the target accuracy, which drastically reduces the power of the test.
Possible problems were identified for eight records. In four cases, corrections were applied, and in one case,
the beginning and ending of the record were rejected. In the other three cases, a correction was not possible
or not appropriate. The data from the former Soviet Union and the corrections need to be re-examined if
more data become available. However, at least the procedure can be used to assess the standard correction
adopted for the radiosonde data when pooling the differences from seven records that were considered reliable.
Figure 5 shows the mean differences and 95% confidence intervals (two standard errors) for the R.DCA and
R.DCRA data. Although very large offsets are found in the R.DCA data, the standard correction leaves only
small, insignificant differences. The standard correction is also slightly better than an alternative correction
using λ0 = 15 s (squares, see Section 6.2), which would result in significant differences.
The procedure works much better for the US data. The reference series were mostly reliable up to 400 hPa
(temperature) and 300 hPa (geopotential height), with the exception of the tropical sites. The number of
monthly mean values is high for most of the series, so that the 95% confidence interval for the bias was
normally in the range of ±0.3 to ±0.5 ° C. The results clearly show that there is no need for any radiation and
lag correction. In general, the series were found to be accurate. Nine records were corrected for temperature
offsets, mostly around 0.5 ° C (no pressure offsets were found). As an example, the results for Charleston are
shown in Figure 6. The temperature offset is almost constant with altitude and easy to correct for, whereas
500
700
-4
-2
0
2
4
300
b
500
700
-100
-50
Temp. difference [°C]
0
50
100
GPH difference [gpm]
Figure 5. Mean differences between observations and reconstructions of temperature (a) and geopotential height (b) at different levels
for radiosonde data from seven sites from the former Soviet Union pooled (n is between 9 and 80). Open diamonds: R.DCA data; filled
diamonds: R.DCRA data; open squares: alternative DCRA data when using λ0 = 15 s. Error bars for R.DCA and R.DCRA data give
the 95% confidence interval
300
a
Pressure [gph]
Pressure [gph]
300
500
700
850
b
500
700
850
-4
-2
0
2
Temp. difference [°C]
4
-100
-50
0
50
100
GPH difference [gpm]
Figure 6. Mean differences (error bars give ±1 standard error) between observations and reconstructions of temperature (a) and
geopotential height (b) at different levels for radiosonde data from Charleston, USA. Filled diamonds: R.DCRA data; open squares:
C.DCRA data. The shaded error bars give ±1 standard error of the reconstructed mean anomaly
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
784
S. BRÖNNIMANN
there remains a small but significant offset in geopotential height (no additional correction was performed).
In five other records, small but slightly significant offsets were accepted without correction, one record was
entirely rejected and one was partly rejected. In many records, the lowest level (850 hPa) was rejected. This is
probably related to the structure of the planetary boundary layer, either due to the interpolation from altitude
levels to pressure levels or because the topography and the boundary layer are not represented accurately in
the reference data set (NCEP–NCAR).
A summarized assessment of the accuracy is given in Table II and Figure 7. Averaged over all series from
one data source, the bias is very small in the ‘C’ data set. Some small systematic errors, such as too high
temperatures in the data from the USA and Sweden or too low a geopotential height in the Lindenberg
data, are possible. The comparison with the ‘R’ data set shows that the corrections, on average, led to lower
temperatures and geopotential height. Figure 7 shows the mean bias and 95% confidence interval for each
series in the form of a map. The 400 hPa level was chosen because it is the highest level for which an
assessment is possible for more than just a few series. The figure for geopotential height shows a possible
tendency towards a negative bias in western and central Europe and a positive bias for the Alaskan stations.
No spatial patterns of the bias can be found in the map for temperature. Most of the estimated biases in North
America, as well as in western and central Europe, are small and relatively well determined (small confidence
intervals). However, the differences between the candidate and the reference series are larger and less well
determined in the case of the former Soviet Union because of short candidate series and unreliable reference
series. The large circles do not indicate that there is a bias or that the quality is low. Rather, they indicate
400 hPa Geopotential height
400 hPa Temperature
Target
Mean bias
95% confidence interval
Geopotential height
Temperature
4
7.5
12
18
25
35
45
[gpm]
0.11
0.22
0.35
0.55
0.75
1
1.3
[°C]
Figure 7. Map of the mean bias and its 95% confidence interval (2 SEO – T ) for temperature and geopotential height at 400 hPa in the
data set ‘C’. Dashed circles indicate series that were corrected during the assessment procedure. Only series with n ≥ 5 are displayed
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
785
HISTORICAL UPPER-AIR DATA SET
Table II. Mean difference between the candidate series and the reference series in the data sets ‘C’ and ‘R’ (without
rejected series) sorted by archival source, variable, and pressure level
Level (hPa)
Data set ‘C’
Linden.
Temperature (° C)
700
−0.05
600
0.09
500
0.04
400
0.00
300
0.14
250
Data set ‘R’ without rejected series
Wetterb.
UK-WR
Torslanda
MWR
Linden.
Wetterb.
UK-WR
Torslanda
MWR
0.01
−0.15
−0.21
0.11
−0.19
0.17
0.14
0.21
0.02
0.03
0.20
0.05
0.09
0.05
0.02
0.29
0.68
0.94
1.07
1.20
1.19
0.23
0.06
−0.05
0.20
0.10
0.88
0.14
0.21
0.02
0.03
0.20
0.15
0.21
0.17
0.14
0.40
Geopotential height (gpm)
700
−2.1
1.1
600
−3.0
−0.2
500
−2.6
−1.6
400
−4.6
0.7
300
−1.8
12.3
250
0.12
−0.15
−3.2
−1.7
−4.6
1.02
0.11
−0.4
0.1
−0.7
−2.0
1.1
−3.0
−1.1
1.6
−0.1
−0.9
4.9
7.6
13.1
18.3
30.9
3.8
3.0
1.2
2.4
20.7
2.2
44.9
−0.4
0.1
−0.7
−2.0
1.1
−2.1
0.8
4.1
3.4
3.2
−0.1
that the assessment significantly lacks power in these cases. In fact, as was shown in Figure 5, only a small
bias is found when these short series are pooled together. In summary, the bias is well within the pre-defined
target for most of the sites. However, the series from the former Soviet Union cannot be assessed with the
same explanatory power, especially at higher levels. The systematic errors are small.
7.5. Precision of monthly mean values
The assessment of the precision was performed for the ‘C’ data. Only 14 series from seven sites (see full
report for details) violated the assumption that the standard deviation is not larger than the target standard
deviation, which corresponds to around 2% of all variables tested. However, the test is not very powerful at
upper levels. Here, the method of comparing the differences between the series of two neighbouring stations
is used for five station pairs in Europe and the USA. For each pair, the data from one station were adjusted
for the climatological difference between the sites. These comparisons revealed no case with too high a
variability except for the pair Kjeller/Torslanda (255 km distance), where five series (four temperatures) were
outside the specified target. This could point to a possible precision problem. Because at Torslanda, ascents
were performed only twice per week, the larger variance could be due to the less frequent sampling. On
the other hand, the variance was only slightly larger than the critical value, and because the test does not
account for the true variance (not caused by measurement errors) of the differences, I did not consider this
as sufficient evidence for rejecting any of the series. A better data precision for monthly mean values (at
the price of less data) would be obtained by defining more rigorous criteria for monthly averages. This
could be more appropriate for certain applications. Note, also, that a low fraction of ascents reaching high
levels can cause a ‘fair weather’ bias in the monthly mean values if the burst of the balloon depends on the
meteorological conditions.
7.6. Precision of single ascents
For the case of Freiburg i. B., the mountain sites Säntis and Jungfraujoch can be used to assess the
precision of daily temperature and geopotential height data at the 700 hPa and 600 hPa levels respectively.
The comparison was restricted to pairs of measurements that were at most 3 h apart. The data that were not
adjusted for the diurnal cycle (C.DCR) were used. Figure 8 shows the corresponding scatter plots. One can
make the simple assumption that a linear function of the data from Jungfraujoch and Säntis (i.e. a least-squares
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
786
S. BRÖNNIMANN
a
Frei. 600 hPa GPH [gpm]
Frei. 700 hPa GPH [gpm]
3200
3000
2800
720
740
4400
b
4200
4000
3800
620
760
0
c
0
Frei 600 hPa T [°C]
Frei 700 hPa T [°C]
640
660
Jungfraujoch p [hPa]
Säntis p [hPa]
-10
-20
-30
d
-10
-20
-30
-40
-30
-20
-10
Säntis T [°C]
0
-40
-30
-20
-10
0
Jungfraujoch T [°C]
Figure 8. Scatter plots of geopotential height (a, b) and temperature (c, d) measured with the radiosonde at 700 and 600 hPa at Freiburg
(C.DCR data) versus pressure and temperature measured at the nearby mountain sites of Säntis and Jungfraujoch within at most 3 h
from the sonde ascent
regression line) represents the true values for Freiburg i. B. and that the entire variability found in the scatter
plot is due to deviations of the radiosonde data. Note that mountain sites do not accurately represent the
surrounding free troposphere. Also, there is an error of the meteorological measurements, and the distance
of 200 km and time difference of 3 h also explain some variability. Hence, the variability of the radiosonde
data is largely overestimated. Still, even under this extreme assumption, the targets for the precision specified
for individual ascents or daily data are met in all four cases.
For the upper levels, series from neighbouring sites can be compared on a daily scale. Here, the amount of
data does not allow for a maximum time difference to be set, and the comparison was performed for the daily
mean values (C.DCRD) from eight station pairs. As for the assessment of monthly mean values, one of the
records was adjusted for the climatological differences between the two locations and it was assumed that the
entire variability is due to imprecise data. Despite this restrictive assumption, most of the series were found
to be within the specified targets (see full report for details). There were some exceptions, which concerned
temperature more often than geopotential height and which were mostly caused by only one or two very large
differences, pointing to remaining outliers. I did not reject any of the series, but note that a better outlier
screening would be desired, especially for the data from the former Soviet Union.
7.7. Summarized quality assessment
Based on all results, the following summarized quality assessment can be made:
• 132 upper air records from 112 sites were digitized, of which 92 records (around 850 series) could be
assessed using statistically reconstructed reference series on a monthly mean base. Further tests, involving
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
787
HISTORICAL UPPER-AIR DATA SET
•
•
•
•
the comparison with neighbouring sites or nearby mountain sites, were performed for 22 records. 25
records (or parts thereof) were corrected; for eight records, significant differences were not corrected, and
nine records or parts thereof were rejected. In several cases, the tests used for the validation lack power
because of too short series or unreliable reference series.
The accuracy is probably the quality criterion most difficult to meet. For most of the series it is within
the specified targets up to the level where validations could be performed (500 to 200 hPa). However, for
many records, this was only the case after corrections additional to the standard correction.
The precision of monthly mean values is, in almost all cases, within the specified targets; exceptions concern
mostly temperature. However, the tests lack power at upper levels.
Only a few tests could be performed for the precision of daily mean data or individual soundings. The
results generally indicate a good precision in the lower and middle troposphere. Again, there are some
exceptions for temperature that point to remaining outliers in the data in some cases.
The data set is released as Version 1.0. When more upper-air data become available for that time period,
it will be possible to improve the quality of the existing records by repeating the plausibility screening
and the validation experiments and reassessing the corrections. In particular, compiling more background
information about the existing records would be of great importance.
8. ANOMALY MAPS FOR THE EARLY 1940s
The data were compiled in order to study a supposed anomaly of atmospheric circulation during the early
1940s. A thorough analysis of this period is outside the scope of this paper, especially since the data set
presented here is in many respects an intermediate product. Nevertheless, some preliminary results are
presented in this section in order to test the consistency of the re-evaluated data.
Figure 9 presents anomaly maps with respect to the 1961–90 mean seasonal cycle for three months: August
1940, February 1941, and January 1942. These months were chosen because they represent typical anomaly
patterns, include summer and winter, and have sufficient upper-air data. The top row shows the anomalies
in surface air temperature; the second row shows the 700 hPa temperature anomalies from the upper-air data
(circles), supplemented with anomalies from several mountain sites between around 2350 and 3600 m a.s.l.
(triangles, see Table III). The third row shows temperature anomalies at 400 and 200 hPa (inset) and the
fourth row anomalies of geopotential height at 400 hPa and SLP. Note that the intervals used for plotting the
data correspond to half the specified target precision for monthly mean data (1.6 ° C, 50 gpm).
The anomalies of temperature at 700 hPa show consistent spatial patterns; those are similar to the surface
air temperature anomalies, although there are slight differences in the magnitudes. The agreement between the
upper-air data and the mountain sites is good; the only obvious outlier is a surface station. The main spatial
Table III. Meteorological mountain sites used for supplementing the 700 hPa temperature data displayed in Figure 9
Station
Jungfraujoch
Sonnblick
Mussala
Pic Du Midi
Dillon, CO
Hermit
Lomnicky Stitt
Vf. Omu
Säntis
Izaña
a MT:
Latitude
Longitude
Altitude (m a.s.l.)
Typea
Source
46.55
47.05
42.2
43.07
39.63
37.8
49.2
45.5
47.23
28.3
7.98
12.95
23.6
0.15
−106.03
−107.1
22.22
25.4
9.35
−16.5
3582
3107
2927
2862
2763
2743
2635
2509
2490
2368
MT
MT
MT
MT
MV
MV
MT
MT
MT
MT
MeteoSwiss
NASA–GISS
NASA–GISS
Dessens, 1991
NASA–GISS
NASA–GISS
NASA–GISS
NASA–GISS
MeteoSwiss
NASA–GISS
mountain top; MV: mountain valley.
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
788
S. BRÖNNIMANN
Surface
Air Temp.
700 hPa
Temp.
400 hPa
Temp.
200 hPa
Temp.
400 hPa GPH
SLP
August 1940
Temperature anomaly [degC]
February 1941
-8.8 -7.2 -5.6 -4 -2.4 -0.8 0.8 2.4 4 5.6 7.2 8.8
Copyright  2003 Royal Meteorological Society
January 1942
Geopotential height anomaly [gpm]
-175-125 -75 -25 25 75 125 175
Int. J. Climatol. 23: 769–791 (2003)
HISTORICAL UPPER-AIR DATA SET
789
features are also similar at the 400 hPa level, but the magnitude of the anomalies is generally smaller. The
strong warm anomaly at Jakutsk could be an outlier. Relatively consistent spatial patterns, but partly different
from the ones at 400 hPa, are also found for the 200 hPa temperature. Here, the variability seems to be larger,
which is also expected from the fact that fewer ascents reach that level. Geopotential height at 400 hPa also
shows consistent spatial patterns. Some of them are different from, but not inconsistent with, the patterns in
the SLP anomaly field. Clearly, there is insufficient spatial information in the data at 400 hPa (or any higher
level) for direct conclusions about the hemispheric circulation, but there could be enough spatial information
for a statistical reconstruction approach that makes use of all available data on all levels, including data from
the Earth’s surface.
The period from around winter 1939–40 to spring 1942 was characterized, at the Earth’s surface, by a
strong Aleutian low, frequent low temperatures over central Europe and frequent high temperatures over Arctic
Alaska. This pattern appears most pronounced in January 1942. The anomalies in monthly mean surface air
temperature spanned a range of −9 to +9 ° C. The negative anomaly over Europe was also pronounced in the
lower troposphere, but weaker at 400 hPa. The sign of the temperature changes around the tropopause and
distinct positive anomalies are found in the stratosphere (200 and 100 hPa levels).
The preliminary analysis suggests that the historical data capture the signal, i.e. the spatial pattern and
amplitude of month-to-month variability relatively well. This is especially the case in the mid-latitudes and
subpolar regions and in the winter season. The signal is presumably smaller and more difficult to detect in
the subtropics and tropics.
9. CONCLUSIONS
A large amount of historical upper-air data for the extratropical Northern Hemisphere for the 1939–44 period
was compiled from several meteorological archives and digitized. Although the available information about
the observations is very limited and independent data for use as reference series do not exist, methods were
developed that allow for checking the plausibility, correcting, and assessing the data. The same concepts
could be useful for re-evaluating other historical upper air data.
The results suggest that the specified targets for the accuracy and precision of the data are generally met,
although there are remaining uncertainties, especially for the short series and at upper levels. A preliminary
analysis of anomaly maps reveals distinct spatial patterns in the upper-level data that are consistent with each
other and with corresponding anomalies at the Earth’s surface. It is suggested that the data can be used to
study synoptic to interannual variability. However, they should not be used to study long-term trends.
The final data product is termed UA39 44, Version 1.0, and can be downloaded from the Website
http://www.giub.unibe.ch/∼broenn/UA39 44/, together with a detailed description of the data set, the reevaluation procedure, and validation results (Brönnimann, 2003). There were several problems in the raw
data set, the solution to which must be considered preliminary in some cases. When more information
becomes available, the corrections and procedures will be re-assessed for future versions of the data set.
10. DATA SOURCES
In addition to Beeliz and Robitzsch (1949) listed in the References section, the following archives were
consulted:
Figure 9 (see page 788). Maps of anomalies of temperature and geopotential height at different levels in August 1940, February 1941,
and January 1942 with respect to the 1961–90 mean seasonal cycle. First row: surface air temperature (HadCRUTv; Jones et al.,
2001). Second row: temperature at 700 hPa (S.DCRM, circles) and at several mountain sites (triangles, see Table III). Third row:
400 hPa temperature (S.DCRM) and 200 hPa temperature (inset, S.DCRM). Fourth row: 400 hPa geopotential height (S.DCRM) and
SLP (Trenberth and Paolino, 1980)
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
790
S. BRÖNNIMANN
American Meteorological Society. 1939–1944. Meteorological and Climatological Data — Aerological
Observations, Monthly Weather Review January 1939 to November 1942 and January to December 1944.
Deutsche Seewarte. 1939–1944. Täglicher Wetterbericht, Übersicht über die Höhenaufstiege, Januar 1939 —
Juni 1943. Bibliothek des Bundesamts für Seeschifffahrt und Hydrographie, Hamburg.
Meteorological Office. 1939–1944. Daily Weather Report of the UK, Upper Air Section, January 1939 to
December 1944. Archive of the UK Met Office, Bracknell, UK.
Statens Meteorologisk–Hydrografiska Anstalt. 1939. Årsbok 18–19 (1936–1937), VI. Aerologiska iakttageleser i Sverige 1936/1937. Stockholm.
Statens Meteorologisk-Hydrografiska Anstalt. 1942–1947 Årsbok 22–26 (1940–1944), VI. Aerologiska
iakttageleser i Sverige. Stockholm.
ACKNOWLEDGEMENTS
I greatly appreciated the support during my visits of several meteorological archives, namely by W. Adam
(Aerological Observatory Lindenberg), A. Lück (Bundesamt für Seeschiffahrt und Hydrographie, Hamburg),
I. McGregor (UK Met Office, Bracknell), Mr Vogel and G. Stork (MeteoSwiss, Zurich). The Swedish
Meteorological Service kindly sent photocopies of the Torslanda data. Andrea Kaiser digitized large parts
of the UK and Swedish data, Nicolas Curien digitized parts of the US data. The comments of two anonymous
referees are gratefully acknowledged. This work was funded by the Swiss National Science Foundation and
the Holderbank Foundation.
The data presented in this paper are based on information supplied by the UK Met Office and the Deutscher
Wetterdienst.
REFERENCES
Barry RG. 1992. Mountain Weather and Climate, 2nd edn. Routledge: New York.
Beelitz P, Robitzsch M. 1949. Arbeiten aus dem Aerologischen Archiv des Observatoriums Lindenberg Kr. Beeskow. Zusammengestellt
von Direktor Dr. Paul Beelitz und Professor Dr. Max Robitzsch. Zusammenstellung der Radiosonden — Aufstiege von Mitteleuropa
1939–1944 (Nach dem Archiv des Observatoriums Lindenberg). I. Teil, Oktober 1949. II. Teil, Dezember 1949.
Brönnimann S. 2003. Description of the 1939–1944 upper air data set (UA39 44) Version 1.0 . University of Arizona: Tucson.
Dessens J. 1991. Secular trend of surface temperature at an elevated observatory in the Pyrenees. Journal of Climate 4: 859–868.
Diamond H, Hinman Jr WS, Dunmore FW. 1937. The development of a radio-meteorograph system for the Navy Department. Bulletin
of the American Meteorological Society 18: 73–99.
Diamond H, Hinman Jr WS, Lapham EG. 1938. Comparisons of soundings with radio-meteorographs, aerographs, and meteorographs.
Bulletin of the American Meteorological Society 19: 129–141.
Durre I, Peterson TC, Vose RS. 2002. Evaluation of the effect of the Luers–Eskridge radiation adjustments on radiosonde temperature
homogeneity. Journal of Climate 15: 1335–1347.
Flohn H. 1944. Zum Klima der freien Atmosphäre über Sibirien, I. Temperatur und Luftdruck in der Troposphäre über Jakutsk.
Meteorologische Zeitschrift 61: 50–57.
Flohn H. 1947. Zum Klima der freien Atmosphäre über Sibirien, II. Die regionale winterliche Inversion. Meteorologische Rundschau
1: 75–79.
Gaffen DJ. 1993. Historical changes in radiosonde instruments and practices. WMO Instruments and Observing Methods Report No.
50. WMO/TD-No. 541.
Hesse W. 1961. Handbuch der Aerologie. Akademische Verlagsgesellschaft Geest & Portig: Leipzig.
Hughes P, Gedzelman D. 1995. The new meteorology. Weatherwise 48(3): 26–36.
Jones PD, Osborn TJ, Briffa KR, Folland CK, Horton EB, Alexander LV, Parker DE, Rayner NA. 2001. Adjusting for sampling density
in grid box land and ocean surface temperature time series. Journal Geophysical Research 106: 3371–3380.
Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Chelliah M,
Ebisuzaki W, Higgins W, Janowiak J, Mo KC, Ropelewski C, Wang J, Leetmaa A, Reynolds R, Jenne R, Joseph D. 1996. The
NCEP/NCAR 40-year reanalysis project. Bulletin of the American Meteorological Society 77: 437–471.
Knowles Middleton I, Edwards HW, Johnson H. 1938. The lag coefficient of some meteorological thermometers. Bulletin of the
American Meteorological Society 19: 321–326.
Labitzke K, van Loon H. 1999. The Stratosphere: Phenomena, History, and Relevance. Springer: Berlin.
Lander AJ. 1946. The British radiosonde. Weather 1: 21–24.
Lange KO. 1937. The 1936 radio-meteographs of Blue Hill Observatory. Bulletin of the American Meteorological Society 18: 107–126.
Luers JK, Eskridge RE. 1995. Temperature correction for the VIZ and Vaisala radiosondes. Journal of Applied Meteorology 34:
1241–1253.
Luers JK, Eskridge RE. 1998. Use of radiosonde temperature data in climate studies. Journal of Climate 11: 1002–1019.
Moninger WR, Mamrosh MD, Pauley PM. 2003. Automated meteorological reports from commercial aircraft. Bulletin of the American
Meteorological Society 84: 203–216.
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)
HISTORICAL UPPER-AIR DATA SET
791
Moore CA, Neiburger M. 1945. Accuracy of radiosonde data (comment and reply). Journal of Meteorology 2: 80–81.
OMI (Organisation Météorologique Internationale). 1951. Comparaison Mondiale des Radiosondes. Acte Final. Vol. I. Station Centrale
Suisse de Météorologie.
OMM (Organisation Météorologique Mondiale). 1952. Comparaison Mondiale des Radiosondes. World Comparison of Radiosondes.
Acte Final . Vol. III. Station Centrale Suisse de Météorologie.
Peterson TC, Vose RS. 1997. An overview of the Global Historical Climatology Network temperature data base. Bulletin of the American
Meteorological Society 78: 2837–2849.
Randel WJ, Wu F, Gaffen DJ. 2000. Interannual variability of the tropical tropopause from radiosonde data and NCEP reanalysis.
Journal of Geophysical Research 105: 15 509–15 523.
Raunio N. 1950. Amendments to the computation of the radiation error of the Finnish (Väisälä) radiosonde. Geophysica 4: 14–20.
Santer BD, Hnilo JJ, Wigley TML, Boyle JS, Doutriaux C, Fiorino M, Parker DE, Taylor KE. 1999. Uncertainties in observationally
based estimates of temperature change in the free atmosphere. Journal of Geophysical Research 104: 6305–6333.
Scherhag R. 1948. Neue Methoden der Wetteranalyse und Wetterprognose. Springer: Berlin.
Schmutz C, Gyalistras D, Luterbacher J, Wanner H. 2001. Reconstruction of monthly 700, 500 and 300 hPa geopotential height fields
in the European and eastern North Atlantic region for the period 1901–1947. Climate Research 18: 181–193.
Scrase FJ. 1954. Radiation and lag errors of the Meteorological Office radiosonde and the diurnal variation of upper-air temperature.
Quarterly Journal of the Royal Meteorological Society 80: 565–578.
Trenberth KE, Paolino DA. 1980. The Northern Hemisphere sea level pressure data set: trends, errors, and discontinuities. Monthly
Weather Review 108: 855–872.
Tweles S, Finger FG. 1960. Reduction of diurnal variation in the reported temperatures and heights of stratospheric constant-pressure
surfaces. Journal of Meteorology 17: 177–194.
Väisälä V. 1941. Der Strahlungsfehler der finnischen Radiosonde. Mitteilungen des Meteorologischen Instituts der Universität Helsinki
No. 47.
Väisälä V. 1949. Solar radiation intensity at the ascending radiosonde. Geophysica 3: 37–55.
Von Storch H, Zwiers FW. 1999. Statistical Analysis in Climate Research. Cambridge University Press: Cambridge.
Zaitseva NA. 1993. Historical developments in radiosonde systems in the former Soviet Union. Bulletin of the American Meteorological
Society 74: 1893–1900.
Copyright  2003 Royal Meteorological Society
Int. J. Climatol. 23: 769–791 (2003)