INTERNATIONAL JOURNAL OF CLIMATOLOGY Int. J. Climatol. 23: 769–791 (2003) Published online in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002/joc.914 A HISTORICAL UPPER AIR-DATA SET FOR THE 1939–44 PERIOD a STEFAN BRÖNNIMANNa,b, * Lunar and Planetary Laboratory, University of Arizona, Tucson, AZ 85721, USA b Institute of Geography, University of Bern, Switzerland Received 16 December 2002 Revised 4 March 2003 Accepted 15 March 2003 ABSTRACT Historical upper-air data from radiosonde ascents and weather flights were re-evaluated in order to study the circulation of the upper troposphere and lower stratosphere during the 1939–44 period. Temperature and geopotential height data from 132 records, comprising around 26 500 single atmospheric profiles and 1750 monthly mean profiles, were compiled, digitized, controlled, adjusted, and assessed. The data stem from a number of countries in the extratropical Northern Hemisphere, including Germany and occupied areas, Sweden, the UK, the former Soviet Union, and the USA. In this paper, the principal procedures used to correct the historical radiosonde data for the effects of lag and radiation errors are presented and ways of assessing the accuracy and precision of the data are discussed. The results show that the specified quality criteria are generally met by most data records. Some of the records had to be corrected for systematic errors; a few were rejected. Many series, however, were too short for a reasonable assessment. Also, individual ascents may have a larger error. Monthly anomaly maps of upper-level temperature and geopotential height based on the re-evaluated data show clear spatial patterns that are consistent with each other and with corresponding anomaly fields from the Earth’s surface. The re-evaluated data can be used to study synoptic to interannual variability, but they are not suitable for long-term trend analysis. Copyright 2003 Royal Meteorological Society. KEY WORDS: radiosonde data; aircraft data; aerological data; historical data; Northern Hemisphere; geopotential height; air temperature 1. INTRODUCTION Exceptional total ozone values at several sites, as well as unusual climatic conditions at the Earth’s surface, point to a possible anomaly of the circulation of the stratosphere and troposphere of the extratropical Northern Hemisphere in the early 1940s (Labitzke and van Loon, 1999). This supposed anomaly seems worth studying; however, this is not possible with the currently available upper-air data or gridded data sets above the Earth’s surface, which start around 1948. Nevertheless, data from pilot balloons, radiosonde ascents, and weather flights since the 1930s can still be found today on paper in various meteorological archives. These data could enable a detailed study of the circulation of the early 1940s, if one is willing to take the trouble of compiling, digitizing, controlling, assessing plausibility, correcting and validating them. This effort was undertaken for a small fraction of the available radiosonde and weather flight data for the 1939 to 1944 period. In total, temperature and geopotential height from around 26 500 single atmospheric profiles and 1750 monthly mean profiles were digitized and processed. This paper presents the new upper-air data set, with a special focus on the procedures used for correcting the data, an assessment of the quality, and a brief presentation of results. For a full description of the data sources, instruments, data series and station information, procedures used for re-evaluating the data, and detailed results of the quality assessment, the reader is referred to Brönnimann (2003), hereafter termed the full report. The data can be downloaded from the Website given at the end of the article. * Correspondence to: Stefan Brönnimann, Lunar and Planetary Laboratory, University of Arizona, P.O. Box 210092, Tucson, AZ 85721, USA; e-mail: [email protected] Copyright 2003 Royal Meteorological Society 770 S. BRÖNNIMANN 2. AIMS AND QUALITY REQUIREMENTS The final data should suit the needs of different planned analyses. On the one hand, daily time series from certain sites will be analysed with respect to their variability and, for example, compared with total ozone time series. On the other hand, monthly mean values from a large number of sites will be analysed in their spatial context and used in statistical procedures to reconstruct, together with station data from the Earth’s surface, monthly mean fields of geopotential height and temperature at various levels up into the lowermost stratosphere. Both of these analyses will be based on anomalies from the long-term mean seasonal cycle, where the National Centers for Environmental Prediction (NCEP) — National Center for Atmospheric Research (NCAR) reanalysis data set is chosen as the long-term reference (Kalnay et al., 1996; see Section 3.6. for a discussion on the possible problems). The individual profiles could eventually be used to supplement observations from the Earth’s surface and from pilot balloons in a data assimilation and modelling procedure to construct a ‘reanalysis’ data set for this time period. It is important to note that the focus of all these applications is on time scales from synoptic to interannual variability. There is no intention to obtain a data set suitable for trend analyses. The applications sketched above dictate a certain range of accuracy and precision of the data set. After a preliminary analysis of the available data for that time period (see Section 8) with respect to the expected signal, I defined the target precision as ±4 ° C for single ascents and ±1.6 ° C for monthly mean values, meaning that 90% of the data points in each case should be within the given limits. For geopotential height, the target precision depends on the pressure level (around 700 to 100 hPa) and is estimated as ±70–160 gpm for single ascents and ±30–80 gpm for monthly means. The bias with respect to the reference, i.e. the NCEP–NCAR reanalysis data set, should be less than ±0.7 ° C (±15–30 gpm). A data set that meets these targets could also be useful for certain other climatological analyses. 3. DATA ARCHIVES The data presented in this paper stem from five archival sources. A brief overview is given in this section, and more detailed descriptions are included in the full report. The bibliographical details of these archival sources are listed in Section 10. 3.1. The Lindenberg compilation An estimated 20 000 radiosounding ascents from about 60 sites in Germany and occupied areas in 1939 to 1944 were later compiled at the Observatory of Lindenberg by Beelitz and Robitzsch (1949). I digitized data from around 9500 ascents from 13 sites in Europe and three sites in North Africa. Data for geopotential height (in dynamic metres; least significant digit: 1), temperature (° C, 0.1), and humidity (%, 1) on pressure levels (1000 to 100 hPa in steps of 100 hPa, and 50 hPa) are printed in tables. A cover sheet for each station gives additional information, followed by an assessment of the quality of the corresponding data and signed by the authors. Obviously, the soundings had been worked through and implausible ascents were excluded. The Lindenberg compilation consists of thin A3-sized sheets: carbon copies, where the colour has become faint and blurred over time. As a consequence, the legibility is a problem (see full report for details). Parts of the Lindenberg compilation were photocopied in Lindenberg. 3.2. Täglicher Wetterbericht der Deutschen Seewarte The upper-air section of the daily weather report from Germany, issued by the Deutsche Seewarte, contains hand-written tabulated data from weather flights and radiosonde ascents from a large number of stations for the period of interest. The data are given in different forms; I used the pressure level data (in dynamic metres, least significant digit 1 or 10, and ° C, 0.1 or 1) for the levels 700 to 100 hPa (in 100 hPa steps). From January 1942 on, the data above 300 hPa were given on different levels and not digitized. Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) HISTORICAL UPPER-AIR DATA SET 771 Around 8000 profiles from the period January 1939 to February 1942 were digitized, mainly from the former Soviet Union (until 4 July 1941), but also from other locations in Europe and North Africa. Because of the large amount of information in this report (soundings, upper-level charts), the Wetterbericht became indispensable for assessing the plausibility or cross-checking information from the Lindenberg compilation (e.g. indecipherable numbers). The data sheets for the period January 1939 to June 1943 were photocopied at the archive of the Bundesamt für Seeschiffahrt und Hydrographie in Hamburg, Germany. 3.3. UK daily weather report, upper-air section The upper-air section of the UK daily weather report contains radiosonde and weather flight data in small hand-written tables. The data are primarily from the UK, and irregularly from other countries. There were many changes in layout and reporting practices during the 1939 to 1944 period, which made the work difficult. For instance, radiosoundings were first given as significant points, then as significant points plus standard levels, later on only three pressure levels (700, 500, 250 hPa), and again later on pressure levels in 50 hPa steps. Altitude (in feet, least significant digit 10 or 100 ft) and temperature (in ° F, 0.5 or 1) were digitized from around 4350 weather flights and 3850 radiosoundings from nine sites in the UK. However, because of the changes in reporting practices and restricted digitizing time, only data from one or two levels were digitized in most cases (see full report for details). The UK Daily Weather Report was obtained from the archive of the UK Met Office in Bracknell, UK. The data up to 1941 were digitized in the archive, whereas the data sheets for the 1942 to 1944 period were photocopied. 3.4. Yearbooks of the Swedish Meteorological and Hydrological Institute Radiosonde data from Sweden, mainly from one site, can be found in Part 6 of the Meteorological Yearbook of the Swedish Meteorological and Hydrological Institute in the form of printed tables. The data are given on geopotential levels, on pressure levels and as significant points. A full description is given in the yearbook 1936–1937. The data chosen for this study are pressure-level data (900 to 100 hPa in 100 hPa steps), which are given in dynamic metres (least significant digit 1) and ° C (0.1), from 733 ascents. The Swedish Meteorological Service kindly sent us photocopies of the data. 3.5. Monthly Weather Review (USA) The aerological network of the USA has its roots in the 1920s and was in relatively good shape from around 1938 onwards (Hughes and Gedzelman, 1995; Moninger et al., 2003). In this study, an attempt was made to work directly with the monthly mean values that can be found in the Monthly Weather Review for the time period of interest. The data are printed in tables with the number of observations, pressure (in hPa, least significant digit 1), temperature (° C, 0.1) and humidity for fixed altitude levels (every 0.5 km up to 3 km, then every 1 km). Data from around 1750 monthly mean profiles from 36 sites were digitized, consisting mainly of radiosonde ascents and some weather flights. 3.6. Data used for correction and validation For several steps in the re-evaluation procedure, a reference climatology is needed. This also holds for many of the validation experiments and, as mentioned above, for the final analysis. I chose the NCEP–NCAR reanalysis data set (Kalnay et al., 1996) as a reference. Note that this data set has inaccuracies and inhomogeneities (e.g. see Santer et al. (1999) and Randel et al. (2000)). Nevertheless, it is the best currently available long-term global meteorological data set and the reported problems mainly concern the tropics rather than the northern extratropics. In view of the specified quality targets, I considered it safe to use the NCEP–NCAR reanalysis data set as a reference. For the re-evaluation procedure, the long-term means provided by the NOAA–CDC via their Website were used, i.e. long-term mean values of geopotential height and temperature for each calendar month, level, Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 772 S. BRÖNNIMANN and variable as well as four-times-daily long-term mean anomalies for the months January, April, July, and October. The reference period for the long-term means is 1968 to 1996. The anomalies were interpolated to fill in the other months and added to the monthly mean values. The diurnal cycle was then interpolated in time between the standard times of observation, yielding a ‘climatology’ for any given time of day, month, level, location, and variable. Note that the diurnal cycle in the NCEP–NCAR data is model generated (radiosonde ascents are normally performed twice per day) and may deviate from the true diurnal cycle. However, in view of the specified quality targets, this error is assumed to be tolerable. For the validation, the NCEP–NCAR reanalysis data (1948–97) were used, as well as monthly temperature data from 111 sites from the NASA–GISS station data set (Peterson and Vose, 1997; homogenized version, see full report for details) and from Pic du Midi (Dessens, 1991). Three-times-daily temperature and pressure data from two mountain sites in Switzerland, Säntis (2500 m a.s.l.) and Jungfraujoch (3580 m) were provided by MeteoSwiss. In addition, gridded sea-level pressure (SLP) data (Trenberth and Paolino, 1980) and monthly mean 300 hPa geopotential height fields over Europe, which were reconstructed by Schmutz et al. (2001) from 1901 to 1947 based on station measurements at the surface, were used for the validation. The preliminary analysis makes use of the gridded surface air temperature data set HadCRUTv (Jones et al., 2001). 4. DATA RECORDS AND DIGITIZING PROCEDURE In total, 132 records from 116 sites in the Northern Hemisphere were digitized (Figure 1). Each record consists of data from 1 to 2165 vertical profiles. The following is a brief summary of the digitizing procedure, which is described in detail in the full report. Not all data were digitized. For instance, only the levels that are also standard pressure levels today were chosen, i.e. mostly from 700 to 100 hPa in 100 hPa steps. Nocturnal Number of observations Aircraft Radiosonde <50 50-200 200-500 500-1000 1000-2000 >2000 Figure 1. Map showing the location and size of the records used in this study. For the US records, the size refers to the total number of ascents on which the monthly mean values are based Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) HISTORICAL UPPER-AIR DATA SET 773 ascents were preferred. In the case of twice-daily soundings, the daytime data were normally not digitized, or if so, only from 300 hPa upward. The data were digitized using speech recognition (Dragon Systems, Dragon NaturallySpeaking 5.00 Preferred Deutsch) followed by a three-step plausibility screening based on redundant information and on hydrostatic considerations. Each outlying data point was checked on the original sheets and the plausibility was assessed from comparison with data from neighbouring levels, earlier or later ascents, surrounding stations (often non-digitized data), meteorological stations or daily upper-level weather charts. Data points considered implausible were flagged (for all analyses shown in this study, flagged data points were excluded). This procedure is manual, and hence very time consuming, but it is considered essential for obtaining a high quality. The error rate of the digitizing procedure was 0.1–0.8% (range of 20 random samples of 1000 numbers each) and the errors mostly concerned the least significant digit. This error rate was considered acceptable. Then, the units were converted to standard units (° C, gpm, UTC) and, if necessary, the data interpolated to pressure levels. 5. INSTRUMENTS AND ERRORS This section gives a brief discussion of the instruments used and the errors to be expected in the original data. The focus is on the radiosonde data; aircraft measurements, at that time, were believed to be more reliable than radiosondes and were often used as a reference (e.g. Diamond et al., 1938), although they, too, had errors (Scherhag, 1948). There are various types of error that affect the accuracy and precision of radiosonde data. Some of them are related to the ambient conditions during the ascent, such as the lag and radiation errors, the temperature dependence of the pressure element or insufficient ventilation of the sonde in the balloon wake. Others are not related to the ascent, and include the ground correction, the calculation of geopotential height from pressure and temperature, as well as the instrument-specific errors of the temperature and pressure measurements. Most of the data sources do not indicate the type of sonde used. However, by studying the literature it was possible to find a very likely sonde type for each country (Table I). Unfortunately, relatively little is known about the systematic errors of each of these sondes. The following is a summary of what can be found in the literature. For the German sonde, Scherhag (1948) assumes an error of the temperature measurement of 0.5 ° C, and from this estimates errors in geopotential height of 10 gpm, 20 gpm, 35 gpm, and 50 gpm at 500 hPa, 200 hPa, 100 hPa, and 40 hPa respectively (he does not discuss the pressure error). Diamond et al. (1938) compared the US sonde with aircraft measurements and found an agreement within 2 ° C for 90% of the observations and estimated the error as ±1 ° C. For the pressure measurement an error of ±1 hPa was anticipated (Diamond et al., 1937, 1938). More information is available from the first international radiosonde intercomparison under the auspices of the WMO, which was performed in 1950 in Payerne, Switzerland (OMI, 1951; OMM, 1952). Among others, sondes from Germany, Finland, UK, and the USA participated in this campaign. Some of the sonde types were improved between the early 1940s and 1950. Nevertheless, the results provide information about the magnitude of the errors to be expected, and they explicitly address the different sources of error. The method of calculating geopotential height from the raw data (by numerical or graphical methods or look-up tables) revealed maximum differences from around 5 gpm at 700 hPa to 30 gpm at 100 hPa. The mean lag error was estimated as 0.4 ° C (at 700 hPa) and 0.6 ° C (at 400 hPa) for the Finnish, UK, and German sonde, but was 0.1 ° C and 0.2 ° C respectively for the US sonde. The radiation error averaged over all ascents was estimated to be around 1.5 ° C at 200 hPa for the UK and Finnish sondes but was much less for the German and US sondes. The uncorrected data from nocturnal flights revealed systematic errors (mean deviation of one sonde type from all others) in the range of −0.3 to +0.7 ° C at pressures larger than 300 hPa and −1.5 to +0.6 ° C at higher levels for the four sonde types discussed here. The corresponding systematic errors in geopotential height were −15 to +10 gpm at 500 hPa and −30 to +15 at 200 hPa. The standard deviations of the differences were around 1.5 ° C for temperature and 10 to 90 gpm (at 700 to 100 hPa) for geopotential height. Interestingly, the reports point on several occasions to the importance of well-trained staff as one of the most important quality factors. Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 774 S. BRÖNNIMANN Table I. Types of radiosonde used in the early 1940s, λ0 is the lag coefficient (s), α is the pressure dependence of radiation error, S0 (h) is the radiation error (see text for details) Year Developed/ produced H-38 1938 Dr Graw GmbH Germany A/B RS-11 1936 Väisälä Finland, Swedenb A/B Kew Pattern MK-I 1937 National Physical UK Laboratory 1930s Molchanow USSR Comb sonde (RZ-049?) Diamond–Hinman, 1937 version unknown Diamond–Hinman, 1943 version unknown Blue Hill Country Sensorsa Sonde type 1935 National Bureau of Standards Friez & Sons National Bureau of Standards USA (Navy, Weather Bureau) USA (Navy, Weather Bureau) Harvard University USA (Boston) A/B A or T B A/E A/R A or T B Reference Hesse (1961); personal communication J. Thieme, Dr Graw GmbH Väisälä (1941, 1949), Raunio (1950) Lander (1946) Gaffen (1993), Zaitseva (1993) Diamond et al. (1937, 1938), Gaffen (1993) Moore and Neiburger (1945), Gaffen (1993) Lange (1937) λ0 α S0 (h) 20 0.75 Raunio (1950) scaled with 0.78 –b –b –b 15 0.78 Tweles and Finger (1960) 10 0.82 Tweles and Finger (1960) – – – – – – – – – a A: aneroid box pressure sensor; B: bimetal thermometer; E: electrolyte thermometer in glass tube; R: ceramic thermometer based on resistance principle; T: burdon tube. b This sonde type was also assumed for the records from Reykjavik, Tromsø 1939, and Rabat, but in these cases the standard correction was used, i.e. λ0 = 15 s, α = 0.852, S0 (h) from Raunio (1950). The reports show that the expected errors are in the same range as the quality targets. Hence, it is necessary to make corrections. Instrument-specific corrections (other than statistical) are possible only for the lag and radiation errors. For this purpose, specific information is needed for each sonde, such as the dependence of the radiation error on the solar zenith angle and pressure and an estimation of the lag of the thermometer. 6. CORRECTION PROCEDURE Today, radiosonde data are corrected for the radiation and lag errors via numerical models of the heat balance of the sonde during ascent (Luers and Eskridge, 1995, 1998; Durre et al., 2002). Apart from the ascent velocity and the solar elevation, these models include a wide range of environmental parameters and model the heat transfer between the different parts of the sonde. It is not possible to apply such elaborated corrections to historical data, where none of these parameters, except for the solar elevation, is known and the properties of the sonde are not known in detail either. Rather, I decided to start from the information that can be found in the old literature. 6.1. Radiation error In order to correct the data for the radiation error, I adopted the general framework by Väisälä (1941, 1949) and Raunio (1950) who suggest the following formula for the radiation error TR (K) as a function of solar Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 775 HISTORICAL UPPER-AIR DATA SET elevation h and pressure p: I (h, p) v0 T (p) p0 α TR (h, p) = S0 (h) I0 v(p)T0 p (1) where v is the ascent velocity, T (K) temperature, and the subscript 0 refers to a reference (see below). The first term I (h, p)/I0 represents the radiation reaching the sonde normalized to the solar constant I0 (1367 W m−2 ). The second term S0 (h) (K) gives the reference radiation error of a given sonde as a function of the solar zenith angle. The third term represents the ventilation of the sonde and the last term is the pressure dependence. In Väisälä (1941), the constants v0 = 5 m s−1 , p0 = 100 hPa, and T0 = 216.7 K are used to normalize the formula to a typical ascent velocity and to the 100 hPa level. For these conditions, S0 (h) and α were then empirically estimated for the Finnish sonde. The functions S0 (h) and I (h, p)/I0 are given in figure 2 of Väisälä (1949) and figure 1 of Raunio (1950) respectively. According to Väisälä (1941), this formula is valid for the 200 hPa level and higher altitudes. He suggested to assume zero at 500 hPa and to use a scaling linear function with altitude for the levels between 500 and 200 hPa. I changed the zero level of the interpolation to 700 hPa (which is normally the lowest level) because it is known from historical (Flohn, 1944, 1947) and current (Luers and Eskridge, 1995, 1998) literature that the radiation error may also be effective in the middle troposphere. I assumed that this general formula can be used for all radiosonde data if the function S0 (h) and the parameter α are known for each sonde type. A constant value of 5 m s−1 was used for v(p) for all radiosonde ascents due to lack of specific information, T (p) was taken from the sonde data or (if missing) from the climatology and T0 was taken from Väisälä (1941). TR (K) is then only a function of h, which was calculated from the start time. Table I gives an overview of S0 (h) and α adopted for each sonde type. This paragraph briefly describes, how they were derived. As mentioned above, both S0 (h) and α (0.852) are known for the Finnish sonde type. However, an analysis of the raw data suggests that these data were already corrected for the radiation error, as was partly expected from the literature (OMI, 1951; see also Väisälä (1941, 1949)). Hence no correction was applied. For the UK and Soviet sondes, S0 (h) and α were determined from the correction curves given in Tweles and Finger (1960) for various sonde types and pressure levels. S0 (h) was taken from the curve referring to the 100 hPa level, divided by I (h, p)/I0 . The parameter α was estimated by comparing S0 (h) with the corresponding function at 200 hPa. Values of around 0.78 and 0.82 were found for the UK and Soviet sondes respectively. Note, however, that the curves for the UK sonde in Tweles and Finger (1960) refer to type MKII-B, whereas the data probably stem from type MKI, and for the Soviet sonde they refer to type RZ-049, which may or may not have been used for the soundings re-evaluated in this study. Error in GPH [gpm] 250 41 hPa 200 150 96 hPa 100 50 225 hPa 0 -5 15 35 55 Solar elevation [°] Figure 2. Radiation error in geopotential height as a function of solar elevation and pressure level. Dashed lines with crosses: error estimated by Scherhag (1948). Thick lines: fit to these curves using the standard correction and parameters as described in Table I Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 776 S. BRÖNNIMANN For estimating S0 (h) and α for the German sonde, I started from curves of the radiation error in geopotential height (Figure 2), which were empirically estimated for three pressure levels (225, 96, 41 hPa) as a function of h by Scherhag (1948). The curves are based on a large amount of data, but are crude in the sense that they were determined from monthly mean values. Scherhag (1948) points out that they are probably wrong for low solar elevations. Since the curves are for geopotential height, they largely depend on the temperature errors of the levels below. As a first guess, I chose the same S0 (h) as for the Finnish sonde, scaled with a factor f , and iteratively tested combinations of f and α until a best match with the original curves was found. The temperature error was converted into geopotential height using the thickness equation as described below. A very good agreement for the 225 hPa and 96 hPa curves was found using f = 0.78 and α = 0.75 (Figure 2). The agreement is acceptable also for the 41 hPa level up to 30° solar elevation, whereas large discrepancies are found for very high solar elevations. There are several indications that the radiation error of the US sonde was small. According to Diamond et al. (1937), the connection between the temperature element and the body of the sonde was minimal. OMM (1952) mentions a small radiation error. For later sonde types, Tweles and Finger (1960) also find a very small error. Because the US data were used only up to the 300 hPa level and because most ascents were during the night, I assumed that no correction was necessary. 6.2. Lag error The lag error TL can be approximated by the product of the lag coefficient λ, the ascent velocity v, and the temperature lapse rate = −dT /dz: TL (v, p) ≈ λ(v, p)v(p)(p) (2) The lag coefficient is a property of the instrument and additionally depends on air pressure and ascent velocity (see Knowles Middleton et al. (1938)): λ(v, p) = λ0 v(p)/v0 f (p) (3) where λ0 is the lag coefficient at sea level for the ascent velocity v0 and f (p) is a function describing the pressure dependence. I again used v(p) = v0 = 5 m s−1 . The function f (p) was taken from the values tabulated by Scrase (1954) for the UK Met Office sonde for different altitude levels (interpolated to pressure levels using the US76 standard atmosphere). The lapse rate was taken from the (uncorrected) temperature of the two neighbouring levels (or the given level and the neighbouring level at the top and bottom levels) during the ascent. In the case of missing data, was taken from the climatology. The lag coefficient λ0 is known only approximately. Adjusting the values from Scrase (1954) for the UK sonde to v0 gives λ0 = 8 s at sea level. This value is not in agreement with the lag errors mentioned in OMM (1952), which correspond to a lag coefficient λ0 of around 15 to 20 s for the Finnish, UK and German sondes. In fact, Väisälä (1941) mentions a lag coefficient of the Finnish sonde of 15 s. The US sonde has a substantially lower temperature lag because of the different sensor used. The numbers given by Diamond et al. (1937) suggest a lag coefficient of around 4.5 s, which is in good agreement with the lag error mentioned by OMM (1952). These considerations suggest values around 15 s for the bimetal sensor sondes (Finnish, German, UK, and Soviet sondes) and 5 s for the US sonde. However, comparisons with reference series (see Section 7, Figure 5), averaged over all data from one sonde type revealed better results when using lag coefficients of 10 and 20 s for the Soviet and German sondes, respectively. These values are physically plausible and were adopted as standard corrections, although the small systematical differences found might also be due to other causes than an inaccurate lag coefficient (e.g. wrong ascent velocity, pressure offset). Similar comparisons also revealed that no lag correction was necessary for the data from the Finnish and US sondes (see Section 7). Table I summarizes the standard corrections applied for all radiosonde records. Aircraft data were not corrected. Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) HISTORICAL UPPER-AIR DATA SET 777 6.3. Error in geopotential height After correcting the temperature by a total amount T , the geopotential height Z was corrected using the thickness equation: Rd Tv0 + Tv1 p0 ln (4) Z1 = Z0 + g 2 p1 where Rd is the gas constant for dry air, g is the acceleration due to gravity, Tv the virtual temperature, and p pressure. The subscripts denote two neighbouring pressure levels. Since humidity data were not digitized from the soundings, specific humidity from the reference climatology was used to calculate Tv . Z0 was set to zero at 1000 hPa and Z was then calculated upwards to the highest level. For a given start time, the temperature correction can be calculated for all levels, even if no data are available. Similarly, the geopotential height correction can be calculated even if, for example, only data for the 250 hPa level are available. 6.4. Adjusting for changes in sonde start time The start times of the sondes changed frequently within each record and were different for different locations. For some of the analyses, it is desirable to correct for these changes in start time in order to make the soundings comparable to each other. In this case, the daily mean value was used as a reference and the soundings were adjusted by subtracting the corresponding difference determined from the reference climatology which, however, introduces some additional uncertainty (see Section 3.6). 6.5. Final data products The procedure to obtain a final data product from the raw data involves many steps, as discussed in this section. Some of them are well supported; others have a more preliminary character because sufficient information is currently not available. The final product is called Version 1.0. When further data are reevaluated and more information becomes available, some of the procedures might be reassessed and lead to a new version of the data set. Therefore, it is important that all intermediate products are archived and described. The digitized and controlled, but uncorrected, data are termed DC. A data set is then produced in which only the changes in start time are adjusted for, but not the radiation/lag error (DCA). The term DCR is used for a data set where only radiation/lag error is accounted for but not the changes in start time. The data set where both corrections are applied is termed DCRA. These are the main products, from which the following are derived. DCRD is a data set with daily averages after removing flagged values, and DCRM are monthly mean values formed from DCRD if, for a given level, variable, and month, one of the two following applies: the number of observations is at least 13, and no gaps are longer than 7 days. For each of these data sets, three versions are available: R, C, and S (e.g. R.DCA, C.DCRD, S.DRCM). ‘R’ refers to the standard correction described in this section and summarized in Table I. ‘C’ refers to the data sets after additional corrections during the assessment procedure (see Section 7). Finally, ‘S’ refers to the ‘C’ data set after records from neighbouring sites were combined. In this case, one of the two records was adjusted for the climatological difference between the two sites determined from the NCEP–NCAR reference for each month, level, and variable (see full report for details). 7. QUALITY ASSESSMENT The quality of historical upper-air data needs special attention (see Section 5); a thorough assessment is very important. However, it is difficult to find independent data series that could be used as a reference. In this section, ways of assessing the data quality are discussed, some results are shown and a summarized assessment is given. The main procedures were applied to 92 records (around 850 series); further tests were performed for 22 records. More detailed results can be found in the full report. Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 778 S. BRÖNNIMANN 7.1. Method The assessment is based on statistical tests, most of which involve a comparison between the candidate series and a reference series. In most cases, the latter was statistically reconstructed by using information from the Earth’s surface. Meteorological series from nearby high-elevation sites can also be used as a reference; however, mountain sites may show a different behaviour than the free atmosphere, depending on the time of day and season (Barry, 1992). Under certain assumptions, which will be discussed below, a set of tests for both accuracy and precision can be derived from the difference between the candidate series and the reference series. If a significant bias is found in a record and the decision is taken to correct the data, then the corresponding reference series is used to guide the correction and the assessment is repeated in order to check the consistency of the corrections. In the following, the procedure is discussed in detail. Monthly reference series were statistically reconstructed for each of the around 850 candidate series separately based on predictor series that are available in the historical period, as well as in a long period (∼40 years) overlapping the NCEP–NCAR data set (calibration period). In the calibration period, the NCEP–NCAR data, interpolated to the location of the sounding site, were assumed to represent the true value at that location. First, the mean seasonal cycle of all variables was determined in the calibration period and was subtracted from all data, including the historical radiosoundings. A multiple regression model was then fitted to the anomalies in the calibration period and was used to reconstruct a reference series in the historical period. Different sets of predictor variables were used for each record (see full report for details). In most cases I used the time series of the first three principal components of a pressure field (300 hPa geopotential height over Europe, which itself is a reconstruction based only on data from the Earth’s surface, or continental-sized SLP fields) and two to six temperature time series from surrounding sites, if possible mountain sites. The skill of the reconstructions varies strongly with location and altitude. In general, I considered a reconstructed series ‘useful’ if the explained variance R 2 in the calibration period was at least 60%. This was normally the case up to around 400 or 300 hPa. The general statistics for assessing the accuracy and precision of the data using statistically reconstructed reference series are as follows. The bias, i.e. the mean difference between the observed value (subscript O) and the true value (T) in the historical period is estimated from the difference between the observed and the reconstructed (R) values: xO − xT = xO − xR + xR − xT ≈ xO − xR (5) which assumes that the reconstructions are unbiased. The standard error for the bias, SEO – T , can be calculated from the sum of the variances of xO − xR (error of observations with respect to reconstructions) and xR − xT (error of reconstructions), assuming independence: SEO – T = sO2 – R + sR2 – T n (6) where n is the number of differences in the historical period. The error of the reconstruction in the historical period sR2 – T is not known and is approximated by the corresponding error in the calibration period, i.e. the variance of the residuals. A bias is normally termed significant if |xO − xT | < 2SEO – T (7) The precision can be assessed similarly by assuming that the variance of the differences between observed and reconstructed series is composed of the observation error and the reconstruction error, again assumed independent: sO2 – R = sO2 – T + sR2 – T Copyright 2003 Royal Meteorological Society (8) Int. J. Climatol. 23: 769–791 (2003) HISTORICAL UPPER-AIR DATA SET 779 For the observation error (sO2 – T ), I have defined a target precision referring to a 90% interval (Section 2). Dividing the target by 1.644 gives an approximate target standard deviation sTarg . Accordingly, one can assess a series as precise if 2 + sR2 – T sO2 – R ≤ sTarg (9) where sR2 – T is again approximated by the variance of the residuals. The variance sO2 – R is estimated from a small sample, which introduces an additional uncertainty. Hence I used the lower 95% confidence limit instead, approximated by (von Storch and Zwiers, 1999) (n − 1)sO2 – R ψL (10) where ψL indicates the lower critical value of the χ 2 distribution at a probability of 0.05 (n − 1 degrees of freedom). In principle, the same procedure can also be applied when comparing series from two very close sites A and B: 2 sA2 – B ≤ 2sTarg (11) This relation can also be used to assess the quality of daily data. Note, however, that the difference in space and time between observations A and B introduces a significant additional error term, which cannot be estimated. Using this relation, therefore, leads to too frequent rejections. To some extent this also applies to Equations (6) and (9), where sR2 – T was approximated by the variance of the residuals. Because of the problem of overfitting, the true sR2 – T might be underestimated. There are also more fundamental problems when using statistically reconstructed reference series. First, the predictor series might be inhomogeneous and cause a bias in the reconstructions. Second, the underlying assumption that the relation between the predictors and the predictand is the same in the historical and the calibration period is not necessarily true, especially because the early 1940s probably represent an anomalous period and because climatic trends were registered over the last 50 years. Note that a trend in the calibration period does not necessarily bias the reconstructions if its regional spatial structure is similar to that of monthto-month variability. Third, one has to keep in mind that the final data set will not be fully independent of the different data sets used for the reconstruction of the reference series. Fourth, the ‘true values’ (i.e. the NCEP–NCAR data) used for the calibration are complete series with no gaps, whereas the monthly mean values in the historical data set are sometimes based on only a few data points. Hence, a too large variability of the monthly mean values does not necessarily imply that the daily data are not precise. Fifth, the reconstructed series have less skill when moving to higher levels and, if the skill drops to very low values, do not have more information than the long-term mean value. Therefore, any test is but one of several arguments on which a decision is based. It is reasonable to consult other statistics, such as the amount of the bias, R 2 , n, the shape of the difference profile, and comparisons with neighbouring stations. Generally, I addressed a bias as significant only for n ≤ 5 and if R 2 was at least 60%. Because of these shortcomings, the validation procedure is not a universally applicable test, and in some cases it lacks power. Nevertheless, it turned out to be a useful guideline for decisions and for the assessment of the quality of the data series. It should be noted that most of the series in the ‘R’ data set are independent from the reference series. The radiosonde data from Germany and the Soviet Union, however, are not independent, because the lag coefficient was adjusted based on a comparison with the reference data (although for any individual series the effect on the statistics is small). If a bias in a series is detected, then it can be corrected based on information from the reference series. Hence, the two series are no longer independent with respect to the bias. Still, it is necessary to assess the corrected series in the same way in order to check the consistency of the corrections. As to what concerns the tests for the precision, the candidate series can be considered independent from the reference series because the corrections do not (or only to a negligible extent) affect the variance of the differences between the candidate and the reference series. Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 780 S. BRÖNNIMANN 7.2. Corrections If a significant bias is found, then there are three possible procedures. First, the record can be rejected. Normally this concerns the entire period, except if there are clear inhomogeneities, in which case the record can be split into parts. Second, the record can be left uncorrected, despite the significant difference. This can be the case if the bias is small compared with the target accuracy and concerns only one series, whereas corrections would cause another series to be significantly biased. Third, if the precision of the series is good, then it can be corrected. Whenever possible, I did not use a purely statistical correction but one that assumes an underlying physical mechanism. Assuming that lag and radiation corrections are adequate, possible mechanisms include offsets in the temperature or pressure element. Both lead to an offset in temperature that is constant with altitude in the former case and strongly increases with altitude up to the upper troposphere and then sharply decreases at the tropopause level in the latter case. A constant offset in temperature can simply be added to the radiation and lag error terms. The effect of an offset in pressure P on the temperature can be calculated as Tp (p) = (p)Z(p) (12) p + P Rd Z(p) = − Tv (p) ln g p (13) with which is then added to the radiation and lag error terms in order to calculate the effect on geopotential height. The offset amount was determined such as to minimize the temperature difference between the candidate series and the reference series averaged over all levels for which the latter were considered reliable. Correcting for an offset in a physically consistent manner is expected to reduce the offset simultaneously at all levels for both temperature and geopotential height by changing just the value for the offset. This was mostly the case, and the correction was then accepted. If this was not the case, then the record was rejected or left uncorrected (see above). 7.3. Radiation correction and adjustment of start time In order to assess the radiation correction, pairs of soundings performed on the same day were studied. It is expected that the uncorrected data (R.DC) show higher temperatures and geopotential heights during the day than at night, because of the true diurnal cycle and the effect of radiation. In data sets corrected for one or both of these effects (R.DCA and R.DCRA), the difference should be smaller. Note, however, that it is not possible to distinguish between an inaccurate radiation correction and an inaccurate adjustment for the start time because the true diurnal cycle is not known. This analysis was performed for 502 pairs of ascents from Freiburg i. B. from the German network and 76 pairs from Lerwick (Shetland Islands) from the UK network. Note that, apart from these 76 pairs, almost only the 250 hPa level was digitized for all UK soundings. Only pairs were chosen where the solar elevation of one or both ascents was larger than −5° at the 200 hPa level. The ascent with the lower solar elevation angle was then subtracted from the other ascent. Figure 3 shows the mean differences (95% confidence intervals) for both sites for geopotential height and temperature. Weather changes can occur within a few hours; therefore, the error bars are large. Nevertheless, in both cases a strong, statistically significant error is found in the uncorrected data. The adjustment for the start time reduces the difference, but it is still significant. After the radiation and lag correction, the differences are no longer statistically significant in the case of the German data. In the case of the UK data, there remains a significant difference between daytime and nocturnal ascents after the standard correction. This could be due to the fact that the function S0 (h) refers to an updated (MK-IIB) version of the UK Met Office sonde. Additional evidence for a too weak radiation correction Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 781 HISTORICAL UPPER-AIR DATA SET b 200 Pressure [gph] Pressure [gph] a 400 600 200 400 600 -4 -2 0 2 4 -100 Temp. difference [°C] 0 50 100 GPH difference [gpm] c d 200 Pressure [gph] Pressure [gph] -50 400 600 200 400 600 -2 0 2 4 Temp. difference [°C] 6 -50 0 50 100 150 GPH difference [gpm] Figure 3. Mean differences of temperature and geopotential height between high and low solar elevation for pairs of ascents made on the same day (see text) at different levels for the records from Freiburg i. B. (a, b) and Lerwick, UK (c, d). Open diamonds: R.DC data; dashed lines without symbols: R.DCA data; filled diamonds: R.DCRA data; open squares: C.DCRA data for Lerwick. The error bars for the R.DCRA data refer to the 95% confidence interval comes from the comparison of 250 hPa temperature with the reconstructed reference series, although the latter results were not statistically significant. I decided to accept the standard correction for the ‘R’ data set, but to scale the function S0 (h) for the alternative ‘C’ data set. A factor of 1.5 was chosen, which eliminates the difference at the tropopause and leaves no significant differences elsewhere. The correction is crude and preliminary and will be re-assessed in future versions of the data set if further soundings from the UK network are digitized. Note that the scaling affects the accuracy rather than the precision. This is because there are almost exclusively early morning ascents for the entire UK network (apart from the 76 pairs). The effect on temperature at 250 hPa (for all UK sites) is less than 0.55 ° C in 97% of the cases (the maximum is 0.81 ° C), whereas the mean value changes by 0.25 ° C. 7.4. Accuracy Figure 4 shows the results of the assessment of the accuracy for Freiburg i. B., Germany. It shows the mean difference between the candidate series and the reconstructed reference series, for the data sets DCA and DCRA respectively, along with the standard errors of the reconstructed mean anomalies (grey bars around the zero line) and the standard errors of the mean differences (confidence intervals with whiskers). Freiburg represents at the same time the best and by far worst outcome of all validation results. A visual comparison of the observed and the reconstructed time series revealed a possible offset after the first 4 months. First, only the later part of the series, from May 1940 to April 1942 (Figures 4(a) and 4(b)) was investigated. The reconstructed reference series for Freiburg i. B. are among the best and are reliable up to 200 hPa for geopotential height (R 2 = 95%) and up to 300 hPa for temperature. There is a highly significant positive offset of temperature and geopotential height in the DCA data that confirms the need for radiation and lag Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 782 S. BRÖNNIMANN a 100 Pressure [gph] Pressure [gph] 100 300 500 b 300 500 700 700 -4 -2 0 2 4 -100 -50 c 100 Pressure [gph] Pressure [gph] 100 0 50 100 GPH difference [gpm] Temp. difference [°C] 300 500 700 d 300 500 700 -8 -4 0 4 Temp. difference [°C] 8 -200 -100 0 100 200 GPH difference [gpm] Figure 4. Mean differences (error bars give ±1 standard error) between observations and reconstructions of temperature and geopotential height at different levels for radiosonde data from Freiburg i. B., May 1940 to April 1942 (a, b) and February to April 1940 (c, d). Open diamonds: R.DCA data; filled diamonds: R.DCRA data; open squares: C.DCRA data. The shaded error bars give ±1 standard error of the reconstructed mean anomaly corrections. The remaining differences after the standard correction (R.DCRA) are small and insignificant (the difference in 100 hPa temperature cannot be assessed). This figure represents the desired outcome of the assessment, and the most frequent: around 70% of all series tested fall into this category, although the differences and errors were normally larger than in Figure 4(a) and (b). The same does not apply to the first part of the Freiburg series. Figure 4(c) and (d) shows the differences for February to April 1940. Note that the scale is extended by a factor of two compared with Figure 4(a) and (b). Although n < 5, I addressed these extremely large differences as significant. A comparison with three-timesdaily data from the nearby mountain sites Jungfraujoch and Säntis suggests that the break possibly occurred around 15 April 1940. The 36 pairs of ascents in the first part of the record give no evidence for an inaccurate radiation correction. Rather, the vertical profile of the temperature difference is typical for a pressure offset. Because the precision of the first part of the record is very good in the lower and middle troposphere (see Section 7.4), I decided to correct for a possible pressure offset. I estimated P in the soundings prior to 15 April so as to cancel out the error in temperature at the lowest four levels. The corresponding profile (open squares) fits very well with the reconstructions for both temperature and geopotential height. Note, however, that this correction (P = −30 hPa) is extraordinary: the second largest correction applied to any series was 7 hPa. Because of the precision of the data and the reliability of the reference series, the correction was accepted and a ‘C’ data set was formed by combining the corrected first part with the second part. For the other records from western and central Europe (data from the Lindenberg compilation, Wetterbericht, UK Weather Report, and Torslanda) the reconstructions were reliable up to around 300 hPa and most of the series sufficiently long so that the 95% confidence intervals for the bias were mostly small (±0.3 to ±0.6 ° C for temperature). Around 40% of the records were corrected because a significant bias was found. Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 783 HISTORICAL UPPER-AIR DATA SET a 300 Pressure [gph] Pressure [gph] The validation was more difficult for radiosondes from the former Soviet Union. On the one hand, reliable reference series could often be obtained only up to 500 hPa. On the other hand, most of the records are short. In addition, there is a sampling error in the monthly mean values because the data were not reported regularly in the data source (Wetterbericht). The 95% confidence interval for the bias was often in the range of ±1 to ±1.5 ° C, i.e. clearly larger than the target accuracy, which drastically reduces the power of the test. Possible problems were identified for eight records. In four cases, corrections were applied, and in one case, the beginning and ending of the record were rejected. In the other three cases, a correction was not possible or not appropriate. The data from the former Soviet Union and the corrections need to be re-examined if more data become available. However, at least the procedure can be used to assess the standard correction adopted for the radiosonde data when pooling the differences from seven records that were considered reliable. Figure 5 shows the mean differences and 95% confidence intervals (two standard errors) for the R.DCA and R.DCRA data. Although very large offsets are found in the R.DCA data, the standard correction leaves only small, insignificant differences. The standard correction is also slightly better than an alternative correction using λ0 = 15 s (squares, see Section 6.2), which would result in significant differences. The procedure works much better for the US data. The reference series were mostly reliable up to 400 hPa (temperature) and 300 hPa (geopotential height), with the exception of the tropical sites. The number of monthly mean values is high for most of the series, so that the 95% confidence interval for the bias was normally in the range of ±0.3 to ±0.5 ° C. The results clearly show that there is no need for any radiation and lag correction. In general, the series were found to be accurate. Nine records were corrected for temperature offsets, mostly around 0.5 ° C (no pressure offsets were found). As an example, the results for Charleston are shown in Figure 6. The temperature offset is almost constant with altitude and easy to correct for, whereas 500 700 -4 -2 0 2 4 300 b 500 700 -100 -50 Temp. difference [°C] 0 50 100 GPH difference [gpm] Figure 5. Mean differences between observations and reconstructions of temperature (a) and geopotential height (b) at different levels for radiosonde data from seven sites from the former Soviet Union pooled (n is between 9 and 80). Open diamonds: R.DCA data; filled diamonds: R.DCRA data; open squares: alternative DCRA data when using λ0 = 15 s. Error bars for R.DCA and R.DCRA data give the 95% confidence interval 300 a Pressure [gph] Pressure [gph] 300 500 700 850 b 500 700 850 -4 -2 0 2 Temp. difference [°C] 4 -100 -50 0 50 100 GPH difference [gpm] Figure 6. Mean differences (error bars give ±1 standard error) between observations and reconstructions of temperature (a) and geopotential height (b) at different levels for radiosonde data from Charleston, USA. Filled diamonds: R.DCRA data; open squares: C.DCRA data. The shaded error bars give ±1 standard error of the reconstructed mean anomaly Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 784 S. BRÖNNIMANN there remains a small but significant offset in geopotential height (no additional correction was performed). In five other records, small but slightly significant offsets were accepted without correction, one record was entirely rejected and one was partly rejected. In many records, the lowest level (850 hPa) was rejected. This is probably related to the structure of the planetary boundary layer, either due to the interpolation from altitude levels to pressure levels or because the topography and the boundary layer are not represented accurately in the reference data set (NCEP–NCAR). A summarized assessment of the accuracy is given in Table II and Figure 7. Averaged over all series from one data source, the bias is very small in the ‘C’ data set. Some small systematic errors, such as too high temperatures in the data from the USA and Sweden or too low a geopotential height in the Lindenberg data, are possible. The comparison with the ‘R’ data set shows that the corrections, on average, led to lower temperatures and geopotential height. Figure 7 shows the mean bias and 95% confidence interval for each series in the form of a map. The 400 hPa level was chosen because it is the highest level for which an assessment is possible for more than just a few series. The figure for geopotential height shows a possible tendency towards a negative bias in western and central Europe and a positive bias for the Alaskan stations. No spatial patterns of the bias can be found in the map for temperature. Most of the estimated biases in North America, as well as in western and central Europe, are small and relatively well determined (small confidence intervals). However, the differences between the candidate and the reference series are larger and less well determined in the case of the former Soviet Union because of short candidate series and unreliable reference series. The large circles do not indicate that there is a bias or that the quality is low. Rather, they indicate 400 hPa Geopotential height 400 hPa Temperature Target Mean bias 95% confidence interval Geopotential height Temperature 4 7.5 12 18 25 35 45 [gpm] 0.11 0.22 0.35 0.55 0.75 1 1.3 [°C] Figure 7. Map of the mean bias and its 95% confidence interval (2 SEO – T ) for temperature and geopotential height at 400 hPa in the data set ‘C’. Dashed circles indicate series that were corrected during the assessment procedure. Only series with n ≥ 5 are displayed Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 785 HISTORICAL UPPER-AIR DATA SET Table II. Mean difference between the candidate series and the reference series in the data sets ‘C’ and ‘R’ (without rejected series) sorted by archival source, variable, and pressure level Level (hPa) Data set ‘C’ Linden. Temperature (° C) 700 −0.05 600 0.09 500 0.04 400 0.00 300 0.14 250 Data set ‘R’ without rejected series Wetterb. UK-WR Torslanda MWR Linden. Wetterb. UK-WR Torslanda MWR 0.01 −0.15 −0.21 0.11 −0.19 0.17 0.14 0.21 0.02 0.03 0.20 0.05 0.09 0.05 0.02 0.29 0.68 0.94 1.07 1.20 1.19 0.23 0.06 −0.05 0.20 0.10 0.88 0.14 0.21 0.02 0.03 0.20 0.15 0.21 0.17 0.14 0.40 Geopotential height (gpm) 700 −2.1 1.1 600 −3.0 −0.2 500 −2.6 −1.6 400 −4.6 0.7 300 −1.8 12.3 250 0.12 −0.15 −3.2 −1.7 −4.6 1.02 0.11 −0.4 0.1 −0.7 −2.0 1.1 −3.0 −1.1 1.6 −0.1 −0.9 4.9 7.6 13.1 18.3 30.9 3.8 3.0 1.2 2.4 20.7 2.2 44.9 −0.4 0.1 −0.7 −2.0 1.1 −2.1 0.8 4.1 3.4 3.2 −0.1 that the assessment significantly lacks power in these cases. In fact, as was shown in Figure 5, only a small bias is found when these short series are pooled together. In summary, the bias is well within the pre-defined target for most of the sites. However, the series from the former Soviet Union cannot be assessed with the same explanatory power, especially at higher levels. The systematic errors are small. 7.5. Precision of monthly mean values The assessment of the precision was performed for the ‘C’ data. Only 14 series from seven sites (see full report for details) violated the assumption that the standard deviation is not larger than the target standard deviation, which corresponds to around 2% of all variables tested. However, the test is not very powerful at upper levels. Here, the method of comparing the differences between the series of two neighbouring stations is used for five station pairs in Europe and the USA. For each pair, the data from one station were adjusted for the climatological difference between the sites. These comparisons revealed no case with too high a variability except for the pair Kjeller/Torslanda (255 km distance), where five series (four temperatures) were outside the specified target. This could point to a possible precision problem. Because at Torslanda, ascents were performed only twice per week, the larger variance could be due to the less frequent sampling. On the other hand, the variance was only slightly larger than the critical value, and because the test does not account for the true variance (not caused by measurement errors) of the differences, I did not consider this as sufficient evidence for rejecting any of the series. A better data precision for monthly mean values (at the price of less data) would be obtained by defining more rigorous criteria for monthly averages. This could be more appropriate for certain applications. Note, also, that a low fraction of ascents reaching high levels can cause a ‘fair weather’ bias in the monthly mean values if the burst of the balloon depends on the meteorological conditions. 7.6. Precision of single ascents For the case of Freiburg i. B., the mountain sites Säntis and Jungfraujoch can be used to assess the precision of daily temperature and geopotential height data at the 700 hPa and 600 hPa levels respectively. The comparison was restricted to pairs of measurements that were at most 3 h apart. The data that were not adjusted for the diurnal cycle (C.DCR) were used. Figure 8 shows the corresponding scatter plots. One can make the simple assumption that a linear function of the data from Jungfraujoch and Säntis (i.e. a least-squares Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 786 S. BRÖNNIMANN a Frei. 600 hPa GPH [gpm] Frei. 700 hPa GPH [gpm] 3200 3000 2800 720 740 4400 b 4200 4000 3800 620 760 0 c 0 Frei 600 hPa T [°C] Frei 700 hPa T [°C] 640 660 Jungfraujoch p [hPa] Säntis p [hPa] -10 -20 -30 d -10 -20 -30 -40 -30 -20 -10 Säntis T [°C] 0 -40 -30 -20 -10 0 Jungfraujoch T [°C] Figure 8. Scatter plots of geopotential height (a, b) and temperature (c, d) measured with the radiosonde at 700 and 600 hPa at Freiburg (C.DCR data) versus pressure and temperature measured at the nearby mountain sites of Säntis and Jungfraujoch within at most 3 h from the sonde ascent regression line) represents the true values for Freiburg i. B. and that the entire variability found in the scatter plot is due to deviations of the radiosonde data. Note that mountain sites do not accurately represent the surrounding free troposphere. Also, there is an error of the meteorological measurements, and the distance of 200 km and time difference of 3 h also explain some variability. Hence, the variability of the radiosonde data is largely overestimated. Still, even under this extreme assumption, the targets for the precision specified for individual ascents or daily data are met in all four cases. For the upper levels, series from neighbouring sites can be compared on a daily scale. Here, the amount of data does not allow for a maximum time difference to be set, and the comparison was performed for the daily mean values (C.DCRD) from eight station pairs. As for the assessment of monthly mean values, one of the records was adjusted for the climatological differences between the two locations and it was assumed that the entire variability is due to imprecise data. Despite this restrictive assumption, most of the series were found to be within the specified targets (see full report for details). There were some exceptions, which concerned temperature more often than geopotential height and which were mostly caused by only one or two very large differences, pointing to remaining outliers. I did not reject any of the series, but note that a better outlier screening would be desired, especially for the data from the former Soviet Union. 7.7. Summarized quality assessment Based on all results, the following summarized quality assessment can be made: • 132 upper air records from 112 sites were digitized, of which 92 records (around 850 series) could be assessed using statistically reconstructed reference series on a monthly mean base. Further tests, involving Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 787 HISTORICAL UPPER-AIR DATA SET • • • • the comparison with neighbouring sites or nearby mountain sites, were performed for 22 records. 25 records (or parts thereof) were corrected; for eight records, significant differences were not corrected, and nine records or parts thereof were rejected. In several cases, the tests used for the validation lack power because of too short series or unreliable reference series. The accuracy is probably the quality criterion most difficult to meet. For most of the series it is within the specified targets up to the level where validations could be performed (500 to 200 hPa). However, for many records, this was only the case after corrections additional to the standard correction. The precision of monthly mean values is, in almost all cases, within the specified targets; exceptions concern mostly temperature. However, the tests lack power at upper levels. Only a few tests could be performed for the precision of daily mean data or individual soundings. The results generally indicate a good precision in the lower and middle troposphere. Again, there are some exceptions for temperature that point to remaining outliers in the data in some cases. The data set is released as Version 1.0. When more upper-air data become available for that time period, it will be possible to improve the quality of the existing records by repeating the plausibility screening and the validation experiments and reassessing the corrections. In particular, compiling more background information about the existing records would be of great importance. 8. ANOMALY MAPS FOR THE EARLY 1940s The data were compiled in order to study a supposed anomaly of atmospheric circulation during the early 1940s. A thorough analysis of this period is outside the scope of this paper, especially since the data set presented here is in many respects an intermediate product. Nevertheless, some preliminary results are presented in this section in order to test the consistency of the re-evaluated data. Figure 9 presents anomaly maps with respect to the 1961–90 mean seasonal cycle for three months: August 1940, February 1941, and January 1942. These months were chosen because they represent typical anomaly patterns, include summer and winter, and have sufficient upper-air data. The top row shows the anomalies in surface air temperature; the second row shows the 700 hPa temperature anomalies from the upper-air data (circles), supplemented with anomalies from several mountain sites between around 2350 and 3600 m a.s.l. (triangles, see Table III). The third row shows temperature anomalies at 400 and 200 hPa (inset) and the fourth row anomalies of geopotential height at 400 hPa and SLP. Note that the intervals used for plotting the data correspond to half the specified target precision for monthly mean data (1.6 ° C, 50 gpm). The anomalies of temperature at 700 hPa show consistent spatial patterns; those are similar to the surface air temperature anomalies, although there are slight differences in the magnitudes. The agreement between the upper-air data and the mountain sites is good; the only obvious outlier is a surface station. The main spatial Table III. Meteorological mountain sites used for supplementing the 700 hPa temperature data displayed in Figure 9 Station Jungfraujoch Sonnblick Mussala Pic Du Midi Dillon, CO Hermit Lomnicky Stitt Vf. Omu Säntis Izaña a MT: Latitude Longitude Altitude (m a.s.l.) Typea Source 46.55 47.05 42.2 43.07 39.63 37.8 49.2 45.5 47.23 28.3 7.98 12.95 23.6 0.15 −106.03 −107.1 22.22 25.4 9.35 −16.5 3582 3107 2927 2862 2763 2743 2635 2509 2490 2368 MT MT MT MT MV MV MT MT MT MT MeteoSwiss NASA–GISS NASA–GISS Dessens, 1991 NASA–GISS NASA–GISS NASA–GISS NASA–GISS MeteoSwiss NASA–GISS mountain top; MV: mountain valley. Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 788 S. BRÖNNIMANN Surface Air Temp. 700 hPa Temp. 400 hPa Temp. 200 hPa Temp. 400 hPa GPH SLP August 1940 Temperature anomaly [degC] February 1941 -8.8 -7.2 -5.6 -4 -2.4 -0.8 0.8 2.4 4 5.6 7.2 8.8 Copyright 2003 Royal Meteorological Society January 1942 Geopotential height anomaly [gpm] -175-125 -75 -25 25 75 125 175 Int. J. Climatol. 23: 769–791 (2003) HISTORICAL UPPER-AIR DATA SET 789 features are also similar at the 400 hPa level, but the magnitude of the anomalies is generally smaller. The strong warm anomaly at Jakutsk could be an outlier. Relatively consistent spatial patterns, but partly different from the ones at 400 hPa, are also found for the 200 hPa temperature. Here, the variability seems to be larger, which is also expected from the fact that fewer ascents reach that level. Geopotential height at 400 hPa also shows consistent spatial patterns. Some of them are different from, but not inconsistent with, the patterns in the SLP anomaly field. Clearly, there is insufficient spatial information in the data at 400 hPa (or any higher level) for direct conclusions about the hemispheric circulation, but there could be enough spatial information for a statistical reconstruction approach that makes use of all available data on all levels, including data from the Earth’s surface. The period from around winter 1939–40 to spring 1942 was characterized, at the Earth’s surface, by a strong Aleutian low, frequent low temperatures over central Europe and frequent high temperatures over Arctic Alaska. This pattern appears most pronounced in January 1942. The anomalies in monthly mean surface air temperature spanned a range of −9 to +9 ° C. The negative anomaly over Europe was also pronounced in the lower troposphere, but weaker at 400 hPa. The sign of the temperature changes around the tropopause and distinct positive anomalies are found in the stratosphere (200 and 100 hPa levels). The preliminary analysis suggests that the historical data capture the signal, i.e. the spatial pattern and amplitude of month-to-month variability relatively well. This is especially the case in the mid-latitudes and subpolar regions and in the winter season. The signal is presumably smaller and more difficult to detect in the subtropics and tropics. 9. CONCLUSIONS A large amount of historical upper-air data for the extratropical Northern Hemisphere for the 1939–44 period was compiled from several meteorological archives and digitized. Although the available information about the observations is very limited and independent data for use as reference series do not exist, methods were developed that allow for checking the plausibility, correcting, and assessing the data. The same concepts could be useful for re-evaluating other historical upper air data. The results suggest that the specified targets for the accuracy and precision of the data are generally met, although there are remaining uncertainties, especially for the short series and at upper levels. A preliminary analysis of anomaly maps reveals distinct spatial patterns in the upper-level data that are consistent with each other and with corresponding anomalies at the Earth’s surface. It is suggested that the data can be used to study synoptic to interannual variability. However, they should not be used to study long-term trends. The final data product is termed UA39 44, Version 1.0, and can be downloaded from the Website http://www.giub.unibe.ch/∼broenn/UA39 44/, together with a detailed description of the data set, the reevaluation procedure, and validation results (Brönnimann, 2003). There were several problems in the raw data set, the solution to which must be considered preliminary in some cases. When more information becomes available, the corrections and procedures will be re-assessed for future versions of the data set. 10. DATA SOURCES In addition to Beeliz and Robitzsch (1949) listed in the References section, the following archives were consulted: Figure 9 (see page 788). Maps of anomalies of temperature and geopotential height at different levels in August 1940, February 1941, and January 1942 with respect to the 1961–90 mean seasonal cycle. First row: surface air temperature (HadCRUTv; Jones et al., 2001). Second row: temperature at 700 hPa (S.DCRM, circles) and at several mountain sites (triangles, see Table III). Third row: 400 hPa temperature (S.DCRM) and 200 hPa temperature (inset, S.DCRM). Fourth row: 400 hPa geopotential height (S.DCRM) and SLP (Trenberth and Paolino, 1980) Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) 790 S. BRÖNNIMANN American Meteorological Society. 1939–1944. Meteorological and Climatological Data — Aerological Observations, Monthly Weather Review January 1939 to November 1942 and January to December 1944. Deutsche Seewarte. 1939–1944. Täglicher Wetterbericht, Übersicht über die Höhenaufstiege, Januar 1939 — Juni 1943. Bibliothek des Bundesamts für Seeschifffahrt und Hydrographie, Hamburg. Meteorological Office. 1939–1944. Daily Weather Report of the UK, Upper Air Section, January 1939 to December 1944. Archive of the UK Met Office, Bracknell, UK. Statens Meteorologisk–Hydrografiska Anstalt. 1939. Årsbok 18–19 (1936–1937), VI. Aerologiska iakttageleser i Sverige 1936/1937. Stockholm. Statens Meteorologisk-Hydrografiska Anstalt. 1942–1947 Årsbok 22–26 (1940–1944), VI. Aerologiska iakttageleser i Sverige. Stockholm. ACKNOWLEDGEMENTS I greatly appreciated the support during my visits of several meteorological archives, namely by W. Adam (Aerological Observatory Lindenberg), A. Lück (Bundesamt für Seeschiffahrt und Hydrographie, Hamburg), I. McGregor (UK Met Office, Bracknell), Mr Vogel and G. Stork (MeteoSwiss, Zurich). The Swedish Meteorological Service kindly sent photocopies of the Torslanda data. Andrea Kaiser digitized large parts of the UK and Swedish data, Nicolas Curien digitized parts of the US data. The comments of two anonymous referees are gratefully acknowledged. This work was funded by the Swiss National Science Foundation and the Holderbank Foundation. The data presented in this paper are based on information supplied by the UK Met Office and the Deutscher Wetterdienst. REFERENCES Barry RG. 1992. Mountain Weather and Climate, 2nd edn. Routledge: New York. Beelitz P, Robitzsch M. 1949. Arbeiten aus dem Aerologischen Archiv des Observatoriums Lindenberg Kr. Beeskow. Zusammengestellt von Direktor Dr. Paul Beelitz und Professor Dr. Max Robitzsch. Zusammenstellung der Radiosonden — Aufstiege von Mitteleuropa 1939–1944 (Nach dem Archiv des Observatoriums Lindenberg). I. Teil, Oktober 1949. II. Teil, Dezember 1949. Brönnimann S. 2003. Description of the 1939–1944 upper air data set (UA39 44) Version 1.0 . University of Arizona: Tucson. Dessens J. 1991. Secular trend of surface temperature at an elevated observatory in the Pyrenees. Journal of Climate 4: 859–868. Diamond H, Hinman Jr WS, Dunmore FW. 1937. The development of a radio-meteorograph system for the Navy Department. Bulletin of the American Meteorological Society 18: 73–99. Diamond H, Hinman Jr WS, Lapham EG. 1938. Comparisons of soundings with radio-meteorographs, aerographs, and meteorographs. Bulletin of the American Meteorological Society 19: 129–141. Durre I, Peterson TC, Vose RS. 2002. Evaluation of the effect of the Luers–Eskridge radiation adjustments on radiosonde temperature homogeneity. Journal of Climate 15: 1335–1347. Flohn H. 1944. Zum Klima der freien Atmosphäre über Sibirien, I. Temperatur und Luftdruck in der Troposphäre über Jakutsk. Meteorologische Zeitschrift 61: 50–57. Flohn H. 1947. Zum Klima der freien Atmosphäre über Sibirien, II. Die regionale winterliche Inversion. Meteorologische Rundschau 1: 75–79. Gaffen DJ. 1993. Historical changes in radiosonde instruments and practices. WMO Instruments and Observing Methods Report No. 50. WMO/TD-No. 541. Hesse W. 1961. Handbuch der Aerologie. Akademische Verlagsgesellschaft Geest & Portig: Leipzig. Hughes P, Gedzelman D. 1995. The new meteorology. Weatherwise 48(3): 26–36. Jones PD, Osborn TJ, Briffa KR, Folland CK, Horton EB, Alexander LV, Parker DE, Rayner NA. 2001. Adjusting for sampling density in grid box land and ocean surface temperature time series. Journal Geophysical Research 106: 3371–3380. Kalnay E, Kanamitsu M, Kistler R, Collins W, Deaven D, Gandin L, Iredell M, Saha S, White G, Woollen J, Zhu Y, Chelliah M, Ebisuzaki W, Higgins W, Janowiak J, Mo KC, Ropelewski C, Wang J, Leetmaa A, Reynolds R, Jenne R, Joseph D. 1996. The NCEP/NCAR 40-year reanalysis project. Bulletin of the American Meteorological Society 77: 437–471. Knowles Middleton I, Edwards HW, Johnson H. 1938. The lag coefficient of some meteorological thermometers. Bulletin of the American Meteorological Society 19: 321–326. Labitzke K, van Loon H. 1999. The Stratosphere: Phenomena, History, and Relevance. Springer: Berlin. Lander AJ. 1946. The British radiosonde. Weather 1: 21–24. Lange KO. 1937. The 1936 radio-meteographs of Blue Hill Observatory. Bulletin of the American Meteorological Society 18: 107–126. Luers JK, Eskridge RE. 1995. Temperature correction for the VIZ and Vaisala radiosondes. Journal of Applied Meteorology 34: 1241–1253. Luers JK, Eskridge RE. 1998. Use of radiosonde temperature data in climate studies. Journal of Climate 11: 1002–1019. Moninger WR, Mamrosh MD, Pauley PM. 2003. Automated meteorological reports from commercial aircraft. Bulletin of the American Meteorological Society 84: 203–216. Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003) HISTORICAL UPPER-AIR DATA SET 791 Moore CA, Neiburger M. 1945. Accuracy of radiosonde data (comment and reply). Journal of Meteorology 2: 80–81. OMI (Organisation Météorologique Internationale). 1951. Comparaison Mondiale des Radiosondes. Acte Final. Vol. I. Station Centrale Suisse de Météorologie. OMM (Organisation Météorologique Mondiale). 1952. Comparaison Mondiale des Radiosondes. World Comparison of Radiosondes. Acte Final . Vol. III. Station Centrale Suisse de Météorologie. Peterson TC, Vose RS. 1997. An overview of the Global Historical Climatology Network temperature data base. Bulletin of the American Meteorological Society 78: 2837–2849. Randel WJ, Wu F, Gaffen DJ. 2000. Interannual variability of the tropical tropopause from radiosonde data and NCEP reanalysis. Journal of Geophysical Research 105: 15 509–15 523. Raunio N. 1950. Amendments to the computation of the radiation error of the Finnish (Väisälä) radiosonde. Geophysica 4: 14–20. Santer BD, Hnilo JJ, Wigley TML, Boyle JS, Doutriaux C, Fiorino M, Parker DE, Taylor KE. 1999. Uncertainties in observationally based estimates of temperature change in the free atmosphere. Journal of Geophysical Research 104: 6305–6333. Scherhag R. 1948. Neue Methoden der Wetteranalyse und Wetterprognose. Springer: Berlin. Schmutz C, Gyalistras D, Luterbacher J, Wanner H. 2001. Reconstruction of monthly 700, 500 and 300 hPa geopotential height fields in the European and eastern North Atlantic region for the period 1901–1947. Climate Research 18: 181–193. Scrase FJ. 1954. Radiation and lag errors of the Meteorological Office radiosonde and the diurnal variation of upper-air temperature. Quarterly Journal of the Royal Meteorological Society 80: 565–578. Trenberth KE, Paolino DA. 1980. The Northern Hemisphere sea level pressure data set: trends, errors, and discontinuities. Monthly Weather Review 108: 855–872. Tweles S, Finger FG. 1960. Reduction of diurnal variation in the reported temperatures and heights of stratospheric constant-pressure surfaces. Journal of Meteorology 17: 177–194. Väisälä V. 1941. Der Strahlungsfehler der finnischen Radiosonde. Mitteilungen des Meteorologischen Instituts der Universität Helsinki No. 47. Väisälä V. 1949. Solar radiation intensity at the ascending radiosonde. Geophysica 3: 37–55. Von Storch H, Zwiers FW. 1999. Statistical Analysis in Climate Research. Cambridge University Press: Cambridge. Zaitseva NA. 1993. Historical developments in radiosonde systems in the former Soviet Union. Bulletin of the American Meteorological Society 74: 1893–1900. Copyright 2003 Royal Meteorological Society Int. J. Climatol. 23: 769–791 (2003)
© Copyright 2026 Paperzz