ENSEMBLE regional forecasting system and

MACC‐II Deliverables D_102.1_8.1 & D106.1_8.1 ENSEMBLE regional forecasting system and performances Date: 06/2012 Lead Beneficiary: MF‐CNRM (#23) Nature: R Dissemination level: PU Grant agreement n°283576
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
Work‐package Deliverable Title 102 and 106 (ENS, model forecasts and verification)
D_102.1 and D_106.1
Dossiers documenting regional forecasting systems and their performances & Verification part of model dossiers commenting in particular skill scores by trimestrial periods R PU MF‐CNRM (#23)
06/2012
Final version
Nature Dissemination Lead Beneficiary Date Status Authors Approved by Contact Philippe Moinat (MF‐CNRM, #23) Virginie Marécal (MF‐CNRM, #23)
Virginie Marécal
[email protected]
This document has been produced in the context of the MACC‐II project (Monitoring Atmospheric Composition and Climate ‐ Interim Implementation). The research leading to these results has received funding from the European Community's Seventh Framework Programme (FP7 THEME [SPA.2011.1.5‐02]) under grant agreement n° 283576. All information in this document is provided "as is" and no guarantee or warranty is given that the information is fit for any particular purpose. The user thereof uses the information at its sole risk and liability. For the avoidance of all doubts, the European Commission has no liability in respect of this document, which is merely representing the authors view. 2 / 2
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
Executive Summary / Abstract The MACC II (Modelling Atmospheric Composition and Climate, www.gmes‐atmosphere.eu) project is establishing the core global and regional atmospheric environmental service delivered as a component of Europe's GMES (Global Monitoring for Environment and Security) initiative. The regional forecasting service provides daily 3‐days forecasts of the main air quality species (ozone, NO2, and PM10) from 7 state‐of‐the‐art atmospheric chemistry models and from the median ensemble calculated from the 7 model forecasts. This report documents the Ensemble regional forecasting system and its statistical performances against in‐situ surface observations for quarter #11 (December 2011, January, February 2012) and quarter#12 (March, April, May 2012). It follows the last MACC dossier on the Ensemble regional forecasting system and performances covering the period September/October/November 2011. The improvement of the Ensemble forecasting system achieved during quarters #11 and #12 rely on the general improvement of the 7 individual model skills. In MACC, all available data were used for the verification statistics in the model dossiers. A new procedure is used from quarter #11 onwards. It is based on the use of only representative sites selected from the objective classification proposed by Joly and Peuch (Atmos. Env. 2012). For the two first quarters of MACC‐II presented here, we show the scores not only against representative sites but also against all available data in order to be able to discriminate between the improvements coming from model updates and those coming from the verification dataset used. Compared to all station data for verification, the ensemble median shows significantly better skills for quarter#11 and quarter#12 compared to the previous winter and spring. This is directly related to the improvements of the 7 individual models and to the use now in quarter#11 and 12 in all models of recent emission inventories. The use of data from representative sites for verification provides an overall improvement of the statistical indicators (bias, root mean square error, correlation) for ozone, NO2 and PM10. Since it is important to use only data representative of the model resolution for a fair verification, this is a very significant step forward of the verification procedure. 3 / 3
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
Table of Contents 1. ENSEMBLE facts sheet ........................................................................................................... 5 1.1. Products portfolio ........................................................................................................... 5 1.2. Performance statistics ..................................................................................................... 5 1.3 Availability statistics ......................................................................................................... 5 1.4 Assimilation and forecast system: synthesis of main characteristics .............................. 5 2. Evolution in the ENSEMBLE suite ........................................................................................... 7 3. ENSEMBLE background information ...................................................................................... 8 3.1 Method ............................................................................................................................. 8 3.2 Individual models ............................................................................................................. 8 3.3 Air Quality EPSgrams ....................................................................................................... 9 ANNEX A: Verification report for quarter#11 .......................................................................... 11 ANNEX B: Verification report for quarter#12 .......................................................................... 22 4 / 4
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
1. ENSEMBLE facts sheet 1.1. Products portfolio Name Description FRC Forecast at the surface, 500m, 1000m, 3000m, and 5000m above ground EPS EPS grams from forecasts at the surface for 40 large european cities ANA Analysis at the surface Freq. Available Daily 7 UTC Species O3, NO, NO2, CO, SO2, PM10 Time span 0‐72h, hourly Daily 7 UTC O3, NO2, SO2, PM10 0‐72h, 3‐hourly Daily 17 UTC O3 0‐24h of the day before, hourly 1.2. Performance statistics See annexes. 1.3 Availability statistics Quarter 11 (December 2011, January 2012, February 2012) : 100% of the ENSEMBLE forecasts and 100% of the ENSEMBLE analyses have been provided in time. Quarter 12 (March 2012, April 2012, May 2012) : 100% of the ENSEMBLE forecasts and 92% of the ENSEMBLE analyses have been provided in time. ENSEMBLE analyses were missing for 6, 7, 8, 9, 10 April 2012 and 12, 13 May 2012. From 6 to 10 April 2012, no observations were retrieved by MF‐CNRM leading to no analysis production. On 12 and 13 May 2012, the production server had a severe hardware problem at MF‐CNRM. Production was back on 14 May using the newly installed back‐up server but that was not fully operational at this time. 1.4 Assimilation and forecast system: synthesis of main characteristics 5 / 5
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
Ensemble forecasts (analyses are not yet available) Horizontal resolution 0.1° regular lat‐lon grid Domain (15°E‐35°W, 35°N‐70°N) Ensemble Method Median model : for each grid‐cell the value corresponds to the median of the different model values. Individual models considered 7 models : CHIMERE, EMEP, EURAD, LOTOS‐
EUROS, MATCH, MOCAGE, SILAM (voluntary contribution) 6 / 6
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
2. Evolution in the ENSEMBLE suite 2009/06/01: start of MACC pre‐operational ensemble forecasts, in continuation from GEMS. 2011/03/07: start of MACC pre‐operational ensemble analysis 2011/11/01: continuation in MACC‐II of MACC of pre‐operational ensemble forecasts and analyses 7 / 7
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
3. ENSEMBLE background information 3.1 Method The ENSEMBLE is currently based upon a median value approach. For each timestep of the daily forecasts, the different individual model surface fields (see 2.) are interpolated on a common regular 0.1°x0.1° grid over the MACC European domain (15°E‐35°W, 35°N‐70°N). For each point of this grid, the ENSEMBLE model value is simply defined as the median value of all the individual models forecasts on this point. The median is defined as the value having 50% of individual models with higher values and 50% with lower values. Thus, this method is rather insensitive to outliers in the forecasts, which is a useful property in the current pre‐operational set‐up. The method is also little sensitive if a particular model forecast is occasionally missing. Other ensemble processing approaches were tested in the context of MACC, considering in particular weights that depend on each individual model’s skill. They have not yet proven more successful than the ensemble median and have thus not been implemented for routine daily production. The report D_R‐ENS_3.2 describe some of the results obtained. It appears that using weights that reflect seasonal/climatological skill of the individual models do not provide better results than the median of the ensemble. Another method has been tested which involves all the individual model forecasts in a weighted sum, the weights being computed locally at each grid point from the root mean square difference between the forecast obtained the previous day and the corresponding median ensemble analysis. The new ensemble forecasts provide better scores than the median ensemble forecasts over a 1‐month period in summer 2011, which we have investigated. In the next period of MACC‐II, this method will therefore be further tested (and over a longer period) before being implemented in daily operations. 3.2 Individual models During the period month 1 to month 7 (December 2011 to May 2012), seven individual models have been used to compute ensemble products: Model names Institutes Horizontal resolution (Europe) CHIMERE INERIS. CNRS (France) 25 km EMEP met.no (Norway) 0.25° EURAD FRIUUK (Germany) 15 km LOTOS‐EUROS KNMI, TNO (The Netherlands) 0.25° (lon) x 0.125° (lat) MATCH SMHI (Sweden) 0.2° MOCAGE METEO‐FR, CERFACS (France) 0.2° SILAM FMI (Finland, unfunded) 0.2° 8 / 8
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
For more information on the MACC‐II ENS participating models (CHIMERE, EMEP, EURAD, LOTOS‐EUROS, MATCH, MOCAGE, SILAM), please refer to each individual model’s dossier. 3.3 Air Quality EPSgrams Daily, “EPSgrams” for 40 major European cities are produced. Such graphics are common for presenting ensemble meteorological forecast products but, to our knowledge, this is the first experimental implementation worldwide in the field of Air Quality, which started within the GEMS project. Figure 1 presents an example of AQ EPSgram for the area of Amsterdam (The Netherlands), for the 72h forecast based on Tuesday March 31st 2009. For the 4 main pollutants (ozone, NO2, SO2 and PM10) forecasts are plotted every three hours as bars, which indicate the range of forecasts of individual ensemble members (minimum, maximum and percentiles 10, 25, 50, 75 and 90). This presentation allows users to assess the dispersion within the ensemble for each species and each 3‐hourly forecast horizon at the given location of the EPSgram. It has to be noted that the 40 selected sites are actually large cities: European capitals and largest conurbations. The forecasts are based upon models that have resolutions of ~15km to 50km, which is too coarse to account for very local and urban effects (high primary pollutants, titration of ozone…). The AQ EPSgrams presented have thus to be taken with caution ; the forecast does not correspond to city centre values, but rather to values representative of the background in the area of the city. A message informs users of the MACC‐II platforms of this caveat. 9 / 9
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
Figure 1: Example of a Air Quality EPSgram. 10 / 10
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ANNEX A: Verification report for quarter#11 This verification report covers the period December 2011/January/February 2012. For this report, average skill scores (bias, root mean square error, correlation) for the ENSEMBLE are successively presented for three pollutants : ozone, NO2 and PM10. The skill is shown for the entire forecast horizon 0 to 72h (3‐hourly values), allowing to evaluate the entire diurnal cycle and the evolution of performance from day 1 to day 3. In MACC, verifications were done against all available Near‐Real‐Time (NRT) data. For this first period of MACC‐II, verifications are now performed against selected available data among the NRT dataset for the following countries: Belgium, Czech Republic, Denmark, Finland, France, Germany, Greece (Athens area only), Italy (no more available since December 15th 2011), Netherlands, Norway, Poland, Spain, Sweden and the United Kingdom. The total number of sites is typically up to: 900 for ozone, 1200 for NO2, 550 for SO2, 300 for CO, 900 for PM10. As an example, the data coverage for December 14th 2011 10UTC is depicted below. Near‐Real‐Time data coverage (December 14th 2011, 10 UTC)
As proposed in the last MACC dossier for quarter 10, the typology of sites is now taken into account to select the NRT data for verification of forecasts. This is done because there is no uniform and reliable metadata currently for all regions and countries, which have all different approaches to this documentation. Skill scores are now computed using data selected following the work that has been carried out [Joly and Peuch, 2012] in MACC to 11 / 11
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
build an objective classification of sites, based on the past measurements available in Airbase (EEA) ; see MACC D_R‐ENS_5.1 for more details. This classification is now used in order to restrict verification only to the sites that have a sufficient spatial representativeness. About half of the sites received in NRT are declared “urban” in Airbase metadata and would be in principle not representative enough for comparing against forecast models that have horizontal resolutions of approximately 0.2°. However, our findings indicate that a significant fraction of sites labelled “urban” or “suburban” has actually little local character (as seen in past time series of measurements) and could be in fact more representative than what could be inferred from the metadata information. The figure below shows scores computed using all MACC data available in NRT and using only data for sites from classes 1‐5 of our objective classification (the “background” fraction). Sample model verification for the period 15/09/2010 to 28/02/2011 using all NRT sites (left) and using only most representative sites (classes 1‐5) only (right). Values are for NO2 (top) and PM10 (bottom) for 0300 UTC. The statistical approach using only representative sites ‐according to the objective classification‐ is clearly the way forward (as it does not also thin too much the NRT data available), leading to a general significant improvement of the overall skill scores for ozone, NO2 and PM10, as illustrated here. For this quarter, we present the scores not only against representative sites but also against all available data in order to be able to discriminate the improvements coming from model updates from those coming from the verification dataset used. The approach of using only representative stations also contributes to alleviate the issue of the large variation in site densities from one country to another: the countries with higher densities of observational sites, particularly Germany and France, actually “loose” many more sites than the others (also in proportion). Yet, the issue remains that the overall skill scores are largely governed by the behaviour of the models in the “data intensive” countries. In the frame of MACC‐II, we will work towards more segmented verification procedures (by countries and by large continental regions: Northern Europe, Eastern Europe, Mediterranean...), which will allow to be increasingly more specific on the description of AQ products quality. At the same time, we acknowledge that the amount of 12 / 12
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
information provided has to remain within reasonable bounds, in order not to confuse general users. Within MACC, the D‐INSITU subproject worked with the European Environment Agency (EEA) to set up a new, more extensive and robust Near‐Real‐Time dataflow. An EEA European Air Quality data stream is available since spring 2011 and it has been checked by D‐INSITU (in particular NILU) in MACC and continued by OBS partners in MACC‐II against the data currently received in NRT, which results from ad hoc bilateral agreement with Environment Agencies in 14 different countries. In MACC, D‐INSITU work showed a considerable number of differences, the EEA dataset reporting more sites for ozone and less for the other species. Efforts were done to merge the two data sources, by helping EEA to get access to data in the countries with which GEMS‐MACC has been in contact and which do not provide all their NRT data to EEA. The general objective to move to a EEA‐based dataflow was not yet readily feasible for the last periods of MACC. In the first period of MACC‐II the work to merge EEA and data currently received in NRT was finalized. The next step is now to set the procedure for the use of EEA dataset in the regional production chain. To ensure that the same data are used by all models for assimilation, EEA dataset will be retrieved by MF‐CNRM with a back‐up at ECMWF. Each regional team will then get from MF‐CNRM the dataset to be assimilated. This will also give the possibility to add information on the data to assimilate (or not to assimilate) such as adding representativeness information and flagging randomly some sites for making verification against independent data possible. The procedure for the use of EEA dataset will be setup and tested in the next MACC‐II period (m8 to m13). A further advantage will be that data arrive sooner through EEA, typically less than 3 hours after measurement. Currently, hourly data from the day before are available to MACC‐II between 2 and 10 UT every day, depending on the country: this delays the possible start of daily verification calculations and makes it impossible to base the daily forecasts upon the analysis of the day before ‐as forecasts have to be delivered early enough every morning, in order to serve users’ needs. Joly, M. and V.‐H. Peuch, 2012: Objective Classification of air quality monitoring sites over Europe, Atmos. Env., 47, 111‐123. 13 / 13
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: ozone skill scores against all available data, period #11 (December 2011, January and February 2012) For this indicator, the skill is improved compared to last year (winter 2010‐2011) by about 5 µg/m3 but ozone is still overestimated particularly around 9 and 18 UTC. RMSE is largely improved compared to the previous winter by about 8 µg/m3. The correlation coefficients are largely improved compared to the previous winter with a range of variation of 0.53‐0.62 for this winter and of 0.29‐0.39 for the previous winter. 14 / 14
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: ozone skill scores against data from representative sites, period #11 (December 2011, January and February 2012) Ozone biases are still positive but with a significantly lower overestimation when using data from representative sites for verification (reduction of about 4 µg/m3). The shape is similar for both verification datasets. RMSEs are also reduced with maxima around 19 µg/m3 for representative data instead of 22 for all data. The shape is similar for both verification datasets. Correlations are only slightly higher around noon when using only representative data. 15 / 15
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: NO2 skill scores against all available data, period #11 (December 2011, January and February 2012) For this indicator, the skill is improved compared to last year (winter 2010‐2011) by about 3‐4 µg/m3 but NO2 is still underestimated particularly around 9 and 18 UTC. RMSE is also improved compared to the previous winter by about 3‐
4 µg/m3. The correlation coefficients are largely improved compared to the previous winter with a range of variation of 0.375‐0.52 for this winter and of 0.19‐0.36 for the previous winter. 16 / 16
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: NO2 skill scores against data from representative sites, period #11 (December 2011, January and February 2012) NO2 biases are still negative but with a significantly lower underestimation when using data from representative sites for verification. Biases now range from ‐3.5 to ‐11 µg/m3 instead of ‐
6 to ‐20 µg/m3 when using all data available. RMSEs are also largely reduced with maxima around 20 µg/m3 for representative data instead of 31.5 for all data. Results around noon are particularly better. Correlations are mainly changed around noon and are significantly higher when using only representative data. 17 / 17
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: PM10 skill scores against all available data, period #11 (December 2011, January and February 2012) For this indicator, the skill is improved compared to last year (winter 2010‐2011) by about 3 µg/m3 but PM10 is still underestimated particularly during daytime. RMSE is also improved compared to the previous winter by about 4 µg/m3. The correlation coefficients are improved compared to the previous winter with a range of variation of 0.35‐0.39 for this winter and of 0.28‐0.33 for the previous winter. The shape of the correlation coefficient is different from the previous winter with poorer correlations earlier at night than for the previous winter. 18 / 18
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: PM10 skill scores against data from representative sites, period #11 (December 2011, January and February 2012) PM10 biases are still negative but with a lower underestimation when using data from representative sites for verification. Biases now range from ‐6.7 to 10 µg/m3 instead of ‐
8.8 to ‐12.5 µg/m3 when using all data available. RMSEs are also reduced of about 2 µg/m3 when using representative data instead of all data available. Correlations are mainly changed around noon with significantly larger correlations when using only representative data. 19 / 19
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
Analysis of ENSEMBLE scores against the previous winter scores based on all stations verification Skill scores for NO2 are significantly improved compared to the previous winter with a reduction of the bias and RMSE and a large increase of the correlation. On the bias and RMSE, this can be partly attributed to the general improvement of the individual models (mainly MOCAGE, LOTOS‐EUROS, CHIMERE and EMEP). Also the use of updated emission inventories now for all models is expected to lead to improved performances since NO2 depends closely to surface emissions. This also likely explains the improvement of the correlation. All skill scores for ozone are significantly improved compared to the previous winter. Similarly to NO2, the correlation is largely improved and is always over 0.5. This can clearly be linked to the NO2 skill improvements since NO2 is a very important ozone precursor. This indicates that the 7 models used to produce the ensemble forecasts are on average able to provide a consistent improvement of NO2 and ozone, as expected. Skill scores for PM10 are significantly improved compared to the previous winter. But the improvement on correlations is lower than for ozone and NO2. PM10 correlations are still poor and exhibit a different shape of variations with forecast time compared to the previous winter. PM10 are very sensitive to the variations of emissions during the day. This variation is likely not enough well represented in the models. The negative bias is linked to components and processes that are missing in the models. This bias and the RMSE are nevertheless significantly better this winter because of the improvement of the aerosols representation itself (as in EMEP model) and of the improvement of other model features acting indirectly on PM10. The comparison with scores from the previous winter shows a clear overall improvement of the ENSEMBLE skills. The correlations exhibit in quarter#11, a stronger tendency to decrease with the forecast day. This may be related to the meteorological forcings used to constraint the chemistry models which deteriorate with the forecasting day. Analysis of ENSEMBLE scores using representative sites against using all data When using all available data for verification, no sorting is done between urban and other stations. In urban areas, pollutants vary rapidly with time and space. The use of the urban station data is expected to degrade the individual models performances and thus the ENSEMBLE since urban station measurements are not representative at the model resolution (10‐20 km). This is particularly true for primary pollutants such as NO2 and PM10. From quarter#11, we have the opportunity to discuss this issue from the comparison of the verification plots calculated from all available data and from the data of only representative sites for each species. As for quarter#11, the impact on NO2 scores of using only representative data is very important with a very large reduction of the negative bias (nearly a factor of 2) and a decrease of RMSE of about 40%. All skill scores for NO2 are improved around noon/early 20 / 20
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
afternoon indicating that removing urban stations is particularly important on the ENSEMBLE performances at this time of the day. The impact of using only representative data for verification for ozone is also very significant on biases and RMSEs. The means bias is down to ~4 µg/m3 on average and the mean RMSE is down to ~19 µg/m3 on average. This means that the sorting of representative stations has also a positive impact on ozone. Correlations are only slightly improved. The effect on ozone is less important than on NO2 since ozone is not a primary pollutant and has a longer life time than NO2. The impact on PM10 of using a representative dataset is similar to NO2, meaning that it is positive on biases and RMSEs. Correlations for PM10 are slightly improved around noon. Thus, removing urban stations is also important on the ENSEMBLE performances for PM10. As for quarter#11, the decrease of the correlations with the forecast day also appears when using the representative dataset. This shows that this feature does not depend on the verification dataset. 21 / 21
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ANNEX B: Verification report for quarter#12 This verification report covers the period March/April/May 2012. For this report, average skill scores (bias, root mean square error, correlation) for the ENSEMBLE model are successively presented for three pollutants : ozone, NO2 and PM10. The skill is shown for the entire forecast horizon 0 to 72h (3‐hourly values), allowing to evaluate the entire diurnal cycle and the evolution of performance from day 1 to day 3. In MACC, verifications were done against all available Near‐Real‐Time (NRT) data. For the first period of MACC‐II (Quarter 11), verifications are performed against selected available data among the NRT dataset for the following countries: Belgium, Czech Republic, Denmark, Finland, France, Germany, Greece (Athens area only), Netherlands, Norway, Poland, Spain, Sweden and the United Kingdom. The total number of sites is typically up to: 900 for ozone, 1200 for NO2, 550 for SO2, 300 for CO, 900 for PM10. As an example, the data coverage for April 5th 2012 10UTC is depicted below. Near‐Real‐Time data coverage (April 5th 2012, 10 UTC)
As proposed in the last MACC dossier (quarter 10), the typology of sites is now taken into account to select the NRT data for verification of forecasts. This is done because there is no uniform and reliable metadata currently for all regions and countries, which have all different approaches to this documentation. Skill scores are now computed using data 22 / 22
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
selected following the work that has been carried out [Joly and Peuch, 2012] in MACC to build an objective classification of sites, based on the past measurements available in Airbase (EEA) ; see MACC D_R‐ENS_5.1 for more details. This classification is now used in order to restrict verification only to the sites that have a sufficient spatial representativeness. About half of the sites received in NRT are declared “urban” in Airbase metadata and would be in principle not representative enough for comparing against forecast models that have horizontal resolutions of approximately 0.2°. However, our findings indicate that a significant fraction of sites labelled “urban” or “suburban” has actually little local character (as seen in past time series of measurements) and could be in fact more representative than what could be inferred from the metadata information. The figure below shows scores computed using all MACC data available in NRT and using only data for sites from classes 1‐5 of our objective classification (the “background” fraction). Sample model verification for the period 15/09/2010 to 28/02/2011 using all NRT sites (left) and using only most representative sites (classes 1‐5) only (right). Values are for NO2 (top) and PM10 (bottom) for 0300 UTC. The statistical approach using only representative sites ‐according to the objective classification‐ is clearly the way forward (as it does not also thin too much the NRT data available), leading to a general significant improvement of the overall skill scores for ozone, NO2 and PM10, as illustrated here. For this quarter, we present the scores not only against representative sites but also against all available data in order to be able to discriminate the improvements coming from model updates from those coming from the verification dataset used. The approach of using only representative stations also contributes to alleviate the issue of the large variation in site densities from one country to another: the countries with higher densities of observational sites, particularly Germany and France, actually “loose” many more sites than the others (also in proportion). Yet, the issue remains that the overall skill scores are largely governed by the behaviour of the models in the “data intensive” countries. In the frame of MACC‐II, we will work towards more segmented verification procedures (by countries and by large continental regions: Northern Europe, Eastern Europe, Mediterranean...), which will allow to be increasingly more specific on the description of AQ products quality. At the same time, we acknowledge that the amount of 23 / 23
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
information provided has to remain within reasonable bounds, in order not to confuse general users. Within MACC, the D‐INSITU subproject worked with the European Environment Agency (EEA) to set up a new, more extensive and robust Near‐Real‐Time dataflow. An EEA European Air Quality data stream is available since spring 2011 and it has been checked by D‐INSITU (in particular NILU) in MACC and continued by OBS partners in MACC‐II against the data currently received in NRT, which results from ad hoc bilateral agreement with Environment Agencies in 14 different countries. In MACC, D‐INSITU work showed a considerable number of differences, the EEA dataset reporting more sites for ozone and less for the other species. Efforts were done to merge the two data sources, by helping EEA to get access to data in the countries with which GEMS‐MACC has been in contact and which do not provide all their NRT data to EEA. The general objective to move to a EEA‐based dataflow was not yet readily feasible for the last periods of MACC. In the first period of MACC‐II the work to merge EEA and data currently received in NRT was finalized. The next step is now to set the procedure for the use of EEA dataset in the regional production chain. To ensure that the same data are used by all models for assimilation, EEA dataset will be retrieved by MF‐CNRM with a back‐up at ECMWF. Each regional team will then get from MF‐CNRM the dataset to be assimilated. This will also give the possibility to add information on the data to assimilate (or not to assimilate) such as adding representativeness information and flagging randomly some sites for making verification against independent data possible. The procedure for the use of EEA dataset will be setup and tested in the next MACC‐II period (m8 to m13). A further advantage will be that data arrive sooner through EEA, typically less than 3 hours after measurement. Currently, hourly data from the day before are available to MACC‐II between 2 and 10 UT every day, depending on the country: this delays the possible start of daily verification calculations and makes it impossible to base the daily forecasts upon the analysis of the day before ‐as forecasts have to be delivered early enough every morning, in order to serve users’ needs. Joly, M. and V.‐H. Peuch, 2012: Objective Classification of air quality monitoring sites over Europe, Atmos. Env., 47, 111‐123. 24 / 24
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: ozone skill scores against all available data, period #12 (March, April, May 2012) The general pattern is similar to the previous spring. The bias is reduced by 1‐2 µg/m3. Improvements are mainly around 06 UTC. The improvements are more important on the RMSE than on the bias compared to the previous spring (reduction of 5‐6 µg/m3). The variability with time is not modified. RMSEs are largest around 06 UTC. The correlations are largely improved compared to the previous spring with a range of variation of 0.43 ‐ 0.6 for this spring and of 0.17‐0.36 for the previous spring. This means that the 0.5 level is often achieved mainly during daytime. The correlation tends to decrease more on average with the forecast day than previously. 25 / 25
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: ozone skill scores against data from representative sites, period #12 (March, April, May 2012) Ozone biases are still positive but with significantly lower values when using representative data (reduction of 3 µg/m3).The shape is similar for the two verification datasets. The bias is below 3 µg/m3 in the afternoon. RMSE are also reduced mainly around 06 UTC. The shape is similar for the two verification datasets. Correlations are slightly larger
compared to verification against all data, mainly at night. There is the same tendency to decrease with time. 26 / 26
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: NO2 skill scores against all available data, period #12 (March, April, May 2012) There is a reduction of the ENSEMBLE bias of 2‐3 µg/m3 compared to the previous spring but NO2 is still underestimated particularly around 9 and 18 UTC. RMSE is also improved compared to the previous spring by about 4 µg/m3. The lowest RMSE is found at nighttime. The correlation coefficients are largely improved compared to the previous winter with a range of variation of 0.32‐0.52 for this spring and of 0.18‐0.36 for the previous spring. 27 / 27
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: NO2 skill scores against data from representative sites, period #12 (March, April, May 2012) There is a large reduction of the ENSEMBLE bias when using representative stations instead of all available data. Biases now range from ‐3.5 to ‐12.5 µg/m3 instead of ‐6 to ‐22 when using all data. RMSEs are also largely reduced with a mean around 14 µg/m3 for representative data and 23 µg/m3 for all data. Results around noon are particularly improved. Correlations are larger by about 0.06 on average, and more around noon. 28 / 28
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: PM10 skill scores against all available data, period #12 (March, April and May 2012) The ENSEMBLE has a negative bias which is slightly reduced during daytime compared to the previous spring. The bias tends to deteriorate with the forecasting day while its tendency was to improve in the previous spring statistics. The RMSE on PM10 has decreased of 3‐4 µg/m3 compared to the previous spring. The largest values are around 8‐9 UTC. Correlations are increased compared to the previous spring by about 0.1. As for the bias and RMSE, the skill degrades with time. 29 / 29
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
ENSEMBLE: PM10 skill scores against data from representative sites, period #12 (March, April and May 2012) The bias is reduced when using the representative dataset by 1‐2 µg/m3. It tends to deteriorate with forecasting day similarly to the verification against all available data. RMSE is reduced when using the representative dataset by 1‐3 µg/m3. It tends to deteriorate with forecasting day similarly to the verification against all available data. Correlations are larger by about 0.08 when using the representative dataset and also the shape of the correlation is different. Best performances are around 9 UTC. 30 / 30
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
Analysis of ENSEMBLE scores against the previous spring scores based on all stations verification Skill scores for NO2 are significantly improved compared to the previous spring with a reduction of the bias and RMSE and an increase of the correlation. The performances for quarter#12 are very similar to those obtained for quarter#11. This can be partly attributed to the general improvement of the individual models (mainly MOCAGE, LOTOS‐EUROS, CHIMERE and EMEP) on the bias and RMSE. Also the use of updated emission inventories now for all models is expected to lead to improved performances since NO2 depends closely to surface emissions. This also likely explains the improvement of the correlation. All skill scores for ozone are also significantly improved compared to the previous spring. The correlation is largely improved and is now around 0.5. This can clearly be linked to the NO2 skill improvements since NO2 is a very important ozone precursor. This indicates that the 7 models used to produce the ensemble forecasts are on average able to provide a consistent improvement of NO2 and ozone, as expected. The improvement of the skill scores for ozone for quarter#12 is only slightly lower than the improvements for quarter#11. RMSE and correlations for PM10 are significantly improved compared to the previous spring. PM10 correlations are still generally poor but are significantly improved at night compared to the previous spring. As for quarter #11, the negative bias is linked to components and processes that are missing in the models. The RMSE is nevertheless significantly better because of the improvement of the aerosols representation itself (as in EMEP model) or by the improvement of other model features acting indirectly on PM10. This indicates that the progress of the ENSEMBLE skills is robust to the change of season. This is not only true for primary pollutants but also for ozone which is produced by photolysis and therefore is sensitive to the stronger sunlight conditions prevailing in spring compared to winter. The comparison with scores from the previous spring shows a clear overall improvement of the ENSEMBLE skills. The correlations exhibit in quarter#12, as in quarter#11, a stronger tendency to decrease with the forecast day. This is possibly related to the meteorological forcings used to constraint the chemistry models which deteriorate with the forecasting day. Analysis of ENSEMBLE scores using representative sites against using all data When using all available data for verification, no sorting is done between urban and other stations. In urban areas, pollutants vary rapidly with time and space. The use of the urban station data is expected to degrade the individual models and thus the ENSEMBLE performances since urban station measurements are not representative at the model resolution (10‐20 km). This is particularly true for primary pollutants such as NO2 and PM10. As expected, the impact on NO2 scores of using only representative data is very important with a very large reduction of the negative bias (nearly a factor of 2) and a decrease of RMSE 31 / 31
File: D_ENS_ENSEMBLE_Dossier1_finalv.doc/.pdf
of about 50%. All skill scores for NO2 are improved around noon/early afternoon indicating that removing urban stations is particularly important on the ENSEMBLE performances at this time of the day. The impact of using only representative data for verification for ozone is also very significant on biases and RMSEs. The means bias is down to ~6 µg/m3 on average and the mean RMSE is down to ~19 µg/m3 on average. This means that the sorting of representative stations has also a positive impact on ozone levels. Correlations are only slightly improved. The effect on ozone is less important than on NO2 since ozone is not a primary pollutant and has a longer life time than NO2. The impact on PM10 of using a representative dataset is similar to NO2, meaning that it is very positive on biases and RMSEs. Correlations for PM10 are slightly improved around noon. Thus, removing urban stations is particularly important on the ENSEMBLE performances for PM10. As for quarter#11, the decrease of the correlations with the forecast day also appears when using the representative data. This shows that this feature does not depend on the verification dataset. 32 / 32