Short-term forecasting of ozone and NO2 levels using

Transactions on the Built Environment vol 64, © 2003 WIT Press, www.witpress.com, ISSN 1743-3509
Short-term forecasting of ozone and NO2 levels
using traffic data in Bilbao (Spain)
G. Ibarra-Berastegi
I
2
l,
'
'
I. Madariaga , E. Agirre & J. Uria
University of the Basque Country, Spain
Basurto Hospital. Bilbao, Spain
Abstract
Bilbao is a city with a population of half a million people located in NorthCentral Spain. In Bilbao, as many other cities in the world, pollution is mainly
due to photochemical smog and carbon monoxide. These pollutants are
originated by traffic. In this work, models based on Multiple Linear Regression
have been built to forecast up to 8 hours ahead ozone and NO2 levels using
current and past values up to 15 hours back of ozone, meteorology and traffic
measured in the area. The models were built for four locations in the area of
Bilbao. Traffic variables were calculated as the mean values of all the sensors in
the central area of Bilbao.
First, the models were adjusted with data of year 1993 and then, the obtained
coefficients were applied to year 1994 whose data were used to test the goodness
of the models. One feature of the models is that future levels of ozone and NO2
are predicted jointly with a system of two equations with two unknowns.The
Multiple Linear Regression models were built using stepwise regression and
tolerance filtering to choose the most meaningful variables. In all cases, traffic
related variables represented between 5% and 10% of the overall variability in
the explanatory models.
The results of the models improve persistance of levels and are as good or even
better than those obtained with much more sophisticated models. The results are
given according to the set of statistical parameters included in the Model
Validation Kit (R xorrelation coefficient-, NMSE-normalized mean square
error-, FA2- factor of two-, FB- fractional bias- and PS -fractional variance) so
that they can be compared with future models developed in the area.
Transactions on the Built Environment vol 64, © 2003 WIT Press, www.witpress.com, ISSN 1743-3509
236
Urban Tramport and the Ei~~~~,nrzrnatlr
it? rha 21st Canrurv
1 Introduction
Bilbao is a city with a population of half a million people located in NorthCentral Spain. In Bilbao, as many other cities in the world, pollution is mainly
due to photochemical smog and carbon monoxide. Ozone is not emitted directly
into the atmosphere but it is produced owing to the interaction of precursors and
meteorological effects. In the case of Bilbao the precursors are due to traffic [l].
NO2 is involved in the production of ozone and in these levels are related to
traffic.
In the area of Bilbao an air pollution and meteorological network is ruled by the
local authorities (Basque Goverment). Also the local municipality operates an
traffic network. In this work, using real time and past data measured in both
networks short-term (up to 8 hours ahead) forecasts are obtained for ozone and
NOz.
2 Air pollution modelling
There have been two classical approaches for air pollution modelling and
forecasting:
Models based on an attempt to describe step by step the physical and
chemical mechanisms describing the fate of pollutants in the
atmosphere (Cause-effecfmodels).
2. Models based on a "black box" approach in which if historical data are
available, the relationship between emissions and inmissions can be
"learnt" from past records for a given location. The obtained
relationships describing the behaviour of pollutants (emissions and
inmissions) in the past can be used to estimate how they will behave in
the future. Models built like that are known as statistical models.
l.
Years ago, the application field of both groups of models was not very clear.
However in the last years a clear trend can be detected in the purpose for which
models are built. While cause-effect models are now being used mainly for
evaluation of the long-term effects of the application of different policies in a
given area, the statistical models based on multiple linear regression or neural
networks are now being used for short-term, real time forecasting. This work is
based on this approach and uses current and past values of air pollution,
meteorology and traffic to forecast forthcoming ozone and NO2 levels in the area
of Bilbao. The statistical tool used has been Multiple Linear Regression.
3 Methodology
Historical records of ozone, NOz, meteorology and traffic were available
corresponding to years 1993 and 1994. Data of year 1993 were used to fit the
equations of the model and data of year 1994 were used to test the model.
Transactions on the Built Environment vol 64, © 2003 WIT Press, www.witpress.com, ISSN 1743-3509
The core of the model is a set of equations relating through multiple linear
regression forthcoming hourly levels of ozone and NO2 on the one hand and
current and past values of ozone, NO2, meteorology and traffic on the other hand.
The forecasts were obtained from l hour up to 8 hours ahead for ozone and NO2
and for four locations in the area of Bilbao. That means that 64 equations are the
core of the model to make predictions in the area of Bilbao. The equations for
the calculation of the ozone levels H hours ahead at the location L include as
independent variable - appart from current and past values of ozone, NOz,
meteorology and traffic- the forecast of NO2 H hours ahead. The name of the
locations are Elorrieta, Deusto, Mazarredo and Txurdinaga, labelled as 1, 2, 3
and 4 in table 1.Locations were Also the equations for the calculation of NO2
levels H hours ahead at the location L include as independent variable the
forecast of ozone H hours ahead with H ranging form l to 8. Therefore,
prediction of ozone and NO2 levels H hours ahead is made solving jointly a two
equation system with two unknowns (ozone and NO2 levels H hours ahead). If
current time is T and predictions are made H hours ahead H = (1, 2, ....8) using
past values until K hours back with K = (0, 1,2, ....15).
This can be seen in equations 1 and 2 which represent the pair of equations used
to jointly forecast ozone and NO2 levels at a given location L, H hours ahead.
MET represents the meteorological variables used (temperature, radiation,
humidity) and TRAF the traffic variables. 140 sensors located under the streets
measure traffic in Bilbao and yields two values every ten minutes at each sensor:
the number of vehicles circulating (NV) and the percentage of time that a vehicle
is occupying the fraction of street above the sensor, that is occupation percentage
(OP). This gives an idea of the fluecy of traffic. The ratio between number of
vehicles and occupation percentage yields and idea of the mean traffic speed in
Bilbao (TS). For this work, hourly values of NV, OP and TS have been used and
in equations 1 and 2 are generally named as TRAF.
The equations were build with data of year 1993 (validation data set) and used to
forecast data of year 1994 (test data set). In all cases, stepwise regression and
tolerance filtering were used so that meaningful variables were chosen to build
the equations. In previous works [2], [3], [4] it was seen that using data of 1 year
to fit the equations that will be used the next year yielded better results that
recalculating the coefficients at each step with most recent data and used the new
equations to estimate the next forecasts. The reason is that the model needs to
"learn" with at least one year's data to capture the most relevant features of the
past behaviour of pollutants. The obtained coefficients and chosen variables were
Transactions on the Built Environment vol 64, © 2003 WIT Press, www.witpress.com, ISSN 1743-3509
238
Urban Tramport and the Ei~~~~,nrzrnatlr
it? rha 21st Canrurv
all congruent with known mechanisms involved in the formation of
photochemical smog. The meteorological variables chosen were in all the cases
current and past values of temperature and radiation. The traffic variables were
responsible for approximately 5% of the overall variability in the equations. In
most cases, in the prediction at time T+H of ozone NO2 at time T+H turned out
to be the most relevant variable and a for NO2 at time T+H ozone at time T+H it
also was.
4 Results
As mentioned before, the obtained equations were applied to year 1994. All the
results have been compared with the simplest forecast: persistance of levels and
can be seen in table la-lb-lc.
T o estimate the goodness of the model the classical parameters R, NMSE, FA2,
FB and FS were used following the recommendations of the EU [5], [6].Except
for NO2 at time Ti-1. at two locations the models perform significantly better
than persistance of levels.
Table la. Forecasting results in Bilbao
I
Transactions on the Built Environment vol 64, © 2003 WIT Press, www.witpress.com, ISSN 1743-3509
Tabla lc. Forecasting results in Bilbao.
2 103 1 8
PERSISTANCE
3 I NO2 8
PERSISTANCE
3 103 1 8
PERSISTANCE
4 I NO2 I 8
PERSISTANCE
03
8
4
PERSISTANCE
1
1
1
1
1
1
.418
,166
,426
,150
.386
.l58
.472
.262
,326
,165
.66
1.15
1
1
1
.l9
.40
.78
1.04
.l9
.32
.60
.79
1
1
1
.489
.433
.B36
.707
S15
.497
,817
.740
,550
.499
1
1
1
1
.040
-.066
-.058
-.001
,155
-.l23
-.095
.051
,204
-.003
1
I
1
.373
-.035
,366
-.061
,357
-.010
.l28
.023
,518
.004
!
Transactions on the Built Environment vol 64, © 2003 WIT Press, www.witpress.com, ISSN 1743-3509
240
Urban Tramport and the E i ~ ~ ~ ~ , n r ~ rit?n arha
t z r21st Canrurv
A graphical application has been developed to graphically display the results
using spatial interpolation. After analyzing data the interpolation technique
chosen has been kriging. The program automatically reads data, interpolates and
displays them graphically (figure 1-2-3).
Figures 1-2-3. Forecasting sequence.
OZONE FORECASTING 4 n o U R S AHEAD. BILBAO.
ShTURDAY I S 1 OF JANUARY. 0 2 0 0
OZONE FORECADTIN0 4 HOURS
*HEAD. B1111AO.
S A T U R O A l i S T O F JANUARY. 0l:DD
Transactions on the Built Environment vol 64, © 2003 WIT Press, www.witpress.com, ISSN 1743-3509
5 Conclusions
Application of the models to the test sample (year 1994) led to the results of
tables la-lb-lc. Having joint measurements of 03.NOz, meteorological variables
and traffic at a given location, it is possible to use MLR to provide the network
with forecasting capabilities of 03,NO2 at that location. MLR equations can be
easily calculated and incorporated to the network management activities.
Although the coefficients of the equations are likely to be influenciated by local
conditions, the methodology is easily aplicable to any air pollution network and
can be used as an easy-to-use and simple tool. Due to the large amount of cases
used to draw these conclusions they can be considered to be robust enough. The
quality of the predictions is at least, as good as that from much more
sophisticated and expensive models. Computational needs and implementation
costs are small since it can be run on a PC and calculation time is low. The
network gains in forecasting capabilities up to 8 hours ahead only at the location
where the mentioned variables are measured jointly. The model is intended to
forecast photochemical smog levels 8 hours ahead. The only interest of
predictions between 1 and 7 hours is, not only to give an estimation of the levels
S hours ahead but also an hourly description of the whole episode. Graphical
display of a whole episode can be implemented.
Intensive application of this strategy to all the interesting locations in the
network can spatially cover several areas of interest for prognostic purposes
through spatial interpolation being kriging an appropiate technique. This
approach can be used as an inexpensive and useful element in the air quality
management of an area where a network exists.
Acknowledgements
This work was performed under financial support of the University of the
Basque Country, UPV Euskal Herriko Unibertsitatea. The authors wish to thank
the Environmental Department of the Basque Government and the local
municipality of Bilbao for providing with data for this work.
References
[l] Ibarra-Berastegi G., Madariaga I., Elias A., Agirre E, Uria J. (2001) Longterm changes of ozone and traffic in Bilbao. Atmospheric Environment 35,
558 1-5592.
[2] Ibarra-Betastegi, G., Elias A. Albizu MV, Agirre E. (2000) Multiple Linear
Regression modelling for short-term real -time prediction of hourly ozone,
NO2 and NO levels in the area of Bilbao. Application of Computer
Techniques to Environmental Studies. pp. 17-26. WIT Press.
ENVROSOFT 2000. Bilbao.
[3] Ibarra-Berastegi G., Madariaga I., Elias A., Agirre E, Uria J. (2001). Shortterm forecasting of hourly ozone, NO2 and NO levels by means of multiple
Transactions on the Built Environment vol 64, © 2003 WIT Press, www.witpress.com, ISSN 1743-3509
242
Urban Tramport and the E i ~ ~ ~ ~ , n r ~ rit?n arha
t z r21st Canrurv
linear regression modelling. Environmental Science & Pollution Research
4, 250.
[4] Ibarra-Berastegi G., Madariaga I., Elias A., Agirre E, Uria J. (2001). Shortterm forecasting of hourly ozone, NO2 and NO levels by means of multiple
linear regression modelling. Gate to Environmental Health and Science.
2001. June, 1-7.
[5] Hanna, S.R., Strimaitis, D.G. and Chang, J.C. (1991). User's guide for
software for evaluating hazardous gas dispersion models. American
Petroleum Institute. 1220 L. Street, Northwest. Washington. D.C. 20005
[6] European Commission, 1994. The Evaluation of Models of Heavy Gas
Dispersion. Model Evaluation Group Seminar.Office for Official
Publications of the European Communities. L-2985. Luxemburg.