source
OPEN
openair News
The openair Project newsletter
Issue 17, January 2015
Recent news from the openair R
package
This issue covers lots of new developments and
changes to the openair R package. This version of
openair brings some new functionality, bug fixes and
refinements. Users often ask how to adjust the font
size of plots — and this is now possible in all openair
plot functions. A significant change relates to trajectory plotting where different map projections are now
possible.
openair development has now moved to Github for
version control and to allow users to contribute, ask
Contents
Recent news from the openair R package . . .
Development of openair on Github . . . . . . .
1
2
questions and report problems.
This newsletter was produced using R version 3.1.2
and openair version 1.1-4. To update to the most
recent version of openair, type update.packages()
and make sure you are using the most recent version of R. If you have difficulty, start a new R session
(please use the most recent version of R — 3.1.2) and
type install.packages("openair", dependencies =
TRUE).
mailto:[email protected]
General updates to openair . .
Map projections for trajectory
scatterPlot . . . . . . . . . .
Other openair developments .
. . . . . . . . .
analysis and
. . . . . . . . .
. . . . . . . . .
2
4
5
Issue 17, January 2015
2
Development of openair on Github
The development of openair has now moved to Github. The main page for openair is https://github.
com/davidcarslaw/openair. Github (and specifically
git) are used for version control. Together they work
very well together making it much easier to develop R
packages and importantly make it possible for others
to contribute.
One of the advantages of Github is that it has an Issues page where bugs, suggestions or questions can
be asked (see top right of the web page).
The Issues page of Github will now be
the main way in which to ask questions,
raise bug reports and make suggestions in
openair.
The Issues page can be found here https://github.
com/davidcarslaw/openair/issues.
One of the advantages of this approach is that all users
can see the issues raised and get a clear idea of if and
when they will be dealt with.
For those familiar with Github and R, please feel free
to contribute!
General updates to openair
Easier font size control
amount of it is available (such as 75% data capture in
one day).
One of the most requested features of openair is the
ability to control the size of the font. This can be difficult because plots are often comprised of many subcomponents. However, a new option fontsize can
be used with all openair functions that produce a
plot.
polarPlot(mydata, pollutant = "so2", limits = c(0, 5),
col = "jet", fontsize = 18)
mean
>5
N
25
20
15
4
10 wind spd.
5
W
0
E
3
2
S
For the vast majority of regular time series this works
fine. However, for data with very poor data capture or
irregular time series the automatic detection may not
work. Also, for time series such as monthly time series
where there is a variable difference in time between
months users should specify the time interval explicitly e.g. interval = "month". Users can also supply
a time interval to force on the time series.
1
If you seem to get strange results from timeAverage,
try setting the interval first.
0
SO2
In the example below (which is rather extreme), imagine we have an hourly time series with only three
measurements in one year. It is not possible for
timeAverage to know the time series was originally
hourly — there isn’t enough data to work out a common interval. In this case we can tell timeAverage that
the original time series was indeed hourly by setting
interval = "hour". In addition, we might also want
a full year to be shown, so we also set the start.date
and end.date.
Figure 1: Polar plot with scales restricted to 0–5 and
the font size set to 18 pt.
More robust handling time-averaging
Time-averaging data is a very common task when
dealing with atmospheric composition data. However, it can be tricky to work with some data sets —
particularly those with non-regular time intervals e.g.
from a gas chromatograph. In addition, we might also
want to decide to only average data when a certain
openair
The timeAverage function has been refined with a few
more options added. In particular, the new interval option has been added. The timeAverage function
tries to determine the interval of the original time
series (e.g. hourly) by calculating the most common
interval between time steps. The interval is needed
for calculations where the data.thresh >0. This is because in order to calculate the proportion of missing
data it is necessary to know how much data should
be there in the first place.
First, make some data
An R package for air pollution data analysis
Issue 17, January 2015
3
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
dat <- data.frame(date = c(as.POSIXct("2015-03-21", "GMT"),
trendLevel(mydata, pollutant = "no2",
as.POSIXct("2015-03-29", "GMT"),
border = "white", statistic = "max",
as.POSIXct("2015-10-10", "GMT")),
breaks = c(0, 50, 100, 500),
x = 1:3)
labels = c("low", "medium", "high"),
# when combining data like this the tz is dropped, set it to GMT
cols = c("forestgreen", "yellow", "red"),
attr(dat$date, "tzone") <- "GMT"
key.position = "top")
dat
max
##
date x
NO2
## 1 2015-03-21 1
low
medium
high
## 2 2015-03-29 2
## 3 2015-10-10 3
Now average it to monthly
1998
1999
2000
2001
2002
2003
hour
2004
2005
22
20
18
16
14
12
10
08
06
04
02
00
22
20
18
16
14
12
10
08
06
04
02
00
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
timeAverage(dat, avg.time = "month",
start.date = "2015-01-01",
interval = "hour",
end.date = "2015-12-01")
##
date
x
## 1 2015-01-01 NaN
## 2 2015-02-01 NaN
## 3 2015-03-01 1.5
## 4 2015-04-01 NaN
## 5 2015-05-01 NaN
## 6 2015-06-01 NaN
## 7 2015-07-01 NaN
## 8 2015-08-01 NaN
## 9 2015-09-01 NaN
## 10 2015-10-01 3.0
## 11 2015-11-01 NaN
## 12 2015-12-01 NaN
22
20
18
16
14
12
10
08
06
04
02
00
Categorical scales for trendLevel
trendLevel can also be used with user defined dis-
crete colour scales as shown in Figure 2. In this case
the default x and y variables are chosen (month and
hour) split by type (year).
month
Figure 2: trendLevel plot for maximum NO2 concentrations using a user-defined discrete colour scale.
More flexibility adding reference lines
It is now possible to add multiple reference
lines to timePlot, scatterPlot, timeVariation and
smoothTrend — together with full control of their properties. These functions now take options ref.x and
ref.y as lists. A simple example is shown below for
the timePlot function. The help files of these functions
provide further details on usage.
openair
An R package for air pollution data analysis
Issue 17, January 2015
4
O3
−28
−33
−40
−35
−51
−41
100
16
−7
CO
45
54
54
63
82
100
−41
14
4
NOx
62
65
79
70
100
82
−51
8
5
SO2
49
55
58
100
70
63
−35
−1
−1
NO2
58
53
100
58
79
54
−40
6
0
PM2.5
84
100
53
55
65
54
−33
−5
−7
PM10
100
84
58
49
62
45
−28
2
−7
wind spd.
wind dir.
1
O3
100
CO
16
NOx
100
SO2
1
NO2
NO2
60
−7
PM2.5
timePlot(mydata, avg.time = "month", pollutant = "no2",
corPlot(mydata, dendrogram = TRUE)
ref.y = list(h = c(35, 42, 60),
wind dir.
−7
−7
0
−1
5
4
lty = c(1, 3, 5),
col = c("grey30", "forestgreen", "blue"),
wind spd.
2
−5
6
−1
8
14
lwd = c(4, 2, 4)))
50
PM10
40
1998
1999
2000
2001
2002
2003
2004
2005
NO2
Figure 3: Example of the timePlot function with userdefined reference lines.
Dendrogram for corPlot
Dendrograms provide additional information to help
with visualising how groups of variables are related
to one another. Note that dendrograms can only be
plotted for type = "default" i.e. for a single panel
plot.
Figure 4: Example of a correlation matrix showing
the relationships between variables, together with the
dendrogram.
NERC Advanced Short Course on openair
and R
We recently ran a 5-day intensive course for NERC
PhD students hosted at the Wolfson Atmospheric
Chemistry Laboratories at the University of York. The
focus was on analysing atmospheric composition data.
We are seeking more funding from NERC to run the
course again because it was heavily oversubscribed.
In addition, we will now develop plans for other
courses that will be available more widely.
Map projections for trajectory analysis
and scatterPlot
Up until now openair has plotted back trajectories
on a rectangular grid. recent updates have however
made it possible to use different map projections using the mapproj package (which is now required by
openair). By default the map projection used is Lambert, but there are many others available (see ?mapproj). Some of these projections require other parameters to be set. If no parameters are required then
the option parameters = NULL should be used. Other
map projections might require two parameters e.g.
two latitudes, in which case something like parameters = c(40, 50) should be used.
In addition there is also an orientation option that
takes three numbers c(latitude, longitude, rotation) which describes where the ”North Pole” should
openair
be when computing the projection. The setting of
these options will depend on the location of interest
but there is now much more flexibility for plotting
large scale maps.
As mentioned above by default the projection used is
Lambert conformal, which is a conic projection best
used for mid-latitude areas. The hysplit model itself will use any one of three different projections depending on the latitude of the origin. If the latitude
greater than 55.0 (or less than −55.0) then a polar stereographic projection is used, if the latitude greater
than −25.0 and less than 25.0 the mercator projection
is used and elsewhere (the mid-latitudes) the Lambert
projection. All these projections (and many others)
are available in the mapproj package.
An R package for air pollution data analysis
Issue 17, January 2015
Here are a couple of examples.
traj <- importTraj(site = "london", year = 2010)
trajLevel(traj, statistic = "frequency", col = "increment")
5
In the next example we use the Mercator projection,
set the parameters to NULL (Mercator does not need
extra parameters) and adjust the orientation to get a
sensible plot.
# Import trajectories for Hong Kong
traj <- importTraj("hk", 2013)
trajLevel(traj, col = "increment", method = "hexbin",
orientation = c(90, 90, 0),
projection = "mercator", parameters = NULL)
Figure 1: Gridded back trajectory frequencies using
the default Lambert map projection.
Figure 2: Gridded back trajectory frequencies using
the Mercator map projection, with hexagonal binning.
Other openair developments
Below is a summary of the developments to openair
since version 1.0. Note that a full record of all
changes can be found here https://github.com/
davidcarslaw/openair/blob/master/NEWS.md.
• Fix regression for openair methods e.g. affected
plot method for timeVariation subsets
• Check data are numeric before applying running
mean (would crash R if not)
• Begin transition to Github, more details to follow
• Add option dist to scatterPlot for surface modelling
• Make trendLevel colour scaling consistent with
other functions and allow missing data to be
shown in different colour
• Allow categorical scales in trendLevel
• Fix type = ’season’ in trajLevel (winter period
not properly calculated)
• Allow multiple reference lines to be added to
timePlot, scatterPlot, timeVariation and add
to smoothTrend together with full control of their
properties. Note - ref.x and ref.y must now be
lists; see help file for details.
• Don’t open graphics window in aqStats
• Sort out package dependencies etc. to make
maps easier to load
• Add more flexibility to timeAverage for irregular time intervals
• Add fontsize option to all openair plot functions
• Add 12-hour interval points on back trajectory
lines
• Add ref.y option to timeVariation for y references line(s)
• Fix bug in percentileRose with stat = ”cpf” and
non-default type (now uses single percentile
based on all data, not each panel)
• Fix type = ”wd” labelling to corPlot
• Refine airbaseStats to include site type and city
by default
openair
• Check if date is in POSIXlt format and throw
error if TRUE
An R package for air pollution data analysis
Issue 17, January 2015
• Improve date checks in selectByDate
• Add dendrogram option to corPlot (thanks to
James Durant for the suggestion)
• Remove strip in corPlot when type = ”default”
• Remove statistic description in pollutionRose
when annotate = FALSE.
• Fix colour scaling bug in scatterPlot/trajPlot
when user limits supplied
• Check period = ”years” or ”months” in summaryPlot; some users supplied ”year” resulting
in incorrect statistics
• Give mean and percent calm in pollutionRose
when statistic = ”prop.mean” (was erroneously
percentage)
• getMet function for downloading Hysplit met
files in manual did not download as binary files;
now corrected
• Add name.pol argument to smoothTrend for
more control over names used for plotting
6
• Show first few dates when import fails to apply
correct date format (helps to provide a clue as
to actual date format)
• Updates to trajectory plots to allow for different map projections using the mapproj package
(new dependency)
• Fix trajectory frequency calculation - underestimated frequencies.
openair citation information
The main citation for the openair package is:
Carslaw, D.C. and K. Ropkins, (2012). openair — an R
package for air quality data analysis. Environmental
Modelling & Software. Volume 27–28, 52–61.
Details of how to cite the manual are given in the
manual.
Bibliography
Carslaw, D. C., S. D. Beevers, K. Ropkins and M. C. Bell (2006). “Detecting and quantifying aircraft and
other on-airport contributions to ambient nitrogen oxides in the vicinity of a large international airport”. In:
Atmospheric Environment 40.28, pp. 5424–5434.
Carslaw, D. C. and S. D. Beevers (2013). “Characterising and understanding emission sources using bivariate
polar plots and k-means clustering”. In: Environmental Modelling & Software 40, pp. 325–329. doi: 10.1016/j.
envsoft.2012.09.005.
Uria-Tellaetxe, I. and D. C. Carslaw (2014). “Conditional bivariate probability function for source identification”. In: Environmental Modelling & Software 59, pp. 1–9. doi: 10.1016/j.envsoft.2014.05.002.
openair
An R package for air pollution data analysis
© Copyright 2026 Paperzz