Non-stationary Cross-Covariance Models for

Scandinavian Journal of Statistics, Vol. 38: 726–747, 2011
doi: 10.1111/j.1467-9469.2011.00751.x
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics. Published by Blackwell Publishing Ltd.
Non-stationary Cross-Covariance Models
for Multivariate Processes on a Globe
MIKYOUNG JUN
Department of Statistics, Texas A&M University
ABSTRACT. In geophysical and environmental problems, it is common to have multiple variables
of interest measured at the same location and time. These multiple variables typically have dependence over space (and/or time). As a consequence, there is a growing interest in developing models for
multivariate spatial processes, in particular, the cross-covariance models. On the other hand, many
data sets these days cover a large portion of the Earth such as satellite data, which require valid
covariance models on a globe. We present a class of parametric covariance models for multivariate
processes on a globe. The covariance models are flexible in capturing non-stationarity in the data
yet computationally feasible and require moderate numbers of parameters. We apply our covariance model to surface temperature and precipitation data from an NCAR climate model output.
We compare our model to the multivariate version of the Matérn cross-covariance function and
models based on coregionalization and demonstrate the superior performance of our model in terms
of AIC (and/or maximum loglikelihood values) and predictive skill. We also present some challenges
in modelling the cross-covariance structure of the temperature and precipitation data. Based on
the fitted results using full data, we give the estimated cross-correlation structure between the two
variables.
Key words: cross-covariance model, linear model of coregionalization, multivariate process,
non-stationary process, process on a globe
1. Introduction
Geophysical or environmental problems routinely involve multiple variables measured at the
same spatial location and time point. Often, the main interest is to study the relationships
between the multiple variables, which would include an accounting of any spatial and/or temporal correlations. On the other hand, with the advance of science and technology, it is common to have data with global coverage. One good example is the study about the relationship
between surface temperature and precipitation (Trenberth & Shea, 2005; Tebaldi & Lobell,
2008; Tebaldi & Sansó, 2009). As stated in Tebaldi & Lobell (2008), from the climate impact
research point of view, studying the joint distribution of surface temperature and precipitation is more interesting than studying each variable separately. Trenberth & Shea (2005) estimate the empirical (spatial) cross-correlation between surface temperature and precipitation
using the numerical model outputs from the Community Climate System Model version 3
(CCSM3) developed by the National Center for Atmospheric Research (NCAR). However,
their estimates are based on sample correlations and, as several authors point out including Bishop & Hodyss (2007), sample correlations often give spurious correlations when the
dimension of the system is much larger than the sample size. Therefore, it is essential to
develop joint models for the variables that can account for not only the marginal but also
the cross-covariance structures and that are valid over the whole globe.
A number of authors have developed cross-covariance models for multivariate spatial processes. One of the most traditional methods is the linear model of coregionalization (LMC)
(Goulard & Voltz, 1992; Wackernagel, 2003) and the key idea is to represent each process as
a linear combination of latent, independent and stationary (often isotropic) processes.
Scand J Statist 38
Non-stationary cross-covariance models
727
Schmidt & Gelfand (2003) present a Bayesian stationary cross-covariance model based on
the idea of the LMC. Gelfand et al. (2004) provide a good review of the history of the
methods for multivariate processes and Schmidt & Gelfand (2003) extend the model by
using a spatially varying LMC to account for non-stationarity. Majumdar & Gelfand (2007)
present an approach to modelling stationary processes based on convolving covariance
functions. This model is then extended to non-stationary processes in Majumdar et al. (2010).
A semiparametric approach to modelling multivariate spatial processes is proposed in Reich
& Fuentes (2007). Their covariance model is non-stationary but has a separable structure;
the cross-covariance is factored into a multivariate component and a spatial component,
which may be limiting in some situations. Choi et al. (2009) use a spatio-temporal version of
the LMC model to deal with speciated fine particles over the US with separable covariance
functions.
In terms of developing parametric classes of covariance models for multivariate spatial processes, other than those based on LMC, there has only been a few papers. Apanasovich &
Genton (2010) propose using latent dimensions to create a valid covariance model for multivariate processes from a covariance model for univariate process. They present the model for
spatio-temporal processes. Their method is convenient to produce valid covariance models
for multivariate processes but they assume stationarity (although in principle their method
can be used for non-stationary processes). They introduce a concept of ‘distance’ between
different processes, which is different from the usual spatial distances or temporal lags and
it is not clear what this distance actually means and how it compares with physical distances
in the space and time domains. Under their setting, one needs to estimate this distance along
with covariance parameters. Gneiting et al. (2010) present a Matérn type covariance model
for multivariate processes. Their model is for isotropic processes only. One of the nice
features of their model is it allows different smoothness for different processes in the multivariate setting, which can be useful for some data. Note though that their cross-covariance
model is symmetric. That is, if we consider a bivariate process, (Z1 , Z2 ), on locations s1
and s2 , their model implies cov{Z1 (s1 ), Z2 (s2 )} = cov{Z2 (s1 ), Z1 (s2 )}, for all s1 and s2 , which
may not be the case in many geophysical and environmental data sets. The co-located
correlation parameter , which controls the strength of cross-correlation between the two
variables, is constant over the entire domain and this may be too restrictive for some data
sets. As demonstrated in section 4, for the data set that we consider in this article, this limitation leads to an estimate of ˆ ≈ 0, even though there is a clear dependence between the
two variables. Furthermore, the above covariance functions are designed for stationary (or
isotropic) processes and none are for processes on a globe. To the best of our knowledge,
there is no such flexible non-stationary cross-covariance function for spatial processes on a
sphere.
Our focus in this article is to develop cross-covariance functions for multivariate processes
on a globe. Moreover, the covariance model is flexible enough to capture non-stationarity
in the data and other complex covariance patterns. Our covariance models require moderate
numbers of covariance parameters and are thus computationally feasible. The remainder of
the article is organized as follows. In section 2, we discuss some properties of non-stationary
covariance structure. Section 3 presents the construction of our covariance model and discusses some computational issues. The application to the joint modelling of global surface
temperature and precipitation data is presented in section 4. Section 4 shows a comparison
of our model with some models proposed in Gneiting et al. (2010) and the LMC models. We
also show our estimated cross-correlation between the surface temperature and precipitation
and we compare it with the result of Trenberth & Shea (2005). We conclude the article with
some discussion in section 5.
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
728
M. Jun
Scand J Statist 38
2. Non-stationary covariance structure
In this section, we explore some prevalent features of non-stationarity in the covariance structure of geophysical processes on a globe. We discuss those properties for both marginal and
cross-covariances in parallel and show some empirical figures from our data (for details of the
data, see section 4.1). Throughout the section, we denote the variable of interest as (Zi (L, l),
i = 1, 2, . . ., N), a multivariate process on the surface of a globe S 2 (the surface of a sphere
in R3 with radius R) and we illustrate for the case when N = 2. Note that L and l denote
latitude and longitude, respectively.
2.1. Dependence on latitude
It is common for geophysical processes on a globe to have covariance structure depending on
latitude (Stein, 2007). In particular, the local variation of the process usually changes with
latitude and in fact, Jun & Stein (2008) show that variances of several linear combinations
of total column ozone level exhibit strong dependence on latitude. Surface temperature and
precipitation data that we consider in this article possess this kind of non-stationarity for
both marginal and cross-covariances. Figures 1–3 in Trenberth & Shea (2005) show that the
standard deviations for both variables as well as their cross-correlations have patterns depending on latitude.
Figure 1 of this article displays the standard deviations and cross-correlations of surface
temperature and precipitation, averaged over November to March each year and then averaged over 1970 to 1999. Figures (A), (C) and (E) give each quantity with respect to latitude,
that is, at each latitude, the standard deviation or cross-correlation of the data across all
longitude values is calculated. Figures (B), (D) and (F) give the standard deviation or crosscorrelation of the data calculated across all latitude values, at each longitude. See Appendix
for more details on how these values are calculated. Notice that although we see some dependence of standard deviation and cross-correlation with respect to longitude, the dependence is
more obvious with respect to latitude. This may suggest the processes are reasonably modelled
as axially symmetric (Jones, 1963) both marginally and jointly; the covariance structure is
stationary with respect to longitude and non-stationary with respect to latitude.
2.2. Longitudinal reversibility
We say a univariate process Z1 is longitudinally reversible if
cov{Z1 (L1 , l1 ), Z1 (L2 , l2 )} = cov{Z1 (L1 , l2 ), Z1 (L2 , l1 )}
for all L1 , L2 , l1 , l2 (Stein, 2007). Stein (2007) and Jun & Stein (2008) show that the total
column ozone process is longitudinally irreversible; we find that it is also the case for both
temperature and precipitation data marginally (not shown). This concept can be applied to
cross-covariances as well. Call the cross-covariance of the two processes Z1 and Z2 longitudinally reversible if cov{Z1 (L1 , l1 ), Z2 (L2 , l2 )} = cov{Z1 (L1 , l2 ), Z2 (L2 , l1 )} for all L1 , L2 , l1 , l2 .
For some data sets, however, longitudinally irreversible cross-covariances may be hard to
estimate. Figure 2(A) shows the empirical estimate of the difference of the cross-correlations
at a latitude band,
cor{Z1 (L, l), Z2 (L, l + )} − cor{Z1 (L, l + ), Z2 (L, l)},
against latitude (x-axis) and longitude lag, (y-axis). The empirical estimate for the above
quantity is based on temporally averaged data, a 30-year average (see section 4.1 for details
on how we aggregate the data temporally). To get these empirical estimates, first we bin the
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
Non-stationary cross-covariance models
Scand J Statist 38
Standard deviation of Z1
2.5
2.0
1.5
Standard deviation
0.5
0.5
1.0
Standard deviation
1.0
1.5
2.0
2.5
3.0
B
3.0
Standard deviation of Z1
A
−50
0
50
0
50
100
Latitude
200
250
300
350
D
300
350
300
350
2.5
Standard deviation of Z2
1.5
1.0
0.5
0.0
0.0
0.5
Standard deviation
2.0
Standard deviation of Z2
−50
0
50
0
50
100
Latitude
150
200
250
Longitude
Cross correlation of Z1 & Z2
0.4
0.2
0.0
Correlation
–0.6
–0.6
–0.2
Correlation
–0.2 0.0 0.2
0.4
0.6
F
0.6
Cross correlation of Z1 & Z2
E
150
Longitude
Standard deviation
1.0
1.5
2.0
2.5
C
729
–50
0
50
Latitude
0
50
100
150
200
250
Longitude
Fig. 1. Comparison of empirical covariances/correlations (dots) and corresponding fitted values (solid
lines) from the model NMG2. Fitted values are calculated using the covariance parameter estimates
from section 4.4. Temperature process is denoted by Z1 and precipitation process is denoted by Z2 .
For details on how the empirical and fitted values are calculated, see Appendix.
latitude with the bin size roughly 7◦ . Then within each bin, for each , we calculate the correlations between the two variables at the longitude lag of . To assess the uncertainty of the
empirical longitudinal irreversibility, we also split the total 30-year period into 30 intervals
of one year and instead of calculating irreversibility using a 30-year average, we calculate the
irreversibility based on 30 annual averages. The mean and the standard deviation of these irreversibility quantities based on these 30 data points (annual averages) are given in (B) and (C),
respectively. Note (A) and (B) give quite similar patterns although the range of irreversibility
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
M. Jun
Scand J Statist 38
B
C
0.4
150
0.5
0.5
0.4
0.0
100
0.0
100
0.2
100
Longitude lag
150
A
150
730
0.3
0.2
50
–0.5
50
50
–0.2
0.1
–0.4
0.0
–50
–50
0
50
Latitude
D
–50
0
50
Latitude
0
Latitude
50
E
0.02
150
150
1.0
0.5
100
0.0
100
Longitude lag
0.00
–0.02
–0.5
–0.04
50
50
–1.0
–1.5
–0.06
–50
0
Latitude
50
–50
0
Latitude
50
Fig. 2. (A): Empirical estimate of a longitudinal irreversibility, cor{Z1 (L, l), Z2 (L, l + )} − cor{Z1 (L, l+),
Z2 (L, l)}, based on a 30 year average; (B): similar quantity as (A) except that here longitudinal irreversibility is based on 30 annual averages and the mean of these 30 irreversibility quantities is displayed;
(C): standard deviation of the 30 irreversibility quantities; (D): Fitted longitudinal irreversibility using
OLS; and (E): Fitted longitudinal irreversibility using MLE. Here Z1 is the temperature process
and Z2 is the precipitation process. The units for both axes are degrees and the fitted values in (E)
are from the estimates given in Table 5. For details about how we get the values in (A)–(C), see
section 2.2. For details about how we get the OLS estimates, see section 4.4.
in (B) is narrower. Although there seems to be some strong longitudinal irreversibility in the
cross-correlation near the poles and mid latitude of Southern Hemisphere at large longitudinal lags, the uncertainty associated with it (especially near the poles) is high. It might be hard
to fit the pattern with any smooth function of latitude and longitude lags due to the complex
nature of the empirical irreversibility surface. See section 4 for more discussion on this issue.
2.3. Asymmetry
We now define a general concept of asymmetry for multivariate spatial processes. We call the
cross-covariance of Z1 and Z2 symmetric if
cov{Z1 (L1 , l1 ), Z2 (L2 , l2 )} = cov{Z1 (L2 , l2 ), Z2 (L1 , l1 )}
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
Non-stationary cross-covariance models
b1 = –1 b2 = 0.1
10
20
30
40
Longitudinal lag (degree)
0
10
20
30
40
Longitudinal lag (degree)
b1 = –1 b2 = –0.1
0.0
–0.4
G
10
20
30
40
Longitudinal lag (degree)
b1 = 1 b2 = –0.1
0
H
10
20
30
40
Longitudinal lag (degree)
b1 = –1 b2 = –0.1
0.0
–0.4
–0.4
–0.4
0
0
0.0
0.4
b1 = –1 b2 = 0.1
D
0.4
F
10
20
30
40
Longitudinal lag (degree)
0.0
Irreversibility
–0.4 0.0
0.4
b1 = 1 b2 = 0.1
0
0.4
E
10
20
30
40
Longitudinal lag (degree)
b1 = 1 b2 = –0.1
–0.4
–0.4
0
C
731
0.4
B
0.0
a2 = 1
a2 = 10
0.4
a2 = 0
a2 = 0.1
0.0
b1 = 1 b2 = 0.1
0.4
A
Irreversibility
–0.4 0.0
0.4
Scand J Statist 38
0
10
20
30
40
Longitudinal lag (degree)
0
10
20
30
40
Longitudinal lag (degree)
Fig. 3. Longitudinal irreversibility in the correlation scale, cor{Z1 (L1 , l1 ), Z2 (L2 , l2 )} − cor{Z1 (L1 , l2 ),
Z2 (L2 , l1 )}, against l2 − l1 . We set L1 = 0◦ , L2 = 10◦ N, = 1, = 2000 (Km) and a1 = 1. Solid line gives
the irreversibility value when a2 ≥ 0 and dotted line gives the value when a2 < 0 (same colour gives the
same magnitude of a2 ). Top row (Figs (A)–(D)) gives the value when = 5 and the bottom row (Figs
(E)–(H)) gives the value when = 1.5. Grey line gives the horizontal zero line.
for all L1 , L2 , l1 , l2 . If the cross-covariance structure is asymmetric, then the cross-covariance
matrix of some set of observations will generally be asymmetric. Note that the model
proposed in Gneiting et al. (2010) is always symmetric. The model from the linear model
of coregionalization is also symmetric unless the coefficients vary spatially. Apanasovich &
Genton (2010) present cross-covariance models that is asymmetric in space-time domain,
which is somewhat different from the asymmetry discussed in this article.
3. Methodology
In this section, we develop joint covariance models that can exhibit the non-stationary properties discussed in section 2 for not only marginal but also cross-covariance structure. We also
discuss computational methods that enable us to compute full likelihoods efficiently when we
have large global data sets on a regular grid for multivariate processes.
3.1. Model
Jun & Stein (2007) proposed an approach to produce non-stationary covariance models for
a univariate process on a globe to capture space-time asymmetry, which is commonly found
in environmental data (Gneiting, 2002; Jun & Stein, 2004; Li et al., 2008). The key idea is
to apply differential operators with respect to latitude, longitude and time to an isotropic
spatio-temporal process and section 4 of Jun & Stein (2007) demonstrates the effectiveness
of the approach in capturing such space-time asymmetry. Jun & Stein (2008) further explore
the idea of applying differential operators with respect to latitude and longitude to an isotropic spatial process to represent various non-stationary properties of univariate process on a
globe. They demonstrate that their model captures small scale variation in the process for
a univariate process well. The key in the model is the flexibility, resulting from the products of first order differential operators with respect to latitude and longitude, applied to
an underlying process. We extend this idea for multivariate spatial processes on a globe.
Now we show how the idea of applying differential operators to the processes can be
applied to multivariate isotropic spatial processes to create non-stationary cross-covariance
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
732
M. Jun
Scand J Statist 38
structure, in particular, asymmetry and longitudinal irreversibility, which depends on latitude in a flexible way. Suppose we have a multivariate spatial process (Z1 (L, l), . . ., ZN (L, l)),
defined on a globe, S 2 and we are interested in modelling the joint distribution of Zi s. We
will focus on the case that N = 2 here; for the case that N > 2, the method extends in a natural
way. Let us assume (Z1 , Z2 ) is a bivariate process with mean zero. We also assume that Zi s
are axially symmetric both marginally and jointly. Let us write Y = G(, , ) if the process
Y defined on S 2 has mean zero and its covariance is given by a Matérn covariance function:
d
d
cov{Y (L1 , l1 ), Y (L2 , l2 )} = K (L1 , L2 , l1 − l2 ) = K
.
(1)
Here, K denotes the covariance function of Y , Li s are latitude values and li s are longitude
values (i = 1, 2). The parameters, , , > 0, are the sill, spatial range, and the smoothness
parameters for a Matérn class, respectively, K is the modified Bessel function, and
1/2
L1 − L2
l1 − l2
+ cos L1 cos L2 sin2
d = d(L1 , L2 , l1 − l2 ) = 2R sin2
2
2
(2)
denotes the chordal distance between the two locations, (L1 , l1 ) and (L2 , l2 ). To ensure the
positive definiteness of (1) on S 2 , we need to use chordal distance as a spatial metric instead
of a geodesic distance (see Jun & Stein (2007) for a detailed discussion).
Let us first consider a simple setting:
Zi (L, l) = ai {Y (L + , l) − Y (L, l)} + bi {Y (L, l + ) − Y (L, l)}, i = 1, 2,
(3)
with ai , bi constants and > 0. When ≈ 0, (3) is essentially equivalent to the model with
differential operators with respect to latitude and longitude applied to the process Y in the L2
sense (instead of taking differences).
It is easy to see from (3) that when → 0, the cross-covariance of Z1 and Z2 can be written
as
∂ ∂
∂2
K (L1 , L2 , l) − b1 b2 2 K (L1 , L2 , l)
∂L1 ∂L2
∂l
∂ ∂
∂ ∂
− a1 b2
K (L1 , L2 , l) + b1 a2
K (L1 , L2 , l),
∂L1 ∂l
∂L2 ∂l
cov{Z1 (L1 , l1 ), Z2 (L2 , l2 )} = a1 a2
(4)
with l = l1 − l2 . Note that the second order partial derivatives of K in (4) originate from the
limits of the second order differences of the covariance K. For example,
1
∂ ∂
K (L1 , L2 , l) = lim 2 {K (L1 + , L2 + , l) − K (L1 + , L2 , l)
→0 ∂L1 ∂L2
− K (L1 , L2 + , l) + K (L1 , L2 , l)}.
Furthermore, to have the limit properly defined, we need to have > 1. For more details
on this condition, see section 2 of Stein (1999) and the result in Jun & Stein (2007).
Now let us consider the longitudinal irreversibility and the asymmetry discussed in sections
2.2 and 2.3. It is straightforward from (4) that when → 0, the longitudinal irreversibility is
given by
cov{Z1 (L1 , l1 ), Z2 (L2 , l2 )} − cov{Z1 (L1 , l2 ), Z2 (L2 , l1 )}
∂ ∂
∂ ∂
= −2a1 b2
K (L1 , L2 , l) + 2b1 a2
K (L1 , L2 , l),
∂L1 ∂l
∂L2 ∂l
and the asymmetry is given by
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
(5)
Scand J Statist 38
Non-stationary cross-covariance models
cov{Z1 (L1 , l1 ), Z2 (L2 , l2 )} − cov{Z1 (L2 , l2 ), Z2 (L1 , l1 )}
∂ ∂
∂ ∂
= (−a1 b2 + b1 a2 )
K (L1 , L2 , l) +
K (L1 , L2 , l) .
∂L1 ∂l
∂L2 ∂l
733
(6)
From (5) and (6), it is clear that the types of non-stationary in the cross-covariance discussed in sections 2.2 and 2.3 are achieved mainly from the interactions between the first and
second order differences of Y in (3). If a1 = a2 and b1 = b2 , the asymmetry in (6) reduces to
zero, although the longitudinal irreversibility in (5) may not be zero.
We show some plots of the cross-covariance structure for various values of ai s and bi s to
further demonstrate the behaviour of the proposed covariance model in a simple setting. We
work with correlation scale instead of the covariance scale. We fix = 1, a1 = 1 and = 2000
(Km). We vary a2 , b1 , b2 and to explore the non-stationarities of the covariance model of
(3) when ≈ 0.
Figure 3 gives the longitudinal irreversibility in the cross-correlation structure given in (5).
The irreversibility in the covariance scale should have the same shape except the scale. Notice
that different values give different shapes of the irreversibility. From (5), the irreversibility
depends on ai s and bi s through a1 b2 and b1 a2 . If the signs of a1 b2 and b1 a2 change together,
the sign of irreversibility should also change. Therefore, we see that several sets of the ai s
and bi s give either the same irreversibility or the same magnitude of irreversibility with different signs. For example, when a2 = 0, the pairs, (A) and (B), (C) and (D), (E) and (F), and
(G) and (H), give the same irreversibility. Moreover, when a2 = 0, the irreversibilities in (A)
and (B) and those in (C) and (D) have same magnitudes but different signs. When a2 = 0,
the pairs, (A) and (D), (B) and (C), (E) and (H), and (F) and (G), give the same magnitude
of the irreversibility with different signs. It may then appear that there are some identifiability problems in ai s and bi s since some sets of these coefficients give the same irreversibility curves. However, this is not the case for the covariance structure of the bivariate process.
As long as we fix the sign of only one of the four coefficients (ai s and bi s), we can avoid the
identifiability problem (see (4)).
Figure 4 gives the asymmetry against longitudinal lags in the cross-correlation structure
given in (6). Unlike the longitudinal irreversibility, the asymmetry in the covariance scale
may have different shape than the asymmetry in the correlation scale. This is because the
asymmetry in the correlation scale is
cor{Z1 (L1 , l1 ), Z2 (L2 , l2 )} − cor{Z1 (L2 , l2 ), Z2 (L1 , l1 )}
cov{Z1 (L2 , l2 ), Z2 (L1 , l1 )}
cov{Z1 (L1 , l1 ), Z2 (L2 , l2 )}
=
−
,
var{Z1 (L1 , l1 )}var{Z2 (L2 , l2 )}
var{Z1 (L2 , l2 )}var{Z2 (L1 , l1 )}
(7)
and the terms in each denominator are in general not the same except the case such as
b2 = a2 b1 (a2 = 0), for any L1 , L2 , l1 , l2 . Therefore, the asymmetries in the correlation scale in
general do not simply depend on the coefficients through a1 b2 − b1 a2 but in a more complex
manner. For example, when a2 = 0, even if (A) and (B) have the same b2 value, their asymmetries are different. When a2 = ±0.1 and b2 = a2 b1 , the asymmetry is zero. When a2 = ±0.1
and b2 = −a2 b1 , then we may get the same magnitude of the irreversibility but with different
signs. For example, the pairs, (B) and (C) with a2 = 0.1 and (A) and (D) with a2 = −0.1,
give the same magnitude of the irreversibility with different signs. It is interesting to note
that even if l1 − l2 = 0, the asymmetry for some combinations of the coefficients is not zero
(because L1 = L2 ).
Figure 5 (A)–(D) display the asymmetry against latitude. Note the asymmetries are zero
when L2 = 0 and the pairs, (A) and (D) or (B) and (C), give symmetric asymmetry values
around L2 = 0. The fact that the asymmetries are zero when L2 = 0 can also be easily
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
M. Jun
10
20
30
40
F
30
40
10
20
30
40
Longitudinal lag (degree)
0.0
10
20
30
40
0
10
20
30
40
Longitudinal lag (degree)
0
Longitudinal lag (degree)
10
20
30
40
Longitudinal lag (degree)
H
b1 = 1 b2 = –0.1
−0.04
−0.04
0
0
G
b1 = –1 b2 = 0.1
0.02
Asymmetry
−0.04
0.02
b1 = 1 b2 = 0.1
20
Longitudinal lag (degree)
0.02
E
10
−0.4
−0.4
0
Longitudinal lag (degree)
b1 = –1 b2 = –0.1
b1 = –1 b2 = –0.1
0.02
0
a2 = 1
a2 = 10
−0.4
a2 = 0
a2 = 0.1
D
b1 = 1 b2 = –0.1
0.0
0.4
0.0
Asymmetry
−0.4 0.0
0.4
C
b1 = –1 b2 = 0.1
0.4
B
b1 = 1 b2 = 0.1
−0.04
A
Scand J Statist 38
0.4
734
0
10
20
30
40
Longitudinal lag (degree)
0
10
20
30
40
Longitudinal lag (degree)
Fig. 4. Asymmetry in the correlation scale, cor{Z1 (L1 , l1 ), Z2 (L2 , l2 )} − cor{Z1 (L2 , l2 ), Z2 (L1 , l1 )},
against l2 − l1 . We set L1 = 0◦ , L2 = 50◦ , = 1, = 2000 (Km) and a1 = 1. Solid line gives the asymmetry
value when a2 ≥ 0 and dotted line gives the value when a2 < 0 (same colour gives the same magnitude
of a2 ). Top row (Figs (A)–(D)) gives the value when = 5 and the bottom row (Figs (E)–(H)) gives the
value when = 1.5. Grey line gives the horizontal zero line.
explained by (11) (we will discuss this further when we introduce (11) later). Note that the
asymmetries are not necessarily symmetric against the equator, which is realistic for most
real data sets.
We now generalize the model in (3) in the sense that the coefficients are functions of latitude
values. That is, we write:
n ∂
∂
+ Bi, k (L)
Zi (L, l) =
Ai, k (L)
(8)
Yk (L.l) + Ci (L)Y0 (L, l).
∂L
∂l
=
k
1
Here, the partial derivatives are defined in the L2 sense and Yk = G(k , k , k ) (k , k > 0,
k = 0, . . ., n, 0 > 0 and k > 1, k = 1, . . ., n). If k ≤ 1 for k = 1, . . ., n, then the mean square
derivatives of Yk is not properly defined. We assume the Yk ’s (k = 0, . . ., n) are independent
of each other. A useful simplification is to assume Yk share the same covariance parameters
for k = 1, . . ., n, but we generally let Y0 have different covariance parameters than Y1 , . . ., Yn
to allow sufficient flexibility in the local behaviour of the model. Note it is not necessary to
include Y0 in (8). We may then let Ci = 0 for parsimony.
The functions Ai, k , Bi, k and Ci in (8) are non-random functions and we model these functions as linear combinations of Legendre polynomials. For instance, we let
Ai, k (L) =
m
aikj Pj (sin L),
(9)
j =0
where Pj denotes the Legendre polynomial of order j. Then aikj ∈ R are additional covariance
parameters to be estimated along with other covariance parameters. The maximum order of
Legendre polynomials used here, m, is chosen arbitrarily and we expect a modest number of
m should be able to produce flexible covariance functions. The values of m for Ai, k , Bi, k and
Ci may be different. Larger m will obviously give more flexibility to the covariance structure and we may let m = 0 for a parsimonious model. We compare the different possibilities
discussed here in section 4.
Although the Yk s are independent of each other, the Zi s have non-zero cross-covariance
and we can get explicit expressions for the cross-covariance of the Zi s. For instance, suppose
n = 1, Y1 = G(1, , ) ( > 1) and Ci = 0 (i = 1, 2). Set h = h(L1 , L2 , l1 − l2 ) = (d/)2 for d defined
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
Non-stationary cross-covariance models
0
50
L2 (degree)
0
50
L2 (degree)
−50
H
0
50
L2 (degree)
b1 = –1 b20 = –0.1
−0.4
0.0
0.4
b1 = 1 b20 = –0.1
0.0
−50
b1 = –1 b2 = –0.1
0.0
0
50
L2 (degree)
−0.4
−0.4
−50
−50
G
D
−0.4
0.0
b1 = –1 b20 = 0.1
0.4
F
0
50
L2 (degree)
0.0
Asymmetry
−0.4 0.0
0.4
b1 = 1 b20 = 0.1
−50
0.4
0
50
L2 (degree)
b1 = 1 b2 = –0.1
−0.4
0.0
a2 = 1
a2 = 10
C
0.4
b1 = –1 b2 = 0.1
−0.4
a2 = 0
a2 = 0.1
−50
E
B
0.4
b1 = 1 b2 = 0.1
Asymmetry
−0.4 0.0
0.4
A
735
0.4
Scand J Statist 38
−50
0
50
L2 (degree)
−50
0
50
L2 (degree)
Fig. 5. (A)–(D): Asymmetry in the correlation scale, cor{Z1 (L1 , l1 ), Z2 (L2 , l2 )} − cor{Z1 (L2 , l2 ), Z2 (L1 , l1 )},
against L2 , when L1 = 0◦ , l1 = 0◦ and l2 = 30◦ . We set = 1, = 2000 (Km), = 5 and a1 = 1. Solid
line gives the asymmetry value when a2 ≥ 0 and dotted line gives the value when a2 < 0 (same
colour gives the same magnitude of a2 ). (E)–(H): Same as (A)–(D) except that we now set
b2 = b2 (L) = b20 (5 + 5P2 (sin L)) for latitude L. Here P2 denotes the Legendre polynomial of order 2.
Grey line gives the horizontal zero line.
in (2), hp = hp (L1 , L2 , l1 − l2 ) = ∂h/∂xp and hpq = hpq (L1 , L2 , l1 − l2 ) = ∂ 2 h/∂xp ∂xq , where x1 = L1 ,
x2 = L2 and x3 = l1 − l2 (see appendix A of Jun & Stein (2007) for explicit expressions for hp
and hpq ). Also let M (x) = x K (x). Then the cross-covariance function of Z1 and Z2 is given
by,
√
√
cov{Z1 (L1 , l1 ), Z2 (L2 , l2 )} = 1 M−2 ( h) + 2 M−1 ( h),
(10)
where 1 and 2 are
1
1 = {A1, 1 (L1 )A2, 1 (L2 )h1 h2 − B1, 1 (L1 )B2, 1 (L2 )h23 − A1, 1 (L1 )B2, 1 (L2 )h1 h3
4
+ B1, 1 (L1 )A2, 1 (L2 )h2 h3 },
and
1
2 = − {A1, 1 (L1 )A2, 1 (L2 )h12 − B1, 1 (L1 )B2, 1 (L2 )h33 − A1, 1 (L1 )B2, 1 (L2 )h13
2
+ B1, 1 (L1 )A2, 1 (L2 )h23 }.
The cross product terms of A1, k or B1, k and A2, k or B2, k in 1 and 2 come from the
covariance of processes with ∂/∂L and ∂/∂l applied in (8) and through the linear combination terms for Ai, k and Bi, k as in (9), the resulting cross-covariance model in (10) achieves
great flexibility and can capture complex non-stationary structure in the data. In particular,
it can be easily shown that for any latitude L, longitude l and longitudinal lag ,
cov{Z1 (L, l), Z2 (L, l + )} − cov{Z1 (L, l + ), Z2 (L, l)}
={B1, 1 (L)A2, 1 (L) − A1, 1 (L)B2, 1 (L)} · 4−2 R2 sin L cos L sin
cos
2
2
1
√
√ 2 −2 2
2
M−2 ( h)4 R cos L sin
− M−1 ( h) ,
×
2
2
(11)
where h = 4−2 R2 cos2 L sin2 (/2). Therefore, by letting Ai, k and Bi, k functions depend on the
latitude, L, the proposed covariance model can produce flexible longitudinal irreversibility.
Equation (11) can also be used to prove the fact that the asymmetries are zero when L2 = 0 in
Fig. 5 (note that when L1 = L2 , the asymmetry reduces to the longitudinal irreversibility).
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
736
M. Jun
Scand J Statist 38
In fact, under the current covariance model, longitudinal irreversibility at the equator is
always zero due to the term sin L in (11). Figure 5 (E)–(F) display the asymmetry when
A1, 1 (L) = 1, A2, 1 (L) = a2 , B1, 1 (L) = b1 , B2, 1 (L) = b2 (L) = b20 (5 + 5P2 (sin L)). These plots demonstrate that by allowing the coefficients Ai, k and Bi, k to depend on the latitude, we get more
flexibility in the resulting covariance structure. It is shown in Jun & Stein (2008) that the
marginal correlation from the model in (8) (with Ci = 0) can be as small as −1. For crosscorrelations, we also achieve the range of −1 to 1. For an extreme example, in (8), suppose
n = 1, A1, 1 = −A2, 1 = 1, B1, 1 = B2, 1 = 0 and C1 = C2 = 0. Then it is easy to see that the crosscorrelation between Z1 and Z2 is −1 everywhere.
The proposed method here has a similar spirit as the LMC model in the sense that each
process Zi is modelled as a linear combination of latent processes. However, the model proposed in this article has several significantly different aspects compared to those of LMC.
In the model, cross-covariance structure is characterized by the first term of (8) and even
when n = 1 with Ci = 0, we do achieve fairly flexible cross-covariance models that we cannot
with the LMC models with more covariance parameters (see section 4.3). The expression
in the summation of (8) may appear to be a linear combination of latent processes, but in
fact the differential operators are defined in the L2 sense. The variations of LMC models
that give non-stationary covariance models such as in Gelfand et al. (2004) achieve the nonstationarity quite differently from the way the models in (8) achieve it. One of the fundamental differences between the approach proposed in this article and the LMC model is that the
differential operators with respect to latitude and longitude are applied to the same process
(Yk in (8)). Suppose we have n = 1 in (8). In the LMC models, they consider linear combinations of independent processes, but the differential operators in (8) effectively evaluate covariances of differences of the same process, Y1 , with small latitudinal or longitudinal lags (see
(3)). Hence, the non-stationarity with respect to latitude not only come from the coefficients
of partial differential operators, Ai, k and Bi, k , but also from the differential operators, ∂/∂L
and ∂/∂l, and the covariance between the processes with each differential operators applied.
We choose to use Legendre polynomials in modelling the coefficients of the differential
operators. It is not clear how we could get empirical estimates of these coefficients from the
data and thus we instead model these coefficients through some orthogonal polynomials of
the latitude. Legendre polynomials in that sense are natural choice since they are orthogonal
over the interval [−1, 1] and thus Pj (sin L)s for −90◦ ≤ L ≤ 90◦ are orthogonal.
There are possible limitations of the model in (8). The first limitation is that each Zi s may
have the same spatial range and smoothness parameter since Zi s consist of the same processes (Yk s). One easy fix of this problem is either by letting Y0 have different covariance
parameter than Yk (k > 0)’s or by adding more terms (processes) in (8) and let these terms
have different covariance parameters. We will explore this issue further for the climatological
application in section 4.
3.2. Computational issues
It is common to estimate the covariance parameters as well as the mean parameters using
maximum likelihood estimation and for that purpose, we from now on assume that the process is multivariate Gaussian. Many spatial data sets these days are of large dimension and
often it can be quite challenging to efficiently compute the full likelihood. For the case of
regularly spaced data, which is usually the case for satellite data and the numerical model
outputs, however, the computation of the exact likelihood can be quite efficient. Jun & Stein
(2008) demonstrate such a method using the Discrete Fourier transform (DFT) for univariate
spatial process. The key idea is the following: since the covariance model is axially symmetric
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
Scand J Statist 38
Non-stationary cross-covariance models
737
and we have regularly spaced longitude values covering full range, the resulting covariance
matrix can be written in a block circulant form. Then using the fact that a block circulant
matrix can be diagonalized by applying the DFT, we can calculate the inverse and the determinant of the covariance matrix efficiently (see Jun & Stein (2008) for more details on how
this works). The same idea can be applied for the cross-covariance matrix. As long as the
multiple spatial processes are on the same longitudinal grids, cover the full longitude range,
and the cross-covariance structure is axially symmetric (note the model in (8) does give an
axially symmetric cross-covariance structure), both the marginal covariance matrix for each
process and the cross-covariance matrix can be diagonalized by applying the DFT. Note Chan
& Wood (1999) consider a multivariate stationary Gaussian random field defined on a rectangular grid in Rd and they apply circulant embedding of a block Toeplitz matrix (Toeplitz
structure comes from the stationarity of the random field) to create a block circulant covariance matrix. Then they perform the DFT to block diagonalize the covariance matrix. Since in
our domain, the block circulant structure of the covariance matrix is naturally given through
regularly spaced longitudinal points with 360◦ coverage, we do not need the step of circulant
embedding.
Suppose we consider a bivariate process (Z1 , Z2 ) observed on a regular grid with p latitude points and q longitude points (longitudinal points must be equally spaced over the full
longitude range). We denote Zi (Lj ) = {Zi (Lj , l1 ), . . ., Zi (Lj , lq )}T for j = 1, . . ., p and F Zi (Lj ) is
the DFT (with respect to longitude) of Zi (Lj ). Then it is well known that the corresponding
covariance matrix of the complex normal vector, F Zi (Lj ), is a diagonal matrix. Although
F Zi (L) is a complex normal random variable, the likelihood of it can be obtained simply by
calculating as if it is a real normal random variable with the appropriate covariance matrix
(Wooding, 1956). Therefore, if we denote
Z*i = {Zi,* 1, 1 , . . ., Zi,* p, 1 , Zi,* 1, 2 , . . ., Zi,* p, 2 , . . ., Zi,* p, q }T
T
T
where Zi,* j, k is the kth element of F Zi (Lj ), then the covariance matrix of {Z*1 , Z*2 }T can be
written as
D1 D12
=
,
D*12 D2
where D1 , D2 and D12 are complex block diagonal matrices with p × p block diagonals and
D*12 is the conjugate transpose of D12 . The determinant of the matrix can be calculated
*
using det() = det(D1 − D12 D−1
2 D12 ) det(D2 ) and the quadratic form in the likelihood can be
efficiently calculated using the fact that
−1
* −1
*
−1
(D1 − D12 D−1
−D−1
2 D12 )
1 D12 (D2 − D12 D1 D12 )
−1
=
.
−1 * −1
*
−1
−D−1
(D2 − D*12 D−1
2 D12 (D1 − D12 D2 D12 )
2 D12 )
Note that the lower off diagonal matrix is the conjugate transpose of the upper off diagonal matrix and the inverses of D1 and D2 can be calculated efficiently since they are block
diagonal matrices with block size p × p. In our application, we have p = 128.
4. Application
4.1. Data
As noted in section 1, the relationship between precipitation and surface temperature has
received a lot of attention by scientists and it is important in the climate impact research area.
We apply our covariance functions developed here to build a joint model between surface
temperature and precipitation; the data originates from one of the numerical model outputs
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
738
M. Jun
Scand J Statist 38
used in Trenberth & Shea (2005), the NCAR CCSM3. The NCAR CCSM3 is one of the
climate models developed by NCAR. Jun et al. (2008) give a more detailed background on this
and other climate models. We look at the 5 months average for Northern winter (November
to March) and we take averages of these over 1970–99 (we call it NDJFM from now on).
The temperature output from this model, CCSM3, has also been analysed by Jun et al. (2008)
but they consider the differences between the observations and numerical model outputs and
they only look at the latitude range of 50◦ S–50◦ N on a coarser grid resolution (of 5◦ × 5◦ ).
We use the numerical model output only (no observations) for the entire globe (full longitude
and latitude ranges) in the original resolution of 256 × 128 (1.40625◦ in both longitude and
latitude). It is common in climate studies to use numerical model outputs rather than observations since observations usually have a large fraction of missing observations (especially near
the poles). For example, Trenberth & Shea (2005) used numerical model outputs only, to study
the relationship between the temperature and precipitation. Note the unit for temperature is K
and the unit for precipitation is K g/(m2 s) (Kilogram per squared meter per second).
Tebaldi & Sansó (2009) deal with multiple numerical model outputs along with observations to build a joint model between surface temperature and precipitation, but their approach
is relatively simple in terms of modelling the cross-covariance structure of the two variables.
In their approach, cross-correlations between the two variables only come from the mean of
the processes, that is, they let the mean of the precipitation be a linear function of the surface temperature. They consider spatio-temporal processes, but in this work, we focus on the
spatial component of the process.
4.2. Model
Model for the mean. The first row of Fig. 6 gives the NDJFM average of temperature and
precipitation data. Since the order of precipitation data is 10−5 , from now on, we multiply
105 to the precipitation data to make it comparable to the surface temperature data. For temperature, it is clear that the mean structure of the field mainly depends on the latitude. For
precipitation, such dependence is not as strong as temperature data and there are places with
large amount of precipitation around the equator. We first filter out the spatial mean structure
using spherical harmonics and work with the residuals. Specifically we use spherical harmonics up to order r = 12 and regress each variable (surface temperature and precipitation) on the
spherical harmonics, {Yrs (sin L, l) | r = 0, 1, 2, . . ., s = −r, . . ., r} for r = 12, separately. The second
row of Fig. 6 gives the estimated mean structure and the third row gives the residuals. Overall,
the estimated mean field removes most of the large-scale spatial patterns in the data.
Model for the covariance. Since our main interest in this article is estimating covariance structure, we focus on fitting the covariance models using the residuals. We fit several covariance
models to the data. We consider a Matérn model in Gneiting et al. (2010), a version of the LMC
model, and a couple of variations of our covariance model developed in section 3.1. Here, Z1
denotes the surface temperature process and Z2 denotes the precipitation process. Note that
these processes are the residuals after filtering out the mean as explained in section 4.2.1.
(i)
(ii)
Matérn model (MAT): we use the parsimonious bivariate Matérn model in Gneiting
et al. (2010). In particular, we let Zi = G(i , , i ), i = 1, 2. The parameter 3 gives the
smoothness for the cross-covariance and by construction, 3 = (1 + 2 )/2. We also have
the co-located correlation coefficient .
LMC model (LMC): we use a version of the LMC model, that is, we set Zi (L, l) =
ai W1 (L, l) + bi W2 (L, l) + ci Ui (L, l), where ai , bi and ci are constants, Wj = G(1, , j )
( j = 1, 2) and Ui = G(1, , i ). We also assume Wj s are independent, Ui s are independent,
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
Non-stationary cross-covariance models
Scand J Statist 38
Temperature (NDJFM)
739
Precipitation (NDJFM)
15
290
50
50
300
280
10
0
0
270
260
−50
−50
5
250
240
0
50
100
150
200
250
300
0
350
50
100
200
250
300
350
Mean
12
290
10
50
300
8
270
6
0
280
260
4
250
2
−50
−50
0
50
Mean
150
240
0
50
100
150
200
250
300
0
350
0
50
100
Residual
150
200
250
300
350
Residual
10
50
50
6
4
5
0
0
2
0
0
−50
−50
−2
−5
−4
0
50
100
150
200
250
300
350
0
50
100
150
200
250
300
350
Fig. 6. Surface temperature and precipitation data used in the analysis. The first row gives original
NDJFM averages, the second row gives the estimated means, and the third row gives the residuals.
The unit for temperature is K and the unit for precipitation is K g/(m2 s). Note that we multiplied 105
to the original data for the precipitation to make the scale comparable to that of temperature.
(iii)
and Wj s and Ui s are independent of each other. Therefore Ui does not contribute to the
cross-covariance structure of the Zi s.
Our covariance model (Non-stationary Multivariate Global model):
(a)
NMG1: we set
∂
∂
=
+ bi
ai
Zi (L, l)
Y (L, l) + ci Ui (L, l).
∂L
∂l
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
740
M. Jun
Scand J Statist 38
Here, Y = G(1, , ), Ui = G(1, , i ), and we assume the Ui s are independent of Y.
Note that > 1.
All of the above models have the property that the processes Z1 and Z2 have the same spatial range parameter. Each of the model’s properties discussed in section 2 are summarized
in Table 1. These models are intentionally set to be relatively simple since they will be used
in section 4.3 with the data over a subregion. We use the following more complex model to
fit the full data in section 4.4.
(b)
NMG2: we let
∂
∂
=
+ Bi (L)
Zi (L, l)
Ai (L)
Y (L, l) + Ci (L)W (L, l) + di Ui (L, l)
∂L
∂l
with Ai , Bi and Ci being defined as in (9) (for instance, Ai (L) = m
j = 0 aij Pj (sin L)). We
let Y = G(1, 1 , 1 ) (1 > 1), W = G(1, 2 , 2 ) and U1 = G(1, 3 , 3 ) (note 2 , 3 > 0). For
the choices of m and di , see section 4.4.
For the models LMC, NMG1 and NMG2, we may have an identifiability problem if we
let all the parameters of the coefficients, ai , bi , ci , and the coefficients of the linear combinations in Ai , Bi and Ci vary in R. To avoid the problem, for LMC model, we take the signs
of a1 , b1 , c1 and c2 positive. For NMG1 model, we take the signs of a1 , c1 and c2 positive.
For NMG2 model, we take the sign of b10 , c20 and d1 positive.
4.3. Fit over subdomains
Even if we use the computational technique through DFT described in section 3.2, using the
full data set for both variables to estimate the covariance structure takes quite some time (the
total data size is 2 × 128 × 256 = 65,536). Therefore, we first choose a subregion over some
parts of North America and fit several covariance models for a quick comparison in terms of
likelihood and prediction accuracy. We perform the prediction on a region disjoint with the
estimation sites, over North America. Figure 7 shows the locations of estimation sites and
the prediction sites. Note that there are 774 estimation sites and 172 prediction sites. For the
fit in this section, since we have a manageable size of the data, we do not use the technique
using DFT. In fact it is not possible since the technique through DFT requires that the data
should cover the entire longitude range.
We compare the covariance models listed in section 4.2.2 (1, 2 and 3(a)) and estimate each
covariance parameter using the maximum likelihood estimation method. We used numerical
optimization (using nlm and optim functions in R and for optim, we use the default
Nelder–Mead algorithm) and tried several starting points. The optimization procedures
reached to the same maximum point for all of the different starting points that were tried.
Table 2 gives the estimated covariance parameter values along with their asymptotic standard
errors. Asymptotic standard errors are obtained from the inverse of the Hessian matrix. The
maximized loglikelihood values for each model and the corresponding AIC values are also
Table 1. Some covariance properties for models fitted in section 4.3
Isotropic
Covariance independent on latitude
Longitudinally reversible
Symmetric
Same smoothness
MAT
LMC
NMG1
Y
Y
Y
Y
N
Y
Y
Y
Y
N
N
N
N
N
N
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
Non-stationary cross-covariance models
Scand J Statist 38
741
Estimation
Prediction
Fig. 7. Locations of estimation sites and prediction sites for the fit over North America in section 4.3.
Table 2. MLE estimates along with their asymptotic standard errors for some
covariance parameters from the fit over North America in section 4.3. The unit
for spatial range parameter () is Km. The estimate for the co-located correlation
parameter for MAT is ˆ = −5.5e − 06
MAT
1
2
1
2
a1
a2
b1
b2
c1
c2
Loglik
AIC
218.67
–
1.18
2.11
–
–
1.41
0.58
–
–
–
–
LMC
(14.73)
(0.058)
(0.073)
(0.11)
(0.11)
965.26
−1918.518
191.23 (12.30)
–
0.93 (0.078)
2.27 (0.13)
2.50 (–)
2.25 (0.20)
0.79 (6.05e-03)
−0.019 (3.13e-03)
0.016 (7.46e-05)
−0.54 (0.11)
0.27 (0.052)
0.11 (0.36)
973.29
−1924.575
NMG1
259.90 (11.94)
2.07 (0.064)
–
–
1.07 (0.045)
2.43 (0.14)
6.47e-04 (3.41e-05)
0.017 (2.01e-03)
0.0032 (2.06e-03)
0.018 (2.21e-03)
1.57 (0.11)
0.17 (0.027)
1052.641
−2085.282
given. First thing to note is that the NMG1 model gives significantly larger loglikelihood
values compared to the other models given comparable number of covariance parameters
(the LMC model has the most covariance parameters). The AIC value for the NMG1 model
is the smallest among the three. The fact that the LMC model, despite having the most
covariance parameters, gives a much smaller loglikelihood value than NMG1 may be a sign
that the non-stationarity, in particular, the dependence of covariance structure on latitudes,
longitudinal irreversibility, and the asymmetry in the data (for marginal and/or crosscovariance structure) are rather strong and the differential operator term in (8) helps to
explain these properties better.
In terms of different smoothness in the two variables, it seems that the precipitation process
is smoother than the temperature process. From the LMC model, the smallest smoothness
parameter value, 1 is shared by the two processes, temperature and precipitation but the
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
742
M. Jun
Scand J Statist 38
coefficient, a1 , is much larger in magnitude than the coefficient, a2 , and also b2 is larger in
magnitude than b1 . Therefore, the roughest process, W1 , mostly contributes to the temperature process and smoother process, W2 , mostly contributes to the precipitation process. From
the NMG1 model, first of all, w1 is smaller than w2 . Note that the effective smoothness of the
process Y is − 1 = 1.07, and thus Y has similar amount of smoothness to the process U1 .
In that sense it is not clear whether precipitation process is smoother than the temperature
process or not. Nevertheless the result in Tebaldi & Sansó (2009) shows that the precipitation
process is smoother than the temperature process.
For the LMC model, the estimate of 1 reached near the upper boundary of the parameter
space and thus we could not obtain its asymptotic standard errors. It is a common practice
to set the range for the smoothness parameter to be (0, 2.5) since the covariance model is not
valid for zero or negative values and for large values we often run into the numerical instability problem. This poor fit may imply that the data do not provide enough information
on this parameter for the particular model of LMC. The model NMG1 does not have this
problem. The estimate for the co-located correlation parameter of the MAT model is almost
zero (ˆ = −5.5e-06), while the empirical cross-covariance estimate is around 0.3.
Figure 8 shows the prediction errors for the covariance models MAT, LMC and NMG1
at the prediction sites. We display the difference between the true and the predicted values
at the prediction sites against latitude. Note that the three models do not show much difference in terms of prediction accuracy for the temperature variable, although we see significant
difference for the precipitation. The superiority of the NMG1 model is apparent for the prediction of precipitation. For the precipitation variable, the predictive skills of the MAT and
LMC models are similar.
To make a fair comparison of the predictive performances of the three models, we now
repeat the above procedure over 24 disjoint subdomains, S1 , . . ., S24 , that cover most of globe
altogether. That is, S2n is a subdomain in the Northern Hemisphere with latitude range
0◦ –60◦ N and S2n−1 is a subdomain in the Southern Hemisphere with latitude range
0◦ –60◦ S for n = 1, . . ., 12. For each n, S2n and S2n−1 cover the longitude range 30(n − 1)◦ to
30n◦ and for S2n and S2n−1 , we set aside the data in the longitude range {30(n − 1) + 10}◦ to
{30(n − 1) + 15}◦ for the validation of prediction. Table 3 shows the maximum loglikelihood
values for the three models from the fits over the 24 subdomains. Except S12 , S16 and S24 ,
NMG1 achieves the largest maximum loglikelihood values and the differences between the
loglikelihood values of NMG1 and the other two models are significantly large in most of
the subdomains. Table 4 gives the summary of prediction performance of the three models
over the 24 subdomains. It provides the median, mean and maximum values of Mean Squared
Errors (MSEs) from the prediction over the 24 subdomains. Except the mean for temperature
and maximum for the precipitation, NMG1 gives the smallest MSE values for all the summary statistics. Along with the result in Fig. 8, this result demonstrates that NMG1 model
indeed outperforms the other two models not only in terms of the maximum loglikelihood
values (and AIC) but also the predictive performance.
4.4. Fit over full domain
We now fit the full data set through the computational technique described in section 3.2.
From the fitted results in section 4.3, it is clear that there is a strong non-stationarity in the
data and the covariance models listed in section 4.2.2 (except NMG2) are not flexible enough
to capture such non-stationarity. On the other hand, from Tables 2–4 and Fig. 8, it is clear
that the NMG1 model outperforms the MAT and LMC models given comparable number of
covariance parameters. Hence, we fit the full data to estimate cross-correlation between the
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
Non-stationary cross-covariance models
Scand J Statist 38
743
1
0
−4 −3 −2 −1
Truth–prediction
2
3
Temperature
0
10
20
30
Latitude
40
50
60
50
60
0.0
0.5
MAT
LMC
NMG1
–0.5
Truth–prediction
1.0
Precipitation
0
10
20
30
Latitude
40
Fig. 8. Differences between the true and the predicted values at the prediction sites from the fit over
North America in section 4.3. The differences are displayed with respect to latitude.
Table 3. Maximum loglikelihood values of the three models fitted in section 4.3 over 24
disjoint subdomains
Domain
MAT
LMC
NMG1
Domain
MAT
LMC
NMG1
S1
S2
S3
S4
S5
S6
S7
S8
S9
S10
S11
S12
1228.16
898.42
966.33
958.74
3991.38
672.35
1139.7
425.93
508.7
751.73
1381.94
1765.53
1232.06
915.18
1004.27
963.9
4131.18
749.61
1139.9
428.5
511.37
789.97
1408.25
1939.04
1318.36
924.17
1081.54
1009.75
4235.07
762.47
1221.59
489.47
560.65
798.83
1430.17
1916.41
S13
S14
S15
S16
S17
S18
S19
S20
S21
S22
S23
S24
4098.32
2050.49
3379.78
1324.41
4011
1235.51
−775.02
905.21
946.68
1895.64
3879.35
1115.2
4137.95
2143.39
3970.56
1367.61
4039.39
1241.37
−754.28
935.11
966.36
1944.9
4032.49
1145.51
4242.89
2271.37
4133.68
1356.9
4277.94
1317.76
−600.95
955.05
990.96
1946.37
4168.3
1140.53
Bold face indicates maximum values across the three models for each case.
temperature and precipitation data using an extended version of NMG1, NMG2, described
in section 4.2.2. We set d1 ∈ (0, ∞) and d2 = 0. This is an attempt to capture the difference
in smoothness for the two variables. We keep d1 positive to avoid the identifiability problem.
We could let d1 = 0 instead of d2 , but since the parameter estimates in Table 2 and the study
by Tebaldi & Sansó (2009) suggest that surface temperature data is less smooth than precipitation data, by adding the term d1 U1 , we hope to capture the ‘roughness’ in the temperature
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
744
M. Jun
Scand J Statist 38
Table 4. Summary of Mean Squared Errors from the prediction of
the three models over the 24 disjoint subdomains in section 4.3
Temperature
Precipitation
Model
Median
Mean
Max
Median
Mean
Max
MAT
LMC
NMG1
0.208
0.222
0.199
0.491
0.488
0.498
2.838
2.908
2.798
0.0638
0.0653
0.0629
0.230
0.236
0.220
1.724
1.552
1.643
Bold face indicates minimum values across the three models.
data. We also fitted the model with d1 , d2 ∈ (0, ∞) but the improvement over NMG2 was
not significant (that is, the loglikelihood values do not increase significantly and the fitted
values do not change noticeably). We also let Y , Wj and U1 have different spatial range
parameters. That is, we set Y = G(1, 1 , 1 ), W = G(1, 2 , 2 ) and U1 = G(1, 3 , 3 ). With the
data size over 30,000 for each variable, we have sufficient information to let these parameters
differ.
The estimated covariance parameter values along with their asymptotic standard errors
are given in Table 5. It is interesting to note all three spatial range parameters are different, although for both variables, the maximum spatial range parameter estimate is given
by the process W (ˆ2 ). The smoothness parameter estimates for Y and U1 are comparable
to the corresponding estimates for the fit in section 4.3 in Table 2. Also the smallest estimate of the spatial range parameters, ˆ1 , is similar to the estimate of the corresponding
parameter, , in Table 2. The estimates for the remaining parameters differ significantly from
those in Table 2, as we expected due to the difference between the two models, NMG1 and
NMG2.
As explained in section 2.1, Fig. 1 gives a comparison between the empirical and fitted
variances and cross-correlations for the temperature and precipitation variables (NDJFM).
For Figs, (A), (C) and (E), the solid line gives the fitted values with the parameters in Table 5.
For Figs (B), (D) and (E), it is not obvious how to display the corresponding fitted values
since as in Appendix, the empirical quantities are calculated through the sum across latitudes
and thus at each longitude value, corresponding fitted values do not come in one number,
but rather you get different fitted values for different combinations of latitudes. Figs (A)–(D)
show the standard deviation for the univariate processes, and (E) and (F) show crosscorrelations. Overall, fitted values do a reasonable job at capturing the pattern of the empirical values. Fitted variance for temperature is rather flat with respect to latitude since d1 has
relatively large estimate and the estimates of A1, 1 and B1, 1 in NMG2 for the temperature
process got smaller weight. On the other hand the fitted variance for precipitation captures
the pattern in the data well. It may be interesting to see if increasing m for the temperature
process would improve the fit. Fitted values for the cross-covariance structure are problematic in some places, but this may be partly due to the complex nature of the cross-covariance
structure of the data.
Table 5. MLE estimates along with their asymptotic standard errors for covariance parameters and the
maximum loglikelihood values in section 4.4. The unit for spatial range parameters (i , i = 1, 2, 3) is Km
1
2
3
1
2
3
d1
256.6205(2.3040)
836.505(46.1053)
462.0350(12.9407)
2.5979(0.0028)
0.9121(0.0059)
0.8071(0.0034)
2.2630(0.0859)
a10
a11
a12
a20
a21
a22
−0.0030(0.0003)
−0.0008(0.0001)
−0.0038(0.0009)
−0.0135(0.0004)
−0.0008(0.0001)
0.0196(0.0006)
b10
b11
b12
b20
b21
b22
0.0018(0.0005)
0.0038(0.0009)
0.0031(0.0013)
−0.0083(0.0003)
0.0230(0.0007)
0.0401(0.0012)
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
c10
c11
c12
c20
c21
c22
−0.2255(0.0213)
−0.3520(0.0287)
0.0559(0.0298)
0.7182(0.0317)
−0.1532(0.0073)
−1.4644(0.0644)
Scand J Statist 38
Non-stationary cross-covariance models
745
Figure 2 shows the comparison of the (A)–(B) empirical, (D) fitted using OLS and (E) fitted
using MLE of the longitudinal irreversibility, that is, r(L, L, ) = cor{Z1 (L, l), Z2 (L, l + )}
− cor{Z1 (L, l + ), Z2 (L, l)}, for L (x-axis) and (y-axis) in degrees. As explained in section
2.2, we bin the latitude with the bin size roughly 7◦ . For the OLS, we obtain another set of
parameter estimates by minimizing the sum of the squared differences between the empirical irreversibility and the model fitted irreversibility across the latitude bins (fitted values are
evaluated at the centre of the latitude bins) and longitude lags, , up to 180◦ . We also tried
weighted least squares using the reciprocals of the number of data points at each latitude bin
and longitude lag as weights, but the results were quite similar to the OLS fit. For (E), we
use the covariance parameter estimates in Table 5. The OLS fit captures the empirical pattern
better than the MLE fit, although irreversibility is overestimated near the North Pole. The
MLE fit captures the positive irreversibility values near the North Pole, but overall the estimates are much smaller in magnitude than the empirical values.
The fact that the OLS fit captures the empirical pattern quite well suggests that the model
in (8) is indeed flexible. However, the fitted irreversibility from MLE is different from the
empirical values by a factor of 10. The misfit of MLE estimates may be somewhat disappointing at first sight, but considering there are not many covariance models that can produce such irreversibility, it is encouraging to develop covariance models in this direction. Also
note that as shown in Figure (C), the uncertainty in the empirical longitudinal irreversibility
is quite large. We also calculated the standard errors for the fitted irreversibility in (E) using
the asymptotic standard errors of the fitted covariance parameters in Table 5, but the magnitude of the standard error is almost the same as the magnitude of the fitted irreversibility
in (E).
Now let us compare our estimated cross-correlation (in Figs 1 (E) and (F)) to the one in
Trenberth & Shea (2005). Note that due to the axial symmetry assumption, our fitted crosscovariance is identical across each latitude bands. The high correlation level in the high latitude area in Northern hemisphere matches well with the result in Trenberth & Shea (2005).
Our estimated correlation values are close to the average correlation levels across longitude
at each latitude levels in Trenberth & Shea (2005) except the South Pole area; in this region,
our estimated levels are slightly negative whereas Trenberth & Shea (2005) give high correlation levels. The estimated cross-correlation in Trenberth & Shea (2005) show clear distinction
over land and sea. Unlike in Trenberth & Shea (2005), our model can give estimated crosscorrelations between the two processes at any two distinct locations (note in Trenberth &
Shea (2005) they can only provide cross-correlation estimate between the two variables at the
same location). For the particular data set that we used in this article, the estimated cross 1 (L1 , l1 ), Z2 (L2 , l2 )}, are close to r̂(L1 , L2 , 0) for relatively
correlations, r̂(L1 , L2 , l1 − l2 ) = cor{Z
small |l1 − l2 |.
5. Discussion
As noted at the end of section 4.4, estimated cross-correlations in Trenberth & Shea (2005)
are high in polar areas, low correlations over the land, and modest correlations over the
ocean. Although we suspect this clear distinction may be due to the lack of filtering of the
spatial mean in calculating sample correlations, we could perform our analysis with our covariance models over the polar area, land and sea, separately. If we find clear differences
between these individual analysis, it may suggest that there is a non-stationary component
that depends on the fact whether the domain is over the pole, land, or the sea. Currently
our model does not have the ability of distinguishing them and we may need to extend our
model to incorporate this property.
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
746
M. Jun
Scand J Statist 38
On the other hand, if we perform analysis over the land and sea separately, we cannot take
advantage of the fact that the covariance matrix is block circulant any more. In that circumstance, we may resort to covariance tapering (Kaufman et al., 2008; Furrer & Sain, 2009) or
other likelihood approximation methods such as Stein et al. (2004) and Fuentes (2007). The
results in section 4.4 suggest that there is still a lot of room to improve in our covariance
model to fit the data better. The complex nature of the cross-covariance structure of global
data provides numerous challenges in developing flexible non-stationary cross-covariance
models for global data. However, the superior performance of our approach to some of the
existing multivariate models demonstrated in section 4.3 and the flexibility of the cross-covariance
structure demonstrated in the OLS fit of longitudinal irreversibility (Fig. 2) encourage us to
pursue the direction presented in this article.
Acknowledgements
The author acknowledges the support by the National Science Foundation. The author acknowledges the modelling groups for making their simulations available for analysis, the Program
for Climate Model Diagnosis and Intercomparison (PCMDI) for collecting and archiving the
CMIP3 model output, and the World Climate Research Programme (WCRP)’s Working Group
on Coupled Modelling (WGCM) for organizing the model data analysis activity. The WCRP
CMIP3 multi-model dataset is supported by the Office of Science, U.S. Department of Energy.
The author thanks Professor Michael L. Stein, the Editor, the Associate Editor and two anonymous referees for their detailed and constructive comments that improved the paper significantly.
References
Apanasovich, T. V. & Genton, M. G. (2010). Cross-covariance functions for multivariate random fields
based on latent dimensions. Biometrika 97, 15–30.
Bishop, C. H. & Hodyss, D. (2007). Flow adaptive moderation of spurious ensemble correlations and
its use in ensemble-based data assimilation. Quart. J. Roy. Meteorol. Soc., 133, 2029–2044.
Chan, G. & Wood, A. (1999). Simulation of stationary Gaussian vector fields. Statist. Comput. 9, 265–
268.
Choi, J., Reich, B., Fuentes, M. & Davis, J. (2009). Multivariate spatial-temporal modeling and prediction of speciated fine particles. J. Statist. Theory Pract. 3, 407–418.
Fuentes, M. (2007). Approximate likelihood for large irregularly spaced spatial data. J. Amer. Statist.
Assoc. 102, 321–331.
Furrer, R. & Sain, S. R. (2009). Spatial model fitting for large datasets with applications to climate &
microarray problems. Statist. Comput. 19, 113–128.
Gelfand, A. E., Schmidt, A. M., Banerjee, S. & Sirmans, C. F. (2004). Nonstationary multivariate process
modeling through spatially varying coregionalization. Test 13, 263–312.
Gneiting, T. (2002). Nonseparable stationary covariance functions for space-time data J. Amer. Statist.
Assoc. 97, 590–600.
Gneiting, T., Kleiber, W. & Schlather, M. (2010). Matérn cross-covariance functions for multi-variate
random fields. J. Amer. Statist. Assoc. 105, 1167–1177.
Goulard, M. & Voltz, M. (1992). Linear coregionalization model: tools for estimation and choice of
cross-variogram matrix. Math. Geol. 24, 269–282.
Jones, R. H. (1963). Stochastic processes on a sphere. Ann. Math. Stat. 34, 213–218.
Jun, M., Knutti, R. & Nychka, D. W. (2008). Spatial analysis to quantify numerical model bias and
dependence: how many climate models are there? J Amer. Statist. Assoc. 103, 934–947.
Jun, M. & Stein, M. L. (2004). Statistical comparison of observed and CMAQ modeled daily sulfate
levels. Atmos. Environ. 38, 4427–4436.
Jun, M. & Stein, M. L. (2007). An approach to producing space-time covariance functions on spheres.
Technometrics 49, 468–479.
Jun, M. & Stein, M. L. (2008). Nonstationary covariance models for global data. Ann. Appl. Statist. 2,
1271–1289.
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.
Non-stationary cross-covariance models
Scand J Statist 38
747
Kaufman, C., Schervish, M. & Nychka, D. (2008). Covariance tapering for likelihood-based estimation
in large datasets. J. Amer. Statist. Assoc. 103, 1556–1569.
Li, B., Genton, M. G. & Sherman, M. (2008). Testing the covariance structure of multivariate random
fields. Biometrika 95, 813–829.
Majumdar, A. & Gelfand, A. E. (2007). Multivariate spatial modeling for geostatistical data using convolved covariance functions. Math. Geol. 39, 225–245. DOI:10.1007/s11004-006-9072-6.
Majumdar, A., Paul, D. & Bautista, D. (2010). A generalized convolution model for multivariate nonstationary spatial processes. Statist. Sinica. 20, 675–695.
Reich, B. J. & Fuentes, M. (2007). A multivariate semiparametric Bayesian spatial modeling framework
for hurricane surface wind fields. Ann. Appl. Statist. 1, 249–264.
Schmidt, A. M. & Gelfand, A. E. (2003). A Bayesian coregionalization approach for multivariate
pollutant data. J. Geophys. Res. 108, 8783. DOI:10.1029/2002JD002905.
Stein, M. L. (1999). Interpolation of spatial data: some theory for kriging. Springer-Verlag, New York.
Stein, M. L. (2007). Spatial variation of total column ozone on a global scale. Ann. Appl. Statist. 1,
191–210.
Stein, M. L., Chi, Z. & Welty, L. J. (2004). Approximating the likelihood for irregularly observed
Gaussian random fields. J. Roy. Statist. Soc. B 66, 275–296.
Tebaldi, C. & Lobell, D. B. (2008). Towards probabilistic projections of climate change impacts on global
crop yields. Geophys. Res. Lett. 35, L08705, doi:10.1029/2008GL033423.
Tebaldi, C. & Sansó, B. (2009). Joint projections of temperature and precipitation change from multiple
climate models: a hierarchical Bayesian approach. J. Roy. Statist. Soc. Series A 172, 83–106.
Trenberth, K. E. & Shea, D. J. (2005). Relationships between precipitation and surface temperature.
Geophys. Res. Lett. 32, L14703, doi:10.1029/2005GL022760.
Wackernagel, H. (2003). Multivariate geostatistics, 3rd edn. Springer-Verlag, Berlin.
Wooding, R. A. (1956). The multivariate distribution of complex normal variables. Biometrika 43, 212–
215.
Received September 2009, in final form May 2011
Mikyoung Jun, Department of Statistics, Texas A&M University, 3143 TAMU, College Station, TX
77843-3143, USA.
E-mail: [email protected]
Appendix
This Appendix gives details on how quantities in Fig. 1 are calculated.
Below, Z can be either temperature or precipitation process.
(A), (C): For empirical values, at each Lj ( j = 1, . . ., p), we get
q
{Z(Lj , li ) − Z̄(Lj )}2 (q − 1),
i =1
when Z̄(Lj ) = qi= 1 Z(Lj , li )/q. The corresponding fitted value is given by var{Z(Lj , l)} for
any l and this value is the same across all values of l since the model is axially symmetric.
(B), (D): For empirical values, at each li (i = 1, . . ., q), we get
p
{Z(Lj , li ) − Z̄(li )}2 (p − 1),
j =1
when Z̄(li ) =
p
j = 1 Z(Lj , li )/p.
Now, Z1 denotes the temperature process and Z2 denotes the precipitation process.
(E): For empirical values, at each latitude Lj , we calculate sample correlation coefficient of
the two vectors, {Z1 (Lj , l1 ), . . ., Z1 (Lj , lp )}T and {Z2 (Lj , l1 ), . . ., Z2 (Lj , lp )}T . Fitted values are
calculated similarly to (A) and (C).
(F): For empirical values, at each longitude li , we calculate sample correlation coefficient
of the two vectors, {Z1 (L1 , li ), . . ., Z1 (Lq , li )}T and {Z2 (L1 , li ), . . ., Z2 (Lq , li )}T .
© 2011 Board of the Foundation of the Scandinavian Journal of Statistics.