Robust functional regression model for marginal mean and subject

arXiv:1705.05618v1 [stat.ME] 16 May 2017
Robust functional regression model for
marginal mean and subject-specific inferences
Chunzheng Cao1 , Jian Qing Shi2 , and Youngjo Lee
∗3
1
School of Mathematics and Statistics, Nanjing University of
Information Science and Technology, Nanjing, China
2
School of Mathematics & Statistics, Newcastle University, UK
3
Department of Statistics, Seoul National University, Seoul, Korea
May 17, 2017
Abstract
We introduce flexible robust functional regression models, using various heavy-tailed processes, including a Student t-process. We propose
efficient algorithms in estimating parameters for the marginal mean
inferences and in predicting conditional means as well interpolation and
extrapolation for the subject-specific inferences. We develop bootstrap
prediction intervals for conditional mean curves. Numerical studies
show that the proposed model provides robust analysis against data
contamination or distribution misspecification, and the proposed prediction intervals maintain the nominal confidence levels. A real data
application is presented as an illustrative example.
Keywords: Bootstrap, dose-response, EM algorithm, functional data analysis, outliers, prediction interval, robust, heavy-tailed process
∗
Corresponding author: [email protected]
1
1
Introduction
Functional regression models are often used to analyze the medical data.
However, conclusions from functional regression models can be sensitive to
the presence of outliers or misspecifications of model assumptions. Prediction
intervals (PIs) for subjects-specific inferences have not been well developed.
In this paper, we propose robust functional regression models and study
predictions of individual response curves and their prediction intervals.
Gaussian process (GP) has been widely used to fit an unknown regression
function to describe the nonparametric relationship between a response variable and a set of multi-dimensional covariates.(Rasmussen and Williams, 2006)
Variety of covariance functions provide flexibility on fitting data with different
degrees of smoothness and nonlinearity. Shi et al. (2007) introduced the GP
functional regression (GPFR) models, which fit the mean structure and covariance structure simultaneously. This enables to estimate a subject-specific
regression curve consistently. The idea is further extended to model nonlinear
random-effects and is applied to construct a patient-specific dose-response
curve.(Shi et al., 2012) Recent applications of GPFR models can be found
in, for example, single-index model (Gramacy and Lian , 2012) and model of
non-Gaussian functional data.(Wang and Shi, 2014)
In the GPFR models, the mean structure can describe marginal mean
curves using information from all subjects, while the covariance structure can
catch up subject-specific characteristics. However, as we shall see, the GPFR
models are not robust. They are sensitive to misspecification of distributions
and the presence of outliers. In this paper, we propose to use heavy-tailed
processes (HPs) to overcome the drawbacks of GPs. Specifically we will use
scale mixtures of GP (SMGP),(Rasmussen and Williams, 2006) an extension
of scale mixtures of normal (SMN) distribution.(Andrews and Mallows, 1974)
The latter is a subclass of elliptical distribution family, including Studentt, slash, exponential power, contaminated-normal and other distributions.
SMN distributions have been used in various models, including nonlinear
mixed-effects models,(Lachos et al., 2011; Meza et al., 2012) measurement
error models,(Cao et al., 2015; Blas et al., 2016) functional models,(Zhu et
al., 2011; Osorio, 2016) and extended to double hierarchical generalized linear
models.(Lee et al., 2006) Similarly SMGP includes many different heavy-tailed
stochastic processes such as the Student t-process. The SMGP inherits most
of the good features from GP; it is easy to understand its covariance structure,
and it can use many different covariance functions to allow a flexible model.
2
SMGP includes GP as a special case. Moreover, SMGPs except GP are HPs
and they provide a robust analysis against distributional misspecification or
data contamination.
In this paper, we extend the GPFR model (Shi et al., 2007, 2012) to a
HP functional regression (HPFR) model for a robust analysis. For maximumlikelihood estimators (MLEs), we develop an efficient EM algorithm for
proposed HPFR models. McCulloch and Neuhaus (2011) investigated the
impact of distribution misspecification in linear and generalized linear models,
and showed that the overall accuracy of prediction is not heavily affected
by mild-to-moderate violations of the distribution assumptions. However,
the improvement of using HPFR models becomes significant for the data
contamination by outliers. A comprehensive simulation study is given in
Section 5.
Existing PIs for subject-specific inferences often give liberal intervals as
we shall show later. Lee and Kim (2016) studied PIs for random-effect models
and we extend them to HPFR models. We show that they maintain the
nominal confidence level (NCL) by using numerical studies.
The rest of this paper is organized as follows. In Section 2, we define
the HPFR model along with a brief introduction of the GPFR model. In
Section 3, we apply the HPFR model to analyze the renal anaemia data.
We demonstrated that the proposed HPFR model provides a robust analysis
against outliers and therefore avoids a misleading conclusion on a doseresponse relation. In Section 4 the estimation and prediction procedures
are described and the information consistency of subject-specific response
prediction is shown. In Section 5, a simulation study is presented to evaluate
the performance of the HPFR model and proposed procedures. Concluding
remarks and discussion are given in Section 6. All the technical details are
provided as the supplementary materials.
3
2
2.1
Model
The GPFR model
Consider a mixed-effects concurrent GPFR model,(Shi et al., 2012) defined
as:
ym (t) = µm (t) + τm (t) + εm (t),
>
µm (t) = v >
m (t)γ + um β(t),
τm (t) =
w>
m (t)bm
(1)
+ ζm (xm (t)),
where ym (t) is a response curve for the m-th subject, depending on covariates
um of dimension pu and functional covariates v m (t), wm (t) and xm (t) of
dimensions pv , pw and px respectively. The model is composed of three parts:
the marginal mean (related to the so-called population-average) (Diggle et
al., 1996)
E[ym (t)] = µm (t),
the random-effect τm (t) for the m-th subject to give the conditional (or the
so-called subject-specific) mean
E[ym (t)|bm , ζm (xm (t))] = µm (t) + τm (t),
and the random error εm (t). In this paper, we show how to make marginal
and conditional inferences based on functional models as above.(Lee and
Nelder, 2004)
In the marginal mean µm (t), v >
m (t)γ is an ordinary linear regression model
with functional covariates v m (t) and regression coefficients γ, while u>
m β(t)
is proposed (Ramsay and Silverman, 2005) for nonparametric functional
estimation using covariates um with unknown functional coefficients β(t).
Here the pu -dimensional functional coefficient β(t) can be approximated by a
set of basis functions. In this paper, we use B-splines, i.e., β(t) = B > Φ(t), in
which B is a D × pu matrix with element βij , and Φ(t) = (Φ1 (t), . . . , ΦD (t))>
are the B-spline basis functions.
In the random-effects τm (t), w>
m (t)bm is an ordinary linear random
effect model with functional covariates wm (t) and random-effects bm ∼
Npw (0, Σb ) with, for simplicity, Σb being a diagonal matrix with elements
φb = (φ1 , . . . , φpw )> , while ζm (xm (t)) is a functional (non-linear) randomeffects by using a GP with zero mean and covariance kernel C(xm (t), xm (t0 ); θ).
4
A common choice for this kernel is the following squared exponential function
px
1X
C(xm (t), xm (t ); θ) = v0 exp{−
wk (xm,k (t) − xm,k (t0 ))2 }.
2 k=1
0
(2)
Other choices of the kernel have been studied.(Rasmussen and Williams,
2006; Shi and Choi, 2011) Linear random-effects can provide a clear physical
explanation between the response and the covariates, and can indicate which
variables are the cause of the variation among different subjects. The unexplained part can be modeled by the functional random-effects. Note that,
µm (t) is one function describing the marginal means of all subjects, while
random functions τm (t) allow different functional and nonparametric effects
for each subject to catch up subject-specific characteristics.
The random error εm (t) follows N(0, φε ) and is independent at different t.
For convenience, we can include it into the random-effects by
τem (t) = τm (t) + εm (t).
Then, τem (t) is a GP with zero mean and the covariance kernel
e m (t), xm (t0 ); wm (t), wm (t0 ); θ, φb , φε )
C(x
pw
X
= C(xm (t), xm (t0 ); θ) +
φk wm,k (t)wm,k (t0 ) + φε δ(t, t0 ) ,
(3)
k=1
where δ(·, ·) is the Kronecker delta function. Thus, the GPFR model for ym (t)
e
is a GP with mean µm (t) and covariance kernel C.
2.2
The HPFR model
In this paper we show that the GPFR model provides sensitive analysis to the
presence of outliers and misspecification of distribution assumptions. Thus,
the inferences based on the GPFR model can be misleading. We show how to
make robust analysis for marginal mean and subject-specific inferences using
HPFR models.
Given a latent variable rm , suppose that the conditional process of τem (t)
follows a GP with zero mean, but with the covariance
e m (t), xm (t0 ); wm (t), wm (t0 ); θ, φb , φε ),
κ(rm )C(x
5
(4)
e ·) is the covariance function given in (3), κ(·) is a strictly positive
where C(·,
function, and the latent random variable rm takes positive value with the
cumulative distribution function (cdf) H(·; ν) and probability density function
(pdf) h(·; ν) with ν being the degree of freedom. The property of this process
depends on the choice of κ(·) and H(·; ν). When κ(·) ≡ 1, it degenerates to
the GP.
Lange and Sinsheimer (1993) studied SMN distribution with κ(rm ) = 1/rm .
Rasmussen and Williams (2006) called the process with conditional covariance
kernel (4) a SMGP and in this paper we call it HP to highlight that it can
be applied to wider class of data using double hierarchical generalized linear
models.(Lee et al., 2006) If r ∼ Gamma(ν/2, ν/2), it becomes a Studentt (T) process, if r ∼ Beta(ν, 1), it becomes a slash (SL) process and if
P (r = γ) = ν, P (r = 1) = 1 − ν with 0 < ν 6 1, 0 < γ 6 1, it is
a contaminated-normal (CN) process. We call model (1) under this HP
structure a mixed-effects HPFR model (abbreviated as ‘HPFR’). HPFR
models can cover existing heavy-tailed mixed-effects models, (Pinheiro et
al., 2001; Savalli et al., 2006) a robust P-splines model (Osorio, 2016) and
extended T-process regression models,(Wang et al., 2016) etc.
3
An illustrative example: the renal anaemia
data
West et al. (2007) studied the renal anaemia data, which contain 74 dialysis
patients who received Darbepoetin alfa (DA) to treat their renal anaemia.
The Hemoglobin (Hb) levels and epoetin dose were recorded from the original
study period of 9 months with a further 3 months extension. The doses
of epoetin DA were determined by a strict clinical decision support system
(Tolman et al., 2005; Will et al., 2007) to maintain the Hb level around 11.8
g/dl, as a target for dialysis patients. The lower Hb level will lead the patient
to anemia, while the higher level increases the risk of prothrombotic and other
problems. The experiment is typically a dose-response study to evaluate the
control of Hb levels with the agent DA. The GPFR model has been used to
analyze the data.(Shi et al., 2012) The response is the Hb level, and the dose
of DA and time are considered as predictors.
Figure 1(1) displays the Hb levels measured for each patient in a period
of 12 months, and Figure 1(3) gives the histogram of dosages of DA for all
6
patients. Since the distribution of dosage dosem (t) (for the m-th patient at
the month t) is quite skew, we use a log-transformation log[10dosem (t) + 1] to
reduce the impact of extreme values. As we see in Figure 1(4), the converted
dose values except zeros are nearly normal after using the transformation.
Taking into account a certain lag-time of the drug reaction, Shi et al. (2012)
choose dosem (t − 2), i.e., the dosage of DA taken two months before, as a key
covariate. Following Shi et al. (2012), we use a linear regression model for
the marginal mean
µm (t) = γ0 + γ1 · t + γ2 · dosem (t − 2) + γ3 · dosem (t − 2)2 ,
and random-effects
τm (t) = bm + ζm (xm (t)),
with xm (t) = [t, dosem (t − 2)]> . Figure 1(2) shows the index plots of Mahalanobis distance (defined later) of each patient under the GPFR model,
which follows theoretically a χ2 -distribution. We can see that there are large
Mahalanobis distance values from the four patients beyond the quantile χ20.01
cutoff line. These four patients are recognized as potential outliers.
(1)
(2)
Hb level
16
14
32
52
58
62
50
Mahalanobis distance
No.
No.
No.
No.
12
10
8
2
4
6
month
8
10
62
32
40
30
52
10
0
12
58
20
20
(3)
40
Index
60
(4)
200
120
100
Frequency
Frequency
150
100
80
60
40
50
20
0
0
0.5
1
1.5
2
0
2.5
Dose
Figure 1:
0
1
2
Dose
3
Renal data: (1) the Hb levels for the 74 patients; (2) Mahalanobis distance under the
GPFR model; (3) Histogram of dose values before transformation; (4) Histogram of dose values after
transformation.
7
3.1
The population-average inferences
Table 1 reports the MLEs of coefficients for the marginal mean and their
standard errors (the first value in parentheses). Here ‘N’ stands for the GPFR
model. The degree parameters for three heavy-tailed models are estimated
together with other parameters. In Table 1, the negative coefficient of t
means that the patients have a decreasing trend in the level of Hb over time
without considering dose effects. Note that the values of Bayesian information
criterion (BIC) (Schwarz, 1978) and the values of standard errors (the second
values of each covariate in fixed-effects) for three heavy-tailed models are
often smaller than those for the GPFR model, and the model using T-process
(TPFR) is the best choice under BIC. We also report the values of root of
mean squared error (RMSE) of the predictions for all patients in Table 1.
The smaller values of RMSE under three HPFR models confirm the better
performance of the heavy-tailed models in fitting the data. The third values
of each covariate in fixed-effects in Table 1 are relative change ratios (%) of
the estimates after removing the data from four potential outlier patients. We
find that the change ratios of estimates under three HPFR models are much
smaller than those under the GPFR model, which indicate the robustness in
parameter estimation by using the heavy-tailed models.
Table 1: Parameter estimation of the renal anaemia data.
Model
N
T
SL
CN
Degree
/
ν
b = 7.106
ν
b = 1.723
ν
b = 0.424
γ
b = 0.326
BIC RMSE
2136
2088
2092
2101
Covariates in fixed-effects
t
dosem (t − 2)
Constant
0.416 11.500/0.310/-1.4 -0.036/0.022/-32.9 0.863/0.313/-6.1
0.393 11.426/0.297/-0.4 -0.035/0.021/-14.7 0.749/0.304/-0.6
0.391 11.451/0.299/-0.5 -0.035/0.021/-15.2 0.746/0.308/ 0.3
0.392 11.464/0.285/-0.6 -0.040/0.020/-17.8 0.744/0.267/-3.3
dose2m (t − 2)
-0.196/0.103/-22.1
-0.097/0.108/ 2.8
-0.106/0.109/ 2.9
-0.094/0.099/-11.7
RMSE: root of mean squared error of the predictions;
Three values of each covariate in fixed-effects are in turn, the values of estimate, standard error and the relative
change ratio of the estimate after removing the outliers.
We plot the marginal mean curve of Hb levels affected by dose in Figure 2
under four models. The solid lines and the dash lines are drawn respectively,
based on the complete 74 patients data, and data deleting those from the four
patients. For the GPFR model, the level of Hb increases when the dose is less
than 0.8 and decreases slowly after it. However, this turning point changes
from 0.8 to 1.3 after deleting the data from the four patients. Under all the
HPFR models, the marginal mean maintains to increase on the whole interval
but the increasing rate reduces slowly at large dosage. This is a more realistic
8
dose-response curve based on the current understanding on the dose-response
relation. Compared with GPFR, the curve shapes under three HPFR models
have smaller changes after deleting outliers, which implies the robustness of
heavy-tailed models for population-average inferences.
(1) N
(2) T
13.5
13.5
13
Hb level
Hb level
13
12.5
12
11.5
11
12.5
12
11.5
0
0.5
0.8
1.3
Dose
2
11
2.5
0
0.5
1
(3) SL
2
2.5
13
Hb level
Hb level
2.5
13.5
13
12.5
12
11.5
12.5
12
11.5
0
0.5
1
1.5
2
11
2.5
0
0.5
Dose
Figure 2:
2
(4) CN
13.5
11
1.5
Dose
1
1.5
Dose
Mean curve of dose response based on original data (Solid line), and data without outliers
(Dash line).
3.2
The subject-specific inferences
We now study the patient-specific dose-response curves. We use the data
from the first 12 months, and predict (extrapolate) the Hb level at the 14-th
month for the dosage from 0 to 2.5. Figure 3 shows the predictions of the
patient-specific dose-response curves under the GPFR and TPFR model for
two typical patients. For the first patient, the dose-response curve under the
GPFR model does not seem realistic, because it cannot achieve the target
value of 11.8 even if we increase the dosage. However, the TPFR curve reaches
the target when dosage is increased to 1.75. This is the dosage the patient
should take at Month 12 if they wants to maintain the Hb level around 11.8
at Month 14. For the second patient, the dose-response curve suggests that
the Hb level under the GPFR model reaches the target when dose is 1, and it
stays at the same level even when the dosage increases. The TPFR model
achieves a more reasonable dose-response relation: the Hb level reaches the
target when dose is 0.75 and keep increasing as the dosage increases. We also
9
plotted the PIs for patient-specific curves. Note that the narrowest interval
is allocated to the Hb level around the actual dosages the patients took in
the previous month. This is because the covariance kernel we used in the
stochastic process depends on the difference of dosages between consecutive
months, and thus the prediction has less uncertainty in the neighbour of the
previous dosage. We will show later how to construct the PIs and show that
the proposed PIs maintain the NCLs.
(2) T, Patient 1
15
14
14
13
13
12
12
Hb level
Hb level
(1) N, Patient 1
15
11
10
9
10
9
8
7
11
8
0
0.5
1
1.5
2
7
2.5
0
0.5
1
Dose
14
13
13
12
12
11
10
9
2.5
2
2.5
11
10
9
8
8
0
0.5
1
1.5
2
7
2.5
0
0.5
Dose
Figure 3:
2
(4) T, Patient 2
15
14
Hb level
Hb level
(3) N, Patient 2
15
7
1.5
Dose
1
1.5
Dose
Dose-response of patient-specific prediction for the first selected patient (the upper panel)
and the second patient (the lower panel) under the GPFR (N) and TPFR (T) model. The solid lines
stand for predictions, and the dash lines are 95% predictive bands.
4
Inference methods
Suppose that we have data from M different subjects, and all functional
covariates are observed at points tm = (tm1 , . . . , tm,nm )> for the m-th subject.
Let y m = (ym1 , . . . , ym,nm )> , V m be the nm × pv matrix with the i-th row
v>
m (tmi ), X m and W m be the matrix defined in the same way as V m . Consider
the following HPFR model
y m = Φm Bum + V m γ + W m bm + ζ m + εm ,
10
(5)
where Φm is an nm × D matrix with the i-th row Φ> (tmi ), and
ind
εm |rm ∼ Nnm (0, κ(rm )φε I nm ),
ind
bm |rm ∼ Npw (0, κ(rm )Σb ),
ind
(6)
ζ m |rm ∼ Nnm (0, κ(rm )C m ),
iid
rm ∼ H(r; ν), m = 1, . . . , M,
C m is an nm ×nm covariance matrix with the (i, j)-th element C(xm (ti ), xm (tj ); θ).
Then, y m follows a SMN distribution, (Andrews and Mallows, 1974; Fang
et al., 1990) i.e., SMN(µm , Σm ; H), where µm = Φm Bum + V m γ and
Σm = C m + W m Σb W >
m + φε I nm .
4.1
Parameter estimation
Denote the parameter set {B, γ, θ, φb , φε , ν} by Θ. Let D c = {D m , rm , m =
1, . . . , M } be the complete data, where D m = {y m , um , V m , W m , X m , tm }
be the observed data of the m-th subject. For convenience, we separate Θ
>
into β = (Vec(B)> , γ > )> , ψ = (θ > , φ>
b , φε ) and ν, where ν is a parameter
for the degree of freedom.
The log-likelihood of the complete data is given by
M
M
M
X
1X
1X
l(Θ|D c ) = −
log |Σm | −
nm log κ(rm ) +
log h(rm ; ν)
2 m=1
2 m=1
m=1
M
1 X −1
−
κ (rm )(y m − Am β)> Σ−1
m (y m − Am β),
2 m=1
(7)
where Am = [u>
m ⊗ Φm , V m ] satisfies µm = Am β. For the MLEs of Θ, we
propose to use an EM algorithm below.
the current value of Θ(k) , we calculate the Q-function
E-step: Given
E l(Θ|D c )Θ(k) , D m , m = 1, . . . , M which is proportional to
Q(Θ|Θ(k) ) = Q1 (β, ψ|Θ(k) ) + Q2 (ν|Θ(k) ),
11
(8)
with
M
M
1 X (k)
1X
log |Σm | −
πm (y m − Am β)> Σ−1
Q1 (β, ψ|Θ ) = −
m (y m − Am β),
2 m=1
2 m=1
(k)
(9)
(k)
Q2 (ν|Θ ) =
M
X
E[log h(rm ; ν)|Θ(k) , D m ],
(10)
m=1
where
(k)
πm
= E[κ−1 (rm )|Θ(k) , D m ].
(11)
In the HPs, the weight πm is inversely proportional to the Mahalanobis
distance dm = (y m − µm )> Σ−1
m (y m − µm ) (see supplementary materials). In
the presence of outliers or model misspecification, dm increases, leading to
smaller weight to give robust analysis.
The likelihood (7) is the h-likelihood of Lee and Nelder (1996). We can
estimate the parameters by using the h-likelihood method. Given Θ, maximizing the conditional likelihood pΘ (rm |y m ) with respect to rm is equivalent
to maximizing the h-likelihood L(Θ, rm ; y m , rm ), which has an explicit form
as (7). To obtain MLEs of Θ in h-likelihood approach, the Laplace approximation has been proposed.(Lee et al., 2006) However, the EM algorithm is
more efficient in the HPFR model, because the E-step (11) is straightforward.
Following ECM (Meng and Rubin, 1993) and ECME,(Liu and Rubin, 1994)
we implement the EM algorithm as follows.
CM-step: We maximize (8) with respect to Θ to get the updated parameter
given Θ(k) . We propose the following sub-iterative process:
1. Set the least square estimation of β as the initial value β (0) under the
assumption: Σm = I m and πm = 1 for m = 1, . . . , M ;
2. Obtain the initial value ψ (0) by maximizing the Q-function in equation
(9) under β (0) and πm = 1;
3. Choose a sensible initial value of ν (0) ;
4. Calculate the weights πm (m = 1, . . . , M ) under the current values β,
ψ and ν;
12
5. Update β under the current values of πm , ψ by
β (k+1) =
M
X
(k) > −1
Am Σ m A m
πm
M
−1 X
(k) > −1
Am Σm y m ψ(k) ;
πm
(12)
m=1
m=1
6. Update ψ by maximizing the Q-function in equation (9) under the
current values of β and πm ;
7. Given the current values of β and ψ, update ν by maximizing the
constrained actual marginal log-likelihood function (Liu and Rubin,
1994) of {y m , m = 1, . . . , M } over ν.
Ideally we need to repeat steps 4 to 7 until it reaches convergence. Practically we can set a threshold and stop the sub-iteration when kΘ(k+1) −Θ(k) k is
smaller than the threshold. Step 6 is analogous to that under GP assumptions
√
by transforming the residual em = y m − Am β into e
em = πm (y m − Am β).
Following the ECME algorithm (Liu and Rubin, 1994), step 7 does not maximize the expected log-likelihood (10) but maximizes the actual log-likelihood
over ν, which yields a much faster converging speed. The actual log-likelihood
of {y m , m = 1, . . . , M } as well as the score function of Θ under some SMN
distributions are given in supplementary materials.
The asymptotic confidence intervals of the MLEs for this model can be
b or the expected
obtained through the observed information matrix J(Θ)
b (see supplementary materials). Here we estimate the
information matrix I(Θ)
degree ν together with β and ψ. When sample size is small, the estimation
of ν may be not reliable. In this case, we may fix its value, e.g., choose ν = 4
for Student-t,(Lange et al., 1989) or choose ν by BIC or by cross-validation.
4.2
Prediction
Suppose that we want to predict the responses y ∗ = (y1∗ , . . . , yn∗ )> of a
new (M + 1)-th subject at points t∗ = (t∗1 , . . . , t∗n )> . Besides the data of
M subjects, suppose that we have nM +1 observations for the new (M +
1)-th subject at points tM +1 = (tM +1,1 , . . . , tM +1,nM +1 )> . Let D M +1 =
{y M +1 , uM +1 , V M +1 , W M +1 , X M +1 , tM +1 } and D = {D m |m = 1, . . . , M +
1} be the observed data.
We rewrite model (5) as
y m = µm + τe m ,
13
(13)
ind
with τe m ∼ SMN(0, Σm ; H) for m = 1, . . . , M + 1. For the (M + 1)-th subject,
let τe ∗ be the random-effects at unobserved points t∗ , while τe M +1 be the
random-effects at observed points tM +1 . Thus, we have
e ∗>
τe >
M +1 , τ
> e M +1 ),
rM +1 ∼ Nn +n (0, κ(rM +1 )Σ
M +1
where
e M +1 =
Σ
ΣM +1 Σ∗M +1
Σ∗>
Σ∗
M +1
(14)
,
(15)
with Σ∗ and Σ∗M +1 are, respectively, the covariance matrix of the new subject
with elements evaluated from the covariance function (2) at point pairs (t∗j , t∗k )
and (ti , t∗k ) for i ∈ {1, . . . , nM +1 } and j, k ∈ {1, . . . , n}. Then, the conditional
expectation of y ∗ given D M +1 and Θ is
E(y ∗ |D M +1 ; Θ) = E(µ∗ + τe ∗ |D M +1 ; Θ)
= µ∗ + E E(e
τ ∗ |rM +1 , D M +1 ; Θ)
∗
=µ +
−1
Σ∗>
M +1 ΣM +1 (y M +1
(16)
− µM +1 ),
where µ∗ and µM +1 are respectively the marginal mean curve of the new
subject at points t∗ and tM +1 .
The conditional variance of y ∗ given D M +1 and Θ is given by
Var(y ∗ |D M +1 ; Θ) = E(y ∗ y ∗> |D M +1 ; Θ) − E(y ∗ |D M +1 ; Θ)E(y ∗> |D M +1 ; Θ)
= E Var(y ∗ |rM +1 , D M +1 ; Θ)
−1
∗
= E[κ(rM +1 )|D M +1 ; Θ](Σ∗ − Σ∗>
M +1 ΣM +1 ΣM +1 ).
(17)
For GP models, PIs have been studied, assuming normal distribution with
the first two moments (16) and (17).(Wang and Shi, 2014) But, the predictive
b under HPFR models
distribution for the conditional mean p(y ∗ |D M +1 ; Θ)
may not be normal. For instance the T-process with low degree of freedom,
the normal approximation may not be appropriate. Thus, we should compute
the quantiles based on the true predictive distribution for a better prediction.
However, the resulting PIs would be liberal because they do not take account
for the uncertainty in estimation the parameters.(Bjørnstad, 1990, 1996)
We propose the following parametric bootstrap method based on a finite
sample adjustment to remedy the drawback:
14
b J−1 (Θ));
b
1. Generate Θ∗j (j = 1, . . . , J) from its asymptotic distribution N(Θ,
2. Approximate the predictive distribution of y ∗ by
Z
∗
b Θ|D)d
b
b
p(y |D) = p(y ∗ |D M +1 ; Θ)p(
Θ
J
1X
p(y ∗ |D M +1 ; Θ∗j ).
≈
J j=1
(18)
Note that Step (2) involves the generation of data from p(y ∗ |D M +1 ; Θ∗j )
which may be not straightforward in some cases. This problem can be
addressed by augmenting the latent variable rM +1 in the process. We first
generate rM +1 from the conditional distribution p(rM +1 |D M +1 ; Θ∗j ) which is
given in supplementary materials for some distributions. Given D M +1 and
rM +1 , it is straightforward to generate y ∗ since its conditional distribution is
Gaussian:
e ∗ ),
µ∗ , κ(rM +1 )Σ
(19)
y ∗ |rM +1 , D M +1 , Θ∗j ∼ Nn (e
−1
∗
∗>
−1
∗
e∗
e ∗ = µ∗ + Σ∗>
with µ
M +1 ΣM +1 (y M +1 − µM +1 ) and Σ = Σ − ΣM +1 ΣM +1 ΣM +1
∗
evaluated at Θj . Hence, we can obtain the prediction as well as pointwise
PIs of y ∗ using the sample quantiles calculated from bootstrap replications.
The bootstrap method can also be used to calculate other quantities, such
as the subject-specific random-effects. It can be considered as an extension
of PIs (Lee and Kim, 2016) for random-effects to curve predictions.
4.3
Information consistency
Seeger et al. (2008) proved information consistency that the prediction based
on GP model can be consistent to the true curve, which has been extended
to generalized GPFR model (Wang and Shi, 2014) and T-process regression
model. (Wang et al., 2016) We now discuss this information consistency for
the HPFR model.
For convenience, we omit the subscript m here and denote the data by
y n = (y1 , . . . , yn )> at points tn = (t1 , . . . , tn )> and all the corresponding
covariates in the random-effects as X n = (x1 , . . . , xn )> , where xi ∈ X ⊂ Rpx
are independently drawn from a distribution U(x). Let p(y n |τ0 , X n ) be the
density function of y n given τ0 and X n , where τ0 (·) is the true underling
15
function and hence the true mean of yi is given by µ(ti ) + τ0 (xi ). Let
Z
pθ (y n |X n ) =
p(y n |τ, X n ) dpθ (τ )
F
be the density of y n given X n based on the HPFR model, where pθ (τ ) is
a measure of the stochastic process τ (·) on space F = {τ (·) : X → R},
and θ contains
R all the parameters in the covariance function of τ (·). Denote
D[p1 , p2 ] = (log p1 − log p2 ) dp1 as the Kullback-Leibler divergence between
p1 and p2 , we have the following result.
Theorem 1. Under appropriate conditions:
b→θ
(C1) The covariance kernel function C(·, ·; θ) is continuous in θ and θ
almost surely as n → ∞;
(C2) The reproducing kernel Hilbert space (RKHS) norm kτ0 kc is bounded;
(C3) The expected regret term EX n (log |I n + φ−1 C n |) = o(n),
we have
1
EX (D[p(y n |τ0 , X n ), pθb (y n |X n )]) → 0 as n → ∞,
n n
(20)
where pθb (y n |X n ) is the estimated density of y n under the HPFR model, EX n
denotes the expectation under the distribution of X n .
Proof is in supplementary materials. Note here that p(y n |τ0 , X n ) is the
density of the true model and pθb (y n |X n ) is that of the working model,
and it achieves information consistency, i.e., the density of working model
converges to that of true model. The estimation of the mean structure µm (t)
is based on all independent subjects and has been proved to be consistent
in many functional models.(Ramsay and Silverman, 2005; Yao et al., 2005;
Li and Hsing, 2010) Under the working model (5) of the observations, the
b is easily hold since it is essentially an MLE
consistency of the estimators Θ
of a longitudinal model. The expected regret term EX n (log |I n + φ−1 C n |) is
of order o(n) for some widely used covariance kernels.(Seeger et al., 2008)
16
5
Simulation studies
We now investigate the performance of the HPFR model in terms of accuracy
and robustness for both estimation and prediction. We will compare models
with the following three types of process: N (GP), T with ν = 4 and SL with
ν = 1.3.
The mean curve of the true model is µ(t) = 0.8 sin((0.5t)3 ) with t equally
distributed in [−4, 4]. The random terms τem (t)’s are generated under a SMGP
by setting: xm (t) = 2.5t with θ > = (v0 , w) = (0.04, 1) for the nonlinear
random-effects, wm (t) = 0.5t with φb = 0.01 for the linear random-effects,
and φε = 0.01 for the random error.
To compare the performance of different models, we generate data of
twenty independent subjects under one of the following five schemes: (I):
GPFR; (II) TPFR with ν = 4; (III) GPFR with curve disturbance in the
5-th subject (increase the amplitude of µ5 (t) from 0.8 to 4); (IV) GPFR with
outliers in the 10-th subject (jump the region {y10 (t)|t ∈ [−1, 1]} upward 2
units); (V) Combine III and IV. We then use three models (N, T, SL) to fit the
data. We estimate the unknown parameters and then calculate the marginal
mean curve and random terms for each subject. The functional coefficients
involved in the fixed mean term are approximated by cubic B-splines with
18 knots equally spaced in [−4, 4] which is suitable for our simulations. In
practice, the number of basis functions can be chosen by BIC or other methods.
5.1
Estimation for population-average inferences
We first study the performances of HPFR models for inferences about the
marginal mean E[y(t)] = µ(t). Table 2 reports the RMSE between µ(t) and
µ
b(t) for nm = 31 and 61 based on fifty replications. Under ‘T’ and ‘SL’ we use
the t-process and slash process with fixed values of ν, while under ‘T1’ and
‘SL1’ we estimate it as well as other parameters. The results under Scheme I
show little difference among three models when the data are generated from
the GPFR model. However, if the data are generated from the TPFR model
(Scheme II), the GPFR model gives a poor fitting. This indicates that HPFR
models provide robust fittings against a misspecification of distribution.
We added two types of outliers respectively in Schemes III and IV and
both of them in Scheme V. The results obtained from the HPFR models
with heavy-tailed process are much better than the GPFR model, indicating
that the HPFR models are robust against outliers. The performances of two
17
HPFR models when ν is estimated (T1 and SL1) are very close to the models
when ν fixed (T and SL). Moreover, the choice between T and SL is not
crucial, since their results are quite similar.
Table 2: RMSE between true mean curve and its estimation.
Scheme
I
II
III
IV
V
nm = 31
T1
SL
N
T
0.055
0.076
0.111
0.076
0.117
0.057
0.056
0.058
0.059
0.057
0.055
0.056
0.058
0.059
0.057
0.056
0.056
0.057
0.059
0.056
nm = 61
T1
SL
SL1
N
T
0.055
0.057
0.056
0.059
0.055
0.053
0.073
0.105
0.075
0.116
0.055
0.055
0.056
0.056
0.055
0.054
0.055
0.056
0.056
0.055
0.054
0.056
0.056
0.056
0.055
SL1
0.053
0.056
0.055
0.056
0.055
T and SL: the t-process and slash process with fixed degrees;
T1 and SL1: the t-process and slash process with estimated degrees.
Figure 4 shows the estimation of mean curve together with its 95%
confidence interval under Scheme V with nm = 61 when we use the GPFR
and TPFR model from one simulated data set. It shows clearly that TPFR
achieves a smaller bias and narrow but precise confidence intervals for the
marginal means. Asymptotic normality of µ
b(t) is well established, so that
these confidence intervals will be asymptotically correct.
(1)
1.5
1
1
0.5
0.5
0
0
−0.5
−0.5
−1
−1
−1.5
−4
Figure 4:
(2)
1.5
−2
0
2
−1.5
−4
4
−2
0
2
4
Estimation of mean curve under Scheme V by using: (1) GPFR; (2) TPFR. Solid line: the
true mean curve; dotted line: the estimation of the mean curve; shaded area: 95% confidence interval.
5.2
Prediction for subject-specific inferences
We now study inferences about conditional mean, which involves prediction
of random-effects τm (t) under a conditional model (1).(Lee and Nelder, 2004)
18
Table 3 lists the RMSE between true random-effects τm (t) and their predictions
τbm (t) for all subjects under Schemes I and II. Under Schemes III-V, we
calculate the RMSE for the subjects except the one with outliers. The
RMSEs become smaller for all cases as nm increases. This is because the
information about the random-effects is mainly provided by each individual
subject and the accuracy is therefore dependent mainly on the sample size of
each individual subject. The two HPFR models outperform the GPFR model
when the data come from TPFR or GPFR with outliers, which is consistent
with the previous findings.
Table 3: RMSE between true random terms and their predictions.
Scheme
I
II
III
IV
V
N
nm = 31
T
SL
N
nm = 61
T
SL
0.084
0.120
0.128
0.097
0.140
0.084
0.113
0.084
0.087
0.087
0.084
0.113
0.084
0.086
0.087
0.073
0.105
0.116
0.086
0.126
0.073
0.092
0.073
0.073
0.075
0.073
0.092
0.072
0.073
0.075
It is important to make interval statements about prediction of individual
subject. PIs based on normal assumption with the moments (16) and (17)
is called ‘PL0’ (plug-in method with a normal approximation), and those
based on quantiles from the predictive distribution pΘ
b (τm (t)|D) is called ‘PL1’
(plug-in method with the predictive distribution). In this paper, we propose
the bootstrap (BTS) PIs in Section 4.3.
We study the PIs for τm (t). Table 4 shows the coverage probabilities
(CPs) of pointwise PIs for the random terms, and Table 5 shows their average
lengths. For BTS, in each replication, we generate 1000 bootstrap samples
from the predictive distribution. The value of J in (18) is set as 50, which
is big enough to provide a reasonably accurate value to approximate the
integration.
Note that results between PL0 and PL1 are quite similar because a normal
approximation may work well in these cases. However, the CPs for both
of them are smaller than the nominal confidence levels (NCLs) since they
do not take account of the uncertainty in estimating the parameters. We
see that the BTS overcomes this drawback and maintains NCLs. When the
data are generated from a GPFR model under Scheme I, all three models
provide good results and the difference is ignorable. Under Scheme II the
19
Table 4: CPs (%) of the PIs for random terms.
nm NCL (%) Scheme
80
31
90
95
80
61
90
95
I
II
V
I
II
V
I
II
V
I
II
V
I
II
V
I
II
V
N
PL0
T
SL
N
PL1
T
SL
N
BTS
T
SL
70.5
73.9
56.0
82.1
83.9
69.3
88.9
89.9
78.8
64.2
66.5
45.4
76.4
77.8
57.4
84.1
84.9
67.0
71.2
70.0
68.4
82.4
81.5
80.0
88.9
88.0
87.3
64.5
65.4
62.5
76.5
77.3
74.4
84.2
84.9
82.2
71.1
69.9
68.5
82.5
81.5
80.0
88.9
88.3
87.4
64.5
65.8
62.7
76.6
77.5
74.5
84.2
85.1
82.3
70.3
73.8
55.9
81.9
83.8
69.1
88.8
89.8
78.6
64.2
66.4
45.3
76.3
77.8
57.4
83.9
84.8
67.0
70.6
69.4
67.8
82.2
81.2
79.8
89.0
88.2
87.4
64.3
65.1
62.2
76.4
77.1
74.2
84.2
84.9
82.1
70.5
69.4
68.0
82.2
81.3
79.9
89.0
88.4
87.5
64.2
65.5
62.4
76.4
77.4
74.3
84.2
85.1
82.4
79.4
82.2
83.6
89.4
90.6
93.6
94.5
94.5
97.5
78.7
80.4
83.7
89.0
89.5
93.4
94.1
94.0
97.4
80.0
78.5
79.3
89.8
88.9
89.6
94.8
94.2
94.7
78.7
79.1
80.3
88.8
89.3
90.2
94.2
94.5
95.0
81.5
78.1
80.5
90.9
88.8
90.4
95.7
94.1
95.2
81.8
78.1
82.4
91.4
88.5
91.5
95.9
94.0
95.8
CP: coverage probabilities; PI: prediction interval; NCL: nominal confidence level;
PL0: plug-in method with a normal approximation;
PL1: (plug-in method with the predictive distribution;
BTS: bootstrap prediction.
data are generated from TPFR model. The advantage of HPFR models
is reflected in the precise CPs for both models but have a tighter PI than
GPFR (see lengths of PIs of BTS methods from GPFR and HPFR models in
Table 5). The data under Scheme II and V are respectively a distribution
misspecification and the presence of outliers. We can see the reduction of
interval length using HPFR is much bigger under the presence of outliers than
a distribution misspecification. The PI becomes wider and the CP is higher
than the nominal levels under the GPFR model, while the two HPFR models
maintain the robustness advantage reflected in the narrow PIs with precise
CPs. This is further illustrated in Figure 5 which exhibits the results for
all subjects under Scheme V when nm = 61 from one simulated data set. It
vividly shows that: the BTS-based PIs (dark grey) are slightly wider than the
PL-based PIs (light grey), and the TPFR-based PIs (even rows) are narrow
20
Table 5: Lengths of the PIs for random terms.
nm
NCL (%)
80
31
90
95
80
61
90
95
Scheme
I
II
V
I
II
V
I
II
V
I
II
V
I
II
V
I
II
V
N
PL0
T
SL
N
PL1
T
SL
N
BTS
T
SL
0.176
0.246
0.234
0.225
0.316
0.300
0.269
0.376
0.357
0.134
0.187
0.169
0.172
0.240
0.217
0.205
0.286
0.258
0.179
0.225
0.177
0.230
0.289
0.228
0.274
0.344
0.271
0.136
0.170
0.134
0.174
0.218
0.172
0.208
0.260
0.205
0.178
0.225
0.176
0.228
0.289
0.226
0.272
0.344
0.269
0.135
0.170
0.134
0.174
0.218
0.172
0.207
0.259
0.204
0.176
0.246
0.233
0.225
0.316
0.300
0.268
0.376
0.357
0.134
0.187
0.169
0.172
0.240
0.217
0.205
0.286
0.258
0.177
0.223
0.175
0.229
0.288
0.227
0.275
0.346
0.273
0.135
0.169
0.133
0.174
0.218
0.172
0.208
0.260
0.206
0.176
0.223
0.174
0.228
0.288
0.225
0.273
0.346
0.270
0.135
0.169
0.133
0.174
0.217
0.171
0.208
0.260
0.205
0.214
0.301
0.398
0.275
0.386
0.510
0.328
0.460
0.605
0.182
0.256
0.360
0.234
0.328
0.462
0.278
0.390
0.549
0.218
0.263
0.223
0.281
0.339
0.287
0.336
0.405
0.343
0.184
0.216
0.196
0.236
0.278
0.252
0.281
0.332
0.300
0.226
0.261
0.227
0.291
0.336
0.292
0.347
0.402
0.349
0.197
0.212
0.206
0.253
0.272
0.265
0.301
0.324
0.315
but more accurate to cover the true random terms (solid lines) compared
with GPFR-based PIs (odd rows).
5.3
Prediction for a new subject
We have studied the prediction of subject-specific curves for individuals who
are observed. Now, we assess the performance of prediction of yM +1 (t) for a
new subject. We generate data of twenty one (M + 1) subjects with nm = 61
and take the responses from the last (the 21-th) subject as our test target.
Here we consider Schemes I, II and V as described before along with a new
one: Scheme VI. Under the new scheme the data are generated from GPFR,
the same as Scheme I, but with larger random errors in the test subject
(increase φε from 0.01 to 0.05). We use half (30 numbers) of the data from the
test subject together with the data from other twenty subjects as observed
data, and try to predict the other half (31 numbers) responses (test data) of
the test subject. We consider two types of test data: one is randomly chosen,
and the other is chosen from the second half, i.e., all the data in ti ∈ [0, 4].
They represent respectively interpolation and extrapolation. The later is more
challenging than the former. Tables 6-8 give in turn the RMSEs, CPs and
lengths of PIs of the BTS-based predictions calculated from fifty replications.
The values of RMSE are reported in Table 6. This can be used to measure
21
(1)
(2)
1
(3)
1
(4)
1
(5)
1
5
0
0
0
0
0
−1
−4
−1
−4
−1
−4
−1
−4
−5
−4
−2
0
2
4
1
0
−1
−4
−2
0
2
4
1
0
−2
0
(6)
2
4
1
−1
−4
−2
0
2
4
1
0
−2
0
(7)
2
4
1
−1
−4
−2
0
2
4
1
0
−2
0
(8)
2
4
1
−1
−4
0
(9)
2
4
1
−5
−4
0
0
0
0
−1
−4
−1
−4
−1
−4
−5
−4
0
2
4
−2
0
2
4
1
−2
0
2
4
1
−2
0
2
4
1
0
0
0
0
0
−1
−4
−1
−4
−1
−4
−5
−4
−2
0
(11)
2
4
0
−2
−4
−2
0
(12)
2
4
1
0
−2
0
2
4
2
−1
−4
−2
0
(13)
2
4
2
0
2
4
1
−2
−4
−2
0
(14)
2
4
1
0
−2
0
2
4
2
−1
−4
0
2
4
1
−1
−4
0
0
0
0
−1
−4
−2
−4
−1
−4
−1
−4
0
(16)
2
4
−2
0
(17)
2
4
1
−2
0
(18)
2
4
1
−2
0
(19)
2
4
1
0
0
0
0
0
−1
−4
−1
−4
−1
−4
−2
−4
−2
0
2
4
−2
0
2
4
1
−2
0
2
4
1
−2
0
2
4
1
0
0
0
0
0
−1
−4
−1
−4
−1
−4
−2
−4
Figure 5:
0
2
4
−2
0
2
4
−2
0
2
4
−2
0
2
4
−2
0
(15)
2
4
−2
0
2
4
−2
0
(20)
2
4
−2
0
2
4
−2
0
2
4
2
−1
−4
−2
4
2
−1
−4
1
2
1
0
−2
0
(10)
0
−2
−2
−4
1
−2
1
0
−2
4
5
−1
−4
2
2
5
0
−2
0
0
−2
−1
−4
1
−2
5
−2
0
2
4
Predictions of all twenty random terms under Scheme V. Odd rows: GPFR-based prediction;
Even rows: TPFR-based prediction. Solid line: true underlying random term; Dark grey: 95% BTS-based
PI; Light grey: 95% PL1-based PI.
the performance of prediction. For interpolation, all three models give very
similar results under Schemes I, II and VI, meaning all of them perform pretty
well and are robust when the distribution is misspecified. However, the two
HPFR models perform more robustly than the GPFR model in the presence
of outliers under Scheme V. The results for extrapolation almost tell the
same story. Overall the errors for extrapolation are larger than the errors for
interpolation.
We report the values of CPs and the length of PIs in Tables 7 and 8. We
have some interesting findings here. First of all, all three models give similar
results under Scheme I and the values of CP are all close to the nominal level.
This shows good robustness for the two HPFR models since the distribution is
misspecified for those two models under Scheme I. The CP values of the two
22
HPFR models are still close to the NCLs under Schemes II and V. But the
performance of GPFR is very unreliable. For example, the CPs are 88.3% for
interpolation and 98.5% for extrapolation when NCL is 80% under Scheme V.
For Scheme VI, the PIs under GPFR are quite narrow leading to smaller CPs.
For example, the CPs under GPFR are 69.3% for interpolation and 84.9% for
extrapolation when NCL is 95%. The GPFR model is very sensitive to the
possible fluctuation in a new subject. In contrast, the two HPFR models give
better results although those two models also suffer from both distribution
specification and high fluctuation in the test subject.
Table 6: RMSE between true responses and their predictions
of a new subject.
Scheme
I
II
V
VI
N
0.138
0.198
0.146
0.277
Interpolation
T
SL
0.137
0.197
0.137
0.277
N
0.138
0.197
0.137
0.277
0.242
0.311
0.277
0.315
Extrapolation
T
SL
0.243
0.305
0.249
0.315
0.243
0.306
0.249
0.316
Interpolation: the test data is randomly chosen from the new subject;
Extrapolation: the test data is chosen from the second half of the new
subject.
Table 7: CPs (%) of the PIs for responses of a new subject.
NCL (%)
80
90
95
Scheme
I
II
V
VI
I
II
V
VI
I
II
V
VI
N
81.1
85.9
88.3
49.9
90.0
92.3
96.1
60.7
94.7
94.9
98.6
69.3
Interpolation
T
SL
80.6
79.6
79.5
73.9
90.3
90.8
90.2
84.8
94.5
94.9
94.6
91.2
23
82.1
79.9
80.7
74.2
90.6
90.7
90.8
85.4
95.5
95.0
95.5
91.0
N
Extrapolation
T
SL
76.4
82.5
98.5
65.6
87.5
91.0
99.7
76.8
92.8
94.8
99.9
84.9
77.0
80.6
79.7
90.5
88.1
91.2
90.4
96.5
94.0
95.7
94.8
98.5
78.4
79.8
80.4
90.4
88.5
90.5
91.0
96.6
94.1
95.9
95.7
98.5
Table 8: Lengths the PIs for responses of a new subject.
NCL (%)
80
90
95
Scheme
I
II
V
VI
I
II
V
VI
I
II
V
VI
N
0.353
0.494
0.487
0.365
0.453
0.634
0.625
0.468
0.541
0.756
0.746
0.558
Interpolation
T
SL
0.356
0.429
0.353
0.623
0.461
0.555
0.456
0.806
0.554
0.667
0.548
0.969
0.365
0.428
0.359
0.624
0.471
0.553
0.463
0.809
0.565
0.665
0.555
0.973
N
0.584
0.787
1.390
0.589
0.750
1.010
1.787
0.755
0.894
1.203
2.131
0.900
Extrapolation
T
SL
0.598
0.735
0.642
1.100
0.774
0.950
0.831
1.423
0.929
1.142
0.998
1.710
0.606
0.729
0.650
1.108
0.784
0.943
0.840
1.435
0.941
1.133
1.007
1.726
The predictions with a 95% NCL from one simulated data set are plotted
in Figure 6 under Scheme V (upper panel) and VI (lower panel) when nm = 61.
For Scheme V, the PIs under the TPFR model (light grey) are narrow than
those under the GPFR model (dark grey), which is more significant for
extrapolation (upper-right). Bear in mind that the cover probability for the
former is more close to the NCL 95% than the latter. For Scheme VI, the
PIs under the GPFR model are too narrow to cover a large proportion of
observations, leading to a very small cover probability.
Overall, we show that the HPFR models perform better than the GPFR
model if the distribution is misspecified and if there are outliers. The proposed
BTS based-PIs under HPFR models are quite effective for subject-specific
predictions.
6
Concluding remarks and discussion
GP models are widely used for the analysis of medical data. The GPFR
model can describe subject-specific characteristics nonparametrically by using
a GP via a flexible covariance structure, and can cope with multidimensional
covariates in a nonparametric way. However, as we demonstrated in simulation studies, it is sensitive to distribution misspecification and outliers. To
overcome this drawback we proposed the HPFR model. We have shown via a
24
(1)
(2)
2
2
1
1
0
0
−1
−1
−2
−4
−2
−2
0
2
4
−4
−2
(3)
2
1
1
0
0
−1
−1
−2
Figure 6:
2
4
2
4
(4)
2
−4
0
−2
−2
0
2
4
−4
−2
0
Bootstrap prediction for a new subject. Upper-left: Interpolation under Scheme V; Upper-
right: Extrapolation under Scheme VI; Lower-left: Interpolation under Scheme VI; Lower-right: Extrapolation under Scheme VI. Dark grey: 95% BTS-based PI under the GPFR model; Light grey: 95%
BTS-based PI under the TPFR model. Dot: true observed response; Circle: true testing response.
comprehensive simulation study that the proposed model has a good property
of robustness, giving accurate results when the distribution is misspecified or
when the data are contaminated by outliers. The PIs for the subject-specific
curves have not been well developed. We propose the use of bootstrap PIs.
The simulation study shows that the proposed bootstrap PIs improve existing
PIs with respect to the CP and the length of the PI. Computing code of
Matlab or R can be provided upon request.
The model has a large flexibility since it includes a class of process
regression model for example, the one with a GP,(Shi et al., 2012) the
T-process regression model,(Wang et al., 2016) and the model with slash
process, mixtures of GP, exponential power process, etc. Even though limited,
according to our experience, we prefer the HP model to the GP model because
the HP model improves the GP model a lot in the presence of outliers. In
complex or big data, identification of all outliers would be difficult. It is
reasonable to use the HP model over the GP model since it can bring robust
inference and does not lose much efficiency even the data truly comes from
GP. As we demonstrated, some criteria, such as the values of BIC, predictive
25
RMSE, standard errors of the estimators and relative change ratios by deleting
potential outliers, can be used to select a specific HP model.
All computations were carried out in Matlab 2015b using a 2.4 GHz
Inter i5 processor with 8.0 GB RAM. The EM algorithm is very efficient
because the E-step is straightforward in our HPFR model. The computation
time to achieve convergence is about 12 seconds for each replication of the
simulation (Scheme V and nm = 61) and 6 seconds for the example in Section
3 under the T-process model. However, we can face with the problems from
high-dimensional covariates or high frequency data. For high-dimensional
covariates, penalized likelihood framework will helpful to select the crucial
functional covariates.(Yi et al., 2011) For high frequency data, the HPFR
model suffers from massive computation of inverse covariance matrices. A
variety of numerical methods, such as Nyström method, active set and filtering
approach (Shi and Choi, 2011) may be applied to solve the problem.
One crucial issue in FDA is the modelling of mean function and covariance
structure.(Guo, 2002; Antoniadis and Sapatinas, 2007) For the mean function
µm (t), we used nonparametric function u>
m β(t) combined with a parametric
(linear) function v >
(t)γ.
Other
mean
structures
can also be considered.
m
The functional coefficients β(t) were approximated by cubic B-splines basis
functions under equally spaced knots. It is known that the number of knots
tunes the bias and variance of resulting estimator. Thus, the choice of the
number of knots is important for the performance of B-spline approximation.
The guidance on this issue can be seen in Wu and Zhang (2006). In practice,
the number of knots can be determined by generalized cross-validation or
BIC methods. However, it is time consuming for high-dimensional and big
data. Some low-rank smoothing methods, such as penalized spline smoothing
(Ruppert et al., 2003) can be used to decrease the computational burden.
For the covariance structure, we combined a parametric (linear) randomeffects model w>
m (t)bm with a nonparametric (non-linear) random-effects
ζm (xm (t)) by using HPs. The parametric random-effects model w>
m (t)bm can
provide explanatory information between the response and some covariates
and characterize the heterogeneity among subjects. A diagonal matrix of Σb
means that a few latent variables act independently in subjects, while a nondiagonal positive matrix of Σb implies that there are correlations among the
latent variables. The nonparametric process-based random-effects ζm (xm (t))
can handle flexible subject-specific deviations. The covariance kernel allows
us consider multi-dimensional functional covariates to catch up the nonlinear
serial correlation within subject. The kernel may represent some symmetric
26
(mirror) effects over time if we only consider time in the process. More
discussion of the covariance kernel can be seen in Rasmussen and Williams
(2006) and Shi and Choi (2011).
Model diagnosis is also an important issue in FDA. Our HPFR model sidesteps the problem of heterogeneity, which means that different subjects share
the same location parameters, scale parameters and also the degrees. However,
this assumption is doubtful when the data are collected from different groups
or sources. Recently, Fang et al. (2016) proposed two tests for evaluating the
variance homogeneity in mixed-effects models. Their approaches may also be
adapted in the FDA models. For discussion of model-checking plots of various
model-misspecification, see Lee et al. (2006). We will develop a model-based
method of clustering or classification as a next work if the assumption of
homogeneity does not hold.
Acknowledgement
The authors thank Prof. West for providing us the renal anaemia data. The
authors are grateful to the Editor, the Associate Editor and two referees,
whose questions and insightful comments have greatly improved the work.
The author(s) disclosed receipt of the following financial support for the
research, authorship, and/or publication of this article: Dr Cao’s work was
supported by the National Science Foundation of China (Grant No. 11301278)
and the MOE (Ministry of Education in China) Project of Humanities and
Social Sciences (Grant No. 13YJC910001).
Technical details on the conditional distribution properties of SMN distributions, the information matrix and the proof of information consistency are
presented in an additional document, which is distributed with the paper as
Supplementary Material.
References
Rasmussen CE and Williams CKI. Gaussian Processes for Machine Learning.
Cambridge, MA: MIT Press, 2006.
Shi JQ, Wang B, Murray-Smith R, et al. Gaussian process functional regression
modelling for batch data. Biometrics 2007; 63(3): 714-723.
27
Shi JQ, Wang B, Will EJ, et al. Mixed-effects GPFR models with application
to dose-response curve prediction. Stat Med 2012; 31(26): 3165-3177.
Gramacy R and Lian H. Gaussian process single-index models as emulators
for computer experiments. Technometrics 2012; 54(1): 30-41.
Wang B and Shi JQ. Generalized Gaussian process regression model for nonGaussian functional Data. J AM Stat Assoc 2014; 109(507): 1123-1133.
Andrews DF and Mallows CL. Scale mixtures of normal distributions. J R
Soc Series B Stat Methodol 1974; 36(1): 99-102.
Lachos VH, Bandyopadhyay D and Dey DK. Linear and nonlinear mixedeffects models for censored HIV viral loads using normal/independent
distributions. Biometrics 2011; 67(4); 1594-1604.
Meza C, Osorio F and Cruz R. Estimation in nonlinear mixed-effects models
using heavy-tailed distributions. Stat Comput 2012; 22(1): 121-139.
Cao CZ, Lin JG., Shi JQ, et al. Multivariate measurement error models for
replicated data under heavy-tailed distributions. J Chemometr 2015; 29(8):
457-466.
Blas B, Bolfarine H and Lachos VH. Heavy tailed calibration model with
Berkson measurement errors for replicated data. Chemometr Intell Lab
2016; 156: 21-35.
Zhu H, Brown PJ and Morris JS. Robust, adaptive functional regression
in functional mixed model framework. J AM Stat Assoc 2011; 106(495):
1167-1179.
Osorio F. Influence diagnostics for robust P-splines using scale mixture of
normal distributions. Ann I of Stat Math 2016; 68(3): 589-619.
Lee Y, Nelder JA and Pawitan Y. Generalized linear models with randomeffects, unified analysis via H-likelihood. London: Chapman and Hall, 2006.
McCulloch CE and Neuhaus JM. Prediction of random effects in linear and
generalized linear models under model misspecification. Biometrics 2011;
67(1): 270-279.
28
Lee Y and Kim G. H-likelihood predictive intervals for unobservables. Int
Stat Rev 2016; Online.
Diggle PJ, Liang KY and Zeger SL. Analysis of longitudinal data. New York:
Oxford Univ. Press, 1996.
Lee Y and Nelder JA. Conditional and marginal models: Another view (with
discussion). Stat Sci 2004; 19(2): 219-238.
Ramsay JO and Silverman BW. Functional data analysis, 2nd edn. New York:
Springer, 2005.
Shi JQ and Choi T. Gaussian process regression analysis for functional data.
London: Chapman and Hall, 2011.
Lange K and Sinsheimer JS. Normal/independent distributions and their
applications in robust regression. J Comput Graph Stat 1993; 2(2): 175-198.
Pinheiro JC, Liu C and Wu YN. Efficient algorithms for robust estimation in
linear mixed-effect models using the multivariate t distribution. J Comput
Graph Stat 2001; 10(2): 249-276.
Savalli C, Paula GA and Cysneiros FJA. Assessment of variance components
in elliptical linear mixed models. Stat Model 2006; 6(1): 59-76.
Wang Z, Shi JQ and Lee Y. Extended T-process regression models. 2016;
arXiv: 1511.03402.
West RM, Harris K, Gilthorpe MS, et al. Functional data analysis applied to
a randomized controlled clinical trial in hemodialysis patients describes the
variability of patient responses in the control of renal anemia. J AM Soc
Nephrol 2007; 18(8): 2371-2376.
Tolman C, Richardson D, Bartlett C, et al. Structured conversion from thrice
weekly to weekly erythropoietic regimens using a computerized decisionsupport system: a randomized clinical study. J AM Soc Nephrol 2005;
16(5): 1463-1470.
Will EJ, Richardson D, Tolman C, et al. Development and exploitation of
a clinical decision support system for the management of renal anaemia.
Nephrology, Dialysis and Transplantation 2007; 22(suppl 4): iv31-iv36.
29
Schwarz G. Estimating the dimension of a model. Ann Stat 1978; 6(2):
461-464.
Fang KT, Kotz S and Ng KW. Symmetrical multivariate and related distributions. London: Chapman and Hall, 1990.
Lee Y, Nelder JA. Hierarchical Generalized Linear Models (with discussion).
J R Soc Series B Stat Methodol 1996; 58(4): 619-678.
Meng XL and Rubin DB. Maximum likelihood estimation via the ECM
algorithm: A general framework. Biometrika 1993; 80(2): 267-278.
Liu CH and Rubin DB. The ECME algorithm: a simple extension of EM and
ECM with faster monotone convergence. Biometrika 1994; 81(4): 633-648.
Lange K, Little R and Taylor J. Robust statistical modeling using the t
distribution. J AM Stat Assoc 1989; 84(408): 881-896.
Bjørnstad JF. Predictive likelihood: A review. Stat Sci 1990; 5(1): 242-265.
Bjørnstad JF. On the generalization of the likelihood function and likelihood
principle. J AM Stat Assoc 1996; 91(434): 791-806.
Seeger MW, Kakade SM and Foster DP. Information consistency of nonparametric Gaussian process methods. IEEE T Inform Theory 2008; 54(5):
2376-2382.
Yao F, Müller HG and Wang JL. Functional data analysis for sparse longitudinal data. J AM Stat Assoc 2005; 100(470): 577-590.
Li Y and Hsing T. Uniform convergence rates for nonparametric regression
and principal component analysis in functional/longitudinal data. Ann Stat
2010; 38(6): 3321-3351.
Yi G, Shi JQ and Choi T. Penalized gaussian process regression and classification for high-dimensional nonlinear data. Biometrics 2011; 67(4):
1285-1294.
Guo W. Functional mixed effects models. Biometrics 2002; 58(1): 121-128.
Antoniadis A and Sapatinas T. Estimation and inference in functional mixedeffects models. Comput Stat Data Anal 2007; 51(10): 4793-4813.
30
Wu H and Zhang JT. Nonparametric regression methods for longitudinal data
analysis: mixed-effects modeling approaches. New Jersey: Wiley, 2006.
Ruppert D, Wand MP and Carroll RJ. Semiparametric regression. Cambridge:
Cambridge University Press, 2003.
Fang X, Li J, et al. Detecting the violation of variance homogeneity in mixed
models. Stat Methods Med Res. 2016; 25(6): 2506-2520.
31