arXiv:1705.05618v1 [stat.ME] 16 May 2017 Robust functional regression model for marginal mean and subject-specific inferences Chunzheng Cao1 , Jian Qing Shi2 , and Youngjo Lee ∗3 1 School of Mathematics and Statistics, Nanjing University of Information Science and Technology, Nanjing, China 2 School of Mathematics & Statistics, Newcastle University, UK 3 Department of Statistics, Seoul National University, Seoul, Korea May 17, 2017 Abstract We introduce flexible robust functional regression models, using various heavy-tailed processes, including a Student t-process. We propose efficient algorithms in estimating parameters for the marginal mean inferences and in predicting conditional means as well interpolation and extrapolation for the subject-specific inferences. We develop bootstrap prediction intervals for conditional mean curves. Numerical studies show that the proposed model provides robust analysis against data contamination or distribution misspecification, and the proposed prediction intervals maintain the nominal confidence levels. A real data application is presented as an illustrative example. Keywords: Bootstrap, dose-response, EM algorithm, functional data analysis, outliers, prediction interval, robust, heavy-tailed process ∗ Corresponding author: [email protected] 1 1 Introduction Functional regression models are often used to analyze the medical data. However, conclusions from functional regression models can be sensitive to the presence of outliers or misspecifications of model assumptions. Prediction intervals (PIs) for subjects-specific inferences have not been well developed. In this paper, we propose robust functional regression models and study predictions of individual response curves and their prediction intervals. Gaussian process (GP) has been widely used to fit an unknown regression function to describe the nonparametric relationship between a response variable and a set of multi-dimensional covariates.(Rasmussen and Williams, 2006) Variety of covariance functions provide flexibility on fitting data with different degrees of smoothness and nonlinearity. Shi et al. (2007) introduced the GP functional regression (GPFR) models, which fit the mean structure and covariance structure simultaneously. This enables to estimate a subject-specific regression curve consistently. The idea is further extended to model nonlinear random-effects and is applied to construct a patient-specific dose-response curve.(Shi et al., 2012) Recent applications of GPFR models can be found in, for example, single-index model (Gramacy and Lian , 2012) and model of non-Gaussian functional data.(Wang and Shi, 2014) In the GPFR models, the mean structure can describe marginal mean curves using information from all subjects, while the covariance structure can catch up subject-specific characteristics. However, as we shall see, the GPFR models are not robust. They are sensitive to misspecification of distributions and the presence of outliers. In this paper, we propose to use heavy-tailed processes (HPs) to overcome the drawbacks of GPs. Specifically we will use scale mixtures of GP (SMGP),(Rasmussen and Williams, 2006) an extension of scale mixtures of normal (SMN) distribution.(Andrews and Mallows, 1974) The latter is a subclass of elliptical distribution family, including Studentt, slash, exponential power, contaminated-normal and other distributions. SMN distributions have been used in various models, including nonlinear mixed-effects models,(Lachos et al., 2011; Meza et al., 2012) measurement error models,(Cao et al., 2015; Blas et al., 2016) functional models,(Zhu et al., 2011; Osorio, 2016) and extended to double hierarchical generalized linear models.(Lee et al., 2006) Similarly SMGP includes many different heavy-tailed stochastic processes such as the Student t-process. The SMGP inherits most of the good features from GP; it is easy to understand its covariance structure, and it can use many different covariance functions to allow a flexible model. 2 SMGP includes GP as a special case. Moreover, SMGPs except GP are HPs and they provide a robust analysis against distributional misspecification or data contamination. In this paper, we extend the GPFR model (Shi et al., 2007, 2012) to a HP functional regression (HPFR) model for a robust analysis. For maximumlikelihood estimators (MLEs), we develop an efficient EM algorithm for proposed HPFR models. McCulloch and Neuhaus (2011) investigated the impact of distribution misspecification in linear and generalized linear models, and showed that the overall accuracy of prediction is not heavily affected by mild-to-moderate violations of the distribution assumptions. However, the improvement of using HPFR models becomes significant for the data contamination by outliers. A comprehensive simulation study is given in Section 5. Existing PIs for subject-specific inferences often give liberal intervals as we shall show later. Lee and Kim (2016) studied PIs for random-effect models and we extend them to HPFR models. We show that they maintain the nominal confidence level (NCL) by using numerical studies. The rest of this paper is organized as follows. In Section 2, we define the HPFR model along with a brief introduction of the GPFR model. In Section 3, we apply the HPFR model to analyze the renal anaemia data. We demonstrated that the proposed HPFR model provides a robust analysis against outliers and therefore avoids a misleading conclusion on a doseresponse relation. In Section 4 the estimation and prediction procedures are described and the information consistency of subject-specific response prediction is shown. In Section 5, a simulation study is presented to evaluate the performance of the HPFR model and proposed procedures. Concluding remarks and discussion are given in Section 6. All the technical details are provided as the supplementary materials. 3 2 2.1 Model The GPFR model Consider a mixed-effects concurrent GPFR model,(Shi et al., 2012) defined as: ym (t) = µm (t) + τm (t) + εm (t), > µm (t) = v > m (t)γ + um β(t), τm (t) = w> m (t)bm (1) + ζm (xm (t)), where ym (t) is a response curve for the m-th subject, depending on covariates um of dimension pu and functional covariates v m (t), wm (t) and xm (t) of dimensions pv , pw and px respectively. The model is composed of three parts: the marginal mean (related to the so-called population-average) (Diggle et al., 1996) E[ym (t)] = µm (t), the random-effect τm (t) for the m-th subject to give the conditional (or the so-called subject-specific) mean E[ym (t)|bm , ζm (xm (t))] = µm (t) + τm (t), and the random error εm (t). In this paper, we show how to make marginal and conditional inferences based on functional models as above.(Lee and Nelder, 2004) In the marginal mean µm (t), v > m (t)γ is an ordinary linear regression model with functional covariates v m (t) and regression coefficients γ, while u> m β(t) is proposed (Ramsay and Silverman, 2005) for nonparametric functional estimation using covariates um with unknown functional coefficients β(t). Here the pu -dimensional functional coefficient β(t) can be approximated by a set of basis functions. In this paper, we use B-splines, i.e., β(t) = B > Φ(t), in which B is a D × pu matrix with element βij , and Φ(t) = (Φ1 (t), . . . , ΦD (t))> are the B-spline basis functions. In the random-effects τm (t), w> m (t)bm is an ordinary linear random effect model with functional covariates wm (t) and random-effects bm ∼ Npw (0, Σb ) with, for simplicity, Σb being a diagonal matrix with elements φb = (φ1 , . . . , φpw )> , while ζm (xm (t)) is a functional (non-linear) randomeffects by using a GP with zero mean and covariance kernel C(xm (t), xm (t0 ); θ). 4 A common choice for this kernel is the following squared exponential function px 1X C(xm (t), xm (t ); θ) = v0 exp{− wk (xm,k (t) − xm,k (t0 ))2 }. 2 k=1 0 (2) Other choices of the kernel have been studied.(Rasmussen and Williams, 2006; Shi and Choi, 2011) Linear random-effects can provide a clear physical explanation between the response and the covariates, and can indicate which variables are the cause of the variation among different subjects. The unexplained part can be modeled by the functional random-effects. Note that, µm (t) is one function describing the marginal means of all subjects, while random functions τm (t) allow different functional and nonparametric effects for each subject to catch up subject-specific characteristics. The random error εm (t) follows N(0, φε ) and is independent at different t. For convenience, we can include it into the random-effects by τem (t) = τm (t) + εm (t). Then, τem (t) is a GP with zero mean and the covariance kernel e m (t), xm (t0 ); wm (t), wm (t0 ); θ, φb , φε ) C(x pw X = C(xm (t), xm (t0 ); θ) + φk wm,k (t)wm,k (t0 ) + φε δ(t, t0 ) , (3) k=1 where δ(·, ·) is the Kronecker delta function. Thus, the GPFR model for ym (t) e is a GP with mean µm (t) and covariance kernel C. 2.2 The HPFR model In this paper we show that the GPFR model provides sensitive analysis to the presence of outliers and misspecification of distribution assumptions. Thus, the inferences based on the GPFR model can be misleading. We show how to make robust analysis for marginal mean and subject-specific inferences using HPFR models. Given a latent variable rm , suppose that the conditional process of τem (t) follows a GP with zero mean, but with the covariance e m (t), xm (t0 ); wm (t), wm (t0 ); θ, φb , φε ), κ(rm )C(x 5 (4) e ·) is the covariance function given in (3), κ(·) is a strictly positive where C(·, function, and the latent random variable rm takes positive value with the cumulative distribution function (cdf) H(·; ν) and probability density function (pdf) h(·; ν) with ν being the degree of freedom. The property of this process depends on the choice of κ(·) and H(·; ν). When κ(·) ≡ 1, it degenerates to the GP. Lange and Sinsheimer (1993) studied SMN distribution with κ(rm ) = 1/rm . Rasmussen and Williams (2006) called the process with conditional covariance kernel (4) a SMGP and in this paper we call it HP to highlight that it can be applied to wider class of data using double hierarchical generalized linear models.(Lee et al., 2006) If r ∼ Gamma(ν/2, ν/2), it becomes a Studentt (T) process, if r ∼ Beta(ν, 1), it becomes a slash (SL) process and if P (r = γ) = ν, P (r = 1) = 1 − ν with 0 < ν 6 1, 0 < γ 6 1, it is a contaminated-normal (CN) process. We call model (1) under this HP structure a mixed-effects HPFR model (abbreviated as ‘HPFR’). HPFR models can cover existing heavy-tailed mixed-effects models, (Pinheiro et al., 2001; Savalli et al., 2006) a robust P-splines model (Osorio, 2016) and extended T-process regression models,(Wang et al., 2016) etc. 3 An illustrative example: the renal anaemia data West et al. (2007) studied the renal anaemia data, which contain 74 dialysis patients who received Darbepoetin alfa (DA) to treat their renal anaemia. The Hemoglobin (Hb) levels and epoetin dose were recorded from the original study period of 9 months with a further 3 months extension. The doses of epoetin DA were determined by a strict clinical decision support system (Tolman et al., 2005; Will et al., 2007) to maintain the Hb level around 11.8 g/dl, as a target for dialysis patients. The lower Hb level will lead the patient to anemia, while the higher level increases the risk of prothrombotic and other problems. The experiment is typically a dose-response study to evaluate the control of Hb levels with the agent DA. The GPFR model has been used to analyze the data.(Shi et al., 2012) The response is the Hb level, and the dose of DA and time are considered as predictors. Figure 1(1) displays the Hb levels measured for each patient in a period of 12 months, and Figure 1(3) gives the histogram of dosages of DA for all 6 patients. Since the distribution of dosage dosem (t) (for the m-th patient at the month t) is quite skew, we use a log-transformation log[10dosem (t) + 1] to reduce the impact of extreme values. As we see in Figure 1(4), the converted dose values except zeros are nearly normal after using the transformation. Taking into account a certain lag-time of the drug reaction, Shi et al. (2012) choose dosem (t − 2), i.e., the dosage of DA taken two months before, as a key covariate. Following Shi et al. (2012), we use a linear regression model for the marginal mean µm (t) = γ0 + γ1 · t + γ2 · dosem (t − 2) + γ3 · dosem (t − 2)2 , and random-effects τm (t) = bm + ζm (xm (t)), with xm (t) = [t, dosem (t − 2)]> . Figure 1(2) shows the index plots of Mahalanobis distance (defined later) of each patient under the GPFR model, which follows theoretically a χ2 -distribution. We can see that there are large Mahalanobis distance values from the four patients beyond the quantile χ20.01 cutoff line. These four patients are recognized as potential outliers. (1) (2) Hb level 16 14 32 52 58 62 50 Mahalanobis distance No. No. No. No. 12 10 8 2 4 6 month 8 10 62 32 40 30 52 10 0 12 58 20 20 (3) 40 Index 60 (4) 200 120 100 Frequency Frequency 150 100 80 60 40 50 20 0 0 0.5 1 1.5 2 0 2.5 Dose Figure 1: 0 1 2 Dose 3 Renal data: (1) the Hb levels for the 74 patients; (2) Mahalanobis distance under the GPFR model; (3) Histogram of dose values before transformation; (4) Histogram of dose values after transformation. 7 3.1 The population-average inferences Table 1 reports the MLEs of coefficients for the marginal mean and their standard errors (the first value in parentheses). Here ‘N’ stands for the GPFR model. The degree parameters for three heavy-tailed models are estimated together with other parameters. In Table 1, the negative coefficient of t means that the patients have a decreasing trend in the level of Hb over time without considering dose effects. Note that the values of Bayesian information criterion (BIC) (Schwarz, 1978) and the values of standard errors (the second values of each covariate in fixed-effects) for three heavy-tailed models are often smaller than those for the GPFR model, and the model using T-process (TPFR) is the best choice under BIC. We also report the values of root of mean squared error (RMSE) of the predictions for all patients in Table 1. The smaller values of RMSE under three HPFR models confirm the better performance of the heavy-tailed models in fitting the data. The third values of each covariate in fixed-effects in Table 1 are relative change ratios (%) of the estimates after removing the data from four potential outlier patients. We find that the change ratios of estimates under three HPFR models are much smaller than those under the GPFR model, which indicate the robustness in parameter estimation by using the heavy-tailed models. Table 1: Parameter estimation of the renal anaemia data. Model N T SL CN Degree / ν b = 7.106 ν b = 1.723 ν b = 0.424 γ b = 0.326 BIC RMSE 2136 2088 2092 2101 Covariates in fixed-effects t dosem (t − 2) Constant 0.416 11.500/0.310/-1.4 -0.036/0.022/-32.9 0.863/0.313/-6.1 0.393 11.426/0.297/-0.4 -0.035/0.021/-14.7 0.749/0.304/-0.6 0.391 11.451/0.299/-0.5 -0.035/0.021/-15.2 0.746/0.308/ 0.3 0.392 11.464/0.285/-0.6 -0.040/0.020/-17.8 0.744/0.267/-3.3 dose2m (t − 2) -0.196/0.103/-22.1 -0.097/0.108/ 2.8 -0.106/0.109/ 2.9 -0.094/0.099/-11.7 RMSE: root of mean squared error of the predictions; Three values of each covariate in fixed-effects are in turn, the values of estimate, standard error and the relative change ratio of the estimate after removing the outliers. We plot the marginal mean curve of Hb levels affected by dose in Figure 2 under four models. The solid lines and the dash lines are drawn respectively, based on the complete 74 patients data, and data deleting those from the four patients. For the GPFR model, the level of Hb increases when the dose is less than 0.8 and decreases slowly after it. However, this turning point changes from 0.8 to 1.3 after deleting the data from the four patients. Under all the HPFR models, the marginal mean maintains to increase on the whole interval but the increasing rate reduces slowly at large dosage. This is a more realistic 8 dose-response curve based on the current understanding on the dose-response relation. Compared with GPFR, the curve shapes under three HPFR models have smaller changes after deleting outliers, which implies the robustness of heavy-tailed models for population-average inferences. (1) N (2) T 13.5 13.5 13 Hb level Hb level 13 12.5 12 11.5 11 12.5 12 11.5 0 0.5 0.8 1.3 Dose 2 11 2.5 0 0.5 1 (3) SL 2 2.5 13 Hb level Hb level 2.5 13.5 13 12.5 12 11.5 12.5 12 11.5 0 0.5 1 1.5 2 11 2.5 0 0.5 Dose Figure 2: 2 (4) CN 13.5 11 1.5 Dose 1 1.5 Dose Mean curve of dose response based on original data (Solid line), and data without outliers (Dash line). 3.2 The subject-specific inferences We now study the patient-specific dose-response curves. We use the data from the first 12 months, and predict (extrapolate) the Hb level at the 14-th month for the dosage from 0 to 2.5. Figure 3 shows the predictions of the patient-specific dose-response curves under the GPFR and TPFR model for two typical patients. For the first patient, the dose-response curve under the GPFR model does not seem realistic, because it cannot achieve the target value of 11.8 even if we increase the dosage. However, the TPFR curve reaches the target when dosage is increased to 1.75. This is the dosage the patient should take at Month 12 if they wants to maintain the Hb level around 11.8 at Month 14. For the second patient, the dose-response curve suggests that the Hb level under the GPFR model reaches the target when dose is 1, and it stays at the same level even when the dosage increases. The TPFR model achieves a more reasonable dose-response relation: the Hb level reaches the target when dose is 0.75 and keep increasing as the dosage increases. We also 9 plotted the PIs for patient-specific curves. Note that the narrowest interval is allocated to the Hb level around the actual dosages the patients took in the previous month. This is because the covariance kernel we used in the stochastic process depends on the difference of dosages between consecutive months, and thus the prediction has less uncertainty in the neighbour of the previous dosage. We will show later how to construct the PIs and show that the proposed PIs maintain the NCLs. (2) T, Patient 1 15 14 14 13 13 12 12 Hb level Hb level (1) N, Patient 1 15 11 10 9 10 9 8 7 11 8 0 0.5 1 1.5 2 7 2.5 0 0.5 1 Dose 14 13 13 12 12 11 10 9 2.5 2 2.5 11 10 9 8 8 0 0.5 1 1.5 2 7 2.5 0 0.5 Dose Figure 3: 2 (4) T, Patient 2 15 14 Hb level Hb level (3) N, Patient 2 15 7 1.5 Dose 1 1.5 Dose Dose-response of patient-specific prediction for the first selected patient (the upper panel) and the second patient (the lower panel) under the GPFR (N) and TPFR (T) model. The solid lines stand for predictions, and the dash lines are 95% predictive bands. 4 Inference methods Suppose that we have data from M different subjects, and all functional covariates are observed at points tm = (tm1 , . . . , tm,nm )> for the m-th subject. Let y m = (ym1 , . . . , ym,nm )> , V m be the nm × pv matrix with the i-th row v> m (tmi ), X m and W m be the matrix defined in the same way as V m . Consider the following HPFR model y m = Φm Bum + V m γ + W m bm + ζ m + εm , 10 (5) where Φm is an nm × D matrix with the i-th row Φ> (tmi ), and ind εm |rm ∼ Nnm (0, κ(rm )φε I nm ), ind bm |rm ∼ Npw (0, κ(rm )Σb ), ind (6) ζ m |rm ∼ Nnm (0, κ(rm )C m ), iid rm ∼ H(r; ν), m = 1, . . . , M, C m is an nm ×nm covariance matrix with the (i, j)-th element C(xm (ti ), xm (tj ); θ). Then, y m follows a SMN distribution, (Andrews and Mallows, 1974; Fang et al., 1990) i.e., SMN(µm , Σm ; H), where µm = Φm Bum + V m γ and Σm = C m + W m Σb W > m + φε I nm . 4.1 Parameter estimation Denote the parameter set {B, γ, θ, φb , φε , ν} by Θ. Let D c = {D m , rm , m = 1, . . . , M } be the complete data, where D m = {y m , um , V m , W m , X m , tm } be the observed data of the m-th subject. For convenience, we separate Θ > into β = (Vec(B)> , γ > )> , ψ = (θ > , φ> b , φε ) and ν, where ν is a parameter for the degree of freedom. The log-likelihood of the complete data is given by M M M X 1X 1X l(Θ|D c ) = − log |Σm | − nm log κ(rm ) + log h(rm ; ν) 2 m=1 2 m=1 m=1 M 1 X −1 − κ (rm )(y m − Am β)> Σ−1 m (y m − Am β), 2 m=1 (7) where Am = [u> m ⊗ Φm , V m ] satisfies µm = Am β. For the MLEs of Θ, we propose to use an EM algorithm below. the current value of Θ(k) , we calculate the Q-function E-step: Given E l(Θ|D c )Θ(k) , D m , m = 1, . . . , M which is proportional to Q(Θ|Θ(k) ) = Q1 (β, ψ|Θ(k) ) + Q2 (ν|Θ(k) ), 11 (8) with M M 1 X (k) 1X log |Σm | − πm (y m − Am β)> Σ−1 Q1 (β, ψ|Θ ) = − m (y m − Am β), 2 m=1 2 m=1 (k) (9) (k) Q2 (ν|Θ ) = M X E[log h(rm ; ν)|Θ(k) , D m ], (10) m=1 where (k) πm = E[κ−1 (rm )|Θ(k) , D m ]. (11) In the HPs, the weight πm is inversely proportional to the Mahalanobis distance dm = (y m − µm )> Σ−1 m (y m − µm ) (see supplementary materials). In the presence of outliers or model misspecification, dm increases, leading to smaller weight to give robust analysis. The likelihood (7) is the h-likelihood of Lee and Nelder (1996). We can estimate the parameters by using the h-likelihood method. Given Θ, maximizing the conditional likelihood pΘ (rm |y m ) with respect to rm is equivalent to maximizing the h-likelihood L(Θ, rm ; y m , rm ), which has an explicit form as (7). To obtain MLEs of Θ in h-likelihood approach, the Laplace approximation has been proposed.(Lee et al., 2006) However, the EM algorithm is more efficient in the HPFR model, because the E-step (11) is straightforward. Following ECM (Meng and Rubin, 1993) and ECME,(Liu and Rubin, 1994) we implement the EM algorithm as follows. CM-step: We maximize (8) with respect to Θ to get the updated parameter given Θ(k) . We propose the following sub-iterative process: 1. Set the least square estimation of β as the initial value β (0) under the assumption: Σm = I m and πm = 1 for m = 1, . . . , M ; 2. Obtain the initial value ψ (0) by maximizing the Q-function in equation (9) under β (0) and πm = 1; 3. Choose a sensible initial value of ν (0) ; 4. Calculate the weights πm (m = 1, . . . , M ) under the current values β, ψ and ν; 12 5. Update β under the current values of πm , ψ by β (k+1) = M X (k) > −1 Am Σ m A m πm M −1 X (k) > −1 Am Σm y m ψ(k) ; πm (12) m=1 m=1 6. Update ψ by maximizing the Q-function in equation (9) under the current values of β and πm ; 7. Given the current values of β and ψ, update ν by maximizing the constrained actual marginal log-likelihood function (Liu and Rubin, 1994) of {y m , m = 1, . . . , M } over ν. Ideally we need to repeat steps 4 to 7 until it reaches convergence. Practically we can set a threshold and stop the sub-iteration when kΘ(k+1) −Θ(k) k is smaller than the threshold. Step 6 is analogous to that under GP assumptions √ by transforming the residual em = y m − Am β into e em = πm (y m − Am β). Following the ECME algorithm (Liu and Rubin, 1994), step 7 does not maximize the expected log-likelihood (10) but maximizes the actual log-likelihood over ν, which yields a much faster converging speed. The actual log-likelihood of {y m , m = 1, . . . , M } as well as the score function of Θ under some SMN distributions are given in supplementary materials. The asymptotic confidence intervals of the MLEs for this model can be b or the expected obtained through the observed information matrix J(Θ) b (see supplementary materials). Here we estimate the information matrix I(Θ) degree ν together with β and ψ. When sample size is small, the estimation of ν may be not reliable. In this case, we may fix its value, e.g., choose ν = 4 for Student-t,(Lange et al., 1989) or choose ν by BIC or by cross-validation. 4.2 Prediction Suppose that we want to predict the responses y ∗ = (y1∗ , . . . , yn∗ )> of a new (M + 1)-th subject at points t∗ = (t∗1 , . . . , t∗n )> . Besides the data of M subjects, suppose that we have nM +1 observations for the new (M + 1)-th subject at points tM +1 = (tM +1,1 , . . . , tM +1,nM +1 )> . Let D M +1 = {y M +1 , uM +1 , V M +1 , W M +1 , X M +1 , tM +1 } and D = {D m |m = 1, . . . , M + 1} be the observed data. We rewrite model (5) as y m = µm + τe m , 13 (13) ind with τe m ∼ SMN(0, Σm ; H) for m = 1, . . . , M + 1. For the (M + 1)-th subject, let τe ∗ be the random-effects at unobserved points t∗ , while τe M +1 be the random-effects at observed points tM +1 . Thus, we have e ∗> τe > M +1 , τ > e M +1 ), rM +1 ∼ Nn +n (0, κ(rM +1 )Σ M +1 where e M +1 = Σ ΣM +1 Σ∗M +1 Σ∗> Σ∗ M +1 (14) , (15) with Σ∗ and Σ∗M +1 are, respectively, the covariance matrix of the new subject with elements evaluated from the covariance function (2) at point pairs (t∗j , t∗k ) and (ti , t∗k ) for i ∈ {1, . . . , nM +1 } and j, k ∈ {1, . . . , n}. Then, the conditional expectation of y ∗ given D M +1 and Θ is E(y ∗ |D M +1 ; Θ) = E(µ∗ + τe ∗ |D M +1 ; Θ) = µ∗ + E E(e τ ∗ |rM +1 , D M +1 ; Θ) ∗ =µ + −1 Σ∗> M +1 ΣM +1 (y M +1 (16) − µM +1 ), where µ∗ and µM +1 are respectively the marginal mean curve of the new subject at points t∗ and tM +1 . The conditional variance of y ∗ given D M +1 and Θ is given by Var(y ∗ |D M +1 ; Θ) = E(y ∗ y ∗> |D M +1 ; Θ) − E(y ∗ |D M +1 ; Θ)E(y ∗> |D M +1 ; Θ) = E Var(y ∗ |rM +1 , D M +1 ; Θ) −1 ∗ = E[κ(rM +1 )|D M +1 ; Θ](Σ∗ − Σ∗> M +1 ΣM +1 ΣM +1 ). (17) For GP models, PIs have been studied, assuming normal distribution with the first two moments (16) and (17).(Wang and Shi, 2014) But, the predictive b under HPFR models distribution for the conditional mean p(y ∗ |D M +1 ; Θ) may not be normal. For instance the T-process with low degree of freedom, the normal approximation may not be appropriate. Thus, we should compute the quantiles based on the true predictive distribution for a better prediction. However, the resulting PIs would be liberal because they do not take account for the uncertainty in estimation the parameters.(Bjørnstad, 1990, 1996) We propose the following parametric bootstrap method based on a finite sample adjustment to remedy the drawback: 14 b J−1 (Θ)); b 1. Generate Θ∗j (j = 1, . . . , J) from its asymptotic distribution N(Θ, 2. Approximate the predictive distribution of y ∗ by Z ∗ b Θ|D)d b b p(y |D) = p(y ∗ |D M +1 ; Θ)p( Θ J 1X p(y ∗ |D M +1 ; Θ∗j ). ≈ J j=1 (18) Note that Step (2) involves the generation of data from p(y ∗ |D M +1 ; Θ∗j ) which may be not straightforward in some cases. This problem can be addressed by augmenting the latent variable rM +1 in the process. We first generate rM +1 from the conditional distribution p(rM +1 |D M +1 ; Θ∗j ) which is given in supplementary materials for some distributions. Given D M +1 and rM +1 , it is straightforward to generate y ∗ since its conditional distribution is Gaussian: e ∗ ), µ∗ , κ(rM +1 )Σ (19) y ∗ |rM +1 , D M +1 , Θ∗j ∼ Nn (e −1 ∗ ∗> −1 ∗ e∗ e ∗ = µ∗ + Σ∗> with µ M +1 ΣM +1 (y M +1 − µM +1 ) and Σ = Σ − ΣM +1 ΣM +1 ΣM +1 ∗ evaluated at Θj . Hence, we can obtain the prediction as well as pointwise PIs of y ∗ using the sample quantiles calculated from bootstrap replications. The bootstrap method can also be used to calculate other quantities, such as the subject-specific random-effects. It can be considered as an extension of PIs (Lee and Kim, 2016) for random-effects to curve predictions. 4.3 Information consistency Seeger et al. (2008) proved information consistency that the prediction based on GP model can be consistent to the true curve, which has been extended to generalized GPFR model (Wang and Shi, 2014) and T-process regression model. (Wang et al., 2016) We now discuss this information consistency for the HPFR model. For convenience, we omit the subscript m here and denote the data by y n = (y1 , . . . , yn )> at points tn = (t1 , . . . , tn )> and all the corresponding covariates in the random-effects as X n = (x1 , . . . , xn )> , where xi ∈ X ⊂ Rpx are independently drawn from a distribution U(x). Let p(y n |τ0 , X n ) be the density function of y n given τ0 and X n , where τ0 (·) is the true underling 15 function and hence the true mean of yi is given by µ(ti ) + τ0 (xi ). Let Z pθ (y n |X n ) = p(y n |τ, X n ) dpθ (τ ) F be the density of y n given X n based on the HPFR model, where pθ (τ ) is a measure of the stochastic process τ (·) on space F = {τ (·) : X → R}, and θ contains R all the parameters in the covariance function of τ (·). Denote D[p1 , p2 ] = (log p1 − log p2 ) dp1 as the Kullback-Leibler divergence between p1 and p2 , we have the following result. Theorem 1. Under appropriate conditions: b→θ (C1) The covariance kernel function C(·, ·; θ) is continuous in θ and θ almost surely as n → ∞; (C2) The reproducing kernel Hilbert space (RKHS) norm kτ0 kc is bounded; (C3) The expected regret term EX n (log |I n + φ−1 C n |) = o(n), we have 1 EX (D[p(y n |τ0 , X n ), pθb (y n |X n )]) → 0 as n → ∞, n n (20) where pθb (y n |X n ) is the estimated density of y n under the HPFR model, EX n denotes the expectation under the distribution of X n . Proof is in supplementary materials. Note here that p(y n |τ0 , X n ) is the density of the true model and pθb (y n |X n ) is that of the working model, and it achieves information consistency, i.e., the density of working model converges to that of true model. The estimation of the mean structure µm (t) is based on all independent subjects and has been proved to be consistent in many functional models.(Ramsay and Silverman, 2005; Yao et al., 2005; Li and Hsing, 2010) Under the working model (5) of the observations, the b is easily hold since it is essentially an MLE consistency of the estimators Θ of a longitudinal model. The expected regret term EX n (log |I n + φ−1 C n |) is of order o(n) for some widely used covariance kernels.(Seeger et al., 2008) 16 5 Simulation studies We now investigate the performance of the HPFR model in terms of accuracy and robustness for both estimation and prediction. We will compare models with the following three types of process: N (GP), T with ν = 4 and SL with ν = 1.3. The mean curve of the true model is µ(t) = 0.8 sin((0.5t)3 ) with t equally distributed in [−4, 4]. The random terms τem (t)’s are generated under a SMGP by setting: xm (t) = 2.5t with θ > = (v0 , w) = (0.04, 1) for the nonlinear random-effects, wm (t) = 0.5t with φb = 0.01 for the linear random-effects, and φε = 0.01 for the random error. To compare the performance of different models, we generate data of twenty independent subjects under one of the following five schemes: (I): GPFR; (II) TPFR with ν = 4; (III) GPFR with curve disturbance in the 5-th subject (increase the amplitude of µ5 (t) from 0.8 to 4); (IV) GPFR with outliers in the 10-th subject (jump the region {y10 (t)|t ∈ [−1, 1]} upward 2 units); (V) Combine III and IV. We then use three models (N, T, SL) to fit the data. We estimate the unknown parameters and then calculate the marginal mean curve and random terms for each subject. The functional coefficients involved in the fixed mean term are approximated by cubic B-splines with 18 knots equally spaced in [−4, 4] which is suitable for our simulations. In practice, the number of basis functions can be chosen by BIC or other methods. 5.1 Estimation for population-average inferences We first study the performances of HPFR models for inferences about the marginal mean E[y(t)] = µ(t). Table 2 reports the RMSE between µ(t) and µ b(t) for nm = 31 and 61 based on fifty replications. Under ‘T’ and ‘SL’ we use the t-process and slash process with fixed values of ν, while under ‘T1’ and ‘SL1’ we estimate it as well as other parameters. The results under Scheme I show little difference among three models when the data are generated from the GPFR model. However, if the data are generated from the TPFR model (Scheme II), the GPFR model gives a poor fitting. This indicates that HPFR models provide robust fittings against a misspecification of distribution. We added two types of outliers respectively in Schemes III and IV and both of them in Scheme V. The results obtained from the HPFR models with heavy-tailed process are much better than the GPFR model, indicating that the HPFR models are robust against outliers. The performances of two 17 HPFR models when ν is estimated (T1 and SL1) are very close to the models when ν fixed (T and SL). Moreover, the choice between T and SL is not crucial, since their results are quite similar. Table 2: RMSE between true mean curve and its estimation. Scheme I II III IV V nm = 31 T1 SL N T 0.055 0.076 0.111 0.076 0.117 0.057 0.056 0.058 0.059 0.057 0.055 0.056 0.058 0.059 0.057 0.056 0.056 0.057 0.059 0.056 nm = 61 T1 SL SL1 N T 0.055 0.057 0.056 0.059 0.055 0.053 0.073 0.105 0.075 0.116 0.055 0.055 0.056 0.056 0.055 0.054 0.055 0.056 0.056 0.055 0.054 0.056 0.056 0.056 0.055 SL1 0.053 0.056 0.055 0.056 0.055 T and SL: the t-process and slash process with fixed degrees; T1 and SL1: the t-process and slash process with estimated degrees. Figure 4 shows the estimation of mean curve together with its 95% confidence interval under Scheme V with nm = 61 when we use the GPFR and TPFR model from one simulated data set. It shows clearly that TPFR achieves a smaller bias and narrow but precise confidence intervals for the marginal means. Asymptotic normality of µ b(t) is well established, so that these confidence intervals will be asymptotically correct. (1) 1.5 1 1 0.5 0.5 0 0 −0.5 −0.5 −1 −1 −1.5 −4 Figure 4: (2) 1.5 −2 0 2 −1.5 −4 4 −2 0 2 4 Estimation of mean curve under Scheme V by using: (1) GPFR; (2) TPFR. Solid line: the true mean curve; dotted line: the estimation of the mean curve; shaded area: 95% confidence interval. 5.2 Prediction for subject-specific inferences We now study inferences about conditional mean, which involves prediction of random-effects τm (t) under a conditional model (1).(Lee and Nelder, 2004) 18 Table 3 lists the RMSE between true random-effects τm (t) and their predictions τbm (t) for all subjects under Schemes I and II. Under Schemes III-V, we calculate the RMSE for the subjects except the one with outliers. The RMSEs become smaller for all cases as nm increases. This is because the information about the random-effects is mainly provided by each individual subject and the accuracy is therefore dependent mainly on the sample size of each individual subject. The two HPFR models outperform the GPFR model when the data come from TPFR or GPFR with outliers, which is consistent with the previous findings. Table 3: RMSE between true random terms and their predictions. Scheme I II III IV V N nm = 31 T SL N nm = 61 T SL 0.084 0.120 0.128 0.097 0.140 0.084 0.113 0.084 0.087 0.087 0.084 0.113 0.084 0.086 0.087 0.073 0.105 0.116 0.086 0.126 0.073 0.092 0.073 0.073 0.075 0.073 0.092 0.072 0.073 0.075 It is important to make interval statements about prediction of individual subject. PIs based on normal assumption with the moments (16) and (17) is called ‘PL0’ (plug-in method with a normal approximation), and those based on quantiles from the predictive distribution pΘ b (τm (t)|D) is called ‘PL1’ (plug-in method with the predictive distribution). In this paper, we propose the bootstrap (BTS) PIs in Section 4.3. We study the PIs for τm (t). Table 4 shows the coverage probabilities (CPs) of pointwise PIs for the random terms, and Table 5 shows their average lengths. For BTS, in each replication, we generate 1000 bootstrap samples from the predictive distribution. The value of J in (18) is set as 50, which is big enough to provide a reasonably accurate value to approximate the integration. Note that results between PL0 and PL1 are quite similar because a normal approximation may work well in these cases. However, the CPs for both of them are smaller than the nominal confidence levels (NCLs) since they do not take account of the uncertainty in estimating the parameters. We see that the BTS overcomes this drawback and maintains NCLs. When the data are generated from a GPFR model under Scheme I, all three models provide good results and the difference is ignorable. Under Scheme II the 19 Table 4: CPs (%) of the PIs for random terms. nm NCL (%) Scheme 80 31 90 95 80 61 90 95 I II V I II V I II V I II V I II V I II V N PL0 T SL N PL1 T SL N BTS T SL 70.5 73.9 56.0 82.1 83.9 69.3 88.9 89.9 78.8 64.2 66.5 45.4 76.4 77.8 57.4 84.1 84.9 67.0 71.2 70.0 68.4 82.4 81.5 80.0 88.9 88.0 87.3 64.5 65.4 62.5 76.5 77.3 74.4 84.2 84.9 82.2 71.1 69.9 68.5 82.5 81.5 80.0 88.9 88.3 87.4 64.5 65.8 62.7 76.6 77.5 74.5 84.2 85.1 82.3 70.3 73.8 55.9 81.9 83.8 69.1 88.8 89.8 78.6 64.2 66.4 45.3 76.3 77.8 57.4 83.9 84.8 67.0 70.6 69.4 67.8 82.2 81.2 79.8 89.0 88.2 87.4 64.3 65.1 62.2 76.4 77.1 74.2 84.2 84.9 82.1 70.5 69.4 68.0 82.2 81.3 79.9 89.0 88.4 87.5 64.2 65.5 62.4 76.4 77.4 74.3 84.2 85.1 82.4 79.4 82.2 83.6 89.4 90.6 93.6 94.5 94.5 97.5 78.7 80.4 83.7 89.0 89.5 93.4 94.1 94.0 97.4 80.0 78.5 79.3 89.8 88.9 89.6 94.8 94.2 94.7 78.7 79.1 80.3 88.8 89.3 90.2 94.2 94.5 95.0 81.5 78.1 80.5 90.9 88.8 90.4 95.7 94.1 95.2 81.8 78.1 82.4 91.4 88.5 91.5 95.9 94.0 95.8 CP: coverage probabilities; PI: prediction interval; NCL: nominal confidence level; PL0: plug-in method with a normal approximation; PL1: (plug-in method with the predictive distribution; BTS: bootstrap prediction. data are generated from TPFR model. The advantage of HPFR models is reflected in the precise CPs for both models but have a tighter PI than GPFR (see lengths of PIs of BTS methods from GPFR and HPFR models in Table 5). The data under Scheme II and V are respectively a distribution misspecification and the presence of outliers. We can see the reduction of interval length using HPFR is much bigger under the presence of outliers than a distribution misspecification. The PI becomes wider and the CP is higher than the nominal levels under the GPFR model, while the two HPFR models maintain the robustness advantage reflected in the narrow PIs with precise CPs. This is further illustrated in Figure 5 which exhibits the results for all subjects under Scheme V when nm = 61 from one simulated data set. It vividly shows that: the BTS-based PIs (dark grey) are slightly wider than the PL-based PIs (light grey), and the TPFR-based PIs (even rows) are narrow 20 Table 5: Lengths of the PIs for random terms. nm NCL (%) 80 31 90 95 80 61 90 95 Scheme I II V I II V I II V I II V I II V I II V N PL0 T SL N PL1 T SL N BTS T SL 0.176 0.246 0.234 0.225 0.316 0.300 0.269 0.376 0.357 0.134 0.187 0.169 0.172 0.240 0.217 0.205 0.286 0.258 0.179 0.225 0.177 0.230 0.289 0.228 0.274 0.344 0.271 0.136 0.170 0.134 0.174 0.218 0.172 0.208 0.260 0.205 0.178 0.225 0.176 0.228 0.289 0.226 0.272 0.344 0.269 0.135 0.170 0.134 0.174 0.218 0.172 0.207 0.259 0.204 0.176 0.246 0.233 0.225 0.316 0.300 0.268 0.376 0.357 0.134 0.187 0.169 0.172 0.240 0.217 0.205 0.286 0.258 0.177 0.223 0.175 0.229 0.288 0.227 0.275 0.346 0.273 0.135 0.169 0.133 0.174 0.218 0.172 0.208 0.260 0.206 0.176 0.223 0.174 0.228 0.288 0.225 0.273 0.346 0.270 0.135 0.169 0.133 0.174 0.217 0.171 0.208 0.260 0.205 0.214 0.301 0.398 0.275 0.386 0.510 0.328 0.460 0.605 0.182 0.256 0.360 0.234 0.328 0.462 0.278 0.390 0.549 0.218 0.263 0.223 0.281 0.339 0.287 0.336 0.405 0.343 0.184 0.216 0.196 0.236 0.278 0.252 0.281 0.332 0.300 0.226 0.261 0.227 0.291 0.336 0.292 0.347 0.402 0.349 0.197 0.212 0.206 0.253 0.272 0.265 0.301 0.324 0.315 but more accurate to cover the true random terms (solid lines) compared with GPFR-based PIs (odd rows). 5.3 Prediction for a new subject We have studied the prediction of subject-specific curves for individuals who are observed. Now, we assess the performance of prediction of yM +1 (t) for a new subject. We generate data of twenty one (M + 1) subjects with nm = 61 and take the responses from the last (the 21-th) subject as our test target. Here we consider Schemes I, II and V as described before along with a new one: Scheme VI. Under the new scheme the data are generated from GPFR, the same as Scheme I, but with larger random errors in the test subject (increase φε from 0.01 to 0.05). We use half (30 numbers) of the data from the test subject together with the data from other twenty subjects as observed data, and try to predict the other half (31 numbers) responses (test data) of the test subject. We consider two types of test data: one is randomly chosen, and the other is chosen from the second half, i.e., all the data in ti ∈ [0, 4]. They represent respectively interpolation and extrapolation. The later is more challenging than the former. Tables 6-8 give in turn the RMSEs, CPs and lengths of PIs of the BTS-based predictions calculated from fifty replications. The values of RMSE are reported in Table 6. This can be used to measure 21 (1) (2) 1 (3) 1 (4) 1 (5) 1 5 0 0 0 0 0 −1 −4 −1 −4 −1 −4 −1 −4 −5 −4 −2 0 2 4 1 0 −1 −4 −2 0 2 4 1 0 −2 0 (6) 2 4 1 −1 −4 −2 0 2 4 1 0 −2 0 (7) 2 4 1 −1 −4 −2 0 2 4 1 0 −2 0 (8) 2 4 1 −1 −4 0 (9) 2 4 1 −5 −4 0 0 0 0 −1 −4 −1 −4 −1 −4 −5 −4 0 2 4 −2 0 2 4 1 −2 0 2 4 1 −2 0 2 4 1 0 0 0 0 0 −1 −4 −1 −4 −1 −4 −5 −4 −2 0 (11) 2 4 0 −2 −4 −2 0 (12) 2 4 1 0 −2 0 2 4 2 −1 −4 −2 0 (13) 2 4 2 0 2 4 1 −2 −4 −2 0 (14) 2 4 1 0 −2 0 2 4 2 −1 −4 0 2 4 1 −1 −4 0 0 0 0 −1 −4 −2 −4 −1 −4 −1 −4 0 (16) 2 4 −2 0 (17) 2 4 1 −2 0 (18) 2 4 1 −2 0 (19) 2 4 1 0 0 0 0 0 −1 −4 −1 −4 −1 −4 −2 −4 −2 0 2 4 −2 0 2 4 1 −2 0 2 4 1 −2 0 2 4 1 0 0 0 0 0 −1 −4 −1 −4 −1 −4 −2 −4 Figure 5: 0 2 4 −2 0 2 4 −2 0 2 4 −2 0 2 4 −2 0 (15) 2 4 −2 0 2 4 −2 0 (20) 2 4 −2 0 2 4 −2 0 2 4 2 −1 −4 −2 4 2 −1 −4 1 2 1 0 −2 0 (10) 0 −2 −2 −4 1 −2 1 0 −2 4 5 −1 −4 2 2 5 0 −2 0 0 −2 −1 −4 1 −2 5 −2 0 2 4 Predictions of all twenty random terms under Scheme V. Odd rows: GPFR-based prediction; Even rows: TPFR-based prediction. Solid line: true underlying random term; Dark grey: 95% BTS-based PI; Light grey: 95% PL1-based PI. the performance of prediction. For interpolation, all three models give very similar results under Schemes I, II and VI, meaning all of them perform pretty well and are robust when the distribution is misspecified. However, the two HPFR models perform more robustly than the GPFR model in the presence of outliers under Scheme V. The results for extrapolation almost tell the same story. Overall the errors for extrapolation are larger than the errors for interpolation. We report the values of CPs and the length of PIs in Tables 7 and 8. We have some interesting findings here. First of all, all three models give similar results under Scheme I and the values of CP are all close to the nominal level. This shows good robustness for the two HPFR models since the distribution is misspecified for those two models under Scheme I. The CP values of the two 22 HPFR models are still close to the NCLs under Schemes II and V. But the performance of GPFR is very unreliable. For example, the CPs are 88.3% for interpolation and 98.5% for extrapolation when NCL is 80% under Scheme V. For Scheme VI, the PIs under GPFR are quite narrow leading to smaller CPs. For example, the CPs under GPFR are 69.3% for interpolation and 84.9% for extrapolation when NCL is 95%. The GPFR model is very sensitive to the possible fluctuation in a new subject. In contrast, the two HPFR models give better results although those two models also suffer from both distribution specification and high fluctuation in the test subject. Table 6: RMSE between true responses and their predictions of a new subject. Scheme I II V VI N 0.138 0.198 0.146 0.277 Interpolation T SL 0.137 0.197 0.137 0.277 N 0.138 0.197 0.137 0.277 0.242 0.311 0.277 0.315 Extrapolation T SL 0.243 0.305 0.249 0.315 0.243 0.306 0.249 0.316 Interpolation: the test data is randomly chosen from the new subject; Extrapolation: the test data is chosen from the second half of the new subject. Table 7: CPs (%) of the PIs for responses of a new subject. NCL (%) 80 90 95 Scheme I II V VI I II V VI I II V VI N 81.1 85.9 88.3 49.9 90.0 92.3 96.1 60.7 94.7 94.9 98.6 69.3 Interpolation T SL 80.6 79.6 79.5 73.9 90.3 90.8 90.2 84.8 94.5 94.9 94.6 91.2 23 82.1 79.9 80.7 74.2 90.6 90.7 90.8 85.4 95.5 95.0 95.5 91.0 N Extrapolation T SL 76.4 82.5 98.5 65.6 87.5 91.0 99.7 76.8 92.8 94.8 99.9 84.9 77.0 80.6 79.7 90.5 88.1 91.2 90.4 96.5 94.0 95.7 94.8 98.5 78.4 79.8 80.4 90.4 88.5 90.5 91.0 96.6 94.1 95.9 95.7 98.5 Table 8: Lengths the PIs for responses of a new subject. NCL (%) 80 90 95 Scheme I II V VI I II V VI I II V VI N 0.353 0.494 0.487 0.365 0.453 0.634 0.625 0.468 0.541 0.756 0.746 0.558 Interpolation T SL 0.356 0.429 0.353 0.623 0.461 0.555 0.456 0.806 0.554 0.667 0.548 0.969 0.365 0.428 0.359 0.624 0.471 0.553 0.463 0.809 0.565 0.665 0.555 0.973 N 0.584 0.787 1.390 0.589 0.750 1.010 1.787 0.755 0.894 1.203 2.131 0.900 Extrapolation T SL 0.598 0.735 0.642 1.100 0.774 0.950 0.831 1.423 0.929 1.142 0.998 1.710 0.606 0.729 0.650 1.108 0.784 0.943 0.840 1.435 0.941 1.133 1.007 1.726 The predictions with a 95% NCL from one simulated data set are plotted in Figure 6 under Scheme V (upper panel) and VI (lower panel) when nm = 61. For Scheme V, the PIs under the TPFR model (light grey) are narrow than those under the GPFR model (dark grey), which is more significant for extrapolation (upper-right). Bear in mind that the cover probability for the former is more close to the NCL 95% than the latter. For Scheme VI, the PIs under the GPFR model are too narrow to cover a large proportion of observations, leading to a very small cover probability. Overall, we show that the HPFR models perform better than the GPFR model if the distribution is misspecified and if there are outliers. The proposed BTS based-PIs under HPFR models are quite effective for subject-specific predictions. 6 Concluding remarks and discussion GP models are widely used for the analysis of medical data. The GPFR model can describe subject-specific characteristics nonparametrically by using a GP via a flexible covariance structure, and can cope with multidimensional covariates in a nonparametric way. However, as we demonstrated in simulation studies, it is sensitive to distribution misspecification and outliers. To overcome this drawback we proposed the HPFR model. We have shown via a 24 (1) (2) 2 2 1 1 0 0 −1 −1 −2 −4 −2 −2 0 2 4 −4 −2 (3) 2 1 1 0 0 −1 −1 −2 Figure 6: 2 4 2 4 (4) 2 −4 0 −2 −2 0 2 4 −4 −2 0 Bootstrap prediction for a new subject. Upper-left: Interpolation under Scheme V; Upper- right: Extrapolation under Scheme VI; Lower-left: Interpolation under Scheme VI; Lower-right: Extrapolation under Scheme VI. Dark grey: 95% BTS-based PI under the GPFR model; Light grey: 95% BTS-based PI under the TPFR model. Dot: true observed response; Circle: true testing response. comprehensive simulation study that the proposed model has a good property of robustness, giving accurate results when the distribution is misspecified or when the data are contaminated by outliers. The PIs for the subject-specific curves have not been well developed. We propose the use of bootstrap PIs. The simulation study shows that the proposed bootstrap PIs improve existing PIs with respect to the CP and the length of the PI. Computing code of Matlab or R can be provided upon request. The model has a large flexibility since it includes a class of process regression model for example, the one with a GP,(Shi et al., 2012) the T-process regression model,(Wang et al., 2016) and the model with slash process, mixtures of GP, exponential power process, etc. Even though limited, according to our experience, we prefer the HP model to the GP model because the HP model improves the GP model a lot in the presence of outliers. In complex or big data, identification of all outliers would be difficult. It is reasonable to use the HP model over the GP model since it can bring robust inference and does not lose much efficiency even the data truly comes from GP. As we demonstrated, some criteria, such as the values of BIC, predictive 25 RMSE, standard errors of the estimators and relative change ratios by deleting potential outliers, can be used to select a specific HP model. All computations were carried out in Matlab 2015b using a 2.4 GHz Inter i5 processor with 8.0 GB RAM. The EM algorithm is very efficient because the E-step is straightforward in our HPFR model. The computation time to achieve convergence is about 12 seconds for each replication of the simulation (Scheme V and nm = 61) and 6 seconds for the example in Section 3 under the T-process model. However, we can face with the problems from high-dimensional covariates or high frequency data. For high-dimensional covariates, penalized likelihood framework will helpful to select the crucial functional covariates.(Yi et al., 2011) For high frequency data, the HPFR model suffers from massive computation of inverse covariance matrices. A variety of numerical methods, such as Nyström method, active set and filtering approach (Shi and Choi, 2011) may be applied to solve the problem. One crucial issue in FDA is the modelling of mean function and covariance structure.(Guo, 2002; Antoniadis and Sapatinas, 2007) For the mean function µm (t), we used nonparametric function u> m β(t) combined with a parametric (linear) function v > (t)γ. Other mean structures can also be considered. m The functional coefficients β(t) were approximated by cubic B-splines basis functions under equally spaced knots. It is known that the number of knots tunes the bias and variance of resulting estimator. Thus, the choice of the number of knots is important for the performance of B-spline approximation. The guidance on this issue can be seen in Wu and Zhang (2006). In practice, the number of knots can be determined by generalized cross-validation or BIC methods. However, it is time consuming for high-dimensional and big data. Some low-rank smoothing methods, such as penalized spline smoothing (Ruppert et al., 2003) can be used to decrease the computational burden. For the covariance structure, we combined a parametric (linear) randomeffects model w> m (t)bm with a nonparametric (non-linear) random-effects ζm (xm (t)) by using HPs. The parametric random-effects model w> m (t)bm can provide explanatory information between the response and some covariates and characterize the heterogeneity among subjects. A diagonal matrix of Σb means that a few latent variables act independently in subjects, while a nondiagonal positive matrix of Σb implies that there are correlations among the latent variables. The nonparametric process-based random-effects ζm (xm (t)) can handle flexible subject-specific deviations. The covariance kernel allows us consider multi-dimensional functional covariates to catch up the nonlinear serial correlation within subject. The kernel may represent some symmetric 26 (mirror) effects over time if we only consider time in the process. More discussion of the covariance kernel can be seen in Rasmussen and Williams (2006) and Shi and Choi (2011). Model diagnosis is also an important issue in FDA. Our HPFR model sidesteps the problem of heterogeneity, which means that different subjects share the same location parameters, scale parameters and also the degrees. However, this assumption is doubtful when the data are collected from different groups or sources. Recently, Fang et al. (2016) proposed two tests for evaluating the variance homogeneity in mixed-effects models. Their approaches may also be adapted in the FDA models. For discussion of model-checking plots of various model-misspecification, see Lee et al. (2006). We will develop a model-based method of clustering or classification as a next work if the assumption of homogeneity does not hold. Acknowledgement The authors thank Prof. West for providing us the renal anaemia data. The authors are grateful to the Editor, the Associate Editor and two referees, whose questions and insightful comments have greatly improved the work. The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Dr Cao’s work was supported by the National Science Foundation of China (Grant No. 11301278) and the MOE (Ministry of Education in China) Project of Humanities and Social Sciences (Grant No. 13YJC910001). Technical details on the conditional distribution properties of SMN distributions, the information matrix and the proof of information consistency are presented in an additional document, which is distributed with the paper as Supplementary Material. References Rasmussen CE and Williams CKI. Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press, 2006. Shi JQ, Wang B, Murray-Smith R, et al. Gaussian process functional regression modelling for batch data. Biometrics 2007; 63(3): 714-723. 27 Shi JQ, Wang B, Will EJ, et al. Mixed-effects GPFR models with application to dose-response curve prediction. Stat Med 2012; 31(26): 3165-3177. Gramacy R and Lian H. Gaussian process single-index models as emulators for computer experiments. Technometrics 2012; 54(1): 30-41. Wang B and Shi JQ. Generalized Gaussian process regression model for nonGaussian functional Data. J AM Stat Assoc 2014; 109(507): 1123-1133. Andrews DF and Mallows CL. Scale mixtures of normal distributions. J R Soc Series B Stat Methodol 1974; 36(1): 99-102. Lachos VH, Bandyopadhyay D and Dey DK. Linear and nonlinear mixedeffects models for censored HIV viral loads using normal/independent distributions. Biometrics 2011; 67(4); 1594-1604. Meza C, Osorio F and Cruz R. Estimation in nonlinear mixed-effects models using heavy-tailed distributions. Stat Comput 2012; 22(1): 121-139. Cao CZ, Lin JG., Shi JQ, et al. Multivariate measurement error models for replicated data under heavy-tailed distributions. J Chemometr 2015; 29(8): 457-466. Blas B, Bolfarine H and Lachos VH. Heavy tailed calibration model with Berkson measurement errors for replicated data. Chemometr Intell Lab 2016; 156: 21-35. Zhu H, Brown PJ and Morris JS. Robust, adaptive functional regression in functional mixed model framework. J AM Stat Assoc 2011; 106(495): 1167-1179. Osorio F. Influence diagnostics for robust P-splines using scale mixture of normal distributions. Ann I of Stat Math 2016; 68(3): 589-619. Lee Y, Nelder JA and Pawitan Y. Generalized linear models with randomeffects, unified analysis via H-likelihood. London: Chapman and Hall, 2006. McCulloch CE and Neuhaus JM. Prediction of random effects in linear and generalized linear models under model misspecification. Biometrics 2011; 67(1): 270-279. 28 Lee Y and Kim G. H-likelihood predictive intervals for unobservables. Int Stat Rev 2016; Online. Diggle PJ, Liang KY and Zeger SL. Analysis of longitudinal data. New York: Oxford Univ. Press, 1996. Lee Y and Nelder JA. Conditional and marginal models: Another view (with discussion). Stat Sci 2004; 19(2): 219-238. Ramsay JO and Silverman BW. Functional data analysis, 2nd edn. New York: Springer, 2005. Shi JQ and Choi T. Gaussian process regression analysis for functional data. London: Chapman and Hall, 2011. Lange K and Sinsheimer JS. Normal/independent distributions and their applications in robust regression. J Comput Graph Stat 1993; 2(2): 175-198. Pinheiro JC, Liu C and Wu YN. Efficient algorithms for robust estimation in linear mixed-effect models using the multivariate t distribution. J Comput Graph Stat 2001; 10(2): 249-276. Savalli C, Paula GA and Cysneiros FJA. Assessment of variance components in elliptical linear mixed models. Stat Model 2006; 6(1): 59-76. Wang Z, Shi JQ and Lee Y. Extended T-process regression models. 2016; arXiv: 1511.03402. West RM, Harris K, Gilthorpe MS, et al. Functional data analysis applied to a randomized controlled clinical trial in hemodialysis patients describes the variability of patient responses in the control of renal anemia. J AM Soc Nephrol 2007; 18(8): 2371-2376. Tolman C, Richardson D, Bartlett C, et al. Structured conversion from thrice weekly to weekly erythropoietic regimens using a computerized decisionsupport system: a randomized clinical study. J AM Soc Nephrol 2005; 16(5): 1463-1470. Will EJ, Richardson D, Tolman C, et al. Development and exploitation of a clinical decision support system for the management of renal anaemia. Nephrology, Dialysis and Transplantation 2007; 22(suppl 4): iv31-iv36. 29 Schwarz G. Estimating the dimension of a model. Ann Stat 1978; 6(2): 461-464. Fang KT, Kotz S and Ng KW. Symmetrical multivariate and related distributions. London: Chapman and Hall, 1990. Lee Y, Nelder JA. Hierarchical Generalized Linear Models (with discussion). J R Soc Series B Stat Methodol 1996; 58(4): 619-678. Meng XL and Rubin DB. Maximum likelihood estimation via the ECM algorithm: A general framework. Biometrika 1993; 80(2): 267-278. Liu CH and Rubin DB. The ECME algorithm: a simple extension of EM and ECM with faster monotone convergence. Biometrika 1994; 81(4): 633-648. Lange K, Little R and Taylor J. Robust statistical modeling using the t distribution. J AM Stat Assoc 1989; 84(408): 881-896. Bjørnstad JF. Predictive likelihood: A review. Stat Sci 1990; 5(1): 242-265. Bjørnstad JF. On the generalization of the likelihood function and likelihood principle. J AM Stat Assoc 1996; 91(434): 791-806. Seeger MW, Kakade SM and Foster DP. Information consistency of nonparametric Gaussian process methods. IEEE T Inform Theory 2008; 54(5): 2376-2382. Yao F, Müller HG and Wang JL. Functional data analysis for sparse longitudinal data. J AM Stat Assoc 2005; 100(470): 577-590. Li Y and Hsing T. Uniform convergence rates for nonparametric regression and principal component analysis in functional/longitudinal data. Ann Stat 2010; 38(6): 3321-3351. Yi G, Shi JQ and Choi T. Penalized gaussian process regression and classification for high-dimensional nonlinear data. Biometrics 2011; 67(4): 1285-1294. Guo W. Functional mixed effects models. Biometrics 2002; 58(1): 121-128. Antoniadis A and Sapatinas T. Estimation and inference in functional mixedeffects models. Comput Stat Data Anal 2007; 51(10): 4793-4813. 30 Wu H and Zhang JT. Nonparametric regression methods for longitudinal data analysis: mixed-effects modeling approaches. New Jersey: Wiley, 2006. Ruppert D, Wand MP and Carroll RJ. Semiparametric regression. Cambridge: Cambridge University Press, 2003. Fang X, Li J, et al. Detecting the violation of variance homogeneity in mixed models. Stat Methods Med Res. 2016; 25(6): 2506-2520. 31
© Copyright 2026 Paperzz