Supporting Web Materials Supplementary Web materials for “Impact

1
Supporting Web Materials
Supplementary Web materials for “Impact of the model building strategy on the inference about
time-dependent and non-linear covariate effects in survival analysis”.
The current document presents supplementary materials that, because of space limitation, were
not included in the manuscript.
A.1. Alternative tests for NL and TD effects and for association with hazard
There are two alternative tests which may be used for each of the NL and TD effects.
Indeed, we can test for the NL effect of Xi conditional on (i) the PH assumption (“test of linearity
assuming PH” in Figure 1 in the manuscript), or (ii) a TD effect (“test of linearity assuming TD”
in Figure 1 in the manuscript). Similarly, we can test for the TD effect of Xi by (i) assuming its
linear effect (“test of PH assuming linearity” in Figure 1 in the manuscript) or (ii) assuming a NL
effect (“test of PH assuming NL” in Figure 1 in the manuscript). For example, testing the TD
effect of Xi, while adjusting for its NL effect, consists of two steps. First, we fit the model with
both TD and NL effects of Xi, ln(HR)= βi(t)*gi(xi). This model uses 7 degrees of freedom (dfs): 4
dfs for TD effects and 3 dfs for NL effects. Then, we fit the 3-df model ln(HR)= gi(xi), in which
Xi is assumed to have NL effect but to be consistent with the PH assumption. Finally, we
compare the deviances of the two models. Under the null hypothesis of the PH effect for Xi, the
resulting LRT has a chi-square distribution with 4 dfs (7-3=4). Similar LRT tests can be
constructed to test the linearity assumption [1].
Finally, Figure 1 in the manuscript shows that the H0 of no association between a
continuous predictor and the hazard can be tested using different approaches. Specifically,
conventional ‘constrained’ 1-df LRT compares the ‘null’ model, that excludes the predictor, with
1
2
the model that imposes both PH and LL constraints, while a more complex ‘flexible’ test
(indicated by a curvilinear arrow at the right edge of Figure 1 in the manuscript) uses 7-df to
compare the ‘null’ model against the flexible model that estimates both TD and NL effects of the
predictor of interest.
A.2. Preliminary simulations: methods
We first simulated a simple hypothetical prognostic study of the effects of three
continuous covariates (X1-X3) on cancer mortality. The study included 300 subjects with a
censoring rate of 33% (i.e. about 200 observed events). X1 and X2 were normally distributed and
were negatively correlated (r=-0.7), while X3 was generated independently of X1 and X2 and was
log-normally distributed. Only X1 was assumed to have TD and NL effects. Accordingly, the
hazard conditional on the covariates was defined as:
λ(t|X1,X2,X3)=λ0(t)*exp[β1(t)*g1(X1)+β2*X2+β3*X3]
(W1)
where β2= β3=ln(1.5). The functional forms considered for β1(t) and g1(X1) are shown in Web
Figure A1. We generated survival times according to model (W1), using the permutational
algorithm [2-4], designed specifically to generate event times conditional on TD effects and/or
TD covariates. We first generated 300 observed event times from a log-normal distribution, then
simulated 300 prognostic factor vectors (values of covariates X1, X2 and X3). Next, we matched
the 300 covariate vectors with the 300 survival times based on the probability derived from the
partial likelihood in (W1).
2
3
A.3 Preliminary simulations: results
In the univariate Cox’s PH model that only included X1 and imposed the linearity and PH
assumptions, the power to detect a significant effect of X1 was only 11.3% (“constrained test of
association” in Figure 1 in the manuscript). In contrast, after having accounted for both NL and
TD effects of X1 in the univariate model, the power increased to 83.3% (“flexible test of
association” in Figure 1 in the manuscript). Similarly, in the multivariable model that adjusted
for X2 and X3, the power for testing the effect of X1 increased from 31.7% for constrained test to
99.3% for the flexible test.
We explored the impact of mismodeling the confounders on the type I error, estimated as
the proportions of the simulated samples, in which a given test incorrectly rejects the true
hypothesis. To this end, we tested for the spurious NL and TD effects of X2 (using α=0.05),
whose true effect was consistent with both linearity and PH assumptions. Type I error rate for
testing for both the NL (15.3% or 14.7%) and the TD (26.7% or 28.7%) effects of X2 were
seriously inflated, by a factor of 3 to 5, when we, respectively, did not adjust for X1 or
incorrectly adjusted only for its PH/LL effects. In contrast, for both effects of X 2, we observed
the correct type I error rates of 4-5%, very close to the nominal significance level of 5%, while
adjusting for both (true) NL and TD effects of X1. These results show the importance of correctly
specifying the effects of a confounder when testing for the effects of the predictor of interest.
3
4
A.4. Main simulations: relevance of the TD and NL effects chosen
To enhance the practical relevance of our simulations, we attempted to select curves that
could be considered clinically plausible for some factors often investigated in prognostic studies
of cancer mortality. For example, we assumed that the dichotomous covariate X1 represented the
indicator of a surgery, right after the cancer diagnosis (time 0). Usually, mortality is high just
after the surgery because of post-surgical complications, but later the subjects who survived this
critical period have a lower risk than those who had not been operated. Accordingly, the TD
effect of X1 illustrated in Figure 2a in the manuscript shows ‘crossing hazards’ [1], with a
harmful effect of surgery in the first 6 months of follow-up (log HR>0) and a protective effect
later on. X2 represented the cumulative smoking intensity (assuming all subjects were smokers),
with a constant-over-time (PH) but non-linear effect (Figure 3a in the manuscript), similar to
previous studies of smoking-cancer associations [2, 3]. X3 represented the age at cancer
diagnosis. Similar to previous studies of various cancers, age was assumed to have a U-shaped
effect (Figure 3b in the manuscript), with patients diagnosed at both very young and old ages
having higher mortality [4, 5]. Furthermore, the TD effect of age was also non-monotone over
time (Figure 2c in the manuscript), with a high impact on early, post-surgical mortality, followed
by a much smaller but gradually increasing impact on later, cancer-related mortality [6]. To
increase variation in the true shapes of the NL and TD effects, X4 was assumed to represent a
hypothetical early biomarker of cancer progression, which was especially discriminative in both
tails of its distribution (Figure 3c in the manuscript) and had a short-term lagged effect, often
observed in clinical studies of biomarkers [7, 8], i.e. it predicted mortality 1 to 2 years after
cancer diagnosis (Figure 2d in the manuscript). Finally, X5 represented the tumor diameter at
4
5
diagnosis with a linear association with log hazard (Figure 3d in the manuscript) that becomes
gradually weaker with increasing time since diagnosis, as reflected by the log HR gradually
decaying toward 0 in Figure 2e in the manuscript. This is consistent with empirical findings
regarding decreasing over time effects of baseline clinical measures of several prognostic factors
[6].
A.5. Main simulations: sensitivity analyses
The results of section 3.2 were based on the following equation:
λ(t|X1,X2,X3,X4,X5)=λ0(t)*exp[β1(t)*X1+g2(X2)+β3(t)*g3(X3)+β4(t)*g4(X4)+β5(t)*X5]
We considered similar simulation studies but allowing the number of true TD and NL effects
to vary. More particularly, we considered the two models described by the following equations:
λ(t|X1,X2,X3,X4,X5)=λ0(t)*exp[β1*X1+g2(X2)+β3*X3+β4(t)*g4(X4)+β5(t)*X5]
λ(t|X1,X2,X3,X4,X5)=λ0(t)*exp[β1*X1+ β2*X2+β3*X3+β4*X4+β5(t)*X5]
(W2)
(W3)
The simulation study defined by (W2) assumes TD effects for X4 and X5 and NL effects for
X2 and X4. Compared to the main simulations, this one also assumes the linearity assumption for
X3 and the PH assumption for X1 and X3.
In contrast, the simulation study defined by (W3) assumes only a TD effect for X5. As
opposed to the main simulations, this one also assumes the linearity assumption for X2, X3 and
X4, and the PH assumption for X1, X3 and X4.
5
6
The results for the constrained multivariable model, the full flexible multivariable model and
the backward elimination are given in Web Tables A2 and A3 respectively. The results for the
univariate model are not presented as known to be inaccurate in the case of multivariable
modeling.
A.6 Detailed results of the backward elimination procedure in the main simulations.
As expected, the frequency with which the proposed backward procedure detected a given true
TD or NL effect did depend on the strength and the form of the effect (column 6 of Table 2 in
the manuscript). For example, the very low power for the TD effect of X3 is mostly due to the
fact that most changes in its effect are limited to only the first 5 months of follow-up (Figure 2c
in the manuscript), and even during this period, the changes are smaller than those simulated for
the TD effects of other covariates. Similarly, the NL effect of X3 is also detected infrequently
because (i) the differences in the log HR across the values of X3 are less important than for other
covariates (Figure 3b in the manuscript) and, in addition, (ii) these modest differences are
multiplied by relatively small values of the corresponding TD effect. Indeed, the mean LRT
values identify both TD and NL effects of X3 as much weaker than for other covariates (last
column of Table 2 in the manuscript). Furthermore, results in Table 2 in the manuscript indicate
that, in a multivariable setting, the ability of the backward elimination to identify true effect of a
given predictor may also depend on its correlations with other covariates and on the true effects
of the correlated covariates. For example, the fact that the lowest power (21.0%) was obtained
for the TD effect of X3 may be partly due to its strong positive correlation (Pearson r=+0.7) with
X5 whose TD effect is stronger but, similar to X3, generally decreases during the follow-up
6
7
(Figures 2c versus 2e in the manuscript). On the other hand, it should be noticed that for all true
TD/NL effects the power based on the model selected by the backward procedure (column 6 of
Table 2 in the manuscript) is as high as that based on the true model (column 7) and higher than
that based on alternative strategies considered. In fact, Sauerbrei et al also reported that
backward elimination performed better than other selection procedures in their study of flexible
multivariable modeling of the effects of continuous covariates [5].
A.7. Application to lung cancer: description of the cohort and the study variables
The study subjects were all treated with chemotherapy between April 9, 2002 and September
18, 2008, with a median follow-up duration of 8.6 months. Vital status and, where applicable,
date of death were obtained from clinical charts [6]. The follow-up was continued until March
15, 2009, when all patients still alive were censored. Furthermore, because mortality observed
more than 3 years after the first chemotherapy was (i) too low for reliable estimation, and (ii)
unlikely to be due to NSCL cancer [6], all patients who survived for 3 years after time 0 were
censored at that time.
Baseline data, collected at the date of the start of the chemotherapy, included several
quantitative biomarkers as well as some categorical variables [6]. Given the number of observed
events (206 deaths) and the need to respect the critical ratio of at least 10 events per each df of
the estimated model [2], we had to restrict our multivariable analyses to a smaller subset of
covariates of primary interest. In particular, we present the analyses that focus on the potential
prognostic utility of the Neutrophil Count (mean: 7.09*10-9 l-1, standard deviation (SD): 3.55*107
8
9 -1
l ), a continuous variable whose predictive ability in NSCL was up to now relatively little
investigated [7-8]. In the same analyses, we adjusted for, and re-assessed, the effects of two
other continuous variables, albumin (mean: 40.2 g.l-1, SD: 4.1 g.l-1) and logarithm of C-reactive
protein (log(CRP), mean: 3.8 mg.l-1, SD: 2.2 mg.l-1), that are considered important prognostic
factors in NSCL [9-10] and were both found to have important TD or NL effects in a previous
analysis [6]. Given the major impact of smoking on lung cancer, all models adjusted also for the
binary indicator of smoking status (ever/never, with 84.8% of all subjects classified as “ever
smokers”).
A.8. Assessment of the robustness of the findings in lung cancer application
Two approaches were used to assess the robustness of the covariate effects estimated in the
flexible model selected by the backward elimination procedure. Firstly, we implemented a
bootstrap investigation of the stability of the estimates [11-12], based on 300 bootstrap resamples of the original NSCL dataset. Then, we estimated the 90% pointwise confidence
intervals (CI) around the TD or NL estimates at selected values of, respectively, follow-up time
(t) or the continuous covariate (x), using the 5th and the 95th percentiles of the distribution of the
300 corresponding bootstrap estimates of β(t) or g(x). Pointwise 90% CI’s are useful to assess the
precision of the estimated effects, e.g. for NL effect of a covariate X, of the log HR’s associated
with specific values of X, relative to the reference value, such as the sample mean of X.
However, at each value x, the corresponding pointwise CI’s are estimated independently of CI’s
for the other values of X, which implies that they do not allow assessing the robustness of the
estimated shape of the NL curve g(X) [2]. Therefore, for selected NL effects, we have also
plotted and visually assessed the 300 estimates, obtained from individual bootstrap re-samples.
8
9
Secondly, to further explore the robustness of the possible NL and/or TD effects of the three
continuous predictors, we used the method proposed by Perme et al [13] to obtain smoothed
pseudo-residual plots.
Web Figure A2 shows smoothed pseudo-residual plots for the three continuous variables,
at times corresponding to the 20th, 40th, 60th and 80th percentiles of the distribution of uncensored event times to further assess their possible NL and/or TD effects. For log(CRP),
neutrophil count and albumin, the pseudo-residuals show a clear departure from linearity. In
addition, the spread and the mean values of the pseudo-residuals vary across the four times,
confirming the TD effects of the predictors.
A.9. Some explanations for the selection of TD effect of albumin but neither the TD nor the
NL effects of neutrophil count by the forward procedure
First, notice that the TD effect of albumin is highly significant (p<10-3) in the univariate
un-adjusted analyses (Strategy 1 in Table 3 of the main document) and in the ‘constrained’
multivariable model (Strategy 2 in Table 3 of the main document) that adjusts only for the
PH/LL effects of the three other covariates, but not in the full flexible multivariable model
(p=0.123). Indeed, the TD effect of albumin becomes non-significant in all models that adjust for
the TD/NL effects of both log(CRP) and neutrophil count, which can be explained partly by
moderately strong correlations of albumin with both predictors (respectively, r=-0.51 and r=0.30). Thus, in the simpler Strategies 1 and 2 that fail to adjust for TD effects of other covariates,
correlated with albumin, residual confounding might have induced its spurious TD effect. This
conjecture is supported by the inflated type I error rates found for Strategies 1 and 2 in our
9
10
simulations (see e.g. tests of TD effect of X2 in Table 2 of the main document). A similar
phenomenon could explain the significance of the TD effect of albumin in the model selected by
the forward procedure which does not include the TD and NL effects of neutrophil count (Table
3 of the main document). Indeed, the high statistical significance of the TD effect of albumin in
the constrained multivariable model (Strategy 2) explains why this effect was selected at the
initial steps of the forward procedure.
We will now attempt to explain why no TD or NL effects of neutrophil count could enter
the model after, in addition to the TD effect of albumin, the TD and NL effects of log(CRP) were
added, at the consecutive steps of the forward selection. This occurred because of two
phenomena. (i) Firstly, because of the correlations between the three continuous covariates, none
of the TD/NL effects is significant when adjusted for the TD and NL of all other covariates
(Strategy 3 in Table 3 of the main document). (ii) Secondly, the significance of the NL or TD
effects of neutrophil count is enhanced when adjusted for, respectively, the TD or NL effect of
the same variable. This is reflected in the results of Strategies 1 and 2 in Table 3 of the main
document, where such adjustments (1st approach) helped identify the NL and/or TD effects of
neutrophil count as significant, in contrast to the 2nd approach that imposed the PH or linearity
constraints. A similar phenomenon was observed in our simulations, where e.g. the power for
testing both the TD and NL effects of X4 increased substantially if, in the true flexible
multivariable models, we adjusted for, respectively, the NL and TD effect of X4 (column 7 of
Table 2 of the main document). Because the forward selection starts with the constrained PH/LL
multivariable model, the TD effect of a continuous variable X is assessed without adjusting for
the NL effect of X, and vice versa, until at least one of these effects is selected. Thus, if each of
10
11
the two effects of X becomes ‘significant’ only when adjusted for the other effect, neither TD
nor NL effect will be selected by the forward procedure.
To further support the above explanation, we carried out additional tests of the effects of
neutrophil count, when adjusting for all effects selected by the forward procedure, i.e. when
added to the model identified by Strategy 5 in Table 3 of the main document. The separate tests
of (i) TD effect of neutrophil count, under the linearity constraint, and (ii) its NL effect, under
the PH constraints, yielded non-significant results (p=0.14 and p=0.15, respectively). In contrast,
even when adjusted for the TD effect of albumin, both the TD and the NL effects of neutrophil
count approached significance when they were mutually adjusted for each other (p=0.06 and
p=0.07, respectively). Furthermore, the 6-df LRT that compared the deviance of the model
selected by the forward procedure, with PH/LL effect of neutrophil count, against the model that
included both its TD and NL effects yielded p=0.04. This indicates that the simultaneous
inclusion of both effects results in a significant improvement in the fit to data. Because the
backward elimination procedure starts with the full flexible model (Table 1 of the main
document), the separate tests of the TD or NL effect of neutrophil count were carried out while
adjusting for its NL or TD, which prevented the elimination of either effect.
A.10. Results about albumin in the application on non-small-cell lung cancer
The TD effect of albumin was selected as significant by the forward procedure (Strategy
5 in Table 3 of the main document) and was the last effect eliminated, with a marginally nonsignificant p=0.08, by the backward procedure. Therefore, Web Figure A3 shows the point
estimates of the TD and NL effects of albumin, adjusted for all other effects selected by the
11
12
backward procedure (Strategy 4 in Table 3 of the main document). The NL estimate shows that
mortality decreases monotonically with increasing baseline albumin value. In fact, the estimate is
very close to a line over a wide range of low to middle albumin values, explaining why the NL
effect is completely non-significant (p=0.83). The TD effect indicates a sharp decrease in the
impact of low albumin with increasing follow-up duration, implying that its effect is limited to
about first six months after the initial measurement. This change in time appears to be clinically
important. Thus, further research is necessary to assess if its marginal non-significance (p=0.08)
may simply reflect inadequate power of the test, based on the flexible multivariable model with
22 df’s (7df’s for NL and TD effects of each of the three continuous covariates and 1-df effect of
smoking status) fitted to data with only 206 deaths.
A.11. Discussion on the models selected by the MFP-like algorithm and the
backward selection procedure
Even if the MFP-like algorithm [14] and our backward procedure selected the same
model in our lung cancer analyses (Table 3 of the main document), the differences between the
properties of the two algorithms may result in somewhat different selections in some other
applications. For example, the 1st step of the MFP-like algorithm starts with assessing the NL
effects under the PH constraints, and then step 2 re-assesses if additional NL effects of some
covariates may be identified if a shorter follow-up is assumed, by censoring later events [14].
This approach may be less effective in identifying the NL effects of those covariates whose
effects are lagged and occur only in the later phase of follow-up, such as effects of some biopsy
variables on the risk of renal failure in lupus [15]. Furthermore, sensitivity analyses reported by
12
13
Sauerbrei et al indicated that the results of step 3 may depend on what cut-off time is used to
define the “shorter follow-up” [14].
MFP-like strategy involves (a) initial assessment of NL effects under the PH constraints,
followed by (b) assessment of TD effects conditional on the previously estimated NL effects.
This feature of the 3-step MFP-like algorithm may be seen as both a limitation and an advantage,
relative to our approach which involves simultaneous estimation of TD and NL effects [1]. On
one hand, by not accounting for possible TD effects while assessing the NL effects, the 3-step
algorithm may be somewhat more susceptible for potential residual confounding discussed in the
previous paragraph. Indeed, Buchholz and Sauerbrei recognize that assessment of TD effects
may depend on the way the NL effect of the same covariate is modeled [16]. On the other hand,
by avoiding the need for simultaneous modeling and testing of TD and NL effects, the MFPbased model proposed in [14] reduces the risk of non-convergence problems that require
complex numerical procedures to estimate our spline-based model [1].
13
14
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
Abrahamowicz M, MacKenzie TA. Joint estimation of time-dependent and non-linear effects of
continuous covariates on survival. Statistics in Medicine 2007; 26: 392-408.
Abrahamowicz M, MacKenzie T, Esdaile JM. Time-Dependent Hazard Ratio: Modeling and
Hypothesis Testing With Application in Lupus Nephritis. Journal of the American Statistical
Association 1996; 91: 1432-1439.
Mackenzie T, Abrahamowicz M. Marginal and hazard ratio specific random data generation:
Applications to semi-parametric bootstrapping. Statistics and Computing 2002; 12: 245-252.
Sylvestre M-P, Abrahamowicz M. Comparison of algorithms to generate event times conditional
on time-dependent covariates. Statistics in Medicine 2008; 27: 2618-2634.
Sauerbrei W, Royston P, Binder H. Selection of important variables and determination of
functional form for continuous predictors in multivariable model building. Statistics in Medicine
2007; 26: 5512-5528.
Gagnon B, Abrahamowicz M, Xiao Y, Beauchamp ME, MacDonald N, Kasymjanova G, Kreisman
H, Small D. Flexible modeling improves assessment of prognostic value of C-reactive protein in
advanced non-small cell lung cancer. Br J Cancer 2010; 102: 1113-1122.
Luo J, Chen Y-J, Narsavage G, Ducatman A. Predictors of survival in patients with non-small cell
lung cancer. Oncology nursing forum 2012; 39: 609-616.
Teramukai S, Kitano T, Kishida Y, Kawahara M, Kubota K, Komuta K, Minato K, Mio T, Fujita Y,
Yonei T, Nakano K, Tsuboi M, Shibata K, Furuse K, Fukushima M. Pretreatment neutrophil count
as an independent prognostic factor in advanced non-small-cell lung cancer: an analysis of Japan
Multinational Trial Organisation LC00-03. European Journal of Cancer (Oxford, England : 1990)
2009; 45: 1950-1958.
Koch A, Fohlin H, Sorenson S. Prognostic significance of C-reactive protein and smoking in
patients with advanced non-small cell lung cancer treated with first-line palliative
chemotherapy. Journal of thoracic oncology 2009; 4: 326-332.
Maeda T, Ueoka H, Tabata M, Kiura K, Shibayama T, Gemba K, Takigawa N, Hiraki A, Katayama
H, Harada M. Prognostic factors in advanced non-small cell lung cancer: elevated serum levels of
neuron specific enolase indicate poor prognosis. Japanese journal of clinical oncology 2000; 30:
534-541.
Royston P, Sauerbrei W. Stability of multivariable fractional polynomial models with selection of
variables and transformations: a bootstrap investigation. Statistics in Medicine 2003; 22: 639659.
Sauerbrei W. The Use of Resampling Methods to Simplify Regression Models in Medical
Statistics. Journal of the Royal Statistical Society. Series C, Applied statistics 1999; 48: 313-329.
Perme M, Andersen P. Checking hazard regression models using pseudo-observations. Statistics
in Medicine 2008; 27: 5309-5328.
Sauerbrei W, Royston P, Look M. A New Proposal for Multivariable Modelling of Time‐Varying
Effects in Survival Data Based on Fractional Polynomial Time‐Transformation. Biometrical
Journal 2007; 49: 453.
Esdaile JM, Abrahamowicz M, Mackenzie T, Kashgarian M, Hayslett JP. The time-dependence of
long-term prediction in lupus nephritis. Arthritis & Rheumatism 1994; 37: 359-368.
Buchholz A, Sauerbrei W. Comparison of procedures to assess non-linear and time-varying
effects in multivariable models for survival data. Biometrical Journal 2011; 53: 308-331.
14
15
Web Table A1. Distribution of the continuous covariates in the main simulations,
according to the value of the binary variable X1.
Continuous prognostic
Value of binary covariate X1
factor
X1=0
X1=1
X2
X3
X4
X5
N(2,1)
N(0.5,1)
N(-0.75,1)
N(1,1)
15
N(2,1)
N(0,1)
N(0,1)
N(0,1)
16
Web Table A2. Proportions of Simulated Samples in Which a Given Test Rejected the
Corresponding Null Hypothesis, at α=0.05, for the Tests for the Different Effects of Each
Covariate for all Strategies in the Case Where There Were Four “True” TD or NL Effects.
Prognostic
factor
X1
X2
X3
X4
X5
Test†
Constrained
multivariable
model
Full flexible
multivariable
model
Backward
selection§
Constrained test of association
Test PH
Flexible test of association¶
45.7*
6.3*
5.0*
Constrained test of association
Test PH assuming linearity*
Test of PH assuming NL*
Test of linearity assuming PH
Test of linearity assuming TD
Flexible test of association
12.7*
21.3*
76.0
78.0
17.7*
7.3*
75.7
71.3
Constrained test of association
Test PH assuming linearity*
Test of PH assuming NL*
Test of linearity assuming PH*
Test of linearity assuming TD*
Flexible test of association
36.7*
45.3*
14.0*
19.3*
10.0*
10.3*
9.3*
7.0*
Constrained test of association
Test PH assuming linearity
Test of PH assuming NL
Test of linearity assuming PH
Test of linearity assuming TD
Flexible test of association
51.3
60.7
59.3
66.0
27.3
49.0
39.0
58.3
Constrained test of association
Test PH assuming linearity
Test of PH assuming NL
Test of linearity assuming PH*
Test of linearity assuming TD*
Flexible test of association
91.3
79.0
12.3*
7.3*
63.3
57.0
11.0*
6.0*
True
model‡
9.0*
84.3
84.7
197
6.0*
6.3*
51.0
58.3
35.7
52.7
44.7
63.3
93.7
94.3
4.3*
Abbreviations: PH, proportional hazards; NL, non-linear; TD, time-dependent
†
As defined in Figure 1.
§
For backward elimination, we show the proportion of samples in which a given effect was
selected into the final model, i.e. was statistically significant at α=0.05 in the final model.
Depending on whether the TD effect of the same variable was also included or not, the test of
linearity is adjusted or not for the TD effect and vice versa.
‡
Model in which we only included the true effects for each prognostic factor.
¶
For a binary variable, no NL effects are available. Therefore, the test of association based on
the flexible model is the one testing for a compares the model with the TD effect of the variable
against the model which excludes the variable.
* indicates the corresponding H0 is true, so that the reported proportion represents the Type I
error rate
16
17
Web Table A3. Proportions of Simulated Samples in Which a Given Test Rejected the
Corresponding Null Hypothesis, at α=0.05, for the Tests for the Different Effects of Each
Covariate for all Strategies in the Case Where There Were Only one “True” NL Effect.
Prognostic
factor
X1
X2
X3
X4
X5
Test†
Constrained
multivariable
model
Full flexible
multivariable
model
Backward
selection§
Constrained test of association
Test PH
Flexible test of association¶
52.7*
8.7*
8.0*
Constrained test of association
Test PH assuming linearity*
Test of PH assuming NL*
Test of linearity assuming PH
Test of linearity assuming TD
Flexible test of association
20.7*
11.3*
5.0*
2.3*
6.7*
5.0*
4.7*
2.0*
Constrained test of association
Test PH assuming linearity
Test of PH assuming NL
Test of linearity assuming PH
Test of linearity assuming TD
Flexible test of association
58.0*
44.0*
19.3*
2.7*
7.0*
5.7*
6.7*
6.0*
Constrained test of association
Test PH assuming linearity
Test of PH assuming NL
Test of linearity assuming PH
Test of linearity assuming TD
Flexible test of association
52.0*
37.3*
3.7*
2.3*
6.0*
5.7*
4.0*
3.0*
Constrained test of association
Test PH assuming linearity
Test of PH assuming NL
Test of linearity assuming PH*
Test of linearity assuming TD*
Flexible test of association
94.3
87.7
32.3*
2.7*
68.7
44.3
12.7*
4.7*
True
model‡
5.7*
3.0*
9.0*
7.3*
7.0*
3.7*
96.7
84.0
3.7*
Abbreviations: PH, proportional hazards; NL, non-linear; TD, time-dependent
†
As defined in Figure 1.
§
For backward elimination, we show the proportion of samples in which a given effect was
selected into the final model, i.e. was statistically significant at α=0.05 in the final model.
Depending on whether the TD effect of the same variable was also included or not, the test of
linearity is adjusted or not for the TD effect and vice versa.
‡
Model in which we only included the true effects for each prognostic factor.
¶
For a binary variable, no NL effects are available. Therefore, the test of association based on
the flexible model is the one testing for a compares the model with the TD effect of the variable
against the model which excludes the variable.
* indicates the corresponding H0 is true, so that the reported proportion represents the Type I
error rate
17
18
TD Effect
1.0
0.0
log(HR)
2.0
1(t):
0
2
4
(a)
6
8
10
12
time
2 4 6
-2
log(HR)
g1(X1): NL Effect
7
8
9
10
X1
11
12
13
Web Figure A1. Time-Dependent (TD) effect and Non-Linear (NL) effect of X1 in preliminary
simulations.
18
19
1.4
1.0
2
-3 -2 -1 0
1
2
-3 -2 -1 0
1
2
1
1.0
-3 -2 -1 0
s
-3 -2 -1 0
1
2
log(CRP)
1.4
1.0
1.4
1.0
1.4
0.10
0.20
0.10
0
-8 -6 -4 -2
0
-8 -6 -4 -2
0
-8 -6 -4 -2
-8 -6 -4 -2
0
Albumin
0.20
0.10
0.20
0.10
0.20
1
3
5
7
t=108 days
1
3
5
7
3
-2 -1 0
1
2
3
-2 -1 0
1
2
3
2
1
-2 -1 0
-2 -1 0
1
2
3
Neutrophil count
1
t=186 days
3
5
7
t=260 days
1
3
7
t=411 days
Web Figure A2. The smoothed average pseudo-residuals with respect to the three continuous
covariates at four different points in time (t=108 days, t=186 days, t=260 days and t=411 days,
corresponding to the 20th, 40th, 60th and 80th quantiles of event times). Each row represents a
continuous variable.
19
5
0.5
0.0
-0.5
log(HR)
1.5
1.0
-0.5
-1.0
0.0
0.5
log(HR)
2.0
2.5
3.0
20
1
4
7 10
14
18
30
time (months)
35
40
45
50
Albumin
Web Figure A3. Estimates of the TD and NL effects of Albumin adjusting for NL and TD effects
of log(CRP) and Neutrophil count as well as for the PH effect of Smoking. (a) TD effect of
albumin, (b) the NL effect of albumin, which shows how the log hazard ratio, relative to the
mean value of 40 g l-1, changes with increasing value of albumin. The empirical distributions of
the observed event times (panel a) and albumin values (panel b) are shown by ‘rug e plots’, at the
bottom of the respective graphs.
20