1 Supporting Web Materials Supplementary Web materials for “Impact of the model building strategy on the inference about time-dependent and non-linear covariate effects in survival analysis”. The current document presents supplementary materials that, because of space limitation, were not included in the manuscript. A.1. Alternative tests for NL and TD effects and for association with hazard There are two alternative tests which may be used for each of the NL and TD effects. Indeed, we can test for the NL effect of Xi conditional on (i) the PH assumption (“test of linearity assuming PH” in Figure 1 in the manuscript), or (ii) a TD effect (“test of linearity assuming TD” in Figure 1 in the manuscript). Similarly, we can test for the TD effect of Xi by (i) assuming its linear effect (“test of PH assuming linearity” in Figure 1 in the manuscript) or (ii) assuming a NL effect (“test of PH assuming NL” in Figure 1 in the manuscript). For example, testing the TD effect of Xi, while adjusting for its NL effect, consists of two steps. First, we fit the model with both TD and NL effects of Xi, ln(HR)= βi(t)*gi(xi). This model uses 7 degrees of freedom (dfs): 4 dfs for TD effects and 3 dfs for NL effects. Then, we fit the 3-df model ln(HR)= gi(xi), in which Xi is assumed to have NL effect but to be consistent with the PH assumption. Finally, we compare the deviances of the two models. Under the null hypothesis of the PH effect for Xi, the resulting LRT has a chi-square distribution with 4 dfs (7-3=4). Similar LRT tests can be constructed to test the linearity assumption [1]. Finally, Figure 1 in the manuscript shows that the H0 of no association between a continuous predictor and the hazard can be tested using different approaches. Specifically, conventional ‘constrained’ 1-df LRT compares the ‘null’ model, that excludes the predictor, with 1 2 the model that imposes both PH and LL constraints, while a more complex ‘flexible’ test (indicated by a curvilinear arrow at the right edge of Figure 1 in the manuscript) uses 7-df to compare the ‘null’ model against the flexible model that estimates both TD and NL effects of the predictor of interest. A.2. Preliminary simulations: methods We first simulated a simple hypothetical prognostic study of the effects of three continuous covariates (X1-X3) on cancer mortality. The study included 300 subjects with a censoring rate of 33% (i.e. about 200 observed events). X1 and X2 were normally distributed and were negatively correlated (r=-0.7), while X3 was generated independently of X1 and X2 and was log-normally distributed. Only X1 was assumed to have TD and NL effects. Accordingly, the hazard conditional on the covariates was defined as: λ(t|X1,X2,X3)=λ0(t)*exp[β1(t)*g1(X1)+β2*X2+β3*X3] (W1) where β2= β3=ln(1.5). The functional forms considered for β1(t) and g1(X1) are shown in Web Figure A1. We generated survival times according to model (W1), using the permutational algorithm [2-4], designed specifically to generate event times conditional on TD effects and/or TD covariates. We first generated 300 observed event times from a log-normal distribution, then simulated 300 prognostic factor vectors (values of covariates X1, X2 and X3). Next, we matched the 300 covariate vectors with the 300 survival times based on the probability derived from the partial likelihood in (W1). 2 3 A.3 Preliminary simulations: results In the univariate Cox’s PH model that only included X1 and imposed the linearity and PH assumptions, the power to detect a significant effect of X1 was only 11.3% (“constrained test of association” in Figure 1 in the manuscript). In contrast, after having accounted for both NL and TD effects of X1 in the univariate model, the power increased to 83.3% (“flexible test of association” in Figure 1 in the manuscript). Similarly, in the multivariable model that adjusted for X2 and X3, the power for testing the effect of X1 increased from 31.7% for constrained test to 99.3% for the flexible test. We explored the impact of mismodeling the confounders on the type I error, estimated as the proportions of the simulated samples, in which a given test incorrectly rejects the true hypothesis. To this end, we tested for the spurious NL and TD effects of X2 (using α=0.05), whose true effect was consistent with both linearity and PH assumptions. Type I error rate for testing for both the NL (15.3% or 14.7%) and the TD (26.7% or 28.7%) effects of X2 were seriously inflated, by a factor of 3 to 5, when we, respectively, did not adjust for X1 or incorrectly adjusted only for its PH/LL effects. In contrast, for both effects of X 2, we observed the correct type I error rates of 4-5%, very close to the nominal significance level of 5%, while adjusting for both (true) NL and TD effects of X1. These results show the importance of correctly specifying the effects of a confounder when testing for the effects of the predictor of interest. 3 4 A.4. Main simulations: relevance of the TD and NL effects chosen To enhance the practical relevance of our simulations, we attempted to select curves that could be considered clinically plausible for some factors often investigated in prognostic studies of cancer mortality. For example, we assumed that the dichotomous covariate X1 represented the indicator of a surgery, right after the cancer diagnosis (time 0). Usually, mortality is high just after the surgery because of post-surgical complications, but later the subjects who survived this critical period have a lower risk than those who had not been operated. Accordingly, the TD effect of X1 illustrated in Figure 2a in the manuscript shows ‘crossing hazards’ [1], with a harmful effect of surgery in the first 6 months of follow-up (log HR>0) and a protective effect later on. X2 represented the cumulative smoking intensity (assuming all subjects were smokers), with a constant-over-time (PH) but non-linear effect (Figure 3a in the manuscript), similar to previous studies of smoking-cancer associations [2, 3]. X3 represented the age at cancer diagnosis. Similar to previous studies of various cancers, age was assumed to have a U-shaped effect (Figure 3b in the manuscript), with patients diagnosed at both very young and old ages having higher mortality [4, 5]. Furthermore, the TD effect of age was also non-monotone over time (Figure 2c in the manuscript), with a high impact on early, post-surgical mortality, followed by a much smaller but gradually increasing impact on later, cancer-related mortality [6]. To increase variation in the true shapes of the NL and TD effects, X4 was assumed to represent a hypothetical early biomarker of cancer progression, which was especially discriminative in both tails of its distribution (Figure 3c in the manuscript) and had a short-term lagged effect, often observed in clinical studies of biomarkers [7, 8], i.e. it predicted mortality 1 to 2 years after cancer diagnosis (Figure 2d in the manuscript). Finally, X5 represented the tumor diameter at 4 5 diagnosis with a linear association with log hazard (Figure 3d in the manuscript) that becomes gradually weaker with increasing time since diagnosis, as reflected by the log HR gradually decaying toward 0 in Figure 2e in the manuscript. This is consistent with empirical findings regarding decreasing over time effects of baseline clinical measures of several prognostic factors [6]. A.5. Main simulations: sensitivity analyses The results of section 3.2 were based on the following equation: λ(t|X1,X2,X3,X4,X5)=λ0(t)*exp[β1(t)*X1+g2(X2)+β3(t)*g3(X3)+β4(t)*g4(X4)+β5(t)*X5] We considered similar simulation studies but allowing the number of true TD and NL effects to vary. More particularly, we considered the two models described by the following equations: λ(t|X1,X2,X3,X4,X5)=λ0(t)*exp[β1*X1+g2(X2)+β3*X3+β4(t)*g4(X4)+β5(t)*X5] λ(t|X1,X2,X3,X4,X5)=λ0(t)*exp[β1*X1+ β2*X2+β3*X3+β4*X4+β5(t)*X5] (W2) (W3) The simulation study defined by (W2) assumes TD effects for X4 and X5 and NL effects for X2 and X4. Compared to the main simulations, this one also assumes the linearity assumption for X3 and the PH assumption for X1 and X3. In contrast, the simulation study defined by (W3) assumes only a TD effect for X5. As opposed to the main simulations, this one also assumes the linearity assumption for X2, X3 and X4, and the PH assumption for X1, X3 and X4. 5 6 The results for the constrained multivariable model, the full flexible multivariable model and the backward elimination are given in Web Tables A2 and A3 respectively. The results for the univariate model are not presented as known to be inaccurate in the case of multivariable modeling. A.6 Detailed results of the backward elimination procedure in the main simulations. As expected, the frequency with which the proposed backward procedure detected a given true TD or NL effect did depend on the strength and the form of the effect (column 6 of Table 2 in the manuscript). For example, the very low power for the TD effect of X3 is mostly due to the fact that most changes in its effect are limited to only the first 5 months of follow-up (Figure 2c in the manuscript), and even during this period, the changes are smaller than those simulated for the TD effects of other covariates. Similarly, the NL effect of X3 is also detected infrequently because (i) the differences in the log HR across the values of X3 are less important than for other covariates (Figure 3b in the manuscript) and, in addition, (ii) these modest differences are multiplied by relatively small values of the corresponding TD effect. Indeed, the mean LRT values identify both TD and NL effects of X3 as much weaker than for other covariates (last column of Table 2 in the manuscript). Furthermore, results in Table 2 in the manuscript indicate that, in a multivariable setting, the ability of the backward elimination to identify true effect of a given predictor may also depend on its correlations with other covariates and on the true effects of the correlated covariates. For example, the fact that the lowest power (21.0%) was obtained for the TD effect of X3 may be partly due to its strong positive correlation (Pearson r=+0.7) with X5 whose TD effect is stronger but, similar to X3, generally decreases during the follow-up 6 7 (Figures 2c versus 2e in the manuscript). On the other hand, it should be noticed that for all true TD/NL effects the power based on the model selected by the backward procedure (column 6 of Table 2 in the manuscript) is as high as that based on the true model (column 7) and higher than that based on alternative strategies considered. In fact, Sauerbrei et al also reported that backward elimination performed better than other selection procedures in their study of flexible multivariable modeling of the effects of continuous covariates [5]. A.7. Application to lung cancer: description of the cohort and the study variables The study subjects were all treated with chemotherapy between April 9, 2002 and September 18, 2008, with a median follow-up duration of 8.6 months. Vital status and, where applicable, date of death were obtained from clinical charts [6]. The follow-up was continued until March 15, 2009, when all patients still alive were censored. Furthermore, because mortality observed more than 3 years after the first chemotherapy was (i) too low for reliable estimation, and (ii) unlikely to be due to NSCL cancer [6], all patients who survived for 3 years after time 0 were censored at that time. Baseline data, collected at the date of the start of the chemotherapy, included several quantitative biomarkers as well as some categorical variables [6]. Given the number of observed events (206 deaths) and the need to respect the critical ratio of at least 10 events per each df of the estimated model [2], we had to restrict our multivariable analyses to a smaller subset of covariates of primary interest. In particular, we present the analyses that focus on the potential prognostic utility of the Neutrophil Count (mean: 7.09*10-9 l-1, standard deviation (SD): 3.55*107 8 9 -1 l ), a continuous variable whose predictive ability in NSCL was up to now relatively little investigated [7-8]. In the same analyses, we adjusted for, and re-assessed, the effects of two other continuous variables, albumin (mean: 40.2 g.l-1, SD: 4.1 g.l-1) and logarithm of C-reactive protein (log(CRP), mean: 3.8 mg.l-1, SD: 2.2 mg.l-1), that are considered important prognostic factors in NSCL [9-10] and were both found to have important TD or NL effects in a previous analysis [6]. Given the major impact of smoking on lung cancer, all models adjusted also for the binary indicator of smoking status (ever/never, with 84.8% of all subjects classified as “ever smokers”). A.8. Assessment of the robustness of the findings in lung cancer application Two approaches were used to assess the robustness of the covariate effects estimated in the flexible model selected by the backward elimination procedure. Firstly, we implemented a bootstrap investigation of the stability of the estimates [11-12], based on 300 bootstrap resamples of the original NSCL dataset. Then, we estimated the 90% pointwise confidence intervals (CI) around the TD or NL estimates at selected values of, respectively, follow-up time (t) or the continuous covariate (x), using the 5th and the 95th percentiles of the distribution of the 300 corresponding bootstrap estimates of β(t) or g(x). Pointwise 90% CI’s are useful to assess the precision of the estimated effects, e.g. for NL effect of a covariate X, of the log HR’s associated with specific values of X, relative to the reference value, such as the sample mean of X. However, at each value x, the corresponding pointwise CI’s are estimated independently of CI’s for the other values of X, which implies that they do not allow assessing the robustness of the estimated shape of the NL curve g(X) [2]. Therefore, for selected NL effects, we have also plotted and visually assessed the 300 estimates, obtained from individual bootstrap re-samples. 8 9 Secondly, to further explore the robustness of the possible NL and/or TD effects of the three continuous predictors, we used the method proposed by Perme et al [13] to obtain smoothed pseudo-residual plots. Web Figure A2 shows smoothed pseudo-residual plots for the three continuous variables, at times corresponding to the 20th, 40th, 60th and 80th percentiles of the distribution of uncensored event times to further assess their possible NL and/or TD effects. For log(CRP), neutrophil count and albumin, the pseudo-residuals show a clear departure from linearity. In addition, the spread and the mean values of the pseudo-residuals vary across the four times, confirming the TD effects of the predictors. A.9. Some explanations for the selection of TD effect of albumin but neither the TD nor the NL effects of neutrophil count by the forward procedure First, notice that the TD effect of albumin is highly significant (p<10-3) in the univariate un-adjusted analyses (Strategy 1 in Table 3 of the main document) and in the ‘constrained’ multivariable model (Strategy 2 in Table 3 of the main document) that adjusts only for the PH/LL effects of the three other covariates, but not in the full flexible multivariable model (p=0.123). Indeed, the TD effect of albumin becomes non-significant in all models that adjust for the TD/NL effects of both log(CRP) and neutrophil count, which can be explained partly by moderately strong correlations of albumin with both predictors (respectively, r=-0.51 and r=0.30). Thus, in the simpler Strategies 1 and 2 that fail to adjust for TD effects of other covariates, correlated with albumin, residual confounding might have induced its spurious TD effect. This conjecture is supported by the inflated type I error rates found for Strategies 1 and 2 in our 9 10 simulations (see e.g. tests of TD effect of X2 in Table 2 of the main document). A similar phenomenon could explain the significance of the TD effect of albumin in the model selected by the forward procedure which does not include the TD and NL effects of neutrophil count (Table 3 of the main document). Indeed, the high statistical significance of the TD effect of albumin in the constrained multivariable model (Strategy 2) explains why this effect was selected at the initial steps of the forward procedure. We will now attempt to explain why no TD or NL effects of neutrophil count could enter the model after, in addition to the TD effect of albumin, the TD and NL effects of log(CRP) were added, at the consecutive steps of the forward selection. This occurred because of two phenomena. (i) Firstly, because of the correlations between the three continuous covariates, none of the TD/NL effects is significant when adjusted for the TD and NL of all other covariates (Strategy 3 in Table 3 of the main document). (ii) Secondly, the significance of the NL or TD effects of neutrophil count is enhanced when adjusted for, respectively, the TD or NL effect of the same variable. This is reflected in the results of Strategies 1 and 2 in Table 3 of the main document, where such adjustments (1st approach) helped identify the NL and/or TD effects of neutrophil count as significant, in contrast to the 2nd approach that imposed the PH or linearity constraints. A similar phenomenon was observed in our simulations, where e.g. the power for testing both the TD and NL effects of X4 increased substantially if, in the true flexible multivariable models, we adjusted for, respectively, the NL and TD effect of X4 (column 7 of Table 2 of the main document). Because the forward selection starts with the constrained PH/LL multivariable model, the TD effect of a continuous variable X is assessed without adjusting for the NL effect of X, and vice versa, until at least one of these effects is selected. Thus, if each of 10 11 the two effects of X becomes ‘significant’ only when adjusted for the other effect, neither TD nor NL effect will be selected by the forward procedure. To further support the above explanation, we carried out additional tests of the effects of neutrophil count, when adjusting for all effects selected by the forward procedure, i.e. when added to the model identified by Strategy 5 in Table 3 of the main document. The separate tests of (i) TD effect of neutrophil count, under the linearity constraint, and (ii) its NL effect, under the PH constraints, yielded non-significant results (p=0.14 and p=0.15, respectively). In contrast, even when adjusted for the TD effect of albumin, both the TD and the NL effects of neutrophil count approached significance when they were mutually adjusted for each other (p=0.06 and p=0.07, respectively). Furthermore, the 6-df LRT that compared the deviance of the model selected by the forward procedure, with PH/LL effect of neutrophil count, against the model that included both its TD and NL effects yielded p=0.04. This indicates that the simultaneous inclusion of both effects results in a significant improvement in the fit to data. Because the backward elimination procedure starts with the full flexible model (Table 1 of the main document), the separate tests of the TD or NL effect of neutrophil count were carried out while adjusting for its NL or TD, which prevented the elimination of either effect. A.10. Results about albumin in the application on non-small-cell lung cancer The TD effect of albumin was selected as significant by the forward procedure (Strategy 5 in Table 3 of the main document) and was the last effect eliminated, with a marginally nonsignificant p=0.08, by the backward procedure. Therefore, Web Figure A3 shows the point estimates of the TD and NL effects of albumin, adjusted for all other effects selected by the 11 12 backward procedure (Strategy 4 in Table 3 of the main document). The NL estimate shows that mortality decreases monotonically with increasing baseline albumin value. In fact, the estimate is very close to a line over a wide range of low to middle albumin values, explaining why the NL effect is completely non-significant (p=0.83). The TD effect indicates a sharp decrease in the impact of low albumin with increasing follow-up duration, implying that its effect is limited to about first six months after the initial measurement. This change in time appears to be clinically important. Thus, further research is necessary to assess if its marginal non-significance (p=0.08) may simply reflect inadequate power of the test, based on the flexible multivariable model with 22 df’s (7df’s for NL and TD effects of each of the three continuous covariates and 1-df effect of smoking status) fitted to data with only 206 deaths. A.11. Discussion on the models selected by the MFP-like algorithm and the backward selection procedure Even if the MFP-like algorithm [14] and our backward procedure selected the same model in our lung cancer analyses (Table 3 of the main document), the differences between the properties of the two algorithms may result in somewhat different selections in some other applications. For example, the 1st step of the MFP-like algorithm starts with assessing the NL effects under the PH constraints, and then step 2 re-assesses if additional NL effects of some covariates may be identified if a shorter follow-up is assumed, by censoring later events [14]. This approach may be less effective in identifying the NL effects of those covariates whose effects are lagged and occur only in the later phase of follow-up, such as effects of some biopsy variables on the risk of renal failure in lupus [15]. Furthermore, sensitivity analyses reported by 12 13 Sauerbrei et al indicated that the results of step 3 may depend on what cut-off time is used to define the “shorter follow-up” [14]. MFP-like strategy involves (a) initial assessment of NL effects under the PH constraints, followed by (b) assessment of TD effects conditional on the previously estimated NL effects. This feature of the 3-step MFP-like algorithm may be seen as both a limitation and an advantage, relative to our approach which involves simultaneous estimation of TD and NL effects [1]. On one hand, by not accounting for possible TD effects while assessing the NL effects, the 3-step algorithm may be somewhat more susceptible for potential residual confounding discussed in the previous paragraph. Indeed, Buchholz and Sauerbrei recognize that assessment of TD effects may depend on the way the NL effect of the same covariate is modeled [16]. On the other hand, by avoiding the need for simultaneous modeling and testing of TD and NL effects, the MFPbased model proposed in [14] reduces the risk of non-convergence problems that require complex numerical procedures to estimate our spline-based model [1]. 13 14 REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Abrahamowicz M, MacKenzie TA. Joint estimation of time-dependent and non-linear effects of continuous covariates on survival. Statistics in Medicine 2007; 26: 392-408. Abrahamowicz M, MacKenzie T, Esdaile JM. Time-Dependent Hazard Ratio: Modeling and Hypothesis Testing With Application in Lupus Nephritis. Journal of the American Statistical Association 1996; 91: 1432-1439. Mackenzie T, Abrahamowicz M. Marginal and hazard ratio specific random data generation: Applications to semi-parametric bootstrapping. Statistics and Computing 2002; 12: 245-252. Sylvestre M-P, Abrahamowicz M. Comparison of algorithms to generate event times conditional on time-dependent covariates. Statistics in Medicine 2008; 27: 2618-2634. Sauerbrei W, Royston P, Binder H. Selection of important variables and determination of functional form for continuous predictors in multivariable model building. Statistics in Medicine 2007; 26: 5512-5528. Gagnon B, Abrahamowicz M, Xiao Y, Beauchamp ME, MacDonald N, Kasymjanova G, Kreisman H, Small D. Flexible modeling improves assessment of prognostic value of C-reactive protein in advanced non-small cell lung cancer. Br J Cancer 2010; 102: 1113-1122. Luo J, Chen Y-J, Narsavage G, Ducatman A. Predictors of survival in patients with non-small cell lung cancer. Oncology nursing forum 2012; 39: 609-616. Teramukai S, Kitano T, Kishida Y, Kawahara M, Kubota K, Komuta K, Minato K, Mio T, Fujita Y, Yonei T, Nakano K, Tsuboi M, Shibata K, Furuse K, Fukushima M. Pretreatment neutrophil count as an independent prognostic factor in advanced non-small-cell lung cancer: an analysis of Japan Multinational Trial Organisation LC00-03. European Journal of Cancer (Oxford, England : 1990) 2009; 45: 1950-1958. Koch A, Fohlin H, Sorenson S. Prognostic significance of C-reactive protein and smoking in patients with advanced non-small cell lung cancer treated with first-line palliative chemotherapy. Journal of thoracic oncology 2009; 4: 326-332. Maeda T, Ueoka H, Tabata M, Kiura K, Shibayama T, Gemba K, Takigawa N, Hiraki A, Katayama H, Harada M. Prognostic factors in advanced non-small cell lung cancer: elevated serum levels of neuron specific enolase indicate poor prognosis. Japanese journal of clinical oncology 2000; 30: 534-541. Royston P, Sauerbrei W. Stability of multivariable fractional polynomial models with selection of variables and transformations: a bootstrap investigation. Statistics in Medicine 2003; 22: 639659. Sauerbrei W. The Use of Resampling Methods to Simplify Regression Models in Medical Statistics. Journal of the Royal Statistical Society. Series C, Applied statistics 1999; 48: 313-329. Perme M, Andersen P. Checking hazard regression models using pseudo-observations. Statistics in Medicine 2008; 27: 5309-5328. Sauerbrei W, Royston P, Look M. A New Proposal for Multivariable Modelling of Time‐Varying Effects in Survival Data Based on Fractional Polynomial Time‐Transformation. Biometrical Journal 2007; 49: 453. Esdaile JM, Abrahamowicz M, Mackenzie T, Kashgarian M, Hayslett JP. The time-dependence of long-term prediction in lupus nephritis. Arthritis & Rheumatism 1994; 37: 359-368. Buchholz A, Sauerbrei W. Comparison of procedures to assess non-linear and time-varying effects in multivariable models for survival data. Biometrical Journal 2011; 53: 308-331. 14 15 Web Table A1. Distribution of the continuous covariates in the main simulations, according to the value of the binary variable X1. Continuous prognostic Value of binary covariate X1 factor X1=0 X1=1 X2 X3 X4 X5 N(2,1) N(0.5,1) N(-0.75,1) N(1,1) 15 N(2,1) N(0,1) N(0,1) N(0,1) 16 Web Table A2. Proportions of Simulated Samples in Which a Given Test Rejected the Corresponding Null Hypothesis, at α=0.05, for the Tests for the Different Effects of Each Covariate for all Strategies in the Case Where There Were Four “True” TD or NL Effects. Prognostic factor X1 X2 X3 X4 X5 Test† Constrained multivariable model Full flexible multivariable model Backward selection§ Constrained test of association Test PH Flexible test of association¶ 45.7* 6.3* 5.0* Constrained test of association Test PH assuming linearity* Test of PH assuming NL* Test of linearity assuming PH Test of linearity assuming TD Flexible test of association 12.7* 21.3* 76.0 78.0 17.7* 7.3* 75.7 71.3 Constrained test of association Test PH assuming linearity* Test of PH assuming NL* Test of linearity assuming PH* Test of linearity assuming TD* Flexible test of association 36.7* 45.3* 14.0* 19.3* 10.0* 10.3* 9.3* 7.0* Constrained test of association Test PH assuming linearity Test of PH assuming NL Test of linearity assuming PH Test of linearity assuming TD Flexible test of association 51.3 60.7 59.3 66.0 27.3 49.0 39.0 58.3 Constrained test of association Test PH assuming linearity Test of PH assuming NL Test of linearity assuming PH* Test of linearity assuming TD* Flexible test of association 91.3 79.0 12.3* 7.3* 63.3 57.0 11.0* 6.0* True model‡ 9.0* 84.3 84.7 197 6.0* 6.3* 51.0 58.3 35.7 52.7 44.7 63.3 93.7 94.3 4.3* Abbreviations: PH, proportional hazards; NL, non-linear; TD, time-dependent † As defined in Figure 1. § For backward elimination, we show the proportion of samples in which a given effect was selected into the final model, i.e. was statistically significant at α=0.05 in the final model. Depending on whether the TD effect of the same variable was also included or not, the test of linearity is adjusted or not for the TD effect and vice versa. ‡ Model in which we only included the true effects for each prognostic factor. ¶ For a binary variable, no NL effects are available. Therefore, the test of association based on the flexible model is the one testing for a compares the model with the TD effect of the variable against the model which excludes the variable. * indicates the corresponding H0 is true, so that the reported proportion represents the Type I error rate 16 17 Web Table A3. Proportions of Simulated Samples in Which a Given Test Rejected the Corresponding Null Hypothesis, at α=0.05, for the Tests for the Different Effects of Each Covariate for all Strategies in the Case Where There Were Only one “True” NL Effect. Prognostic factor X1 X2 X3 X4 X5 Test† Constrained multivariable model Full flexible multivariable model Backward selection§ Constrained test of association Test PH Flexible test of association¶ 52.7* 8.7* 8.0* Constrained test of association Test PH assuming linearity* Test of PH assuming NL* Test of linearity assuming PH Test of linearity assuming TD Flexible test of association 20.7* 11.3* 5.0* 2.3* 6.7* 5.0* 4.7* 2.0* Constrained test of association Test PH assuming linearity Test of PH assuming NL Test of linearity assuming PH Test of linearity assuming TD Flexible test of association 58.0* 44.0* 19.3* 2.7* 7.0* 5.7* 6.7* 6.0* Constrained test of association Test PH assuming linearity Test of PH assuming NL Test of linearity assuming PH Test of linearity assuming TD Flexible test of association 52.0* 37.3* 3.7* 2.3* 6.0* 5.7* 4.0* 3.0* Constrained test of association Test PH assuming linearity Test of PH assuming NL Test of linearity assuming PH* Test of linearity assuming TD* Flexible test of association 94.3 87.7 32.3* 2.7* 68.7 44.3 12.7* 4.7* True model‡ 5.7* 3.0* 9.0* 7.3* 7.0* 3.7* 96.7 84.0 3.7* Abbreviations: PH, proportional hazards; NL, non-linear; TD, time-dependent † As defined in Figure 1. § For backward elimination, we show the proportion of samples in which a given effect was selected into the final model, i.e. was statistically significant at α=0.05 in the final model. Depending on whether the TD effect of the same variable was also included or not, the test of linearity is adjusted or not for the TD effect and vice versa. ‡ Model in which we only included the true effects for each prognostic factor. ¶ For a binary variable, no NL effects are available. Therefore, the test of association based on the flexible model is the one testing for a compares the model with the TD effect of the variable against the model which excludes the variable. * indicates the corresponding H0 is true, so that the reported proportion represents the Type I error rate 17 18 TD Effect 1.0 0.0 log(HR) 2.0 1(t): 0 2 4 (a) 6 8 10 12 time 2 4 6 -2 log(HR) g1(X1): NL Effect 7 8 9 10 X1 11 12 13 Web Figure A1. Time-Dependent (TD) effect and Non-Linear (NL) effect of X1 in preliminary simulations. 18 19 1.4 1.0 2 -3 -2 -1 0 1 2 -3 -2 -1 0 1 2 1 1.0 -3 -2 -1 0 s -3 -2 -1 0 1 2 log(CRP) 1.4 1.0 1.4 1.0 1.4 0.10 0.20 0.10 0 -8 -6 -4 -2 0 -8 -6 -4 -2 0 -8 -6 -4 -2 -8 -6 -4 -2 0 Albumin 0.20 0.10 0.20 0.10 0.20 1 3 5 7 t=108 days 1 3 5 7 3 -2 -1 0 1 2 3 -2 -1 0 1 2 3 2 1 -2 -1 0 -2 -1 0 1 2 3 Neutrophil count 1 t=186 days 3 5 7 t=260 days 1 3 7 t=411 days Web Figure A2. The smoothed average pseudo-residuals with respect to the three continuous covariates at four different points in time (t=108 days, t=186 days, t=260 days and t=411 days, corresponding to the 20th, 40th, 60th and 80th quantiles of event times). Each row represents a continuous variable. 19 5 0.5 0.0 -0.5 log(HR) 1.5 1.0 -0.5 -1.0 0.0 0.5 log(HR) 2.0 2.5 3.0 20 1 4 7 10 14 18 30 time (months) 35 40 45 50 Albumin Web Figure A3. Estimates of the TD and NL effects of Albumin adjusting for NL and TD effects of log(CRP) and Neutrophil count as well as for the PH effect of Smoking. (a) TD effect of albumin, (b) the NL effect of albumin, which shows how the log hazard ratio, relative to the mean value of 40 g l-1, changes with increasing value of albumin. The empirical distributions of the observed event times (panel a) and albumin values (panel b) are shown by ‘rug e plots’, at the bottom of the respective graphs. 20
© Copyright 2026 Paperzz