ONLINE APPENDIX (Hudson et al., “A Structural Approach to the Familial Coaggregation of Disorders”) Appendix 1. Principles of Graphical Analysis Technically, DAGs are a graphic representation of a recursive nonparametric structural equation model.32 They originated in the area of machine learning, where they are used to encode Bayesian belief networks. DAGs have since been adopted for encoding causal relations, as well as the causal determinants of statistical associations, because they provide a rigorous, yet intuitive, framework for examining causality.32-34 Causal DAGs have proven useful in many epidemiologic applications.33,35-39 DAGs are closely related to path diagrams,40,41 but differ in that path diagrams imply additionally a parametric linear statistical relationship between variables. A DAG implies statistical independencies that can be read from the DAG using graphical analysis. Therefore, any statistical model that fits the observed data and is consistent with the independencies implied by the DAG can be used to estimate a causal effect of interest. We present here a brief summary of the principles of graphical analysis used in this paper; overviews of graphical analysis in the epidemiologic literature are provided elsewhere.33,35,37,39 A directed acyclic graph consists in a set of variables (X1,…, Xn) and directed edges (edges, represented as line segments, with arrows in a single direction) among variables such that the graph has no cycles (i.e., one cannot begin at a variable and follow the directed edges back to the same variable). The diagram is said to be a causal directed acyclic graph if the directed edges represent direct causal effects. A variable V is a parent of Xi on the causal directed acyclic graph if there is arrow from V to Xi. Formally, each of the variables on the graph is defined by a non-parametric structural equation Xi = fi(pai, i) where pai, the parents of Xi on the graph and the i, the error for Xi, are mutually independent. Unlike linear structural equations, these non-parametric equations are entirely general; Xi may depend on any function of its parents and i. The requirement that the i be mutually independent is essentially that every common cause of any two variables on the graph must also be on the graph. A path is defined as any sequence of variables connected by edges regardless of arrowhead direction; a directed path is a path which follows the edges in the direction indicated by the graph's arrows; a collider is a particular variable on a path such that both the preceding and subsequent variables on the path have directed edges going into that variable (i.e., both the edge to and the edge from that variable have arrowheads into the variable). A path between A and B is said to be blocked given some set of variables Z if either there is a variable in Z on the path that is not a collider or if there is a collider on the path such that neither the collider itself nor any of its descendants are in Z. When all paths between A and B are blocked given Z, then A and B are conditionally independent given Z.32,34,42,43 The causal DAG framework has proven to be particularly useful in determining whether conditioning on a particular set of variables, or none at all, is sufficient to control for confounding. Pearl32 showed that, when estimating the causal effect of treatment A on outcome Y, conditioning on a set of variables Z suffices to control for confounding if Z blocks all “back-door paths” from A to Y (i.e., all paths with directed edges into A) and no variable in Z is a descendent of A. Because of their relevance to the DAGs for coaggregation presented below, we give some additional results pertinent to the situation where we wish to control for confounding by conditioning on a variable that is not only on a back-door path from A to Y, but is also a collider on a separate path from A to Y. Although conditioning on this variable removes bias due to confounding, it also introduces “collider-stratification bias”,19 which is a form of selection bias44 because conditioning on a collider induces a statistical association between the parents of the collider. In the instances considered below, we will be interested in determining the direction of the collider-stratification bias. However, doing so is not straightforward because it is difficult to make general statements about the direction (positive or negative) of the association between two variables conditional on a third variable that is a collider on a path between them. In general, the conditional covariance may be either positive or negative. Work on quantifying the magnitude of collider-stratification bias has been done for binary variables by Greenland19 and linear SEMs by Spirtes.45 VanderWeele and Robins46 introduce conditions that make it possible to draw conclusions about the covariance between two binary variables conditional on a collider, but these conditions are often too strong for family studies of coaggregation. We introduce below conditions that, though plausible in many coaggregation applications, are strong enough to permit conclusions about the sign of the covariance between two binary variables conditional on a collider. Appendix 2. Direct effects of A on B, and B and A within individuals We evaluate here the case where there are direct effects of A on B, and B and A within individuals, as depicted in the DAG in Appendix Figure 1a, and under the null hypothesis of no coaggregation, in Appendix Figure 1b. Note that because DAGs do not permit two variables to cause each other simultaneously, we represent A and B at two time points and allow that B at time 1 can cause A at time 2, and that A at time 1 can cause B at time 2; this representation can be extended to represent relationships over multiple time points. For models involving two or more time points of observation, we use a third subscript t to denote the time points, t = 1…nt, where t is number of time points; e.g., the disorder A outcome at the second time point for the proband is YA12. For simplicity and without loss or generality, we consider for analysis the paths from latent variables through disorder outcomes at time 1 to outcomes at time 2 to be the same path as from the latent variable to the outcome at time 2 (e.g., we consider FA-C – YA11 – YA12 to be the same as FA-C – YA12). Returning to our analysis, under the null hypothesis there are two unblocked paths from YA12 to YB22 (YA12 – FA-C – YA21 –YB22, and YA12 – YB11 – FB-C –YB22). Under the alternative hypothesis, there are 3 paths (YA12 – FC – 2 YA21 – YB22 , YA12 – YB11 – FC – YB22, and YA12 – YB11 – FC – YA21 – YB22) from YA12 to YB22 that go through FC, other than the path for the coaggregation effect. Because of the unblocked paths that result from the direct effect of disorder A on disorder B, and the direct effect of disorder B on disorder A, YA1 and YB2 are not independent. However, as in case 2, since we assume that the unblocked paths represent positive associations, the statistical association in model 1 is greater than the association explained by coaggregation. When we condition on YA21 and YB11 under the null hypothesis (DAG in Appendix Figure 1c), we block the previously unblocked paths, but also open two colliders and thereby unblock two previously blocked paths (YA12 – EC1 – YB11 – FB-C – YB22 and YA12 – FA-C – YA21 – EC2 – YB22). Under the alternative hypothesis, we unblock three more previously blocked paths (YA12 – FC – YA21 – EC2 – YB22 , YA12 – EC1 – YB11 – FC – YB22, and YA12 – EC1 – YB11 – FC – YA21 – EC2 – YB22). A logistic regression model incorporating this conditioning is: logit P (YBj2 = 1 | YA12 , YB11 , YAj1 ) = β0 + β1YA12 + β2YB11 + β3YAj1. (A1) If we assume that a single measure of disorder status—such as current disorder or presence of disorder at any time up to the present (lifetime diagnosis)—is an acceptable proxy for status at all time points, and if we make the assumption of linear additive effects of normally distributed latent variables, then the results of the simulation experiment presented in Appendix 3 can be used to assess the direction of bias. Since the influence of the direct path from disorder A to disorder B within individuals and vice versa has been removed by conditioning, the finding that the odds ratio corresponding to β1 from model A1 is less than the causal coaggregation odds ratio for all parameter settings (see Appendix 3) indicates that β1 from model A1 is biased downward. In situations where it is plausible to use a single measure of disorder status as an acceptable proxy for status at previous time points, the odds ratio for the association between YA12 and YB22 is the same as the odds ratio for the association between YB12 and YA22 and we therefore can use a single bivariate logistic regression model with a common coaggregation parameter to assess coaggregation.1,26 The model combines the following two logistic regression equations: logit P (YAj = 1 | YA1 , YB1 , YBj ) = β01 + β1YB1 + β2YA1 + β3YBj logit P (YBj = 1 | YA1 , YB1 , YAj ) = β02 + β1YA1 + β5YB1 + β3YAj. 3 (A2) Appendix 3. Results from a simulation experiment investigating the bias in the estimated coaggregation odds ratios from models 1 and 2 in the presence of direct effects from disorder B to disorder A. We conducted simulations in which data were generated from a graph (Appendix Figure 2) that extends the DAG in Figure 1b by including the unique environmental factors specific to disorder A (EA-C 1 and EA-C 2) and specific to disorder B (EB-C 1 and EB-C 2) and by replacing FA-C, FB-C, and FC with correlated family-member- specific factors (FAC 1 and FA-C 2, FB-C 1 and FB-C 2, and FC 1 and FC 2, respectively). Although in strict terms the resulting graph does not represent a causal DAG since it contains undirected edges, it could be rewritten as a causal DAG by replacing each undirected edge with a common parent that has directed edges pointing into the two correlated family-member-specific factors. We also imposed several additional assumptions to generate the data: 1) Familiality (F) represents genetic effects only. 2) Relatives 1 and 2 are first-degree relatives (i.e., they share 50% of their genes on average). 3) The joint distribution of the latent variables for a relative pair is multivariate normal. More specifically, [FA–C 1 FB–C 1 FC 1 FA–C 2 FB–C 2 FC 2, EA–C 1 EB–C 1, EC 1 EA–C 2 EB–C 2 EC 2] has a multivariate normal distribution for i = 1,...,n, where i indexes relative pairs and n is the number of relative pairs. The means and variances of the latent variables are set to 0 and 1, respectively, without loss of generality. 4) The correlations between the latent variables are zero with the exception of cor(FA –C,1, FA –C,2) = cor(FB–C,1, FB–C,2) = cor(FC,1, FC,2) = 0.5. 5) The observed variables are produced by coarsening a linear function of the (relevant) latent variables and possibly other observed variables. Further, the linear function has no interactions between the latent variables, and all coefficients are positive. More specifically, a. For j=1,2, YA j=1 if and only if Y*A j > tA, where tA is the cutoff corresponding to the (1-prevA) percentile of the Y*A js and where Y*A j = β(FA-C → YA) FA–C j + β(FC → YA) FC j + β(EA-C → YA) EA–C j + β(EC → YA) EC j + β(YB → YA) YB j for all β ≥0. b. For j=1,2, YB j=1 if and only if Y*B j > tB, where tB is the cutoff corresponding to the (1-prevB) percentile of the Y*B js and where Y*B j = β(FB-C → YB) FB–C j + β(FC → YB) FC j + β(EB-C → YB) EB–C j + β(EC → YB) EC j for all β ≥0. Assumption 4 corresponds to assumptions commonly made in quantitative genetics models for complex disorders: that mating is random with respect to the phenotype in question; that the genetic effects are additive with no epistasis; and that there is no correlation between genes and environment. Assumption 5 corresponds to the liabilitythreshold model12 used to relate binary phenotypes to underlying liabilities influenced by genes and environment. With regard to Assumption 5, note that β(FC → YA) = β(FC → YB) = 0 corresponds to the null hypothesis of interest, that there are no familial effects common to both disorders (i.e., the true co-aggregation odds ratio is zero). Also, note that β(EC → YA) 4 = β(EC → YB) = 0 corresponds to having no unique effects common to both disorders. Finally, note that β(YB → YA) = 0 corresponds to having no direct effects from B to A within an individual. Direct effects between individuals, and from A to B within individuals, are always assumed to be absent. Using R version 2.4.1,47 we simulated 1000-2000 datasets, each containing four variables (YA1, YB1, YA2, and YB2) observed for n = 100,000 pairs. To do so, we first used assumptions 3 and 4 to generate [FA–C 1 FB–C 1 FC 1 FA–C 2 FB–C 2 FC 2, EA–C 1 EB–C 1 EC 1 EA–C 2 EB–C 2 EC 2] for each pair, and then used assumption 5 to calculate [YA1 YB1 YA2 YB2] for each pair. We used the following settings for parameter values: 1) prevalence for disorder A of 0.02 and 0.10; 2) prevalence for disorder B of 0.02 and 0.10; 3) values of [(β(FA-C → YA)) + (β(FC → YA) )] and of [(β(EA-C → YA)) + (β(EC → YA))] that correspond to heritabilities of 0.40 and 0.70 for disorder A; 4) values of [(β(FB-C → YB)) + (β(FC → YB) )] and of [(β(EB-C → YB)) + (β(EC → YB))] that correspond to heritabilities of 0.40 and 0.70 for disorder B; 5) values of β(FC → YA) and β(FC → YB) that correspond to 0 and 0.8 proportion of familial (i.e., genetic) factors being common to disorders A and B; 4) a value of β(EC → YA) and β(EC → YB) that corresponds to 0.3 proportion of unique environmental factors being common to disorders A and B; and 5) values of β(YB → YA) that correspond to no direct effects of YB on YA and direct effects equal to the effect of familial factors on YA. For each dataset, we then used logistic regression to estimate the coaggregation odds ratio under model 1 (marginal) and under model 2 (conditional on YB1). Note that the conditional odds ratio corresponding to β1 in model 2 cannot be interpreted as the stratum-specific odds ratio because the two stratum-specific odds ratios cannot be assumed to be equal. Instead, the conditional odds ratio is a pooled version of stratumspecific odds ratios that, conveniently, has a uniformly negative bias under the assumptions of positive and additive linear effects of latent variables. We present in Appendix Tables 1-4 the results of the simulation study. The values of the parameters that correspond most closely to the example of BED and bipolar disorder are found in Table 3 (prevalence of disorder A equal to 0.10; prevalence of disorder B equal to 0.02; heritability of disorder A and disorder B equal to 0.40). Under the null hypothesis, the causal coaggregation odds ratio is one, and thus any departures from one indicate bias. In conformance with theoretical results, model 1 was unbiased in the absence of direct effects, but biased in a positive direction in the presence of direct effects. Of most interest is that model 2 was negatively biased, with increased bias when there were direct effects. Under the alternative hypothesis, the causal coaggregation odds ratio for a given set of parameters is obtained from model 1 when direct effects are absent. In conformance with theoretical results, model 1 is positively biased in the presence of direct effects. Most importantly, we found that the direction of the bias for model 2 was negative. 5 We have shown that conditioning on YB1 yields an estimated coaggregation odds ratio that is negatively biased under a range of plausible parameter values. However, we cannot exclude the possibility that we have missed some relevant combination of parameter values that would yield different results, or that relaxing one or more of assumptions 3-5 would yield different results. But given various theoretical results demonstrating that the bias is negative in scenarios similar to ours,19,45 we expect that this is not the case. We also performed a range of exploratory analyses (not reported). More extreme parameter settings still produced a negative bias. In addition, although inclusion of positive and negative linear terms for the interaction between familial factors and unique environmental factors common to both disorders sometimes produced a positive bias in one stratum, the pooled conditional odds ratio always had a negative bias. However, because the covariance between two factors that is induced by conditioning on a common effect cannot be assumed to be negative in general, it is likely that other functional forms (i.e., other than additive linear) for the interaction between the two latent factors would produce a positive bias. 6 Additional Appendix References (see main text for references 1-31) 32. Pearl J. Causal diagrams for empirical research. Biometrika 1995;82:669-688. 33. Robins JM, Smoller JW, Lunetta KL. On the validity of the TDT test in the presence of comorbidity and ascertainment bias. Genet Epidemiol 2001;21:326336. 34. Spirtes P, Glymour, C, Scheines, R. Causation, Prediction and Search. New York: Springer-Verlag, 1993. 35. Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research. Epidemiology 1999;10:37-48. 36. Cole SR, Hernan MA. Fallibility in estimating direct effects. Int J Epidemiol 2002;31:163-165. 37. Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as a prerequisite for confounding evaluation: An application to birth defects epidemiology. Am J Epidemiol 2002;155:176-184. 38. Hernan MA, Hernandez-Diaz S, Robins JM. A structural approach to selection bias. Epidemiology 2004;15:615-625. 39. Glymour MM, Weuve J, Berkman LF, Kawachi I, Robins JM. When is baseline adjustment useful in analyses of change? An example with education and cognitive change. Am J Epidemiol 2005;162:267-278. 40. Cox DR, Wermuth N. Linear dependencies represented by chain graphs. Stat Sci 1993;8:204-218. 41. Pearl J. Causality. Cambridge, England Cambridge University Press, 2000. 42. Geiger D, Verma T.S., J. P. Identifying independence in bayesian networks. Networks 1990;20:507-34. 43. Lauritzen SL, Dawid AP, Larsen BN, Leimer HG. Independence properties of directed Markov fields. Networks 1990;20:491-505. 44. Brumback BA, Hernan MA, Haneuse S, Robins JM. Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Stat Med 2004;23:749-767. 45. Spirtes P. Presented at: WNAR/IMS Meeting. Los Angeles, CA, 2002. 7 46. VanderWeele TJ, Robins JM. Directed acyclic graphs, sufficient causes and the properties of conditioning on a common effect. Am J Epidemiol 2007;166:10961104. 47. R Development Core Team. R: A language and environment for statistical computing R Foundation for Statistical Computing, Vienna, Austria. ISBN 3900051-07-0, URL http://www.R-project.org. 2006. 8 Appendix Table 1. Results of simulation experiment with prevalence of disorder A = 0.02 and prevalence of disorder B = 0.02 a Heritability of A 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 Heritability of B 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 Common familial factorsb 0 0 0.8 0.8 0 0 0.8 0.8 0 0 0.8 0.8 0 0 0.8 0.8 Direct Effectsc 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Causal OR(YA1,YB2) 1.00 1.00 2.42 2.42d 1.00 1.00 3.13 3.13d 1.00 1.00 3.12 3.12d 1.00 1.00 4.33 4.33d Model 1 OR(YA1,YB2) Bias 1.00 None 1.16 Positive 2.42 None 2.67 Positive 1.00 None 1.32 Positive 3.13 None 4.69 Positive 1.00 None 1.09 Positive 3.12 None 3.26 Positive 1.00 None 1.46 Positive 4.33 None 5.35 Positive Model 2 OR(YA1,YB2|YB1) 0.95 0.93 1.98 1.88 0.93 0.91 1.91 1.65 0.97 0.96 2.54 2.48 0.95 0.94 2.46 2.29 Bias Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative a- The proportion of unique environmental factors common to disorders A and B within individuals is set equal to 0.3 in all simulations. b- Proportion of familial factors common to disorders A and B. The value zero corresponds to the null hypothesis of no coaggregation. c- Values correspond to direct effects that are zero and one times the size of the effect of familial factors for disorder B. d- Determined from the corresponding model without direct effects, which is unbiased. 9 Appendix Table 2. Results of simulation experiment with prevalence of disorder A = 0.02 and prevalence of disorder B = 0.10 a Heritability of A 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 Heritability of B 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 Common familial factorsb 0 0 0.8 0.8 0 0 0.8 0.8 0 0 0.8 0.8 0 0 0.8 0.8 Direct Effectsc 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Causal OR(YA1,YB2) 1.00 1.00 2.05 2.05d 1.00 1.00 2.54 2.54d 1.00 1.00 2.55 2.55d 1.00 1.00 3.38 3.38d Model 1 OR(YA1,YB2) Bias 1.00 None 1.25 Positive 2.05 None 2.29 Positive 1.00 None 2.02 Positive 2.54 None 3.44 Positive 1.00 None 1.15 Positive 2.55 None 2.68 Positive 1.00 None 1.65 Positive 3.38 None 3.96 Positive Model 2 OR(YA1,YB2|YB1) 0.92 0.91 1.56 1.42 0.90 0.89 1.45 1.25 0.95 0.94 1.90 1.75 0.93 0.92 1.74 1.50 Bias Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative a- The proportion of unique environmental factors common to disorders A and B within individuals is set equal to 0.3 in all simulations. b- Proportion of familial factors common to disorders A and B. The value zero corresponds to the null hypothesis of no coaggregation. c- Values correspond to direct effects that are zero and one times the size of the effect of familial factors for disorder B. d- Determined from the corresponding model without direct effects, which is unbiased. 10 Appendix Table 3. Results of simulation experiment with prevalence of disorder A = 0.10 and prevalence of disorder B = 0.02a Heritability of A 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 Heritability of B 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 Common familial factorsb 0 0 0.8 0.8 0 0 0.8 0.8 0 0 0.8 0.8 0 0 0.8 0.8 Direct Effectsc 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Causal OR(YA1,YB2) 1.00 1.00 2.04 2.04d 1.00 1.00 2.54 2.54d 1.00 1.00 2.54 2.54d 1.00 1.00 3.38 3.38d Model 1 OR(YA1,YB2) Bias 1.00 None 1.08 Positive 2.04 None 2.13 Positive 1.00 None 1.37 Positive 2.54 None 2.95 Positive 1.00 None 1.06 Positive 2.54 None 2.44 Positive 1.00 None 1.23 Positive 3.38 None 3.63 Positive Model 2 OR(YA1,YB2|YB1) 0.96 0.95 1.86 1.89 0.94 0.94 2.00 2.15 0.97 0.97 2.34 2.23 0.96 0.95 2.69 2.87 Bias Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative a- The proportion of unique environmental factors common to disorders A and B within individuals is set equal to 0.3 in all simulations. b- Proportion of familial factors common to disorders A and B. The value zero corresponds to the null hypothesis of no coaggregation. c- Values correspond to direct effects that are zero and one times the size of the effect of familial factors for disorder B. d- Determined from the corresponding model without direct effects, which is unbiased. 11 Appendix Table 4. Results of simulation experiment with prevalence of disorder A = 0.10 and prevalence of disorder B = 0.10a Heritability of A 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 0.7 0.7 0.7 0.7 Heritability of B 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 0.4 0.4 0.4 0.4 0.7 0.7 0.7 0.7 Common familial factorsb 0 0 0.8 0.8 0 0 0.8 0.8 0 0 0.8 0.8 0 0 0.8 0.8 Direct Effectsc 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 Causal OR(YA1,YB2) 1.00 1.00 1.78 1.78d 1.00 1.00 2.13 2.13d 1.00 1.00 2.13 2.13d 1.00 1.00 2.69 2.69d Model 1 OR(YA1,YB2) Bias 1.00 None 1.16 Positive 1.78 None 1.93 Positive 1.00 None 1.65 Positive 2.13 None 2.91 Positive 1.00 None 1.10 Positive 2.13 None 2.21 Positive 1.00 None 1.41 Positive 2.69 None 3.19 Positive Model 2 OR(YA1,YB2|YB1) 0.94 0.93 1.51 1.49 0.92 0.91 1.47 1.42 0.96 0.96 1.81 1.81 0.94 0.94 1.79 1.78 Bias Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative Negative a- The proportion of unique environmental factors common to disorders A and B within individuals is set equal to 0.3 in all simulations. b- Proportion of familial factors common to disorders A and B. The value zero corresponds to the null hypothesis of no coaggregation. c- Values correspond to direct effects that are zero and one times the size of the effect of familial factors for disorder B. d- Determined from the corresponding model without direct effects, which is unbiased. 12 Appendix Figure 1. DAGs allowing for direct effect of B on A and A on B: (a) when coaggregation is present (alternative hypothesis); (b) when coaggregation is absent (null hypothesis); (c) when coaggregation is absent (null hypothesis), conditional on YB11 and YA21 13 Appendix Figure 2. Graph used to generate data for the ith relative pair in the simulation. Note that coefficients from a linear model have been added, and correlations of familial factors between family members that are fixed by design at 0.5 are indicated by arcs without arrows between these variables. 14
© Copyright 2026 Paperzz