A Structural Approach to Coaggregation of Disorders in

ONLINE APPENDIX (Hudson et al., “A Structural Approach to the Familial
Coaggregation of Disorders”)
Appendix 1. Principles of Graphical Analysis
Technically, DAGs are a graphic representation of a recursive nonparametric
structural equation model.32 They originated in the area of machine learning, where they
are used to encode Bayesian belief networks. DAGs have since been adopted for
encoding causal relations, as well as the causal determinants of statistical associations,
because they provide a rigorous, yet intuitive, framework for examining causality.32-34
Causal DAGs have proven useful in many epidemiologic applications.33,35-39 DAGs are
closely related to path diagrams,40,41 but differ in that path diagrams imply additionally a
parametric linear statistical relationship between variables.
A DAG implies statistical independencies that can be read from the DAG using
graphical analysis. Therefore, any statistical model that fits the observed data and is
consistent with the independencies implied by the DAG can be used to estimate a causal
effect of interest.
We present here a brief summary of the principles of graphical analysis used in
this paper; overviews of graphical analysis in the epidemiologic literature are provided
elsewhere.33,35,37,39 A directed acyclic graph consists in a set of variables (X1,…, Xn) and
directed edges (edges, represented as line segments, with arrows in a single direction)
among variables such that the graph has no cycles (i.e., one cannot begin at a variable and
follow the directed edges back to the same variable). The diagram is said to be a causal
directed acyclic graph if the directed edges represent direct causal effects. A variable V is
a parent of Xi on the causal directed acyclic graph if there is arrow from V to Xi.
Formally, each of the variables on the graph is defined by a non-parametric structural
equation Xi = fi(pai, i) where pai, the parents of Xi on the graph and the i, the error for
Xi, are mutually independent. Unlike linear structural equations, these non-parametric
equations are entirely general; Xi may depend on any function of its parents and i. The
requirement that the i be mutually independent is essentially that every common cause of
any two variables on the graph must also be on the graph.
A path is defined as any sequence of variables connected by edges regardless of
arrowhead direction; a directed path is a path which follows the edges in the direction
indicated by the graph's arrows; a collider is a particular variable on a path such that both
the preceding and subsequent variables on the path have directed edges going into that
variable (i.e., both the edge to and the edge from that variable have arrowheads into the
variable). A path between A and B is said to be blocked given some set of variables Z if
either there is a variable in Z on the path that is not a collider or if there is a collider on
the path such that neither the collider itself nor any of its descendants are in Z. When all
paths between A and B are blocked given Z, then A and B are conditionally independent
given Z.32,34,42,43
The causal DAG framework has proven to be particularly useful in determining
whether conditioning on a particular set of variables, or none at all, is sufficient to control
for confounding. Pearl32 showed that, when estimating the causal effect of treatment A on
outcome Y, conditioning on a set of variables Z suffices to control for confounding if Z
blocks all “back-door paths” from A to Y (i.e., all paths with directed edges into A) and no
variable in Z is a descendent of A.
Because of their relevance to the DAGs for coaggregation presented below, we
give some additional results pertinent to the situation where we wish to control for
confounding by conditioning on a variable that is not only on a back-door path from A to
Y, but is also a collider on a separate path from A to Y. Although conditioning on this
variable removes bias due to confounding, it also introduces “collider-stratification
bias”,19 which is a form of selection bias44 because conditioning on a collider induces a
statistical association between the parents of the collider. In the instances considered
below, we will be interested in determining the direction of the collider-stratification bias.
However, doing so is not straightforward because it is difficult to make general
statements about the direction (positive or negative) of the association between two
variables conditional on a third variable that is a collider on a path between them. In
general, the conditional covariance may be either positive or negative. Work on
quantifying the magnitude of collider-stratification bias has been done for binary
variables by Greenland19 and linear SEMs by Spirtes.45 VanderWeele and Robins46
introduce conditions that make it possible to draw conclusions about the covariance
between two binary variables conditional on a collider, but these conditions are often too
strong for family studies of coaggregation. We introduce below conditions that, though
plausible in many coaggregation applications, are strong enough to permit conclusions
about the sign of the covariance between two binary variables conditional on a collider.
Appendix 2. Direct effects of A on B, and B and A within individuals
We evaluate here the case where there are direct effects of A on B, and B and A
within individuals, as depicted in the DAG in Appendix Figure 1a, and under the null
hypothesis of no coaggregation, in Appendix Figure 1b. Note that because DAGs do not
permit two variables to cause each other simultaneously, we represent A and B at two
time points and allow that B at time 1 can cause A at time 2, and that A at time 1 can
cause B at time 2; this representation can be extended to represent relationships over
multiple time points. For models involving two or more time points of observation, we
use a third subscript t to denote the time points, t = 1…nt, where t is number of time
points; e.g., the disorder A outcome at the second time point for the proband is YA12.
For simplicity and without loss or generality, we consider for analysis the paths
from latent variables through disorder outcomes at time 1 to outcomes at time 2 to be the
same path as from the latent variable to the outcome at time 2 (e.g., we consider FA-C –
YA11 – YA12 to be the same as FA-C – YA12). Returning to our analysis, under the null
hypothesis there are two unblocked paths from YA12 to YB22 (YA12 – FA-C – YA21 –YB22, and
YA12 – YB11 – FB-C –YB22). Under the alternative hypothesis, there are 3 paths (YA12 – FC –
2
YA21 – YB22 , YA12 – YB11 – FC – YB22, and YA12 – YB11 – FC – YA21 – YB22) from YA12 to YB22
that go through FC, other than the path for the coaggregation effect.
Because of the unblocked paths that result from the direct effect of disorder A on
disorder B, and the direct effect of disorder B on disorder A, YA1 and YB2 are not
independent. However, as in case 2, since we assume that the unblocked paths represent
positive associations, the statistical association in model 1 is greater than the association
explained by coaggregation.
When we condition on YA21 and YB11 under the null hypothesis (DAG in Appendix
Figure 1c), we block the previously unblocked paths, but also open two colliders and
thereby unblock two previously blocked paths (YA12 – EC1 – YB11 – FB-C – YB22 and YA12 –
FA-C – YA21 – EC2 – YB22). Under the alternative hypothesis, we unblock three more
previously blocked paths (YA12 – FC – YA21 – EC2 – YB22 , YA12 – EC1 – YB11 – FC – YB22,
and YA12 – EC1 – YB11 – FC – YA21 – EC2 – YB22).
A logistic regression model incorporating this conditioning is:
logit P (YBj2 = 1 | YA12 , YB11 , YAj1 ) = β0 + β1YA12 + β2YB11 + β3YAj1.
(A1)
If we assume that a single measure of disorder status—such as current disorder or
presence of disorder at any time up to the present (lifetime diagnosis)—is an acceptable
proxy for status at all time points, and if we make the assumption of linear additive
effects of normally distributed latent variables, then the results of the simulation
experiment presented in Appendix 3 can be used to assess the direction of bias. Since the
influence of the direct path from disorder A to disorder B within individuals and vice
versa has been removed by conditioning, the finding that the odds ratio corresponding to
β1 from model A1 is less than the causal coaggregation odds ratio for all parameter
settings (see Appendix 3) indicates that β1 from model A1 is biased downward.
In situations where it is plausible to use a single measure of disorder status as an
acceptable proxy for status at previous time points, the odds ratio for the association
between YA12 and YB22 is the same as the odds ratio for the association between YB12 and
YA22 and we therefore can use a single bivariate logistic regression model with a common
coaggregation parameter to assess coaggregation.1,26 The model combines the following
two logistic regression equations:
logit P (YAj = 1 | YA1 , YB1 , YBj ) = β01 + β1YB1 + β2YA1 + β3YBj
logit P (YBj = 1 | YA1 , YB1 , YAj ) = β02 + β1YA1 + β5YB1 + β3YAj.
3
(A2)
Appendix 3. Results from a simulation experiment investigating the bias in the
estimated coaggregation odds ratios from models 1 and 2 in the presence of direct
effects from disorder B to disorder A.
We conducted simulations in which data were generated from a graph (Appendix
Figure 2) that extends the DAG in Figure 1b by including the unique environmental
factors specific to disorder A (EA-C 1 and EA-C 2) and specific to disorder B (EB-C 1 and EB-C
2) and by replacing FA-C, FB-C, and FC with correlated family-member- specific factors (FAC 1 and FA-C 2, FB-C 1 and FB-C 2, and FC 1 and FC 2, respectively). Although in strict terms the
resulting graph does not represent a causal DAG since it contains undirected edges, it
could be rewritten as a causal DAG by replacing each undirected edge with a common
parent that has directed edges pointing into the two correlated family-member-specific
factors.
We also imposed several additional assumptions to generate the data:
1) Familiality (F) represents genetic effects only.
2) Relatives 1 and 2 are first-degree relatives (i.e., they share 50% of their genes on
average).
3) The joint distribution of the latent variables for a relative pair is multivariate normal.
More specifically, [FA–C 1 FB–C 1 FC 1 FA–C 2 FB–C 2 FC 2, EA–C 1 EB–C 1, EC 1
EA–C 2 EB–C 2 EC 2] has a multivariate normal distribution for i = 1,...,n, where i
indexes relative pairs and n is the number of relative pairs. The means and variances
of the latent variables are set to 0 and 1, respectively, without loss of generality.
4) The correlations between the latent variables are zero with the exception of
cor(FA –C,1, FA –C,2) = cor(FB–C,1, FB–C,2) = cor(FC,1, FC,2) = 0.5.
5) The observed variables are produced by coarsening a linear function of the (relevant)
latent variables and possibly other observed variables. Further, the linear function has
no interactions between the latent variables, and all coefficients are positive. More
specifically,
a. For j=1,2, YA j=1 if and only if Y*A j > tA, where tA is the cutoff
corresponding to the (1-prevA) percentile of the Y*A js and where
Y*A j = β(FA-C → YA) FA–C j + β(FC → YA) FC j + β(EA-C → YA) EA–C j +
β(EC → YA) EC j + β(YB → YA) YB j for all β ≥0.
b. For j=1,2, YB j=1 if and only if Y*B j > tB, where tB is the cutoff
corresponding to the (1-prevB) percentile of the Y*B js and where
Y*B j = β(FB-C → YB) FB–C j + β(FC → YB) FC j + β(EB-C → YB) EB–C j +
β(EC → YB) EC j for all β ≥0.
Assumption 4 corresponds to assumptions commonly made in quantitative genetics
models for complex disorders: that mating is random with respect to the phenotype in
question; that the genetic effects are additive with no epistasis; and that there is no
correlation between genes and environment. Assumption 5 corresponds to the liabilitythreshold model12 used to relate binary phenotypes to underlying liabilities influenced by
genes and environment. With regard to Assumption 5, note that β(FC → YA) = β(FC → YB) = 0
corresponds to the null hypothesis of interest, that there are no familial effects common to
both disorders (i.e., the true co-aggregation odds ratio is zero). Also, note that β(EC → YA)
4
= β(EC → YB) = 0 corresponds to having no unique effects common to both disorders.
Finally, note that β(YB → YA) = 0 corresponds to having no direct effects from B to A
within an individual. Direct effects between individuals, and from A to B within
individuals, are always assumed to be absent.
Using R version 2.4.1,47 we simulated 1000-2000 datasets, each containing four
variables (YA1, YB1, YA2, and YB2) observed for n = 100,000 pairs. To do so, we first used
assumptions 3 and 4 to generate [FA–C 1 FB–C 1 FC 1 FA–C 2 FB–C 2 FC 2, EA–C 1 EB–C 1
EC 1 EA–C 2 EB–C 2 EC 2] for each pair, and then used assumption 5 to calculate [YA1 YB1
YA2 YB2] for each pair. We used the following settings for parameter values: 1)
prevalence for disorder A of 0.02 and 0.10; 2) prevalence for disorder B of 0.02 and 0.10;
3) values of [(β(FA-C → YA)) + (β(FC → YA) )] and of [(β(EA-C → YA)) + (β(EC → YA))] that
correspond to heritabilities of 0.40 and 0.70 for disorder A; 4) values of [(β(FB-C → YB)) +
(β(FC → YB) )] and of [(β(EB-C → YB)) + (β(EC → YB))] that correspond to heritabilities of 0.40
and 0.70 for disorder B; 5) values of β(FC → YA) and β(FC → YB) that correspond to 0 and 0.8
proportion of familial (i.e., genetic) factors being common to disorders A and B; 4) a
value of β(EC → YA) and β(EC → YB) that corresponds to 0.3 proportion of unique
environmental factors being common to disorders A and B; and 5) values of β(YB → YA)
that correspond to no direct effects of YB on YA and direct effects equal to the effect of
familial factors on YA.
For each dataset, we then used logistic regression to estimate the coaggregation
odds ratio under model 1 (marginal) and under model 2 (conditional on YB1). Note that
the conditional odds ratio corresponding to β1 in model 2 cannot be interpreted as the
stratum-specific odds ratio because the two stratum-specific odds ratios cannot be
assumed to be equal. Instead, the conditional odds ratio is a pooled version of stratumspecific odds ratios that, conveniently, has a uniformly negative bias under the
assumptions of positive and additive linear effects of latent variables.
We present in Appendix Tables 1-4 the results of the simulation study. The values
of the parameters that correspond most closely to the example of BED and bipolar
disorder are found in Table 3 (prevalence of disorder A equal to 0.10; prevalence of
disorder B equal to 0.02; heritability of disorder A and disorder B equal to 0.40).
Under the null hypothesis, the causal coaggregation odds ratio is one, and thus
any departures from one indicate bias. In conformance with theoretical results, model 1
was unbiased in the absence of direct effects, but biased in a positive direction in the
presence of direct effects. Of most interest is that model 2 was negatively biased, with
increased bias when there were direct effects.
Under the alternative hypothesis, the causal coaggregation odds ratio for a given
set of parameters is obtained from model 1 when direct effects are absent. In
conformance with theoretical results, model 1 is positively biased in the presence of
direct effects. Most importantly, we found that the direction of the bias for model 2 was
negative.
5
We have shown that conditioning on YB1 yields an estimated coaggregation odds
ratio that is negatively biased under a range of plausible parameter values. However, we
cannot exclude the possibility that we have missed some relevant combination of
parameter values that would yield different results, or that relaxing one or more of
assumptions 3-5 would yield different results. But given various theoretical results
demonstrating that the bias is negative in scenarios similar to ours,19,45 we expect that this
is not the case.
We also performed a range of exploratory analyses (not reported). More extreme
parameter settings still produced a negative bias. In addition, although inclusion of
positive and negative linear terms for the interaction between familial factors and unique
environmental factors common to both disorders sometimes produced a positive bias in
one stratum, the pooled conditional odds ratio always had a negative bias. However,
because the covariance between two factors that is induced by conditioning on a common
effect cannot be assumed to be negative in general, it is likely that other functional forms
(i.e., other than additive linear) for the interaction between the two latent factors would
produce a positive bias.
6
Additional Appendix References (see main text for references 1-31)
32.
Pearl J. Causal diagrams for empirical research. Biometrika 1995;82:669-688.
33.
Robins JM, Smoller JW, Lunetta KL. On the validity of the TDT test in the
presence of comorbidity and ascertainment bias. Genet Epidemiol 2001;21:326336.
34.
Spirtes P, Glymour, C, Scheines, R. Causation, Prediction and Search. New York:
Springer-Verlag, 1993.
35.
Greenland S, Pearl J, Robins JM. Causal diagrams for epidemiologic research.
Epidemiology 1999;10:37-48.
36.
Cole SR, Hernan MA. Fallibility in estimating direct effects. Int J Epidemiol
2002;31:163-165.
37.
Hernan MA, Hernandez-Diaz S, Werler MM, Mitchell AA. Causal knowledge as
a prerequisite for confounding evaluation: An application to birth defects
epidemiology. Am J Epidemiol 2002;155:176-184.
38.
Hernan MA, Hernandez-Diaz S, Robins JM. A structural approach to selection
bias. Epidemiology 2004;15:615-625.
39.
Glymour MM, Weuve J, Berkman LF, Kawachi I, Robins JM. When is baseline
adjustment useful in analyses of change? An example with education and
cognitive change. Am J Epidemiol 2005;162:267-278.
40.
Cox DR, Wermuth N. Linear dependencies represented by chain graphs. Stat Sci
1993;8:204-218.
41.
Pearl J. Causality. Cambridge, England Cambridge University Press, 2000.
42.
Geiger D, Verma T.S., J. P. Identifying independence in bayesian networks.
Networks 1990;20:507-34.
43.
Lauritzen SL, Dawid AP, Larsen BN, Leimer HG. Independence properties of
directed Markov fields. Networks 1990;20:491-505.
44.
Brumback BA, Hernan MA, Haneuse S, Robins JM. Sensitivity analyses for
unmeasured confounding assuming a marginal structural model for repeated
measures. Stat Med 2004;23:749-767.
45.
Spirtes P. Presented at: WNAR/IMS Meeting. Los Angeles, CA, 2002.
7
46.
VanderWeele TJ, Robins JM. Directed acyclic graphs, sufficient causes and the
properties of conditioning on a common effect. Am J Epidemiol 2007;166:10961104.
47.
R Development Core Team. R: A language and environment for statistical
computing R Foundation for Statistical Computing, Vienna, Austria. ISBN 3900051-07-0, URL http://www.R-project.org. 2006.
8
Appendix Table 1. Results of simulation experiment with prevalence of disorder A = 0.02 and prevalence of disorder B = 0.02 a
Heritability
of A
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
Heritability
of B
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
Common
familial
factorsb
0
0
0.8
0.8
0
0
0.8
0.8
0
0
0.8
0.8
0
0
0.8
0.8
Direct
Effectsc
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
Causal
OR(YA1,YB2)
1.00
1.00
2.42
2.42d
1.00
1.00
3.13
3.13d
1.00
1.00
3.12
3.12d
1.00
1.00
4.33
4.33d
Model 1
OR(YA1,YB2)
Bias
1.00
None
1.16
Positive
2.42
None
2.67
Positive
1.00
None
1.32
Positive
3.13
None
4.69
Positive
1.00
None
1.09
Positive
3.12
None
3.26
Positive
1.00
None
1.46
Positive
4.33
None
5.35
Positive
Model 2
OR(YA1,YB2|YB1)
0.95
0.93
1.98
1.88
0.93
0.91
1.91
1.65
0.97
0.96
2.54
2.48
0.95
0.94
2.46
2.29
Bias
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
a- The proportion of unique environmental factors common to disorders A and B within individuals is set equal to 0.3 in all simulations.
b- Proportion of familial factors common to disorders A and B. The value zero corresponds to the null hypothesis of no coaggregation.
c- Values correspond to direct effects that are zero and one times the size of the effect of familial factors for disorder B.
d- Determined from the corresponding model without direct effects, which is unbiased.
9
Appendix Table 2. Results of simulation experiment with prevalence of disorder A = 0.02 and prevalence of disorder B = 0.10 a
Heritability
of A
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
Heritability
of B
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
Common
familial
factorsb
0
0
0.8
0.8
0
0
0.8
0.8
0
0
0.8
0.8
0
0
0.8
0.8
Direct
Effectsc
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
Causal
OR(YA1,YB2)
1.00
1.00
2.05
2.05d
1.00
1.00
2.54
2.54d
1.00
1.00
2.55
2.55d
1.00
1.00
3.38
3.38d
Model 1
OR(YA1,YB2)
Bias
1.00
None
1.25
Positive
2.05
None
2.29
Positive
1.00
None
2.02
Positive
2.54
None
3.44
Positive
1.00
None
1.15
Positive
2.55
None
2.68
Positive
1.00
None
1.65
Positive
3.38
None
3.96
Positive
Model 2
OR(YA1,YB2|YB1)
0.92
0.91
1.56
1.42
0.90
0.89
1.45
1.25
0.95
0.94
1.90
1.75
0.93
0.92
1.74
1.50
Bias
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
a- The proportion of unique environmental factors common to disorders A and B within individuals is set equal to 0.3 in all simulations.
b- Proportion of familial factors common to disorders A and B. The value zero corresponds to the null hypothesis of no coaggregation.
c- Values correspond to direct effects that are zero and one times the size of the effect of familial factors for disorder B.
d- Determined from the corresponding model without direct effects, which is unbiased.
10
Appendix Table 3. Results of simulation experiment with prevalence of disorder A = 0.10 and prevalence of disorder B = 0.02a
Heritability
of A
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
Heritability
of B
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
Common
familial
factorsb
0
0
0.8
0.8
0
0
0.8
0.8
0
0
0.8
0.8
0
0
0.8
0.8
Direct
Effectsc
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
Causal
OR(YA1,YB2)
1.00
1.00
2.04
2.04d
1.00
1.00
2.54
2.54d
1.00
1.00
2.54
2.54d
1.00
1.00
3.38
3.38d
Model 1
OR(YA1,YB2)
Bias
1.00
None
1.08
Positive
2.04
None
2.13
Positive
1.00
None
1.37
Positive
2.54
None
2.95
Positive
1.00
None
1.06
Positive
2.54
None
2.44
Positive
1.00
None
1.23
Positive
3.38
None
3.63
Positive
Model 2
OR(YA1,YB2|YB1)
0.96
0.95
1.86
1.89
0.94
0.94
2.00
2.15
0.97
0.97
2.34
2.23
0.96
0.95
2.69
2.87
Bias
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
a- The proportion of unique environmental factors common to disorders A and B within individuals is set equal to 0.3 in all simulations.
b- Proportion of familial factors common to disorders A and B. The value zero corresponds to the null hypothesis of no coaggregation.
c- Values correspond to direct effects that are zero and one times the size of the effect of familial factors for disorder B.
d- Determined from the corresponding model without direct effects, which is unbiased.
11
Appendix Table 4. Results of simulation experiment with prevalence of disorder A = 0.10 and prevalence of disorder B = 0.10a
Heritability
of A
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
Heritability
of B
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
0.4
0.4
0.4
0.4
0.7
0.7
0.7
0.7
Common
familial
factorsb
0
0
0.8
0.8
0
0
0.8
0.8
0
0
0.8
0.8
0
0
0.8
0.8
Direct
Effectsc
0
1
0
1
0
1
0
1
0
1
0
1
0
1
0
1
Causal
OR(YA1,YB2)
1.00
1.00
1.78
1.78d
1.00
1.00
2.13
2.13d
1.00
1.00
2.13
2.13d
1.00
1.00
2.69
2.69d
Model 1
OR(YA1,YB2)
Bias
1.00
None
1.16
Positive
1.78
None
1.93
Positive
1.00
None
1.65
Positive
2.13
None
2.91
Positive
1.00
None
1.10
Positive
2.13
None
2.21
Positive
1.00
None
1.41
Positive
2.69
None
3.19
Positive
Model 2
OR(YA1,YB2|YB1)
0.94
0.93
1.51
1.49
0.92
0.91
1.47
1.42
0.96
0.96
1.81
1.81
0.94
0.94
1.79
1.78
Bias
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
Negative
a- The proportion of unique environmental factors common to disorders A and B within individuals is set equal to 0.3 in all simulations.
b- Proportion of familial factors common to disorders A and B. The value zero corresponds to the null hypothesis of no coaggregation.
c- Values correspond to direct effects that are zero and one times the size of the effect of familial factors for disorder B.
d- Determined from the corresponding model without direct effects, which is unbiased.
12
Appendix Figure 1. DAGs allowing for direct effect of B on A and A on B: (a) when
coaggregation is present (alternative hypothesis); (b) when coaggregation is absent (null
hypothesis); (c) when coaggregation is absent (null hypothesis), conditional on YB11 and YA21
13
Appendix Figure 2. Graph used to generate data for the ith relative pair in the simulation.
Note that coefficients from a linear model have been added, and correlations of familial
factors between family members that are fixed by design at 0.5 are indicated by arcs without
arrows between these variables.
14