The Next Step Forward Is to Take a Step Back - Diabetes

2824
Diabetes Volume 65, October 2016
Patrick F. McArdle
The Next Step Forward Is to Take a
Step Back
COMMENTARY
Diabetes 2016;65:2824–2825 | DOI: 10.2337/dbi16-0044
Understanding the causes of diabetes is at the heart of
most diabetes research. The field of causal inference has
made dramatic advances in methodology that allows for
causal inference from observational and not experimental
data (1). One such methodological advancement is the use
of genetic markers as instrumental variables, or so-called
Mendelian randomization (MR) (2). MR relies on the laws
governing genetic inheritance to gain insights into a modifiable risk factor’s causal effect on a disease through the
use of genotype to address potential confounding or reverse
causation.
Three crucial assumptions are required for traditional
MR analyses (3). They are that 1) the genetic markers must
be associated with the risk factor, 2) the genetic markers
are independent of factors that may confound the risk factor and disease, and 3) the genetic marker is independent
of the disease conditional upon the risk factor and confounding factors.
In this issue of Diabetes, Corbin et al. (4) use MR to
explore the causal relationship between BMI and type 2
diabetes (T2D). A composite instrument was developed
from 97 genetic variants identified in previous genomewide association studies. Gene–BMI and gene–T2D associations taken from two independent samples were
combined using three estimating approaches. Traditional
MR analyses require assumptions, as specified in Fig. 1A,
in order for estimates to be valid. If multiple genetic markers
are used to create a composite instrument, as was done in
this case, Corbin at al. argue that those assumptions are
likely violated. Specifically, at least some of the genetic
variants may either themselves affect T2D or be in linkage
disequilibrium with other variants that affect T2D via
pathways that do not involve BMI (Fig. 1B). Violations
of this type are referred to as “horizontal pleiotropy” (5).
It is an interesting sidenote that Fig. 1B also represents
the scenario of gene–environment interaction, a situation
that almost certainly exists in relation to the effect of BMI
on T2D (6).
Corbin et al. (4) use three MR estimators, two of which
are novel applications of methods that relax the horizontal pleiotropy assumption (7–9). The authors note that
the three approaches are generally consistent in the sense
that they return similar estimates of a positive effect of
increased BMI on the odds of T2D. Through the use of
sensitivity analysis, Corbin et al. investigate potential bias
due to including genetic variants that exhibit pleiotropic
effects. They nicely demonstrate the influence of two genes,
TCF7L2 and FTO, on the three different estimates of the
causal effect. Investigating biases resulting from violations
of assumptions is an important step in any analysis (10)
and one that warrants further exploration in this case.
The causal relationship between BMI and T2D is not a
simple one. BMI and T2D are both time-varying factors
that likely exhibit a complex time-dependent signature (Fig.
1C). As BMI changes over time, it both influences future
BMI levels and T2D status and is influenced by past BMI
and T2D status. This picture gets more complicated when
the diagnosis of T2D and not just the presence of T2D is
considered. While methodologists continue to work on the
problem of causal inference in the presence of time-varying
factors and in particular how MR can be used (11), it may
be instructive to take a step back and ask, “What is intended
to be estimated?”
The task of causal inference is reliant upon first defining what is meant by “the causal effect” (12). In the instance of BMI and T2D, one could ask many sensible
questions. Some examples are: If everyone in the population was to reduce their BMI by 1 kg/m2 tomorrow, what
would the burden of T2D be in 1 year? Or if everyone in
the population was to reduce their BMI by 1 kg/m2 tomorrow, what would the burden of T2D be in 5 years? Or
if everyone was to reduce their BMI by 1 kg/m2 tomorrow
and remain at that new BMI for the remainder of their
life, what would the burden of T2D be in 5 years? It seems
reasonable to assume that the numeric answer to each of
those questions would be quite different. Further, each of
Division of Endocrinology, Diabetes and Nutrition, Department of Medicine, and
Program in Epidemiology and Human Genetics, University of Maryland School of
Medicine, Baltimore, MD
© 2016 by the American Diabetes Association. Readers may use this article as
long as the work is properly cited, the use is educational and not for profit, and the
work is not altered. More information is available at http://www.diabetesjournals
.org/content/license.
Corresponding author: Patrick F. McArdle, [email protected].
See accompanying article, p. 3002.
diabetes.diabetesjournals.org
McArdle
2825
more granular levels of detailed data such as longitudinal time series data. Novel methods, particularly those
leveraging the growing genomic knowledge base in
time-varying scenarios, are needed to fulfill the promise
of elucidating the causal effect of BMI on T2D. For that
promise to be fully realized in the future, the novel methods need to address well-defined causal questions and be
paired with appropriately collected data.
Duality of Interest. No potential conflicts of interest relevant to this article
were reported.
References
Figure 1—Three sets of causal assumptions for the estimation of
the effect of BMI on T2D utilizing genetic markers (G) as an instrumental variable: so-called Mendelian randomization. A: Traditional
assumptions. B: Relaxation of horizontal pleiotropy assumption.
C: Time-varying scenario. Common causes of BMI and T2D (dotted
lines) have been left off of C for readability but should be assumed
present.
those questions would require different data collection
strategies. Regardless of study design or estimation strategy, the meaning of a single estimate of “the causal effect”
of BMI on T2D is semantically vague when so many causal
effects are definable.
Valid causal inference requires a well-defined causal
effect and the identification of data that can be used
for the estimation of that effect (13). Judea Pearl (14)
stated this idea succinctly when he summarized the
practice of causal inference in three steps: “define first,
identify second, estimate last.” While great advances
have been made in leveraging instrumental variable estimators and the specification of the identification assumptions that underlie them, they provide little value
if the causal effect is first not defined. In order to take
the next step forward in understanding the risk factors
causally responsible for T2D, it is imperative to take a
step back. Important causal questions must be asked
and translated into statistical targets first, prior to estimation. In some cases, published data may be sufficient to aid in estimation (15), but many will require
1. Pearl J. Causality: Models, Reasoning and Inference. 2nd ed. New York,
Cambridge University Press, 2009
2. Davey Smith G, Hemani G. Mendelian randomization: genetic anchors
for causal inference in epidemiological studies. Hum Mol Genet 2014;23:
R89–R98
3. Lawlor DA, Harbord RM, Sterne JAC, Timpson N, Davey Smith G. Mendelian
randomization: using genes as instruments for making causal inferences in
epidemiology. Stat Med 2008;27:1133–1163
4. Corbin LJ, Richmond RC, Wade KH, et al. BMI as a modifiable risk factor for
type 2 diabetes: refining and understanding causal estimates using Mendelian
randomization. Diabetes 2016;65:3002–3007
5. Burgess S, Timpson NJ, Ebrahim S, Davey Smith G. Mendelian randomization:
where are we now and where are we going? Int J Epidemiol 2015;44:379–388
6. Rampersaud E, Mitchell BD, Pollin TI, et al. Physical activity and the association of common FTO gene variants with body mass index and obesity. Arch
Intern Med 2008;168:1791–1797
7. Bowden J, Davey Smith G, Haycock PC, Burgess S. Consistent estimation in
Mendelian randomization with some invalid instruments using a weighted median estimator. Genet Epidemiol 2016;40:304–314
8. Bowden J, Davey Smith G, Burgess S. Mendelian randomization with invalid
instruments: effect estimation and bias detection through Egger regression. Int J
Epidemiol 2015;44:512–525
9. Burgess S, Butterworth A, Thompson SG. Mendelian randomization analysis
with multiple genetic variants using summarized data. Genet Epidemiol 2013;37:
658–665
10. Greenland S, Lash TL. Bias analysis. In Modern Epidemiology. 3rd ed.
Philadelphia, LWW, 2012, p. 345–380
11. VanderWeele TJ, Tchetgen Tchetgen EJ, Cornelis M, Kraft P. Methodological
challenges in Mendelian randomization. Epidemiology 2014;25:427–435
12. Maldonado G, Greenland S. Estimating causal effects. Int J Epidemiol 2002;
31:422–429
13. Petersen ML, van der Laan MJ. Causal models and learning from data:
integrating causal modeling and statistical estimation. Epidemiology 2014;25:
418–426
14. Pearl J. Foreword. In Targeted Learning: Causal Inference for Observational
and Experimental Data. New York, Springer New York, 2011, p. vii–x
15. Burgess S, Scott RA, Timpson NJ, Davey Smith G, Thompson SG; EPICInterAct Consortium. Using published data in Mendelian randomization: a blueprint for efficient identification of causal risk factors. Eur J Epidemiol 2015;30:
543–552