Towards a comprehensive picture of the genetic

B RIEFINGS IN BIOINF ORMATICS . VOL 15. NO 1. 30 ^ 42
Advance Access published on 27 August 2012
doi:10.1093/bib/bbs049
Towards a comprehensive picture of the
genetic landscape of complex traits
Zhong Wang, Yaqun Wang, Ningtao Wang, Jianxin Wang, Zuoheng Wang, C. Eduardo Vallejos and Rongling Wu
Submitted: 21st May 2012; Received (in revised form) : 9th July 2012
Abstract
The formation of phenotypic traits, such as biomass production, tumor volume and viral abundance, undergoes a
complex process in which interactions between genes and developmental stimuli take place at each level of biological
organization from cells to organisms. Traditional studies emphasize the impact of genes by directly linking
DNA-based markers with static phenotypic values. Functional mapping, derived to detect genes that control developmental processes using growth equations, has proven powerful for addressing questions about the roles of
genes in development. By treating phenotypic formation as a cohesive system using differential equations, a different
approachçsystems mappingçdissects the system into interconnected elements and then map genes that determine
a web of interactions among these elements, facilitating our understanding of the genetic machineries for phenotypic development. Here, we argue that genetic mapping can play a more important role in studying the genotype ^ phenotype relationship by filling the gaps in the biochemical and regulatory process from DNA to end-point
phenotype. We describe a new framework, named network mapping, to study the genetic architecture of complex
traits by integrating the regulatory networks that cause a high-order phenotype. Network mapping makes use of
a system of differential equations to quantify the rule by which transcriptional, proteomic and metabolomic components interact with each other to organize into a functional whole. The synthesis of functional mapping, systems
mapping and network mapping provides a novel avenue to decipher a comprehensive picture of the genetic landscape of complex phenotypes that underlie economically and biomedically important traits.
Keywords: network mappin; complex traits; differential equations; DNA polymorphism; systems biology
GENETIC MAPPING
One of the major tasks in modern biology is to map
and characterize the genetic architecture of complex
traits and use this knowledge to predict the response
of biological structure, organization and function to
changing environment [1, 2]. One important challenge in achieving this task is posed by the multifactorial and architectural complexity of biological
traits. To infer and establish the genotype–phenotype
relationship at the level of individual genes, known
Corresponding author. Rongling Wu, Center for Statistical Genetics, The Pennsylvania State University, Hershey, PA 17033, USA.
Tel: þ717-531-2037; Fax: þ717-531-0480; E-mail: [email protected]
ZhongWang is a computer scientist and statistical geneticist in the Center for Computational Biology at Beijing Forestry University.
He also works in the Center for Statistical Genetics at The Pennsylvania State University. He develops computer algorithms for
statistical genetic models.
YaqunWang is a PhD student in statistics at The Pennsylvania State University. The focus of his thesis is the dynamic modeling of
Bayesian network.
NingtaoWang is a PhD student in statistics at The Pennsylvania State University. The focus of his thesis is the modeling and dissection
of complex phenotypes using differential geometry.
JianxinWang is an Associate Professor in information science at Beijing Forestry University, interested in algorithmic development for
complex systems.
Zuoheng Wang is an Assistant Professor in biostatistics at Yale University. Her research interest is to develop statistical models for
genetic mapping and genome-wide association studies.
C. E.Vallejos is an Associate Professor in plant genetics at the University of Florida. His research interest centers on the molecular
genetics of disease resistance in crops, such as Phaseolus vulgaris.
Rongling Wu is a Professor in biostatistics and bioinformatics and the Director of the Center for Statistical Genetics at The
Pennsylvania State University. He founded the Center for Computational Biology at Beijing Forestry University. He is interested
in quantitative and statistical genetics.
ß The Author 2012. Published by Oxford University Press. For Permissions, please email: [email protected]
Genetic landscape of complex traits
as quantitative trait loci (QTLs), genetic mapping
pioneered by Lander and Botstein [3] has been
used in a diversity of traits and organisms [4–6].
The past two decades have seen an explosion of
interest in developing various statistical approaches
for genetic mapping of complex phenotypes [7–13].
The defining principle of QTL mapping is based
on the identification of significant associations
between DNA-based polymorphic markers and
phenotypic variation in order to link the genotype
to the phenotype. This approach allows us to dissect
a complex phenotypic trait into its underlying DNA
variants distributed in various regions of the genome.
For example, the difference in stem length between
two plants may be attributed to their different DNA
genotypes at particular loci (Figure 1). Quantitative
genetics has gained revolutionary insights due to the
widespread use of gene mapping [14]. In fact, this
newly acquired knowledge has been successfully
applied in breeding programs [5] and medicine [6].
31
Although useful to varying extents, QTL analyses
based on simple statistical tests to identify the direct
relationship between genotype and end-point
phenotype do not reveal a complete picture of the
genetic architecture of complex traits. Focusing on
end-point phenotypes leaves out of the analysis all
the developmental and genetic factors that shape the
phenotype. For instance, consider the acquisition of
certain height or weight over different lengths of
time. The same end-point phenotype could be
attained through different developmental trajectories
each controlled by a different set of genes. This
example underscores the limitations of bulking similar end-point phenotypes for standard QTL analysis.
It is widely recognized that almost all organ traits
undergo a developmental process during which their
level of complexity increases [15]. Thus, development is a dynamic process in which genes play a
critical role in regulating the pattern of trait development through their interactions with morphogens
[16]. In addition, DNA variants can affect phenotype
formation by altering to some degree biochemical
pathways and regulatory networks directly involved
in the shaping of a particular phenotype [17–19].
The developmental pattern of genetic control can
now be documented at both the spatial and temporal
levels by two complementary and potentially synergistic approaches: global analysis of gene expression,
and functional and systems mapping; the latter are
two statistical models for mapping dynamic traits. In
this Opinion article, we present a new conceptual
framework of mapping complex phenotypes by integrating knowledge of how genetic information is
integrated, coordinated and, ultimately transmitted
through regulatory networks to enable biological
functions that shape high-order traits. We will also
argue how a comprehensive genotype–phenotype
map can be charted through the merger of this
framework and functional and systems mapping.
FUNCTIONAL MAPPING
Figure 1: Traditional genetic mapping that links DNA
variation (AT versus CG) in a region of the plant
genome to phenotypic variation in static plant traits,
such as leaf display, through a statistical test. This approach simply considers the biochemical pathways
from DNA to the phenotype as a black box, unable to
address the biological complexity of phenotypic formation and development.
Conventional QTL mapping experiments require
three basic steps: (i) genotyping a segregating population with a relatively large number of DNA-based
markers, (ii) phenotyping the populations at a specific point in time, usually the end-point (final size,
weight, etc.), and (iii) identifying significant associations between the genotype and the phenotype.
This approach, however, has a major limitation
because it fails to analyze the genetic components
32
Wang et al.
of trait development that lead to phenotypic variation over time. This limitation can be overcome
by functional mapping [12, 20–23], a new QTL
mapping approach that takes into account the dynamic nature of trait development. It accomplishes
this task through a statistical analysis of the mathematical function that best represents the growth or
developmental trajectory of a segregating trait using
parametric or nonparametric models [21] (Figure 2).
Thus, functional mapping has the ability to identify
QTL, which may play critical roles in trait development at a specific period in time. Overall, functional
mapping displays increased statistical power and stability by reducing the number of parameters to be
estimated.
From a statistical standpoint, functional mapping
strives to jointly model mean-covariance structures
for a longitudinal trait, i.e. a trait measured repeatedly over a period of time. However, in contrast to
the general treatment of longitudinal problems,
functional mapping integrates parameter estimation
and the testing process within a mixture-based
framework in which each mixture component is
assigned a biological rationale. Unlike traditional
mapping models that normally do not consider biological processes, functional mapping yields biologically relevant genetic information because it embeds
the mathematical aspects of biological processes
(e.g. growth equations) within the estimation procedure of QTL parameters. Thus, results obtained by
functional mapping provide a mechanistic and realistic perspective of biological systems. Specifically,
functional mapping has been shown to address the
following fundamental biological questions:
(i) How does a specific QTL affect the pattern of development? Functional mapping can determine the
timing at which the QTL is switched on and
Figure 2: Biological networks that are involved in trait formation and development. Genetic mapping is moving
from a direct genotype ^ phenotype correspondence (Figure 1) to regulatory networks that underlies the formation
of a phenotypic trait. Functional mapping associates DNA variants with the developmental process (early, middle
and late) of a trait, such as plant height, facilitating the elucidation of the developmental and genetic basis of trait
variation. Systems mapping extends the dynamic idea of functional mapping to dissect the phenotype into its interrelating components based on design principles and then map QTLs that determine each of the components and
component^ component interactions and coordination. In this example, systems mapping dissects plant height into
its leaf and root components over a time scale and study how leaf and root traits coordinate and compete with
each other to affect height growth. Now, network mapping combines the advantages of functional mapping and systems mapping to identify eQTLs (for gene expression), pQTLs (for protein expression) and mQTLs (for metabolic expression) and their genetic interactions that control regulatory networks that cause the final phenotype. Network
mapping entails the integrative and innovative use of different tools from diverse disciplines, such as genetics, mathematics, development, statistics, computer science and engineering.
Genetic landscape of complex traits
off and the period of how long the QTL is at
work [21].
(ii) Are there any distinct temporal patterns of QTL effects in
a time course? When used to study the genetic
control of thermal performance, for example,
functional mapping can identify whether hotter–colder, faster–slower or generalist–specialist
QTLs exist to control thermal performance
curves of growth rates [24].
(iii) How do developmental and environmental stimuli impact
on theexpression ofa QTL? Genes affect phenotypic
development through internal (developmental)
and external environments. Functional mapping
can characterize how different environmental stimuli activate the expression of new
QTLs and guide the direction of their expression [25, 26].
(iv) What is the overall picture of the genetic architecture of
dynamic traits? Functional mapping can test
whether pleiotropic QTLs contribute to genetic
correlations between vegetative growth and
reproductive behavior [22]. In particular, it can
also test how different QTLs interact to form a
dynamic web of epistatic network to regulate
growth and development.
SYSTEMS MAPPING
Functional mapping is a dynamic model for mapping
QTLs because it treats trait development as a process.
Indeed, any complex trait is formed in a dynamic
system that consists of various interacting parts or
gene networks, each with different levels of complexity and defined spatial and temporal patterns of
expression. The behavior and outcome of this
system, i.e. the resulting final phenotype, can be
changed when one or more key components of
the system are replaced by a variant [27]. However,
to understand the basis of genotype-dependent
changes of the phenotype, we need to understand
how different system parts are coordinated and organized, as well as the genetic mechanisms that underlie their coordination [28].
A fundamental principle and approach for mapping
QTL associated with a complex dynamic system has
been formulated by linking different parts of the
system with robust mathematical equations and integrating these equations into a statistical framework for
genetic mapping. This approach, called systems mapping [29], deploys a system of differential equations to
describe and quantify the dynamic behavior of a
33
biological system and manipulate its high complexity
(Figure 2). Systems mapping allows for the quantitative testing of biologically interesting hypotheses
about the genetic mechanisms that control the interactions patterns among integral parts of the system
and the emerging properties of the whole system.
During plant growth, biomass allocation to different organs (stem, branches, leaves, seeds, coarse roots
and fine roots) tends to produce an optimal architecture that maximizes the capture of nutrients, light,
water and carbon dioxide through a specific developmental program [30]. Thus, plant growth is a
highly coordinated activity in which biomass is partitioned to the various organs according to the need
while maintaining a physiological balance among
them dictated by basic biological principles. A
system of ordinary differential equations (ODEs)
has been derived to model the dynamic allocation
of carbon to different organs during growth in relation to carbon and water/nitrogen supply by integrating coordination theory [31] and allometric
scaling [32, 33]. Systems mapping can quantitatively
and precisely identify genes that govern the mechanistic relationships and interactions among system
components. This task can be accomplished through
various tests for individual ODE parameters, or combinations of them. Some mechanistically meaningful
relationships that have been examined include:
(i) Size-shape relationshipçSystems mapping can
analyze whether a big plant is due to a big
stem with sparse leaves or dense leaves with a
small stem [34].
(ii) Structural^functionalrelationshipçSystems mapping
can detect whether a plant tends to allocate
more carbon to its leaves for CO2 uptake or
to roots for water and nutrient uptake when
the environment changes [35].
(iii) Cause^effect relationshipçSystems mapping can
determine whether more roots are due to
more leaves or more leaves produce more
roots in a particular environment [36, 37].
(iv) Pleiotropic relationshipçSystems mapping can
quantify how some QTLs pleiotropically control morphological integration in which different traits with a similar function tend to
integrate into developmental modularity [38].
A better understanding of these relationships
will help us gain additional insights into the mechanistic response of plant size and shape to
34
Wang et al.
developmental and environmental signals and, also,
provide guidance to select an ideotype of crop cultivars with optimal shape and structure suited to a
particular environment.
A BIG PICTURE OF LIFE
Existing genetic mapping methods are an intrinsic
reductionist approach because they usually do not
take into account the high-dimensional nature of
gene expression—spatio-temporal patterns, transcriptional and post-transcriptional controls in the
context of regulatory gene networks involved in
trait formation. The emergence and development
of any phenotype are carried out within a living
system in which the ‘Central Dogma’ of biology, as
a fundamental rule, controls and directs the life
process according to: DNA ! mRNA ! enzyme
(inactive) ! enzyme
(active) ! metabolite(s) ! metabolism ! cellular
physiology ! phenotype (Figure 2). The pathways and
processes from DNA to phenotype can be viewed as
a biological system comprised of large populations of
elements that function together within the system
according to unique design principles. Thus, the
phenotype can be defined by the structure and
dynamics of the system. These two characteristics
jointly establish the essential property of a biological
system, robustness, with which single perturbations
will rarely cause functional failure [39, 40]. There are
two sets of digital information that play a central role
in assembling different parts of the system; together,
these sets coordinate with each other and with the
surrounding environment. The first set comprises
genes that encode proteins, which play specific structural and functional roles. The second comprises
regulatory networks in which the genes interact
with each other and with their products in a
manner that reveals a complex web of dynamic
co-expression in time and space [41]. While genetic
information can operate over such a wide range of
time scales: tens of years to hours (life cycles) and
weeks to milliseconds (physiology) [42], the biological phenotype it determines can be expressed at
multiple levels of organization, from the molecular
to the organismal level [40].
Current biotechnologies allow the high-throughput measure of gene expression, a process by which
inheritable information from the DNA sequence is
converted into a functional gene product (such as
RNA or protein). Nowadays, it has been
increasingly feasible to measure the whole set of
genes of a biological system (genome), the entire
set of RNA transcripts of the system (transcriptome),
the entire collection of proteins (proteome) and the
entire range of metabolites taking part in a biological
process (metabolome). Another technological development has facilitated the measure of organismal
phenotypes, such as embryo growth trajectories,
heart 3D shape and plant root architecture. It is
also possible to measure a complete set of phenotypes
(phenome) for some organisms, such as humans [19]
and Arabidopsis [43]. The availability of these omics
data entails the development of computing models to
analyze the interactome (complete set of interactions
between proteins or between these and other molecules) and localizome (localization of transcripts,
proteins, etc.) and ultimately relate these to the
organization and function of the biological system
[41, 44, 45].
NETWORK MAPPING
Many studies have been carried out to map QTLs that
affect gene expression (eQTLs), proteomic (pQTLs)
and metabolomic profiles (mQTLs) (Figure 2) by dissecting the phenotype into its genetic regulatory networks [19, 46–48]. However, most of these studies
associate DNA polymorphic markers with expression
profiles of individual genes, proteins or metabolites,
without considering intrinsic gene–protein–metabolite interaction networks. Such single-level analysis
cannot extract information related to the control of
gene expression between the transcriptional and
post-translational levels. What is needed is an approach that can provide insight into interaction networks of genes during cell cycle, cell differentiation,
signal transduction and metabolism, all processes that
form a phenotypic trait.
Modeling dynamic regulatory networks
The information available from omics research can
enable us to assess multiple features of complex systems at various levels of biological organization, from
the cell to the whole organism within a defined and
fully delineated framework. The challenge is therefore to decode this information in the context of the
physiology of the system [49, 50].
It is imperative to have a good understanding of
the pattern and control of gene expression and the
physiology of a biological system if we are going to
construct a model to assess the capabilities of such
Genetic landscape of complex traits
system [49]. Classic correlation methods or multivariate statistics have been used to correlate gene
expression data with proteomic and metabolomic
data [47, 48], but they fail to capture the true quantitative variation and relationship between activity of
thousands of gene–protein couples or protein–metabolite couples in a cellular system. To do so, we
need to consider several critical factors, i.e. the time
displacement of the genetic and protein synthetic
and post-translational events, their different timescales and their half-lives. ODEs that have been
widely used to model electronic networks in engineering may play a part in describing regulatory networks of a biological system [51–56]. These ODE
methods were applied to model the regulatory network of Halobacterium salinarum, suggesting that the
model could predict mRNA levels of 2000 out of
a total 2400 genes found in the genome [49]. A
similar application was implemented to model
human regulatory networks (TLR-5–mediated
stimulation of macrophages) and several other microbial networks [54].
Mapping dynamic regulatory networks
It is feasible to derive a dynamic model for mapping
the biochemical pathways of trait formation and
development, and identifying the mechanistic
networks involved in the generation of a high-order
phenotype by integrating the basic principles of functional mapping and systems mapping. Although a
regulatory network usually has a complex structure,
robust mathematical models like differential equations have proven a powerful means for simplifying
and summarizing this complexity. We use a simplified example of gene expression to show how network mapping is developed,
Four processes determine the expression of a gene
into its protein: transcription, translation, mRNA
degradation and protein degradation, as depicted
below [57].
The dynamics of gene expression in a time course
may be described by two ODEs incorporating these
four reactions [57], i.e.
dM
¼ k1 d1 M
dt
dP
¼ k2 M d2 P
dt
35
ð1Þ
where M and P are the time courses of mRNA and
protein expression, respectively, k1 is the rate of
mRNA transcription from DNA, d1 is the decay
rate of mRNA degradation, k2 is the rate of translation from mRNA to protein, and d2 is the decay rate
of protein degradation. By using this set of ODEs,
we can quantify the dynamic properties of a gene
expression system (1) based on parameters
? ¼ (k1,d1;k2,d2).
By integrating ODEs (1) and DNA polymorphic
markers through a mixture model framework, a new
dynamic model, called network mapping, can be
derived. In Box 1, a statistical procedure of deriving
network mapping is described. Network mapping is
equipped to estimate and test ODE parameters for
different QTL genotypes inferred from DNA marker
information based on QTL-marker linkage or linkage disequilibrium [58]. Consider an ODE parameter
set ?1 ¼ (k11,d11;k12,d12) (0.2,0.9;0.9,0.25), with
which we plot the dynamic behavior of mRNA
and protein expression (Figure 3A). If a QTL affects
the dynamic system of gene–protein expression (1)
by changing one or more parameters, we
obtain different behavioral dynamics of expression
profiles as shown in Figure 3B with ?2 ¼
(k21,d21;k22,d22) (0.25,0.9;0.9,0.35) and Figure 3C
with ?3 ¼ (k31,d31;k32,d32) (0.3,0.9;0.9,0.45). Pronounced differences in curve form show a high sensitivity of gene and protein expression to parameter
changes and, therefore, genotype changes. The genetic control of the correlation dynamics of gene and
protein expression can be visualized by displaying
genotype-specific phase planes (Figure 3D).
Phase-plane analysis reveals the equilibrium points
of the gene–protein expression system, showing the
pattern of how a QTL controls the equilibrium
point. Network mapping embeds a procedure of
testing whether a QTL pleiotropically controls
gene expression and protein expression. Computer
simulation has provided basic information about the
power of QTL detection and the precision of parameter estimation by network mapping given different sample sizes, measurement errors and the number
of time points [30].
Network mapping described in Box1 can be readily extended to characterize the effects of
genetic interactions between different QTLs by
36
Wang et al.
Figure 3: Dynamic changes of gene and protein expression in a time course, with the pattern and behavior depending on different QTL genotypes (A, B and C). In (D), a phase plane analysis is conducted for the three genotypes, showing their different patterns of gene ^ protein expression dynamics.
incorporating multiple QTLs into likelihood (B1). If
there exist an eQTL and a pQTL, co-located at a
given genomic region, which control gene and protein expression dynamics, respectively, in a manner
that is specified by system (1), then network mapping
allows the overall genetic architecture of transcriptional regulation to be elucidated as follows:
From the above graphic presentation, we can address several important questions: (i) does the eQTL
pleiotropically control protein expression as shown
by the dash line with an arrow? Similarly, does the
pQTL pleitropically controls the gene expression?
(ii) do the eQTL and pQTL interact with each
other to determine gene expression, protein expression or both? (iii) is the correlation between gene and
protein expression due to either the pleiotropic
control of these two QTLs or their genetic linkage
in the genome? (iv) can we identify how each QTL
affects different processes, transcription, translation,
mRNA degradation and protein degradation, by
testing individual parameters in ? ¼ (k1,d1;k2,d2)?
In practice, a biological system comprises of multiple genes and multiple proteins that interact with
each other and are co-expressed across a time-space
scale [41, 57]. Box 2 provides such an example of
multiple-gene interactions. The basic procedure of
network mapping given in Box 1 can well be used
to map the regulatory network of this system by
applying or deriving a high-dimensional system of
differential equations with more complex structure
(Box 2). Towards a fuller understanding of the genetic causes and consequences of transcriptional regulation networks that lead to a final phenotype, this
approach allows testing a series of hypotheses about
the pattern of expression of a gene set, a protein set
or a gene–protein co-expression.
COMPREHENDING GENETIC
ARCHITECTURE
Phenotypic traits are usually determined by many
genes, acting with various effects and manners and
Genetic landscape of complex traits
37
Box 1: Statistical models for network mapping
Genetic mapping is based on an experimental or natural population of size n in which segregating individuals are genotyped for DNA markers and phenotyped for gene and protein expression profiles at a series
of T time points. If specific QTLs exist to affect the dynamic system (1), the parameters that specify the
system should be different among QTL genotypes. Genetic mapping uses a mixture model-based likelihood
to estimate QTL genotype-specific parameters. This likelihood is expressed as
n Y
L yM ; yP ¼
o1ji f1 ðyMi ,yPi Þ þ þ oJji fJ ðyMi ,yPi Þ
ðB1Þ
i¼1
where yM ¼ (yMi(t1), . . . , yMi(tT)) and yP ¼ (yPi(t1), . . . , yPi(tT)) are the expression profiles of gene and protein
measured at T different time points, ojji is the conditional probability of QTL genotype j (j ¼ 1, . . . ,J) given
the marker genotype of individual i, fj(yMi,yPi) is a multivariate normal distribution with expected mean
vector for individual i that belongs to QTL genotype j,
ðuMjji ; uPjji Þ ¼ uMjji t1 , . . . ,uMjji tT ; uPjji t1 , . . . ,uPjji tT
ðB2Þ
and covariance matrix
R¼
RM
RPM
RMP
,
RP
ðB3Þ
with RM and RP being (T T) covariance matrices of time-dependent mRNA and protein expression,
respectively, and RMP ¼ RPM being a (T T) covariance matrix between the two variables.
In network mapping, we incorporate ODEs (1) into mixture model (B1) to estimate genotypic means (B2)
specified by ODE parameters for different QTL genotypes, expressed as (k1j,d1j;k2j,d2j) for j ¼ 1, . . . ,J [72,
73]. Since mRNA and protein expression profiles obey dynamic system (1), the derivatives of genotypic
means can be expressed in a similar way. Let gkjji(t,ukjji) denote the genotypic derivative for variable k (k ¼ M
or P), i.e.
gkjji ðt,ukjji Þ ¼
dukjji
:
dt
We use ukjji to denote the genotypic mean of variable j for individual i belonging to QTL genotype j at an
arbitrary point in a time course. Based on the Runge–Kutta scheme, the value of ukjji in iteration l þ 1 is
determined by the present value plus the weighted average of four deltas (where each delta is the product of
the size of the interval and an estimated slope), expressed as
flg
1
uflþ1g
kjji ¼ ukjji þ 6 R1kjji þ 2R2kjji þ 2R3kjji þ R4kjji ,
is the delta based on the slope at the beginning of the interval, using uflg
where R1kjji ¼ hgkjji tflg ,uflg
kjji
kjji ;
flg 1 flg 1
R2kjji ¼ hgkjji t þ 2h,ukjji þ 2R1kjji is the delta based on the slope at the midpoint of the interval, using
flg 1 flg 1
1
uflg
based on the slope at the midpoint, but
kjji þ 2R1kjji , R3kjji ¼ hgkjji t þ 2h,ukjji þ 2R2kjji is again the delta
flg
1
flg
now using uflg
t
the
delta based on the slope at the end of
þ
R
,
and
R
¼
hg
þ
h,u
þ
R
kjji
3kjji
4kjji
kjji
kjji
2 2kjji
the interval.
The Runge–Kutta fourth order algorithm with step size h ¼ 0.1 is used to approximate the solution in high
accuracy given a trial set of parameter values and initial conditions.
Next, we need to model the covariance structure by using a parsimonious and flexible approach such as an
autoregressive, antedependence, autoregressive moving average or nonparametric and semiparametric
approaches [74]. In likelihood (B1), the conditional probabilities of QTL genotypes given marker genotypes
can be expressed as a function of recombination fractions for an experimental cross population or linkage
disequilibria for a natural population [58]. The estimation of the recombination fractions or linkage disequilibria can be implemented with the EM algorithm.
Box 2: Kinetic analysis and mapping of gene ^ gene interactions
In a regulatory network, it is common that different genes interact and coordinate to determine an intermediate step towards phenotypic formation. Network mapping allows gene–gene interactions to be tested
in a quantitative way. Consider three genes that operate in a network [41] depicted graphically as below:
Gene 1 is constitutively expressed, and is repressed by gene 3. Therefore, its level may reach a maximal rate
of increase (k1s where s stands for synthesis) when the level of gene 3 is 0, in which case k1s will be
multiplied by 1. When the level of gene 3 is non-zero, the level of gene 1 rises more slowly than k1s.
Transcription of gene 2 is activated by gene 1, with the expression level of gene 2 rising as a Michaelis–
Menten function of the level of gene 1. Similarly, transcription of gene 3 is activated when both gene 1 and
gene 2 levels are non-zero.
The regulatory relations of the three genes can be described by a triple of ODEs [41], expressed as
dM1
1
¼ k1s k1d M1
1 þ k13 M3
dt
dM2
k21 M1
¼ k2s k2d M2
dt
1 þ k21 M1
dM3
k31 M1 k32 M2
¼ k3s k3d M3
dt
ð1 þ k31 M1 Þð1 þ k32 M2 Þ
ðB4Þ
where degradation is modeled as a first-order reaction with rate constants k1d, k2d and k3d for three genes,
respectively. This formulation assumes that every transcript is immediately translated, and therefore the
synthesis constants k1s, k2s and k3s refer to both transcription and translation.
Each equation in system (B4) shows the change in the level of a gene as a difference of its synthesis and
degradation through a set of 10 ODE parameters (k1s,k2s,k3s,k1d,k2d,k3d, k21,k31,k32,k13). By testing these
parameters singly or in combination, we can determine how an eQTL affects the pattern of co-expression
of different genes. For example, the test of how gene 1 and 2 are co-expressed is based on the null
hypothesis H0: k21j k21 (for j ¼ 1, . . . ,J).
To show the effect of eQTLs on the dynamic system of gene co-expression, we assume three groups of
parameters (k1s,k2s,k3s,k1d,k2d,k3d, k21,k31,k32,k13) ¼ (2,2,15,1,1,1,1,1,1,100) for QTL genotype 1,
(2,2,15,0.8,0.8,0.8,1,1,1,100) for QTL genotype 2, and (2,2,15,1,1,1,0.8,0.8,0.8,100) for QTL genotype
3, which produce different patterns of trajectories of gene expression in a time course as shown below:
Network mapping can be extended to map eQTLs and their interactions for more than three genes
expressed on a time scale. A particular mathematical treatment is needed to assure the stability of the
estimates of parameters for a high-dimensional system of ODEs.
Genetic landscape of complex traits
interacting with one another, and also with a capacity to adjust genetic expression in response to environmental and developmental signals [14]. These
activities of genes are cumulated to form a complex
network of actions and interactions. The genetic
architecture of a complex trait describes the structure
and dynamics of this network. The more gene activities listed, the more comprehensive genetic architecture elucidated. Network mapping enjoys the
incorporation of all the activities that constitute genetic architecture.
In recent years, new sources for genetic variation
have been recognized and studied, including single
nucleotide polymorphisms (SNPs), insertions or
deletions ‘indels’ and copy number variants [59].
With the increasing availability of these data,
genome-wide association studies (GWAS), aimed
to identify a complete set of genes affecting a complex phenotype or disease, have become one of the
most important tools in genetic research [59]. The
implementation of network mapping into GWAS
can greatly refresh our understanding of how and
where a phenotype is originated.
Another important source, genomic imprinting,
has been increasingly studied. This phenomenon results from epigenetic marks, causing different patterns
of expression of certain genes depending on their
parental origin [60–62]. These so-called imprinted
genes violate the classical Mendelian inheritance,
which are either expressed only from the allele of
the mother, such as H19 or CDKN1C [63, 64], or
from the allele of the father, such as IGF-2 [65].
From a quantitative genetic perspective, genomic
imprinting may provide the organisms with evolutionary merits by activating additional genetic variation and conferring a fitness benefit when the
environment changes [66, 67]. Currently, different
forms of genomic imprinting have been detected in a
variety of species and thought to play an important
role in regulating crucial aspects of embryonic
growth and development as well as pathogenesis
[61]. Recent bioinformatic analyses suggest that the
number of imprinted genes may be higher than we
thought previously, although this remains to be
demonstrated experimentally [62].
Genomic imprinting can also be incorporated into
network mapping through a family design in which
both parents and their progeny are genotyped while
the phenotype is measured on the progeny. Li et al.
[68] proposed a sampling strategy to estimate genetic
imprinting expressed at the individual gene level by
39
sampling nuclear families at random from a natural population. Using reciprocal crosses, Wang
et al. [69, 70] formulated a model to compute different types of imprinting effects and their interactions
with other genetic effects; this model allows the test
of whether imprinting effects are reprogrammed in
the process of embryonic development or transmitted to next generations. There is no difficulty in
implementing these designs into network mapping,
facilitating the complete elucidation of trait genetic
control.
CONCLUDING REMARKS
One of the major challenges in systems biology, as far
as modeling is concerned, is the construction of
models capable of integrating processes across contrasting scales of time and space. The suite of mapping approaches that include functional mapping,
systems mapping and network mapping offers the
opportunity to meet this challenge given their complementation to provide a comprehensive coverage
of the genotype–phenotype map. Functional mapping and systems mapping make it possible to detect
QTLs and their epistatic interactions, which are
responsible for phenotypic variation in trait formation and development. These two strategies provide
a dynamic view of the genotype–phenotype connection, but do not reveal regulatory mechanisms
behind this connection. By implementing network
biology into systems mapping, network mapping
promises to connect the phenotype to the interacting
web of genetic regulatory networks at the transcriptional, translational and metabolic levels (Figure 2),
likely making scientific breakthroughs in understanding biological signals. Network mapping provides an
unprecedented resolution into fundamental biological questions about the interplay between genetic
actions/interactions and the origin, properties and
function of life defined as a dynamic system.
The uniqueness and novelty of network mapping
rest on its ability to integrate genetics, genomics,
proteomics and metabolomics by mathematical
equations to model the genetic, biochemical and
dynamic mechanisms of trait formation. The capacity
of differential equations to handle complex systems
makes them uniquely suitable for the identification
of key structural features of the system, quantification
of functional correlations of nodes and pathways and
for providing fundamental insights into the mechanisms by which network structure determines
40
Wang et al.
network dynamics. Network mapping is flexible in
incorporating various types of differential equations
to map eQTLs, pQTLs and mQTLs and their genetic interactions on a time and space scale. Network
mapping is equipped with a capacity to unleash the
mystery of the black box that constructs the genotype–phenotype map.
Several issues, including analysis of rare variants
and RNA-seq data generated by next-generation
sequencing technologies, should be addressed, in
order to make network mapping a broadly useful
tool. Furthermore, although network mapping was
introduced in the context of plant genetics, it can
also have an implication for human genetics.
Barabasi et al. [71] have discussed a new networkbased medicine approach for studying human complex diseases. This approach can be integrated with
our network mapping to better elucidate the genetic
architecture and landscape of complex human
diseases.
The application of network mapping relies critically upon differential equations and their mathematical, statistical and computational solutions. To
capture and understand the structure, organization
and function of any biological system, even a single
cell, we need to build sophisticated differential equations that can reflect the emergent property and dimension of the system. Almost all such equations
useful to biology and biomedicine are likely to be
high-dimensional, nonlinear, stochastic and multifactorial. Just as deriving these biologically meaningful
equations entails the participation of biologists, the
solution of these equations requires in-depth technical helps from mathematicians, statisticians and
computational scientists. Thus, to make network
mapping a practical tool, synergic collaboration of
various tools from multiple disciplines is crucial.
Although many funding agencies have sensed the
potential of integrating mathematics and biology,
they yet have to realize the value of initiating a
more comprehensive experiment completely based
on well-justified statistical designs, such as network
mapping. In a short run, biologists and mathematicians may work independently to deeply solve various questions and problems specific to their own
areas. However, in a long run, because of its quantitative nature, network mapping that bridges mathematics and biology should be invested; after all, it
provides a new vision for precision biology and precision medicine that cannot be achieved by any statistically unwarranted design.
Key Points
A phenotype is genetically complex in terms of its underlying
genetic architecture involving many genes that display a web of
interactions with other genes and with environmental factors.
The formation of any phenotype undergoes a series of developmental events and biological alterations that lead to cell
growth, differentiation and morphogenesis.
DNA sequences determine variation in a phenotype by perturbing transcripts, metabolites and proteins that construct transcriptional and regulatory networks.
A comprehensive picture of the genetic landscape of a phenotype can now be elucidated through dissecting it into its underlying genetic, developmental and regulatory components using
robust mathematical models.
FUNDING
This work is partially supported by NSF/
IOS-0923975,
NIH/UL1RR0330184,
the
Changjiang Scholars Award and ‘Thousand-person
Plan’ Award.
References
1.
Frazer KA, Murray SS, Schork NJ, et al. Human genetic
variation and its contribution to complex traits. Nat Rev
Genet 2009;10:241–51.
2. Lin WD, Liao YY, Yang TJ, et al. Coexpression-based clustering of Arabidopsis root genes predicts functional modules
in early phosphate deficiency signaling. Plant Physiol 2011;
155:1383–402.
3. Lander ES, Botstein D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics
1989;121:185–99.
4. Mackay TF, Stone EA, Ayorles JF. The genetics of quantitative traits: challenges and prospects. Nat Rev Genet 2009;
10:565–77.
5. Miura K, Ashikari M, Matsuoka M. The role of QTLs in
the breeding of high-yielding rice. Trends Plant Sci 2012;16:
319–26.
6. Lu AT, Yoon J, Geschwind DH, et al. QTL replication and
targeted association highlight the nerve growth factor gene
for nonverbal communication deficits in autism spectrum
disorders. Mol Psychiatry 2011 Nov 22. doi: 10.1038/
mp.2011.155. [Epub ahead of print].
7. Zeng Z-B. Theoretical basis of separation of multiple linked
gene effects on mapping quantitative trait loci. Proc Natl Acad
Sci USA 1993;90:10972–6.
8. Jansen RC, Stam P. High resolution of quantitative traits
into multiple loci via interval mapping. Genetics 1994;136:
1447–55.
9. Kao CH, Zeng ZB, Teasdale RD. Multiple interval mapping for quantitative trait loci. Genetics 1999;152:1203–16.
10. Jannink J-L, Jansen RC. Mapping epistatic QTL with
one-dimensional genome searches. Genetics 2001;157:
445–54.
11. Xu S. Estimating polygenic effects using markers of the
entire genome. Genetics 2003;163:789–801.
Genetic landscape of complex traits
12. Wu RL, Lin M. Functional mapping – how to map and
study the genetic architecture of dynamic complex traits.
Nat Rev Genet 2006;7:229–37.
13. Wang Z, Pang XM, Lv YF, et al. A dynamic framework for
quantifying the genetic architecture of phenotypic plasticity.
Brief Bioinform 2013;14:82–95.
14. Lynch M, Walsh B. Genetics and Analysis of QuantitativeTraits.
Sunderland, MA: Sinauer, 1998.
15. West GB, Brown JH, Enquist BJ. A general model for
ontogenetic growth. Nature 2001;413:628–31.
16. Le Goff L, Lecuit T. Gradient scaling and growth. Science
2011;331:1141–2.
17. Wu S, Yap JS, Li Y, et al. Network models for dissecting
plant development by functional mapping. Curr Bioinform
2009;4:183–7.
18. Nadeau JH, Dudley AM. Systems genetics. Science 2011;
331:1015–6.
19. Dermitzakis ET. Cellular genomics for complex traits. Nat
Rev Genet 2012;13:215–20.
20. Ma CX, Casella G, Wu RL. Functional mapping
of quantitative trait loci underlying the character
process: A theoretical framework. Genetics 2002;161:
1751–62.
21. Wu RL, Wang ZH, Zhao W, etal. A mechanistic model for
genetic machinery of ontogenetic growth. Genetics 2004;
168:2383–94.
22. He QL, Berg A, Li Y, et al. Modeling genes for plant structure, development and evolution: Functional mapping
meets plant ontology. Trends Genet 2010;26:39–46.
23. Li Y, Wu RL. Functional mapping of growth and development. Biol Rev 2010;85:207–16.
24. Yap JS, Wang CG, Wu RL. A computational approach
for functional mapping of quantitative trait loci that
regulate thermal performance curves. PLoS One 2007;2(6):
e554.
25. Zhao W, Zhu J, Gallo-Meagher M, et al. A unified statistical
model for functional mapping of genotype environment
interactions for ontogenetic development. Genetics 2004;
168:1751–62.
26. Zhao W, Ma CX, Cheverud JM, et al. A unifying statistical model for QTL mapping of genotype-sex interaction for developmental trajectories. Physiol Genom 2004;
19:218–27.
27. Nicholson JK, Holmes E, Lindon JC, etal. The challenges of
modeling mammalian biocomplexity. Nat Biotech 2004;22:
1268–74.
28. Lander AD. Pattern, growth, and control. Cell 2011;144:
955–69.
29. Wu RL, Cao JG, Huang ZW, et al. Systems mapping: How
to improve the genetic mapping of complex traits through
design principles of biological systems. BMC Syst Biol 2011;
5:84.
30. Niklas KJ, Enquist BJ. On the vegetative biomass partitioning of seed plant leaves, stems, and roots. Am Nat 2002;159:
482–97.
31. Chen J, Reynolds J. A coordination model of carbon
allocation in relation to water supply. Ann Bot 1997;80:
45–55.
32. West GB, Brown JH, Enquist BJ. A general model for the
origin of allometric scaling laws in biology. Science 1997;
276:122–6.
41
33. West GB, Brown JH, Enquist BJ. The fourth dimension of
life: Fractal geometry and allometric scaling of organisms.
Science 1999;284:1677–9.
34. Poorter H, Niklas KJ, Reich PB, et al. Biomass allocation to
leaves, stems and roots: meta-analyses of interspecific
variation and environmental control. New Phytol 2012;193:
30–50.
35. Niklas KJ, Enquist BJ. An allometric model for seed plant
reproduction. Evol Ecol Res 2003;5:79–88.
36. Niklas KJ, Enquist BJ. Canonical rules for plant organ biomass partitioning and annual allocation. Am J Bot 2004;89:
812–9.
37. Niklas KJ, Enquist BJ. Biomass Allocation and Growth Data of
Seeded Plants. Data set. Oak Ridge, TN: Oak Ridge National
Laboratory Distributed Active Archive Center, 2004.
38. Klingenberg CP. Morphological integration and developmental modularity. Annu Rev Ecol Evol Syst 2008;39:115–32.
39. Ideker T, Galitski T, Hood L. A new approach to decoding
life: systems biology. Annu Rev Genomics Hum Genet 2001;2:
343–72.
40. de Hoog CL, Mann M. Proteomics. Annu Rev Genomics
Hum Genet 2004;5:267–93.
41. Karlebach G, Shamir R. Modelling and analysis of gene
regulatory networks. Nat Rev Mol Cell Biol 2008;9:770–80.
42. Chong L, Ray LB. Whole-istic biology. Science 2002;295:
1661.
43. Boyes DC, Zayed AM, Ascenzi R, etal. Growth stage-based
phenotypic analysis of Arabidopsis: a model for high
throughput functional genomics in plants. Plant Cell 2001;
13:1499–510.
44. Ideker T, Thorsson V, Ranish JA, et al. Integrated genomic
and proteomic analyses of a systematically perturbed metabolic network. Science 2001;292:929–34.
45. Kitano H. Systems biology: a brief overview. Science 2002;
295:1662–4.
46. Jansen RC, Nap JP. Genetical genomics: the added value
from segregation. Trends Genet 2001;17:388–91.
47. Rockman MV, Kruglyak L. Genetics of global gene expression. Nat Rev 2006;7:862–72.
48. Cookson W, Liang L, Abecasis G, et al. Mapping complex
disease traits with global gene expression. Nat Rev Genet
2009;10:184–94.
49. Bonneau R. Learning biological networks: from modules to
dynamics. Nat Chem Biol 2008;4:658–64.
50. Nicholson JK, Holmes E, Lindon JC, etal. The challenges of
modeling mammalian biocomplexity. Nat Biotechnol 2004;
22:1268–74.
51. Gardner TS, di Bernardo D, Lorenz D, et al. Inferring genetic networks and identifying compound mode of action via
expression profiling. Science 2003;301:102–5.
52. Tegner J, Yeung MK, Hasty J, et al. Reverse engineering
gene networks: integrating genetic perturbations with
dynamical modeling. Proc Natl Acad Sci USA 2003;100:
5944–9.
53. Yeung MK, Tegner J, Collins JJ. Reverse engineering gene
networks using singular value decomposition and robust
regression. Proc Natl Acad Sci USA 2002;99:6163–8.
54. Gilchrist M, Thorsson V, Li B, et al. Systems biology
approaches identify ATF3 as a negative regulator of
Toll-like receptor 4. Nature 2006;441:173–8.
42
Wang et al.
55. Bonneau R, Reiss DJ, Shannon P, et al. The Inferelator: an
algorithm for learning parsimonious regulatory networks
from systems-biology data sets de novo. Genome Biol 2006;
7:R36.
56. Bonneau R, Facciotti MT, Reiss DJ, et al. A predictive
model for transcriptional control of physiology in a free
living cell. Cell 2007;131:1354–65.
57. Legewie S, Herzel H, Westerhoff HV, et al. Recurrent
design patterns in the feedback regulation of the mammalian
signalling network. Mol Syst Biol 2008;4:190.
58. Wu RL, Ma CX, Casella G. Statistical Genetics of Quantitative
Traits: Linkage, Maps, and QTL. New York: Springer, 2007.
59. Morrell PL, Buckler ES, Ross-Ibarra J. Crop genomics:
advances and applications. Nat Rev Genet 2012;13:85–96.
60. Reik W. The Wellcome Prize Lecture. Genetic imprinting:
the battle of the sexes rages on. Exp Physiol 1996;81:161–72.
61. Reik W, Dean W, Walter J. Epigenetic reprogramming in
mammalian development. Science 2001;293:1089–93.
62. Luedi PP, Dietrich FS, Weidman JR, et al. Computational
and experimental identification of novel human imprinted
genes. Genome Res 2007;17:1723–30.
63. Leibovitch MP, Nguyen VC, Gross MS, et al. The human
ASM (adult skeletal muscle) gene: expression and chromosomal assignment to 11p15. Biochem Biophys Res Commun
1991;180:1241–50.
64. Matsuoka S, Edwards MC, Bai C, et al. p57KIP2, a structurally distinct member of the p21CIP1 Cdk inhibitor
family, is a candidate tumor suppressor gene. Genes Dev
1995;9:650–62.
65. O’Dell SD, Day IN. Insulin-like growth factor II (IGF-II).
IntlJ Biochem Cell Biol 1998;30:767–71.
66. Moore T, Haig D. Genomic imprinting in mammalian development: a parental tug-of-war. Trends Genet 1991;7:
45–9.
67. Wilkins JF, Haig D. What good is genomic imprinting: the
function of parent-specific gene expression. Nat Rev Genet
2003;4:359–68.
68. Li Y, Guo YQ, Hou W, et al. A statistical design for testing
transgenerational genomic imprinting in natural human
populations. PLoS One 2011;6(2):e16858.
69. Wang CG, Wang Z, Luo JT, et al. A model for transgenerational imprinting variation in complex traits. PLoS One
2010;5(7):e11396.
70. Wang CG, Wang Z, Prows DR, et al. A computational
framework for the inheritance of genomic imprinting for
complex traits. Brief Bioinform 2012;13:34–45.
71. Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a
network-based approach to human disease. Nat Rev Genet
2011;12:56–68.
72. Fu GF, Luo J, Berg A, et al. A dynamic model for
functional mapping of biological rhythms. J Biol Dyn
2010;4:1–10.
73. Luo JT, Hager WW, Wu RL. A differential equation model
for functional mapping of a virus-cell dynamic system. J
Math Biol 2010;65:1–15.
74. Yap J, Fan J, Wu RL. Nonparametric modeling of covariance structure in functional mapping of quantitative trait
loci. Biometrics 2009;65:1068–77.