B RIEFINGS IN BIOINF ORMATICS . VOL 15. NO 1. 30 ^ 42 Advance Access published on 27 August 2012 doi:10.1093/bib/bbs049 Towards a comprehensive picture of the genetic landscape of complex traits Zhong Wang, Yaqun Wang, Ningtao Wang, Jianxin Wang, Zuoheng Wang, C. Eduardo Vallejos and Rongling Wu Submitted: 21st May 2012; Received (in revised form) : 9th July 2012 Abstract The formation of phenotypic traits, such as biomass production, tumor volume and viral abundance, undergoes a complex process in which interactions between genes and developmental stimuli take place at each level of biological organization from cells to organisms. Traditional studies emphasize the impact of genes by directly linking DNA-based markers with static phenotypic values. Functional mapping, derived to detect genes that control developmental processes using growth equations, has proven powerful for addressing questions about the roles of genes in development. By treating phenotypic formation as a cohesive system using differential equations, a different approachçsystems mappingçdissects the system into interconnected elements and then map genes that determine a web of interactions among these elements, facilitating our understanding of the genetic machineries for phenotypic development. Here, we argue that genetic mapping can play a more important role in studying the genotype ^ phenotype relationship by filling the gaps in the biochemical and regulatory process from DNA to end-point phenotype. We describe a new framework, named network mapping, to study the genetic architecture of complex traits by integrating the regulatory networks that cause a high-order phenotype. Network mapping makes use of a system of differential equations to quantify the rule by which transcriptional, proteomic and metabolomic components interact with each other to organize into a functional whole. The synthesis of functional mapping, systems mapping and network mapping provides a novel avenue to decipher a comprehensive picture of the genetic landscape of complex phenotypes that underlie economically and biomedically important traits. Keywords: network mappin; complex traits; differential equations; DNA polymorphism; systems biology GENETIC MAPPING One of the major tasks in modern biology is to map and characterize the genetic architecture of complex traits and use this knowledge to predict the response of biological structure, organization and function to changing environment [1, 2]. One important challenge in achieving this task is posed by the multifactorial and architectural complexity of biological traits. To infer and establish the genotype–phenotype relationship at the level of individual genes, known Corresponding author. Rongling Wu, Center for Statistical Genetics, The Pennsylvania State University, Hershey, PA 17033, USA. Tel: þ717-531-2037; Fax: þ717-531-0480; E-mail: [email protected] ZhongWang is a computer scientist and statistical geneticist in the Center for Computational Biology at Beijing Forestry University. He also works in the Center for Statistical Genetics at The Pennsylvania State University. He develops computer algorithms for statistical genetic models. YaqunWang is a PhD student in statistics at The Pennsylvania State University. The focus of his thesis is the dynamic modeling of Bayesian network. NingtaoWang is a PhD student in statistics at The Pennsylvania State University. The focus of his thesis is the modeling and dissection of complex phenotypes using differential geometry. JianxinWang is an Associate Professor in information science at Beijing Forestry University, interested in algorithmic development for complex systems. Zuoheng Wang is an Assistant Professor in biostatistics at Yale University. Her research interest is to develop statistical models for genetic mapping and genome-wide association studies. C. E.Vallejos is an Associate Professor in plant genetics at the University of Florida. His research interest centers on the molecular genetics of disease resistance in crops, such as Phaseolus vulgaris. Rongling Wu is a Professor in biostatistics and bioinformatics and the Director of the Center for Statistical Genetics at The Pennsylvania State University. He founded the Center for Computational Biology at Beijing Forestry University. He is interested in quantitative and statistical genetics. ß The Author 2012. Published by Oxford University Press. For Permissions, please email: [email protected] Genetic landscape of complex traits as quantitative trait loci (QTLs), genetic mapping pioneered by Lander and Botstein [3] has been used in a diversity of traits and organisms [4–6]. The past two decades have seen an explosion of interest in developing various statistical approaches for genetic mapping of complex phenotypes [7–13]. The defining principle of QTL mapping is based on the identification of significant associations between DNA-based polymorphic markers and phenotypic variation in order to link the genotype to the phenotype. This approach allows us to dissect a complex phenotypic trait into its underlying DNA variants distributed in various regions of the genome. For example, the difference in stem length between two plants may be attributed to their different DNA genotypes at particular loci (Figure 1). Quantitative genetics has gained revolutionary insights due to the widespread use of gene mapping [14]. In fact, this newly acquired knowledge has been successfully applied in breeding programs [5] and medicine [6]. 31 Although useful to varying extents, QTL analyses based on simple statistical tests to identify the direct relationship between genotype and end-point phenotype do not reveal a complete picture of the genetic architecture of complex traits. Focusing on end-point phenotypes leaves out of the analysis all the developmental and genetic factors that shape the phenotype. For instance, consider the acquisition of certain height or weight over different lengths of time. The same end-point phenotype could be attained through different developmental trajectories each controlled by a different set of genes. This example underscores the limitations of bulking similar end-point phenotypes for standard QTL analysis. It is widely recognized that almost all organ traits undergo a developmental process during which their level of complexity increases [15]. Thus, development is a dynamic process in which genes play a critical role in regulating the pattern of trait development through their interactions with morphogens [16]. In addition, DNA variants can affect phenotype formation by altering to some degree biochemical pathways and regulatory networks directly involved in the shaping of a particular phenotype [17–19]. The developmental pattern of genetic control can now be documented at both the spatial and temporal levels by two complementary and potentially synergistic approaches: global analysis of gene expression, and functional and systems mapping; the latter are two statistical models for mapping dynamic traits. In this Opinion article, we present a new conceptual framework of mapping complex phenotypes by integrating knowledge of how genetic information is integrated, coordinated and, ultimately transmitted through regulatory networks to enable biological functions that shape high-order traits. We will also argue how a comprehensive genotype–phenotype map can be charted through the merger of this framework and functional and systems mapping. FUNCTIONAL MAPPING Figure 1: Traditional genetic mapping that links DNA variation (AT versus CG) in a region of the plant genome to phenotypic variation in static plant traits, such as leaf display, through a statistical test. This approach simply considers the biochemical pathways from DNA to the phenotype as a black box, unable to address the biological complexity of phenotypic formation and development. Conventional QTL mapping experiments require three basic steps: (i) genotyping a segregating population with a relatively large number of DNA-based markers, (ii) phenotyping the populations at a specific point in time, usually the end-point (final size, weight, etc.), and (iii) identifying significant associations between the genotype and the phenotype. This approach, however, has a major limitation because it fails to analyze the genetic components 32 Wang et al. of trait development that lead to phenotypic variation over time. This limitation can be overcome by functional mapping [12, 20–23], a new QTL mapping approach that takes into account the dynamic nature of trait development. It accomplishes this task through a statistical analysis of the mathematical function that best represents the growth or developmental trajectory of a segregating trait using parametric or nonparametric models [21] (Figure 2). Thus, functional mapping has the ability to identify QTL, which may play critical roles in trait development at a specific period in time. Overall, functional mapping displays increased statistical power and stability by reducing the number of parameters to be estimated. From a statistical standpoint, functional mapping strives to jointly model mean-covariance structures for a longitudinal trait, i.e. a trait measured repeatedly over a period of time. However, in contrast to the general treatment of longitudinal problems, functional mapping integrates parameter estimation and the testing process within a mixture-based framework in which each mixture component is assigned a biological rationale. Unlike traditional mapping models that normally do not consider biological processes, functional mapping yields biologically relevant genetic information because it embeds the mathematical aspects of biological processes (e.g. growth equations) within the estimation procedure of QTL parameters. Thus, results obtained by functional mapping provide a mechanistic and realistic perspective of biological systems. Specifically, functional mapping has been shown to address the following fundamental biological questions: (i) How does a specific QTL affect the pattern of development? Functional mapping can determine the timing at which the QTL is switched on and Figure 2: Biological networks that are involved in trait formation and development. Genetic mapping is moving from a direct genotype ^ phenotype correspondence (Figure 1) to regulatory networks that underlies the formation of a phenotypic trait. Functional mapping associates DNA variants with the developmental process (early, middle and late) of a trait, such as plant height, facilitating the elucidation of the developmental and genetic basis of trait variation. Systems mapping extends the dynamic idea of functional mapping to dissect the phenotype into its interrelating components based on design principles and then map QTLs that determine each of the components and component^ component interactions and coordination. In this example, systems mapping dissects plant height into its leaf and root components over a time scale and study how leaf and root traits coordinate and compete with each other to affect height growth. Now, network mapping combines the advantages of functional mapping and systems mapping to identify eQTLs (for gene expression), pQTLs (for protein expression) and mQTLs (for metabolic expression) and their genetic interactions that control regulatory networks that cause the final phenotype. Network mapping entails the integrative and innovative use of different tools from diverse disciplines, such as genetics, mathematics, development, statistics, computer science and engineering. Genetic landscape of complex traits off and the period of how long the QTL is at work [21]. (ii) Are there any distinct temporal patterns of QTL effects in a time course? When used to study the genetic control of thermal performance, for example, functional mapping can identify whether hotter–colder, faster–slower or generalist–specialist QTLs exist to control thermal performance curves of growth rates [24]. (iii) How do developmental and environmental stimuli impact on theexpression ofa QTL? Genes affect phenotypic development through internal (developmental) and external environments. Functional mapping can characterize how different environmental stimuli activate the expression of new QTLs and guide the direction of their expression [25, 26]. (iv) What is the overall picture of the genetic architecture of dynamic traits? Functional mapping can test whether pleiotropic QTLs contribute to genetic correlations between vegetative growth and reproductive behavior [22]. In particular, it can also test how different QTLs interact to form a dynamic web of epistatic network to regulate growth and development. SYSTEMS MAPPING Functional mapping is a dynamic model for mapping QTLs because it treats trait development as a process. Indeed, any complex trait is formed in a dynamic system that consists of various interacting parts or gene networks, each with different levels of complexity and defined spatial and temporal patterns of expression. The behavior and outcome of this system, i.e. the resulting final phenotype, can be changed when one or more key components of the system are replaced by a variant [27]. However, to understand the basis of genotype-dependent changes of the phenotype, we need to understand how different system parts are coordinated and organized, as well as the genetic mechanisms that underlie their coordination [28]. A fundamental principle and approach for mapping QTL associated with a complex dynamic system has been formulated by linking different parts of the system with robust mathematical equations and integrating these equations into a statistical framework for genetic mapping. This approach, called systems mapping [29], deploys a system of differential equations to describe and quantify the dynamic behavior of a 33 biological system and manipulate its high complexity (Figure 2). Systems mapping allows for the quantitative testing of biologically interesting hypotheses about the genetic mechanisms that control the interactions patterns among integral parts of the system and the emerging properties of the whole system. During plant growth, biomass allocation to different organs (stem, branches, leaves, seeds, coarse roots and fine roots) tends to produce an optimal architecture that maximizes the capture of nutrients, light, water and carbon dioxide through a specific developmental program [30]. Thus, plant growth is a highly coordinated activity in which biomass is partitioned to the various organs according to the need while maintaining a physiological balance among them dictated by basic biological principles. A system of ordinary differential equations (ODEs) has been derived to model the dynamic allocation of carbon to different organs during growth in relation to carbon and water/nitrogen supply by integrating coordination theory [31] and allometric scaling [32, 33]. Systems mapping can quantitatively and precisely identify genes that govern the mechanistic relationships and interactions among system components. This task can be accomplished through various tests for individual ODE parameters, or combinations of them. Some mechanistically meaningful relationships that have been examined include: (i) Size-shape relationshipçSystems mapping can analyze whether a big plant is due to a big stem with sparse leaves or dense leaves with a small stem [34]. (ii) Structural^functionalrelationshipçSystems mapping can detect whether a plant tends to allocate more carbon to its leaves for CO2 uptake or to roots for water and nutrient uptake when the environment changes [35]. (iii) Cause^effect relationshipçSystems mapping can determine whether more roots are due to more leaves or more leaves produce more roots in a particular environment [36, 37]. (iv) Pleiotropic relationshipçSystems mapping can quantify how some QTLs pleiotropically control morphological integration in which different traits with a similar function tend to integrate into developmental modularity [38]. A better understanding of these relationships will help us gain additional insights into the mechanistic response of plant size and shape to 34 Wang et al. developmental and environmental signals and, also, provide guidance to select an ideotype of crop cultivars with optimal shape and structure suited to a particular environment. A BIG PICTURE OF LIFE Existing genetic mapping methods are an intrinsic reductionist approach because they usually do not take into account the high-dimensional nature of gene expression—spatio-temporal patterns, transcriptional and post-transcriptional controls in the context of regulatory gene networks involved in trait formation. The emergence and development of any phenotype are carried out within a living system in which the ‘Central Dogma’ of biology, as a fundamental rule, controls and directs the life process according to: DNA ! mRNA ! enzyme (inactive) ! enzyme (active) ! metabolite(s) ! metabolism ! cellular physiology ! phenotype (Figure 2). The pathways and processes from DNA to phenotype can be viewed as a biological system comprised of large populations of elements that function together within the system according to unique design principles. Thus, the phenotype can be defined by the structure and dynamics of the system. These two characteristics jointly establish the essential property of a biological system, robustness, with which single perturbations will rarely cause functional failure [39, 40]. There are two sets of digital information that play a central role in assembling different parts of the system; together, these sets coordinate with each other and with the surrounding environment. The first set comprises genes that encode proteins, which play specific structural and functional roles. The second comprises regulatory networks in which the genes interact with each other and with their products in a manner that reveals a complex web of dynamic co-expression in time and space [41]. While genetic information can operate over such a wide range of time scales: tens of years to hours (life cycles) and weeks to milliseconds (physiology) [42], the biological phenotype it determines can be expressed at multiple levels of organization, from the molecular to the organismal level [40]. Current biotechnologies allow the high-throughput measure of gene expression, a process by which inheritable information from the DNA sequence is converted into a functional gene product (such as RNA or protein). Nowadays, it has been increasingly feasible to measure the whole set of genes of a biological system (genome), the entire set of RNA transcripts of the system (transcriptome), the entire collection of proteins (proteome) and the entire range of metabolites taking part in a biological process (metabolome). Another technological development has facilitated the measure of organismal phenotypes, such as embryo growth trajectories, heart 3D shape and plant root architecture. It is also possible to measure a complete set of phenotypes (phenome) for some organisms, such as humans [19] and Arabidopsis [43]. The availability of these omics data entails the development of computing models to analyze the interactome (complete set of interactions between proteins or between these and other molecules) and localizome (localization of transcripts, proteins, etc.) and ultimately relate these to the organization and function of the biological system [41, 44, 45]. NETWORK MAPPING Many studies have been carried out to map QTLs that affect gene expression (eQTLs), proteomic (pQTLs) and metabolomic profiles (mQTLs) (Figure 2) by dissecting the phenotype into its genetic regulatory networks [19, 46–48]. However, most of these studies associate DNA polymorphic markers with expression profiles of individual genes, proteins or metabolites, without considering intrinsic gene–protein–metabolite interaction networks. Such single-level analysis cannot extract information related to the control of gene expression between the transcriptional and post-translational levels. What is needed is an approach that can provide insight into interaction networks of genes during cell cycle, cell differentiation, signal transduction and metabolism, all processes that form a phenotypic trait. Modeling dynamic regulatory networks The information available from omics research can enable us to assess multiple features of complex systems at various levels of biological organization, from the cell to the whole organism within a defined and fully delineated framework. The challenge is therefore to decode this information in the context of the physiology of the system [49, 50]. It is imperative to have a good understanding of the pattern and control of gene expression and the physiology of a biological system if we are going to construct a model to assess the capabilities of such Genetic landscape of complex traits system [49]. Classic correlation methods or multivariate statistics have been used to correlate gene expression data with proteomic and metabolomic data [47, 48], but they fail to capture the true quantitative variation and relationship between activity of thousands of gene–protein couples or protein–metabolite couples in a cellular system. To do so, we need to consider several critical factors, i.e. the time displacement of the genetic and protein synthetic and post-translational events, their different timescales and their half-lives. ODEs that have been widely used to model electronic networks in engineering may play a part in describing regulatory networks of a biological system [51–56]. These ODE methods were applied to model the regulatory network of Halobacterium salinarum, suggesting that the model could predict mRNA levels of 2000 out of a total 2400 genes found in the genome [49]. A similar application was implemented to model human regulatory networks (TLR-5–mediated stimulation of macrophages) and several other microbial networks [54]. Mapping dynamic regulatory networks It is feasible to derive a dynamic model for mapping the biochemical pathways of trait formation and development, and identifying the mechanistic networks involved in the generation of a high-order phenotype by integrating the basic principles of functional mapping and systems mapping. Although a regulatory network usually has a complex structure, robust mathematical models like differential equations have proven a powerful means for simplifying and summarizing this complexity. We use a simplified example of gene expression to show how network mapping is developed, Four processes determine the expression of a gene into its protein: transcription, translation, mRNA degradation and protein degradation, as depicted below [57]. The dynamics of gene expression in a time course may be described by two ODEs incorporating these four reactions [57], i.e. dM ¼ k1 d1 M dt dP ¼ k2 M d2 P dt 35 ð1Þ where M and P are the time courses of mRNA and protein expression, respectively, k1 is the rate of mRNA transcription from DNA, d1 is the decay rate of mRNA degradation, k2 is the rate of translation from mRNA to protein, and d2 is the decay rate of protein degradation. By using this set of ODEs, we can quantify the dynamic properties of a gene expression system (1) based on parameters ? ¼ (k1,d1;k2,d2). By integrating ODEs (1) and DNA polymorphic markers through a mixture model framework, a new dynamic model, called network mapping, can be derived. In Box 1, a statistical procedure of deriving network mapping is described. Network mapping is equipped to estimate and test ODE parameters for different QTL genotypes inferred from DNA marker information based on QTL-marker linkage or linkage disequilibrium [58]. Consider an ODE parameter set ?1 ¼ (k11,d11;k12,d12) (0.2,0.9;0.9,0.25), with which we plot the dynamic behavior of mRNA and protein expression (Figure 3A). If a QTL affects the dynamic system of gene–protein expression (1) by changing one or more parameters, we obtain different behavioral dynamics of expression profiles as shown in Figure 3B with ?2 ¼ (k21,d21;k22,d22) (0.25,0.9;0.9,0.35) and Figure 3C with ?3 ¼ (k31,d31;k32,d32) (0.3,0.9;0.9,0.45). Pronounced differences in curve form show a high sensitivity of gene and protein expression to parameter changes and, therefore, genotype changes. The genetic control of the correlation dynamics of gene and protein expression can be visualized by displaying genotype-specific phase planes (Figure 3D). Phase-plane analysis reveals the equilibrium points of the gene–protein expression system, showing the pattern of how a QTL controls the equilibrium point. Network mapping embeds a procedure of testing whether a QTL pleiotropically controls gene expression and protein expression. Computer simulation has provided basic information about the power of QTL detection and the precision of parameter estimation by network mapping given different sample sizes, measurement errors and the number of time points [30]. Network mapping described in Box1 can be readily extended to characterize the effects of genetic interactions between different QTLs by 36 Wang et al. Figure 3: Dynamic changes of gene and protein expression in a time course, with the pattern and behavior depending on different QTL genotypes (A, B and C). In (D), a phase plane analysis is conducted for the three genotypes, showing their different patterns of gene ^ protein expression dynamics. incorporating multiple QTLs into likelihood (B1). If there exist an eQTL and a pQTL, co-located at a given genomic region, which control gene and protein expression dynamics, respectively, in a manner that is specified by system (1), then network mapping allows the overall genetic architecture of transcriptional regulation to be elucidated as follows: From the above graphic presentation, we can address several important questions: (i) does the eQTL pleiotropically control protein expression as shown by the dash line with an arrow? Similarly, does the pQTL pleitropically controls the gene expression? (ii) do the eQTL and pQTL interact with each other to determine gene expression, protein expression or both? (iii) is the correlation between gene and protein expression due to either the pleiotropic control of these two QTLs or their genetic linkage in the genome? (iv) can we identify how each QTL affects different processes, transcription, translation, mRNA degradation and protein degradation, by testing individual parameters in ? ¼ (k1,d1;k2,d2)? In practice, a biological system comprises of multiple genes and multiple proteins that interact with each other and are co-expressed across a time-space scale [41, 57]. Box 2 provides such an example of multiple-gene interactions. The basic procedure of network mapping given in Box 1 can well be used to map the regulatory network of this system by applying or deriving a high-dimensional system of differential equations with more complex structure (Box 2). Towards a fuller understanding of the genetic causes and consequences of transcriptional regulation networks that lead to a final phenotype, this approach allows testing a series of hypotheses about the pattern of expression of a gene set, a protein set or a gene–protein co-expression. COMPREHENDING GENETIC ARCHITECTURE Phenotypic traits are usually determined by many genes, acting with various effects and manners and Genetic landscape of complex traits 37 Box 1: Statistical models for network mapping Genetic mapping is based on an experimental or natural population of size n in which segregating individuals are genotyped for DNA markers and phenotyped for gene and protein expression profiles at a series of T time points. If specific QTLs exist to affect the dynamic system (1), the parameters that specify the system should be different among QTL genotypes. Genetic mapping uses a mixture model-based likelihood to estimate QTL genotype-specific parameters. This likelihood is expressed as n Y L yM ; yP ¼ o1ji f1 ðyMi ,yPi Þ þ þ oJji fJ ðyMi ,yPi Þ ðB1Þ i¼1 where yM ¼ (yMi(t1), . . . , yMi(tT)) and yP ¼ (yPi(t1), . . . , yPi(tT)) are the expression profiles of gene and protein measured at T different time points, ojji is the conditional probability of QTL genotype j (j ¼ 1, . . . ,J) given the marker genotype of individual i, fj(yMi,yPi) is a multivariate normal distribution with expected mean vector for individual i that belongs to QTL genotype j, ðuMjji ; uPjji Þ ¼ uMjji t1 , . . . ,uMjji tT ; uPjji t1 , . . . ,uPjji tT ðB2Þ and covariance matrix R¼ RM RPM RMP , RP ðB3Þ with RM and RP being (T T) covariance matrices of time-dependent mRNA and protein expression, respectively, and RMP ¼ RPM being a (T T) covariance matrix between the two variables. In network mapping, we incorporate ODEs (1) into mixture model (B1) to estimate genotypic means (B2) specified by ODE parameters for different QTL genotypes, expressed as (k1j,d1j;k2j,d2j) for j ¼ 1, . . . ,J [72, 73]. Since mRNA and protein expression profiles obey dynamic system (1), the derivatives of genotypic means can be expressed in a similar way. Let gkjji(t,ukjji) denote the genotypic derivative for variable k (k ¼ M or P), i.e. gkjji ðt,ukjji Þ ¼ dukjji : dt We use ukjji to denote the genotypic mean of variable j for individual i belonging to QTL genotype j at an arbitrary point in a time course. Based on the Runge–Kutta scheme, the value of ukjji in iteration l þ 1 is determined by the present value plus the weighted average of four deltas (where each delta is the product of the size of the interval and an estimated slope), expressed as flg 1 uflþ1g kjji ¼ ukjji þ 6 R1kjji þ 2R2kjji þ 2R3kjji þ R4kjji , is the delta based on the slope at the beginning of the interval, using uflg where R1kjji ¼ hgkjji tflg ,uflg kjji kjji ; flg 1 flg 1 R2kjji ¼ hgkjji t þ 2h,ukjji þ 2R1kjji is the delta based on the slope at the midpoint of the interval, using flg 1 flg 1 1 uflg based on the slope at the midpoint, but kjji þ 2R1kjji , R3kjji ¼ hgkjji t þ 2h,ukjji þ 2R2kjji is again the delta flg 1 flg now using uflg t the delta based on the slope at the end of þ R , and R ¼ hg þ h,u þ R kjji 3kjji 4kjji kjji kjji 2 2kjji the interval. The Runge–Kutta fourth order algorithm with step size h ¼ 0.1 is used to approximate the solution in high accuracy given a trial set of parameter values and initial conditions. Next, we need to model the covariance structure by using a parsimonious and flexible approach such as an autoregressive, antedependence, autoregressive moving average or nonparametric and semiparametric approaches [74]. In likelihood (B1), the conditional probabilities of QTL genotypes given marker genotypes can be expressed as a function of recombination fractions for an experimental cross population or linkage disequilibria for a natural population [58]. The estimation of the recombination fractions or linkage disequilibria can be implemented with the EM algorithm. Box 2: Kinetic analysis and mapping of gene ^ gene interactions In a regulatory network, it is common that different genes interact and coordinate to determine an intermediate step towards phenotypic formation. Network mapping allows gene–gene interactions to be tested in a quantitative way. Consider three genes that operate in a network [41] depicted graphically as below: Gene 1 is constitutively expressed, and is repressed by gene 3. Therefore, its level may reach a maximal rate of increase (k1s where s stands for synthesis) when the level of gene 3 is 0, in which case k1s will be multiplied by 1. When the level of gene 3 is non-zero, the level of gene 1 rises more slowly than k1s. Transcription of gene 2 is activated by gene 1, with the expression level of gene 2 rising as a Michaelis– Menten function of the level of gene 1. Similarly, transcription of gene 3 is activated when both gene 1 and gene 2 levels are non-zero. The regulatory relations of the three genes can be described by a triple of ODEs [41], expressed as dM1 1 ¼ k1s k1d M1 1 þ k13 M3 dt dM2 k21 M1 ¼ k2s k2d M2 dt 1 þ k21 M1 dM3 k31 M1 k32 M2 ¼ k3s k3d M3 dt ð1 þ k31 M1 Þð1 þ k32 M2 Þ ðB4Þ where degradation is modeled as a first-order reaction with rate constants k1d, k2d and k3d for three genes, respectively. This formulation assumes that every transcript is immediately translated, and therefore the synthesis constants k1s, k2s and k3s refer to both transcription and translation. Each equation in system (B4) shows the change in the level of a gene as a difference of its synthesis and degradation through a set of 10 ODE parameters (k1s,k2s,k3s,k1d,k2d,k3d, k21,k31,k32,k13). By testing these parameters singly or in combination, we can determine how an eQTL affects the pattern of co-expression of different genes. For example, the test of how gene 1 and 2 are co-expressed is based on the null hypothesis H0: k21j k21 (for j ¼ 1, . . . ,J). To show the effect of eQTLs on the dynamic system of gene co-expression, we assume three groups of parameters (k1s,k2s,k3s,k1d,k2d,k3d, k21,k31,k32,k13) ¼ (2,2,15,1,1,1,1,1,1,100) for QTL genotype 1, (2,2,15,0.8,0.8,0.8,1,1,1,100) for QTL genotype 2, and (2,2,15,1,1,1,0.8,0.8,0.8,100) for QTL genotype 3, which produce different patterns of trajectories of gene expression in a time course as shown below: Network mapping can be extended to map eQTLs and their interactions for more than three genes expressed on a time scale. A particular mathematical treatment is needed to assure the stability of the estimates of parameters for a high-dimensional system of ODEs. Genetic landscape of complex traits interacting with one another, and also with a capacity to adjust genetic expression in response to environmental and developmental signals [14]. These activities of genes are cumulated to form a complex network of actions and interactions. The genetic architecture of a complex trait describes the structure and dynamics of this network. The more gene activities listed, the more comprehensive genetic architecture elucidated. Network mapping enjoys the incorporation of all the activities that constitute genetic architecture. In recent years, new sources for genetic variation have been recognized and studied, including single nucleotide polymorphisms (SNPs), insertions or deletions ‘indels’ and copy number variants [59]. With the increasing availability of these data, genome-wide association studies (GWAS), aimed to identify a complete set of genes affecting a complex phenotype or disease, have become one of the most important tools in genetic research [59]. The implementation of network mapping into GWAS can greatly refresh our understanding of how and where a phenotype is originated. Another important source, genomic imprinting, has been increasingly studied. This phenomenon results from epigenetic marks, causing different patterns of expression of certain genes depending on their parental origin [60–62]. These so-called imprinted genes violate the classical Mendelian inheritance, which are either expressed only from the allele of the mother, such as H19 or CDKN1C [63, 64], or from the allele of the father, such as IGF-2 [65]. From a quantitative genetic perspective, genomic imprinting may provide the organisms with evolutionary merits by activating additional genetic variation and conferring a fitness benefit when the environment changes [66, 67]. Currently, different forms of genomic imprinting have been detected in a variety of species and thought to play an important role in regulating crucial aspects of embryonic growth and development as well as pathogenesis [61]. Recent bioinformatic analyses suggest that the number of imprinted genes may be higher than we thought previously, although this remains to be demonstrated experimentally [62]. Genomic imprinting can also be incorporated into network mapping through a family design in which both parents and their progeny are genotyped while the phenotype is measured on the progeny. Li et al. [68] proposed a sampling strategy to estimate genetic imprinting expressed at the individual gene level by 39 sampling nuclear families at random from a natural population. Using reciprocal crosses, Wang et al. [69, 70] formulated a model to compute different types of imprinting effects and their interactions with other genetic effects; this model allows the test of whether imprinting effects are reprogrammed in the process of embryonic development or transmitted to next generations. There is no difficulty in implementing these designs into network mapping, facilitating the complete elucidation of trait genetic control. CONCLUDING REMARKS One of the major challenges in systems biology, as far as modeling is concerned, is the construction of models capable of integrating processes across contrasting scales of time and space. The suite of mapping approaches that include functional mapping, systems mapping and network mapping offers the opportunity to meet this challenge given their complementation to provide a comprehensive coverage of the genotype–phenotype map. Functional mapping and systems mapping make it possible to detect QTLs and their epistatic interactions, which are responsible for phenotypic variation in trait formation and development. These two strategies provide a dynamic view of the genotype–phenotype connection, but do not reveal regulatory mechanisms behind this connection. By implementing network biology into systems mapping, network mapping promises to connect the phenotype to the interacting web of genetic regulatory networks at the transcriptional, translational and metabolic levels (Figure 2), likely making scientific breakthroughs in understanding biological signals. Network mapping provides an unprecedented resolution into fundamental biological questions about the interplay between genetic actions/interactions and the origin, properties and function of life defined as a dynamic system. The uniqueness and novelty of network mapping rest on its ability to integrate genetics, genomics, proteomics and metabolomics by mathematical equations to model the genetic, biochemical and dynamic mechanisms of trait formation. The capacity of differential equations to handle complex systems makes them uniquely suitable for the identification of key structural features of the system, quantification of functional correlations of nodes and pathways and for providing fundamental insights into the mechanisms by which network structure determines 40 Wang et al. network dynamics. Network mapping is flexible in incorporating various types of differential equations to map eQTLs, pQTLs and mQTLs and their genetic interactions on a time and space scale. Network mapping is equipped with a capacity to unleash the mystery of the black box that constructs the genotype–phenotype map. Several issues, including analysis of rare variants and RNA-seq data generated by next-generation sequencing technologies, should be addressed, in order to make network mapping a broadly useful tool. Furthermore, although network mapping was introduced in the context of plant genetics, it can also have an implication for human genetics. Barabasi et al. [71] have discussed a new networkbased medicine approach for studying human complex diseases. This approach can be integrated with our network mapping to better elucidate the genetic architecture and landscape of complex human diseases. The application of network mapping relies critically upon differential equations and their mathematical, statistical and computational solutions. To capture and understand the structure, organization and function of any biological system, even a single cell, we need to build sophisticated differential equations that can reflect the emergent property and dimension of the system. Almost all such equations useful to biology and biomedicine are likely to be high-dimensional, nonlinear, stochastic and multifactorial. Just as deriving these biologically meaningful equations entails the participation of biologists, the solution of these equations requires in-depth technical helps from mathematicians, statisticians and computational scientists. Thus, to make network mapping a practical tool, synergic collaboration of various tools from multiple disciplines is crucial. Although many funding agencies have sensed the potential of integrating mathematics and biology, they yet have to realize the value of initiating a more comprehensive experiment completely based on well-justified statistical designs, such as network mapping. In a short run, biologists and mathematicians may work independently to deeply solve various questions and problems specific to their own areas. However, in a long run, because of its quantitative nature, network mapping that bridges mathematics and biology should be invested; after all, it provides a new vision for precision biology and precision medicine that cannot be achieved by any statistically unwarranted design. Key Points A phenotype is genetically complex in terms of its underlying genetic architecture involving many genes that display a web of interactions with other genes and with environmental factors. The formation of any phenotype undergoes a series of developmental events and biological alterations that lead to cell growth, differentiation and morphogenesis. DNA sequences determine variation in a phenotype by perturbing transcripts, metabolites and proteins that construct transcriptional and regulatory networks. A comprehensive picture of the genetic landscape of a phenotype can now be elucidated through dissecting it into its underlying genetic, developmental and regulatory components using robust mathematical models. FUNDING This work is partially supported by NSF/ IOS-0923975, NIH/UL1RR0330184, the Changjiang Scholars Award and ‘Thousand-person Plan’ Award. References 1. Frazer KA, Murray SS, Schork NJ, et al. Human genetic variation and its contribution to complex traits. Nat Rev Genet 2009;10:241–51. 2. Lin WD, Liao YY, Yang TJ, et al. Coexpression-based clustering of Arabidopsis root genes predicts functional modules in early phosphate deficiency signaling. Plant Physiol 2011; 155:1383–402. 3. Lander ES, Botstein D. Mapping Mendelian factors underlying quantitative traits using RFLP linkage maps. Genetics 1989;121:185–99. 4. Mackay TF, Stone EA, Ayorles JF. The genetics of quantitative traits: challenges and prospects. Nat Rev Genet 2009; 10:565–77. 5. Miura K, Ashikari M, Matsuoka M. The role of QTLs in the breeding of high-yielding rice. Trends Plant Sci 2012;16: 319–26. 6. Lu AT, Yoon J, Geschwind DH, et al. QTL replication and targeted association highlight the nerve growth factor gene for nonverbal communication deficits in autism spectrum disorders. Mol Psychiatry 2011 Nov 22. doi: 10.1038/ mp.2011.155. [Epub ahead of print]. 7. Zeng Z-B. Theoretical basis of separation of multiple linked gene effects on mapping quantitative trait loci. Proc Natl Acad Sci USA 1993;90:10972–6. 8. Jansen RC, Stam P. High resolution of quantitative traits into multiple loci via interval mapping. Genetics 1994;136: 1447–55. 9. Kao CH, Zeng ZB, Teasdale RD. Multiple interval mapping for quantitative trait loci. Genetics 1999;152:1203–16. 10. Jannink J-L, Jansen RC. Mapping epistatic QTL with one-dimensional genome searches. Genetics 2001;157: 445–54. 11. Xu S. Estimating polygenic effects using markers of the entire genome. Genetics 2003;163:789–801. Genetic landscape of complex traits 12. Wu RL, Lin M. Functional mapping – how to map and study the genetic architecture of dynamic complex traits. Nat Rev Genet 2006;7:229–37. 13. Wang Z, Pang XM, Lv YF, et al. A dynamic framework for quantifying the genetic architecture of phenotypic plasticity. Brief Bioinform 2013;14:82–95. 14. Lynch M, Walsh B. Genetics and Analysis of QuantitativeTraits. Sunderland, MA: Sinauer, 1998. 15. West GB, Brown JH, Enquist BJ. A general model for ontogenetic growth. Nature 2001;413:628–31. 16. Le Goff L, Lecuit T. Gradient scaling and growth. Science 2011;331:1141–2. 17. Wu S, Yap JS, Li Y, et al. Network models for dissecting plant development by functional mapping. Curr Bioinform 2009;4:183–7. 18. Nadeau JH, Dudley AM. Systems genetics. Science 2011; 331:1015–6. 19. Dermitzakis ET. Cellular genomics for complex traits. Nat Rev Genet 2012;13:215–20. 20. Ma CX, Casella G, Wu RL. Functional mapping of quantitative trait loci underlying the character process: A theoretical framework. Genetics 2002;161: 1751–62. 21. Wu RL, Wang ZH, Zhao W, etal. A mechanistic model for genetic machinery of ontogenetic growth. Genetics 2004; 168:2383–94. 22. He QL, Berg A, Li Y, et al. Modeling genes for plant structure, development and evolution: Functional mapping meets plant ontology. Trends Genet 2010;26:39–46. 23. Li Y, Wu RL. Functional mapping of growth and development. Biol Rev 2010;85:207–16. 24. Yap JS, Wang CG, Wu RL. A computational approach for functional mapping of quantitative trait loci that regulate thermal performance curves. PLoS One 2007;2(6): e554. 25. Zhao W, Zhu J, Gallo-Meagher M, et al. A unified statistical model for functional mapping of genotype environment interactions for ontogenetic development. Genetics 2004; 168:1751–62. 26. Zhao W, Ma CX, Cheverud JM, et al. A unifying statistical model for QTL mapping of genotype-sex interaction for developmental trajectories. Physiol Genom 2004; 19:218–27. 27. Nicholson JK, Holmes E, Lindon JC, etal. The challenges of modeling mammalian biocomplexity. Nat Biotech 2004;22: 1268–74. 28. Lander AD. Pattern, growth, and control. Cell 2011;144: 955–69. 29. Wu RL, Cao JG, Huang ZW, et al. Systems mapping: How to improve the genetic mapping of complex traits through design principles of biological systems. BMC Syst Biol 2011; 5:84. 30. Niklas KJ, Enquist BJ. On the vegetative biomass partitioning of seed plant leaves, stems, and roots. Am Nat 2002;159: 482–97. 31. Chen J, Reynolds J. A coordination model of carbon allocation in relation to water supply. Ann Bot 1997;80: 45–55. 32. West GB, Brown JH, Enquist BJ. A general model for the origin of allometric scaling laws in biology. Science 1997; 276:122–6. 41 33. West GB, Brown JH, Enquist BJ. The fourth dimension of life: Fractal geometry and allometric scaling of organisms. Science 1999;284:1677–9. 34. Poorter H, Niklas KJ, Reich PB, et al. Biomass allocation to leaves, stems and roots: meta-analyses of interspecific variation and environmental control. New Phytol 2012;193: 30–50. 35. Niklas KJ, Enquist BJ. An allometric model for seed plant reproduction. Evol Ecol Res 2003;5:79–88. 36. Niklas KJ, Enquist BJ. Canonical rules for plant organ biomass partitioning and annual allocation. Am J Bot 2004;89: 812–9. 37. Niklas KJ, Enquist BJ. Biomass Allocation and Growth Data of Seeded Plants. Data set. Oak Ridge, TN: Oak Ridge National Laboratory Distributed Active Archive Center, 2004. 38. Klingenberg CP. Morphological integration and developmental modularity. Annu Rev Ecol Evol Syst 2008;39:115–32. 39. Ideker T, Galitski T, Hood L. A new approach to decoding life: systems biology. Annu Rev Genomics Hum Genet 2001;2: 343–72. 40. de Hoog CL, Mann M. Proteomics. Annu Rev Genomics Hum Genet 2004;5:267–93. 41. Karlebach G, Shamir R. Modelling and analysis of gene regulatory networks. Nat Rev Mol Cell Biol 2008;9:770–80. 42. Chong L, Ray LB. Whole-istic biology. Science 2002;295: 1661. 43. Boyes DC, Zayed AM, Ascenzi R, etal. Growth stage-based phenotypic analysis of Arabidopsis: a model for high throughput functional genomics in plants. Plant Cell 2001; 13:1499–510. 44. Ideker T, Thorsson V, Ranish JA, et al. Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001;292:929–34. 45. Kitano H. Systems biology: a brief overview. Science 2002; 295:1662–4. 46. Jansen RC, Nap JP. Genetical genomics: the added value from segregation. Trends Genet 2001;17:388–91. 47. Rockman MV, Kruglyak L. Genetics of global gene expression. Nat Rev 2006;7:862–72. 48. Cookson W, Liang L, Abecasis G, et al. Mapping complex disease traits with global gene expression. Nat Rev Genet 2009;10:184–94. 49. Bonneau R. Learning biological networks: from modules to dynamics. Nat Chem Biol 2008;4:658–64. 50. Nicholson JK, Holmes E, Lindon JC, etal. The challenges of modeling mammalian biocomplexity. Nat Biotechnol 2004; 22:1268–74. 51. Gardner TS, di Bernardo D, Lorenz D, et al. Inferring genetic networks and identifying compound mode of action via expression profiling. Science 2003;301:102–5. 52. Tegner J, Yeung MK, Hasty J, et al. Reverse engineering gene networks: integrating genetic perturbations with dynamical modeling. Proc Natl Acad Sci USA 2003;100: 5944–9. 53. Yeung MK, Tegner J, Collins JJ. Reverse engineering gene networks using singular value decomposition and robust regression. Proc Natl Acad Sci USA 2002;99:6163–8. 54. Gilchrist M, Thorsson V, Li B, et al. Systems biology approaches identify ATF3 as a negative regulator of Toll-like receptor 4. Nature 2006;441:173–8. 42 Wang et al. 55. Bonneau R, Reiss DJ, Shannon P, et al. The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol 2006; 7:R36. 56. Bonneau R, Facciotti MT, Reiss DJ, et al. A predictive model for transcriptional control of physiology in a free living cell. Cell 2007;131:1354–65. 57. Legewie S, Herzel H, Westerhoff HV, et al. Recurrent design patterns in the feedback regulation of the mammalian signalling network. Mol Syst Biol 2008;4:190. 58. Wu RL, Ma CX, Casella G. Statistical Genetics of Quantitative Traits: Linkage, Maps, and QTL. New York: Springer, 2007. 59. Morrell PL, Buckler ES, Ross-Ibarra J. Crop genomics: advances and applications. Nat Rev Genet 2012;13:85–96. 60. Reik W. The Wellcome Prize Lecture. Genetic imprinting: the battle of the sexes rages on. Exp Physiol 1996;81:161–72. 61. Reik W, Dean W, Walter J. Epigenetic reprogramming in mammalian development. Science 2001;293:1089–93. 62. Luedi PP, Dietrich FS, Weidman JR, et al. Computational and experimental identification of novel human imprinted genes. Genome Res 2007;17:1723–30. 63. Leibovitch MP, Nguyen VC, Gross MS, et al. The human ASM (adult skeletal muscle) gene: expression and chromosomal assignment to 11p15. Biochem Biophys Res Commun 1991;180:1241–50. 64. Matsuoka S, Edwards MC, Bai C, et al. p57KIP2, a structurally distinct member of the p21CIP1 Cdk inhibitor family, is a candidate tumor suppressor gene. Genes Dev 1995;9:650–62. 65. O’Dell SD, Day IN. Insulin-like growth factor II (IGF-II). IntlJ Biochem Cell Biol 1998;30:767–71. 66. Moore T, Haig D. Genomic imprinting in mammalian development: a parental tug-of-war. Trends Genet 1991;7: 45–9. 67. Wilkins JF, Haig D. What good is genomic imprinting: the function of parent-specific gene expression. Nat Rev Genet 2003;4:359–68. 68. Li Y, Guo YQ, Hou W, et al. A statistical design for testing transgenerational genomic imprinting in natural human populations. PLoS One 2011;6(2):e16858. 69. Wang CG, Wang Z, Luo JT, et al. A model for transgenerational imprinting variation in complex traits. PLoS One 2010;5(7):e11396. 70. Wang CG, Wang Z, Prows DR, et al. A computational framework for the inheritance of genomic imprinting for complex traits. Brief Bioinform 2012;13:34–45. 71. Barabási AL, Gulbahce N, Loscalzo J. Network medicine: a network-based approach to human disease. Nat Rev Genet 2011;12:56–68. 72. Fu GF, Luo J, Berg A, et al. A dynamic model for functional mapping of biological rhythms. J Biol Dyn 2010;4:1–10. 73. Luo JT, Hager WW, Wu RL. A differential equation model for functional mapping of a virus-cell dynamic system. J Math Biol 2010;65:1–15. 74. Yap J, Fan J, Wu RL. Nonparametric modeling of covariance structure in functional mapping of quantitative trait loci. Biometrics 2009;65:1068–77.
© Copyright 2024 Paperzz