DBRF–MEGN method: an algorithm for deducing minimum

Vol. 20 no. 16 2004, pages 2662–2675
doi:10.1093/bioinformatics/bth306
BIOINFORMATICS
DBRF–MEGN method: an algorithm for deducing
minimum equivalent gene networks from
large-scale gene expression profiles of gene
deletion mutants
Koji Kyoda1,3 , Kotaro Baba2,4 , Shuichi Onami1,2,3, ∗ and
Hiroaki Kitano1,2,3,5
1 Kitano
Symbiotic Systems Project, ERATO, Japan Science and Technology
Corporation and 2 The Systems Biology Institute, M31 6A, 6-31-15 Jingumae,
Shibuya, Tokyo 150-0001, Japan, 3 Graduate School of Science and Technology,
Keio University, 3-14-1 Hiyoshi, Kohoku, Yokohama 223-8522, Japan, 4 National
Institute of Agrobiological Sciences, 2-1-2 Kannondai, Tsukuba, Ibaraki 305-8602,
Japan and 5 Sony Computer Science Laboratories, Inc., 3-14-13 Higashi-Gotanda,
Shinagawa, Tokyo 141-0022, Japan
Received on September 15, 2003; revised on March 5, 2004; accepted on April 28, 2004
Advance Access publication May 14, 2004
ABSTRACT
Motivation: Large-scale gene expression profiles measured
in gene deletion mutants are invaluable sources for identifying
gene regulatory networks. Signed directed graph (SDG) is the
most common representation of gene networks in genetics
and cell biology. However, no practical procedure that deduces
SDGs consistent with such profiles has been developed.
Results: We developed the DBRF–MEGN (difference-based
regulation finding–minimum equivalent gene network) method
in which an algorithm deduces the most parsimonious SDGs
consistent with expression profiles of gene deletion mutants.
Positive (or negative) directed edges representing positive
(or negative) gene regulations are deduced by comparing
the gene expression level between the wild-type and mutant.
The most parsimonious SDGs are deduced using graph theoretical procedures. Compensation for excess removal of
edges by restoring a minimum number of edges makes the
method applicable to cyclic gene networks. Use of independent groups of edges greatly reduces the computational cost,
thus making the method applicable to large-scale expression profiles. We confirmed the applicability of our method
by applying it to the gene expression profiles of 265 Saccharomyces cerevisiae deletion mutants, and we confirmed
our method’s validity by comparing the pheromone response
pathway, general amino acid control system, and copper and
iron homeostasis system deduced by our method with those
reported in the literature. Interpretation of the gene network
deduced from the S. cerevisiae expression profiles by using
∗ To
whom correspondence should be addressed.
2662
our method led to the prediction of 132 transcriptional targets
and modulators of transcriptional activity of 18 transcriptional
regulators.
Availability: The software is available on request.
Contact: [email protected]
Supplementary information: http://www.so.bio.keio.ac.jp/
dbrf-megn/
INTRODUCTION
Identification of gene regulatory networks (hereafter called
gene networks) is essential for understanding cellular functions. Large-scale gene deletion projects (Liu et al., 1999;
Winzeler et al., 1999; Hamer et al., 2001; Giaever et al.,
2002) and DNA microarrays (Schena et al., 1995; Lockhart
et al., 1996) have enabled large-scale gene expression profiles of gene deletion mutants (deletants); these large-scale
profiles comprise the expression levels of thousands of genes
measured in deletants of those genes. Hughes et al. (2000)
reported gene expression profiles of more than 6300 genes
corresponding to 265 single-gene deletants in Saccharomyces
cerevisiae. Such profiles are invaluable sources for identifying
gene networks.
Many procedures, such as those by Ideker et al. (2000),
Kyoda et al. (2000), Pe’er et al. (2001), and Wagner (2001),
infer gene networks from large-scale expression profiles
of gene deletants. In these procedures, gene networks are
modeled using various mathematical representations. Ideker
et al. (2000) modeled gene networks as acyclic Boolean networks and inferred a network consistent with profiles by using
a combinatorial optimization technique. Kyoda et al. (2000)
Bioinformatics vol. 20 issue 16 © Oxford University Press 2004; all rights reserved.
Minimum equivalent gene networks
modeled gene networks as signed directed graphs (SDGs) and
deduced the most parsimonious graph consistent with profiles
by using a graph theoretical procedure. Pe’er et al. (2001)
modeled gene networks as Bayesian networks and inferred
gene networks by using machine learning technology. Wagner
(2001) modeled gene networks as directed acyclic graphs and
deduced the most parsimonious graph consistent with profiles
by using a graph theoretical procedure.
The SDG is a desirable representation of gene networks
because it is the most common representation of gene networks in genetics and cell biology. In such graphs, a regulation
between two genes is represented as a signed directed edge
(SDE) whose sign—positive or negative—represents whether
the effect of the regulation is activation or inhibition and whose
direction represents which gene regulates which other gene.
Because of the commonness of SDGs in genetics and cell
biology, SDGs consistent with large-scale gene expression
profiles will provide fruitful information for understanding
cellular function; such graphs can be directly compared with
gene networks identified through classical small-scale experiments and then can be interpreted in the same manner as those
small-scale gene networks.
Kyoda et al. (2000) previously developed the DBRF
(difference-based regulation finding) method, which deduces
the most parsimonious SDG consistent with the expression
profiles of gene deletants. However, the method is not applicable to cyclic gene networks. This is the critical drawback of
the DBRF method, because real gene networks contain many
feedback regulations (Ferrell, 2002; Guelzim et al., 2002; Lee
et al., 2002). Therefore, an algorithm that is applicable to
cyclic gene networks needed to be developed.
In this study, we developed the DBRF–MEGN (minimum
equivalent gene network) method, an improved algorithm for
deducing the most parsimonious SDG consistent with the
expression profiles of gene deletants. This method is applicable not only to acyclic but also to cyclic gene networks.
To show the applicability of this method, we applied it to
large-scale expression profiles of S. cerevisiae (Hughes et al.,
2000). The method successfully deduced the most parsimonious SDG consistent with these profiles. To evaluate the
validity of the method, we then compared the deduced graph
with gene networks reported in the literature and interpreted
this graph to predict the transcriptional targets and modulators
of transcriptional activity of known transcriptional regulators.
METHODS
DBRF–MEGN method
Four key concepts of the DBRF–MEGN method are described
here. The first two concepts, difference-based deduction of
edges and removal of redundant edges, were already implemented in the DBRF method (Kyoda et al., 2000). The last
two, compensation for excess removal of edges by restoring a minimum number of non-essential edges and the use of
independent groups, were originally developed for the DBRF–
MEGN method and thus are improvements on the DBRF
method.
The first concept is difference-based deduction of edges.
The gene expression profiles of gene deletants consist of the
expression levels of genes measured in deletants for each of
these genes. To deduce the SDG that is consistent with these
profiles, we used an assumption that is commonly used in
genetics and cell biology as is done in the DBRF method
(Kyoda et al., 2000), i.e. there exists a positive (negative)
regulation from gene A to gene B when the expression level
of gene B in the deletant of gene A is lower (higher) than in
the wild-type (Fig. 1A). For each possible pair of genes, we
determined whether positive (negative) regulations between
those genes exist and deduced all SDEs consistent with both
the stated assumption and the profiles; we call these edges
‘initially deduced edges’ (Fig. 1B). This computation required
n2 iterations, where n represents the number of genes in the
profiles.
The second concept is removal of redundant edges. The initially deduced edges consist not only of those representing
direct gene regulations but also those representing indirect
gene regulations. We define the regulation from gene A to
gene B as direct when gene A regulates gene B independently of other gene regulations, e.g. a transcription factor A
binds to upstream regulatory regions of gene B and increases
the transcription of gene B. On the other hand, we define gene
regulation as indirect when gene A regulates gene B as a result
of other regulations, e.g. a transcription factor A increases the
transcription of transcription factor C, which then increases
the transcription of gene B. A desirable gene network consists
only of direct gene regulations, because indirect regulations
do not correspond to molecular mechanisms of gene regulation. To choose edges representing direct gene regulations
from the initially deduced edges, all edges that are deductively explained by two other edges are removed (Fig. 1C), as
is done in the DBRF method (Kyoda et al., 2000), because
an indirect regulation is explained by direct regulations. To
reduce the computational cost, we implemented this removal
process by modifying Warshall’s algorithm (Warshall, 1962).
The resulting algorithm required n3 iterations.
The third concept is compensation for excess removal of
edges by restoring a minimum number of non-essential edges.
The edges chosen in the removal process, called ‘essential edges’, sometimes fail to explain the initially deduced
edges. This is the problem of the DBRF method when it is
applied to cyclic gene networks. Some edges represent direct
gene regulations even when they are explained by two other
edges (Fig. 1C). Therefore, the removal process sometimes
removes edges representing direct gene regulations, resulting in excess removal of edges (Fig. 1C). It is difficult to
know whether an edge represents direct or indirect gene regulation when the edge is explained by two other edges and
when only expression profiles of single-gene deletants are
2663
K.Kyoda et al.
Fig. 1. Example of the deduction of MEGNs from the gene expression profiles of gene deletants. (A) Assumption used in the DBRF–MEGN
method. (B) Deduction of initially deduced edges. The matrix represents a set of expression profiles, and the schematic represents a set
of initially deduced edges. In the matrix, a, b, . . . represent expression levels of gene a, b, . . . , and a, b, . . . represent deletants of gene
a, b, . . . . The up (down) arrows indicate that the gene expression levels are higher (lower) in the deletant than in the wild-type. (C) Essential
edges. Non-essential edges are light-colored. Dotted edges represent unexplained edges, which cannot be explained by essential edges. Either
d → e or d → f represents a direct gene regulation, and either h → i, i → g, and g → h or h → g, g → i, and i → h represent direct
gene regulations. (D) Four MEGNs of the profiles. (E) Independent groups of unexplained edges. Combination of the minimum numbers of
edges of two independent groups (G1 and G2) produce all four MEGNs.
available. Therefore, instead of looking for edges representing direct regulations among the removed edges (hereafter we
call the removed edges ‘non-essential edges’), the DBRF–
MEGN method compensates for excessively removed edges
2664
by restoring a minimum number of non-essential edges so that
the resulting edges (essential edges and the minimum number
of non-essential edges) can explain the initially deduced edges
(Fig. 1D). Often, several sets of such non-essential edges exist,
Minimum equivalent gene networks
and the DBRF–MEGN method deduces all sets. The resulting graphs are the most parsimonious SDGs consistent with
given profiles and are called ‘the minimum equivalent gene
networks’ (MEGNs) of the profiles.
The fourth concept is the use of independent groups. The
computation of the described process of deducing MEGNs is
bounded by n3 m
i=0 (I −E) Ci · (I − E − i), where m is the
number of non-essential edges to be restored, I is the number
of initially deduced edges and E is the number of essential
edges. This computation is impractical, however, because
(I −E) Cm increases rapidly as I − E and/or m increase. To
reduce the computational cost, non-essential edges are separated into ‘independent groups’ so that edges to be restored
can be deduced independently for each group (Fig. 1E). Edges
that are not explained by essential edges are chosen from nonessential edges, and these edges are divided into independent
groups so that the edges in one group do not explain those
in other groups. For each group, the minimum number of
edges with which essential edges can explain all edges in the
group are deduced. All sets of such edges are deduced for
each group, and all possible combinations of these sets are
computed to generate the
com MEGNs
mj of the profiles. The
3
putation is bounded by G
C
·(R
−i)·n
,
where
j
j =0
j
i=0 Rj i
G is the number of groups, Rj is the number of edges in group
j , nj is the number of genes in group j and mj is the number
of edges to be restored in group j .
A detailed description of the algorithm of the DBRF–
MEGN method is included in the Supplementary information.
Software implementation of this algorithm can be obtained
from the authors on request.
RESULTS
Applicability to large-scale expression profiles
obtained for real organisms
To evaluate the applicability of the DBRF–MEGN method,
we applied it to a subset of large-scale expression profiles
obtained for S. cerevisiae (Hughes et al., 2000). The set of
profiles comprises expression levels of 265 genes measured
in 265 gene deletants corresponding to those genes (see Supplementary information). Each expression level accompanies
a P -value, which corresponds to the significance of the difference from the expression level in the wild-type (Hughes
et al., 2000). We considered that the expression level in
the deletant is increased (decreased) when the level significantly differed from that in wild-type at P ≤ 0.01, which
is the same P -value used by Hughes et al. (2000). With
this P -value threshold, the DBRF–MEGN method deduced
829 initially deduced edges and 675 essential edges (see
Supplementary information). These essential edges deductively explained the initially deduced edges. Therefore, the
method deduced a unique MEGN of the profiles, and this
MEGN consisted only of those essential edges (Supplementary Figure S1 is a graphical representation of the MEGN).
The computation took ∼0.02 s on an Intel Pentium 4 PC
(2.8 GHz, 1 GB RAM).
The application we just described was a case in which essential edges deductively explained initially deduced edges. To
evaluate the applicability when essential edges fail to explain
initially deduced edges, we increased the number of initially
deduced edges by increasing the P -value threshold from 0.01
to 0.05. Essential edges failed to explain initially deduced
edges when the threshold was 0.03 and 0.05. In these two
cases, the DBRF–MEGN method successfully deduced 2 and
16 384 MEGNs, respectively (see Supplementary information), and the computation took ∼0.02 and 0.75 s, respectively.
These results show the applicability of the DBRF–MEGN
method to actual large-scale expression profiles.
Validity of gene networks deduced by the
DBRF–MEGN method: pheromone response
pathway
The pheromone response pathway is one of the best characterized cellular cascades in S. cerevisiae. The 265 genes of
the expression profiles we used include most genes of this
pathway. Therefore, to evaluate the validity of the DBRF–
MEGN method, we first compared the MEGN deduced from
the expression profiles for the 265 S. cerevisiae genes with the
known gene network in this pathway reported in the literature
(Fig. 2).
First, we focused on transcriptional regulations by Ste12p,
which is the central transcription factor in the pheromone
response pathway. Because the expression profiles applied to
the DBRF–MEGN method were a collection of mRNA levels
in gene deletants, an edge directing from STE12 was expected
to represent a transcriptional regulation by Ste12p. Among the
265 genes in the profiles, 6 genes are transcriptional targets of
Ste12p (Fig. 2A) (Errede and Ammerer, 1989; Sprague and
Thorner, 1992; Oehlen et al., 1996; Oehlen and Cross, 1998;
Ren et al., 2000; Roberts et al., 2000). The method cannot
deduce self-regulations because of its assumption (Fig. 1A
for a schematic of this assumption). Therefore, five positive
edges directing from STE12 to FAR1, FUS3, SST2, STE2 and
TEC1 were expected to be deduced (Fig. 2B). As expected,
the method deduced five edges directing from STE12, all five
of which were positive edges directing to each of those five
genes (Fig. 2C).
Next, we focused on the post-transcriptional regulation
cascade that regulates Ste12p activity. Deletion of a single
gene in this cascade increases (decreases) Ste12p activity,
which then increases (decreases) the STE12 mRNA level
because Ste12p self-increases its own transcription (Ren et al.,
2000). The applied expression profiles were a collection of
mRNA levels in gene deletants. Therefore, an edge directing from a gene to STE12 was expected to indicate the
existence of a post-transcriptional regulation cascade from
this gene to Ste12p, unless the gene is a transcriptional
regulator. Among the 265 genes, 11 are involved in the
2665
K.Kyoda et al.
post-transcriptional regulation cascade that regulates Ste12p
activity (Fig. 2A) (Tedford et al., 1997; Roberts et al., 2000;
Elion, 2001). However, deletion of 6 of those 11 genes was
not expected to affect the STE12 mRNA level for the following
three reasons. First, STE2 encodes the receptor for α-factor
(Jenness et al., 1983); the receptor would not be activated
in any of the experiments in which gene expression profiles
were measured because MATa cells, which do not secrete
α-factor (Herskowitz et al., 1992), were used in these experiments. Second, deletion of STE20 does not completely block
pheromone-induced Ste12p activation, suggesting unidentified pathways that bypass Ste20p activity (Ramer and Davis,
1993). Third, FUS3 and KSS1 (Elion et al., 1991) and DIG1
and DIG2 (Tedford et al., 1997) are functionally redundant. Therefore, five positive edges directing to STE12 from
STE4, STE5, STE7, STE11 and STE18 were expected to be
deduced by the DBRF–MEGN method (Fig 2B). As expected, the method deduced five edges directing to STE12, all
five of which were positive edges directed from each of those
five genes (Fig. 2C). These results show the validity of the
DBRF–MEGN method.
In addition to the 10 edges just described, the DBRF–
MEGN method deduced 2 unexpected edges (STE20 to STE2
and DIG1 to SST2) in this pathway.
Validity of gene networks deduced by the
DBRF–MEGN method: general amino acid
control system
Fig. 2. Validation of MEGN in the S. cerevisiae pheromone response
pathway. (A) Known pheromone response pathway. Six transcriptional regulations (red edges) and 14 post-transcriptional regulations
(light-green edges) were reported previously (Errede and Ammerer,
1989; Sprague and Thorner, 1992; Oehlen et al., 1996; Tedford et al.,
1997; Oehlen and Cross, 1998; Ren et al., 2000; Roberts et al., 2000;
Elion, 2001). (B) Expected edges in the pheromone response pathway. Five edges from STE12 to transcriptional targets (red edges)
and five edges from post-transcriptional regulators to STE12 (green
edges) were expected. Six edges from post-transcriptional regulators to STE12 (dotted green edges) were not expected because of
the experimental conditions or the redundancy of gene regulation.
(C) The MEGN in the pheromone response pathway. Five expected
edges from STE12 (red edges), five expected edges to STE12 (green
edges) and two unexpected edges (blue edges) were deduced.
2666
The general amino acid control system is a cross-pathway regulatory system that regulates many genes encoding amino acid
biosynthesis enzymes and increases their expression under
conditions of amino acid starvation in S. cerevisiae. We next
evaluated the validity of the DBRF–MEGN method in this
system.
First, we focused on transcriptional regulations by Gcn4p,
which is the central transcriptional regulator in the general amino acid control system. Gcn4p is required for full
induction of 539 genes in response to histidine starvation,
suggesting transcriptional regulation of these genes by Gcn4p
(Natarajan et al., 2001). Similar to that of the transcriptional
regulations by Ste12p, an edge directing from GCN4 is expected to represent a transcriptional regulation by Gcn4p (see
the second paragraph of the previous section). The DBRF–
MEGN method deduced two edges directing from GCN4.
Consistent with the expectation, these two edges were positive edges directing to one of the suggested 539 transcriptional
targets of Gcn4p (GCN4 to ALD5 and GCN4 to HIS1).
Although the 265 genes in the profiles involve 18 of those 539
transcriptional targets, the method deduced only two edges
directing from GCN4 to those targets. However, this finding does not indicate low sensitivity of the method, because
Gcn4p is not always required for the basal expression of
genes whose induction in response to amino acid starvation
Minimum equivalent gene networks
depends on Gcn4p (Pellman et al., 1990; Hinnebusch et al.,
1992).
Next, we focused on modulations of Gcn4p activity. Deletion of a gene that encodes a modulator of Gcn4p activity
increases (decreases) Gcn4p activity, which then increases
(decreases) the mRNA level of transcriptional targets of
Gcn4p. As described in the previous paragraph, ALD5 and
HIS1 are putative transcriptional targets of Gcn4p. Therefore,
we expected an edge directing from a gene to ALD5 or HIS1
to indicate the existence of modulation of Gcn4p activity by
this gene unless the gene is a transcriptional regulator. The
265 genes in the profiles include 12 genes (CKA2, CKB2,
MED2, RAD6, RPL12A, RPL20A, RPL27A, RPL6B, RPL8A,
RPS24A, RTS1 and UBR1) that encode modulators of Gcn4p
activity (Feng et al., 1994; Kornitzer et al., 1994; van den
Heuvel et al., 1995; Hinnebusch, 1997; Planta and Mager,
1998; Myers et al., 1999; Cherkasova and Hinnebusch, 2003;
Wang and Jiang, 2003; Mewes et al., 2004). We found 19
genes (ADE2, ASE1, CKB2, ERG2, FKS1, IMP2’, MED2,
RML2, RPL27A, RPS24A, RTG1, RTS1, SIR4, SOD1, UBR1,
VPS8, YHL029C, YMR014W and YMR293C) from which
edges deduced by the DBRF–MEGN method directed to either
ALD5 or HIS1 or both. Consistent with the expectation, these
19 genes included 6 (CKB2, MED2, RPL27A, RPS24A, RTS1
and UBR1) that encode modulators of Gcn4p activity. Three
of these six genes (CKB2, RTS1 and UBR1) have edges directing to both ALD5 and HIS1, whereas the remaining three
have a single edge directing to either ALD5 or HIS1 (MED2
and RPS24A to ALD5, and RPL27A to HIS1), suggesting
modulation of specific gene transcription or crosstalk among
modulators of different transcriptional regulators.
As just described, the 265 genes in the profiles included
12 genes encoding modulators of Gcn4p activity, and edges
directing to ALD5 or HIS1 were deduced from 6 of those
12 genes. Deletion of the remaining six genes (CKA2,
RAD6, RPL12A, RPL20A, RPL6B and RPL8A) was expected not to affect the mRNA levels of ALD5 and HIS1 for
the following reasons. CKA2 is functionally redundant to
CKA1 (Padmanabha et al., 1990). RPL12A, RPL20A, RPL6B
and RPL8A encode ribosomal proteins, many of which are
duplicated in S. cerevisiae (Planta and Mager, 1998; Hughes
et al., 2000). Absence of RAD6, which encodes a specific
ubiquitin conjugating enzyme required for Gcn4p degradation, mildly inhibits Gcn4p degradation (Kornitzer et al.,
1994). Among the 19 genes from which edges directed to
either ALD5 or HIS1 or both, 13 (ADE2, ASE1, ERG2,
FKS1, IMP2 , RML2, RTG1, SIR4, SOD1, VPS8, YHL029C,
YMR014W and YMR293C) do not encode known modulators
of Gcn4p activity. However, this finding does not indicate low
specificity of the deduction by the DBRF–MEGN method,
because activity of cellular processes involving these genes
may influence Gcn4p activity. For example, IMP2 is involved
in carbohydrate metabolism, and glucose limitation stimulates translation of Gcn4p (Donnini et al., 1992; Yang et al.,
2000). These results support the validity of the DBRF–MEGN
method.
Validity of gene networks deduced by the
DBRF–MEGN method: copper and iron
homeostasis system
To evaluate the validity of the DBRF–MEGN method in the
copper and iron homeostasis system, we first focused on transcriptional regulations by Mac1p and Aft1p, both of which
play key roles in this system. An edge directing from MAC1
was expected to represent either a direct transcriptional regulation by Mac1p or an indirect transcriptional regulation by
Mac1p through Aft1p for the following two reasons. First,
the absence of MAC1 increases the expression of AFT1 and
its transcriptional targets (De Freitas et al., 2004). Second, the
265 genes in the profiles include MAC1 but not AFT1. Among
the 265 genes in the profiles, 3 (ERG3, FRE6 and MNN1)
are transcriptionally regulated by Mac1p (Georgatsou and
Alexandraki, 1999; De Freitas et al., 2004) and 2 (ERG3 and
FRE6) are transcriptionally regulated by Aft1p (Martins et al.,
1998; Rutherford et al., 2003). The DBRF–MEGN method
deduced two edges directing from MAC1. Consistent with the
expectation, these two edges were negative edges directing to
ERG3 and FRE6. The edge directing from MAC1 to MNN1
was deduced when the P -value threshold was set to 0.02,
which is consistent with the imperfect reproducibility of the
reduction of MNN1 expression in mac1 cells (De Freitas
et al., 2004).
Next, we focused on modulations of Mac1p and Aft1p activities. As described in the previous paragraph, ERG3 and FRE6
are transcriptionally regulated by Mac1p and Aft1p. Therefore, we expected that an edge directing from a gene to ERG3
or FRE6 would indicate the existence of a modulation of
Mac1p or Aft1p by this gene unless the gene is a transcriptional regulator, similar to the situation involving modulators
of Gcn4p activity (see the third paragraph of the previous
section). Genes crucial for vacuolar functions are involved
in modulations of Mac1p and Aft1p activities, whereas those
crucial for mitochondrial functions are involved in modulations of Aft1p for the following two reasons. First, Mac1p
activity is downregulated by its direct binding to copper ions
(Jensen and Winge, 1998), whereas Aft1p activity is regulated
by its nuclear localization in response to cellular iron status
(Yamaguchi-Iwai et al., 2002). Second, vacuoles are crucial
for copper and iron homeostasis, whereas mitochondria are
crucial for iron homeostasis (De Freitas et al., 2003). We found
13 genes (AEP2, AFG3, CUP5, ERG2, ERG28, MRPL33,
RML2, RSM18, SSN6, VMA8, YEL044W, YMR031W-A and
YMR293C) from which edges deduced by the DBRF–MEGN
method directed to either ERG3 or FRE6 or both. Consistent
with the above expectation, these 13 genes included 8 that are
crucial for either vacuolar (CUP5 and VMA8) or mitochondrial (AFG3, AEP2, MRPL33, RML2, RSM18 and YMR293C)
functions (Kang et al., 1991; Finnegan et al., 1995; Paul and
2667
K.Kyoda et al.
Table 1. Performance of the DBRF–MEGN method
T = 0.005
IDE
MEGN
Pheromone response pathway
Sensitivity
0.90
0.90
(9/10)
(9/10)
Specificity
0.90
0.90
(9/10)
(9/10)
General amino acid control system
Sensitivity
0.18
0.18
(7/38)
(7/38)
Specificity
0.23
0.27
(7/31)
(7/26)
T = 0.01
IDE
MEGN
T = 0.02
IDE
MEGN
T = 0.03
IDE
MEGN
T = 0.04
IDE
MEGN
T = 0.05
IDE
MEGN
1.00
(10/10)
0.91
(10/11)
1.00
(10/10)
1.00
(10/10)
1.00
(10/10)
0.83
(10/12)
1.00
(10/10)
1.00
(10/10)
1.00
(10/10)
0.71
(10/14)
1.00
(10/10)
0.91
(10/11)
1.00
(10/10)
0.71
(10/14)
0.90
(9/10)
0.90
(9/10)
1.00
(10/10)
0.63
(10/16)
0.90
(9/10)
0.69
(9/13)
0.29
(11/38)
0.25
(11/44)
0.29
(11/38)
0.42
(11/26)
0.34
(13/38)
0.23
(13/57)
0.34
(13/38)
0.39
(13/33)
0.37
(14/38)
0.20
(14/69)
0.32
(12/38)
0.41
(12/29)
0.39
(15/38)
0.20
(15/75)
0.26
(10/38)
0.33
(10/30)
0.42
(16/38)
0.19
(16/83)
0.29
(11/38)
0.38
(11/29)
Sensitivity and specificity of the DBRF–MEGN method for the pheromone response pathway and the general amino acid control system, with the actual numbers given in parentheses.
A total of 10 gene regulations in the pheromone response pathway were used as the ‘gold standard’ for calculating sensitivity and specificity (STE12 to FAR1, FUS3, SST2, STE2
and TEC1, and STE4, STE5, STE7, STE11 and STE18 to STE12). A total of 38 gene regulations in the general amino acid control system were used as the ‘silver standard’ (GCN4
to ADE1, ALD5, ARG80, CBP2, ECA39, GLN3, HIS1, IMP2 , PET117, STB4, STE11, YAL004W, YEL059W, YER024W, YER033C, YHL045W, YIL037C and YOR072W, and CKB2,
MED2, RPL12A, RPL20A, RPL27A, RPL6B, RPL8A, RPS24A, RTS1 and UBR1 to ALD5 or HIS1). Four gene regulations that represent modulations of Gcn4p activity by Ckb2p or
Rad6p (CKB2 to ALD5 and HIS1, and RAD6 to ALD5 and HIS1) were excluded because these regulations were not expected to be deduced (Padmanabha et al., 1990; Kornitzer et al.,
1994). All edges that were included in at least one MEGN were used in calculating sensitivity and specificity for T = 0.03 and 0.05. Note that sensitivity for the general amino acid
control system was underestimated because all gene regulations of the silver standard are not necessarily expected to be deduced (see the section Validity of gene networks deduced
by the DBRF–MEGN method: general amino acid control system). Specificity for the general amino acid control system was also underestimated because of the possible existence
of uncharacterized gene regulations. T , P -value threshold and IDE, initially deduced edges.
Tzagoloff, 1995; Arlt et al., 1996; Pan and Mason, 1997;
Forgac, 1999; Hughes et al., 2000). For the remaining five
genes, the method deduced five edges directing to ERG3 or
FRE6 (ERG2 to ERG3, ERG28 to ERG3, SSN6 to ERG3,
YEL044W to FRE6 and YMR031W-A to FRE6). Two of these
five edges are consistent with the previously reported gene
regulations, although it is unknown whether Mac1p or Aft1p
play roles in these regulations (ERG2 to ERG3 and SSN6 to
ERG3; Arthington-Skaggs et al., 1996; Vik and Rine, 2001;
Kwast et al., 2002). It is possible that those edges represent
modulations of other transcriptional regulators than Mac1p
and Aft1p because the 265 genes in the profiles do not involve
all transcriptional regulators. The remaining three edges directed from recently characterized genes with little information
(ERG28; Hughes et al., 2000) or from uncharacterized open
reading frames (YEL044W, YMR031W-A). These results again
support the validity of the DBRF–MEGN method.
Optimal P -value threshold for the DBRF–MEGN
method
To determine the optimal P -value threshold for the DBRF–
MEGN method, we examined the sensitivity and specificity
of the method at various P -value thresholds (Table 1). For calculating the sensitivity and specificity, the 10 gene regulations
that were expected to be deduced in the pheromone response
pathway were used as the ‘gold standard’ (see the section
Validity of gene networks deduced by the DBRF–MEGN
method: pheromone response pathway). The 38 gene regulations in the general amino acid control system were used as the
‘silver standard’, because those regulations were not necessarily expected to be deduced (see the section Validity of gene
2668
networks deduced by the DBRF–MEGN method: general
amino acid control system). As expected, sensitivity increased
as the threshold increased. For the pheromone response pathway, sensitivity was highest when the threshold was between
0.01 and 0.03. Interestingly, a decrease in sensitivity was
observed as the threshold increased to ≥0.03. This decrease is
a result of the removal of true-positive edges that are explained
by two false-positive edges and those that are explained by
a combination of false-positive and true-positive edges during the process removal of redundant edges, indicating that
increased thresholds (≥0.03) do not provide the highest sensitivity because of the increased number of false-positive edges.
In contrast, specificity decreased as the threshold increased.
Noteworthily, decreased thresholds (≤0.005) did not provide
the highest specificity because the number of deduced edges
was too small for efficient removal of false-positive edges during the process removal of redundant edges. In light of these
results, we concluded that 0.01 is an optimal threshold that
provides a good balance of sensitivity and specificity.
Prediction of transcriptional targets and
modulators of transcriptional activity from
MEGNs
As described in the previous three sections, the DBRF–MEGN
method successfully deduced expected edges directing from
Ste12p, Gcn4p and Mac1p to their transcriptional targets,
those directing from its modulators (post-transcriptional regulators) to Ste12p, and those directing from their modulators to
transcriptional targets of Gcn4p and Mac1p. These successful
Minimum equivalent gene networks
Fig. 3. Schemes to predict transcriptional targets and modulators of transcriptional activity from MEGNs. (A) Scheme for a transcriptional
regulator that self-regulates its transcription. A gene is predicted to be a transcriptional target when an edge directs from this regulator to the
gene (red edge). A gene is predicted to be a modulator of transcriptional activity when an edge directs from the gene to the regulator (green
edge). (B) Scheme for a transcriptional regulator that does not self-regulate its own transcription. A gene is predicted to be a transcriptional
target when an edge directs from this regulator to the gene (red edge). A gene is predicted to be a modulator of transcriptional activity when
an edge directs from the gene to transcriptional targets of this regulator (green edge).
deductions suggest that transcriptional targets and modulators of a given transcriptional regulator can be predicted from
MEGNs by interpreting edges directing from the transcriptional regulator and those directing to the regulator or its
transcriptional targets. We examined such possible predictions
as follows.
First, we considered the transcriptional targets of a given
transcriptional regulator. An edge is expected to represent
a transcriptional regulation by the transcriptional regulator
when it directs from this regulator (Fig. 3), as was discussed
for the transcriptional regulations by Ste12p (see the second
paragraph of the section Validity of gene networks deduced by
the DBRF–MEGN method: pheromone response pathway).
Therefore, a gene is predicted to be a transcriptional target of
a given transcriptional regulator when an edge directs from
this regulator to the gene.
Next, we considered modulators of transcriptional activity
of a given transcriptional regulator. In this case, two alternative prediction schemes (schematically represented in Fig 3A
and B for prediction of modulators of transcriptional activity)
should be used, depending on whether the given regulator selfregulates its own transcription or not. In the first alternative
scheme, when the regulator self-regulates its transcription,
deletion of a gene that encodes a modulator of the activity
of this regulator increases (decreases) the mRNA level of
this regulator, as was discussed for the post-transcriptional
regulation cascade of Ste12p (see the third paragraph of the
section Validity of gene networks deduced by the DBRF–
MEGN method: pheromone response pathway). Therefore, a
gene is predicted to be a modulator of transcriptional activity
of a given transcriptional regulator when an edge directs from
the gene to the given regulator (Fig. 3A).
In the second alternative scheme, when the given transcriptional regulator does not self-regulate its own transcription,
deletion of a gene that encodes a modulator of the activity of this regulator is expected not to influence the mRNA
level of this regulator. Such deletion increases (decreases) the
activity of this regulator, which then increases (decreases) the
mRNA level of its transcriptional targets, as was discussed
for the modulations of Gcn4p activity (see the third paragraph of the section Validity of gene networks deduced by
the DBRF–MEGN method: general amino acid control system). Therefore, a gene is predicted to be a modulator of the
activity of a given transcriptional regulator when an edge directs from the gene to transcriptional targets of this regulator
(Fig. 3B).
Of the 265 genes in the profiles, 18 are listed as ‘transcriptional regulators’ in the Saccharomyces Genome Database
(Cherry et al., 1998; see gene list in Supplementary information). Based on the above two schemes for transcriptional
targets and modulators of transcriptional activity, we predicted transcriptional targets and modulators of transcriptional
activity of those 18 genes from the MEGN deduced by the
DBRF–MEGN method from the expression profiles of the
2669
K.Kyoda et al.
265 S. cerevisiae genes (Table 2). Nearly half (132) of the
265 genes were thus predicted as transcriptional targets or
modulators of transcriptional activity or both.
An important feature of a gene regulatory network is
crosstalk between cellular processes (Schwikowski et al.,
2000; Hinnebusch and Natarajan, 2002; Brun et al., 2003;
Pawson and Nash, 2003). Because of crosstalk, it is expected that the activity of a single transcriptional regulator is
modulated by genes involved in diverse cellular processes
and that genes involved in a single cellular process modulate the activity of several different transcriptional regulators.
To confirm the capability of our prediction schemes to predict
such crosstalk-dependent modulations, we compared modulators predicted by our schemes with gene clusters generated
by hierarchical clustering using the full set of profiles in the
Rosetta Compendium (Hughes et al., 2000); genes belonging
to the same cluster are likely to function in the same cellular process (Eisen et al., 1998). As expected, the predicted
modulators of most transcriptional regulators involved genes
belonging to several different clusters, and genes belonging
to the same cluster were involved in the modulators of several different transcriptional regulators (Table 2). The results
indicate that our prediction scheme can predict modulators
that modulate activity of transcriptional regulators through
crosstalk of cellular processes.
DISCUSSION
We developed the DBRF–MEGN method, an algorithm for
deducing the most parsimonious SDGs consistent with largescale expression profiles of gene deletants. One key feature of
this method is compensation for excessively removed edges
by restoring a minimum number of non-essential edges. This
makes the method applicable not only to acyclic gene networks but also to cyclic gene networks. Our previous method,
the DBRF method, fails to deduce the most parsimonious SDG
when the target network is cyclic (Kyoda et al., 2000). This
prevents the DBRF method from being widely used in the
analysis of large-scale gene expression profiles, because real
gene networks contain many feedback loops (Ferrell, 2002;
Guelzim et al., 2002; Lee et al., 2002). The applicability of
the DBRF–MEGN method to cyclic gene networks most probably will greatly improve the effectiveness of large-scale gene
expression profiles.
Another key feature of the DBRF–MEGN method is the
implementation of independent groups of non-essential edges.
This feature makes the method applicable to large-scale gene
expression profiles by greatly reducing the computational
cost of the process that deduces the minimum number of
non-essential edges for the compensation. The method successfully deduced MEGNs from the large-scale expression
profiles of 265 genes in 0.75 s, even when 16 384 different MEGNs exist. Without implementation of independent
groups, such a deduction would take 3.8×1015 years to obtain
2670
the same results. Despite the great reduction in computational
cost by the use of independent groups, there is no guarantee
that the method will deduce MEGNs from any given expression profiles in an acceptable time. The cost depends on the
maximum number of edges among all independent groups,
and this number corresponds to the modularity of the gene
network. Gene networks are predicted to be highly modulated
(Hartwell et al., 1999; Ravasz et al., 2002; Rives and Galitski,
2003). Therefore, the DBRF–MEGN method most probably
deduce MEGNs from most sets of expression profiles in an
acceptable time.
A major advantage of the DBRF–MEGN method is the
representation of gene networks. A gene network deduced
by this method is represented by SDG, the most common
representation of gene networks in genetics and cell biology. This commonness allows the deduced gene networks
to be compared with those identified through classical smallscale experiments and to be interpreted in the same way as
those small-scale gene networks. We compared the pheromone response pathway, general amino acid control system,
and copper and iron homeostasis system deduced by the
DBRF–MEGN method with those reported in the literature,
and found that the transcription targets and modulators of
transcriptional activity of 18 transcriptional regulators were
predicted from the MEGN of the expression profiles of 265
gene deletants. MEGNs probably will provide effective links
between large- and small-scale gene network analyses and will
provide important clues to understanding cellular function.
Another advantage of the DBRF–MEGN method is the
removal of redundant edges. The method removes as many
non-essential edges as possible from the initially deduced
edges. This makes the deduced graph simpler and more directly represent molecular mechanisms of gene networks. By
the DBRF–MEGN method, ∼20% of edges (154 of 829)
were removed from the initially deduced edges deduced
from S. cerevisiae gene expression profiles. In the pheromone response pathway, ∼65% of edges (23 out of 35)
were removed. All these removed edges represent indirect
regulations from post-transcriptional gene regulators to transcriptional targets of Ste12p. Therefore, removal of these
edges simplified interpretation of the MEGN in the pheromone
response pathway.
The DBRF–MEGN method has wider applicability and
higher performance than either the predictor method (Ideker
et al., 2000) or Pe’er et al.’s method (Pe’er et al., 2001)
or Wagner’s method (Wagner, 2001). The predictor method
(Ideker et al., 2000) is applicable only to acyclic gene networks, and its performance is lower than that of the DBRF
method because of the Boolean modeling of gene networks
in the predictor method (Kyoda et al., 2000). The DBRF–
MEGN method is applicable to both acyclic and cyclic gene
networks and is an improvement over the DBRF method. Pe’er
et al.’s method estimates a Bayesian network that models gene
networks, whereas the DBRF–MEGN method computes the
Minimum equivalent gene networks
Table 2. Predicted transcriptional targets and modulators of transcriptional activity
Transcriptional
Effect*
Modulators of transcriptional activity
Transcriptional targets
ADE2 CKB2 MED2 RML2 RPS24A RTG1 RTS1 SOD1 VPS8
ALD5 HIS1
regulators
GCN4
P
YMR014W YMR293C
GLN3
N
ASE1 ERG2 FKS1 IMP2' RPL27A SIR4 UBR1 YHL029C
P
ASE1 BIM1 CEM1 CLB2 DOT4 ISW2 MRT4 NPR2 PET117 RNR1
N
AEP2 PFD2 RTG1 YOR051C
P
AEP2 AFG3 CUP5 ERG2 ERG28 MRPL33 RML2 RSM18 SSN6
RPL20A SBH2 SPF1 SSN6 SST2 VMA8 YHR011W YMR031W-A
MAC1
AQY2A AQY2B YER024W
VMA8 YMR031W-A YMR293C
MBP1
N
YEL044W
P
YHR031C YOR080W
ERG3 FRE6
N
ERG2 ERG3 ERG28 MED2 RAD6
STE12
P
STE4 STE5 STE7 STE11 STE18
FAR1 FUS3 SST2 STE2 TEC1
RNR1
SWI4
P
AEP2 AFG3 CEM1 CUP5 DIG1 ERG2 ERG3 ERG28 FKS1 FUS3
CLB2 CLB6 ERG6 HST3 MNN1 MRT4
GAS1 HPT1 MED2 MSU1 QCR2 RML2 RPL12A RPL27A RPS24A
SWI5
RPS27B RSM18 SCS7 SGS1 SHE4 SIN3 SIR4 SPF1 SSN6 UBR1
VMA8 YEL044W YER083C YHL029C YHR011W YMR014W
YMR031W-A YMR293C YOR078W
N
CKB2 CUP5 DOT4 ERG2 ERG3 MED2 RML2 RPD3 RPL8A
ADE1 ADE2 ALD5 ARG80 ERG4
RPL12A RPL27A RPS24A RPS27B RTG1 RTS1 SIR4 SSN6 VMA8
YEL047C YHL029C YML011C YMR009W
VPS8 YEL044W YMR014W YMR293C YOR078W
SWI5
P
AEP2 ASE1 CLB2 CUP5 DIG1 FKS1 FUS3 GAS1 GYP1 IMP2
JNM1 KIM4 KIN3 MSU1 OST3 QCR2 RAD57 RPS27B RSM18
SCS7 SGS1 SIR2 SST2 VMA8 YAR014C
TEC1
N
YER030W
YIL117C
P
YER030W
YIL117C
N
AEP2 ASE1 CLB2 CUP5 DIG1 FKS1 FUS3 GAS1 GYP1 IMP2
JNM1 KIM4 KIN3 MSU1 OST3 QCR2 RAD57 RPS27B RSM18
SCS7 SGS1 SIR2 SST2 VMA8 YAR014C
TUP1
P
N
ADE2 AFG3 ERG28 MRT4 QCR2 RML2 RPD3 RRP6 RSM18 RTS1
ARD1 ERG2 FUS3 GYP1 HST3 ISW1
SHE4 SIN3 SIR4 SSN6 TOP3 VMA8 YEL033W YEL044W YHR011W
KIN3 KSS1 RAD27 RPS24A SIR2
YMR031W-A YOR078W YOR080W
YER083C YMR258C YOR015W
AEP2 CUP5 ECM18 ERG3 ERG4 FUS3 HMG1 HOG1 RAD6
AQY2A BUB2 CAT8 CIN5 ECM34 GPA2
RML2 RPD3 RPS27B SIN3 SST2 YER083C
MAC1 MAK10 NTA1 PEP12 PHD1 STE4
SWI4 UTR4 VPS21 YEL044W YEL059W
YEL067C YER033C YHL029C YHR022C
YHR031C YMR031C YOR009W
YOR051C
YAP1
P
ASE1 BIM1 CEM1 CLB2 ISW2 NPR2 PET117 RNR1 RPL20A SBH2
N
PFD2 RTG1 YOR051C
SOD1
SPF1 SST2 VMA8
YER024W
A total of 18 transcriptional regulators that are both listed as ‘transcriptional regulators’ in the Saccharomyces Genome Database (Cherry
et al., 1998) and included in the 265 genes in the profiles were analyzed (see gene list in Supplementary information). Transcriptional
targets and modulators of transcriptional activity of these 18 transcriptional regulators were predicted from the MEGN deduced from the
expression profiles of 265 S. cerevisiae gene deletants based on the two prediction schemes shown in Figure 3. All predicted transcriptional
targets and modulators of transcriptional activity are shown. Gene clusters reported by Hughes et al. (2000) are represented by different
colors: mitochondrial function (yellow), cell wall (brown), protein synthesis (sky blue), ergosterol biosynthesis (orange), mating (violet),
MAPK activation (turquoise), rnr1 HU (red), histone deacetylase (blue), isw (purple), vacuolar ATPase/iron regulation (bright pink), sir
(grey), tup1 ssn6 (light green), Gcn4 down (green) and Gcn4 up (bright green).
∗
Indicates the positive (P) or negative (N) effect of transcriptional regulation or modulation of transcriptional activity.
2671
K.Kyoda et al.
exact solution of the most parsimonious SDG that is consistent with expression profiles. Pe’er et al.’s method failed
to infer all gene regulations relating to STE12 (Pe’er et al.,
2001), whereas the DBRF–MEGN method deduced 10 such
regulations from the same expression profiles. Gene networks deduced by the Wagner’s method (Wagner, 2001) have
less information than those deduced by the DBRF–MEGN
method. An edge deduced by Wagner’s method represents
the direction of gene regulation but lacks information about
whether the regulation is activation or inhibition, whereas
an edge deduced by the DBRF–MEGN method has all this
information. Wagner’s method avoids deducing the cycle
structures of gene networks, whereas the DBRF–MEGN
method deduces all candidates of such structures. Therefore,
we conclude that the DBRF–MEGN method is better than all
three of these methods.
The success of gene network deduction by the DBRF–
MEGN method depends on the experimental conditions under
which the expression profiles were obtained. The expression
profiles used in the present study were obtained from asynchronous culture of deletant strains (Hughes et al., 2000).
Therefore, the method could not deduce cell-cycle-specific
gene regulations, such as MBP1 to CLB6 (Koch et al., 1993)
and SWI6 to CLB6 (Dirick et al., 1998). The method also could
not deduce diploid- or haploid-specific gene regulations, such
as TUP1 to STE5 (Mukai et al., 1993) and SIN3 to STE2
(Vidal et al., 1991), because some profiles were obtained in
diploid cells and others were obtained in haploid cells. To
deduce these types of regulations, expression profiles would
need to be obtained through more controlled experiments,
such as inactivation of gene function at a specific period of
the cell cycle in synchronized culture and measurement of
all the expression profiles in either diploid or haploid cells.
Improvements in technologies that more accurately control
experiments, such as real-time monitoring of the cell cycle
and drug-induced rapid inactivation of gene function, will
increase the effectiveness of the DBRF–MEGN method.
A major drawback of the DBRF–MEGN method is its
inability to deduce redundant gene regulations. The method
deduces a gene regulation only when deletion of a single gene
affects the expression level of another gene. Therefore, when
two or more genes redundantly regulate a gene, the method
cannot deduce any of these regulations. In the pheromone
response pathway, the method could not deduce five redundant regulations from STE20, FUS3, KSS1, DIG1 and DIG2 to
STE12. Importantly, this is not a drawback of our algorithm
but of the expression profiles of single-gene deletants. One
possible solution is to generate expression profiles of multiple
gene deletants, although such generation might entail enormous experimental costs. We are now developing an algorithm
applicable to such expression profiles.
Transcriptional targets and modulators of transcriptional
activity of given transcriptional regulators can be predicted from MEGNs. This prediction is an example of the
2672
interpretation of MEGNs. In this prediction, nearly half of
the 265 S. cerevisiae genes were predicted as transcriptional
targets or modulators of transcriptional activity or both. The
remaining genes likely are transcriptional targets or modulators of transcriptional activity of transcriptional regulators that
are not included in the 265 genes. In the pheromone response
pathway, the DBRF–MEGN method deduced two unexpected edges, STE20 to STE2 and DIG1 to SST2. In light of the
prediction schemes described in the previous section, Ste20p
is predicted to be a modulator of the transcriptional activity of
some transcriptional regulator that is not included in the 265
genes and that regulates the transcription of STE2. Similarly,
Dig1p is predicted to be a modulator of the transcriptional
activity of some transcriptional regulator that is not included
in the 265 genes and that regulates the transcription of SST2.
One possible approach to predicting the functions of those
remaining genes is to generate MEGNs from the expression
profiles of all the S. cerevisiae genes. Giaever et al. (2002) generated single deletants of almost all S. cerevisiae genes; the
expression profiles of all those deletants are highly anticipated.
Many direct gene regulations are represented in MEGNs but
not in the most parsimonious unsigned directed graphs consistent with expression profiles. The redundancy of edges is
determined by both accessibility and effect of three edges in
the DBRF–MEGN method, whereas it is determined only by
accessibility when the most parsimonious unsigned directed
graph is deduced. Therefore, a gene regulation whose accessibility is explained by two other regulations but whose effect is
not explained is represented in the MEGN but not in the most
parsimonious unsigned directed graph (e.g. regulation from
gene h to gene a in Fig. 1B). Approximately 15% of edges (104
out of 675) in MEGN deduced from S. cerevisiae expression
profiles represent such regulations. Regulation from STE12 to
FUS3 (Roberts et al., 2000) is one such regulation. Maki et al.
(2001) proposed a combination approach, in which the most
parsimonious unsigned directed graph is deduced from the
expression profiles of gene deletants and then functions of its
edges are inferred from time-series expression profiles. Integration of the DBRF–MEGN method probablywill improve the
performance of such combination approaches.
Large-scale gene expression profiles of gene deletants are
invaluable sources for understanding cellular functions. Clustering has been the only method widely used in the analysis
of these profiles. Although clustering can predict cellular
processes that involve the target gene, it provides no overt
information about the gene regulations, which make up gene
networks. The DBRF–MEGN method deduces gene regulations from large-scale expression profiles of gene deletants.
The method is applicable not only to expression profiles measured by using DNA microarrays but also to those measured by
using other technologies, such as 2D-PAGE-MS (Gygi et al.,
1999) and protein chips (Zhu et al., 2000). The DBRF–MEGN
method will provide fruitful information for understanding
cellular functions.
Minimum equivalent gene networks
ACKNOWLEDGEMENTS
We thank K. Oka for his support and valuable discussion. We
also thank A. Kimura, S. Hamahashi and M. Morohashi for
critical comments on this manuscript. This work was supported in part by a grant from the Ministry of Agriculture, Forestry
and Fisheries of Japan (Rice Genome Project SY1106) to
H.K. and S.O.; a grant from Special Coordination Funds for
Promoting Science and Technology (to H.K. and S.O.) and
Grant-in-Aid for the 21st Century Center of Excellence (COE)
Program entitled ‘Understanding and Control of Life’s Function via Systems Biology (Keio University)’ (to H.K.), the
Ministry of Education, Culture, Sports, Science and Technology, the Japanese Government; and a grant from Institute
for Bioinformatics Research and Development (BIRD), Japan
Science and Technology Agency to S.O.
REFERENCES
Arlt,H., Tauer,R., Feldmann,H., Neupert,W. and Langer,T. (1996)
The YTA10-12 complex, an AAA protease with chaperonelike activity in the inner membrane of mitochondria. Cell, 85,
875–885.
Arthington-Skaggs,B.A., Crowell,D.N., Yang,H., Sturley,S.L. and
Bard,M. (1996) Positive and negative regulation of a sterol biosynthetic gene (ERG3) in the post-squalene portion of the yeast
ergosterol pathway. FEBS Lett., 392, 161–165.
Brun,C., Chevenet,F., Martin,D., Wojcik,J., Guénoche,A. and
Jacq,B. (2003) Functional classification of proteins for the prediction of cellular function from a protein–protein interaction
network. Genome Biol., 5, R6.
Cherkasova,V.A. and Hinnebusch,A.G. (2003) Translational control
by TOR and TAP42 through dephosphorylation of eIF2α kinase
GCN2. Genes Dev., 17, 859–872.
Cherry,J.M., Adler,C., Ball,C., Chervitz,S.A., Dwight,S.S.,
Hester,E.T., Jia,Y., Juvik,G., Roe,T., Schroeder,M. et al. (1998)
SGD: Saccharomyces Genome Database. Nucleic Acids Res., 26,
73–79.
De Freitas,J., Wintz,H., Kim,J.H., Poynton,H., Fox,T. and Vulpe,C.
(2003) Yeast, a model organism for iron and copper metabolism
studies. Biometals, 16, 185–197.
De Freitas,J.M., Kim,J.H., Poynton,H., Su,T., Wintz,H., Fox,T.,
Holman,P., Loguinov,A., Keles,S., Van der Laan,M. et al. (2004)
Exploratory and confirmatory gene expression profiling mac1.
J. Biol. Chem., 279, 4450–4458.
Dirick,L., Goetsch,L., Ammerer,G. and Byers,B. (1998) Regulation
of meiotic S phase by Ime2 and a Clb5,6-associated kinase in
Saccharomyces cerevisiae. Science, 281, 1854–1857.
Donnini,C., Lodi,T., Ferrero,I. and Puglisi,P.P. (1992) IMP2, a nuclear gene controlling the mitochondrial dependence of galactose,
maltose and raffinose utilization in Saccharomyces cerevisiae.
Yeast, 8, 83–93.
Eisen,M.B., Spellman,P.T., Brown,P.O. and Botstein,D. (1998)
Cluster analysis and display of genome-wide expression patterns.
Proc. Natl Acad. Sci., USA, 95, 14863–14868.
Elion,E.A. (2001) The Ste5p scaffold. J. Cell Sci., 114, 3967–3978.
Elion,E.A., Brill,J.A. and Fink,G.R. (1991) FUS3 represses CLN1
and CLN2 and in concert with KSS1 promotes signal transduction.
Proc. Natl Acad. Sci., USA, 88, 9392–9396.
Errede,B. and Ammerer,G. (1989) STE12, a protein involved in celltype-specific transcription and signal transduction in yeast, is part
of protein–DNA complexes. Genes Dev., 3, 1349–1361.
Feng,L., Yoon,H. and Donahue,T.F. (1994) Casein kinase II mediates
multiple phosphorylation of Saccharomyces cerevisiae eIF-2α
(encoded by SUI2), which is required for optimal eIF-2 function
in S. cerevisiae. Mol. Cell. Biol., 14, 5139–5153.
Ferrell,J.E. (2002) Self-perpetuating states in signal transduction:
positive feedback, double-negative feedback and bistability. Curr.
Opin. Cell Biol., 14, 140–148.
Finnegan,P.M., Ellis,T.P., Nagley,P. and Lukins,H.B. (1995) The
mature AEP2 gene product of Saccharomyces cerevisiae, required
for the expression of subunit 9 of ATP synthase, is a 58 kDa
mitochondrial protein. FEBS Lett., 368, 505–508.
Forgac,M. (1999) Structure and properties of the vacuolar (H+ )ATPases. J. Biol. Chem., 274, 12951–12954.
Georgatsou,E. and Alexandraki,D. (1999) Regulated expression
of the Saccharomyces cerevisiae Fre1p/Fre2p Fe/Cu reductase
related genes. Yeast, 15, 573–584.
Giaever,G., Chu,A.M., Ni,L., Connelly,C., Riles,L., Véronneau,S.,
Dow,S., Lucau-Danila,A., Anderson,K., André,B. et al. (2002)
Functional profiling of the Saccharomyces cerevisiae genome.
Nature, 418, 387–391.
Guelzim,N., Bottani,S., Bourgine,P. and Képès,F. (2002) Topological and causal structure of the yeast transcriptional regulatory
network. Nat. Genet., 31, 60–63.
Gygi,S.P., Rochon,Y., Franza,B.R. and Aebersold,R. (1999) Correlation between protein and mRNA abundance in yeast. Mol. Cell.
Biol., 19, 1720–1730.
Hamer,L., Adachi,K., Montenegro-Chamorro,M.V., Tanzer,M.M.,
Mahanty,S.K.,
Lo,C.,
Tarpey,R.W.,
Skalchunes,A.R.,
Heiniger,R.W., Frank,S.A. et al. (2001) Gene discovery
and gene function assignment in filamentous fungi. Proc. Natl
Acad. Sci., USA, 98, 5110–5115.
Hartwell,L.H., Hopfield,J.J., Leibler,S. and Murray,A.W. (1999)
From molecular to modular cell biology. Nature, 402,
C47–C52.
Herskowitz,I., Rine,J. and Strathern,J. (1992) Mating-type
determination and mating-type interconversion in Saccharomyces cerevisiae. In Jones,E.W., Pringle,J.R. and Broach,J.R.
(eds), The Molecular and Cellular Biology of the Yeast
Saccharomyces. Cold Spring Harbor Laboratory Press, New York,
pp. 583–656.
Hinnebusch,A.G. (1992) General and pathway-specific regulatory
mechanisms controlling the synthesis of amino acid biosynthetic
enzymes in Saccharomyces cerevisiae. In Jones,E.W., Pringle,J.R.
and Broach,J.R. (eds), The Molecular and Cellular Biology of
the Yeast Saccharomyces. Cold Spring Harbor Laboratory Press,
New York, pp. 319–414.
Hinnebusch,A.G. (1997) Translational regulation of yeast GCN4.
A window on factors that control initiator-tRNA binding to the
ribosome. J. Biol. Chem., 272, 21661–21664.
Hinnebusch,A.G. and Natarajan,K. (2002) Gcn4p, a master regulator
of gene expression, is controlled at multiple levels by diverse
signals of starvation and stress. Eukaryot. Cell, 1, 22–32.
Hughes,T.R., Marton,M.J., Jones,A.R., Roberts,C.J., Stoughton,R.,
Armour,C.D., Bennett,H.A., Coffey,E., Dai,H., He,Y.D. et al.
(2000) Functional discovery via a compendium of expression
profiles. Cell, 102, 109–126.
2673
K.Kyoda et al.
Ideker,T.E., Thorsson,V. and Karp,R.M. (2000) Discovery of regulatory interactions through perturbation: inference and experimental
design. Pac. Symp. Biocomput., 305–316.
Jenness,D.D., Burkholder,A.C. and Hartwell,L.H. (1983) Binding
of α-factor pheromone to yeast a cells: chemical and genetic
evidence for an α-factor receptor. Cell, 35, 521–529.
Jensen,L.T. and Winge,D.R. (1998) Identification of a copperinduced intramolecular interaction in the transcription factor
Mac1 from Saccharomyces cerevisiae. EMBO J., 17, 5400–5408.
Kang,W., Matsushita,Y., Grohmann,L., Graack,H.R., Kitakawa,M.
and Isono,K. (1991) Cloning and analysis of the nuclear gene
for YmL33, a protein of the large subunit of the mitochondrial ribosome in Saccharomyces cerevisiae. J. Bacteriol., 173,
4013–4020.
Koch,C., Moll,T., Neuberg,M., Ahorn,H. and Nasmyth,K. (1993) A
role for the transcription factors Mbp1 and Swi4 in progression
from G1 to S phase. Science, 261, 1551–1557.
Kornitzer,D., Raboy,B., Kulka,R.G. and Fink,G.R. (1994) Regulated degradation of the transcription factor Gcn4. EMBO J., 13,
6021–6030.
Kwast,K.E., Lai,L.C., Menda,N., James,D.T.,III, Aref,S. and
Burke,P.V. (2002) Genomic analyses of anaerobically induced
genes in Saccharomyces cerevisiae: functional roles of Rox1 and
other factors in mediating the anoxic response. J. Bacteriol., 184,
250–265.
Kyoda,K.M., Morohashi,M., Onami,S. and Kitano,H. (2000) A gene
network inference method from continuous-value gene expression
data of wild-type and mutants. Genome Inform. Ser. Workshop
Genome Inform., 11, 196–204.
Lee,T.I., Rinaldi,N.J., Robert,F., Odom,D.T., Bar-Joseph,Z.,
Gerber,G.K., Hannett,N.M., Harbison,C.T., Thompson,C.M.,
Simon,I. et al. (2002) Transcriptional regulatory networks in
Saccharomyces cerevisiae. Science, 298, 799–804.
Liu,L.X., Spoerke,J.M., Mulligan,E.L., Chen,J., Reardon,B.,
Westlund,B., Sun,L., Abel,K., Armstrong,B., Hardiman,G. et al.
(1999) High-throughput isolation of Caenorhabditis elegans deletion mutants. Genome Res., 9, 859–867.
Lockhart,D.J., Dong,H., Byrne,M.C., Follettie,M.T., Gallo,M.V.,
Chee,M.S., Mittmann,M., Wang,C., Kobayashi,M., Horton,H.
et al. (1996) Expression monitoring by hybridization to highdensity oligonucleotide arrays. Nat. Biotechnol., 14, 1675–1680.
Maki,Y., Tominaga,D., Okamoto,M., Watanabe,S. and Eguchi,Y.
(2001) Development of a system for the inference of large scale
genetic networks. Pac. Symp. Biocomput., 446–458.
Martins,L.J., Jensen,L.T., Simons,J.R., Keller,G.L. and Winge,D.R.
(1998) Metalloregulation of FRE1 and FRE2 homologs in Saccharomyces cerevisiae. J. Biol. Chem., 273, 23716–23721.
Mewes,H.W., Amid,C., Arnold,R., Frishman,D., Güldener,U.,
Mannhaupt,G., Münsterkötter,M., Pagel,P., Strack,N., Stümpflen,V. et al. (2004) MIPS: analysis and annotation of proteins
from whole genomes. Nucleic Acids Res., 32, D41–D44.
Mukai,Y., Harashima,S. and Oshima,Y. (1993) Function of the
ste signal transduction pathway for mating pheromones sustains
MATα1 transcription in Saccharomyces cerevisiae. Mol. Cell.
Biol., 13, 2050–2060.
Myers,L.C., Gustafsson,C.M., Hayashibara,K.C., Brown,P.O. and
Kornberg,R.D. (1999) Mediator protein mutations that selectively
abolish activated transcription. Proc. Natl Acad. Sci., USA, 96,
67–72.
2674
Natarajan,K., Meyer,M.R., Jackson,B.M., Slade,D., Roberts,C.,
Hinnebusch,A.G. and Marton,M.J. (2001) Transcriptional profiling shows that Gcn4p is a master regulator of gene expression
during amino acid starvation in yeast. Mol. Cell. Biol., 21,
4347–4368.
Oehlen,L. and Cross,F.R. (1998) The mating factor response pathway
regulates transcription of TEC1, a gene involved in pseudohyphal
differentiation of Saccharomyces cerevisiae. FEBS Lett., 429,
83–88.
Oehlen,L.J., McKinney,J.D. and Cross,F.R. (1996) Ste12 and Mcm1
regulate cell cycle-dependent transcription of FAR1. Mol. Cell.
Biol., 16, 2830–2837.
Padmanabha,R., Chen-Wu,J.L., Hanna,D.E. and Glover,C.V. (1990)
Isolation, sequencing, and disruption of the yeast CKA2 gene:
casein kinase II is essential for viability in Saccharomyces
cerevisiae. Mol. Cell. Biol., 10, 4089–4099.
Pan,C. and Mason,T.L. (1997) Functional analysis of ribosomal
protein L2 in yeast mitochondria. J. Biol. Chem., 272, 8165–8171.
Paul,M.F. and Tzagoloff,A. (1995) Mutations in RCA1 and AFG3
inhibit F1 -ATPase assembly in Saccharomyces cerevisiae. FEBS
Lett., 373, 66–70.
Pawson,T. and Nash,P. (2003) Assembly of cell regulatory systems
through protein interaction domains. Science, 300, 445–452.
Pe’er,D., Regev,A., Elidan,G. and Friedman,N. (2001) Inferring
subnetworks from perturbed expression profiles. Bioinformatics,
17(Suppl. 1), S215–S224.
Pellman,D., McLaughlin,M.E. and Fink,G.R. (1990) TATAdependent and TATA-independent transcription at the HIS4 gene
of yeast. Nature, 348, 82–85.
Planta,R.J. and Mager,W.H. (1998) The list of cytoplasmic ribosomal
proteins of Saccharomyces cerevisiae. Yeast, 14, 471–477.
Ramer,S.W. and Davis,R.W. (1993) A dominant truncation allele
identifies a gene, STE20, that encodes a putative protein kinase
necessary for mating in Saccharomyces cerevisiae. Proc. Natl
Acad. Sci., USA, 90, 452–456.
Ravasz,E.,
Somera,A.L.,
Mongru,D.A.,
Oltvai,Z.N. and
Barabási,A.L. (2002) Hierarchical organization of modularity in
metabolic networks. Science, 297, 1551–1555.
Ren,B., Robert,F., Wyrick,J.J., Aparicio,O., Jennings,E.G.,
Simon,I., Zeitlinger,J., Schreiber,J., Hannett,N., Kanin,E. et al.
(2000) Genome-wide location and function of DNA binding
proteins. Science, 290, 2306–2309.
Rives,A.W. and Galitski,T. (2003) Modular organization of cellular
networks. Proc. Natl Acad. Sci., USA, 100, 1128–1133.
Roberts,C.J., Nelson,B., Marton,M.J., Stoughton,R., Meyer,M.R.,
Bennett,H.A., He,Y.D., Dai,H., Walker,W.L., Hughes,T.R. et al.
(2000) Signaling and circuitry of multiple MAPK pathways
revealed by a matrix of global gene expression profiles. Science,
287, 873–880.
Rutherford,J.C., Jaron,S. and Winge,D.R. (2003) Aft1p and Aft2p
mediate iron-responsive gene expression in yeast through related
promoter elements. J. Biol. Chem., 278, 27636–27643.
Schena,M., Shalon,D., Davis,R.W. and Brown,P.O. (1995) Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science, 270, 467–470.
Schwikowski,B., Uetz,P. and Fields,S. (2000) A network of protein–
protein interactions in yeast. Nat. Biotechnol., 18, 1257–1261.
Sprague,G.F.,Jr and Thorner,J.W. (1992) Pheromone response and
signal transduction during the mating process of Saccharomyces
Minimum equivalent gene networks
cerevisiae. In Jones,E.W., Pringle,J.R. and Broach,J.R. (eds), The
Molecular and Cellular Biology of the Yeast Saccharomyces, Cold
Spring Habor Laboratory Press, New York, pp. 657–744.
Tedford,K., Kim,S., Sa,D., Stevens,K. and Tyers,M. (1997) Regulation of the mating pheromone and invasive growth responses in
yeast by two MAP kinase substrates. Curr. Biol., 7, 228–238.
van den Heuvel,J., Lang,V., Richter,G., Price,N., Peacock,L.,
Proud,C. and McCarthy,J.E. (1995) The highly acidic C-terminal
region of the yeast initiation factor subunit 2α (eIF-2α) contains
casein kinase phosphorylation sites and is essential for maintaining normal regulation of GCN4. Biochim. Biophys. Acta, 1261,
337–348.
Vidal,M., Strich,R., Esposito,R.E. and Gaber,R.F. (1991) RPD1
(SIN3/UME4) is required for maximal activation and repression
of diverse yeast genes. Mol. Cell. Biol., 11, 6306–6316.
Vik,Å. and Rine,J. (2001) Upc2p and Ecm22p, dual regulators of
sterol biosynthesis in Saccharomyces cerevisiae. Mol. Cell. Biol.,
21, 6395–6405.
Wagner,A. (2001) How to reconstruct a large genetic network from
n gene perturbations in fewer than n2 easy steps. Bioinformatics,
17, 1183–1197.
Wang,H. and Jiang,Y. (2003) The Tap42-protein phosphatase type
2A catalytic subunit complex is required for cell cycle-dependent
distribution of actin in yeast. Mol. Cell. Biol., 23, 3116–3125.
Warshall,S. (1962) A theorem on Boolean matrices. J. Assoc.
Comput. Mach., 9, 11–12.
Winzeler,E.A., Shoemaker,D.D., Astromoff,A., Liang,H., Anderson,K., Andre,B., Bangham,R., Benito,R., Boeke,J.D., Bussey,H.
et al. (1999) Functional characterization of the S. cerevisiae
genome by gene deletion and parallel analysis. Science, 285,
901–906.
Yamaguchi-Iwai,Y., Ueta,R., Fukunaka,A. and Sasaki,R. (2002)
Subcellular localization of Aft1 transcription factor responds to
iron status in Saccharomyces cerevisiae. J. Biol. Chem., 277,
18914–18918.
Yang,R., Wek,S.A. and Wek,R.C. (2000) Glucose limitation induces
GCN4 translation by activation of Gcn2 protein kinase. Mol. Cell.
Biol., 20, 2706–2717.
Zhu,H., Klemic,J.F., Chang,S., Bertone,P., Casamayor,A.,
Klemic,K.G., Smith,D., Gerstein,M., Reed,M.A. and Snyder,M.
(2000) Analysis of yeast protein kinases using protein chips. Nat.
Genet., 26, 283–289.
2675