A gene coexpression network for bovine skeletal muscle inferred

Physiol Genomics 28: 76 – 83, 2006.
First published September 19, 2006; doi:10.1152/physiolgenomics.00105.2006.
CALL FOR PAPERS
2nd International Symposium on Animal Functional Genomics
A gene coexpression network for bovine skeletal muscle inferred from
microarray data
Antonio Reverter,1 Nicholas J. Hudson,1 Yonghong Wang,1 Siok-Hwee Tan,1 Wes Barris,1
Keren A. Byrne,1 Sean M. McWilliam,1 Cynthia D. K. Bottema,2 Adam Kister,2
Paul L. Greenwood, 3 Gregory S. Harper, 1 Sigrid A. Lehnert, 1 and Brian P. Dalrymple 1
Submitted 1 June 2006; accepted in final form 10 September 2006
Reverter A, Hudson NJ, Wang Y, Tan SH, Barris W, Byrne
KA, McWilliam SM, Bottema CD, Kister A, Greenwood PL,
Harper GS, Lehnert SA, Dalrymple BP. A gene coexpression
network for bovine skeletal muscle inferred from microarray data.
Physiol Genomics 28: 76 – 83, 2006. First published September 19,
2006; doi:10.1152/physiolgenomics.00105.2006.—We present the
application of large-scale multivariate mixed-model equations to the
joint analysis of nine gene expression experiments in beef cattle
muscle and fat tissues with a total of 147 hybridizations, and we
explore 47 experimental conditions or treatments. Using a correlationbased method, we constructed a gene network for 822 genes. Modules
of muscle structural proteins and enzymes, extracellular matrix, fat
metabolism, and protein synthesis were clearly evident. Detailed
analysis of the network identified groupings of proteins on the basis of
physical association. For example, expression of three components of
the z-disk, MYOZ1, TCAP, and PDLIM3, was significantly correlated.
In contrast, expression of these z-disk proteins was not highly correlated with the expression of a cluster of thick (myosins) and thin (actin
and tropomyosins) filament proteins or of titin, the third major
filament system. However, expression of titin was itself not significantly correlated with the cluster of thick and thin filament proteins
and enzymes. Correlation in expression of many fast-twitch muscle
structural proteins and enzymes was observed, but slow-twitch-specific proteins were not correlated with the fast-twitch proteins or with
each other. In addition, a number of significant associations between
genes and transcription factors were also identified. Our results not
only recapitulate the known biology of muscle but have also started to
reveal some of the underlying associations between and within the
structural components of skeletal muscle.
IN RECENT YEARS, gene expression profiling has become part of
the suite of technologies used by animal geneticists investigating livestock traits (7, 12, 16, 28). While some progress has
been made in the identification of differentially expressed
genes between two (or more) experimental conditions, it is also
thought that the analysis of large data sets generated from
expression profiles may allow the development of predictive
Article published online before print. See web site for date of publication
(http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: A. Reverter, Bioinformatics Group, CSIRO Livestock Industries, Queensland Bioscience Precinct, 306 Carmody Road, St. Lucia, QLD 4067, Australia (e-mail: [email protected]).
76
models of the systems underpinning the genetics of complex
traits (1, 36).1
Gene coexpression networks (GCN) can be determined from
expression experiments where a number of different perturbations have been profiled. These networks rely on the “guilt-byassociation” heuristic, widely invoked in genomics and with
universal applicability (35). These networks represent the integration of the various regulatory processes impacting on the
expression of the genes in the system being studied. However,
for maximum utility, the input data should be derived from
clearly defined systems and optimally designed experiments.
To explore regulatory pathways that control gene expression
in bovine skeletal muscle and to study the relationships between gene expression and structural compartments, we have
undertaken a series of gene expression experiments, mainly
using samples of the longissimus dorsi muscle (LD) from adult
animals. For this project, we constructed a bovine microarray
with 9,600 elements printed in duplicate and comprising
⬃2,000 expressed sequence tags (ESTs) and ⬃7,600 anonymous cDNA clones from muscle- and fat-derived libraries
(12).
Using this platform, researchers have conducted a number of
experiments to identify genes that are differentially expressed
across a variety of conditions, and the results have been
reported (13, 14, 18, 22, 30, 33, 34). Also, a multivariate
mixed-model method has been proposed to jointly analyze
independent experiments conducted on this platform (19). The
method was further implemented to show the optimality of
mixed models for data normalization in gene coexpression
studies (20). Based on these results, a network for bovine
skeletal muscle with 102 genes was constructed (21).
Here, we present the successful data-driven reconstruction
of a GCN in bovine skeletal muscle and adipose tissue, based
on gene expression profile data with 822 genes from 147
hybridizations across 9 experiments and 47 conditions. To
derive the network, we have applied a large-scale mixed model
to normalize gene coexpression measurements. We employed
1
The 2nd International Symposium on Animal Functional Genomics was
held May 16 –19, 2006 at Michigan State University in East Lansing, Michigan
and was organized by Jeanne Burton of Michigan State University and
Guilherme J. M. Rosa of University of Wisconsin-Madison (see meeting report
by Drs. Burton and Rosa, Physiol Genomics 28: 1-4, 2006).
1094-8341/06 $8.00 Copyright © 2006 the American Physiological Society
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 14, 2017
Cooperative Research Centre for Cattle and Beef Quality, 1Commonwealth Scientific and Industrial
Research Organisation (CSIRO) Livestock Industries, St. Lucia, Queensland; 2Agriculture and
Animal Science, University of Adelaide, Roseworthy, South Australia; and 3New South Wales
Department of Primary Industries, University of New England, Armidale, New South Wales, Australia
77
GENE NETWORK FOR BOVINE SKELETAL MUSCLE
partial correlations and data transmission theory to isolate
significant associations and explored the relationships between
transcription factors (TF) and their potential targets.
MATERIALS AND METHODS
Bovine fat and muscle cDNA library and microarray construction.
The development of the bovine cDNA microarray platform used in
this study and related quality control analyses have been described
previously in detail (12). In brief, total RNA was prepared from the
LD muscle and subcutaneous fat of a 24-mo-old Bos taurus Angus
steer. The tissue was dissected, immediately frozen in liquid nitrogen,
disrupted with a hammer, and homogenized in TRIzol (Invitrogen,
Carlsbad, CA) using an ultrasonic homogenizer (IKA-Ikasonic,
Staufen, Germany). Poly(A⫹) RNA was separated from total RNA
using PolyATtract (Promega, Madison, WI) magnetic streptavidin
beads and biotinylated oligo-dT probe according to the manufacturer’s protocol.
Selected cDNA inserts from the two cDNA libraries were amplified
by PCR in a 96-well plate format. Before spotting on microarray glass
slides, all the PCR products were resuspended in 50% DMSO and
transferred to a 384-well plate format. A total of 9,600 elements were
printed in duplicate onto glass slides. The array consisted of 9,222
cattle probes, comprising 7,291 anonymous cDNAs from the bovine
skeletal muscle and subcutaneous fat cDNA libraries and 1,915
bovine ESTs selected from the muscle and fat libraries, Commonwealth Scientific and Industrial Research Organisation (CSIRO) cattle
skin library (32), the USDA Meat Animal Research Center (MARC)
1– 4 BOV libraries (25), and collaborating laboratories. In addition, 16
EST-derived oligonucleotides (60-mer) and the Lucidea Microarray
Scorecard v1.1 (Amersham Bioscience, Piscataway, NJ) were printed
on the array.
The main classes of genes represented on the microarray were as
follows: contractile proteins (e.g., myosin isoforms), muscle metabolic enzymes (e.g., enolase, mitochondrial enzymes), extracellular
matrix (ECM) genes (e.g., collagens), and lipogenic enzymes (e.g.,
lipoprotein lipase). These functional classes are associated with the
main cellular component of muscle tissue as follows: the muscle fibers
Physiol Genomics • VOL
28 •
themselves, connective tissue, and intramuscular fat islands. The
microarray also includes a collection of candidate genes from the
adipogenesis and protein turnover literature.
Data sets and gene coexpression measures. We used nine microarray studies in genetic improvement of beef cattle spanning a total of
147 hybridizations (GEO accession numbers GPL4196 and
GPL4197), with mRNA from muscle and adipose tissues in 47
experimental conditions representing a broad dynamic range of perturbations of the cellular systems (Fig. 1). All animal experimentation
was approved by the Animal Ethics Committees of New South Wales
Agriculture, CSIRO, and University of Adelaide, South Australia.
For the present study, we did not perform any data preprocessing
with any filtering criteria based on differential or absolute level of
expression. We edited out readings with foreground signal ⱕ background signal, as well as clones not observed in all 47 conditions.
These criteria resulted in a total of 4,059,807 fluorescent signals from
7,898 clones of which 1,947 contained accurate (P ⬍ 0.01/7,898 ⫽
1.27E-6) functional BLAST annotation for 822 unique genes determined by searching the National Center for Biotechnology Information human reference sequence (RefSeq) collection of mRNA sequences. These included 67 genes that were reported as musclespecific genes (“MSG”) in a massively parallel signature sequencing
(MPSS) study (10) and/or a serial analysis of gene expression (SAGE)
database (3).
Normalized measures of gene coexpression were obtained from the
most optimal statistical modeling that allowed considering all uncertainty at once (20). In brief, the following nine-variate (one for each
experiment) mixed model was fitted:
y E ⫽ XE␤E ⫹ ZE1cE ⫹ ZE2aE ⫹ ZE3dE ⫹ ZE4tE ⫹ eE,
(1)
where yE is the vector of intensity signals from the E-th experiment;
XE is the incidence matrix relating signals in yE with systematic fixed
effects in ␤E including array slide, printing block, and dye channel;
ZE1 is the incidence matrix relating yE with random effects in cE
corresponding to the clones; ZE2 is the matrix relating yE with random
effects in aE corresponding to the three-way interaction of clone by
array and printing block; ZE3 is the matrix relating yE with random
effects in dE corresponding to the interaction of clone by dye channel;
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 14, 2017
Fig. 1. Design of the 9 microarray experiments spanning 147 hybridizations
and 47 experimental conditions. Experiment 1, to breeds by two diets; experiment 2, three diets; experiment 3, two
diets at three postnatal developmental
stages; experiment 4, two breeds at three
postnatal developmental stages; experiment 5, two fat treatments at two stages
with three time points each; experiment
6, two breeds at four fetal developmental
stages; experiment 7, two fat treatments
at three time points; experiment 8, two
breeds by two sexes; and experiment 9,
two breeds at three time points. Experiments 1, 2, 3, 4, 6, and 9 were performed
with mRNA extracted from biopsies of
the longissimus dorsi (LD) muscle. Experiment 5 studied the mechanisms underlying adipogenesis in vitro using cell
cultures. Experiments 7 and 8 were performed using biopsies from subcutaneous dorsal fat. Arrows indicate hybridizations from the sample labeled with red
to the sample labeled with green fluorescent dye.
78
GENE NETWORK FOR BOVINE SKELETAL MUSCLE
level equated to 0.689, and the rxy was set to zero if 兩rxy兩 ⱕ 0.689 兩rxz兩,
and 兩rxy兩 ⱕ 0.689 兩ryz兩. Otherwise, the association was assessed as
significant, and the connection between the pair of genes was established in the reconstruction of the network.
Network connectivity and regulatory modules. The above-mentioned criteria resulted in 42,673 connections out of a possible
337,431 that could have been established. Thus, the clustering coefficient was 12.6%. The clustering coefficient captures how many
neighbors of a given gene are connected to each other. The number of
connections per gene averaged 103.8 and ranged from 49 for GHSR
(growth hormone secretagogue receptor) to 217 for MYBPC2 (myosin
binding protein C, fast type).
For the construction of subnetworks, we concentrated on 26 TF that
were represented in our data sets and treated as the hubs in the
subnets. A gene x in the data set was included in the j-th TF-based hub
if the r value between gene x and the j-th TF was larger than the r
value between gene x and any other TF. To assess the optimality of the
resulting TF-based hubs, the probability of the observed connectivity
was evaluated by permutation test using 10,000 random permutations.
Furthermore, the number of MSG in each subnetwork was registered
and the hypergeometric test was used to estimate the chance probability of capturing at least the observed number of MSG. Finally, we
used the Onto-Tools (8) to identify overrepresented gene ontology
(GO) terms and pathways within each TF-based hub.
RESULTS
Hierarchical scale-free behavior of the inferred network.
The mixed model accounted for 84.3–96.9% of the total
variance. The vast majority of this variation was attributed to
the random effect of clone (⬃80%), followed by the clone by
array-block interaction (⬃15%). Gene expression correlation
coefficients (r) were calculated for all the pair-wise comparisons of the 822 genes, including 67 MSG and 26 TF that were
surveyed across the 47 conditions of the 9 experiments (Fig. 1).
Of the 822 genes, 269 had detectable expression in human
skeletal muscle in a large survey of human tissues (27). The
bovine genes were biased toward the more highly expressed of
the 932 genes detectable in human skeletal muscle. Correlations were assessed to be significant with a probability proportional to magnitude and relative to the partial correlations in all
trio-wise comparisons (Fig. 2, left). The connectivity of the
inferred network revealed a power-law distribution of genes
with a specific number of interactions on a log-log scale (Fig.
2, right). Using a nominal false discovery rate of ⬍1%, we
identified 123 genes involved in 312 connections for which 兩r兩
Fig. 2. Left: empirical density distribution
of all (gray) and significant (black) interactions (i.e., correlation coefficients)
among the 822 genes. Right: distribution
of genes with a specific number of interactions on a log-log scale. The straight line
indicates the theoretical power-law (r2 ⫽
84.7%) extrapolated from the experimental
data for connectivity greater than 5 (i.e.,
genes with more than 5 interactions).
Physiol Genomics • VOL
28 •
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 14, 2017
ZE4 is the matrix relating yE with random effects in tE corresponding
to the interaction of clones by each of the experimental treatment
conditions in the E-th experiment; and eE is the random error associated with signals in yE.
It is important to emphasize that model 1 operates at the clone level
(i.e., the probe on the microarray comprises the experimental unit),
and the clone was the main hierarchy in the variance decomposition
with the total variance being decomposed by the components of clone,
clone by array block, clone by dye, clone by treatment, and residual.
Once assembled, model 1 contained 1,762,338 equations and 81
(co)variance components that were estimated by restricted maximum
likelihood using the VCE software (http://w3.tzv.fal.de/⬃eg/vce4/
vce4.html). It took 248 h and 24 min of CPU time on a Dell server
with dual 3.6-GHz processors and 8 Gb of RAM, running a 64-bit
version of Red Hat Linux, for model 1 to be solved. Following
Reverter and colleagues (20), for each clone in c, a normalized vector
of observations was obtained from (t̂Ec ⫺ ␮c/E)/␴E, where t̂Ec is the
vector containing the best linear unbiased prediction (BLUP) of tE;
␮c/E is the vector containing the average BLUP of tE for the c-th clone
in the E-th experiment; and ␴E is the standard deviation of all the
BLUP of tE. Finally, vectors with normalized expressions for each
gene were obtained by averaging element by element those vectors
corresponding to clones that were annotated to a single gene.
The averaging of vectors for clones associated with a gene was
justified by the monotonic increasing pattern that was observed for the
average correlation among all pair-wise clones for genes highly
represented in the microarray. The number of genes with more than 1,
more than 5, and more than 10 clones was 268, 58, and 29, respectively. The average correlation among their clones was 0.394, 0.592,
and 0.637, respectively, for the same three categories of highly
represented genes. This increasing pattern was attributed to the optimality of the model-based normalization method as well as to the
severity of the criterion used for annotating clones to genes, resulting
in only 25% of clones (1,947 out of 7,898) being utilized in the
network reconstruction.
Identification of significant coexpressions. Partial correlation coefficients and an information theory approach were used to identify
meaningful gene-to-gene associations. Although not simultaneously,
both strategies have been applied in the reconstruction of gene
networks (2, 6). For every trio of genes in x, y, and z (having
92,231,140 combinations of 822 genes taken 3 at a time), we computed the first-order partial correlation coefficients in rxy.z, rxz.y, and
ryz.x. The partial correlation coefficient between x and y given z (rxy.z)
indicates the strength of the linear relationship between x and y that is
independent of (uncorrelated with) z. Again for every trio of genes,
and to ascertain a tolerance level for meaningful association, the
average ratio of partial to direct (or zero-order) correlation was
computed as follows: 1⁄3(rxy.z/rxy ⫹ rxz.y/rxz ⫹ ryz.x/ryz). This tolerance
GENE NETWORK FOR BOVINE SKELETAL MUSCLE
result. To try to overcome the limitations of the single TF hub
approach, we clustered genes based on their pattern of correlations to all of the TF (Fig. 4).
DISCUSSION
Tomczak and colleagues (31) clustered mouse genes based
on changes in expression during differentiation of C2C12 cells
from myoblasts to myotubes. Allowing for differences in
inclusion of genes in the two data sets, our module 4 overlapped extensively with their group III and in particular with
cluster 10. Group III contains those genes which are significantly induced during differentiation. Clustering of genes
within the groups in the C2C12 data was based on expression
ratios in the time course. In contrast to the C2C12 differentiation data set, the ECM genes were not correlated with the
main cluster of muscle structural protein and enzymes. This is
perhaps not surprising given the different systems analyzed. In
fact, in the cattle data set, ECM genes were negatively correlated with the module 4 genes.
The muscle module: fiber-type associations. Our analysis
was designed to identify some of the key underlying associa-
Fig. 3. Gene network developed with the 123 genes that were involved in 312 connections with r ⬍ ⫺0.75 (dotted lines) or r ⬎ 0.75 (solid lines). Four modules
were clearly distinguishable and for which gene ontology (GO) analyses revealed the following ontologies as being overrepresented: module 1, skeletal
development, structural constituent of extracellular matrix; module 2, regulation of cell growth/cycle, fatty acid biosynthesis; module 3, protein biosynthesis,
structural constituent of ribosome, RNA binding; module 4, muscle development, tropomyosin binding, magnesium ion binding, calcium ion binding.
Physiol Genomics • VOL
28 •
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 14, 2017
⬎ 0.75. Four distinct modules were identified by visual inspection (Fig. 3) prior to gene ontology (GO) analyses. The four
modules and their significant GO terms were as follows:
module 1, skeletal development, structural constituent of ECM;
module 2, regulation of cell growth/cycle, fatty acid biosynthesis; module 3, protein biosynthesis, structural constituent of
ribosome, RNA binding; module 4, muscle development, tropomyosin binding, magnesium ion binding, calcium ion binding. In this part of the GCN, 29 of the 67 MSG lay in module
4 (P ⬍ 0.0001).
TF analysis. Not unexpectedly, in our data set few correlations between genes and TF were significant. To try to identify
potential associations between genes and TF, each gene in the
GCN was allocated to the TF-based hub corresponding to the
most highly correlated TF for that particular gene. A number of
TF-defined subnetworks, including ANKRD1, ATF4, CSRP3,
and HMGA1, were significantly enriched for MSG (Table 1).
However, GO analysis of the TF-defined subnetworks did not
always identify significant associations between members of a
network and a function. In light of the small number of TF
included in this analysis and the fact that many if not all genes
are regulated by multiple TF, this is perhaps not a surprising
79
80
GENE NETWORK FOR BOVINE SKELETAL MUSCLE
Table 1. Significant transcription factors
TF
g
ANKRD1
ATF4
CSRP3
HMGA1
MYF6
MYOG
NFIX
NFKB2
SREBF1
ZFP36L1
40
73
38
37
54
28
33
35
32
36
MSG
7
11
8
6
6
2
2
0
0
3
k
c(k)
r
GO*
264
1,279
238
247
426
140
219
190
185
205
0.34
0.49
0.34
0.37
0.30
0.37
0.41
0.32
0.37
0.32
0.483
0.545
0.438
0.474
0.459
0.482
0.488
0.438
0.483
0.474
0003779
0003735
0001558
0000287
0005509
0005578
0003954
0003700
0003774
0003723
characteristics that could relate to either type I or type IIa
fibers. In fact, the connection of both SLC25A4 and CKMT2
with MYH2 (myosin isoform IIa) appears to support the latter.
These fibers exhibit fast contractile characteristics but are
highly aerobic.
In addition to the fiber-type-specific proteins, many structural proteins and enzymes that are components of all fiber
types were identified (e.g., z-disk proteins as discussed in the
following section). The expression of the thick filament protein
titin (TTN), the thin filament protein nebulin (NEB), and the
muscle-specific class III intermediate filament protein desmin
(DES) was not significantly correlated with the large cluster of
tions between muscle structural proteins and enzymes in a
production animal. Skeletal muscle in bovids is compartmentalized into discrete type I, IIa, IIx/d, and IIb fibers. These
fibers exhibit specific functional and metabolic properties relating to contractile speed, fatigue resistance, and substrate use.
Each fiber type expresses predictable suites of genes that help
define these functional differences. Such genes range from the
particular myosin heavy chain isoform through to the preferred
pathways of energy metabolism (aerobic vs. anaerobic) and
preferred substrate for combustion (triglycerides, glycogen,
and/or phosphocreatine). Under various treatment conditions,
the fiber types will respond uniquely. For example, in starving
mammals the faster type IIb fibers show a more rapid atrophy
than the type I fibers. Our muscle samples were from the LD
muscle, predominantly composed of type IIa (fast oxidativeglycolytic) and IIb (fast glycolytic) fibers, and many of the
gene expression clusters we have identified reflect such compartmentalization (Fig. 3). For example, MYBPC2 (fast-twitch
myosin binding protein) clusters with the carbohydrate metabolizing enzymes FBP2, PYGM, PGM1, and CKM. The expression of these enzymes reflects the physiological coordination
required to store and sequester glycogen and phosphocreatine,
the preferred storage substrates for the faster type IIa/x and b
fibers, but not the slower type I fibers. Furthermore, MYBPC2
also clusters with other key components of the fast-twitch
phenotype, including CASQ1 (fast-twitch calcium cycling) and
various sarcomeric, contractile proteins such as ACTN3 (high
velocity forceful contractions), MYH2 (IIa myosin heavy
chain), and MYH1 (IIx/d myosin heavy chain). However, the
fast muscle regulatory myosin light chain, MYLPF, was not
part of the large cluster, suggesting more independent expression of this subunit.
In contrast, there is no strong evidence for clustering of
slow-twitch (type I) proteins. For example, TNNT1, TNNC1,
MYOZ2, MYH7, MYL3, MYBPC1, and ATP2A2 are present in
the set of 822 genes, but with the exception of MYH7, their
expression is not significantly correlated with each other. This
may reflect the low abundance of the slower fibers in the LD
muscle. Nevertheless, SLC25A4, CKMT2, and MDH2 are clustered to some extent. As components of the mitochondria, they
may reflect the adaptations required for the more oxidative
Physiol Genomics • VOL
28 •
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 14, 2017
Number of total genes (g), muscle-specific genes (MSG), connections (k),
clustering coefficient (c(k)), and average correlation (r) among genes in each
transcription factor (TF) and most represented gene ontology term (GO).
*0003779, actin binding; 0003735, structural constituent of ribosome;
0001558, regulation of cell growth; 0000287, magnesium ion binding;
0005509, calcium ion binding; 0005578, extracellular matrix; 0003954,
NADH dehydrogenase activity; 0003700, transcription factor activity;
0003774, motor activity; 0003723, RNA binding.
Fig. 4. Hierarchical clustering of selected genes from the four modules
(modules 1 to 4, top to bottom) identified in Fig. 3 against the 26 TF included
in our study. Cells represent correlation coefficients from negative (red) to zero
(black) to positive (green).
www.physiolgenomics.org
GENE NETWORK FOR BOVINE SKELETAL MUSCLE
Physiol Genomics • VOL
28 •
these, only MEF2C and MYF6 show significantly altered
expression levels during C2C12 differentiation. However, in
addition to MYF6 and MEF2C, MYOG has been shown to
positively regulate the expression of ANKRD1 and ATF4. Thus
the analysis of this set of data has identified TF already known
to play a role in muscle development through the role of
MYOG. The results also suggest that NSEP1 (YBX1) may play
a significant role in regulating gene expression in muscle cells.
Indeed, NSEP1 has been proposed to be involved in the
activation of transcription in a subset of fast-twitch muscles
(26). In addition, myogenic regulatory factors MYF6 and
MYOG were negatively correlated with each other. There is
experimental evidence that this association may be a reflection
of biological reality. MYF6 ⫺/⫺ mice show only a subtle
phenotype but have a threefold increased level of MYOG
transcription (37), suggesting that MYOG may compensate for
the absence of MYF6 and that MYF6 may act as a negative
regulator for MYOG.
The differences between the expression of TF and muscle
proteins observed in C2C12 cells during differentiation and our
data may reflect differences between the in vitro differentiation
system and muscles in live animals. There appears to be a
negative r value between TF such as CSRP3 and NFE2L1 and
matrix protein gene expression. Neither of these proteins has
been reported to be involved in regulation of matrix protein
genes. CSRP3 is a positive regulator of myogenesis (potentially as a nuclear TF) and may also play a role in mechanical
stretch sensing in the z-disk. Other TF associated with the
matrix molecule cluster were NSEP1, NFKB2, and MYF6; of
these, NSEP1 has been reported to activate the transcription of
COL1A1 (29), but in our data, NSEP1 is negatively correlated
with COL1A1.
Ontology analyses reveal association between magnesium
and protein synthesis. Linkages among the most represented
GO terms (Fig. 5) revealed a substantial number of obvious
associations (e.g., hormone activity is associated with the
regulation of cell growth). These serve to validate the reverseengineering method. However, some of the other associations
have only become apparent in the more recent literature. For
example, the association between magnesium binding with
both the structural constituents of the ribosome and RNA
binding is compelling. Magnesium is possibly the missing
element in molecular interpretation of cell proliferation and
protein synthesis (23). These results suggest there is much
correspondence between the associations we have uncovered
computationally and those associations that are supported experimentally. The novel associations revealed here might provide a useful hypothesis-generating tool for future laboratory
research in the agricultural sciences.
Conclusions. We report the integration of nine microarray
gene expression experiments in beef cattle. Combined, these
represent the largest and most comprehensive bovine muscle
gene profiling data reported to date. We applied partial correlation coefficients and an information theoretic approach to
isolate significant gene-to-gene coexpression measures. Many
of our associations are consistent with the known biology of
muscle structure and development. We have also identified
previously unknown associations pointing to new relationships
between the components of muscle. The overall impression is
of a core of correlated expression of fast-twitch actin and
myosin filament proteins and enzymes, with the expression of
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 14, 2017
fast-twitch proteins, or with each other. However, a number of
different splice variants of the TTN and NEB proteins have
been identified in humans, and it is possible that these may
contribute to the lack of correlation in expression.
The muscle module: relationships between structural components. The fibers in skeletal muscle are highly organized and
are composed of a number of discrete structural components
that contain different sets of proteins (5). The relationships
between the expression of proteins within a component, such
as the z-disk, or the thick or thin filaments have not been
examined in any detail. By matching the nodes in the GCN
with the physical locations of the proteins in the muscle, we
have investigated the relationships between expression and
location of the proteins. The expression of MYOZ1 is significantly correlated with the expression of TCAP. These two
proteins are both located in the z-disk and bind to each other
(9), and they are located in the same cluster in the C2C12
differentiation data set (31). In addition, the expression of
TCAP is correlated with the expression of PDLIM3, another
component of the z-disk (11). The expression of both TCAP
and PDLIM3 is also positively correlated with the expression
of PFKM (a muscle cytoplasmic phosphofructokinase) and
negatively correlated with the expression of SPP1 (osteopontin). The specific role, if any, of PFKM and SPP1 in muscle
z-disk is currently unknown; however, PDLIM3 and PFKM lay
in the same cluster in the C2C12 data set (31).
CSRP3 (muscle LIM protein) is also located in the z-disk,
but its expression is not correlated with the MYOZ1 cluster.
However, the expression of CRYAB and MUSTN1 is significantly correlated with the expression of CSRP3. Muscles of
individuals with mutations in CRYAB have been reported to
exhibit myofibrillar disintegration that begins in the z-disk
(24). MUSTN1 is a nuclear protein expressed during musculoskeletal development and regeneration and in adult muscle and
tendons (15). The correlation with CSRP3 suggests that
MUSTN1 may be involved in the same signaling pathway. A
number of other z-disk proteins were included in the analysis:
TTID expression was correlated with the expression of
SLC25A4 (a muscle mitochondrial adenine nucleotide translocator), ACTN3 expression was highly correlated with the
expression of a number of major muscle structural proteins
including TPM2, TMOD4, MYH1, and MYBPC2 and enzymes
such as FBP2, ENO3, and CKMT2. In contrast, ACTN2 expression was not as highly correlated with the large cluster in
module 4. In humans ACTN2 is expressed in all muscle fibers,
while ACTN3 is restricted to a subset of type II fibers (17). In
mice, ACTN2 is expressed in all type I, IIa, and IIx/d fibers and
some but not all IIb fibers, while ACTN3 is restricted to type
IIb fibers. The stronger correlation of ACTN3 may reflect the
preponderance of IIb fibers in LD muscle.
TF analysis. Figure 4 shows a negative r value between the
expression of MYOG, HMGA1, and ZFP36L1 and a cluster of
proteins that includes a number of muscle structural proteins.
Expression of both MYOG and ZFP36L1 is induced by insulinlike growth factor I (IGF1), which promotes muscle growth,
and both are induced during differentiation of mouse C2C12
cells and hence are positively correlated with the major muscle
structural proteins and enzymes (31). Additional TF that correlated with the muscle structural proteins are ANKRD1, ATF4,
NSEP1 (YBX1), MYF6 (all positively correlated), and more
weakly NFKB2 and MEF2C (both negatively correlated). Of
81
82
GENE NETWORK FOR BOVINE SKELETAL MUSCLE
and in vitro conditions. We view this experimental breadth as
a strength because only the most fundamental transcriptional
associations will clearly emerge. On the other hand, we cannot
reject the possibility that some real associations have been
overlooked. There are some distinct differences between the
network from this analysis and a previous analysis of the first
five sets of data (21). The future challenge is to explore the
network that results from the integration of various sets of data
to construct the hierarchy of the relationships between the
components of the network, and from this, to determine the
regulatory nature that lies behind the coexpression network.
Fig. 5. Correlation structure among the 22 gene ontologies with ⱖ5 genes and
representing 263 genes (or 32% of the total). Black and gray cells indicate
positive and negative relationships, respectively. White cells indicate noncorrelated ontologies. Ontologies are as follows: 1, regulation of cell cycle; 2,
ubiquitin ligase complex; 3, magnesium ion binding; 4, nuclear mRNA
splicing, via spliceosome; 5, skeletal development; 6, regulation of cell growth;
7, DNA binding; 8, transcription factor activity; 9, RNA binding; 10, structural
constituent of ribosome; 11, actin binding; 12, GTPase activity; 13, cytochrome-c oxidase activity; 14, protein serine/threonine kinase activity; 15,
hormone activity; 16, binding; 17, ATP binding; 18, GTP binding; 19,
extracellular matrix; 20, integral to nucleus; 21, integral to plasma membrane;
and 22, integral to membrane.
slow-twitch-muscle-specific proteins, and some components
common to most or all fibers types, being less correlated to
each other and the fast-twitch proteins. We have identified a
number of clusters of z-disk proteins marginally correlated to
the fast-twitch cluster. Clearly, the exact nature of the relationships between the different types of protein molecules in the
different components and types of muscle fibers is complex.
We have compared our results, predominantly from the LD
muscle in adult cattle in a range of production and nonproduction environments, with results from the in vitro differentiation
of mouse C2C12 cells. The comparisons have highlighted the
shortcomings of gene expression correlation analyses. The
very restricted nature of the C2C12 data does not allow fine
details to be distinguished. At the other end of the spectrum,
undertaking correlation analyses across diverse tissues will
focus on very basic cellular processes. We have attempted to
include a range of conditions from a very limited range of
tissues from B. taurus to find the balance between the overall
very highly correlated networks in C2C12 cells and the inclusion of a very diverse set of tissues. However, we acknowledge
that the bovine transcriptomes analyzed here are limited in
coverage and were derived from microarray data from at least
two cell types (although predominantly skeletal muscle and
adipose tissue with smaller proportion of fibroblasts and endothelial cells), a total of 47 experimental manipulations (including development and nutritional intervention), and both in vivo
Physiol Genomics • VOL
28 •
1. Andersson L, Georges M. Domestic-animal genomics: deciphering the
genetics of complex traits. Nat Rev Genet 5: 202–212, 2004.
2. Basso K, Margolin AA, Slolovitzky G, Klein U, Dalla-Favera R,
Califano A. Reverse engineering of regulatory networks in human B cells.
Nat Genet 37: 382–390, 2005.
3. Boon K, Osorio EC, Greenhut SF, Schaefer CF, Shoemaker J, Polyak
K, Morin PJ, Buetow KH, Strausberg RL, De Souza SJ, Riggins GJ.
An anatomy of normal and malignant gene expression. Proc Natl Acad Sci
USA 99: 11287–11292, 2002.
4. Byrne KA, Wang YH, Lehnert SA, Harper GS, McWilliam SM, Bruce
SL, Reverter A. Gene expression profiling of muscle tissue in Brahman
steers during nutritional restriction. J Anim Sci 83: 1–12, 2005.
5. Clark KA, McElhinny AS, Beckerle MC, Gregorio CC. Striated muscle
cytoarchitecture: an intricate web of from and function. Annu Rev Cell Dev
Biol 18: 637–706, 2002.
6. de la Fuente A, Bing N, Hoeschele I, Mendes P. Discovery of meaningful associations in genomic data using partial correlation coefficients.
Bioinformatics 20: 3565–3574, 2004.
7. Donaldson L, Vuocolo T, Gray C, Strandberg Y, Reverter A, McWilliam SM, Wang YH, Byrne KA, Tellam R. Construction and validation
of a Bovine innate immune microarray. BMC Genomics 6: 135, 2005.
8. Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA.
Onto-Tools, the toolkit of the modern biologist: Onto-Express, OntoCompare, Onto-Design and Onto-Translate. Nucleic Acids Res 31: 3775–
3781, 2003.
9. Faulkner G, Pallavicini A, Comelli A, Salamon M, Bortoletto G,
Ievollela C, Trevisan S, Kojic S, Dalla Vecchia F, Laveder P, Valle G,
Lanfranchi G. FATZ, a filamin-, actinin-, and telethonin-binding protein
of the Z-disc of skeletal muscle. J Biol Chem 275: 41234 – 41242, 2000.
10. Jongeneel CV, Delorenzi M, Iseli C, Zhou D, Haudenschild CD,
Khrebtukova I, Kuznetsov D, Stevenson BJ, Strausberg RL, Simpson
AJ, Vasicek TJ. An atlas of human gene expression from massively
parallel signature sequencing (MPSS). Genome Res 15: 1007–1014, 2005.
11. Klaavuniemi T, Kelloniemi A, Ylanne J. The ZASP-like motif in
actinin-associated LIM protein is required for interaction with the alphaactinin rod and for targeting to the muscle Z-line. J Biol Chem 279:
26402–26410, 2004.
12. Lehnert SA, Wang YH, Byrne KA. Development and application of a
bovine cDNA microarray for expression profiling of muscle and adipose
tissue. Aust J Exp Agric 44: 1127–1133, 2004.
13. Lehnert SA, Wang YH, Tan SH, Reverter A. Gene expression-based
approaches to beef quality research. Aust J Exp Agric 46: 165–172, 2006.
14. Lehnert SA, Byrne KA, Reverter A, Nattrass GS, Greenwood PL,
Wang YH, Hudson NJ, Harper GS. Gene expression profiling of bovine
skeletal muscle in response to and during recovery from chronic and
severe under-nutrition. J Anim Sci 84: 3239 –3250, 2006.
15. Lombardo F, Komatsu D, Hadjiargyrou M. Molecular cloning and
characterization of Mustang, a novel nuclear protein expressed during
skeletal development and regeneration. FASEB J 18: 52– 61, 2004.
16. Nobis W, Ren X, Suchyta SP, Suchyta TR, Zanella AJ, Coussens PM.
Development of a porcine brain cDNA library, EST database, and microarray resource. Physiol Genomics 16: 153–159, 2003.
17. North KN, Beggs AH. Deficiency of a skeletal muscle isoform of
alpha-actinin (alpha-actinin-3) in merosin-positive congenital muscular
dystrophy. Neuromuscul Disord 6: 229 –235, 1996.
18. Reverter A, Byrne KA, Bruce HL, Wang YH, Dalrymple BP, Lehnert
SA. A mixture model-based cluster analysis of DNA microarray gene
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 14, 2017
REFERENCES
GENE NETWORK FOR BOVINE SKELETAL MUSCLE
19.
20.
21.
22.
24.
25.
26.
27.
28.
Physiol Genomics • VOL
28 •
29.
30.
31.
32.
33.
34.
35.
36.
37.
YR, Burton JL, Collier RJ, DePeters EJ, Ferris TA, Lucy MC,
McGuire MA, Medrano JF, Overton TR, Smith TP, Smith GW,
Sonstegard TS, Spain JN, Spiers DE, Yao J, Coussens PM. Development and testing of a high-density cDNA microarray resource for cattle.
Physiol Genomics 15: 158 –164, 2003.
Sun W, Hou F, Panchenko MP, Smith BD. A member of the Y-box
protein family interacts with an upstream element in the alpha1 (I)
collagen gene. Matrix Biol 20: 527–541, 2001.
Tan SH, Reverter A, Wang YH, Byrne KA, McWilliam SM, Lehnert
SA. Gene expression profiling of bovine in vitro adipogenesis using a
cDNA microarray. Funct Integr Genomics Feb. 10: 1–15, 2006.
Tomczak KK, Marinescu VD, Ramoni MF, Sanoudou D, Montanaro
F, Han J, Kunkel LM, Kohane IS, Beggs AH. Expression profiling and
identification of novel genes involved in myogenic differentiation. FASEB
J 18: 403– 405, 2004.
Wang YH, McWilliam SM, Barendse W, Kata SR, Womack JE,
Moore SS, Lehnert SA. Mapping of 12 bovine ribosomal protein
genes using a bovine radiation hybrid panel. Anim Genet 322: 69 –73,
2001.
Wang YH, Reverter A, Mannen H, Taniguchi M, Harper GS, Oyama
K, Byrne KA, Oka A, Tsuji S, Lehnert SA. Transcriptional profiling of
muscle tissue in growing Japanese Black cattle to identify genes involved
with the development of intramuscular fat. Aust J Exp Agric 45: 809 – 820,
2005.
Wang YH, Byrne KA, Reverter A, Harper GS, Taniguchi M, McWilliam SM, Mannen H, Oyama K, Lehnert SA. Transcriptional profiling
of skeletal muscle tissue from two breeds of cattle. Mamm Genome 16:
201–210, 2005.
Wolfe CJ, Kohane IS, Butte AJ. Systematic survey reveals general
applicability of “guilt-by-association” within gene coexpression networks.
BMC Bioinformatics 6: 227, 2005.
Womack JE. Advances in livestock genomics: opening the barn door.
Genome Res 15: 1699 –1705, 2005.
Zhang W, Behringer RR, Olson EN. Inactivation of the myogenic bHLH
gene MRF4 results in up-regulation of myogenin and rib anomalies. Genes
Dev 9: 1388 –1399, 1995.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.3 on June 14, 2017
23.
expression data on Brahman and Brahman composite steers fed high-,
medium-, and low-quality diets. J Anim Sci 81: 1900 –1910, 2003.
Reverter A, Wang YH, Byrne KA, Tan SH, Harper GS, Lehnert SA.
Joint analysis of multiple cDNA microarray studies via multivariate mixed
models applied to genetic improvement of beef cattle. J Anim Sci 82:
3430 –3439, 2004.
Reverter A, Barris W, McWilliam SM, Byrne KA, Wang YH, Tan SH,
Hudson NJ, Dalrymple BP. Validation of alternative methods of data
normalization in gene co-expression studies. Bioinformatics 21: 1112–
1120, 2005.
Reverter A, Barris W, Moreno-Sánchez N, McWilliam SM, Wang
YH, Harper GS, Lehnert SA, Dalrymple BP. Construction of gene
interaction and regulatory networks in bovine skeletal muscle from expression data. Aust J Exp Agric 45: 821– 829, 2005.
Reverter A, Ingham A, Lehnert SA, Tan SH, Wang YH, Ratnakumar
A, Dalrymple BP. Simultaneous identification of differential gene expression and connectivity in inflammation, adipogenesis and cancer. Bioinformatics July 24, 2006; doi:10.1093/bioinformatics/btl392.
Rubin H. Magnesium: the missing element in molecular views of cell
proliferation control. Bioessays 27: 311–320, 2005.
Selcen D, Engel AG. Myofibrillar myopathy caused by novel dominant
negative alpha B-crystallin mutations. Ann Neurol 54: 804 – 810, 2003.
Smith TPL, Grosse WM, Freking BA, Roberts AJ, Stone RT, Casas E,
Wray JE, White J, Cho J, Fahrenkrug SC, Bennet GL, Heaton MP,
Laegreid WW, Rohrer GA, Chitko-McKown CG, Pertea G, Holt I,
Karamycheva S, Liang F, Quackenbush J, Keele JW. Sequence evaluation of four pooled-tissue normalized bovine cDNA libraries and construction of a gene index for cattle. Genome Res 11: 626 – 630, 2001.
Spitz F, Salminen M, Demignon J, Kahn A, Daegelen D, Maire P. A
combination of MEF3 and NFI proteins activates transcription in a subset
of fast-twitch muscles. Mol Cell Biol 17: 656 – 666, 1997.
Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J,
Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB. A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA 101: 6062– 6067, 2004.
Suchyta SP, Sipkovsky S, Kruska R, Jeffers A, McNulty A, Coussens
MJ, Tempelman RJ, Halgren RG, Saama PM, Bauman DE, Boisclair
83