Quantitative assessment of the importance of dye switching and

brief communication
Physiol Genomics 14: 199–207, 2003.
First published June 10, 2003; 10.1152/physiolgenomics.00143.2002.
Quantitative assessment of the importance of dye switching
and biological replication in cDNA microarray studies
Mingyu Liang,1 Amy G. Briggs,1 Elizabeth Rute,1
Andrew S. Greene,1,2 and Allen W. Cowley, Jr.1
1
Department of Physiology and 2Biotechnology and Bioengineering Center,
Medical College of Wisconsin, Milwaukee, Wisconsin 53226
Submitted 24 October 2002; accepted in final form 3 June 2003
experimental design; Pearson correlation coefficient; outlier
concordance; Northern blot; gene expression
an increasingly important
technique for high-throughput measurement of mRNA
expression. A cDNA microarray experiment typically
involves labeling mRNAs from two samples being compared with different fluorescent dyes such as Cy3 and
Cy5. The two samples are then hybridized together to
a microarray containing cDNA probes for thousands of
CDNA MICROARRAY HAS BECOME
Article published online before print. See web site for date of
publication (http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: M. Liang,
Dept. of Physiology, Medical College of Wisconsin, 8701 Watertown
Plank Road, Milwaukee, WI 53226 (E-mail: [email protected]).
genes. The ratio between the fluorescent intensities of
Cy3 and Cy5 at each spot provides a measure of the
relative expression level of this gene between the samples examined. Although several variations of DNA
microarray techniques have been introduced and the
application of them has been diversified, cDNA microarrays utilizing such a two-color hybridization
method, as described originally by Schena et al. (16),
remain one of the most widely used methods.
The potentially enormous power of the cDNA microarray technique and its inherent complexity has
motivated a large number of experiments and analyses
studying various aspects of the technique, particularly
the preparation of arrays and samples and the analysis
of data (14). As cDNA microarray is being incorporated
into more physiologically oriented studies involving
multiple factors and naturally existing variability, experimental design also needs to be rigorously addressed (2, 19). Two particularly important issues in
experimental design are the use of dye switching and
biological replication. Due to the physiochemical differences between fluorescent dyes Cy3 and Cy5, it is
suspected that they might cause systematic bias in the
ratios generated. Random variations in the handling of
the two samples or the scanning of the two fluorescent
channels could also result in ratio bias. In addition to
normalization between the two dyes (14, 15), a commonly used approach to correct any residual dye bias is
to repeat the hybridization, with Cy3 and Cy5 switched
between the two samples being compared. Biological
replication, in which several independent individuals
are analyzed in a study, is a standard practice in
physiological experiments because of the well-known
variability between individuals. Although both dye
switching and biological replication are intuitively beneficial for cDNA microarray studies, one of the drawbacks is that these procedures substantially increase
the costs of these already expensive experiments, further limiting the ability of a laboratory to use cDNA
microarrays. These procedures also further increase
the complexity of the experimental design and the data
structure, posing even greater challenges for data analysis. Therefore, the practical question becomes, to what
extent a cDNA microarray experiment can benefit
from dye switching and/or biological replication, i.e.,
whether the benefits are great enough to justify the
additional costs and the increased complexity.
1094-8341/03 $5.00 Copyright © 2003 the American Physiological Society
199
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 15, 2017
Liang, Mingyu, Amy G. Briggs, Elizabeth Rute, Andrew S. Greene, and Allen W. Cowley, Jr. Quantitative
assessment of the importance of dye switching and biological
replication in cDNA microarray studies. Physiol Genomics
14: 199–207, 2003. First published June 10, 2003; 10.1152/
physiolgenomics.00143.2002.—Dye switching and biological
replication substantially increase the cost and the complexity
of cDNA microarray studies. The objective of the present
analysis was to quantitatively assess the importance of these
procedures to provide a quantitative basis for decision-making in the design of microarray experiments. Taking advantage of the unique characteristics of a published data set, the
impact of these procedures on the reliability of microarray
results was calculated. Adding a second microarray with dye
switching substantially increased the correlation coefficient
between observed and predicted ln(ratio) values from 0.38 ⫾
0.06 to 0.62 ⫾ 0.04 (n ⫽ 12) and the outlier concordance from
21 ⫾ 3% to 43 ⫾ 4%. It also increased the correlation with the
entire set of microarrays from 0.60 ⫾ 0.04 to 0.79 ⫾ 0.04 and
the outlier concordance from 31 ⫾ 6% to 58 ⫾ 5% and tended
to improve the correlation with Northern blot results. Adding
a second microarray to include biological replication also
improved the performance of these indices but often to a
lesser degree. Inclusion of both procedures in the second
microarray substantially improved the consistency with the
entire set of microarrays but had minimal effect on the
consistency with predicted results. Analysis of another data
set generated using a different cDNA labeling method also
supported a significant impact of dye switching. In conclusion, both dye switching and biological replication substantially increased the reliability of microarray results, with dye
switching likely having even greater benefits. Recommendations regarding the use of these procedures were proposed.
200
DYE SWITCHING AND BIOLOGICAL REPLICATION IN MICROARRAYS
In the present analysis, we took advantage of the
unique characteristics of a published microarray data
set that was generated in a physiologically oriented
context (9), and we developed several algorithms to
quantitatively assess the importance of dye switching
and biological replication. Guidelines for designing
cDNA microarray experiments were proposed based on
this analysis.
METHODS
Characteristics of the Data Set Used
Fig. 1. A: experimental design of the data set utilized in the present study. Four groups, each containing three
individual rats, were subjected to four comparisons. Each comparison involved three different pairs of rats, each
examined by both forward and reverse labeling. SSLS, Dahl salt-sensitive rats on a low-salt diet; SSHS, Dahl
salt-sensitive rats on a high-salt diet; 13LS, consomic SS.BN13 rats on a low-salt diet; 13HS, consomic SS.BN13
rats on a high-salt diet. B: subsets of microarrays with different combinations used in the present analysis. The
three numbers in the abbreviation shown in parentheses represent number of arrays, number of pairs of rats, and
number of labeling directions (forward or reverse), respectively. Data subsets generated from one of the four
comparisons are shown as examples.
Physiol Genomics • VOL
14 •
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 15, 2017
A previously published microarray data set (9) was utilized
for this analysis, in which cDNA microarrays were used to
identify genes in the rat renal medulla associated with the
development of Dahl salt-sensitive hypertension. A custommade microarray containing cDNA probes for ⬃2,000 genes,
representing ⬃80% of all currently known rat genes, was
used. Microarray hybridization was carried out with the
widely used direct, two-color, Cy3 and Cy5, labeling method.
A custom-designed data analysis method was used to screen
for reliable data points and to adjust signal intensity, correct
background, and calculate and normalize natural log-transformed ratios. Details of these procedures were described
previously (9). Renal medullary mRNA expression profiles
were compared in four groups of rats, Dahl salt-sensitive rats
on a low-salt (SSLS) or high-salt (SSHS) diet, and consomic,
salt-insensitive SS.BN13 rats on a low-salt (13LS) or highsalt (13HS) diet, using a loop-like, four-way comparison experimental design with a total of 24 microarrays. As depicted
in Fig. 1A, three pairs of individual rats were compared in
each comparison between two groups of rats (i.e., biological
replication), and each pair of rats was examined with both
forward and reverse labeling (i.e., dye switching). This design
enabled the evaluation of contributions of biological replication and dye switching separately or in combination. Moreover, 20 randomly selected genes were further analyzed with
Northern blots, providing one of the largest sets of validation
data in the microarray literature, although still limited from
a data analysis point of view.
A second data set from a study by Yuan et al. (21) was used
to test whether the conclusions drawn from the main data set
described above could be applied to other cDNA labeling
methods. The study by Yuan et al. (21) used arrays similar to
those used in the study described above but utilized a different labeling method using the commercially available TSA
Labeling and Detection Kit (MICROMAX; NEN Life Science
Products, Boston, MA). The TSA method (21) involved labeling the reverse transcription products generated from total
RNA with fluorescein or biotin and subsequent antibodymediated deposition of Cy3 and Cy5. Data from eight arrays
(designated A1 to A8) examining eight control samples (designated C1 to C8) and eight treated samples (designated T1
to T8), each extracted from an individual rat, were analyzed.
The dye-labeling pattern was as follows: in A1 to A4, the
control samples (i.e., C1 to C4) were labeled with Cy3, and
the treated samples (i.e., T1 to T4) were labeled with Cy5; in
A5 to A8, the labeling was reversed, i.e., the control samples
were labeled with Cy5, and the treated samples were labeled
with Cy3.
DYE SWITCHING AND BIOLOGICAL REPLICATION IN MICROARRAYS
Identification of Outliers Using an Intensity-Dependent,
Continuous Curve of Threshold
Generation of Data Subsets to Separate the Impact of Dye
Switching and Biological Replication
To evaluate the impact of dye switching and/or biological
replication, we divided data from each of the four comparisons into several subsets of data in six different combinations
as shown in Fig. 1B. The combination of “1 array, 1 pair of
rats, 1 way of labeling” (1-1-1) constituted a baseline condition where neither dye switching nor biological replication
was utilized. The combination of “2 arrays, 1 pair of rats, both
ways of labeling” (2-1-2) utilized dye switching when the
second array was added, whereas the combination of “2
arrays, 2 pairs of rats, 1 way of labeling” (2-2-1) utilized
biological replication. Any changes in the reliability of microarray results in combinations “2-1-2” and “2-2-1” compared with “1-1-1” would reflect the impact of dye switching
and biological replication, respectively, in addition to the
impact of adding a second array itself. The combination of “2
arrays, 2 pairs of rats, 2 ways of labeling” (2-2-2) would
reflect the impact of simultaneous addition of dye switching
and biological replication in the second array. The combination of “4 arrays, 2 pairs of rats, both ways of labeling” (4-2-2)
reflected the impact of dye switching and biological replication when they were added sequentially, but also reflected
the impact of increasing the number of arrays to four. The
combination of “6 arrays, 3 pairs of rats, both ways of labeling” (6-3-2) added to the combination of “4-2-2” another
biological replicate with both ways of labeling. The ln(ratio)
values were averaged for each gene in each subset and used
for subsequent analyses. Note that in some combinations
such as “2-2-1” and “2-2-2,” a pair of rats had to be used in
more than one data subset to take advantage of a more
Physiol Genomics • VOL
14 •
complete coverage of the available data. As a result, not all
individual subsets of data in these combinations were completely independent of each other. Accordingly, conventional
statistical significance was not tested. Similar trends were
seen when only independent subsets were examined.
Quantification of the Impact of Dye Switching and/or
Biological Replication
Three indices were examined to assess the reliability of
results obtained from each combination described in Fig. 1B
and, thereby, to quantify the importance of dye switching
and/or biological replication.
Index 1: Consistency between observed ln(ratio) values and
ln(ratio) values predicted on the basis of the loop-like, fourway comparison design. With the loop-like four-way comparison design, ln(ratio) values for any given comparison could
be predicted based on ln(ratio) values from the other three
comparisons using the following formulas
Predicted ln(13HS/SSHS)
⫽ ln(13HS/13LS) ⫹ ln(13LS/SSLS) ⫺ ln(SSHS/SSLS)
Predicted ln(SSHS/SSLS)
⫽ ln(13HS/13LS) ⫹ ln(13LS/SSLS) ⫺ ln(13HS/SSHS)
Predicted ln(13HS/13LS)
⫽ ln(SSHS/SSLS) ⫹ ln(13HS/SSHS) ⫺ ln(13LS/SSLS)
Predicted ln(13LS/SSLS)
⫽ ln(SSHS/SSLS) ⫹ ln(13HS/SSHS) ⫺ ln(13HS/13LS)
For each combination of arrays shown in Fig. 1B, the
Pearson correlation coefficient and the concordance of outliers were calculated as measures of the consistency between
predicted and observed data. The Pearson correlation coefficient was calculated based on predicted ln(ratio) values and
observed ln(ratio) values of all available genes. The outlier
concordance, expressed as percentage, was calculated as [2 ⫻
M/(A ⫹ B)] ⫻ 100, in which A and B represented the numbers
of outliers identified from two data subsets being compared
(the predicted and the observed data in this case), and M
represented the number of overlapping outliers. The number
of outliers varied from one data subset to another but was
generally within the range of 30 to 60. Ideally, predicted
ln(ratio) values should be identical to observed ln(ratio) values. However, technical variance exists between any two
microarrays. Since the predicted ln(ratio) values were essentially the sum of ln(ratio) values from three microarrays (or
three sets of microarrays), the variance between predicted
ln(ratio) values and observed ln(ratio) values would be
greater than the variance that can be expected between any
two sets of microarrays. The ability of dye switching and/or
biological replication to reduce this composite variance,
therefore, provided a sensitive measure of their benefits.
Index 2: Consistency between results from subsets of microarrays and the entire set of microarrays. The Pearson
correlation coefficient of ln(ratio) values and the concordance
of outliers were calculated for each combination of arrays
shown in Fig. 1B (except the combination of “6-3-2”) compared with the entire set of arrays (i.e., the combination of
“6-3-2”). The ability of dye switching and/or biological replication to increase this consistency was used as a measure of
their benefits.
Index 3: Consistency between results from microarrays
and Northern blots. The Pearson correlation coefficient between microarray and Northern blot ln(ratio) values of 20
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 15, 2017
A criterion of two times the standard deviation of the
entire set of ln(ratio) values was used as the threshold to
identify differentially expressed genes (i.e., outliers) in the
original study (9). This criterion assumed that expressions of
the majority of genes remained unchanged under the experimental conditions examined. However, a large dispersion of
ln(ratio) values has been noticed at the lower range of signal
intensity, which gradually decreases as intensity increases.
Similar dispersion patterns were seen when identical samples were hybridized against each other (1), indicating that it
was a systematic technical artifact, rather than a biological
phenomenon. With data dispersed in this manner, when a
constant threshold such as two times the standard deviation
of the entire set of ln(ratio) values is applied, genes with
lower intensities have a higher probability of being identified
as outliers. To avoid this bias, an algorithm was developed to
generate an intensity-dependent, continuous threshold curve.
Genes were ranked according to their intensities and divided
into consecutive groups, each containing 50 genes. The average of normalized ln(ratio) values in each group was confirmed to be close to 0. The standard deviation of ln(ratio)
values as well as the average of ln(intensity) values in each
group was calculated. An equation was identified to describe
the relationship between two times the ln(ratio) standard
deviation of each 50-gene group with the corresponding average of ln(intensity). This equation was then used to calculate a ln(ratio) threshold at the ln(intensity) level of any
given gene. If the actual ln(ratio) of a gene exceeded the
calculated ln(ratio) threshold, then the gene was considered
an outlier. This threshold curve was refitted for each subset
of arrays as defined below since each data subset might
contain a different number of microarrays.
201
202
DYE SWITCHING AND BIOLOGICAL REPLICATION IN MICROARRAYS
genes was calculated for each combination of arrays (Fig. 1B)
as another index of the reliability of microarray results.
RESULTS
Fig. 2. A: an example of the intensity-dependent dispersion of ln(ratio). Yellow curves
indicate global, intensity-independent threshold of differential expression determined by
the two times of the ln(ratio) standard deviation of the entire array. Purple curves indicate the intensity-dependent, continuous
thresholds of differential expression determined by the linear regression equation obtained in B. B: the two times of the ln(ratio)
standard deviation of each 50-gene bin was
linearly correlated with the averaged ln(intensity) of each bin.
Physiol Genomics • VOL
14 •
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 15, 2017
In the study by Liang et al. (9), microarray data were
normalized by shifting the mean ln(ratio) of an array to
0. Other normalization methods, such as intensitydependent normalization (LOWESS correction), pinby-pin normalization, and scaling, have been described
and shown to be beneficial (20). Therefore, the necessity of applying these methods to the data set used in
this analysis was examined. The plots of ln(ratio) vs.
averaged ln(intensity) were created for eight microarrays (two from each of the four comparisons). None of
these arrays exhibited the typical, intensity-dependent
deviation of ln(ratio) from 0 (the “Nike swoop” shape)
that would constitute the basis for the LOWESS correction. The plots were generally symmetrical around
the horizontal axis. An example of this plot can be
found in Fig. 2A. Variations between the four printing
pins also appeared to be small. The pin-to-pin coefficient of variance of the number of outliers (based on a
threshold set for the entire array) was 0.19 ⫾ 0.06 (n ⫽
8), and that of the standard deviation of ln(ratio) [reflecting the range of ln(ratio) in each pin] was 0.10 ⫾
0.01 (n ⫽ 8). The array-to-array coefficient of variance
of the standard deviation of ln(ratio) was 0.23. Therefore, it appeared that the benefit of applying additional
normalization methods would be minimal in this particular data set, especially if the tradeoff of applying
additional normalizations (i.e., the potential to compromise the validity of the assumptions underlying normalizations) was taken into consideration. These results support the notion that substantial differences
exist among various array platforms and that normalization methods should be chosen based on the characteristics of specific data sets.
Figure 2A depicts a typical distribution of ln(ratio)
values over ln(intensity) values in a cDNA microarray
hybridization. The dispersion of ln(ratio) values decreased as the ln(intensity) increased. If two times the
standard deviation of the entire set of ln(ratio) values,
0.588, was used as the threshold of differential expres-
sion (the yellow lines in Fig. 2A), then a disproportionally
large number of genes at the lower intensity range would
be identified as differentially expressed. When two times
the standard deviation of ln(ratio) values in each 50-gene
bin was plotted against the averaged ln(intensity) of the
bin, a linear relationship was revealed (Fig. 2B). The
linear regression equation, ln(ratio) ⫽ ⫺0.10 ⫻ ln(intensity) ⫹ 0.36, with a Pearson correlation coefficient of
⫺0.78, was then used to calculate a threshold ln(ratio) for
each gene based on its ln(intensity). These threshold
ln(ratio) values formed continuous lines shown in purple
in Fig. 2A. Differentially expressed genes (i.e., outliers)
identified in this way were used in the following calculation of outlier concordance.
The Pearson correlation coefficient (r) between observed ln(ratio) values and predicted ln(ratio) values
based on subsets of microarrays, each containing a
single microarray (the combination “1-1-1,” Fig. 1)
was 0.38 ⫾ 0.06 (n ⫽ 12, Fig. 3A), and the outlier
concordance was 21 ⫾ 3% (n ⫽ 12, Fig. 3B). Adding
a second array examining the same pair of rats, but
with a reverse labeling (the combination “2-1-2”),
substantially increased the correlation coefficient to
0.62 ⫾ 0.04 (n ⫽ 12) and the outlier concordance to
43 ⫾ 4% (n ⫽ 12). When a second array was added to
examine a different pair of rats with the same way of
labeling (the combination “2-2-1”), the correlation
coefficient was similarly increased to 0.62 ⫾ 0.03
(n ⫽ 12), while the outlier concordance increased to
35 ⫾ 3% (n ⫽ 12). Adding a second array examining
a different pair of rats with a reverse labeling (the
combination “2-2-2”) did not increase the correlation
coefficient (0.38 ⫾ 0.08, n ⫽ 12) and only slightly
increased the outlier concordance to 26 ⫾ 4% (n ⫽
12). Increasing the number of arrays to four or six to
include two or three pairs of rats, each examined
with forward and reverse labeling (combinations “42-2” or “6-3-2,” n ⫽ 4 each), resulted in greater
increases in the correlation coefficient that reached
0.69 ⫾ 0.04 or 0.79 ⫾ 0.03. In addition, the outlier
concordance was increased to 52 ⫾ 4% or 56 ⫾ 4%.
An example of the correlation for each combination is
shown in Fig. 3, C–H.
DYE SWITCHING AND BIOLOGICAL REPLICATION IN MICROARRAYS
203
As shown in Fig. 4, A–G, the correlation coefficient
and outlier concordance between subsets of arrays and
the entire set of six arrays followed the same trend of
changes as the predictability described above. The exception to this was the combination of two arrays
examining two pairs of rats with one forward and the
other reverse labeling (“2-2-2”). The correlation coefficient and outlier concordance between the combination
“2-2-2” and the entire set of six arrays reached a level
similar to or slightly higher than that achieved when
two arrays examining one pair of rats with both ways
of labeling (“2-1-2”) were evaluated (Fig. 4).
Physiol Genomics • VOL
14 •
The correlation coefficients between array ln(ratio) values and Northern blot ln(ratio) values of 20 randomly
selected genes also followed a similar trend (Fig. 5).
To test whether these results were only associated
with the particular cDNA labeling method used in the
study by Liang et al. (9), a second data set (21) generated
using a different labeling method was analyzed (see
METHODS). Only the second consistency index (i.e., the
consistency between data subsets with the entire data
set) was calculated for this analysis due to the lack of a
loop-like design needed for the first index and the limited
number of Northern blots. Two types of data subsets were
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 15, 2017
Fig. 3. Effects of dye switching and/or biological replication on the consistency between observed and predicted
results. See METHODS for the calculation of predicted results. A: correlation coefficient (means ⫾ SE of data subsets
generated from the entire study; see the text for n numbers) between observed ln(ratio) values and predicted
ln(ratio) values. B: concordance (means ⫾ SE; see the text for n numbers) between the outliers identified from
observed ln(ratio) values and predicted ln(ratio) values. C–H: representative examples of correlation between
observed and predicted ln(ratio) values obtained from each combination of arrays; 1-1-1 ⫽ 1 array, 1 pair of rats,
1 way of labeling; 2-1-2 ⫽ 2 arrays, 1 pair of rats, forward and reverse labeling; 2-2-1 ⫽ 2 arrays, 2 pairs of rats,
1 way of labeling; 2-2-2 ⫽ 2 arrays, 2 pairs of rats, one with forward and one with reverse labeling; 4-2-2 ⫽ 4 arrays,
2 pairs of rats, forward and reverse labeling for each pair of rats; 6-3-2 ⫽ 6 arrays, 3 pairs of rats, forward and
reverse labeling for each pair of rats.
204
DYE SWITCHING AND BIOLOGICAL REPLICATION IN MICROARRAYS
created, 2-2-1 (2 arrays, 2 pairs of rats, 1 way of labeling)
and 2-2-2 (2 arrays, 2 pairs of rats, one with forward and
the other with reverse labeling). Four individual subsets
were created for each type. An example of 2-2-1 would be
the combination of arrays A1 and A2, whereas the combination of A1 and A5 would be an example of 2-2-2. So
the only difference between 2-2-1 and 2-2-2 was that
2-2-2 contained dye switching, whereas 2-2-1 did not.
Variations between different pairs of rats were random
and therefore should, on average, have equal impact on
“2-2-1” and “2-2-2”. When these subsets were compared
with the entire set of arrays, it was found that the
correlation coefficient was 0.76 ⫾ 0.05 for 2-2-1, and it
Physiol Genomics • VOL
14 •
was 0.89 ⫾ 0.02 for 2-2-2. The outlier concordance was
31 ⫾ 6% for 2-2-1 and 47 ⫾ 2% for 2-2-2.
DISCUSSION
Indices of Benefits
To quantitatively assess the benefits of dye switching and biological replication, one would ideally want
to compare microarray results (with or without dye
switching and/or biological replication) to a “gold standard” mRNA measurement method. This is, however,
practically difficult. None of the mRNA measurement
techniques currently available has been accepted as a
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 15, 2017
Fig. 4. Effects of dye switching and/or biological replication on the consistency between results from subsets of
microarrays and the entire set of six microarrays. A: correlation coefficient (means ⫾ SE; see the text for n
numbers) between ln(ratio) values from subsets of microarrays and the entire set of microarrays. B: concordance
(means ⫾ SE; see the text for n numbers) between the outliers identified from subsets of microarrays and the entire
set of microarrays. C–G: representative examples of correlation between ln(ratio) values obtained from each subset
of microarrays and the entire set of microarrays.
DYE SWITCHING AND BIOLOGICAL REPLICATION IN MICROARRAYS
“gold standard” by all biologists. Moreover, techniques
that have been used to validate microarray results,
such as Northern blotting and real-time PCR, are difficult to use at a throughput level high enough to allow
large-scale comparisons with microarray.
In the absence of a “gold standard,” it is still possible
to de-compose sources of variation and assess the relative contribution of each source to the overall variations (7, 18). The purpose of the present study, however, was to assess the impact of dye switching and
biological replication on the reliability of microarray
results. Reliability may or may not be equivalent to
reproducibility measured by variation, depending on
how these terms are defined. In the setting of biological
experiments such as the one analyzed in the present
study, reliability can be further defined as precision
(i.e., how precise the data reflect the subjects being
measured) and “generalizability” (i.e., how well the
conclusions derived from the measurement of a limited
number of subjects can be extrapolated to a larger
population).
In the present analysis, we took advantage of the
unique characteristics of a published data set (9) and
used the combination of three indices to assess the
impact of dye switching and biological replication on
the precision and/or generalizability of microarray results. Each index has advantages and disadvantages.
The predictability index was used to assess precision,
because if each measurement were a precise representation of the subject, then the measured and the predicted data would be identical. One could argue that
this index was in fact reflecting reproducibility in repeated measurements of a subject, which may or may
not be equivalent to precision. The measured and the
predicted data would be identical so long as the repeated measurements were reproducible, even though
they might not be precise. However, in the absence of a
“gold standard,” reproducibility in repeated measurements does provide a reasonable indication of precision. An advantage of this index is that it is free of any
assumptions regarding the benefits of dye switching or
biological replication. The disadvantage is that it does
not reflect generalizability. The use of the consistency
with the entire set of arrays had the disadvantage of
Physiol Genomics • VOL
14 •
assuming qualitative benefits of dye switching and
biological replication because both procedures were
utilized in the entire set of arrays. However, so long as
this assumption was acceptable, the relative ability of
dye switching and/or biological replication in each subset of arrays to bring the results closer to the entire set
of arrays would provide a straightforward measure of
the quantitative benefits of dye switching and/or biological replication in extrapolating the results to the
whole population, i.e., the generalizability of the results. The obvious advantage of the comparison with
Northern blots was the use of an independent second
technique, and it could reflect both precision and generalizability. The disadvantage was the number of
genes for which both microarray and Northern blot
data were available was limited, reducing the power of
this index. In addition, because of the lack of a “gold
standard,” one could always question the relative reliability of microarray vs. Northern blot. Therefore, despite the limitations of each index, the three indices
appear to complement each other. Consistent trends
observed in more than one of them would provide a
strong indication of improvements in data reliability.
Relative Benefits of Dye Switching and
Biological Replication
One of these consistent trends was the improvement
of all three indices when a second array was added
using the reverse labeling to examine the same pair of
rats (i.e., dye switching). A 63% increase in correlation
coefficient and a doubling of outlier concordance between observed and predicted data were obtained. Similar improvements were found when comparing between subsets of arrays and the entire set of arrays.
The data set available did not allow quantitative distinction between the effect of dye switching and the
effect of simply adding a second array. However, the
improvement in consistency that was observed very
likely involved the benefits of dye switching, because
other combinations containing two arrays did not
achieve the same level of improvement. In fact, the
improvement achieved by adding a second array labeled in the same way but to examine a different pair
of rats (i.e., biological replication) was often less than
that obtained by dye switching. These results indicated
that both dye switching and biological replication improved the reliability of microarray results, with dye
switching likely having even greater benefits.
The ln(ratio) data used in these analyses had been
normalized by adjusting the mean ln(ratio) of each
array to 0 (9). It therefore appears that normalization
alone was not sufficient to remove the influence of the
dye difference. This was consistent with the remarkably strong effect of the dye difference on microarray
results, such as that reported by Jin et al. (5), supporting the notion that dye switching is required for obtaining reliable microarray results. The exact nature
and the mechanism underlying dye biases are not clear
at present. Further experiments and a deeper understanding of the physiochemical characteristics of the
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 15, 2017
Fig. 5. Effects of dye switching and/or biological replication on the
correlation coefficient (mean ⫾ SE; see the text for n numbers)
between ln(ratio) values of 20 genes obtained from microarrays and
Northern blots.
205
206
DYE SWITCHING AND BIOLOGICAL REPLICATION IN MICROARRAYS
Determining Thresholds of Differential Expression
Determining the threshold of differential expression
is a major issue in microarray studies. A fixed fold
change was widely used in earlier studies, often withPhysiol Genomics • VOL
14 •
out a convincing rationale. A standard deviation-based
threshold (9), predetermined P value threshold (5, 21),
corrected P values (17) and “null distribution”-based
approaches (4, 10) have also been applied. The intensity-dependent dispersion of ratios has been noted
previously (11). The intensity-dependent, continuous
threshold curve utilized in the present analysis was
similar to that developed by Mutch et al. (11), except
that a logarithmic function, instead of an inverse function, was used in the present analysis. Genes identified
as differentially expressed using this equation contained a more consistent representation of genes across
the entire range of ln(intensity) as shown in Fig. 2A.
It is important to point out that although dye biases
and intensity-dependent effects could be partially related, they are in essence two distinct problems. Furthermore, two types of intensity-dependent effects
need to be distinguished. One is the intensity-dependent dispersion of log-transformed ratios (i.e., a wider
dispersion of ratios at lower intensity levels), which we
observed in our data set and addressed by using the
threshold curve. The other is the intensity-dependent
deviation of log-transformed ratios from 0, i.e., the
“Nike swoop” shape (20), which we did not observe in
our data set.
Summary and Recommendations
The present analysis indicated that both dye switching and biological replication improved the reliability
of microarray results. Dye switching appears to yield
greater benefits. The selection of experimental design
is governed by scientific logic but can also be influenced
by practical issues such as the availability of materials
or resources. The results of this analysis argue against
sacrificing dye switching and biological replication for
the sake of reducing costs or experimental complexity.
Based on these analyses, we propose the following
guidelines for designing cDNA microarray experiments
when only a small, fixed number of microarrays is
available for a particular study. If the main purpose of
the experiment is to obtain estimates of the whole
population, then each array should be used to examine
a different pair of samples, with dyes reversed in half
of the pairs. If obtaining accurate measurements for
the samples examined is the main concern, then two
arrays with dye switching should be used to examine
each pair of samples. If both the generalizability and
the precision are desired, then the second design is
preferred because, compared with the first design, the
gain of precision appears quantitatively much greater
than the loss of generalizability. It is important to note
that these guidelines are developed based on physiologically oriented experiments using cDNA microarray
techniques described in the two studies analyzed (9,
21). Caution should be taken when applying these
guidelines to experiments with drastically different
characteristics or using other types of microarray techniques.
We gratefully acknowledge Meredith Skelton for critical review of
the manuscript, and the Microarray Group in the Department of
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 15, 2017
dyes and their binding kinetics are needed to address
these questions.
The importance of replication in microarray experiments has been emphasized (6, 8, 12). The present
analysis showed that biological replication, even when
applied in the absence of dye switching, also appeared
to have substantial benefits. The magnitude of the
impact of biological replication depends highly on the
level of naturally existing individual variability in each
specific experimental setting. To determine exactly
how many replicates are needed for a specific experiment, one would have to determine the variability level
of each gene of interest, the magnitude of expression
differences expected, and the statistical power desired.
Several studies have examined the “normal” variability of gene expression levels (3, 4, 10, 13), providing a
prototype of this kind of assessment.
When dye switching and biological replication were
included simultaneously in the second array added
(the combination of “2-2-2”), consistency with the entire set of arrays was substantially improved to a level
similar to or slightly higher than that achieved by the
combination of “2-1-2” (i.e., dye switching without biological replication). However, the predictability was
only minimally increased compared with a single array. This was perhaps a result of the different nature of
these two indices. The consistency with the entire set of
arrays essentially reflects the generalizability of the
results, that is, the ability to extrapolate the results to
the whole population. The predictability, on the other
hand, was an index of precision, that is, the accuracy in
the measurements of the samples being examined.
Compared with a single array, adding a second array
with reverse labeling and examining a different pair of
rats enhanced the resemblance of the combination
structure with the entire set of arrays. It thereby
increased the generalizability of the results. However,
the second array was used to examine a second pair of
rats and with reverse labeling. This, therefore, did not
improve the precision of the measurement of mRNA
levels in either pair of rats involved and did not substantially improve the predictability.
The improvement of index 2 by the inclusion of dye
switching was observed in a second data set generated
using a different cDNA labeling method (21). It is
important to keep in mind that this index assumes that
dye switching is qualitatively beneficial. Therefore, we
cannot use this index alone to draw conclusions regarding the benefit of dye switching. However, the fact that
this index performed differently for combinations 2-2-1
(without dye switching) and 2-2-2 (with dye switching)
supported the notion that dye-labeling patterns had an
effect on the results obtained using this labeling
method, which is consistent with the conclusion drawn
from the analysis of the data from Liang et al. (9).
DYE SWITCHING AND BIOLOGICAL REPLICATION IN MICROARRAYS
Physiology at the Medical College of Wisconsin for helpful discussion.
DISCLOSURES
This study was supported by National Heart, Lung, and Blood
Institute Grants HL-66579, HL-54998, and HL-29587.
Editor S. R. Gullans served as the review editor for this manuscript submitted by Editor A. W. Cowley, Jr.
REFERENCES
Physiol Genomics • VOL
14 •
9. Liang M, Yuan B, Rute E, Greene AS, Zou AP, Soares P,
McQuestion GD, Slocum GR, Jacob HJ, and Cowley AW
Jr. Renal medullary genes in salt-sensitive hypertension: a chromosomal substitution and cDNA microarray study. Physiol
Genomics 8: 139–149, 2002. First published January 2, 2002;
10.1152/physiolgenomics.00083.2001.
10. Liang M, Yuan B, Rute E, Greene AS, Olivier M, and
Cowley AW Jr. Insights into Dahl salt-sensitive hypertension
revealed by temporal patterns of renal medullary gene expression. Physiol Genomics 12: 229–237, 2003. First published December 10, 2002; 10.1152/physiolgenomics.00089.2002.
11. Mutch DM, Berger A, Mansourian R, Rytz A, and Roberts
MA. The limit fold change model: a practical approach for selecting differentially expressed genes from microarray data.
BMC Bioinformatics 3: 17, 2002.
12. Oleksiak MF, Churchill GA, and Crawford DL. Variation in
gene expression within and among natural populations. Nat
Genet 32: 261–266, 2002.
13. Pritchard CC, Hsu L, Delrow J, and Nelson PS. Project
normal: defining normal variance in mouse gene expression.
Proc Natl Acad Sci USA 98: 13266–13271, 2001.
14. Quackenbush J. Computational analysis of microarray data.
Nat Rev Genet 2: 418–427, 2001.
15. Quackenbush J. Microarray data normalization and transformation. Nat Genet 32: 496–501, 2002.
16. Schena M, Shalon D, Davis RW, and Brown PO. Quantitative monitoring of gene expression patterns with a complementary DNA microarray. Science 270: 467–470, 1995.
17. Slonim DK. From patterns to pathways: gene expression data
analysis comes of age. Nat Genet 32: 502–508, 2002.
18. Wang X, Ghosh S, and Guo SW. Quantitative quality control
in microarray image processing and data acquisition. Nucleic
Acids Res 29: E75, 2001.
19. Yang YH and Speed T. Design issues for cDNA microarray
experiments. Nat Rev Genet 3: 579–588, 2002.
20. Yang YH, Dudoit S, Luu P, and Speed TP. Normalization for
cDNA Microarray Data (Technical Report no. 589). Berkeley,
CA: Dept. of Statistics, Univ. of California at Berkeley, 2000.
21. Yuan B, Liang M, Yang Z, Rute E, Taylor N, Olivier M, and
Cowley AW Jr. Gene expression reveals vulnerability to oxidative stress and interstitial fibrosis of the renal outer medulla to
non-hypertensive elevations of angiotensin II. Am J Physiol
Regul Integr Comp Physiol 284: R1219–R1230, 2003. First published January 23, 2003; 10.1152/ajpregu.00257.2002.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 15, 2017
1. Amaral SL, Liang M, Rute E, Cowley AW Jr, and Greene
AS. cDNA microarray analysis of gene expression in skeletal
muscle angiogenesis after chromosomal substitution in Dahl S
rats (Abstract). Hypertension 40: 396, 2002.
2. Churchill GA. Fundamentals of experimental design for cDNA
microarrays. Nat Genet 32: 490–495, 2002.
3. Hsiao LL, Dangond F, Yoshida T, Hong R, Jensen RV,
Misra J, Dillon W, Lee KF, Clark KE, Haverty P, Weng Z,
Mutter GL, Frosch MP, Macdonald ME, Milford EL, Crum
CP, Bueno R, Pratt RE, Mahadevappa M, Warrington JA,
Stephanopoulos G, Stephanopoulos G, and Gullans SR. A
compendium of gene expression in normal human tissues.
Physiol Genomics 7: 97–104, 2001. First published October 2,
2001; 10.1152/physiolgenomics.00040.2001.
4. Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton
R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd
MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants
SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J,
Bard M, and Friend SH. Functional discovery via a compendium of expression profiles. Cell 102: 109–126, 2000.
5. Jin W, Riley RM, Wolfinger RD, White KP, Passador-Gurgel G, and Gibson G. The contributions of sex, genotype and
age to transcriptional variance in Drosophila melanogaster. Nat
Genet 29: 389–395, 2001.
6. Kerr MK and Churchill GA. Bootstrapping cluster analysis:
assessing the reliability of conclusions from microarray experiments. Proc Natl Acad Sci USA 98: 8961–8965, 2001.
7. Kerr MK, Martin M, and Churchill GA. Analysis of variance
for gene expression microarray data. J Comput Biol 7: 819–837,
2000.
8. Lee ML, Kuo FC, Whitmore GA, and Sklar J. Importance of
replication in microarray gene expression studies: statistical
methods and evidence from repetitive cDNA hybridizations. Proc
Natl Acad Sci USA 97: 9834–9839, 2000.
207