The human reticulocyte transcriptome

Physiol Genomics 30: 172–178, 2007.
First published April 3, 2007; doi:10.1152/physiolgenomics.00247.2006.
The human reticulocyte transcriptome
Sung-Ho Goh,1 Matthew Josleyn,1 Y. Terry Lee,1 Robert L. Danner,3
Robert B. Gherman,4 Maggie C. Cam,2 and Jeffery L. Miller1
1
Molecular Medicine Branch, 2Microarray Core Facility, National Institute of Diabetes,
Digestive and Kidney Diseases; 3Critical Care Medicine Department, Clinical Center,
National Institutes of Health; and 4National Naval Medical Center, Bethesda, Maryland
Submitted 9 November 2006; accepted in final form 26 March 2007
microarray; erythropoiesis; hemoglobin; malaria; iron
ERYTHROPOIESIS IS DEFINED by lineage commitment and proliferation of hematopoietic stem cells followed by terminal differentiation of erythroblasts into mature erythrocytes. After the
final cell division, enucleation marks the complete cessations
of erythroid transcription. For several days after enucleation,
the cells retain some RNA in their cytoplasm and enter the
circulation. Until that residual RNA is fully degraded, the cells
are referred to as reticulocytes.
Reticulocytes have been utilized for a variety of research
assays and other studies pertaining to erythroid biology (16).
As a laboratory reagent, reticulocyte lysate continues to be
useful for studies of in vitro translation (10). Reticulocytes also
provide an important reagent to study the effects of erythroid
gene manipulation in mice (7, 17). In the clinic, reticulocyte
enumerations are regularly performed in patients with anemia
or erythrocytosis. In some patients, erythroid diseases are
Article published online before print. See web site for date of publication
(http://physiolgenomics.physiology.org).
Address for reprint requests and other correspondence: J. L. Miller, Molecular Medicine Branch, National Institute of Diabetes, Digestive and Kidney
Diseases, National Institutes of Health, 10 Center Dr., Bldg. 10, Rm. 9B17,
Bethesda, MD 20892 (e-mail: [email protected]).
172
caused by dysregulated or abnormal production of proteins
during terminal differentiation. Terminal erythroid differentiation is manifest by many specialized processes including protection from oxidant damage, accumulation of hemoglobin,
cytoskeleton formation, and autophagocytosis of organelles.
After the final cell division and enucleation, reticulocytes
complete the phenotypic specialization required for transporting oxygen and survival in the absence of new gene activity.
Due to their unique position at the final stage of the erythroid
developmental pathway, reticulocytes are one of the most
functionally specialized cellular populations in humans.
The remnant mRNA contained in reticulocytes is hypothesized to encode a reservoir of information regarding in vivo
erythropoiesis as well as more general information regarding
postmitotic maturation of human cells in the absence of apoptosis. With the development of functional genomics, it has
become plausible to analyze gene expression among cellular
populations as entire transcriptomes rather than individual
transcripts. Hematology is ideally suited for this type of genome-based research based upon the relative ease with which
purified populations of hematopoietic cells may be isolated (5).
Unfortunately, the amount of information gained from the
reticulocyte transcriptome has been largely limited by the
abundance of globin gene transcripts. According to serial
analysis of gene expression among adult human reticulocytes,
⬎70% of sequenced transcript tags encoded the adult betaglobin gene even in the absence of alpha-globin transcripts (3).
Separate attempts to perform high-throughput sequencing of
reticulocyte mRNA were thwarted by the high percentage of
globin transcripts contained within reticulocyte expression libraries (J. L. Miller; unpublished observations). In this study,
oligonucleotide arrays were utilized to overcome this problem
to generate more robust profiles of the genes encoded by
reticulocyte mRNA. Reticulocyte transcriptomes at the newborn and adult stages of human development are presented and
compared.
MATERIALS AND METHODS
Preparation of reticulocyte RNA. All cells were collected according
to approved human subjects’ guidelines. The research studies were
performed at the National Institutes of Health under an exemption
from 45 CFR 46, which is in accord with the principles of Helsinki.
Acid citrate dextrose anticoagulated blood was collected from 14
healthy adult donors by venipuncture and 14 placental umbilical cords
at term delivery. All the samples were stored at 4°C for ⬍48 h prior
to RNA extraction. After the centrifugation at 200 g for 10 min, the
plasma and the buffy-coat layers were removed by aspiration. Next,
the packed red blood cells were diluted with 4 volumes of 1⫻ PBS
and filtered through two RCXL2 high-efficiency leukocyte reduction
filters connected in series (Pall). Platelets were removed by duplicate
low-speed centrifugation at 200 g at 4°C for 10 min. Special attention
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 18, 2017
Goh S-H, Josleyn M, Lee YT, Danner RL, Gherman RB,
Cam MC, Miller JL. The human reticulocyte transcriptome.
Physiol Genomics 30: 172–178, 2007. First published April 3,
2007; doi:10.1152/physiolgenomics.00247.2006.—RNA from circulating blood reticulocytes was utilized to provide a robust description
of genes transcribed at the final stages of erythroblast maturation.
After depletion of leukocytes and platelets, Affymetrix HG-U133
arrays were hybridized with probe generated from the reticulocyte
total RNA (blood obtained from 14 umbilical cords and 14 healthy
adult humans). Among the cord and adult reticulocyte profiles, 698
probe sets (488 named genes) were detected in each of the 28 samples.
Among the highly expressed genes, promoter analyses revealed a
subset of transcription factor binding motifs encoded at higher than
expected frequencies including the hypoxia-related arylhydrocarbon
receptor repressor family. Over 100 probe sets demonstrated differential expression between the cord and adult reticulocyte samples. For
verification, the array expression patterns for 21 genes were confirmed
by real-time PCR (correlation coefficient 0.98). Only four transcripts
(MAP17, FLJ32009, ARRB2, and FLJ27365) were identified as being
upregulated in the adult blood transcriptome. Further analysis revealed that the lipid-regulating protein MAP17 was present in the
membrane fraction of adult erythrocytes, but not detected in cord
blood erythrocytes. Combined with other clinical and experimental
data, these reticulocyte transcriptome profiles should be useful to
better understand the molecular bases of terminal erythroid differentiation, hemoglobin switching, iron metabolism and malarial pathogenesis.
173
RETICULOCYTE TRANSCRIPTOME
Table 1. Highly represented genes in cord and adult blood reticulocytes
Rank AB
Probe Set
Gene Title
1
2
3
4
5
6
7
8
9
10
11
1
2
8
20
4
3
5
17
19
27
18
211745_x_at
209116_x_at
204848_x_at
231078_at
200633_at
223649_s_at
200077_s_at
204466_s_at
240336_at
211560_s_at
223266_at
12
13
10
22
55705_at
221479_s_at
14
15
16
17
6
212869_x_at
226811_at
224752_at
221700_s_at
hemoglobin, alpha
hemoglobin, beta
hemoglobin, gamma A/gamma G
mitochondrial solute carrier protein, Mitoferrin
ubiquitin B
CGI-69 protein
ornithine decarboxylase antizyme 1
synuclein, alpha
hemoglobin, mu
aminolevulinate, delta-, synthase 2
amyotrophic lateral sclerosis 2 (juvenile)
chromosome region, candidate 2
hypothetical protein MGC16353
BCL2/adenovirus E1B 19 kDa interacting protein
3-like
tumor protein, translationally-controlled 1
hypothetical protein FLJ20202
Homo sapiens, clone IMAGE:5285814
ubiquitin A-52 residue ribosomal protein fusion
product 1
ferritin, light polypeptide
F-box only protein 7
nuclear receptor coactivator 4
18
19
20
42
12
25
14
213187_x_at
201178_at
210774_s_at
Gene Symbol
CB Geo Mean
AB Geo Mean
HBA
HBB
HBG1/ HBG2
MSCP (MFRN)
UBB
CGI-69
OAZ1
SNCA
HBM
ALAS2
ALS2CR2
440,019
352,016
339,713
292,261
267,557
263,117
231,649
214,052
190,356
171,838
171,426
574,210
459,172
232,207
124,047
356,773
359,287
253,334
130,634
126,459
89,844
127,352
MGC16353
BNIP3L
171,352
164,496
191,717
115,566
TPT1
FLJ20202
IMAGE:5285814
UBA52
147,376
145,293
136,496
135,863
250,092
20,949
58,932
180,789
FTL
FBXO7
NCOA4
130,600
125,467
124,879
108,991
156,995
13,373
CB, cord blood; AB: adult blood; GeoMean: geographic mean.
was made to process all samples in an equivalent manner. To check
the purification quality, complete blood counts with reticulocyte
before and after purification process (Supplemental Table S1) using
CELL-DYN 4000 (Abbott Diagnostics) were performed.1 Contaminating white blood cells were not seen on Giemsa-stained blood
smears. For RNA extraction, the purified samples were thoroughly
mixed with three volumes of TRIzol-LS (Invitrogen) and stored at
⫺80°C prior to processing. Batched samples were treated with
DNaseI to degrade residual genomic DNA, and the extracted RNA
was purified using RNeasy Mini columns (Qiagen). Prior to array
analysis, a consistent pattern of RNA molecular weight distribution
with distinct 28S and 18S rRNA peaks was detected using Agilent
Technologies 2100 Bioanalyzer (Agilent Technologies) in each of the
28 samples used for this study.
Microarray data analysis. Affymetrix HG-U133A and HG-U133B
microarray hybridizations were performed using 5 ␮g of total RNA
from each samples with one cycle of complementary RNA amplification according to the manufacturer’s protocol. After hybridization
and washing, the microarrays were scanned using MicroArray
Suite 5.0 software. The data were analyzed using Partek Pro 6.0
software (Partek) for statistical evaluation. Mean signal intensities
and the P value measurement of significance were calculated for each
probe set according to the software default settings. The probe sets
encoding highly expressed genes were compiled according to the
geometric mean values of signal intensities (Table 1 and Supplemental Table S2). In general, the genes identified in this manuscript are
defined according to Refseq reference terminology (http://www.
ncbi.nlm.nih.gov/projects/RefSeq).
The raw data for 28 samples were also imported into GeneSpring
7.2 (Agilent Technologies) according to the default settings. Genes
were scored as “present” or “absent” according to the MicroArray
Suite 5.0 software. For this study, differentially expressed genes were
identified according to the following strict criteria: 1) raw signal
intensity of the mean adult blood or cord blood ⬎1,000, 2) normalized
intensity difference between adult blood and cord blood of more than
fivefold difference, 3) P value ⬍0.0001, 4) standard deviation of
normalized intensities for each probe set within sample group
1
The online version of this article contains supplemental material.
Physiol Genomics • VOL
30 •
⬍0.0625, 5) gene distinguished as present in at least 7 of 14 adult
blood or cord blood samples according to the MicroArray Suite 5.0
software.
Other bioinformatic analyses. To analyze transcription factor binding site motif frequencies, a web-based Gene2Promoter software
application was used (http://www.genomatix.de). For this purpose, the
Affymetrix probe sets listed in Table 1 were queried to predict 67
human promoter sequences. Among those 67 sequences, 26 sequences
were identified from RefSeq gene entries or full-length open reading
frames (ORFs). The remaining 41 sequences (derived from expressed
sequence tags or comparative genomics predictions) were discarded.
Those 26 promoter sequences were searched for common transcription factor (TF) motifs (see Supplemental Table S5). The frequencies
of shared TF motifs among the 26 promoter sequences were calculated (%frequency ⫽ number of promoters containing the TF motif/
26 ⫻ 100). For comparison, the frequencies of TF motifs among all
the vertebrate promoters contained in the database were utilized. For
investigators wishing to perform analytical approaches that are not
described here, the original array data were deposited in the Gene
Expression Omnibus (http://www.ncbi.nlm.nih.gov/geo/) (accession
numbers: GSM143572–143599, GSM143671–143682, GSM143703,
GSM143706 –143716, GSM143718 –143721).
Quantitative real-time PCR. For confirmation of the microarray
results, quantitative PCR (QPCR) was performed on selected genes.
Pooled RNA samples from five adult vs. five cord blood donors were
utilized for the synthesis of cDNA. We amplified 50 ng of cDNA from
adult and cord blood reticulocytes with the 2⫻ TaqMan master mix
and 20⫻ Assay-on-Demand kit (Applied Biosystems), and the signal
was detected by ABI 7700 real-time PCR machine. Copy numbers
were derived from comparisons with the standard curve that was
obtained by the simultaneous amplification of a positive control
plasmid harboring a targeted amplicon using Sequence Detection
Systems 1.7a (Applied Biosystems). Each reaction was performed in
triplicate.
MAP17 protein expression. Protein from the membrane fraction of
erythrocytes was prepared according to published methods (21). For
the Western analysis, 15 ␮g of the erythrocyte membrane fraction was
electrophoresed, transferred to a membrane, and hybridized with
polyclonal MAP17 antibody (Novus Biologicals). The protein band
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 18, 2017
Rank CB
174
RETICULOCYTE TRANSCRIPTOME
was visualized using anti-rabbit-horseradish peroxidase secondary
antibody (Amersham, Piscataway, NJ). Human kidney lysate (ProSci)
was used as the positive control.
RESULTS
Physiol Genomics • VOL
30 •
Fig. 1. The expression levels of genes within and near the alpha- (A) and
beta-globin (B) gene loci. The raw signal intensities ⫻ 10(5) (y-axis) for probes
that hybridize to genes (defined here as genomic regions that are identified by
RefSeq-named sequences; x-axis). A full description of the genes, RefSeq ID,
Affymetrix probes, and their genomic locations is provided in Supplemental
Table S4. The expression levels are indicated according to the geometric mean
of 14 cord blood samples (CB, open bar) and 14 adult blood samples (AB,
solid bar) of microarray data.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 18, 2017
Purification of reticulocytes. A negative selection strategy
was utilized to obtain purified populations of erythroid elements circulating in peripheral blood. Leukocytes were removed by filtration, and platelets were depleted by repeated
low-speed centrifugation. Depletion of leukocytes and platelets
was confirmed to levels near the sensitivity limits of automated
counting as shown in Supplemental Table S1. While erythroid
cells were the only population visualized on the blood smears,
contaminating white blood cells were estimated as 0.001% of
the total cellular populations. Density gradient purification was
attempted separately, but contamination by white blood cells
was more pronounced (data not shown). Despite the depletion
of leukocytes and platelets, the percentage of reticulocytes in
the purified populations remained stable. As predicted (2), a
higher percentage of reticulocytes were detected in the cord
blood samples (3.3 ⫾ 0.9% for cord blood after purification vs.
2.2 ⫾ 1.2% for adult blood after purification, t-test P value ⫽
0.01) (Supplemental Table S1). Additional purification or separation of reticulocytes into subpopulations was not performed.
Transcripts detected at high levels. To generate a subset of
genes with transcripts detected at the highest levels among
remnant mRNA from cord and adult blood reticulocytes, the
mean signals were sorted for the 44,854 probe sets contained
on the microarrays. The 27 gene transcripts detected at the
highest levels (ranked among the top 20 according to mean
signal intensity from either cord or adult profiles) were compiled as shown in Table 1. A broader list containing the 100
transcripts with the highest mean signal intensities in reticulocyte mRNA from cord vs. adult blood is provided in Supplemental Table S2. For gene transcripts hybridized with multiple
probe sets, the probe set with the highest mean intensity is
shown. A complete list of probe sets encoding the globin genes
is provided in Supplemental Table S3. The alpha1 globin gene
was detected at the highest levels in both of the adult and cord
blood groups. Other globin gene transcripts identified as being
very highly represented in the transcriptome profiles included
beta-globin, gamma-globins and mu-globin. The high level of
beta-globin mRNA in cord blood reflects the relatively advanced stage of globin gene switching in reticulocytes at the
time of birth (11). Even though the gamma-globin genes were
detected at lower levels in adult blood, their signals were still
ranked 8th among the highly expressed genes. Interestingly,
the newly discovered alpha-like globin gene, mu-globin was
ranked 9th in cord blood and 19th in adult blood. Delta-globin
was ranked 24th and 48th in adult and cord blood, respectively
(Supplemental Table S2). Eight other genes (CGI-69, OAZ1,
ALAS2, BNIP3L, FTL, MSCP, TPT1, and GYPC) known to
be expressed during terminal erythroid differentiation or play
key roles in erythroid specialization were also identified here.
Several ribosomal proteins were also noted on the broader list.
FBXO7, UBB, and UBA52 encode genes that function in
protein ubiquitination. The polyamine regulator OAZ1, the
Parkinson’s disease-associated gene SNCA, the transcription
related factor NSEP1, and three uncharacterized transcripts
were also identified.
Genomic overview of the highly expressed genes. It is well
known that the globin transcripts are expressed from two
developmentally regulated groups of genes on chromosomes
11 and 16. The very high level of gene activity within those
genes may be related to insulated or protected chromosomal
domains (12, 13). To determine whether genes in the vicinity
of the globin loci possess similarly high-level gene expression,
the signal intensities for probes that hybridize to adjacent genes
(defined here as genomic regions defined by RefSeq ID) were
examined (Fig. 1). The developmentally predicted patterns of
high-level expression were observed for the alpha- and betaglobin gene clusters in the cord and adult blood samples.
However, comparably high levels of transcripts encoding the
surrounding genes were not detected. Analyses of other highly
abundant nonglobin transcripts from Table 1 demonstrated
similarly low probe intensity levels from adjacent genes (Supplemental Table S4).
Promoter motif comparison among highly expressed genes.
Based upon the shared pattern of very high level transcripts
among those genes shown in Table 1, it was hypothesized that
175
RETICULOCYTE TRANSCRIPTOME
these genes may possess shared mechanism(s) of transcriptional regulation. As an initial study in this regard, transcription
factor binding motifs were determined for the promoter regions
of the genes shown in Table 1. As described in MATERIALS AND
METHODS, 26 promoter sequences were identified using promoter prediction software from the RefSeq or full-length ORFs
corresponding to the Table 1 gene list. The frequencies of
particular transcription factor binding motifs within these promoters (number of promoters containing the TF motif/26 ⫻
100) were compared with the frequency of those motifs within
all the vertebrate promoters contained in the database (Genomatix MatBase vertebrate TF motif frequency). 143 TF families were determined to possess motifs within the highly
expressed gene promoters. Among these, the 15 TF families
with the highest frequencies within the 26 promoters are shown
in Fig. 2 (for a complete list, see Supplemental Table S5). With
the exception of the TATA binding motif (TBPF), the most
commonly identified motifs were overrepresented within this
group of 26 promoters compared with the frequencies predicted by the software for all vertebrate promoters. Among the
TF group, both EKLF and Tal1 (EBOX) binding motifs were
included. The most overrepresented TF motif was the AHRR
consensus binding motif (73.1% of the promoters from highly
expressed genes versus vs. 36.6% of vertebrate promoters).
The AHRR (arylhydrocarbon receptor repressor) family members include AHR, a transcription factor that may be involved
in hypoxia-related signaling through binding to arylhydrocarbon nuclear translocator (8). GATA binding motif was identified in 58% of the promoters of the genes described in Fig. 1
compared with 73% of vertebrate promoters identified by the
database (Supplemental Table S6).
Comparison of the cord and adult blood transcriptomes. In
addition to those transcripts detected at very high levels,
analyses of the entire data set were also performed according to
the MAS 5.0 software-defined absent or present algorithm;
1,285 and 806 probe sets were annotated as present in all cord
blood or in adult blood samples, respectively. The probe set
IDs are provided in Supplemental Table S7 and summarized by
the Venn diagram shown in Fig. 3. In total, 698 probe sets (488
Physiol Genomics • VOL
30 •
named genes) were identified as present in all 28 samples. An
additional 587 probe sets (454 named genes) were identified in
each of the cord blood samples but not among the entire group
of adult blood samples. In contrast, 108 probe sets (98 named
genes) were identified in each of the adult blood samples but
not consistently among the cord blood samples.
Based upon the large number of differences in the profiles
determined by absent or present annotations, a more quantitative approach was explored to determine genes that may be
differentially represented by the cord and adult blood transcriptomes. For this purpose, GeneSpring 7.2 software was utilized
to compare the signal intensities for each probe set. With the
application of filtering to identify probe sets with distinct
signals between the cord and adult blood samples (see
MATERIALS AND METHODS), a list of 107 probe sets encoding 94
genes was generated from 44,854 probe sets (Supplemental
Table S8). Those probe sets were clustered into a conditional
tree by unsupervised hierarchical clustering as shown in Fig. 4.
Only four probe sets demonstrated higher level expression
Fig. 3. Venn diagram of probe sets annotated as “present” among the 14 cord
blood and/or 14 adult blood samples. The number of probe sets are 1,285 for
cord blood-all present (gray ⫹ black) and 806 for adult blood-all present
(black ⫹ white). Among them, 698 probe sets were shared between the 2
groups and present in all 28 samples (black). A detailed probe list is provided
in Supplemental Table S7.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 18, 2017
Fig. 2. Frequencies of transcription factor (TF) binding
motifs among the promoters of the highly expressed genes.
26 promoter sequences identified for the highly expressed
genes shown in Table 1 were analyzed. The 15 most highly
represented motifs are shown (reticulocyte, black bars) vs.
all vertebrate promoters (gray bars). The frequencies of
those promoter motifs are represented as percentages on the
y-axis. The transcription factor families are shown on the
x-axis (see also Supplemental Table S5).
176
RETICULOCYTE TRANSCRIPTOME
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 18, 2017
Fig. 4. Heat map and conditional tree of 107 differentially
expressed genes between cord blood and adult blood reticulocytes generated by unsupervised clustering. Four genes were
upregulated (red) in adult blood, and the remainder were upregulated in cord blood. The normalized expression level scale
is shown at the bottom left of the heat map.
in the adult reticulocytes (219630_at-MAP17, 242957_atFLJ32009, 203388_at-ARRB2, and 238058_at-FLJ27365).
The vast majority of differentially expressed transcripts were
detected at higher levels in the cord blood samples. However,
the distinctly higher levels noted for those genes in Fig. 4 were
not detected by pangenomic comparisons. When all 44,854
probe sets were compared, the geometric means of the raw
signal intensities were 1,078 ⫾ 38 (mean ⫾ SE) for cord blood
and 1,037 ⫾ 43 (mean ⫾ SE) for adult blood. Functional
classifications of 107 differentially expressed probe sets were
performed using GoMiner (22) software to demonstrate that
Physiol Genomics • VOL
30 •
the encoded genes are associated with a wide range of cellular
and molecular functions (data not shown).
Validation of microarray data. For validation, 21 probe sets
were chosen according to the ratio of signals from cord blood
and adult blood arrays. The ratios were calculated using array
signal intensities, and these ratios were compared with the
copy number/ng cDNA ratio from real-time PCR (Fig. 5).
Three genes (ADIPOR1, CFTR, and ATP6V0C) with adult
blood/cord blood ratios between 1 and 2 demonstrated disparate results between the two measuring methods. The remaining
17 genes with array signal intensities ratios of ⬍0.5 or ⬎2
www.physiolgenomics.org
RETICULOCYTE TRANSCRIPTOME
demonstrated comparable ratios according to PCR analyses.
Among those 17 differentially expressed genes, the overall
correlation coefficient between adult blood/cord blood ratio
from microarray and real-time PCR was r2 ⫽ 0.978.
To further validate the transcriptome data at the protein level
and demonstrate the utility of this transcriptome data for
identifying novel proteins in circulating human erythrocytes, a
Western analysis of MAP17 was performed. MAP17 is a small
membrane protein thought to be involved in lipid regulation
and signal transduction (19). Membrane fractions were prepared from purified erythrocytes from six separate human
donors (3 cord, 3 adult). As shown, the increased level of
MAP17 gene expression detected by array and QPCR analyses
is also detected at the protein level. MAP17 protein was not
detected in cord blood erythrocyte membranes, but a MAP17
band was present in each of the adult erythrocyte membrane
samples (Fig. 6).
DISCUSSION
The data derived from this study provide the first genomewide comparison of the remnant mRNA contained in circulating reticulocytes at the fetal and adult stages of human development. Experimental approaches were chosen to minimize
nonerythroid RNA contamination. In addition, the use of 14
samples from each group (cord blood and adult blood) provided replicate profiles for analysis. The large percentage of
globin-encoded mRNA in reticulocytes may have reduced the
detection of genes expressed at very low levels. However,
PCR-based validations of the array-based data support the
notion that array-based patterns of gene activity provide an
accurate description of the reticulocyte transcriptome. Whether
reticulocyte transcriptome profiling represents a new clinical
tool for studying erythroid biology among individual patient
remains to be determined. Bioinformatics and analytical approaches were chosen here to furnish general descriptions of
the 1.26 million data points compiled in this initial study.
Physiol Genomics • VOL
30 •
Differences between the cord and adult reticulocyte transcriptomes were clearly seen. Interestingly, increased representation of differentially expressed genes was highly biased
toward the cord blood transcriptome. Since the methods used
for collection, storage, and processing of the 28 samples were
consistent, no obvious experimental or technical artifacts were
identified for these differences. Instead, biological differences
between cord and adult blood erythropoiesis and reticulocyte
release into the circulation may be involved. It is well recognized that cord blood contains less-mature reticulocytes than
adult blood (15). Therefore, the increased representation of
genes in cord blood reticulocytes may be related to the presence of those less-mature reticulocyte populations. Beyond
differences in the reticulocyte populations, the differential
representation transcripts may also reflect developmental
changes in erythroid gene expression (11) or mRNA stability.
Curiously, two poly(A) binding proteins (215823-x_atPABPC1, 222984_at-PAIP2) were identified as being more
abundant in cord blood.
Despite the validation of differential protein expression of
MAP17 among cord and adult human erythrocytes, the transcriptome profiles are not expected to completely and quantitatively reflect the protein content of reticulocytes. Differential
levels of mRNA and protein stability make the degree of
overlap between reticulocyte transcriptome and proteome profiles difficult to predict. For instance, the differential representation of glycophorin A among cord and adult blood profiles is
not borne out at the protein level (4). Furthermore, there are
transcripts present at very high levels in all the samples for
which the encoded proteins are difficult to detect. Perhaps the
best example of this phenomenon is portrayed by the newly
discovered mu-globin gene. Mu-globin mRNA is clearly detected in erythroid cells by array, Northern, and RT-PCR
assays. However, mu-globin protein chains or assembled hemoglobin molecules are not expressed at sufficient levels for
identification by standard chromatography techniques (5). Interestingly, mass spectroscopy techniques used for mature
erythrocyte proteome profiling recently provided the first evidence that the mu-globin protein exists in vivo (14). In these
and other examples, the data suggest that transcriptome and
proteome profiling provide complementary, rather than duplicate, descriptions of genome expression. Comparisons of transcriptome and proteome data gathered concurrently from the
same cellular samples may be necessary to most accurately
determine the level of overlap between these two approaches.
Ultimately, these transcriptome data should be integrated with
proteome and other high-throughput studies of human reticulocytes.
From the onset, the primary goal of this project was the
identification of a genetic treasure trove (available at http://
Fig. 6. Detection of MAP17 protein in adult erythrocytes. Erythrocyte membrane proteins from three cord blood (lanes 1–3) and three adult blood (lanes
4 – 6) samples were analyzed. Kidney lysate was used as a positive control (C).
Each lane was loaded with 15 ␮g of erythrocyte membrane protein lysate.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 18, 2017
Fig. 5. Validation of microarray data using quantitative real-time PCR
(Q-PCR). We selected 21 genes for real-time PCR for comparison with the
array-based data. The y-axis shows Log (AB/CB ratio) values for array signals
(array, black bars) vs. Q-PCR copy numbers (Q-PCR, gray bars). The gene
annotations are shown on the x-axis.
177
178
RETICULOCYTE TRANSCRIPTOME
ACKNOWLEDGMENTS
The Department of Transfusion Medicine at National Institutes of Health
(NIH) is acknowledged for assistance in the collection of blood samples, as
well as members of the National Institute of Diabetes and Digestive and
Kidney Diseases (NIDDK) Microarray Core Facility for assistance.
Physiol Genomics • VOL
30 •
GRANTS
This research was supported by the Intramural Research Program of the
NIH, NIDDK.
REFERENCES
1. Aerbajinai W, Giattina M, Lee YT, Raffeld M, Miller JL. The
proapoptotic factor Nix is coexpressed with Bcl-xL during terminal
erythroid differentiation. Blood 102: 712–717, 2003.
2. Bertil G. The anemias. In: Nelson Textbook of Pediatrics (17th ed.), edited
by Nelson WE. Philadelphia, PA: Saunders, 2004, p. 1605.
3. Bonafoux B, Lejeune M, Piquemal D, Quere R, Baudet A, Assaf L,
Marti J, Aquilar-Martinez P, Commes T. Analysis of remnant reticulocyte mRNA reveals new genes and antisense transcripts expressed in the
human erythroid lineage. Hematologica 89: 1434 –1440, 2004.
4. Gahmberg CG. External labeling of human erythrocyte glycoproteins.
J Biol Chem 251: 510 –515, 1976.
5. Goh SH, Lee YT, Bhanu NV, Cam MC, Desper R, Martin BM,
Moharram R, Gherman RB, Miller JL. A newly discovered human
alpha-globin gene. Blood 106: 1466 –1472, 2005.
6. Hanel P, Andreani P, Graler MH. Erythrocytes store and release
sphingosine 1-phosphate in blood. FASEB J 21: 1202–1209, 2007.
7. Koury MJ, Koury ST, Kopsombut P, Kopsombut P, Bondurant MC.
In vitro maturation of nascent reticulocytes to erythrocytes. Blood 105:
2168 –2174, 2005.
8. Lee K, Burgoon LD, Lamb L. Identification and characterization of
genes susceptible to transcriptional cross-talk between the hypoxia and
dioxin signaling cascade. Chem Res Toxicol 19: 1284 –1293, 2006.
9. Nagasaka H, Chiba H, Kikuta H, Akita H, Takahashi Y, Yanai H, Hui
SP, Fuda H, Fujiwara H, Kobayashi K. Unique character and metabolism of high density lipoprotein (HDL) in fetus. Atherosclerosis 161:
215–223, 2002.
10. Nakano T, Matsushima-Hibiya Y, Yamamoto M, Enomoto S, Matsumoto Y, Totsuka Y, Watanabe M, Sugimura T, Wakabayashi K.
Purification and molecular cloning of a DNA ADP-ribosylating protein,
CARP1, from the edible clam Meretrix lamarckii. Proc Natl Sci Acad USA
103: 13652–13657, 2006.
11. Oneal PA, Gantt NM, Schwartz JD, Bhanu NV, Lee YT, Moroney
JW, Reed CH, Schechter AN, Luban NL, Miller JL. Fetal hemoglobin
silencing in humans. Blood 108: 2081–2086, 2006.
12. Orkin SH. Globin gene regulation and switching: circa 1990. Cell 63:
665– 672, 1990.
13. Palstra RJ, Tolhuis B, Splinter E, Nijmeijer R, Grosveld F, de Laat W.
The beta-globin nuclear compartment in development and erythroid differentiation. Nat Genet 35: 190 –194, 2003.
14. Pasini EM, Kirkegaard M, Mortesen P, Lutz HU, Thomas AW, Mann
M. In-depth analysis of the membrane and cytosolic proteome of red blood
cells. Blood 108: 791– 801, 2006.
15. Paterakis GS, Lykopoulou L, Papassotiriou J, Stamulakatou A, Kattamis C, Loukopoulos D. Flow-cytometric analysis of reticulocytes in
normal cord blood. Acta Haematol 90: 182–185, 1993.
16. Rapoport SM. The Reticulocyte, Boca Raton, FL: CRC, 1986.
17. Rhodes MM, Kopsombut P, Bondurant MC, Price JO, Koury MJ.
Bcl-x(L) prevents apoptosis of late-stage erythroblasts but does not mediate
the antiapoptotic effect of erythropoietin. Blood 106: 1857–1863, 2005.
18. Sanna MG, Correia JS, Luo Y, Chuang B, Paulson LM, Nguyen B,
Deveraux QL, Ulevitch RJ. ILPIP, a novel anti-apoptotic protein that
enhances XIAP-mediated activation of JNK1 and protection against apoptosis. J Biol Chem 277: 30454 –30462, 2002.
19. Shaw GC, Cope JJ, Li L, Corson K, Hersey C, Ackermann GE,
Gwynn B, Lambert AJ, Wingert RA, Traver D, Trede NS, Barut BA,
Zhou Y, Minet E, Donovan A, Brownlie A, Balzan R, Weiss MJ,
Peters LL, Kaplan J, Zon LI, Paw BH. Mitoferrin is essential for
erythroid iron assimilation. Nature 440: 96 –100, 2006.
20. Silver DL, Wang N, Vogel S. Identification of small PDZK1-associated
protein, DD96/MAP17, as a regulator of PDZK1 and plasma high density
lipoprotein levels. J Biol Chem 278: 28528 –28532, 2003.
21. Takeuchi M, Miyamoto H, Sako Y, Komizu H, Kusumi H. Structure of
the erythrocyte membrane skeleton as observed by atomic force microscopy. Biophys J 74: 2171–2183, 1998.
22. Zeeberg BR, Feng W, Wang G, Wang MD, Fojo AT, Sunshine M,
Narasimhan S, Kane DW, Reinhold WC, Lababidi S, Bussey KJ, Riss
J, Barrett JC, Weinstein JN. GoMiner: a resource for biological interpretation of genomic and proteomic data. Genome Biol 4: R28. 2003.
www.physiolgenomics.org
Downloaded from http://physiolgenomics.physiology.org/ by 10.220.33.4 on June 18, 2017
www.ncbi.nlm.nih.gov/geo/) that might be used by the global
community to better understand the molecular bases of terminal erythroid differentiation, hemoglobin switching, iron metabolism, and malarial pathogenesis. It is predicted that the
gene expression patterns that underlie chromatin condensation,
enucleation, autophagocytosis, mitochondrial integrity, and
membrane specialization are contained within the transcriptome profiles. Several nonglobin transcripts were detected at
very high levels in reticulocytes from cord and adult blood
including ALAS2, ferritin, and glycophorin C. ALS2CR2 (also
known as ILPIP) is an apoptosis-related gene identified within
this select group of highly expressed genes (18). High-level
expression of another apoptosis related gene BNIP3L (Nix)
was also detected. Nix expression in erythroblast mitochondria
was reported previously (1). Another mitochondrial gene identified among the highly represented transcripts is MSCP (mitochondrial solute carrier protein, also known as mitoferrin).
Very recently, mitoferrin was determined to be essential for
erythropoiesis in the zebrafish model system (19). The presence of mitoferrin transcripts present at such high levels in
human reticulocytes suggests the importance of this gene for
erythropoiesis may be conserved in humans. The promoter
allocation of particular TF motifs including that of EKLF (Fig.
2) also suggests that these genes may also share a common
mode(s) of transcriptional regulation. The relative overrepresentation of AHRR binding motifs in the promoter regions of
these genes is of particular interest. Since hypoxia is of fundamental importance for the regulation of erythropoiesis, further exploration of the AHRR transcription factor in erythroid
cells is warranted.
Preliminary studies of MAP17 RNA and protein expression
were provided as an example of how this large body of novel
information may be useful for future studies of erythroid
biology. Despite decades of research regarding the composition of erythrocyte membranes, MAP17 expression in
erythrocytes was not reported. Remarkably, MAP17 is the
first erythrocyte membrane protein with a fetal-to-adult
pattern of increased expression in humans. Originally, MAP17
was identified as a small protein associated with PDZK1, and
it is thought to be involved in the regulation of plasma
high-density lipoprotein levels (20). The developmentally increased MAP17 expression in adult erythroid cells is an especially interesting finding, since the human fetal-to-adult transition is also associated with a major shift in plasma lipid levels
in humans (9). As such, this discovery provides an intriguing
clue that erythrocytes may be involved in plasma lipid regulation. Erythrocytes were also very recently shown to regulate
levels of sphingosine 1-phosphates in blood (6). Thus, our
discovery of regulated MAP17 expression during human ontogeny provides an excellent example of how this massive
dataset may facilitate the progress of new avenues of research.
It is predicted that many other novel and unexpected discoveries will be forthcoming as this clinical dataset is integrated
with experimental model systems and other sources of information regarding erythroid biology and disease.