Scalable, High-Throughput, Single-Cell RNA-Seq

A scalable high-throughput method for RNA-Seq analysis
of thousands of single cells
Kevin Taylor 1, Lisa Watson1, Lucas Frenz 2, Doug Greiner 2, Ronald Lebofsky 2, Duc Do 2, Pallavi Shah 2, Pengchin Chen 2, Mary Ma 2, Bin Zhang 2, Preeti Pattamatta 2,
Leanne Javier 2, Joshua Mopas 2, Jennifer Chew 2, Sean Cater 2, Carolyn Reifsnyder 2, Felix Schlesinger 1, Irina Khrebukova1, Jay Patel1, Charles Lin1, Jeff Tsai1,
Valerie Montel1, Mariko Kellogg1, Andrej Hartnett1, Allison Yunghans1, Gary P. Schroth1, Jeremy Agresti 2
Abstract
omplex biological systems are fundamentally determined by
C
the coordinated functions of individual cells. The transcriptional
heterogeneity that drives this complexity is often masked by
conventional technologies that only provide bulk transcriptome
data. Although high-dimensional gene expression analysis
has been enabled by RNA-Seq, it is currently still a challenge
to generate thousands of single-cell NGS libraries in an
affordable, high-throughput, and user-friendly manner. To
truly deliver on the promise of single-cell biology, a robust
technology is required that enables controlled experiments with
multiple samples, treatment conditions, and time points.
Here, we present the Illumina Bio-Rad Single-Cell Sequencing
Solution. This new platform pairs Bio-Rad’s Droplet Digital™
Technology with Illumina NGS library preparation and analysis
technology to provide a comprehensive workflow for singlecell analysis. Single cells are individually partitioned into
subnanoliter droplets using a disposable cartridge on the
one-touch ddSEQ™ Single-Cell Isolator. The cartridge can
accommodate multiple samples, and multiple cartridges can
be processed in parallel to isolate tens of thousands of cells in
1. Illumina, Inc.
2. Digital Biology Center, Bio-Rad Laboratories, Inc.
a matter of minutes. Cell lysis and cell barcoding occur inside
individual droplets, and single-cell-barcoded RNA-Seq libraries
are subsequently prepared using Nextera® Technology. Data
analysis is conducted via BaseSpace Sequence Hub®, the
Illumina cloud-based genomics computing environment.
This droplet-based method is agnostic to mammalian cell
size, enabling unbiased profiling of diverse cell populations. Additionally, because the time from culture to lysis is on the
order of a few minutes, transcriptional signatures are not
affected by lengthy experimental workflows allowing for acute
transcriptional responses to be detected and tracked by time
course. This combination of a cost-effective, simple, and
fast workflow enables new types of single cell information to
be revealed by allowing users to analyze multiple samples in
parallel, under multiple treatment conditions and at multiple
time-points. We demonstrate reproducible interrogation
of single cell transcriptomes from multiple cell types. This
scalable, robust single-cell NGS sample prep methodology will
enable more researchers to apply the sensitivity and precision
of RNA-Seq to questions in single cell biology.
Illumina Bio-Rad Single-Cell Sequencing Solution
Fig. 1. Sample to answer workflow. The workflow leverages proven cell isolation using Droplet Digital technology with the ddSEQ Single-Cell Isolator, SureCell™ WTA 3’
Library Prep Kit with Nextera technology, Illumina sequencing, and BaseSpace NGS analysis.
Overview of 3’ RNA-Seq Assay
Second strand synthesis
TTTT
Bar Read 1
Code
Direct cDNA Nextera tagmentation
P7
Index
3 enrichment and sample indexing
P5
Sequencing-ready fragment
P7
Index Read 2
TTTT
Bar Read 1 P5
Code
Fig. 2. Overview of 3’ RNA-Seq assay. The SureCell WTA 3’ Library Prep Kit. Lysis and cell barcoding takes place in each droplet. Droplets are disrupted and cDNA pooled
for second strand synthesis in bulk. Libraries are generated with direct cDNA tagmentation followed by 3’ enrichment and sample indexing.
Methods
HEK293 cells and NIH3T3 cells (unless otherwise noted) were mixed at a 1:1 ratio, loaded across 4 sample chambers of a single
ddSEQ M Cartridge and encapsulated and barcoded by the ddSEQ Single-Cell Isolator. Barcoded transcripts were processed for single
cell sequencing using the SureCell™ 3’ WTA Library Prep Kit for the ddSEQ System and sequenced on the Illumina NextSeq® 550
sequencer. Sequencing results were analyzed using the SureCell™ RNA Single-Cell App.
Detection of Genes in a Heterogeneous Population of Cells
RNA-Seq Analysis of 1,384 Cells Using BaseSpace Single Cell App
Cells: 1384
Duplets: 5.8%
Purity: 99.1%
40,000
30,000
20,000
10,000
0
0
10,000 20,000
30,000
40,000 50,000
Number of unique transcripts aligned to hg19 genes
1.0
Cumulative fraction of genic UMI Count
Cumulative fraction of genic UMI Count
Number of unique transcripts
aligned to mm10 genes
50,000
0.8
0.6
0.4
0.2
0.0
0
2,000
4,000
6,000
8,000
Cell in descending order by genic UMI count
1e+5
1e+4
1e+3
1e+2
1e+1
1e+0
1e+0
1e+1
1e+2
1e+3
1e+4
1e+5
1e+6
Cell in descending order by genic UMI count
Fig. 3A. Two-species cell mixture demonstrates low crosstalk and high purity. Number of unique transcripts aligned to mouse (red) and human (blue) genome for each
cell barcode. Unique transcripts mapping to both human and mouse are shown in purple and represent doublets (left panel). Cumulated fraction of unique transcripts assigned
to cell barcodes in linear scale (middle panel) and log scale (right panel). The inflection point (knee) is used to determine the number of barcoded cells detected in the run.
Cell Cycle Analysis of Single Cells
cttacggccgtttttggg
caagtcagacccatactt
atataccatagaagcacg
ccacgccaccacttaaga
-0.0010
gactcggtcctatcatca
gctgagaagtatgatcaa
caaccgacggacatgaag
ggacgaacaaggtctagc
agatgtttcgcatggcag
cggcgtgggatcgagtga
cggtccgcttgtacggac
-30
-30
cttgaatggcagcataga
10
caccacgccagaaattgg
10
agtaaaagtaaagctgag
20
ctaggtcatagaatatac
20
aacagcgatcaattcgca
0.0010
atagcgccgtaattaaga
30
gcgaatttcgcacactgt
30
atacttgatcaaaagcca
0.0002
acggacttcgcagccaga
0.0002
agatgtcaccaccgcata
PC2 (2.62%)
0.0004
ccgtaacggtcccttacg
tSNE axis 2
0.0004
ccgtaaaacgtgacggac
agcacgcttgaatcggga
cactgtcggcgtgtccta
aagccacgagcataagct
10
agacccaagccaagatgt
-20
atagcgtgttcggactcg
ccgtaatcatcattcttg
ggacgattcgcaaagcca
-10
aacgtgcggcgtttgctc
0.0004
cgaaagttggatgggatc
0.0006
caagtctcatcagggatc
0
ccgatgactgcagctccc
gcgcggggccatctatta
accttcgtcctagatcaa
-0.0002
gaagggcgcataggtagg
PC2 (2.62%)
-0.0000
cggtccctcaatactgca
cgttattcgccttaagct
caaccgggacgatcagtg
ggccatccacgcagaccc
gaagggaagccagtttca
ctaggtttcttggttaac
gaaataagatgtctgtgg
atccggcatagacgaaag
atagcgaattgggcgaat
aattgggactcggcgaat
gcttgtgaagggtgcggt
cagactctgtggcaaccg
gtttcaactgcaccgatg
gcgaattatttcatccgg
cgaaaggtcggctgtgta
gccagaggtaggaagcca
0.0005
atccggtgctaactcaat
gcgaattggcagggtagg
cctctagaggcccttgaa
agtctgcctctacggtcc
tcatcaagtaaaagtaaa
gactcggtcctaagtaaa
gcgcggtgagacagatgt
gagtgacagactcttacg
taagctggtaggcggtcc
0
cttgaattaagacgaaag
gttaacccacgcaagtat
tccaagggtagggccaga
0.0000
acggaccatagatagagg
gtcctagtggtgtgctaa
ctgtggctaggttgagac
cgcataaacgtgaattgg
ggtgctcgaaagggtgct
atacttcttacggatcaa
gctgagctgtggcaggag
aagccatatttcgcgaat
-10
aggttacgcataacccaa
gcgaatgagtgataagct
gtacagtgctaaccgtaa
ctgtggtcagtggcttgt
-0.0005
ggccatgactcgagcacg
cggcgtaacgtgcataga
ctattacggtccgcgcgg
tccaagccacgcatagcg
caagtcacccaaccacgc
caaccgccgatgggattg
tcagtgcttacgagtaaa
atacttgctgagatccgg
tSNE axis 1
ggccatcaccacgccgtt
HEK293 / NIH3T3
(human/mouse)
ctgtggtgagacaacagc
-20
gcgaattgctaacataga
atgaagagtaaatgtgta
aaagaatccaagcttgaa
-0.0010
cgagcattggatcggcgt
caccacttgctcgggatc
cctctaccgatgtgtgta
attagtgatcaatgtgta
cgaaagcgaaagatgaag
gccgttttcttgggattg
atagcgcaccaccaagtc
ctgtggcatagaagcacg
caagtcagtctgtgctaa
-30
-30
atacttcgttatcaggag
gcgaatatactttgtgta
acccaagctccctaagct
0.0010
agtaaagctcccgagtga
aattggctaggttgttcg
aggttaagtaaatgcggt
-0.0006
tatttcttaagacgcata
-0.0006
atacttagtctggggatc
-0.0004
aacgtgcagactgtacag
-0.0004
cactgtaagtattttggg
acccaaggccatcactgt
10
ggccataacagcctgtgg
10
accttcaagtatgtacag
Total cells: 602
mm10 cells: 558
hg19 cells: 43
Sub-population: 7%
cgaaaggaggcccgagca
tSNE axis 2
30
actgcaagcacggggatc
-20
gccagatctagcgatcaa
taatagcttgaaggacga
gttaacaggttaaagcca
gtggtgggccatttggat
cggtcctcatcatcggga
-10
ggacgactaggtgttaac
0
cctctaattagttaagct
-0.0002
ctgtggcatagaaacgtg
-0.0000
ctattactattaagtaaa
PC2 (2.62%)
0.0002
taagctacccaataatag
acaaggtagaggctatta
gcgcggttggatgagtga
acaaggtaccgagccgtt
atatacagcacgcgcata
acggacttggatacggac
aagtataggttattaaga
gaaataagtctgcactgt
ggattggccagatggcag
taagctgtggtggccaga
ctaggtccacgctaagct
cgcataatactttatttc
gagtgactaggtagatgt
cagacttgaattcataga
ctcaattgctaagggatc
ctgtggcgagcaaggtta
cttgaatgcggtcaggag
caccacgctgagcaccac
cttacggggatcaggtta
agcacgtctagcctgtgg
0.0005
ctgtgggatcaaagtaaa
tatttcagtaaaagatgt
agaccctcgggaggtgct
cgcatacagactcgaaag
gactcggatcaacataga
0
cggtcccatagacgaaag
accttcgggatcggattg
caccacatatactggcag
ccacgctgcggtacggac
cttgaaagtctggtcggc
tagaggcagactgcgcgg
0.0000
agaccccatagaatatac
cgagcaatatacccacgc
cggtccttggatgctgag
gatcaaacccaagaggcc
gcgaatatagcgcactgt
-10
cttgaacatagatatttc
acccaaatccggcataga
aacagcgactcgactgca
aacagcccacgcgactcg
gagtgacttacgatactt
gccagagactcgccgtaa
gagcttcgagcaaagcca
-0.0010 -0.0005
ctaggtaggttacgagca
atagcgtcagtgcagact
cactgtgaaggggtggtg
atatacttcgcagactcg
cagactcagacttcagtg
caagtcttggatggccat
-20
caccacgaaggggtttca
aggttatgcggtgccgtt
ggccatacaaggttaaga
accttcctgtggccgatg
ccgatggctgagtttggg
gagtgatctagcaattgg
cggtccttcttggtcggc
cttacgtccaagaagtat
gctgaggcgaatcgttat
gcgcgggcttgtttggat
caccacctaggtactgca
aagtatggattgggacga
-30
-30
aagtattcgccttgtgta
cctctaacccaagccgtt
aagccagccagatgtgta
caggagcagacttgtgta
ctcaattaatagacccaa
cactgtatactttggcag
20
ggtgctcgcatacggtcc
gtcggccgcatacagact
gcttgtaaagaaccacgc
gaaataaaagaaacaagg
atgaagcaaccggcgaat
agcacgtaccgacgcata
aacgtgaacgtgcgagca
ctattaatacttagatgt
ctattatgagacgtcggc
tSNE axis 2
0.0006
0.0006
-0.0000
-0.0002
-0.0004
-0.0006
PC1 (66.22%)
PC1 (66.22%)
-0.0005
PC1 (66.22%)
-20
-10
0.0000
HEK293 / NIH3T3
(human/mouse)
hg19 RPL13
mm10 Rpl13
tSNE axis 1
tSNE axis 1
0
hg19 RPL13
mm10 Rpl13
0.0005
0.0010
Fig. 3B. PCA clustering of 1:1 mixture of mouse and human cells detects distinct population. (Left) PCA analysis of 1,384 cells from a 1:1 ratio mixture of HEK293 and
NIH3T3 cells using the Illumina BaseSpace single cell application. Cells color-coded by gene expression of human RPL13 gene (middle) or mouse Rpl13 (right).
0
-10
-20
10
Fig. 3C. T-SNE analysis identifies a sub-population in a heterogeneous cell mixture. A mixture of mouse cells spiked with human cells. The human cells (representing 7%
of the total cell population) are identified as a distinct cluster in t-SNE analysis based on gene expression profile (left). Cells color-coded by gene expression of human RPL13
gene (middle) or mouse Rpl13 (right) confirm the identity of the sub-population.
G1/S
G1/S
G2/M
S
S
G2/M
M/G1
M
M
M/G1
ExpressionExpression
LowAverage
AverageHigh
Low
High
Fig. 4. Heat map based on the cell cycle by counting unique cell cycle transcripts. Cell cycle state based on unique transcript counts of genes in each cell cycle,
normalized by total count for each cell for a mixture of HEK293 and NIH3T3 cells lines. Expression is centered by the median and scaled by the median absolute deviation for
each cell cycle.
Sensitivity of Gene Detection Across Varied Cell Lines
Sensitivity of gene detection across varied cell lines and at varied read depths
7,000
7,000
6,000
6,000
Mouse genes detected
Human genes detected
A
5,000
4,000
3,000
2,000
Rep 1
1,000
Rep 2
5,000
4,000
3,000
2,000
0
0
50
100
150
200
Rep 1
1,000
Rep 2
0
250
0
50
100
150
200
250
Reads Per Cell (x1000)
NIH3T3 Genes vs Reads Per Cell
Reads Per Cell (x1000)
HEK293 Genes vs Reads Per Cell
Fig. 5A. Replicate samples processed using a single cartridge were sequenced on NextSeq 550 and sequencing reads were sub-sampled to varied reads per cell
ranging from 25,000 reads to 200,000 reads per cell. The median genes detected per cell are plotted at each sequencing depth.
Gene count - Rep 2
106
10
5
104
103
y=-0.0015+0.98x
R2:0.975
NumGenes:10.911
102
101
101
102
103
104
105
106
Gene count - Rep1
Fig. 5B. Reproducibility of gene expression for two replicates. Linear regression
fit of gene counts (for genes with ≥ 50 counts) summed across all HEK293 cells,
from two samples processed on a single chip shows high reproducibility.
7,000
Median Genes Detected
C
B
6,000
5,000
4,000
3,000
A20
3T3
HEK
BJ
10
15
20
25
2,000
1,000
0
0
Average Cell Diameter, µm
Fig. 5C. Sensitivity of gene detection across a panel of cells of varied
diameter. The median genes detected per cell versus the cell diameter shows that
recovery of transcripts is not limited by cell size.
Conclusions
•• The Illumina Bio-Rad Single-Cell Sequencing Solution can reproducibly partition and analyze thousands of single cells in sub-nanoliter droplets from multiple cell lines in minutes with a simple protocol without pre-amplification.
•• Analysis of human and mouse cell line mixing experiments demonstrates the ability of this platform to distinguish cells in a heterogeneous population by gene expression profiles.
•• Robust chemistry allows for a high percentage assignment of transcripts to single cell barcodes in multiple cell lines.
•• Transcriptional variation can be measured in single cells by analyzing changes in cell cycle gene expression.
•• High sensitivity of gene expression is detected across a number of cell types and is not impacted by cell diameter
Visit www.bio-rad.com/ddSEQ for more information.
For Research Use Only. Not for use in diagnostic procedures.
© 2017 Illumina, Inc. | Bio-Rad Laboratories, Inc. All rights reserved.
Pub No. 1070-2016-013
Bulletin 6883 Ver B 16-1124 1116