Detection of Fetal Subchromosomal

Papers in Press. Published February 20, 2015 as doi:10.1373/clinchem.2014.233312
The latest version is at http://hwmaint.clinchem.org/cgi/doi/10.1373/clinchem.2014.233312
Clinical Chemistry 61:4
000 – 000 (2015)
Molecular Diagnostics and Genetics
Detection of Fetal Subchromosomal Abnormalities by
Sequencing Circulating Cell-Free DNA from Maternal
Plasma
Chen Zhao,1 John Tynan,1 Mathias Ehrich,2 Gregory Hannum,1 Ron McCullough,1 Juan-Sebastian Saldivar,1
Paul Oeth,1 Dirk van den Boom,2* and Cosmin Deciu1*
BACKGROUND:
The development of sequencing-based
noninvasive prenatal testing (NIPT) has been largely focused on whole-chromosome aneuploidies (chromosomes 13, 18, 21, X, and Y). Collectively, they account
for only 30% of all live births with a chromosome abnormality. Various structural chromosome changes, such as
microdeletion/microduplication (MD) syndromes are
more common but more challenging to detect. Recently,
several publications have shown results on noninvasive
detection of MDs by deep sequencing. These approaches
demonstrated the proof of concept but are not economically feasible for large-scale clinical applications.
METHODS: We present a novel approach that uses lowcoverage whole genome sequencing (approximately
0.2⫻) to detect MDs genome wide without requiring
prior knowledge of the event’s location. We developed a
normalization method to reduce sequencing noise. We then
applied a statistical method to search for consistently increased or decreased regions. A decision tree was used to
differentiate whole-chromosome events from MDs.
RESULTS: We demonstrated via a simulation study that the
sensitivity difference between our method and the theoretical limit was ⬍5% for MDs ⱖ9 Mb. We tested the performance in a blinded study in which the MDs ranged from 3
to 40 Mb. In this study, our algorithm correctly identified
17 of 18 cases with MDs and 156 of 157 unaffected cases.
CONCLUSIONS: The limit of detection for any given MD
syndrome is constrained by 4 factors: fetal fraction, MD size,
coverage, and biological and technical variability of the
event region. Our algorithm takes these factors into account
and achieved 94.4% sensitivity and 99.4% specificity.
© 2015 American Association for Clinical Chemistry
1
Sequenom Laboratories, 2 Sequenom Inc., San Diego, CA.
* Address correspondence to: C.D. at Sequenom Laboratories, 3595 John Hopkins Ct., San
Diego, CA, 92121. Fax 858-523-8719; e-mail [email protected]. D.V. at: •••.
E-mail [email protected].
Received September 25, 2014; accepted January 30, 2015.
Previously published online at DOI: 10.1373/clinchem.2014.233312
The discovery of circulating cell-free (ccf)3 fetal DNA in
maternal plasma (1, 2 ) has greatly enhanced research in
noninvasive prenatal testing (NIPT). In recent years,
several studies have demonstrated the performance of
fetal aneuploidy detection using massively parallel sequencing (MPS) (3– 8 ). Although detection of wholechromosome aneuploidy such as trisomy 13, 18, and 21
is becoming routine practice (8 ), subchromosomal abnormality detection remains challenging. A study published by Jensen et al. (9 ) demonstrated detection of a
3-Mb deletion on chr22q11.2 that causes the DiGeorge
syndrome. The results showed a statistically significant
reduction of standardized read count (z ⬍ ⫺3) in the
DiGeorge region for the affected samples. However, to
achieve statistical significance, relatively high coverage
was required with approximately 200 million reads per
sample. A study published by Srinivasan et al. (10 )
showed that deep sequencing allowed detection of several
small fetal copy number variations (CNVs). These
researchers partitioned the genome into 1-Mb bins and
tested statistical significance for individual bins to identify a CNV. A similar publication by Yu et al. (11 ) used
locally weighted scatterplot smoothing (LOESS) for GC
correction (median counts 144 million per sample) and
they required 3 consecutive bins having 兩z兩 ⬎ 3 to identify
a CNV. A major drawback of individual bin methods is
that they tend to produce many false positives (FPs) and
false negatives (FNs) (12 ) and the rules requiring that
multiple bins be significant are study specific. Recent
work by Straver et al. demonstrated improvements over
the individual bin method by applying a within-sample
sliding window approach (12 ). Their method detected
aberrations down to 20 Mb at relatively shallow sequencing coverage (0.15–1.66x). However, an attempt to further increase detection resolution resulted in an increased
number of FPs. Another recent study by Chen et al.
proposed a binary segmentation and dynamic threshold
3
Nonstandardabbreviations:ccf,circulatingcell-free;NIPT,noninvasiveprenataltesting;MPS,massively parallel sequencing; CNV, copy number variation; LOESS, locally weighted scatterplot
smoothing; FP, false positive; FN, false negative; MDs, microdeletions or microduplications; PCA,
principal component analysis; CBS, circular binary segmentation; CGH, comparative genomic hybridization; LOR, log odds ratio; NPP, nonpregnant female plasma; gDNA, genomic DNA; FISH,
fluorescence in situ hybridization; TP, true positive; LoD, limit of detection.
1
Copyright (C) 2015 by The American Association for Clinical Chemistry
strategy to detect CNVs ⬎10 Mb using low-coverage
sequencing data (13 ). Out of the 1311 tested cases, 3
affected samples with known G-banding karyotype results were correctly identified. Although the method
showed promising results on large CNVs, it is unclear
how it will perform on CNVs smaller than 10 Mb. Most
recently, Rampasek et al. presented a probabilistic
method for detecting fetal CNVs from maternal plasma
using a unified Hidden Markov model (14 ). With use of
deep sequencing and a trio analysis, the authors of this
study estimated that CNVs bigger than 400 kb could be
detected at 90% sensitivity if the fetal fraction is sufficiently high (13%).
In this report, we present a novel approach which
uses shallow whole-genome sequencing (approximately
0.2⫻ coverage) data to detect genome-wide microdeletions or microduplications (MDs) (also referred to in this
report as “events”) without requiring a priori knowledge
of an event’s location. Unlike previously published approaches, our focus is on improving the normalization
method to remove as much sequencing “noise” as possible upfront. Following that, a rigorous statistical method
is applied to detect consistently increased or decreased
regions in the normalized data without any individual
bin-level z-score rules. We show that our method
achieves the theoretical limit of detection for large MDs
by “spiking in” synthetic MDs in real sequencing data.
Finally, we test the performance of our algorithm in a
blinded study and demonstrate its performance.
The scripts and data files used in this paper are available at: https://sourceforge.net/projects/mddetection.
Methods
DATA NORMALIZATION
To remove sequencing bias, we developed a 2-step normalization approach. Briefly, reads were aligned to hg19,
allowing for only perfect matches within the seed sequence using Bowtie 2 (15 ). The genome was then partitioned into 50-kbp nonoverlapping bins and the raw
count for each bin was determined. After binning, regions with high variability or low mappability were excluded according to a previously established method (8 ).
To normalize the 50-kbp raw bin count, GC bias was
removed by employing a LOESS-based correction for a
sample-wise correction, similar to the one described in
Alkan et al. (16 ). Then, principal component analysis
(PCA) was used to remove higher-order artifacts for a
population-based correction (17, 18 ). To train the PCA
normalization step, we used a data matrix comprising
LOESS normalized bin counts from 1000 female pregnancies and transformed the data matrix into the principal component space. Then for a new sample, we built a
linear regression model between its LOESS normalized
bin count and the top principal components from the
2
Clinical Chemistry 61:4 (2015)
training set. The residuals of this model serve as final
normalized values for this given sample. Intuitively, the
top principal components represent noise commonly
seen in euploid samples, and therefore removing them
can effectively improve normalization. The combination
of the last 2 steps replaces the older, region-based normalization we previously published (8 ).
CIRCULAR BINARY SEGMENTATION
Our goal is to develop an algorithm for detection of MDs
regardless of their genomic coordinates. As such, the proposed algorithm needs to search for such events in a nontargeted fashion. To do so, we apply the circular binary
segmentation (CBS) method which has been extensively
used by the array comparative genomic hybridization
(CGH) community for CNV detection (19, 20 ). CBS
works by iteratively partitioning a chromosome into equal
copy number regions using the likelihood ratio statistic, and
it can pinpoint the change point precisely. Many studies
have demonstrated its superior performance over other
methods (21, 22 ). Although powerful, CBS tends to overly
partition the genome when the signal-to-noise ratio is low.
To compensate for oversegmentation we further applied a
segment-merging algorithm similar to the one described by
Willenbrock and Fridlyand (22 ).
QUANTIFYING THE STATISTICAL SIGNIFICANCE OF DETECTED
EVENTS
After CBS, each chromosome was partitioned into regions of equal copy number. To quantify the statistical
significance of the detected event, we developed 2 approaches. First, for each chromosome, the segment with
the largest area was treated as a potential MD, and its
z-score, zCBS, was calculated by comparing the summed
bin count with respect to a reference set of samples in the
same region (8 ). The whole-chromosome z score, zCHR,
was also calculated in the same way. Second, we calculated the log odds ratio (LOR) to quantify the likelihood
of the event being true at the measured fetal fraction, f
(see the LOR Section 1 in the Data Supplement that
accompanies the online version of this article at http://
www.clinchem.org/content/vol61/issue4
DECISION TREE FOR GENOME-WIDE ANEUPLOIDY OR MD
DETECTION
For each chromosome, 4 statistics can be inferred:
1. The z score and LOR for the largest segment partitioned by the CBS process: zCBS and LORCBS.
2. The whole-chromosome z score and LOR: zCHR and
LORCHR.
Note that the largest segment is defined to be the
segment with the largest area (height ⫻ length) out of all
the segments on a given chromosome. For example, in
Fig. 1B, there are 2 segments, and the second segment has
Noninvasive Detection of Fetal Microdeletions
Fig. 1. Bin count profiles (50 kb) for a sample with microduplication before and after normalization.
Blue color is the diploid region and maroon color is the microduplication region. The karyogram is also shown for each trace, where the coloring
scheme follows the UCSC genome browser coloring convention. Due to bin filtering, very few bins locate at the centromere regions (red
triangles). The x-axis is the coordinate of each bin in Mb. (A), Scaled raw bin count. (B), Normalized bin count. After normalization, the expected
bin count for the diploid region is 1.
the largest area. The latter represents the most significant
finding on the chromosome of interest. On the basis of
these 4 statistics, we propose the following decision tree
method for aneuploidy or MD detection:
where C is a predefined z-score cutoff that controls the
tradeoff between sensitivity and specificity. In this report,
we set C to be 3 for chromosome 21 and 3.95 for all the
other chromosomes. The threshold was selected on the
basis of our previous study on trisomy detection, in
which more than 99.9% sensitivity and specificity were
achieved for all 3 major aneuploidies (8 ). The comparison
兩zCHR兩 ⱖ a兩zCBS兩 allows one to distinguish a whole chromosome event from a subchromosomal event; in practice, we
find that a between 0.6 and 0.8 works well. The value was
determined by assessing its impact on the detection of in
silico– created MDs at different sizes and fetal fractions. See
the online Supplemental Material for detailed descriptions
of the selection of a. Because there might be multiple MD
events on the same chromosome, the decision tree process
moves to the next-largest segment on the chromosome until
there are no further significant segments.
libraries were created as previously described by Jensen
et al. (8 ). gDNA mixture models were created by normalizing gDNA and nonpregnant female plasma (NPP)
DNA libraries to 1.6 nmol/L. A starting mix of 20%
gDNA library was created in a background of NPP library
and subsequently used to create 17.5%, 15%, 12.5%, 10,
7.5%, and 5% mixtures by serial dilution. These were then
multiplexed, clustered, and sequenced on the HiSeq2000
for 36 ⫹ 7 cycles in a 12-plex format (see online Supplemental Material for detailed protocols).
A total of 183 clinical plasma samples were collected
using Investigational Review Board–approved clinical
protocols. Samples with abnormal G-band karyotype results collected through these clinical protocols were selected for sequencing to determine detection rates for
MDs. Euploid samples were chosen on the basis of normal karyotype reports using either fluorescence in situ
hybridization (FISH) or G-banding methods. Library
preparation and sequencing were conducted similarly to
that of gDNA samples.
Because the karyotype outcome was based on FISH
or G-banding, we anticipated that some of the “euploid”
samples contained MD events. In case of putative FP results
at 12-plex sequencing, we further sequenced the library at
uniplex to confirm its validity. If the results were confirmed
by uniplex sequencing, we counted the results as true positives (TPs) instead of FPs. Previous results have described
the limitations of traditional karyotyping and also compared
it with sequencing-based results (10 ).
GENOMIC AND PLASMA DNA SAMPLE PREPARATION
Results
1. A chromosome is classified to be trisomy or monosomy if:
兩zCHR兩 ⱖ C, LORCHR ⬎ 0, and 兩zCHR兩 ⱖ a兩zCBS兩;
2. A chromosome is classified to have microdeletion/
microduplication if:
It is not a trisomy or monosomy, and 兩zCBS兩 ⱖ C,
LORCBS ⬎ 0,
We conducted a blinded clinical evaluation study to test
the performance of the proposed MD detection algorithm. The study comprised 2 parts: an analytical validation using genomic DNA (gDNA) mixtures and a clinical validation using plasma samples.
Forty-five gDNA samples (130 ␮L of 30 ng/␮L)
were processed for library preparation, and sequencing
NORMALIZATION
Sequencing bias can cause nonuniform coverage in data,
which complicates downstream CNV analysis (23, 24 ).
Fig. 1 shows the normalization result on a 12-plex sequencing sample. Without normalization, many regions
would aberrantly appear as depletions or duplications
Clinical Chemistry 61:4 (2015) 3
Fig. 2. Four factors affect the detectability of microdeletions.
The plots follow the same convention as Fig. 1. The orange line represents the median value of the respective regions. (A), Low fetal fraction
(6.5%) vs high fetal fraction (21%). (B), Small-size MD (7.5 Mb) vs large-size MD (31.75 Mb). (C), Low coverage (15 M reads) vs high coverage
(180 M reads). (D), Bin count SD differs from region to region.
(Fig. 1A). After normalization, the microduplication
event became more evident (Fig. 1B).
MD LIMIT OF DETECTION
For subchromosomal abnormality detection, it is of fundamental importance to understand the limit of detection (LoD) for MDs. Four factors affect the LoD: fetal
fraction (f), size of the event, coverage, and the biological
and technical variability of the event region. The first 3
factors are easy to understand: it is easier to detect an
event with higher fetal fraction, larger size, and higher
sequencing coverage. The fourth factor is related to the
fact that certain regions are more variable than others due
to various factors (such as GC bias, repetitive elements,
and mapping ability) and are harder to detect. Fig. 2
4
Clinical Chemistry 61:4 (2015)
illustrates the impact of the 4 factors. Through a simulation study, we observed that at 0.2⫻ coverage, the sensitivity difference between the proposed decision tree
method and the theoretical limit was ⬍5% for MDs ⱖ9
Mb (see online Supplemental Fig. 1).
ANALYTICAL VALIDATION USING GENOMIC DNA MIXTURES
A blinded gDNA study was conducted to test the performance of the proposed decision tree method on several
selected syndromes with high clinical relevance. Due to
the rarity of these syndromes, we created a gDNA mixture model system for analytical validation. Forty-five
gDNA samples from individuals with DiGeorge, Cridu-chat, Prader-Willi, Angelman, or 1p36 deletion syndromes were obtained from Coriell Cell Repositories or
Noninvasive Detection of Fetal Microdeletions
Fig. 3. Predicted sensitivity (red cycles), 95% CI (red bars), and observed sensitivity (black crosses) for Cri-du-chat, Prader-Willi/
Angelman, and DiGeorge at the observed titration levels.
The plot is not shown for 1p36 due to limited sample size (n = 1), which is detectable until 9.3% titration.
from CombiMatrix Diagnostics. The source, disease, and
karyotype results of the gDNAs are provided in online
Supplemental Table 1. In total, 17 were positive for
Prader-Willi or Angelman syndrome, 14 were positive
for Cri-du-chat, 13 were positive for DiGeorge, and 1
was positive for 1p36. All 45 gDNA samples were sequenced once from a 20%–7.5% mixture, and they were
sequenced twice at a 5% mixture. A total of 360 gDNA
mixtures were sequenced.
Fetal fraction plays a crucial role in the detectability
of any given MD. We measured the observed fetal fraction by chromosome Y on male samples to confirm the
planned titration levels. Online supplemental Fig. 2
shows that the measured values agreed well with the
planned titrations, with slight overdilution. The median
measured titration concentrations were 18.3%, 16.0%,
14.0%, 11.6%, 9.3%, 7.1%, and 4.7% respectively.
Before the gDNA unblinding, we predicted the sensitivity to detect each of the syndromes using the LoD
framework. Briefly, the genomic location and size definition of each syndrome is acquired from the DECIPHER
database (25 ). A syndrome-specific microdeletion or microduplication is created by sampling the observed titration values shown in online Supplemental Fig. 2. After
unblinding, we found that the observed sensitivity for
detection of each of these 3 syndromes fell within the
95% CI, with the exception of the 4.7% gDNA fractions
(Fig. 3). Such high concordance suggests that the LoD
framework is predictive of the actual performance. We
note that the discrepancy at 4.7% can be mainly attributed to the size difference between the database size and
the tested cases, and that such differences are most obvious at low fetal fractions. For example, the Cri-du-chat
syndrome has a size of 12.52 Mb according to the
DECIPHER database (25 ), and our testing cases ranged
from 8 to 20 Mb, which made the detection easier at low
fetal fractions.
For the 1p36 sample, we were able to detect at 9.3%
titration and above. An example is shown in online Supplemental Fig. 3. We note that this 1p36 sample has a size
of only approximately 3 Mb, which is much smaller than
its database definition of 12.83 Mb (25 ). In real life
applications, we expect the overall sensitivity to be
higher.
Within this set of gDNA mixtures each gDNA had a
combination of karyotype and microarray results confirming the presence of an MD event for 1 of the 5
syndromes queried in this study. We used this information to determine specificity for the nonaffected syndrome regions in each of the gDNA samples. There was
no false-positive result by the decision tree algorithm for
any of the 360 gDNA mixtures. An overall specificity of
100% (95% CI, 98.7%–100%) was achieved in the
gDNA study. The high specificity value is paramount for
good positive predictive values, given the low prevalence
of these conditions.
CLINICAL VALIDATION USING PLASMA SAMPLES
To further test the performance of the proposed decision
tree method in a genome-wide fashion, we conducted a
blinded study using maternal plasma DNAs with
matched karyotype. For this study, 183 samples were
analyzed using 12-plex sequencing. Fetal fractions were
measured by the same method used in our current clinical
practice (26, 27 ).
On the completion of sequencing, 1 (1 ) affected
sample and 4 (4 ) euploid samples failed QC and were
subsequently excluded from analysis. Hence, we had a
total of 178 plasma samples with a median fetal fraction
of 9.2% (range, 2.8%–24.9%). We did not impose any
QC cutoff for fetal fraction. According to the karyotype
table, the expected MD sizes ranged from 3 to 40 Mb (see
online Supplemental Table 2).
Clinical Chemistry 61:4 (2015) 5
The algorithm detected 13 of the 16 affected samples. Of the 3 supposed FNs, one was expected to show
trisomy 8, one was expected to show a deletion of chromosome 4q34, and the third one was expected to show
unbalanced translocation between chromosome 12 and
chromosome 19. Subsequent detailed review of the first
karyotype reports indicated the trisomy 8 sample was a
low-level mosaic sample (approximately 8% cells affected). Because this low level of mosaicism would not
have met the sample inclusion criteria, we excluded this
sample from the remaining analysis. Review of the second karyotype report indicated that the mother also carried the same chromosome 4q34 deletion as the fetus.
Our algorithm did successfully detect the event and correctly called it as a maternal event. Because we focused
only on de novo fetal events, this sample was also excluded from the analysis. Review of the third sample did
not reveal any complex karyotype information, and the
negative algorithm outcome was probably a result of a
low fetal fraction of 4.8%. We note that this FN sample
was still nondetectable at uniplex sequencing.
For the remaining 162 presumed euploid samples, 5
MD events were detected. Additionally, in 2 of the 16
affected samples, we detected 2 events, whereas the
karyotype report indicated only 1 event. Because the
study collection protocol provided only the karyotyping
result of amniocytes for these samples, no pure fetal tissue
was available for sequencing or array confirmation.
Therefore, to confirm the validity of our findings, we
resequenced at higher coverage (uniplex) all the putative
FP samples. Of these, 1 sample led to a different outcome
based on the higher coverage sequencing and was deemed
an FP result. The remaining 4 samples produced the
same outcome when sequenced in uniplex. The MD
events detected in these 4 samples included MDs of 2.85,
3.25, 22.15, 23.55, 25, 32.45, and 42.3 Mb. Fig. 4 illustrates the comparisons between 12-plex and uniplex results for (A) a confirmed TP sample by uniplex, (B) a
confirmed FP by uniplex, and (C) the maternal deletion
detected by both 12-plex and uniplex.
Table 1 summarizes the performance of our algorithm on 176 plasma samples (excluding the mosaic trisomy 8 and maternal chr4q34 deletion). Our algorithm
attained a 94.4% sensitivity (95% CI, 70.6%–99.7%),
and a 99.4% specificity (95% CI, 95.96%–99.97%) for
the plasma samples. The CI was calculated on the basis of
the proportion test with continuity correction (28 ). We
want to emphasize again that 4 out of 5 putative FP
samples were confirmed by uniplex sequencing and subsequently counted as TP samples. Because we did not
have pure tissue samples for further invasive arrayCGH
confirmation, the interpretation of specificity was limited
to the signal detection perspective. We note that other
researchers have also observed discrepancies between sequencing and the traditional karyotyping methods (10 ).
6
Clinical Chemistry 61:4 (2015)
Discussion
The methodology described in this report for noninvasive detection of fetal subchromosomal abnormalities relies on a simple yet fundamental assumption: namely,
that any plasma sample pertaining to a mixture of maternal DNA and fetal DNA in which the latter component
has a genomic abnormality will lead to sequencing results
similar to those of euploid samples over the entire genome except for the region where the fetus is affected.
Moreover, for the affected region, the sequencing results
will show a deviation in sequenced reads that is directly
proportional to the fetal fraction but described by the
same variability as observed in a euploid population. We
have tested this hypothesis over a set of 16 plasma samples from women carrying affected fetuses and have
proven it to be consistent with these assumptions. This
result has warranted the exploration of the performance
of this methodology over a specific set of genomic abnormalities that were experimentally created to mimic the
fetal and maternal ccf DNA mixture as present in maternal plasma.
Compared to our previous clinical validation efforts
(4, 6 – 8 ), in this study we have performed an analytical,
genome-wide validation of this bioinformatics algorithm. None of the abnormalities investigated were statistically powered individually in a classical sense. Given
the low prevalence of many of these abnormalities and
given the fact this is a genome-wide assay, it would have
been impossible to collect a sufficiently large number of
affected sample for each possible abnormality. This kind
of difficulty is not unlike the one encountered in the
development of genome-wide chromosomal array
CGH invasive prenatal diagnosis (29 ). The performance of the chromosomal microarray analysis has
been evaluated through larger cohort studies, such as
reported by Wapner et al. in 2012 (30 ). Based on the
findings of this particular study, the American College
of Obstetricians and Gynecologists and the Society for
Maternal-Fetal medicine have recently issued the recommendation that in cases where in which one or
more major fetal structural abnormalities have been
identified by ultrasonography and the prospective
mother is undergoing invasive prenatal diagnosis,
chromosomal microarray analysis can replace the need
for fetal karyotyping (31 ).
A complexity associated with the reporting of
genome-wide abnormalities is the increased prevalence of
maternal-only and maternal-and-fetal abnormalities.
The analyte queried by noninvasive testing is always a
mixture of maternal and fetal DNA and although the
maternal copy number for chromosomes 21, 13, and 18
is highly likely to be known in advance, this is not necessarily the case for other genomic abnormalities. A recently published study (32 ) demonstrated how a signifi-
Noninvasive Detection of Fetal Microdeletions
Fig. 4. Comparison between 12-plex (left panel) and uniplex (right panel) results.
The plots follow the same convention as Fig. 1. The uniplex results show significantly improved signal to noise ratio compared to the 12-plex
results. (A), A TP event detected at 12-plex and confirmed by uniplex sequencing. (B), A FP event detected at 12-plex but not by uniplex
sequencing. (C), The maternal deletion detected by both the 12-plex and uniplex sequencing.
cant fraction of the NIPT FP results for chromosome X
are due to mosaic maternal abnormalities. The solution
proposed in the aforementioned report is to process the
maternal-only component (buffy coat) for many if not all
Table 1. Contingency table for the plasma study.a
Known genome
abnormality
a
No known genome
abnormality
TP
17
1
TN
1
156
Not reported
0
1
The decision tree algorithm correctly detected 17/18 positives and 156/157 negatives. Note 4 out of 5 putative FP samples at 12-plex were subsequently confirmed
at uniplex and were counted as TP samples here.
the samples undergoing NIPT. Currently our algorithms
detect and report any genomic abnormality (of sufficient
signal-to-noise ratio), regardless of whether its origin is
fetal, maternal, or a combination. An issue common to
both the genome-wide array CGH invasive testing as well
as sequencing-based NIPT is that both have the potential
for findings of unknown significance. A conservative
approach is to limit the findings to a subset of categories of high clinical relevance. This was indeed the
strategy behind the selection of the 6 conditions we
have investigated in our study by means of sheared
gDNA ⫹ ccf DNA mixtures. This is only a temporary
and intentional limitation: going forward, wholegenome sequencing– based noninvasive testing will
have the benefit of the improved knowledge of the
human genome as derived from the findings of microarray analysis.
Clinical Chemistry 61:4 (2015) 7
Author Contributions: All authors confirmed they have contributed to
the intellectual content of this paper and have met the following 3 requirements: (a) significant contributions to the conception and design, acquisition of data, or analysis and interpretation of data; (b) drafting or revising
the article for intellectual content; and (c) final approval of the published
article.
Authors’ Disclosures or Potential Conflicts of Interest: Upon manuscript submission, all authors completed the author disclosure form. Disclosures and/or potential conflicts of interest:
Employment or Leadership: C. Zhao, Sequenom Laboratories; J.
Tynan, Sequenom; M. Ehrich, Sequenom; G. Hannum, Sequenom; R.
McCullough, Sequenom; J.-S. Saldivar, Sequenom Laboratories; P.
Oeth, Sequenom Center for Molecular Medicine; D. van den Boom,
Sequenom; C. Deciu, Sequenom Laboratories and Illumina.
Consultant or Advisory Role: C. Deciu, Sequenom Laboratories.
Stock Ownership: C. Zhao, Sequenom Laboratories; J. Tynan, Sequenom; M. Ehrich, Sequenom; G. Hannum, Sequenom; R. McCullough,
Sequenom; J.-S. Saldivar, Sequenom; P. Oeth, Sequenom; D. van den
Boom, Sequenom; C. Deciu, Sequenom Laboratories and Illumina.
Honoraria: None declared.
Research Funding: None declared.
Expert Testimony: C. Deciu, Sequenom.
Patents: C. Zhao, patent number SEQ-6068-PC; C. Deciu, patent
numbers US 8,688,388, 20130338933, 20130325360, 20130310260,
20130309666, 20130304392, 20130288244, 20130261983,
20130150253, 20130130923, 20130103320, and 20130085681.
Role of Sponsor: No sponsor was declared.
Acknowledgments: The authors thank Dongyan Liu and Zhanyang
Zhu for performing the sequencing data processing; Amin R. Mazloom
and Sung K. Kim for providing critical discussions during the project;
Sarah L. Kinnings for assistance with the visualization of the genomic
data; Jason Chibuk for interpreting the plasma sample karyotype information; and Rick Hockett for providing several of the genomic DNA
samples used in the analytical validation. We thank Graham McLennan
for his assistance in the management of the clinical samples. Finally, we
thank Margo Maeder for all her project management support.
References
1. Lo YM, Corbetta N, Chamberlain PF, Rai V, Sargent IL,
Redman CW, Wainscoat JS. Presence of fetal DNA in
maternal plasma and serum. Lancet 1997;350:485–7.
2. Lo YM, Tein MS, Lau TK, Haines CJ, Leung TN, Poon PM,
et al. Quantitative analysis of fetal DNA in maternal
plasma and serum: Implications for noninvasive prenatal diagnosis. Am J Hum Genet 1998;62:768 –75.
3. Chiu RW, Chan KC, Gao Y, Lau VY, Zheng W, Leung TY, et
al. Noninvasive prenatal diagnosis of fetal chromosomal aneuploidy by massively parallel genomic sequencing of DNA in maternal plasma. Proc Natl Acad Sci
U S A 2008;105:20458 – 63.
4. Ehrich M, Deciu C, Zwiefelhofer T, Tynan JA, Cagasan L,
Tim R, et al. Noninvasive detection of fetal trisomy 21 by
sequencing of DNA in maternal blood: A study in a clinical setting. Am J Obstet Gynecol 2011;204:205 e1–11.
5. Fan HC, Blumenfeld YJ, Chitkara U, Hudgins L, Quake
SR. Noninvasive diagnosis of fetal aneuploidy by shotgun sequencing DNA from maternal blood. Proc Natl
Acad Sci U S A 2008;105:16266 –71.
6. Palomaki GE, Deciu C, Kloza EM, Lambert-Messerlian
GM, Haddow JE, Neveux LM, et al. DNA sequencing of
maternal plasma reliably identifies trisomy 18 and trisomy 13 as well as down syndrome: An international
collaborative study. Genet Med 2012;14:296 –305.
7. Palomaki GE, Kloza EM, Lambert-Messerlian GM, Haddow JE, Neveux LM, Ehrich M, et al. DNA sequencing of
maternal plasma to detect down syndrome: An international clinical validation study. Genet Med 2011;13:
913–20.
8. Jensen TJ, Zwiefelhofer T, Tim RC, Dzakula Z, Kim SK,
Mazloom AR, et al. High-throughput massively parallel
sequencing for fetal aneuploidy detection from maternal plasma. PLoS One 2013;8:e57381.
9. Jensen TJ, Dzakula Z, Deciu C, van den Boom D, Ehrich
M. Detection of microdeletion 22q11.2 in a fetus by
next-generation sequencing of maternal plasma. Clin
Chem 2012;58:1148 –51.
10. Srinivasan A, Bianchi DW, Huang H, Sehnert AJ, Rava
RP. Noninvasive detection of fetal subchromosome abnormalities via deep sequencing of maternal plasma.
Am J Hum Genet 2013;92:167–76.
11. Yu SC, Jiang P, Choy KW, Chan KC, Won HS, Leung WC,
8
Clinical Chemistry 61:4 (2015)
12.
13.
14.
15.
16.
17.
18.
19.
20.
21.
22.
23.
et al. Noninvasive prenatal molecular karyotyping from
maternal plasma. PLoS One 2013;8:e60968.
Straver R, Sistermans EA, Holstege H, Visser A, Oudejans CB, Reinders MJ. Wisecondor: detection of fetal
aberrations from shallow sequencing maternal plasma
based on a within-sample comparison scheme. Nucleic
Acids Res 2014;42:e31.
Chen S, Lau TK, Zhang C, Xu C, Xu Z, Hu P, et al. A
method for noninvasive detection of fetal large deletions/duplications by low coverage massively parallel
sequencing. Prenatal Diagnosis 2013;33:584 –90.
Rampášek L, Arbabi A, Brudno M. Probabilistic method
for detecting copy number variation in a fetal genome
using maternal plasma sequencing. Bioinformatics
2014;30:i212– 8.
Langmead B, Salzberg SL. Fast gapped-read alignment
with bowtie 2. Nat Methods 2012;9:357–9.
Alkan C, Kidd JM, Marques-Bonet T, Aksay G, Antonacci
F, Hormozdiari F, et al. Personalized copy number and
segmental duplication maps using next-generation sequencing. Nat Genet 2009;41:1061–7.
Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects
for stratification in genome-wide association studies.
Nat Genet 2006;38:904 –9.
Leek JT, Storey JD. Capturing heterogeneity in gene
expression studies by surrogate variable analysis. PLoS
Genet 2007;3:1724 –35.
Olshen AB, Venkatraman ES, Lucito R, Wigler M. Circular binary segmentation for the analysis of array-based
DNA copy number data. Biostatistics 2004;5:557–72.
Venkatraman ES, Olshen AB. A faster circular binary
segmentation algorithm for the analysis of array cgh
data. Bioinformatics 2007;23:657– 63.
Lai WR, Johnson MD, Kucherlapati R, Park PJ. Comparative analysis of algorithms for identifying amplifications and deletions in array cgh data. Bioinformatics
2005;21:3763–70.
Willenbrock H, Fridlyand J. A comparison study: Applying segmentation to array CGH data for downstream
analyses. Bioinformatics 2005;21:4084 –91.
Dohm JC, Lottaz C, Borodina T, Himmelbauer H. Substantial biases in ultra-short read data sets from high-
24.
25.
26.
27.
28.
29.
30.
31.
32.
throughput DNA sequencing. Nucleic Acids Res 2008;
36:e105.
Benjamini Y, Speed TP. Summarizing and correcting
the GC content bias in high-throughput sequencing.
Nucleic Acids Res 2012;40:e72.
Firth HV, Richards SM, Bevan AP, Clayton S, Corpas M,
Rajan D, et al. Decipher: Database of chromosomal imbalance and phenotype in humans using ensembl resources. Am J Hum Genet 2009;84:524 –33.
Geis J, Kinnings S, Almarsri E, Wang H, Guan X, McCullough R, et al. Fetal fraction distributions in noninvasive prenatal testing (NIPT) for multiple gestations.
2014 ACMG Annual Clinical Genetics Meeting, 2014
Mar 25–29; Nashville, TN.
Kinnings S, Almarsri E, Bombard A, Deciu C, Geis J,
Guan X, et al. Factors affecting levels of circulating cellfree fetal DNA in maternal plasma and their implications for noninvasive prenatal testing. ACMG Annu
Clinical Genetics Meeting, Vol. Nashville, Tennessee.,
2014.
Newcombe RG. Interval estimation for the difference
between independent proportions: Comparison of
eleven methods. Statistics in Medicine 1998;17:873–
90.
Baldwin EL, Lee JY, Blake DM, Bunke BP, Alexander CR,
Kogan AL, et al. Enhanced detection of clinically relevant genomic imbalances using a targeted plus whole
genome oligonucleotide microarray. Genet Med 2008;
10:415–29.
Wapner RJ, Martin CL, Levy B, Ballif BC, Eng CM, Zachary JM, et al. Chromosomal microarray versus karyotyping for prenatal diagnosis. N Engl J Med 2012;367:
2175– 84.
American College of Obstetricians and Gynecologists.
The use of chromosomal microarray analysis in prenatal
diagnosis. Committee opinion no. 581. Obstet Gynecol
2013;122:1374 –7.
Wang Y, Chen Y, Tian F, Zhang J, Song Z, Wu Y, et al.
Maternal mosaicism is a significant contributor to discordant sex chromosomal aneuploidies associated with
noninvasive prenatal testing. Clinical Chemistry 2014;
60:251–9.