Supplementary Appendix S1

Appendix S1
Theoretical approximations of three algorithms: z-score, NCV, and GWNS
1. Theoretical approximations of the iso-quality curves
We first compare the three detection methods by deriving theoretical approximations of their
iso-quality curves. The derivations are based on the following assumptions:
(1) A fetus with T21 possesses three copies of chromosome 21, whereas a normal fetus possesses
only two.
(2) The mother carries a normal karyotype.
(3) The MPSS reads are from either maternal or fetal DNA. The relative proportion of reads from
each origin depends on the relative quantities of maternal and fetal DNA.
(4) Conditioned on the origin (maternal or fetal), each MPSS read is independently sampled from
all chromosomes. The probability of sampling each chromosome constitutes a fixed ratio.
The statistics of MPSS reads are described below:
Suppose
reads of a fixed length are randomly sampled from fetal chromosomes. Denote t as the
chromosome identity of a read, lk as the effective length of chromosome k, and
following equalities hold for a T21 sample:
. Then the
(s1)
Denote Nk as the number of reads from chromosome k. In normal and T21 samples, N21 follows
a multinomial distribution with probability p21 and
respectively. When p21 (or
) is small
and N is large, N21 can be approximated with a Poisson distribution with rate
21  Np (t  21)  Np21 (or
3
21 ).
2
Maternal plasma consists of both fetal and maternal DNAs. Denote f as the proportion of fetal
DNA in the sample. Accordingly the total read count of chromosome 21 is
N21  N21fetal  N21maternal  fN21  (1  f ) N21
(s2)
For a normal fetus, N 21fetal , N 21maternal and N21 all follow the same Poisson distribution with rate
λ21. For T21 fetus, N 21fetal and N 21maternal follow Poisson distributions with rates
respectively. We denote the cumulative distribution function of
and
in this case as
1.1. Iso-quality curve of z-scores
There is no simple analytic form for the iso-quality curve of z-scores. Instead we fixed the
significance level (p-value), varied the fetal DNA proportion and total read number, and calculated
the corresponding power. The parameter configurations that give rise to the same power are on the
same iso-quality curve.
For
z-scores
in
equation
1,
z21
~
N(0,1),
and
. The p-value of an observed z21 = zq is thus obtained by plugging
zq into the zero-mean, unit-variance normal distribution N(0,1):
(s3)
where Ф (.) is the cumulative distribution function of
. Conversely, zq = Ф-1 (1-q) when the
significance level q is given a priori. Given the significance level q, the power of z-scores is
(s4)
In T21 samples, N21 has the cumulative distribution function
Therefore,
(s5)
1.2. Iso-quality curves of NCV
The ratio of read counts between the target and reference chromosomes has a more complicated
distribution since it is the ratio of two Poisson random variables. Rather than deriving the exact
form of this complicated statistics, we treated the denominator (read count of the reference
chromosome) as a constant and gave a simplified approximation.
In
equation
2
, and
. Similar to equation s3, by setting the p-value significance level to
obtained the threshold values
and
, we
. Accordingly, the power
of NCVs gives rise to the same formula as z-scores:
(s6)
1.3. Iso-quality curves of GWNS
We evaluated the p-value of an observed
by counting the fraction of normalized scores from all
chromosomes in all control samples that exceeded an observed value rq. This step treats normalized
scores to be sampled from a mixture of 22 random variables rk’s of the corresponding
chromosomes:
(s7)
where
P(rk  r q | normal control) = P(N k 
Nl k r q
| Poisson with  = Np k )
. L
Define the cumulative distribution of this mixture as
Similar to the analysis on
z-scores, we fixed the significance level and evaluated the corresponding normalized score
according to the null model:
(s8)
We further evaluated the power of T21 detection according to rq:
(s9)
Here, chromosome 21 read count
is attributed to both fetal and
maternal DNA with the fetal DNA proportion f and follows the distribution
Therefore,
(s10)
2. Comparison of the iso-quality curves among three detection methods
Figure S1.1 displays the log-log plot of iso-quality curves of the three methods according to
theoretical approximations. We fixed the p-value threshold to 0.05 and considered two levels of
detection power: 0.8 and 0.9. The iso-quality curves of NCV are omitted as their approximation of
statistical power (equation s6) is identical to the z-score (equation s5).
Theoretical approximations of iso-quality curves conform with our expectation in terms of the
following aspects. First, all the iso-quality curves possess negative slopes in almost all the data
points, reflecting the complementary contributions of total read number and fetal DNA fraction to
the quality of prediction. The few “kinks” in the curves are attributed to the errors of evaluating the
inverse cumulative distribution functions. Second, for each method the iso-quality curve for a lower
power (0.8, red curves) lies below the curve for a higher power (0.9, blue curves). This is sensible
as a higher level of fetal DNA proportion or total read number is required to achieve a higher
power. Third, for both statistical power levels the iso-quality curves of GWNS (dashed lines)
always lie below those of z-scores (solid lines). With the same parameter values and significance
level, GWNS yields a higher power than z-scores since the former incorporates the data from all
chromosomes and thus require a lower threshold value for a fixed significance level.
Figure S1.1 Theoretical approximations of iso-quality curves of three detection methods for
trisomy 21 (T21): GWNS (dashed lines), z-score (solid lines), and NCV (not displayed as their
approximations coincide with z-scores). With a fixed significance level (p-value 0.05) and two
detection power levels (0.8 and 0.9), an iso-quality curve indicates the values of fetal DNA fraction
(x-axis) and total DNA read number (y-axis) that yield the specified accuracy level. The two
parameters are displayed in a log scale.