Appendix S1 Theoretical approximations of three algorithms: z-score, NCV, and GWNS 1. Theoretical approximations of the iso-quality curves We first compare the three detection methods by deriving theoretical approximations of their iso-quality curves. The derivations are based on the following assumptions: (1) A fetus with T21 possesses three copies of chromosome 21, whereas a normal fetus possesses only two. (2) The mother carries a normal karyotype. (3) The MPSS reads are from either maternal or fetal DNA. The relative proportion of reads from each origin depends on the relative quantities of maternal and fetal DNA. (4) Conditioned on the origin (maternal or fetal), each MPSS read is independently sampled from all chromosomes. The probability of sampling each chromosome constitutes a fixed ratio. The statistics of MPSS reads are described below: Suppose reads of a fixed length are randomly sampled from fetal chromosomes. Denote t as the chromosome identity of a read, lk as the effective length of chromosome k, and following equalities hold for a T21 sample: . Then the (s1) Denote Nk as the number of reads from chromosome k. In normal and T21 samples, N21 follows a multinomial distribution with probability p21 and respectively. When p21 (or ) is small and N is large, N21 can be approximated with a Poisson distribution with rate 21 Np (t 21) Np21 (or 3 21 ). 2 Maternal plasma consists of both fetal and maternal DNAs. Denote f as the proportion of fetal DNA in the sample. Accordingly the total read count of chromosome 21 is N21 N21fetal N21maternal fN21 (1 f ) N21 (s2) For a normal fetus, N 21fetal , N 21maternal and N21 all follow the same Poisson distribution with rate λ21. For T21 fetus, N 21fetal and N 21maternal follow Poisson distributions with rates respectively. We denote the cumulative distribution function of and in this case as 1.1. Iso-quality curve of z-scores There is no simple analytic form for the iso-quality curve of z-scores. Instead we fixed the significance level (p-value), varied the fetal DNA proportion and total read number, and calculated the corresponding power. The parameter configurations that give rise to the same power are on the same iso-quality curve. For z-scores in equation 1, z21 ~ N(0,1), and . The p-value of an observed z21 = zq is thus obtained by plugging zq into the zero-mean, unit-variance normal distribution N(0,1): (s3) where Ф (.) is the cumulative distribution function of . Conversely, zq = Ф-1 (1-q) when the significance level q is given a priori. Given the significance level q, the power of z-scores is (s4) In T21 samples, N21 has the cumulative distribution function Therefore, (s5) 1.2. Iso-quality curves of NCV The ratio of read counts between the target and reference chromosomes has a more complicated distribution since it is the ratio of two Poisson random variables. Rather than deriving the exact form of this complicated statistics, we treated the denominator (read count of the reference chromosome) as a constant and gave a simplified approximation. In equation 2 , and . Similar to equation s3, by setting the p-value significance level to obtained the threshold values and , we . Accordingly, the power of NCVs gives rise to the same formula as z-scores: (s6) 1.3. Iso-quality curves of GWNS We evaluated the p-value of an observed by counting the fraction of normalized scores from all chromosomes in all control samples that exceeded an observed value rq. This step treats normalized scores to be sampled from a mixture of 22 random variables rk’s of the corresponding chromosomes: (s7) where P(rk r q | normal control) = P(N k Nl k r q | Poisson with = Np k ) . L Define the cumulative distribution of this mixture as Similar to the analysis on z-scores, we fixed the significance level and evaluated the corresponding normalized score according to the null model: (s8) We further evaluated the power of T21 detection according to rq: (s9) Here, chromosome 21 read count is attributed to both fetal and maternal DNA with the fetal DNA proportion f and follows the distribution Therefore, (s10) 2. Comparison of the iso-quality curves among three detection methods Figure S1.1 displays the log-log plot of iso-quality curves of the three methods according to theoretical approximations. We fixed the p-value threshold to 0.05 and considered two levels of detection power: 0.8 and 0.9. The iso-quality curves of NCV are omitted as their approximation of statistical power (equation s6) is identical to the z-score (equation s5). Theoretical approximations of iso-quality curves conform with our expectation in terms of the following aspects. First, all the iso-quality curves possess negative slopes in almost all the data points, reflecting the complementary contributions of total read number and fetal DNA fraction to the quality of prediction. The few “kinks” in the curves are attributed to the errors of evaluating the inverse cumulative distribution functions. Second, for each method the iso-quality curve for a lower power (0.8, red curves) lies below the curve for a higher power (0.9, blue curves). This is sensible as a higher level of fetal DNA proportion or total read number is required to achieve a higher power. Third, for both statistical power levels the iso-quality curves of GWNS (dashed lines) always lie below those of z-scores (solid lines). With the same parameter values and significance level, GWNS yields a higher power than z-scores since the former incorporates the data from all chromosomes and thus require a lower threshold value for a fixed significance level. Figure S1.1 Theoretical approximations of iso-quality curves of three detection methods for trisomy 21 (T21): GWNS (dashed lines), z-score (solid lines), and NCV (not displayed as their approximations coincide with z-scores). With a fixed significance level (p-value 0.05) and two detection power levels (0.8 and 0.9), an iso-quality curve indicates the values of fetal DNA fraction (x-axis) and total DNA read number (y-axis) that yield the specified accuracy level. The two parameters are displayed in a log scale.
© Copyright 2026 Paperzz