Supplementary Information Genomic landscape established by allelic imbalance in the cancerization field of a normal appearing airway Yasminka A. Jakubek, Wenhua Lang, Selina Vattathil, Melinda Garcia, Li Xu, Lili Huang, Suk-Young Yoo, Li Shen, Wei Lu, Chi-Wan Chow, Zachary Weber, Gareth Davies, Jing Huang, Carmen Behrens, Neda Kalhor, Cesar Moran, Junya Fujimoto, Reza Mehran, Randa El-Zein, Stephen G. Swisher, Jing Wang, Jerry Fowler, Avrum E. Spira, Erik A. Ehli, Ignacio I. Wistuba, Paul Scheet, Humam Kadara Supplementary Methods Genome-wide high-density array profiling. Genomic DNA samples (200 ng) were processed using the Illumina Infinium HD assay protocol. DNA samples were denatured, isothermally amplified overnight (to minimize amplification bias) and then fragmented. Fragmented DNA samples were then hybridized to the BeadChip arrays. BeadChip images were then captured using an Illumina iScan system and raw genotyping data was then generated using the Illumina GenomeStudio Genotyping (GT) Module Software. For a quality control measure we determined the concordance between the germline genotype and SNP genotype calls for each set of patient samples. When blood cells were unavailable, DNA from white blood cells or normal lung tissue was used as the designated normal sample for each individual, from which to infer patient germline genotypes. Relatively high concordance rates (> 98%) were observed between non-tumor and germline genotypes and > 88% concordance between NSCLC tumor/CNB and germline genotypes. Importantly and in contrast, the concordance rates for all samples when compared to an incorrectly matched germline sample were less than 70%. Quality control of event boundaries identified by hapLOH. HapLOH was applied to the entire genome, agnostic to chromosomal boundaries. Although individual AI events are naturally restricted to separate chromosomes, running the algorithm in this way allows for an orthogonal confirmation of detected events, ie. when the boundary of a called AI event corresponds in genomic coordinates to the end of a chromosome or 1 its centromere. Indeed, most of the called events with sizes on the order of a small chromosome were observed to end at chromosome boundaries, validating their existence. However, in highly aberrant genomes, AI events may reside at “adjacent” (by number) chromosomes, eg. all of 1q and near the terminus of 2p. In this boundary-agnostic mode, a false observation of a small event next to a large one of a different chromosome can result from the algorithm slowly lowering the probability below 50% over many markers (essentially “bleeding” into adjacent markers). This is especially likely in “low cellularity” settings since regions of AI will not look that different from normal regions. To deal with this and separate out this “bleeding” phenomenon from true multiple (adjacent) events, hapLOH was performed four times for each sample, each time permuting the order of the chromosomes in the genome (sampling chromosomal orderings without replacement). The data were then combined by averaging the posterior probabilities across the four experiments. This approach did not dramatically affect the number of AI events detected; however, it did improve identification of event boundaries. AI was also analyzed in the X chromosome of female patients. To do so, hapLOH was performed two times per sample placing the X chromosome markers at the beginning and at the end of the genome after which the data of both experiments were combined by averaging their posterior probabilities. Classification of events as gains, losses or copy neutral LOH. AI events were classified as gain, loss or cn-LOH using the average log R ratio (LRR) and BAF deviations of the markers within each event (Supplementary Figure 4). LRR thresholds were set at +/- 0.05 for gains and losses respectively. Events with LRR between 0.1 and -0.1 and BAF deviations > 0.1 were classified as cn-LOH. Airway events (n = 179) that exhibited BAF and LRR deviations that were too subtle for classification (BAF < 0.1 and LRR between 0.1 and -0.1) were deemed as “undeterminable”. To classify those undeterminable events in airway samples that had a positive BAF correlation with a paired tumor event, the events between paired airway and tumor samples were compared using a 50% reciprocal overlap rule. An undeterminable airway event was inferred to be the same type as the tumor event if the airway event matched an event in the tumor/CNB. In cases where the airway sample event matched multiple tumor/CNB samples from that same patient, a majority rule among the tumor events was used to label the airway event. Using this approach, 61 subtle airway events 2 were classified. Also, AI events were classified as focal or arm events with an arm event spanning 90% or more markers on a chromosome arm (Supplementary Figure 6). Identification of anti-correlated BAF signals as indicators of secondary mutations in airways and tumors. For individuals with multiple samples exhibiting AI at a given region, we attempted to infer whether the imbalance of alleles occurred in the same or different directions among samples. An observation of the same direction would be consistent with the same mutational event, whereas the opposite direction implies an independent (or additional) mutation. Shifts in the BAF frequencies at heterozygous markers were modeled to determine whether two samples from the same patient showed the same pattern of AI (for example both exhibit the pattern shown in Supplementary Figure 5C) or an opposite pattern (for example tumor exhibits the pattern shown in Supplementary Figure 5C and the airway shows the pattern in Supplementary Figure 5D). A quantitative assessment of these deviations was performed for each airway AI event by calculating the correlation between the heterozygous BAFs in the airway event and the BAFs for the same set of markers in the tumor event. In order to remove possible artifacts from systematic probe biases, tumor/CNB BAFs were divided into two groups: increased BAF (values are above 0.5) and decreased BAF (values are below 0.5) after which the BAFs were smoothed to 0.9 and 0.1 for the increased group and decreased group respectively. Further, we evaluated only events where the mean BAF deviations are at least .05 from the expected value of 0.5. Correlations between observed BAF values in the airway samples and the corresponding smoothed BAFs from the tumor samples were computed within a region of interest or putative AI. These regions and events are reported in Supplementary Table 3. The main manuscript includes data on the rate of whole-arm secondary independent mutations on chromosome 9 to be 0.8 with a standard error of 0.15. These data were derived by the following logic. The second mutation can occur in one of two directions: the same as the first mutation or the opposite. If the second mutation occurs in the same direction (same haplotype in relative excess) as the first, we will not be able to detect it, since our test is based on the presence of a negative correlation (above), which is thus a latent process (the mutation occurs but we cannot observe it). Only those second mutations that happen in a 3 different direction from the first one can be observed. Assuming there are a total of n pairs of identicallypositioned AI events between airway sample and tumor from the same patient, let x denote the number of these in which we observed the second mutation in the opposite (BAF) direction as the first, and let y denote the number of cells in which a second mutation happens but in the same direction as the first mutation and is thus not observed. We assume that the secondary mutation is an independent event. Then x and y follow binomial distributions as follows: x ~ Binom(n, px) and y ~ Binom(n, py), where px and py are the rates of each type of mutation. We further assume the direction of the mutation is random with equal probability and thus px = py and the overall probability (or rate) of secondary mutation p is px + py. Given observed data n and x, we can obtain a point estimate for p as twice the point estimate for px, or 2(x/n). To obtain a standard error of this estimate, we use the square-root of the variance of the estimator, Var(𝑝̂ ), which in turn is given by Var(𝑝 ̂) ̂) ̂𝑥 (1 – 𝑝 ̂) ̂𝑦 (1 – 𝑝 ̂) ̂)(1 –𝑝 ̂)/n. 𝑥 + Var(𝑝 𝑦 = [𝑝 𝑥 +𝑝 𝑦 ]/n or 2(𝑝 𝑥 𝑥 On chromosome 9, 6 opposite-direction mutations were observed out of a possible 15 and thus our estimate for p, the rate of secondary independent mutations, is 12/15 or 0.8 with a std. error approximately 0.15. 4 References 1. Kadara H, Fujimoto J, Yoo SY, Maki Y, Gower AC, Kabbout M, et al. Transcriptomic architecture of the adjacent airway field cancerization in non-small cell lung cancer. Journal of the National Cancer Institute. 2014;106:dju004. 2. Vattathil S, Scheet P. Haplotype-based profiling of subtle allelic imbalance with SNP arrays. Genome research. 2013;23:152-8. 3. Staaf J, Lindgren D, Vallon-Christersson J, Isaksson A, Goransson H, Juliusson G, et al. Segmentation-based detection of allelic imbalance and loss-of-heterozygosity in cancer cells using whole genome SNP arrays. Genome Biol. 2008;9:R136. 4. Scheet P, Stephens M. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. American journal of human genetics. 2006;78:629-44. 5 Supplementary Figure 1. HapLOH results for matched normal large airway and NSCLC tumor from case 18 BAF A BAF B CHR: 1 3 5 7 9 11 13 15 17 19 Supplementary Figure 1. B-Allele frequencies (BAF) (y-axis) are plotted for each marker on the array. Markers are ordered along the x-axis by their genomic position. Red lines indicated HapLOH output, the posterior probability that markers are in a region of allelic imbalance (AI). Blue dashed lines indicate chromosome boundaries and blue dots indicate centromeres. Plots for normal-appearing large airway brushing (A) and NSCLC tumor (B) from case 18. 6 21 100 0 50 Number of Events 150 Supplementary Figure 2. Distribution of posterior probabilities for airway events called by hapLOH 0.5 0.6 0.7 0.8 0.9 1.0 Posterior Probability Supplementary Figure 2. Distribution of the posterior probabilities (x-axis) for the events identified by hapLOH in the normal-appearing airway field of cancerization. 7 Supplementary Figure 3: Allelic imbalance events detected in normal-appearing airway brushings CHR 1 025_S2b 026_S1 031_S2 039_S2 017_S4 AnnotationTrack CHR 2 013_S3 025_S2b 026_S1 039_S2 k CHR 3 018_L1 025_S2b 026_S1 032_S4 AnnotationTrack 039_S2 CHR 4 018_L1 025_S2b 026_S1 039_S2 AnnotationTrack 042_S2 CHR 5 018_L1 024_S1 025_S2b 026_S1 039_S2 039_S3 AnnotationTrack 042_S2 8 CHR 6 018_L1 024_S1 025_S2b 026_S1 AnnotationTrack 039_S2 CHR 7 018_L1 026_S1 027_S2 025_S2b AnnotationTrack 039_S2 CHR 8 018_L1 025_S2b 026_S1 026_S2 039_S2 041_S1 AnnotationTrack 042_S2 CHR 9 012_S4 016_S1 016_S2 018_L1 020_S2 020_S4 022_S2 AnnotationTrack 022_S4 023_S2 024_S1 025_S2b 026_S1 033_S1 033_S2 039_S2 039_S3 042_S1 042_S2 042_S4 044_S5 9 CHR 10 025_S2b 026_S1 039_S2 018_L1 CHR 11 AnnotationTrack 035_S5 039_S2 018_L1 025_S2b AnnotationTrack CHR 12 018_L1 024_S1 040_S2 AnnotationTrack 025_S2b CHR 13 CHR 14 042_S2 026_S2 018_L1 039_S2 025_S2b 041_S1 026_S1 018_L1 039_S2 026_S1 CHR 16 AnnotationTrack AnnotationTrack CHR 15 025_S2b 018_L1 026_S1 018_L1 025_S2b 026_S1 039_S2 CHR 18 AnnotationTrack AnnotationTrack CHR 17 018_L1 025_S2b 044_S2 026_S1 018_L1 024_S1 024_S2 041_S1 039_S2 020_L1 10 AnnotationTrack AnnotationTrack 039_S2 CHR 19 CHR 20 CHR 22 CHR 21 007_L1 026_S1 018_L1 026_S1 025_S2b 039_S2 025_S2b 025_S2b 026_S1 025_S2b 018_L1 018_L1 CHR X AnnotationTrack AnnotationTrack AnnotationTrack AnnotationTrack 039_S2 018_L1 026_S1 039_S2 039_S3 025_S2b AnnotationTrack Supplementary Figure 3. Plots of somatic AI events detected in the airway samples from NSCLC patients. Events are shown as bars: red (gain), blue (loss), green (cnLOH), and gray (undeterminable). Labels indicate the case number and sample type. 11 Supplementary Figure 4. Event classification using B-allele frequencies and log R ratios B 0.2 0.0 LRR deviation -0.4 -0.2 0.2 0.0 -0.2 -0.4 CNLOH loss gain undeterminable 0.005 0.01 CNLOH loss gain undeterminable -0.6 -0.6 0.02 0.03 0.05 0.1 0.2 0.3 0.005 0.5 0.01 0.02 0.03 0.05 0.1 0.2 0.3 0.5 0.2 0.3 0.5 BAF deviation (log scale) BAF deviation (log scale) C 0.6 Supplementary Figure 4. 0.2 0.0 -0.2 -0.4 LRR deviation 0.4 Plots of AI events with BAF deviation plotted on x-axis and log R ratio (LRR) deviation plotted on the y-axis. Panel A depicts the events in the normal-appearing airways and panel B displays the same events but with the NSCLC tumor event type designation. Panel C depicts all events identified in the tumors. -0.6 LRR deviation 0.4 0.4 0.6 0.6 A CNLOH loss gain undeterminable 0.005 0.01 0.02 0.03 0.05 0.1 BAF deviation (log scale) 12 Supplementary Figure 5. Effect of allelic imbalance on B-allele frequencies A B C D To assess AI, we first phased heterozygous genotypes in order to identify the two haplotypes, labeled as the maternal and paternal haplotypes (A). Regions with no AI showed patterns that are consistent with an equal proportion of maternal and paternal haplotypes (B). We detected AI in regions where BAF deviations indicated an abundance of one of the parental haplotypes (C-D). In addition, we calculated BAF correlations between airway and tumor samples in order to determine if AI in the samples were the result of both samples having the same haplotype or opposite haplotypes in excess. Positive correlations indicated AI occurring in the same direction (both exhibiting pattern C or D). Negative correlations between paired airway and tumor samples indicated that samples exhibited opposite haplotypes in excess (for example C in tumor and D in airway) pointing to independent AI events. 13 Supplementary Figure 6: Distribution of focal and arm airway events A B 60 80 In Tumor Events 0 20 40 Number of Events 40 20 0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 Event Markers / Markers in Chromosome Arm 0.2 0.4 0.6 0.8 1.0 Event Markers / Markers in Chromosome Arm Supplementary Figure 6. Not in Tumor Events C) 0 20 40 60 80 Relative size distribution of normalappearing airway events. Size is measured as the number of markers in the airway events / the number of markers on the chromosome arm (x-axis). The dashed line at 0.9 indicates the threshold used to differentiate between arm and focal events. Histogram for all events detected in the airway samples or for airway events with matching tumor event are displayed in panels A and B, respectively. Panel C depicts distribution of airway events that did not match a tumor event. Number of Events Number of Events 60 80 All Events 0.0 0.2 0.4 0.6 0.8 Event Markers / Markers in Chromosome Arm 14 1.0
© Copyright 2024 Paperzz