A PIPELINE FOR DIAGNOSING SICK KIDS: AN EXAMPLE OF COMPOSABLE BIOCOMPUTE OBJECTS 1 Genomic analysis for diagnosing sick children • Many different circumstances and different options: • May be testing just the child • May be testing parents also • May not have both parents – • But maybe have extended family members • May have evidence of heredity • Might eliminate searching for de novo variants • Whole genome? Whole exome? Panel? RNA? Epigenomic tests? 2 Sick kids pedigree pipeline(s) – components in use • Standard germline variant calling pipeline on each genome • Same for WGS and WES • Best practices pre-processing • • • • Trio/ Quad analysis when available Extended Pedigree analysis if needed Mitochondrial DNA analysis on WGS de Novo variant discovery and filtering • without filtering very high percentage of de novo calls are false!! • 70-120 true. 1000’s false positive • Annotation • Interpretation and diagnosis – not automated 3 Overall Pipeline: Components in Use BAM Generation Variant Calling Annotation Manual Interpretation& Diagnosis 4 Pedigree Analysis Mitochondiral Analysis DeNovo Analysis CURRENT BEST PRACTICES Variant Discovery on each genome SNV/Indel Discovery GATKHaplotypeCaller Analysisready BAM GATKJointGenotyping VariantDiscovery Analysisready SV/CNV Calls GATKVariantScore Recalibration(VQSR) SV/CNV Discovery SV/CNVDiscovery GenomSTRIP JointCalling GenomSTRIP Genotyping SV/CNV Filtering Mitochondrial DNAAnalysis Mitochondrial DNAAnalysis VariantQuality Filtering Preprocessing 5 SNV/Indel Discovery Analysisready SNV/Indel Calls Heteroplasmy Estimation Analysisready mtDNA Variant Calls Pedigree Analysis Workflow Analysisready SNV/Indel Calls Clinical Information Pedigree Analysis Interpretation Analysisready SV/CNV Calls Analysisready mtDNA Variant Calls 6 Results/ Denovo Filtering Extended Pedigree Analysis Functional Annotation Pedigree Analysis - Quad JOINTGENOTYPING Proband gVCF Multi-sample VCF Father gVCF Mother gVCF VariantQuality Score Recalibration Unaffected Sibling gVCF Phase genotypes Calculate genotype posteriors MALEFACTOR NYGCSoftware Functional Annotation+ Interpretation Family members affection status Penetrance model 7 Modeof inheritance Denovo validation+ filtration Variant Filtration Denovo identification Mitochondrial DNA Analysis 16569 1 rCRS-alt rCRS 1 16569 AlignmenttoGRCh37withrCRS-BWAAligner Heteroplasmy discovery Homoplasmy discovery Unified VCF AlignmenttoGRCh37withrCRS-alt-BWAAligner Heteroplasmy discovery mtDNA copy number relative tonDNA Homoplasmy discovery mtDNA copy number relative tonDNA Unified VCF Reconstructed VCFderived fromrCRS and rCRS-alt 8 Variantannotation MitoMap, ClinVar,OMIM Varianteffect scoringfor prioritization Tumor/normal comparison Variant Interpretation Name: Steven Walerstein Project: Project_CLIN_11377_B01_SOM_WGS Sample ID: ONC15-50N-D Analysis: Whole genome sequencing data generated as a part of participating in General Population Research Study was used for the ancestry analysis. 1000 Genomes Project data was used as reference for population stratification. Ancestry analysis – Apply population specific allele frequencies Central Asian 2.1% Italian / Balkan 12.1% Middle Eastern 0.7% European Jewish or East Mediterranean 85.1% Note:9 These proportions show the approximate locations of your ancestors ~500 years ago, as determined by a comparison of your DNA to that of a set of reference Composability • These various individual compute components can be used in many contexts. • Variant calling is the same for WGS and WES • Structural variant calling is more accurate on whole genome • Mitochondrial DNA analysis is often neglected in standard WGS analysis workflows. Can be performed on any whole genome without the need for a dedicated Mitochondrial panel. • Different combinations may be needed for research/ trials on cohorts than are needed for individual clinical samples – eg joint genotyping. • Pedigree vs Extended Pedigree is a choice per proband • How is the right pipeline selected? • Should these be combined? • Ancestry analysis is helpful to evaluate population specific variant allele frequencies 10 Versioning • These are clinical pipelines • We change our research pipelines frequently • And periodically validate the clinical pipelines and submit to state for re-certification. • How will that work with a bio-compute repository? • Is that validation sufficient? • Is another required? • What if the two don’t agree? • Standard operating procedures for validating any changes in sequencing or bioinformatics analyses workflows 11 Questions • How do we handle version management in a bio-compute repository? • Different projects needs different versions • Different projects need different references • Where are standard data like references stored? • What tools will we provide for composability? • Do we need to? • How do we validate pipelines in a repository? • What could change to invalidate a pipeline? • Is there a risk that composing many bio-compute objects from the repository could make re-identification easier? • Do we deal with that in any technical way? • Reproducibility: • Data isn’t always there 12 Acknowledgements • • • • • • • 13 Avinash Abhyankar Belinda Cornes Anne-Katrin Emde Giuseppe Narzise Bo-Juen Chen Jimmy Lin Christian Stolte • • • • • Shailu Gargeya Clint Howarth Manisha Kher Terry Dontje Uday Evani
© Copyright 2026 Paperzz