Readings in Statistics and Probability: Analysis of Copy-Number Variation University of Kentucky STA 715, Spring 2012 Credit: 3.0 Meeting time: 1:00 p.m - 2:30 p.m., Wednesdays Room 205D, Multidisciplinary Sciences Building (MDS 205D) Instructor: Patrick Breheny, Ph.D. Office: MDS 205D Phone: 218-2077 e-mail: [email protected] Office hours: Whenever I’m in my office, or by appointment Course description: We will be reading prominent articles pertaining to the analysis of copy-number variation data. In addition to reading articles specifically dealing with copynumber variation, we will also read some background material on some of the statistical methods involved. My goal is that at the end of the class, you will: • Be familiar with the key articles and concepts in the analysis of copy-number variation data, as well as have a sense of the unanswered questions and current research directions • Gain exposure to ideas and tools which can be applied to copy-number data, such as multiple testing corrections and false discovery rates, hidden Markov models, and the fused lasso • Be more comfortable reading and discussing journal articles in statistics Readings: I will provide .pdf copies of all the articles and textbook chapters we will be reading via Dropbox. Prerequisite: STA 701, STA 703 Grading: Your grade will be based entirely on classroom participation and discussion. You should be fine as long as you actually read the articles and attempt to understand them. It is understandable and expected that you will still have some questions after reading the articles – this is the purpose of discussing them, after all. If I feel that your effort and participation are not at the “A” level, I will let you know. Attendance: Obviously, since 100% of your grade is based on participation in discussion, attendance is very important. If you cannot attend one of our meetings, please let me know 1 in advance. To make up the absence, you will be asked to play a larger role in the following week’s discussion. If you repeatedly miss class (more than 3 absences), it will be reflected in your grade. Electronic communication: I will occasionally send e-mails to the class (to the account listed for you in the campus directory), so please check that account regularly. Complaints: Students with suggestions or complaints should see me first, and if we cannot come to an agreement, I will direct you to the head of the department. Disabilities: If anyone has a disability requiring special accommodations, please let me know as soon as possible, so that these arrangements can be made. Copy-number variation is a phenomenon that has only recently been discovered, and there are many interesting aspects of its analysis that have not yet been explored. I hope that you will find both this specific topic interesting and also gain a sense of how new statistical methods are developed in response to the collection of new types of data and new inferential questions. Course outline: 1. Fundamental aspects of copy-number variation • Eichler, E. (2008). Copy number variation and human disease. Nature Education, 1. • McCarroll, S. and Altshuler, D. (2007). Copy-number variation and association studies of human disease. Nature Genetics, 39 S37–S42. • Zöllner, S. and Teslovich, T. (2009). Using GWAS data to identify copy number variants contributing to common complex diseases. Statistical Science, 24 530–546. • McCarroll, S. (2008). Extending genome-wide association studies to copynumber variation. Human Molecular Genetics, 17 R135. • Ginsburg, G. S. and Willard, H. (eds.) (2010). Essentials of Genomic and Personalized Medicine. Academic Press • Sharp, A., Cheng, Z. and Eichler, E. (2006). Structural variation of the human genome. Annual Review of Genomics and Human Genetics, 7 407–442. • Cooper, G., Zerr, T., Kidd, J., Eichler, E. and Nickerson, D. (2008). Systematic assessment of copy number variant detection via genome-wide SNP genotyping. Nature genetics, 40 1199–1203. • Itsara, A., Cooper, G., Baker, C., Girirajan, S., Li, J., Absher, D., Krauss, R., Myers, R., Ridker, P., Chasman, D. et al. (2009). Population analysis of large copy number variants and hotspots of human genetic disease. The American Journal of Human Genetics, 84 148–161. 2 2. Landmark papers • Iafrate, A., Feuk, L., Rivera, M., Listewnik, M., Donahoe, P., Qi, Y., Scherer, S. and Lee, C. (2004). Detection of large-scale variation in the human genome. Nature Genetics, 36 949–951. • Sharp, A., Locke, D., McGrath, S., Cheng, Z., Bailey, J., Vallente, R., Pertz, L., Clark, R., Schwartz, S., Segraves, R. et al. (2005). Segmental duplications and copy-number variation in the human genome. The American Journal of Human Genetics, 77 78–88. • Redon, R., Ishikawa, S., Fitch, K., Feuk, L., Perry, G., Andrews, T., Fiegler, H., Shapero, M., Carson, A., Chen, W. et al. (2006). Global variation in copy number in the human genome. Nature, 444 444–454. 3. Multiple comparisons and kernel methods • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society Series B, 57 289–300. • Yang, H., Hsieh, H. and Fann, C. (2008). Kernel-based association test. Genetics, 179 1057–1068. • Efron, B. (2010). Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Cambridge University Press • Efron, B. and Zhang, N. (2011). False discovery rates and copy number variation. Biometrika, 98 251. 4. Circular binary segmentation • Olshen, A., Venkatraman, E., Lucito, R. and Wigler, M. (2004). Circular binary segmentation for the analysis of array-based DNA copy number data. Biostatistics, 5 557–572. • Venkatraman, E. and Olshen, A. (2007). A faster circular binary segmentation algorithm for the analysis of array CGH data. Bioinformatics, 23 657–663. 5. Hidden Markov Models • Press, W., Teukolsky, S., Vetterling, W. and Flannery, B. (2007). Numerical recipes: The Art of Scientific Computing. 3rd ed. Cambridge University Press • Scott, S. (2002). Bayesian methods for hidden markov models: Recursive computing in the 21st century. Journal of the American Statistical Association, 97 337–352. • Fridlyand, J., Snijders, A., Pinkel, D., Albertson, D. and Jain, A. (2004). Hidden Markov models approach to the analysis of array CGH data. Journal of Multivariate Analysis, 90 132–153. 3 • Wang, K., Li, M., Hadley, D., Liu, R., Glessner, J., Grant, S., Hakonarson, H. and Bucan, M. (2007). PennCNV: An integrated hidden Markov model designed for high-resolution copy number variation detection in whole-genome SNP genotyping data. Genome Research, 17 1665–1674. 6. Fused lasso • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. and Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society Series B, 67 91–108. • Tibshirani, R. and Wang, P. (2008). Spatial smoothing and hot spot detection for CGH data using the fused lasso. Biostatistics, 9 18–29. • Nowak, G., Hastie, T., Pollack, J. and Tibshirani, R. (2011). A fused lasso latent feature model for analyzing multi-sample aCGH data. Biostatistics, 12 776–791. 7. Integrated analysis of SNP and CNV data • Barnes, C., Plagnol, V., Fitzgerald, T., Redon, R., Marchini, J., Clayton, D. and Hurles, M. (2008). A robust statistical method for casecontrol association testing with copy number variation. Nature Genetics, 40 1245–1252. • Korn, J., Kuruvilla, F., McCarroll, S., Wysoker, A., Nemesh, J., Cawley, S., Hubbell, E., Veitch, J., Collins, P., Darvishi, K. et al. (2008). Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs. Nature Genetics, 40 1253. 4
© Copyright 2026 Paperzz