A Quantitative Overview to Gene Expression Profiling in Animal Genetics Analysis of (cDNA) Microarray Data: Part V. Mixtures of Distributions Model-Based Clustering via Mixtures of Distribution Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Definition • The mixture model assumes that each cluster (or component) of the data is generated by an underlying normal distribution. • Each of the data in y are assumed to be independent observations from a mixture density with k (possibly unknown but finite) components and with probability density function: k f y; k i y; i , Vi i 1 Normal density function Mixing proportions (add to 1) Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Introduction f y; k y; , V k j j j j 1 Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics The Guru Mixtures of Distributions http://www.maths.uq.edu.au/~gjm Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Software and Resources Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions k EM Algorithm f y; k i y; i , Vi i 1 The EM algorithm obtains the maximum likelihood estimate of by iteration. In the (m+1)th iteration, the estimates of the parameters of interest are updated by: ( m 1) i Vi ( m 1) Where n j 1 ( m) ij /n ( m 1) i n j 1 n ( m) ij yi / ij( m) j 1 n ( m) n ( m) ( m 1) ( m 1) T ij ( yi i )( yi i ) / ij j 1 j 1 ij( m) i( m) y j ; i( m) , Vi ( m) / f ( y j ; ( m) ) Is the Posterior Probability that yj belongs to the i-th component of the mixture (…with a very elegant link to False Discovery Rate). Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions k f y; k i y; i , Vi EM Algorithm i 1 • We proceed for k = 1, 2, 3, …, and so on components. • Criteria for model selection includes the Akaike Information Criterion (AIC) and the Bayesian Information Criterion (BIC): ˆ k ) 2 k AIC 2 log L( ˆ k ) k log( n) BIC 2 log L( Where k 3k 1 Is the number of independent parameters in the mixture. • Alternatively, the distribution of the likelihood ratio test (LRT) can be estimated by bootstrapping and P-values obtained to contrast a model with k components against a model with k + 1 components. Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Consider these Distribution N(1,5) N(5,10) Simulation 1 Records 10,000 5,000 …and simulate ˆ ) 2 N (1, 5) 1 N (5,10) The Mixture becomes: f ( y; 3 3 Posterior Prob: ij i y j ; i , Vi f ( y j ; ) Likelihood 6 2 -1 0 1 5 7 N(1,5) N(5,10) 0.120 0.161 0.178 0.036 0.005 0.021 0.036 0.056 0.126 0.103 4 3 Weighted average (by mixing proportions) Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Consider these Distribution N(0,1) N(0,10) Simulation 2 …and simulate Records Microarray 9,000 1,000 Non-DE Genes DE Genes ˆ ) 0.9 N (0,1) 0.1 N (0,10) The Mixture becomes: f ( y; Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Simulation 2 ˆ ) 0.9 N (0,1) 0.1 N (0,10) 1. Simulate: f ( y; 2. Ask EMMIX to fit mixtures with up to 5 components and… 3. EMMIX model of best fit: ˆ ) 0.903 N (0.006, 0.993) 0.097 N (0.010,10.805) f ( y; Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Simulation 2 ˆ ) 0.9 N (0,1) 0.1 N (0,10) 1. Simulate: f ( y; ˆ ) 0.903 N (0.006, 0.993) 0.097 N (0.010,10.805) 3. EMMIX best fit: f ( y; Frequency Post Prob Posterior Probabilities are “Decision Function” changing at 2.75 Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Linking Posterior Probabilities with False Discovery Rate Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Linking Posterior Probabilities with False Discovery Rate Not-DE DE Select the N most extreme genes, and FDR is the average posterior probability of not being in the cluster of extremes. Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Simulation 2 ˆ ) 0.9 N (0,1) 0.1 N (0,10) 1. Simulate: f ( y; ˆ ) 0.903 N (0.006, 0.993) 0.097 N (0.010,10.805) 3. EMMIX best fit: f ( y; Select the N most extreme genes, and FDR is the average Post Prob of not being in the cluster of extremes. FDR by N Genes Post Prob Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Example “Diets” (only REFERENCE components of the design) yiHvL g i ri g i 8 r i 8 Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Example “Diets” (only REFERENCE components of the design) yiHvL g i ri g i 8 r i 8 Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Example “Diets” (only REFERENCE components of the design) k f y; k i y; i , Vi i 1 yiHvL ˆ ) 0.044 N (0.87, 67.46) f ( y; g i ri g i 8 r i 8 0.590 N (2.30,10.42) 0.366 N (2.41, 2.32) Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Example “Diets” (only REFERENCE components of the design) ˆ ) 0.044 N (0.87, 67.46) 0.590 N (2.30,10.42) 0.366 N (2.41, 2.32) , f ( y; Armidale Animal Breeding Summer Course, UNE, Feb. 2006 A Quantitative Overview to Gene Expression Profiling in Animal Genetics Mixtures of Distributions Example “Diets” (only REFERENCE components of the design) ˆ ) 0.044 N (0.87, 67.46) 0.590 N (2.30,10.42) 0.366 N (2.41, 2.32) , f ( y; FDR by N Genes In Reverter et al. ‘03 (JAS 81:1900), 27 genes were reported as having a PP > 0.95 of being in the extreme cluster. Now, we can assess that these 27 genes include a FDR < 10%. Armidale Animal Breeding Summer Course, UNE, Feb. 2006
© Copyright 2026 Paperzz