From reference genes to global mean normalization Jo Vandesompele professor, Ghent University co-founder and CEO, Biogazelle EMBL Master Class - Following MIQE recommendation Heidelberg, July 4, 2012 Critical elements contributing to successful qPCR results “normalization is the single most important factor contributing to (more) accurate qPCR results” Derveaux et al., Methods, 2010 Three types of normalization will be discussed n reference genes: gold standard for normalization n global mean normalization and selection of stable references n expressed repeat elements http://genomebiology.com/2002/3/7/research/0034.1 Open Access et al. Mestdagh 2009 Volume 10, Issue 6, Article R64 Method Research !"#$%&'()"*+(,(-#.%/,((�(#12(/(2-#34,4+#1%//5&-#6278(#1"++(9%'4&( $%&#:"5-#;&&(#0(#1%(+(#%&'#32%&<#=+(,(*%& comment Accurate normalization of real-time quantitative RT-PCR data by geometric averaging of multiple internal control genes A novel and universal method for microRNA RT-qPCR data normalization Pieter Mestdagh*, Pieter Van Vlierberghe*, An De Weer*, Daniel Muth†, Frank Westermann†, Frank Speleman* and Jo Vandesompele* Addresses: *Center for Medical Genetics, Ghent University Hospital, De Pintelaan 185, Ghent, Belgium. †Department of Tumour Genetics, German Cancer Center, Im Neuenheimer Feld 280, Heidelberg, Germany. ?"22()+"&'(&8(>#32%&<#=+(,(*%&N#OJ*%4,>#@2%&<4N)+(,(*%&P27MN%8NQ( Correspondence: Jo Vandesompele. Email: [email protected] Published: 18 June 2002 Genome Biology 2002, 3(7):research0034.1–0034.11 Received: 20 December 2001 Revised: 10 April 2002 Accepted: 7 May 2002 reviews ;''2())>#?(&/(2#@"2#A('48%,#B(&(/48)-#BC(&/#D&4E(2)4/5#F")+4/%,#G.H-#0(#14&/(,%%&#GIH-#6JKLLL#BC(&/-#6(,M47*N Published: 16 June 2009 Genome Biology 2009, 10:R64 (doi:10.1186/gb-2009-10-6-r64) The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2002/3/7/research/0034 reports © 2002 Vandesompele et al., licensee BioMed Central Ltd (Print ISSN 1465-6906; Online ISSN 1465-6914) Abstract %&%,5)4)# "@# /C"7)%&')# "@# M(&()# 4&# /S"# '4@@(2(&/4%,,5# ,%Q(,(' :9;# +"+7,%/4"&)# VGW-# SC4,(# 2(%,J/4*(# :TJ1?:# +2"E4'()# /C( )4*7,/%&("7)#*(%)72(*(&/#"@#M(&(#(R+2())4"&#*%&5#'4@J @(2(&/# )%*+,()# @"2# %# ,4*4/('# &7*Q(2# "@# M(&()-# %&'# 4)# ()+(J 84%,,5# )74/%Q,(# SC(&# "&,5# %# )*%,,# &7*Q(2# "@# 8(,,)# %2( %E%4,%Q,(# VXJYWN# 6"/C# /(8C&4U7()# C%E(# /C(# %'E%&/%M(# "@ )+(('-#/C2"7MC+7/#%&'#%#C4MC#'(M2((#"@#+"/(&/4%,#%7/"*%/4"& 8"*+%2('# /"# 8"&E(&/4"&%,# U7%&/4@48%/4"&# *(/C"')-# )78C# %) &"2/C(2&JQ,"/# %&%,5)4)-# 24Q"&78,(%)(# +2"/(8/4"&# %))%5-# "2 information B(&(J(R+2())4"&#%&%,5)4)#4)#4&82(%)4&M,5#4*+"2/%&/#4&#*%&5 @4(,')# "@# Q4","M48%,# 2()(%28CN# D&'(2)/%&'4&M# +%//(2&)# "@ (R+2())('#M(&()#4)#(R+(8/('#/"#+2"E4'(#4&)4MC/#4&/"#8"*+,(R 2(M7,%/"25#&(/S"2<)#%&'#S4,,#*")/#+2"Q%Q,5#,(%'#/"#/C(#4'(&J /4@48%/4"&# "@# M(&()# 2(,(E%&/# /"# &(S# Q4","M48%,# +2"8())()-# "2 4*+,48%/('# 4&# '4)(%)(N# TS"# 2(8(&/,5# '(E(,"+('# *(/C"')# /" *(%)72(#/2%&)824+/#%Q7&'%&8(#C%E(#M%4&('#*78C#+"+7,%24/5 %&'# %2(# @2(U7(&/,5# %++,4('N# A482"%22%5)# %,,"S# /C(# +%2%,,(, interactions Conclusions: The normalization strategy presented here is a prerequisite for accurate RT-PCR expression profiling, which, among other things, opens up the possibility of studying the biological relevance of small expression differences. refereed research Results: We outline a robust and innovative strategy to identify the most stably expressed control genes in a given set of tissues, and to determine the minimum number of genes required to calculate a reliable normalization factor. We have evaluated ten housekeeping genes from different abundance and functional classes in various human tissues, and demonstrated that the conventional use of a single gene for normalization leads to relatively large errors in a significant proportion of samples tested. The geometric mean of multiple carefully selected housekeeping genes was validated as an accurate normalization factor by analyzing publicly available microarray data. deposited research Background: Gene-expression analysis is increasingly important in biological research, with realtime reverse transcription PCR (RT-PCR) becoming the method of choice for high-throughput and accurate expression profiling of selected genes. Given the increased sensitivity, reproducibility and large dynamic range of this methodology, the requirements for a proper internal control gene for normalization have become increasingly stringent. Although housekeeping gene expression has been reported to vary considerably, no systematic survey has properly determined the errors related to the common practice of using only one control gene, nor presented an adequate way of working around this problem. Background Received: 2 April 2009 Revised: 2 April 2009 Accepted: 16 June 2009 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2009/10/6/R64 © 2009 Mestdagh et al.; licensee BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. <p>The Normalization iments.</p> mean expression of microRNA value: RT-qPCR a new method for accurate and reliable normalization of microRNA expression data from RT-qPCR exper- Abstract Gene expression analysis of microRNA molecules is becoming increasingly important. In this study we assess the use of the mean expression value of all expressed microRNAs in a given sample as a normalization factor for microRNA real-time quantitative PCR data and compare its performance to the currently adopted approach. We demonstrate that the mean expression value outperforms the current normalization strategy in terms of better reduction of technical variation and more accurate appreciation of biological changes. Background MicroRNAs (miRNAs) are an important class of gene regulators, acting on several aspects of cellular function such as differentiation, cell cycle control and stemness. Not surprisingly, deregulated miRNA expression has been implicated in a wide variety of diseases, including cancer [1]. Moreover, miRNA expression profiling of different tumor entities resulted in the identification of miRNA signatures correlating with patient diagnosis, prognosis and response to treatment [2]. Despite the small size of miRNA molecules, several technologies have been developed that enable high-throughput and sensitive miRNA profiling, such as microarrays [3-8], real-time quantitative PCR (RT-qPCR) [9,10] and bead-based flow cytometry [2]. In terms of accuracy and specificity, RTqPCR has become the method of choice for measuring gene expression levels, both for coding and non-coding RNAs. However, the accuracy of the results is largely dependent on proper data normalization. As numerous variables inherent to an RT-qPCR experiment need to be controlled for in order to differentiate experimentally induced variation from true biological changes, the use of multiple reference genes is generally accepted as the gold standard for RT-qPCR data normalization [11]. Typically, a set of candidate reference genes is evaluated in a pilot experiment with representative samples from the experimental condition(s). Ideally these candidate reference genes belong to different functional classes, significantly reducing the possibility of confounding co-regulation. In case of miRNA profiling, only few candidate reference miRNAs have been reported [12]. Generally, other small noncoding RNAs are used for normalization. These include both small nuclear RNAs (for example, U6) and small nucleolar RNAs (for example, U24, U26). Strategies for normalization of high-dimensional expression profiling experiments (using, for example, microarray technology, but recently also transcriptome sequencing) generally take advantage of the huge amount of data generated and often use (almost) all available data points. These strategies range from a straightforward approach based on the mean or median expression value to more complex algorithms such as Genome Biology 2009, 10:R64 Why do we need normalization? n 2 sources of variation in gene expression results n biological variation (true fold changes) n experimentally induced variation (noise and bias) n purpose of normalization is removal or reduction of the experimental variation n input quantity: RNA quantity, cDNA synthesis efficiency, … n (input quality: RNA integrity, RNA purity, …) Various normalisation strategies have been proposed Huggett et al., Genes and Immunity, 2005 Various normalisation strategies have been proposed n sample size or volume n total RNA n rRNA genes (e.g. 18S rRNA) n spike-in molecules n reference genes (mRNA) (‘housekeeping genes’) The use of reference genes (housekeeping genes) is the most universal and most appropriate method for normalization. The problem of using a single non-validated reference gene Cq values GOI ACTB GAPDH T 21.0 U 23.0 T 18.0 U 19.0 T 19.0 U 22.0 normalized relative quantities GOIACTB GOIGAPDH T 2 U 1 T 1 U 2 4-fold difference different directions the geNorm solution to the normalization problem n framework for qPCR gene expression normalisation using the reference gene concept n quantified errors related to the use of a single reference gene (> 3 fold in 25% of the cases; > 6 fold in 10% of the cases) n developed a robust algorithm for assessment of expression stability of candidate reference genes n proposed the geometric mean of at least 3 reference genes for accurate and reliable normalisation n Vandesompele et al., Genome Biology, 2002 Candidate reference genes in normal blood samples n RT-qPCR analysis of 5 candidate reference genes (belonging to different functional and abundance classes) on 7 normal blood samples 4 3 ACTB HMBS 2 HPRT1 TBP 1 UBC 0 A B C D E F G 15 fold difference between A and B if normalized by only one gene (ACTB or HMBS) geNorm expression stability parameter n pairwise variation V (between any 2 candidate reference genes) gene A gene B sample 1 a1 b1 log2(a1/b1) sample 2 a2 b2 log2(a2/b2) sample 3 a3 b3 log2(a3/b3) … … … … sample n an bn log2(an/bn) standard deviation = V n gene stability measure M: average pairwise variation V of a gene with all other tested candidate reference genes n iterative procedure of removing the worst reference gene followed by recalculation of M-values geNorm analysis is straightforward and helps you on 2 levels n ranking of candidate reference genes according to their stability n determination of how many genes are required for reliable normalization http://www.genorm.info Calculation of a sample specific normalization factor as the geometric mean of the reference genes n geometric mean of 3 reference gene expression levels geometric mean = (a x b x c) arithmetic mean = 1/3 a+b+c 3 n controls for outliers n compensates for differences in expression level between the reference genes Geometric averaging is robust (insensitive to outliers) ACTB HMBS HPRT1 TBP UBC NF The use of multiple reference genes for normalization gives statistically more significant results Kaplan-Meier cancer patient survival curve log rank statistics NF4 0.003 NF1 0.006 0.021 0.023 0.056 Hoebeeck et al., Int J Cancer, 2006 The use of multiple reference genes for normalization enables accurate measurement of small expression differences LEMD3 levels drop to 50% in patient with NMD patient / control n 3 independent experiments n 95% confidence intervals n Hellemans et al., Nature Genetics, 2004 geNorm is the de facto standard for reference gene validation and normalization n > 4,500 citations of the geNorm technology n > 15,000 geNorm software downloads in 100 countries Large and active geNorm discussion community > 1000 members, almost 2000 posts http://tech.groups.yahoo.com/group/genorm/ genormPLUS is a much improved version built into qbasePLUS classic geNorm improved geNorm Excel Windows qbasePLUS Win, Mac, Linux 1x 20x expert interpretation + report - + ranking best 2 genes - + handling missing data - + raw data (Cq) as input - + platform speed genormPLUS result interpretation is easy n expert report without need to understand formulas n time saver n higher confidence in the results other methods for selection stable reference genes reference Alternative approaches tooffind stably expressed genes n n n n n n n Global Pattern Recognition (Akilesh et al., Genome Research, 2003) BestKeeper (Pfaffl et al., Biotechnology Letters, 2004) Equivalence test (Haller et al., Analytical Biochemistry, 2004) ANOVA test (Brunner et al., BMC Plant Biology, 2004) Normfinder (Andersen et al., Cancer Research, 2004) Szabo et al., Genome Biology, 2004 Abruzzo et al., Biotechniques, 2005 present mathematical (linear mixed-effects) models to further analyze candidate reference genes log yij = µ + Ti + Gj + εij n Vandesompele, Kubista & Pfaffl Reference gene validation software for improved normalization book chapter in “Real-time PCR: an essential guide”, Horizon Bioscience, 2nd edition (2009) Misconceptions and tips for normalization n expression levels of reference gene and gene of interest do not need to be similar n no need to measure reference genes in the same run (plate) as genes of interest n geNorm pilot experiment can fit in a single 96-well plates n 10 representative samples x 8 candidate reference genes (no replicates, no controls, no standard curve) n results that can be used in the future if experimental conditions do not change n at least 2 reference genes are needed n quality control of the expression stability and normalization factor n more accurate results Large scale gene expression studies benefit from different normalization strategy n service applications at Biogazelle n 755 microRNAs (OpenArray) n 1718 long non-coding RNAs (SmartChip) n gene panels (96 or 384-well plates) A new normalization method: global mean normalization n hypothesis: when a large set of genes are measured, the average expression level reflects the input amount and could be used for normalization n microarray normalization (lowess, mean ratio, …) n RNA-sequencing read counts n the set of genes must be sufficiently large and unbiased n we test this hypothesis using genome-wide microRNA data from experiments in which Biogazelle quantified a large number of miRNAs in different studies n cancer biopsies & serum o neuroblastoma, T-ALL, EVI1 leukemia, retinoblastoma n pool of normal tissues, normal bone marrow set n induced sputum of smokers vs. non-smokers How to validate a new normalization method? n n n n geNorm ranking global mean vs. candidate reference genes reduction of experimental noise balancing of expression differences (up vs. down) identification of truly differentially expressed genes n original global mean (Mestdagh et al., 2009) n improved global mean (D’haene et al., 2012) n mean center the data > equal weight to each gene n allow PCR efficiency correction n improved global mean on common targets (D’haene et al., 2012) n improved global mean n average only genes that are expressed in all samples MicroRNA normalization strategies n small-RNA controls n frequently used normalization strategy n small nuclear RNAs, small nucleolar RNAs n 18 available from Applied Biosystems n global mean normalization n method applied for microarray data n universal: applicable for all large miRNA datasets n many datapoints needed (megaplex vs. multiplex) n miRNAs/small RNA controls that resemble the mean n minimal standard deviation when comparing miRNA expression with mean ( geNorm V value, standard deviation of log transformed ratios) n compatible with multiplex assays n need to determine mean expression level first Frequently used small RNA controls for miR normalization n How ‘stable’ is the global mean compared to controls? n geNorm analysis using controls and mean as input variables n exclusion of potentially co-regulated controls HY3 RNU19 RNU24 RNU38B RNU43 RNU44 RNU48 RNU49 RNU58A RNU58B RNU66 RNU6B U18 U47 U54 U75 Z30 RPL21 7q36 5q31.2 9q34 1p34.1-p32 22q13 1q25.1 6p21.32 17p11.2 18q21 18q21 1p22.1 10p13 15q22 1q25.1 8q12 1q25.1 17q12 13q12.2 T-ALL geNorm ranking 1,8 expression stability 1,6 1,4 1,2 1 0,8 0,6 0,4 0,2 0 geNorm ranking neuroblastoma leukemia EVI1 overexpression bone marrow pool normal tissues Cumulative variance plot demonstrates most pronounced removal of variation when using global mean (neuroblastoma) 120 100 80 not normalised stable controls 60 mean miRNAs 40 20 0 0 50 100 150 200 250 300 Removal of variation in other datasets T-ALL bone marrow leukemia EVI1 overexpression pool normal tissues Biological validation of normalization strategy n choice of normalization strategy influences differential miRNA expression n miR-17-92 expression in 2 subgroups of neuroblastoma 3,5 3 2,5 2 1,5 1 0,5 0 stable controls mean miRNAs Global mean trategy also works for microarray data n each sample is measured by RT-qPCR and microarray n global mean normalization n standardization per method n hierarchical clustering n samples cluster by sample (and NOT by method) Global mean strategy also works for mRNA data n 4 MAQC samples n 201 MAQC consensus genes are measured n geNorm analysis n 9 classic reference genes n 1 Alu-Sq assay n global mean of 201 mRNAs 0.7 expression stability 0.6 0.5 0.4 0.3 0.2 0.1 0 Global mean normalization method is preferred when many genes are simultaneously measured n novel and powerful normalization strategy n only useful when measuring a large and unbiased set of genes n e.g. microRNA gene expression profiling of 755 human miRs n e.g. measuring panels of pathway genes n maximal reduction of technical noise n improved identification of differentially expressed genes n Mestdagh et al., Genome Biology, 2009 (original global mean) n D’haene et al., Methods Mol Biol, 2012 (improved global mean) n integrated in qbasePLUS Novel strategy for normalization n need for something new n reference gene validation requires (extensive) experimental work n sometimes not possible (lack of sample material, funding, time or devotion) n there’s a need for an alternative n EAR normalization (Expressed Alu Repeat) using a repetitive sequence in the human transcriptome as a measure for the mRNA fraction EAR normalization - principle rationale: repeat sequences are present in the UTR of many genes, and the differential expression of a small number of genes won’t influence the overall repeat abundance in the transcriptome Alu repeat elements n by far the most abundant repeats in the human genome n 1 million copies (10% of the genome), 31 subfamilies (well conserved) n short interspersed elements (SINE) replicating via retrotransposition n ~280 bp long, followed by a variable poly-A tail n no known biological function n implicated in human disease (unequal recombination) n roughly 1500 human genes contain one or more Alu repeats geNorm analysis of a very heterogeneous experiment confirm Alu content as good normalizer 3rd EMBO qPCR course, Heidelberg, 2006 Good correlation between geometric mean of 4 stably expressed reference genes (NF) and Alu-Sq signal Alu repeat normalization results in more significant genes n 59 prognostic marker genes n WT-Ovation amplified cDNA from 30 primary neuroblastoma tumors belonging to 2 different risk groups (high risk vs. low risk) (Vermeulen et al., Lancet Oncology, 2009) n multiple reference gene normalization (geometric mean of 4 stably WSB1 ULK2 TYMS TNFRSF25 SNAPC1 SLC25A5 SLC6A8 SCG2 QPCT PTPRN2 PTPRH PTPRF PTN PRKCZ PRKACB PRDM2 PRAME PMP22 PLAT PLAGL1 PIK3R1 PDE4DIP PAICS ODC1 NTRK1 NRCAM NM23A NHLH2 MYCN MTSS1 MRPL3 MCM2 MAPT MAP7 MAP2K4 INPP1 IGSF4 HIVEP2 GNB1 FYN EPN2 EPHA5 EPB41L3 ELAVL4 ECEL1 DPYSL3 DDC CPSG3 CLSTN1 CDKN3 CDH5 CDCA5 CD44 CAMTA2 CAMTA1 BIRC5 ARHGEF7 AKR1C1 AHCY Alu NF NF Alu expressed reference genes) (NF) n expressed Alu repeat normalization n more significant genes (45 vs. 35) n lower p-values (41 out of 59 have lower p-value) Good correlation between arithmetic mean Cq of 4 stably expressed reference genes (X) and Alu repeat signal (Y) 18 16 14 12 ALUsq 20 22 n cancer cell lines treated with therapeutic compounds n Spearman r = 0.849, p = 2.2E-16 22 24 26 28 NF4 30 Normalization in practice n most powerful, flexible and user-friendly real-time PCR data-analysis software n based on Ghent University’s geNorm and qBase technology n state of the art normalization procedures o one or more classic reference genes o global mean normalization o expressed repeat normalization o no normalization n detection and correction of inter-run variation n dedicated error propagation n fully automated analysis; no manual interaction required http://www.qbaseplus.com qbasePLUS is the only 3rd party qPCR data-analysis software that is MIQE and RDML compliant n RDML universal file format compatible n peer-reviewed data-analysis algorithms n multiple reference gene normalization n global mean normalization n inter-run calibration n error propagation n assay information n data quality control n reference gene stability analysis n NTC analysis n PCR efficiency calculation n replicate variability assessment Conclusions n proper normalization has a major impact on your results n provides statistically more significant results n enables accurate assessment of small expression differences n gold standard for RT-qPCR gene expression analysis n (geNorm) evaluation of candidate reference genes n geometric mean of multiple stably expressed reference genes n global mean normalization and subsequent geNorm based selection of reference genes that resemble the mean is a valid option when measuring a large and unbiased set of genes (e.g. all miRNAs) n expressed Alu repeat elements emerges as an alternative normalization strategy (Witt, Pattyn, et al., in preparation) Weekly qPCR tips and tricks via Twitter https://twitter.com/#!/Biogazelle @Biogazelle Acknowledgments n Ghent University n Pieter Mestdagh n Filip Pattyn n Steve Lefever n Jan Hellemans n Stefaan Derveaux n Katleen De Preter n Sarah De Brouwer n Frank Speleman n Vladimir Benes, Nina Witt, Tania Nolan, Jim Huggett, …
© Copyright 2026 Paperzz