Gene expression Statistics 246, Week 3 Thesis: the analysis of gene expression data is going to be big in 21st century statistics Many different technologies, including High-density nylon membrane arrays Serial analysis of gene expression (SAGE) Short oligonucleotide arrays (Affymetrix) Long oligo arrays (Agilent) Fibre optic arrays (Illumina) cDNA arrays (Brown/Botstein)* Total microarray articles indexed in Medline 600 Number of papers 500 400 300 200 100 0 1995 1996 1997 1998 1999 2000 2001 (projected) Year Common themes • Parallel approach to collection of very large • Sophisticated instrumentation, requires some • • • amounts of data (by biological standards) understanding Systematic features of the data are at least as important as the random ones Often more like industrial process than single investigator lab research Integration of many data types: clinical, genetic, molecular…..databases Biological background Transcription DNA G T A A T C C T C | | | | | | | | | C A T T A G G A G RNA polymerase mRNA G U A A U C C Idea: measure the amount of mRNA to see which genes are being expressed in (used by) the cell. Measuring protein might be better, but is currently harder. Reverse transcription Clone cDNA strands, complementary to the mRNA mRNA G U A A U C C U C Reverse transcriptase cDNA T T A G G A G C A T T A G A G G G CT G C A T A G G A G A A T A C A TT TT AA G G A GG G CTA AG GA GG A G G C A TT ATG T A G AG GA G A G A C ACTA TT ATG G C cDNA microarray experiments mRNA levels compared in many different contexts Different tissues, same organism Same tissue, same organism (brain v. liver) (ttt v. ctl, tumor v. non-tumor) Same tissue, different organisms (wt v. ko, tg, or mutant) Time course experiments (effect of ttt, development) Other special designs (e.g. to detect spatial patterns). cDNA microarrays cDNA clones cDNA microarrays Compare the genetic expression in two samples of cells PRINT cDNA from one gene on each spot SAMPLES cDNA labelled red/green e.g. treatment / control normal / tumor tissue HYBRIDIZE Add equal amounts of labelled cDNA samples to microarray. SCAN Laser Detector Biological question Differentially expressed genes Sample class prediction etc. Experimental design Microarray experiment 16-bit TIFF files Image analysis (Rfg, Rbg), (Gfg, Gbg) Normalization R, G Estimation Testing Clustering Biological verification and interpretation Discrimination Some statistical questions Image analysis: addressing, segmenting, quantifying Normalisation: within and between slides Quality: of images, of spots, of (log) ratios Which genes are (relatively) up/down regulated? Assigning p-values to tests/confidence to results. Some statistical questions, ctd Planning of experiments: design, sample size Discrimination and allocation of samples Clustering, classification: of samples, of genes Selection of genes relevant to any given analysis Analysis of time course, factorial and other special experiments…..…...& much more. Some bioinformatic questions Connecting spots to databases, e.g. to sequence, structure, and pathway databases Discovering short sequences regulating sets of genes: direct and inverse methods Relating expression profiles to structure and function, e.g. protein localisation Identifying novel biochemical or signalling pathways, ………..and much more. Part of the image of one channel false-coloured on a white (v. high) red (high) through yellow and green (medium) to blue (low) and black scale Does one size fit all? Segmentation: limitation of the fixed circle method SRG Fixed Circle Inside the boundary is spot (foreground), outside is not. Some local backgrounds Single channel grey scale We use something different again: a smaller, less variable value. Quantification of expression For each spot on the slide we calculate Red intensity = Rfg - Rbg fg = foreground, bg = background, and Green intensity = Gfg - Gbg and combine them in the log (base 2) ratio Log ( Red intensity / Green intensity) 2 Gene Expression Data On p genes for n slides: p is O(10,000), n is O(10-100), but growing, Slides 1 Genes slide 1 0.46 slide 2 0.30 2 -0.10 0.49 4 -0.45 -1.03 3 5 0.15 -0.06 0.74 1.06 slide 3 0.80 0.24 0.04 -0.79 1.35 slide 4 1.51 0.06 0.10 -0.56 1.09 slide 5 … 0.46 ... 0.90 0.20 -0.32 -1.09 ... ... ... ... Gene expression level of gene 5 in slide 4 = Log ( Red intensity / Green intensity) 2 These values are conventionally displayed on a red (>0) yellow (0) green (<0) scale. The red/green ratios can be spatially biased • . Top 2.5%of ratios red, bottom 2.5% of ratios green The red/green ratios can be intensity-biased M = log R/G 2 = log R - log G 2 2 Values should scatter about zero. = (log R + log G )/2 2 2 Normalization: how we “fix” the previous problem The curved line becomes the new zero line Orange: Schadt-Wong rank invariant set Yellow: GAPDH, tubulin Light blue: MSP pool / titration Red line: lowess smooth -4 -2 0 M 2 Normalizing: before 6 8 10 12 14 16 -4 -2 0 M normalised 2 Normalizing: after 6 8 10 12 14 16 From a study of the mouse olfactory system Main (Auxiliary) Olfactory Bulb VomeroNasal Organ Olfactory Epithelium From Buck (2000) Axonal connectivity between the nose and the mouse olfactory bulb >2M, ~1,800 types Neocortex Two principles: “zone-to-zone projection”, and “glomerular convergence” Of interest: the hardwiring of the vertebrate olfactory system • Expression of a specific odorant receptor gene by • Targeting and convergence of like axons to specific an olfactory neuron. glomeruli in the olfactory bulb. The biological question in this case Are there genes with spatially restricted expression patterns within the olfactory bulb? Layout of the cDNA Microarrays • Sequence verified mouse cDNAs 19,200 spots in two print groups of 9,600 each – 4 x 4 grid, each with 25 x24 spots – Controls on the first 2 rows of each grid. 77 • pg1 pg2 Design: How We Sliced Up the Bulb A P D L V M Design: Two Ways to Do the Comparisons Goal: 3-D representation of gene expression Compare all samples to a Multiple direct comparisons sample (e.g., whole bulb) (no common reference) common reference A between different samples L V V R D M A M D P L P An Important Aspect of Our Design A D Different ways of estimating the same contrast: e.g. A compared to P M L Direct = A-P Indirect = A-M + (M-P) A-D + (D-P) or or -(L-A) - (P-L) V P How do we combine these?
© Copyright 2026 Paperzz