Estimating gene networks by Bayesian networks from microarrays

2003年度 統計関連学会 連合大会
企画セッション: DNAアレイデータ解析に関する統計的諸問題
DNAアレイデータ概説
井元清哉1,樋口知之2
1東京大学医科学研究所ヒトゲノム解析センター
[email protected]
2統計数理研究所
[email protected]
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
Joint Statistical Meeting 2003
in San Francisco
8月4日
8:30-10:20
10:30-12:20
10:30-12:20
Analysis of gene expression data (p.96)
Bayesian and mixture method in genomics data (p.125)
Data analysis of microarray data (p.133)
8月5日
10:30-12:20
14:00-15:50
Classification of gene expression data (p.246)
Microarray data analysis (p.276)
8月6日
8:30-10:20
10:30-12:20
10:30-12:20
10:30-12:20
Statistical issues in image analysis, microarrays,
and machine learning (p.305)
Bayesian methods for microarray data analysis (p.342)
Statistics and genomics (p.345)
Analysis of genetic data II (p.370)
8月7日
8:30-10:20
8:30-10:20
10:30-12:20
Statistics and microarrays (p.422)
Normalization of microarray data (p.445)
Multivariate approachs to gene expression data (p.465)
遺伝子発現データ



cDNAマイクロアレイデータ
オリゴヌクレオチドアレイ
(Affymetrix社,GeneChip R )
マクロアレイ
(ラジオアイソトープ)
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
(C) Copyright 2003 Seiya Imoto,
Human Genome Center, University of Tokyo
Red means Cell A < Cell B
Green means Cell A > Cell B
Yellow means Cell A = Cell B
The transfer of information from
DNA to protein
gene
AGGTTCAGCGC
DNA
Transcription
(転写)
mRNA
Splicing; A process that results in removal of
introns and joining of exons in RNAs.
exon: cording region
intron: noncording region
Translation
(翻訳)
Protein
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
cDNA microarray
Reference Cell
Experimental Cell
Extract mRNA
from all genes
Colored cDNA
Hybridize to chip
(C) Copyright 2003 Seiya Imoto,
Human Genome Center, University of Tokyo
GeneX is over-expressed in Cell B than Cell A
Cell A
Cell B
Labeled cDNA
from geneX
Hybridize
to chip
Spot of geneX with
complementary sequence
of colored cDNA
This spot shows red color.
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
(C) Copyright 2003 Seiya Imoto,
Human Genome Center, University of Tokyo
Red means Cell A < Cell B
Green means Cell A > Cell B
Yellow means Cell A = Cell B
cDNA microarray
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
This machine can make 48 microarrays simultaneously
(One day).
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
Colored cDNAs are put at the
cusp of the needles.
384 plate contains 384
colored cDNAs.
Yeast has over 6,000 genes,
then we should change 384
plate 16 times.
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
Dip 32 spots at once.
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
Green
Green b.g.-corrected Red b.g.-corrected
background
(R. b.g.-c)/(G. b.g.-c)
Red intensity
Green
Systematic name
intensity
Red b.g.
Gene function
A_1_1
A_1_2
A_1_3
A_1_4
A_1_5
A_1_6
A_1_7
A_1_8
A_1_9
A_1_10
A_1_11
A_1_12
A_1_13
A_1_14
A_1_15
A_1_16
A_1_17
A_1_18
A_1_19
A_1_20
A_1_21
A_1_22
A_1_23
A_1_24
A_1_25
A_1_26
Ctrl
Ctrl
D x A - PSL
Bkgd
59358.75
512.92
1209.19
512.92
1948.2
512.92
4940.806
512.92
1485.59
512.92
32642.03
512.92
6919.441
512.92
2698.301
512.92
7167.958
512.92
5470.062
512.92
27879.49
512.92
2589.613
512.92
6196.245
512.92
34737.1
512.92
34035.35
512.92
1638.381
512.92
3873.718
512.92
2433.625
512.92
1800.736
512.92
1296.689
512.92
3453.24
512.92
10731.55
512.92
6191.309
512.92
3589.998
512.92
27568.34
512.92
1956.182
512.92
Ctrl
sDxA
58845.83
696.271
1435.28
4427.886
972.671
32129.11
6406.521
2185.382
6655.038
4957.142
27366.57
2076.693
5683.326
34224.18
33522.43
1125.461
3360.799
1920.706
1287.816
783.77
2940.32
10218.63
5678.39
3077.078
27055.42
1443.262
Data
Data
D x A - PSL
Bkgd
50953.13 1779.913
2522.345 1779.913
3100.152 1779.913
6670.604 1779.913
2916.086 1779.913
42304.13 1779.913
8540.246 1779.913
4314.47 1779.913
7379.286 1779.913
6953.799 1779.913
33746.9 1779.913
4385.568 1779.913
8840.475 1779.913
36129.62 1779.913
27128.53 1779.913
2988.042 1779.913
4955.141 1779.913
3502.406 1779.913
3011.855 1779.913
2636.549 1779.913
4968.026 1779.913
9307.246 1779.913
8808.398 1779.913
4420.744 1779.913
20856.2 1779.913
3150.716 1779.913
Data
sDxA
49173.22
742.4323
1320.239
4890.691
1136.173
40524.22
6760.333
2534.557
5599.373
5173.886
31966.99
2605.655
7060.562
34349.7
25348.62
1208.129
3175.228
1722.493
1231.942
856.6356
3188.113
7527.333
7028.485
2640.831
19076.29
1370.803
Ratio (sDxA): Data /
0.835628 YAL003W
1.066298 YAR053W
0.919848 YBL078C
1.104521 YAL008W
1.168096 YAR062W
1.261293 YBL087C
1.055227 YAL014C
1.159778 YAR068W
0.841374 YBL100C
1.043724 YAL025C
1.168103 YBL002W
1.254713 YBL107C
1.242329 YDR044W
1.003668 YDR134C
0.756169 YDR233C
1.073453 YDR048C
0.944784 YDR139C
0.896802 YDR252W
0.956613 YDR053W
1.092968 YDR149C
1.084274 YDR260C
0.736629 YDR056C
1.23776 YDR152W
0.858227 YDR269C
0.705082 YGL189C
0.949795 YGL261C
Ctrl
translation elongation factor eef1beta
hypothetical protein
essential for autophagy
protein of unknown function
putative pseudogene
60s large subunit ribosomal protein l23.e
strong similarity to hypothetical protein yhr214w
questionable orf
nuclear viral propagation protein
histone h2b.2
hypothetical protein
coproporphyrinogen iii oxidase
strong similarity to flo1p, flo5p, flo9p and ylr110c
similarity to hypothetical protein ydl204w
questionable orf
ubiquitin-like protein
strong similarity to egd1p and to human btf3 pro
questionable orf
questionable orf
hypothetical protein
hypothetical protein
weak similarity to c.elegans hypothetical protein
questionable orf
40s small subunit ribosomal protein s26e.c7
strong similarity to members of the srp1/tip1 fam
Data
1. {Cy3ij , Cy5ij }
B
ij
B
ij
2. {Cy3 , Cy5 }
i 番目のアレイによって観測された j 番目の遺伝子
の発現データ
バックグラウンドのインテンシティ分を補正
3. {Cy3ijBN , Cy5ijBN }
正規化されたインテンシティ
 Cy5ijBN
4. xij  log 2 
 Cy3BN
ij





対数変換
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
正規化1 (大域的正規化)
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
正規化2 (局所的正規化)
1
1
2
2
3
3
4
5
4
6
7
8
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
8
4
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
利用可能なマイクロアレイデータ1

スタンフォード
(SMDデータベース)





人 (Homo sapiens)
パン酵 (Saccharomyces
cerevisiae)
線虫 (Caenorhabditis
elegans)
論文のアブストラクト
データの説明
http://genome-www5.stanford.edu/MicroArray/SMD/
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
利用可能なマイクロアレイデータ2

KEGGデータベース




藍藻 (Synechocystis
sp. PCC6803)
枯草菌
(Bacillus subtilis)
線虫(Escherichia coli
K-12 W3110)
論文のアブストラクト
http://www.genome.ad.jp/
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
利用可能なマイクロアレイデータ3

Golub et al. (1999). Science.
血液の癌 AML と ALL の分類
38患者(学習データ),34患者(テストデータ)
http://contest.genome.ad.jp/problem2.html
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
マイクロアレイデータ解析の
レクチャーノート
Terry Speed ed. (2002).
Statistical analysis of gene expression
microarray data. CHAPMAN&HALL/CRC
Sorin Draghici. (2003).
Data analysis tools for DNA
microarrays. CHAPMAN&HALL/CRC
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo
その他

科研費シンポジュウム「バイオスタティスティクスの数理的基
礎」(2002年12月東京大学数理科学)
チュートリアル:遺伝子発現データ解析概論.
濱野鉄太郎,伊藤陽一,井元清哉
http://www.ms.u-tokyo.ac.jp/~nakahiro/sympo14/tu1

日本計量生物学会
2003年度シンポジュウム特別セッション
「マイクロアレイデータ解析における統計的方法論の開発」
井元清哉,大瀧慈
http://bonsai.ims.u-tokyo.ac.jp/~imoto/imoto_biometrics2003.pdf
(C) Copyright 2003 Seiya Imoto, Human Genome Center, University of Tokyo