生物計算

Multiple Sequence Alignment
暨南大學資訊工程學系
黃光璿
2004/05/31
1
What is a multiple alignment?
2
3
An alignment of ten I-set immunoglobin
superfamily
4
Motivation
A multiple alignment may suggest

a common structure of the protein
products;

a common function;

a common evolutionary source.
5
Issues

How to define meaningful scoring
function for an alignment?



evolutionary correct alignment --- more
difficult!
structure alignment
How to find the best alignment?

by algorithms
6
Three types of alignment problems


DNA
protein


joined by disulfide bond
RNA

more difficult due to long-range correlation
We focus on alignment problems of
sequences of DNAs or proteins.
7
8
9
10
11

To prove that a computational problem
is NP-hard, we need

to reduce an NP-complete (hard) problem
to this problem.
12

When a computational problem is NPhard, we deal with it by



heuristic: convince other people by
experiments
approximation: how to analyze the
performance?
randomization: how to design a reasonable
algorithm
13
14
15
16
17
18
19
Branch & bound heuristic for the DP
algorithm of the Sum-of-pairs


Carrillo & Lipman (1988)
The idea was implemented in the
famous problem MSA.


Lipman, Altshul, Kececiogly, 1989
MSA can align 6 sequences of length
~200 in reasonable time.
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
參考資料及圖片出處
1.
Biological Sequence Analysis –
Probabilistic Models of Proteins and
Nucleic Acids
R. Durbin, S. Eddy, A. Krogh, and G.
Mitchison,
Cambridge University Press, 1998.
35