Sequence alignments

Sequence alignments
Genetic sequences change over time
LRGGD
mutation
deletion
✖
LRGD
LRCD
mutation
ARCD
time
Relationshipbetweenoriginalandfinalsequence:
LRGGD
AR-CD
or
LRGGD
ARC-D
In practice: we only know sequences from
extant organisms
human
LRGDDC
ancestor
mouse
LGDCC
We need to align these sequences to
compare them
mouse
LGDCC
human
LRGDDC
LRGDDC
L-GDCC
LRGDDCL-GD-CC
LRGDDC
-LGDCC
Whichalignmentiscorrect?
We need to score the alignment
Example:
• match=+1
• mismatch=-1
• gap=0
LRGDDC
L-GDCC
score=1+0+1+1-1+1
=3
LRGDDCL-GD-CC
score=1+0+1+1+0+1+0
=4
LRGDDC
-LGDCC
score=0-1+1+1-1+1
=1
We need to score the alignment
Example:
• match=+1
• mismatch=-1
• gap=-2
LRGDDC
L-GDCC
score=1-2+1+1-1+1
=1
LRGDDCL-GD-CC
score=1-2+1+1-2+1-2
=-2
LRGDDC
-LGDCC
score=-2-1+1+1-1+1
=-1
We often score by amino-acid similarity
BLOSUM62Matrix
pij
score = log
pi p j
http://commons.wikimedia.org/wiki/File:BLOSUM62.gif
Gaps in alignments are called “indels”
LRGDDC
L-GDCC
indel
Canyouguesswhy?
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
G
C
A
T
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
G
C
A
T
Alignment:
-
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
G
-1
C
A
T
Alignment:
-G
--
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
G
-1
C
-2
A
T
Alignment:
-GC
---
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
G
-1
C
-2
A
-3
T
-4
Alignment:
-GCAT
-----
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
G
-1
C
-2
A
-3
T
-4
Alignment:
--G
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
C
-2
A
-3
T
-4
Alignment:
----GAT
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
?
C
-2
A
-3
T
-4
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
-2
C
-2
A
-3
T
-4
Alignment:
-G--G
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
-2
C
-2
A
-3
T
-4
Alignment:
--G
-G-
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
1
C
-2
A
-3
T
-4
Alignment:
-G
-G
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
1
C
-2
0
A
-3
T
-4
Alignment:
-GC
-G-
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
1
0
C
-2
0
A
-3
T
-4
Alignment:
-G-GA
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
1
0
C
-2
0
-1
A
-3
T
-4
Alignment:
-GC-G-A
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
1
0
C
-2
0
-1
A
-3
T
-4
Alignment:
-G-C
-GA-
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
1
0
C
-2
0
0
A
-3
T
-4
Alignment:
-GC
-GA
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
1
0
-1
C
-2
0
0
-1
A
-3
-1
1
0
T
-4
-2
0
2
How do we find the best alignment given
a scoring system?
Globalalignment:Needleman-Wunsch algorithm
Example:alignGCAT andGAT
Scoring:match=1,mismatch=-1,gap=-1
G
A
T
0
-1
-2
-3
G
-1
1
0
-1
C
-2
0
0
-1
A
-3
-1
1
0
T
-4
-2
0
2
Alignment:
-GCAT
-G-AT
Now try on your own
AlignATGCT andATTACA
Scoring:match=1,mismatch=-1,gap=-1
A
T
G
C
T
A
T
T
A
C
A
Multiple sequence alignment (MSA)
Software to generate MSAs
• MAFFT
(verygood,veryfast)
http://mafft.cbrc.jp/alignment/software/
• Clustal Omega
(verygood,veryfast)
http://www.ebi.ac.uk/Tools/msa/clustalo/
• PRANK
(extremelygood,veryslow)
http://wasabiapp.org/software/prank/