Algorithms research

Algorithms research
Tandy Warnow
UT-Austin
“Algorithms group”
• UT-Austin: Warnow, Hunt
• UCB: Rao, Karp, Papadimitriou, Russell,
Myers
• UCSD: Huelsenbeck
• UNM: Moret, Bader, Williams
• External participants: Mossel (UCB), Huson
(Germany), Steel (NZ), and others
Main research foci
• Solving maximum parsimony and
maximum likelihood more effectively
• “Fast converging methods”
• Gene order and content phylogeny
• Reticulate evolution
• Multiple sequence alignment at the genomic
level
GRAPPA (Genome Rearrangement
Analysis under Parsimony and other
Phylogenetic Algorithms)
http://www.cs.unm.edu/~moret/GRAPPA/
• Heuristics for NP-hard optimization problems
• Fast polynomial time distance-based methods
• Contributors: U. New Mexico,U. Texas at
Austin, Universitá di Bologna, Italy
• Poster: Jijun Tang
Maximum Parsimony on Rearranged
Genomes (MPRG)
• The leaves are rearranged genomes.
• Find the tree that minimizes the total number of
rearrangement events
A
A
B
3
6
E
C
2
B
D
C
3
4
Total length
= 18
F
D
Benchmark gene order dataset:
Campanulaceae
• 12 genomes + 1 outgroup (Tobacco), 105 gene segments
• NP-hard optimization problems: breakpoint and inversion
phylogenies
1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)
Benchmark gene order dataset:
Campanulaceae
• 12 genomes + 1 outgroup (Tobacco), 105 gene segments
• NP-hard optimization problems: breakpoint and inversion
phylogenies
1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)
2000: Using GRAPPA v1.1 on the 512-processor Los Lobos
Supercluster machine: 2 minutes (200,000-fold speedup
per processor)
Benchmark gene order dataset:
Campanulaceae
• 12 genomes + 1 outgroup (Tobacco), 105 gene segments
• NP-hard optimization problems: breakpoint and inversion
phylogenies
1997: BPAnalysis (Blanchette and Sankoff): 200 years (est.)
2000: Using GRAPPA v1.1 on the 512-processor Los Lobos
Supercluster machine: 2 minutes (200,000-fold speedup
per processor)
2003: Using latest version of GRAPPA: 2 minutes
on a single processor (1-billion-fold speedup per
processor)
Reticulate Evolution
• Group leader: Randy Linder
• Software: (1) producing random networks,
(2) simulating sequences down networks,
(3) performance evaluation of methods
(4) inferring reticulate networks
• Current reconstruction methods limited to
one reticulation event
• Poster: Luay Nakhleh
20-taxon 1-hybrid network. 0.1 scaling factor.
MP/ML heuristics
• Disk-Covering Methods (DCMs): Divideand-conquer strategies that boosting the
performance of base methods for MP/ML
(Warnow)
• Mr Bayes (Huelsenbeck)
• New I-DCM3 technique improves upon the
Ratchet and TBR
• Poster: Usman Roshan (DCM-MP)
Gutell dataset: 854 rRNA sequences
Iterative-DCM3 trials find trees of MP score 103210 in 30 hours,
whereas ratchet500 trials take 45 hours to find trees of same score
Other planned projects
(partial list)
• Multiple Sequence Alignment (Myers and
Williams)
• Steiner Tree algorithms - error bounds and
new heuristics (Rao)
• MCMC methods (Russell and Huelsenbeck)
• Symbolic representation of data (Hunt)
• Parallel algorithms (Bader and Williams)
Questions for group
•
•
•
•
How should we measure performance?
How should we use simulated data?
How should we use real datasets?
How can we study criteria (MP, ML, etc.) as
opposed to methods?
• Should we sponsor DIMACS-style challenges?
• Others? (please bring questions, comments,
answers, to the break-out session)