This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment Slides by: Eric Ringger, with contributions from Mike Jones, Eric Mercer, Sean Warnick Announcements Homework #15 due now Project #5: Gene Sequence Alignment Kick-off: today Read directions now Whiteboard experience: due Monday Early: Monday after mid-term exam Due: Wednesday after mid-term exam Mid-term Exam Start preparing your one page of notes Must be prepared by you. No cutting and pasting. Objectives Revisit the main ideas behind Dynamic Programming Define the optimality property for DP Develop the algorithm for gene sequence alignment (or at least begin) Prepare for Project #5 Dynamic Programming The six steps: 1. Ask: am I solving an optimization problem? 2. Devise a minimal description (address) for any problem instance and sub-problem 3. Divide problems into sub-problems: define the recurrence to specify the relationship of problems to sub-problems 4. Check that the optimality property holds: An optimal solution to a problem is built from optimal solutions to subproblems. 5. Store results – typically in a table – and re-use the solutions to sub-problems in the table as you build up to the overall solution. 6. Back-trace / analyze the table to extract the composition of the final solution. Optimality Property An optimal solution to a problem is built from optimal solutions to sub-problems. The optimality property is a necessary condition for solving an optimization problem by DP! It allows us to store and re-use optimal results to sub-problems. Optimality f1 (optimalsolution(child1 )) f (optimalsolution(child )) 2 optimalsolution( parent ) min (or max) 2 ... f n (optimalsolution(child n )) A B E F C G H D I J K Shortest Path American Fork Sundance 20 10 12 15 18 Orem 3 10 Geneva 12 Provo Goal: the shortest path from AF to Provo. Does this problem exhibit the optimality property? Pair up. Discuss American Fork Questions Sundance 20 10 15 12 18 Orem 3 Geneva 10 Provo Q. In general, do you know which 12 sub-problem solutions to use in advance? A. No. So a very greedy algorithm is not an option. (But Dijkstra’s is.) Q: How does having a table of intermediate shortest path results help find the shortest path from AF to Provo? A: Reuse those results for intermediate destinations as you try different routes. Q. Do you have to reconsider alternative sub-optimal solutions for the intermediate destinations? A. No Thus,, the Optimality Property holds Therefore, the shortest path problem can be solved by DP. Optimality in Driving The shortest route from American Fork to Provo passes through Orem. Assume we have found this route. Then what can we say about the shortest route from AF to Orem? It follows that optimal route from AF to Provo. Could it be otherwise? A related problem Now suppose you drive from AF to Orem as fast as you can on your way to Provo, But you are limited by the gas in your tank. Does the Optimality Property Hold? Start with 10 gallons AF 5/9 Orem 5/9 10/5 10/5 20/1 20/1 Provo “takes 20 minutes using 1 gallon of gas” Goal: get to Provo in as little time as possible. No refueling. Does this problem (formulation) satisfy the optimality property or not? Why? Problem Solving Advice Start by asking: which sub-problems should be solved? If you know how to choose in advance using local information only, then greedy might work. Else if sub-problems don’t overlap, then divide and conquer would be a good choice. Else if the optimality property holds, then DP is a good choice. Else the optimality property does NOT hold, so apply another strategy. (Stay tuned for more guidance) Important! Gene Sequence Alignment x=ACGCTGA y=ACTGT Virtually Identical Problems Edit Distance aka Levenshtein Distance Sequence Alignment E.g., Gene Sequence Alignment Fundamentally the same thing! We’re focusing on gene sequence alignment. Edit Distance / Sequence Alignment Problem Given: 2 strings: 𝑥 and 𝑦; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑥) = 𝑚; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑦) = 𝑛 Return: Smallest cost to transform string 𝒙 into string 𝒚 (or vice versa) Another perspective: smallest cost of aligning 𝒙 to 𝒚 (or vice versa) Contrast the 2 perspectives. Edit Distance / Sequence Alignment Problem Given: 2 strings: 𝑥 and 𝑦; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑥) = 𝑚; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑦) = 𝑛 Return: Smallest cost to transform string 𝒙 into string 𝒚 (or vice versa) Another perspective: smallest cost of aligning 𝒙 to 𝒚 (or vice versa) The ‘-’ is a “gap” Alignment Example: x: ACGCT-C y: A--CTGT Edit Distance / Sequence Alignment Problem Given: 2 strings: 𝑥 and 𝑦; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑥) = 𝑚; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑦) = 𝑛 Return: Smallest cost to transform string 𝒙 into string 𝒚 (or vice versa) Another perspective: smallest cost of aligning 𝒙 to 𝒚 (or vice versa) Divide into Pairs x: ACGCT-C y: A--CTGT Edit Distance / Sequence Alignment Problem Given: 2 strings: 𝑥 and 𝑦; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑥) = 𝑚; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑦) = 𝑛 Return: Smallest cost to transform string 𝒙 into string 𝒚 (or vice versa) Another perspective: smallest cost of aligning 𝒙 to 𝒚 (or vice versa) Cost: Type: Match; Cost = cmatch Each Pair has a type and a cost x: ACGCT-C y: A--CTGT Edit Distance / Sequence Alignment Problem Given: 2 strings: 𝑥 and 𝑦; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑥) = 𝑚; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑦) = 𝑛 Return: Smallest cost to transform string 𝒙 into string 𝒚 (or vice versa) Another perspective: smallest cost of aligning 𝒙 to 𝒚 (or vice versa) Cost: Match: cmatch Type: Insertion into x (= deletion from y) aka “indel”; Cost = cindel x: ACGCT-C y: A--CTGT Edit Distance / Sequence Alignment Problem Given: 2 strings: 𝑥 and 𝑦; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑥) = 𝑚; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑦) = 𝑛 Return: Smallest cost to transform string 𝒙 into string 𝒚 (or vice versa) Another perspective: smallest cost of aligning 𝒙 to 𝒚 (or vice versa) Cost: Match: cmatch Insertion into x (= deletion from y): cindel Insertion into y (= deletion from x): cindel x: ACGCT-C y: A--CTGT Edit Distance / Sequence Alignment Problem Given: 2 strings: 𝑥 and 𝑦; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑥) = 𝑚; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑦) = 𝑛 Return: Smallest cost to transform string 𝒙 into string 𝒚 (or vice versa) Another perspective: smallest cost of aligning 𝒙 to 𝒚 (or vice versa) Cost: Match: cmatch Insertion into x (= deletion from y): cindel Insertion into y (= deletion from x): cindel Type: Substitution of x into y (or from y into x); Cost = csub x: ACGCT-C y: A--CTGT Edit Distance / Sequence Alignment Problem Given: 2 strings: 𝑥 and 𝑦; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑥) = 𝑚; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑦) = 𝑛 Return: Smallest cost to transform string 𝒙 into string 𝒚 (or vice versa) Another perspective: smallest cost of aligning 𝒙 to 𝒚 (or vice versa) Cost: Match: cmatch Insertion into x (= deletion from y): cindel Insertion into y (= deletion from x): cindel Substitution of x into y (or from y into x); Cost = csub x: ACGCT-C y: A--CTGT Edit Distance / Sequence Alignment Problem Given: 2 strings: 𝑥 and 𝑦; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑥) = 𝑚; 𝑙𝑒𝑛𝑔𝑡ℎ(𝑦) = 𝑛 Return: Smallest cost to transform string 𝒙 into string 𝒚 (or vice versa) Another perspective: smallest cost of aligning 𝒙 to 𝒚 (or vice versa) Cost: Match: cmatch Insertion into x (= deletion from y): cindel Insertion into y (= deletion from x): cindel Substitution of x into y (or from y into x); Cost = csub x: ACGCT-C y: A--CTGT How would you solve this problem? Solution Ideas Enumerate all and score Pro: Easy to code Pro: Optimal Con: exponential Greedy: work from left to right, gobbling up matches and inserting gaps or allowing substitutions as necessary Pro: Easy Pro: Linear = fast / efficient Con: not optimal DP Pre-req: optimality property Pre-req: define addressable sub-problems Pre-req: determine relationship between problem and sub-problems Pro: Optimal Con: ? Divide and Conquer? Designing the DP Algorithm for Gene Sequence Alignment DP? Define each sub-problem 𝑆(𝑖, 𝑗) to be the best score for aligning the first 𝑖 bases of sequence 𝑥 with the first 𝑗 bases of sequence 𝑦 Does that suffice as a minimal description? In those terms, what is our objective function? minimize 𝑆 𝑚, 𝑛 , where 𝑚 = 𝑥 , 𝑛 = |𝑦| Can we divide this problem into sub-problems? How many? Hint: how many sub-problems are one step away from 𝑆(𝑖, 𝑗)? Example: Sub-problems x=ACGCTGA y=ACTGT Example: Sub-problems x=ACGCTGA y=ACTGT To be continued in Lecture #25 Assignment HW #16 Read Section 6.3, if you haven’t done so already. Thursday: Screencast & Quiz
© Copyright 2026 Paperzz