answers to HW 2

CS431 homework 2
8 June 2011
Question 1 (page 54, problem 2.3). Is lg n = O(n)? Is lg n = Ω(n)? Is
lg n = Θ(n)?
Answer. Recall the definition of big-O: for all functions f and g, f (n) = O(g(n))
if there exist constants c and N such that for all n > N , f (n) ≤ c · g(n). For
big-omega, replace the ≤ with ≥ in the above definition. A function f (n) is
big-theta of g(n) if f (n) = O(g(n)) and f (n) = Ω(g(n)).
If we choose c = 1 and N = 0, then we immediately see that for all n > 0,
lg n = 1 · n. Therefore lg n = O(n).
Since n < 2n for all n ≥ 1, it follows that lg n < lg 2n = n for all n ≥ 1.
Even with a constant multiplicative factor on the left-hand side of the inequality,
there is a great enough n such that the inequality will be true. This means that
lg n 6= Ω(n).
Since lg n is not Ω(n), it is not Θ(n).
Question 2. Assume you have a DNA base sequence P of n bases and are given
a sequence S of m bases. The problem is to find the place in S where P first
occurs, if there is one.
1. Describe a simple exhaustive search algorithms to solve this problem.
2. Assume
P = TCGATG
and
S = TCGACTCGAATTGCCTCGATGGATCCATATCG.
How many comparisons does your algorithm take before it finds where P
occurs?
3. Assume P does not occur in S. How many comparisons does your algorithm take before it discovers this?
Answer.
1. On input P = p1 p2 · · · pm and S = s1 s2 · · · sn the naı̈ve exhaustive search
algorithm is as follows:
• for i = 1 up to n − m + 1
1
– for j = 1 up to m
∗ if si+j 6= pj , break out of this inner loop, and continue to the
next iteration of the outer loop
– if the inner loop terminates, output i
• if the outer loop terminates, then P does not exist in S
2. Each number in the following sum is the total number of comparisons
when starting at position 1, position 2, and so on, in the string S:
5+1+1+1+1+5+1+1+1+1+2+4+1+1+1+6
This sum equals 33.
3. In the worst case this algorithm performs m comparisons at n − m + 1
positions in the string S. Therefore the running time of this algorithm if
the string P does not exist in the string S is O(m(n − m + 1)).
Question 3 (page 212, problem 6.10). A rook stands on the upper left square
of a chessboard. Two players take turns moving the rook either horizontally to
the right or vertically downward (as many squares as they want). The player
who can place the rook on the lower right square of the chessboard wins. Who
will win? Describe the winning strategy.
Answer. Recall the analysis of the game we saw in class in which two players
took turns moving a king from the upper left corner of the chess board to the
bottom right corner. We will start analyzing from the bottom right corner and
note that if player one were to start at any position on the bottom row or the
right-most column, he or she could certainly win by simply moving the rook to
the bottom right corner. See Figure 1a.
We also notice that from position g2, player one must lose, since he or she
will move either down one or right one, allowing player two to move to the final
position. See Figure 1b.
Now from any position in column g above g2, and from any position in row
2 to the right of g2, player one can force player two to lose by moving directly
to position g2. See Figure 1c.
At position f3, player one must lose, because moving either right or down by
one or two allows player two to win. See Figure 1d.
This pattern continues all the way back up towards the top left position.
See Figure 1e. This means that if both players play optimally, player one will
lose if he or she starts with the rook initially in the top left position. Therefore
player two will win. The winning strategy for player two follows.
On the first turn, player one must move the rook either right along the top
row or down along the left-most column. Regardless of where player one moves
the rook, player two should move the rook to the diagonal (where all the Ls
are). This forces player one to a losing position for the next turn. Repeat this
strategy after each move by player one, and player two will always win.
2
Figure 1: Analyzing the rook problem described in Question 3.
(a) Player one can win from the bottom row
(b) Player one must lose from position g2.
or the right-most column.
W
0Z0Z0Z0Z
7
W
Z0Z0Z0Z0
6
W
0Z0Z0Z0Z
5
W
Z0Z0Z0Z0
4
W
0Z0Z0Z0Z
3
W
Z0Z0Z0Z0
2
W
0Z0Z0Z0Z
1 WWWWWWW
Z0Z0Z0Z0
W
0Z0Z0Z0Z
W
Z0Z0Z0Z0
6
W
0Z0Z0Z0Z
5
W
Z0Z0Z0Z0
4
W
0Z0Z0Z0Z
3
W
Z0Z0Z0Z0
2
LW
0Z0Z0Z0Z
1 WWWWWWW
Z0Z0Z0Z0
8
a
b
c
d
e
f
g
8
7
h
a
b
c
d
e
f
g
h
(c) Player one can force player two to lose (d) Player one loses at f3 because any move
by moving directly to g2.
allows player two to win.
WW
0Z0Z0Z0Z
7
WW
Z0Z0Z0Z0
6
WW
0Z0Z0Z0Z
5
WW
Z0Z0Z0Z0
4
WW
0Z0Z0Z0Z
3
WW
Z0Z0Z0Z0
2 WWWWWW L W
0Z0Z0Z0Z
1 WWWWWWW
Z0Z0Z0Z0
WW
0Z0Z0Z0Z
WW
Z0Z0Z0Z0
6
WW
0Z0Z0Z0Z
5
WW
Z0Z0Z0Z0
4
WW
0Z0Z0Z0Z
3
L
WW
Z0Z0Z0Z0
2 WWWWWW L W
0Z0Z0Z0Z
1 WWWWWWW
Z0Z0Z0Z0
8
a
b
c
d
e
f
g
8
7
h
a
b
c
d
(e) The complete analysis of losing and
winning starting positions for player one.
L WWWWWWW
0Z0Z0Z0Z
W L WWWWWW
Z0Z0Z0Z0
6 WW L WWWWW
0Z0Z0Z0Z
5 WWW L WWWW
Z0Z0Z0Z0
4 WWWW L WWW
0Z0Z0Z0Z
3 WWWWW L WW
Z0Z0Z0Z0
2 WWWWWW L W
0Z0Z0Z0Z
1 WWWWWWW
Z0Z0Z0Z0
8
7
a
b
c
d
e
3
f
g
h
e
f
g
h
Question 4 (page 213, problem 6.14). Two players play the following game
with two sequences of length n and m nucleotides. At every turn a player must
delete two nucleotides from one sequence (either the first or the second) and one
nucleotide from the other. The player who cannot move loses. Who will win?
Describe the winning strategy for each n and m.
Answer. Let the string of length m be called u and the string of length n be
called v. Notice that either m or n (or both) may be zero, representing an
empty string.
We can view this game as a two player game on a chessboard of m + 1 rows
and n + 1 columns in which there is a knight in the top left corner, similar to
the game described in Question 3. The only valid moves for the knight in this
game are two moves right and one move down, or two moves down and one
move right. The two players take turns moving the knight, and the one who
cannot make a move loses.
Why is this the correct notion for the original game described in this question? Every move to the right represents a deletion of a letter from v, and
every move down represents a deletion of a letter from u. Notice that placing
the knight in the right-most column means that n characters have been deleted
from v, so it is the empty string at this point. Similarly, placing the knight
in the bottom column means that m characters have been deleted from u, so
it is the empty string at this point. The board is of size (m + 1) × (n + 1) to
accomodate the possibility that either u or v is the empty string.
As before, we start by examining the bottom right region of the board, where
any position along the bottom row or the right-most column is a losing position.
In addition, no move can be made from g2, so it is also a losing position. See
Figure 2a (which shows the game on an 8 × 8 board, so m = n = 7).
Player one can force player two into a losing position from most of the
positions outside these extreme positions, though. See Figure 2b. The analysis
continues until we get the pattern seen in Figure 2c. The pattern would continue
outward for a board of arbitrary size.
Now we just need to determine under what conditions (that is, board sizes)
player one will win or lose if the knight starts in the top right corner. It will be
a little bit simpler to determine under what conditions player one will lose than
to determine under what conditions player one will win. If m + 1 represents the
number of rows and n + 1 represents the number of columns, then the following
positions contain Ls:
m = 0 and n ≥ 0, m = 3 and n ≥ 3, m = 6 and n ≥ 6, . . .
n = 0 and m ≥ 0, n = 3 and m ≥ 3, n = 6 and m ≥ 6, . . .
m = n = 1, m = n = 4, m = n = 7, . . .
Expressed more concisely,
m = 3k and n ≥ 3k for some k ∈ N
(1)
n = 3k and m ≥ 3k for some k ∈ N
(2)
m = n = 3k + 1 for some k ∈ N
(3)
4
Figure 2: Analyzing the knight problem described in Question 4 for an 8 × 8
board.
(a) The bottom row and the right-most
column are losing positions in the knight (b) Player one can force player two into a
losing position from the next layer out.
game.
L
0Z0Z0Z0Z
7
L
Z0Z0Z0Z0
6
L
0Z0Z0Z0Z
5
L
Z0Z0Z0Z0
4
L
0Z0Z0Z0Z
3
L
Z0Z0Z0Z0
2
L L
0Z0Z0Z0Z
1 L L L L L L L L
Z0Z0Z0Z0
WW L
0Z0Z0Z0Z
7
WW L
Z0Z0Z0Z0
6
WW L
0Z0Z0Z0Z
5
WW L
Z0Z0Z0Z0
4
WW L
0Z0Z0Z0Z
3 WWWWWWW L
Z0Z0Z0Z0
2 WWWWWW L L
0Z0Z0Z0Z
1 L L L L L L L L
Z0Z0Z0Z0
8
a
b
c
d
e
f
g
8
h
a
b
c
d
(c) The complete analysis of winning and
losing starting positions for player one.
L L WW L WW L
0Z0Z0Z0Z
L L WW L WW L
Z0Z0Z0Z0
6 WWWW L WW L
0Z0Z0Z0Z
5 WWW L L WW L
Z0Z0Z0Z0
4 L L L L L WW L
0Z0Z0Z0Z
3 WWWWWWW L
Z0Z0Z0Z0
2 WWWWWW L L
0Z0Z0Z0Z
1 L L L L L L L L
Z0Z0Z0Z0
8
7
a
b
c
d
e
5
f
g
h
e
f
g
h
(Here, N is the set of natural numbers including 0, {0, 1, 2, 3 . . .}.) Therefore, if
a board with m + 1 rows and n + 1 columns meets any of these three conditions,
player one will lose. Under the complementary set of conditions, player one
will win. Translating this back to the original problem, if the strings are of
length m and n respectively, then player one will lose if m and n meet any of
these conditions. The winning strategy is to move the knight onto one of these
“strips” of losing positions, in order to force the next player to be in a losing
position.
Question 5. Describe how RNA works to carry information DNA and helps in
making cell proteins. In particular,
1. What are the different forms of RNA and how is their structure different?
2. What are the different functions of the different forms of RNA?
3. What do ribosomes do and where are they located?
Note: Don’t write a book on this, just a page or so will do. This process is
discussed on pages 65–67 of the text but you may need a bit more research than
that here.
Answer. There are several different forms of ribonucleic acid (RNA). The main
three significant types of RNA in humans (to know about for a bioinformatics
class) are messenger RNA (mRNA), ribosomal RNA (rRNA), and transfer RNA
(tRNA).
Messenger RNA is a transcription of a portion of DNA which codes for
a protein. In the nucleus, a molecule called RNA polymerase forms strands
of mRNA which are complementary to corresponding portions of DNA. The
mRNA is a molecular sequence of nucleotides, and each triple of nucleotides
(called a “codon”) codes for an amino acid, according to a fixed translation
table. The translation table has size 43 = 64, since there are four possible
nucleotide bases (A, G, C, and T).
Ribosomal RNA makes up the ribosome, an organelle which exists in the
cytoplasm outside of the nucleus of the cell, which synthesizes proteins coded
by mRNA. The ribosome is more of a static collection of rRNA and proteins
which can be likened to a factory. The mRNA is the template along which the
ribosome moves, and as the ribosome reads each codon, it appends the amino
acid delivered by the appropriate transfer RNA to a growing chain of amino
acids called a polypeptide chain which will eventually become part of a protein.
Each transfer RNA has an “anticodon” component, which matches a codon,
and a site which binds to the amino acid to which the matching codon corresponds. There is at least one tRNA in the cell for each codon. When the ribosome reads a codon, it also allows a corresponding tRNA molecule to bind, and
subsequently helps move the amino acid from the tRNA to the growing polypeptide chain created using previous tRNA molecules corresponding to prior codons
read from mRNA.
6
Table 1: One possible dynamic programming table for computing the longest
common subsequence of GAGTACA and GCTAGGA.
G
C
T
A
G
G
A
G A G T A C A
- ← - ← ← ← ←
1 1 1 1 1 1 1
↑ ↑ ↑ ↑ ↑ - ←
1 1 1 1 1
2 2
↑ ↑ ↑ - ← ← ←
1 1 1
2 2 2 2
↑ - ← ← - ← ←
1
2 2 2 3 3 3
- ↑ - ← ← ← ←
1 2
3 3 3 3 3
↑ ↑ ↑ ↑ ↑ ↑ ↑
1 2 3 3 3 3 3
↑ ↑ ↑ ↑ - ← 1 2 3 3
4 4 4
Question 6. Use the LCS method in section 6.5 of the textbook to find the
longest common subsequence of the two sequences v = GAGTACA and w =
GCTAGGA.
Answer. See Table 1 for one possible dynamic programming table, filled from
the top left corner outward using the algorithm shown in the textbook, given
the two strings GAGTACA and GCTAGGA. Once the table is filled in, we follow
the path from the bottom right corner back to the top left corner (shown in
bold) to get the alignment of the two strings. This results in the alignment
G--AGTA-A
|
||
|
GCTAG--CA
giving the longest common subsequence GAGA of length four.
7

Download Report

answers to HW 2

Paperzz.com

Your Paperzz