Lecture 18
IST 4
Challenge Problem 1
The first coding challenge is to implement a function that retrieves the minimum
number of steps it takes to generate a string using only tandem duplications
from one of the following seed strings [0, 1, 01, 10, 101, 010] as listed in
problem 1a, HW1.
Once you have completed this you will realize that for large strings it takes a lot
of time for your program to grow as the search space becomes much larger. The
challenge is to think of ways to optimize this generation process and argue why
your solution is good.
In your solution, please describe your algorithm and approach. Discuss pros and
cons and what things you like/dislike about your algorithm.
Duplications as a tree
01
001
0001
00001
0101
001001
011
0001
00101
0011
Searching algorithms
• Breadth first search
https://en.wikipedia.org/wiki/Breadth-first_search
Depth First Search
• Depth first search
https://en.wikipedia.org/wiki/Depth-first_search
Duplication Distance Problem
Seed = 01
0101
0101101
Sequence = 01011001
Seed = 01
0101
01001
011001
Sequence = 01011001
Duplication Distance Problem
Seed = 01
0101
Seed = 01
0101
Think Reverse!
0101101
Sequence = 01011001
01001
011001
Sequence = 01011001
Duplication Distance Problem
t0 =
t1=
t2=
Thue-Morse Sequence (tk)
0
01
0110
t3 =
01101001
t4 =
0110100110010110
t5 =
01101001100101101001011001101001
Duplication Distance Problem
Special Sequences
Thue-Morse
Sequence (tk)
|tk| = 2k = n
Duplication Distance = Θ(log n)
t5 = 01101001100101101001011001101001
[N. Alon, J. Bruck, F. Farnoud, S. Jain, ISIT’16]
Duplication Distance Problem
Question
Given a sequence s ε {0,1}n, let fs(n) be the duplication
distance from its seed. What can we say about F(n) =
maxsfs(n)?
Theorem D1
0.045 ≤ limn ∞ F(n)/n ≤ 0.4
LINEAR!
[N. Alon, J. Bruck, F. Farnoud, S. Jain, ISIT’16]
Duplication Distance Problem
De-Bruijn sequence of length n
00000100011001010011101011011111
Duplication Distance = Ω(n/log n)
[N. Alon, J. Bruck, F. Farnoud, S. Jain, ISIT’16]
Nikhil Poole
Challenge Problem 1
Nikhil Poole
IST 4
June 1, 2017
Concept
• Bottom-up algorithm: i.e. start with seed, and work way up to final
string
• Overview: use a tree of binary possibilities and traverse the branches
through an optimizing, recursive depth first search, utilizing a stack.
Function Overview
• Parameters: binary string passed in as an array of chars: denote by
BIN_STR
• Return value: integer value representing the minimum tandem
duplication distance
• Methods: depth first search, implement a stack
• Each time we go down a branch we push the new branch onto the
stack; once we are finished dealing with this branch, we pop it off the
top of the stack
Choosing the Starting Seed
• If BIN_STR[0] is 0, and BIN_STR[last] is 0, and sum(digits) is 0 (i.e. BIN_STR
has only 0’s): seed is 0
• If BIN_STR[0] is 0, and BIN_STR[last] is 0, and sum(digits) does not equal 0
(i.e. BIN_STR has 1’s in between a start and end 0): seed is 010
• If BIN_STR[0] is 0, and BIN_STR[last] is 1: seed is 01
• If BIN_STR[0] is 1, and BIN_STR[last] is 1, and sum(digits) = len(BIN_STR)
(i.e. BIN_STR has only 1’s): seed is 1
• If BIN_STR[0] is 1, and BIN_STR[last] is 1, and sum(digits) < len(BIN_STR)
(i.e. BIN_STR has 0’s between a start and end 1): seed is 101
• If BIN_STR[0] is 1, and BIN_STR[last] is 0: seed is 10
Algorithm
• Seed = CURR_STR
• counter value to keep track of the number of duplication steps
• Depth-first search: traverse the tree associated with seed by exploring
each digit combination branch
• explore branches in which we take 1 digit at a time, branches in which
we take 2 digits at a time, 3 at a time, etc.
• each depth level of tree consists of all NMAX possibilities for taking
certain number of digits at a time
• optimize solution: start with the greatest number of digits, i.e. NMAX
= length of CURR_STR
Algorithm
• Duplicate each particular tandem combination of numbers (in this case, we
duplicate all NMAX digits)
• increment the counter value at each deeper branch.
• Update CURR_STR with tandem duplication and check size of CURR_STR.
• If len(CURR_STR) > len(BIN_STR), then return back up tree (subtract 1 from
counter value) and proceed to take (NMAX – 1) digits at a time.
• If len(CURR_STR) = len(BIN_STR), then check if strings match. If strings
match, set the variable TAND_DIST to counter value.
• If len(CURR_STR) < len(BIN_STR), repeat tandem duplication of all digits,
going down one level deeper, incrementing counter variable: recursive
Algorithm
• (NMAX – 1) branch: already explored all possible branches for taking
NMAX digits at a time at current tree level
• Take (NMAX – 1) digits at a time
• Recall: if we return up a branch in the tree, we will return to the
branch’s root representation of CURR_STR and counter value
Algorithm
• Set up a loop with iterator i = 0, for i < NMAX – (NMAX – k), where k =
1 in this case.
• For each i, take the next (NMAX – 1) digits, starting with this
particular index in the string
• Perform a tandem duplication by going down one level in the tree and
incrementing the counter variable
• Repeat tests for NMAX digits.
Algorithm
• If len(CURR_STR) > len(BIN_STR), then return back up the tree one level
(subtract 1 from counter value) and proceed to take (NMAX – 2) digits at a
time break from loop.
• If len(CURR_STR) = len(BIN_STR), then check if strings match. If strings
match and counter variable < TAND_DIST, set the variable TAND_DIST to
counter value.
• If len(CURR_STR) < len(BIN_STR), then go down a level and repeat the
tandem duplication, starting, however, from a duplication of all NMAX
digits, and proceeding down the ladder as a before
• Use recursive algorithm: if at any point, counter value > TAND_DIST, break
from further exploration of this particular branch and return to root branch
at the previous level
Algorithm
• (NMAX – k – 1) branch: already explored all possible branches for
taking NMAX – k digits at a time
• continue by taking (NMAX – k – 1) digits at a time in CURR_STR
Algorithm
• return the value of TAND_DIST, representing the optimal duplication
distance to the given BIN_STR
• TAND_DIST a global variable, independent of level of tree, static even
as we traverse recursively farther and farther down into the tree
Example: 001001
• Find minimum duplication distance for 001001
• Start with seed
• String starts with 0, ends in 1:
• Seed = 01
Example: 001001
• CURR_STR = 01:
• Counter = 0
• Duplicate all digits
• Go down a level
• Counter = 1
• New CURR_STR = 0101
• Length less than 6 go down another level
Example: 001001
• CURR_STR = 0101
• Counter = 1
• Duplicate all digits
• Go down a level
• Counter = 2
• New CURR_STR = 01010101
• Length greater than 6
• Done exploring 4-digit combinations return back up a level to
CURR_STR = 0101
Example: 001001
• CURR_STR = 0101
• Counter = 1
• i = 0: duplicate first three digits
• Go down a level
• Counter = 2
• New CURR_STR = 0100101
• Length greater than 6
• Done exploring 3-digit combinations return back up a level to
CURR_STR = 0101
Example: 001001
• CURR_STR = 0101
• Counter = 1
• i = 0: duplicate first two digits
• Go down a level
• Counter = 2
• New CURR_STR = 010101
• Length = 6, but strings don’t match
• Go to next branch under the 2-digit node
Example: 001001
• CURR_STR = 0101
• i = 1: duplicate middle two digits
• Go down a level
• Counter = 2
• New CURR_STR = 010101
• Length = 6, but strings don’t match
• Go to next branch under the 2-digit node
Example: 001001
• CURR_STR = 0101
• i = 2: duplicate last two digits
• Go down a level
• Counter = 2
• New CURR_STR = 010101
• Length = 6, but strings don’t match
• Done exploring 2-digit combinations return back up a level to
CURR_STR = 0101
Example: 001001
• CURR_STR = 0101
• Counter = 1
• i = 0: duplicate first digit
• Go down a level
• Counter = 2
• New CURR_STR = 00101
• Length < 6
• Go to next branch under the 1-digit node
Example: 001001
• This process continues, starting again with duplicating all digits under
this node
• Eventually get to 001001 by duplicating the fourth digit of the node
00101 TAND_DIST = 3
• For time’s sake, skip these permutations and return to the seed 01:
done exploring the 2-digit combinations
Example: 001001
• CURR_STR = 01:
• Counter = 0
• Duplicate first digits
• Go down a level
• Counter = 1
• New CURR_STR = 001
• Length less than 6 go down another level
Example: 001001
• CURR_STR = 001:
• Counter = 1
• Duplicate all digits
• Go down a level
• Counter = 2
• New CURR_STR = 001001
• STRINGS MATCH!!!
• Counter < TAND_DIST = 3, so set TAND_DIST = 2
Example: 001001
• Would eventually explore rest of branch for the 1 digit combinations,
starting from seed 01
• TAND_DIST at end = 2
Conclusion: Optimization
• recursive, depth first search algorithm of the possibility tree
• implement a depth first search, rather than a breadth-first search of
the possibility tree eliminate branches that increase our current
counter value over TAND_DIST,
• breadth-first search would have required implementation of a queue,
consider all possible digit combinations at a given level at the same
time, before proceeding to the next level
• less efficient and involves unnecessary traversal of different branches.
Conclusion: Optimization
• Optimize: start by checking the case in which we take all NMAX digits
of CURR_STR at a time
• ensures that, if, for example, BIN_STR was simply two copies of
CURR_STR, we would obtain this tandem duplication distance
immediately
• if there is a solution requiring repeated tandem duplications of all
NMAX digits, this would be the first solution we find.
• upper bound on TAND_DIST, and may eliminate branches that
traverse past the tree level represented by TAND_DIST
Cons
• Quite exhaustive
• Only certain eliminations in branches, probably could simplify
• Alternative ideas: top-down approach
• Could also put conditional statements checking difference in length
between current string and desired string, so we do not always have
to start by duplicating the maximum number of digits
Umesh Padia
Dessie DiMino &
Muhammad Younis
Challenge Problem 1
By Dessie DiMino and Muhammad Younis
Our Solution
•
Find the seed
•
Work backwards to find the number of steps
•
Recursion
•
Memorization
Why Work Backwards?
•
By working backwards, we can ensure that we have the correct solution by accessing every
potential reduction branch
•
Function specifically starts with one string, iteration ensures that all repeats are tested
individually until a base case is found
•
Solution is always guaranteed to be true because all paths of string reduction are exhausted
Examples
0110110101
11111110
01101001
01101
11110
0101001
0101
110
01001
10
0101
0110101
01
01
Why Memorization?
Some reductions lead to the same string and redoing the recursion for that string is unnecessary
if it can be stored in a dictionary and accessed.
0110110101
0110101
01101
0110110101
01101101
01101
Finding the Seed
•
Seed can be found by testing 4 cases and with subcases
•
0 at beginning and 1 at the end => 01 seed
•
1 at beginning and 0 at the end => 10 seed
•
0 at beginning and end
•
•
If 1 is found anywhere in the middle => 010 seed
•
If no 1s are found => 00 seed
1 at beginning and end
•
If 0 is found anywhere in the middle => 101 seed
•
If no 0s are found => 11 seed
Finding the Steps
•
Recursion by inputting entire computing string (work backwards)
•
Base Case 1: Seed is found, return 0
•
Base Case 2: Seed has previously found, return value in hashtable
•
Reduced number of steps that need to be made if they’ve already been tested
•
Loops through all possible pairs of matches based on how long the string is,
specifically testing potentially longer repeats first.
•
Each call adds one to the number of steps for every call of the functions.
Recursive Function
Function (as written):
solver(n, seed, solvedkeys, currentsmallest)
n is the string being computed
seed is the seed (found from the previous part)
solvedkeys is a dictionary of keys computed already
currentsmallest is the current smallest seed value
Testing Longer Repeats
•
The code specifically tests the longest repeat possible
•
Example:
•
String: 01110101
•
Starting Index: 0 (01110101)
•
First Compares: 01110101
•
Then Compares: 01110101 until 01110101
•
Repeats this iterating through every index
•
Since we cut off branches that take too long and longer repeats lead to shorter string, this allows for less
computing
Why is that Important?
Program specifically is able to prune off branches that lead to solutions that take a longer
number of steps than the one being run
This cuts off the need for computing that isn’t necessary
Certain “branches” of the tree can be cut off
String
01010101
Potential
Reduction Paths
01010101
01010101
01010101
01010101
Reduced String
0101
010101
010101
010101
Potential
Reduction Paths
0101
010101
010101
010101
010101
010101
010101
Reduced String
01
0101
0101
0101
0101
0101
0101
Potential
Reduction Paths
N/A
0101
0101
0101
0101
0101
0101
Reduced String
N/A
01
01
01
01
01
01
Minimum String
Length
2
3
3
3
3
3
3
Pros and Cons
•
Pro: Fast on small, nice looking strings
•
Con: Not so fast on longer strings, runtime increases exponentially
•
More branches, more computation
•
Cuts down some branches with memorization
•
Still checks more branches than is necessary
Things to Improve
Decreased the number of branches tested even more
Makes runtime lower
Through global dictionary that is updated (Ex: if 01010101 is inputted, it remembers its
length to be 8 for all future function calls)
Better method to stop computing certain branches
Competition Time
List of sequences
s1 = 10010011
s2= 000001000110010100111010110111110000
s3 = 01101001100101101001011001101001
s4 = 11111111111100111
s5 = 11111111111011000
s6= 0000100110101111000
Challenge Problem 2
In homework 3, you found a circuit for computing parity of 2 and 3 variables with 4
and 8 m-boxes respectively.
By using the same idea, you can compute parity of 4 variables with 12 m-boxes.
Is 12 is the smallest size of a circuit of m-boxes for computing the parity of
4 variables?
What is the smallest circuit for parity of 4 variables?
You can use a formal proof or an algorithm/code
to check all possible circuit configurations.
a
b
a
b
Parity(a, b)
4 m-boxes
d
c Parity(a,b)
c
Parity(a,b)
d
Parity(a,b,c)
Parity(a,b,c)
Parity(a,b,c)= Parity(c,Parity(a, b)) Parity(a,b,c,d) = Parity(d,Parity(a,b,c))
8 m-boxes
12 m-boxes
Changnan Peng
Changnan Peng
2017.6.1
Outline
Tell the computer what XOR and all other binary functions are
Tell the computer what an m-box is
Tell the computer what we want to do
Breadth-first search (BFS)
Depth-first search (DFS)
The code
Results
Tell the computer what binary functions are
What is a binary function?
Tell the computer what binary functions are
A binary function is defined by its syntax box.
a
b
a⊕b
0
0
0
0
1
1
1
0
1
1
1
0
a
b
c
d
a⊕b⊕c⊕d
0
0
0
0
0
0
0
0
1
1
Tell
the
computer
what binary functions are
0
0
1
0
1
0
0
1
1
0
0
1
0
0
1
0
1
1
0
0
0
1
1
1
1
1
0
0
0
1
1
0
0
1
0
1
0
1
0
0
1
0
1
1
1
1
1
0
0
0
1
1
0
1
1
1
1
1
0
1
1
1
1
1
0
A binary0 function
is
by its syntax box.
1
0
1 defined
0
Tell the computer what an m-box is
What is an m-box?
m
X
Y
m(X, Y)
0
0
1
0
1
1
1
0
1
1
1
0
Tell the computer what an m-box is
Input binary functions?
a
b
c
d
X(a,b,c,d) Y(a,b,c,d) m(X, Y)
0
0
0
0
X0
Y0
m(X0,Y0)
0
0
0
1
X1
Y1
m(X1,Y1)
0
0
1
0
X2
Y2
m(X2,Y2)
0
0
1
1
X3
Y3
m(X3,Y3)
0
1
0
0
X4
Y4
m(X4,Y4)
0
1
0
1
X5
Y5
m(X5,Y5)
0
1
1
0
X6
Y6
m(X6,Y6)
0
1
1
1
X7
Y7
m(X7,Y7)
1
0
0
0
X8
Y8
m(X8,Y8)
1
0
0
1
X9
Y9
m(X9,Y9)
1
0
1
0
X10
Y10
m(X10,Y10)
1
0
1
1
X11
Y11
m(X11,Y11)
1
1
0
0
X12
Y12
m(X12,Y12)
1
1
0
1
X13
Y13
m(X13,Y13)
1
1
1
0
X14
Y14
m(X14,Y14)
1
1
1
1
X15
Y15
m(X15,Y15)
Tell the computer what an m-box is
Example: m(a, a) = a
Example: m(a, b) = a + b
Tell the computer what we want to do
Example: a⊕b
m
m
m
m
Tell the computer what we want to do
Step by step:
We have two functions:
m
f0 = a = 0011 and f1 = b = 0101
We use an m-box on f0 and
f1 and get a new function:
f2 = m(0011, 0101) = 1110
m
We use an m-box on f0 and
f2 and get a new function:
f3 = m(0011, 1110) = 1101
m
We use an m-box on f1 and
f2 and get a new function:
f4 = m(0101, 1110) = 1011
We use an m-box on f3 and
f4 and get the target function: f5 = m(1101, 1011) = 0110
m
Tell the computer what we want to do
Step by step:
We have two functions:
f0 = a = 0011 and f1 = b = 0101
We use an m-box on f0 and
f1 and get a new function:
f2 = m(0011, 0101) = 1110
We use an m-box on f0 and
f2 and get a new function:
f3 = m(0011, 1110) = 1101
We use an m-box on f1 and
f2 and get a new function:
f4 = m(0101, 1110) = 1011
We use an m-box on f3 and
f4 and get the target function: f5 = m(1101, 1011) = 0110
Tell the computer what we want to do
If we do not know the right path, we can try every possible ways in each
step:
Tell the computer what we want to do
Tree structure.
Tell the computer what we want to do
The list with the target function is located somewhere in the tree, and we
need to find the path to it.
Breadth-first search (BFS)
The BFS algorithm searches layer by layer.
Depth-first search (DFS)
The BFS algorithm searches branch by branch.
A restriction of maximum depth is required.
Increase the maximum depth until solution found.
The code
The code
The code
Results
find(2, 0110, 4) gives [(0, 1), (0, 2), (1, 2), (3, 4)]
find(2, 0110, 3)
gives no output
Results
find(4, 0110100110010110, 10)
Find whether a⊕b⊕c⊕d can be constructed with 10 m-boxes.
Searching the branch starting with [(0, 0), …] costs 18 hours, and
searching the branch starting with [(0, 1), …] costs 20 hours. There is no
output.
Since a, b, c, and d are symmetric, we can conclude that a⊕b⊕c⊕d
cannot be constructed with 10 m-boxes.
Results
find(4, 0110100110010110, 11) is still running. No output so far.
It is unknown whether a⊕b⊕c⊕d can be constructed with 11 m-boxes.
find(4, 0110100110010110, 12) correctly recognized the solution [(0, 1), (0, 4),
(1, 4), (5, 6), (2, 7), (2, 8), (7, 8), (9, 10), (3, 11), (3, 12), (11, 12), (13, 14)].
Thank you!
© Copyright 2025 Paperzz