Notation (p.#, s.#) means (pdf page no., slide page no. as it written on the slides) Scaled Pattern Matching (p.1, s.0.1) Motivation: Searching for templates in aerial photographs Input: Aerial photo/image Template: The pattern we are looking for, tank for example Task: Search for locations where the template appears in the image The problems ahead of us: 1. Rotation – What if the template is rotated in relational to the one we are looking at 2. Error – What if there is an error in part of the template (partly match) 3. Size – What if the template is scaled in relational to the one we are looking at (p.4, s.3) If there is no need for exact matching (avoiding error) algorithm like Suffix tree & LCA can deal with the problem of Local Error and Orientation by approximation (p.7) Let's look at a problem of digitizing newspaper stories from the point of view of the size only (we are not searching for any error nor rotated match) We will keep a dictionary of fonts and we will search for appearances in all sizes (p.8, s.6) Problem: The problem is inherently inexact. What if the appearance is 1.5 times bigger? What is 0.5 a pixel? Solution until now: Natural scales only Consider 1, 2, 3, 4, 5 … the only scales that we looking for, discrete scales. (p.9, s.6b) Definition: Text in size: n n Pattern in size: m m Text a11 a1n an1 a nn Pattern a11 a m1 a1m amm Find all occurrences of the pattern in the text in all discrete sizes. (p.10, s.5-6) Our problem: Discrete Exact Scaled Matching Input: T X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X O O O X X X X X X O O O X X X X X X O O O X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X X O O X X X X O X X X X X X O O X X O X O X X X O O X X X X X X X X X X X O O X X X X X X X O X X X X X X X X X P X X X X XX XX XX X XX OO XO X T T T S S S R R R T T T S S S R R R XX XO XO X X X X X In the example above we can see 1, 2 and 3 scale match Example for 3 scaling: Z U T Y V S X W R Z Z Z Y Y Y X X X Z Z Z Y Y Y X X X Z Z Z Y Y Y X X X U U U V V V W W W U U U V V V W W W U U U V V V W W W T T T S S S R R R There is a linear algorithm O n 2 that find scale match, in dictionary problem (AC – 96 (p.6, s.4)) (p.11, s.9) Idea: Fix a scale s , divide every text dimension into n / s squares, every square is at size n2 , there are squares at the text. ss s2 There is a constant amount of work for each square ( s -block) (p.12, s.10) n2 s2 For how many scales should we check for? How many correct one’s there are? n The highest scale we need to check for is beyond that the pattern exceed the text m Time for searching match of a pattern scaled with s is: n m Total time for searching the pattern at any scale is n s s 1 n m The progression 1 s s 1 2 2 2 n m n2 s 1 1 n2 . s2 converges to a constant. (p.13, s.11) Problem: Real scales are an open problem even for strings… How to define scaling in one dimension? Let's look at the pattern 'aabcccbb' Scaled to 2 it would look like that 'aaaabbccccccbbbb' every item is doubled But what about scaling to 1.5? How will the pattern will look like than? Every item which is not an integer after scaling will be truncate to an integer (a version of rounding is also possible) Scaled to 1.5 the pattern would look like that 'aaabccccbbb' – a ‘half’ 'b' (the left b item) and a ‘half’ 'c' were truncated. (p.14, s.12) FORMALLY: r times r Denote: a as a single element aaa a We will look at continuity of an item as it was an instance of a single item. PROBLEM DEFINITION 1: r Input: Pattern P a1r1 a2 r2 a j j Where P ri m Text T Where T n Output: All text locations where a1c1 a2 r2 rj 1 a j 1 aj cj appears for some 1, , c1 r1 , c j rj This definition conclude the appearance of the pattern in the text with as many a1 ahead it and as many a j behind it, beyond the respectively truncate numbers. (p.15, s.13) Remark: 1 means we only scale up. Reasons: We need to avoid conceptual problems of loss of resolution. The conceptual problem can be that from "far enough" away everything looks the same, and we can’t determine mismatches. From the above we can conclude that by our definition ( 1), for every scaling with 1 k there is a match at every text location m (p.16, s.14) PROBLEM DEFINITION 2 (SIMPLIFY DEFINITION): Look for a1 r1 a2 r2 rj 1 a j 1 a j aj in the text Example: P aabcccbbbb 3 2 a 2 3 1 b 2 3 3 c 2 3 4 b 2 d aaa b cccc bbbbbb e In this text there is a match by definition 1&2 daaabccccbbbbbbbbe In this text there is a match only by definition 1 but not by 2 (p.17, s.15) WHY ARE DEFINITIONS EQUIVALENT: Split text and pattern to symbol part T S , P S and length part T L , P L P aabcccbbbb P S abcb Example: P L 2134 T daaabccccbbbbbbe T S dabcbe T L 131461 Time for split: O n m Finding P S in T S : O n m (e.g. KMP) The hard part: Finding P L in T L (p18, s.16) Claim: Solving definition 2 in time O f n Solving definition 1 in time O f n Why? r Find a2 r2 a j 1 j1 by definition 2, i.e. find an m 2 inside items match Time O f n For each match verify in constant time that: 1st & last symbol of the pattern are match with T S and T L Time for verifying O n . The maximum matches that can be is n Total time O f (n) n O f (n) (p19, s.17) Naive Algorithm for Matching P L in T L : Before we start remember that t , p are value numbers in T L and P L respectively. Each item continuity appearance in the pattern and text is numbered in P L and T L . We are trying to find a scale that would make a match for every pattern value, scaled with , to the values of some position in the text. We will do that for every position finding interval that need to be on it. For each text location, position pattern starting at that location and calculate interval t t 1 p , p for each resulting <text, pattern> pair This is the interval of possible scale since: t t p t For every p t and there is no match with that p p t 1 t 1 p t 1 For every p t and there is no match with that p p (p.20, s.18) If intersection of all intervals then match. Example: Index Interval PL TL PL 1 2 1, 3 2 2 Interval Interval 2 3 4,5 1 4 2 5 2, 2 1 2, 2 2 4 5 6 7 8 Intersection = . No need to check other pairs 2 2 1 3 2 4 7 4 2 3 2 5 7 8 5 , 2,3 2, 2 2, 2 3 3 1 1 3 1 2,3 2, 2 2 2 3 , 2 3 2, 2 2 1 1 Intersection: 2 , 2 match at location 2 3 3 5 3 Time: O mn (p.21, s.19) Improvement – Parameterized Matching Introduced: Baker 1994 Motivation: Trying to reveal "copying" code P m-matches T at location i if bijection : such that P p1 p2 pm titi 1 ti m1 (p.22, s.20) Example: P abaccbba T badadbbaadcd In the third place we have an m-match. a d , b a, c b Claim (AFM – 94): For that can be sorted in linear time (e.g. 1, , n ) parameterized matching can be done in time O n (p.23, s.21) Lemma , 1 for which P L matches T L at location i scaled to , only if P L m-matches T L at i Proof Assume P L does not m-matche T L at location i . Let us look at the possible reasons for this m-mismatch and by that proofing the lemma. Situation (i) TL P a c≠a b b W.L.O.G. c ≥ a + 1 L Let us check scale match now and see if it is possible to have it. We will check it for the smallest possibility of c (closest numerator since the denominator is the same), which will give us the best chances for scale match. This possibility is c = a + 1. a a 1 a 1 a 2 b , b b , b (p.24, s.22) Situation (ii) TL P a a b c≠b W.L.O.G. c ≥ b + 1 L Let us check scale match now and see if it is possible to have it. We will check it for the smallest possibility of c (closest denominator since the numerator is the same), which will give us the best chances for scale match (smallest denominator nearest to b as possible). This possibility is c = b + 1. a a 1 a a 1 b , b b 1 , b 1 The intersection will not be empty only if a 1 a ab b ab a b a b 1 b But this can never happen if we are looking for scale up only not scale down with 1 (p.25, s.23) Algorithm for Real Scaled String Matching Let pi1 , pi2 , , pil be the different numbers in P L 1. m-match P L in T L 2. For each match check intersection of intervals between pi1 , , pil & corresponding symbols in T L End Example P L = 2 3 2 3 2 pi1 2, pi2 3 Note: there is no interest for which symbol the first 2 stands for and for which symbol other 2 stands for, the only thing we need it for, is interval generating in order to check intersection (the second step of the algorithm) Index 1 2 3 4 5 6 7 8 9 10 11 12 5 6 5 6 5 6 10 6 10 6 10 7 TL m-match (index no.) Scale match (intervals) 1 1 1 2 2 ,3 2, 2 3 1 2 2 3,3 2 1 3 , 2 1 1 2 6 3,3 2 3 3 ,3 3 1 1 7 5,5 2 2, 2 3 (p.26, s.26) Important Fact l p j 1 ij m So there are at most O m different p ij ’s Algorithm Time O n Parameterized matching (for 1, , n as claimed in (p.22 s.20)) m Verification of interval intersection for each location of parameterized matching (No more than O m different p ’s, locations to check) O ij Total O n m (p.27, s.27) TIGTHER ANALYSIS: limit on # of possible m-matches Lemma: Let P m, T n p , i1 , pil different numbers in P Then at most L 2n m-matches of P L in T L l MEANING: Since verification, as seen above, is O l per m-match, 2n lemma implies verification time: O l O n l (p.28, s.28) Proof: Remember that every pi j is a notation for number of continuous occurrences for the first symbol with that no. of occurrences. Any same occurrence by any symbol will not appear as pi j anymore. Now let us look at a place in the text where there is an m-match: Every place pi j shows the first appearances of pi1 , , pil PL pi1 pi2 pil TL a1 a2 al Every ai is a representation for symbol occurrence in the text. Now, since we know that in this position there is an m-match and every pi j is different, every ai must be different, otherwise there was no m-match in that position. The sum of these ai ’s is l ai i 1 l2 2 (p.29, s.29) Let x be total number of m-matches in text Our target now is to find what x is and we want it to be n We will discover x with that tricky way: The sum of all text elements that match 1st occurrences of pi j ’s in the pattern is BUT: This sum is counting overlaps matches too; some m-match can start at the middle of another m-match, this means that it possible that we summarize some of these element matches twice. HOW MANY OVERLAPS CAN BE? (p.30, s.30) For each text location, at most l m-matches will count it, because every ai ’s are different. xl 2 1 xl Total Count Without Overlaps 2 l 2 Dividing it by the most possible overlaps we can now find max x that possible. xl n 2 2n Which give us a limit on the max no. of parameterized matching in text x l Clearly without summarizing anything twice (overlaps stay outside) we get xl 2 2
© Copyright 2026 Paperzz