Rules for Approximate String Matching R.C.T. Lee 1 Rule 1 Consider two substrings A1 and A2 as shown below: A1 A2 P1 S1 P2 S2 If ed(A1, A2) ≦k and S1=S2, then ed(P1, P2) ≦k. 2 • Rule 1:[AKLLLR2000], [H2005], [HHLS2006], [JB2000], [LV89], [NB99], [NB2000], [S80], [TU93], and [WM92]. 3 Rule 2 A B m If ed(A, B) ≦k, then the length of A must be between m-k and m+k. 4 • Rule 2: [FN2004], [NB99], [NB2000] and [TU93]. 5 Rule 3 S1 S1’ P If S1 contain S1’ completely and the distance between S1’ and any substring of P is larger than k, then ed(S1, P)>k. 6 • Rule 3: [ALP2004]. 7 Rule 4 T P S1 S2 P S2 For any substring S1 in T, if there exists a substring S2 in P to the left of S1, ed(S1, S2) ≦k and S2 is the rightmost such substring, then move P to align S1 and S2. 8 • Rule 4: [ALP2004]. 9 Based upon Rule 3 and Rule 2, we have Rule 5 m-k T S1 P If the window size is (m-k) and there exists a substring S1 in the window such that the distance between S1 and any substring of P is larger than k, then we can safely move P as follows: m-k T S1 P 10 If Rule 5 is not satisfied, it means the following: For every substring S1 in T, there exists a substring S2 in P such that ed(S1, S2) ≦k. 11 Rule 5-1 m-k T S1 P If Rule 5 is not satisfied, we can only move 1 step as follows: m-k T S1 P 12 • Rule 5: [HN2005]. 13 Rule 6 Hamming Distance(A, B) ≧Edit Distance(A, B). 14 • Rule 6: [AKLLLR2000], [FN2004] and [TU93]. 15 Rule 7 For strings A and B, if there are k+1 characters which do not appear in B, then ed(A, B)>k. Rule 7-1 Let A and B be two strings. Let there be k+1 characters a1, a2, …, ak+1 in A and ai is aligned with bi in B. If every ai does not appear in B[i-k, i+k], then ed(A, B)>k. 16 • Rule 7: [TU93]. 17 Rule 8 Let there be two strings A and B. Let B be divided into j pieces B1, B2, …, Bj. If ed(A, B)>k, there is at least one substring Ai in A such that ed(Ai, Bi) k j . 18 Rule 8-1 Let A and B be two strings. Let B be divided into j pieces B1, B2, …, Bj. If for every Bi and every substring S of A, ed(S, Bi) k j , ed(A, B)>k. 19 Rule 8-2 Let A and B be two strings. Let the lengths of A and B be m+k and m repsectively. Let B be divided into j pieces B1, B2, …, Bj. Let AP be a prefix of A. If for every Bi and every substring S of A, ed(S, Bi) k j , ed(AP, B)>k. 20 • Rule 8: [NB99] and [NB2000]. 21 Rule 9 Let A and B be two strings with lengths m+k and m respectively. Let A’ be the prefix of A with length m-k. Let there be j characters a1, a2, …, aj in A’. Let the number of times that ai appears in A and B be N(A’, ai) and N(B, ai) respectively. Let Ci=N(A’, ai)-N(B, ai). Let AP be any prefix of A. If C Ci 0 i k , ed(AP, B)>k. 22 Rule 9-1 Let A and B be two strings with lengths m+k and m respectively. Let there be j characters a1, a2, …, aj in A. Let the number of times that ai appears in A and B be N(A’, ai) and N(B, ai) respectively. Let Ci=N(B, ai)-N(A, ai). Let AP be any prefix of A. If C Ci 0 i k , ed(AP, B)>k. 23 Rule 10 m+2k P’ T i-k i i+m+k P Let P and T be two strings with lengths m and n respectively. If P matches with a substring P’ of T at position i, any substring S of T[i-k, i+m+k] has the probability of ed(S, P) ≦k. 24 • Rule 10: [NB99]. 25 Rule 11 Let P and Q be two strings. Let P be divided as follows: P1 … P2 Pn Let Qi be the substring in Q and that ed(Pi, Qi) is the smallest. P1 P2 Pn … Q1 … QN Q2 N If ed ( P , Q ) k , ed ( P, Q) k. i 1 i i 26 Application of Rule 11 W … tn T Pn P1 t2 t1 P2 ed(ti,Pi) is the smallest. n If for some n, ed (ti , Pi ) k , ed (W , P) k . i 1 27 • [AKLLLR2000] Text Indexing and Dictionary Matching with One Error , Amir, A., Keselman, D., Landau, G. M., Lewenstein, M., Lewenstein, N. and Rodeh, M. , Journal of Algorithms , Vol. 37 , 2000 , pp. 309-325 . • [ALP2004] Faster Algorithms for String Matching with k Mismatches, Amir, A., Lewenstein, and Porat, E. Journal of Algorithms, Vol. 50, 2004, pp. 257-275. • [FN2004] Average-Optimal Multiple Approximate String Matching, Kimmo Fredriksson , Gonzalo Navarro, ACM Journal of Experimental Algorithmics, Vol 9, Article No. 1.4,2004, pp. 1-47. 28 • [GG86] Improved String Matching with k Mismatches, Galil, Z. and Giancarlo, R.,SIGACT News, Vol. 17, No. 4, 1986, pp. 52-54. • [H2005] Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229. • [HHLS2006] Approximate String Matching Using Compressed Suffix Arrays, Trinh N. D. Huynh, W. K. Hon, T. W. Lam and W. K. Sung, Theoretical Computer Science, Vol. 352, 2006, pp. 240-249. 29 • [HN2005] Bit-parallel Witnesses and their Applications to Approximate String Matching, Heikki Hyyro and Gonzalo Navarro, Algorithmica, Vol 4, No. 3, 2005, pp.203-231. • [JB2000] Approximate string matching using factor automata, Jan Holub, Borivoj Melichar, Theoretical Computer Science 249, 2000, pp. 305-311. • [LV86] String Matching with k Mismatches by Using Kangaroo Method, Landau, G.M., and Vishkin, U., Theoret. Comput Sci 43, 1986, pp. 239-249. 30 • [LV89] Fast Parallel and Serial Approximate String Matching, G. Landau and U. Vishkin, Journal of algorithms, 10, 1989, pp.157-169. • [NB99] Very fast and simple approximate string matching, G. Navarro and R. BaezaYates, Information Processing Letters, Vol. 72, 1999, pp.65-70. • [NB2000] A Hybrid Indexing Method for Approximate String Matching, Gonzalo Navarro and Ricardo Baeza-Yates , 2000, No.1, Vol.1, pp.205-239. 31 • [S80] String Matching with Errors, Sellers, P. H., Journal of Algorithms, Vol. 20, No. 1, 1980, pp. 359-373. • [TU93] Approximate Boyer-Moore String Matching, J. Tarhio and E. Ukkonen, SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260. • [WM92] Fast Text Searching: Allowing Errors, Sun Wu and Udi Manber, Communications of the ACM, Vol. 35, 1992, pp. 83-91. 32
© Copyright 2026 Paperzz