String Matching
Finite automata based algorithm
R. Inkulu
http://www.iitg.ac.in/rinkulu/
(Finite automata based string matching algo)
1/8
σ function
• The suffix function σ maps Σ∗ to {0, 1, . . . , |P| = m} such that σ(Ti ) is
the length of the longest prefix of pattern P that is also a suffix of Ti 1 .
suffix of Ti
T
abababa
a b a b a c a pattern P
ababaca ×
ababaca
ababaca
×
ababaca
×
ababaca
×
ababaca
hence, σ(Ti ) = 5
• If σ(Ti ) = |P| ⇒ suffix of Ti is matching with P.
1
first i (resp j) characters of T (resp. P) are denoted with Ti (resp. Pj )
(Finite automata based string matching algo)
2/8
δ function: construct DFA by preprocessing the pattern
• DFA transition function δ : Q × Σ → Q is defined as δ(q, a) = σ(Pq a),
where Q = {0, 1, . . . , |P| = m}, with m as the final state
if δ(q, w) is not mentioned in DFA for some q and w ∈ Σ, then δ(q, w) is assumed to be in state 0
a
a
0
a
a
1
b
2
a
a
3
b
4
a
5
c
6
a
7
b
b
for the pattern P : ababaca with Σ = {a, b, c}
(Finite automata based string matching algo)
3/8
δ function: construct DFA by preprocessing the pattern
• DFA transition function δ : Q × Σ → Q is defined as δ(q, a) = σ(Pq a),
where Q = {0, 1, . . . , |P| = m}, with m as the final state
if δ(q, w) is not mentioned in DFA for some q and w ∈ Σ, then δ(q, w) is assumed to be in state 0
a
a
0
a
a
1
b
2
a
a
3
b
4
a
5
c
6
a
7
b
b
for the pattern P : ababaca with Σ = {a, b, c}
Objective: After processing Ti with the DFA,
• DFA is in state j ⇔ σ(Ti ) = j.
• In specific, DFA is in state m ⇔ suffix of Ti matches with P.
(Finite automata based string matching algo)
3/8
Preprocessing algorithm
naively match P against itself while considering every possible next character
(Finite automata based string matching algo)
4/8
Preprocessing algorithm
naively match P against itself while considering every possible next character
• takes O(m3 |Σ|) time
- there are m places in Tm
- each place can be filled with |Σ| characters
- every Ti need to be aligned with m prefixes of P
- takes O(m) time to check for matching in any alignment
(Finite automata based string matching algo)
4/8
Preprocessing algorithm
naively match P against itself while considering every possible next character
• takes O(m3 |Σ|) time
- there are m places in Tm
- each place can be filled with |Σ| characters
- every Ti need to be aligned with m prefixes of P
- takes O(m) time to check for matching in any alignment
• can be improved to O(m|Σ|)
— not presented in calss
(Finite automata based string matching algo)
4/8
Matching using DFA
(1) q = 0
(2) for i = 1 to n
(a) q = δ(q, T[i])
(b) if q == m
(i) print: i − m is a valid shift
• as each character in T is examined only once, matching time is O(n)
(Finite automata based string matching algo)
5/8
φ function
• Let φ : Σ∗ → Q be a function induced by δ such that φ(w) is the state
DFA is in after scanning the string w.
• If φ(Ti ) = m ⇒ reached state m ⇒ suffix of Ti is matching with P.
(Finite automata based string matching algo)
6/8
Correctness of matching algo: ∀i=0,...,n φ(Ti ) = σ(Ti )
induction on i • basis: φ(T0 ) = 0 = σ(T0 )
(Finite automata based string matching algo)
7/8
Correctness of matching algo: ∀i=0,...,n φ(Ti ) = σ(Ti )
induction on i • basis: φ(T0 ) = 0 = σ(T0 )
• induction hypothesis: assume that φ(Ti ) = σ(Ti ) = q
(Finite automata based string matching algo)
7/8
Correctness of matching algo: ∀i=0,...,n φ(Ti ) = σ(Ti )
induction on i • basis: φ(T0 ) = 0 = σ(T0 )
• induction hypothesis: assume that φ(Ti ) = σ(Ti ) = q
• inductive step:
φ(Ti+1 )
(Finite automata based string matching algo)
7/8
Correctness of matching algo: ∀i=0,...,n φ(Ti ) = σ(Ti )
induction on i • basis: φ(T0 ) = 0 = σ(T0 )
• induction hypothesis: assume that φ(Ti ) = σ(Ti ) = q
• inductive step:
φ(Ti+1 )
= φ(Ti a)
(Finite automata based string matching algo)
7/8
Correctness of matching algo: ∀i=0,...,n φ(Ti ) = σ(Ti )
induction on i • basis: φ(T0 ) = 0 = σ(T0 )
• induction hypothesis: assume that φ(Ti ) = σ(Ti ) = q
• inductive step:
φ(Ti+1 )
= φ(Ti a)
= δ(φ(Ti ), a)
(Finite automata based string matching algo)
7/8
Correctness of matching algo: ∀i=0,...,n φ(Ti ) = σ(Ti )
induction on i • basis: φ(T0 ) = 0 = σ(T0 )
• induction hypothesis: assume that φ(Ti ) = σ(Ti ) = q
• inductive step:
φ(Ti+1 )
= φ(Ti a)
= δ(φ(Ti ), a)
= δ(q, a)
(Finite automata based string matching algo)
7/8
Correctness of matching algo: ∀i=0,...,n φ(Ti ) = σ(Ti )
induction on i • basis: φ(T0 ) = 0 = σ(T0 )
• induction hypothesis: assume that φ(Ti ) = σ(Ti ) = q
• inductive step:
φ(Ti+1 )
= φ(Ti a)
= δ(φ(Ti ), a)
= δ(q, a)
= σ(Pq a)
(from the defintion of δ)
(Finite automata based string matching algo)
7/8
Correctness of matching algo: ∀i=0,...,n φ(Ti ) = σ(Ti )
induction on i • basis: φ(T0 ) = 0 = σ(T0 )
• induction hypothesis: assume that φ(Ti ) = σ(Ti ) = q
• inductive step:
φ(Ti+1 )
= φ(Ti a)
= δ(φ(Ti ), a)
= δ(q, a)
= σ(Pq a)
(from the defintion of δ)
=? σ(Ti a)
(Finite automata based string matching algo)
7/8
Correctness of matching algo: ∀i=0,...,n φ(Ti ) = σ(Ti )
induction on i • basis: φ(T0 ) = 0 = σ(T0 )
• induction hypothesis: assume that φ(Ti ) = σ(Ti ) = q
• inductive step:
φ(Ti+1 )
= φ(Ti a)
= δ(φ(Ti ), a)
= δ(q, a)
= σ(Pq a)
(from the defintion of δ)
=? σ(Ti a)
= σ(Ti+1 )
(Finite automata based string matching algo)
7/8
Correctness of matching algo: ∀i=0,...,n φ(Ti ) = σ(Ti )
induction on i • basis: φ(T0 ) = 0 = σ(T0 )
• induction hypothesis: assume that φ(Ti ) = σ(Ti ) = q
• inductive step:
φ(Ti+1 )
= φ(Ti a)
= δ(φ(Ti ), a)
= δ(q, a)
= σ(Pq a)
(from the defintion of δ)
=? σ(Ti a)
= σ(Ti+1 )
concluding that there is a match whenever φ(Ti ) = |P| = σ(Ti )
(Finite automata based string matching algo)
7/8
Observation
Given that σ(Ti ) = q, we need to show that σ(Pq a) = σ(Ti a):
• supposing σ(Pq a) > σ(Ti a),
leads to σ(Pq ) > σ(Ti ) ⇒ q > σ(Ti ), a contradiction to premise
similarly, supposing σ(Pq a) < σ(Ti a),
leads to σ(Pq ) < σ(Ti ) ⇒ q < σ(Ti ), again a contradiction
(Finite automata based string matching algo)
8/8
© Copyright 2026 Paperzz