Notation Notation Notation Notation

Notation
Notation
Definition (Alphabets and words)
Definition (Words)
An alphabet is an nonempty finite set, the members of an alphabet
are called symbols (or letters). An alphabet is k-ary if it has size k.
A word over an alphabet Σ is a finite sequence of symbols in Σ.
The length |w | of a word w is the length of the sequence w .
The unique word of length 0 is denoted as empty word λ.
For a given alphabet Σ we let for every natural number n
Usually we write {1} for the unary alphabet, {0, 1} for the binary
alphabet, and {a0 , a1 , . . . , ak } for the k-ary alphabet in general.
Σ∗ : ={w : w is a word over Σ}
Every alphabet Σ comes with an implicitly and canonically defined
strict order <, e.g., we assume 0 < 1 for the binary alphabet.
This order may be be written as <Σ in order to avoid ambiguities.
Σ+ : ={w : w is a word over Σ & |w | > 0}
Σn : ={w : w is a word over Σ & |w | = n}
Σ≤n : ={w : w is a word over Σ & |w | ≤ n}
Remark
Alphabets with the same size can be identified via an appropriate
order isomorphism. E.g., the alphabets {a1 , a2 , a3 } and {a, b, c}
with orderings a1 < a2 < a3 and a < b < c can be identified via
the order isomorphism a1 7→ a, a2 7→ b, a3 7→ c .
Notation
and sets of words such as Σ<n , Σ≥n , Σ>n are defined likewise.
In the literature, words are also called strings, and the empty word
is also denoted by ε.
For any alphabet Σ the empty word is a member of Σ∗ .
Notation
Definition (Concatenation of words)
Let u1 , . . . , un and v1 , . . . , vm be symbols in appropriate alphabets
and let u = u1 . . . un and v = v1 . . . vm . The concatenation of the
two words u und v is uv = u1 . . . un v1 . . . vm , also written as u ◦ v .
For any natural number t ≥ 0 and word u, we let u t be the t-fold
concatenation of u with itself, i.e., u 0 = λ, u 1 = u, u 2 = uu, . . . .
Definition (Prefix, infix, suffix)
Let u and w be words. Then
u is a prefix of w in case w = uv for some word v ,
u is an infix of w in case w = v1 uv2 for some words v1 , v2 ,
u is a suffix of w in case w = vu for some words v .
Remark
For every alphabet Σ the operation concatenation is associative
on Σ∗ and the empty word λ is left- and right-neutral, i.e., for all
words u, v , and w
(u ◦ v ) ◦ w = u ◦ (v ◦ w )
and
u =λ◦u =u◦λ
This can be rephrased by saying that the structure (Σ∗ , ◦) is a
monoid, i.e., a semigroup with neutral element.
A prefix u of w is a proper prefix of w if u is different from w .
The notions proper infix and proper suffix are defined likewise.
We write v for the prefix relation, and @ for the proper prefix
relation, e.g., 01 @ 011.
If u is an infix of a word w , we may also say that u is a
(contiguous) subword of w .
Notation
Notation
Definition (Orderings on words)
Definition (Language)
Let Σ be an alphabet. The lexicographical ordering ≤lex on Σ∗ is
defined in the usual way, i.e., for all words
Let Σ be an alphabet. A language over Σ is a subset of Σ∗ .
u = u1 . . . un
and
v = v1 . . . vm
where are all ui and vi are in Σ, we have
u ≤lex v ⇔ u v v or
[ j = min{i : i ≤ n, i ≤ m, ui 6= vi } exists and uj <Σ vj ]
The length-lexicographical ordering ≤`` on Σ∗ is defined by
u ≤`` v ⇔ |u| < |v | or [|u| = |v | and u ≤lex v ].
We write <lex and <`` for the corresponding strict orderings.
Deterministic finite automata
Definition (Deterministic finite automaton)
A deterministic finite automaton, or DFA for short, is a 5-tupel
A = (Q, Σ, δ, s, F ),
where
Q is a finite set of states,
Given a language L and a word w over Σ, the characteristic
function of L is
(
1, in case w ∈ L,
χL (w ) =
0, otherwise.
We will usually write L(w ) instead of χL (w ).
If we let z0 <`` z1 <`` z2 · · · be the words over Σ in increasing
length-lexicographic order, a language L over Σ can be identified
with it characteristic sequence
L(z0 )L(z1 )L(z2 ) . . . .
Deterministic finite automata
Example 1 (Search for symbol b)
The figure below shows the transition diagram of an a DFA
A = ({qno , qyes }, {a, b}, δ, qno , {qyes })
that accepts the language
L(A) = {w ∈ {a, b}∗ : w contains some b}.
Σ is an alphabet, the input alphabet,
δ : Q × Σ → Q is the transition function,
qno
b
qyes
s ∈ Q is the initial state,
F ⊆ Q is the set of accepting states.
Finite automata are also called finite state machines, accepting
states may also be called final states.
a
a,b
The incoming arrow marks the initial state, the other arrows
specify the transition function, accepting states are shown as
double circles.
Deterministic finite automata
Deterministic finite automata
Example 1 (Search for symbol b)
Transition diagram of a DFA A that accepts the language
L(A) = {w ∈ {a, b}∗ : w contains some b}.
b
qno
The transition function of an DFA can also be specified by its
transition table.
Example 1 (Search for symbol b)
Transition diagram and transition table of a DFA A that accepts
the language L(A) = {w ∈ {a, b}∗ : w contains some b}.
qyes
Transition diagram of A
a
a,b
qno
b
Transition table of A
state
→ qno
qyes ∗
qyes
The DFA reads successively the symbols of the input and follows
on each symbol the corresponding transition.
Each state has exactly one outgoing edge for each symbol in Σ,
hence each input w determines a corresponding sequence of visited
states, which has length |w | + 1.
a
a
qno
qyes
b
qyes
qyes
a,b
In the transition table, the symbol → indicates the initial state,
final states are marked by the symbol ∗.
An input w is accepted if its processing ends in an accepting state.
Deterministic finite automata
Deterministic finite automata
Example 3 (Search for subword abac)
The following DFA A = (Q, {a, b}, δ, qλ , {qabac }) with state set
Q = {qλ , qa , qab , qaba , qabac } accepts the language
Example 2 (Modulo counting)
The following DFA A = ({q0 , . . . , q4 }, {0, 1}, δ, q0 , {q0 }) accepts
the language L(A) = {w ∈ {0, 1}∗ : |w | is divisible by 5}.
L(A) = {w ∈ {a, b, c}∗ : w contains the subword abac}.
0,1
b,c
q0
0,1
q1
0,1
q2
0,1
q3
0,1
qλ
q4
a
a
qa
c
a,b,c
a
b
qab
a
qaba
c
qabac
b
b,c
This automaton corresponds to an efficient algorithm for pattern
matching where a given short word is searched in a long word.
Deterministic finite automata
Regular languages
Definition (Extended transition function δ)
Definition (Accepted words and recognized languages)
The transition function δ of an DFA A = (Q, Σ, δ, s, F ) extends
canonically to an (extended) transition function δb: Q × Σ∗ → Q.
Let A = (Q, Σ, δ, s, F ) be a DFA. The DFA A accepts a word w if
the (uniquely determined) state that is reached after having read
the input w is accepting, i.e., if it holds that δ(s, w ) ∈ F .
b w ) are defined by induction over the length of w
The values δ(q,
simultaneously for all states q as follows.
Basis w = λ:
b λ) = q.
For all q ∈ Q let δ(q,
Induction step w = ua for some a ∈ Σ:
For all q ∈ Q let
b ua) = δ(δ(q,
b u), a).
δ(q,
The function δb can be viewed as the transitive closure of the
transition function δ. We will usually refer to δb as transition
b
function and will often write δ instead of δ.
Note that the transition function δ is contained in δb in the sense
b a) for all q ∈ Q and a ∈ Σ.
that we have δ(q, a) = δ(q,
Regular languages
L1 = {w ∈ {a, b}∗ : w contains some b},
L2 = {w ∈ {0, 1}∗ : |w | is divisible by 5},
L3 = {w ∈ {a, b, c}∗ : w contains the subword abac}.
We will later demonstrate that for example the following
languages L4 and L5 are not regular,
where
where we consider L(A) as a language over the alphabet Σ.
We also say, L(A) is the language decided or accepted by A.
Definition (Regular language)
A language L is regular if L is recogized by some DFA A, i.e., if
L = L(A).
Definition (Symbol sets of words and languages)
We have already seen that the following languages L1 , L2 , and L3
are regular
wR
L(A) = {w ∈ Σ∗ : A accepts w } = {w ∈ Σ∗ : δ(s, w ) ∈ F },
Regular languages
Example (Regular languages)
L4 = {0n 1n : n ≥ 0},
The language recognized by the DFA A is
L5 = {w #w R : w ∈ {0, 1, }∗ },
= wn . . . w1 is the mirror word of w = w1 . . . wn .
For a word w , let symb(w ) be the set of symbols that occur in w .
For a given language L over some alphabet
Σ, the alphabet of all
S
symbols that occur in L is Σ(L) = w ∈L symb(w ).
(Note that the set Σ(L) is finite because it is a subset of Σ.)
The function symb can be defined formally by induction over the
length of its argument w .
Remark (Alphabet change)
By definition, a language L over some alphabet Σ is regular if
(i) There is a DFA of the form (Q, Σ, . . .) that recognizes L.
It can be shown that (i), and (ii) and (iii) below are all equivalent.
(ii) There is a DFA of the form (Q, Σ(L), . . .) that recognizes L.
That language L2 is regular but L4 is not can be summarized by
saying that DFA can count modulo a constant but cannot count.
(iii) There is a DFA of the form (Q, Σ0 , . . .) that recognizes L such
that Σ(L) is a subset of the alphabet Σ0 .
Regular languages
Regular languages
Definition (Boolean functions)
A k-ary Boolean function is a function α : {0, 1}k → {0, 1}.
Definition
The (relative) complement of a language L with respect to an
alphabet Σ is
∗
∗
L̄ = Σ \ L = {w ∈ Σ : w ∈
/ L}.
In case it is clear from the context that a language L is explicitly or
implicitly considered as a language over a certain alphabet, usually
we speak simply of the complement of L in order to refer to the
complement of L with respect to this alphabet.
In a context of Boolean function the values 0 and 1 have the
intended meanings of being false and being true, respectively.
Accordingly, for example the k-ary function that attains the
value 1 exactly for the argument 1k is called k-ary AND function.
Definition (Set-theoretical operations)
Let α be a k-ary Boolean function for some k ≥ 0 and let Σ be
some alphabet. Then α induces a k-ary function Lα on the
languages over Σ such that for all words w over Σ it holds that
((Lα (L1 , . . . , Lk ))(w ) = α(L1 (w ), . . . , Lk (w )).
A set-theoretical operation is a function of the form Lα for some
Boolean function α.
Regular languages
Regular languages
Example (Set-theoretical operations)
Theorem (Closure under complementation)
We give examples of set-theoretical operations with respect to
some fixed alphabet Σ.
The class of regular languages is closed under complementation.
The 0-ary set-theoretical operations are the constant
mappings with values ∅ and Σ∗ .
That is, if L is a regular language over some alphabet Σ, then also
the complement of L is regular.
The unary set-theoretical operations are identity L 7→ L and
complement L 7→ L = Σ∗ \ L.
Proof. The proof works by swapping accepting and nonaccepting
states in an DFA that recognizes a given regular language.
Binary set-theoretical operations include set-theoretical union,
intersection, and difference, where for example the union of
two languages L1 and L2 is
Given a regular language L, i.e., L = L(A) for some DFA A, where
L1 ∪ L2 = {w : w ∈ L1 or w ∈ L1 }.
Ternary set-theoretical operations include ternary union and
intersection, where for example the union of L1 , L2 , and L3
can be defined as (L1 ∪ L2 ) ∪ L3 .
A = (Q, Σ, δ, s, F ),
let
A = (Q, Σ, δ, s, Q \ F ).
Note that A and A differ exactly in their sets of accepting states.
It follows that L(A) = L( A ) b ecause by construction we have
(∗)
(∗)
w ∈ L(A) ⇔ δ(s, w ) ∈ F ⇔ δ(s, w ) ∈
/ Q \F ⇔w ∈
/ L( A ).
The equivalences (∗) hold by definition of recognized language. t
u
Regular languages
Regular languages
Proof, cont.: A product automaton of A1 and A2 has the form
Theorem (Closure under union)
The class of regular languages is closed under union. That is, if the
languages L1 and L2 are regular, their union L1 ∪ L2 is regular, too.
Proof. For given regular languages L1 and L2 over alphabets Σ1
and Σ2 , respectively, choose DFAs
A1 = (Q1 , Σ, δ1 , s1 , F1 ) where L1 = L (A1 ),
A2 = (Q2 , Σ, δ2 , s2 , F2 ) where L2 = L (A2 ).
By the remark on alphabet change, we can assume Σ = Σ1 ∪ Σ2 .
The union of L1 and L2 will be recognized by a DFA that
simulates A1 and A2 in parallel, which is referred to as product
automaton.
Regular languages
where for all q1 ∈ Q1 , q2 ∈ Q2 , and a ∈ Σ it holds that
δ((q1 , q2 ), a) = (δ1 (q1 , a), δ2 (q2 , a)).
This choice of δ corresponds to running A1 and A2 in parallel.
Formally, this means that for all words w over Σ we have
b 1 , s2 ), w ) = (δb1 (s1 , w ), δb2 (s2 , w ))
δ((s
(∗)
If we let F = (F1 × Q2 ) ∪ (Q1 × F2 ), then (∗) implies that the
product automaton A recognizes the language L(A) = A1 ∪ A2 .
Regular languages
Proof, cont.: It remains to demonstrate
b 1 , s2 ), w ) = (δb1 (s1 , w ), δb2 (s2 , w ))
δ((s
A = (Q1 × Q1 , Σ, δ, (s1 , s2 ), F ),
Theorem (Closure under set-theoretical operations)
(∗),
which we do by induction over the length of w .
b 1 , s2 ), λ) = (s1 , s2 ) = (δb1 (s1 , λ), δb2 (s2 , λ)).
Basis w = λ: δ((s
Induction step w = ua for some a ∈ Σ:
b 1 , s2 ), ua) (1)
b 1 , s2 ), u), a) (2)
b 1 , u), δb2 (s2 , u)), a)
δ((s
= δ(δ((s
= δ((δ(s
(3)
b 1 , u), a), δ2 (δb2 (s2 , u), a))
= (δ1 (δ(s
The class of regular languages is closed under set-theoretical
operations.
Proof: We have already seen that the class of regular languages is
closed under complementation and union.
Variant 1. The proof of closure under union via product automata
extends naturally to all other set-theoretical operations.
Variant 2. The operations complementation and union are
complete in the sense that they can represent all set-theoretical
operations. For example, by the De Morgan’s rules we have
(4)
= (δb1 (s1 , ua), δb2 (s2 , ua)),
b (2)
where the equations hold by (1) the inductive definition of δ,
the induction hypothesis, (3) definition of δ, and (4) the inductive
definition of δb1 and δb2 .
t
u
L1 ∩ L2 = L1 ∪ L2 ,
hence if L1 and L2 are regular, then so are the languages L1 and L1 ,
and then in turn also their union and its complement L1 ∩ L2 . t
u
Nondeterministic finite automata
Nondeterministic finite automata
Definition (Nondeterministic finite automaton)
Example 4 (Nondeterministic search for subword abac)
A (nondeterministic) finite automaton, or FA for short, is a 5-tupel
The following FA A = (Q, {a, b}, δ, qλ , {qabac }) with state set
Q = {qλ , qa , qab , qaba , qabac } accepts the language
A = (Q, Σ, δ, s, F ),
L(A) = {w ∈ {a, b, c}∗ : w contains the subword abac}.
where
Q is a finite set of states,
Σ is an alphabet, the input alphabet,
δ: Q × Σ →
2Q
is the transition relation,
s ∈ Q is the initial state,
F ⊆ Q is the set of accepting states.
a,b,c
a,b,c
qλ
a
qa
b
qab
a
qaba
c
qabac
For
FAs are also called (nondeterministic) finite state machines.
Deterministic finite automata
example, we have δ(qλ , a) = {qλ , qa } and δ(qab , b) = ∅.
Regular languages
Definition (Extended transition function δ)
Remark
The transition relation δ of an FA A = (Q, Σ, δ, s, F ) extends
canonically to an (extended) transition relation δb: Q × Σ∗ → 2Q .
b w ) contains
By definition, for a word w = w1 . . . wn the set δ(q,
exactly the states that can be reached on having processed all of w
when starting in state q and following successively for i = 1, . . . , n
a transition with label wi .
b w ) are defined by induction over the length of w
The values δ(q,
simultaneously for all states q as follows.
Basis w = λ:
b λ) = {q}.
For all q ∈ Q let δ(q,
Induction step w = ua for some a ∈ Σ: For all q ∈ Q let
[
b ua) =
δ(q,
δ(q 0 , a).
b
q 0 ∈δ(q,u)
Similar to the case of DFAs, we will often refer to δb as transition
b
relation and will write δ instead of δ.
Again the transition relation δ is contained in δb in the sense that
b a) for all q ∈ Q and a ∈ Σ.
we have δ(q, a) = δ(q,
Definition (Acceptance by an FA)
Let A = (Q, Σ, δ, s, F ) be an FA. The FA A accepts a word w if
some accepting state can be reached after having processed the
input w , i.e., if the set δ(s, w ) intersects the set F .
The language recognized by the FA A is
L(A) = {w ∈ Σ∗ : A accepts w } = {w ∈ Σ∗ : δ(s, w ) ∩ F 6= ∅},
where we consider L(A) as a language over the alphabet Σ.
We also say, L(A) is the language decided or accepted by A.
Regular languages
Regular languages
Remark
Lemma (Power set construction)
DFAs can be viewed as a special form of FA and accordingly every
regular language is recognized by an FA.
Given a regular language L where L = L(A) for some DFA
A = (Q, Σ, δD , s, F ) consider the FA N = (Q, Σ, δN , s, F ) where
each set δN (q, a) just contains the single state {δD (q, a)}.
For every FA N there is a DFA D sucht that L(N) = L(D).
Proof. Given an FA N = (QN , Σ, δN , sN , FN ), let
D = (2Q , Σ, δD , {sN }, FD },
where δD (H, a) =
[
δN (q, a).
q∈H
By construction, the FA N recognizes the same language as A.
We will show below that the definition of δD implies for all w ∈ Σ∗
Theorem (FAs recognize exactly the regular languages)
The class of languages recognized by FAs coincides with the
regular languages.
By the preceding remark, the theorem follows if we can show that
for every FA N the language recognized by N is regular.
Regular languages
δbD ({sN }, w ) = δbN (sN , w ),
(∗)
i.e., the state that D reaches after having read an input w is just
the set of states that N can nondeterministically reach after having
read the input w .
Regular languages
Proof, cont.: In order to achieve L(N) = L(D), exploiting the
equation
δbD ({sN }, w ) = δbN (sN , w ),
(∗),
Proof, cont.: It remains to demonstrate that (∗) is true, i.e., that
it suffices to define the set of accepting states of D as
which we show by induction over the length of w .
FD = {H ⊆ QN : H ∩ FN 6= ∅}.
Indeed, this way we obtain for all words w over Σ
(1)
w ∈ L(D) ⇔ δbD ({sN }, w ) ∈ FD
(2)
⇔ δbD ({sN }, w ) ∩ FN 6= ∅
(3)
(4)
⇔ δbN (sN , w ) ∩ FN 6= ∅ ⇔ w ∈ L(N)
wehre the equivalences hold (1) by definition of L(D), (2) by
Definition of FD , (3) by (∗), and (4) by definition of L(N).
δbD ({sN }, w ) = δbN (sN , w ),
Basis w = λ:
We have δbD ({sN }, λ) = {sN } = δbN (sN , λ).
Induction step w = ua for some a ∈ Σ:
We have
(1)
(2)
δbD ({sN }, ua) = δD (δbD ({sN }, u), a) = δD (δbN (sN , u), a)
[
(3)
(4)
=
δN (q, a) = δbN (sN , ua),
q∈δbN (sN ,u)
where the equations hold (1) by the inductive definition of δbD , (2)
by the induction hypothesis, (3) by the definition of δd , and (4) by
the inductive definition of δbN .
t
u
Regular languages
Regular languages
Remark (Exponential increase of number of states)
Example (kth last symbol)
There are FAs N such that every DFA D where L(N) = L(D) has
an exponentially larger number of states than N.
For any k ≥ 1, consider the set
Lk = {w ∈ {0, 1}∗ : w = w1 . . . wn for some n ∈ N & wn−k+1 = 1}
of binary words such that the kth last symbol is a 1. Each set Lk
is regular and is recognized by an FA with k + 1 states. For the
case k = 4, the corresponding FA has the following diagramm.
0,1
q0
For a proof, recall from the example above that each language Lk
is recognized by an FA with k + 1 states. We show that for any
DFA D = (Q, σ, δ, s, F ) that recognizes Lk we have |Q| ≥ 2k .
Assuming otherwise, by the pigeonhole principle there are two
distinct words u and v of length k such that D is in the same
state q after having read u and v , i.e., δ(s, u) = δ(s, v ) = q.
Let j ∈ {1, . . . , k} be the least position where u and v differ and
consider the two words u0j−1 and v 0j−1 .
1
q1
0,1
q2
0,1
q3
0,1
q4
Then exactly one of the two latter words is in Lk , whereas these
words are either both in L(D) or both not in L(D) because of
δ(s, u0j−1 ) = δ(s, v 0j−1 ) = δ(q, 0j−1 ).