Formal Languages applied to Linguistics

Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Formal Languages applied to Linguistics
Pascal Amsili
Laboratoire de Linguistique Formelle, Université Paris Diderot
U. São Carlos, september 2014
1 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Overview
1
Formal Languages
Base notions
Definition
Problem
2
Formal Grammars
3
Regular Languages
4
Formal complexity of Natural Languages
2 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Alphabet, word
Def. 1 (Alphabet)
An alphabet ⌃ is a finite set of symbols (letters). The size of the
alphabet is the cardinal of the set.
Def. 2 (Word)
A word on the alphabet ⌃ is a finite sequence of letters from ⌃.
Formally, let [p] = (1, 2, 3, 4, ..., p) (ordered integer sequence).
Then a word is a mapping
u : [p] ! ⌃
p, the length of u, is noted |u|.
3 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples I
Alphabet
Words
Alphabet
Words
{0,1,2,3,4,5,6,7,8,9, · }
235 · 29
007 · 12
·1 · 1 · 00 · ·
3 · 1415962 . . . (⇡)
...
{, }
...
4 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples II
Alphabet
Words
Alphabet
Words
{
,
,
,
,
, ... }
...
{a, man, loves, woman }
a
a man loves a woman
man man a loves woman loves a
...
5 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Monoid
Def. 3 (⌃⇤ )
Let ⌃ be an alphabet.
The set of all the words that can be formed with any number of
letters from ⌃ is noted ⌃⇤
It comprises a word with no letter, noted "
Example: ⌃
⌃⇤
= {a, b, c}
= {", a, b, c, aa, ab, ac, ba, . . . , bbb, . . .}
N.B.: ⌃⇤ is always infinite, except. . .
6 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Monoid
Def. 3 (⌃⇤ )
Let ⌃ be an alphabet.
The set of all the words that can be formed with any number of
letters from ⌃ is noted ⌃⇤
It comprises a word with no letter, noted "
Example: ⌃
⌃⇤
= {a, b, c}
= {", a, b, c, aa, ab, ac, ba, . . . , bbb, . . .}
N.B.: ⌃⇤ is always infinite, except. . .
if ⌃ = ;. Then ⌃⇤ = {"}.
6 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Structure of ⌃⇤
Let k be the size of the alphabet k = |⌃|.
Then ⌃⇤ contains : k 0 = 1
k1 = k
k2
...
kn
word(s) of 0 letters (")
word(s) of 1 letters
word(s) of 2 letters
words of n letters, 8n
0
7 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Representation of ⌃⇤
⌃ = {a, b, c}
"
HH
H
a
H
HH
ab ac
aaa
aa
HH
H
aab aac
ba
HH
H
b
H
HH
bb bc
H
c
H
HH
ca cb cc
...
Words can be enumerated according to different orders
⌃⇤ is a countable set
8 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Concatenation
⌃⇤ can be equipped with a binary operation: the concatenation
Def. 4 (Concatenation)
u
w
Let [p] ! X , [q] ! X . The concatenation of u and w , noted
uw (u.w ) is thus defined:
uw : [p + q]⇢ ! X
ui
for i 2 [1, p]
uwi =
wi p for i 2 [p + 1, p + q]
9 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Concatenation
⌃⇤ can be equipped with a binary operation: the concatenation
Def. 4 (Concatenation)
u
w
Let [p] ! X , [q] ! X . The concatenation of u and w , noted
uw (u.w ) is thus defined:
uw : [p + q]⇢ ! X
ui
for i 2 [1, p]
uwi =
wi p for i 2 [p + 1, p + q]
Example : u
v
bacba
cca
9 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Concatenation
⌃⇤ can be equipped with a binary operation: the concatenation
Def. 4 (Concatenation)
u
w
Let [p] ! X , [q] ! X . The concatenation of u and w , noted
uw (u.w ) is thus defined:
uw : [p + q]⇢ ! X
ui
for i 2 [1, p]
uwi =
wi p for i 2 [p + 1, p + q]
Example : u
v
uv
bacba
cca
bacbacca
9 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Factor
Def. 5 (Factor)
A factor w of u is a subset of adjascent letters in u.
–w is a factor of u
, 9u1 , u2 s.t. u = u1 wu2
–w is a left factor (prefix) of u
, 9u2 s.t. u = wu2
–w is a right factor (suffix) of u , 9u1 s.t. u = u1 w
Def. 6 (Factorization)
We call factorization the decomposition of a word in factors.
10 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Role of concatenation
1
2
Words have been defined on ⌃.
If one takes two such words, it’s always possible to form a new
word by concatenating them.
Any word can be factorised in many different ways:
abac c ab
11 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Role of concatenation
1
2
Words have been defined on ⌃.
If one takes two such words, it’s always possible to form a new
word by concatenating them.
Any word can be factorised in many different ways:
abac c ab
(a b a)(c c a b)
11 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Role of concatenation
1
2
Words have been defined on ⌃.
If one takes two such words, it’s always possible to form a new
word by concatenating them.
Any word can be factorised in many different ways:
abac c ab
(a b)(a c c)(a b)
11 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Role of concatenation
1
2
Words have been defined on ⌃.
If one takes two such words, it’s always possible to form a new
word by concatenating them.
Any word can be factorised in many different ways:
abac c ab
(a b a c c)(a b)
11 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Role of concatenation
1
2
Words have been defined on ⌃.
If one takes two such words, it’s always possible to form a new
word by concatenating them.
Any word can be factorised in many different ways:
abac c ab
(a)(b)(a)(c)(c)(a)(b)
11 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Role of concatenation
1
2
3
4
Words have been defined on ⌃.
If one takes two such words, it’s always possible to form a new
word by concatenating them.
Any word can be factorised in many different ways:
abac c ab
(a)(b)(a)(c)(c)(a)(b)
Since all letters of ⌃ form a word of length 1
(this set of words is called the base),
any word of ⌃⇤ can be seen as a (unique) sequence of
concatenations of length 1 words :
abac c ab
((((((ab)a)c)c)a)b)
((((((a.b).a).c).c).a).b)
11 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Properties of concatenation
1
Concatenation is non commutative
2
Concatenation is associative
3
Concatenation has an identity (neutral) element: "
1
2
Notation : a.a.a = a3
3
uv .w 6= w .uv
(u.v ).w = u.(v .w )
u." = ".u = u
12 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Overview
1
Formal Languages
Base notions
Definition
Problem
2
Formal Grammars
3
Regular Languages
4
Formal complexity of Natural Languages
13 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Language
Def. 7 ((Formal) Language)
Let ⌃ be an alphabet.
A language on ⌃ is a set of words on ⌃.
14 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Language
Def. 7 ((Formal) Language)
Let ⌃ be an alphabet.
A language on ⌃ is a set of words on ⌃.
or, equivalently,
A language on ⌃ is a subset of ⌃⇤
14 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples I
Let ⌃ = {a, b, c}.
15 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples I
Let ⌃ = {a, b, c}.
L1 = {aa, ab, bac}
finite language
15 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples I
Let ⌃ = {a, b, c}.
L1 = {aa, ab, bac}
L2 = {a, aa, aaa, aaaa . . .}
finite language
15 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples I
Let ⌃ = {a, b, c}.
L1 = {aa, ab, bac}
L2 = {a, aa, aaa, aaaa . . .}
or L2 = {ai / i
1}
finite language
infinite language
15 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples I
Let ⌃ = {a, b, c}.
L1 = {aa, ab, bac}
L2 = {a, aa, aaa, aaaa . . .}
or L2 = {ai / i
1}
L3 = {"}
finite language
infinite language
finite language,
reduced to a singleton
15 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples I
Let ⌃ = {a, b, c}.
L1 = {aa, ab, bac}
L2 = {a, aa, aaa, aaaa . . .}
or L2 = {ai / i
1}
L3 = {"}
finite language
infinite language
finite language,
reduced to a singleton
6=
15 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples I
Let ⌃ = {a, b, c}.
L1 = {aa, ab, bac}
L2 = {a, aa, aaa, aaaa . . .}
or L2 = {ai / i
1}
L3 = {"}
L4 = ;
finite language
infinite language
finite language,
reduced to a singleton
6=
“empty” language
15 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples I
Let ⌃ = {a, b, c}.
L1 = {aa, ab, bac}
L2 = {a, aa, aaa, aaaa . . .}
or L2 = {ai / i
1}
L3 = {"}
L4 = ;
L5 = ⌃⇤
finite language
infinite language
finite language,
reduced to a singleton
6=
“empty” language
15 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples II
Let ⌃ = {a, man, loves, woman}.
16 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples II
Let ⌃ = {a, man, loves, woman}.
L = { a man loves a woman, a woman loves a man }
16 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples II
Let ⌃ = {a, man, loves, woman}.
L = { a man loves a woman, a woman loves a man }
Let ⌃0 = {a, man, who, saw, fell}.
16 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Examples II
Let ⌃ = {a, man, loves, woman}.
L = { a man loves a woman, a woman loves a man }
Let ⌃0 = {a, man, who, saw, fell}.
8
9
a
man
fell,
>
>
>
>
<
=
a
man
who
saw
a
man
fell,
0
L =
a man who saw a man who saw a man fell, >
>
>
>
:
;
...
16 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Set operations
Since a language is a set, usual set operations can be defined:
union
intersection
set difference
17 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Set operations
Since a language is a set, usual set operations can be defined:
union
intersection
set difference
) One may describe a (complex) language as the result of set
operations on (simpler) languages:
{a2k / k 1} = {a, aa, aaa, aaaa, . . .} \ {ww / w 2 ⌃⇤ }
17 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Additional operations
Def. 8 (product operation on languages)
One can define the language product and its closure the Kleene star
operation:
The product of languages is thus defined:
L1 .L2 = {uv / u 2 L1 & v 2 L2 }
k times
z }| {
Notation: L.L.L . . . L = Lk ; L0 = {"}
The Kleene star of a language is thus defined:
L⇤ =
S
n 0L
n
18 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Regular expressions
It is common to use the 3 rational operations:
union
product
Kleene star
to characterize certain languages...
19 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Regular expressions
It is common to use the 3 rational operations:
union
product
Kleene star
to characterize certain languages...
({a} [ {b})⇤ .{c} = {c, ac, abc, bc, . . . , baabaac, . . .}
(simplified notation (a|b)⇤ c — regular expressions)
19 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Regular expressions
It is common to use the 3 rational operations:
union
product
Kleene star
to characterize certain languages...
({a} [ {b})⇤ .{c} = {c, ac, abc, bc, . . . , baabaac, . . .}
(simplified notation (a|b)⇤ c — regular expressions)
... but not all languages can be thus characterized.
19 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Overview
1
Formal Languages
Base notions
Definition
Problem
2
Formal Grammars
3
Regular Languages
4
Formal complexity of Natural Languages
20 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Back to “Natural” Languages
English as a formal language:
alphabet morphemes (often simplified to words —depending on
your view on flexional morphology)
) Finite at a time t by hypothesis
words well formed English sentences
) English sentences are all finite by hypothesis
language English, as a set of an infinite number of well formed
combinations of “letters” from the alphabet
21 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Discussion I
1
is the alphabet finite?
closed class morphemes obviously
open class morphemes what about “new words”?
morphological derivations can be seen as
produced from an unchanged
inventory (1)
other words loan words (rare)
lexical inventions (rare)
change of category (2) (bounded)
) negligable
(1)
motherese = mother+ese
(2)
americanA ! americanN
22 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Discussion II
2
is English infinite ?
It is supposed that you can always profer a longuer sentence
than the previous one by adding linguistic material preserving
well-formedness.
Compatible with the working memory limit
(Langendoen & Postal, 1984)
3
is language discrete ?
Well, that’s another story
23 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
About infinity
Linguists sometimes have trouble with infinity:
In order for there to be an infinite number of sentences in a
language there must either be an infinite number of words
in the language (clearly not true) or there must be the possibility
of infinite length sentences. The product of two finite numbers
is always a finite number.
(Mannell, 1999)
and many others
24 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
About infinity
Linguists sometimes have trouble with infinity:
In order for there to be an infinite number of sentences in a
language there must either be an infinite number of words
in the language (clearly not true) or there must be the possibility
of infinite length sentences. The product of two finite numbers
is always a finite number.
(Mannell, 1999)
and many others
!! WRONG !!
24 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
About infinity
Linguists sometimes have trouble with infinity:
In order for there to be an infinite number of sentences in a
language there must either be an infinite number of words
in the language (clearly not true) or there must be the possibility
of infinite length sentences. The product of two finite numbers
is always a finite number.
(Mannell, 1999)
and many others
!! WRONG !!
The whole point of formal languages is that they are infinite
sets
::::::
of ::::::
finite words on a finite
alphabet.
:::::
24 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
About infinity
Linguists sometimes have trouble with infinity:
In order for there to be an infinite number of sentences in a
language there must either be an infinite number of words
in the language (clearly not true) or there must be the possibility
of infinite length sentences. The product of two finite numbers
is always a finite number.
(Mannell, 1999)
and many others
!! WRONG !!
The whole point of formal languages is that they are infinite
sets
::::::
of ::::::
finite words on a finite
alphabet.
:::::
von Humbolt: language is an infinite use of finite means
(quoted by Chomsky)
24 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Base notions
Definition
Problem
Good questions
Why would one consider natural language as a formal language?
it allows to describe the language in a
formal/compact/elegant way
it allows to compare various languages (via classes of
languages established by mathematicians)
it give algorithmic tools to recognize and to analyse words
of a language.
recognize u : decide whether u 2 L
analyse u : show the internal structure of u
25 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Overview
1
Formal Languages
2
Formal Grammars
Definition
Language classes
3
Regular Languages
4
Formal complexity of Natural Languages
26 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Introduction
Formal grammars have been proposed by Chomsky as one of the
available means to characterize a formal language.
Other means include :
Turing machines (automata)
-terms
...
27 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Formal grammar
Def. 9 ((Formal) Grammar)
A formal grammar is defined by h⌃, N, S, Pi where
⌃ is an alphabet
N is a disjoint alphabet non-terminal vocabulary)
S 2 V is a distinguished elemnt of N, called the axiom
P is a set of « production rules », namely a subset of the
cartesian product (⌃ [ N)⇤ N(⌃ [ N)⇤ ⇥ (⌃ [ N)⇤ .
28 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Examples
h⌃, N, S, Pi
G0 =
*
29 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Examples
h⌃, N, S, Pi
G0 =
*
{joe, sam, sleeps},
29 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Examples
h⌃, N, S, Pi
G0 =
*
{joe, sam, sleeps}, {N, V , S},
29 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Examples
h⌃, N, S, Pi
G0 =
*
{joe, sam, sleeps}, {N, V , S}, S,
29 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Examples
h⌃, N, S, Pi
G0 =
*
8
(N, joe)
>
>
<
(N, sam)
{joe, sam, sleeps}, {N, V , S}, S,
(V , sleeps)
>
>
:
(S, N V )
9
>
+
>
=
>
>
;
}
29 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Examples
h⌃, N, S, Pi
G0 =
*
8
N ! joe
>
>
<
N ! sam
{joe, sam, sleeps}, {N, V , S}, S,
V ! sleeps
>
>
:
S !NV
9
>
+
>
=
>
>
;
}
29 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Examples (cont’d)
8
S ! SN SV
>
>
>
*
>
< SN ! Np
SV ! V
G1 = {jean, dort}, {Np, SN, SV , V , S}, S,
>
>
> Np ! jean
>
:
V ! dort
G2 = h{(, )}, {S}, S, {S ! " | (S)S}i
9
>
>
>
+
>
=
>
>
>
>
;
}
30 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Notation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
31 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Notation
G3 : E
! E +E
|
E ⇥E
|
(E )
|
F
F
! 0|1|2|3|4|5|6|7|8|9
G3 = h{+, ⇥, (, ), 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {E , F }, E , {. . .}i
31 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Notation
G3 : E
! E +E
|
E ⇥E
|
(E )
|
F
F
! 0|1|2|3|4|5|6|7|8|9
G3 = h{+, ⇥, (, ), 0, 1, 2, 3, 4, 5, 6, 7, 8, 9}, {E , F }, E , {. . .}i
G4 = E ! E + T | T , T ! T ⇥ F | F , F ! (E ) | a
31 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Immediate Derivation
Def. 10 (Immediate derivation)
Let G = hX , V , S, Pi a grammar, (f , g ) 2 (X [ V )⇤ two “words”,
r 2 P a production rule, such that r : A ! u (u 2 (X [ V )⇤ ).
• f derives into g (immediate derivation) with the rule r
r
(noted f ! g ) iff
9v , w s.t. f = vAw and g = vuw
• f derives into g (immediate derivation) in the grammar G
G
(noted f ! g ) iff
r
9r 2 P s.t. f ! g .
32 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Derivation
Def. 11 (Derivation)
f
G⇤
! g if f = g
9f0 , f1 , f2 , ..., fn s.t. f0 = f
fn = g
8i 2 [1, n] : fi
or
1
G
! fi
An example with G0 :
N V joe N
33 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Derivation
Def. 11 (Derivation)
f
G⇤
! g if f = g
9f0 , f1 , f2 , ..., fn s.t. f0 = f
fn = g
8i 2 [1, n] : fi
or
1
G
! fi
An example with G0 :
N V joe N ! sam V joe N
33 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Derivation
Def. 11 (Derivation)
f
G⇤
! g if f = g
9f0 , f1 , f2 , ..., fn s.t. f0 = f
fn = g
8i 2 [1, n] : fi
An example with G0 :
N V joe N ! sam V joe N
or
1
G
! fi
! sam V joe joe
or
33 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Derivation
Def. 11 (Derivation)
f
G⇤
! g if f = g
9f0 , f1 , f2 , ..., fn s.t. f0 = f
fn = g
8i 2 [1, n] : fi
An example with G0 :
N V joe N ! sam V joe N
or
1
G
! fi
! sam V joe joe
sam V joe sam
or
or
33 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Derivation
Def. 11 (Derivation)
f
G⇤
! g if f = g
9f0 , f1 , f2 , ..., fn s.t. f0 = f
fn = g
8i 2 [1, n] : fi
An example with G0 :
N V joe N ! sam V joe N
or
1
G
! fi
! sam V joe joe
sam V joe sam
sam sleeps joe N
...
or
or
or
33 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Endpoint of a derivation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
An example with G3 :
E ⇥E
34 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Endpoint of a derivation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
An example with G3 :
E ⇥E
! F ⇥E
34 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Endpoint of a derivation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
An example with G3 :
E ⇥E
! F ⇥E
! 3⇥E
34 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Endpoint of a derivation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
An example with G3 :
E ⇥E
! F ⇥E
! 3⇥E
! 3 ⇥ (E )
34 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Endpoint of a derivation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
An example with G3 :
E ⇥E
! F ⇥E
! 3⇥E
! 3 ⇥ (E ) ! 3 ⇥ (E + E )
34 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Endpoint of a derivation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
An example with G3 :
E ⇥E ! F ⇥E
3 ⇥ (E + F )
! 3⇥E
! 3 ⇥ (E ) ! 3 ⇥ (E + E ) !
34 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Endpoint of a derivation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
An example with G3 :
E ⇥E ! F ⇥E ! 3⇥E
3 ⇥ (E + F ) ! 3 ⇥ (E + 4)
! 3 ⇥ (E ) ! 3 ⇥ (E + E ) !
34 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Endpoint of a derivation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
An example with G3 :
E ⇥ E ! F ⇥ E ! 3 ⇥ E ! 3 ⇥ (E ) ! 3 ⇥ (E + E ) !
3 ⇥ (E + F ) ! 3 ⇥ (E + 4) ! 3 ⇥ (F + 4)
34 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Endpoint of a derivation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
An example with G3 :
E ⇥ E ! F ⇥ E ! 3 ⇥ E ! 3 ⇥ (E ) ! 3 ⇥ (E + E ) !
3 ⇥ (E + F ) ! 3 ⇥ (E + 4) ! 3 ⇥ (F + 4) ! 3 ⇥ (5 + 4)
34 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Endpoint of a derivation
G3 : E
F
!
|
|
|
!
E +E
E ⇥E
(E )
F
0|1|2|3|4|5|6|7|8|9
An example with G3 :
E ⇥ E ! F ⇥ E ! 3 ⇥ E ! 3 ⇥ (E ) ! 3 ⇥ (E + E ) !
3 ⇥ (E + F ) ! 3 ⇥ (E + 4) ! 3 ⇥ (F + 4) ! 3 ⇥ (5 + 4) !|
34 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 :
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 : S ! (S)S
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 : S ! (S)S ! ()S
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 : S ! (S)S ! ()S ! ()
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 : S ! (S)S ! ()S ! ()
as well as ((())), ()()(), ((()()())). . .
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 : S ! (S)S ! ()S ! ()
as well as ((())), ()()(), ((()()())). . .
but )()( 62 LG2 , even though the following is a licit derivation :
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 : S ! (S)S ! ()S ! ()
as well as ((())), ()()(), ((()()())). . .
but )()( 62 LG2 , even though the following is a licit derivation :
)S( !
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 : S ! (S)S ! ()S ! ()
as well as ((())), ()()(), ((()()())). . .
but )()( 62 LG2 , even though the following is a licit derivation :
)S( ! )(S)S( !
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 : S ! (S)S ! ()S ! ()
as well as ((())), ()()(), ((()()())). . .
but )()( 62 LG2 , even though the following is a licit derivation :
)S( ! )(S)S( ! )()S( !
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 : S ! (S)S ! ()S ! ()
as well as ((())), ()()(), ((()()())). . .
but )()( 62 LG2 , even though the following is a licit derivation :
)S( ! )(S)S( ! )()S( ! )()(
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Engendered language
Def. 12 (Language engendered by a word)
Let f 2 (⌃ [ N)⇤ .
LG (f ) = {g 2 X ⇤ /f
G⇤
! g}
Def. 13 (Language engendered by a grammar)
The language engendered by a grammar G is the set of words of ⌃⇤
derived from the axiom.
LG = LG (S)
For instance () 2 LG2 : S ! (S)S ! ()S ! ()
as well as ((())), ()()(), ((()()())). . .
but )()( 62 LG2 , even though the following is a licit derivation :
)S( ! )(S)S( ! )()S( ! )()(
for there is no way to arrive at )S( starting with S.
35 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Example
G4 = E ! E + T | T , T ! T ⇥ F | F , F ! (E ) | a
a + a, a + (a ⇥ a), ...
36 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Proto-word
Def. 14 (Proto-word)
A proto-word (or proto-sentence) is a word on (⌃ [ N)⇤ N(⌃ [ N)⇤
(that is, a word containing at least one letter of N) produced by a
derivation from the axiom.
E !E +T !E +T ⇤F !T +T ⇤F !T +F ⇤F !
T + a ⇤ F ! F + a ⇤ F ! a + a ⇤ F ! ///////////
a+a⇤a
37 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Multiple derivations
A given word may have several derivations:
E ! E +E ! F +E ! F +F ! 3+F ! 3+4
38 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Multiple derivations
A given word may have several derivations:
E ! E +E ! F +E ! F +F ! 3+F ! 3+4
E ! E +E ! E +F ! E +4 ! F +4 ! 3+4
38 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Multiple derivations
A given word may have several derivations:
E ! E +E ! F +E ! F +F ! 3+F ! 3+4
E ! E +E ! E +F ! E +4 ! F +4 ! 3+4
... but if the grammar is not ambiguous, there is only one left
derivation:
38 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Multiple derivations
A given word may have several derivations:
E ! E +E ! F +E ! F +F ! 3+F ! 3+4
E ! E +E ! E +F ! E +4 ! F +4 ! 3+4
... but if the grammar is not ambiguous, there is only one left
derivation:
E ! E +E ! F +E ! 3+E ! 3+F ! 3+4
38 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Multiple derivations
A given word may have several derivations:
E ! E +E ! F +E ! F +F ! 3+F ! 3+4
E ! E +E ! E +F ! E +4 ! F +4 ! 3+4
... but if the grammar is not ambiguous, there is only one left
derivation:
E ! E +E ! F +E ! 3+E ! 3+F ! 3+4
parsing : trying to find the/a left derivation (resp. right)
38 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Derivation tree
For context-free languages, there is a way to represent the set of
equivalent derivations, via a derivation tree which shows all the
derivation independantly of their order.
S
Grammar G2 : S
! "
|
(S)S
⇣⇣P
@PPP
⇣⇣
⇣
@
PP
⇣
PP
⇣⇣
@
(
S
⇣P
⇣
P
⇣⇣ @ PP
(
S
)
"
S
)
S
"
"
S ! (S)S ! ((S)S)S ! ((S)S) ! ((S)) ! (())
39 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Structural analysis
Syntactic trees are precious to give access to the semantics
E
E
HH
H
H
+
T
T
T
F
F
a
a
HH
⇤
F
a
40 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Ambiguity
When a grammar can assign more than one derivation tree to a
word w 2 L(G ) (or more than one left derivation), the grammar is
ambiguous.
For instance, G3 is ambiguous, since it can assign the two follwing
trees to 1 + 2 ⇥ 3:
E
E
H
HH
E
+
H
E
H
F
E
1
⇥
H
HH
H
E
H
H
H
⇥
E
E
F
F
F
3
1
2
E
E
F
F
2
3
+
41 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
About ambiguity
Ambiguity is not desirable for the semantics
Useful artificial languages are rarely ambiguous
There are context-free languages that are intrinsequely
ambiguous (3)
Natural languages are notoriously ambiguous...
(3)
{an bam bap baq |(n
q^m
p) _ (n
m^p
q)}
42 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Comparison of grammars
different languages generated ) different grammars
same language generated by G and G 0 :
) same weak generative power
same language generated by G and G 0 , and same structural
decomposition :
) same strong generative power
43 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Overview
1
Formal Languages
2
Formal Grammars
Definition
Language classes
3
Regular Languages
4
Formal complexity of Natural Languages
44 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Principle
Define language families on the basis of properties of the
grammars that generate them :
1
Four classes are defined, they are included one in another
2
A language is of type k if it can be recognized by a type k
grammar (and thus, by definition, by a type k 1 grammar) ;
and cannot be recognized by a grammar of type k + 1.
45 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Chomsky’s hierarchy
type 0 No restriction on
P ⇢ (X [ V )⇤ V (X [ V )⇤ ⇥ (X [ V )⇤ .
type 1 (context-sensitive grammars) All rules of P are of the
shape (u1 Su2 , u1 mu2 ), where u1 and u2 2 (X [ V )⇤ ,
S 2 V and m 2 (X [ V )+ .
type 2 (context-free grammar) All rules of P are of the
shape (S, m), where S 2 V and m 2 (X [ V )⇤ .
type 3 (regular grammars) All rules of P are of the shape
(S, m), where S 2 V and m 2 X .V [ X [ {"}.
46 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Examples
type
S
B
A
3:
! aS | aB | bB | cA
! bB | b
! cS | bB
47 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Examples
type
S
B
A
3:
! aS | aB | bB | cA
! bB | b
! cS | bB
type 2:
E ! E + T | T , T ! T ⇥ F | F , F ! (E ) | a
47 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Example 1 type 0
Type 0:
S ! SABC AC ! CA
S !"
CA ! AC
AB ! BA
BC ! CB
BA ! AB
CB ! BC
generated language :
A !a
B !b
C !c
48 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Example 1 type 0
Type 0:
S ! SABC AC ! CA
S !"
CA ! AC
AB ! BA
BC ! CB
BA ! AB
CB ! BC
generated language : words
A !a
B !b
C !c
with an equal number of a, b, and c.
48 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Example 2: type 0
Type 0: S
S0
S0
S0
! $S 0 $
! aAS 0
! bBS 0
!"
Aa
Ab
Ba
Bb
! aA
! bA
! aB
! bB
$a ! a$
$b ! b$
A$ ! $a
B$ ! $b
$$ ! #
49 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Example 2: type 0 (cont’d)
S
H
HH
H
S0
$
a
HH
H
$
HH
H
S0
A
b
HH
0
B
S
"
$
a
a
a
a
a
a
a
$
$
$
b
b
b
A b
A b
A b
b A
$ A
$ $
#
B
B
$
$
$
a
a
$
$
b
b
b
b
b
50 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Language families
recursive
finite
regular
3
recursively enumerable
context−sensitive
2
1
context−free
formal
0
no constraint
Turing−decidable
Turing−recognizable
51 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
Definition
Language classes
Remarks
There are others ways to classify languages,
either on other properties of the grammars;
or on other properties of the languages
Nested structures are preferred, but it’s not necessary
When classes are nested, it is expected to have a growth of
complexity/expressive power
52 / 74
Formal Languages
Formal Grammars
Regular Languages
Formal complexity of Natural Languages
References
References I
Aho, Alfred, & Ullman, Jeffrey. 1993. Concepts fondamentaux de l’informatique. Dunod. Traduction de
Foundations of Computer Science, 1992, W.H. Freeman and Company, New York.
Langendoen, D Terence, & Postal, Paul Martin. 1984. The vastness of natural languages. Basil
Blackwell Oxford.
Mannell, Robert. 1999. Infinite number of sentences. part of a set of class notes on the Internet.
74 / 74