Formal Languages, Coinductively Formalized

Formal Languages, Coinductively Formalized
Andreas Abel
Department of Computer Science and Engineering
Chalmers and Gothenburg University
Departmental Seminar
Department of Computer and Information Sciences
Strathclyde University, Glasgow, Scotland, UK
19 April 2016
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
1 / 30
Contents
1
Formal Languages
2
Coinductive Types and Copatterns
3
Bisimilarity
4
Sized Coinductive Types
5
Conclusions
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
2 / 30
Formal Languages
Formal Languages
A language is a set of strings over some alphabet A.
Real life examples:
Orthographically and grammatically correct English texts (infinite set).
Orthographically correct English texts (even bigger set).
List of university employees plus their phone extension.
AbelAndreas1731,CoquandThierry1030,DybjerPeter1035,...
Programming language examples:
The set of grammatically correct JAVA programs.
The set of decimal numbers.
The set of well-formed string literals.
Languages can describe protocols, e.g. file access.
A = {o, r , w , c} (open, read, write, close)
Read-only access: orc, oc, orrrc, orcorrrcoc, . . .
Illegal sequences: c, rr , orr , oco, . . .
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
3 / 30
Formal Languages
Running Example: Even binary numbers
Even binary numbers: 0, 10, 100, 110, 1000, 1010, . . .
Excluded: 00, 010 (non-canonical); 1, 11 (odd) . . .
Alphabet A = {a, b} where a is zero and b is one.
So E = {a, ba, baa, bba, baaa, baba, . . . }.
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
4 / 30
Formal Languages
Tries
An infinite trie is a node-labeled A-branching tree.
I.e., each node has one branch for each letter a ∈ A.
A language can be represented by an infinite trie.
To check whether word a1 · · · an is in the language:
We start at the root.
At step i, we choose branch ai .
At the final node, the label tells us whether the word is in the language
or not.
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
5 / 30
Formal Languages
Trie of E
8
a
@
b
b
a
b
&
a
b
8
a
b
a
b
&
Andreas Abel (GU)
a
b
3
a
a
b
a
b
a
b
+
3
+
3
+
3
a
Languages coinductively
b
+
a
b
a
b
a
b
a
b
3 ···
+3
···
+3
···
+3
···
+3
···
+3
···
+3
···
+3
···
+ ···
Strathclyde 2016
6 / 30
Formal Languages
Regular Languages
A trie is regular if it has only finitely many different subtrees.
Each node of the trie corresponds to one of these languages:
E
Z
N
ε
∅
Andreas Abel (GU)
even binary numbers
strings ending in a
strings not ending in b
the empty string
nothing (empty language)
Languages coinductively
Strathclyde 2016
7 / 30
Formal Languages
3 ···
a
4 ∅
a
8 ∅
b
*
4 ···
a
*
a
∅
Aε
b
+
3 ···
a
b
&
a
4 ∅
a
∅
b
*
b
*
4 ···
a
∅
E
b
*
4 ···
a
a
b
8 N
4 N
b
b
*
*
4 ···
a
a
Z
Z
b
*
4 ···
a
b
&
Andreas Abel (GU)
b
a
Z
Languages coinductively
b
4 N
*
b
a
Z
b
*
4 ···
+ ···
Strathclyde 2016
8 / 30
Formal Languages
Cutting duplications at depth 3
4
a
7 ∅
a
?ε
b
'
a
b
*
4
a
∅
b
*
E
7 N
a
Z
b
'
Andreas Abel (GU)
4
a
b
b
+
4
a
Z
Languages coinductively
b
+
Strathclyde 2016
9 / 30
Formal Languages
Bending branches . . .
a
7 ∅ h
K
a
?ε
b
b
'
a
a
b
∅
E
a
b
7 N i
a
Z o
S
b
b
'
a
Z
b
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
10 / 30
Formal Languages
Final Automata
We have arrived at a familiar object: a final automaton.
Depending on what we cut, we get different automata for E .
If we cut all duplicate subtrees, we get the minimal automaton.
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
11 / 30
Formal Languages
Removing duplicate subtrees II. . .
4
a
a
: ε
7 ∅
b
b
*
*
a
E
4
a
b
7 N
a
Z
b
+
b
)
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
12 / 30
Formal Languages
Bending branches II . . .
a
7 ∅
BK
a
: ε
b
a
b
E
a
b
a
J
Z h
7 N
b
b
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
13 / 30
Formal Languages
Extensional Equality of Automata
All automata for E unfold to the same trie.
This gives a extensional notion of automata equality:
1
2
Recognizing the same language.
I.e., unfold to the same trie.
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
14 / 30
Formal Languages
Automata, Formally
An automaton consists of
1
2
3
A set of states S.
A function ν : S → Bool singling out the accepting states.
A transition function δ : S → A → S.
s∈S
E
ε
∅
Z
N
νs
7
X
7
7
X
δsa
ε
∅
∅
N
N
δsb
Z
∅
∅
Z
Z
Language automaton
1
2
3
State = language ` accepted when starting from that state.
ν`: Language ` is nullable (accepts the empty word)?
δ`a = {w | aw ∈ `}: Brzozowski derivative.
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
15 / 30
Formal Languages
Differential equations
Language E and friends can be specified by differential equations:
ν gives the initial value.
ν∅
δ∅x
= false
= ∅
νε
δεx
= true
= ∅
νE
= false
δE a = ε
δE b = Z
νN
= true
δNa = N
δNb = Z
νZ
δZ a
δZ b
= false
= N
= Z
For these simple forms, solutions exist always.
What is the general story?
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
16 / 30
Coinductive Types and Copatterns
Final Coalgebras
(Weakly) final coalgebra.
S
f
/ F (S)
F (coit f )
coit f
νF
force
/ F (νF )
Coiteration = finality witness.
force ◦ coit f = F (coit f ) ◦ f
Copattern matching defines coit by corecursion:
force (coit f s) = F (coit f ) (f s)
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
17 / 30
Coinductive Types and Copatterns
Automata as Coalgebra
Arbib & Manes (1986), Rutten (1998), Traytel (2016).
Automaton structure over set of states S:
o : S → Bool
t : S → (A → S)
“output”: acceptance
transition
Automaton is coalgebra with F (S) = Bool × (A → S).
ho, ti : S −→ Bool × (A → S)
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
18 / 30
Coinductive Types and Copatterns
Formal Languages as Final Coalgebra
ho,ti
S
/ Bool × (A → S)
id×(coitho,ti ◦_)
` := coitho,ti
Lang
ν◦`
ν (` s)
hν,δi
“nullable”
= o
= os
δ◦`
= (` ◦ _) ◦ t
δ (` s)
= ` ◦ (t s)
δ (` s) a = ` (t s a)
Andreas Abel (GU)
/ Bool × (A → Lang)
(Brzozowski) derivative
Languages coinductively
Strathclyde 2016
19 / 30
Coinductive Types and Copatterns
Languages – Rule-Based
Coinductive tries Lang defined via observations/projections ν and δ:
Lang is the greatest type consistent with these rules:
l : Lang
ν l : Bool
l : Lang
a:A
δ l a : Lang
Empty language ∅ : Lang.
Language of the empty word ε : Lang defined by copattern matching:
ν ε = true : Bool
δεa = ∅
: Lang
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
20 / 30
Coinductive Types and Copatterns
Corecursion
Empty language ∅ : Lang defined by corecursion:
ν ∅ = false
δ∅a = ∅
Language union k ∪ l is pointwise disjunction:
ν (k ∪ l) = ν k ∨ ν l
δ (k ∪ l) a = δ k a ∪ δ l a
Language composition k · l à la Brzozowski:
ν (k · l)
= νk ∧νl
(δ k a · l) ∪ δ l a
δ (k · l) a =
(δ k a · l)
if ν k
otherwise
Not accepted because ∪ is not a constructor.
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
21 / 30
Bisimilarity
Bisimilarity
Equality of infinite tries is defined coinductively.
_∼
=_ is the greatest relation consistent with
∼k
l=
∼
=ν
νl ≡νk
l∼
a:A ∼
=k
=δ
δla ∼
= δka
Equivalence relation via provable ∼
=refl, ∼
=sym, and ∼
=trans.
∼
=trans
∼
=ν (∼
=trans p q)
∼
∼ (=trans
p q) a
=δ
:
(p : l ∼
= k) → (q : k ∼
= m) → l ∼
=m
∼
∼
: νl ≡νk
= ≡ trans (=ν p) (=ν q)
= ∼
= δma
=δ q a) : δ l a ∼
=trans (∼
=δ p a) (∼
Congruence for language constructions.
k∼
l∼
= k0
= l0 ∼
=∪
(k ∪ k 0 ) ∼
= (l ∪ l 0 )
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
22 / 30
Bisimilarity
Proving bisimilarity
Composition distributes over union.
dist : ∀ k l m. k · (l ∪ m) ∼
= (k · l) ∪ (k · m)
Proof. Observation δ _ a, case k nullable, l not nullable.
δ (k · (l ∪ m)) a
= δ k a · (l ∪ m)
∪ δ (l ∪ m) a
∼
= (δ k a · l ∪ δ k a · m) ∪ (δ l a ∪ δ m a)
∼
= (δ k a · l ∪ δ l a) ∪ (δ k a · m ∪ δ m a)
= δ ((k · l) ∪ (k · m)) a
by definition
by coind. hyp. (wish)
by union laws
by definition
Formal proof attempt.
∼
=δ dist a = ∼
=trans (∼
=∪ dist . . . ) . . .
Not coiterative / guarded by constructors!
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
23 / 30
Sized Coinductive Types
Construction of greatest fixed-points
Iteration to greatest fixed-point.
> ⊇ F (>) ⊇ F 2 (>) ⊇ · · · ⊇ F ω (>) =
\
F n (>)
n<ω
Naming ν i F = F i (>).
ν0 F
ν n+1 F
νω F
= >
= T
F (ν n F )
n
=
n<ω ν F
Deflationary iteration.
T
j
νi F =
j<i F (ν F )
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
24 / 30
Sized Coinductive Types
Sized coinductive types
Add to syntax of type theory
Size
i
νi F
Size< i
type of ordinals
ordinal variables
sized coinductive type
type of ordinals below i
Bounded quantification ∀j<i. A = (j : Size< i) → A.
Well-founded recursion on ordinals, roughly:
f : ∀ i. (∀ j<i. ν j F ) → ν i F
fix f : ∀ i. ν i F
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
25 / 30
Sized Coinductive Types
Sized coinductive type of languages
Lang i ∼
= Bool × (∀j<i. A → Lang j)
l : Lang i
ν l : Bool
l : Lang i
j <i
a:A
δ l {j} a : Lang j
∅ : ∀i. Lang i by copatterns and induction on i:
ν (∅ {i})
= false : Bool
δ (∅ {i}) {j} a = ∅ {j} : Lang j
Note j < i.
On right hand side, ∅ : ∀j<i. Lang j (coinductive hypothesis).
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
26 / 30
Sized Coinductive Types
Type-based guardedness checking
Union preserves size/guardeness:
k : Lang i
l : Lang i
k ∪ l : Lang i
ν (k ∪ l)
= νk ∨νl
δ (k ∪ l) {j} a = δ k {j} a ∪ δ l {j} a
Composition is accepted and also guardedness-preserving:
k : Lang i
l : Lang i
k · l : Lang i
ν (k · l)
= νk ∧νl
(δ k {j} a · l) ∪ δ l {j} a
δ (k · l) {j} a =
(δ k {j} a · l)
Andreas Abel (GU)
Languages coinductively
if ν k
otherwise
Strathclyde 2016
27 / 30
Sized Coinductive Types
Guardedness-preserving bisimilarity proofs
Sized bisimilarity ∼
= is greatest family of relations consistent with
l∼
=i k ∼
=ν
νl ≡νk
l∼
=i k
j <i
a:A ∼
=δ
j
∼
δla = δka
Equivalence and congruence rules are guardedness preserving.
∼
=trans
∼ (=trans
∼
p q)
=ν
∼
∼
=δ (=trans p q) j a
:
(p : l ∼
=i k) → (q : k ∼
=i m) → l ∼
=i m
= ≡ trans (∼
: νl ≡νk
=ν p) (∼
=ν q)
∼
∼
∼
= =trans (=δ p j a) (=δ q j a) : δ l a ∼
=j δ m a
Coinductive proof of dist accepted.
∼
=δ dist j a = ∼
=trans j (∼
=∪ (dist j) (∼
=refl j)) . . .
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
28 / 30
Conclusions
Conclusions
Tracking guardedness in types allows
natural modular corecursive definition
natural bisimilarity proof using equation chains
Implemented in Agda (ongoing)
Abel et al (POPL 13): Copatterns
Abel/Pientka (ICFP 13): Well-founded recursion with copatterns
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
29 / 30
Conclusions
Related work
Hagino (1987): Coalgebraic types
Cockett et al.: Charity
Dmitriy Traytel (PhD TU Munich, 2015): Languages coinductively in
Isabelle
Kozen, Silva (2016): Practical coinduction
Hughes, Pareto, Sabry (POPL 1996)
Papers on sized types (1998–2015): e.g. Sacchini (LICS 2013)
Andreas Abel (GU)
Languages coinductively
Strathclyde 2016
30 / 30