Notes on the Lambda-Calculus and Types

Notes on Foundations of Programming Languages
Sandra Alves
DCC-FCUP
(revised on September 25, 2016)
Chapter 1
Induction
Induction is a fundamental proof technique for the topics discussed in these notes. We will present several
forms of induction: from induction on natural numbers, to structural induction, or induction on proofs.
Although both structural induction and induction on proofs can be seen as special cases of induction on the
natural numbers (considering as measure the size of the terms in the case of structural induction, and the
lenght of the proofs for induction on proofs), we will present all these forms of induction, as instances of a
more general notion, which is the notion of well-founded induction.
1.1
Induction on Natural Numbers
We start by dening the principal of mathematical induction in its most common form.
Definition 1.1.1 (Induction on natural numbers) Given a certain property P(n), with n 2 N. Sup-
pose the following hold:
P(0)
For every k 2 N, if P(k), then P(k + 1).
Then 8n.P(n) is true.
We refer to the proof of P(0) as the base case, and to the proof that P(k) implies P(k + 1) as the induction
step. We assume P(k) to be true in order to proof P(k + 1), and refer to it as induction hypothesis.
It is clear that having proved the base case and the induction step, then from P(0) and the induction
step one can infer P(1). Using P(1) and the same argument, we obtain a proof of P(2), and so on. Therefore,
it is possible in this way to build a proof for any given n. We will now see an example of a simple proof by
induction.
Example 1.1.2 Consider the property "Every positive interger power of 3 is odd". We want to show
that for every n 2 N+ , it holds that 3n = 2l + 1, for some l 2 N. We proceed by induction.
Base case: for n = 0 we have 1 = 2 0 + 1, so the property holds.
Induction step: suppose 3k is odd (that is 3k = 2l + 1, l 2 N), we want to prove that 3k+1 is also
odd.
3k+1
=
=
=
3 3k
3(2l + 1)
2(3l + 1) + 1
(by induction hypothesis)
Therefore 3k+1 is odd, which concludes the proof.
2
Definition 1.1.3 (Strong induction on natural numbers) Given a certain property P(n), with n
N. Suppose the following hold:
P(0)
For every k 2 N, if P(0) ∧ P(1) ∧ ∧ P(k), then P(k + 1).
2
Then 8n.P(n) is true.
Example 1.1.4 Consider the property "Any positive integer greater than one, can be written as a
product of prime numbers". We proceed by strong induction.
Base case: for n = 2, since 2 is a prime number the property holds. (Note that the base case is
2 and not 0, since we want to prove the claim for all the positive integer greater or equal than
2.)
Induction step: Lets assume the claim for all the positive integers between 2 and k. If k + 1 is
a prime number, then the property holds trivially. Otherwise, k + 1 has a positive divisor other
than 1 and itself. Therefore k + 1 = a b, with 2 a, b k. By induction hypothesis, both a
and b can be written as a product of primes, therefore so can their product, which concludes the
proof.
Note than, although we use the term "strong", to refer to this second case of induction, in fact both
forms are equivalent with respect to the properties one can prove. That is, one can express the rst form
using the second and vice versa.
Let P(n) be a property on the natural numbers and assume the conditions of the induction principle:
1) P(0)
2) For every k 2 N, if P(k), then P(k + 1).
Lets prove by strong induction that 8n.P(n). From (1) we get P(0). Lets assume P(0) ∧ P(1) ∧ ∧ P(k),
which implies P(k). Therefore, by (2), we get P(k + 1). Thus, by strong induction 8n.P(n).
Now, lets assume the conditions of the strong induction principle:
1) P(0)
2) For every k 2 N, if P(0) ∧ P(1) ∧ ∧ P(k), then P(k + 1).
Lets prove by weak induction that 8n.P(n).
Let Q(m) be P(n) holds for all n such that 0 n m. Apply mathematical induction to Q(m). Since
Q(0) is just P(0), we have the base case. Now suppose Q(m) is given and we wish to show Q(m + 1). Notice
that Q(m) is the same as P(0) ∧ P(1) ∧ ∧ P(m). The hypothesis of strong induction tells us that this
implies P(m + 1). If we add P(m + 1) to Q(m), we get P(0) ∧ P(1) ∧ P(m) ∧ P(m + 1), which is just
Q(m + 1). So using mathematical induction, we get that Q(n) holds for all natural numbers n. But Q(n)
implies P(n), so we have the conclusion of strong induction, namely that P(n) holds for all natural numbers
n.
We can also use induction on natural numbers to prove properties on other sets (for example properties
over trees, or terms of a given language), by dening a suitable function on the natural numbers.
Example 1.1.5 Consider the following data type dening binary trees:
data (Arv a) = Empty | Leaf a | No (Arv a) (Arv a)
Consider the following property of binary trees:
3
P(t) = tree t has at most one more leaf than internal nodes.
One can dene a function height : Trees → N, and rewrite P(t) as the property Q(n) on natural
numbers:
Q(n) = 8 trees t, if height(t) = n then P(t).
1.2
Induction on Expressions and Proofs
We now look at two particular kinds of induction to prove properties on expressions generated by a grammar
or over proofs dened by a particular proof system.
Definition 1.2.1 (Structural induction) Let e be an expression generated by a particular grammar
and P(e) a property on e. If
1. the property holds for any atomic expression;
2. for any compound expression e 0 , with imediate sub-expressions e1 , . . . , ek , if P(ei ) for i = 1, . . . , k
implies P(e 0 );
Then P(e) holds for every expression e.
A stronger version of the above principle requires that, for any non-atomic expression, P(e 0 ) holds, if it
holds for any sub-expression (imediate or not). These two principles are related, respectively, with the weak
and strong principles of induction on natural numbers (note that any proof by structural induction can be
writen as a proof of induction on the natural numbers over the size of the expression.
The other induction principle we will consider, will be induction on proofs over a Hilbert proof system.
Definition 1.2.2 A Hilbert proof system consists of a set of axioms and a set of inference rules:
an axiom is a statment that is provable by denition;
an inference rule determines that, if a list of statments (called premises) is provable then so it
is the conclusion of the rule.
A1
An
B
Definition 1.2.3 Let π be a proof in some proof system, and P(π) a property on π. If
1. the property holds for any axiom;
2. the property holds for proofs π1 , . . . , πk (with shorter proofs), implies that the property holds for
π 0 , where π 0 ends by extending one or more of the proofs π1 , . . . , πk , with an inference rule.
Then P(π) holds for every proof π, of the proof system.
1.3
Well-founded Induction
We can see all these dierent kinds of induction as instances of a general form of induction on what are
called \well-founded relations".
Definition 1.3.1 (Well-founded relation) A well-founded relation on a set A is a binary relation
on A, such that, there is no innite descending sequence a0 a1 a2 .
4
Form of Induction
Well-founded relation
Natural number induction (weak)
Natural number induction (strong)
Structural induction (weak)
Structural induction (strong)
Induction on proofs
m n if m + 1 = n
m n if m < n
e e 0 if e is an immediate subexpression of e 0
e e 0 if e is a subexpression of e 0
π π 0 if π is the subproof for some antecedent
0
of the last inference rule in proof π
Table 1.1: Well founded relations for common forms of induction
Lemma 1.3.2 Let be a binary relation on A. Then
subset of A has a minimal element.
is well-founded if and only if every nonempty
Proof: We prove the two directions separately.
(⇒) Suppose
is a well-founded relation on A and let B A be any nonempty subset. We will show, by
contradiction, that B has a minimal element. If B does not have a minimal element, then for any a 2 B
there exists a 0 2 B such that a 0 a. But then we can build the innite sequence a0 a1 a2 starting with any a0 2 B and using the fact that no ai can be minimal since B has no minimal element.
(⇐) Suppose that any subset has a minimal element. Then there can be no innite decreasing sequence
a0 a1 a2 since such a sequence would give us a set {a0 , a1 , a2 , . . . } without a minimal
element. This completes the proof.
Proposition 1.3.3 (Well-founded induction) Let be a well-founded binary relation on a set A and
let P be some property on A. If P(a) holds whenever we have P(b) for all b a, then P(a) is true for
all a 2 A.
Proof: Suppose 8a.(8b.(b a ⇒ P(b)) ⇒ P(a)). We will show, by contradiction, that P(a) holds for
a 2 A. Suppose there exists x 2 A, such that ¬P(x). Therefore the set B = {a 2 A | ¬P(a)} is nonempty,
which means B has a minimal element a 2 B. But since we then have P(b) for all b a, this contradicts
the assumption 8b.(b a ⇒ P(b)) ⇒ P(a), which concludes the proof.
Exercises
1.1 Consider the following data type dening binary trees:
data (Arv a) = Empty | Leaf a | No (Arv a) (Arv a)
Using induction prove that any tree of type Arv a, has at most one more leaf than internal nodes.
1.2 Let
V
be an innite set of variables. Consider the following grammar for expressions:
e := 0 | 1 | v | e + e | e e
Prove that, for any list of variables v0 , . . . , vn , containing all the variables in a given expression e,
there exists a polinomial pn = cvk0 vk1 . . . vkn such that, for all the possible values of v0 , . . . , vn > 0, the
value of e is less than pn .
5
1.3 Consider the following relation on N2 :
(n, m) (n 0 , m 0 ) i n < n 0 or (n = n 0 and m < m 0 )
Prove that is a well-founded relation.
1.4 Let L be the set of expressions containing only the symbols ( and ), dened the following way:
– () is an expression;
– if α is an expression, then (α) is an expression;
– if α and β are expressions, then αβ is an expression.
where αβ denotes the concatenation of the two expressions. Show that for every expression in L:
(a) the number of ( is equal to the number of ).
(b) in every prex of an expression the number of ) does not excede the number of (.
1.5 Consider the set of words containing only the letters M, I, U, and dened in the following way:
– MI is a word;
– if xI is a word, then xIU is a word;
– if Mx is a word, then Mxx is a word;
– if xIIIy is a word, then xUy is a word;
– if xUUy is a word, the xy is a word.
where x, y are any sequences of M's, I's and U's.
(a) Is MU a word?
(b) Prove by induction that the number if occurrences of I in any word is nerver a multiple of 3.
What can you conclude?
1.6 Let L be the set of words dened in the following way:
– a and b are words;
– if β is a word then aaβ and bbβ are also words;
(a) Prove that any word of L has either an even number of a's and an odd number of b's, or an even
number of b's and an odd number of a's.
(b) Show that L can be described as the set of words build of blocks of a's and/or b's, being even the
numebr of letters in each block, except on the last block.
6
Chapter 2
The Lambda-Calculus Reduction
System
2.1
Reduction Systems
In this section we present basic notions on reduction systems. For a more detailed study see [Klop, 1992,
Dershowitz and Jouannaud, 1990].
Definition 2.1.1 A reduction system is a pair hA, →R i, where A is a set of terms and →R 1 is a relation
on A. We call R a notion of reduction. We will sometimes refer to →R as just R.
A notion of reduction can be introduced as a set of contraction rules:
R : M → N if
This corresponds to the following relation on A:
R = {(M, N) | }
We will always present notions of reduction as contraction rules.
Definition 2.1.2 Let
be a set of terms. A context (denoted by C[ ]) is a 'term' containing one or
more occurrences of [ ], denoting holes, and such that if M 2 A, then replacing the holes in C[ ] by
M, is the term C[M] 2 A.
A
A relation R is compatible if it can be lifted upon contexts.
Definition 2.1.3
1. A binary relation R on a set of terms
A
is compatible if
(M, N) 2 R ⇒ (C[M], C[N]) 2 R
for all M, N 2 A and all contexts C[ ] with one hole.
2. A compatible, reexive and transitive relation on
Definition 2.1.4 Let R be a notion of reduction on
on A:
1 We
will often refer to
→R
as just
R.
7
A
is called a reduction relation on A.
. Then R induces the following binary relations
A
The one step R-reduction denoted by →R . The →R relation is the compatible closure of R, and
is inductively dened as follows:
(M, N) 2 R
M →R N
⇒
⇒
M →R N
C[M] →R C[N]
The R-reduction denoted by −→R . The −→R relation is the reexive, transitive closure of →R ,
and is inductively dened as follows:
M →R N
M −→R M
M −→R N, N −→R P
⇒ M −→R N
⇒ M −→R P
The relation →R is, by denition, a compatible relation. The relation −→R is the reexive transitive closure
of →R and therefore a reduction relation.
We will sometimes omit R, when it is clear from the context which notion of reduction R represents.
Definition 2.1.5 Let
, →R i be a reduction system.
hA
A term M in A is called an R-redex, if (M, N) 2 R for some N in A. The term N is called an
R-contractum of M.
A term M is said to be in R-normal form (R-nf) if M does not contain (as a subterm) any
R-redex.
A term M has a (R-nf), if M R-reduces to N, and N is a (R-nf).
We write NFR to denote the set of terms in R-normal form.
We now discuss the property of conuence, which will be required in the reduction systems we will consider.
Definition 2.1.6 Let R be a notion of reduction.
R satises the diamond property (see Figure 2.1) if:
8
M, N1 , N2 .(M R N1 ∧ M R N2 ⇒ 9N.(N1 R N ∧ N2 R N))
M
R
R
-
N2
R
-
R
N1
N
Figure 2.1: Diamond property
R is said to be Church-Rosser (CR) if −→R satises the diamond property.
If a reduction is Church-Rosser then it is not possible to reduce any λ-term to two distinct normal forms.
Therefore, if a term has a normal form, the normal form is unique.
8
Definition 2.1.7
Let ∆ be a R-redex with contractum ∆ 0 . We write
∆
M−
→R N
if M C[∆], and N C[∆ 0 ].
An R-reduction (path) is a sequence (possible innite)
∆
∆
1
0
−→
M0 −−→
R M2 →R
R M1 −
We will sometimes leave out the redexes ∆i , when denoting a reduction sequence.
Definition 2.1.8 Let
by GR (M)) is the set
, Ri be a reduction system. The R-reduction graph of a term M 2 A (denoted
hA
{N 2 A | M −→R N}
directed by →R . This denes a multigraph since, if several redexes give rise to M0 →R M1 , then that
many directed arcs connect M0 to M1 in GR (M).
Definition 2.1.9 The reduction system
reduction sequence
, Ri is strongly normalising (SN), if for every M0
hA
M0 →R M1 →R
, every
2 A
reaches a normal form.
In terms of the graph representation of reductions, if a reduction system is strongly normalising, then the
reduction-graph for all the terms in the system is both acyclic and nite.
Reduction-graphs represent all the possible ways to reduce a term. A reduction strategy denes a way
to travel trough the reduction-graphs, thus providing a choice of how to reduce a term. We will now dene
this notion, and discuss some of its properties.
Definition 2.1.10 Let
, Ri be a reduction system. A reduction strategy F is a map
hA
F:A→A
such that, for all M 2 A, M →R F(M) if M is not in normal form.
Definition 2.1.11 A strategy F is normalising if
M has a normal form ⇒
9
n Fn (M) is a normal form.
Definition 2.1.12 A strategy F is maximal if the minimum number of applications of F needed to reach
the normal form is equal to the length of the longest nite reduction path.
2.2
The λ-calculus
In this section we briey present the type free λ-calculus. For a more detailed reference, see [Barendregt, 1984].
2.2.1
Syntax
Definition 2.2.1 Let
from
V
V be an innite set of variables. The set of λ-terms, Λ is inductively dened
the following way:
x2V
M, N 2 Λ
M 2 Λ, x 2 V
⇒
⇒
⇒
x2Λ
(MN) 2 Λ
(λxM) 2 Λ
9
(Application)
(Abstraction)
We use the symbol to denote syntactic equality between terms.
We consider application to be left associative, and abstraction to be right associative, and use the following
abbreviations to simplify notation:
Notation:
(M1 M2 . . . Mn )
(λx1 x2 . . . xn .M)
(. . . (M1 M2 ) . . . Mn )
(λx1 (λx2 (. . . (λxn M) . . . )))
We dene formally the notion of contexts in the λ-calculus.
Definition 2.2.2 A context C[ ] in the λ-calculus is inductively dened in the following way:
x is a context
[ ] is a context
if C1 [ ] and C2 [ ] are contexts, then C1 [ ]C2 [ ] and λx.C1 [ ] are also contexts.
2.2.2
Variables and Substitutions
A variable x occurs free in a term M if x is not in the scope of an abstraction λx in M. Otherwise x occurs
bound in M.
Definition 2.2.3 Let M be in Λ, the set fv(M) of free variables of M is inductively dened as follows:
fv(x) =
fv(MN) =
fv(λx.M) =
{x}
fv(M) [ fv(N)
fv(M) \ {x}
Definition 2.2.4 Let M be in Λ, the set bv(M) of bound variables of M is inductively dened as
follows:
bv(x)
bv(MN)
bv(λx.M)
=
=
=
;
bv(M) [ bv(N)
bv(M) [ {x}
A λ-term is closed if and only if fv(M) = ;. The set of closed λ-terms is denoted by Λ0 Λ.
Note that the sets of free and bound variables of a term are not necessarily disjoint: x occurs both free
and bound in x(λxy.x).
Definition 2.2.5 (Substitution) The result of substituting the free occurrences of x by L in M (denoted by M[L/x]) is dened as:
L
if x y
y[L/x] y otherwise
(MN)[L/x] (M[L/x])(N[L/x])
λy.M
if x y
(λy.M)[L/x] λy.(M[L/x]) otherwise
10
2.2.3
The hΛ, βi Reduction System
We will now present a reduction relation on Λ. See [Barendregt, 1984] for more details on this, and other
notions of reduction in the λ-calculus. This notion, together with Λ, denes a reduction system for which
some properties will be discussed.
Definition 2.2.6 (β-reduction) The notion of β-reduction on Λ is dened by the following contrac-
tion rule:
β : (λx.M)N → M[N/x], M, N 2 Λ
A λ-term of the form (λx.M)N is called a β-redex and M[N/x] is his β-contractum.
Substitution and α-equivalence
Some care is needed with substitution to avoid the problem of variable capture.
Consider the term first (λxy.x). For any given λ-terms M and N, we would expect
((λxy.x)MN −→ M.
But if we take M y then
((λxy.x)yN −→ N.
The problem arises when the free variable y in M enters the scope of a (λy) in (λxy.x). To avoid variable
capture, a substitution M[N/x] should only be allowed, in which case we say that x is substitutable by N in
M, if x doesn't occur free in any subterm of M of the form λy.P, and y 2 fv(N). This condition is ensured
if the set of bound variables of M is disjoint from the set of free variables of N:
bv(M) \ fv(N) = ;.
The previous condition can be always be ensure by renaming, when necessary, the bound variables in M.
This operation is called α-conversion.
Definition 2.2.7 A change of bound variable x in a term M is the substitution of a subterm of M of
the form λx.N by λy.(N[y/x]), where y does not occur in N.
The change of bound variables preserves the meaning of the term, in the sense that it represents the same
function. This notion is called α-congruence :
Definition 2.2.8 (α-congruence) The terms M and N are α-congruent, (notation M
can be obtained from M, by a series of changes of bound variables, and vice-versa.
α
N), if N
In any given context, we will always assume the sets of free and bound variables of any term to be disjoint2 .
Therefore any substitution M[N/x] is valid. Moreover, we do not distinguish terms that are α-congruent
(for instance λx.x λy.y).
Definition 2.2.9 Let β be the notion of reduction in Denition 2.2.6, and →β and −→β be the binary
relations induced by β as described in Denition 2.1.4. We say that:
M β-reduces to N in one step, and write M →β N;
M β-reduces to N, and write M −→β N.
Based on the relations dened above, we can build reduction-graphs for terms in Λ.
Remark 2.2.10 Note that, the fact that a term M has a β-normal form is not implied neither implies
Gβ (M) to be nite.
2 this
is known as the Barendregt variable convention
11
(λx.M)N = M[N/x]
M=M
M=N
M = N, N = L
M=N
M=N
M=N
⇒
N=M
⇒
M=L
⇒
ML = NL
⇒
LM = LN
⇒ λx.M = λx.N
Figure 2.2: The notion of equality of λ-terms
Equality of λ-terms
Based on the notion of β-reduction, we dene equality between λ-terms. Informaly, we say that two λterms are equal if M can be transformed into N by a series of reductions/expansions (M expands to N, if
N −→ M). A pictorial representation of the notion of equality is:
M2 Mk−1
Definition 2.2.11 Let M, N
ure 2.2.
2.2.4
2
-
-
-
N2
N1
N
M1
M
Nk
Λ. The notion of equality M = N is formalised by the rules in Fig-
Confluence
The β-reduction is Church-Rosser [Barendregt, 1984, Church and Rosser, 1936]. We will show a proof of
conuence for the λ-calculus with the β notion of reduction, due to W. Tait and P. Martin-Lof.
We rst recall the following result from [Barendregt, 1984].
Lemma 2.2.12 If a binary relation satises the diamond property, then the transitive closure of that
relation also satises the diamond property.
Proof: As suggested by the following diagram:
-
-
-
?
-
-
?
?
-
-
-
-
?
?
-
?
?
?
?
−→
We will show that
β satises the diamond property, by dening the parallel reduction relation 1 and
proving that 1 satises the diamond property and that −→β is the transitive closure of 1 .
Definition 2.2.13 We dene a binary relation 1 on the set of λ-terms inductively as indicated in
Figure 2.3.
Lemma 2.2.14 If M 1 M 0 and N 1 N 0 , then M[N/x] 1 M 0 [N 0 /x].
12
M 1
M 1
M 1
M 1
M
M0
M 0 , N 1 N 0
M 0 , N 1 N 0
⇒ λx.M 1 λx.M 0
⇒ MN 1 M 0 N 0
⇒ (λx.M)N 1 M 0 [N 0 /x]
Figure 2.3: The 1 reduction relation
Proof: By induction on the denition of 1 .
M 1 M 0 is M 1 M. Then one has to show that M[N/x] 1 M[N 0 /x]. This follows easily by
induction on the structure of M.
M 1 M 0 is λy.P 1 λy.P 0 , and is a consequence of P 1 P 0 . By induction hypothesis P[N/x] 1
P 0 [N 0 /x]. Then λy.(P[N/x]) 1 λy.(P 0 [N 0 /x]) by the denition of 2.2.13, thus (λy.P)[N/x] 1
(λy.P 0 )[N 0 /x], by the denition of substitution.
M 1 M 0 is PQ 1 P 0 Q 0 , and is a consequence of P 1 P 0 and Q 1 Q 0 . Then by induction
hypothesis P[N/x] 1 P 0 [N 0 /x] and Q[N/x] 1 Q 0 [N 0 /x]. Therefore PQ[N/x] P[N/x]Q[N/x]Q 1
P 0 [N 0 /x]Q 0 [N 0 /x] P 0 Q 0 [N 0 /x].
M 1 M 0 is (λy.P)Q 1 P 0 [Q 0 /y], and is a consequence of P 1 P 0 and Q 1 Q 0 , then:
M[N/x]
1
(λy.P[N/x])Q[N/x]
P 0 [N 0 /x][Q 0 [N 0 /x]/y]
(P 0 [Q 0 /y])[N 0 /x]
M 0 [N 0 /x]
Lemma 2.2.15 1 satises the diamond property.
Proof: By induction on the denition of 1 , one can show that if M 1 M1 and M 1 M2 , then there
is a term M3 such that M1 1 M3 and M2 1 M3 .
M 1 M1 is M 1 M. Then take M3
M2 .
M 1 M1 is λx.P 1 λx.P 0 and is a consequence of P 1 P 0 . Then M2
hypothesis there is a term P 000 with P 0 1 P 000 and P 00 1 P 000 , thus take M3
λx.P 00 . By induction
000
λx.P .
M 1 M1 is PQ 1 P 0 Q 0 and is a consequence of P 1 P 0 and Q 1 Q 0 . Then we have two subcases:
– M2 P 00 Q 00 , such that P 1 P 00 and Q 1 Q 00 . Then by induction hypothesis there is a term
P 000 with P 0 1 P 000 and P 00 1 P 000 and the same for Q, thus take M3 = P 000 Q 000 .
– P λx.P1 , M2 P100 [Q 00 /x] with P1 1 P100 and Q 1 Q 00 . Then one has P 0 λx.P10 with
P1 1 P10 . By induction hypothesis and Lemma 2.2.14 one can take M3 P1000 [Q 000 /x].
M 1 M1 is (λx.P)Q 1 P 0 [Q 0 /x] and is a consequence of P 1 P 0 and Q 1 Q 0 . Then we have two
subcases:
– M2 (λx.P 00 )Q 00 .
P 000 [Q 000 /x].
Then by induction hypothesis and Lemma 2.2.14 one can take M3 =
13
P 00 [Q 00 /x] with P 1 P 00 and Q 1 Q 00 . By induction hypothesis and Lemma 2.2.14 one
can can take M3 P 000 [Q 000 /x].
– M2
Lemma 2.2.16 −→ is the transitive closure of 1 .
Proof: If we represent reduction as a relation (thus as a set of pairs), we have:
−→= 1 −→
Since −→ is this transitive closure of →= , so it is of 1 .
Theorem 2.2.17 (Church-Rosser) Let M be a λ-term, if M −→β N1 and M −→β N2 , there is a
λ-term N such that N1 −→β N and N2 −→β N. Thus −→ is Church-Rosser.
By Lemma 2.2.15 1 satises the diamond property. By Lemma 2.2.16 −→ is the transitive
closure of 1 . The result follows by Lemma 2.2.12.
Proof:
The notion of η-reduction
We consider another important notion of reduction, which is called η-reduction.
Definition 2.2.18
1. The notion of η-reduction is given by
η : λx.Mx −→ M, with x 2
/ fv(M).
2. βη = β [ η.
Theorem 2.2.19 βη is Church-Rosser.
2.2.5
Normalisation
The hΛ, βi reduction system is not strongly normalising. Consider Ω (λx.xx)(λx.xx), and M (λxy.y)Ω.
The reduction path
M
(λxy.y)Ω −→β λy.y
ends with the normal form of the term, erasing Ω. However, if we try to normalise the subterm Ω, we obtain
the following innite reduction sequence:
Ω
Ω
Ω
(λxy.y)Ω −→β (λxy.y)Ω −→β (λxy.y)Ω −→β
A λ-term M that has a normal form, which admits an innite reduction sequence, reaches the normal form
if all the subterms of M that do not have normal forms, are erased. This is obtained if the following strategy
is used.
Definition 2.2.20 FL is dened as follows:
FL (M)
=
M
=
M0
if M is in normal form,
∆
0
if M −→
β M and ∆ is the leftmost redex in M.
The following was proved in [Curry and Feys, 1958].
14
Theorem 2.2.21 The reduction strategy FL is normalising.
FL is called the \normal order" strategy and it is related to the call-by-name strategy in programming
languages.
The following strategy is due to Barendregt et al. [Barendregt et al., 1976] and was proved to be maximal
in [van Raamsdonk et al., 1999].
Definition 2.2.22 F∞ is dened as follows:
F∞ (x~PQ~R)
=
F∞ (λx.P)
=
F∞ (λx.P)Q~R =
F∞ (λx.P)Q~R =
x~PF∞ (Q)~R
λx.F∞ (P)
P[Q/x]~R
(λx.P)F∞ (Q)~R
if ~P 2 NFβ , Q 2/ NFβ
if x 2 fv(P), or Q 2 NFβ
if x 2/ fv(P), and Q 2/ NFβ
Theorem 2.2.23 The reduction strategy F∞ is maximal.
2.2.6
Subsystems of the lambda calculus
Several subsystems of the λ-calculus can be obtained by restricting the set of terms. Here we present three of
those systems: the λI -calculus, the ane λ-calculus and the linear λ-calculus. These systems are obtained
by imposing restrictions on variable occurrences on the terms.
In the λI calculus, in every term of the form λx.M, x occurs in M. Therefore β-reduction in λI never
erases terms.
Definition 2.2.24 Let
from
V
V be an innite set of variables. The set of λI -terms, ΛI is inductively dened
in the following way:
x2V
M, N 2 ΛI
M 2 ΛI , x 2 fv(M)
⇒ x 2 ΛI
⇒ (MN) 2 ΛI
⇒ (λx.M) 2 ΛI
(Application)
(Abstraction)
In the ane λ-calculus β-reduction never duplicates terms. Thus, in every term M, every variable occurs
free at most once in any subterm of M.
Definition 2.2.25 Let
dened from
V
V be an innite set of variables. The set of ane λ-terms, ΛA is inductively
in the following way:
x 2 V ⇒ x 2 ΛA
M, N 2 ΛA , fv(M) \ fv(N) = ; ⇒ (MN) 2 ΛA
M 2 ΛA , x 2 V ⇒ (λx.M) 2 ΛA
(Application)
(Abstraction)
The linear λ-calculus is the intersection of the λI and the ane λ-calculus. In the linear λ-calculus, in every
term M every variable occurs free exactly once in any subterm of M.
Definition 2.2.26 Let
dened from
V
V be an innite set of variables. The set of linear λ-terms, ΛL is inductively
in the following way:
x 2 V ⇒ x 2 ΛL
M, N 2 ΛL , fv(M) \ fv(N) = ; ⇒ (MN) 2 ΛL
M 2 ΛL , x 2 fv(M) ⇒ (λx.M) 2 ΛL
(Application)
(Abstraction)
All the notions dened for Λ, are dened in an analogous way for ΛI , ΛA , and ΛL . The sets of terms ΛI ,
ΛA , and ΛL , with the β-reduction notion respectively dene the reduction systems hΛI , βi, hΛA , βi, and
hΛL , βi.
15
2.2.7
The de Brujin notation
We recall a notation for representing terms in the λ-calculus , which eliminates the necessity for using names
of variables. This notation is due to Nicolaas Govert de Bruijn [Bruijn, 1972].
Definition 2.2.27
1. The set of nameless terms Λ has the following alphabet:
λ, (, ), 1, 2, 3, . . .
2. Λ is dened inductively in the following way:
n 2 N \ {0}
A, B 2 Λ
A 2 Λ
⇒
⇒
⇒
n 2 Λ
(AB) 2 Λ
λA 2 Λ
Each De Bruijn index is a natural number that represents an occurrence of a variable in a λ-term, and
denotes the number of binders that are in scope between that occurrence and its corresponding binder.
Definition 2.2.28 The notion of β reduction for nameless terms is dened by the following reduction
rule.
(β) : (λP)Q −→ P[Q/1]
Substituition has to be dened in an appropriate way. In the β-reduction (λM)N, three aspects need to be
consider:
1. nd the variables n1 , n2 , . . . , nk in M under the range of the λ in λM;
2. decrease the free variables of M taking into account the removal of the outer binder;
3. replace n1 , n2 , . . . , nk with N, suitably increasing the free variables occurring in N each time, to match
the number of λ-binders the corresponding variable occurs under when substituted.
Definition 2.2.29 Let M, N 2 Λ and n 2 N \ {0}. Substitution M[N/n] is inductevly dened as:


if m < n
m

m[N/n] m−1
if m > n


remane
(N) if m = n
(m,1)
(M1 M2 )[N/n]
(λM)[N/n]
(M1 [N/n])(M2 [N/n])
λ(M[N/n + 1])
and remane(n,i) (M) is inductively dened as:
remane(n,i) (j)
remane(n,i) (M1 M2 )
remane(n,i) (λM)
if j < i
j + n − 1 if j i
j
(remane(n,i) (M1 ))(remane(n,i) (M2 ))
λ(remane(n,i+1) (M))
We can can dene a translation from Λ to Λ :
DB x (x1 , . . . , xn )
DB (λx.M) (x1 , . . . , xn )
DB (MN) (x1 , . . . , xn )
i, where i is the minimum such that x xi
λ(DB M (x, x1 , . . . , xn ))
(DB M (x1 , . . . , xn ))(DB N (x1 , . . . , xn ))
16
Exercises
2.1 For any of the following λ-terms reduce them to β normal for (if possible):
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(h)
(i)
xy
(λx.x)(yz)
(λx.xy)(λz.z)
(λxy.xy)
(λx.xλx.xy)(yy)
(λx.xx)(λx.xx)
(λx.xxx)(λx.xxx)
(λxy.y)((λx.xx)(λx.xx))b
M = AAx where A = λaxz.z(aax)
2.2 What is wrong with the following reductions?
(a) (λxy.yx)y −→β λy.yy
(b) (λxx.xx)y −→β λx.yy
(c) (λx.xx)(λy.ay)k −→β (λx.xx)ak −→β akak
2.3 Let S = λxyz.xz(yz), K = λxy.x, B = λxyz.x(yz) and I = λx.x. Show that
(a)
(b)
(c)
(d)
S(KI) −→βη I
BI −→βη I
SKK −→βη I
S(KS)K −→βη B
2.4 Let T1 = (λx.xx)(λx.xx) and T2 = (λxy.yx)(λx.xx)(λx.x). Can T1 and T2 be reduced to the same
λ-term T ?
2.5 If M and N are two λ-terms, then M α N if they are the same λ-term or if one can be obtained from
the other by a change of bound variables. For example (λx.x)z α (λy.y)z and (λx.x)z 6α (λx.y)z.
Prove, by strutural inductionof a λ-term M that if x 6= y and if x 2/ FV(L) then:
M[N/x][L/y] α M[L/y][N[L/y]/x]
2.6 Show that if x does not occur free in the λ-term M, 8N.M[N/x] = M.
2.7 Consider the 1 in Λ, inductively dened in the following way:
M 1 M
M 1 M 0
M 1 M 0 , N 1 N 0
M 1 M 0 , N 1 N 0
⇒ λx.M 1 λx.M 0
⇒ MN 1 M 0 N 0
⇒ (λx.M)N 1 M 0 [N 0 /x]
Prove, by induction on M, that if N 1 N 0 , then M[N/x] 1 M[N 0 /x].
2.8 Prove, by induction on the denition of 1 , the following properties:
17
(a) λx.M 1 N, implies that N λx.M 0 and M 1 M 0
(b) MN 1 L, implies that
L M 0 N 0 and M 1 M 0 , N 1 N 0
or
M λx.P, L P 0 [N 0 /x] and P 1 P 0 , N 1 N 0
2.9 Write the following λ-terms using the De Brujin notation:
(a) (λxy.y)((λx.xx)(λx.xx))(λy.y)
(b) (λfx.f(f(x)))(λzw.z(z(zw)))
(c) M = λx.AAx where A = λaxz.z(aax)
2.10 Reduce to normal form the following De Brujin terms:
(a) (λ(λ2)1)(λ321)
(b) (λλ.2(2(21)))(λλ2(21))
(c) (λλλ31(21))(λλ2)(λ1)
2.11 Consider the set Λ , of De Brujin terms. Dene a function BL : Λ −→ Λ that converts terms in the
De Brujin notation, to λ-calculus.
18
Chapter 3
Computability in the λ-calculus
The λ-calculus is Turing complete, which means that every computable function can be written as terms in
the λ-calculus. In this chapter we dene encodings of some useful data structures (such as boolean values,
pairs, natural numbers, etc), which are useful in relating the λ-calculus with other theories for recursive
functions.
3.1
Booleans and conditionals
The boolean values are represented as functions of two values that evaluate to one or the other of their
arguments. The values true and false are dened in the λ-calculus as:
true
false
Note that
true M N
false M N
λxy.x
λxy.y
(λxy.x)MN −→ M
(λxy.y)MN −→ N
Therefore, an appropriate encoding of if is such that if L M N −→ LMN. Thus if can be encoded as:
if
λpxy.pxy
Having dened the truth values and a conditional, other operations on Booleans can be easily encoded:
and
or
not
λpq.if p q false
λpq.if p true q
λpq.if p false true
There are other possible encodings. For example and can be encoded in a more direct way as λmn.mnm.
Boolean values are useful to dene other data structures, such as pairs. Pairs and the usual projections fst
and snd can be encoded as:
pair
fst
snd
λxyz.zxy
λp.p true
λp.p false
Applying snd to M and N will reduce to λf.fMN (this is called packing). When λf.fMN is applied to a
two-argument function λxy.L will reduce do L[M/x][N/y] (this is called unpacking).
19
3.2
Natural numbers and arithmetic
The rst encoding of natural numbers in the λ-calculus was dened by Alonzo Church and it is known as
the Church numerals. Church numerals 0, 1, 2, . . . , are dened as follows:
0
1
2
n
λfx.x
λfx.fx
λfx.f(fx)
...
λfx.fn x
...
That is, the natural number n is represented by the Church numeral n, which has the property that for any
λ-terms F and X,
nFX =β Fn X
Arithmetic functions that work on Church numerals can also be encoded as λ-terms:
add
mult
exp
λmnfx.mf(nfx)
λmnfx.m(nf)x
λmnfx.nmfx
It is easy to show that the functions dened above behave as expected. We show the case of add.
add m n −→
−→
−→
λfx.mf(nfx)
λfx.fm (fn x)
λfx.fm+n x m + n
Other basic operations on numerals:
succ
iszero
λnfx.f(nfx)
λn.n(λx.false)true
Encoding the predecessor function is not so straightforward. Given f, we will consider a function g working
on pairs of numerals, such that g(x, y) = (f(x), x), thus
gn+1 (x, x) = (fn+1 (x), fn (x))
We can encode the predecessor function as:
pred λnfx.snd (n (prefnf) (pair x x))
with
prefn λfp.pair (f(fst p)) (fst p)
Subtraction can then be dened as:
sub λmn.n pred m.
The encodings dened above are sometimes not very intuitive or even ecient. This is somewhat related to
the fact that encodings carry their on control structure.
20
3.2.1
Lists
Similar to what happens with church numerals, a list [x1 , x2 , . . . , xn ] can be represented by the term
λfy.f x1 (f x2 . . . (f xn y) . . . )
Alternatively, lists can be represented using pairs (as done in ML or Lisp). It is possible to represent a list
[x1 , x2 , . . . , xn ] as (x1 , (x2 , (. . . (xn , nil) . . . )). List can be encoded as the following λ-terms:
nil
cons
pair true true
λxy.pair false (pair x y)
The encoding of lists contains a boolean ag, which indicates whether or not the list is empty. The usual
functions on lists, are thus dened as:
null
hd
tl
fst
λz.fst (snd z)
λz.snd (snd z)
A simpler encoding of the empty list is λz.z.
3.3
Recursion in the λ-calculus
We now look at a key aspect in computation, which is recursion. Church numerals allows us to encode
a very large class of functions on natural numbers (due to the fact that a Church number can be used
to dene bounded iteration). The use of high-order gives Church numerals the ability to encode even
functions out of the scope of primitive recursion (for example, Ackermann's function can be encoded as
λm.m(λfn.nf(f1))succ.
In recursion theory, general recursive functions are encoded through the use of minimisation. In the
λ-calculus we will dene general recursion, with the use of a xed point combinator, which is a term Y
such that YF = F(YF), for all terms F.
Definition 3.3.1 A xed point of the function F is any X such that F(X) = X. In the case above,
X = YF
To encode recursion through the use of a xed point combinator, we consider F to be the body of the
recursive function and use the law YF = F(YF) to unfold F as many times as needed.
Example 3.3.2 Consider the following recursive denitions
fact n
append x y
zeroes
if (iszero n) 1 (mult n (fact(pre n)))
if (null x) y (cons (hd x) (append (tl x) y))
cons 0 zeroes
Then its recursive denitions in the λ-calculus are:
fact
append
zeroes
Y(λfn.if (iszero n) 1 (mult n (f(pre n))))
Y(λfxy.if (null x) y (cons (hd x) (f (tl x) y)))
Y(λf.cons 0 f)
It is easy to verify that.
zeroes
Y(λf.cons 0 f)
= (λf.cons 0 f)Y(λf.cons 0 f)
= (λf.cons 0 f)zeroes
−→ cons 0 zeroes
21
We now formalize the general usage of Y in the denition of recursive functions. Any recursive equation,
representing an n-argument equation, Mx1 . . . xn = PM, where P is any λ-term is dened by the λ-term
M Y(λgx1 . . . xn .Pg).
It is easy to see that
Mx1 . . . xn
=
−→
Y(λgx1 . . . xn .Pg)x1 . . . xn
(λgx1 . . . xn .Pg)Mx1 . . . xn
PM
For mutually recursive denitions M and N such that
M
N
=
=
PMN
QMN
We consider the xed point of a function F on pairs such that F(X, Y) = (P X Y, Q X Y). Now using the
encoding of pairs, we dene
3.3.1
L
Y(λz.pair(P (fst z) (snd z)))(Q (fst z) (snd z)))
M
N
fst L
snd L
Some well-known fixed-point combinators
We present some xed-point combinators which are well known in the literature. The rst one is Haskell
Curry's Y combinator, which is dened by:
Y = λf.(λx.f(xx))(λx.f(xx))
We can verify easily verify the xed point property:
YF
−→ (λx.F(xx))(λx.F(xx))
−→ F(λx.F(xx))(λx.F(xx))
= F(YF)
This combinator is also known as Curry's Paradoxical combinator. If we consider Russell's Paradox where
one denes the set R {x | x 2/ x}, then R 2 R if and only if R 2/ R. Now representing in the λ-calculus sets
as predicates, then M 2 N is encoded as N(M), and {x | P} is encoded as λx.P. Now one can derive Russel's
Paradox because R λx.not(xx), which implies RR = not(RR), which is logically a contradiction. Curry's
xed-point is dened by replacing in RR = not(RR), not by any term F.
Note that no reduction YF −→ F(YF) is possible. The property is veried by two β-reductions, followed
by a β-expansion. A stronger requirement is to have a combinator M such that 8F.MF −→ M(MF). A
combinator that satises this property is Alan Turing's Θ, dened as
Θ AA where A λxy.y(xxy)
Now it is easy to verify that:
ΘF
−→
−→
(λxy.y(xxy))AF
(λy.y(AAy))F
F(AAF)
F(ΘF)
Another xed-point combinator that veries this stronger requirement is Klop's $ combinator:
$ = $$$$$$$$$$$$$$$$$$$$$$$$$$
$ = λabcdefghijklmnopqstuvwxyzr.r(thisisaxedpointcombinator)
22
3.3.2
Head normal form
We now introduce a new notion of normal form, which is related to the notion of result in functional
programming. Note that, if we consider a recursive denition M = PM then, unless P is a constant function
or the identity, M does not have a normal form. In the same way, anything dened through the use of a
xed-point combinator, is not likely to have a normal form, although it can still be used to compute with.
We formalize this, with the denition of head normal form.
Definition 3.3.3 A λ-term M is a head normal form (hnf) if and only if is of the form
λx1 . . . xn .yM1 . . . Mm
m, n 0
The variable y is called the head variable.
Example 3.3.4 Some examples of hnfs are λx.yΩ, xM, λz.z(λx.x), but not λy.(λx.a)y. In fact the
redex (λx.a)y in the previous term is called a head-redex.
It is obvious that a normal form is also a head normal form. Also, if
λx1 . . . xn .yM1 . . . Mm −→ N
then the term N must be of the form:
λx1 . . . xn .yN1 . . . Nm
with Mi −→ Ni . This in a sense, means that head normal forms x the outer structure of the "result".
Also, the reductions Mi −→ Ni do not interfere with each other, therefore they can be done in parallel.
The notion of head normal form is related to the notion of denability.
Definition 3.3.5 A term is dened if and only if it can be reduced to a head normal form, otherwise
is undened.
Example 3.3.6 The term Ω is an example of an undened term, whereas xΩ is dened.
This notion is related to Solvability in the λ-calculus:
Definition 3.3.7 A term M is solvable if and only if there exist x1 , . . . , xm , N1 , . . . , Nn , such that
(λx1 . . . xm .M)N1 . . . Nn = I
Example 3.3.8 Take the dened term xΩ. Then (λx.xΩ)(λxy.y)I −→ I.
Exercises
3.1 Write other (more direct) encodings of or, not and xor.
3.2 Write other encodings of add, mult and exp.
3.3 Verify the correctness of succ and iszero:
succ n
−→
iszero 0
−→
iszero n + 1 −→
23
n+1
true
false
3.4 Recall the encoding of pre. Show that:
pre 0 −→
pre (n + 1) −→
0
n
3.5 Recall the following encoding of the Ackermann function:
ack = λm.m(λfn.nf(f1)) suc
Show that it is possible to derive the following:
ack 0 n =
ack (m + 1) 0 =
ack (m + 1) (n + 1) =
n+1
ack m 1
ack m (ack (m + 1) n)
3.6 Recall Klop's xed point combinator $:
$ = $$$$$$$$$$$$$$$$$$$$$$$$$$
$ = λabcdefghijklmnopqstuvwxyzr.r(thisisaxedpointcombinator)
Show that $F −→ F($F).
3.7 A term is called dened if and only if it can be reduced to head normal form. Which of the following
terms are dened? (Consider K = λxy.x.)
Y
Ynot K
YI
xΩ
YK
Y(Kx)
n
3.8 A term is said to be solvable if and only if there exists variables x1 , . . . , xm and terms N1 , . . . , Nn
such that
(λx1 . . . xm .M)N1 . . . Nn = I
Which of the terms from the previous exercise are solvable?
24
Chapter 4
The typed λ-calculus
So far we have discussed the type-free λ-calculus. In this chapter we will discuss type systems for the λcalculus. The initial motivation to dene typed versions of the λ-calculus was to avoid paradoxical uses of
the untyped calculus [Church, 1940].
4.1
Simple Types
The Curry Type System was rst studied in [Curry, 1934] for the theory of combinators. In [Curry and Feys, 1958]
this system was modied for the λ-calculus. The denitions and proofs of results in this section can be found
in [Barendregt, 1992].
We start by dening the set of types for this system.
Definition 4.1.1 Let V be an innite set of type variables. The set of simple types, TC is inductively
dened from V in the following way:
α2V
τ, τ 0 2 TC
Notation
represents
If τ1 , . . . , τn 2 TC , then
⇒
⇒
α 2 TC
(τ → τ 0 ) 2 TC
τ1 → τ2 → → τn
(τ1 → (τ2 → → (τn−1 → τn ) . . . ))
that is, the type constructor → is right associative.
Definition 4.1.2 If x is a term variable in
V
and τ is a type in TC then:
A statement is of the form M : τ, where the type τ is called the predicate, and the term M is
called the subject of the statement.
A declaration is a statement where the subject is a term variable.
A basis Γ is a set of declarations where all the subjects are distinct.
A basis where the subjects are pairwise distinct, is called monovalent (or consistent).
Definition 4.1.3 If Γ = {x1 : τ1 , . . . , xn : τn } is a basis, then:
25
Γ is a partial function, with domain, denoted dom(Γ ) = {x1 , . . . , xn }, and Γ (xi ) = τi .
Let
We dene Γx as Γ \ {x : τ} .
V0
be a set of variables. Then Γ V0 = {x : Γ (x) | x 2 V0 }.
Definition 4.1.4 In the Curry type system, we say that M has type τ given the basis Γ , and write
Γ
if Γ
`C
M : τ,
`C
M : τ can be obtained from the following derivation rules:
Γ
[
{x : τ} `C x : τ
(Axiom)
Γx [ {x : τ1 } `C M : τ2
Γx
Γ
`C
(→ Intro)
λx.M : τ1 → τ2
`C
M : τ1 → τ2
Γ
`C
Γ
`C
N : τ1
MN : τ2
(→ Elim)
Example 4.1.5 For the λ-term (λxy.x)(λx.x) the following derivation is obtained in the Curry Simple
Type System:
{x : α → α, y : β} `C x : α → α
{x : α → α} `C λy.x : β → α → α
`C
{x : α} `C x : α
λxy.x : (α → α) → β → α → α
`C
`C
λx.x : α → α
(λxy.x)(λx.x) : β → α → α
Proposition 4.1.6 (Basic lemmas) Let Γ be a basis:
Let Γ 0 be a basis such that Γ
Γ 0 , then
Γ
If Γ
`C
M : τ, then fv(M) dom(Γ ).
If Γ
`C
M : τ, then Γ fv(M) `C M : τ.
`C
M : τ ⇒ Γ0
`C
M : τ.
Definition 4.1.7 (Substitution) We call type-substitution to
S = [τ1 /α1 , . . . , τn /αn ]
where α1 , . . . , αn are distinct type variables and τ1 , . . . , τn are types in TC . If τ is a type in TC , then
S(τ) is the type obtained by simultaneously substituting αi by τi , with 1 i n, in τ.
The type S(τ) is called an instance of the type τ. The notion of substitution can be extended to
basis in the following way:
S(Γ )
=
{x1 : S(τ1 ), . . . , xn : S(τn )}
if Γ = {x1 : τ1 , . . . , xn : τn }
The basis S(Γ ) is called an instance of the basis Γ .
Next we will present some standard properties of the Simple Type System. Details and proofs can be found
in [Hindley, 1997].
26
Lemma 4.1.8 (Substitution lemmas)
1. If Γ
`C
M : τ, then S(Γ ) `C M : S(τ).
2. If Γ [ {x : τ1 } `C M : τ and Γ
`C
N : τ, then Γ
`C
M[N/x] : τ.
Theorem 4.1.9 (Subject reduction) Let M be a λ-term, and M −→β M 0 , then
Γ
`C
M:τ⇒Γ
`C
M 0 : τ.
The implication in the other direction is called subject expansion, and does not hold for this system. For
example (λxy.y)(λz.zz) −→β (λy.y), where (λy.y) is typable in this system, and (λxy.y)(λz.zz) is not.
Subject reduction also holds for the three sub-calculi dened before. The property of subject expansion
is veried in the λI calculus and in the linear and ane λ-calculus.
Theorem 4.1.10 (Strong normalization) Let M be a λ-term.
Γ
`C
M : τ ⇒ M is strongly normalisable.
Notice that the implication in the other direction does not hold. There are many strongly normalisable
λ-terms that are not typable in this system. For example, the term λx.xx is in normal form therefore is
strongly normalisable, but it is not typable in the Curry Simple Type System (notice that, to type the
subterm xx, the variable x has to be of both type α and α → β).
4.1.1
Type-checking and Typability
In the Curry Type System, as well in other type systems the following questions arise:
1. Given a term M in Λ, a type τ and a basis Γ , do we have Γ
`C
M : τ?
2. Given a term M in Λ, is there a type τ and a basis Γ , such that Γ
`C
M : τ?
The rst question concerns type checking and the second typability, and will be discuss in this section.
Type checking and typability in the Curry Type System are both decidable problems. Moreover, for the
typability problem there exists a function that, for any typable term M, returns the most general type for
M in this system. Such type is called the principal type of the term.
We rst introduce the notions of principal pair and principal type.
Definition 4.1.11 (Principal pair) Let M be a term in Λ. Then (Γ, τ) is called a principal pair for M
if:
1. Γ
`C
M : τ;
2. If Γ 0 `C M : τ 0 , then
9
S. (S(Γ ) Γ 0 and S(τ) τ 0 ).
Note that, if (Γ, τ) is a principal pair for a term M, then fv(M) = dom(Γ ).
Definition 4.1.12 (Principal type) Let M be a closed term in Λ. Then τ is called a principal type
for M if:
1.
`C
2. If
M : τ;
`C
M : τ 0 , then
9
S. (S(τ) τ 0 ).
27
The principal type of a term M is a characterisation of the set of types that can be assigned to the term M.
Note that every type that can be assigned to M, can be obtained from the principal type of M by applying
a type-substitution.
The following result is independently due to Curry [Curry, 1969], Hindley [Hindley, 1969], and Milner
[Milner, 1978] (see [Barendregt, 1992] for more details).
Theorem 4.1.13 (Principal type theorem)
1. Let M be a term in Λ. There exists a total function pp, such that:
M has a type
M has no type
⇒ pp(M) = (Γ, τ), and (Γ, τ) is a principal pair for M
⇒ pp(M) = fail.
2. Let M be a closed term in Λ. There exists a total function pt, such that:
M has a type
M has no type
⇒ pt(M) = τ, and τ is a principal type of M
⇒ pt(M) = fail.
Based on the functions dene above, decidability for type-checking and typability in the Curry Type System,
can be stated. Notice that that for Γ = {x1 : τ1 , . . . , xn : τn } we have
Γ
`C
M : τ if and only if
`C
λx1 . . . xn .M : τ1 → τn → τ.
Therefore one can state the questions of type-checking and type-assignment, taking Γ to be an empty set.
Corollary 4.1.14 Type checking and typability are decidable problems in the Curry Type System.
Type checking decidability is proved by noticing that, given M and τ:
`C
M : τ ⇐⇒
9
S. (S(τ) = pt(M)).
The type-substitution S is found using Robinson's of rst-order unication [Robinson, 1965], which is decidable.
As for typability, notice that
M is typable ⇐⇒ pt(M) 6= fail.
4.2
Type-inference
We now present two principal type algorithms for the Curry Type System [Wand, 1987, Hindley, 1997], based
on Robinson's unication algorithm [Robinson, 1965], that given a term M, return its principal typing.
To type-check a given type for a given term, one can use the type inference algorithms to calculate the
principal type of the term and then verify if the given type is an instance of the principal type.
4.2.1
Unification
Robinson's unication algorithm [Robinson, 1965] plays a key role in type inference. We will present a
denition of the unication algorithm for the particular case where the terms we want to unify are simple
types.
Definition 4.2.1 (Unifier) A unier between two types τ1 and τ2 , is a type-substitution S such that
S τ1 = S τ2 . If two types have a unier then we say that they are uniable. We call S τ1 (or S τ2 ), a
common instance.
28
Example 4.2.2 The types (α → β → α) and ((γ → γ) → δ) are uniable. For the substitution
S = [(γ → γ)/α, β → (γ → γ)/δ], the common instance is ((γ → γ) → β → (γ → γ)).
Definition 4.2.3 (Most general unifier) S is a most general unier (mgu) of τ1 and τ2 if, for any
other unier S1 , of τ1 and τ2 , there is a substitution S2 such that S1 = S2 S.
Example 4.2.4 Consider the types τ1 = (α → α) e τ2 = (β → γ). The substitution S 0 = [(α1 →
α2 )/α, (α1 → α2 )/β, (α1 → α2 )/γ] is a unier of τ1 and τ2 , but it is not the mgu.
The mgu of τ1 and τ2 is S = [α/β, α/γ]. The common instance of τ1 and τ2 by S 0 , (α1 → α2 ) →
(α1 → α2 ) is an instance of (α → α).
We now present the Unication algorithm we will use to dene type inference. The function UNIFY, given
two types τ1 and τ2 , returns the mgu of τ1 and τ2 if it exists, and fails otherwise.
Definition 4.2.5 (Unification Algorithm) Let τ1 and τ2 be two types. The unication function
UNIFY(τ1 , τ2 ) is inductively dened as:
UNIFY(α, τ)
=
=
=
[τ/α] if α 2
/ FV(τ)
Id
(identity function) if τ = α
fail otherwise
UNIFY(τ1 → τ2 , α)
=
UNIFY(α, τ1 → τ2 )
UNIFY(σ1 → σ2 , τ1 → τ2 )
=
let
S = UNIFY(σ2 , τ2 )
in UNIFY(S σ1 , S τ1 ) S
Note that let S = UNIFY(σ2 , τ2 ) in UNIFY(S σ1 , S τ1 ) S, fails if one of the calls of UNIFY fails.
4.2.2
Milner’s Type-Inference Algorithm
The following algorithm, presented in [Milner, 1978], denes a function that, given a term M returns a basis
Γ and a type τ such that Γ ` M : τ, is the principal typing of M.
Definition 4.2.6 Let Γ be a basis, M a term and τ a type. Let UNIFY be the unication function
dened above. The function T (M) = (Γ, τ) denes a type inference algorithm for the simply typed
λ-calculus, in the following way:
1. If M is a variable x, then Γ = fx : αg and τ = α, where α is a new variable;
2. If M M1 M2 , T (M1 ) = (Γ1 , τ1 ) e T (M2 ) = (Γ2 , τ2 ), then let {v1 , . . . , vn } = FV(M1 ) \ FV(M2 ), such
that v1 : δ1 , . . . , vn : δn 2 Γ1 and v1 : δ10 , . . . , vn : δn0 2 Γ2 . Then
T (M) = (S Sn S1 (Γ1 [ Γ2 ), Sα), where
S1 = UNIFY(δ1 , δ10 );
S2 = UNIFY(S1 (δ1 ), S1 (δ10 ));
...
Sn = UNIFY(Sn−1 S1 (δn ), Sn−1 S1 (δn0 ));
S = UNIFY(Sn S1 (τ1 ), Sn S1 (τ2 ) → α) and α is a new variable;
3. If M λx.N and T (N) = (ΓN , σ) then:
a) If x 2
/ dom(ΓN ), then T (M) = (ΓN , α → σ), where α is a new variable;
b) If {x : τ} 2 ΓN , T (M) = (ΓN \ {x : τ}, τ → σ).
29
4.2.3
Wand’s Algorithm
The following algorithm [Wand, 1987], given a basis, a term, and a type returns a set of equations, that
when applied unication, give the principal type of the term:
Definition 4.2.7 Let Γ be a basis, M 2 Λ and σ 2 T. Let E = (Γ, M, σ) be the set of equations dened
by:
E(Γ, x, σ)
E(Γ, MN, σ)
=
=
E(Γ, λx.M, σ)
=
{σ = Γ (x)}
E(Γ, M, α → σ) [ E(Γ, N, α),
where α is a new variable;
E(Γ [ {x : α}, M, β) [ {α → β = σ},
where α, β are new variables.
Applying unication to E gives a substitution S such that S(Γ ) ` M : S(σ).
4.3
The Damas-Milner Polymorphic Type System
We now present the Damas-Milner [Damas and Milner, 1982] type system for the λ-calculus with parametric
polymorphism, which forms the basis of type inference algorithms for functional languages, such as ML and
Haskell.
4.3.1
The term language
The set of terms is an extension of the λ-calculus with a a constructor let. Given an innite set of term
variables V , the term language is give by the following grammar:
M ::= x | MM 0 | λx.M | let x = M in M 0
4.3.2
Type Schemes
The denition of types schemes allows us to represent the set of terms one can infer for a term M, given a
basis Γ .
Definition 4.3.1 We say that σ is a type scheme if σ is a simple type τ or a term of the form
8α1 , ..., αn .τ, where α1 , ..., αn are called generic type variables.
Definition 4.3.2 If τ is a type and σ a type scheme, we say that τ is a generic instance of σ if and
only if σ = τ or σ = 8α1 , ..., αn .η and 9τ1 , ..., τn such that τ = [τi /αi ]η.
4.3.3
Types
The set of types of this system is given the the following grammar:
τ ::= α | τ 0 → τ 00
σ ::= τ | 8α.σ 0
where α belongs to an innite set of type variables, τ, τ 0 and τ 00 are simple types, and σ and σ 0 are type
schemes.
30
4.3.4
The type system
Let x be a term variable, α a type variable, M and M 0 terms, τ and τ 0 simple types and σ and σ 0 type
schemes. The Damas-Milner type system is dened by the following inference rules: s
(Ax)
Γ
Γ
(Gen)
[
`ML
(Inst)
(App)
`ML
`ML
M : 8α.σ
Γ
`ML
M : 8α.σ
`ML
M : σ[τ/α]
M : τ0 → τ
Γ
`ML
Γ
`ML
M0 : τ
(MM 0 ) : τ
Γx [ {x : τ 0 } `ML M : τ
(Abs)
Γx
(Let)
α2
/ fv(Γ )
M:σ
Γ
Γ
Γ
{x : σ} `ML x : σ
Γ
`ML
Γ
`ML
λx.M : (τ 0 → τ)
M:σ
`ML
Γ
[
{x : σ} `ML M 0 : σ 0
let x = M in M 0 : σ 0
Example 4.3.3 Consider the derivation tree for (λx.x) in this system:
{x : α} `ML x : α
`ML
`ML
4.3.5
λx.x : α → α
Abs
λx.x : 8α.α → α
Gen
Type Inference
We will dene the notion of closure of a type, which will be used in the denition of the algorithm.
Definition 4.3.4 Let V be a set of type variables, Γ a basis and τ a type. The closure of τ with respect
to V , V(τ) is the type scheme 8α1 , . . . , αn .τ, where α1 , . . . , αn are all the variables that occur in τ that
do not belong to V . Γ (τ) is the closure of τ with respect to the set of type variables that occur in Γ .
We can now dene the type-inference algorithm for this system.
Definition 4.3.5 Let Γ be a basis, M a term, S a substitution and τ a type. Let UNIFY be the unication
function dened above. The function W(Γ, M) = (S, τ) is dened in the following way:
1. If M is a variable x and x : 8α1 , . . . , αn .τ
where βi is a new variable (1 i n).
2
Γ the S is the identity function and σ = [βi /αi ]τ,
2. If M M1 M2 , let:
W(Γ, M1 ) = (S1 , τ1 );
W(Γ, M2 ) = (S2 , τ2 );
S3 = UNIFY(S2 τ1 , τ2 → β), where β is a new variable;
then S = S3 S2 S1 and σ = Sβ;
31
3. If M
λx.N, let β be a new variable and W(Γ \ {x : α} [ {x : β}, N) = (S1 , τ1 ), then S = S1 e
σ = S1 (β → τ1 );
4. If M let x = M1 in M2 , let:
W(Γ, M1 ) = (S1 , τ1 ) e
W(S1 (Γ \ {x : α} [ {x : S1 Γ (τ1 )}), M2 ) = (S2 , τ2 )
then S = S2 S1 e σ = τ2
4.3.6
Correctness and Completeness
Proposition 4.3.6 If W(Γ, M) = (S, τ), then S(Γ ) `ML M : S(τ).
Definition 4.3.7 Let Γ be a basis and M a term. We say that σP is the principal type-scheme of Γ if
and only if:
M : σP .
Γ
For any other σ such that Γ
`ML
`ML
M : σ is a generic instance of σP .
Theorem 4.3.8 Given Γ and M, let Γ 0 be an instance of Γ and σ a type scheme such that Γ 0
Then:
`ML
M : σ.
W(Γ, M) succeeds
If W(Γ, M) = (S, τ), then there exists S 0 such that Γ 0 = S 0 SΓ and σ is a generic instance of S 0 SΓ (τ).
Exercises
4.1 Use the simple types system to infer a type for:
(a) λxy.x
(b) λx.y se y : τ
(c) λxy.xyy
4.2 Use the simple types system to verify the type of:
(a)
(b)
(c)
(d)
`C
λxyz.x(yz) : (α −→ β) −→ (γ −→ α) −→ (γ −→ β)
`C
λxyz.xzy : (α −→ β −→ γ) −→ β −→ α −→ γ
`C
λxy.xyy : (α −→ α −→ β) −→ α −→ β
`C
λxyz.xz(yz) : (α −→ (β −→ γ)) −→ (α −→ β) −→ α −→ γ
4.3 Using the type inference algoritm T, for the simple types system to infer a principal type for:
(a)
(b)
(c)
(d)
λxyz.xzy
λxy.(λz.x)(yx)
λxy.y(λz.zxx)x
λxy.xy(λz.yz)
32
4.4 Implement the principal type inference algoritm T.
4.5 Prove the following basic lemmas:
(a) Let Γ be a basis. If Γ 0 is a basis such that Γ Γ 0 , then Γ
(b) Let Γ be a basis. If Γ `C M : τ, then fv(M) dom(Γ ).
(c) Let Γ be a basis. If Γ `C M : τ, then Γ fv(M) `C M : τ.
`C
M : τ implies that Γ 0
`C
M : τ.
4.6 Show that:
(a) If Γ `C M : τ, then S(Γ ) `C M : S(τ)
(b) If Γ [ {x : σ} `C M : τ and Γ `C N : σ, then Γ
`C
M[N/x] : τ
4.7 Using Wand's type inference for the simple types system, determine the principal type for:
(a)
(b)
(c)
(d)
λxyz.xzy
λxy.(λz.x)(yx)
λxy.y(λz.zxx)x
λxy.xy(λz.yz)
4.8 Using the type inference algorithm W, for the Damas-Milner type system, infer a type for:
(a) let i =`C x.xy in ii
(b) let x =`C zy.zyy in xx
(c) let x = λzw.zw in λz.z(xx)
4.9 Show that:
(a) If Γ `ML M : σ, then S(Γ ) `ML M : S(σ) and the size of the derivation tree for S(Γ ) `ML M : S(σ)
is less or equal to the size of Γ `ML M : σ.
(b) If σ 0 is a generic instance of σ, and Γx [ {x : σ 0 } `ML M : σ 00 , then Γx [ {x : σ} `ML M : σ 00 .
33
Bibliography
[Barendregt, 1984] Barendregt, H. P. (1984). The Lambda Calculus: Its Syntax and Semantics, volume
103 of Studies in Logic and the Foundations of Mathematics. North-Holland, Amsterdam, second,
revised edition.
[Barendregt, 1992] Barendregt, H. P. (1992). Lambda Calculi with Types. In Abramsky, S., Gabbay, D. M.,
and Maibaum, T., editors, Handbook of Logic in Computer Science, volume 2. Clarendon Press, Oxford.
[Barendregt et al., 1976] Barendregt, H. P., Bergstra, J., Klop, J. W., and Volken, H. (1976). Degrees,
reductions and representability in the lambda calculus. Technical Report Preprint no.22, University of
Utrecht, Department of Mathematics.
[Bruijn, 1972] Bruijn, N. G. D. (1972). Lambda calculus notation with nameless dummies, a tool for automatic formula manipulation, with application to the church-rosser theorem. INDAG. MATH, 34:381{392.
[Church, 1940] Church, A. (1940). A Formulation of the Simple Theory of Types. Journal of Symbolic
Logic, 5(2):56{68.
[Church and Rosser, 1936] Church, A. and Rosser, J. B. (1936). Some Properties of Conversion. Transactions of the American Mathematical Society, 39(3):472{482.
[Curry, 1969] Curry, H. (1969). Modied basic functionality in combinatory logic. Dialectica, 23:83{92.
[Curry and Feys, 1958] Curry, H. and Feys, R. (1958). Combinatory Logic, volume 1. North-Holland,
Amsterdam.
[Curry, 1934] Curry, H. B. (1934). Functionality in Combinatory Logic. In Proceedings of the National
Academy of Science, U.S.A, volume 20, pages 584{590. National Academy of Sciences.
[Damas and Milner, 1982] Damas, L. and Milner, R. (1982). Principal type-schemes for functional programs.
In Proceedings of the 9th ACM SIGPLAN-SIGACT symposium on Principles of programming languages, POPL '82, pages 207{212, New York, NY, USA. ACM.
[Dershowitz and Jouannaud, 1990] Dershowitz, N. and Jouannaud, J.-P. (1990). Rewrite Systems. In van
Leeuwen, J., editor, Handbook of Theoretical Computer Science, volume B: Formal Models and Semantics, chapter 6, pages 243{320. North-Holland, Amsterdam.
[Hindley, 1969] Hindley, J. R. (1969). The Principal Type-scheme of an Object in Combinatory Logic.
Transactions of the American Mathematical Society, 146:29{60.
[Hindley, 1997] Hindley, J. R. (1997). Basic Simple Type Theory, volume 42 of Cambridge Tracts in
Theoretical Computer Science. Cambridge University Press.
34
[Klop, 1992] Klop, J. W. (1992). Term Rewriting Systems. In Abramsky, S., Gabbay, D. M., and Maibaum,
T. S. E., editors, Handbook of Logic in Computer Science: Background - Computational Structures
(Volume 2), pages 1{116. Clarendon Press, Oxford.
[Milner, 1978] Milner, R. (1978). A Theory of Type Polymorphism in Programming. Journal of Computer
and System Sciences, 17(3):348{375.
[Robinson, 1965] Robinson, J. A. (1965). A machine-oriented logic based on the resolution principle. Journal
of the Association for Computing Machinery (ACM), 12:23{41.
[van Raamsdonk et al., 1999] van Raamsdonk, F., Severi, P., Sorensen, M. H. B., and Xi, H. (1999). Perpetual reductions in λ-calculus. Information and Computation, 149(2):173{225.
[Wand, 1987] Wand, M. (1987). A Simple Algorithm and Proof for Type Inference. Fundamenta Infomaticae, 10:115{121.
35