1 Pattern Matching for Scheme

Pattern Matching for Scheme
Andrew K. Wright and Bruce F. Duba
Department of Computer Science
Rice University
Houston, TX 77251-1892
Version 1.12, May 24, 1995
Please direct questions or bug reports regarding this software to [email protected].
The most recent version of this software can be obtained by anonymous FTP from site
ftp.nj.nec.com in le pub/wright/match.tar.gz.
1 Pattern Matching for Scheme
Pattern matching allows complicated control decisions based on data structure to be expressed in
a concise manner. Pattern matching is found in several modern languages, notably Standard ML,
Haskell and Miranda. This document describes several pattern matching macros for Scheme, and
an associated mechanism for dening new forms of structured data.
The basic form of pattern matching expression is:
(match exp [pat body ] : : : )
where exp is an expression, pat is a pattern, and body is one or more expressions (like the body
of a lambda-expression).1 The match form matches its rst subexpression against a sequence of
patterns, and branches to the body corresponding to the rst pattern successfully matched. For
example, the following code denes the usual map function:
(dene map
(lambda (f l )
(match l
[() ()]
[(x . y ) (cons (f x ) (map f y ))])))
The rst pattern () matches the empty list. The second pattern (x : y) matches a pair, binding x
to the rst component of the pair and y to the second component of the pair.
1.1
Pattern Matching Expressions
The complete syntax of the pattern matching expressions follows:
1 The notation \hthing i " indicates that hthing i is repeated zero or more times. The notation \hthing 1 j thing 2 i"
means an occurrence of either thing 1 or thing 2 . Brackets \[]" are extended Scheme syntax, equivalent to parentheses
\()".
:::
1
exp ::= (match exp clause : : : )
j
(match-lambda clause : : : )
j
(match-lambda* clause : : : )
j
(match-let ([pat exp ] : : : ) body )
j
(match-let* ([pat exp ] : : : ) body )
j
(match-letrec ([pat exp ] : : : ) body )
j
(match-let var ([pat exp ] : : : ) body )
j
(match-dene pat exp )
clause ::= [pat body ] j [pat (=> identier ) body ]
Figure 1 gives the full syntax for patterns. The next subsection describes the various patterns.
The match-lambda and match-lambda* forms are convenient combinations of match and
lambda, and can be explained as follows:
(match-lambda [pat body ] : : : ) = (lambda (x ) (match x [pat body ] : : : ))
(match-lambda* [pat body ] : : : ) = (lambda x (match x [pat body ] : : : ))
where x is a unique variable. The match-lambda form is convenient when dening a single argument function that immediately destructures its argument. The match-lambda* form constructs
a function that accepts any number of arguments; the patterns of match-lambda* should be lists.
The match-let, match-let*, match-letrec, and match-dene forms generalize Scheme's let,
let*, letrec, and dene expressions to allow patterns in the binding position rather than just
variables. For example, the following expression:
(match-let ([(x y z ) (list 1 2 3)]) body)
binds x to 1, y to 2, and z to 3 in body . These forms are convenient for destructuring the result of
a function that returns multiple values as a list or vector. As usual for letrec and dene, pattern
variables bound by match-letrec and match-dene should not be used in computing the bound
value.
The match, match-lambda, and match-lambda* forms allow the optional syntax (=> identier ) between the pattern and the body of a clause. When the pattern match for such a clause
succeeds, the identier is bound to a failure procedure of zero arguments within the body . If this
procedure is invoked, it jumps back to the pattern matching expression, and resumes the matching
process as if the pattern had failed to match. The body must not mutate the object being matched,
otherwise unpredictable behavior may result.
1.2
Patterns
Figure 1 gives the full syntax for patterns. Explanations of these patterns follow.
identier (excluding the reserved names ?, $, =, , and, or, not, set!, get!, :::, and ::k for nonnegative integers k): matches anything, and binds a variable of this name to the matching value
in the body .
: matches anything, without binding any variables.
(), #t, #f, string , number , character , 's-expression : These constant patterns match themselves,
ie., the corresponding value must be equal? to the pattern.
(pat 1 : : : pat n ): matches a proper list of n elements that match pat 1 through pat n .
2
Pattern :
pat ::= identier
Matches :
anything, and binds identier as a variable
anything
()
itself (the empty list)
#t
itself
#f
itself
string
an equal? string
number
an equal? number
character
an equal? character
's-expression
an equal? s-expression
'symbol
an equal? symbol (special case of s-expression)
(pat 1 pat n )
a proper list of elements
(pat 1 pat n pat n+1 )
a list of or more elements
(pat 1 pat n pat n+1 ) a proper list of + or more elementsa
#(pat 1 pat n )
a vector of elements
#(pat 1 pat n pat n+1 ) a vector of + or more elements
#&pat
a box
($ struct 1
a structure
n)
(= eld )
a eld of a structure
(and pat 1 pat n )
if all of pat 1 through pat n match
(or pat 1 pat n )
if any of pat 1 through pat n match
(not pat 1 pat n )
if none of pat 1 through pat n match
(? predicate pat 1 pat n ) if predicate true and pat 1 through pat n all match
(set! identier )
anything, and binds identier as a setter
(get! identier )
anything, and binds identier as a getter
`qp
a quasipattern
Quasipattern:
Matches :
qp ::= ()
itself (the empty list)
#t
itself
#f
itself
string
an equal? string
number
an equal? number
character
an equal? character
identier
an equal? symbol
(qp 1 qp n )
a proper list of elements
(qp 1 qp n qp n+1 )
a list of or more elements
(qp 1 qp n qp n+1 )
a proper list of + or more elements
#(qp 1 qp n )
a vector of elements
#(qp 1 qp n qp n+1 ) a vector of + or more elements
#&qp
a box
,pat
a pattern
,@pat
a pattern, spliced
j
j
j
j
j
j
j
j
j
j
:::
j
:::
j
:::
n
:
n
::k
j
:::
j
:::
n
k
n
::k
n
k
j
j
pat
j
pat
j
: : : pat
:::
j
:::
j
:::
j
:::
j
j
j
j
j
j
j
j
j
j
:::
j
:::
j
:::
j
:::
j
:::
n
:
n
::k
n
k
n
::k
n
k
j
j
j
Figure 1: Pattern Syntax
a The notation
denotes a keyword consisting of three consecutive dots (ie., \ "), or two dots and an non-negative
integer (eg., \ 1", \ 2"), or three consecutive underscores (ie., \ "), or two underscores and a non-negative integer.
The keywords \ " and \ " are equivalent. The keywords \ ", \ ", \ 0", and \ 0" are equivalent.
:::
::k
::
::
::k
k
:::
3
::
(pat 1 : : : pat n : pat n+1 ): matches a (possibly improper) list of at least n elements that ends in
something matching pat n+1 .
(pat 1 : : : pat n pat n+1 :::): matches a proper list of n or more elements, where each element of the
tail matches pat n+1 . Each pattern variable in pat n+1 is bound to a list of the matching values. For
example, the expression:
(match '(let ([x 1][y 2]) z)
[('let ((binding values ) : : : ) exp ) body ])
binds binding to the list '(x y), values to the list '(1 2), and exp to 'z in the body of the matchexpression. For the special case where pat n+1 is a pattern variable, the list bound to that variable
may share with the matched value.
(pat 1 : : : pat n pat n+1
): This pattern means the same thing as the previous pattern.
(pat 1 : : : pat n pat n+1 ::k): This pattern is similar to the previous pattern, but the tail must be at
least k elements long. The pattern keywords ::0 and ::: are equivalent.
(pat 1 : : : pat n pat n+1
k): This pattern means the same thing as the previous pattern.
#(pat 1 : : : pat n ): matches a vector of length n, whose elements match pat 1 through pat n .
#(pat 1 : : : pat n pat n+1 :::): matches a vector of length n or more, where each element beyond n
matches pat n+1 .
#(pat 1 : : : pat n pat n+1 ::k): matches a vector of length n + k or more, where each element beyond
n matches pat n+1 .
#&pat : matches a box containing something matching pat .
($ struct pat1 : : : patn ): matches a structure declared with dene-structure or dene-conststructure. See Section 2.
(= eld pat): is intended for selecting a eld from a structure. \eld" may be any expression; it
is applied to the value being matched, and the result of this application is matched against pat.
(and pat 1 : : : pat n ): matches if all of the subpatterns match. At least one subpattern must be
present. This pattern is often used as (and x pat) to bind x to to the entire value that matches pat
(cf. \as-patterns" in ML or Haskell).
(or pat 1 : : : pat n ): matches if any of the subpatterns match. At least one subpattern must be
present. All subpatterns must bind the same set of pattern variables.
(not pat 1 : : : pat n ): matches if none of the subpatterns match. At least one subpattern must be
present. The subpatterns may not bind any pattern variables.
4
(? predicate pat 1 : : : pat n ): In this pattern, predicate must be an expression evaluating to a single
argument function. This pattern matches if predicate applied to the corresponding value is true,
and the subpatterns pat 1 : : : pat n all match. The predicate should not have side eects, as the code
generated by the pattern matcher may invoke predicates repeatedly in any order. The predicate
expression is bound in the same scope as the match expression, ie., free variables in predicate are
not bound by pattern variables.
(set! identier ): matches anything, and binds identier to a procedure of one argument that
mutates the corresponding eld of the matching value. This pattern must be nested within a pair,
vector, box, or structure pattern. For example, the expression:
(dene x (list 1 (list 2 3)))
(match x [( ( (set! setit ))) (setit 4)])
mutates the cadadr of x to 4, so that x is '(1 (2 4)).
(get! identier ): matches anything, and binds identier to a procedure of zero arguments that
accesses the corresponding eld of the matching value. This pattern is the complement to set!. As
with set!, this pattern must be nested within a pair, vector, box, or structure pattern.
Quasipatterns: Quasiquote introduces a quasipattern, in which identiers are considered to be
symbolic constants. Like Scheme's quasiquote for data, unquote (,) and unquote-splicing (,@)
escape back to normal patterns.
1.3
Match Failure
If no clause matches the value, the default action is to invoke the procedure match:error with the
value that did not match. The default denition of match:error calls error with an appropriate
message:
> (match 1 [2 2])
Error: no clause matched 1.
For most situations, this behavior is adequate, but it can be changed either by redening match:error ,
or by altering the value of the variable match:error-control . Valid values for match:error-control are:
match:error-control : error action:
'error (default)
call (match:error unmatched-value )
'match
call (match:error unmatched-value '(match expression : : : ))
'fail
call match:error or die in car , cdr , ...
'unspecied
return unspecied value
Setting match:error-control to 'match causes the entire match expression to be quoted and passed
as a second argument to match:error . The default denition of match:error then prints the match
expression before calling error ; this can help identify which expression failed to match. This option
causes the macros to generate somewhat larger code, since each match expression includes a quoted
representation of itself.
Setting match:error-control to 'fail permits the macros to generate faster and more compact code
than 'error or 'match. The generated code omits pair? tests when the consequence is to fail in car
or cdr rather than call match:error .
Finally, if match:error-control is set to 'unspecied, non-matching expressions will either fail in
car or cdr , or return an unspecied value. This results in still more compact code, but is unsafe.
5
2 Data Denition
The ability to dene new forms of data proves quite useful in conjunction with pattern matching.
This macro package includes a slightly altered2 version of Chez Scheme's dene-structure macro for
dening new forms of data [1], and a similar dene-const-structure macro for dening immutable
data.
The following expression denes a new kind of data named struct :
(dene-structure (struct arg 1 : : : arg n ))
A struct is a composite data structure with n elds named arg 1 through arg n . The dene-structure
macro declares the following procedures for constructing and manipulating data of type struct :
Procedure Name :
Function :
make-struct
constructor requiring n arguments
struct ?
predicate
struct-arg1 , : : : , struct-argn
named selectors
set-struct-arg1!, : : : , set-struct-argn! named mutators
struct-1, : : : , struct-n
numeric selectors
set-struct-1!, : : : , set-struct-n!
numeric mutators
The eld name (underscore) is special: no named selectors or mutators are dened for such a eld.
Such unnamed elds can only be accessed through the numeric selectors or mutators, or through
pattern matching.
A second form of denition:
(dene-structure (struct arg 1 : : : arg n ) ([init 1 exp1 ] : : : [init m expm ]))
declares m additional elds init 1 through init m with initial values exp 1 through exp m . The expressions exp 1 through exp m are evaluated in order each time make-struct is invoked.
Finally, the macro dene-const-structure:
(dene-const-structure (struct arg 1 : : : arg n ))
(dene-const-structure (struct arg 1 : : : arg n ) ([init 1 exp1 ] : : : [init m expm ]))
is similar to dene-structure, but allows immutable elds. If a eld name arg i is simply a variable,
no (named or numeric) mutator is declared for that eld. If a eld name has the form (! x )
where x is a variable, then that eld is mutable. Hence (dene-structure (Foo a b )) abbreviates
(dene-const-structure (Foo (! a ) (! b ))).
By default, structures are implemented as vectors whose rst component is the name of the
structure as a symbol. Thus a Foo structure of one eld will match both the patterns ($ Foo x)
and #('Foo x ). Setting the variable match:structure-control to 'disjoint causes subsequent denestructure denitions to create structures that are disjoint from all other data, including vectors.
In this case, Foo structures will no longer match the pattern #('Foo x ).3
2 This macro generates additional numeric selector and mutator names for use by the pattern matcher, recognizes
as an unnamed eld, and optionally allows structures to be disjoint from vectors. Chez Scheme does not provide
dene-const-structure.
3 Disjoint structures are implemented as vectors whose rst component is a unique symbol (an uninterned symbol
for Chez Scheme). The procedure vector? is modied to return false for such vectors (hence the 'disjoint option
cannot be used with Chez Scheme's optimize-level set higher than 1). For completeness the other vector operations
(vector-ref, vector-set!, etc.) should also be modied to reject structures, but we don't bother.
6
3 Code Generation
Pattern matching macros are compiled into if-expressions that decompose the value being matched
with standard Scheme procedures, and test the components with standard predicates. Rebinding or
lexically shadowing the names of any of these procedures will change the semantics of the match
macros. The names that should not be rebound or shadowed are:
null? pair? number? string? symbol? boolean? char? procedure? vector? box? list?
equal?
car cdr cadr cdddr ...
vector-length vector-ref
unbox
reverse length call/cc
Additionally, the code generated to match a structure pattern like ($ Foo pat 1 : : : pat n ) refers to the
names Foo? , Foo-1 through Foo-n , and set-Foo-1! through set-Foo-n! . These names also should
not be shadowed.
4 Examples
This section illustrates the convenience of pattern matching with some examples. The following
function recognizes s-expressions that represent the standard Y operator:
(dene Y?
(match-lambda
[('lambda (f1 )
('lambda (y1 )
((('lambda (x1 ) (f2 ('lambda (z1 ) ((x2 x3 ) z2 ))))
('lambda (a1 ) (f3 ('lambda (b1 ) ((a2 a3 ) b2 )))))
y2 )))
(and (symbol? f1 ) (symbol? y1 ) (symbol? x1 ) (symbol? z1 ) (symbol? a1 ) (symbol? b1 )
(eq? f1 f2 ) (eq? f1 f3 ) (eq? y1 y2 )
(eq? x1 x2 ) (eq? x1 x3 ) (eq? z1 z2 )
(eq? a1 a2 ) (eq? a1 a3 ) (eq? b1 b2 ))]
[ #f]))
Writing an equivalent piece of code in raw Scheme is tedious.
The following code denes abstract syntax for a subset of Scheme, a parser into this abstract
syntax, and an unparser.
(dene-structure (Lam args body ))
(dene-structure (Var s ))
(dene-structure (Const n ))
(dene-structure (App fun args ))
7
(dene parse
(match-lambda
[(and s (? symbol? ) (not 'lambda))
(make-Var s )]
[(? number? n )
(make-Const n )]
[('lambda (and args ((? symbol? ) : : : ) (not (? repeats? ))) body )
(make-Lam args (parse body ))]
[(f args : : : )
(make-App
(parse f )
(map parse args ))]
[x (error x "invalid expression")]))
(dene repeats?
(lambda (l )
(and (not (null? l ))
(or (memq (car l ) (cdr l )) (repeats? (cdr l ))))))
(dene unparse
(match-lambda
[($ Var s ) s ]
[($ Const n ) n ]
[($ Lam args body ) `(lambda ,args ,(unparse body ))]
[($ App f args ) `(,(unparse f ) ,@(map unparse args ))]))
With pattern matching, it is easy to ensure that the parser rejects all incorrectly formed inputs with
an error message.
With match-dene, it is easy to dene several procedures that share a hidden variable. The
following code denes three procedures, inc , value , and reset , that manipulate a hidden counter
variable:
(match-dene (inc value reset )
(let ([val 0])
(list
(lambda () (set! val (+ 1 val )))
(lambda () val )
(lambda () (set! val 0)))))
Although this example is not recursive, the bodies could recursively refer to each other.
The following code is taken from the macro package itself. The procedure match:validate-pattern
checks the syntax of match patterns, and converts quasipatterns into ordinary patterns.
8
(dene match:validate-pattern
(lambda (pattern )
(letrec
([simple?
(lambda (x )
(or (string? x ) (boolean? x ) (char? x ) (number? x ) (null? x )))]
[ordinary
(match-lambda
[(? simple? p ) p ]
[' ' ]
[(? match:pattern-var? p ) p ]
[('quasiquote p ) (quasi p )]
[(and p ('quote )) p ]
[('? pred ps : : : ) `(? ,pred ,@(map ordinary ps ))]
[('and ps ..1) `(and ,@(map ordinary ps ))]
[('or ps ..1) `(or ,@(map ordinary ps ))]
[('not ps ..1) `(not ,@(map ordinary ps ))]
[('$ (? match:pattern-var? r) ps : : : ) `($ ,r ,@(map ordinary ps ))]
[(and p ('set! (? match:pattern-var? ))) p ]
[(and p ('get! (? match:pattern-var? ))) p ]
[(p (? match:dot-dot-k? ddk )) `(,(ordinary p ) ,ddk )]
[(x . y ) (cons (ordinary x ) (ordinary y ))]
[(? vector? p ) (apply vector (map ordinary (vector->list p )))]
[(? box? p ) (box (ordinary (unbox p )))]
[p (match:syntax-err pattern "syntax error in pattern")])]
[quasi
(match-lambda
[(? simple? p ) p ]
[(? symbol? p ) `(quote ,p )]
[('unquote p ) (ordinary p )]
[(('unquote-splicing p ) . ()) (ordinary p )]
[(('unquote-splicing p ) . y ) (append (ordlist p ) (quasi y ))]
[(p (? match:dot-dot-k? ddk )) `(,(quasi p ) ,ddk )]
[(x . y ) (cons (quasi x ) (quasi y ))]
[(? vector? p ) (apply vector (map quasi (vector->list p )))]
[(? box? p ) (box (quasi (unbox p )))]
[p (match:syntax-err pattern "syntax error in pattern")])]
[ordlist
(match-lambda
[() ()]
[(x . y ) (cons (ordinary x ) (ordlist y ))]
[p (match:syntax-err pattern
"invalid use of unquote-splicing in pattern")])])
(ordinary pattern ))))
5 Known Bugs
A structure pattern like ($ foo a b c ) is not checked to ensure that there are enough elds present
for a foo object. This should be xed in the future.
9
Acknowledgments
Several members of the Rice programming languages community exercised the implementation and
suggested enhancements. We thank Matthias Felleisen, Cormac Flanagan, Amit Patel, and Amr
Sabry for their contributions.
References
[1]
Dybvig, R. K.
Jersey, 1987.
The Scheme Programming Language. Prentice-Hall, Englewood Clis, New
10