Adding preferences to CUF
Chris Brew
Language Technology Group
Human Communication Research Centre
Edinburgh University
February 23, 2000
Abstract
We discuss the problem of associating grammars written in CUF [DD93]
with preference information of the sort which has proved useful in the development of large scale grammars like the Alvey Natural Language Tools
Grammar [GBCB89]. We show a way to embed a useful part of the functionality of the Xerox part of speech tagger [Kup92] into the framework
of CUF, developing a design for a possible successor to CUF which has
the appropriate facilities for the handling of numeric data. On the basis of
this design we speculate about the prospects of including other preferencebased mechanisms within grammar encoded in CUF, and draw preliminary
conclusions about the nature of this enterprise. This chapter draws on material from two contributions [Bre93, Bre94] prepared for Esprit's Dyana-2
Basic Research Project.
1
Introduction
Highly declarative typed feature formalisms have not, to date, been much used
for large-scale grammar development.
Because
of
the
extra
generality
In part this is because
of
formalisms
like
CUF
it
is
not
immedi-
ately clear that established techniques for parse-ranking which were developed
for nite-state
automata
velopers
working
ways
resolve
to
grammars
tion
in
with
this.
demands
practical
or context-free
the
more
Purists
the
tasks.
might
creation
An
grammars will be of assistance for de-
elaborate
of
formalisms.
argue
entirely
alternative,
that
new
more
the
There
extra
at
least
expressivity
techniques
pragmatic
are
for
their
approach
is
of
two
the
exploitato
begin
by constructing systems in which the existing techniques are embedded within
Most of this work was carried out while the author was supported by a U.K. SERC grant
to Henry Thompson, other funding came from the ESRC Human Communication Research
Centre, and from the Dyana-2 grant.
Special thanks to David McKelvie and Suresh Manand-
har for very useful discussions, and to Jochen D
orre for extensive input, including a detailed
review and numerous informal comments.
I am also grateful to a forceful anonymous referee
who re-iterated many of Jochen's points in an entirely dierent rhetorical style.
Thanks also
to Steve Finch for helping me to understand the Xerox tagger, to Henry Thompson and Claire
Grover for providing comments on drafts.
1
the
more
complex
formalisms.
The
main
contribution
vide a simple example of such an embedding.
of
this
paper
is
to
pro-
Our secondary contribution is to
use the experience gained from designing and implementing this hybrid system
as
the
basis
for
a
more
speculative
discussion
of
the
way
in
which
preference
mechanisms can inter-operate with declarative grammar formalisms.
We
focus
exemplar
of
on
the
CUF
because
constraint
it
logic
is
a
widely
disseminated
programming
approach
and
to
highly
natural
1
processing, within which large systems have been constructed .
general
language
We rely on the
fact that such systems adopt an execution strategy more exible than the that
of Prolog, under which constraints can be delayed as needed.
augment
the
predicates.
usual
execution
This seems
to
large classes of stochastic
Aside
from
mechanism
cance
of
nisms.
the
for
this
be
model
a
with
facilities for
prerequisite for
the
In what follows we
memoization
ecient
of
selected
implementation
of
algorithm.
claim
about
memoization,
preference-ranking
is mainly as
a
proxy
which
for
a
can
much
we
use
this
co-exist
larger
article
with
class
to
CUF.
describe
The
if preference
a
signimecha-
Although we do present a specic proposal for preference ranking, this is
done mainly for concreteness and expository clarity.
We certainly do not intend
to suggest that the proposal, which amounts to a ranking of parses according to
the sequences of their preterminals, is sucient to provide useful parse ranking
in the general case.
Rather,the goal of the present work is to provide an explicit example of how
preference
mechanisms
can
be
incorporated
into
a
modern
framework.
Since
we are mainly concerned with demonstrating the interaction between statistical
and
symbolic
models
not
important
except
the
inadequacies
insofar
that
they
of
the
particular
demonstrate
model
developed
diculties which
we
are
may
expect to encounter in trying to integrate more empirically adequate preference
mechanisms into CUF and similar systems.
of
grammars
we
would
be
and
tagsets
delighted,
makes
but
also
our
If it turns out that judicious tuning
system
surprised.
of
more
The
than
expository
expository
goal
is
interest,
important
enough to pursue in it own right, since the development of eective hybrid NLP
systems depends on a meeting of minds between the statistical and the symbolic
camps.
It
is
an
acknowledged
problem
that
than one analysis with each sentence.
large-scale
grammars
associate
more
In fact, there are often so many analyses
that it is unreasonable to expect an analyst to inspect the complete set, making
the debugging of a large grammar at best an extremely tedious and error-prone
activity [BC93][p 40].
2
If possible, we would like to use preference information
to provide a rank-ordering of the outputs of the system, in order that the analyst
1
We
inevitably
introduce
CUF-specic
details
into
the
discussion,
but
the
substantive
points we make are not dependent on the choice of a particular contraint logic programming
mechanism.
2
This terminology is used in preference to Uszkoreit's control information [Usz91] because
CUF already makes a distinction between the declarative meaning of a specication and the
control information which indicates how the declarative specication is to be exploited.
the
sake
of
clarity
we
reserve
the
term
control
information
indexing statements described by D
orre and Dorna [DD93].
for
the
non-numeric
delay
For
and
But when we talk about preference
information we are referring to Uszkoreit's numerical weights for options within the grammar.
2
should need to inspect, or otherwise process, only those analyses which are (in
some
is
suitable
extremely
sense)
the
important
most
for
likely.
the
We
think
debugging
that
and
the
ability
development
to
of
rank
large
parses
coverage
grammars.
This
is
a
special
case
of
a
larger
problem.
In
any
system
there
comes
a
point when the explosion in number of parse results becomes so large that later
stages
of
is
the
capacity
processing
is
at
issue.
of
cannot
the
Given
a
reasonably
system
suitable
rather
be
asked
than
framework
the
for
to
handle
patience
preference
of
them
the
all.
Now
analyst
information
it
it
which
may
be
possible to alleviate this problem by structuring the computation in such a way
that a rank-ordered sequence of solutions is generated by lazy evaluation of the
input
against
practicality
by
which
the
of
grammar.
this
approach
preference
the theoretical
This
would
would
information
is
possibility of such a
described by Uszkoreit [Usz91].
be
depend
a
kind
on
combined,
scheme.
the
of
best-rst
exact
but
we
form
see
no
search
of
the
reason
3
.
The
calculus
to
doubt
This is essentially the programme
A potential diculty with this approach is the
unconstrained nature of the preference formalism.
At the best of times it is hard
for human beings to reason correctly about the behaviour of non-deterministic
programs.
The addition of complex interactions between continuously variable
numerical parameters is hardly going to help much.
the
preference
formalism is supposed to
role will become even
There
to
a
are
system
If the burden of exploiting
borne by
the
grammar
writer,
this
more demanding than it already is.
several
as
be
options
when
highly structured
as
we
consider
CUF.
One
adding
option
preference
is to
information
systematically
add
preference information to each and every construct of the language, and then to
extend the system to allow processing strategies which are sensitive to this preference
information.
This is an
interesting possibility, but
not
the
one
pursued
we
consider
in this paper.
2
Background
2.1
As
Probabilistic language models
the
the
introductory
Hidden
tion
Markov
[HAJ90]
deterministic
HMM,
erates
one
an
as
example
Model.
well
as
in
nite-state
begins
output
in
a
of
This
a
probabilistic
model
word-tagging
automaton.
start
symbol,
is
state,
then
In
chooses
language
extensive
[Kup92].
order
chooses
continues until an end-state is reached.
in
a
a
to
The
model,
use
basic
generate
transition
transition
in
to
once
speech
model
recogni-
is
language
another
more.
a
non-
from
state,
This
an
gen-
process
The process is doubly stochastic, since
the choice of a transition is made using a (state-dependent) vector of transition
probabilities and
tabulating
a
given
3
the
state.
the
choice
probability
In
the
of
of
a
symbol
generating
word-tagging
to
a
generate
is
particular
application
controlled
symbol
described
by
by
when
Kupiec
a
matrix
one
is
in
[Kup92],
Such a search regime will only be of benet if we can abort the search process before the
complete search space is explored, since any form of heuristic search involves extra eort in
maintaining the necessary book-keeping information.
3
which we shall from now on call the tagger, the underlying Markov process operates over part-of-speech tags.
an
ambiguity
class
for
each
tags permitted for a word.
to
alleviate
the
sparse
word.
The
ambiguity
data
problems
take
the
which
would
by
of
a
the
means
probabilities
and
to
locally
timum
a
a
which
network,
prepared
a
maximises
in
large
the
corpus
corpus
optimal
[Bau72,
the
from
Training consists
of
Kup92].
analysis
In
words
set
of
In order
were
treated
output symbols of
lexicon,
of
a
the
search
for a
probability
was
generated
the
practice
of
set
of
the
the
is
no
using
example,
ambiguity
any
sequence
model.
is
this
of
certain
of
guessing
combination
Given
guarantee
be
emission
ambiguity
only
to
a
technology
a
lex-
converge
global
op-
perform
at
we
search
sequence
of
for
the
states
optimal
path
through
the
which provides the best
trained
account
of
Once we know the best
we can read o the disambiguated part-of-speech tags.
the
for
word-class
or
transition and
procedure
there
taggers
set
corpus,
that
by
training
although
a
tagged
This may
levels in many applications.
phase
looking for the
sequence of states,
For
the
This need not be annotated,
the sequence of ambiguity classes which were observed.
way
if
the
their ambiguity classes.
previously
text
solution,
close to state-of-the-art
In
map words to
manually
derived
observed
icon
a
of
lexicon
above.
classes
is
model.
but it must be possible to
rules,
arise
ambiguity classes as
It is assumed that a training corpus is available.
done
class
Only one of these will be correct in context.
individually, it is usual to
the Markov
The system has access to a lexicon which gives
known
lexicon
the
word
provided
\red",
with
making
the
the
Xerox
tagger
following
includes
input
strings
a
two
mildly
ambiguous.
N ormanbuystheredbottle
npvbzat(jj nn)
nn
N ormanisinthered
npbezinat(jj nn)
The task of the tagger is to use its statistical model to resolve the ambiguity,
which it achieves,
as witnessed by the following sample output
N ormanbuystheredbottle
npvbzatjj=2nn
N ormanisinthered
npbezinatnn=2
Here
the
numerical
suxes
indicate
has been considered for each tag.
sentence
the
size
of
the
ambiguity
class
which
For a more realistic example consider the rst
of the abstract of D
orre and Dorna's article on CUF [DD93]
W edescribethef ormalismCU F
ppssvbatnnnps=25
(ComprehensiveU nif icationF ormalism)
4
jjnnnn
whichhasbeendesignedasatoolf or
wdthvzbenvbn=2cs=2atnn=2in
writingandmakinguseof any
vbg=2ccvbg=2nn=2indti=3
kindof linguisticdescriptions
nn=2injjnns
rangingf romphonologytopragmatics:
vbginnnin=2nns=2
Note that many of the words have unambiguous part-of-speech, but that the
tagger is still choosing a good analysis from 25 2 2 2 2 2 2 3 2 2 2 = 38400
possibilities.
The
25
arises
because
\CUF", assigning all possible tags.
the
system
has
essentially
given
up
on
There is actually another word not present
in the lexicon, namely \pragmatics", but the system is capable of making a fair
(but wrong) guess on the basis of simple morphological analysis.
Compare
P ragmaticsishard:
nn=2bezrb=2
in which the tagger
correctly regards \pragmatics" as a singular noun, with:
P ragmaticsarehard:
nns=2berrb=2
in
which
it
makes
the
same
\pragmatics" as a plural
4
mistake
as
it
did
in
the
long
sentence,
and
treats
.
While acknowledging the deciencies of the theoretical model, this remains
a fairly impressive performance in coping with general text.
the
a
tagger
range
of
is
sensitive
available
to
preference
analyses.
But
information,
it
is not
yet
which
obvious
it
By its very nature
uses
how
to
the
choose
tagger
from
can
be
reconciled with the CUF style.
3
Reconstructing the tagger within CUF
In the next section we show how ambiguity classes and tags can be re-interpreted
within CUF's type discipline.
We provide an account of the way in which CUF
can be used to emulate the behaviour which the tagger shows at run-time.
do
not
since
display a
that
can
corresponding emulation for
conveniently
be
carried
out
the
training phase of
o-line,
and
does
not
the
We
tagger,
need
to
be
integrated within CUF.
Our ob jectives in carrying out this implementation work are the following:
We
aim
to
demonstrate
how
statistical
and
rationalist
systems
can
be
interleaved in CUF.
4
Which is even more reasonable in this case than it was earlier, since the sequence nns,ber
is many times more likely than the sequence nn,bez.
and eective natural language processing system.
5
This behaviour is indicative of a robust
We wish to show how simple program transformations can result in CUF
systems
having
the
same
asymptotic
complexity
as
their
counterparts
expressed in imperative languages.
For
We
want to provide a basis for future work on corpora.
reasons
of
time
single sentence.
be
legitimate
contingent
If
and
facts
and
eort
this were
appropriate
about
the
we
the
restricted
to
design
sentence,
design which is much more general.
to give more realistic coverage,
ourselves
only sentence
the
but
to
which we
system
since
it
is
in
the
cared
a
not,
way
we
treatment
about,
it
which
have
of
a
would
exploits
aimed
for
a
In the future we hope to extend the system
as well as systematic testing against interesting
corpora.
Some
tive
of
the
encoding
implementation
within
CUF
necessarily specic to
of
work
the
which
facilities
we
describe
which
the
involves
tagger
the
declara-
provides.
This
is
the particular model being encoded.
A second element of the work reported here is the prototyping of algorithms
which
depend
the
basic
the
system
on
CUF
in
the
use
of
search
implementation.
Prolog,
both
expressiveness of Prolog.
for
strategies
This
work
reasons
of
other
than
requires
eciency
us
and
those
to
provided
implement
because
of
by
part
the
of
extra
Once they have proved their worth, we envisage that
the new facilities will be added as primitives of future successors of CUF.
3.1
Types and sorts
CUF
makes
a
distinction between
dening relations over
straints
it
on
harder
the
for
compile-time,
cidable
form
the
feature
of
CUF
while
feature
and
terms.
sacrice
checking
and
relative simplicity of types makes
a
sorts.
while type
Sorts
implementation
types
satisability
types
terms,
to
degree
the
are
adopt
of
Sorts
more
possibility
of
denite clauses
propositional con-
expressive,
ecient
expressivity
it easier to see
are
axioms are
but
make
representations
for
run-time
the
sake
of
eciency.
at
de-
The
how they can be pressed into
service for statistics, but even for the limited goals of this note we will turn out
to need sorts as well.
To
a
rst approximation CUF
types can
of a conventional programming language.
the
it
types
have
convenient
about
the
types.
Thus
to
the
full
power
formulate
disjointness
of
subsumption
analogous
to
the
types
But another perspective reveals that
propositional
propositional
and
be seen as
logic.
constraints
relations
in
Syntactic
the
holding
form
sugar
of
between
makes
statements
particular
headd tr < sign:
states
that a head_dtr is necessarily also a sign while
headd tr < sign:
compld tr < sign:
states that there are (at least) two types which are signs, namely head_dtr and
compl_dtr.
However, it does not say anything about the relationships between
6
the
subtypes:
one
might
they might be disjoint.
include
the
other,
they
might
partially
This is CUF's \open world assumption".
overlap
or
On the other
hand:
headd tr compl
dtr < sign.
explicitly species the disjointness of head_dtr and comp_dtr.
syntactic
atomic
sugar
feature
features).
major
=
for
the
common
structures
In this case
(i.e.
case
that
feature
a
type
is
structures
we write statements
a
There is further
disjoint
which
enumeration
themselves
carry
of
no
like:
noun; verb; adj; prep; det; adv; part:
which has the same meaning as
major
=
nounverb adj prep detadv part:
noun < af s:
verb < af s:
:::
part < af s:
(In this specication afs is the predened built-in type for atomic feature structures).
We
structures
constant
3.2
call
the
members
constants,
because
of
the
they
disjoint
have
the
enumeration
behaviour
of
which
atomic
we
feature
expect
from
symbols in Prolog or Lisp.
The simple encoding of tagger facilities
We
now
turn to
the
the
corresponding
relationship which exists
ambiguity
classes.
Recall
between
that
the
part-of-speech
lexicon
assigns
tags
a
and
unique
ambiguity class (a set of possible tags) to every word, and that the tagger then
picks an element from this set.
Shifting focus, this means that at the end of the
tagging process each word in the input string will be associated not only with a
tag but with an ambiguity class from which this tag was chosen by the tagger.
Just
(it
as
there
could
could
be
arise
is
more
nns
from
or
than
nn),
any
of
one
there
way
is
several
to
pick
more
a
than
ambiguity
tag
one
classes,
for
the
way
to
see
word
get
below
\pragmatics"
the
for
tag
nns
details).
(it
We
therefore base the representation on a set of CUF constants standing for wordoccurrences.
with a
possible
All
of
=
are
actually
the
Word
combinations
the
covered
woc
These
specic tag.
of
possible combinations
occurrences are
two
occurring
potentially
end0; ppss1; vb2; at3; nn4;
nps5; vb5; nn5; nns5; np5;
wdt6; hvz 7; ben8; vbd9; vbn9;
cs10; ql 10; nn11; vb11;
in12; nn13; vbg 13; cc14;
dti15; ql 15; rb15; jj 16; nn16;
7
a
independent
possibilities needed
below.
of
specic
therefore joint
for
ambiguity
events
dimensions
the
class
describing the
example
of
variation.
sentence
are
jj 17; nns18; vbg 19;
in20; to20; nn21; nns21:
We
use
a
consistent
naming
scheme.
with ambiguity class 13 and tag
For
example
nn13
is
the
type
associated
nn.
The type tag is dened by a disjoint union of possible tags.
tag
end
=
nns
nps
ben
vbd
vb
ppss
np
wdt
vbn
cs
at
nn
hvz
ql invbg
ccdti rbjj to:
The type ambig is dened by a disjoint union of possible ambiguity classes.
ambig
=
ambig 0ambig1 ambig 2
ambig 3ambig4
ambig 5ambig6
ambig 7ambig8
ambig 9ambig10
ambig 11ambig12
ambig 13ambig14
ambig 15ambig16
ambig 17ambig18
ambig 19ambig20
ambig 21:
Ambiguity classes are sub-sets of woc
ambig 0
=
end0:
ambig 1
=
ppss1:ambig 2
=
vb2:ambig 3
=
at3:ambig 4
=
nn4:ambig 5
Tags are also subsets of woc which partition it in a dierent way.
end
=
ppss
end0:
ppss1:
=
vb
=
vb2; vb5; vb11 :
at
=
at3:
nn
nn4; nn5; nn11; nn13; nn16; nn21 :
=
nns
=
nns5; nns18; nns21:
nps
=
nps5:
np
=
np5:
wdt
=
wdt6:
hvz
=
hvz 7 :
ben
=
ben8:
vbd
=
vbd9:
vbn
=
vbn9:
cs
=
cs10:
ql
=
ql 10; ql 15 :
in
=
vbg
in12; in20 :
=
vbg 13; vbg 19 :
8
=
nps5; vb5; nn5; nns5; np5:ambig 6
cc
=
dti
cc14:
=
dti15:
rb
=
rb15:
jj
=
jj 16; jj 17 :
to
=
to20:
We provide a lexicon which associates each word with an ambiguity class.
This
is
once
a
deterministic
CUF
predicate
(CUF
uses
a
functional
notation,
but
again this is syntactic sugar for an underlying Prolog-style predicate, which can
in general be non-deterministic for any or all of its possible calling modes).
tagl ex(string ) > ambig:
tagl ex("we")
ambig 1:
:=
tagl ex("describe")
tagl ex("the")
tagl ex("a")
ambig 2:
:=
ambig 3:
:=
ambig 3:
:=
tagl ex("f ormalism")
tagl ex("phonology ")
tagl ex("C U F ")
tagl ex("has")
ambig 7:
ambig 8:
:=
tagl ex("designed")
tagl ex("as")
:=
tagl ex("tool ")
ambig 4:
:=
ambig 6:
:=
:=
tagl ex("been")
ambig 4:
ambig 5:
:=
tagl ex("which")
:=
ambig 9:
:=
ambig 10:
:=
ambig 11:
tagl ex("use")
:=
ambig 11:
tagl ex("f or ")
:=
ambig 12:
tagl ex("writing ")
:=
ambig 13:
tagl ex("making ")
:=
ambig 13:
tagl ex("and")
tagl ex("of ")
:=
:=
tagl ex("any ")
ambig 12:
:=
tagl ex("kind")
ambig 14:
ambig 15:
:=
ambig 16:
tagl ex("linguistic")
tagl ex("ranging ")
tagl ex("f rom")
tagl ex("to")
:=
:=
:=
:=
ambig 18:
ambig 19:
ambig 12:
ambig 20:
tagl ex("pragmatics")
We
ambig 17:
:=
tagl ex("descriptions")
:=
ambig 21:
provide a predicate which looks up strings of words in this lexicon.
tagl ookup(list) > list:
tagl ookup([])
:
:= []
tagl ookup([W Ws])
tagl ex(W )tag
[
:=
lookup(Ws)].
In the training phase the tagger picks up information from a corpus and records
it
as
information
about
discrete
formation which the tagger
probability
learns is:
9
distributions.
Specically,
the
in-
Information
about
state
transitions.
the context in which tags occur.
this.
This
expresses
information
about
The type mechanism says nothing about
To make progress with this we will turn to the use of denite clauses
in a later section.
Information about the probabilities that a particular tag will be expressed
in the observable input as each of its possible ambiguity classes.
The type
system says which ambiguity classes are possible, but does no more than
this.
If
required,
rst
step
we
can
towards
record
numerical information
integrating
the
functionality
in
of
the
the
grammar.
tagger
into
This is
CUF.
our
In
a
later section we will demonstrate how the information recorded during training
can be exploited in tandem with CUF's usual processing.
Recall
that
but
the
the
tagger
of
ambiguity
is
form
binary
joint
D
orre,
in
the
classes
are
classes.
in
its
the
The
relating
which
are
preference
tags
tagger
a
is
labels
to
the
nothing
therefore
procedures.
these
conditional
elements
to
are
attached
input
optimization
predicate
events,
tagger
concerned
ambiguity
the
in
less
of
woc
elegant
for
the
hidden
input
by
more
nor
uses
We
the
provide
these
(this
was
encoding
to
than
As
a
far
in
the
as
sequence
probabilities
the
suggested
which
sequence,
lexicon.
less
conditional
probabilities
state
form
of
of
a
corresponding
to
us
appeared
by
in
an
Jochen
earlier
draft).
gprob(woc) > number:
gprob(ppss1)
: :
:= 1 0
There is only one ambiguity class ambig1 can be generated from ppss but there
are three disjoint ambiguity classes which can be generated from verb.
gprob(vb2)
: :gprob(vb5)
:= 0 6
gprob(to20)
:
:= 0 05
:gprob(vb11)
:
::::
:= 0 35
: :
:= 1 0
The particular numbers displayed here are arbitrary.
The implementation uses
corresponding numbers from the models provided with the Xerox
tagger.
So far, we have encoded only half the information which the tagger collects
during
the
sitions
between
training
phase,
states.
because
To
facilities
of
add
denite
clause
CUF.
large to
list fully here) follows:
we
this
An
haven't
extract
tprob(tag; tag ) > number:
tprob(to; vb)
:= 0 898110
tprob(ql; jj )
:= 0 762018
tprob(ppss; vb)
tprob(at; nn)
:
:
:
:
:= 0 618695
:
:
:= 0 553838
tprob(nns; in)
tprob(jj; nn)
:
:
:
:
:= 0 516793
:
:
:= 0 502235
tprob(hvz; ben)
tprob(vbd; in)
:
:
:= 0 479057
:
said
information
:
:= 0 411837
:::
10
from
anything
we
make
the
about
further
relevant
the
use
tran-
of
predicate
the
(too
We
can
now
provide
a
clean
description
of
how
a
score
is
assigned
to
a
tag
sequence.
tags core(list; list) > number:tags core([W ]; [T ])
:=
gprob(T tagl ex(W )):
tags core([W 1; W 2Ws],[T1,T2 T s])
:=
mult(gprob(T 1tagl ex(W 1)); tprob(T 1; T 2); tags core([W 2Ws],[T2 T s])):
Recall that the sort tag_lex/1, having an implicit output argument (as do all
sorts) is similar to the Prolog predicate tag_lex/2, so the idiom T&tag_lex(W)
is like saying tag_lex(W,T) in a Prolog program, and gprob(T
would be tag_lex(W,T),gprob(T,Output) in a
together,
Prolog
&
tag_lex(W))
program.
Putting this
the rst clause would be
tags core([W ]; [T ]; Output)
:
tagl ex(W; T );
gprob(T ; Output):
in Prolog, losing the compactness of the CUF functional notation.
clause
uses
mult/3,
which
is
in
fact
an
indirect
call
to
Prolog
The second
arithmetic.
In
Prolog syntax the second clause would be:
tags core([W 1; W 2Ws],[T1,T2 T s]; Output)
:
tagl ex(W 1; T 1);
gprob(T 1; GP rob);
tprob(T 1; T 2; T P rob);
tags core([W 2Ws],[T2 T s]; Output2);
OutputisGP rob
This
translation
T P rob
is
Output2:
provided
only
as
an
aid
to
comprehension,
no
assertion
of
equivalence is intended, in particular, the operational semantics of CUF diers
from
that
of
Prolog.
While
the
CUF
is
already
somewhat
more
concise
than
the Prolog, would be better yet if CUF supported inx operators, allowing:
tags core([W 1; W 2Ws],[T1,T2 T s])
:=
gprob(T 1tagl ex(W 1))
tprob(T 1; T 2)
tags core([W 2Ws],[T2 T s]):
For future use, it actually pays to
rewrite.
tags core(list) > number:
tags core(W ords)
:=
tags core1(tagl ookup(W ords)):
tags core1(list) > number:
tags core1([T ])
:=
gprob(T ):
tags core1([T 1; T 2Ts])
:=
mult(grob(T 1); tprob(T 1; T 2);
tags core1([T 2Ts])).
11
tag_score as follows:
Note
that
gprob/1
vb2,vb5,pps1.
On
has
clauses
entry
ambig5, gprob then
to
which
gprob
are
the
dened
argument
in
is
terms
an
non-deterministically renes the
of
joint
ambiguity
events
class
like
such
input ambiguity class
as
by
adding a specic tag (here the choice is between nps5,vb5, nn5,nns5, and np5.
In
choosing
a
tag
it
also
obtains
a
score
for
that
choice,
which
is
returned
as
the value of gprob.
This
decomposition
makes
it
straightforward
to
dene
a
second
predicate
which returns more information than tag_score/1:
score
::
trace
:
val
:
number;
list:
tagt race(list) > score:
tagt race(X )
val
:
trace
The
:=
tags core1(T tagl ookup(X ))
:
T:
CUF
idiom
T&tag_lookup(X) is
simply a
means
of
giving
a
name
to
the
output variable of tag_lookup, in order that it can be referred to as the value
of the feature trace.
The overall eect is to nd a T which is a list of tags that have a score.
way
to
conceptualize
this
is
to
imagine
the
input
to
the
scoring
process
One
as
an
underspecied list of ambiguity classes which are rened to a list of tags by the
process of scoring.
This conceptualization, while useful, is slightly over-specic:
in point of fact CUF is allowed to evaluate goals in a variety of orders, and may
not respect our intuitions about what is input and what output.
At this point we have a simple HMM part-of-speech tagger which produces
the same solutions as an implementation in a more conventional language than
CUF. In order to make
We
must
greatly
ensure
further progress we need to do three related things:
that
concerned
the
if
our
system
system
is
is
acceptably
ecient.
slow,
CUF
since
is
We
in
will
any
not
case
be
not
tuned for speed, but we do not want to use algorithms whose asymptotic
complexity is signicantly worse than that of the best known methods.
We must clarify how the system and its probabilistic mechanisms interact
with other components.
This is of course crucial to the demonstration that
CUF is a exible and potentially useful technology
for parse-ranking.
We must add a specialized search mechanism in order to nd high-probability
paths without necessarily enumerating all paths.
Our approach to all these problems is to make extensive use of CUF's facilities
for
meta-programming,
already
developed.
In
building various tools
a
companion
paper
we
which
will
manipulate the
describe
the
programs
process
of
de-
veloping these tools in some detail, because we think that our experiences may
be
of
interest
not
only
to
people
concerned
with
statistical
NLP
but
also
to
anyone who needs to build similar tools which exploit CUF. But for this paper
we
will
use
a
broad
brush.
We
skim
over
many
details,
including
the
crucial
issue of using meta-programming to ensure that heavily used predicates are well
indexed.
12
3.3
Improving the encoding of tagger facilities
There are several deciencies in the tagging facilities as presented so far:
There
are
exponentially
many
tag
sequences
for
each
input
string,
so
it
will hardly ever be appropriate to simply enumerate them.
We
do
not
have
sucient
control
over
the
search
strategy
employed
by
eciency
on
the tagger.
Standard
the
algorithms
tabulation
strategy
of
and
for
tagging
reuse
chronological
of
and
the
training
solutions
backtracking
to
rely
for
their
can
sub-problems.
require
The
current
sub-problems
to
be
solved exponentially many times.
The evaluation of a syntactic
analysis is frequently related to
the probabilities of several tag sequences.
ing no more than one tag-sequence
The alternative
pursued here
appropriate search engine.
is to
is considered at a time.
use CUF's
Prolog
interface
It replaces depth-rst search with a breadth-rst regime.
It allows memoisation of the crucial tagging predicates.
denitive
integration
of
such
complex task, which we defer.
3.3.1
Breadth-rst
The rst step towards
to
build a
more
The one which we develop does the following things:
The
the sum of
Under chronological backtrack-
facilities
into
CUF
would
be
a
large
and
We aim only for a limited proof of concept.
tagging
a more convenient
default depth-rst search
strategy.
We
search
have
strategy
is to abandon CUF's
currently dened the tagging pro-
cess in terms of a recursive relation between a state and its predecessor, but we
now
need
a
slightly dierent
formulation,
under
which
we
view
the
process
as
driven by a probabilistic relation between sets of states and their predecessors.
This
leads
path.
inter
While
alia
to
a
breadth-rst
polynomial
search
is
tagging
a
algorithm
standard
which
technique,
it
is
nds
of
the
best
interest
to
show how it can be encoded within CUF,
We
begin
characterized
by
choosing
a
data
structure
sets
of
states.
A location within the input string.
2.
A
of
states,
each
of
which
is
associated
with
cost information and some trace-back information.
several
being
leave
Each
stateset
is
by:
1.
list
for
variations
the
nature
of
this
of
the
data
cost
structure,
and
with the
trace-back
a
current
dimension of
information.
We
the CUF representation of this information unspecic.
The CUF types for states and statesets
are as follows:
13
tag,
some
There are going to be
variation
therefore
© Copyright 2026 Paperzz