resolving automatic prepositional-phrase

1
RESOLVING AUTOMATIC PREPOSITIONAL PHRASE ATTACHMENTS BY
NON-STATISTICAL MEANS
MICHAEL MANOOKIN
Brigham Young University
Prepositional-phrase attachment is a topic of active research in the field of computational
linguistics. Properly attaching prepositional phrases to their pertinent constituent proves
straightforward for humans, but inferring these attachments in a cognitive modeling
system becomes difficult. For example, in the sentence, ‘Ralph threw the frisbee to
John,’ the prepositional phrase ‘to John’ will attach to the verb phrase ‘threw’. In another
example, ‘Joe saw the dog with fur,’ the prepositional phrase ‘with fur’ will attach
directly to the noun phrase ‘the dog.’ Humans would have little difficulty resolving these
examples, but for computers this would be difficult.
The literature is replete with attempts at resolving ambiguities in prepositionalphrase attachment, but the vast majority of these endeavors use purely statistical methods
(Hindle & Rooth 1993). However, statistical approaches are not appropriate or adequate
in accounting for inferring prepositional phrase attachments in cognitive modeling
systems, as human cognition is generally not a completely statistical process (Botterill &
Carruthers 1999:191-207).
How, then, can PP attachments be determined in a natural language processing
system based on cognitive modeling? This paper discusses three steps for accomplishing
this task: syntactic modeling, lexicon construction, and semantic modeling. Syntactic
modeling is achieved by establishing a syntactic representation. The second step
involves building a lexicon that contains subcategorization information (i.e. part of
speech, argument structure, etc.). This subcategorization information is then
bootstrapped to infer whether the prepositional phrase should be attached to the preceding
noun phrase or verb phrase. When increasing context shows an utterance untenable, it
can be reanalyzed subject to constraints described in the psycholinguistic literature
(Lewis 1993). Finally, a semantic model is created, which contains concept information
from the lexicon, along with semantic relationships between the concepts.
This paper describes techniques that the authors have used to train NL-Soar to infer
prepositional-phrase attachments during sentence processing. NL-Soar is a cognitive
modeling architecture applied to natural language, which uses WordNet as its lexicon.
WordNet is a machine-readable lexical database with over 100,000 entries, distributed by
Princeton University (WordNet). This lexicon has important subcategorization
information for most of its entries, which is very useful in fashioning an architecture
capable of ‘intelligent’ PP attachment. This paper discusses how the system performs PP
attachment as well as reanalysis in ‘garden path’ sentences.
2. OVERVIEW OF THE SOAR ARCHITECTURE. Newell and Simon presented the first version
of the Soar cognitive modeling architecture in 1982, and Newell later gave a detailed
description of the system in Unified Theories of Cognition (1990). Soar models human
processing, attention, and memory, even down to psychologically viable memory
distinctions between working, declarative, and procedural memory systems. Even so,
Newell decided that language was, at the time, too difficult a task to attempt: ‘Language
2
should be approached with caution and circumspection. A unified theory of cognition
must deal with it, but I will take it as something to be approached later rather than sooner’
(Newell 1990:16).
NL-Soar—the natural language implementation of the Soar architecture, was
outlined in Rick Lewis’s (1993) thesis at Carnegie Mellon University, and was
subsequently employed for use in modeling language behavior in several tasks including
those of F-14 pilots in combat situations (Jones et al. 1999). The Soar research group at
BYU presently works on NL-Soar (NL-Soar), and the current (7.3) version of NL-Soar
represents syntactic parses as X-bar syntactic structures and semantic representations as
lexical conceptual structures (LCS).
3. PREVIOUS WORK ON PP-ATTACHMENT. The vast majority of work in prepositionalphrase attachment has been done using statistical approaches to the problem. These
statistical approaches generally involve analyzing large annotated corpora and
determining the probability of an unknown attachment. The Penn Treebank (Penn
Treebank) is an annotated corpus containing tags for part-of-speech along with skeleton
syntactic and semantic parses. Computational linguists commonly use this and other
corpora for training programs, which, in turn, provide a statistical probability for each
potential attachment. For example, for the sentence, ‘I saw the man with the telescope,’ a
statistical parser might predict that the prepositional phrase (PP) ‘with a telescope’ might
have a 84% probability of attaching to the verb phrase (VP) ‘saw,’ and a 16% probability
of attaching to the determiner phrase (DP) ‘the man.’
4. ASSUMPTIONS CONCERNING LANGUAGE & COGNITION. Our approach makes two major
assumptions about the nature of human language processing: (1) that the mental lexicon
contains explicit subcategorization information and (2) that humans use this
subcategorization information to prefer one syntactic attachment to another and we make
such decisions using logical inference.
4.1. SUBCATEGORIZATION. The first of our major assumptions, that the mental lexicon
contains subcategorization, is based on the widely accepted notion of thematic roles (also
known as semantic roles, theta (θ) roles, etc.). According to Chomsky (1981), (1) verbs
(events) assign thematic roles to nouns (entities), and (2) these theta-role assignments are
predictable. For example, one sense of the transitive verb ‘prove’ assigns (subcategorizes
for) an actor theta role and a goal theta role, whereas a sense of the intransitive verb
‘vanish’ subcategorizes only for an actor theta role as illustrated in examples 1 and 2.
(1) The mathematics professor proved this theorem.
prove(Actor, Goal)
prove(the mathematics professor, this theorem)
(2) The book vanished.
vanish(Actor)
vanish(the book)
3
The WordNet lexicon applies the concept of subcategorization by assigning one or
more subcategorization ‘frames’ to each verb in the lexicon. Following are the verb
frames that deal with prepositional phrases (PP). Notice that verb frames 15, 16, 17, 18,
19, 27, and 31 subcategorize for particular prepositions.
4. Something is ---ing PP
15. Somebody ---s something to somebody
16. Somebody ---s something from somebody
17. Somebody ---s something with somebody
18. Somebody ---s something of somebody
19. Somebody ---s something on somebody
20. Somebody ---s somebody PP
21. Somebody ---s something PP
22. Somebody ---s PP
27. Somebody ---s to somebody
31. Somebody ---s something with something
This type of information is valuable for inferring syntactic attachments. For example, the
verb ‘read’ subcategorizes for two complements as in the sentence, ‘The linguist reads
novels with those glasses.’ In WordNet, ‘entice’ is annotated with verb frame number 20
(and a few others), which requires a prepositional phrase as the second complement. The
verb ‘enjoy,’ on the other hand, subcategorizes for only one complement, which is
illustrated in the sentence, ‘The linguist enjoys novels with illustrations.’ Examples 3
and 4 show the two sentences just mentioned, their potential argument structures, and the
argument structure representations for the sentence.
(3) The linguist reads novels with those glasses.
reads(NP, NP, PP)
reads(the linguist, novels) & with(reads, those glasses)
(4) The linguist enjoys novels with illustrations.
enjoys(NP, NP)
enjoys(the linguist, novels) & with(novels, illustrations)
These argument structures are realized syntactically as thematic-role assignment, and
the contrasting syntactic structures are reflected in the Figure 1.
4
Figure 1. Two contrasting syntactic parses with an N-attached PP (left) and a V-attached PP (right).
4.2. MENTATION AND INFERENCE. This approach also assumes that humans determine
syntactic attachments with rational inference, not statistically—this is certainly the more
controversial of the two claims, as it revisits the mentalism-connectionism debate, and,
despite the recent trends in linguistics toward non-rule approaches to language, our
method sides with the mentalists. We take this theoretical stance for several reasons. (1)
The mentalist theory of cognition accounts for known causal relationships between belief,
desire, and semantics—non-rule approaches cannot. (2) Connectionism and other nonrule approaches are not entirely cognitively plausible. (3) Non-rule based approaches do
not process at realistic rates to simulate the time course of human cognitive processing.
(4) Many non-rule-based approaches require a psychologically unrealistic amount of
training. (5) Connectionist approaches have difficulty accounting for cognitive
adaptation to a dynamic environment. (6) Non-rule-based approaches cannot simulate
sequential mental states, such as those required to bring about psychological affect. (7)
Non-rule approaches cannot handle psychological reanalysis, such as belief reanalysis
and syntactic reanalysis. I will briefly address cognitive plausibility, environmental
adaptability, sequential mental states, and psychological reanalysis.
4.2.1. COGNITIVE PLAUSIBILITY. Connectionist theory is based on the general notion that
nerve cells interconnect in a network or group of networks, and information is distributed
across these networks (Botterill & Carruthers 1999:197). Numerous neuroscience studies
illustrate, however, that cognitive processing in humans and other mammals is generally
local (not distributed) in nature (Corkin 1984; Hilts 1995; to name a few). The Parallel
Distributed Processing (PDP) theory, which inspired connectionism and neural network
approaches to cognition, is based on the notion that mental faculties are distributed across
a relatively large area (McClelland, et al. 1986); in other words, the theory contains no
arguments for cognitive modularity.
4.2.2. ENVIRONMENTAL ADAPTABILITY. An individual must possess mental representation
and cognitive structure for acclimation to a constantly fluid environment. ‘To get around
in the world, a cognizer must keep track of enduring individuals that have changing,
5
repeatable properties and relations. Doing this requires that mental predicates be applied
to mental subjects, and it requires the capacity to apply predicates to subjects on a vast
scale’ (Horgan & Tienson 1996:10-11). Put differently, ‘Humans (and other intelligent
creatures) need to collect, retain, update, and reason from a vast array of information …
There seems no way of making sense of this capacity except by supposing that it is
subserved by a system of compositionally structured representational states’ (Botterill &
Carruthers 1999:196). Environmental adaptability is a central tenet of cognitive
psychology, as human behavior depends on the ability of an individual to represent the
world (Chomsky 1959) and to revise those mental representations through reanalysis
(Peirce 1877).
The mentalist approach accounts for mental representation as a ‘language of thought’
(LoT) comprised of mental propositions and rule-governed transitions between those
propositions. Such an LoT is vital to the field of cognitive modeling in general and, more
specific to this paper, the field of natural language modeling. In his seminal work,
Newell (1990) outlines how an artificial intelligence ‘agent’ can represent mental states
and move between those states.
Newell (1990:383) appeals to the Johnson-Laird theory (1983), which claims that
mental representation of a concrete situation takes place by means of syllogisms, as seen
below in a classic example.
(5) Socrates is a man.
(6) All men are mortal.
∃x[Socrates(x) ∧ man(x)]
∀y[man(y) ⊃ mortal(y)]
According to this paradigm, when an individual reads a syllogism s/he ‘constructs an
internal model of a concrete situation that the premises assert’ (Newell 1990:383).
Example 1 contains two premises: the major premise (a) and the minor premise (b).
Several mental states are required for comprehension of how the major and minor
premises relate: (1) the human or AI unit must have a goal of understanding the
relationship between (a) and (b); and (2) once this goal state is realized, then subgoals are
used to learn how the constituents of premise (a) relate to the elements of premise (b).
These subgoals will be described in the following section—SEQUENTIAL MENTAL STATES.
4.2.3. SEQUENTIAL MENTAL STATES. A central requirement for a cognitive modeling
system is the ability to simulate sequential mental states. Many cognitive psychologists
and philosophers argue that cognition is goal-directed and presupposes a logical
progression between mental representations. The Soar architecture, as already
mentioned, represents states using syllogistic logic, so it can denote the conditions of and
associations between mental states. In the NL-Soar system morphology, syntax, and
semantics are represented as separate but connected mental states. In fact, NL-Soar maps
from the syntactic representation/state to a semantic representation/state, as illustrated in
the following example. I exclude the full syntactic parse in (b) because of length
considerations.
(7) The linguist enjoys novels with illustrations.
(8) …[ VP[V'[V[Venjoys]][NP[N'[ Nnovels]]] [PP [P'[Pwith][Nillustrations]…]]]]]…
(9) enjoys(the linguist, novels) & with(novels, illustrations)
6
NL-Soar uses logic operators to map between the syntax (b) and semantics (c).
Representing the syntactic and semantic states syllogistically and categorically allows
NL-Soar to denote the transitions between those states.
On the other hand, non-rule theories such as connectionism can represent separate
mental states, but cannot signify the transitions between them. Example ? demonstrates
why. A neural network would represent the syntactic (a) and semantic (b) representations
as different patterns of nodal activation; and it should, as these are distinct premises. So,
to produce a mapping between these two distinct representations would be purely
accidental, as there is no intrinsic association between the syntax and semantics in a nonrule system.
4.2.4. PSYCHOLOGICAL REANALYSIS. Generally, real-world premises are not clear-cut,
and, because of this, humans frequently reanalyze situations when a more complete
representation of the situation becomes available. Charles Sanders Peirce’s essay The
Fixation of Belief (1877) maintains that psychological reanalysis must proceed through
three basic states: (1) previous belief (stored in memory), (2) doubt cast upon state (1),
and (3) reanalysis of state (1) according to the new information in state (2) to arrive at a
new belief state. Peirce’s radical break from the long-held Cartesian view that decision
processes must start with belief gave birth to the field of pragmatism and inspired
psychologists and philosophers such as William James (especially in his classic essay The
Will to Believe), Chauncey Wright, John Dewey, and Josiah Royce.
Clark & Clark (1977) and other psycholinguistic researchers have established the
validity of psychological reanalysis in language. Rick Lewis outlines many of these
research studies and the ability of NL-Soar to deal with ambiguities through reanalysis,
especially with respect to garden-path sentences (1993).
Psychological reanalysis is conceptually similar to environmental adaptability and
sequential mental states—because Soar represents these states as syllogisms, when new
information casts doubt on previous belief states, Soar can use this new information to
reanalyze the previous belief state accordingly and generate an entirely new belief state.
And, once again, since connectionist representation cannot intrinsically relate one logical
state to another (because there are no logical states) any reanalysis that might occur is the
product of absolute chance—this is a problem for connectionist, nearest neighbor, and
analogical modeling approaches.
The following example illustrates the process of syntactic reanalysis in NL-Soar.
NL-Soar parses the sentence, ‘The magistrate accuses the terrorists from downtown of
treason.’
NL-Soar parses ‘the’ and lexical access (from WordNet) returns ‘the’ annotated as a
determiner. The procedure continues with ‘magistrate,’ which returns from lexical access
annotated as a plural noun (morphology takes place by a separate process, which we will
not describe in this paper). With two lexical items and their categories, the agent must
decide how the items relate syntactically. It then draws upon phrase-structure rules,
encoded in the system, to determine the possible syntactic relations between determiners
and nouns, and the corresponding structure is built. Under X-bar syntactic theory, ‘the
magistrate’ is constructed under a noun phrase (NP) headed by the noun ‘magistrate,’ as
illustrated in the following diagram.
7
(10)
[NP[N'[detthe] [Nmagistrate]]]
After this NP is successfully built, NL-Soar waits for the next word—accuses.
WordNet stores accuses unambiguously as a verb (the lemma being accuse), so accuses
is annotated as a verb (accuses.v). With this much information, the system builds a VP
for accuses.
(11)
[VP[V'[V[VPRESENTi][Vaccuses]]]]
This VP is then linked to the preceding NP under an IP node (and a CP node).
(12)
[CP[C'[IP[I'[VP[NP[N'[detthe][Nmagistrate]]][[i I][VP[V'[V[VPRESENTi][Vaccuses]]]]]]]]
When lexical access occurs for ‘accuses,’ two of the verb frames that return from
WordNet are frames 18 and 20.
18. Somebody ---s something of somebody
20. Somebody ---s somebody PP
So, ‘accuses’ is annotated as a verb with two complements—the first complement will be
a noun phrase and the second complement a prepositional phrase headed by the
preposition ‘of’ or another preposition. After the structure is built in example 13, and a
brief wait period, ‘the,’ then ‘terrorists’ are parsed into a noun phrase similarly to ‘the
magistrate’ and linked to as the first complement of ‘accuses.’
(13)
(14)
[NP[N'[detthe] [Nterrorists]]]
…[VP[V'[V[Vaccuses]][NP[N'[detthe][Nterrorists]]]]]…
Following ‘the terrorists,’ NL-Soar parses the preposition ‘from,’ which fits into the
general preposition slot in the second complement position. Since such a syntactic link is
acceptable, the link succeeds and the corresponding structure is built.
(15) …[VP[V'[V[Vaccuses]][NP[N'[detthe][Nterrorists]]] [PP[P'[Pfrom][Ndowntown]]]]]…
Notice that this construction is incorrect, but it is allowed at this point because the
subcategorization allows for it. Fortunately, the Soar system, as already described, is
capable of reanalysis, and this is precisely what happens when the next prepositional
phrase ‘of treason’ is parsed.
When ‘of treason,’ enters working memory, as with all of the other words in the
sentence, a rule (operator) is proposed to learn what to do with this new phrase (‘of
treason’). There are two possible syntactic decisions at this point: (1) adjoin ‘of treason’
to the N' governing ‘downtown’ or (2) link ‘of treason’ into the second complement slot
of ‘accuses.’ In this situation, NL-Soar prefers the second choice (link this prepositional
phrase into the second complement slot of the verb), because, as already mentioned,
‘accuses’ specifically subcategorizes for the preposition ‘of’ but not ‘from’? In order for
8
this to occur, the previous linkage between ‘from’ and the ‘accuses’ must be ‘snipped’
and the new syntactic structure is remade—‘from downtown’ becomes an adjunct of
‘terrorists’ and ‘of terrorism’ becomes the second complement of ‘accuses’.
(16)…[VP[V'[V[Vaccuses]][NP[N'[…[detthe][Nterrorists]]]][PP[P'[…[Pof][Nterrorism]]]]]]…
As already mentioned, Soar learns in order to accomplish goals, and models working
memory. Figure 2 below illustrates the working memory processing that occurred in
comprehending the sentence, ‘The magistrate accuses the terrorists from downtown of
treason.’ The x-axis shows the time course for processing the sentence, and the y-axis
represents the number of active items in working memory. The peaks on the graph
correspond to syntactic linking of constituents into the tree, while the troughs are periods
when NL-Soar waits for the next word to enter the phonological buffer. Peak A reflects
the point at which the syntactic reanalysis takes place. Notice that this is the highest
point on the graph, meaning that working memory is taxed maximally at this point.
A
Figure 2. Working memory processing in NL-Soar.
The type of information in Figure 2 has been verified as hippocampal ‘population
spikes’ in research on rats, mice, and macaque monkeys. These ‘population spikes’ look
quite similar to what is observed in Figure 2. This type of pattern cannot be verified in
human working memory (generally, the medial-temporal lobe structures), however, as
this type of imaging is only available through insertion of electrodes—a practice that,
fortunately, is not considered ethical.
Figure 3 reflects the learning that NL-Soar utilizes in parsing the same sentence as
the previous figure. The x-axis is the time course of processing, and the y-axis represents
the use of previously learned items in parsing the sentence. Once again, point A on
Figure 3 is the position at which the syntactic reanalysis occurs. Point Z (the highest
point on the graph), on the other hand, represents linking of ‘the terrorists’ into the
syntactic tree. The spike is high at this point because NL-Soar has previously learned
how to link in the noun phrase ‘the magistrate’ and this learning is utilized at this point,
which makes the process of linking ‘the terrorists’ faster than any other syntactic linkage.
9
Z
A
Figure 3. Learning in NL-Soar.
4.3. THE PLACE OF NON-RULE THEORIES IN LANGUAGE PROCESSING. This is not to say that
these approaches (connectionism, nearest neighbor, analogical modeling, and other nonrule theories) have no value in a cognitive theory/architecture. David Marr (1982)
described three possible levels of cognitive representation. The top level concerns itself
with reallocation of attention between mental processes. The middle level represents the
actual mental states (premises) and their transitions, which I have described in some
detail already. The lowest tier of representation physically implements state transitions.
He (and Botterill & Carruthers 1999:197) suggests that non-rule theories might be useful
at the lowest level, but should not be applied any higher.
Non-rule models might be implemented in Soar at this lower level by, for example,
using it to decide between two (or more) equally preferred rules. Another possible
application would be to use analogical modeling for morphological processing. In fact,
an interesting experiment would be to compare analogical modeling (a non-rule
approach) and finite state modeling (a rule-based approach) for morphological processing
in NL-Soar.
6. CONCLUSION AND FUTURE WORK. Resolution of prepositional-phrase attachment is still
an open issue in natural language processing. This paper has illustrated the usefulness of
using a cognitive modeling system that utilizes subcategorization information in order to
infer attachments. Using NL-Soar to model language comprehension and generation is a
step in the right direction to understanding how humans process language. The method
outlined in this paper—using subcategorization to infer syntactic prepositional phrase
attachment—is useful for deciding other types of syntactic attachment such as
complementizers, infinitivals, etc.
7. REFERENCES.
Botterill, George & Peter Carruthers. 1999. The philosophy of psychology. Cambridge
University Press: Cambridge, UK.
10
Chomsky, Noam. 1959. [Review of Skinner’s Verbal Behavior.] Language 35:26-58.
---. 1981. Lectures on government and binding. Foris: Dordrecht.
Clark, Herbert & Eve Clark. 1977. The psychology of language: An introduction to
psycholinguistics. Harcourt Brace Jovanovich: New York.
Corkin, Suzanne. 1984. Lasting consequences of bilateral medial temporal lobectomy:
Clinical course and experimental findings in H.M. Semin. Neurol. 4:249-259.
Hilts, Philip. 1995. Memory’s ghost: The strange tale of Mr. M. and the nature of
memory. Simon and Schuster: New York.
Hindle, Donald & Mats Rooth. 1993. Structural ambiguity and lexical relations.
Computational Linguistics 19(1):103-120.
Horgan, Terence & John Tienson. 1996. Connectionism and the philosophy of
psychology. MIT Press: Cambridge, MA.
Johnson-Laird, Philip. 1983. Mental models. Harvard University Press: Cambridge,
MA.
Jones, Randolph, John Laird, Paul Nielsen, Karen Coulter, Patrick Kenney & Frank Koss.
1999. Automated intelligent pilots for combat flight simulation. AI Magazine 20:27-41.
Lewis, Richard. 1993. An architecturally-based theory of human sentence
comprehension. PhD thesis, Carnegie Mellon University, School of Computer Science.
Marr, David. 1982. Vision. MIT Press: Cambridge, MA.
McClelland, James, David Rumelhart & the PDP Research Group. 1986. Parallel
distributed processing. MIT Press: Cambridge, MA.
Newell, Alan. 1990. Unified theories of cognition. Harvard University Press:
Cambridge, MA.
NL-Soar. http://linguistics.byu.edu/nlsoar/. (Accessed on 1 Sep. 2003).
Peirce, Charles. 1877. The fixation of belief. Popular Science Monthly 12:1-15.
Penn Treebank. http://www.cis.upenn.edu/~treebank/home.html. (Accessed on 1 Sep.
2003).
WordNet. http://www.cogsci.princeton.edu/~wn/. (Accessed on 1 Sep. 2003).