1 RESOLVING AUTOMATIC PREPOSITIONAL PHRASE ATTACHMENTS BY NON-STATISTICAL MEANS MICHAEL MANOOKIN Brigham Young University Prepositional-phrase attachment is a topic of active research in the field of computational linguistics. Properly attaching prepositional phrases to their pertinent constituent proves straightforward for humans, but inferring these attachments in a cognitive modeling system becomes difficult. For example, in the sentence, ‘Ralph threw the frisbee to John,’ the prepositional phrase ‘to John’ will attach to the verb phrase ‘threw’. In another example, ‘Joe saw the dog with fur,’ the prepositional phrase ‘with fur’ will attach directly to the noun phrase ‘the dog.’ Humans would have little difficulty resolving these examples, but for computers this would be difficult. The literature is replete with attempts at resolving ambiguities in prepositionalphrase attachment, but the vast majority of these endeavors use purely statistical methods (Hindle & Rooth 1993). However, statistical approaches are not appropriate or adequate in accounting for inferring prepositional phrase attachments in cognitive modeling systems, as human cognition is generally not a completely statistical process (Botterill & Carruthers 1999:191-207). How, then, can PP attachments be determined in a natural language processing system based on cognitive modeling? This paper discusses three steps for accomplishing this task: syntactic modeling, lexicon construction, and semantic modeling. Syntactic modeling is achieved by establishing a syntactic representation. The second step involves building a lexicon that contains subcategorization information (i.e. part of speech, argument structure, etc.). This subcategorization information is then bootstrapped to infer whether the prepositional phrase should be attached to the preceding noun phrase or verb phrase. When increasing context shows an utterance untenable, it can be reanalyzed subject to constraints described in the psycholinguistic literature (Lewis 1993). Finally, a semantic model is created, which contains concept information from the lexicon, along with semantic relationships between the concepts. This paper describes techniques that the authors have used to train NL-Soar to infer prepositional-phrase attachments during sentence processing. NL-Soar is a cognitive modeling architecture applied to natural language, which uses WordNet as its lexicon. WordNet is a machine-readable lexical database with over 100,000 entries, distributed by Princeton University (WordNet). This lexicon has important subcategorization information for most of its entries, which is very useful in fashioning an architecture capable of ‘intelligent’ PP attachment. This paper discusses how the system performs PP attachment as well as reanalysis in ‘garden path’ sentences. 2. OVERVIEW OF THE SOAR ARCHITECTURE. Newell and Simon presented the first version of the Soar cognitive modeling architecture in 1982, and Newell later gave a detailed description of the system in Unified Theories of Cognition (1990). Soar models human processing, attention, and memory, even down to psychologically viable memory distinctions between working, declarative, and procedural memory systems. Even so, Newell decided that language was, at the time, too difficult a task to attempt: ‘Language 2 should be approached with caution and circumspection. A unified theory of cognition must deal with it, but I will take it as something to be approached later rather than sooner’ (Newell 1990:16). NL-Soar—the natural language implementation of the Soar architecture, was outlined in Rick Lewis’s (1993) thesis at Carnegie Mellon University, and was subsequently employed for use in modeling language behavior in several tasks including those of F-14 pilots in combat situations (Jones et al. 1999). The Soar research group at BYU presently works on NL-Soar (NL-Soar), and the current (7.3) version of NL-Soar represents syntactic parses as X-bar syntactic structures and semantic representations as lexical conceptual structures (LCS). 3. PREVIOUS WORK ON PP-ATTACHMENT. The vast majority of work in prepositionalphrase attachment has been done using statistical approaches to the problem. These statistical approaches generally involve analyzing large annotated corpora and determining the probability of an unknown attachment. The Penn Treebank (Penn Treebank) is an annotated corpus containing tags for part-of-speech along with skeleton syntactic and semantic parses. Computational linguists commonly use this and other corpora for training programs, which, in turn, provide a statistical probability for each potential attachment. For example, for the sentence, ‘I saw the man with the telescope,’ a statistical parser might predict that the prepositional phrase (PP) ‘with a telescope’ might have a 84% probability of attaching to the verb phrase (VP) ‘saw,’ and a 16% probability of attaching to the determiner phrase (DP) ‘the man.’ 4. ASSUMPTIONS CONCERNING LANGUAGE & COGNITION. Our approach makes two major assumptions about the nature of human language processing: (1) that the mental lexicon contains explicit subcategorization information and (2) that humans use this subcategorization information to prefer one syntactic attachment to another and we make such decisions using logical inference. 4.1. SUBCATEGORIZATION. The first of our major assumptions, that the mental lexicon contains subcategorization, is based on the widely accepted notion of thematic roles (also known as semantic roles, theta (θ) roles, etc.). According to Chomsky (1981), (1) verbs (events) assign thematic roles to nouns (entities), and (2) these theta-role assignments are predictable. For example, one sense of the transitive verb ‘prove’ assigns (subcategorizes for) an actor theta role and a goal theta role, whereas a sense of the intransitive verb ‘vanish’ subcategorizes only for an actor theta role as illustrated in examples 1 and 2. (1) The mathematics professor proved this theorem. prove(Actor, Goal) prove(the mathematics professor, this theorem) (2) The book vanished. vanish(Actor) vanish(the book) 3 The WordNet lexicon applies the concept of subcategorization by assigning one or more subcategorization ‘frames’ to each verb in the lexicon. Following are the verb frames that deal with prepositional phrases (PP). Notice that verb frames 15, 16, 17, 18, 19, 27, and 31 subcategorize for particular prepositions. 4. Something is ---ing PP 15. Somebody ---s something to somebody 16. Somebody ---s something from somebody 17. Somebody ---s something with somebody 18. Somebody ---s something of somebody 19. Somebody ---s something on somebody 20. Somebody ---s somebody PP 21. Somebody ---s something PP 22. Somebody ---s PP 27. Somebody ---s to somebody 31. Somebody ---s something with something This type of information is valuable for inferring syntactic attachments. For example, the verb ‘read’ subcategorizes for two complements as in the sentence, ‘The linguist reads novels with those glasses.’ In WordNet, ‘entice’ is annotated with verb frame number 20 (and a few others), which requires a prepositional phrase as the second complement. The verb ‘enjoy,’ on the other hand, subcategorizes for only one complement, which is illustrated in the sentence, ‘The linguist enjoys novels with illustrations.’ Examples 3 and 4 show the two sentences just mentioned, their potential argument structures, and the argument structure representations for the sentence. (3) The linguist reads novels with those glasses. reads(NP, NP, PP) reads(the linguist, novels) & with(reads, those glasses) (4) The linguist enjoys novels with illustrations. enjoys(NP, NP) enjoys(the linguist, novels) & with(novels, illustrations) These argument structures are realized syntactically as thematic-role assignment, and the contrasting syntactic structures are reflected in the Figure 1. 4 Figure 1. Two contrasting syntactic parses with an N-attached PP (left) and a V-attached PP (right). 4.2. MENTATION AND INFERENCE. This approach also assumes that humans determine syntactic attachments with rational inference, not statistically—this is certainly the more controversial of the two claims, as it revisits the mentalism-connectionism debate, and, despite the recent trends in linguistics toward non-rule approaches to language, our method sides with the mentalists. We take this theoretical stance for several reasons. (1) The mentalist theory of cognition accounts for known causal relationships between belief, desire, and semantics—non-rule approaches cannot. (2) Connectionism and other nonrule approaches are not entirely cognitively plausible. (3) Non-rule based approaches do not process at realistic rates to simulate the time course of human cognitive processing. (4) Many non-rule-based approaches require a psychologically unrealistic amount of training. (5) Connectionist approaches have difficulty accounting for cognitive adaptation to a dynamic environment. (6) Non-rule-based approaches cannot simulate sequential mental states, such as those required to bring about psychological affect. (7) Non-rule approaches cannot handle psychological reanalysis, such as belief reanalysis and syntactic reanalysis. I will briefly address cognitive plausibility, environmental adaptability, sequential mental states, and psychological reanalysis. 4.2.1. COGNITIVE PLAUSIBILITY. Connectionist theory is based on the general notion that nerve cells interconnect in a network or group of networks, and information is distributed across these networks (Botterill & Carruthers 1999:197). Numerous neuroscience studies illustrate, however, that cognitive processing in humans and other mammals is generally local (not distributed) in nature (Corkin 1984; Hilts 1995; to name a few). The Parallel Distributed Processing (PDP) theory, which inspired connectionism and neural network approaches to cognition, is based on the notion that mental faculties are distributed across a relatively large area (McClelland, et al. 1986); in other words, the theory contains no arguments for cognitive modularity. 4.2.2. ENVIRONMENTAL ADAPTABILITY. An individual must possess mental representation and cognitive structure for acclimation to a constantly fluid environment. ‘To get around in the world, a cognizer must keep track of enduring individuals that have changing, 5 repeatable properties and relations. Doing this requires that mental predicates be applied to mental subjects, and it requires the capacity to apply predicates to subjects on a vast scale’ (Horgan & Tienson 1996:10-11). Put differently, ‘Humans (and other intelligent creatures) need to collect, retain, update, and reason from a vast array of information … There seems no way of making sense of this capacity except by supposing that it is subserved by a system of compositionally structured representational states’ (Botterill & Carruthers 1999:196). Environmental adaptability is a central tenet of cognitive psychology, as human behavior depends on the ability of an individual to represent the world (Chomsky 1959) and to revise those mental representations through reanalysis (Peirce 1877). The mentalist approach accounts for mental representation as a ‘language of thought’ (LoT) comprised of mental propositions and rule-governed transitions between those propositions. Such an LoT is vital to the field of cognitive modeling in general and, more specific to this paper, the field of natural language modeling. In his seminal work, Newell (1990) outlines how an artificial intelligence ‘agent’ can represent mental states and move between those states. Newell (1990:383) appeals to the Johnson-Laird theory (1983), which claims that mental representation of a concrete situation takes place by means of syllogisms, as seen below in a classic example. (5) Socrates is a man. (6) All men are mortal. ∃x[Socrates(x) ∧ man(x)] ∀y[man(y) ⊃ mortal(y)] According to this paradigm, when an individual reads a syllogism s/he ‘constructs an internal model of a concrete situation that the premises assert’ (Newell 1990:383). Example 1 contains two premises: the major premise (a) and the minor premise (b). Several mental states are required for comprehension of how the major and minor premises relate: (1) the human or AI unit must have a goal of understanding the relationship between (a) and (b); and (2) once this goal state is realized, then subgoals are used to learn how the constituents of premise (a) relate to the elements of premise (b). These subgoals will be described in the following section—SEQUENTIAL MENTAL STATES. 4.2.3. SEQUENTIAL MENTAL STATES. A central requirement for a cognitive modeling system is the ability to simulate sequential mental states. Many cognitive psychologists and philosophers argue that cognition is goal-directed and presupposes a logical progression between mental representations. The Soar architecture, as already mentioned, represents states using syllogistic logic, so it can denote the conditions of and associations between mental states. In the NL-Soar system morphology, syntax, and semantics are represented as separate but connected mental states. In fact, NL-Soar maps from the syntactic representation/state to a semantic representation/state, as illustrated in the following example. I exclude the full syntactic parse in (b) because of length considerations. (7) The linguist enjoys novels with illustrations. (8) …[ VP[V'[V[Venjoys]][NP[N'[ Nnovels]]] [PP [P'[Pwith][Nillustrations]…]]]]]… (9) enjoys(the linguist, novels) & with(novels, illustrations) 6 NL-Soar uses logic operators to map between the syntax (b) and semantics (c). Representing the syntactic and semantic states syllogistically and categorically allows NL-Soar to denote the transitions between those states. On the other hand, non-rule theories such as connectionism can represent separate mental states, but cannot signify the transitions between them. Example ? demonstrates why. A neural network would represent the syntactic (a) and semantic (b) representations as different patterns of nodal activation; and it should, as these are distinct premises. So, to produce a mapping between these two distinct representations would be purely accidental, as there is no intrinsic association between the syntax and semantics in a nonrule system. 4.2.4. PSYCHOLOGICAL REANALYSIS. Generally, real-world premises are not clear-cut, and, because of this, humans frequently reanalyze situations when a more complete representation of the situation becomes available. Charles Sanders Peirce’s essay The Fixation of Belief (1877) maintains that psychological reanalysis must proceed through three basic states: (1) previous belief (stored in memory), (2) doubt cast upon state (1), and (3) reanalysis of state (1) according to the new information in state (2) to arrive at a new belief state. Peirce’s radical break from the long-held Cartesian view that decision processes must start with belief gave birth to the field of pragmatism and inspired psychologists and philosophers such as William James (especially in his classic essay The Will to Believe), Chauncey Wright, John Dewey, and Josiah Royce. Clark & Clark (1977) and other psycholinguistic researchers have established the validity of psychological reanalysis in language. Rick Lewis outlines many of these research studies and the ability of NL-Soar to deal with ambiguities through reanalysis, especially with respect to garden-path sentences (1993). Psychological reanalysis is conceptually similar to environmental adaptability and sequential mental states—because Soar represents these states as syllogisms, when new information casts doubt on previous belief states, Soar can use this new information to reanalyze the previous belief state accordingly and generate an entirely new belief state. And, once again, since connectionist representation cannot intrinsically relate one logical state to another (because there are no logical states) any reanalysis that might occur is the product of absolute chance—this is a problem for connectionist, nearest neighbor, and analogical modeling approaches. The following example illustrates the process of syntactic reanalysis in NL-Soar. NL-Soar parses the sentence, ‘The magistrate accuses the terrorists from downtown of treason.’ NL-Soar parses ‘the’ and lexical access (from WordNet) returns ‘the’ annotated as a determiner. The procedure continues with ‘magistrate,’ which returns from lexical access annotated as a plural noun (morphology takes place by a separate process, which we will not describe in this paper). With two lexical items and their categories, the agent must decide how the items relate syntactically. It then draws upon phrase-structure rules, encoded in the system, to determine the possible syntactic relations between determiners and nouns, and the corresponding structure is built. Under X-bar syntactic theory, ‘the magistrate’ is constructed under a noun phrase (NP) headed by the noun ‘magistrate,’ as illustrated in the following diagram. 7 (10) [NP[N'[detthe] [Nmagistrate]]] After this NP is successfully built, NL-Soar waits for the next word—accuses. WordNet stores accuses unambiguously as a verb (the lemma being accuse), so accuses is annotated as a verb (accuses.v). With this much information, the system builds a VP for accuses. (11) [VP[V'[V[VPRESENTi][Vaccuses]]]] This VP is then linked to the preceding NP under an IP node (and a CP node). (12) [CP[C'[IP[I'[VP[NP[N'[detthe][Nmagistrate]]][[i I][VP[V'[V[VPRESENTi][Vaccuses]]]]]]]] When lexical access occurs for ‘accuses,’ two of the verb frames that return from WordNet are frames 18 and 20. 18. Somebody ---s something of somebody 20. Somebody ---s somebody PP So, ‘accuses’ is annotated as a verb with two complements—the first complement will be a noun phrase and the second complement a prepositional phrase headed by the preposition ‘of’ or another preposition. After the structure is built in example 13, and a brief wait period, ‘the,’ then ‘terrorists’ are parsed into a noun phrase similarly to ‘the magistrate’ and linked to as the first complement of ‘accuses.’ (13) (14) [NP[N'[detthe] [Nterrorists]]] …[VP[V'[V[Vaccuses]][NP[N'[detthe][Nterrorists]]]]]… Following ‘the terrorists,’ NL-Soar parses the preposition ‘from,’ which fits into the general preposition slot in the second complement position. Since such a syntactic link is acceptable, the link succeeds and the corresponding structure is built. (15) …[VP[V'[V[Vaccuses]][NP[N'[detthe][Nterrorists]]] [PP[P'[Pfrom][Ndowntown]]]]]… Notice that this construction is incorrect, but it is allowed at this point because the subcategorization allows for it. Fortunately, the Soar system, as already described, is capable of reanalysis, and this is precisely what happens when the next prepositional phrase ‘of treason’ is parsed. When ‘of treason,’ enters working memory, as with all of the other words in the sentence, a rule (operator) is proposed to learn what to do with this new phrase (‘of treason’). There are two possible syntactic decisions at this point: (1) adjoin ‘of treason’ to the N' governing ‘downtown’ or (2) link ‘of treason’ into the second complement slot of ‘accuses.’ In this situation, NL-Soar prefers the second choice (link this prepositional phrase into the second complement slot of the verb), because, as already mentioned, ‘accuses’ specifically subcategorizes for the preposition ‘of’ but not ‘from’? In order for 8 this to occur, the previous linkage between ‘from’ and the ‘accuses’ must be ‘snipped’ and the new syntactic structure is remade—‘from downtown’ becomes an adjunct of ‘terrorists’ and ‘of terrorism’ becomes the second complement of ‘accuses’. (16)…[VP[V'[V[Vaccuses]][NP[N'[…[detthe][Nterrorists]]]][PP[P'[…[Pof][Nterrorism]]]]]]… As already mentioned, Soar learns in order to accomplish goals, and models working memory. Figure 2 below illustrates the working memory processing that occurred in comprehending the sentence, ‘The magistrate accuses the terrorists from downtown of treason.’ The x-axis shows the time course for processing the sentence, and the y-axis represents the number of active items in working memory. The peaks on the graph correspond to syntactic linking of constituents into the tree, while the troughs are periods when NL-Soar waits for the next word to enter the phonological buffer. Peak A reflects the point at which the syntactic reanalysis takes place. Notice that this is the highest point on the graph, meaning that working memory is taxed maximally at this point. A Figure 2. Working memory processing in NL-Soar. The type of information in Figure 2 has been verified as hippocampal ‘population spikes’ in research on rats, mice, and macaque monkeys. These ‘population spikes’ look quite similar to what is observed in Figure 2. This type of pattern cannot be verified in human working memory (generally, the medial-temporal lobe structures), however, as this type of imaging is only available through insertion of electrodes—a practice that, fortunately, is not considered ethical. Figure 3 reflects the learning that NL-Soar utilizes in parsing the same sentence as the previous figure. The x-axis is the time course of processing, and the y-axis represents the use of previously learned items in parsing the sentence. Once again, point A on Figure 3 is the position at which the syntactic reanalysis occurs. Point Z (the highest point on the graph), on the other hand, represents linking of ‘the terrorists’ into the syntactic tree. The spike is high at this point because NL-Soar has previously learned how to link in the noun phrase ‘the magistrate’ and this learning is utilized at this point, which makes the process of linking ‘the terrorists’ faster than any other syntactic linkage. 9 Z A Figure 3. Learning in NL-Soar. 4.3. THE PLACE OF NON-RULE THEORIES IN LANGUAGE PROCESSING. This is not to say that these approaches (connectionism, nearest neighbor, analogical modeling, and other nonrule theories) have no value in a cognitive theory/architecture. David Marr (1982) described three possible levels of cognitive representation. The top level concerns itself with reallocation of attention between mental processes. The middle level represents the actual mental states (premises) and their transitions, which I have described in some detail already. The lowest tier of representation physically implements state transitions. He (and Botterill & Carruthers 1999:197) suggests that non-rule theories might be useful at the lowest level, but should not be applied any higher. Non-rule models might be implemented in Soar at this lower level by, for example, using it to decide between two (or more) equally preferred rules. Another possible application would be to use analogical modeling for morphological processing. In fact, an interesting experiment would be to compare analogical modeling (a non-rule approach) and finite state modeling (a rule-based approach) for morphological processing in NL-Soar. 6. CONCLUSION AND FUTURE WORK. Resolution of prepositional-phrase attachment is still an open issue in natural language processing. This paper has illustrated the usefulness of using a cognitive modeling system that utilizes subcategorization information in order to infer attachments. Using NL-Soar to model language comprehension and generation is a step in the right direction to understanding how humans process language. The method outlined in this paper—using subcategorization to infer syntactic prepositional phrase attachment—is useful for deciding other types of syntactic attachment such as complementizers, infinitivals, etc. 7. REFERENCES. Botterill, George & Peter Carruthers. 1999. The philosophy of psychology. Cambridge University Press: Cambridge, UK. 10 Chomsky, Noam. 1959. [Review of Skinner’s Verbal Behavior.] Language 35:26-58. ---. 1981. Lectures on government and binding. Foris: Dordrecht. Clark, Herbert & Eve Clark. 1977. The psychology of language: An introduction to psycholinguistics. Harcourt Brace Jovanovich: New York. Corkin, Suzanne. 1984. Lasting consequences of bilateral medial temporal lobectomy: Clinical course and experimental findings in H.M. Semin. Neurol. 4:249-259. Hilts, Philip. 1995. Memory’s ghost: The strange tale of Mr. M. and the nature of memory. Simon and Schuster: New York. Hindle, Donald & Mats Rooth. 1993. Structural ambiguity and lexical relations. Computational Linguistics 19(1):103-120. Horgan, Terence & John Tienson. 1996. Connectionism and the philosophy of psychology. MIT Press: Cambridge, MA. Johnson-Laird, Philip. 1983. Mental models. Harvard University Press: Cambridge, MA. Jones, Randolph, John Laird, Paul Nielsen, Karen Coulter, Patrick Kenney & Frank Koss. 1999. Automated intelligent pilots for combat flight simulation. AI Magazine 20:27-41. Lewis, Richard. 1993. An architecturally-based theory of human sentence comprehension. PhD thesis, Carnegie Mellon University, School of Computer Science. Marr, David. 1982. Vision. MIT Press: Cambridge, MA. McClelland, James, David Rumelhart & the PDP Research Group. 1986. Parallel distributed processing. MIT Press: Cambridge, MA. Newell, Alan. 1990. Unified theories of cognition. Harvard University Press: Cambridge, MA. NL-Soar. http://linguistics.byu.edu/nlsoar/. (Accessed on 1 Sep. 2003). Peirce, Charles. 1877. The fixation of belief. Popular Science Monthly 12:1-15. Penn Treebank. http://www.cis.upenn.edu/~treebank/home.html. (Accessed on 1 Sep. 2003). WordNet. http://www.cogsci.princeton.edu/~wn/. (Accessed on 1 Sep. 2003).
© Copyright 2026 Paperzz