PSYCH 150 / LIN 155 syn lab UCI COGNITIVE SCIENCES 02.26.13: Psychology of Language Prof. Jon Sprouse Introduction to Sentences 1 Sentence processing is an interesting puzzle... The boy appeared to the girl to be nice. The boy appealed to the girl to be nice. He looks kind. 2 Sentence processing is an interesting puzzle... The boy appeared to the girl to be nice. The boy appealed to the girl to be nice. Please be kind. 3 Why is sentence processing so interesting? lexical Matching the signal to stored representations sensory conceptual sounds sentences words phonetic Extracting information from the signal syntactic S John VP Composing a new object from smaller objects bought NP a car 4 How do we know it is about composition (aka computation)? There are only two choices: composition/computation or storage. And we can make a pretty compelling argument that it can’t be storage --namely, that the number of sentences in a given language is infinite. Since the storage space in brains is finite, it can’t be the case that we store an infinite number of sentences! To see this infinity in action, we can ask three questions: Question 1: Can you understand sentences you’ve never heard? Question 2: How many sentences are there in the English language? Question 3: What is the longest sentence in English? 5 Infinity 1: novel sentences Question 1: Can you understand sentences you’ve never heard? George Washington flew to Atlantis on his winged horse and devoured the redcoats. The Tyrannosaurus told the leprechaun to follow him down to the center of the Earth. The fact that you can say and understand sentences that you’ve never heard before is one reason to believe that sentences are not memorized Compare this to things that we know are memorized, like lexemes: Can you understand words that you’ve never heard before? What does subjacency mean? 6 Infinity 2: infinitely many sentences Question 2: How many sentences are there in the English language? We’ve already seen that we can create sentences that you’ve never heard before... and we can also easily prove that the number of sentences in English is infinite: John bought one car. John bought two cars. John bought three cars. John bought four cars. We know that numbers are infinite, so this simple manipulation shows that the number of sentences in English must also be infinite. John bought five cars. John bought six cars. ... 7 Infinity 3: infinitely long sentences Question 3: What is the longest sentence in English? Just like there is no limit to the number of sentences in English, there is also no limit to the length of sentences in English: John bought a car. Mary thinks that John bought a car. Susan said that Mary thinks that John bought a car. Bill hopes that Susan said that Mary thinks that John bought a car. Clare claimed that Bill hopes that Susan said that Mary thinks that John bought a car. Of course, there is an upper limit to this in practice -- at a certain point, it becomes impossible to remember all of the people in the sentence. But that is a limit on your memory, not a limit on the length of sentences in English. 8 A theory of the construction of sentences 9 Some definitions This is what our theory needs to define! word string: A sequence of words. sentence: A sequence of words that forms a well-formed sentence in a given language. words strings sentences 10 Word strings vs sentences: they feel very different to you! The boy appeared to the girl to be nice. * The the boy appeared to the girl to be nice. Why can “the” appear here... ...but “the” can’t appear here? The asterisk means this string of words does not form a sentence. The little boy appeared to the girl to be nice. Why can “little” appear here when “the” can’t? So what we want to do is build a theory of sentence construction that only builds possible sentences, and explains why it is that the impossible sentences are impossible. 11 The Semantic Soup Approach 12 They “don’t make sense” One gut reaction many speakers have to word strings that aren’t sentences is that they “don’t make sense”. We can try to convert that intuition into a theory of sentence construction as follows: The simplest possible theory of sentence construction: The cat is sleeping. Put words together into a string. If the meanings of the words can combine to form a larger meaning that “makes sense”, then it is a sentence. If the word meanings can’t be combined in a way that “makes sense”, it is not a sentence. This captures the distinction between these two word strings very nicely. * The the the sleeps coughs. 13 This is called the Semantic Soup theory The Semantic Soup theory: Put words together into a string. If the meanings of the words can combine to form a larger meaning that “makes sense”, then it is a sentence. If the word meanings can’t be combined in a way that “makes sense”, it is not a sentence. Notice that the semantic soup theory says nothing about the order of the words. This means that the order of the words shouldn’t matter: John ate soup. * ate soup John. What did John eat? * eat did what John But it is clear that the order of the words does matter: the same set of words can either form a sentence or a word string based on the order! 14 Another problem for semantic soup The order of words not only impacts the status of a string as a sentence or not, it also impacts the meaning of the sentence! John soup ate soup different order, different meaning ate John yes, less than $100K Not even 10 years ago you could buy a house for less than $100,000. different order, different meaning Not even 10 years ago could you buy a house for less than $100,000. 10 years ago today no, not less than $100K 10 years ago today 15 The Linear Order Approach 16 Semantic Soup vs Linear Order Clearly the order of the words is an important part of the construction of sentences. we can use brackets to indicate that a group of words forms a set [Not1 even2 103 years4 ago5 you6 could7 buy8 a9 house10 for11 less12 than13 $100,00014] we can use numbered subscripts to indicate the order of the words [Not1 even2 103 years4 ago5 could6 you7 buy8 a9 house10 for11 less12 than13 $100,00014] Semantic Soup: The construction of a sentence is simply mashing together a group of words Linear Order: The construction of a sentence is stringing together a group of words in a specific order. 17 So how can we define the specific order necessary for sentences? One possibility is to look at the transitions between each word in a sentence. For sentences, each transition will be possible. For word strings, at least one transition will be impossible. Did the boy eat the soup? * Soup the the eat boy did The transitions that we are looking at here are bigrams: they are the transition between two words. Unfortunately, bigrams don’t work. This word string is composed of possible transitions, as you can see by the possible sentences that contain each of the transitions. * Eat John what did? (Did he eat John?) (You gave John what?) (What did you do?) 18 If bigrams don’t work, what about trigrams? bigrams: look at the transition between one word and the next (two words) trigrams: look at the transition between two words and the next (three words) Let’s compare bigrams and trigrams for our word string and see what happens. * Eat John what did? bi-grams: possibility: tri-grams: possibility: [eat John] possible [eat John what] impossible [John what] possible [John what did] impossible [what did] possible Because at least one trigram is impossible, our trigram-based linear order theory can correctly predict the that this is not a sentence. 19 Are there any word strings that trigrams can’t explain? Trigrams are working well for us so far, but can we come up with a word string that is made up completely of possible trigrams? Here is a sentence: The guy who said he was great wouldn’t listen to anyone. Here are the trigrams it is made up of: [The guy who] [was great wouldn’t] [guy who said] [great wouldn’t listen] [who said he] [wouldn’t listen to] [said he was] [listen to anyone] Each of these trigrams are possible because this is a sentence. [he was great] 20 Are there any word strings that trigrams can’t explain? Trigrams are working well for us so far, but can we come up with a word string that is made up completely of possible trigrams? Here is another sentence: The guy who said he was great wouldn’t listen to anyone who didn’t think he was great. This sentence adds 5 trigrams to the list of Here are the trigrams it is made up of: possible trigrams. [The guy who] [was great wouldn’t] [to anyone who] [guy who said] [great wouldn’t listen] [anyone who didn’t] [who said he] [wouldn’t listen to] [who didn’t think] [said he was] [listen to anyone] [didn’t think he] [he was great] This trigram is used twice! [think he was] 21 Are there any word strings that trigrams can’t explain? Trigrams are working well for us so far, but can we come up with a word string that is made up completely of possible trigrams? Here is a word string: * The guy who said he was great wouldn’t listen to anyone who didn’t think he was great wouldn’t listen to anyone. It is made up of the exact same trigrams as the previous sentence! [The guy who] [was great wouldn’t] [to anyone who] [guy who said] [great wouldn’t listen] [anyone who didn’t] [who said he] [wouldn’t listen to] [who didn’t think] [said he was] [listen to anyone] [didn’t think he] [he was great] These trigrams are used twice! [think he was] 22 We can prove that, in general, any N-Gram will fail... These strings show that repeated material is problematic for any n-gram approach: The guy who said he was great wouldn’t listen to anyone who didn’t think he was great. * The guy who said he was great wouldn’t listen to anyone who didn’t think he was great wouldn’t listen to anyone. The “n” can be replaced by any number! Although the strings above can be solved with a 7-gram approach, in principle you can extend these to any arbitrary number, proving that n-grams will never work. For example these strings require a 9-gram: The guy who said he was great at basketball wouldn’t listen to anyone who didn’t think he was great at basketball. * The guy who said he was great at basketball wouldn’t listen to anyone who didn’t think he was great at basketball wouldn’t listen to anyone. 23 Engineering solutions vs. Neuroscience solutions Google translate uses an n-gram approach to syntax to translate between pairs of languages. To see the limitations of this approach, simply do the following: 1. Type in an English sentence (or multiple sentences) 2. Translate the sentence into a foreign language. 3. Copy and paste the translation back into the translate box. 4. Translate the translation back into English. 5. Laugh. As a famous linguist once said: In other words, there is a big difference between what counts as a good program and what counts as a good theory of the brain. It’s nice that they get most of it right, but even if they reach 90% accuracy, that’s not enough. If my child was only 90% accurate at speaking, I would take him to the doctor! 24 Where we stand: neither theory works Semantic Soup: The construction of a sentence is simply mashing together a group of words This can’t distinguish strings and sentences that use the same words Linear Order: The construction of a sentence is stringing together a group of words in a specific order. This can’t distinguish strings and sentences that use the same n-grams 25 The Hierarchical Structure Approach 26 A new theory Semantic Soup: The construction of a sentence is simply mashing together a group of words This can’t distinguish strings and sentences that use the same words Linear Order: The construction of a sentence is stringing together a group of words in a specific order. This can’t distinguish strings and sentences that use the same n-grams Hierarchical Structure: The construction of a sentence is arranging groups of words in a hierarchical structure. This is the theory we will develop from now on. 27 A taste of what “structure” means Repeated n-grams are a problem for the linear order approach because there is no way to distinguish them from each other. If we could say that the repeated sequences were different, then we could explain why it is that some repetitions form sentences, but others do not: subject The guy who said he was great wouldn’t listen to anyone who didn’t think he was great. predicate subject * The guy who said he was great wouldn’t listen to anyone who didn’t think he was great wouldn’t listen to anyone. what is this??? predicate 28 Our game plan What we need is a theory that: 1. Divides sentences up into smaller components 2. Labels those smaller components so that we can treat them differently 3. Tells us how those smaller components are related to each other This is called a theory of Grammar, but it is also simply an answer to the question: How are sentences constructed? Once we have an answer to that question (a theory of grammar), we can construct a theory of the processes that construct sentences during real-time comprehension. grammatical: A sentence is grammatical inside a specific theory of grammar if it can be constructed by that grammar. 29 Step 1: Identify the smaller components of a sentence The smallest component of a sentence is a word. The first thing we can do is look at the properties of words to see if we can distinguish different categories of words. Since our goal is to explain the distribution of words in a sentence, the first place to look for categories is at the distribution of words (syntax): The ___ existed. dog homework idea All of the words that can fit in this position are the same syntactic category, which in this case we call nouns. island * eat * sleep *the *of The words that can’t fit in this position are a different syntactic category, i.e., not nouns. *quickly 30 Distribution tests for syntactic category Nouns: The ___ existed. Verbs: The cat will ___. Prepositions: It died right ____ here. Adjectives: They are very ___. Adverbs (manner): She coughed ___. Adverbs (sentential): ___, you are a liar. Determiners: He wrote ___ other work(s). Complementizers: I know ___ John is a liar. We can define a set of syntactic categories by defining a set of distribution frames, and asking which words fit in which frames. 31 Syntactic Categories must be part of lexical entries! Up until now, we’ve defined lexemes as items stored in the lexicon, consisting of a pair of representations: a phonetic representation and a semantic representation: Phonetic representation: [k æ t] Semantic (meaning) representation: lexeme Syntactic Category: noun Now we need to add a third piece of information to the entry: syntactic category. 32 Step 1: Identify the smaller components of sentences The hierarchical structure approach to sentences claims that there are units that are larger than words, but smaller than the full sentence. We call the sub-units that make up the whole object constituents. We call the constituents that are larger than words but smaller than the sentence phrases. There are a number of tests we can use to figure out if a string of words forms a constituent/phrase or not: Substitution test: If a string of words can be replaced by a single word, then it is a constituent/phrase Ellipsis test: If a string of words can be replaced by silence, then it is a constituent/phrase Movement test: If a string of words can be displaced, then it is a constituent/phrase 33 Constituency Tests Substitution test: If a string of words can be replaced by a single word, then it is a constituent [That bottle of water] might have cracked open. [That bottle It of water] might have cracked open. [That [bottle of water]] might have cracked open. [That [bottle one of water]] might have cracked open. constituent constituent We call these replacements pro-forms (as in “pronoun”). The limitation of the substitution test is that there are very few pro-forms. There are pronouns like the ones above, and at least one proverb: do so [That [bottle of water]] [cracked open and spilled]. [That [bottle of water]] [might have did so cracked open] too. constituent 34 Constituency Tests Ellipsis test: If a string of words can be replaced by silence, then it is a constituent/phrase This bottle of wine didn’t crack open, but... [That [bottle of water]] might have [cracked open]. [That [bottle of water]] might have ____________. constituent It’s unlikely that this bottle of wine will have cracked open, but... [That [bottle of water]] might [have [cracked open]]. [That [bottle of water]] might ________________. constituent Notice that ellipsis generally requires context. This is one of many additional constraints on ellipsis that need to be taken into account during the test. In general, successful ellipsis indicates constituency, but unsuccessful ellipsis does not indicate non-constituency because the ellipsis could simply be violating one of the additional constraints. 35 Constituency Tests Movement test: If a string of words can be displaced, then it is a constituent/phrase There are actually quite a few different types of movement tests, simply because there are several different ways that you can displace a string of words. We can’t cover them all, so I will just give you two examples to illustrate the use of movement as a diagnostic for constituency: Topicalization: I don’t really care for most artwork, but... I really like [that painting of the sky]. Just like ellipsis, there are many additional constraints on movement that need to be taken into account during the test. [That painting of the sky], I really like. Clefting: So failure of movement tests should not be taken as evidence of non-constituency. I don’t know about you, but... I really like [that painting of the sky]. It is [that painting of the sky] that I really like. 36 Building a hierarchical theory What we need is a theory that: 1. Divides sentences up into smaller components syntactic category constituency tests (to identify phrases) 2. Labels those smaller components so that we can treat them differently 3. Tells us how those smaller components are related to each other 37 This is as far as we got today... 38
© Copyright 2025 Paperzz