Capturing patterns of linguistic interaction in a parsed corpus A methodological case study Sean Wallis Survey of English Usage University College London [email protected] Capturing linguistic interaction... • Parsed corpus linguistics • Intra-structural priming • Experiments – – – – Attributive AJPs before a noun Embedded postmodifying clauses Sequential postmodifying clauses Speech vs. writing • Conclusions • The handout explains the analytical method in more detail (so read it later!) Parsed corpus linguistics • An example tree from ICE-GB (spoken) S1A-006 #23 Parsed corpus linguistics • Three kinds of evidence may be obtained from a parsed corpus Frequency evidence of a particular known rule, structure or linguistic event Coverage evidence of new rules, etc. Interaction evidence of the relationship between rules, structures and events • This evidence is necessarily framed within a particular grammatical scheme – How might we evaluate this grammar? Intra-structural priming • Priming effects within a structure – Study repeating an additive step in structures • Consider – a phrase or clause that may (in principle) be extended ad infinitum • e.g. an NP with a noun head N Intra-structural priming • Priming effects within a structure – Study repeating an additive step in structures • Consider – a phrase or clause that may (in principle) be extended ad infinitum N • e.g. an NP with a noun head – a single additive step applied to this structure • e.g. add an attributive AJP before the head AJP Intra-structural priming • Priming effects within a structure – Study repeating an additive step in structures • Consider – a phrase or clause that may (in principle) be extended ad infinitum N • e.g. an NP with a noun head – a single additive step applied to this structure AJP • e.g. add an attributive AJP before the head – Q. What is the effect of repeatedly applying this operation to the structure? N ship Intra-structural priming • Priming effects within a structure – Study repeating an additive step in structures • Consider – a phrase or clause that may (in principle) be extended ad infinitum N • e.g. an NP with a noun head – a single additive step applied to this structure AJP • e.g. add an attributive AJP before the head – Q. What is the effect of repeatedly applying this operation to the structure? AJP N tall ship Intra-structural priming • Priming effects within a structure – Study repeating an additive step in structures • Consider – a phrase or clause that may (in principle) be extended ad infinitum N • e.g. an NP with a noun head – a single additive step applied to this structure AJP • e.g. add an attributive AJP before the head – Q. What is the effect of repeatedly applying this operation to the structure? AJP AJP N tall very green ship Intra-structural priming • Priming effects within a structure – Study repeating an additive step in structures • Consider – a phrase or clause that may (in principle) be extended ad infinitum N • e.g. an NP with a noun head – a single additive step applied to this structure AJP • e.g. add an attributive AJP before the head – Q. What is the effect of repeatedly applying this operation to the structure? AJP AJP AJP N tall very green old ship Experiment 1: analysis of results • Sequential probability analysis – calculate probability of adding each AJP – error bars: Wilson intervals – probability falls probability • second < first 0.20 • third < second – decisions interact – Every AJP added makes it harder to add another 0.15 0.10 0.05 0.00 0 1 2 3 4 5 Experiment 1: explanations? • Feedback loop: for each successive AJP, it is more difficult to add a further AJP logical-semantic constraints • tend to say the tall green ship • do not tend to say tall short ship or green tall ship communicative economy • once speaker said tall green ship, tends to only say ship memory/processing constraints • unlikely: this is a small structure, as are AJPs Experiment 1: speech vs. writing • Spoken vs. written subcorpora – Same overall pattern – Spoken data tends to have fewer attributive AJPs • Support for communicative economy or memory/processing hypotheses? – Significance tests • Paired 2x1 Wilson tests (Wallis 2011) • first and second observed spoken probabilities are significantly smaller than written probability 0.25 0.20 written 0.15 spoken 0.10 0.05 0.00 0 1 2 3 4 5 Experiment 2: preverbal AVPs • Consider adverb phrases before a verb – Results very different • Probability does not fall significantly between first and second AVP • Probability does fall 0.10 between third and probability second AVP – Possible constraints 0.05 • (weak) communicative • (weak) semantic 0.00 – Further investigation needed 0 1 2 3 4 Experiment 3: postmodifying clauses • Another way to specify nouns in English – add clause after noun to explicate it • the ship [that was in the port] • the ship [called Ariadne] – may be embedded • the ship [that was in the port [we visited last week]] – or successively postmodified • the ship [called Ariadne][that was in the port] Experiment 3: (i) embedding • Probability of adding a further embedded postmodifying clause falls with size – All data • second < first • third < first 0.10 probability – Spoken • second < first written 0.05 spoken – Written • third < second all 0.00 • Compare with effect of sequential postmodification of same head 0 1 2 3 4 Experiment 3: (ii) sequential • Probability of sequential postmodifying falls and - for spoken data, falls, then rises – All data • second < first probability 0.15 – Spoken • third > second 0.10 spoken 0.05 written 0.00 0 1 2 3 4 5 Experiment 3: (ii) sequential • Probability of sequential postmodifying falls and - for spoken data, falls, then rises – All data • second < first probability 0.15 – Spoken • third > second – Option: count conjoins separately or treat as single item • Either way, results show similar pattern 0.10 spoken 0.05 written 0.00 0 1 2 3 – Negative feedback: the ‘in for a penny’ effect 4 5 Experiment 3: (iii) embed vs. seq • Embedded vs. sequential postmodification • embedding > sequence (second level) – It is slightly easier to modify the latest head than a more remote one: • semantic constraints? • backtracking cost? – Third level 0.15 probability 0.10 sequential 0.05 embedding • embedding < sequence 0.00 0 1 2 3 4 (if counting conjoins) • long sequences seem to be easier to construct than comparable layers of embedding 5 Conclusions • A method for evaluating interactions along grammatical axes – General purpose, robust, structural – More abstract than ‘linguistic choice’ experiments – Depends on a concept of grammatical distance along an axis, based on the chosen grammar • Method has philosophical implications – Grammar viewed as outcome of linguistic choices – Linguistics as an evaluable observational science • Signature (trace) of language production decisions – A unification of theoretical and corpus linguistics? Potential applications • Corpus linguistics – Optimising existing grammatical framework • e.g. coordination, compound nouns – Comparing genres/languages/periods • Theoretical linguistics – Comparing different grammars, same language • Psycholinguistics – Search for evidence of language production constraints in spontaneous speech corpora • speech and language therapy • language acquisition and development References Nelson, G., Wallis, S. & Aarts, B. (2002) Exploring natural language. Benjamins. Pickering, M. & Ferreira, V. (2008) Structural priming. Psychological Bulletin 134, 427–459. Wallis, S.A. (2011) Comparing χ² tests for separability. Survey of English Usage. • For explanation of the analysis method see the handout! • For more detail and a draft of the full paper see http://corplingstats.wordpress.com
© Copyright 2026 Paperzz