spoken

Capturing patterns of linguistic
interaction in a parsed corpus
A methodological case study
Sean Wallis
Survey of English Usage
University College London
[email protected]
Capturing linguistic interaction...
• Parsed corpus linguistics
• Intra-structural priming
• Experiments
–
–
–
–
Attributive AJPs before a noun
Embedded postmodifying clauses
Sequential postmodifying clauses
Speech vs. writing
• Conclusions
• The handout explains the analytical method in more detail
(so read it later!)
Parsed corpus linguistics
• An example tree from ICE-GB (spoken)
S1A-006 #23
Parsed corpus linguistics
• Three kinds of evidence may be obtained
from a parsed corpus
Frequency evidence of a particular known rule,
structure or linguistic event
Coverage evidence of new rules, etc.
Interaction evidence of the relationship
between rules, structures and events
• This evidence is necessarily framed within a
particular grammatical scheme
– How might we evaluate this grammar?
Intra-structural priming
• Priming effects within a structure
– Study repeating an additive step in structures
• Consider
– a phrase or clause that may (in principle) be
extended ad infinitum
• e.g. an NP with a noun head
N
Intra-structural priming
• Priming effects within a structure
– Study repeating an additive step in structures
• Consider
– a phrase or clause that may (in principle) be
extended ad infinitum
N
• e.g. an NP with a noun head
– a single additive step applied to this structure
• e.g. add an attributive AJP before the head
AJP
Intra-structural priming
• Priming effects within a structure
– Study repeating an additive step in structures
• Consider
– a phrase or clause that may (in principle) be
extended ad infinitum
N
• e.g. an NP with a noun head
– a single additive step applied to this structure
AJP
• e.g. add an attributive AJP before the head
– Q. What is the effect of repeatedly applying this
operation to the structure?
N
ship
Intra-structural priming
• Priming effects within a structure
– Study repeating an additive step in structures
• Consider
– a phrase or clause that may (in principle) be
extended ad infinitum
N
• e.g. an NP with a noun head
– a single additive step applied to this structure
AJP
• e.g. add an attributive AJP before the head
– Q. What is the effect of repeatedly applying this
operation to the structure?
AJP
N
tall
ship
Intra-structural priming
• Priming effects within a structure
– Study repeating an additive step in structures
• Consider
– a phrase or clause that may (in principle) be
extended ad infinitum
N
• e.g. an NP with a noun head
– a single additive step applied to this structure
AJP
• e.g. add an attributive AJP before the head
– Q. What is the effect of repeatedly applying this
operation to the structure?
AJP
AJP
N
tall very green ship
Intra-structural priming
• Priming effects within a structure
– Study repeating an additive step in structures
• Consider
– a phrase or clause that may (in principle) be
extended ad infinitum
N
• e.g. an NP with a noun head
– a single additive step applied to this structure
AJP
• e.g. add an attributive AJP before the head
– Q. What is the effect of repeatedly applying this
operation to the structure?
AJP
AJP
AJP N
tall very green old ship
Experiment 1: analysis of results
• Sequential probability analysis
– calculate probability of adding each AJP
– error bars: Wilson intervals
– probability falls
probability
• second < first
0.20
• third < second
– decisions interact
– Every AJP added
makes it harder
to add another
0.15
0.10
0.05
0.00
0
1
2
3
4
5
Experiment 1: explanations?
• Feedback loop: for each successive AJP,
it is more difficult to add a further AJP
 logical-semantic constraints
• tend to say the tall green ship
• do not tend to say tall short ship or green tall ship
 communicative economy
• once speaker said tall green ship, tends to only say ship
 memory/processing constraints
• unlikely: this is a small structure, as are AJPs
Experiment 1: speech vs. writing
• Spoken vs. written subcorpora
– Same overall pattern
– Spoken data tends to have fewer attributive AJPs
• Support for communicative economy or
memory/processing hypotheses?
– Significance tests
• Paired 2x1 Wilson tests
(Wallis 2011)
• first and second
observed spoken
probabilities are
significantly smaller
than written
probability
0.25
0.20
written
0.15
spoken
0.10
0.05
0.00
0
1
2
3
4
5
Experiment 2: preverbal AVPs
• Consider adverb phrases before a verb
– Results very different
• Probability does not fall significantly between first and
second AVP
• Probability does fall
0.10
between third and
probability
second AVP
– Possible constraints
0.05
• (weak) communicative
• (weak) semantic
0.00
– Further investigation
needed
0
1
2
3
4
Experiment 3: postmodifying clauses
• Another way to specify nouns in English
– add clause after noun to explicate it
• the ship [that was in the port]
• the ship [called Ariadne]
– may be embedded
• the ship [that was in the port [we visited last week]]
– or successively postmodified
• the ship [called Ariadne][that was in the port]
Experiment 3: (i) embedding
• Probability of adding a further embedded
postmodifying clause falls with size
– All data
• second < first
• third < first
0.10
probability
– Spoken
• second < first
written
0.05
spoken
– Written
• third < second
all
0.00
• Compare with effect of
sequential postmodification of same head
0
1
2
3
4
Experiment 3: (ii) sequential
• Probability of sequential postmodifying falls and - for spoken data, falls, then rises
– All data
• second < first
probability
0.15
– Spoken
• third > second
0.10
spoken
0.05
written
0.00
0
1
2
3
4
5
Experiment 3: (ii) sequential
• Probability of sequential postmodifying falls and - for spoken data, falls, then rises
– All data
• second < first
probability
0.15
– Spoken
• third > second
– Option: count
conjoins separately
or treat as single item
• Either way, results show
similar pattern
0.10
spoken
0.05
written
0.00
0
1
2
3
– Negative feedback: the ‘in for a penny’ effect
4
5
Experiment 3: (iii) embed vs. seq
• Embedded vs. sequential postmodification
• embedding > sequence (second level)
– It is slightly easier to
modify the latest head
than a more remote
one:
• semantic constraints?
• backtracking cost?
– Third level
0.15
probability
0.10
sequential
0.05
embedding
• embedding < sequence 0.00 0
1
2
3
4
(if counting conjoins)
• long sequences seem to be easier to construct than
comparable layers of embedding
5
Conclusions
• A method for evaluating interactions along
grammatical axes
– General purpose, robust, structural
– More abstract than ‘linguistic choice’ experiments
– Depends on a concept of grammatical distance
along an axis, based on the chosen grammar
• Method has philosophical implications
– Grammar viewed as outcome of linguistic choices
– Linguistics as an evaluable observational science
• Signature (trace) of language production decisions
– A unification of theoretical and corpus linguistics?
Potential applications
• Corpus linguistics
– Optimising existing grammatical framework
• e.g. coordination, compound nouns
– Comparing genres/languages/periods
• Theoretical linguistics
– Comparing different grammars, same language
• Psycholinguistics
– Search for evidence of language production
constraints in spontaneous speech corpora
• speech and language therapy
• language acquisition and development
References
Nelson, G., Wallis, S. & Aarts, B. (2002) Exploring natural language.
Benjamins.
Pickering, M. & Ferreira, V. (2008) Structural priming. Psychological
Bulletin 134, 427–459.
Wallis, S.A. (2011) Comparing χ² tests for separability. Survey of
English Usage.
• For explanation of the analysis method see the handout!
• For more detail and a draft of the full paper see
http://corplingstats.wordpress.com