Linguistic features of the texts produced by high- vs. low

Mehdi Riazi & Jill Murray
Department of Linguistics, Macquarie University, Australia
11th annual EALTA Conference, May 2014
Warwick University
SIG for the Assessment of Writing & AAP
1
Project Description
• Part of a large-scale project which compares
performance on TOEFL-iBT writing vs. reallife academic writing tasks in terms of:
– Processes & strategies (qualitative stimulated
recall semi-structured interviews)
– Product (quantitative analysis of text features)
2
Research questions
• How high vs. low scored texts are
differentiated in terms of linguistic features?
• Which of the two writing tasks (integrated vs.
independent) is capable of better
differentiating high- and low-scored texts in
terms of linguistic features (lexical
sophistication, syntactic complexity, and
cohesion)?
3
Participants & data
• 30 postgraduate students studying at 4
major universities in New South Wales,
Australia
• TOEFL-iBT writing
– Task 1 (Integrated: Reading +Listening) producing
a 150 word summary
– Task 2 (Independent: Writing prompt) producing
an argumentative essay of 250 words
• Texts & scores (from ETS)
4
Scoring of texts
• Task 1: development, organization, grammar,
vocabulary, accuracy and completeness
• Task 2: overall writing quality, including
development, organization, grammar and
vocabulary
• Writing scales on the ETS website:
(http://www.ets.org/Media/Tests/TOEFL/pdf/
Writing_Rubrics.pdf)
• The two tasks are rated from 0 to 5 and the
sum is converted to a scaled score of 0 to 30.
5
Text analysis
• Coh-Metix 3.0 (Graesser, McNamara, Louwrese, & Cai, 2004)
• Surface features
– No. of pargs, No. sents, No. words
• Deep features
– Syntactic complexity
– Lexical sophistication
– Cohesion
Evidence from previous research (Engber,
1995; Ferris, 1994; Grant & Ginther, 2000) &
corresponding to scoring rubrics (Cumming et
al. 2002) in Guo et al. (2013)
6
Syntactic Complexity: 9 indices
1) Word length, No. of syllables
6) Sentence syntax similarity,
all combinations, across
paragraphs, mean
2) Sentence length, number of
words, mean
7) Agentless passive voice
density, incidence
3) Left embeddedness, words
before the main verb
8) Flesch Readability
4) Number of modifiers per noun
phrase, mean
5) Sentence syntax similarity,
adjacent sentences
9) Coh-Metrix L2 Readability
7
Lexical Sophistication: 12 indices
1) CELEX word frequency for 7) Polysemy for content words,
content words, mean
mean
2) CELEX Log frequency for
8) Hypernymy for nouns and
all words, mean
verbs, mean
3) Familiarity for content
9) Lexical diversity, type-token
words, mean
ratio, content word lemmas
4) Concreteness for content
10) Lexical diversity, type-
words, mean
token ratio, all words
5) Imagability for content
11) Lexical diversity, MTLD, all
words, mean
words
6) Meaningfulness, Colorado 12) Lexical diversity, MTLD, all
norms, content words, mean words
Back to 14
8
Cohesion: 10 indices
1) Noun overlap
6) LSA given/new, sentences,
mean
2) Argument overlap
7) All connectives incidence
3) Stem overlap
8) Causal connectives
incidence
4) Content word overlap, all 9) Logical connectives
sentences, binary, mean
incidence
5) LSA overlap, adjacent
sentences, mean
10) Causal verb incidence
(Situational Model)
Back to 14
9
Data Analysis
• A set of Kruskal-Wallis (non-parametric
test equivalent of ANOVA) was run to
compare the two groups in each Task in
terms surface features and the three
latent variables:
– Syntactic complexity (9 indices)
– Lexical sophistication (12 indices)
– Cohesion (10 indices)
10
Findings Task 1: Surface features
(mean & SD)
Task 1
(low)
Task 1
(high)
No. of paragraphs 3.38 (1.40) 4.75
Sig.
Ns
(1.14)
No. of sentences
10 (2.30)
12.41
(3.78)
No. of words
175 (35.55) 278.17
(46.59)
Ns
P<0.001
Gebril & Plakans, 2009; Watanabe,
2001; Guo et al. (2013)
11
Findings Task 1 : Deep features
Syntactic Complexity (9 indices)
• Two indices differentiated high- vs.
low-scored tests
– Sentence length, number of words,
mean
– Word length, number of syllables,
mean
12
Findings Task 1 : Deep features
Lexical Sophistication (12 indices)
• Two indices differentiated high- vs.
low-scored tests
– Lexical diversity, type-token ratio,
content word lemmas
– Lexical diversity, type-token ratio, all
words
13
Findings Task 1 : Deep features
Cohesion (10 indices)
• Four indices differentiated high- vs. lowscored texts
– Noun overlap, all sentences, binary, mean
– Argument overlap, all sentences, binary,
mean
– Stem overlap, all sentences, binary, mean
– LSA given/new, sentences, mean
14
Findings Task 2: Surface features
(mean & SD)
Task 2
(low)
Task 2
(high)
Sig.
No. of
paragraphs
5 (1.30)
5
(.84)
Ns
No. of sentences
15.73 (5)
17.53
(3.62)
Ns
No. of words
284.80
(70.62)
344.80
(52.35)
P<0.05
15
Findings Task 2 : Deep features
Syntactic Complexity (9 indices)
• No significant results
Lexical Sophistication (12 indices)
• Only one index differentiated between
high vs. low-scored texts
• Lexical diversity, type-token ratio, all words
Cohesion (10 indices)
• No significant results
16
Discussion
• Text length seems to be a functional differentiator of lowvs. high-scored texts in both tasks (in line with previous
studies)
• Integrated task (writing from sources) seem to better
differentiate between high- vs. low-scored texts
compared to independent task which is solely based on
test takers’ prior knowledge & experience
• All the three linguistic features, Syntactic Complexity,
Lexical Sophistication & Cohesion are able to
differentiate between high- vs. low-scored texts in
integrated task
• High- vs. low-scored independent tasks were
differentiated in terms of only TTR
17
Conclusion
• Given the small sample size, no strict
generalisations can be made
• We may end with a hypothesis for further
research:
• Integrated writing tasks better differentiate
between test takers’ L2 writing proficiency
than independent tasks in terms of linguistic
features of the produced texts
• We also need to compare the processes and
strategies these two task types elicit
18
References
Cumming, A., Kantor, R., & Powers, D. (2002). Decision making while
scoring ESL/EFL compositions: A descriptive model. MLJ, 86, 6796.
Engber, C.A. (1995). The relationship of lexical proficiency to the
quality of ESL compositions. JSLW, 4(2), 139-155.
Ferris, D. (1994). Lexical and syntactic features of ESL writing by
students at different levels of L2 proficiency. TQ, 28, 414-420.
Gebril, A., & Plakans, L. (2009). Investigating source use, discourse
features, and process in integrated writing tests. Spaan Fellow
Working Papers in Second or Foreign Language Assessment, 7,
47-84.
Graesser, A.C., McNamara, D.S., Louwerse, M.M., & Cai, Z. (2004).
Coh-Metrix: Analysis of text on cohesion and language.
Behavioural Research Methods, Instruments, and Computers, 36,
193-202.
19
References
Guo, L., Crossley, S., McNamara, D.S. (2013). Predicting human
judgements of essay quality in both integrated and independent
second language writing samples: A comparison study.
Assessing Writing, 18, 218-238.
Watanabe, Y. (2001). Read-to-write tasks for the assessment of
second language academic writing skills: Investigating text
features and rater reactions. University of Hawaii. (Unpublished
doctoral dissertation).
20
Note
• This research was funded by the
Educational Testing Service (ETS) under a
Committee of Examiners and the Test of
English as a Foreign Language research
grant.
• ETS does not discount or endorse the
methodology, results, implications, or
opinions presented by the researchers.
21