Mehdi Riazi & Jill Murray Department of Linguistics, Macquarie University, Australia 11th annual EALTA Conference, May 2014 Warwick University SIG for the Assessment of Writing & AAP 1 Project Description • Part of a large-scale project which compares performance on TOEFL-iBT writing vs. reallife academic writing tasks in terms of: – Processes & strategies (qualitative stimulated recall semi-structured interviews) – Product (quantitative analysis of text features) 2 Research questions • How high vs. low scored texts are differentiated in terms of linguistic features? • Which of the two writing tasks (integrated vs. independent) is capable of better differentiating high- and low-scored texts in terms of linguistic features (lexical sophistication, syntactic complexity, and cohesion)? 3 Participants & data • 30 postgraduate students studying at 4 major universities in New South Wales, Australia • TOEFL-iBT writing – Task 1 (Integrated: Reading +Listening) producing a 150 word summary – Task 2 (Independent: Writing prompt) producing an argumentative essay of 250 words • Texts & scores (from ETS) 4 Scoring of texts • Task 1: development, organization, grammar, vocabulary, accuracy and completeness • Task 2: overall writing quality, including development, organization, grammar and vocabulary • Writing scales on the ETS website: (http://www.ets.org/Media/Tests/TOEFL/pdf/ Writing_Rubrics.pdf) • The two tasks are rated from 0 to 5 and the sum is converted to a scaled score of 0 to 30. 5 Text analysis • Coh-Metix 3.0 (Graesser, McNamara, Louwrese, & Cai, 2004) • Surface features – No. of pargs, No. sents, No. words • Deep features – Syntactic complexity – Lexical sophistication – Cohesion Evidence from previous research (Engber, 1995; Ferris, 1994; Grant & Ginther, 2000) & corresponding to scoring rubrics (Cumming et al. 2002) in Guo et al. (2013) 6 Syntactic Complexity: 9 indices 1) Word length, No. of syllables 6) Sentence syntax similarity, all combinations, across paragraphs, mean 2) Sentence length, number of words, mean 7) Agentless passive voice density, incidence 3) Left embeddedness, words before the main verb 8) Flesch Readability 4) Number of modifiers per noun phrase, mean 5) Sentence syntax similarity, adjacent sentences 9) Coh-Metrix L2 Readability 7 Lexical Sophistication: 12 indices 1) CELEX word frequency for 7) Polysemy for content words, content words, mean mean 2) CELEX Log frequency for 8) Hypernymy for nouns and all words, mean verbs, mean 3) Familiarity for content 9) Lexical diversity, type-token words, mean ratio, content word lemmas 4) Concreteness for content 10) Lexical diversity, type- words, mean token ratio, all words 5) Imagability for content 11) Lexical diversity, MTLD, all words, mean words 6) Meaningfulness, Colorado 12) Lexical diversity, MTLD, all norms, content words, mean words Back to 14 8 Cohesion: 10 indices 1) Noun overlap 6) LSA given/new, sentences, mean 2) Argument overlap 7) All connectives incidence 3) Stem overlap 8) Causal connectives incidence 4) Content word overlap, all 9) Logical connectives sentences, binary, mean incidence 5) LSA overlap, adjacent sentences, mean 10) Causal verb incidence (Situational Model) Back to 14 9 Data Analysis • A set of Kruskal-Wallis (non-parametric test equivalent of ANOVA) was run to compare the two groups in each Task in terms surface features and the three latent variables: – Syntactic complexity (9 indices) – Lexical sophistication (12 indices) – Cohesion (10 indices) 10 Findings Task 1: Surface features (mean & SD) Task 1 (low) Task 1 (high) No. of paragraphs 3.38 (1.40) 4.75 Sig. Ns (1.14) No. of sentences 10 (2.30) 12.41 (3.78) No. of words 175 (35.55) 278.17 (46.59) Ns P<0.001 Gebril & Plakans, 2009; Watanabe, 2001; Guo et al. (2013) 11 Findings Task 1 : Deep features Syntactic Complexity (9 indices) • Two indices differentiated high- vs. low-scored tests – Sentence length, number of words, mean – Word length, number of syllables, mean 12 Findings Task 1 : Deep features Lexical Sophistication (12 indices) • Two indices differentiated high- vs. low-scored tests – Lexical diversity, type-token ratio, content word lemmas – Lexical diversity, type-token ratio, all words 13 Findings Task 1 : Deep features Cohesion (10 indices) • Four indices differentiated high- vs. lowscored texts – Noun overlap, all sentences, binary, mean – Argument overlap, all sentences, binary, mean – Stem overlap, all sentences, binary, mean – LSA given/new, sentences, mean 14 Findings Task 2: Surface features (mean & SD) Task 2 (low) Task 2 (high) Sig. No. of paragraphs 5 (1.30) 5 (.84) Ns No. of sentences 15.73 (5) 17.53 (3.62) Ns No. of words 284.80 (70.62) 344.80 (52.35) P<0.05 15 Findings Task 2 : Deep features Syntactic Complexity (9 indices) • No significant results Lexical Sophistication (12 indices) • Only one index differentiated between high vs. low-scored texts • Lexical diversity, type-token ratio, all words Cohesion (10 indices) • No significant results 16 Discussion • Text length seems to be a functional differentiator of lowvs. high-scored texts in both tasks (in line with previous studies) • Integrated task (writing from sources) seem to better differentiate between high- vs. low-scored texts compared to independent task which is solely based on test takers’ prior knowledge & experience • All the three linguistic features, Syntactic Complexity, Lexical Sophistication & Cohesion are able to differentiate between high- vs. low-scored texts in integrated task • High- vs. low-scored independent tasks were differentiated in terms of only TTR 17 Conclusion • Given the small sample size, no strict generalisations can be made • We may end with a hypothesis for further research: • Integrated writing tasks better differentiate between test takers’ L2 writing proficiency than independent tasks in terms of linguistic features of the produced texts • We also need to compare the processes and strategies these two task types elicit 18 References Cumming, A., Kantor, R., & Powers, D. (2002). Decision making while scoring ESL/EFL compositions: A descriptive model. MLJ, 86, 6796. Engber, C.A. (1995). The relationship of lexical proficiency to the quality of ESL compositions. JSLW, 4(2), 139-155. Ferris, D. (1994). Lexical and syntactic features of ESL writing by students at different levels of L2 proficiency. TQ, 28, 414-420. Gebril, A., & Plakans, L. (2009). Investigating source use, discourse features, and process in integrated writing tests. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 7, 47-84. Graesser, A.C., McNamara, D.S., Louwerse, M.M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavioural Research Methods, Instruments, and Computers, 36, 193-202. 19 References Guo, L., Crossley, S., McNamara, D.S. (2013). Predicting human judgements of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18, 218-238. Watanabe, Y. (2001). Read-to-write tasks for the assessment of second language academic writing skills: Investigating text features and rater reactions. University of Hawaii. (Unpublished doctoral dissertation). 20 Note • This research was funded by the Educational Testing Service (ETS) under a Committee of Examiners and the Test of English as a Foreign Language research grant. • ETS does not discount or endorse the methodology, results, implications, or opinions presented by the researchers. 21
© Copyright 2026 Paperzz