Comparing Cognitive Strategic Reading Processes in IELTS and TOEFL iBT Nathaniel Owen www.le.ac.uk This session • Background • Rationale • Research Questions • Cognitive Strategic Processing • Methodology & Instruments • Pilot studies & findings • What next? Rationale TOEFL IBT IELTS 0-8 0 - 1.0 9 - 18 1.0 - 1.5 19 - 29 2.0 - 2.5 30 - 40 3.0 - 3.5 41 - 52 4.0 53 - 64 4.5 - 5.0 65 - 78 5.5 - 6.0 79 - 95 6.5 - 7.0 96 - 120 7.5 - 9.0 Source: http://secure.vec.bc.ca/toefl-equivalency-table.cfm Reproduced here: http://www.aber.ac.uk/en/media/departmental/internationalenglishcentre/englishlanguagerequ irements/Compare-IELTS,-TOEFL-and-TOEIC.pdf Background – Equivalence studies • Score concordance – Prediction; scale aligning; equating (Holland & Dorans, 2006) • Linking to external criterion – Common European Framework of Reference (CEFR) (A1, A2, B1, B2, C1, C2) – ETS (2007, 2010); Pearson (2009) • Content equivalence – Bachman et al. (1995) Research Questions • What cognitive strategic processes are elicited by the reading components of the IELTS and TOEFL tests? To what extent are they comparable? • Does the elicitation of different processes reflect a differing understanding of the construct of interest? • Are the strategic processing demands of the tests equally finely-calibrated to distinguish between different proficiency levels? Cognitive strategic processing in reading for academic purposes (Weir, 2005; Khalifa & Weir, 2009) Test taker chracteristics Context Validity Cognitive Validity Setting Demands Processing knowledge Cognitive strategic processing in reading for academic purposes • Lack of clear distinctions among terms such as ‘processes,’ ‘skills,’ and ‘strategies’(Grabe, 2000) • Possibility of distinguishing individual reading skills (Rosenshine, 1980; Alderson, 2000) • Identification of construct relevant and irrelevant strategies (Cohen & Upton, 2006; Field, 2008; Cohen, 2012). • Encompasses careful & expeditious reading – cognitive demand (Khalifa & Weir, 2009). • Innovative methodology – eye tracking (Bax, 2013). Cognitive strategic processing in reading for academic purposes (Cohen and Upton, 2006; Cohen, 2011, 2012) • Language learner strategies – “Thoughts and actions, consciously chosen and operationalized by language learners, to assist them in carrying out a multiplicity of tasks” (Cohen, 2011: 7). – Observable actions and unobservable cognitive processes which may be inferred via the observable actions. – Strategies employed by test takers may or may not be relevant to the construct of interest. • Test management strategies (construct-relevant?) – “Strategies for responding meaningfully to test items and tasks” (Cohen, 2012: 263). – Test management strategies are those that operationalize the skills associated with reading (Cohen, 2011: 2). • Test-wiseness strategies (construct-irrelevant) – “Test-wiseness refers to the use of peripheral information to answer items without going through the expected linguistic and cognitive processes” (Cohen, 2011: 2). – Represents construct-irrelevant behaviour by test takers using their knowledge of item types to select responses rather than engage in domain-relevant cognitive processing. Instruments (authentic test scripts & items) • IELTS – Composed of three texts & 40 questions – 14 item types – 1 hour • TOEFL iBT – 3-5 texts; 12-14 questions per passage – 10 item types (inc. ‘reading to learn’ items) – 60-100 minutes – (N.B. Study does not incorporate integrated-item types e.g. reading into writing, but does include reading for learning items) Methodology I: Stimulated-recall interviews • Verbal protocols a “window into the minds of learners” (Bowles, 2010: 2). • Verbalisation of task completion procedure – Retrospective (non-metalinguistic) • Advantages – Direct contribution to theory-building (Oxford, 1996) – Direct, not inferential – Identification of affective and construct-irrelevant processes (Pressley & Afflerbach, 1995) • Caveats – Reactivity (impact of methodology on findings) – Veridicality (truthfulness of claims) (Ericsson and Simon, 1993) • Intervening variable – Nationality – student participants all Chinese L1 (Mandarin). Methodology II: Addressing caveats • Four pilot studies addressed methodological concerns – Pilot Study I – think-aloud vs. stimulated recall (reactivity) – Pilot Studies II & III – Appropriateness of stimuli (annotated text & responses) (veridicality) – Pilot Study IV – Appropriateness of stimulus (video prompt) (veridicality) Pilot Study I Transcription • IELTS (one text, thirteen “…I have no time control… I am told I need 20 minutes questions, two item to finish the test, but I didn’t know how to manage the types, approx. 20 time well. I have done this kind of reading before… I minutes) first looked at the key words and each paragraph, there are key words linked to the headings…” “…If I have problems with for example B, I go straight to C… the information isn’t familiar so it’s a bit difficult for me to read, even though I understand [the words] I don’t need to understand all the information… I used scanning, and I did not really read all the texts word by word, just found the key words… for question 14 to 19, I used strategies like crossing out. For questions 20 to 26, I looked at the key words… so I located the place and I found the rest of the verse… if I find some of the information is not similar to this, I could say it’s ‘false’….” • Concurrent think-aloud • Reflective feedback • Authenticity • Training & familiarity with item types evident • Individualised item feedback • Metacognitive verbalisations Pilot Studies II & III – Procedural changes • Significant changes to data collection procedure: – Test conducted under timed-conditions (participant should have watch to be able to manage time) – Conducted under test conditions – immediate retrospective rather than think-aloud – Researcher present for and directs stimulated-recall interview (ensure session progresses item by item) – Presents new challenges in terms of researcher participation • The transcripts were passed to practicing teachers of EAP • Teachers presented with student verbalisations (transcribed), student responses, the correct responses and the script that the student worked on. Participant Information Test taker 1 (Pilot Study 2) Test taker 2 (Pilot Study 3) • Overall IELTS score of 6.5 (test taken July, 2012) • Reading score of 6.0 • Participant is currently studying on a Medical Physiology foundation course at the University of Leicester. The foundation course has a focus on English for medical purposes. • The participant has not attended any courses at the ELTU. • Overall IELTS score of 5.5 (test taken April, 2012) • No reading score available. • Participant arrived in Leicester in October 2012. They studied three courses at the ELTU (Courses, B, C and D). • The participant is intending to undertake a Master’s degree in Financial/Mathematical modelling in the Department of Economics. Teacher: recently completed DELTA. Holds Teacher: full-time at English Language Teaching Unit (ELTU), University of MA in Applied Linguistics from the Leicester. University of Leicester. Pilot Study II (Higher proficiency?) Transcription Strategies used & comments Interviewer: So, we’ll go through it item by item. So, question 14, paragraph A, you chose title 1. Can you explain why you chose title 1? Participant: Yeah, because when I read this, I first read the titles, and then went to find something… this way, paragraph A also said ‘what is the FAA,’ so generally speaking, it just introduced [it], so I chose one. Interviewer: So, it’s matching ‘FAA’ in the text to the title… Participant: Yeah, because it says the FAA regulations, so it [matched]… The S probably tried reading for gist due to the time constraint. However, s/he was unable to deduce the meanings of key unknown words, e.g. establishment, which could have helped her to choose the best heading. Instead, s/he resorted to an easier strategy (matching a heading that contains the same word as in the paragraph). Interviewer: OK. Number 15, paragraph C, you chose title 2. Participant: Yeah because in this paragraph it says ‘new development’ so maybe it suggested improvement or more easy to work or more useful… Perhaps the S was trained to read only topic sentences (presumably at the beginning of a paragraph) for exam purposes under timed conditions, hence missing important information that comes in the rest of the paragraph. Pilot Study III (Lower proficiency?) Transcription Strategies/skills used & comments Interviewer: Question 14, paragraph A, you selected title 5, so can you tell me why you selected title 5 with reference to the test? Participant: Because I think it’s just… er… background… about the FAA, that’s why I think it’s this. The opening sentence… [was the clue] Interviewer: How did you know that this was about background information? Participant: Because it talk[s] about why it [was] established… um… what this… um… helped to improve air traffic control. Perhaps attention is drawn to “oversimplified” as signalling a general statement opening (“background”). Interestingly the student identifies that the paragraph is about “why it was established” but didn’t perhaps didn’t know the word “prompts” in the correct answer. Interviewer: OK, all right then. What about question 15, paragraph C, you said title 8. OK, Can you tell me why? Participant: Um… because it mention the safety here [paragraph C, line 6]. It talked about the rules… for… how to reduce the … chance to having accident. Word spotting – “safety” appears in heading and text. Feedback questions presented to teachers Teacher 1 Teacher 2 How confident were you about each of your judgements? “Quietly confident, but for Q.21. There are some deictic words used (e.g. I usually do 'this') which confused me a bit.” “Where I’m not sure I’ve used cautious language” How useful was the inclusion of the participant’s test script? “Very useful, I can read the commentary and actually check which part of the text was highlighted.” “Very. Difficult to infer anything without it.” How long did the task take you? “A little over 30 mins because I “50 mins.” do not have a printer and I have to keep going back on forth the 2 files on my laptop.” What other evidence would you like to see that could make you more confident about your judgements? “Can't think of anything at the moment.” “None that I know of.” Findings from Pilot Studies 2 & 3 • The teachers identified eight specific strategies from the participants’ verbalisations of two item types: Reading for gist Identifying topic sentences Matching words in the question stem to words in the text (‘word spotting’) Using knowledge of synonyms to construct meaning Deducing meaning from context Eliminating options Underlining/highlighting key words Scanning Pilot Study 2 (Higher proficiency?) 1 6 Pilot Study 3 (Lower proficiency?) 1 1 3 1 1 2 1 1 1 1 1 1 • Three types of strategic behaviour (Cohen and Upton, 2006; Cohen, 2011): Language learner strategies Scanning a text to identify key words Knowledge of syntax to determine the type of response required Lexical knowledge to determine parallel meanings between text and summary. Test-management strategies Comparing the grammatical structure of the text to the summary Comparing the coherent structure of the text to the summary Ensuring the proposed response matches the item requirements. Test-wiseness strategies Matching lexis from the summary to the text. Pilot Study IV: Video stimulus • Test taker: Successful Chinese PhD student from School of Management. Recently passed viva voce with minor corrections. • Teacher the same as in Pilot Study II • Instrument: TOEFL iBT reading – Timed (30 minutes) • Participant had not previously taken TOEFL • Participant recorded whilst completing test. – Video playback as additional stimulus Pilot Study IV: Video stimulus – “what were you thinking at that moment?” Pilot Study IV: Findings • Video able to isolate Transcription specific behaviour. “… I bring my memory back to doing this kind of Automatically structures test, I remember we should read the question first, interview. and this will be much more helpful to me…” “I was imagining they would ask me something • Participant highlighted about …” key ideas, not just key 15:52. Underlines “Ambocelatus swam like modern words whales” • Observable strategic • “…they may ask me how they swim, what they behaviour highlights look like or something… because those sentences specific processes at that I highlighted gave me some information specific moments rather than just passing [reading] the sentences” [PAUSES] “Let me think about what I thought” • Participant is trying to remember their thoughts, not reinterpret them • Evidence that video aided cognition rather than metacognition Pilot Study IV: TOEFL Q7 (MCQ) & Q8 (Reading to learn) 7. The hind leg of Basilosaurus was a significant find because it showed that Basilosaurus • • • • lived later than Ambulocetus natans lived at the same time as Pakicetus was able to swim well could not have walked on land 8. It can be inferred that Basilosaurus bred and gave birth in which of the following locations? • • • • on land both on land and at sea in shallow water in a marine environment Transcription “… yes, it’s based on this sentence: ‘such legs would have been far too small to have supported the 50-foot long…’ so what I understand is that they couldn’t actually… that small feet couldn’t support them to walk on land…” “They probably cannot give birth on the land to support themselves, and certainly not both on land and sea for still the same reason… I chose the last one because I didn’t see [any reference to] shallow water, but I did see the word ‘marine’ here”. What next? (Stimulated-recall interview research design) Test-taker Participants (n=24) Test 1 • • • • Test 2 Coder 1 IELTS 1 IELTS 1 IELTS 1 IELTS 1 Coder 7 Coder 2 IELTS 2 IELTS 2 IELTS 2 IELTS 2 Coder 8 Coder 3 IELTS 3 IELTS 3 IELTS 3 IELTS 3 Coder 9 Coder 4 TOEFL 1 TOEFL 1 TOEFL 1 TOEFL 1 Coder 10 Coder 5 TOEFL 2 TOEFL 2 TOEFL 2 TOEFL 2 Coder 11 Coder 6 TOEFL 3 TOEFL 3 TOEFL 3 TOEFL 3 Coder 12 Each box represents one participant IELTS and TOEFL tests divided into three parts. Each part forms the stimulus for two stimulated-recall interviews. Two complete IELTS and two complete TOEFL reading tests are used. Twelve ‘coders’, each of who analyses two transcripts from the same part of each test (e.g. Coder 1 analyses verbalisations from two participants who completed part 1 of IELTS test 1). What next? • Recruitment of ‘successful’ participants (enrolled on an academic programme at the University of Leicester) • Collection of participant proficiency data (existing test scores as evidence of proficiency) • All participants to share the same L1 (Chinese). • Research Questions: – Do test takers use the strategies that test writers claim their items activate? – Do test takers use the same strategies across different item types? – Do test takers use the same strategies in both tests? • Hypothesis formulation: – Scrutinise claims of test companies regarding the strategies they claim to activate – Scrutinise claims that particular strategies are associated with particular item types • Finely-grained level of analysis vs. aggregated scores as instrument of test comparability References • • • • • • • • • • • • • • • • • Alderson, J. C. (2000). Assessing Reading. New York: Cambridge University Press. Bachman, L. F., Davidson, F., Ryan, K. & Choi, I. C. (1995) An Investigation into the comparability of two tests of English as a foreign language: The Cambridge TOEFL Comparability Study. Cambridge: Cambridge University Press. Bax, S. (2013). The cognitive processing of candidates during reading tests: Evidence from eye-tracking. Language Testing 30 (4), 441-465. Bowles, M. (2010). The think-aloud controversy in second language research. New York: Routledge. Oxford. Cohen, D. & Upton, T. A. (2006). Strategies in Responding to New TOEFL® Reading Tasks. TOEFL iBT Report No. iBT-10. https://www.ets.org/research/policy_research_reports/publications/report/2006/hsjr (Accessed 25th June, 2013). Cohen, D. (2012). Test-taking Strategies and task design, in Fulcher & Davidson (eds.) The Routledge Handbook of Language Testing. New York and London: Routledge, 262-277. Educational Testing Service (ETS) (2007). Mapping TOEFL® iBT on the Common European Framework of Reference (CEFR). http://www.ets.org/Media/Campaign/5394/rsc/pdf/5684_CEF%20Flyer_HR.pdf (Accessed 25th June, 2013). Educational Testing Service (ETS) (2010). Linking TOEFL iBT Scores to IELTS Scores – A Research Report. http://www.ets.org/s/toefl/pdf/linking_toefl_ibt_scores_to_ielts_scores.pdf (Accessed 25th June, 2013). Ericsson, K., & Simon, H. (1993). Protocol Analysis: Verbal Reports as Data (2nd ed.). Boston: MIT Press. Field, J. (2008). Listening in the language classroom. Cambridge, UK: Cambridge University Press. Fulcher, G., Davidson, F. & Kemp, J. (2011). Effective rating scale development for speaking tests: Performance decision trees. Language Testing 28(1), 5 – 29. Grabe, W. (2000). Reading research and its implications for reading assessment. In A. Kunnan (Ed.), Fairness and validation in language assessment (pp. 226-260). Cambridge: Cambridge University Press. Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp. 187-220). Westport, CT: Greenwood. Mislevy, R. J., (1992) Linking Educational Assessments: Concepts, Issues, Methods and Prospects. Princeton, NJ: Educational Testing Service. Pearson (2009). Preliminary Estimates of Concordance between Pearson Test of English Academic and other Measures of English Language Competencies. http://www.pearsonpte.com/SiteCollectionDocuments/PremliminaryEstimatesofConcordanceUS.pdf (Accessed 25th June, 2013). Pressley, M., & Afflerbach, P. (1995). Verbal protocols of reading: The nature of constructively responsive reading. Hillsdale NJ: Erlbaum. http://secure.vec.bc.ca/toefl-equivalency-table.cfm (Accessed 25th June 2013). Rosenshine, B. V. (1980). Skill Hierarchies in Reading Comprehension, in Spiro, R. J. Bruce, B. C. and Brewer, W. F. (eds.), Theoretical Issues in Reading Comprehension. Hillsdale, NJ: Lawrence Erlbaum, 535-559.
© Copyright 2026 Paperzz