Comparing cognitive strategic processes in IELTS

Comparing Cognitive Strategic Reading
Processes in IELTS and TOEFL iBT
Nathaniel Owen
www.le.ac.uk
This session
• Background
• Rationale
• Research Questions
• Cognitive Strategic Processing
• Methodology & Instruments
• Pilot studies & findings
• What next?
Rationale
TOEFL IBT
IELTS
0-8
0 - 1.0
9 - 18
1.0 - 1.5
19 - 29
2.0 - 2.5
30 - 40
3.0 - 3.5
41 - 52
4.0
53 - 64
4.5 - 5.0
65 - 78
5.5 - 6.0
79 - 95
6.5 - 7.0
96 - 120
7.5 - 9.0
Source: http://secure.vec.bc.ca/toefl-equivalency-table.cfm
Reproduced here:
http://www.aber.ac.uk/en/media/departmental/internationalenglishcentre/englishlanguagerequ
irements/Compare-IELTS,-TOEFL-and-TOEIC.pdf
Background – Equivalence studies
• Score concordance
– Prediction; scale aligning; equating (Holland & Dorans,
2006)
• Linking to external criterion
– Common European Framework of Reference (CEFR)
(A1, A2, B1, B2, C1, C2)
– ETS (2007, 2010); Pearson (2009)
• Content equivalence
– Bachman et al. (1995)
Research Questions
• What cognitive strategic processes are elicited by
the reading components of the IELTS and TOEFL
tests? To what extent are they comparable?
• Does the elicitation of different processes reflect a
differing understanding of the construct of interest?
• Are the strategic processing demands of the tests
equally finely-calibrated to distinguish between
different proficiency levels?
Cognitive strategic processing in reading for
academic purposes (Weir, 2005; Khalifa &
Weir, 2009)
Test taker
chracteristics
Context
Validity
Cognitive
Validity
Setting
Demands
Processing
knowledge
Cognitive strategic processing in reading for
academic purposes
• Lack of clear distinctions among terms such as
‘processes,’ ‘skills,’ and ‘strategies’(Grabe, 2000)
• Possibility of distinguishing individual reading skills
(Rosenshine, 1980; Alderson, 2000)
• Identification of construct relevant and irrelevant
strategies (Cohen & Upton, 2006; Field, 2008; Cohen,
2012).
• Encompasses careful & expeditious reading – cognitive
demand (Khalifa & Weir, 2009).
• Innovative methodology – eye tracking (Bax, 2013).
Cognitive strategic processing in reading for academic
purposes (Cohen and Upton, 2006; Cohen, 2011, 2012)
• Language learner strategies
– “Thoughts and actions, consciously chosen and operationalized by language learners,
to assist them in carrying out a multiplicity of tasks” (Cohen, 2011: 7).
– Observable actions and unobservable cognitive processes which may be inferred via
the observable actions.
– Strategies employed by test takers may or may not be relevant to the construct of
interest.
• Test management strategies (construct-relevant?)
– “Strategies for responding meaningfully to test items and tasks” (Cohen, 2012: 263).
– Test management strategies are those that operationalize the skills associated with
reading (Cohen, 2011: 2).
• Test-wiseness strategies (construct-irrelevant)
– “Test-wiseness refers to the use of peripheral information to answer items without
going through the expected linguistic and cognitive processes” (Cohen, 2011: 2).
– Represents construct-irrelevant behaviour by test takers using their knowledge of
item types to select responses rather than engage in domain-relevant cognitive
processing.
Instruments (authentic test scripts & items)
• IELTS
– Composed of three texts & 40 questions
– 14 item types
– 1 hour
• TOEFL iBT
– 3-5 texts; 12-14 questions per passage
– 10 item types (inc. ‘reading to learn’ items)
– 60-100 minutes
– (N.B. Study does not incorporate integrated-item
types e.g. reading into writing, but does include
reading for learning items)
Methodology I: Stimulated-recall interviews
• Verbal protocols a “window into the minds of learners” (Bowles, 2010: 2).
• Verbalisation of task completion procedure
– Retrospective (non-metalinguistic)
• Advantages
– Direct contribution to theory-building (Oxford, 1996)
– Direct, not inferential
– Identification of affective and construct-irrelevant processes (Pressley &
Afflerbach, 1995)
• Caveats
– Reactivity (impact of methodology on findings)
– Veridicality (truthfulness of claims)
(Ericsson and Simon, 1993)
• Intervening variable
– Nationality – student participants all Chinese L1 (Mandarin).
Methodology II: Addressing caveats
• Four pilot studies addressed methodological
concerns
– Pilot Study I – think-aloud vs. stimulated recall
(reactivity)
– Pilot Studies II & III – Appropriateness of stimuli
(annotated text & responses) (veridicality)
– Pilot Study IV – Appropriateness of stimulus
(video prompt) (veridicality)
Pilot Study I
Transcription
• IELTS (one text, thirteen
“…I have no time control… I am told I need 20 minutes
questions, two item
to finish the test, but I didn’t know how to manage the
types, approx. 20
time well. I have done this kind of reading before… I
minutes)
first looked at the key words and each paragraph,
there are key words linked to the headings…”
“…If I have problems with for example B, I go straight
to C… the information isn’t familiar so it’s a bit difficult
for me to read, even though I understand [the words] I
don’t need to understand all the information… I used
scanning, and I did not really read all the texts word
by word, just found the key words… for question 14 to
19, I used strategies like crossing out. For questions 20
to 26, I looked at the key words… so I located the place
and I found the rest of the verse… if I find some of the
information is not similar to this, I could say it’s
‘false’….”
• Concurrent think-aloud
• Reflective feedback
• Authenticity
• Training & familiarity with
item types evident
• Individualised item
feedback
• Metacognitive
verbalisations
Pilot Studies II & III – Procedural changes
• Significant changes to data collection procedure:
– Test conducted under timed-conditions (participant should have
watch to be able to manage time)
– Conducted under test conditions
– immediate retrospective rather than think-aloud
– Researcher present for and directs stimulated-recall interview
(ensure session progresses item by item)
– Presents new challenges in terms of researcher participation
• The transcripts were passed to practicing teachers of EAP
• Teachers presented with student verbalisations (transcribed), student
responses, the correct responses and the script that the student worked
on.
Participant Information
Test taker 1 (Pilot Study 2)
Test taker 2 (Pilot Study 3)
• Overall IELTS score of 6.5 (test taken
July, 2012)
• Reading score of 6.0
• Participant is currently studying on a
Medical Physiology foundation course
at the University of Leicester. The
foundation course has a focus on
English for medical purposes.
• The participant has not attended any
courses at the ELTU.
• Overall IELTS score of 5.5 (test taken
April, 2012)
• No reading score available.
• Participant arrived in Leicester in
October 2012. They studied three
courses at the ELTU (Courses, B, C and
D).
• The participant is intending to
undertake a Master’s degree in
Financial/Mathematical modelling in
the Department of Economics.
Teacher: recently completed DELTA. Holds Teacher: full-time at English Language
Teaching Unit (ELTU), University of
MA in Applied Linguistics from the
Leicester.
University of Leicester.
Pilot Study II (Higher proficiency?)
Transcription
Strategies used & comments
Interviewer: So, we’ll go through it item by item. So,
question 14, paragraph A, you chose title 1.
Can you explain why you chose title 1?
Participant: Yeah, because when I read this, I first read the
titles, and then went to find something… this
way, paragraph A also said ‘what is the FAA,’
so generally speaking, it just introduced [it],
so I chose one.
Interviewer: So, it’s matching ‘FAA’ in the text to the
title…
Participant: Yeah, because it says the FAA regulations, so
it [matched]…
The S probably tried reading for gist due to the
time constraint. However, s/he was unable to
deduce the meanings of key unknown words,
e.g. establishment, which could have helped
her to choose the best heading. Instead, s/he
resorted to an easier strategy (matching a
heading that contains the same word as in the
paragraph).
Interviewer: OK. Number 15, paragraph C, you chose title
2.
Participant: Yeah because in this paragraph it says ‘new
development’ so maybe it suggested
improvement or more easy to work or more
useful…
Perhaps the S was trained to read only topic
sentences (presumably at the beginning of a
paragraph) for exam purposes under timed
conditions, hence missing important
information that comes in the rest of the
paragraph.
Pilot Study III (Lower proficiency?)
Transcription
Strategies/skills used & comments
Interviewer: Question 14, paragraph A, you selected title
5, so can you tell me why you selected title 5
with reference to the test?
Participant: Because I think it’s just… er… background…
about the FAA, that’s why I think it’s this. The
opening sentence… [was the clue]
Interviewer: How did you know that this was about
background information?
Participant: Because it talk[s] about why it [was]
established… um… what this… um… helped to
improve air traffic control.
Perhaps attention is drawn to “oversimplified”
as signalling a general statement opening
(“background”). Interestingly the student
identifies that the paragraph is about “why it
was established” but didn’t perhaps didn’t
know the word “prompts” in the correct
answer.
Interviewer: OK, all right then. What about question 15,
paragraph C, you said title 8. OK, Can you tell
me why?
Participant: Um… because it mention the safety here
[paragraph C, line 6]. It talked about the
rules… for… how to reduce the … chance to
having accident.
Word spotting – “safety” appears in heading
and text.
Feedback questions
presented to teachers
Teacher 1
Teacher 2
How confident were you
about each of your
judgements?
“Quietly confident, but for
Q.21. There are some deictic
words used (e.g. I usually do
'this') which confused me a
bit.”
“Where I’m not sure I’ve used
cautious language”
How useful was the
inclusion of the
participant’s test script?
“Very useful, I can read the
commentary and actually
check which part of the text
was highlighted.”
“Very. Difficult to infer
anything without it.”
How long did the task
take you?
“A little over 30 mins because I “50 mins.”
do not have a printer and I
have to keep going back on
forth the 2 files on my laptop.”
What other evidence
would you like to see that
could make you more
confident about your
judgements?
“Can't think of anything at the
moment.”
“None that I know of.”
Findings from Pilot Studies 2 & 3
• The teachers identified eight specific strategies from the participants’
verbalisations of two item types:
Reading for gist
Identifying topic sentences
Matching words in the question stem to
words in the text (‘word spotting’)
Using knowledge of synonyms to construct
meaning
Deducing meaning from context
Eliminating options
Underlining/highlighting key words
Scanning
Pilot Study 2 (Higher proficiency?)
1
6
Pilot Study 3 (Lower proficiency?)
1
1
3
1
1
2
1
1
1
1
1
1
• Three types of strategic behaviour (Cohen and Upton, 2006; Cohen, 2011):
Language learner strategies
Scanning a text to identify key words
Knowledge of syntax to determine the type of response required
Lexical knowledge to determine parallel meanings between text and summary.
Test-management strategies
Comparing the grammatical structure of the text to the summary
Comparing the coherent structure of the text to the summary
Ensuring the proposed response matches the item requirements.
Test-wiseness strategies
Matching lexis from the summary to the text.
Pilot Study IV: Video stimulus
• Test taker: Successful Chinese PhD student from
School of Management. Recently passed viva voce
with minor corrections.
• Teacher the same as in Pilot Study II
• Instrument: TOEFL iBT reading
– Timed (30 minutes)
• Participant had not previously taken TOEFL
• Participant recorded whilst completing test.
– Video playback as additional stimulus
Pilot Study IV: Video stimulus – “what were you thinking at
that moment?”
Pilot Study IV: Findings
• Video able to isolate
Transcription
specific behaviour.
“… I bring my memory back to doing this kind of
Automatically structures
test, I remember we should read the question first,
interview.
and this will be much more helpful to me…”
“I was imagining they would ask me something
• Participant highlighted
about …”
key ideas, not just key
15:52. Underlines “Ambocelatus swam like modern
words
whales”
• Observable strategic
• “…they may ask me how they swim, what they
behaviour highlights
look like or something… because those sentences
specific processes at
that I highlighted gave me some information
specific moments
rather than just passing [reading] the sentences”
[PAUSES] “Let me think about what I thought”
• Participant is trying to remember their thoughts,
not reinterpret them
• Evidence that video
aided cognition rather
than metacognition
Pilot Study IV: TOEFL Q7 (MCQ) & Q8 (Reading to learn)
7. The hind leg of Basilosaurus was a
significant find because it showed that
Basilosaurus
•
•
•
•
lived later than Ambulocetus natans
lived at the same time as Pakicetus
was able to swim well
could not have walked on land
8. It can be inferred that Basilosaurus
bred and gave birth in which of the
following locations?
•
•
•
•
on land
both on land and at sea
in shallow water
in a marine environment
Transcription
“… yes, it’s based on this sentence: ‘such
legs would have been far too small to have
supported the 50-foot long…’ so what I
understand is that they couldn’t actually…
that small feet couldn’t support them to
walk on land…”
“They probably cannot give birth on the
land to support themselves, and certainly
not both on land and sea for still the same
reason… I chose the last one because I
didn’t see [any reference to] shallow
water, but I did see the word ‘marine’
here”.
What next? (Stimulated-recall interview research design)
Test-taker Participants (n=24)
Test 1
•
•
•
•
Test 2
Coder 1
IELTS 1
IELTS 1
IELTS 1
IELTS 1
Coder 7
Coder 2
IELTS 2
IELTS 2
IELTS 2
IELTS 2
Coder 8
Coder 3
IELTS 3
IELTS 3
IELTS 3
IELTS 3
Coder 9
Coder 4
TOEFL 1
TOEFL 1
TOEFL 1
TOEFL 1
Coder 10
Coder 5
TOEFL 2
TOEFL 2
TOEFL 2
TOEFL 2
Coder 11
Coder 6
TOEFL 3
TOEFL 3
TOEFL 3
TOEFL 3
Coder 12
Each box represents one participant
IELTS and TOEFL tests divided into three parts. Each part forms the stimulus for two
stimulated-recall interviews.
Two complete IELTS and two complete TOEFL reading tests are used.
Twelve ‘coders’, each of who analyses two transcripts from the same part of each test
(e.g. Coder 1 analyses verbalisations from two participants who completed part 1 of IELTS
test 1).
What next?
•
Recruitment of ‘successful’ participants (enrolled on an academic programme at the
University of Leicester)
•
Collection of participant proficiency data (existing test scores as evidence of
proficiency)
•
All participants to share the same L1 (Chinese).
•
Research Questions:
– Do test takers use the strategies that test writers claim their items activate?
– Do test takers use the same strategies across different item types?
– Do test takers use the same strategies in both tests?
•
Hypothesis formulation:
– Scrutinise claims of test companies regarding the strategies they claim to activate
– Scrutinise claims that particular strategies are associated with particular item types
•
Finely-grained level of analysis vs. aggregated scores as instrument of test
comparability
References
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
•
Alderson, J. C. (2000). Assessing Reading. New York: Cambridge University Press.
Bachman, L. F., Davidson, F., Ryan, K. & Choi, I. C. (1995) An Investigation into the comparability of two tests of English as a
foreign language: The Cambridge TOEFL Comparability Study. Cambridge: Cambridge University Press.
Bax, S. (2013). The cognitive processing of candidates during reading tests: Evidence from eye-tracking. Language Testing 30
(4), 441-465.
Bowles, M. (2010). The think-aloud controversy in second language research. New York: Routledge. Oxford.
Cohen, D. & Upton, T. A. (2006). Strategies in Responding to New TOEFL® Reading Tasks. TOEFL iBT Report No. iBT-10.
https://www.ets.org/research/policy_research_reports/publications/report/2006/hsjr (Accessed 25th June, 2013).
Cohen, D. (2012). Test-taking Strategies and task design, in Fulcher & Davidson (eds.) The Routledge Handbook of Language
Testing. New York and London: Routledge, 262-277.
Educational Testing Service (ETS) (2007). Mapping TOEFL® iBT on the Common European Framework of Reference (CEFR).
http://www.ets.org/Media/Campaign/5394/rsc/pdf/5684_CEF%20Flyer_HR.pdf (Accessed 25th June, 2013).
Educational Testing Service (ETS) (2010). Linking TOEFL iBT Scores to IELTS Scores – A Research Report.
http://www.ets.org/s/toefl/pdf/linking_toefl_ibt_scores_to_ielts_scores.pdf (Accessed 25th June, 2013).
Ericsson, K., & Simon, H. (1993). Protocol Analysis: Verbal Reports as Data (2nd ed.). Boston: MIT Press.
Field, J. (2008). Listening in the language classroom. Cambridge, UK: Cambridge University Press.
Fulcher, G., Davidson, F. & Kemp, J. (2011). Effective rating scale development for speaking tests: Performance decision
trees. Language Testing 28(1), 5 – 29.
Grabe, W. (2000). Reading research and its implications for reading assessment. In A. Kunnan (Ed.), Fairness and validation
in language assessment (pp. 226-260). Cambridge: Cambridge University Press.
Holland, P. W., & Dorans, N. J. (2006). Linking and equating. In Brennan, R. L. (Ed.), Educational measurement (4th ed., pp.
187-220). Westport, CT: Greenwood.
Mislevy, R. J., (1992) Linking Educational Assessments: Concepts, Issues, Methods and Prospects. Princeton, NJ: Educational
Testing Service.
Pearson (2009). Preliminary Estimates of Concordance between Pearson Test of English Academic and other Measures of
English Language Competencies.
http://www.pearsonpte.com/SiteCollectionDocuments/PremliminaryEstimatesofConcordanceUS.pdf (Accessed 25th June,
2013).
Pressley, M., & Afflerbach, P. (1995). Verbal protocols of reading: The nature of constructively responsive reading. Hillsdale
NJ: Erlbaum. http://secure.vec.bc.ca/toefl-equivalency-table.cfm (Accessed 25th June 2013).
Rosenshine, B. V. (1980). Skill Hierarchies in Reading Comprehension, in Spiro, R. J. Bruce, B. C. and Brewer, W. F. (eds.),
Theoretical Issues in Reading Comprehension. Hillsdale, NJ: Lawrence Erlbaum, 535-559.