Principles of Language Assessment

Principles of
Language Assessment
Practicality
• Cost
• time constraints
• administration issues
• specific and time-efficient scoring/ evaluation procedure
• Computerized tests- where?
Reliability
• consistent
• dependable
• Same test same students, different occasions same results?
Student-related reliability
• illness
• fatigue
• anxiety
• other physical/ psychological factors
• observed score  deviates from ‘true score’
• test-wiseness: strategies for efficient test taking
Rater reliability
• Inter-rater reliability
• Intra-rater reliability
• Eg: rater’s fatigue 
different scoring in the first few papers and the last ones
Remedy:
• reading some papers before grading
• analytical scoring rubrics
Test administration reliability
• street noise
• photocopies
• light in the room
• temperature
• desks and chairs
Test reliability
• Nature of the test
• Too long/ timed/ content knowledge/ ambiguous test items/ more
than one correct answer
Validity
• Purpose of the test?
• Inferences made from assessment  appropriate, meaningful, useful
• Former knowledge required?
• Test performance  course/ unit
• How well a test measures whether students reached the goals/ level of
competence
Content-related evidence
• Test samples the subject matter
• Test requires target performance
• Achievement clearly defined
Remedy:
• Direct testing
• Extensive scrutiny
Criterion-related evidence
• Comparison of results with another measure of the same criterion
• Eg: commercially produced tests
• Concurrent validity if the results are parallel
• Predictive validity  likelihood of future success
Construct-related evidence
• Eg: proficiency, communicative competence
• Eg: interview  pron., fluency, accuracy, vocab, sociolinguistic appropriateness
• Theoretical construct:
Eg: believing that
‘These are major components of oral proficiency.’
Construct-related evidence
• Eg: Unit focus vocab.- communicative use
Definitions 
Eg: TOEFL oral production in academic contexts 
Consequential validity
• Intended and unintended impact of the test takers
• Social consequences
• Eg: YDS, KPSS, TOEFL for SUNY
• Eg: Dershanes, coaching
Face Validity
• Students  fair, relevant, useful, important VALID
• Looks right
• Appears to measure the right thing
Remedy:
• Expected format, familiar tasks
• Doable in allotted time
• Clear, uncomplicated items and directions
• Relevant to course work (content validity)
• Reasonable challenge
Authenticity
• Test task real language task
• Likely to be enacted in real world
Remedy:
• Language natural
• Items contextualized
• Thematic organization (episodic)
• Tasks represent/ approximate real-world tasks
• Eg: reading texts, listening with hesitations, white noise, interruptions
• Large scale  productive skills 
Washback/ Backwash
• Effect of testing on teaching, instruction and learning
• after or before the test
• ‘Teaching to the test’
• Intrinsic motivation, autonomy, language ego, strategic investment
• Feedback, constructive criticism, hints  dialogue, cooperative learning
Practicality
Standardized multiple-choice proficiency
test. No oral or written production. S
receives a report form of scores for
listening, grammar, proofreading, and
reading comprehension.
Timed impromptu test of written English
(TWE). S receives a report form of scores
ranging between 0 and 6.
One-on-one oral interview to assess
overall production ability. S receives one
holistic score between 0 and 5.
Rater
Test
Content
Face
reliability reliability validity validity
Authenticity
Practicality
Multiple-choice listening quiz provided by
a textbook with taped promts, covering the
content of a three-week module of a
course. S receives a total score from T
with no indication of which items were
correct/ incorrect.
S is given a sheet with 10 vocublary items
and directed to write 10 sentences using
each word. T marks each item as
acceptable/ unacceptable; and S receives
the best sheet back with items marked and
a total score ranging from 0 to 10.
Rater
Test
Content
Face
reliability reliability validity validity
Authenticity
Practicality
S reads a passage of three paragraphs
and responds to six multiple-choice
general comprehension items. S receives
a score report showing which items were
correct and incorrect.
S gives a 5-minute prepared oral
presentation in class. T evaluates by filling
in a rating sheet indicating S’s success in
delivery,
rapport,
pronounciation,
grammar, and content.
S listens to 15-minute video lecture and
takes notes. T makes individual comments
on each S’s notes.
Rater
Test
Content
Face
reliability reliability validity validity
Authenticity
Practicality
S writes a take-home (overnight) one-page
essay on assigned topic. T reads paper
and comments on organization and
content only, and returns essay to S for a
subsequent draft.
S Creates mmultiple drafts on a threepage essay, peer-and T-reviewed, and
turns in a final version. T comments on
grammatical/ rhetorical errors only, and
returns it to S.
Rater
Test
Content
Face
reliability reliability validity validity
Authenticity
Practicality
Scenario 11: S assambles a portifolio of
materials over a semester-long course, T
conferences with 5 on the portifolio at the
end of the semester.
Scenario 12: S writes a dialogue journal
over the course of a semester. T
comments on entires every two weeks.
Rater
Test
Content
Face
reliability reliability validity validity
Authenticity