Principles of Language Assessment Practicality • Cost • time constraints • administration issues • specific and time-efficient scoring/ evaluation procedure • Computerized tests- where? Reliability • consistent • dependable • Same test same students, different occasions same results? Student-related reliability • illness • fatigue • anxiety • other physical/ psychological factors • observed score deviates from ‘true score’ • test-wiseness: strategies for efficient test taking Rater reliability • Inter-rater reliability • Intra-rater reliability • Eg: rater’s fatigue different scoring in the first few papers and the last ones Remedy: • reading some papers before grading • analytical scoring rubrics Test administration reliability • street noise • photocopies • light in the room • temperature • desks and chairs Test reliability • Nature of the test • Too long/ timed/ content knowledge/ ambiguous test items/ more than one correct answer Validity • Purpose of the test? • Inferences made from assessment appropriate, meaningful, useful • Former knowledge required? • Test performance course/ unit • How well a test measures whether students reached the goals/ level of competence Content-related evidence • Test samples the subject matter • Test requires target performance • Achievement clearly defined Remedy: • Direct testing • Extensive scrutiny Criterion-related evidence • Comparison of results with another measure of the same criterion • Eg: commercially produced tests • Concurrent validity if the results are parallel • Predictive validity likelihood of future success Construct-related evidence • Eg: proficiency, communicative competence • Eg: interview pron., fluency, accuracy, vocab, sociolinguistic appropriateness • Theoretical construct: Eg: believing that ‘These are major components of oral proficiency.’ Construct-related evidence • Eg: Unit focus vocab.- communicative use Definitions Eg: TOEFL oral production in academic contexts Consequential validity • Intended and unintended impact of the test takers • Social consequences • Eg: YDS, KPSS, TOEFL for SUNY • Eg: Dershanes, coaching Face Validity • Students fair, relevant, useful, important VALID • Looks right • Appears to measure the right thing Remedy: • Expected format, familiar tasks • Doable in allotted time • Clear, uncomplicated items and directions • Relevant to course work (content validity) • Reasonable challenge Authenticity • Test task real language task • Likely to be enacted in real world Remedy: • Language natural • Items contextualized • Thematic organization (episodic) • Tasks represent/ approximate real-world tasks • Eg: reading texts, listening with hesitations, white noise, interruptions • Large scale productive skills Washback/ Backwash • Effect of testing on teaching, instruction and learning • after or before the test • ‘Teaching to the test’ • Intrinsic motivation, autonomy, language ego, strategic investment • Feedback, constructive criticism, hints dialogue, cooperative learning Practicality Standardized multiple-choice proficiency test. No oral or written production. S receives a report form of scores for listening, grammar, proofreading, and reading comprehension. Timed impromptu test of written English (TWE). S receives a report form of scores ranging between 0 and 6. One-on-one oral interview to assess overall production ability. S receives one holistic score between 0 and 5. Rater Test Content Face reliability reliability validity validity Authenticity Practicality Multiple-choice listening quiz provided by a textbook with taped promts, covering the content of a three-week module of a course. S receives a total score from T with no indication of which items were correct/ incorrect. S is given a sheet with 10 vocublary items and directed to write 10 sentences using each word. T marks each item as acceptable/ unacceptable; and S receives the best sheet back with items marked and a total score ranging from 0 to 10. Rater Test Content Face reliability reliability validity validity Authenticity Practicality S reads a passage of three paragraphs and responds to six multiple-choice general comprehension items. S receives a score report showing which items were correct and incorrect. S gives a 5-minute prepared oral presentation in class. T evaluates by filling in a rating sheet indicating S’s success in delivery, rapport, pronounciation, grammar, and content. S listens to 15-minute video lecture and takes notes. T makes individual comments on each S’s notes. Rater Test Content Face reliability reliability validity validity Authenticity Practicality S writes a take-home (overnight) one-page essay on assigned topic. T reads paper and comments on organization and content only, and returns essay to S for a subsequent draft. S Creates mmultiple drafts on a threepage essay, peer-and T-reviewed, and turns in a final version. T comments on grammatical/ rhetorical errors only, and returns it to S. Rater Test Content Face reliability reliability validity validity Authenticity Practicality Scenario 11: S assambles a portifolio of materials over a semester-long course, T conferences with 5 on the portifolio at the end of the semester. Scenario 12: S writes a dialogue journal over the course of a semester. T comments on entires every two weeks. Rater Test Content Face reliability reliability validity validity Authenticity
© Copyright 2026 Paperzz