Missing vs. Zero Zarko Vukmirovic [email protected] Community of Practice: Data and Metrics Workshop January 8-9, 2013 1 Missing vs. Zero: What Is the Issue? • Focus on: missing data originating from educational assessments (item responses) • Not dealing with: missing group membership and background information data • Consider the nature of missing data: – Identify situations where missing data are valid and can be replaced by certain value (zero or other) – Identify situations where missing data indicate that a value is not known • Discuss and reach agreement about solutions for both types of missing data 2 Missing vs. Zero: Consider “Stimulus-Response” • Consider a definition of test as a set of small experiments or “Stimulus-Response” (S-R) situations • Evaluate all possible meaning of ‘blanks’ in data 1. 2. 3. 4. 5. Student was exposed to “S” but failed to respond Student was exposed to “S” but the response was lost Student was not appropriately exposed to “S” Student was not exposed to “S” – items not reached Student was not exposed to “S” – items not presented 3 1. Missing Data is Valid = Zero • Student was exposed to “S” but failed to give a response – Student does not know the answer – Else? • Missing is valid data: – Assign ZERO points (or other conventional value) • Most plausible solution for omitted item responses, except for items at the end of a timed “power” test 4 2. Missing Data – Value Was Lost • Student was exposed to “S” but the response was lost – Response was illegible, – Response incorrectly recorded, – Response missed to be entered, – Else? • Missing means a value is not known: – Estimate what the values would have been if they are not lost 5 3. Missing Data – Value Invalidated • Student was not appropriately exposed to “S” – Erroneous testing material (e.g., bad print, missing pages) – Environmental distractions (e.g., noise during some part of test administration) – Else? • Data are invalidated (and converted to missing) because student responses supposedly do not reflect measured construct – Estimate what the values would have been if the students were appropriately exposed to “S” 6 4. Missing Data – Items Not Reached • Student was not exposed to “S” – Because of test timing some items are not reached – Consider a definition of the measured construct • In “power” tests not reached means not known – Estimate what the values would have been if a student had more time, or – Determine a student score only from reached items – Issue: how to accurately identify reached items • In “speed” tests, or tests that tap both power and speed, not reached means not successfully done – Assign ZERO points 7 5. Missing Data – Items Not Presented • Student was not exposed to “S” – In certain multi-level test designs not all items need to be presented to examinees • Some test designs allow skipping easy items that are below examinee level and not presenting difficult items that are above examinee level – Individually administered tests – Computer adaptive tests • It is assumed that among not presented items easy ones would be answered correctly and difficult ones would not be answered correctly • Student scores are determined only from presented items using special scaling procedures that consider difficulty of those items that were presented 8 When Missing Is Missing • Treatment of missing data traditionally involves two major strategies – Case elimination – Value imputation • These strategies may be somewhat differently implemented considering the purpose of analysis: – Evaluation of student characteristics (computation of student scores for individual or institutional reports) – Evaluation of item and test characteristics (computation of item and test statistics, scaling, equating, etc.) 9 Calibration of Students • In practice it is often necessary to generate student scores for all test takers (e.g., high stakes tests), thus, case elimination may be undesirable. • Definition of the following rules is needed: – Valid case: specify the percentage of missing data that is still acceptable to generate a reliable and valid student score (e.g., max 25%) – Missing as valid: specify when blanks will be assigned zero points (e.g., omitted responses for all reached items) – Missing as not known: specify imputation technique (if any) to be used for not known values (e.g., replace missing with values computed by multiple regression) – Computation of student score (e.g., percentage of correct out of reached items). 10 Calibration of Items and Tests • Both case elimination and imputation strategies may be considered • Definition of case elimination rules – List-wise exclusion, when sample size is large and missing data is random. – Pair-wise exclusion, maximizes available information, however, item parameters are based on different samples. • Imputation techniques – Substitution by a mean • Horizontal: mean that a student has on other valid items • Vertical: mean that other students have on a particular item – Substitution by a value predicted by regression using other items (and non-test data) as predictors – Maximum likelihood – Multiple imputations 11 Computation of Student Scores • Percent correct (out of total): – number of correct / total number of points • Percent correct (out of reached) – number of correct / number of points from reached items – Note: this yields the same results as substitution with horizontal mean • Adjusted percent correct (out of reached), based on ratio of average difficulty of reached and non-reached items. – (number of correct / number of points from reached items) * adjustment factor – Adjustment factor is: (average P for non-reached/ average P for reached items). – Rationale of adjustment: if non-reached items are easier than reached, actual percent correct on them would be higher, thus can be expected that total percent correct would be higher than percent correct on reached items (and vice versa). 12 Summary • Missing vs. Zero decision is based on “S-R” paradigm considerations • When missing is valid: assign Zero • When missing is not known: eliminate cases or impute values, consider possible bias in missing data • Consider the purpose of analysis: evaluation of students or items & tests. 13 Discussion • Questions and answers • Discuss the nature of “S-R” paradigm in EGRA and EGMA tests • Discuss specific nature of missing data in EGRA and EGMA • Agree on strategy of decision Missing vs. Zero • Agree on strategy and procedures for treating Missing Data in EGRA and EGMA • THANK YOU 14
© Copyright 2026 Paperzz