Practical Measurement: An Argument-Based Approach to Exploring Alternative Psychometric Validity Evidence Jeff J. Kosovich, Chris S., Hulleman, University of Virginia; Jessica K. Flake, York University Background The Construct • Achievement Motivation: factors that direct and energize individual behavior and choice in school • The Expectancy-Value framework of achievement motivation posits three components of motivation Purpose • Value – the importance, usefulness, and enjoyment an individual associates with the a particular task. • Cost – perceived psychological, physical, and temporal barriers that prevent the individual from succeeding • Expectancy, value, and costs relate to important educational outcomes such as interest in a domain, selfregulatory behaviors, and academic performance. • Expectancy and value are typically positively correlated with each other, and negatively or unrelated to cost. Use 1: Describe Classroom Motivation Assumptions – Reliability and Construct Validity • Scales demonstrate evidence of reliability. • Items should relate to other constructs in theoretically expected directions and magnitudes (see Use 2) Evidence (Sample: N = 2067, College Math) •Expected correlations with other constructs can demonstrate response consistency and act as a substitute for reliability. •If an item shares the majority of its variance with a composite, it arguably contains similar information. Evidence of Reliability and Construct Validity (Correlations) Intended Uses r = .88 r = .88 Shared Variance: Each image shows the overlap (color) between an item and a composite. Items were selected by motivation experts and compared to randomly-chosen items. The three represent expectancy (E), value (V), and cost (C) with their respective composites scores.. Measures Expectancy-Value-Cost Items from Studies 1 and 20 Item How confident are you that you can learn the E1 material in this class? How confident are you that you can be successful E2 in this class? The Value Intervention in 5 Steps E3 How well do you expect to do in this class? 1. Describe Classroom Motivation: Capturing motivation in the classroom can help understand student struggles and be leveraged to improve student success. 2. Predict Outcomes: Motivation measures can be used to predict outcomes (e.g., interest) as focal processes or as covariates. 3. Capture Intervention Impacts: Measuring processes effectively after introducing classroom changes can signal whether the change makes a difference. The three uses described above were tested across several samples for replication purposes. Each use presented below shows evidence from separate samples, but all analyses were replicated in all samples. • A Value Intervention: Reflecting on the usefulness of course material can facilitate students’ interest and performance in a class. • Contextual constraints: For the study presented below researchers were given 15 mins to collect all data and administer the intervention. 1. Pre-survey: students responded to a brief measure of motivation 2. Module: Students read a series of quotes about the value (usefulness) of math to daily life, future career goals, and hobbies. 3. Intervention: Students wrote a brief essay in which they discussed the usefulness of class material to their own lives 4. Control: Control condition students summarized course material 5. Follow-Up Survey: Students responded to brief measures of interest and value. E4 V1 V2 V3 V4 V5 V6 V7 C2 C3 Use 2: Predict Important Outcomes Assumptions – Experimental Differences • Items should show pre-post differences when interventions meant to affect their underlying constructs are introduced • Items most aligned with the underlying constructs should show the most drastic differences. Evidence (Sample: N = 180, College Math) • Expert knowledge can be leveraged to identify items that best represent a construct or that predict another construct. • A reduced set of items can explain similar amounts of outcome variance Evidence (Sample: N = 313, High School Algebra/Geometry) • Expert knowledge can be leveraged to identify items that best represent a construct or that predict another construct. • Fewer items can explain similar amounts of outcome variance • Multiple regression controlling for baseline measures of motivation. Evidence of Predictive Validity (Multiple Regression) R2= .52 R2= .36 Variance Explained: These images represent the amount of interest variance accounted for by expectancy (E), value (V), and cost (C). The total variance (grid) is divided into unexplained variance (black) and explained variance (red, blue, purple). The 12-item full scale included 4 items for each construct. The expert scale represents two models in which the top-ranked item from each construct was included (yielded same variance explained). The random scale represents one model in which a random item from each construct was included. C4 Use 3: Detect Intervention Impacts Assumptions – Predictive Validity • Items should relate to other constructs in theoretically expected directions and magnitudes • High quality items should accurately reflect their underlying construct and maintain its predictive strength R2= .56 r = .91 Acknowledgements This research was supported by the Institute of Education Sciences, U.S. Department of Education, through Grant#R305B090002 to the University of Virginia and the National Science Foundation Grant DRL 1228661. The opinions expressed are those of the authors and do not represent views of the Institute or the U.S. Department of Education. • To examine the viability of interpretation-and-use arguments for high-quality measures that function within applied research constraints. • To apply the interpretation-and-use framework to a measure of achievement motivation in educational interventions. • Expectancy – the degree to which students believe they have the ability and can put forth the effort to succeed. • In applied setting there are time constraints and concerns about participants’ responsiveness. • These practical constraints conflict with best practices for validation • Scales are often adapted or retrofitted to be practical in these settings. Evidence of Experimental Differences (Multiple Regression) (Unstandardized Regression Coefficient) Scale Developers: Kenn E. Barron, Steve Getty; Dissertation Committee: Sara Rimm-Kaufman, Bob Pianta, Karen Schmidt; Research Partners: Julie Phelps, Maryke Lee, Deborah Howard, Valencia College, Emily Rosenzweig, Allan Wigfield, Stacy Priniski, Judy Harackiewicz, Florida Virtual Schools • Measurement experts have outlined extensive, rigorous standards by which to compile and judge scale quality (i.e., validation) • Longer scales tend to display better reliability and arguable represent construct breadth more fully than short measures. Usefulness Intervention Effect Acknowledgements • However, contemporary argument based validation approaches can be used to ensure quality measures that are also practical. • Use-and-Interpretation arguments identify desired applications of a measure. They then identify underlying assumptions of those uses as well as evidence to support those assumptions Composite 0.3 0.25 0.2 0.15 0.1 0.05 0 -0.05 -0.1 -0.15 Usefulness Importance C5 C6 C7 C8 How confident are you that you can understand the material in this class? How relevant is the course material to your future career plans? How important is the course material to your future? How useful is the course material to your everyday life? How important is this class to you? How useful will this class be to your career? How valuable is this class to you? How useful is this class to you? How often does this class require too much of your time or effort? How often do obstacles (class-related or other) limit the effort you can put into this class? How often do you sacrifice too many things in order to do well in this class? How often does this class require too much time? How often do you feel that you don't have to put into this class because of other things that you do? How often are you limited in the amount of effort that you can put into this class? How often do you feel that you have to sacrifice too much in order to do well in this class? Selected by Expert Expert Random Random Expert Random Expert Expert Random Random Expert Note.. E = Expectancy, V = Value, C = Cost. Result Summary • • • Several samples showed replicable evidence that singleitem motivation measures demonstrate reliability and construct validity Expert-chosen items were able to maintain high quality construct information despite drastic scale reduction. Single items were sensitive to intervention interventions targeted at their underlying constructs. Conclusions • • Algebra Geometry Intervention Effects: This graph shows changes in 1-item measures of usefulness, importance, and their composite (value) after the intervention. The value intervention in this study was aimed at changing perceptions of usefulness rather than general importance. Thus, a strong test of the intervention effects would show differences in usefulness and not necessarily importance. • • • Expert knowledge can be leveraged to select representative measures and minimize quality loss. The use and interpretation framework can help to identify what evidence is needed for measure validation. No single piece of validity evidence is sufficient, but multiple sources of evidence provide a stronger argument. Researchers and practitioners need to develop validity arguments based on their unique measures and contexts. Future research should examine applying use-andinterpretation arguments to other constructs
© Copyright 2026 Paperzz