Practical Measurement: An Argument

Practical Measurement: An Argument-Based Approach to Exploring Alternative Psychometric Validity Evidence
Jeff J. Kosovich, Chris S., Hulleman, University of Virginia; Jessica K. Flake, York University
Background
The Construct
• Achievement Motivation: factors that direct and energize
individual behavior and choice in school
• The Expectancy-Value framework of achievement
motivation posits three components of motivation
Purpose
• Value – the importance, usefulness, and
enjoyment an individual associates with the a
particular task.
• Cost – perceived psychological, physical, and
temporal barriers that prevent the individual
from succeeding
• Expectancy, value, and costs relate to important
educational outcomes such as interest in a domain, selfregulatory behaviors, and academic performance.
• Expectancy and value are typically positively correlated
with each other, and negatively or unrelated to cost.
Use 1: Describe Classroom Motivation
Assumptions – Reliability and Construct Validity
• Scales demonstrate evidence of reliability.
• Items should relate to other constructs in theoretically
expected directions and magnitudes (see Use 2)
Evidence (Sample: N = 2067, College Math)
•Expected correlations with other constructs can
demonstrate response consistency and act as a substitute
for reliability.
•If an item shares the majority of its variance with a
composite, it arguably contains similar information.
Evidence of Reliability and Construct Validity
(Correlations)
Intended Uses
r = .88
r = .88
Shared Variance: Each image shows the overlap (color) between an
item and a composite. Items were selected by motivation experts and
compared to randomly-chosen items. The three represent expectancy
(E), value (V), and cost (C) with their respective composites scores..
Measures
Expectancy-Value-Cost Items from Studies 1 and 20
Item
How confident are you that you can learn the
E1
material in this class?
How confident are you that you can be successful
E2
in this class?
The Value Intervention in 5 Steps
E3 How well do you expect to do in this class?
1. Describe Classroom Motivation: Capturing motivation in the
classroom can help understand student struggles and be leveraged to
improve student success.
2. Predict Outcomes: Motivation measures can be used to predict
outcomes (e.g., interest) as focal processes or as covariates.
3. Capture Intervention Impacts: Measuring processes
effectively after introducing classroom changes can signal whether
the change makes a difference.
The three uses described above were tested across several samples for
replication purposes. Each use presented below shows evidence from
separate samples, but all analyses were replicated in all samples.
• A Value Intervention: Reflecting on the usefulness of course
material can facilitate students’ interest and performance in a class.
• Contextual constraints: For the study presented below researchers
were given 15 mins to collect all data and administer the intervention.
1. Pre-survey: students responded to a brief measure of motivation
2. Module: Students read a series of quotes about the value
(usefulness) of math to daily life, future career goals, and hobbies.
3. Intervention: Students wrote a brief essay in which they
discussed the usefulness of class material to their own lives
4. Control: Control condition students summarized course material
5. Follow-Up Survey: Students responded to brief measures of
interest and value.
E4
V1
V2
V3
V4
V5
V6
V7
C2
C3
Use 2: Predict Important Outcomes
Assumptions – Experimental Differences
• Items should show pre-post differences when interventions meant to
affect their underlying constructs are introduced
• Items most aligned with the underlying constructs should show the
most drastic differences.
Evidence (Sample: N = 180, College Math)
• Expert knowledge can be leveraged to identify items that best
represent a construct or that predict another construct.
• A reduced set of items can explain similar amounts of outcome
variance
Evidence (Sample: N = 313, High School Algebra/Geometry)
• Expert knowledge can be leveraged to identify items that best
represent a construct or that predict another construct.
• Fewer items can explain similar amounts of outcome variance
• Multiple regression controlling for baseline measures of motivation.
Evidence of Predictive Validity
(Multiple Regression)
R2= .52
R2= .36
Variance Explained: These images represent the amount of interest variance
accounted for by expectancy (E), value (V), and cost (C). The total variance (grid) is
divided into unexplained variance (black) and explained variance (red, blue,
purple). The 12-item full scale included 4 items for each construct. The expert
scale represents two models in which the top-ranked item from each construct
was included (yielded same variance explained). The random scale represents one
model in which a random item from each construct was included.
C4
Use 3: Detect Intervention Impacts
Assumptions – Predictive Validity
• Items should relate to other constructs in theoretically expected
directions and magnitudes
• High quality items should accurately reflect their underlying
construct and maintain its predictive strength
R2= .56
r = .91
Acknowledgements
This research was supported by the Institute of Education Sciences, U.S. Department of
Education, through Grant#R305B090002 to the University of Virginia and the National
Science Foundation Grant DRL 1228661. The opinions expressed are those of the authors and
do not represent views of the Institute or the U.S. Department of Education.
• To examine the viability of interpretation-and-use arguments for high-quality measures that function within applied research constraints.
• To apply the interpretation-and-use framework to a measure of achievement motivation in educational interventions.
• Expectancy – the degree to which students
believe they have the ability and can put forth
the effort to succeed.
• In applied setting there are time constraints
and concerns about participants’
responsiveness.
• These practical constraints conflict with
best practices for validation
• Scales are often adapted or retrofitted to be
practical in these settings.
Evidence of Experimental Differences
(Multiple Regression)
(Unstandardized Regression Coefficient)
Scale Developers: Kenn E. Barron, Steve Getty; Dissertation Committee: Sara Rimm-Kaufman,
Bob Pianta, Karen Schmidt; Research Partners: Julie Phelps, Maryke Lee, Deborah Howard,
Valencia College, Emily Rosenzweig, Allan Wigfield, Stacy Priniski, Judy Harackiewicz,
Florida Virtual Schools
• Measurement experts have outlined
extensive, rigorous standards by which to
compile and judge scale quality (i.e.,
validation)
• Longer scales tend to display better
reliability and arguable represent construct
breadth more fully than short measures.
Usefulness Intervention Effect
Acknowledgements
• However, contemporary argument based
validation approaches can be used to ensure
quality measures that are also practical.
• Use-and-Interpretation arguments
identify desired applications of a measure.
They then identify underlying assumptions
of those uses as well as evidence to support
those assumptions
Composite
0.3
0.25
0.2
0.15
0.1
0.05
0
-0.05
-0.1
-0.15
Usefulness
Importance
C5
C6
C7
C8
How confident are you that you can understand
the material in this class?
How relevant is the course material to your future
career plans?
How important is the course material to your
future?
How useful is the course material to your everyday
life?
How important is this class to you?
How useful will this class be to your career?
How valuable is this class to you?
How useful is this class to you?
How often does this class require too much of
your time or effort?
How often do obstacles (class-related or other)
limit the effort you can put into this class?
How often do you sacrifice too many things in
order to do well in this class?
How often does this class require too much time?
How often do you feel that you don't have to put
into this class because of other things that you do?
How often are you limited in the amount of effort
that you can put into this class?
How often do you feel that you have to sacrifice
too much in order to do well in this class?
Selected by
Expert
Expert
Random
Random
Expert
Random
Expert
Expert
Random
Random
Expert
Note.. E = Expectancy, V = Value, C = Cost.
Result Summary
•
•
•
Several samples showed replicable evidence that singleitem motivation measures demonstrate reliability and
construct validity
Expert-chosen items were able to maintain high quality
construct information despite drastic scale reduction.
Single items were sensitive to intervention interventions
targeted at their underlying constructs.
Conclusions
•
•
Algebra
Geometry
Intervention Effects: This graph shows changes in 1-item measures of usefulness,
importance, and their composite (value) after the intervention. The value
intervention in this study was aimed at changing perceptions of usefulness rather
than general importance. Thus, a strong test of the intervention effects would
show differences in usefulness and not necessarily importance.
•
•
•
Expert knowledge can be leveraged to select
representative measures and minimize quality loss.
The use and interpretation framework can help to identify
what evidence is needed for measure validation.
No single piece of validity evidence is sufficient, but
multiple sources of evidence provide a stronger argument.
Researchers and practitioners need to develop validity
arguments based on their unique measures and contexts.
Future research should examine applying use-andinterpretation arguments to other constructs