Validity Definitions Definitions Examples General Issues Sources of

Definitions
Validity
●
Psych 818
DeShon
Definitions
●
Validation: the process of gathering and evaluating
information to support the desired inference
●
You don't validate a test!
●
Instead, you validate inferences or decisions based
on test scores
●
Validation determines the degree of confidence
that decision makers can place in inferences we
mdee about people based on their test scores
General Issues
●
Validity statements should address particular
interpretations or types of decisions
–
●
●
Ex: a test for general intelligence (Wonderlic) may
discriminate well for the general population but not
very well for college grads
Validation is a process of hypothesis testing:
– Someone who scores high on this measure will also do
well in situation A
In validity assessment the aim is inferential
– Ex: a person who does well on rheumatology exam can
be expected to know more about rheumatic disease, or
manage patient with rheumatologic disease
appropriately.
Validity is the extent to which the inferences made
from test scores are accurate
–
Variation in the underlying construct causes variation
in the measurement process
–
Establishing causality in measurement is no different
than establishing causality for any research question
–
Must show evidence to support the inference
–
Must rule out alternative explanations
Examples
●
People with higher levels of depression will score
higher on the Beck depression inventory
●
A person with this score on the LSAT will do well
in law school.
●
A person with these scores on the SVIB will be
happy as an engineer.
●
A person with these scores on the REID report
will likely steal from an employer.
Sources of Validity Evidence
●
In the beginning, there were types of validity (e.g.,
content, criterion-related, construct)
●
Now, All validity evidence is construct validity
evidence
●
Sources of Validity Evidence
–
–
–
–
–
–
–
Content Evidence
Criterion-Related Evidence
Discriminant Groups Evidence
Multi-Trait, Multi-Method Correlations
Face Validity?
Consequential Evidence?
Internal Structure Evidence
Content Validity
Content validity as
representativeness
●
Content validity is concerned with samplepopulation measurement representation
–
●
Ex: The knowledge, skills, abilities, and personal
characteristics (KSAPs) covered by the test items
should be representative to the entire domain of
KSAPs
Content validity is judgment
concerning how adequately a
test samples behavior
representative of the universe
of behavior the test was
designed to sample
●
Content validity draw an
inference from test scores to a
large domain of items similar
to those on the test
Content experts
●
Content validity is usually established by content
or subject matter experts (SMEs).
●
In content validity evidence is obtained by looking
for agreement in judgments by experts panel
A test that includes a more representative sample
of the target behavior lends itself to more accurate
inferences
–
that is, inferences which hold true under a wider range
of circumstances.
–
If important aspects of the outcome are missed by the
scales, then some inferences which will prove to be
wrong; then inferences (not the tests) are invalid
Quantification of Content validity?
(Lawshe)
●
●
Each member of panel of experts responds to the
question “is the skill or knowledge measured by
this item…
–
Essential versus
–
Useful but not essential versus
–
Not necessary
Quantification of Content validity?
(Lawshe)
●
For each item, the number of panelists stating that
item is essential is noted. If more than half the
panelists indicate that an item is essential, that
item has at least some content validity
●
Greater levels of content validity exist as larger
numbers of panelists agree that a particular item is
essential
Drawbacks:
●
●
…to the behavioral domain?”
–
experts tend to take their knowledge for granted and
forget how little other people know. Some tests written
by content experts are extremely difficult.
–
Content experts often fail to identify the learning
objectives of a subject.
Content Relevance & Coverage
●
Messick (1980)
●
Content relevance: each item on the test should
relate to one dimension of the domain.
●
Content coverage: each domain dimension should
be represented by one or more item
Specification Chart
Physiology
….
1
2
3
4
5
6
7
8
20
Criterion-Related Validity
Content Area
Question
◙
Semiology Diagnosis ….. Treatment
◙
◙
◙
◙
◙
Concurrent Validity
●
A judgment regarding how adequately a test
score can be used to infer an individual’s
most probable standing a criterion (e.g.,
performance) of interest.
●
Concurrent validity refers to the form of criterionrelated validity that is an index of the degree to
which a test score is related to some criterion
measure obtained at the same time.
●
Indexed by the correlation between scores on
a measure of the construct and a measure of
the criterion of interest
●
Statements of concurrent validity indicate the
extent to which test scores may be used to estimate
an individual’s present standing on a criterion.
–
●
Concurrent validity estimate
Predictive validity estimate
Predictive Validity
Predictive validity refers to the form of
criterion-related validity that is an index of
the degree to which a test score predicts
some criterion measure obtained at a future
time.
–
●
–
Correlation is estimated in one of two ways:
–
–
●
–
Validity Coefficient
Example: clerkship scores of a medical student
as predictor of physicians performance after
graduation as criterion
Drawbacks:
–
Will have range restriction unless all applicants
are hired
– Must wait several months for job performance
(criterion) data
Must involve current employees, which results in range
restriction & non-representative sample
Current employees will not be as motivated to do well
on the test as job seekers
Criterion Related Validity
●
Must have a criterion measure to use this
form of validity
●
If a good criterion measure already exists,
why use another test?
●
Because in many situations the criterion
measurement
–
–
–
–
Is Impractical
Is expensive
Is time consuming
Associated with delayed outcome
Expectancy Table
●
If you have a validity coefficient, you can
form a chart to communicate the expected
performance gains associated with basing
decisions on the predictor
●
Expectancy tables illustrate the likelihood
that the testtaker will score within some
interval of scores on a criterion measure:
–
Show the % of people within specified test-score
intervals who were placed in various categories of the
criterion.
Taylor-Russell Table
Decision Theory Terminology
●
●
●
●
●
Expectancy Table
Base rate: the extent to which a particular characteristic or
attribute exist in the population.
Hit rate: the proportion of people a test accurately
identifies as possessing a particular characteristic or
attribute.
Miss rate: the proportion of people the test fails to
identify as having-or not having-a particular characteristic
or attribute.
False positive: a miss wherein the test predicted that the
testtaker did possess the particular characteristic or
attribute being measured.
False negative: a miss wherein the test predicted that the
testtaker did not possess the particular characteristic or
attribute being measured
Cronbach & Gleser Decision Theory
●
A classification of decision problems,
various selection strategies ranging from
single stage processes to sequential analysis;
●
A quantitative analysis of the relationship
between test utility, the selection ration, cost
of the testing program, and expected value of
the outcome.
●
A recommendation that in some instances
job requirements be tailored to the
applicant’s ability instead of the other way
around
Decision Theory Graph
Patients with disease are shown in the upper distribution;
Patients without disease are shown in the lower distribution.
Construct Validity Evidence
●
●
●
Constructs are interrelated
Construct Validity
Theory
Concerned with the theoretical relationships
among constructs
And
The corresponding observed relationships among
measures
What you think
Cause
Construct
cause-effect construct
the
construct
Can we generalize to the constructs?
Program
program-outcome relationship
What you do
other
construct:
B
other
construct:
A
Effect
Construct
other
construct:
C
Observations
What you see
other
construct:
D
Observation
What is the goal?
The Problem
measure all of the construct and nothing else
other
construct:
B
other
construct:
A
the
construct
other
construct:
C
other
construct:
D
●
concepts are not mutually exclusive
●
they exist in a web of overlapping meaning
●
to enhance construct validity, we must show
where the construct is in its broader network
of meaning
Could show that...
Could show that...
the construct is slightly related to the other
four...
...and
...and,, constructs A & C and constructs B & D
are related to each other...
other
construct:
B
other
construct:
A
the
construct
the
construct
other
construct:
D
other
construct:
C
other
construct:
B
other
construct:
A
Could show that...
other
construct:
D
other
construct:
C
Example: Want to Measure...
...and
...and,, constructs A & C are not related to
constructs B & D
other
construct:
B
other
construct:
A
the
construct
self
esteem
other
construct:
D
other
construct:
C
Example: Distinguish From...
self
disclosure
self worth
To Establish Construct Validity
●
Must set the construct within a semantic
(meaning) net
●
Evidence that you control the
operationalization of the construct (that your
theory has some correspondence with reality)
●
Must provide evidence that your data support
the theoretical structure
self
esteem
confidence
openness
The Convergent Principle
Convergent and Discriminant Validity
How it works
measures of constructs that are
related to each other should be
strongly correlated
The Discriminant Principle
Theory
you theorize that the
items all reflect self
esteem
self esteem
construct
item 1
item 2
1.00
.83
.89
.91
item 3
item 4
measures of different constructs
should not correlate highly with
each other
.83 .89 .91
1.00 .85 .90
.85 1.00 .86
.90 .86 1.00
the correlations provide evidence
that the items all converge
on the same construct
Observation
How it works
Theory
problem solving
construct
factual knowledge
construct
Putting It All Together
PS1
the correlations provide
evidence that the items on
the two tests discriminate
Observation
FK1
PS2
rPS , FK
rPS , FK
rPS , FK
rPS , FK
FK2
Convergent and Discriminant Validity
1
1
= .12
1
2
= .09
2
1
= .04
2
2
= .11
we have two constructs
we want to measure, problem solving
and factual knowledge
Theory
problem solving
construct
PS1
PS2
Theory
problem solving
construct
factual knowledge
construct
PS3
FK1
FK2
FK3
Theory
FK1
FK2
FK3
The Nomological Network
problem solving
construct
factual knowledge
construct
●
●
PS1
PS2
PS3
FK1
FK2
FK3
PS3
PS1 PS2
PS3 FK1
FK2
FK3
.89
.02
.12
.09
PS1 1.00 .83
Convergent
.83
1.00
.85
.05
.11
.03
PS2
.85 1.00
.04
.00
.06
PS3 .89
.02
.05
.04 1.00
.84
.93
FK1
.12
.11
.00
.84 1.00
.91
FK2
.09
.03
.06
.93
.91 1.00
FK3
Divergent
Observation
for each construct we develop three
scale items; our theory is that items
within construct will converge,
across constructs will discriminate
PS1
PS2
PS1
factual knowledge
construct
PS2
PS3
FK1
FK2
FK3
PS1 PS2
PS3 FK1
FK2
FK3
1.00 .83
.89
.02
.12
.09
.83 1.00
.85
.05
.11
.03
.89
.85 1.00
.04
.00
.06
.02
.05
.04 1.00
.84
.93
.12
.11
.00
.84 1.00
.91
.09
.03
.06
.93
.91 1.00
Observation
nomological is derived from Greek and
means “lawful”
●
links interrelated theoretical ideas with
empirical evidence
What is the Nomological Net?
a representation of the concepts (constructs) of
interest in a study,
construct
construct
●
the correlations support both
convergence and discrimination,
discrimination,
and therefore construct validity
What is the Nomological Net?
construct
What is it?
Developed by Cronbach, L. and Meehl, P.
(1955). Construct validity in psychological
tests, Psychological Bulletin, 52, 4, 281-302.
construct
a representation of the concepts (constructs) of
interest in a study,
construct
construct
construct
construct
obs
obs
obs
construct
construct
obs
obs
obs
obs
obs
obs
obs
...their observable manifestations, and the
interrelationships among and between these
What is the Nomological Net?
Principles
Theoretical Level: Concepts, Ideas
construct
construct
Scientifically, to make clear what something is
means to set forth the laws in which it occurs.
construct
construct
construct
obs
obs
obs
construct
construct
construct
obs
obs
obs
obs
obs
obs
obs
obs
Observed Level:Measures, Programs
obs
obs
obs
obs
obs
This interlocking system of laws is the
Nomological Network.
Principles
Principles
The laws in a nomological network may relate...
construct
obs
obs
construct
obs
obs
construct
obs
obs
observable properties or quantities to each other
Principles
obs
obs
construct
obs
construct
obs
obs
construct
obs
obs
obs
different theoretical constructs to each other
Principles
The laws in a nomological network may relate...
construct
The laws in a nomological network may relate...
construct
obs
obs
construct
obs
theoretical constructs to observables
obs
"Learning more about" a theoretical construct is a matter
of elaborating the nomological network in which it
occurs...
construct
obs
obs
construct
obs
obs
obs
construct
obs
obs
...or of increasing the definiteness
of its components
obs
The Main Problem with the
Nomological Net
...it doesn't tell us how we can
assess the construct validity in a
study
What is the MTMM Matrix?
•
•
•
An approach developed by Campbell, D.
and Fiske, D. (1959). Convergent and
dicriminant validation by the multitraitmultimethod matrix. 56, 2, 81-105.
A matrix (table) of correlations arranged to
facilitate the assessment of construct
validity
integrates both convergent and discriminant
validity
Principles
•
•
Convergence: Things which should
be related are
Divergence/Discrimination: Things
which shouldn't be related aren't
The Multitrait-Multimethod Matrix
What is the MTMM Matrix?
•
assumes that you measure each of several
concepts (trait) by more than one method
•
very restrictive - ideally you should
measure each concept by each method
•
arrange the correlation matrix by concepts
within methods
A Hypothetical MTMM Matrix
Traits
A1
Method 1 B1
C1
A2
Method 2 B2
C2
A3
Method 3 B3
C3
A1
Method 1
B1
C1
A2
Method 2
B2
C2
A3
Method 3
B3
C3
(.89)
.51 (.89)
.38
.37 (.76)
.57
.22
.11
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
(.93)
.68 (.94)
.59
.58 (.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
(.94)
.67 (.92)
.58
.60 (.85)
Parts of the Matrix
Traits
A1
Method 1 B1
C1
A2
Method 2 B2
C2
A3
Method 3 B3
C3
A1
Method 1
B1
C1
Parts of the Matrix
A2
(.89)
.89)
.51 (.89)
.89)
.38
.37 (.76)
.76)
.57
.22
.11
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
Method 2
B2
C2
A3
Method 3
B3
C3
the reliability diagonal
(.93)
.93)
.68 (.94)
.94)
.59
.58 (.84)
.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
A1
Method 1 B1
C1
A2
Method 2 B2
C2
A3
Method 3 B3
C3
A1
Method 1
B1
C1
(.94)
.94)
.67 (.92)
.92)
.58
.60 (.85)
.85)
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
A2
Method 2
B2
C2
A3
Method 3
B3
C3
monomethod
heterotrait triangles
(.93)
.68 (.94)
.59
.58 (.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
A1
Method 1 B1
C1
A2
Method 2 B2
C2
A3
Method 3 B3
C3
A1
Method 1
B1
C1
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
Method 1
B1
C1
A2
Method 2
B2
C2
(.89)
.51 (.89)
.38
.37 (.76)
.57
.22
.11
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
A3
Method 3
B3
C3
validity diagonals
(.93)
.68 (.94)
.59
.58 (.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
(.94)
.67 (.92)
.58
.60 (.85)
A1
Method 1 B1
C1
(.94)
.67 (.92)
.58
.60 (.85)
A3
Method 3 B3
C3
A1
Method 1
B1
C1
A2
(.89)
.51 (.89)
.38
.37 (.76)
.57
.22
.11
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
Method 2
B2
C2
A3
Method 3
B3
C3
heteromethod
heterotrait triangles
(.93)
.68 (.94)
.59
.58 (.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
(.94)
.67 (.92)
.58
.60 (.85)
Parts of the Matrix
A2
Method 2
B2
C2
(.89)
.51 (.89)
.38
.37 (.76)
.57
.22
.11
Traits
A2
Method 2 B2
C2
Parts of the Matrix
Traits
A3
Method 3 B3
C3
A1
Parts of the Matrix
(.89)
.51 (.89)
.38
.37 (.76)
.57
.22
.11
A1
Method 1 B1
C1
A2
Method 2 B2
C2
Parts of the Matrix
Traits
Traits
A3
Method 3
B3
C3
monomethod blocks
(.93)
.68 (.94)
.59
.58 (.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
Traits
A1
Method 1 B1
C1
A2
Method 2 B2
C2
(.94)
.67 (.92)
.58
.60 (.85)
A3
Method 3 B3
C3
A1
Method 1
B1
C1
A2
(.89)
.51 (.89)
.38
.37 (.76)
.57
.22
.11
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
Method 2
B2
C2
A3
Method 3
B3
C3
heteromethod blocks
(.93)
.68 (.94)
.59
.58 (.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
(.94)
.67 (.92)
.58
.60 (.85)
Interpreting the MTMM Matrix
Traits
A1
Method 1 B1
C1
A2
Method 2 B2
C2
A3
Method 3 B3
C3
A1
Method 1
B1
C1
A2
Method 2
B2
C2
(.89)
.89)
.51 (.89)
.89)
.38
.37 (.76)
.76)
.57
.22
.11
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
A3
Interpreting the MTMM Matrix
Method 3
B3
C3
(.93)
.93)
.68 (.94)
.94)
.59
.58 (.84)
.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
A2
Method 2 B2
C2
(.94)
.94)
.67 (.92)
.92)
.58
.60 (.85)
.85)
A1
Method 1 B1
C1
A2
Method 2 B2
C2
A3
Method 3 B3
C3
Method 1
B1
C1
A2
Method 2
B2
C2
A3
A3
Method 3 B3
C3
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
Method 3
B3
C3
(.93)
.68 (.94)
.59
.58 (.84)
.42
.66
.32
.33
.34
.58
A2
(.89)
.51 (.89)
.38
.37 (.76)
.57
.22
.11
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
Traits
A1
Method 1 B1
C1
.67
.43
.34
Method 1
B1
C1
Method 2
B2
C2
A3
Method 3
B3
C3
Convergent - validity diagonals
should have strong r's
(.93)
.68 (.94)
.59
.58 (.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
(.94)
.67 (.92)
.58
.60 (.85)
Discriminant - a validity diagonal should be higher than the other values in its
row and column within its own block (heteromethod)
(.89)
.51 (.89)
.38
.37 (.76)
.57
.22
.11
A1
Interpreting the MTMM Matrix
Convergent - the same pattern of trait interrelationship should occur in all
triangles (mono and heteromethod blocks)
A1
A1
Method 1 B1
C1
Reliability - should be
highest coefficients
Interpreting the MTMM Matrix
Traits
Traits
A2
Method 2 B2
C2
(.94)
.67 (.92)
.58
.60 (.85)
Interpreting the MTMM Matrix
A3
Method 3 B3
C3
A1
Method 1
B1
C1
A2
Method 2
B2
C2
A3
Method 3
B3
C3
(.89)
.51 (.89)
.38
.37 (.76)
.57
.22
.11
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
(.93)
.68 (.94)
.59
.58 (.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
(.94)
.67 (.92)
.58
.60 (.85)
Advantages
Disciminant - a variable should have higher r with another measure
of the same trait than with different traits measured by the same
method
●
Traits
A1
Method 1
B1
C1
A2
Method 2
B2
C2
A3
Method 3
B3
C3
●
A1
Method 1 B1
C1
A2
Method 2 B2
C2
A3
Method 3 B3
C3
(.89)
.51 (.89)
.38
.37 (.76)
.57
.22
.11
.22
.57
.11
.09
.10
.46
.56
.23
.11
.22
.58
.11
.11
.12
.45
●
(.93)
.68 (.94)
.59
.58 (.84)
.67
.43
.34
.42
.66
.32
.33
.34
.58
(.94)
.67 (.92)
.58
.60 (.85)
addresses convergent and discriminant validity
simultaneously
addresses the importance of method of
measurement
provides a rigorous standard for construct
validity
Disadvantages
●
●
●
hard to implement
no known overall statistical test for validity
requires judgment call on interpretation
Additional Representations
of Validity
●
Face Validity – degree to which a test appears to
measure what it purports to measure; i.e., do the
test items appear to represent the domain being
evaluated?
–
important because lack a of it could contribute to a
lack of confidence with respect to perceived
effectiveness of the test.
●
Physical Fidelity – do physical characteristics of
test represent reality
●
Psychological Fidelity – do psychological
demands of test reflect real-life situation
Inadequate Preoperational
Explication of Constructs
Threats to
Construct Validity
Mono-operation Bias
•
•
pertains to the treatment or program
used only one version of the treatment or
program
•
preoperational = before translating constructs
into measures or treatments
•
in other words, you didn't do a good enough job
of defining (operationally) what you mean by the
construct
Mono-method Bias
•
•
●
pertains especially to the measures or outcomes
only operationalized measures in one way
for instance, only used paper-and-pencil tests
Restricted Generalizability Across
Constructs
Consequential Validity
●
•
•
Discriminant Groups Evidence
●
A measure of a construct should be sensitive to
differences between groups known to differ on the
construct
–
–
Alcoholism
Gender attitudes
–
MMPI
Messick (1995)
–
you didn't measure your outcomes completely
or, you didn't measure some key affected
constructs at all (i.e., unintended effects)
the aspect of construct validity that appraises the value
implications of score interpretation as a basis for action
as well as the actual and potential consequences of test
use, especially in regard to sources of invalidity related
to issues of bias, fairness, and distributive justice
Internal Structure Evidence
●
Factor Analysis! :)