Definitions Validity ● Psych 818 DeShon Definitions ● Validation: the process of gathering and evaluating information to support the desired inference ● You don't validate a test! ● Instead, you validate inferences or decisions based on test scores ● Validation determines the degree of confidence that decision makers can place in inferences we mdee about people based on their test scores General Issues ● Validity statements should address particular interpretations or types of decisions – ● ● Ex: a test for general intelligence (Wonderlic) may discriminate well for the general population but not very well for college grads Validation is a process of hypothesis testing: – Someone who scores high on this measure will also do well in situation A In validity assessment the aim is inferential – Ex: a person who does well on rheumatology exam can be expected to know more about rheumatic disease, or manage patient with rheumatologic disease appropriately. Validity is the extent to which the inferences made from test scores are accurate – Variation in the underlying construct causes variation in the measurement process – Establishing causality in measurement is no different than establishing causality for any research question – Must show evidence to support the inference – Must rule out alternative explanations Examples ● People with higher levels of depression will score higher on the Beck depression inventory ● A person with this score on the LSAT will do well in law school. ● A person with these scores on the SVIB will be happy as an engineer. ● A person with these scores on the REID report will likely steal from an employer. Sources of Validity Evidence ● In the beginning, there were types of validity (e.g., content, criterion-related, construct) ● Now, All validity evidence is construct validity evidence ● Sources of Validity Evidence – – – – – – – Content Evidence Criterion-Related Evidence Discriminant Groups Evidence Multi-Trait, Multi-Method Correlations Face Validity? Consequential Evidence? Internal Structure Evidence Content Validity Content validity as representativeness ● Content validity is concerned with samplepopulation measurement representation – ● Ex: The knowledge, skills, abilities, and personal characteristics (KSAPs) covered by the test items should be representative to the entire domain of KSAPs Content validity is judgment concerning how adequately a test samples behavior representative of the universe of behavior the test was designed to sample ● Content validity draw an inference from test scores to a large domain of items similar to those on the test Content experts ● Content validity is usually established by content or subject matter experts (SMEs). ● In content validity evidence is obtained by looking for agreement in judgments by experts panel A test that includes a more representative sample of the target behavior lends itself to more accurate inferences – that is, inferences which hold true under a wider range of circumstances. – If important aspects of the outcome are missed by the scales, then some inferences which will prove to be wrong; then inferences (not the tests) are invalid Quantification of Content validity? (Lawshe) ● ● Each member of panel of experts responds to the question “is the skill or knowledge measured by this item… – Essential versus – Useful but not essential versus – Not necessary Quantification of Content validity? (Lawshe) ● For each item, the number of panelists stating that item is essential is noted. If more than half the panelists indicate that an item is essential, that item has at least some content validity ● Greater levels of content validity exist as larger numbers of panelists agree that a particular item is essential Drawbacks: ● ● …to the behavioral domain?” – experts tend to take their knowledge for granted and forget how little other people know. Some tests written by content experts are extremely difficult. – Content experts often fail to identify the learning objectives of a subject. Content Relevance & Coverage ● Messick (1980) ● Content relevance: each item on the test should relate to one dimension of the domain. ● Content coverage: each domain dimension should be represented by one or more item Specification Chart Physiology …. 1 2 3 4 5 6 7 8 20 Criterion-Related Validity Content Area Question ◙ Semiology Diagnosis ….. Treatment ◙ ◙ ◙ ◙ ◙ Concurrent Validity ● A judgment regarding how adequately a test score can be used to infer an individual’s most probable standing a criterion (e.g., performance) of interest. ● Concurrent validity refers to the form of criterionrelated validity that is an index of the degree to which a test score is related to some criterion measure obtained at the same time. ● Indexed by the correlation between scores on a measure of the construct and a measure of the criterion of interest ● Statements of concurrent validity indicate the extent to which test scores may be used to estimate an individual’s present standing on a criterion. – ● Concurrent validity estimate Predictive validity estimate Predictive Validity Predictive validity refers to the form of criterion-related validity that is an index of the degree to which a test score predicts some criterion measure obtained at a future time. – ● – Correlation is estimated in one of two ways: – – ● – Validity Coefficient Example: clerkship scores of a medical student as predictor of physicians performance after graduation as criterion Drawbacks: – Will have range restriction unless all applicants are hired – Must wait several months for job performance (criterion) data Must involve current employees, which results in range restriction & non-representative sample Current employees will not be as motivated to do well on the test as job seekers Criterion Related Validity ● Must have a criterion measure to use this form of validity ● If a good criterion measure already exists, why use another test? ● Because in many situations the criterion measurement – – – – Is Impractical Is expensive Is time consuming Associated with delayed outcome Expectancy Table ● If you have a validity coefficient, you can form a chart to communicate the expected performance gains associated with basing decisions on the predictor ● Expectancy tables illustrate the likelihood that the testtaker will score within some interval of scores on a criterion measure: – Show the % of people within specified test-score intervals who were placed in various categories of the criterion. Taylor-Russell Table Decision Theory Terminology ● ● ● ● ● Expectancy Table Base rate: the extent to which a particular characteristic or attribute exist in the population. Hit rate: the proportion of people a test accurately identifies as possessing a particular characteristic or attribute. Miss rate: the proportion of people the test fails to identify as having-or not having-a particular characteristic or attribute. False positive: a miss wherein the test predicted that the testtaker did possess the particular characteristic or attribute being measured. False negative: a miss wherein the test predicted that the testtaker did not possess the particular characteristic or attribute being measured Cronbach & Gleser Decision Theory ● A classification of decision problems, various selection strategies ranging from single stage processes to sequential analysis; ● A quantitative analysis of the relationship between test utility, the selection ration, cost of the testing program, and expected value of the outcome. ● A recommendation that in some instances job requirements be tailored to the applicant’s ability instead of the other way around Decision Theory Graph Patients with disease are shown in the upper distribution; Patients without disease are shown in the lower distribution. Construct Validity Evidence ● ● ● Constructs are interrelated Construct Validity Theory Concerned with the theoretical relationships among constructs And The corresponding observed relationships among measures What you think Cause Construct cause-effect construct the construct Can we generalize to the constructs? Program program-outcome relationship What you do other construct: B other construct: A Effect Construct other construct: C Observations What you see other construct: D Observation What is the goal? The Problem measure all of the construct and nothing else other construct: B other construct: A the construct other construct: C other construct: D ● concepts are not mutually exclusive ● they exist in a web of overlapping meaning ● to enhance construct validity, we must show where the construct is in its broader network of meaning Could show that... Could show that... the construct is slightly related to the other four... ...and ...and,, constructs A & C and constructs B & D are related to each other... other construct: B other construct: A the construct the construct other construct: D other construct: C other construct: B other construct: A Could show that... other construct: D other construct: C Example: Want to Measure... ...and ...and,, constructs A & C are not related to constructs B & D other construct: B other construct: A the construct self esteem other construct: D other construct: C Example: Distinguish From... self disclosure self worth To Establish Construct Validity ● Must set the construct within a semantic (meaning) net ● Evidence that you control the operationalization of the construct (that your theory has some correspondence with reality) ● Must provide evidence that your data support the theoretical structure self esteem confidence openness The Convergent Principle Convergent and Discriminant Validity How it works measures of constructs that are related to each other should be strongly correlated The Discriminant Principle Theory you theorize that the items all reflect self esteem self esteem construct item 1 item 2 1.00 .83 .89 .91 item 3 item 4 measures of different constructs should not correlate highly with each other .83 .89 .91 1.00 .85 .90 .85 1.00 .86 .90 .86 1.00 the correlations provide evidence that the items all converge on the same construct Observation How it works Theory problem solving construct factual knowledge construct Putting It All Together PS1 the correlations provide evidence that the items on the two tests discriminate Observation FK1 PS2 rPS , FK rPS , FK rPS , FK rPS , FK FK2 Convergent and Discriminant Validity 1 1 = .12 1 2 = .09 2 1 = .04 2 2 = .11 we have two constructs we want to measure, problem solving and factual knowledge Theory problem solving construct PS1 PS2 Theory problem solving construct factual knowledge construct PS3 FK1 FK2 FK3 Theory FK1 FK2 FK3 The Nomological Network problem solving construct factual knowledge construct ● ● PS1 PS2 PS3 FK1 FK2 FK3 PS3 PS1 PS2 PS3 FK1 FK2 FK3 .89 .02 .12 .09 PS1 1.00 .83 Convergent .83 1.00 .85 .05 .11 .03 PS2 .85 1.00 .04 .00 .06 PS3 .89 .02 .05 .04 1.00 .84 .93 FK1 .12 .11 .00 .84 1.00 .91 FK2 .09 .03 .06 .93 .91 1.00 FK3 Divergent Observation for each construct we develop three scale items; our theory is that items within construct will converge, across constructs will discriminate PS1 PS2 PS1 factual knowledge construct PS2 PS3 FK1 FK2 FK3 PS1 PS2 PS3 FK1 FK2 FK3 1.00 .83 .89 .02 .12 .09 .83 1.00 .85 .05 .11 .03 .89 .85 1.00 .04 .00 .06 .02 .05 .04 1.00 .84 .93 .12 .11 .00 .84 1.00 .91 .09 .03 .06 .93 .91 1.00 Observation nomological is derived from Greek and means “lawful” ● links interrelated theoretical ideas with empirical evidence What is the Nomological Net? a representation of the concepts (constructs) of interest in a study, construct construct ● the correlations support both convergence and discrimination, discrimination, and therefore construct validity What is the Nomological Net? construct What is it? Developed by Cronbach, L. and Meehl, P. (1955). Construct validity in psychological tests, Psychological Bulletin, 52, 4, 281-302. construct a representation of the concepts (constructs) of interest in a study, construct construct construct construct obs obs obs construct construct obs obs obs obs obs obs obs ...their observable manifestations, and the interrelationships among and between these What is the Nomological Net? Principles Theoretical Level: Concepts, Ideas construct construct Scientifically, to make clear what something is means to set forth the laws in which it occurs. construct construct construct obs obs obs construct construct construct obs obs obs obs obs obs obs obs Observed Level:Measures, Programs obs obs obs obs obs This interlocking system of laws is the Nomological Network. Principles Principles The laws in a nomological network may relate... construct obs obs construct obs obs construct obs obs observable properties or quantities to each other Principles obs obs construct obs construct obs obs construct obs obs obs different theoretical constructs to each other Principles The laws in a nomological network may relate... construct The laws in a nomological network may relate... construct obs obs construct obs theoretical constructs to observables obs "Learning more about" a theoretical construct is a matter of elaborating the nomological network in which it occurs... construct obs obs construct obs obs obs construct obs obs ...or of increasing the definiteness of its components obs The Main Problem with the Nomological Net ...it doesn't tell us how we can assess the construct validity in a study What is the MTMM Matrix? • • • An approach developed by Campbell, D. and Fiske, D. (1959). Convergent and dicriminant validation by the multitraitmultimethod matrix. 56, 2, 81-105. A matrix (table) of correlations arranged to facilitate the assessment of construct validity integrates both convergent and discriminant validity Principles • • Convergence: Things which should be related are Divergence/Discrimination: Things which shouldn't be related aren't The Multitrait-Multimethod Matrix What is the MTMM Matrix? • assumes that you measure each of several concepts (trait) by more than one method • very restrictive - ideally you should measure each concept by each method • arrange the correlation matrix by concepts within methods A Hypothetical MTMM Matrix Traits A1 Method 1 B1 C1 A2 Method 2 B2 C2 A3 Method 3 B3 C3 A1 Method 1 B1 C1 A2 Method 2 B2 C2 A3 Method 3 B3 C3 (.89) .51 (.89) .38 .37 (.76) .57 .22 .11 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 (.93) .68 (.94) .59 .58 (.84) .67 .43 .34 .42 .66 .32 .33 .34 .58 (.94) .67 (.92) .58 .60 (.85) Parts of the Matrix Traits A1 Method 1 B1 C1 A2 Method 2 B2 C2 A3 Method 3 B3 C3 A1 Method 1 B1 C1 Parts of the Matrix A2 (.89) .89) .51 (.89) .89) .38 .37 (.76) .76) .57 .22 .11 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 Method 2 B2 C2 A3 Method 3 B3 C3 the reliability diagonal (.93) .93) .68 (.94) .94) .59 .58 (.84) .84) .67 .43 .34 .42 .66 .32 .33 .34 .58 A1 Method 1 B1 C1 A2 Method 2 B2 C2 A3 Method 3 B3 C3 A1 Method 1 B1 C1 (.94) .94) .67 (.92) .92) .58 .60 (.85) .85) .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 A2 Method 2 B2 C2 A3 Method 3 B3 C3 monomethod heterotrait triangles (.93) .68 (.94) .59 .58 (.84) .67 .43 .34 .42 .66 .32 .33 .34 .58 A1 Method 1 B1 C1 A2 Method 2 B2 C2 A3 Method 3 B3 C3 A1 Method 1 B1 C1 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 Method 1 B1 C1 A2 Method 2 B2 C2 (.89) .51 (.89) .38 .37 (.76) .57 .22 .11 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 A3 Method 3 B3 C3 validity diagonals (.93) .68 (.94) .59 .58 (.84) .67 .43 .34 .42 .66 .32 .33 .34 .58 (.94) .67 (.92) .58 .60 (.85) A1 Method 1 B1 C1 (.94) .67 (.92) .58 .60 (.85) A3 Method 3 B3 C3 A1 Method 1 B1 C1 A2 (.89) .51 (.89) .38 .37 (.76) .57 .22 .11 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 Method 2 B2 C2 A3 Method 3 B3 C3 heteromethod heterotrait triangles (.93) .68 (.94) .59 .58 (.84) .67 .43 .34 .42 .66 .32 .33 .34 .58 (.94) .67 (.92) .58 .60 (.85) Parts of the Matrix A2 Method 2 B2 C2 (.89) .51 (.89) .38 .37 (.76) .57 .22 .11 Traits A2 Method 2 B2 C2 Parts of the Matrix Traits A3 Method 3 B3 C3 A1 Parts of the Matrix (.89) .51 (.89) .38 .37 (.76) .57 .22 .11 A1 Method 1 B1 C1 A2 Method 2 B2 C2 Parts of the Matrix Traits Traits A3 Method 3 B3 C3 monomethod blocks (.93) .68 (.94) .59 .58 (.84) .67 .43 .34 .42 .66 .32 .33 .34 .58 Traits A1 Method 1 B1 C1 A2 Method 2 B2 C2 (.94) .67 (.92) .58 .60 (.85) A3 Method 3 B3 C3 A1 Method 1 B1 C1 A2 (.89) .51 (.89) .38 .37 (.76) .57 .22 .11 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 Method 2 B2 C2 A3 Method 3 B3 C3 heteromethod blocks (.93) .68 (.94) .59 .58 (.84) .67 .43 .34 .42 .66 .32 .33 .34 .58 (.94) .67 (.92) .58 .60 (.85) Interpreting the MTMM Matrix Traits A1 Method 1 B1 C1 A2 Method 2 B2 C2 A3 Method 3 B3 C3 A1 Method 1 B1 C1 A2 Method 2 B2 C2 (.89) .89) .51 (.89) .89) .38 .37 (.76) .76) .57 .22 .11 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 A3 Interpreting the MTMM Matrix Method 3 B3 C3 (.93) .93) .68 (.94) .94) .59 .58 (.84) .84) .67 .43 .34 .42 .66 .32 .33 .34 .58 A2 Method 2 B2 C2 (.94) .94) .67 (.92) .92) .58 .60 (.85) .85) A1 Method 1 B1 C1 A2 Method 2 B2 C2 A3 Method 3 B3 C3 Method 1 B1 C1 A2 Method 2 B2 C2 A3 A3 Method 3 B3 C3 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 Method 3 B3 C3 (.93) .68 (.94) .59 .58 (.84) .42 .66 .32 .33 .34 .58 A2 (.89) .51 (.89) .38 .37 (.76) .57 .22 .11 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 Traits A1 Method 1 B1 C1 .67 .43 .34 Method 1 B1 C1 Method 2 B2 C2 A3 Method 3 B3 C3 Convergent - validity diagonals should have strong r's (.93) .68 (.94) .59 .58 (.84) .67 .43 .34 .42 .66 .32 .33 .34 .58 (.94) .67 (.92) .58 .60 (.85) Discriminant - a validity diagonal should be higher than the other values in its row and column within its own block (heteromethod) (.89) .51 (.89) .38 .37 (.76) .57 .22 .11 A1 Interpreting the MTMM Matrix Convergent - the same pattern of trait interrelationship should occur in all triangles (mono and heteromethod blocks) A1 A1 Method 1 B1 C1 Reliability - should be highest coefficients Interpreting the MTMM Matrix Traits Traits A2 Method 2 B2 C2 (.94) .67 (.92) .58 .60 (.85) Interpreting the MTMM Matrix A3 Method 3 B3 C3 A1 Method 1 B1 C1 A2 Method 2 B2 C2 A3 Method 3 B3 C3 (.89) .51 (.89) .38 .37 (.76) .57 .22 .11 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 (.93) .68 (.94) .59 .58 (.84) .67 .43 .34 .42 .66 .32 .33 .34 .58 (.94) .67 (.92) .58 .60 (.85) Advantages Disciminant - a variable should have higher r with another measure of the same trait than with different traits measured by the same method ● Traits A1 Method 1 B1 C1 A2 Method 2 B2 C2 A3 Method 3 B3 C3 ● A1 Method 1 B1 C1 A2 Method 2 B2 C2 A3 Method 3 B3 C3 (.89) .51 (.89) .38 .37 (.76) .57 .22 .11 .22 .57 .11 .09 .10 .46 .56 .23 .11 .22 .58 .11 .11 .12 .45 ● (.93) .68 (.94) .59 .58 (.84) .67 .43 .34 .42 .66 .32 .33 .34 .58 (.94) .67 (.92) .58 .60 (.85) addresses convergent and discriminant validity simultaneously addresses the importance of method of measurement provides a rigorous standard for construct validity Disadvantages ● ● ● hard to implement no known overall statistical test for validity requires judgment call on interpretation Additional Representations of Validity ● Face Validity – degree to which a test appears to measure what it purports to measure; i.e., do the test items appear to represent the domain being evaluated? – important because lack a of it could contribute to a lack of confidence with respect to perceived effectiveness of the test. ● Physical Fidelity – do physical characteristics of test represent reality ● Psychological Fidelity – do psychological demands of test reflect real-life situation Inadequate Preoperational Explication of Constructs Threats to Construct Validity Mono-operation Bias • • pertains to the treatment or program used only one version of the treatment or program • preoperational = before translating constructs into measures or treatments • in other words, you didn't do a good enough job of defining (operationally) what you mean by the construct Mono-method Bias • • ● pertains especially to the measures or outcomes only operationalized measures in one way for instance, only used paper-and-pencil tests Restricted Generalizability Across Constructs Consequential Validity ● • • Discriminant Groups Evidence ● A measure of a construct should be sensitive to differences between groups known to differ on the construct – – Alcoholism Gender attitudes – MMPI Messick (1995) – you didn't measure your outcomes completely or, you didn't measure some key affected constructs at all (i.e., unintended effects) the aspect of construct validity that appraises the value implications of score interpretation as a basis for action as well as the actual and potential consequences of test use, especially in regard to sources of invalidity related to issues of bias, fairness, and distributive justice Internal Structure Evidence ● Factor Analysis! :)
© Copyright 2026 Paperzz