Defining, Measuring, and Manipulating Variables Operational definition of a construct Constructs: Hunger, aggression, happiness, success, intelligence … Operational definition: How is the construct measured? Hunger: scale of 1-7 subjective feeling Hunger: # hrs since last ate Accuracy of operational definition Circular or tautological definitions Definitions may not match construct Definitions may differ between researchers Caffeine consumption questionnaire How is caffeine consumption operationally defined? Scales of measurement Nominal, ordinal, interval, ratio (p61) Also distinguished as discrete vs. continuous variables Or qualitative vs. quantitative What scale of measurement is used? Item Scale of measurement True-false test Nominal IQ test scores Interval Political affiliation Nominal Top 10 basketball teams Ordinal Time to finish an exam Ratio List of favorite to least Ordinal favorite teachers Zip code Class rank Nominal Ordinal What scale of measurement is used? Indicate your attitude toward scientific research by placing a check mark on each scale Positive __ __ __ __ __ __ __ Negative Worthless __ __ __ __ __ __ __ Valuable Unethical __ __ __ __ __ __ __ Ethical Circle your answer: Scientific research has produced many advances that have significantly enhanced the quality of human life. strongly agree agree neutral disagree strongly disagree Above examples use “Likert scale” Each response can be numbered from 1 – 7 = interval scale Weinle (2003) Examine use of drawing to facilitate kid’s narrative about emotional events. Participants: 6, 7, 8-yr-olds Method: Interviewed about “mad” or “sad” events ½ asked to draw picture while talking; ½ just talked Results: Children who drew while talking provided significantly longer and richer narratives What are the scales of measurement for IVs and DV? IV: Age = ratio; Activity while talking = nominal; Emotion of event = nominal DV: Length of narrative = ratio; Richness of narrative = interval or ordinal Caffeine consumption questionnaire What scales of measurement are used? What other questions could be asked that use other scales? Nominal Ordinal Interval Ratio Types of measures (p65) What type of measurement is used? What scale of measurement? Geriatric Depression Scale (GDS) Choose the best answer for how you have felt over the past week: YES / NO 1. Are you basically satisfied with your life? 2. Have you dropped many of your activities and interests? 3. Do you feel that your life is empty? 4. Do you often get bored? 5. Are you in good spirits most of the time? 6. Are you afraid that something bad is going to happen to you? 7. Do you feel happy most of the time? 8. Do you often feel helpless? 9. Do you prefer to stay at home, rather than going out and doing new things? 10. Do you feel you have more problems with memory than most? 11. Do you think it is wonderful to be alive now? 12. Do you feel pretty worthless the way you are now? 13. Do you feel full of energy? 14. Do you feel that your situation is hopeless? 15. Do you think that most people are better off than you are? Reliability “Consistency and stability of a measuring instrument”(p65) Is the scale free from random error? Observed score = true score + error “High reliability” = low error Types of errors: Method error (e.g. test situation, equipment error) Trait error (e.g. fatigue, health, truthfulness) Theoretical reliability True score / true score + error score Measured reliability Correlation coefficient: -1.0 to 0 to +1.0 .70 – 1.0 Strong; .30 - .69 Moderate; .00 - .29 Weak Not all-or-none; a more or less reliable measure Correlational design Scatterplot: relationship between 2 quantitative variables How 1 variable relates to or influences another variable Individual = dot (X and Y data point) Lexical decision task and measurement error Press “yes” button when you see a word (“crow”) Press “no” button when you see a non-word (“cwor”) IV: stimulus (word/non-word) DV: RT (time to press button) Types of measurement errors? Ss responds more slowly on later trials due to fatigue Ss responds more quickly b/c just saw the word before coming to the lab Ss responds more slowly because sneezing during trial Ss performs poorly b/c can’t read words clearly on screen; b/c room is too warm; b/c thinking about other things… Something affects behavior other than the variable you are studying Types of reliability How can you measure reliability? Test-retest reliability Compare same test on 2 occasions Alternative forms reliability Compare equivalent or similar tests Split-half reliability Compare performance on 2 halves of a test Inter-rater reliability Consistency/agreement between 2 judges # agree / # possible agree x 100 What types of measures would use this? Kazdin (1990): Automatic thoughts questionnaire “An examination of the internal consistency of the ATQ yielded a coefficient alpha of .96… These statistics suggest a high level of internal consistency.” Reliability statistic: Chronbach’s alpha Average correlation among all items of scale .70 – 1.0 Strong; .30 - .69 Moderate; .00 - .29 Weak “Individual item-total score correlations, presented in Table 1, were in the moderate to high range (r’s = .39 to .81). The mean item-total correlation… was .69.” Reliability measurements: Inter-item correlation matrix All correlations should be positive Validity Does measure provide info on what we really want to measure? Multiple types of validity Content validity Criterion validity Construct validity Validity is not all-or-none, but on a scale Can be high in 1 type of validity and low on others Later… (ch8) Internal validity: eliminated extraneous variables External validity: findings will generalize to other contexts Content validity Does test have representative samples of behavior Does content of test reflect what we want to measure? Are all aspects of content represented fairly? e.g. exams in courses Do test items match with what you’ve learned/studied? e.g. depression questionnaire Does test measure all behaviors that would be of interest? The more specific the variable, easier it is to get good content validity Face validity: does it appear to be valid Examine what test appears to measure on surface But, does not provide any real evidence Criterion validity Extent predicts behavior or ability in area Compare scores on measure with another criterion (area) Concurrent validity Test used to predict present performance e.g. pilot or driving test Predictive validity Test used to predict future performance e.g. SAT or GRE Convergent validity Significant (pos or neg) correlations found where expected Discriminant (or divergent) validity Zero correlations found between variables supposed to be unrelated Construct validity Degree test accurately measures construct Examine if concept is being operationalized in a useful way e.g. depression questionnaire Is test measuring same construct in all populations that are tested (young – older adults; all cultures)? e.g. induce depression Ss read positive or negative statements to induce or diminish depressed mood but does it resemble naturally occurring depression? Does method have construct validity? Kazdin (1990): ATQ “Criterion validity: Depressed versus nondepressed children” … A one-way ANOVA of total ATQ scores indicated that depressed children were significantly higher in negative thoughts (M = 82.8) than were nondepressed children (M = 52.9), F(1, 136) = 47.02, p < .001. Overall ATQ score is predicting which group children belong to. Another section examines which particular items (or statements on scale) distinguish groups “Convergent validity. … As shown in Table 2 performance on ATQ correlated significantly with other measures of cognitive processes related to depression. Children who indicated more negative thoughts showed lower self-esteem, greater hopelessness, and more external attribution of control. The correlations … support convergent validity of the ATQ.” Kazdin (1990): ATQ “Discriminant validity. … As can be seen in Table 2, the ATQ did not correlate significantly with severity of impairment or social competence. These findings would seem to support the discriminant validity of the ATQ. However the absence of … correlations might have been due to the different raters (children vs parents).” “These results suggest that the ATQ tended to correlate more highly with other measures of cognitive processes and with depression than with measures of prosocial behavior and positive affective experience. Reliability and validity What is relationship between reliability and validity? Can measure be valid w/o being reliable? No Can measure be reliable w/o being valid? Yes Can give same score each time but not give any useful information! e.g. Measure height as estimate of intelligence The SAT: What type of validity? Is the SAT useful in predicting how well students perform in college? Does SAT-math test concepts from math courses at high school level? Do SAT questions measure “academic strength”? Are SAT-math and SAT-verbal positively correlated? The SAT Is the SAT useful in predicting how well students perform in college? Criterion or predictive validity Does SAT-math test concepts from math courses at high school level? Content validity Do SAT questions measure “academic strength”? Construct validity Are SAT-math and SAT-verbal positively correlated? Convergent validity
© Copyright 2026 Paperzz