Defining, Measuring, and Manipulating Variables

Defining, Measuring, and Manipulating Variables
Operational definition of a construct
 Constructs:
 Hunger, aggression, happiness, success, intelligence …
 Operational definition:
 How is the construct measured?
 Hunger: scale of 1-7 subjective feeling
 Hunger: # hrs since last ate
 Accuracy of operational definition
 Circular or tautological definitions
 Definitions may not match construct
 Definitions may differ between researchers
Caffeine consumption questionnaire
 How is caffeine
consumption
operationally
defined?
Scales of measurement
 Nominal, ordinal, interval, ratio (p61)
 Also distinguished as discrete vs. continuous variables
 Or qualitative vs. quantitative
What scale of measurement is used?
Item
Scale of measurement
 True-false test
 Nominal
 IQ test scores
 Interval
 Political affiliation
 Nominal
 Top 10 basketball teams
 Ordinal
 Time to finish an exam
 Ratio
 List of favorite to least
 Ordinal
favorite teachers
 Zip code
 Class rank
 Nominal
 Ordinal
What scale of measurement is used?
 Indicate your attitude toward scientific research by placing a
check mark on each scale
Positive __ __ __ __ __ __ __ Negative
Worthless __ __ __ __ __ __ __ Valuable
Unethical __ __ __ __ __ __ __ Ethical
 Circle your answer:
 Scientific research has produced many advances that have
significantly enhanced the quality of human life.
strongly agree agree neutral disagree strongly disagree
 Above examples use “Likert scale”
 Each response can be numbered from 1 – 7 = interval scale
Weinle (2003)
 Examine use of drawing to facilitate kid’s narrative about
emotional events.




Participants: 6, 7, 8-yr-olds
Method: Interviewed about “mad” or “sad” events
½ asked to draw picture while talking; ½ just talked
Results: Children who drew while talking provided
significantly longer and richer narratives
 What are the scales of measurement for IVs and DV?
 IV:
 Age = ratio; Activity while talking = nominal; Emotion of
event = nominal
 DV:
 Length of narrative = ratio; Richness of narrative = interval or
ordinal
Caffeine consumption questionnaire
 What scales of
measurement
are used?
 What other
questions could
be asked that use
other scales?
 Nominal
 Ordinal
 Interval
 Ratio
Types of measures (p65)
What type of measurement is used? What scale of measurement?
Geriatric Depression Scale (GDS)
 Choose the best answer for how you have felt over the past week: YES / NO
 1. Are you basically satisfied with your life?
 2. Have you dropped many of your activities and interests?
 3. Do you feel that your life is empty?
 4. Do you often get bored?
 5. Are you in good spirits most of the time?
 6. Are you afraid that something bad is going to happen to you?
 7. Do you feel happy most of the time?
 8. Do you often feel helpless?
 9. Do you prefer to stay at home, rather than going out and doing new things?
 10. Do you feel you have more problems with memory than most?
 11. Do you think it is wonderful to be alive now?
 12. Do you feel pretty worthless the way you are now?
 13. Do you feel full of energy?
 14. Do you feel that your situation is hopeless?
 15. Do you think that most people are better off than you are?
Reliability
 “Consistency and stability of a measuring instrument”(p65)
 Is the scale free from random error?
 Observed score = true score + error
 “High reliability” = low error
 Types of errors:
 Method error (e.g. test situation, equipment error)
 Trait error (e.g. fatigue, health, truthfulness)
 Theoretical reliability
 True score / true score + error score
 Measured reliability
 Correlation coefficient: -1.0 to 0 to +1.0
 .70 – 1.0 Strong; .30 - .69 Moderate; .00 - .29 Weak
 Not all-or-none; a more or less reliable measure
Correlational design
 Scatterplot: relationship between 2 quantitative variables
 How 1 variable relates to or influences another variable
Individual =
dot (X and Y
data point)
Lexical decision task and measurement error




Press “yes” button when you see a word (“crow”)
Press “no” button when you see a non-word (“cwor”)
IV: stimulus (word/non-word)
DV: RT (time to press button)
 Types of measurement errors?
 Ss responds more slowly on later trials due to fatigue
 Ss responds more quickly b/c just saw the word before coming to
the lab
 Ss responds more slowly because sneezing during trial
 Ss performs poorly b/c can’t read words clearly on screen; b/c room
is too warm; b/c thinking about other things…
 Something affects behavior other than the variable you are studying
Types of reliability
 How can you measure reliability?
 Test-retest reliability
 Compare same test on 2 occasions
 Alternative forms reliability
 Compare equivalent or similar tests
 Split-half reliability
 Compare performance on 2 halves of a test
 Inter-rater reliability
 Consistency/agreement between 2 judges
 # agree / # possible agree x 100
 What types of measures would use this?
Kazdin (1990): Automatic thoughts questionnaire
 “An examination of the internal consistency of the ATQ
yielded a coefficient alpha of .96… These statistics suggest a
high level of internal consistency.”
 Reliability statistic: Chronbach’s alpha
 Average correlation among all items of scale
 .70 – 1.0 Strong; .30 - .69 Moderate; .00 - .29 Weak
 “Individual item-total score correlations, presented in
Table 1, were in the moderate to high range (r’s = .39 to .81).
The mean item-total correlation… was .69.”
 Reliability measurements: Inter-item correlation matrix
 All correlations should be positive
Validity
 Does measure provide info on what we really want to
measure?
 Multiple types of validity
 Content validity
 Criterion validity
 Construct validity
 Validity is not all-or-none, but on a scale
 Can be high in 1 type of validity and low on others
 Later… (ch8)
 Internal validity: eliminated extraneous variables
 External validity: findings will generalize to other contexts
Content validity
 Does test have representative samples of behavior
 Does content of test reflect what we want to measure?
 Are all aspects of content represented fairly?
 e.g. exams in courses
 Do test items match with what you’ve learned/studied?
 e.g. depression questionnaire
 Does test measure all behaviors that would be of interest?
 The more specific the variable, easier it is to get good
content validity
 Face validity: does it appear to be valid
 Examine what test appears to measure on surface
 But, does not provide any real evidence
Criterion validity
 Extent predicts behavior or ability in area
 Compare scores on measure with another criterion (area)
 Concurrent validity
 Test used to predict present performance
 e.g. pilot or driving test
 Predictive validity
 Test used to predict future performance
 e.g. SAT or GRE
 Convergent validity
 Significant (pos or neg) correlations found where expected
 Discriminant (or divergent) validity
 Zero correlations found between variables supposed to be unrelated
Construct validity
 Degree test accurately measures construct
 Examine if concept is being operationalized in a useful
way
 e.g. depression questionnaire
 Is test measuring same construct in all populations that
are tested (young – older adults; all cultures)?
 e.g. induce depression
 Ss read positive or negative statements to induce or
diminish depressed mood but does it resemble naturally
occurring depression? Does method have construct
validity?
Kazdin (1990): ATQ
 “Criterion validity: Depressed versus nondepressed children” … A
one-way ANOVA of total ATQ scores indicated that depressed
children were significantly higher in negative thoughts (M =
82.8) than were nondepressed children (M = 52.9), F(1, 136) =
47.02, p < .001.
 Overall ATQ score is predicting which group children belong to.
 Another section examines which particular items (or statements on
scale) distinguish groups
 “Convergent validity. … As shown in Table 2 performance on ATQ
correlated significantly with other measures of cognitive
processes related to depression. Children who indicated more
negative thoughts showed lower self-esteem, greater
hopelessness, and more external attribution of control. The
correlations … support convergent validity of the ATQ.”
Kazdin (1990): ATQ
 “Discriminant validity. … As can be
seen in Table 2, the ATQ did not
correlate significantly with severity
of impairment or social competence.
These findings would seem to
support the discriminant validity of
the ATQ. However the absence of …
correlations might have been due to
the different raters (children vs
parents).”
 “These results suggest that the ATQ
tended to correlate more highly with
other measures of cognitive
processes and with depression than
with measures of prosocial behavior
and positive affective experience.
Reliability and validity
 What is relationship between reliability and validity?
 Can measure be valid w/o being reliable?
 No
 Can measure be reliable w/o being valid?
 Yes
 Can give same score each time but not give any useful
information!
 e.g. Measure height as estimate of intelligence
The SAT: What type of validity?
 Is the SAT useful in predicting how well students
perform in college?
 Does SAT-math test concepts from math courses at
high school level?
 Do SAT questions measure “academic strength”?
 Are SAT-math and SAT-verbal positively correlated?
The SAT
 Is the SAT useful in predicting how well students
perform in college?
 Criterion or predictive validity
 Does SAT-math test concepts from math courses at
high school level?
 Content validity
 Do SAT questions measure “academic strength”?
 Construct validity
 Are SAT-math and SAT-verbal positively correlated?
 Convergent validity