DEMOGRAPHIC VARIABLES AND INTELLIGENCE TEST SCORES

DEMOGRAPHIC VARIABLES AND INTELLIGENCE TEST SCORES IN
DISABILITY APPLICANTS
ROBERT BRUCE CLAPP JR.
Bachelor of Science
Ohio Dominican University
May 1981
Masters of Arts
Cleveland State University
December, 1989
Educational Specialist in Counseling
Cleveland State University
December, 2002
Submitted in partial fulfillment of the requirements for the degree
DOCTOR OF PHILOSOPHY IN URBAN EDUCATION
at the
CLEVELAND STATE UNIVERSITY
MAY, 2014
© Copyright by Robert Bruce Clapp Jr. 2014
We hereby approve this dissertation of
Robert Bruce Clapp Jr.
Candidate for the Doctor of Philosophy in Urban Education degree
This Dissertation has been approved for the
Office of Doctoral Studies,
College of Education and Human Services
and
CLEVELAND STATE UNIVERSITY
College of Graduate Studies by:
____________________________________
Dr. Kathryn C. MacCluskie, Committee Chairperson
Counseling, Administration, Supervision, and Adult Learning
____________________________________
Dr. Graham B. Stead: Methodologist
Curriculum and Foundations
____________________________________
Dr. Sarah M. Toman; Committee Member
Counseling, Administration, Supervision, and Adult Learning
____________________________________
Dr. Aaron T. Ellington; Committee Member
Counseling, Administration, Supervision, and Adult Learning
____________________________________
Dr. Deborah Koricke; Committee Member
Center for Effective Living
April 28, 2014
Student’s Date of Defense
DEDICATION
Reverend Robert Clapp
(1924-2011)
He taught me how to believe in God; and to believe in our fellow man.
ACKNOWLEDGEMENTS
To Vanessa, Hannah and Robby Clapp; for their patience during this long,
challenging journey.
To Drs. Sarah M. Toman and Kathryn C. MacCluskie, for amazing support and
patient revising.
To Dr. Graham B. Stead Methodologist for his guidance great instruction in the
process.
To Drs Aaron T. Ellington and Deborah Koricke for their great support in the
writing process.
To the Center for Effective Living for providing the data and an excellent learning
experience.
ABSTRACT
The American Psychological Association (2012) has published Guidelines for the
Assessment of and Intervention with Persons with Disabilities. These guidelines
emphasize the importance of recognizing social and cultural diversity for persons with
disabilities (guideline 8, p. 49 ) and the need to apply assessment approaches that are
“psychometrically sound, fair, comprehensive, and appropriate for clients with
disabilities” (guideline 14, p. 52). In addition, the American Association on Intellectual
and Developmental Disabilities (AAIDD; 2011) has indicated IQ testing is a major tool
in assessing intellectual disability. Also, the Wechsler Adult Intelligence Scale, Fourth
Edition (WAIS-IV; Wechsler, 2008) and the Wechsler Intelligence Scale for Children
(WISC-IV; Wechsler, 2003) have become two of the most widely used tests in the world
for assessing intellectual disability (Chen & Zhu; 2008). Studies on both assessments
have focused mainly on people who are generally functioning in the average range of
cognitive ability. Among people whose level of functioning is less sophisticated,
however, issues related to possible test bias are even more concerning, because their test
scores might be even more deleteriously affected by bias than a test-taker whose ability is
in the "average" range.
To illuminate possible psychometric bias against people with disabilities resulting
from cultural differences, this study looks specifically at any differences in the sample
between age, gender, and race (specifically Blacks and Whites) as assessed by the WAISIV or WISC-IV scores, particularly among people being evaluated for disability benefits.
This study seeks to investigate the possible presence of significant correlations between
subtest, Index, and Full Scale scores and selected demographic variables. Since the data
vi
in this archival analysis will be derived from assessments of people who had been
referred due to problems resulting from some form of cognitive impairment, the mean
test scores fall below “average" scores for the standardization sample, which is
considered to be roughly representative of the general U.S. population. Analyses of
subscale scores will be conducted to determine whether subtest variability appears to
have a significant relationship with any of three demographic variables of age, gender,
and race. The purpose of this psychometric analysis is to implement the established
practice guidelines in order to help assure the best possible care for persons being
assessed for intellectual disability.
vii
TABLE OF CONTENTS
ABSTRACT ....................................................................................................................... vi
LIST OF TABLES ............................................................................................................. xi
CHAPTER:
I.
INTRODUCTION AND STATEMENT OF THE PROBLEM ......................... 1
Intelligence ....................................................................................................8
Race .............................................................................................................10
Fairness ........................................................................................................11
Research Hypothesis ...................................................................................14
Summary .....................................................................................................15
II.
REVIEW OF THE LITERATURE .................................................................. 16
Disability .....................................................................................................16
Age ..............................................................................................................18
Gender .........................................................................................................20
History of Intelligence Testing and Racial Differences ..............................22
Nature: The Hereditarian Model .................................................................35
Nurture: The Disparity is argued to be from Environmental Causes ..........39
Beyond Nature versus Nurture: Towards a More Complex
Understanding .............................................................................................42
Determining Which Scales to Use...............................................................44
Subscales .....................................................................................................44
Summary .....................................................................................................46
III.
RESEARCH METHODOLOGY ..................................................................... 48
viii
Research Hypothesis ...................................................................................48
Procedure .....................................................................................................48
Participants ..................................................................................................50
Instruments ..................................................................................................50
WISC-IV ............................................................................................. 50
WAIS-IV ............................................................................................. 53
Analyses ......................................................................................................56
Summary .....................................................................................................57
IV.
RESEARCH RESULTS ................................................................................... 58
Introduction .................................................................................................58
Demographics ..............................................................................................59
Analysis of the WISC IV scores for Age, Race, and Gender: Research
Hypothesis 1 ................................................................................................61
Full Scale Intelligence Quotient (FSIQ) ............................................. 62
Indices ................................................................................................. 63
Analysis of the WAIS- IV scores for Age, Race, and Gender: Research
Hypothesis 2 ................................................................................................65
Full Scale Intelligence Quotient (FSIQ) ............................................. 66
Indices ................................................................................................. 66
Conclusion ...................................................................................................68
V.
RESULTS ......................................................................................................... 69
Research Hypotheses ...................................................................................69
First Research Hypothesis................................................................... 69
ix
WISC-IV Results ......................................................................... 70
WAIS-IV Results ........................................................................ 70
Discussion ...................................................................................................71
Limitations and Implications for Future Research ......................................74
Implications for Practice .............................................................................79
Summary .....................................................................................................80
REFERENCES ................................................................................................................. 81
x
LIST OF TABLES
1.
WISC-IV, Indexes and Their Subtests .........................................................................52
2.
WAIS-IV: Indexes and Their Subtest ..........................................................................56
3.
WISC-IV and WAIS-IV Demographics (Summary of participants) ...........................59
4.
WISC-IV Age (by range categories; as determined by The Psychological
Corporation) .................................................................................................................60
5.
WAIS-IV Age (categories; as determined by The Psychological Corporation) ..........60
6.
Means and Standard Deviations (in parentheses) of WISC-IV Scores for Disability
Applicants ....................................................................................................................61
7.
Univariate tests: WISC-IV: Dependent Variable FSIQ by Age, Race, and Gender ..62
8.
Multivariate tests for Age, Race, Gender, and WISC-IV ............................................63
9.
WISC source of significance for Age by VCI, PRI, WMI, and PSI ............................64
10.
WISC source of significance for Gender by VCI, PRI, WMI, and PSI .......................64
11.
Means and Standard Deviations (in parentheses) of WAIS-IV Scores for Disability
Applicants ....................................................................................................................65
12.
Univariate tests: WAIS-IV: Dependent Variable FSIQ by Age, Race, and Gender.66
13.
Multivariate tests for Age, Race, Gender, and WAIS-IV ............................................67
14.
WAIS-IV source of significance for AGE by VCI, PRI, WMI, and PSI ....................68
xi
CHAPTER I
INTRODUCTION AND STATEMENT OF THE PROBLEM
We have eliminated the colored versus white factor by admitting at the outset, that
our norms cannot be used for the colored population of the United States. Though
we have tested a large number of colored persons, our standardization is based
upon white subjects only. We omitted the colored population from our first
standardization because we did not feel that norms derived by mixing the
populations could be interpreted without special provisions and reservations.
(David Weschler, 1944, as cited in Williams, 1972, p.4)
Chapter one introduces the rationale for conducting this dissertation research. The
chapter includes discussion of the importance and significant contribution this research
will make to the field. In addition the chapter elaborates on the concepts of intelligence,
race, and fairness as they apply to research on the topic of racial disparity in intelligence
quotient (IQ) assessment amongst people with intellectual disabilities.
Standardized testing for intelligence began during an era when prejudice and
discrimination against black people was less subtle than it is today. While much has been
written about the history of intelligence testing and the connections to racism, little
current research exists to provide clarity about the current status of the newest editions of
1
these high stakes tests. Specifically, there is a dearth in the amount of research being
conducted on the Wechsler Adult Intelligence Scale Fourth Edition (WAIS-IV; Wechsler,
2008) and Wechsler Intelligence Scale for Children Fourth Edition (WISC-IV; Wechsler,
2003) related to the question of racial differences. This is concerning given the important
relevance of this area of research from both a psychological and social perspective and
within the process of assessing for cognitive disability. Nisbett et al (2012) stated “IQ is
also important because some group differences are large and predictive of performance in
many domains. Much evidence indicates that it would be difficult to overcome racial
disadvantage if IQ differences could not be ameliorated. IQ tests help us track the
changes in intelligence of different groups and of entire nations and to measure the
impact of interventions intended to improve intelligence” (p. 131).
Even less research exists on samples within the lower end of IQ, where these high
stakes tests may have their biggest impact in terms of decisions made about diagnoses,
treatments, and educational opportunities. For example, Koocher (2003), explained that
the Supreme Court could rule, as in Atkins v Virginia, that intelligence tests have played a
role in death penalty enforcement across the country. Thus, when the decision to execute
a person could be influenced by our ability to determine if the person has the intelligence
to understand the crime they have committed, our ability to measure intelligence can be
of major significance. However, as will be described later, the courts views on
intelligence testing have waivered and not always been consistent.
Since 1932, it has been accepted that there exists a one standard deviation
difference between test scores for White and Black intelligence test takers (Onwuegbuzie
& Daley, 2001). While several authors (reviewed in Chapter 2 of this text) have proposed
diverse explanations for the causes of these differences, little research is being done today
2
to investigate the status of current scales with current test takers or the factors within the
tests that may continue to contribute to race differences in test scores on standardized
tests of intelligence. This is especially true when considering persons with disabilities,
women, and age differences.
When a person does not perform as well as others taking the same tests, assessors
have an ethical obligation to be concerned about the potential consequences for test
takers. Gasquoine (2009) expressed this ethical concern when he stated:
As clinical neuropsychologists can interpret low scores (or a high number of
errors) on neuropsychological tests as indicative of cognitive impairment from
structural brain injury these findings place minorities at a higher risk of
misdiagnosis than Whites. (p.250)
In 2008, Suinn and Borrayo cited U. S. Census projections that ethnic minorities
will make up 36% of the population in 2010 and 52.3% by 2050. They indicated that we
need to increase our research on effective assessment practices, with minorities receiving
as much research consideration as “Euro-Americans,” and indicated a void in the number
of assessments that access information about cultural-specific syndromes. Actual Census
data (Census.gov, 2011) from the 2010 census indicates that 27.6% of the population did
not identify as “white alone.” The identification was 12.6 % Black or African American
alone, 0.9% American Indian and Alaskan Native alone, 4.8% Asian alone, 0.2% Native
Hawaiian and other Pacific Islander alone, 2.6% some other race alone and 2.9% two or
more races.
When consideration is given specifically to individuals whose scores are lower
than average, the research on the effects of the disparity in scores almost does not exist.
This lack of exploration of low end scores continues in spite of findings by Detterman
3
and Daniel (1989) that indicated mental assessments correlate with each other and with
cognitive variables, and are highest for low IQ groups. They explained that “subtest intercorrelations are significantly larger for low relative to high IQ groups” (Detterman &
Daniel, 1989, p. 352). Since these inter-correlations are higher, if the Black- White
disparity was determined to be greater for people on the lower end of the intelligence
curve, misdiagnosis becomes an even larger concern. It is still important to attempt to
determine if there is any part of the tests that influence racial differences, increasing the
potential for any group to be misdiagnosed.
Sattler (2008) reported differences in Black-White IQ scores for the WISC-IV and
found differences on all four Indexes and Full Scale IQ. Sattler’s (2008) publication
indicated the following findings by the Psychological Corporation: “Euro- Americans
(N= 1,402) had a mean Full Scale IQ (FSIQ) of 103.24 (SD = 14.52) while African
Americans (N= 343) had a mean FSIQ of 91.72 (SD = 15.74)” (p. 280). This replicates
the previously identified one standard deviation in FSIQ performance (Onwuegbuzie &
Daley, 2001).
Index scores were also reported for Verbal Comprehension (VCI), Perceptual
Reasoning (PRI), Working Memory (WMI), and Processing Speed (PSI) (Sattler, 2008,
p. 280). Sattler (2008) reported Euro Americans had a VCI mean score of 102.92 (SD
=13.80), while African Americans had a mean score of 91.86 (SD=15.42). For PRI, Euro
Americans had a mean score of 102.77 (SD =14.36) while African Americans had a mean
score of 91.43 (SD=15.07). For WMI, Euro Americans had a mean score of 101.26 (SD
=14.55) while African Americans had a mean score of 96.12 (SD=15.35). Finally, for
PSI, Euro Americans had a mean score of 101.41 (SD =14.70) while African Americans
had a mean score of 95.00 (SD=15.66). For more detailed explanations of the Indexes,
4
the reader is referred to Chapter 3 of Wechsler Intelligence Scale for Children – Fourth
Edition (WISC-IV) Technical and Interpretive Manual (Wechsler, 2003).
While the differences in VCI and PRI may appear to be slightly greater than those
of WMI and PSI, it was not a part of the purpose of Sattler’s chapter to present specific
analyses of which, if any, Index scores contribute most to disparity. Nor was it relevant
for Sattler to provide commentary about the contributions of the subscales themselves,
but this is a topic of interest for this dissertation. Also appropriate for Sattler’s chapter,
the reported data reflects performance of individual’s representative of the general
population. Earlier in the text, Sattler (2008) referenced Jensen (1975), and indicated that
disparity may be associated with g loading. But, Sattler (2008) later qualified:
The present consensus is that it is not possible to make valid inferences about
genetic differences among races as long as there are relevant systematic
differences among races in socioeconomic status, cultural patterns, and
environments. These differences influence the development of cognitive skills in
complex ways, and no one has succeeded in either estimating or eliminating their
effects. Centuries of discrimination have made meaningless direct comparisons of
the mental ability of African Americans and Euro Americans. (p.169)
It is understandable that comparing mental ability may be “meaningless,” but
comparing elements of a test that may contribute to misinterpretations of cognitive
differences certainly is not if an objective of the research is clarification of what
constitutes a measure of intellectual disability.
Sattler’s (2008) reported findings do not provide information about the WAIS-IV
because Sattler’s text was focused on the testing of children. Also, the research questions
of this dissertation research are beyond the scope of his text; Sattler’s results do not
5
indicate the specifics of how any of the subscales impact the disparity in full scale IQ, nor
do they address the question about racial differences on lower performance scores for
either the WISC-IV or WAIS-IV. Sattler’s text also does not report questions about
within-group differences.
Assessment disparity is not merely an academic concern. Psychologists and test
administrators have an ethical obligation to attend to any perceived lack of cultural
consideration in the work completed as practitioners. According to the American
Psychological Association (APA), Guidelines on multicultural education, training,
research, practice, and organizational change for psychologists, (2003),“ Consistent with
Standard 2.04 of the APA Ethics Code (APA, 1992), multiculturally sensitive
practitioners are encouraged to be aware of the limitations of assessment practices, from
intakes to the use of standardized assessment instruments” (p.391).
Because of the well-documented concerns about the historical misuse of these
tests, it is important that we do not merely assume that the normalization processes used
by test developers have effectively eliminated biases. Simply put, to ignore the question
of race differences in these new assessments could inadvertently perpetuate a myth of one
race’s intellectual superiority. Such a myth is of great concern in light of recognition that
the tests may require specific cognitive procedures that are more often used by whites
than blacks, as was determined to be the case in previous studies of similar tests (Helms,
2006).
If differences persist between the performance of whites and blacks, it would be
helpful to determine if this is due to overall test differences or if some subtests contribute
more than others. If subtest findings exist, it might allow for modifications of future
6
tests and greater understanding of cultural differences in the cognitive experiences of
both groups.
The “Flynn effect” (Flynn, 1987) demonstrated that there has been a rise in IQ
scores in the past century. In addition, much of the gains in IQ have occurred in the lower
range of intelligence. If a decrease is to occur in the disparity between black and white IQ
scores, it could occur within this group; that is, if the increases in IQ are happening more
in the lower end of the distribution the impact on decreases in disparity are more likely
found there first. While Sattler demonstrated the one standard deviation difference was
still present for children taking the WISC-IV (Sattler, 2008, p. 280), the research was not
conducted on a sample that represented the black-white differences on the lower end of
intelligence. Sattler also does not provide information on testing for adults on the most
current scales (WAIS-IV). In addition, as mentioned, these studies do not explore withingroup variability.
Many people over the course of history have sought to provide definitions and
tests to determine a person’s intelligence. Just as there may be many differing
perspectives on the definition of intelligence and consequently how it might be tested,
there may be many differing personal backgrounds that could influence the way in which
an author may define and then test for intelligence. Peoples’ views on race, their
experiences as members of a race and ideas about the role of race on intelligence have
been reported to play a part in how a person may define and test for intelligence. Thus,
another question that arises in the literature about intelligence testing and race is the issue
of test fairness. The following paragraphs offer brief descriptions of Intelligence, Race,
and Fairness. While each of these topics is directly relevant to this dissertation research,
these descriptions are limited by the scope of this dissertation.
7
Intelligence
Discussion in the literature about intelligence makes clear that the definition and
conceptualization of intelligence remains a topic of debate. Volumes have been written to
describe intelligence, to present theories about what it involves and theories about how
individuals may or may not have it. Boring commented, that “intelligence is whatever the
tests test” (cited in Beins, 2010, p.89). Thus, we need to be thoughtful of which test we
choose and what the test measures.
In 1958, Weschler (cited in Onwuegbuzie & Daley, 2001) defined intelligence as,
“the aggregate or global capacity of the individual to act purposefully, to think rationally,
and to deal effectively with the environment” (p. 211). However, Onwuegbuzie and
Daley (2001) challenged Weschler’s perspective when noting, “Yet even Alfred Binet,
who devised the first IQ test in 1905, declared that intelligence is too complex to
summarize with a single number and warned of the ‘brutal pessimism’ that would ensue
if IQ tests ever were mistaken as a measure of fixed immutable intelligence” (p.211).
Onwuegbuzie and Daley also referred to Greenfield (1998), noting that his
explanation of intelligence was:
…the ability to acquire competence through learning, socialization, and
development from each of the following: (a) technology, (b) linguistic
communication, and (c) social organization, facets that vary from culture to
culture. Thus by Greenfield’s definition, intelligence varies from culture to
culture. In other words, cultures define intelligence by what is adaptive in their
particular social and cultural milieu. (p.212)
Two additional contributions to the understanding of intelligence are Pinker’s
(1997) research on artificial intelligence, and Dweck’s (2000) work with self-theory. In
8
the book, How the Mind Works, Pinker (1997), compared artificial and human
intelligence and offered that a large part of intelligence has to do with the ability to attain
goals. Factors that thwart the individual’s ability to experience goal attainment can then
be seen as contributing to impacting their intelligence. This could be interpreted to
support the idea that because groups of people have had their goals blocked they may not
score as well on intelligence tests. Persons of minority status or with disabilities may
experience more obstacles to goal attainment. The tests they are given should not be ones
that contain their own obstacles, rather tests that serve as an aid to overcoming them.
Dweck (2000), in the book Self- Theories compared “entity” versus “incremental”
theories of intelligence. Based on her research, if an individual views intelligence as a
single entity they are less likely to perform well than if they view intelligence as
something able to change from time to time and situation to situation (incrementally).
One potential concern about standardized measures is the potential for them to perpetuate
“entity” perspectives of intelligence. Sternberg (2000) described a similar concept,
“implicit theory” of intelligence, noting that intelligence tests “are validated almost
exclusively against the societally approved criteria, giving the tests the appearance of
validity that they may not have within a given sociocultural group” (p.159). Differing
conceptualizations of intelligence are likely reflective of and/or influenced by the history
of intelligence testing. More information about the history of intelligence testing follows
in Chapter 2.
Advances in our understanding of “cognitive plasticity” (see Mercado III, 2009,
for more detailed explanation) have also helped us to understand important critical
periods for intervention based on the remarkable adaptive abilities of the developing
brain. Intelligence from this area of cognitive neuroscience involves a complex interplay
9
of genetic, physiological, and environmental interventions. Such sophisticated knowledge
of brain development may assist in restructuring our knowledge of the nuances of
intelligence and will contribute to even more beneficial tools for cognitive skill
assessment.
While alternative conceptualizations of intelligence offer promise in increasing
fairness and broadening our view of the concept of intelligence, they tragically, at this
time, lack empirically validated objective measures that lend themselves to research
analyses. Suzuki and Valencia’s (1997) definition of intelligence best supports the
purposes of this dissertation research: “Intelligence is operationally defined by scores on
individually administered standardized intelligence tests” (p. 1103). Since their available
data for study had been operationalized as scores on the WAIS-IV and WISC-IV, we are
also led to accept them as the definition of intelligence provided by theses scales.
Sometimes, when our only tool is a hammer, the question becomes is it our best tool or is
there a way to at least create a modification to make it a better tool?
Race
The topic of racial differences begs the question, what do you mean by race? The
APA provides answers to the question, by defining race in their Guidelines on
Multicultural Education, Training, Research, Practice, and Organizational Change for
Psychologists (APA, 2003). The APA indicated that the biological definition of race has
led to a great deal of controversy. The APA cites Helms and Cook (1999) and explains
that the controversy surrounds the recognition that “biological racial categories and
phenotypic characteristics have more within-group variation than between-group
variation” (p.380). In the guidelines, (APA, 2003) race is not viewed as a biological
construct, but rather a social construct. “Race, then, is the category to which others assign
10
individuals on the basis of physical characteristics, such as skin color or hair type, and the
generalizations and stereotypes made as a result” ( p.380). They quote Helmes and
Talleyrand (1997), “Thus, people are treated or studied as though they belong to
biologically defined racial groups on the basis of such characteristics” (p.374). For the
purposes of this dissertation study, only participants that identified with and were
classified within the social construct of either Whites or Blacks will be included. These
racial constructs include stereotypes that have resulted from studies with these racial
groups in the past, but this dissertation research is being conducted in the spirit of
contributing to the eventual elimination of stereotypes, as they may be related to
intelligence testing and disability determination. In addition, identification of races may
be beneficial if it is determined that one group is being discriminated against, or even
being treated differently than another in assessing intellectual disability.
Fairness
In light of the history of IQ tests, the potential misuses of IQ test results in
disability determination, and because the history would make the presence of an “illusory
correlation” or stereotypical conclusion not only possible, but also hazardous, one of the
main reasons for generating this dissertation research is to determine if the newest
versions of the Weschler tests are fair. Fairness of these tests might be found if the
“Flynn effect” (Flynn, 1987), or modifications in the tests themselves, has rendered the
Black/White disparity obsolete or progressing in that direction, at least for persons with
intellectual disabilities. Helms (2006) defined ‘fairness’ in testing as “the removal from
test scores of systematic variance attributable to experiences of racial or cultural
socialization, and it is differentiated from test-score validity and cultural bias” (p. 845).
11
Even the legal system appears perplexed in attempting to decide about the fairness
of IQ testing as it relates to determining a disability. For example, in 1979 in the case of
Larry P. v. Wilson Riles, Judge Peckman of California ruled that IQ tests are culturally
biased when used to assess Black children for classes for the educable mentally retarded.
One year later, in the case brought forward by the conservative parent group (Parents in
Action), Parents in Action on Special Education v. Joseph P Hannon, Judge Grady of
Illinois ruled that intelligence tests are not racially or culturally biased and do not
discriminate against Black children. However, if a racial disparity exists amongst
persons who are applying for disability determination and happen to score lower on the
newer test versions, it is more likely that they may not be fair. Given debates within
Psychology about the fairness of intelligence testing, it is not surprising that the courts
have not been able to agree whether or not these tests are culturally biased.
While promoting an argument for genetic influence, Herrnstein and Murray
(1995) conceded that, to some degree, culture had to be acknowledged in results from
intelligence testing, yet there would be more potential for the situation to improve if there
were greater improvements in the larger culture that contains disparity. They also
support the hypothesis of this dissertation, that disparity change may first be observed on
the lower end of the scale scores.
In the past few decades, the gap between blacks and whites has narrowed by
perhaps 3 points. The narrowing appears to have been mainly caused by a
shrinking number of very low scores in the black population rather than an
increasing number of high scores. Improvements in the economic circumstances of
blacks, in the quality of the schools they attend, in better public heath, and perhaps
12
in diminishing racism may be narrowing the gap. (Herrnstein & Murray, 1995, p.
269)
In addition to the recognition that with improvement to environmental factors
within the culture, there will be a closing of the gap, there is within this quote the
prediction of the research proposed in this dissertation. That if the gap begins to close, it
is more likely to be discovered on the lower end of test scores first. Sattler (2008)
appeared to agree and cites the findings of Colom, Lluis-Font and Andres-Peuyo (2005),
that the gains made in performance amongst blacks on the WISC-IV were “on the lower
portion of the IQ distribution” (p.252).
Findings of reductions in the disparity of the test scores (even if just starting on the
lower portion) could also be an early indication of efforts towards discovery of greater
fairness within these cognitive assessments. Without changes in the degree of disparity,
IQ assessments are not completely fair tests. Perry, Satiani, Henze, Mascher, and Helms,
(2008) explained:
Helms (2006) revisited the concept of cultural test bias by reframing the matter
under an Individual-Differences Fairness model. According to this approach, CAT
(cognitive ability test) scores should not be correlated with racial-cultural
constructs (e.g., racial identity). If they are correlated and result in mean
differences, then CATs are not ‘fair’ instruments. Helms (2006) suggested that
researchers replace the use of the construct of race with individual-difference
constructs based on socialization. In a hierarchical regression, fairness would enter
‘conceptual constructs in the first step of an analysis to predict scores and racial
groups in a second step’ (Helms, 2006, p. 852). If the second step fails to explain
13
variation in scores above and beyond the first step, the explanations based on racial
group would not be adequate models. (p. 164-165)
While it will fall beyond the scope of this project to deem if the revisions of the
IQ scales are completely fair or unfair, the results of this dissertation may indicate that
the scores are not biased for people that are applying for disability. If ethical practice is
to be retained, it is essential that attempts are made to provide assessments that are as fair
as possible to all people being measured. If gaps exist in a measure that continues to lead
an entire group of people to exhibit differences in performance, it will not lead to the
conclusion that the measure is fair, despite reports of validity and reliability statistics. If
disparity continues to exist, future fairness can be accomplished by discovering factors
that contribute to the disparity.
Research Hypothesis
To determine if there are disparities that exist in more current assessment tools for
intellectual disability, the psychometric properties of the WAIS-IV (Wechsler, 2008) and
the WISC-IV (Wechsler, 2003) could be explored for individuals applying for disability
benefits. It would be beneficial to determine if the assessment tool indicated any
differences for blacks or whites, males or females, or if any influence of age was
observed. The following research hypothesis provides a guide for answering
preliminary questions about whether the psychometric properties of this tool are impacted
by race, gender, or age: Hypothesis 1: Among people applying for disability, there is a
relationship between age, gender, race, and performance on the Wechsler Intelligence
Scale for Children (WISC-IV; Wechsler, 2003). Hypothesis 2: Among people applying
for disability, there is a relationship between age, gender, race, and performance on the
Wechsler Adult Intelligence Scale, Fourth Edition (WAIS-IV; Wechsler, 2008).
14
Summary
This first chapter presented an argument for the importance of conducting this
dissertation research. Although several less biased models of intelligence have been
proposed, they have not been standardized. The assessment tools that have been
standardized have been challenged as being unfair due to differences measured with
people of differing race. This issue cannot be ignored because of the professional
mandate to practice ethically when determining the correct assessment methods for
determining intellectual disability. Additionally, the ethical challenge is coupled with
recognition of the potential for harm by labeling someone with a disability if a test may
be biased. If one group of people is at greater risk of misdiagnosis than another due to a
disparity in test scores people may be inadvertently labeled as having a disability when
the issue is really about differences in the way people from different backgrounds
respond to the test. Black people have a much greater history of being exploited than
whites, and review of the history of the historical debates about racial differences in
intelligence, which follows in chapter 2, underscores this.
Chapter two, a review of the relevant literature, explores the meaning of
intellectual disability, the history of intelligence testing specific to racial differences,
including the expression of psychology’s nature versus nurture debate, the determination
of scales to be used, the potential role of age, gender, and racial differences in the test
scores, and the potential contributions of the subscales.
15
CHAPTER II
REVIEW OF THE LITERATURE
In reviewing the literature chapter 2 explores the relevant research on disability,
age and gender in the measurement of intelligence. Chapter 2 then briefly describes some
of the vastly documented history of intelligence testing as it relates to racial disparity in
determination of disability and continues by discussing how the nature/nurture debate has
been expressed in exploring the heritability of racial differences in IQ tests. This chapter
concludes by explaining the choice of which scales were explored and why the
population of people applying for disability was chosen.
Disability
The American Association on Intellectual and Developmental Disabilities
(AAIDD, 2011) indicated that: (1) IQ testing is a major tool in assessing intellectual
disability. (2) A score as high as 75 can indicate limitations in intellectual functioning.
(3) The term “intellectual disability” means the same as “mental retardation.” Since
“intellectual disability” is the preferred term, the association changed their name from the
American Association on Mental Retardation to AAIDD in 2007. While identifying
multiple and differing factors that may contribute to intellectual disability; they
emphasized:
16
The overarching reason for evaluation and classifying individuals with intellectual
disabilities is to tailor supports for each individual, in the form of a set of
strategies and services provided over a sustained period. The goal is to enhance
peoples functioning within their own culture and environment in order to lead a
more successful and satisfying life. Some of this enhancement is thought of in
terms of self–worth, subjective well-being, pride, engagement in political action,
and other principals of ‘disability identity.’ (p. 2)
According to Foley-Nicpon and Lee (2012), “Now is the time for counseling
psychology to give theory, research, and practice with individuals with disability the
attention they deserve”(p.397). Foley-Nicpon and Lee (2012) completed an analysis of
the content of 20 years of research in counseling psychology in the area of disability.
Despite disability being an important element of counseling psychology’s emphasis on
multiculturalism and diversity coupled with the importance counseling psychology places
on preventing marginalization and discrimination, disability research comprised “an
extremely small amount (from less than 1% to 2.7%) of the counseling psychology
literature” (p.392). Referencing 2005 Census bureau reports they indicated that this is of
great concern when one realizes that 19% of the U.S. population (one in five) report
having one or more disability and constitute the largest minority group in America.
Foley-Nicpon and Lee (2012) indicated that the study of disability is complex because
there is an enormous range of disabilities, and what can happen is that disability research
becomes specialized by divisions like Rehabilitation Psychology (APA Division 22) for
physical disability and Clinical Psychology (APA Division 12) for psychological
disorders. In a review of 55 articles in counseling psychology literature Foley-Nicpon
and Lee (2012) found that amongst those few articles written in disability research the
17
majority (38%) were reviews of the literature, or opinion, policy, or legal reports.
Fortunately, the number of empirical studies completed in the area of disability research,
while still sparse, has increased slightly in the past 20 years, a trend the authors are
encouraging. However another major concern is that the limited disability research that
is being conducted is “one dimensional and focused on the disability itself … but not on
how aspects of disability interact with our many other identities including race, class and
gender…Literature focusing on the intersection of multiple identities is recommended”
(p. 396).
Age
According to Rushton and Jensen (2005), the size of the average Black-White differences
does not change beyond age three. This dissertation research continues the exploration to
confirm if this remains the case for people applying for disability. Murray (2007)
recently observed fluctuations in age and score differences that challenge Rushton and
Jensen’s conclusions. However, Murray found these differences based on testing with
the Woodcock-Johnson (Woodcock & Johnson, 1989) test for intelligence. Kaufman,
Johnson and Liu (2008), using Kaufman Brief Intelligence Test (Kaufman & Kaufman,
2004), found IQ decreases with age but, this study did not report any impact of this
change upon the lower end of intelligence test scores. Age may also be a consideration
regarding contribution scores between differences in subscales (ex. Crystalized versus
fluid scales). Horne and Cartell (1966), differentiated between fluid and crystallized
intelligence. Fluid intelligence was defined as problem solving and the ability to perceive
relationships and similarities and does not rely on specific instruction. In later
intelligence tests this came to be known as performance intelligence. At this same time
Horne and Cartell also defined crystallized intelligence, which is accumulated knowledge
18
over time. Crystallized skills are affected by knowledge and cultural experiences. In
later tests this came to be known as Verbal Intelligence. Crystallized intelligence can
increase with age, while Fluid intelligence can decrease with age.
Nisbett et al (2012) cited Flynn (2009) indicating correlations between declines
in Fluid (g F) versus Crystallized (g C) and aging for “brighter individuals” (p 143),
there appears to be an “age tax” after age 65 those with higher IQ show the greatest
decline. The differences in age for lower IQ scores were not so great and may be found
in the verbal skills. It should be noted however that this research was conducted on
previous versions of the WAIS.
After age 65, the brighter the person the greater the decline: Whereas those at the
median lose an extra 6.35 points (SD= 15) compared to those 1 SD below the
median, those 1 SD above the median lose an extra 6.20 points compared to those
at the median, and those 2 SDs above the median lose an extra 3.4 points
compared to those 1 SD above. Those who are very bright, rather than below
average, pay a total penalty of 16 IQ points. The reverse is true of verbal, where
there is a ‘bright bonus’ of 6.30 points.
(p.143)
Nisbett et al (2012) also referenced Blair’s (2006) claim that g F declines more
quickly than g C as we get older, as the prefrontal cortex deteriorates faster than the
remainder of the cortex. Because people of different ages are also representative of
different eras, comparisons from different ages as mediating the racial disparity could
also provide information about differences from different times in history. Indeed,
Nisbett et al (2012) indicated a “dramatic decline of Black IQ with age” adding that there
was a 5 point difference between Black and White 4 year olds, but a 17 point difference
19
Black and White 24 year olds. They add “this could be as it seems a loss with age. But it
could be that younger cohorts of Black (those born 5 years ago) have had more favorable
life histories than older cohorts of Blacks (born 24 years ago)” (p. 147).
This dissertation seeks to explore any fluctuations related to age within WISC-IV
and WAIS-IV scores, and, more specifically, within persons from the lower end of the
scale where the results carry higher stakes.
Gender
There is some discussion in the literature about the role gender has played in
WAIS-IV and WISC-IV scale results. Most studies have observed no differences, while
others observed that some differences existed. For example, in a sample by Chen and
Zhu (2008) with 2200 children, half male/half female, “overall factor patterns, loadings,
unique variances, and factor co-variances of the WISC-IV generally did not vary with
gender” (p. 260).
Yet, other research found some gender differences. Jackson and Rushton (2006)
explored test results for 100,000 Scholastic Assessment Test (SAT), (College Entrance
Examination Board, 1992), participants and found gender differences. The g factor
measured by the verbal and mathematics sections of the SAT (which the authors report
ought to parallel g as measured by standardized intelligence tests) was significantly
higher for males than for females. Chen and Zhu (2008) indicated that gender invariance
is essential to demonstrate, and there is a great deal of interest in understanding if there
are any differences between the genders in cognitive abilities. It is implied by the
practice of lumping males and females test results together that the scales and the
subscales have the same meaning for both genders. Chen and Zhu (2008) indicated that,
“According to standard 7.8 of the ‘Standards for Educational and Psychological Testing’
20
(American Educational Research Association and National Council on Measurement in
Education, 1999), ‘Comparisons across groups are only meaningful if scores have
comparable meaning across groups. The standard is intended applicable to settings
where scores are implicitly or explicitly presented as comparable in meaning across
groups.’” (p.83).
Most tests found little or no gender differences in overall performance. Sattler (2008)
reported WISC-IV average full scale IQ scores as similar for boys (100.24) and girls
(99.78). This held true for mean differences on Verbal Comprehension, Perceptual
Reasoning, and Working Memory, but the average on the Processing Speed Index was
almost 4 points higher for boys (102.48), than for girls (97.63). However, this data had
been based on the original Psychological Corporation norm group, so it does not indicate
if this gender difference holds true for youth with cognitive disabilities (Wechsler, 2003,
p. 280). In addition, Sattler has not addressed adult assessment, so it does not report
gender differences amongst adult test takers.
Nisbett et al (2012) did report gender differences (amongst normal population
participants) with an advantage for males in “visuo-spatial abilities,” (such as object
rotation) and for females in “verbal abilities” (fluency and memory). Males “verbal
scores may be decreased because of a higher prevalence of stuttering, dyslexia, and
autism in males.” Also, they noted that there are many more “mentally retarded” males
than females. “Males are more variable in their performance on some tests of
quantitative abilities which results in more males at both the high and low ends of the
distribution…This may indicate if we find a gender difference (not already adjusted for
by standardization) that it would not be because males are performing higher than
females on quantitative measures (which is not the case in median and above median
21
performances)” (p. 145). In general, there may be no gender differences in overall
performance, but there may be some in subtests as indicated:
“ Men and women apparently achieve similar IQ results with different brain
regions, suggesting that there is no singular underlying neuroanatomical structure to
general intelligence and that different types of brain designs may manifest equivalent
intellectual performance” (Haier et al., 2005, p. 145).
Researchers could wonder if this standard applies not only to gender, but also to
age and race, and whether more research is warranted. Review of the literature on gender
differences raised some points that contributed to the research questions of this
dissertation: Which research can be replicated? Do any of these findings hold true for
WAIS-IV? Does this hold true for differences on the high or low end of the scale? Do
these differences in any way influence measured differences between races? Does gender
contribute to psychometric outcomes indicating intellectual disability, and if so, what
scales may be most influenced by gender?
History of Intelligence Testing and Racial Differences
Knowledge of the reported history of intelligence testing helps to understand why
it remains important to stay apprised of issues around race in high stakes testing. While
not a full report on the entire history of IQ testing, I will highlight some of the
information in the literature about the historical factors that may have contributed to the
controversy about the differences observed in scores between White and Black test
takers. For a more extensive review of the history of intelligence testing, the reader is
referred to: Beins (2010), Benjamin and Baker (2004), Boake (2002), Gould (1981),
Gutherie (2004), and Pickren, and Dewsbury (2002).
22
As early as 1817, Gall and Spurzheim began the practice of Phrenology
concluding, and convincing the public that the size, shape, and bumps of the skull
reflected its contents. This led to the establishment of multiple testing clinics within the
United States and increased public interest not only in the study of the skull and what it
contained, but also what findings about the skull might mean.
In 1849, Samuel George Morton packed skulls with packing material to measure
“endocranial volume.” He found that he could pack about five cubic inches more worth
of material into the skulls of Whites than Blacks (Rushton & Jensen, 2005). However, as
Gould (1981) reported, it was later discovered that Morton may have altered his data and
even shaken the skulls differently to allow more material to be packed into selected
skulls. In 1869, Galton, Charles Darwin's cousin, (also interested in genetic influence),
used head size and other “brass instruments” to measure intelligence. Galton introduced
the term eugenics to science and had a “lifelong obsession” with selective mating to
improve the stock of a race (Welch, 2006). In 1873, Paul Broca weighed brains of
Blacks and Whites and found Whites brains to be heavier, their frontal lobes larger and
with more “complex convolutions” (Rushton & Jensen, 2005). Thus, during the 1800s
there was early research interest in intelligence and specifically measuring the brain and
skull on the premise that the sizes reflected a parallel amount of intelligence.
People were interested in intelligence and in its relationship to race before Binet
generated his first intelligence test in 1881. Binet, commissioned by the French
government, was looking for “sub-normal children” and began objective testing in the
hope of determining which children could benefit most from school. Simon joined Binet
in 1905 and added tests of abstract reasoning and general mental ability, leading to the
Simon/Binet concept of a mental age (Weiten, 2010).
23
In 1897, Charles Cooley, first president and one of the founders of the American
Sociological Association, argued for a “deprivation” model. Cooley attributed slavery,
poverty, and racism to explain the disparity in Black-White intelligence differences.
Cooley used the metaphor of corn seeds grown in deprived versus normal environments
and the effects on their height (Rushton & Jensen, 2005). In 1904, Spearman discussed
the concept of “General intelligence,” concluding it could be objectively determined and
measured. Spearman observed that different measures of mental ability are positively
correlated and theorized that all of these tests tapped into a general mental ability that he
called “g.” Terman, at Stanford University, developed the Stanford/ Binet in 1916 by
adding a new scoring method and IQ (Intelligence Quotient) where IQ = MA /CA X 100
(mental age over chronological age multiplied by 100). Guthrie (2004) expressed concern
about Terman and quotes Terman as having said: “[Mental retardation] represents the
level of intelligence which is very, very common among Spanish-Indians and Mexican
families of the Southwest and also among Negroes. Their dullness seems to be
racial.”(p.61). Guthrie added: “Terman further predicted that when future IQ testing of
these groups was done, ‘there will be discovered enormously significant racial
differences which cannot be wiped out by any scheme of mental culture’” (p.61). Thus,
Guthrie demonstrated his concern that one of the early contributors to the field of
intelligence testing firmly believed that persons of different races were not as intelligent.
Goddard introduced the study of the Kallikak family to the American people in
1912. This report served to illustrate the problems created by the potential breeding of
whites with “colored” people and how their lack of intelligence predisposed them to lives
of crime and poverty (Welch, 2006). Benjamin (2009) provided a review of Goddard’s
contributions to the American intelligence testing movement. While Benjamin
24
acknowledged the importance of Goddard’s contributions, he was also concerned about
Goddard’s comments about keeping cognitively-impaired children in institutions or
having them sterilized to prevent breeding. Benjamin was also concerned about
Goddard’s contributions to the eugenics movement. Benjamin described how Goddard’s
book, The Kallikak family: a study in the heredity of feeble mindedness (1912), helped to
perpetuate the argument against “feebleminded people” having children. Gould (1981)
claimed Goddard “altered photographs to suggest mental retardation in the Kallikaks,”
indicating that his efforts were conscious acts of “social prejudice” (p.59). Goddard
insisted that Binet’s test measured g, despite Binet’s insistence that his tests did not
measure intelligence. Goddard also insisted intelligence is innate and inherited from
one’s family (Welch, 2006).
In 1958, Wechsler introduced the WAIS (Wechsler Adult Intelligence Scale) and
WISC (Wechsler Intelligence Scale for Children). Both added performance scales
(assemble blocks, etc.) and were less verbal than the Stanford-Binet Intelligence Scale
(Terman, 1916). However, in the 1970s, these tests also generated some controversy
regarding being potentially culturally-biased. For example, one item questioned what to
do if someone hits you and gave higher points for a culturally-biased response: Contact
the authorities. White, Anglo-Saxon people were more likely to provide this response, as
opposed to other cultural groups that respond with “hit him back.” (Weiten, 2010).
An additional contribution to intelligence testing, that has been a part of
understanding potential contributions to the racial differences is the idea that some
intelligence may be innate while some may be learned. As noted earlier, terminology has
been developed to describe this difference. Horne and Cartell (1966), differentiated
between fluid and crystalized intelligence. Fluid intelligence was defined as problem
25
solving and the ability to perceive relationships and similarities and does not rely on
specific instruction. In later intelligence tests this came to be known as performance
intelligence. At this same time Horne and Cartell also defined crystallized intelligence,
which is accumulated knowledge over time. Crystalized skills are affected by knowledge
and cultural experiences. In later tests this came to be known as Verbal Intelligence.
Crystallized intelligence can increase with age, while Fluid intelligence can decrease with
age.
Between 1968 and 1972, two historical figures emerged in challenging the alleged
racial bias of intelligence tests, Adrian Dove and Robert Williams, designers of the
Chitling (Dove, 1968) and Black Intelligence Test of Cultural Homogeneity (BITCH;
Williams, 1972) tests. Both designed simple measures that illustrated that tests written in
the language of Black people generated better performances by Black people. This
indicated that perhaps to succeed at an IQ test, Blacks would need to be further exposed
to White language, essentially demonstrating bilingual abilities that are not expected of
White test-takers. In 1968, Black Sociologist Adrian Dove developed the Dove
Counterbalance Intelligence Test. It came to be known as the Chitling Test (from an
article in Newsweek, 1968) because one of the questions asked about cooking time for
chitlings. According to Kaplan and Saccuzzo (2001), the Chitling test is not standardized
and has no predictive validity, only face validity. They added that it did not discriminate
between those who had been exposed to 1960’s Black culture and those who had not. In
a conversation by phone with Dr. Robert Williams on June 11, 2008, he reported to this
researcher that he was influenced by the Chitling test in his construction of the BITCH
(Black Intelligence Test of Cultural Homogeneity) Test (1972). This conversation
indicated that Williams, like Dove, had strong interest in the contribution that crystalized
26
skills (verbal intelligence) was making on the perceived racial differences reported in
standardized tests.
The BITCH test was designed with results from 100 Black and 100 White high
school students (ages 16-18) from St. Louis, half of low SES and half of middle SES.
Black subjects appeared intensely interested in the test, while white subjects questioned
validity and appeared tense (sighed, showed signs of discomfort, etc.). Test items
consisted of 100 multiple-choice items. Examinees were presented with a word, term, or
expression and asked to select the correct meaning. Blacks outscored whites by 36 mean
points. Both groups produced skewed curves. Blacks’ scores were negatively skewed
(more high scores than low scores) while Whites’ were positively skewed (more low
scores than high scores), indicating the test was difficult for Whites but easy for Blacks.
When distributions were combined, they produced a normal distribution curve, with
Blacks comprising the upper half and Whites comprising the lower half of the normal
distribution. Williams reported these results demonstrated that tests derived from the
experience of one group, but used to determine the abilities of another, are inherently
unfair. BITCH had only a .33 correlation with the language portion of the California
Achievement Test (CAT; California Test Bureau, 1973); meaning low CAT scorers were
some of highest BITCH scorers and vice versa.
Additional research was conducted with the BITCH test. Long and Anthony
(1974) examined the relationship between WISC and BITCH scores for 30 Black high
school students enrolled in an EMR (educable mentally retarded) program. All 30
students had been classified as educable mentally retarded using the WISC. The goal
was to determine whether BITCH scores can be used to rule out mental retardation. They
found the following: Black EMR students obtained similar scores on the WISC and the
27
BITCH. All 30 students obtained scores on the BITCH that fell below the 1 st percentile,
indicating that EMR students scored poorly on both test instruments even with the
inclusion of culturally specific items. Matarazzo and Wiens (1977) compared WAIS and
BITCH scores for 17 Black and 66 White police applicants. They examined the utility of
the BITCH as a selection instrument for Black police officers and found White applicants
outscored Black applicants on the WAIS, while Black applicants outscored White
applicants on the BITCH. WAIS and BITCH scores were completely uncorrelated. The
authors concluded that the BITCH was not useful as a selection instrument due to lack of
adequate ceiling for Black applicants. Young and Rearden (1979) compared BITCH and
Shipley Institute of Living Scale scores (Shipley, 1940) for 45 Black Chicago youth.
They used the Vocabulary and Abstract Reasoning subtests of the Shipley Scale. Their
results indicated that BITCH scores correlated negatively with Vocabulary and Abstract
Reasoning on the Shipley Scale. Finally, Butler-Omololu, Doster, and Lahey (1984)
compared WISC-R, CAT (California Test Bureau, 1973), and BITCH scores for 16 Black
and 16 White high school students. The goal was to examine the influence of cultural
factors on test performance. Their results indicated that Whites scored significantly
better than Blacks on the CAT and the WISC-R, and Blacks scored significantly better
than Whites on the BITCH. The authors concluded that cultural factors need to be
considered during test construction.
Thus, the overall conclusions about the BITCH test have been that BITCH
performance is either unrelated or negatively correlated to performance on traditional
ability tests grounded in the Euro-American culture (e.g., WAIS, WISC, CAT, Shipley,
etc.). Blacks outscored Whites on the BITCH. Whites outscored Blacks on tests reported
28
as favoring Euro-American culture, and cultural factors were reported as inseparable
from the test construction process.
Guthrie (2004) described the work of G.R. Stetson, who in 1897 compared 500
black and 500 white American school children in an experiment that required them to
memorize and repeat a stanza of poetry. Stetson found that the black children
outperformed the whites, but as Guthrie (2004) indicated, little was done with the results
of this test and additionally, it was concluded that “the memory technique was not a valid
measure of intelligence” (p.63). What Guthrie concluded about Stetson’s research may be
considered parallel, to some degree, to the conclusions made by more current intelligence
theorists about the importance of findings like those of Williams and Dove. The point
being, that efforts may have been made by some historical contributors to the field of
intelligence testing to intentionally ignore information that could have contributed to the
development of tests that were more fair and culturally sensitive. One could simply
dismiss these findings because neither the BITCH test nor the Chitling test (or Stetsons
findings) were ever found to be a measure of intelligence. But what was lost in that
process is the point that was made by Williams and Dove at the height of the American
Civil Rights movement, while tests may be standardized they may not be fair; a point
Helms takes up later.
In 1981, Gould published the book The Mismeasure of Man, which explores the
history of research efforts to demonstrate whites as being more intelligent than blacks. In
this text, Gould is critical of Darwin, Morton, and Broca for their work in “Crainiometry”
because they assumed that intelligence was a “single, innate, heritable, and measureable
thing” (p.57). Thus, the research they were doing, he believed, was not accurate. While
they offer correlations between things like brain size or cranial capacity and intelligence,
29
they were only able to do this because they made the aforementioned assumption about
intelligence.
Gould (1981) stated he had no doubt that IQ is to some degree inherited, but
argued that there is more to the “hereditarian fallacy” than just the conclusion that
intelligence is inherited. “The hereditarian fallacy resides in two false implications drawn
from this basic fact: The equation of ‘heritable’ with ‘inevitable’ and the confusion of
within- and between- group heredity” (p.185-186). Gould identified three major early
contributors to bringing the hereditarianism of intelligence and intelligence testing to the
United States: (1) Goddard, who translated Binet’s work and brought the test back with
him from Europe, (2) Terman, developer of the Stanford –Binet, and (3) Yerkes, who
convinced the army into “objective testing” that contributed to the Immigration
Restriction Act of 1924.
Murray (Herrnstein & Murray, 1996) refuted Gould’s criticisms of early research
on cranial capacity, brain size and intelligence. Murray indicated that research with
magnetic resonance imaging (MRI) has found, if one accounts for body size, that a
relationship between brain size and intelligence does exist and that there are different
distributions of brain size among races. According to Murray (1996), Gould’s work is
very wrong and it largely ignores the research on g and quotes Herrnstein as having said,
“You can make g hide, but you can’t make it go away” (p. 559). Murray (Herrnstein &
Murray, 1996) went on to argue that g has empirically been linked to IQ scores, “neurophysiological functioning,” and genetics. Murray added that, “the higher the g loading of
a subtest, the higher its heritability” (p.560). Hence, we observe that the argument around
a hereditarian explanation for why there is racial disparity is defined, to some degree, by
Spearman’s concept of “g” as proof of a genetic contribution to intelligence. But do
30
some subtests contribute stronger to the disparity than others, and perhaps another
perspective may be that this would render those subtests as less fair?
In 1981, Sternberg posed a Triarchic model of intelligence. Sternberg found for
most people, intelligence could be defined as either: Verbal Intelligence; having an
excellent ability to use language (for example J.K. Rowling, author), Practical
Intelligence; exceptional problem solving skills (for example Bill Gates, innovator and
entrepreneur), or Social Intelligence; strong ability to get along with and even influence
other people (for example Oprah Winfrey, celebrity and social activist). Sternberg
elaborated that intelligence, in the Triarchic model, has multiple layers and levels of
influence. Similarly, Gardner (1983) argued that intelligence is more complex than a
single factor (like “g”) and proposed eight types of intelligence based on observations of
people of achievement/success: (1) Linguistic, (2) Logical, (3) Musical, (4) Spatial, (5)
Kinesthetic, (6) Interpersonal, (7) Intrapersonal, and (8) Naturalistic. Gardner also
concluded that almost every studied relationship between societal outcome and IQ test
explained, at the most, 20% of the outcomes. Eighty percent of the elements that
contribute to socioeconomic status were factors that exist beyond measured intelligence
(Welch, 2006). Murray, in the Afterward section of The Bell Curve (1996), was critical
of Gardner’s work. Murray argued “linguistic intelligence” or “logical- mathematical
intelligence” (as measured by IQ tests) were better predictors (than Gardner’s other five
types of intelligence) of a person’s success in life. Murray added that Gardner had
succeeded better than anyone else to demonstrate that his types or elements of
intelligence were statistically independent of each other; that theses abilities actually
grouped with one another, as implied by the concept of g.
31
Mayer, Caruso, Panter, & Salovey (2012) described the growing research zeitgeist
towards alternative views of intelligence as the “hot intelligences” and advocated for their
continued inclusion in future research. These “hot intelligences would include social,
emotional, and personal intelligences including the influence of peoples’ skills in their
abilities to interact within a social context; as contrasted with “cool intelligences” which
only measure such traditional cognitive functions as abilities to abstract and to
manipulate information. Nisbett, et al (2012) indicated agreement with Mayer, et al
(2012), but added that beyond finding correlations between cool and hot intelligences, it
has been hard to demonstrate a significant contribution of hot intelligences towards
general behaviors and performance.
In 1987, Flynn published his findings indicating IQ gains in 14 separate nations.
Flynn (1987), found what has been named the “Flynn effect:” IQ had increased 3 points
every 10 years, in different nations, due largely to increases in fluid (performance)
intelligence as a society. Recently, in concluding a discussion of the Flynn effect,
Hiscock (2007) stated,
A pervasive increase over time in performance on IQ tests is well established. The
magnitude of the increase is especially marked when culture-reduced tests of
general intelligence, such as Raven’s Matrices, are used. The Flynn effect also
raises scores from Weschler and Stanford –Binet IQ tests, and the increases are
sufficiently large as to present an interpretive problem for practitioners who
administer IQ tests to their clients. Not only does the Flynn effect cause published
norms for Full Scale IQ to become progressively less appropriate over time, but it
also causes different subtest norms to change at different rates Clinical
neuropsychologists who use old versions of IQ tests not only will overestimate IQ
32
but also will risk misinterpreting ipsative indicators such as Verbal-Performance
disparities and subtest profiles. (p.526)
Thus we can see that neurologists and other testing researchers continued to
observe what Flynn reported, and continued to utilize these findings to express concerns
about test interpretation. Rushton and Jensen (2010) are more critical of Flynn’s
findings:
Rather than interpreting the secular gain of three IQ points a decade as evidence
that people become familiar with the test material over time, requiring periodic
updates of the test, Flynn took it to mean that ‘real’ intelligence levels have
increased at least in abstract reasoning. Where is there evidence of this
‘familiarity’? (p.217)
Another major contributing factor to the historical issues surrounding race and
intelligence was the 1995 publication of the best seller The Bell Curve. When Herrnstein
and Murray (1995) published The Bell Curve, they made the argument that intelligence is
largely genetic and that racial differences are an evolutionary bi-product. They included
provocative conclusions, such as the idea that American Black intelligence was not
impacted by slavery, evidenced by the research indicating that American Blacks
performed better on IQ tests than African Blacks. In 1995, Murray added “Afterward”
comments to criticisms of the text. He argued that American social stratification is based
on intelligence, and in particular “g” which is argued to be largely inherited. Murray
summarized the conclusions of The Bell Curve’s controversial 13th chapter:
Mental test scores are generally as predictive of academic and job performance
for blacks as for other ethnic groups. Insofar as the tests are biased at all, they
tend to over-predict, not under-predict, black performance. These factors are
33
useful in the quest to understand why (for example) occupational and wage
differences separate blacks and whites, or why aggressive affirmative action has
produced academic apartheid in our universities. (p.562)
Herrnstein and Murray (1995) presented a review of more than 156 studies to
indicate that there existed the presence of a one standard deviation difference between
Black and White test scores while making the argument that there is no evidence of
external or internal bias of the test that contributes to these differences. As mentioned
earlier, one year after the initial publication of The Bell Curve, Murray wrote an
“Afterward” chapter which has been included in subsequent publications of the text.
Murray wrote alone because Herrnstein died the year the original text was published and
because he felt the need to defend against the attacks made about their book. Murray
noted that Herrnstein and himself accept the idea of g; a general factor of intelligence
within which people differ.
However, lest the reader mistakenly conclude Murray (1995) is strictly arguing
from a genetic point of view, he added: “It is scientifically prudent at this point to assume
that both environment and genes are involved, in unknown proportions; and most
important, people are getting far too excited about the whole issue” (p.563).
In criticism, Welch (2002) credited anthropologist K. Anthony Appiah for the
observation that Herrnstein and Murray’s work “exploited the economic insecurity that
many middle-class whites feel about their own futures” (p.192). Welch (2002)
concluded, “Despite its timing, the ideas presented in The Bell Curve reflect the
continuity of racial bigotry and prejudice that have legitimized the socioeconomic,
cultural, and political oppression of people of African descent” (p. 195). We see then a
34
continued debate about not only the role of g but also the role of genetics and fairness in
the racial differences found in these scales.
Nisbett et al (2012) indicated Blacks gained 5 .5 IQ points on Whites between
1972 and 2002. Rushton, (2012) rebuts claiming there was no narrowing in mean blackwhite IQ differences predicted by heritable g. Rushton (2012) challenges Nisbett’s
(2012) report, stating that they were incorrect in stating that between 1972 and 2002 there
was a 5.5 point in narrowing the 15 point gap between whites and blacks based on
educational gaps and did not describe adequately “how heritable g provides evidence of a
significant genetic contribution to Black- White differences” (p.501). Nisbett et al
(2012) responded to Rushton’s challenge by indicating that his data represented only
mean scores as opposed to looking at the gap reductions in terms of effect size and that
this data indicated the changes in the gaps are “substantial” (p.503).
The history of intelligence testing, as reviewed, indicates some contributions
made by people that may have been compelled to emphasize a perceived difference
between the intelligence of Whites and Blacks based on testing scores. Others have made
contributions in efforts to indicate that a problem existed and that the environment played
a key part in the differences. At the crux of the debate is the idea that one group may be
intellectually advantaged due to nature, and at the other side of the familiar debate is the
idea that the difference is due to environmental factors.
Nature: The Hereditarian Model
In review of some of the more current literature there were strong advocates of the
hereditarian model. The idea advocates that genetics plays a strong role in influencing g
and a strong role in the racial differences within these high stakes tests.
35
Rushton and Ankney (2009) reviewed research on brain size via brain imaging,
external head size and general mental ability. They reported evidence of correlations
(r=.63) between brain size and g. In another study, Rushton (2010) argued that brain size
was correlated not only with IQ, but also with longevity, parental care, intensity of child
rearing practices, and delays in reproductive behaviors. Brain size was also used by
Rushton (2010) to explain differences in IQ between different nations. He concluded,
“Central to answering the question of why nations differ in IQ, longevity, crime, and
economic ‘development status’, is heritable brain power that evolved in part as a
response to natural selection in the colder northern latitudes” (p.99).
It could be interjected here, that an evolutionary explanation might also have
accounted for how people of differing races may have been impacted by years of
oppression and deprivation of intellectual stimulation. This would require further
research than hypothesizing that the genetic explanation resides solely in adaptation to
climatic differences. It could also be challenged that regretfully psychology has been
unable to provide standardized tests to these early nomads thus conclusions about how
the climate impacted their intelligence (by the tools we use to measure intelligence) can
only be speculative.
Using large samples, Jensen (1995), reported having found differences between
Blacks and Whites in head size (controlling for age and body size). Jensen added that
head size significantly correlated with IQ, not only within each racial group, but within
families (i.e., same sex full siblings with age partialed out), indicating a relationship
between brain size and IQ. In addition, Jensen found that this increase in brain size
correlated with g of IQ and that this “three-way pattern in IQ, brain size, and other traits”
was found outside of the United States (Rushton & Jensen, 2005).
36
Rushton and Jensen (2005) reviewed the literature of the past 30 years on race
differences and IQ and described that the “vexing” issue of a 15 point difference in
Black-White scores can be traced back to the mid-19th century. They also observed that
the size of the average Black-White differences do not change beyond age three. Critical
of what they perceived as a “tabula-rosa” approach to IQ, Rushton and Jensen, (2005)
argued that social sciences and members of ethnic groups need to be more receptive to
the idea of accepting genetic differences.
Some have suggested that we cannot expect members of ethnic groups to simply
accept the genetic component in mean-group differences in IQ and other traits.
Yet, with regard to individuals within families, we do acknowledge that some
siblings are more intelligent, more athletic, more physically attractive or more
socially charming than others. We also accept that some families are genetically
more gifted in certain areas than other families. We should, therefore, by
extension, be able to generalize to all the members of the human family. If
viewed against the backdrop that group differences are simply aggregated
individual differences, the former may be easier to accept than has hitherto been
thought. (p.282)
One explanation could be that some members of ethnic groups (and others) may
have difficulty in simply accepting a genetic component is because of the potential
illusory correlation that exists between race and intellectual superiority and inferiority.
“Illusory correlations” have been defined by Goldstein (2005) as what occurs when “a
correlation between two events appears to exist, but in reality the correlation doesn’t exist
or is much weaker than you assume it to be” (p. 460). He added that these correlations
often appear in the form of stereotypical thinking. It may be hard to read statements
37
about genetic research and not conclude that the authors may be implying Black-White
disparity is attributable to one race being born smarter than another; this could be due to
the history of eugenics associated with the testing and the continued dialogues about what
constitutes intelligence.
As recently as 2010 Rushton and Jensen argued for the need for a herditarian
perspective and continue to challenge the notion of a “Flynn effect.” Rushton and Jensen
(2010) provided review of empirical evidence that maintained that the IQ gap between
Blacks and Whites remained at least 15-20 points (1.1 standard deviations) since 1917
when mass testing first started. “Flynn effect” advocates had argued that the average
difference between races had decreased from the Army Alpha of World War I (1917), to
the Army General Classification Test of World War II (1946), to the Armed Forces
Qualification Test of the Vietnam era (1968). And the gap closed by 5.5 points (35%)
between 1970 and 1992 (p.217). Rushton and Jensen (2010) were also critical that
Nisbett claimed that blacks had narrowed the gap in educational achievement by 35% on
the National Assessment of Educational Progress (NAEP) tests adding that Nisbett
argued that educational interventions eliminated the gap altogether. Rushton and Jensen
(2010) challenged Flynn and Nisbett’s findings that the racial disparity gap is shrinking,
as bad research concluding: “To the contrary, we find there is little or no evidence of
narrowing. The evidence presented in its favor rests mainly on insufficient sampling and
selective reporting” (p. 217). Rushton and Jensen (2010) continued to argue for
heritability for IQ differences and added, “we present analysis that demonstrate that over
the last 54 years there has been no narrowing of the Black-White gap in either IQ or
educational achievement”. These authors also predicted, “Black-White differences are
greater on more heritable and g-loaded tests.” (p. 214). However, Rushton and Jensen
38
(2010) were referencing research on the WISC-III and the WAIS-R; thus, it would be
beneficial to explore results of research on the more current scales.
Nurture: The Disparity is argued to be from Environmental Causes
Other recent articles, found in reviewing the literature, offered equally passionate
arguments that intelligence differences could be explained by more external or
environmental factors with disregard for the hereditarian perspective.
Onwuegbuzie and Daley (2001) identified and then challenged eight premises
held by the hereditarian theory of intelligence; they cited studies relevant to challenging
the hereditarian position. Their premises were:
(1). Intelligence is unidimensional and structural, with a dominant factor, g,
representing some core mental ability. ( 2). Intelligence is fixed within individuals
and across generations. (3). IQ tests accurately measure this fixed core mental
ability. (4). IQ tests are equally valid across racial ethnic and cultural groups. (5).
Intelligence determines individual’s professional and social standings. (6). The
environment plays little or no role in determining individual’s levels of
intelligence. (7).The intelligence of populations is deteriorating over time. (8).
Scores on IQ tests are consistent with classical statistical and measurement
theory.” (p. 210)
These authors, as mentioned, continued in their article to explore why each of these
premises may be challenged from the more environmental perspective.
In continuing to discuss their concerns about the lack of acceptance for a
herditarian model Rushton and Jensen (2005) indicated the American Psychological
Association (APA) has taken an environmental position. In 1996, the APA established
39
an 11-person Task force that concluded that while the findings that the White-Black IQ
differences exist, “There is certainly no support for a genetic interpretation.” (p. 217).
Guthrie (2004), seeking to explore a more environmental explanation, was critical
of Jensen and his work on brain size, race, and intelligence noting:
In 1969, Berkeley educational psychologist Arthur Jensen strode into prominence
just as the United States was establishing federally financed compensatory
programs designed to prepare disadvantaged children for increased learning
opportunities. His 1969 article ‘How much can we boost IQ and scholastic
achievement’ was aimed toward discrediting the purpose of such programs by
revitalizing the tired racist theme that inheritance accounts for 80 percent of the
variability in intelligence. Although Jensen attracted a following of supporters,
Illinois professor Jerry Hirsch examined Jensen’s research and ‘uncovered literal
misrepresentations of a kind and to an extent that erodes all confidence in it (and
in him) as a reliable source of information’. Hirsch further questioned the
derivation of Jensen’s formula for estimating broad heritability. ‘He did not say
how this formula is derived. It has no theoretical justification nor does it estimate
heritability broad or narrow.’ (p. 106)
Adding to the environmental argument, Manly (2005) stated that to make such
conclusions about groups of people, concluding that the resultant differences are based in
heredity, is not only in error, but harmful. “Normative data have been used by social
scientists such as Richard Lynn, Arthur Jensen, and Richard Herrnstein, whose research
agendas lead to dangerous and irresponsible biological and genetic interpretations” (p.
272).
40
Using age appropriate IQ measures and conducting research to explore an
environmental explanation for the disparity, Brooks-Gunn, Klebanov, and Duncan (1996)
tested 483 Black and White five year old children born with low birth weight. They
found the traditional one standard deviation IQ score, but identified that the disparity was
decreased if they controlled for economic factors (poverty) and social factors (if a
learning environment existed and if there was “warmth” in the home). Also, from an
environmental perspective, as stated earlier, Gardner concluded that almost every studied
relationship between societal outcome. Also, IQ tests explained at the most 20% of the
outcomes with 80% of the elements that contribute to socioeconomic status being factors
that exist beyond measured intelligence, indicating Gardner’s endorsement of a more
environmentally based explanation for intelligence (Welch, 2006).
Scott (1994) noted an increase in IQ scores of three points, indicating that Blacks
had increased ability to lead “productive lives in a complex society” (p.56) despite
increased impoverishment that occurred simultaneously.
Dickens and Flynn (2006) utilized the results from standardization of the WISC
(WISC-R, WISC-III, and WISC-IV) in 1972, 1989, and 2002, and the WAIS (WAIS-R,
and WAIS-III) in 1978 and 1995. They also utilized standardization results from the
Stanford-Binet and Armed Forces qualifications tests. They argued that the lack of
constancy in the Black-White demonstrates that it is a “myth,” and therefore cannot
serve to argue for a genetic origin for IQ. They reported that the analysis of the studies
show that Black children have had large IQ gains (relative to Whites) since the 1960’s.
Blacks have gained 4-7 IQ points on Whites over the past 30 years. Neither
change in the ancestry of the individuals classified as Black nor those who
41
identify themselves as Black can explain more than a fraction of the gain.
Therefore, the environment has been responsible. (p.917)
Even more recently one speaker noted that we must still concern ourselves with
the potential role environment plays on intellectual development. The October 2011 APA
Monitor (Winerman, 2011) reported that at the 2011 APA convention, one speaker,
Frank Worrell, PhD, shared “We have spent more than half a century trying
unsuccessfully to address the achievement gap” (p.28). This was in reference to the
educational achievement gap between Whites and minorities. Worrell suggested several
environmentally based factors contributing to the achievement gap: (1) A lack of
diversity in schools, (2) a failure to support immigrant students, (3) too few leaders of
color and (4) a failure to support ESL students. Achievement measures are often argued
to demonstrate the predictability of intelligence tests. Thus I would argue that Worrell’s
perspective indicated clear and specific environmental factors contributing to a gap in
achievement and could relate to disparity in test scores.
Review of the literature would indicate there are as many compelling arguments
for viewing the origins of the disparity in Black and White test scores as being from
environmental factors as there have been from more genetically-oriented theories.
Although it aids in simplifying the debate over the origins of the test score disparity by
classifying the discussion into another example of Psychology’s nature-nurture debate,
other research indicates that the issue is more complex.
Beyond Nature versus Nurture: Towards a More Complex Understanding
While Rushton and Jensen are mostly known for advocating the hereditarian
position (that the disparities between Black and White test scores lies in genetic
differences), they appear to have taken a more buffered position when offering
42
explanation of their model. Rushton and Jensen (2005) claimed that their “hereditarian
model” is actually 50% genetic-50% environmental. Further, they sited Jensen’s work on
twin studies and concluded an interactive model of both genetic and environmental
factors best explained the observed Black-White group differences in IQ, whereas both
the genetic-only and the environmental-only explanations were inadequate.
Sattler (2008) further emphasized an interactionist perspective, advocating:
The Flynn effect might be due to improvements in educational opportunities and
schooling, genetic factors, increased cross-ethnic mating, smaller family size, test
sophistication (i.e., improved ability in the population to take intelligence tests)
improvements in cognitive stimulation (e.g., availability of cognitively
stimulating toys, computers, books, and media) better nutrition, and improved
parental literacy. (p.252)
In addition, Sattler (2008), stated: “We believe that intelligence scores represent
interplay of biological factors, environmental factors, and past learning. If ethnic
minority children obtain low scores on intelligence tests, perhaps we need to improve the
educational system rather than abandoning standardized tests” (p. 162).
But rather than reducing conclusions like these to a concept like Interactionism, a
better and more sophisticated perspective has been proposed. According to Perry et al.
(2008), a Culturalist model was developed by Helms in 1992 to serve as an alternative
explanation for the Black-White test score gap. This served to challenge the “implicitbiological” perspective and the “environmental” perspective by explaining that “the
culturalist point of view emphasizes racial group differences in CAT (cognitive ability
test) scores as a matter of cultural bias in the tests and the testing process itself” (p. 156).
The culturalist model accounts for the illusory correlation that may continue to exist in
43
the group differences, and it aids in our acknowledgement of a history of cultural bias in
cognitive test that parallels a known and acknowledged history of racial prejudice and
discrimination that was less implicit than when the tests were first developed.
It is beyond the scope of this dissertation research to succeed in identifying if any
changes that we see in the historical disparities of these tests are brought about by
changes within the genetics or the environment of the test takers. Indeed, studies of the
complexities of evolution have indicated that divorcing nature and nurture is almost
impossible. This research is attempting to explore more about the current culture of
testing and in the hope of determining if there have been changes in the disparity (has the
“Flynn effect” closed any gaps). It is especially important to see if this disparity exists
among test takers on the below average end of the test results.
Determining Which Scales to Use
The data for this dissertation was gathered at a testing site which utilized the
WAIS-IV and the WISC-IV more than other scales. Most of the participants had been
referred for testing by the Bureau of Disability Determination and more often the WAISIV and WISC-IV were the tests requested. Watkins, Wilson, Kotz, Carbone, and Babula
(2006) referred to Prifitera, Saklofske, Weiss, and Rolfus, (2005) when noting that the
WISC-IV has already surpassed the WISC-III as the most widely used test of cognitive
abilities in children. Chen and Zhu (2008) noted that, “Weschler tests are among the
most widely used in the world. Roughly 20 countries have standardized these tests so
far.” (p.206).
Subscales
Studies, similar to this dissertation research establish support for the g factor
theory and have reported that the study of subscales does not indicate that they make
44
independent contributions to the disparity. Watkins, et al. (2006) completed a factor
structure analysis of the WISC-IV among referred students. In their study, 432
Pennsylvania students were referred to be evaluated for inclusion to special education
classes. Of the participants, 176 were female and 256 were male. They were of ages 6-16
(average 10.3, standard deviation 2.7 years). The breakdown of the participant’s racial
background was: 89.6 % white 2.5% black, 1.6 % Hispanic. They found 65% eligible for
special education services: 37 % had learning disabilities 5% mental retardation, 7 %
emotional disabilities, 8% gifted, 2% speech disabilities, and 6% had multiple
disabilities. Full scale IQ scores were found to be “slightly lower and somewhat more
variable than the normative sample,” (p.982), which the authors indicated had been found
to parallel other studies of referred students. But the distribution of the scores was found
to be normal. Analysis of their data did not appear to explore what role, if any, age, race,
or gender may be contributing to the findings, but the results of their factorial analysis
study indicated that g more than any subtest contributed most to the variance in the core
WISC-IV subtests. This fit with the Carroll (1993) three-stratum theory used to develop
the test, and indicated, according to the authors, that the same model proposed to work
with the general population also fits for use with “referred students.”
Although it runs against the g factor perspective, research into the effects of the
subscales could be justified if the purpose of such exploration aids in the development of
culturally-sensitive and fair testing. Shuttleworth-Edwards et al. (2004), in addition to
demonstrating the important difference that quality education plays in intelligence,
referred to multiple studies indicating that block design, digit span, vocabulary, and
arithmetic subtests in the WAIS-III were found to be sensitive to cultural differences.
They added that this was often in a negative direction and was related to educational
45
deprivation. Whitaker (2008) found that children with lower IQ appeared to have
difficulty following instructions on certain subtests, especially Letter-NumberSequencing. He demonstrated that the WISC-IV may be giving lower IQ scores for youth
than the WAIS-III for children with low IQ.
Glass, Ryan, Chater, and Bartels (2009) advised that if we are going to compare
subtests, we need to be certain that we limit the comparisons to hypothesis testing and not
jump to concluding applicability in clinical decision making. They cited research on
internal consistency reliability and what were considered acceptable standards. They
identified that the research indicated a range of .70 to .90 and was found to be generally
acceptable. When the measure is utilized to generate hypothesis, a level of at least .80
needed to be reached in order to be considered adequate, and that internal consistency
reliabilities needed to be even higher (.95) when the tool is used as important decisions
concerning treatment or diagnosis are to be made based on a test score. Because the
results indicated that the internal consistency reliabilities for the subtests are not as high
as recommended for clinical decision making, clinicians need to be careful when
interpreting discrepancy scores. They also noted, “Test results for individuals at different
ability levels, as well as, a variety of clinical groups, including those with neurobiological
disorders (e.g., mental retardation and traumatic brain injury), and should also receive
attention” (p.143). This dissertation was designed to include representation of people
from this understudied group of people with disabilities.
Summary
IQ tests, especially WAIS-IV and WISC-IV are still a primary resource in
determining intellectual disability and disabled populations remain inadequately
researched. Age, gender, and race have each been considered, albeit to variant degrees,
46
as potentially contributing to variations in intelligence testing scores. A review of the
history of intelligence testing indicates controversy over racial differences in intelligence
test scores; including discussion over the role heredity may play in influencing outcomes
of these disability measures. Several authors cited have generated alternate descriptions
of intelligence, yet the primary method for determination of intellectual disability remains
standardized testing. Although debate remains in explanations for differences in test
results, even the strongest and most read advocates of a hereditarian position Herrnstein
and Murray (1995) indicated endorsement of the recognition that if there will be a
narrowing of the disparity of scores between the races it is mostly likely to begin to
appear amongst people with lower test scores.
Having reviewed some of the key literature, Chapter 2 explored the topic of
disability and the need for additional research with this area; it also explored the research
on age and gender as they can serve as potential contributing factors in outcomes related
to testing for intellectual disability. The chapter also provided review of the history of
intelligence testing as it relates to racial disparity in test scores. This chapter also
discussed how the nature/nurture debate was expressed in exploring the heritability of
racial differences in IQ tests. The chapter concluded by exploring the importance of
research on age, gender, along with the subscales in helping to understand which factors
may be contributing to the racial disparity amongst people scoring on the lower end of
the range of cognitive functioning. Chapter 3 explores the methodology used to conduct
this research.
47
CHAPTER III
RESEARCH METHODOLOGY
It is important to take advantage of every opportunity to energize and encourage
research on culture and cognitive test performance. (Manly, 2005, p. 271)
Chapter three describes the design, method, and statistical details of this
dissertation research. The description includes the participants, procedures and research
instruments. Chapter three closes with an explanation of the selected statistical
procedures used to analyze the following research hypothesis:
Research Hypothesis
Hypothesis 1: Among people applying for disability there is a relationship
between age, gender, race, and performance on the WISC-IV (Wechsler, 2003).
Hypothesis 2: Among people applying for disability there is a relationship
between age, gender, race, and performance on the WAIS-IV (Wechsler, 2008).
Procedure
Following Institutional Review Board (IRB) approval from Cleveland State
University and permission to use scores from the Center for Effective Living, data were
collected from existing charts. The data were analyzed from an existing archived source
at the Center for Effective Living, a Midwestern psychology and testing private practice
48
that has been providing services by licensed professionals for more than 30 years. The
center is a private practice site dedicated to working with forensic clients, disability
cases, individual adults, teens, children, and family clients. The data of interest were face
sheets that included the assessors’ descriptions of the clients’ race and gender, another
sheet that includes date of birth, and an additional data sheet that contained the results on
subtests of either the WISC-IV or WAIS-IV, depending on the participant’s age. This
data were obtained as a part of psychological testing to determine participant eligibility
for disability which included a specific request to be assessed for cognitive disability
using one of the two studied scales. A double-blind protocol was implemented. Separate
parties employed by the center and trained in HIPPA (Health Insurance Portability and
Accountability Act, 1996.) compliance and maintenance of confidentiality converted the
data from confidential to anonymous scores. They then entered the data about race, age,
and test scores onto an Excel spreadsheet. The spreadsheet was the only data to leave the
center and only for statistical analyses. No identifying information was transferred to the
researcher that could pair the actual person with either their scores or information about
their age, gender, or race.
During the course of testing, administrators occasionally determine, based on
behavioral observations, if a participant may either be malingering or too impaired by
illness to effectively complete testing. If either malingering or impairment was found to
be the case, the data were not used in the study. Examples of these would be when a
person walks out of the test, fails to complete sections, or becomes too anxious or
agitated to respond.
49
Participants
Participants included approximately children and adults who were referred either
by themselves or from the Social Security Bureau of Disability Determination to the
Center for Effective Living for testing for cognitive impairment and to determine
eligibility for disability income. The collected participant data existed in archival records.
Identifying information was removed by staff before data were entered, by separate,
trained staff, on excel spreadsheets. Participants were of mixed age and gender and had
been identified as being either White or Black. Participants of mixed race were excluded
because of small sample size. Participants’ cognitive functioning was assessed by trained
testers who had at least held a Master’s degree and were supervised by a licensed
Psychologist. All assessments were completed at The Center for Effective Living Review
from November 2008 through August of 2009. These cases were chosen because the
records were ready for filing, about to be disposed of, and most accessible for the staff
completing the spreadsheets.
Instruments
WISC-IV. According to the WISC-IV: Technical and interpretive manual
(Wechsler, 2003), the WISC-IV is “an individually administered clinical instrument for
assessing cognitive ability of children aged 6 years 0 months through 16 years 11
months” (p.1).
As summarized in Table 1., the WISC–IV measures ability in four areas: Verbal
Comprehension Index (VCI), a measure of language command; Perceptual Reasoning
Index (PRI), a measure of manipulation of concrete materials or processing of visual
stimuli to solve non-verbal problems; Working Memory Index (WMI), a measure of
short-term memory; and Processing Speed Index (PSI), a measure of how quickly and
50
correctly someone can think about things needed to complete a task. These four areas are
combined to provide a participant’s Full Scale Intelligence (FSIQ).
The overall reliability (average internal consistency coefficients) of these four
Indexes is contained in the WISC-IV Technical and Interpretive Manual (Wechsler,
2003). The scales and their overall reliabilities are summarized as: Verbal
Comprehension (VCI) .94, Perceptual Reasoning (PRI) .92, Working Memory (WMI),
.92, Processing Speed (PSI), .88, and Full Scale .97. Flanagan and Kaufman (2009,
p.41) report an overall validity of .89 in relation to the WISC-III
Each of the Indexes is composed of subtests. The different subtests contribute to
the scores of the Indexes as follows: Verbal Comprehension (VCI) composed of:
Similarities (SI), Vocabulary (VC), and Comprehension (CO). The Perceptual Reasoning
Index (PRI) is composed of: Block Design (BD), Picture Concepts (PCn), and Matrix
Reasoning (MR). The Working Memory Index (WMI) is composed of Digit Span (DS)
and Letter-Number Sequencing (LN). The Processing Speed Index (PSI) is composed of
Coding (CD) and Symbol Search (SS).
The subtests, their abbreviations, and their descriptions are as follows: Block
Design, (BD). “While viewing a constructed model, or a picture in the Stimulus Book,
the child uses red-and-white blocks to re-create the design within a specified time limit”
(p.2). Similarities (SI): “The child is presented two words that represent common
objects or concepts and describes how they are similar” (p.2). Digit Span (DS): For Digit
Span, Forward: “the child repeats numbers in the same order as presented aloud by the
examiner” (p.2). For Digit Span Backward: “the child repeats numbers in the reverse
order of that presented aloud by the examiner.” Picture Concepts (PCn): “The child is
presented with two or three rows of pictures and chooses one picture from each row to
51
form a group with a common characteristic” (p.2). Coding, (CD): “The child copies
symbols that are paired with simple geometric shapes or numbers. Using a key, the child
draws each symbol in its corresponding shape or box within a specified time limit.” (p.2).
Vocabulary, (VC): “For Picture Items, the child names pictures that are displayed in the
Stimulus Book” (p.2). “For Verbal Items: the child gives definitions for words that the
examiner reads aloud” (p.2). Letter-Number Sequencing (LN): “The child is read a
sequence of numbers and letters and recalls the numbers in ascending order and the
letters in alphabetical order” (p.3). Matrix reasoning, (MR): “the child looks at an
incomplete matrix and selects the missing portion from five response options” (p.3).
Comprehension, (CO): “The child answers questions based on his or her understanding of
general principles and social situations” (p.3). Symbol Search (SS): “The child scans a
search group and indicates whether the target symbol(s) matches any of the symbols in
the search group within a specified time limit” (p.3)
Table 1
WISC-IV, Indexes and Their Subtests
Index
Subtests
Verbal
Similarities (SI), Vocabulary (VC), and Comprehension (CO).
Comprehension (VCI)
Perceptual Reasoning (PRI)
Block Design (BD), Picture Concepts (PCn), and Matrix Reasoning
(MR).
Working Memory (WMI)
Digit Span (DS) and Letter-Number Sequencing (LN).
Processing Speed (PSI)
Coding (CD) and Symbol Search (SS).
52
Subscale Overall Reliability (test-retest, average internal consistency
coefficients), according to the WISC-IV Technical and Interpretive Manual (Wechsler,
2003) are BD = .86, SI =.86, DS = .87, PCn = .82, CD =.85, VC =.89, LN =.90, MR
=.89, CO =.81, and SS =.79.
Additional supplemental subscales (Picture Completion, Cancellation,
Information, Arithmetic, and Word Reasoning) are available as a part of the WISC-IV,
but are not described here, because they were not included in this study. The reason for
their exclusion is that when testing was requested for the participants, it specified that
only the primary subtests be used. Future research could explore the potential gains and
losses of the supplemental scales in relation to the hypotheses.
Flanagan and Kaufman (2004) reported that according to the test manual, there is
validity evidence for test content, response procedure, internal structure, relationship with
other variables, and consequences of testing. According to Pearson’s website,
(Pearsonpsychcorp.com, 2010) careful sampling ensures that norms are representative of
the current population of children in the United States. The WISC–IV sample consisted
of 2,200 children between the ages of 6:00 and 16:11 years. A total of 200 children were
selected for each of the 11 age groups. The sample was stratified on age, sex, parent
education level, region, and race/ethnicity.
WAIS-IV. The WAIS-IV (Wechsler, 2008) is intended for use with adults aged
16 to 90. As summarized in Table 2, the assessment measures cognitive ability using a
core battery of 10 unique subtests that focus on four specific domains of intelligence:
verbal comprehension, perceptual reasoning, working memory, and processing speed.
The WAIS-IV featured a normative sample of 2,200 adults and was stratified by age,
gender, education level, ethnicity, and region to provide the highest reliability of results.
53
Thirteen special group studies included in the data were also conducted with specific
clinical populations by Litchenberger and Kaufman (2009).
WAIS-IV measures ability in four areas: Verbal Comprehension Index (VCI),
which measures command of language, Perceptual Reasoning Index (PRI), which
measures manipulation of concrete materials or processing of visual stimuli to solve nonverbal problems; Working Memory Index (WMI), which measures short term memory;
and Processing Speed Index (PSI), which measures how quickly and correctly someone
can think about things needed to complete a task. These four areas are combined to
provide a participant’s Full Scale Intelligence quotient (FSIQ).
Overall reliability (test-retest average internal consistency coefficients) of these
four Indexes, according to Litchenberger and Kaufman (2009) are summarized as: Verbal
Comprehension (VCI) =.96, Perceptual Reasoning (PRI) = .87, Working Memory (WMI)
= .88, Processing Speed (PSI) =.87, and Full Scale =. 96. Lichtenberger and Kaufman
(2009, p. 32) report an overall validity of .94 in relation to the WAIS-IV.
The Indexes are composed of subtests as follows: VCI is composed of Similarities
(SI), Vocabulary (VC), and Information (IN). PRI is composed of Block Design (BD),
Matrix Reasoning (MR), and Visual Puzzles (VP). WMI is composed of Digit Span (DS),
and Arithmetic (AR). Processing Speed (PSI) is composed of Symbol Search (SS) and
Coding (CD). Litchenberger and Kaufman (2009) provided descriptions of subtests as
follows:
Similarities (SI), the examinee is presented two words that represent common
objects or concepts and describes how they are similar, Vocabulary (VC), for
picture items, the examinee names the object presented visually. For verbal items,
the examinee defines words that are presented visually and orally. Information
54
(IN): the examinee answers questions that address a broad range of general
knowledge topics. Block Design (BD), working within a specified time limit, the
examinee views a model and a picture or a picture only and uses red and white
blocks to recreate the design. Matrix Reasoning (MR), the examinee views an
incomplete matrix or series and selects the response option that completes the
matrix or series. Visual Puzzles (VP) working within a specified time limit, the
examinee views a completed puzzle and selects three response options that when
combined, reconstruct the puzzle. Digit Span (DS), For Digit Span Forward, the
examinee is read a sequence of numbers and recalls the numbers in the numbers
in the same order. For Digit Span Backward, the examinee is read a sequence of
numbers and recalls the numbers in reverse order. Arithmetic (AR), working
within a specified time limit, the examinee mentally solves a series of arithmetic
problems. Symbol Search (SS), working within a specified time limit, the
examinee scans a search group and indicates whether one of the symbols in the
target group matches, and Coding (CD), using a key, the examinee copies
symbols that are paired with numbers within a specified time limit. (p. 25)
According to Litchenberger and Kaufman (2009) the subtests are summarized as
having Test-Retest Reliability Coefficients as follows: “SI=.87, VC=.89, IN=.90,
BD=.80, MR=.74, VP=.74, DS=.83, AR=.83, SS=.81, and CD=.86.” (p.35).
55
Table 2
WAIS-IV: Indexes and Their Subtest
Index
Subtests
Verbal
Similarities (SI), Vocabulary (VC), Comprehension (CO)
Comprehension (VCI)
Information (IN) and Visual Puzzles (VP).
Perceptual Reasoning (PRI)
Block Design (BD), and Matrix Reasoning (MR).
Working Memory (WMI)
Digit Span (DS) and Arithmetic (AR).
Processing Speed (PSI)
Coding (CD) and Symbol Search (SS).
Additional supplemental subtests, Comprehension (CO), Figure Weights (FW),
Picture Completion (PCm), Letter-Number Sequencing (LN), Cancellation (CA), are
available as a part of the WISC-IV, but are not described here because they were not
included in this study. The reason for their exclusion is that when testing was requested
for the participants, it specified that only the primary subtests were to be used. Future
research could explore the potential gains and losses of the supplemental scales in
relation to the hypotheses.
Analyses
The Kolmogorov-Smirnov test was used to determine that the sample was
normally distributed. To answer research Hypothesis 1, “Among people applying for
disability there is a relationship between age, gender, race, and performance on the
WISC-IV (Wechsler, 2003).” A MANOVA was ran to determine any relationships
between age, gender, race, and performance on the WISC-IV (Wechsler, 2003). Since
significance was determined a post hoc MANOVA was run on the subtests. Since the
Full Scale IQ is composed of the Indexes and the Indexes composed of the subtests
separate analysis was warranted.
56
To answer research Hypothesis 2, “Among people applying for disability there is
a relationship between age, gender, race, and performance on the WAISC-IV (Wechsler,
2008).” A MANOVA was run to determine any relationships between age, gender, race,
and performance on the WAIS-IV (Wechsler, 2008). Since significance was determined
a post hoc MANOVA was run on the subtests. Since the Full Scale IQ is composed of the
Indexes and the Indexes composed of the subtests separate analysis was warranted.
Summary
This third chapter outlined the research hypothesis, method, procedure,
instruments and analyses used to answer the research questions. Chapter four will report
the findings of those analyses for each research hypothesis.
57
CHAPTER IV
RESEARCH RESULTS
Introduction
Chapter 4 presents the results of the analyses conducted to answer the research
questions of this dissertation. As stated in the prior chapter these questions are:
Hypothesis 1: Among people applying for disability there is a relationship between age,
gender, race, and performance on the Wechsler Intelligence Scale for Children (WISCIV; Wechsler, 2003). Also, Hypothesis 2: Among people applying for disability there is
a relationship between age, gender, race, and performance on the Wechsler Adult
Intelligence Scale, Fourth Edition (WAIS-IV; Wechsler, 2008). The chapter contains
results of the analyses completed for both the WISC-IV and the WAIS-IV scores in
relation to age, race, and gender amongst people applying for disability income. The
data were statistically analyzed using SPSS, PASW Statistics version 18. Data were
analyzed using GLM (General Linear Model) Multivariate Analysis.
GLM analysis is appropriate when the sample distributions are normal (they
were) and allows for investigation of the effects of both individual factors (such as
gender) and interactive factors (such as gender and age). In addition GLM allows for
analysis when models are unbalanced (when there are an uneven number of participants
58
in different groups) as was the case with this sample. MANOVAS were preferred instead
of multiple ANOVAS to control for inflation of Type I error. Statistical significance was
set at .05 as the research was exploratory.
Demographics
This section provides the demographic data to aid in providing a description of
the participants for the WISC-IV and the WAIS-IV (Table 3) based upon gender and
race. In addition it provides a reporting of participants’ ages for the WISC-IV and the
WAIS-IV (Table 4 and 5) as distributed by the age categories assigned and determined by
the test designers, The Psychological Corporation.
Table 3
WISC-IV and WAIS-IV Demographics (Summary of participants)
Gender
N
Race
N
WISC-IV
Male
161
White
83
Female
96
Black
174
Total
257
Male
246
White
214
Female
148
Black
180
Total
394
257
WAIS-IV
394
As indicated in Table III there were 161 Males and 96 Females that completed testing
using the WISC-IV. Of this same group, 83 participants where White and 174 were
59
Black. Also indicated in table 1, there were 246 Males and 148 Females that completed
testing using the WAISC-IV. Of this same group, 214 where White and 180 were Black.
Table IV provides a detailed reporting of the distribution of participants into the different
age categories as provided by The Psychological Corporation.
Table 4
WISC-IV Age (by range categories; as determined by The Psychological Corporation)
Category
Age Range
N
1
6.0 – 7.11
45
2
8.0 – 9.11
42
3
10 – 11.11
64
4
12 – 13.11
48
5
14 – 15.11
48
6
16+
10
Total
257
Table 5
WAIS-IV Age (categories; as determined by The Psychological Corporation)
Category
Age Range
N
1
16 – 17.11
25
2
18 – 19.11
127
3
20 24.11
48
4
25 – 29.11
22
5
30 – 34.11
25
6
35 – 44.11
42
7
45 – 54.11
65
8
55 – 64.11
40
Total
394
60
Analysis of the WISC IV scores for Age, Race, and Gender: Research Hypothesis 1
Among people applying for disability there is a relationship between age, gender,
race, and performance on the WISC-IV (Wechsler, 2003).
Table 3 provided data about the Mean and Standard Deviation scores for
participants applying for disability income that were assessed for cognitive disability
using the WISC-IV scale.
Table 6
Means and Standard Deviations (in parentheses) of WISC-IV Scores for Disability
Applicants
Scale
FSIQ
VCI
PRI
WMI
PSI
Total sample (N =257)
67.80 (18.58)
19.14 (6.39)
22.56 (6.74)
12.40 (4.53)
13.66 (5.36)
66.68 (19.58)
18.98 (7.01)
22.47 (7.18)
11.98 (4.52)
12.86 (5.71)
70.29 (15.83)
19.42 (5.21)
22.70 (5.97)
13.12 (4.48)
15.01 (4.41)
(White, N = 83)
70.14 (16.12)
20.40 (6.14)
23.39 (6.23)
11.98 (4.36)
14.34 (5.52)
(Black, N = 174)
66.68 (19.58)
18.54 (6.43)
22.16 (6.95)
12.61 (4.61)
13.34 (5.26)
Gender
(Male, N = 161)
(Female, N = 96)
Race
Age (Categorical)
1
6.0 – 7.11 (45)
69.71 (19.68)
18.27 (7.59)
24.24 (6.92)
12.20 (4.55)
14.98 (5.48)
2
8.0 – 9.11 (42)
70.95 (18.12)
20.45 (5.31)
22.76 (6.25)
13.17 (4.73)
14.64 (4.69)
3
10 – 11.11 (64)
72.11 (19.78)
20.84 (6.86)
23.30 (7.32)
13.42 (4.29)
14.66 (5.48)
4
12 – 13.11 (48)
66.69 (16.58)
18.88 (5.55)
21.94 (6.24)
12.48 (4.07)
13.10 (5.13)
5
14 – 15.11 (48)
60.50 (15.05)
17.60 (5.56)
20.87 (6.43)
10.69 (4.42)
11.25 (5.05)
6
16+ (10)
58.70 (20.81)
15.30 (6.16)
20.40 (6.77)
11.50 (6.09)
11.50 (5.54)
61
Full Scale Intelligence Quotient (FSIQ). Univariate analysis of variance was
conducted to explore interactions between gender, race and age on WISC-IV
performance. Table 7 provides results indicating the only variable of significance was
categorical age. This significance was not found when categorical age was explored in
interaction with any other variables. Because the Indices measure distinctly separate
factors that contribute to FSIQ, additional multivariate analyses were conducted on the
WISC-IV Indices. The WICS-IV Indices are: Verbal Comprehension Index (VCI; which
measures command of language), Perceptual Reasoning Index (PRI; which measures
manipulation of concrete materials or processing of visual stimuli to solve non-verbal
problems), Working Memory Index (WMI; which measures short term memory), and
Processing Speed Index (PSI; which measures how quickly and correctly someone can
think about things needed to complete a task). These four areas are combined to provide a
participant’s FSIQ.
Table 7
Univariate tests: WISC-IV: Dependent Variable FSIQ by Age, Race, and Gender
Effect
F
df
p.
Age
2.426
5
.036*
Race
0.372
1
.542
Gender
1.646
1
.201
Race X Gender
0.022
1
.882
Race X Age
0.353
5
.880
Gender X Age
0.052
5
.998
Race X Gender
0.382
4
.821
X Age
*p < .05
62
Indices. A 6 (Age group) X 2 (Race), X 2 (Gender) factorial analysis of variance
(MANOVA) was conducted with the WISC-IV test results to determine the extent to
which there were differences and interactions in age groups, race or gender on
performance on the 4 Indices (VCI, PRI, WMI, & PSI) of the WISC IV. The MANOVA
was preferred over multiple ANOVAS to control for the inflation of Type I error.
Statistical significance was set at .05 as the research was exploratory. Prior to
computing the MANOVA, the Kolmogorov-Smirnov test determined the sample was
normally distributed, meeting the requirement for use of a MANOVA. One-Sample
Kolmogorov-Smirnov Test for the WISC-IV (X= 67.798, SD= 18.576, p= .132) indicated
that the scores for WISC-IV were normally distributed. Table 8 demonstrates that when
the Indices are included, there is significance for both age and gender.
Table 8
Multivariate tests for Age, Race, Gender, and WISC-IV
Effect
Wilks’ λ
F
df (error df)
p.
Age
.892
2.229
20 (767)
.002*
Race
.977
1.366
4 (231)
.247
Gender
.956
2.629
4 (231)
.035*
Race X Gender
.992
.469
4 (231)
.758
Race X Age
.925
.918
20 (767)
.564
Gender X Age
.951
.589
20 (767)
.922
Race X Gender
.941
.886
16 (706)
.586
X Age
*p < .05
As demonstrated in Table 9, statistical significance was found in age which
decreased significantly in performance on the PSI in this sample group, indicating
performance declined with age. Wilks’ λ (20, 2.23) = .829), p = .002.
63
Table 9
WISC source of significance for Age by VCI, PRI, WMI, and PSI
Source
F
df
sig
VCI
1.964
5
.085
PRI
1.395
5
.227
WMI
2.031
5
.075
PSI
4.060
5
.001*
*p < .05
Table 10 indicates a statistically significant relation to gender and performance on
the WISC-IV on the PSI Wilks’ λ (4, 2.63) = .956), p = .035. Males had slower PSI
(X=12.86) compared to females (X=15.0).
Table 10
WISC source of significance for Gender by VCI, PRI, WMI, and PSI
Source
F
df
p.
VCI
0.296
1
.587
PRI
0.104
1
.748
WMI
0.929
1
.336
PSI
7.036
1
.009*
*p < .05
Gender and age were the only significant sources of variance and those differences were
only evident on the Processing Speed Index amongst this sample.
64
Analysis of the WAIS- IV scores for Age, Race, and Gender: Research Hypothesis 2
Among people applying for disability there is a relationship between age, gender,
race, and performance on the WAIS-IV (Wechsler, 2008).
Table 11 provides data about the Mean and Standard Deviation scores for
participants applying for disability income that were testing for cognitive disability using
the WAIS-IV scale.
Table 11
Means and Standard Deviations (in parentheses) of WAIS-IV Scores for Disability
Applicants
Scale
FSIQ
VCI
PRI
WMI
PSI
Total sample (N =394)
58.98 (19.84)
18.14 (8.34)
19.82 (7.78)
11.14 (5.38)
10.98 (5.73)
(Male, N = 246)
58.35 (19.35)
17.81 (6.60)
19.92 (6.82)
10.83 (4.01)
10.25 (4.94)
(Female, N = 148)
60.03 (20.63)
18.70 (11.65)
19.67 (9.18)
11.65 (7.10)
12.21 (6.69)
(White, N = 214)
61.06 (20.91)
19.43 (10.61)
20.66 (8.66)
11.72 (6.35)
11.25 (6.29)
(Black, N = 180)
56.50 (18.21)
16.61 (5.76)
18.83 (6.48)
10.44 (3.84)
10.67 (4.98)
Gender
Race
Age (Categorical)
1
16 – 17.11 (25)
58.88 (16.08)
17.68 (3.78)
18.96 (6.21)
10.68 (3.80)
11.56 (5.16)
2
18 – 19.11 (127)
53.04 (16.59)
16.73 (10.12)
17.16 (5.83)
9.99 (3.87)
10.19 (4.37)
3
20 – 24.11 (48)
59.35 (19.20)
16.96 (6.17)
19.67 (6.66)
11.46 (4.70)
11.54 (4.38)
4
25 29.11 (22)
70.59 (21.08)
20.45 (5.65)
23.14 (7.56)
12.50 (4.13)
14.50 (6.58)
5
30 – 34.11 (25)
52.60 (20.81)
18.12 (7.87)
18.48 (7.51)
9.44 (4.27)
8.80 (4.14)
6
35 – 44.11 (42)
60.19 (19.63)
17.74 (7.89)
20.71 (7.34)
11.57 (4.75)
10.07 (4.49)
7
45 – 54.11 (65)
61.34 (21.49)
18.75 (10.88)
22.11 (10.96)
12.34 (8.88)
12.03 (8.65)
8
55 – 64.11 (40)
69.93 (21.26)
22.50 (7.41)
23.40 (7.10)
12.58 (4.70)
11.17 (6.05)
65
Full Scale Intelligence Quotient (FSIQ). Univariate analysis of variance was
conducted to explore interactions between gender, race and age on WAIS-IV
performance. Table 12 provides results indicating the only variable of significance was
categorical age. This significance was not found when categorical age was explored in
interaction with any other variables.
Table 12
Univariate tests: WAIS-IV: Dependent Variable FSIQ by Age, Race, and Gender
Effect
F
df
p.
Age
5.058
7
.000*
Race
2.717
1
.067
Gender
1.235
1
.267
Race X Gender
0.022
1
.882
Race X Age
0.492
7
.840
Gender X Age
1.067
7
.384
Race X Gender
0.971
7
.452
X Age
*p < .05
Indices. Because the Indices measure distinctly separate factors that contribute to
FSIQ, an additional multivariate analysis was conducted on the WAIS-IV Indices. The
WICS-IV Indices are: Verbal Comprehension Index (VCI), age Perceptual Reasoning
Index (PRI), Working Memory Index (WMI), and Processing Speed Index (PSI). These
four areas are combined to provide a participant’s FSIQ.
66
A 9 (Age group) X 2 (Race), X 2 (Gender) factorial analysis of variance (MANOVA)
was conducted with sample 2 (WAIS-IV test results) to determine the extent to which
there were any differences or interactions in age groups, race or gender on their
performance on the 4 Indices (VCI, PRI, WMI, & PSI) of the WAIS IV. The MANOVA
was preferred over multiple ANOVAS to control for the inflation of Type I error.
Statistical significance was set at .05 as the research was exploratory.
Prior to computing the MANOVA, the Kolmogorov-Smirnov test determined the
sample was normally distributed, meeting the requirement for use of a MANOVA. OneSample Kolmogorov-Smirnov Test for the WAIS-IV (X= 58.982, SD= 19.81, p= .757)
indicated that the scores for WISC-IV were normally distributed.
As demonstrated in Table 13, statistical significance was found in relation to
Categorical age Wilks’ λ (28, 2.969) = .799, p=.001. Table 14 further demonstrates that it
was performance on the PRI, WMI, and the PSI that each had statistically significant
influence by Categorical age.
Table 13
Multivariate tests for Age, Race, Gender, and WAIS-IV
Effect
Wilks’ λ
F
df (error df)
p.
Age
.799
2.969
28 (1296)
.000*
Race
.978
1.983
4 (359)
.097
Gender
.969
2.897
4 (359)
.022
Race X Gender
.999
0.086
4 (359)
.987
Race X Age
.917
1.120
28 (1296)
.305
Gender X Age
.938
0.833
28 (1296)
.716
Race X Gender
.934
0.878
28 (1296)
.648
X Age
*p < .05
67
Table 14
WAIS-IV source of significance for AGE by VCI, PRI, WMI, and PSI
Source
F
df
p.
VCI
1.970
7
.058
PRI
5.178
7
.000*
WMI
2.626
7
.012*
PSI
2.991
7
.005*
*p < .05
Conclusion
This chapter presented the results of the statistical analyses as related to the research
questions. Chapter 5 interprets the analyses and offers recommendations for further
research and practice.
68
CHAPTER V
RESULTS
This chapter interprets the analyses of the results and offers recommendations
for further research and practice. Overall a few main effects, but no interactive effects,
were observed and the general psychometric qualities of the scales did not appear to
demonstrate a significant cultural bias when used with this population. Review of the
demographic information about the sample indicated that amongst individuals applying
for disability at the time and location surveyed, for applicants under the age of 16, more
participants were Black (N =174) than White (N =83). However, after the age of 16,
there were more White applicants than Black. Male applicants outnumbered female
applicants for both administered assessments. The largest age group represented was
among applicants from the ages of 18-20 (N =127), with the second largest group being
age 45-55 (N = 65) and 10-12 year olds (N = 64).
Research Hypotheses
First Research Hypothesis. Among people applying for disability there is a
relationship between age, gender, race, and performance on the Wechsler Intelligence
Scale for Children (WISC-IV; Wechsler, 2003).
69
WISC-IV Results. Multivariate analysis of the WISC-IV scores found
significance (p. >. 05) for Age (Wilks’ λ = .892, F= 2.229 and p.= .002 > .05) and for
Gender (Wilks’ λ = .956, F= 2.629 and p.= .035 > .05). Closer exploration of these
findings indicated that Processing Speed Index (PSI) was the only significant Index to
reflect the differences found in Age (PSI F = 4.060, p. = .001 > .05) Gender (PSI F =
7.036, p. = .009 > .05). This indicated that for the WISC, the older participants
demonstrated slower processing speeds than the younger participants and male applicants
exhibited slower processing speeds than females. However, there did not appear to be an
interactive effect between age and gender. Females were as likely as males to have
decline in relation to age as were males.
Second Research Hypothesis. Among people applying for disability there is a
relationship between age, gender, race, and performance on the Wechsler Adult
Intelligence Scale, Fourth Edition (WAIS-IV; Wechsler, 2008).
WAIS-IV Results. Multivariate analysis of the WAIS-IV scores found
significance (p. >. 05) for Age (Wilks’ λ = .799, F= 2.969 and p. = .000 > .05). Closer
exploration of these findings indicated that Perceptual Reasoning Index (PRI) (F = 5.178,
p. = .000 > .05), Working Memory Index (WMI) (F = 2.626, p. = .012 > .05) and PSI (F
= 2.626, p. = .012 > .05) were all significantly influenced by age. However, inspection
of the mean scores indicated that the age where the highest and lowest average scores
were found, differed for some Indices and even for Full Scale Intelligence Quotient
(FSIQ) (which also varied with age). Thus, while there is a relationship between age and
performance on the WAIS, there did not appear to be a specific predictive pattern that
emerged. Indeed these findings are likely an artifact of how the age categories were
70
assigned. For PRI, the highest average (23.40) was at ages 55- 65 and the lowest average
(17.16) was for ages 18-20. Yet while the highest average (12.58) for WMI was also at
ages 55- 65, the lowest average (9.44) was for ages 30-35. However, similar results
were obtained for 18-20 year olds with low average mean (9.99) on the WMI. PSI had
different age groups represented with the highest mean age between ages 25-30 and the
lowest mean age for ages 30-35. FSIQ for this sample had the highest mean age for 2530 year olds at 70.59 and the lowest mean age for 30-35 year olds with an average FSIQ
of 52.60.
Discussion
It is noteworthy that within the current samples the largest group of people
applying for disability was represented by young adults, at the age when many were
finishing high school and preparing for further education or a career. On the WISC IV, a
lower PSI could be an indication of cognitive difficulties for males under the age of 16
applying for disability. Age effects may also reflect that age had a role in some of the
slower processing speeds for young adults, but there is no measure of developmental
changes within the participants. The findings may be equally reflective of information
about age of onset for differing cognitive difficulties contributing to disability. Either
way, there did not appear to be anything within these findings to raise serious
consideration of the cultural integrity of the test. The test developers provide conversion
tables which adjust for age differences, and the differences in Processing Speed for
males, while worthy of more detailed research, and did not lead the researchers to
question gender biases created by the test. In fact, Sattler (2008) also found a 4 point
difference in scores for males over scores for females in the general population.
71
On the WAIS-IV while age effects were noted on each of the Indices it is likely that,
as mentioned with the WISC-IV results, these results would be adjusted for within
conversion tables of the tests. Further research could explore whether age differences
are representative of differences in the impact of distinct pathologies (for example
anxiety disorder versus attention disorders on cognitive functioning). Although it is
possible there could be age differences having an effect on the performance of applicants
with disabilities differently than in the general population, it was not a conclusion of this
dissertation research that these age differences represent a cultural bias of this test for this
sample.
As previously mentioned, the American Psychological Association has published
Guidelines for the Assessment of and Intervention with Persons with Disabilities (2012).
These guidelines emphasize the importance of recognizing social and cultural diversity
for persons with disabilities (guideline 8, p. 49 ) and the need to apply assessment
approaches that are “psychometrically sound, fair, comprehensive, and appropriate for
clients with disabilities” (guideline 14, p. 52). The two hypothesis of this dissertation
explored two major assessment tools (WISC-IV and WAIS-IV) to determine if they fall
within these guidelines.
The first Research Hypothesis stated: Among people applying for disability there
is no relationship between age, gender, race, and performance on the Wechsler
Intelligence Scale for Children (WISC-IV; Wechsler, 2003).
This data indicated that for the WISC, the older participants had slower processing
speeds than the younger participants and male applicants had slower processing speeds
72
than females. However, there did not appear to be an interactive effect between age and
gender. Females were as likely as males to have decline in relation to age.
The second Research Hypothesis stated: Among people applying for disability
there is no relationship between age, gender, race, and performance on the Wechsler
Adult Intelligence Scale, Fourth Edition (WAIS-IV; Wechsler, 2008). The data indicated
that Perceptual Reasoning Index (PRI) (F = 5.178, p. = .000 > .05), Working Memory
Index (WMI) (F = 2.626, p. = .012 > .05) and PSI (F = 2.626, p. = .012 > .05) were all
significantly influenced by Age. Thus, while there was a relationship between Age and
performance on the WAIS, there did not appear to be a specific predictive pattern that
emerged. Again, these findings were likely an artifact of how the age categories were
assigned.
Therefore, for both Hypothesis 1 and 2 while some relationships between age and
performance, and gender and performance were noted, they were not interactive with
each other or other variables. There was not enough evidence to fully conclude that
either the WISC-IV or the WAIS-IV failed to meet the standards established in the
American Psychological Association’s Guidelines for the Assessment of and Intervention
with Persons with Disabilities (2012). These scales appeared to be “psychometrically
sound, fair, comprehensive, and appropriate for clients with disabilities” (guideline 14, p.
52). The two hypotheses of this dissertation explored two major assessment tools to aid
in determining if they fall within these guidelines when assessing individuals applying for
disabilities and demonstrated the importance of conducting research to continue to assure
assessment tools fall within these guidelines.
73
Limitations and Implications for Future Research
This dissertation study did not identify the etiology of participants’ cognitive
impairments, or any differences that may exist between the state of progression of
differing pathologies and the impact of that on Age differences in performance. So, for
example, while age and gender differences were observed on PSI in WISC-IV testing, it
is not clear whether these differences might have been emotionally based, or were
indicative of processing or perceptual disorders within the participants. Also, there is no
information regarding the relative contribution of organic versus functional factors in the
observed test scores. This includes not being certain of the absence or presence of
potentially confounding psychiatric disorders. These limitations of this dissertation study
indicate the need for further research that explores the role of differences in pathological
etiologies on test outcomes amongst persons applying for disability.
A related factor, also not accounted for in this dissertation research, was the
previously documented role that chronic ongoing stress plays on the brain. Nisbett
(2012) indicated that “Research suggests that part of the Black-White IQ gap may be
attributable to the fact that Blacks, on average, tend to live in more stressful environments
than do Whites. This is particularly the case in urban environments, where Black
children are exposed to multiple stressors (p. 152).” If this is the case the absence of a
gap in scores of Black and White applicants in this sample could indicate that persons
with disabilities experience a higher degree of stress than that observed amongst persons
without disabilities. Future research could include the level of stress in persons applying
for disability experience as compared to stress levels amongst persons without disabilities
and the relationship between these factors and performance on intelligence tests. Stress
74
levels theoretically may also interact with age and could have played a part in differences
for participants who live in economic distress.
As mentioned earlier, the largest group of people applying for disability, in this
sample, was 18 - 20 years old. Future research would be helpful in exploring if this age
group is representative of a larger trend, as it might lend support to the need for more and
longer educational interventions for persons with disabilities leaving high school.
Since data were drawn from people applying for disability determination, scores
on standardized tests were anticipated to be lower than scores from people in the general
population. While some aspects of this were desirable, because the stakes are even higher
for individuals with below average scores, the results of this dissertation cannot be
extrapolated to apply to the general population. Further research would be needed to
determine the similarities and differences of findings for the population of this research
and the general population.
Another limitation is that invalid individual test scores were excluded from the
analyses. Invalid scores occurred for multiple reasons (suspected malingering, cognitive
or severe psychiatric impairments [for example ADHD or Anxiety Disorder]) indicating
the participant may not capable of responding at what would be an anticipated cognitive
ability. An additional study could be conducted to explore the characteristics of
individuals that provide invalid test results during disability determination. An example
would be to explore the SES of individuals that malinger during testing; perhaps applying
for disability income out of financial desperation. Again, not enough is known about the
multiple and complex factors that may contribute to the etiology of the cognitive
75
impairments experienced by the participants. Future research is needed to explore this in
greater depth.
This dissertation research did not account for socio economic status (SES).
Discussion needs to continue about the role SES plays in intelligence test performance
One consideration was made in this research to account for SES through the
consideration of individual’s zip codes, since no information was requested about income
amounts. However, zip codes were not included in the analyzed data since it could not be
known for certain that zip code was indicative of income range, because of fair market
housing regulations.
Age as it was represented in this dissertation research was cross sectional. Had it
been possible to complete a longitudinal, or even cross sequential study of the
participants, it would have provided even more in depth understanding about ways that
intelligence may be changing over time for individuals and how age may interact with
different types of disability on cognitive performance measures over time. An additional
challenge to this dissertation research was the limitations of retesting on the same
instrument.
An additional limitation to the meaning of the results and a potential extraneous
variable is that this dissertation did not explore any relationship between the test taker
and the test provider including any differences that age, gender, or race of either the
examiner or the participant may play on the performance of another. While it is known
that testing conditions were similar for all the participants, it is not known if there were
differences in the person’s administering the tests. While some research has eliminated
76
this as a potential variable, future research may be useful to explore if characteristics of
the test administer remains true.
Another limitation of this dissertation was that it did not have information about
other minority members or members of multiple ethnic backgrounds. Future research,
focused on demonstrating absence of bias, will also be charged with providing more
ethnic categories’ that allow for inclusion of greater clarity in describing diverse groups
that comprise the population.
Even with the defined concept of race for the purposes of
this research, there certainly could have been a large diversity of backgrounds amongst
the people identified as merely Blacks and Whites. The intentional exclusion of Asian,
Hispanic, Bi-racial, and multiracial participants was due to a smaller number of these
groups being tested at the time of data collection. Further research will need to explore
any performance differences for these and within many cultural and ethnic groups.
This dissertation research was limited to one way of measuring intelligence. Although
both WAIS-IV and WISC-IV are popular measures of IQ, they are not the only scales,
and given other theories of intelligence, not the only ways to define intelligence. Future
research could verify or contradict the results of this dissertation by including the
administration of other IQ assessments during the disability application
Our study failed, as do most in this area of research, to fully account for the
influence of racial socialization and stereotype threat. As Manly (2005) indicated, both
are under-researched and may have strong influence on test results. Our study lacks
qualitative histories of the participants. It could be very valuable to hear more history and
information from the participants. For example, as Manly (2005) stated, Blacks are a
group that has been hard to get information from; “African Americans have fears of
77
participation in medical research that are justified by a legacy of unethical use of human
subjects and skin-deep social science. As a result many well-meaning, but inexperienced,
neuropsychological researchers have considerable difficulty enrolling African American
participants in studies” (p. 271). It would be very helpful for future research to utilize, as
Perry et al (2008, p165) suggested, to include Matsumoto’s concept of “unpackaging” to
try to better understand how participants ascribe meaning to the test and to the testing
situation. Since this dissertation study was archival in nature, we did not obtain
information about the test takers experience of the testing event. Authors like Ryan
(2001) are finding test taking experience to be an important variable in test performance
and future research may seek to gather more information from test takers about their
experience with the test.
Future researchers could look at additional races and cultures, explore even more
about differing disabilities and the ways they interact with age, gender, and race to impact
intelligence test performance. (For example do Hispanic females with ADHD perform
differently than Caucasian females with ADHD and is either of these impacted by age or
the etiology of the ADHD). Mayer, et al (2012) described the growing research zeitgeist
towards alternative views of intelligence as the “hot intelligences” and advocated for their
continued inclusion in future research. These “hot intelligences” would include social,
emotional, and personal intelligences including the influence of people’s skills in their
abilities to interact within a social context; as contrasted with “cool intelligences” which
only measure such traditional cognitive functions as abilities to abstract and to
manipulate information. Nisbett, et al (2012) indicated agreement with Mayer, et al
(2012), but added that beyond finding correlations between cool and hot intelligences, it
78
has been hard to demonstrate a significant contribution of hot intelligences towards
general behavior and performance.
Implications for Practice
It will remain critical for there to be continued research on scales used by
psychologists assessing for disability. Keeping certain that the tools used in assessment
of disability are “psychometrically sound, fair, comprehensive, and appropriate for clients
with disabilities” (guideline 14, p. 52) are extremely important considerations prior to
providing testing.
Additionally, while the tests studied here were adequate for their
purpose, with growing understanding of intellectual disability within our profession, the
cognitive assessments we have previously relied upon are argued to need additional
empirical augmentation to provide a complete picture of an individual being considered
as having disability. The Vineland Adaptive Behavior Scales Second Edition (Vineland
II; Sparrow, Cicchetti, & Balla, 2005) is considered a “gold standard” (Scattone,
Donalaggio, & May, 2011, p.626) in the measure of adaptive behaviors. These scales
measure Communication, Language Skills, Motor Skills, Daily Living Skills and
Socialization and provide additional information to the clinician about the individual’s
ability to function beyond the scope of measuring for cognitive impairment and can aid in
meeting criteria established in the DSM-V. However, additional research may need to be
conducted to aid in assuring that the combined scales (WAISIV/WISCIV and Vineland
II) do not pose any cultural bias.
Future research could help in advocating the clear need for greater contextual
information about the men women and children that are applying for disability. To
develop knowledge about how to better assist this group of people would include better
79
integration of both qualitative and quantitative information about the multitude of
complex variables that directly contribute to individuals applying for and potentially
possessing a disability. This would include research involving greater depth of
information about factors contributing and interfering with functioning and adaptation.
Summary
The Diagnostic and Statistical Manual (DSM-V; American Psychiatric
Association, 2013) classifies intellectual disability (Intellectual Developmental Disorder)
under the broader classification of Neurodevelopmental Disorders. This reflects the
impact of the AAID definition of intellectual disability and a general movement away
from strong reliance upon cognitive assessment only; requiring that assessment also
include objective measures for difficulties in adaptive functioning. This is also in
keeping with Rosa’s Law (Public Law 111-256) a United States federal statute replacing
the term “mental retardation” with “intellectual disability.” Yet, while these measures
will not play a solo role in diagnosis it is highly likely they will maintain a key part. The
call on the part of the APA for ethical non biased testing remains and is especially clearly
articulated in the 2012 guidelines for Assessment and Intervention with Persons with
Disabilities. While very limited, this dissertation study provided a reply to the APA
request for applied research directed at the actual implementation of assessment measures
with purposeful exploration of bias in testing. Clearly an enormous amount of research
remains in order to be clear that we have begun to respond to providing assessments
meeting the standards we have established for ourselves as a profession.
80
REFERENCES
AAIDD. (2011). FAQ on Intellectual Disability, http://www.aaidd.org/content,
1/21/2011, 10:07 PM, 1-3.
American Psychological Association, (2003). Guidelines on multicultural education,
training, research, practice, and organizational change for psychologists,
American Psychologist, 58, 377-402.
American Psychological Association. (2012). Guidelines for assessment of and
intervention with persons of disabilities. American Psychologist. 67(1), 43- 62.
American Psychiatric Association. (2013). Diagnostic and statistical manual of mental
Disorders (5th ed.). American Psychiatric Publishing: Washington, DC.
Beins, B. (2010). Teaching measurement through historical sources. History of
Psychology, 13(1), 89-94.
Benjamin, L. (2009). The birth of American intelligence testing. Monitor on Psychology.
40(1), 20-21.
Benjamin, L. & Baker, D. B., (2004). From séance to science: A history of the profession
of psychology in America. Belmont, CA: Thompson, Wadsworth.
Boake, C. (2002). From the Binet-Simon to the Wechler-Bellvue: Tracing the history of
intelligence testing. Journal of Clinical and Experimental Neurology, 24( 3), 383405.
Brooks-Gunn, J. B., Klebanov, P. K. & Duncan, G. (1996). Ethnic differences in
children’s intelligence test scores: role of economic deprivation, home
environment, and maternal characteristics. Child Development, 67, 396-408.
81
Butler-Omololu, C., Doster, J. A., & Lahey, B. (1984). Some implications for intelligence
test construction with children of different racial groups. The Journal of Black
Psychology, 10, 63-75.
Carroll, J. B. (1993) Human cognitive abilities. Cambridge: Cambridge University Press.
Chen, H. & Zhu, J. (2008). Factor invariance between genders of the Weschler
Intelligence Scale for Children-Fourth edition. Personality and Individual
Differences, 45, 260-266.
Census data (2011). United States census 2010 data. Retrieved from:
http://www.census.gov/2010census/data/.
College Entrance Examination Board. (1992). Validity study sample of the 1991SAT
administration. New York, NY: College Entrance Examination Board.
Colom, Lluis-Font, & Andres-Peuyo (2005). The generational intelligence gains are
caused by decreasing variance in the lower half of the distribution: Supporting
evidence for the nutrition hypothesis. Intelligence, 33, 83-91. In Sattler, J. M.
(Ed.). Assessment of Children, Cognitive Foundations (5th ed.). LaMesa, CA:
Jerome M. Sattler. Publisher, Inc.
Detterman, D. K. & Daniel, M. H. (1989). Correlations of mental tests with each other
and with cognitive variables are highest for low IQ groups. Intelligence, 13, 349359.
Dickens W. T. & Flynn, J. R. (2006). Black Americans reduce the racial IQ gap.
Psychological Science, 17(10), 913-920.
Dove, A. (1968). Taking the chitling test, Newsweek, 72, 51-52.
82
Dweck, C. (2000). Self-Theories: Their role in motivation, personality, and development.
Philadelphia, PA: Psychology Press.
Erdfelder, E., Faul, F., & Buchner, A. (1996). GPOWER: A general power analysis
program. Behavior Research Methods, Instruments, & Computers, 28, 1-11.
Flanagan, D. P., & Kaufman, A. S. (2009). Essentials of WISC-IV assessment, (2nd ed.).
Hoboken, NJ: John Wiley & Sons, Inc.
Flynn, J. R. (1987). Massive IQ gains in 14 nations: What IQ tests really measure.
Psychological Bulletin. 101, 171-191.
Flynn, J. R., (1999). Searching for Justice: The discovery of IQ gains over time.
American Psychologist. 54(1), 5-20.
Foley-Nicpon, M. & Lee, S. (2012). Disability research in counseling psychology
journals: A 20- year content analysis. Journal of Counseling Psychology, 59,
392-398.
Gardner, H. (1983). Frames of mind: The theory of multiple intelligences. New York:
Basic books.
Gasquoine, P. (2009). Race-norming of neuropsychological tests. Neuropsychology
Review. 19, 250-262.
Glass, L. A., Ryan, J. J., Chater, R. A., and Bartels, J. D., (2009). Discrepancy score
reliabilities in the WISC-IV standardization sample. Journal of
Psychoeducational Assessment, 27(2), 138-144.
Goddard, H. H. (1912). The Kallikak family: as study in the heredity of feeble
mindedness. New York, MacMillan.
83
Goldstein, E. B., (2005). Reasoning and decision making. In E. B. Goldstein (Ed.).
Cognitive psychology: connecting mind, research, and everyday experience (pp.
427-478). Belmont, CA: Thompson Wadsworth.
Gould, S. J. (1981). The mismeasure of man. New York: W.W. Norton.
Guthrie, R.V. (2004). Even the rat was white: a historical view of psychology. Boston:
Pearson.
Health Insurance Portability and Accountability Act of 1996 (Pub.L. 104-191, 110
Stat.1936, enacted August 21 1996)
Helms, J. E. (2002). A remedy for the black-white test-score disparity. American
Psychologist. 47, 303-305.
Helms, J. E. (2006). Fairness is not validity or cultural bias in racial-group assessment:
The quantitative perspective. American Psychologist, 6(8), 845-859.
Helmes, J. E., & Cook, D. A. (1999). Using race and cultural in counseling and
psychotherapy: Theory and process. Boston: Allyn & Bacon.
Helmes, J. E., & Talleyrand, R. M. (1997). Race is not ethnicity. American Psychologist,
52, 1246-1247.
Herrnstein, R. J., & Murray, C. (1995). The bell curve: Intelligence and class structure in
American life. . New York: Simon and Schuster Inc.
Hiscock, M. (2007). The Flynn effect and its relevance to neuropsychology. Journal of
Clinical and Experimental Neuropsychology, 29(5), 514-529.
Howell, D. C. (2007). Statistical methods for psychology, (6th ed.). Belmont, CA:
Thompson Wadsworth .
84
Jackson, D. N., Rushton, J. P., (2006). Males have greater g: Sex differences in general
mental ability from 100,000 17- to 18- year-olds on the Scholastic Assessment
Test. Intelligence, 34(5), 479-486.
Jensen, A.R. (1995). Psychological Research on Race Differences, American
Psychologist, 50, 1,
Johnson, C. K. & Liu, X. (2008). A CHC theory based analysis of age differences on
cognitive abilities and academic skills at ages 22 and 90 years. Journal of
Psychoeducational Assessment. 26(4), 350-381.
Kaplan, R. M., & Saccuzzo, D. P. (2001). Psychological testing: Principles, applications
and issues (5th ed.). Belmont, CA: Wadsworth.
Kaufman, A., Johnson, C. K. & Liu, X. (2008). A CHC theory based analysis of age
differences on cognitive abilities and academic skills at ages 22 and 90 years.
Journal of Psychoeducational Assessment, 26(4), 350-381.
Kaufman, A. S., & Kaufman, N. L. (2004). Brief intelligence test - second edition
(KBIT-2). Circle Pines, MN: American Guidance Service.
Koocher, G.P. (2003). IQ Testing a matter of life or death. Ethics & Behavior. 13(1), 1-2.
Long, P. A., & Anthony, J. J. (1974). The measurement of mental retardation by a
culture-specific test. Psychology in the Schools, 11, 310-312.
Litchenberger & Kaufman (2009). Essentials of WAIS-IV assessment. Hoboken, NJ: John
Wiley and Sons, Inc.
Manly, J. J., (2005). Advantages and disadvantages of separate norms for African
Americans. The Clinical Neuro-Psychologist, 19, 270-275.
85
Matarazzo, J. D., & Wiens, A. N. (1977). Black intelligence test of cultural homogeneity
and Wechsler adult intelligence scale scores of black and white police applicants.
Journal of Applied Psychology, 62, 57-63.
Mayer, J. D., Caruso, D. R., Panter, A. T., Salovey, P. (2012). The growing significance
of hot intelligences. American Psychologist, 67(2), 502.
Mercado III, E. (2009). Cognitive plasticity and cortical modules. Current Directions in
Psychological Science. 18(3), 153- 158.
Murray, C. (2007). The magnitude and components of change in the black- white IQ
difference from 1920 to 1991: A birth cohort analysis of the Woodcock-Johnson
standardizations. Intelligence, 35, 305-318.
Nisbett, R. E., Aronson, J., Blair, C., Dickens, W., Flynn, J., Halpern, D.F., &
Turkheiner, E. (2012). Intelligence: New findings and theoretical developments.
American Psychologist, 67(2), 130-159.
Onwuegbuzie, A. J. &Daley, C. E. (2001). Racial differences in IQ revisited: A synthesis
of nearly a century of research. Journal of Black Psychology, 27(2), 209-220
Perry, J. C., Satiani, A., Henze, K. T., Mascher, J., & Helms, J. E. (2008). Why is there
still no study of cultural equivalence in standardized cognitive ability tests?
Journal of Multicultural Counseling and Development, 36, 155-167.
Pickren, W. E. & Dewsbury, D. A., (2002), Evolving perspectives on the history of
psychology. Washington, DC: American Psychological Association..
Pinker, S. (1997). How the mind works. New York: W.W. Norton & Company.
Psychological Corporation. (2003). WISC-IV Scoring and Administration Manual. San
Antonio, TX: Author.
86
Rushton, P. J. (2010). Brain size as an explanation of national differences in IQ,
longevity, and other life-history variables. Personality and Individual differences.
48, 97-99
Rushton, P. J. (2012). No narrowing in mean black-white IQ differences predicted by
heritable g. American Psychologist, 67(2), 500-501.
Rushton, P. J. & Ankney, C. D. (2009). Whole brain size and general mental ability: a
review. International Journal of Neuroscience, 119(5), 691-731.
Rushton, P. J. & Jensen, A. R., (2005). Thirty years of research on race differences in
cognitive ability. Psychology, Public Policy, and Law. 11(2), 235-294.
Rushton, P. J., Jensen, A. R. (2010). The rise and fall of the Flynn Effect as a reason to
expect narrowing of the Black-White IQ gap. Intelligence , 38, 213-219
Sattler, J. M. (2008). Assessment of children, cognitive foundations (5th ed.). LaMesa,
CA: Jerome M. Sattler. Publisher, Inc.
Scattone, D., Donalaggio, D., & May, W. (2011). Comparison of the Vineland Adaptive
Behavior Scales Second Edition, and the Bayley Scales of Infant and Toddler
Development, Third Edition. Psychological Reports, 109(2), 626-634
Scott, D. (1994). Cognitive conceit: A review of the bell curve. A Social Policy. 25, 5059.
Shipley, W. C. (1940). A self-administering scale for measuring impairment and
deterioration. Journal of Psychology, 9, 371-377.
Shuttleworth-Edwards A., Kemp.R, Rust. A, Muirhead. J, Hartman. N., & Radloff. S.,
(2004). Cross-cultural effects on IQ test performance: A review and preliminary
87
normative indications on WAIS-II test performance. Journal of Clinical and
Experimental Neuropsychology, 26(7), 903-920
Sparrow, S., Cicchetti, D., & Balla, D., (2005). Vineland- II: Vineland adaptive behavior
scales survey forms manual. The Psychological Corporation.
Sternberg, R. J. (2000). Implicit theories of Intelligence as exemplar stories of success
why intelligence test validity is in the eye of the beholder. Psychology, Public
Policy, and Law. 6(1), 159-167.
Suinn, R. & Borrayo. (2008). The Ethnicity Gap: The Past, Present, and Future.
Professional Psychology: Research and Practice. 39(6), 646-651(US Census
Bureau, 2004) (US interim reports by age sex race and Hispanic origin retrieved
from http://www.census.gov/ipc/www/usinterimproj/)
Suzuki, L., & Valencia, R., (1997). Race-Ethnicity and Measured Intelligence:
Educational Implications. American Psychologist. 52(10), 1103-1114.
Terman, L. M. (1916). The measurement of intelligence. Boston: Houghton Mifflin. .
Watkins, M. W., Wilson, S. M., Kotz, K. M., Carbone, M. C., & Babula, T., (2006).
Factor structure of the Weschler Intelligence Scale for Children-Fourth Edition
among referred students. Educational and Psychological Measurement. 66(6),
975-983
Weiten, W. (2011). Psychology: Themes and variations (8th ed.). Mason, OH:
Wadsworth.
Wechsler, D. (2003). WISC-IV: Technical and interpretive manual. San Antonio, TX:
The Psychological Corporation.
88
Welch, K. C. (2002). The bell curve and the politics of negrophobia. In Fish, J. M. (Ed.).
Race and intelligence separating science from myth (p.177-198). Mahwah, NJ:
Lawrence Erlbaum.
Whitaker, S. (2008). WISC-IV and low IQ: Review and comparison with the WAIS-III.
Educational Psychology in Practice. 24(2), 129-137.
Winerman, L. (2011). Where’s the progress? Monitor on Psychology. 42(9), 28.
Williams. R. L. (1972). The BITCH-100: A culture-specific test. Paper presented at the
American Psychological Association Annual Convention, Honolulu, Hawaii.
Woodcock, R. W, & Johnson, M. B. (1989). Woodcock-Johnson psycho-educational
battery-revised. Allen, TX: DLM/Teaching Resources.
WWW. Pearsonpsychcorp.com
Young, A. & Rearden, J. (1979). Black intelligence test of cultural homogeneity and
Shipley-institute of living scale scores for black Chicago youths. Psychological
Reports, 45,457-458.
89