Aggregating form accuracy and percept frequency to optimize

The University of Toledo
The University of Toledo Digital Repository
Theses and Dissertations
2015
Aggregating form accuracy and percept frequency
to optimize Rorschach perceptual accuracy
Sandra L. Horn
University of Toledo
Follow this and additional works at: http://utdr.utoledo.edu/theses-dissertations
Recommended Citation
Horn, Sandra L., "Aggregating form accuracy and percept frequency to optimize Rorschach perceptual accuracy" (2015). Theses and
Dissertations. 1989.
http://utdr.utoledo.edu/theses-dissertations/1989
This Dissertation is brought to you for free and open access by The University of Toledo Digital Repository. It has been accepted for inclusion in Theses
and Dissertations by an authorized administrator of The University of Toledo Digital Repository. For more information, please see the repository's
About page.
A Dissertation
entitled
Aggregating Form Accuracy and Percept Frequency
to Optimize Rorschach Perceptual Accuracy
by
Sandra L. Horn
Submitted to the Graduate Faculty as partial fulfillment of the requirements for the
Doctor of Philosophy Degree in Clinical Psychology
__________________________________________
Gregory J. Meyer, Ph.D., Committee Chair
__________________________________________
Jeanne Brockmyer, Ph.D., Committee Member
__________________________________________
Joni L. Mihura, Ph.D., Committee Member
__________________________________________
Jason P. Rose, Ph.D., Committee Member
__________________________________________
Donald J. Viglione, Ph.D., Committee Member
__________________________________________
Patricia R. Komuniecki, Ph.D., Dean
College of Graduate Studies
The University of Toledo
December 2015
Copyright 2015, Sandra L. Horn
This document is copyrighted material. Under copyright law, no parts of this document
may be reproduced without the expressed permission of the author.
An Abstract of
Aggregating Form Accuracy and Percept Frequency
to Optimize Rorschach Perceptual Accuracy
by
Sandra L. Horn
Submitted to the Graduate Faculty as partial fulfillment of the requirements for the
Doctor of Philosophy Degree in Clinical Psychology
The University of Toledo
December 2015
Exner’s (2003) Comprehensive System and Meyer et al.’s (2011) Rorschach Performance
Assessment System use Form Quality scores as a method for assessing the accuracy of
perceptions on the Rorschach. However, Form Quality is a rather coarse classification
method as it is based on just three options along a continuum of perceptual accuracy.
There is currently not a fully dimensional Rorschach score that can thoroughly and
efficiently tap into both the frequency with which particular objects are reported while
taking the test and the perceptual fit of those objects to the cards. This study is focused on
exploring the structure of a fit variable, Form Accuracy, in combination with a frequency
variable, Percept Frequency, to make progress on a new dimensional method of scoring
perceptual accuracy that will improve the ability to identify distorted perceptual
processes and impaired reality testing and thus improve validity coefficients in the
Rorschach-based identification of psychosis. Percept Frequency tables were developed
from six internationally collected samples from Argentina, Brazil, Italy, Japan, Spain, and
the U.S. that quantified how often objects were reported while completing the Rorschach
task. Form Accuracy ratings were obtained from a database of 13,031 objects that had
iii
been rated an average of 9.9 times by different judges from eleven countries who were
asked to rate the extent to which the object fit the contours of the inkblot at the location
where it was seen. A criterion database containing 159 protocols and 3,897 scorable
responses was then scored for Form Accuracy and Percept Frequency. Hierarchical
Linear Modeling was used to complete structural analyses of Form Accuracy and Percept
Frequency scores at the response level, and correlations of these variables were computed
at the protocol level with a criterion measure assessing severity of disturbance based on
psychiatric diagnoses. Across different levels of aggregation, there was resounding
evidence that the structure of each of the ten Rorschach cards and the sequence of first,
second, third, or fourth responses given to a card played a large role in determining Form
Accuracy and Percept Frequency scores. As such, these scores are strongly influenced by
structural features of the Rorschach task that cannot be entirely attributed to stable
characteristics of the test-taker. There were consistent clustering effects in the data due to
the card number and due to the response within a card. Predicted scores for Form
Accuracy and Percept Frequency were highest on Cards 5, 1, and 7, and they were lowest
on Cards 9 and 6; scores were also lowered with each subsequent response within a card.
Surprisingly, Percept Frequency scores did not correlate with the criterion measure of
diagnostic severity, though Form Accuracy did have small correlations. Understanding
the structural patterns of the fit and frequency data is an important undertaking in forming
the foundation for future research on a dimensional Rorschach perceptual accuracy
scoring system.
iv
Dedicated to my parents, Leah and Terry, and to Grampy.
You have my utmost gratitude for your unconditional love and support. You taught me
math by helping with homework at the kitchen table and having me calculate
measurements in the shop; you helped me develop a love for reading by taking me to pick
out books at the library and pretending you didn’t realize I was reading under the covers
with a flashlight at night. Whatever the lesson for the day happened to be, you were
teaching me the value of hard work and instilling in me a deep appreciation and yearning
for education. You gave me the skills necessary to succeed and granted me the space to
determine my own path in life; for this I am forever grateful.
Acknowledgements
I would like to first acknowledge my advisor, Dr. Gregory Meyer. My graduate
education has been a long and emotional journey and I am forever grateful that I was able
to travel this road with him as my mentor. This dissertation is one of many
accomplishments that would not have been possible without his support and guidance.
His dedication to my education and professional growth has been unwavering, and I feel
incredibly lucky to have had an advisor who is so devoted to helping students learn,
grow, and find their path.
I would also like to thank my committee members, Dr. Jeanne Brockmyer, Dr.
Joni Mihura, Dr. Jason Rose, and Dr. Donald Viglione. They donated significant amounts
of time and energy to me on this dissertation, and their insights and suggestions were
spot-on, leading to a final product that I feel very proud of. It is an honor to have had
their encouragement, feedback, and support.
I have felt overwhelming support from so many friends, family members,
colleagues, and supervisors, it would be impossible to name everyone here. However, I
wholeheartedly thank each and every one of you for the hugs, laughs, talks over dinners
and beers, and phone calls that always seemed to come when I needed them most. I
deeply appreciate all of your love, support, and encouragement that constantly enveloped
me and kept me moving toward my goals. Without you all, I could not have endured
through the unavoidable ups and downs of graduate school and a dissertation. Thank you.
vi
Table of Contents
Acknowledgements ............................................................................................................ vi
Table of Contents .............................................................................................................. vii
List of Tables ..................................................................................................................... xi
List of Figures ................................................................................................................... xii
List of Abbreviations and Rorschach Scores ................................................................... xiv
I. Introduction .................................................................................................................... 1
II. Review of the Literature ................................................................................................ 9
Perceptual Accuracy, Reality Testing, and Psychosis .................................................... 9
Rorschach Form Quality (FQ) ...................................................................................... 10
History of the development of FQ ............................................................................ 10
Comprehensive System (CS) scoring of FQ ............................................................. 14
Rorschach Performance Assessment System (R-PAS) scoring of FQ ..................... 20
Review of FQ validity. .............................................................................................. 25
FQ validity – differentiation of clinical groups .................................................... 25
FQ validity – criterion validity ............................................................................. 29
FQ validity – SCZI, PTI, TP-Comp, & EII........................................................... 33
FQ validity – malingering ..................................................................................... 43
Limitations of FQ ...................................................................................................... 44
Rorschach Form Accuracy (FA) ................................................................................... 47
The development of FA ............................................................................................ 47
vii
Scoring of FA ............................................................................................................ 50
Review of FA validity............................................................................................... 52
Rorschach Frequency of Perceptions ............................................................................ 56
Popular responses...................................................................................................... 57
Findings using Rorschach indices of response frequency ........................................ 58
Statement of the Problem .............................................................................................. 59
Purpose of the Present Study ........................................................................................ 61
Principle of Aggregation ............................................................................................... 63
Research Questions ....................................................................................................... 65
III. Method ....................................................................................................................... 66
Participants .................................................................................................................... 66
Percept Frequency samples ....................................................................................... 66
U.S. Sample .......................................................................................................... 66
Argentinean Sample .............................................................................................. 66
Italian Sample ....................................................................................................... 67
Spanish Sample ..................................................................................................... 67
Japanese Sample ................................................................................................... 67
Brazilian Sample ................................................................................................... 67
Criterion Database. ................................................................................................... 67
Measures ....................................................................................................................... 69
Percept Frequency samples measures ....................................................................... 69
Criterion Database measures..................................................................................... 73
Procedures ..................................................................................................................... 76
viii
Frequency tables construction................................................................................... 76
Structure of the original FA and PF tables............................................................ 76
Coding the U.S. Sample ........................................................................................ 77
Updating and adding variables to the FA and PF tables ....................................... 79
Criterion Database coding......................................................................................... 81
Coder training and interrater reliability ................................................................ 81
Coding FA and PF ................................................................................................ 83
Statistical Analyses ....................................................................................................... 85
Overview of planned analyses .................................................................................. 85
Hierarchical Linear Modeling (HLM) ...................................................................... 86
Supplemental analysis strategies............................................................................... 93
IV. Results........................................................................................................................ 94
Interrater Reliability ...................................................................................................... 94
Frequency Tables: Descriptives .................................................................................... 94
Criterion Database: Descriptives .................................................................................. 95
Criterion Database: HLM ........................................................................................... 100
HLM models for FA ............................................................................................... 100
HLM models for PFM ............................................................................................ 111
HLM models for PFN1.5 ........................................................................................ 122
Supplemental Analysis Strategies ............................................................................... 132
V. Discussion ................................................................................................................. 152
Updating the PF Tables ............................................................................................... 155
Interrater Reliability .................................................................................................... 156
ix
Modeling the Criterion Database ................................................................................ 157
Modeling the Structure of FA ..................................................................................... 158
Modeling the Structure of PFM .................................................................................. 161
Modeling the Structure of PFN1.5 .............................................................................. 164
Summary of Variable Structures Across Modeling Techniques................................. 166
Strengths and Limitations of the Study....................................................................... 169
Expected and Surprising Findings .............................................................................. 171
Conclusions ................................................................................................................. 174
References ....................................................................................................................... 177
x
List of Tables
Table 1. New Response Objects Derived From the U.S. Frequency Sample ................... 95
Table 2. Descriptive Statistics........................................................................................... 97
Table 3. Mean Values by Card Number and R_InCard .................................................... 99
Table 4. Statistical Summary of FA HLM Models ......................................................... 109
Table 5. Statistical Summary of PFM HLM Models ...................................................... 120
Table 6. Statistical Summary of PFN1.5 HLM Models.................................................. 130
Table 7. Protocol-Level Descriptive Statistics................................................................ 133
Table 8. Protocol-Level Cohen’s d by Card and R_InCard ............................................ 143
Table 9. Response-Level Cohen’s d by Card and R_InCard .......................................... 146
xi
List of Figures
Figure 1. Card 3 location D3............................................................................................ 17
Figure 2. Image of a butterfly. ......................................................................................... 17
Figure 3. Image of a dumbbell. ........................................................................................ 17
Figure 4. Image of a dragonfly. ....................................................................................... 17
Figure 5. Card 3 location D2............................................................................................ 46
Figure 6. Image of an anchor ........................................................................................... 46
Figure 7. Images of fishhooks.......................................................................................... 47
Figure 8. Card 3 location D3............................................................................................ 49
Figure 9. Image of a bowtie ............................................................................................. 49
Figure 10. Image of an insect ........................................................................................... 50
Figure 11. Image of a werewolf ....................................................................................... 50
Figure 12. Card III location D2........................................................................................ 72
Figure 13. Protocol-Level FA Means by Card Number. ............................................... 135
Figure 14. Protocol-Level PFM Means by Card Number.............................................. 136
Figure 15. Protocol-Level PFN1.5 Means by Card Number. ........................................ 137
Figure 16. Protocol-Level FA Means by R_InCard....................................................... 138
Figure 17. Protocol-Level PFM Means by R_InCard. ................................................... 139
Figure 18. Protocol-Level PFN1.5 Means by R_InCard. .............................................. 140
Figure 19. Protocol-Level Cohen’s d by Card on FA, PFM, and PFN1.5. .................... 144
Figure 20. Protocol-Level Cohen’s d by R_InCard on FA, PFM, and PFN1.5. ............ 145
xii
Figure 21. Response-Level Cohen’s d by Card on FA, PFM, and PFN1.5. .................. 147
Figure 22. Response-Level Cohen’s d by R_InCard on FA, PFM, and PFN1.5. .......... 148
xiii
List of Abbreviations and Rorschach Scores
Rorschach Acronyms, Codes, and Indices
CS
Comprehensive System
R-PAS
Rorschach Performance Assessment System
FQ
Form Quality (CS; R-PAS)
+
Ordinary-Elaborated Form Quality (CS)
o
Ordinary Form Quality (CS; R-PAS)
u
Unusual Form Quality (CS; R-PAS)
–
Minus Form Quality (CS; R-PAS)
W
Whole Location (CS; R-PAS)
D
Common Detail Location (CS; R-PAS)
Dd
Unusual Detail Location (CS; R-PAS)
PTI
Perceptual Thinking Index (CS)
SCZI
Schizophrenia Index (CS)
TP-Comp
Thought and Perception Composite (R-PAS)
WDA%
Percentage of responses given to common (W or D) locations that
have appropriate form use (i.e. FQ coding of +, o, or u) (CS)
WD-%
Percentage of responses given to common (W or D) locations that
have distorted form use (i.e., FQ coding of -) (R-PAS)
X+%; FQo% Percentage of responses that are common and have appropriate
form use (i.e. FQ coding of + or o) (CS; R-PAS)
XA%
Percentage of responses that have appropriate form use (i.e. FQ
coding of +, o, or u) (CS)
Xu%; FQu% Percentage of responses that are uncommon and have appropriate
form use (i.e., FQ coding of u) (CS; R-PAS)
X-%; FQ-% Percentage of responses that have distorted form use (i.e., FQ
coding of –) (CS; R-PAS)
Rorschach Perceptual Accuracy
FA
Form Accuracy
PA
Perceptual Accuracy
PF
Percept Frequency
PFM
The response-level mean of the object-level averages of the 6
countries’ percentage-based frequency values, for values greater
than or equal to 1.5%
PFN1.5
The response-level mean of the object-level counts of countries
(range 0-6) that had a percentage-based frequency value of greater
than or equal to 1.5%
xiv
Chapter One
Introduction
The Rorschach Inkblot Task (commonly referred to as “the Rorschach”) was
introduced to the mental health professions by Hermann Rorschach (1921/1942), a
psychiatrist with an artistic bent. Finding inspiration in a popular game at the time,
Klecksographie (Blotto), in which the players made inkblots and then formed
associations or told stories about the images, Rorschach began to formulate sophisticated
hypotheses about how inkblot images could be used to investigate individual differences
on psychological constructs (Exner, 2003). According to Exner, the initial studies and
observations made by Rorschach and his colleagues pertained to the use of inkblots in
identifying psychosis. Although Rorschach began development of stimuli for his inkblot
experiments by designing 40+ images, he soon selected a set of 15–16 blots for his early
research, and then settled on a final set of 12 images for later projects. However, when
Rorschach sent the 12 blots to press he had to reduce both their size and number and so
he selected a set of 10 inkblots due to limitations imposed by the publisher. The 10 blots
designed and selected by Rorschach now comprise the standard set of Rorschach cards
used in current clinical research and practice.
The Rorschach is a commonly used psychological assessment method (e.g.,
Camara, Nathan, & Puente, 2000; Clemence & Handler, 2001; Sundberg, 1961) in which
a person is presented with the standard series of 10 inkblots and is asked to respond to
1
each, answering the question, “What might this be?” The Comprehensive System (CS;
Exner, 2003) has been the most commonly used administration and interpretation system
for Rorschach assessment for decades (Mihura, Meyer, Dumitrascu, & Bombel, 2013),
with 96% of a recently-surveyed international sample of clinicians reporting they use the
CS as their primary system when coding and interpreting the Rorschach (Meyer, Hsiao,
Viglione, Mihura, & Abraham, 2013). The new Rorschach Performance Assessment
System (R-PAS; Meyer, Viglione, Mihura, Erard, & Erdberg, 2011), with its primary
foundations in the CS and the current published literature, has also been gaining traction
since its publication.
The CS and R-PAS have roots in the work of other systematizers that have strived
over the years to develop, standardize, and validate various methods for obtaining and
scoring Rorschach protocols. After Exner carefully reviewed the existing Rorschach
systems, he published the basic CS foundations in 1974. Although he pulled a
combination of elements from existing Rorschach systems, Exner (1974) also included
some new methodological, scoring, and interpretation guidelines. Similarly, Meyer et al.
(2011) completed an extensive review of the CS, previous systems, and the published
literature when designing R-PAS. With many familiar CS components and procedures,
but also with some significant changes to the CS (e.g., new normative sample; ROptimized administration; the way variables are calculated and presented), R-PAS is
presented as an evidence-based and internationally-oriented system, with the authors
being focused on “…enhancing the psychometric and international foundation of the test,
while allowing examiners to interpret the rich communication, imagery, and interpersonal
behavior within that strong psychometric foundation” (Meyer et al., 2011; see Meyer &
2
Eblin, 2012, for a brief overview). Many clinicians find value in the Rorschach as a
method of gathering information about an individual that cannot be obtained using other
popular assessment methods, and this is likely an important factor in the popularity of the
Rorschach in clinical settings (McGrath, 2008).
Historically, the Rorschach has been labeled as a projective test. Weiner (1998)
wrote that “The basic theory of projective mechanisms holds that the possibility and
probability of people attributing their internal characteristics to external objects and
events is directly proportional to the lack of structure in these objects and events.” Use of
objective/projective terminology for describing personality tests, including the Rorschach
(and other tests for that matter), has been challenged in recent years (e.g., Meyer & Kurtz,
2006; Viglione & Rivera, 2003, 2013). A strong argument for retiring the term
“projective” as a test descriptor is that the term carries various meanings and
connotations. One assumption that directly applies to many facets of Rorschach testing is
that the test stimuli are ambiguous and that the task is completely unstructured. As
pointed out by many (e.g., Weiner, 1998; Exner, 2003; Meyer et al., 2011), Rorschach
cards contain complex structural elements (e.g., form, color, shading) that do provide
some boundaries for the test-taker when completing the Rorschach task. However, the
presence of some structure does not preclude test-takers making use of the stimulus
features in unique ways. Thus, the Rorschach stimuli offer clinicians and researchers an
opportunity to explore psychological constructs in a systematic and replicable manner but
without imposing strict regulations on the latitude of the test-taker.
Viglione and Rivera’s (2003, 2013) discussion of performance-based assessment
tests/methods explores the concept of the test-taker responding to the test stimuli (and
3
testing situation) with more freedom of response than would be encountered on typical
self-report measures, but with a variety of constraints and influences still present (e.g.,
critical bits of the inkblot, instructions, examiner variability, reason for referral,
individual differences in level of projection, level of defensiveness, etc.). They agree that
performance-based tasks such as the Rorschach are not purely “projective” in nature, but
argue that they still offer rich behavioral information, in the form of induced observable
(and oftentimes scorable) behavioral samples collected under controlled conditions, that
may not be available from other sources during the assessment process. These rich
samples of complex and real-life behaviors, which are initiated by the stimulus situation,
are mediated by the person’s personality. Ideally, the behavioral and personality samples
collected through the use of standardized performance-based assessment methods will
generalize outside the microcosm of the task, and interpretations can be made about the
person’s behavior and personality in daily life.
Hermann Rorschach (1921/1942) saw the Rorschach as an intellectual endeavor,
requiring the person to concentrate their attention on the inkblots, search their memory,
compare the Rorschach images to images in their memory, then verbalize a response that
matches the mental image to the blot features. Exner (2003) and Meyer et al. (2011) posit
that the Rorschach gives behavioral information about a person, but also contains
information about the psychological and cognitive processes that generate the behaviors.
Leichtman (1996) considers most theorists to have glossed over development of a
thorough conceptualization of Rorschach task demands; he personally describes the
Rorschach as a task of visual representation in which “…participants actively search for
what stimuli can be made into. What emerges is not any association, but an idea that
4
arises from the effort to find a referent and that, in turn, plays a major role in shaping the
medium further.” In other words, the test-taker performs on the Rorschach in a way that
is akin to an artist shaping clay — both begin with a raw material that is shaped or
described in a way that allows it to serve as a representation of the real object. A clay
form of a woman is an artistic expression or representation of a human; a woman seen on
Card 3 of the Rorschach does not look exactly like a human, but some test-takers
recognize a portion of that blot as woman-like and consider it a good representation of
the true object.
Balcetis and Dunning (2006; 2007) reviewed research findings and presented their
own new data showing that how people perceive the world around them is influenced by
their internal state (i.e. their wishes and preferences). Related to this, a series of studies
showed how stimuli that are meaningful to a person draw the person’s attention
automatically (Koivisto & Revonsuo, 2007). This area of research stems from the New
Look approach to perception (e.g., Bruner, 1957) in which a person’s needs, motivation,
and expectations were first seen as influences on perception. This line of research can be
used in understanding the Rorschach, as Rorschach perception is not a purely “cold” or
cognitive process; rather, it is also “hot”, influenced by various factors including the
dynamics, needs, and conflicts of an individual (Exner, 1989, 2003; Meyer et al., 2011).
In other words, “hot” perception refers to processing visual stimuli under the influence of
motivational or affective state.
The importance of recognizing the distinction between types of information
obtained through different methods is not limited to the field of personality assessment or
even clinical psychology; as pointed out by McGrath (2008), social/personality
5
psychologists make similar distinctions using terms such as “implicit” and “explicit,” and
“mental process” versus “mental experience.” Whether the Rorschach is described as a
“projective test,” “performance-based method,” “implicit test,” etc., the most important
matter for a user of the Rorschach is having a well-grounded understanding of how the
test functions and the strengths and limitations of its use.
As will be reviewed, research has established that the Rorschach can be used to
accurately identify psychosis in test-takers by employing scores that demonstrate the
accuracy of the test-takers’ perceptions (e.g., Mihura et al., 2013). These scores and
related indices have been constructed, evaluated, and revised over time and across
Rorschach systems. Within the CS and R-PAS, Form Quality (FQ) is used to assess
accuracy of perception on the Rorschach. However, FQ has some important limitations
that will be detailed in Chapter 2. The R-PAS version of FQ (Meyer et al., 2011) was
developed in an attempt to rectify some of the problems associated with the CS version of
FQ, but early validity studies demonstrate additional room for improvement in the
detection of psychosis using the Rorschach. Additionally, there is currently no fully
dimensional Rorschach score (within the CS, R-PAS, or otherwise) that can thoroughly
and efficiently tap into both the conventionality of response objects being spontaneously
reported by people completing the task and the perceptual fit of those response objects to
the cards at the location where they are perceived. It is believed that such a score could be
an important factor in identifying distorted perceptual processes and impaired reality
testing of the test-taker, and thus improve validity coefficients in Rorschach identification
of psychosis.
6
Prior to beginning work on R-PAS as a formal system, Meyer and Viglione
(2008) conceptualized and developed the Form Accuracy (FA) scoring category, which
captures the accuracy of perceptual fit between a response object and the features of the
inkblot in the location where the object was perceived; FA is assigned to responses by
consulting FA tables, with possible object-level FA scores ranging from 1-5 (Meyer et
al., 2011 presents an overview). Meyer et al. (2011) followed the development of FA
with the initial development of an additional type of scoring category, Percept Frequency
(PF), which indicates how frequently a perceived object is given as a response to the
inkblot location used by the respondent. In the current study, the table of PF variables and
values developed by Meyer et al. (2011) was expanded by adding data from a sixth
country (the U.S.) to the existing specific object frequencies from five other countries
(Argentina, Brazil, Italy, Japan, and Spain) in order to create international summary PF
indices. The PF tables represent a cross-cultural index of how frequently response objects
are identified in specific areas of specific cards on the Rorschach. An archival database
that includes Rorschach protocols and diagnostic information was then used as a criterion
database to explore the structure and predictive capabilities of a selection of FA and PF
variables, with the ultimate goal being to better understand the potential functionality of
the Rorschach as a method for gathering information about the accuracy of people’s
perceptions. This research was also intended to aid in the broader push to identify ideal
methods for combining FA and PF to form final Perceptual Accuracy (PA) indices and
lookup tables. This is an important issue to explore so that standardized methods of
scoring and interpreting PA scores can be applied to future research and ideally, to future
7
clinical practice, with the ultimate goal being to more accurately identify psychosis in
test-takers.
8
Chapter Two
Review of the Literature
Perceptual Accuracy, Reality Testing, and Psychosis
The Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-V;
APA, 2013) is a compendium of psychological constructs that are organized into and
represented as discrete psychological disorders. Many of the DSM-V disorders fall within
a spectrum of “Schizophrenia Spectrum and Other Psychotic Disorders,” the common
thread being the manifestation of psychosis. Although the classification system displayed
in the DSM-V (as well as previous versions of the DSM system) is useful with regard to
facilitating communication about mental health and illness, such classification systems
also pose problems when it comes to researching psychological constructs. As described
by van Os and Tamminga (2007), in discussing the DSM-IV (APA, 1994):
Although these categories are meant to refer to broadly defined
psychopathological syndromes rather than biologically defined diseases
that exist in nature, inevitably they undergo a process of reification and
come to be perceived by many as natural disease entities…. they may also
confuse the field by imposing arbitrary boundaries in genetic and
treatment research and classifying patients into categories that upon closer
examination have little to offer in terms of diagnostic specificity. (p. 861).
9
Rorschach research and clinical use are complicated by the classification problem
described above, as the Rorschach is a performance-based task that provides the user with
behavioral samples that are coded and interpreted as representations of psychological
constructs. Such psychological constructs manifest as observed real-world behaviors that
are labeled as symptoms, which are then organized into diagnostic categories. In other
words, the Rorschach provides information at the level of psychological constructs (e.g.,
accuracy of perception), as opposed to symptoms (e.g., impaired reality testing) and
diagnostic clusters (e.g., disorders involving psychosis).
Various Rorschach researchers have made a push for having clear and direct
intuitive links between Rorschach variables and the constructs they are hypothesized to
represent (e.g., McGrath, 2008; Meyer et al., 2011; Schafer, 1954; Weiner, 2003). Even
though the validity of the Rorschach has been intensely debated throughout the years
since its development, and some Rorschach scores are not believed to be valid for
interpretation, even the toughest critics of the Rorschach attest to the validity of
“perceptual accuracy” scores (e.g., Dawes, 1999; Wood, Nezworski, & Garb, 2003;
Wood, Garb, Nezworski, Lilienfeld, & Duke, 2015); these scores also serve as an
example of variables with a clear relationship to the construct they are intended to assess
(McGrath, 2008).
Rorschach Form Quality (FQ)
History of the development of FQ. Hermann Rorschach devised FQ as a way to
describe whether the response object was appropriate for the contours of the inkblot used
in the response (Exner, 2003). Rorschach, as well as many followers after his death,
believed that the manner in which form was used in constructing a response delivered
10
information about the person’s perceptual accuracy or “reality testing” ability (Exner,
2003). According to guidelines for scoring FQ using the CS (Exner, 2003) and R-PAS
(Meyer et al., 2011), an FQ score is assigned to every response that makes use of form. In
both systems, FQ scoring is guided by the use of published tables: Response objects are
organized by card and by location within a card, and each listed object has a
corresponding FQ score (e.g., Exner, 2003, Table A; Meyer et al., 2011, Chapter 6).
Prior to the development of R-PAS and the CS, various Rorschach systematizers
all agreed on the importance of FQ as a Rorschach score but there was disparity in how
each systematizer felt FQ should be coded. Beck, Beck, Levitt, and Molish (1961) and
Hertz (1970) created two categories: “Good form” responses were indicated by “+” and
“poor form” responses were indicated by “–”, with the assignment of the + or – form
quality scores based on how frequently a response was given for a specific location. Beck
and Hertz published tables that, much like the current R-PAS and CS tables, indicated FQ
scores for lists of response objects at specified locations. However, the tables constructed
by Beck, Hertz, and Exner are not entirely the same; some location areas and FQ scores
for identical objects at identical locations differ between the tables.
In more recent years it has become apparent that many of Beck’s FQ + or – score
decisions were more subjective than originally thought (Kinder, Brubaker, Ingram, &
Reading, 1982). That is, they were probably based more on Beck's judgment than an
actual tally of how frequently a response was given to a specific location. Hertz’s table
appears to have been constructed more systematically and objectively; it includes every
unique response given by her large sample (n = 1,050) of children and adolescents.
11
Similar to Beck and Hertz, Klopfer also used + and – codes for FQ, though he did
not publish frequency tables — He preferred that the scores be based on examiner
judgment (Exner, 2003). Like Klopfer, Piotrowski (1957) and Rapaport, Gill, and Schafer
(1946) did not develop frequency tables, though they approved of the concept of using
frequencies of responses to determine the corresponding FQ scores.
When developing the CS, Exner (2003) considered interrater reliability of great
importance for each score included in the system, as only FQ scores with acceptable
interrater reliability were considered reasonable to use in validation studies. He also
wanted to ensure that FQ clearly demonstrated “reality testing operations” (p. 121).
However, Exner thought the 2-category method of coding for FQ + or –, as used by Beck,
Hertz, Klopfer, and others, resulted in far more limited information than a more complex
system would; he saw meaningful variance in the quality of responses that received
identical scores.
When Mayman (1970) devised a six-category method of scoring, Exner hoped it
would prove more diagnostically useful than the existing two-category methods.
However, Exner’s pilot study of agreement between coders using the Mayman method
revealed discouraging results: Four trained coders independently coded 20 protocols for
Mayman’s FQ and agreement among the coders ranged from 41-83%. Not wanting to
discard the entire method, Exner revised the method by dropping two of the categories
and not having subcategories for one other (Exner, 2003). He also settled on a four
category system of ordinary–elaborated (+), ordinary (o), unusual (u), and minus (–) after
deciding that FQ scores should be based on the frequency of a response 1. Exner’s
Exner’s four category system initially termed ordinary–elaborated FQ as “superior (+)” and unusual FQ
as “weak (w).” In current CS scoring, ordinary–elaborated and ordinary FQ are typically combined.
1
12
simplified system resulted in higher agreement (87-95%) between the same four raters
than was observed using Mayman's six category approach.
In developing the R-PAS FQ tables, Meyer et al. (2011) wanted to retain the
essence of FQ as a measure of accuracy of perception that can be used to identify
distorted perceptual processes of the test-taker. Included in their operational definition of
FQ is the idea that perceptual accuracy encompasses two elements: Fit between the
perceived object and the form features of the inkblot where it is seen, and the frequency
with which that object is spontaneously reported by respondents completing the task.
Thus, they incorporated both elements into their development of the R-PAS FQ reference
tables. Working from an initial set of 13,031 unique response objects that were compiled
from previous FQ tables and sources, Meyer et al. (2011) developed the R-PAS FQ tables
in stages of iterative refinement. The fit and frequency data were used to determine
preliminary FQ designations, with the response objects that made use of form falling into
the categories of ordinary (o), unusual (u), and minus (–).
Determinations about fit were based on FA ratings that had been collected during
the Rorschach FA Project (Meyer & Viglione, 2008). Each of the 13,031 FA response
objects had been rated by five to 15 judges, and 129,230 ratings were obtained in total.
The FA judges had been asked to rate the objects by answering the question “Can you see
the response quickly and easily at the designated location?” Their ratings were made on a
5-point Likert-type scale, with the following answer categories:
1) "No. I can't see it at all. Clearly, it's a distortion."
2) “Not really. I don't really see that. Overall, it does not match the blot area.”
3) “A little. If I work at it, I can sort of see that.”
13
4) “Yes. I can see that. It matches the blot pretty well.”
5) "Definitely. I think it looks exactly or almost exactly like that.”
The objects were rated an average of 9.9 times by a pool of 569 judges who were from
Brazil, China, Finland, Israel, Italy, Japan, Portugal, Romania, Taiwan, Turkey, and the
United States. Meyer et al. (2011) had followed the development of FA with the initial
development of the PF tables, which indicate how frequently a perceived object is given
as a response to the inkblot location used by the respondent. The frequency data had been
culled from five international datasets (Argentina, Brazil, Italy, Japan, and Spain).
As a next step in their process, Meyer et al. (2011) reduced the number of objects
to be classified in the R-PAS FQ tables to 5,060; each of the 5,060 objects were
accompanied by an FA score, PF data, and the FQ scores assigned to the objects by other
systematizers, which was primarily Exner though also included codes assigned by Beck
and Hertz. The final R-PAS FQ code determinations were made after careful examination
of all three sources of data. The authors first applied an algorithm to the data using the
three sources of information then individually reviewed the FQ code determinations for
objects that had seeming discrepancies between data sources (e.g., low FA but an FQ
score of ordinary by Exner), making adjustments to the final FQ code determinations as
necessary. When the finalized R-PAS FQ tables were compared to the CS tables, using
tables that had been slightly revised by Exner’s Rorschach Research Council, 39.9% of
the objects had different FQ code designations (kappa = .375).
Comprehensive System (CS) scoring of FQ. According to CS guidelines, an
FQ score is assigned to each Rorschach response that incorporates the use of form. For
example, a response such as “A bunch of smoke” does not use form — the smoke can
14
take any shape, and there is no shape description included in the language of the
response. Therefore, such a response would not be assigned an FQ score. However, a
response such as “A bunch of smoke — it looks like it is originating from this point down
here, and it billows out as it rises” introduces form into the language of the response, and
thus an FQ score would be assigned. Similarly, responses that contain objects with
inherent form properties (e.g., “a bear”; “two women”; “a mountaintop”) are assigned
FQ scores.
According to Exner (2003), “The FQ coding provides information about the ‘fit’
of the response, that is, does the area of the blot being used conform to the form
requirements of the blot object specified?” (p. 120). The CS method of coding FQ is
considered a way of scoring the Rorschach for accuracy of perception (Exner, 2003).
Although the articulated definition of CS FQ do not contain a reference to additional
factors influencing FQ, the scores are not exclusively based on and indicative of objective
accuracy of the test-taker’s perception; they are in part determined by frequency of the
percept for the specified location, whether lines are imposed on the inkblot in forming the
percept, and at times even word choice. One could, however, make the argument that
factors like the frequency of perceptions on the Rorschach do relate to accuracy of
perceptions, when considered from an ecological position – Is a person’s objective
misperception of a stimulus considered a misperception if it is normative within their
culture? It may also be the case that the frequency of perceptions is a proxy for accuracy
of fit on the Rorschach.
CS FQ scores are assigned using published tables for guidance. Within the FQ
tables, response objects are organized by card and location within a card, and each listed
15
object has a corresponding FQ score (e.g., Exner, 2003, Table A). Exner’s (2003) CS FQ
tables are based on data from a sample of 9,500 Rorschach protocols, consisting of
205,701 individual responses (Exner, 2003). From these responses, 5,018 items or item
classes were reported in the tables.
The “o”, or ordinary FQ item, is defined by Exner (2003) as:
The common response in which general form features are easily
articulated to identify an object… If the item, or class of items, is
designated in Table A as ordinary (o), and involves a W or D area 2, this
signifies that the object was reported in at least 2% (190 or more) of the
9,500 records, and involves blot contours that do exist and are reasonably
consistent with the form of the reported object. There are 865 items or
item classes designated as o for W or D locations. If the item listed as o
involves a Dd location, this signifies that the area was used by at least 50
people (0.52%), that the object was reported by no fewer than two-thirds
of those using the area, and involves blot contours that do exist. Table A
includes 146 items classified as o for the Dd locations. (pp. 122-123).
An example of an ordinary FQ item given to the Card 3 location depicted in Figure 1 is
“butterfly” (see Figure 2 to compare actual object).
2
W indicates a whole response, in which the person uses the entire blot in their response. D indicates a
common detail response, in which an area of the blot is used that is common (i.e., used by at least 5% of
subjects in the development sample, n = 3,000). Dd indicates an unusual detail response, in which an area
of the blot is used that is uncommon (i.e., used by less than 5% of subjects in the standardization sample).
See Exner (2003, pp. 76-79) for a review.
16
Figure 1. Card 3 location D3.
Figure 2. Image of a butterfly.
Figure 3. Image of a dumbbell.
Figure 4. Image of a dragonfly.
17
The “u”, or unusual FQ item, is defined as “A low frequency response in which
the basic contours involved are appropriate for the response. These are uncommon
answers that are seen quickly and easily by the observer” (p. 122). The u responses in the
tables occurred in less than 2% of persons for W and D areas, and for Dd areas they
occurred in fewer than 50 people but were judged by at least three raters who
unanimously deemed the response objects quick and easy to see, and appropriate for the
contours of the blot (Exner, 2003). An example of an unusual FQ item is “dumbbell” (see
Figure 3) given to the location depicted in Figure 1.
The “–”, or minus FQ item, is defined as:
The distorted, arbitrary, unrealistic use of form in creating a response. The
answer is imposed on the blot structure with total, or near total disregard
for the contours of the area used. Often substantial arbitrary lines or
contours will be created where none exist. (p. 122).
An example of a minus FQ item is “dragonfly” (see Figure 4) given to the location
depicted in Figure 1.
Although “o,” “u,” and “–” are the three primary FQ designations within the CS,
Exner also included a code for a subcategory of the ordinary response: The “+”, or
ordinary-elaborated category, is defined by Exner (2003) as:
The unusually detailed articulation of form in responses that otherwise
would be scored ordinary. It is done in a manner that tends to enrich the
quality of the response without sacrificing the appropriateness of the form
use. The + answer is not necessarily original or creative but, rather, it
18
stands out by the manner in which form details are used and specified. (p.
122).
Ordinary-elaborated responses differ from ordinary responses in that they include extra
elaboration of articulated features; they do not necessarily have better fit with the blot,
and they occur with less frequency than do ordinary responses.
When using the CS, FQ is scored for a response by looking up the response object
verbalized by the test taker using the published FQ tables (see Exner, 2003, for review).
If the object is listed in the tables under the appropriate card and location then the
corresponding FQ score is assigned to the response. If the object is not listed in the FQ
tables then the examiner must attempt to extrapolate from the tables by looking for
similar objects that might be listed (e.g., “cherry” if “apple” is not listed), or by looking at
object listings for a location that is quite similar to that of the response. If no comparable
objects are listed, and there are no acceptable object listings in similar locations, then the
FQ score determination for the response relies on the examiner’s judgment. In cases
when an object is not listed and extrapolation from the tables is not possible, the response
is scored as unusual if it meets the following criteria: (1) The response can be quickly and
easily identified, (2) it does not involve distortion of the blot contours, (3) no arbitrary
lines — imagined lines imposed on the blot — are used in the formation of the response,
and (4) the person does not close a broken figure in the formation of the response. If any
of these four criteria are not met then the response would be scored as minus.
When a response is composed of more than one object, which is a common
occurrence, it is oftentimes the case that the FQ scores associated with the multiple
response objects will differ according to the FQ tables. Consider the following example
19
response to Card 3: “It looks like 2 people and there is a big butterfly flying in between
them.” The response has 3 distinct objects: The 2 people and the butterfly. When
responses contain more than one object that is an important part of the response, the
object with the lowest FQ score determines the FQ score assigned to the response; there
is never more than one FQ score assigned to a single response. The lowest-FQ-score rule
only applies to objects that are deemed important to the overall response; if a response
object is not important to the overall response, the FQ of that object is not used in
determining the FQ score of the response. When working from the CS materials,
sometimes the distinction between important and secondary objects within a response is
not clear, though published guidelines and example protocols can be helpful in learning
how to make such distinctions.
Rorschach Performance Assessment System (R-PAS) scoring of FQ. R-PAS
FQ is a function of how accurate the response is (i.e., how well the object or objects
included in the response fit the inkblot location that was used in constructing the response
based on shape), and how common the response is (i.e., how frequently the object or
objects reported by the test-taker occur in that particular location). The FQ scores are
assigned using published tables for guidance, which are contained within Chapter 6 of the
R-PAS manual (Meyer et al., 2011). The 5,060 objects included in the R-PAS tables are
organized into sections based on card number and location within the card. Each card
begins on a new page and is accompanied by a location chart, which identifies the
location numbers for the standard location areas on the card. Within each of the location
sections in the FQ tables, the objects are first arranged by card orientation (i.e., which
position the card was held in when the response was delivered). Within orientation the
20
objects are alphabetized within clusters that are based on five categories according to the
type of object (i.e., objects that are human/human-like; objects that could be either
human/human-like or animal/animal-like; objects that are animal/animal-like; objects that
are anatomical/biological; and all other types of objects). Many of the object listings also
contain clarifying information and elaborations that are intended to help orient the coder
to the listed percept. For example, the FQ listings for Card 3 location D3 (see Figure 1)
include an entry for “Hearts (Anatomical; 2 in Dd29)”. The appropriate FQ code for each
object is listed next to the object entry in the tables.
Like in the CS, R-PAS has three FQ codes that can be assigned to responses that
incorporate the use of form: ordinary (o), unusual (u), and minus (-). There is an
additional category that is used when responses do not contain any objects that use form:
none (n). The ordinary FQ code is described as “form fit that is both relatively frequent
and accurate,” and in general the responses are “…quickly and easily seen” (Meyer et al.,
2011). There are a total of 1,078 ordinary objects in the R-PAS FQ tables. The unusual
FQ code is described as “form fit that is of intermediate frequency or accuracy or both,”
and although the unusual response objects are generally encountered less often than the
ordinary response objects and typically have less accurate fit, “…they are not grossly
inconsistent with blot contours. At times FQu responses fit a particular location well, but
the fit is not readily obvious so the object is not commonly reported” (Meyer et al.,
2011). There are a total of 2,377 unusual object listings. The minus FQ code is described
as “form fit that is infrequent and inaccurate”; these responses are “…infrequent, if not
rare, and also inaccurate, distorted, or arbitrary. They are difficult to see or only grossly
21
approximate the actual contours and shape of the blot areas” (Meyer et al., 2011). There
are 1,605 objects listed that have minus designations.
Before consulting the R-PAS FQ tables the coder must determine whether the
response contains form. The final R-PAS FQ code of none (n) is described as being
applied when a “response does not contain an object with definite form or outline”
(Meyer et al., 2011). Responses that are scored with the none designation are typically
impressionistic responses based on shading and/or color features of the inkblot that do
not include any objects that make use of form. Like in the CS, a response such as “a lot of
blood” does not use form. The object itself – the blood – does not have inherent form
demand (i.e., it can take any shape), and the respondent did not introduce form into the
response language. However, if the respondent had instead reported (or had added to their
original response of “a lot of blood”) that the blood was “dripping down,” “smeared
across this section,” “splattered across the card”, etc., the response object would then be
considered to have form demand and would receive an FQ score based on the FQ table
listings. One might notice that the FQ tables have listings for a variety of objects that do
not have inherent form demand, but to which form can be injected, as demonstrated by
the example. Therefore, the listed FQ code is only applied when form is injected into the
response; if form is not specified by the response language, the none code is applied,
regardless of the object potentially being listed in the tables with an ordinary, unusual, or
minus FQ code assigned to it.
If it is determined that form is present in the response, R-PAS FQ is scored by
looking up the response object(s) verbalized by the test taker using the published FQ
tables (see Meyer et al., 2011 for complete instructions). If the response only contains
22
one object and that object is listed in the tables under the appropriate card and location,
then the corresponding FQ score is assigned to the response. If the object is not listed in
the FQ tables then the examiner must attempt to extrapolate from the tables. The R-PAS
manual offers three basic extrapolation principles to help guide coders: (1) Using the FQ
tables to perform systematic extrapolation is preferable to using independent judgments;
(2) Shape and spatial orientation of the listed object must be consistent with the response
object when extrapolating from the tables; and (3) ideal extrapolation coding captures the
entire response as a collective percept, not the various individual elements of the
response.
When responses contain only one object and extrapolation is needed, Meyer et al.
(2011) provide a procedure to follow that can involve up to four steps. Essentially, the
coder should first search within the appropriate location for objects with a similar shape
to the response object, emphasizing key perceptual features that help delineate the object.
If an extrapolation is not obvious at this point then the coder should proceed to the next
step. If the response object does not fit a location that is listed in the tables, the coder
should extrapolate by consulting object listings in similar location areas (e.g,, a larger
area that subsumes the location area used in the response). As the next step, looking up
subcomponents of the object in the appropriate sub-locations can help inform the
extrapolated coding (e.g., looking up the FQ codes for wings and antennae in the sublocations of the butterfly percept, if butterfly is not listed in the appropriate location
and/or near-location). Finally, the accumulated information should be reviewed, with
more weight given to the earlier as opposed to later steps, before deciding on the final
extrapolated FQ judgement.
23
When a response is comprised of more than one object, the FQ coding procedures
are slightly different and more complex. First, the coder must differentiate important
from unimportant objects in the response. Meyer et al. (2011) describe important objects
as “…the central or focal response objects of a multiple object response. Most often they
are mentioned first, and they are typically asserted with more commitment and
spontaneity than unimportant objects. It is rare for there to be more than three important
objects in a response.” The manual includes elaboration of this concept, as well as several
examples to help coders differentiate important and unimportant response objects. This
concept is crucial for coders to understand because it can have strong impact on the FQ
designations that are assigned to responses during the coding process.
As a first step in coding multiple-object responses, the coder should consult the
FQ tables to check if the percept is listed in its entirety in the overarching location area.
Typically though, the coder must employ additional coding steps. As a next step, the
important response objects should be looked up in the tables in the appropriate location
areas, and the lowest FQ code (minus < unusual < ordinary) from across the important
objects should be assigned to the response. If extrapolation is required because some or
all of the important objects are not listed, the general extrapolation process is the same as
for single-object responses. Once the listed and/or extrapolated scores have been
determined for each individual important object, the lowest object-level FQ code is
applied to the response. Beginning the coding of each multi-object response with the
seemingly poorest-fitting important object can save coding time; if that object is
determined to have a minus FQ code, the response will also be assigned a minus code due
24
to the coding rule that the lowest FQ code from across the important objects is the one
that is applied to the response.
Review of FQ validity.
FQ validity – differentiation of clinical groups. Over the years FQ scores have
been shown to significantly differ across groups, with non-psychotic clinical control
groups (e.g., Berkowitz & Levine, 1953; Knopf, 1956), non-clinical control groups (e.g.,
Friedman, 1953; Rickers-Ovsiankina, 1938; Sherman, 1952), and a mixed group (Beck,
1938) responding with better FQ than groups of people with various forms of
schizophrenia or psychotic disturbance; researchers have attributed this ability to
differentiate between groups to varying capacities for reality-testing. Since the
publication of Exner’s CS in 1974, researchers have continued to demonstrate the
effectiveness of FQ in identifying psychosis (e.g., Mihura et al., 2013).
In a study of perceptual and thought disturbance, data were collected from
individuals falling within categories along a schizophrenia spectrum: (1) Those with no
Axis I or II diagnosis following SCID-I and SCID-II assessment, and no first- or seconddegree relative with a schizophrenia diagnosis (noDx); (2) first-degree relatives of
patients with diagnosed schizophrenia (relatives); (3) undergraduate students who scored
at least two standard deviations above the mean on either the Perceptual Aberration,
Magical Ideation Scale, or Physical Anhedonia Scale (PerMag/PhysAn); (4) individuals
diagnosed with schizotypal personality disorder following assessment with the SCID-I
and the Structured Interview for DSM-III-R Personality Disorders (SPD); (5) outpatients
diagnosed with schizophrenia following SCID-I assessment (outpatients); and (6)
inpatients diagnosed with schizophrenia following SCID-I assessment (inpatients) (Perry,
25
Minassian, Cadenhead, Sprock, & Braff, 2003). Using CS procedures and scoring, the
groups differed on the protocol-level average proportion of minus responses (X-%, or
distorted form; percentage of responses with an FQ coding of – ), the means generally
falling in the pattern expected: The NoDx (M = .25), relatives (M = .23), and SPD (M =
.28) groups had the lowest proportion of minus responses, while the PerMag/PhysAn (M
= .36), outpatients (M = .33), and inpatients (M = .37) had notably higher X-%.
In a meta-analytic review of 48 adult samples using the CS that were published in
the Journal of Personality Assessment from 1974-1985, X+% could differentiate clinical
and control samples with a large effect size (d = 1.05) (Meyer, 2001). X+%, or
conventional form use, refers to the percentage of responses that are common and have
appropriate form use (i.e. FQ coding of + or o) (Exner, 2003). In the sample of 9,500
protocols used to assemble the 2003 CS FQ tables, the average proportion of ordinary
(and ordinary-elaborated) responses (X+%) is .74 for nonpatients, .64 for outpatients, and
.52 for inpatients; the average proportion of unusual responses (Xu%, or unusual form
use; percentage of responses with an FQ coding of u) is .15 for nonpatients, .17 for
outpatients, and .20 for inpatients; and based on the values reported for X+% and Xu%,
the average proportion of minus responses (X-%, or distorted form; percentage of
responses with an FQ coding of – ) is approximately .11 for nonpatients, .19 for
outpatients, and .28 for inpatients (Exner, 2003). XA% was not used by Meyer et al.
(2001), but is included in studies discussed later in this review; XA%, or Extended Form
Appropriate, refers to percentage of responses that have appropriate form use (i.e. FQ
coding of +, o, or u).
26
It is held that X+% imparts information about how conventional the person is in
their responding, with exceptionally high scores indicating an unusual level of
commitment to conventionality or preoccupation with social acceptability, and low scores
indicating unconventional ways of understanding the inkblots (Exner, 1991). High X-%
scores are believed to indicate inaccurate and distorted perception of the blots. Thus, the
description of FQ as a measure of “perceptual accuracy” does not refer to a simple
measure of good eyesight, but rather to a measure that can detect a person’s ability to
recognize objects in a way that is both accurate and in line with social convention (Exner,
2003). Wagner (1998) used the term congruent to describe cases in which the response
percept, representing a real-world object, has good fit with the form present in the
inkblot.
Additional studies not included in Meyer’s (2001) meta-analysis are consistent
with his summary of FQ as scored using the CS. In a sample comprised of patients from
an inpatient psychiatric unit of a Veterans Affairs hospital, as well as from a private
practice, individuals were categorized as either psychotic or nonpsychotic based on DSMIII diagnoses made by staff psychiatrists and clinical psychologists (Peterson &
Horowitz, 1990). The groups differed on CS X+% (t = 6.33, p < .05) in the expected
direction, though descriptive statistics were not reported. Mason, Cohen, and Exner
(1985) observed significant group differences in CS X+% in a comparison of
schizophrenic inpatients (M = 0.52; SD = 0.17), inpatients with depressive disorders (M =
0.68; SD = 0.12), and individuals with no psychiatric history (M = 0.83; SD = 0.06), the
contrasts producing large effect sizes (|d| = 1.06 to 2.53). Interestingly, the authors noted
that some patients with schizophrenia had learned to conceal their symptoms, producing
27
barren Rorschach protocols. Some of these protocols lacked indicators of thought
disorder, but still tended to contain poor FQ scores indicating inaccurate perception.
Diagnostic category membership had been determined using the Research Diagnostic
Criteria, which lists sets of criteria for functional disorders (Spitzer, Endicott, & Robins,
1978).
In another study examining clinical samples, Kimhy et al. (2007) compared CS
Rorschach protocols of individuals considered to be at high risk for psychosis (assessed
using the Structured Interview for Prodromal Symptoms and the Scale of Prodromal
Symptoms), patients with recent-onset schizophrenia (assessed using the Diagnostic
Interview for Genetic Studies, having onset within past 2 years), and patients with
chronic schizophrenia (having onset 3+ years ago). The groups did not differ significantly
on X+%, Xu%, or X-%, though all three groups did show elevated levels of perceptual
distortion that are consistent with the diagnostic categories. X-% for the groups was
higher than would be expected for nonpatients: High risk M = .36; recent onset M = .31;
chronic M = .39. Nonpatient M = .11 in Exner’s (2007) sample of 450 adult nonpatients.
Effect sizes between the nonpatient sample and Kimhy et al.’s samples were large (|d| =
2.76 to 3.77). Similarly, X+% was lower than for nonpatients: High risk M = .45; recent
onset M = .50; chronic M = .45; nonpatient M = .68. Large effect sizes were again
observed for contrasts comparing the nonpatient sample to Kimhy et al.’s samples (|d| =
1.64 to 2.10). Kimhy et al. interpreted the results as an indication that deficits in visual
processing might be evidenced prior to a person meeting full criteria for a psychotic
disorder, and that the FQ indices might have detected an endophenotype of risk for the
28
development of psychosis. In other words, FQ indices may capture a genetically-driven
observable trait that could be linked to psychosis proneness.
Mihura et al. (2013) completed a systematic and extensive review of the validity
of the 65 main CS variables, which included some of the validity studies mentioned
throughout this literature review. After thoroughly examining the peer-reviewed
published literature and coding and tabulating the information according to clearly
outlined procedures, the authors ended up with a total of 1,156 Rorschach validity
coefficients that targeted the CS scores’ core constructs. They screened the meta-analytic
data for publication and selection bias before presenting validity results, with no
concerning findings. In one set of analyses, Mihura et al. (2013) explored whether CS
variables could differentiate target psychiatric samples from nonpatient samples, but also
whether they could differentiate the target psychiatric samples from other diagnostic
samples. They found that both X+% and X-% could differentiate psychotic disorder
samples from nonpatient samples (r = .57, p = .01; r = .61, p < .01), as well as from other
diagnostic samples (r = .31, p < .01; r = .47, p < .01). However, in follow-up moderator
analyses, neither X+% nor X-% could differentiate the nonpatient samples from the
comparison psychiatric samples (p = .25; p = .06).
FQ validity – criterion validity. In evaluating the performance of FQ indices in
relation to other tasks using clinical samples, an important consideration is task difficulty
with regards to cognitive load. Minassian, Granholm, Verney, and Perry (2004) reviewed
literature indicating that pupil dilation can be used as a measure of attention allocation or
cognitive effort. They state that several researchers have observed deficits in pupillary
response by individuals with schizophrenia, but the deficit in pupillary response is only
29
observed when the schizophrenic individual is engaging in tasks requiring high levels of
cognitive effort; the deficits are not typically found in low-demand tasks, where they
have a normal pupillary response. Minassian et al. administered the Rorschach following
CS procedures and a 10-picture version of the Boston Naming Test, which is a task
requiring the person to name objects depicted by simple line drawings, to a sample of 24
patients with schizophrenia and 15 nonpatient participants (assessed using the SCID-IV).
Both groups showed less pupil dilation in response to the Boston Naming Test than the
Rorschach. Additionally, the groups did not differ in dilation for the Boston Naming Test,
but did differ in dilation during the Rorschach. Taken together, the results indicate that
the Rorschach required more cognitive effort from both groups than did the Boston
Naming Test, and the groups differed in cognitive load during the Rorschach but not
during the Boston Naming Test. In planned analyses, a negative moderate-sized
correlation was detected between level of pupillary dilation and X-% on Rorschach Cards
9 and 10 (r = – 0.37, p < .05; r = – 0.39, p < .05, respectively), though the trend in the
data was not significant across all 10 cards (r = – 0.31, p = .06). However, in accordance
with Cohen’s (1992) guidelines, the small sample size might have resulted in inadequate
power to detect a true effect on other cards. At least based on the correlations found in
Cards 9 and 10, results were described as consistent with existing hypotheses stating that
individuals with schizophrenia are not able to process complex and demanding stimuli at
an optimal level due to a limited fund of attentional resources, which is quickly taxed in
such a task. The authors further purport that attentional limitations combined with
cognitive overload can explain (partially at least), the fragmented thinking and thought
disturbance seen in people with schizophrenia.
30
In a criterion-validity study of FQ, X+% and X-% were assessed in conjunction
with scores on several criterion measures of perceptual accuracy (Neville, 1995). F+%
was also used, which is a no longer used CS score that is quite similar to X+%; the
difference between the two scores is that F+% is based only on responses that are pureform responses (i.e., they do not make use of any determinants other than form; other
determinants can include the use of shading, color, the perception of depth, or the
description of movement in the response). The normal group of participants was an
undergraduate and community sample (n = 42), and the clinical group consisted of people
in treatment for schizophrenia at a community support program of a mental health center
(n = 20). Each participant completed a Rorschach (CS procedures), the Gestalt
Completion Test, the Hidden Figures Test, and a signal detection task developed by
Neville. The Hidden Figures Test assesses a person’s ability to locate relatively simple
geometrical figures within a complex geometric design. The Gestalt Completion Test
assesses the ability to identify objects from incomplete figural information. The signal
detection task consisted of 75 figures that were each presented for two seconds. After the
presentation of an item, the participant was asked to identify which figure out of four
options, if any, was the figure previously presented. For this task, scores were standard
metric measures of the difference between the number of hits and misses (d’).
The only significant correlation within the normal group between the FQ
variables and criterion scores was X-% with d’ from the signal detection task (r = .31, p
<.05; Neville, 1995). For the clinical group, there were no significant correlations
between the Rorschach FQ variables and the criterion variables. Although these results
are not encouraging, there are two potentially important considerations. First, the normal
31
and clinical groups were assessed separately, whereas a combined sample would have led
to more score variability and increased statistical power. Second, aggregated criterion
scores were not used, which could have led to more stable and accurate scores (see
discussion of aggregation below); however, the tradeoff with aggregation is a loss of
specificity in interpretation of the findings. Given these considerations, the results may be
misleading and deserve to be followed up with more research.
In a criterion-validity Rorschach study using a child and adolescent sample
(Smith, Bistis, Zahka, & Blais, 2007), CS indices of FQ were compared to performance
on another task considered to represent perceptual accuracy, the Rey-Osterreith Complex
Figure (ROCF). The authors anticipated that good FQ on the Rorschach (high X+% and
WDA%; low X-%) would align with better performance copying the ROCF accurately.
The hypothesis was supported by the correlations observed between the ROCF and
WDA% (r = .56) as well as X-% (r = – 0.45); the correlation was not significant with
X+% (r = .26), though a small sample size (n = 27) could once again be a factor in the
non–significance (Cohen, 1992).
In the reviews of CS literature completed by Mihura et al. (2013), results of the
reviews were also categorized according to the type of method for the validity criterion
measure, with the categories being “introspectively assessed” or “externally assessed.”
Introspectively assessed criteria included only self-report questionnaires and fully
structured interviews with results that do not permit clinician judgement to alter the
results. Externally assessed criteria included DSM diagnosis, observer/chart ratings, and
various performance-based measures. Not entirely surprising, when the authors examined
the averages of the effect sizes included in the study across all CS variables, they found
32
stronger results when the criteria were externally assessed (Zr = .28, r = .27; 770 total
findings) than when the criteria were introspectively assessed (Zr = .08, r = .08; 386 total
findings). When the criteria were limited to those that were externally assessed, across
included studies there was excellent validity support for X+% (r = .48, p < .01; 29 total
findings), X-% (r = .49, p < .01; 34 total findings), and WDA% (r = .46, p < .01; 7 total
findings), and moderate support for Xu% (r = .32, p = .04; 7 total findings). WDA%
refers to the percentage of responses that have appropriate form use (i.e. FQ coding of +,
o, or u), and is calculated from only those responses that are given to common (W or D)
locations. Results were quite different when effects using introspectively assessed criteria
were aggregated: The FQ indices had either a zero correlation (X+% r = .00, p < .01; 6
total findings), no significant correlation (X-% r = .03, p = .68; 4 total findings), or no
findings in the literature that qualified for inclusion in the meta-analyses (WDA% and
Xu%).
FQ validity – SCZI, PTI, TP-Comp, & EII. Rorschach found that psychotic
disordered individuals tended to have poor form quality, and as discussed above, this
finding has been replicated by numerous researchers throughout the years. In more recent
literature, studies have included indices partially comprised of FQ scores. Development
of the Schizophrenia Index (SCZI) began in the 1970s and the index was finalized as the
SCZI in 1984 (Exner, 1984; Exner, 1986). In 1991 the SCZI was modified to reduce the
occurrence of false-positives observed with some clinical groups (Exner, 1991). The
modified SCZI is comprised of 6 criteria used to determine a score that can range from
zero to six. X+%, X-%, and raw FQ sums are among the criteria (Exner, 2003). There is
an abundance of support for the ability of the SCZI to detect group differences between
33
psychotic and non-psychotic individuals (see Jorgensen, Andersen, & Dam, 2000; Exner,
2003; Hilsenroth, Fowler, & Padawar, 1998). However, continued problems with false
positives and a potentially misleading name for the index led the development of an
alternate index, the Perceptual Thinking Index (PTI; Exner, 2000). The PTI has five
criteria that result in a score of zero to five (Exner, 2003). Among the criteria are XA%
and WDA%, as well as X-% and several indices of cognitive slippage. XA%, or
Extended Form Appropriate, refers to percentage of responses that have appropriate form
use (i.e. FQ coding of +, o, or u); WDA% is very similar to XA%, the difference being
that it is calculated from only those responses that are given to common (W or D)
locations (Exner, 2003). W (whole) location scores are assigned when the response is
given using the entire contents of the inkblot. D (common detail area) location scores are
given to responses that use a part of the inkblot, these locations each appearing in at least
5% of the protocols evaluated in establishing location codes (Exner, 2003).
Viglione (1996) completed a criterion-validity study of the Rorschach using a
sample of inpatient, outpatient, and nonpatient children and adolescents. Participants
completed the Rorschach as well as true-false interview designed to assess atypical
beliefs in children — the Child Unusual Belief Scale. X-% had a moderate correlation
with the criterion measure (Spearman rho = .45), as did the SCZI (Spearman rho = .36).
Archer and Gordon (1988) also used an adolescent sample to explore the diagnostic
validity of the Rorschach and MMPI in adolescent populations experiencing psychotic or
depressive symptoms, with DSM-III discharge diagnoses based on treatment team clinical
judgment (primarily determined by clinical history and the team’s behavioral
observations). Teens with schizophrenia had fewer accurate perceptions and more
34
distorted percepts present in their Rorschach protocols (X+% = .46; X-% = .34) than did
those with major depression (X+% = .52; X-% = .23), dysthymic disorder (X+% = .58;
X-% = .20), personality disorder (X+% = .50; X-% = .27), or conduct disorder (X+% =
.61; X-% = .21). The same trend was seen with SCZI scores.
The PTI has been shown effective in discriminating psychotic disordered patients
from non-patients, as well as from patients that were diagnosed with a Cluster A, Cluster
C, or Borderline Personality Disorder (CA/BPD) (Hilsenroth, Eudell-Simmons, DeFife,
& Charnas, 2007). The psychotic-disordered group consisted of adult inpatients with
DSM-IV psychotic disorder intake diagnosis, which was based on treatment team
consensus following a review of all available data. Non-patient protocols were selected
from Exner’s (2003) sample of 450. The CA/BPD group were represented by archival
files from a university psychological clinic. Files with a personality disorder diagnosis
listed were masked and reviewed for presence/absence of a personality disorder by 4
doctoral students. Cases identified as having a personality disorder were reviewed again
and rated for all Cluster B symptom criteria using the DSM-IV. When the PTI was broken
down into its component scores, the mean FQ scores followed a pattern that would be
expected for these groups. The non-patient group had the healthiest scores (XA% = .85,
WDA% = .87, X-% = .14, X+% = .61), followed by the CA/BPD group (XA% = .69,
WDA% = .71, X-% = .29, X+% = .42), and the psychotic group had the most impaired
performance (XA% = .60, WDA% = .62, X-% = .36, X+% = .45). The mean X+% in the
CA/BPD group appears to be somewhat lower than in the psychotic group, contrary to
what we would expect, but the difference is not statistically significant. At a dimensional
level, lower XA%, WDA%, and X+% scores, and higher X-% scores were related to
35
greater diagnostic severity (i.e. higher levels of relative thought and perceptual
impairment), producing large effect sizes (|r| = .47 to .64).
Dao and Prevatt (2006) similarly found that the PTI was effective in
discriminating inpatient individuals with schizophrenia–spectrum disorders (SSD) from
inpatients with mood disorder without psychotic features (MD). Both groups had been
administered the SCID-CV and the SCID-II at intake, and the primary diagnosis was
assigned following consensus by the clinical social worker and psychiatrist after review
of their independent interviews with the patients and the SCID’s. As would be expected,
the SSD group had higher PTI scores (M = 2.9) than the MD group (M = 0.89). Perhaps
more interesting in the context of the current study, all FQ indices also differed by group
status. The SSD group (XA% = .55, WDA% = .57, X-% = .42) displayed poorer FQ than
did the MD group (XA% = .72, WDA% = .75, X-% = .26). The effect sizes for the FQ
indices and PTI were all of large magnitude (d = 1.07 to 1.62).
There is also an extensive literature (e.g., Archer & Gordon, 1988; Archer &
Krishnamurthy, 1997; Bannatyne, Gacono, & Greene, 1999; Blais, Hilsenroth,
Castlebury, Fowler, & Baity, 2001; Ganellen, 1996; Garb, 1984; Meyer, 2000; Ritsher,
2004) aimed at comparing the validity and clinical utility of the Rorschach and MMPI
(and MMPI-2), most of which is beyond the scope of this paper. However, Dao, Prevatt,
and Horne (2008) published a concise summary of important references and also
examined the clinical utility and possible incremental validity of the Rorschach and
MMPI-2 with regard to detection of psychosis. Group comparisons were reported for
inpatients with either primary psychotic disorder (PPD) or primary mood disorder
without psychotic features (PMD). The sample was comprised of 236 patients, and
36
analyses were completed using the primary admission diagnoses. In their comparison of
Rorschach and MMPI-2 protocols, the groups differed on mean PTI (PPD = 2.95, PMD =
1.13; d = 1.22) as well as on all three PTI criteria that involve FQ indices (d = 0.92 to
1.30). Additionally, the authors concluded that the PTI was better at psychosis group
discrimination than was the MMPI-2.
In the CS meta-analyses completed by Mihura et al. (2013), the PTI had excellent
validity support when externally assessed criteria (e.g., DSM diagnosis, observer/chart
ratings, and various performance-based measures) were used that related to disturbed
thinking and distorted perceptions (r = .39, p < .01; 30 total findings). When
introspectively assessed criteria were used, the PTI had a statistically significant low
level of validity support (r = .10, p < .01; 23 total findings). Additionally, the PTI
differentiated target psychotic disorder samples from nonpatient samples (r = .72, p <
.01) as well as from other psychiatric samples (r = .47, p < .01). In follow-up moderator
analyses, the PTI was also able to differentiate the nonpatient samples from the
comparison psychiatric samples (p < .01).
Revisions were made to the PTI prior to publication of R-PAS to make the index
continuous, and with the hopes of improving its inter-rater reliability and validity
(Viglione, Giromini, Gustafson, & Meyer, 2014). In the new Thought and Perception
Composite (TP-Comp), the dichotomous PTI cut scores were substituted with a
regression-based model that produced continuous scores, as opposed to integer scores. To
calculate the regression model used in constructing TP-Comp, the authors used the
original PTI as the predicted variable, and loaded the individual variables that were used
to calculate the PTI into the regression model as predictors. The beta-weights of the
37
resulting regression equation were used to construct TP-Comp. PTI and TP-Comp were
highly correlated in the independent validation sample (r = .87), and TP-Comp was
shown to have higher inter-rater reliability and validity (Viglione et al., 2014).
The Ego Impairment Index (EII) is another index that is partially comprised of FQ
scores. The EII was developed (Perry & Viglione, 1991) as a Rorschach index of the
degree of general psychological impairment experienced by the test-taker, and has a
strong empirical foundation as a measure of psychopathology severity and thought
disturbance (Meyer et al., 2011). It is similar to the PTI, but also contains components
related to self and other representations, as well as indices of crude or disturbing thought
content and imagery. The original EII is comprised of 5 criteria, and the final EII scores
are determined by multiplying the scores for each of the criteria by weights that were
determined through factor analysis. Among the 5 criteria are the sum of FQ- scores, and
M-, which is the number of responses that contain human movements that were also
scored FQ-. The EII underwent slight revision and was renamed the EII-2 (Viglione,
Perry, & Meyer, 2003) when one of the component variables was revised (Viglione,
Perry, Jansak, Meyer, & Exner, 2003). However, despite the differences in calculation,
the EII and the EII-2 have extremely high correlation with each other (r = .99; Viglione,
Perry, & Meyer, 2003).
In a 2011 meta-analysis examining the EII or EII-2 and its relationship to general
psychological disturbance, 14 publications and a total of 13 samples met inclusion
criteria (Diener, Hilsenroth, Shaffer, & Sexton). As had been predicted, higher EII scores
were associated with greater levels of psychiatric severity (r = .29; p < .01). In the
moderator analyses, it was determined that the type of criterion variable impacted the
38
effect sizes. After breaking down the analyses by type of criterion variable, it became
clear that effect sizes were larger when the criterion variable was based on researcher
ratings (r = .45, p < .01) than when it was based on clinician ratings (r = .19, p < .01),
informant ratings (r = .18, p < .01), self-report ratings (r = .10, p = .07), level of
treatment or placement status (r = .11, p = .08); however, criterion variables consisting of
performance-based measures (r = .37, p = .01) also had larger effect sizes than self-report
ratings, or level of treatment or placement status.
A more recent study of the EII-2 was completed using a child sample in Tehran,
Iran (Mohammadi, Hosseininasab, Borjali, & Mazandarani, 2013). The patient sample
consisted of children who had been hospitalized with a diagnosis of childhood-onset
schizophrenia (n = 10) and were under outpatient care at the time of the CS Rorschach
administration, and a comparison sample consisted of “normal” children (n = 10). The
diagnosis of childhood schizophrenia for members of the patient sample was verified by a
child psychiatrist with administration of the Structured Clinical Interview for the DSMIV-TR. The authors broke the EII-2 down into its components and examined each
component individually. The sum of FQ- scores differed between the two samples (d = 2.62, p < .01), with a higher rate of FQ- scores in the patient group (M = 1.17, SD = 0.45)
than in the normal group (M = 0.25, SD = 0.21). The authors also found statistically
significant differences between groups on 3 of the other 4 remaining subcomponents of
the EII-2.
The EII-2 was also used in a study of functional and social skills capacity in adult
patients with schizophrenia or schizoaffective disorder (Moore, Viglione, Rosenfarb,
Patterson, & Mausbach, 2013). Patients had psychiatrist-assigned chart diagnoses based
39
on the DSM-IV and were considered stable at the time of the assessment. One to two
weeks after completing a series of questionnaires, the Rorschach was administered using
an early version of the R-PAS manual; FQ was coded using CS FQ tables, as the R-PAS
FQ tables were not yet available. Correlations with the EII-2 were observed in the
expected direction for structured interview-based indices of positive symptoms (PANSS
Positive r = .32, p < .05) and total symptoms (PANSS Total r = .31, p < .05), but there
was not a significant correlation with negative symptoms (PANSS Negative r = .01, p >
.05). Contrary to expectations, there was no correlation between the EII-2 and
performance-based measures of everyday living skills (UPSA r = -.10, p = .40) or social
skills capacity (SSPA r = -.00, p = .97). Interestingly, healthier EII-2 scores were
associated with higher global cognitive ability (RBANS r = -.33, p < .05). When the
number of FQ- responses was broken out from the EII-2 as a standalone score, there were
no significant correlations with any of the criterion measures.
In 2011, the EII-2 was again revised, becoming the EII-3 (Viglione, Perry,
Giromini, & Meyer, 2011), and it is the EII-3 that is included in R-PAS. The EII-3 is
based on three revisions: A change in the distribution of R due to R-Optimized
administration, removal of food content from the coding process, and transformations to
variables to make them follow (as closely as possible) a normal distribution. The same
regression procedure that was used to construct the EII-2 was again used to calculate the
weights applied to variables in the EII-3. The correlations between the EII-3 and the
previous versions are strong (EII-2 r = .98; EII r = .95; Viglione et al., 2011).
After publication of the R-PAS manual (Meyer et al., 2011), two studies were
published that compared earlier Rorschach indices (the PTI and EII-2) to the R-PAS
40
versions of the indices (the TP-Comp and EII-3). The first study examined the predictive
validity of both the older and newer versions of the indices, and explored whether TPComp and the EII-3 had incremental validity over the PTI and EII-2 (DzamonjaIgnjatovic, Smith, Jocic, & Milanovic, 2013). The authors also explored the degree of
overlap observed between the TP-Comp and the EII-3. The samples of adult inpatients
were drawn from the archives of a psychiatry institute in Serbia, and diagnoses were
made by a psychiatrist and further vetted in case conferences prior to and independent of
the Rorschach administrations. The final psychotic or schizophrenic sample (n = 100)
was comprised of patients who were receiving antipsychotic medications at the time of
the Rorschach administration; the nonpsychotic sample (n = 111) was comprised of
patients with diagnoses of various anxiety states, depression without psychotic features,
or mixed depression and anxiety, and they were receiving anxiolytics, antidepressants, or
a combination of both at the time of testing. The Rorschach was administered and scored
for FQ following CS guidelines. All four indices were able to effectively discriminate the
two samples with p < .01 (PTI d = 1.77; TP-Comp d = 2.16; EII-2 d = 1.58; EII-3 d =
1.92). Using hierarchical logistic regression, the authors found that the TP-Comp and EII3 incremented (with a small increment) the PTI and EII-2 in predicting group
membership; the PTI and EII-2 did not contribute any predictive power to the model that
contained the TP-Comp and EII-3 as the first step. Dzamonja-Ignjatovic et al. (2013) also
found that TP-Comp had a small amount of incremental validity over the EII-3, and
inversely the EII-3 also had some incremental validity over TP-Comp in the prediction of
psychotic disorder group membership. The reality testing component (based on FQ) of
41
the TP-Comp (d = 1.33, p < .01) and EII-3 (d = 0.92, p < .01) were also able to
differentiate the patient groups as standalone indices.
The second study comparing the PTI and EII-2 to the TP-Comp and EII-3 was
conducted to investigate the international adaptability of R-PAS, and explored the
validity of the indices in Taiwan (Su et al., 2015). The sample consisted of culturally
Taiwanese adults who were classified as nonpatients (n = 15), outpatients (n = 37),
patients from a day-treatment program (n = 11), or inpatients (n = 27). The Rorschach
was administered in Taiwanese Mandarin using a translated set of R-PAS administration
instructions, and FQ was scored with the CS tables for the earlier indices (PTI and EII-2,
and FQ indices X-% and WDA%), and with the R-PAS tables for the more recent
versions (TP-Comp and EII-3, and FQ-% and WD-%). Each of the R-PAS indices was
highly correlated with its CS counterpart. As hypothesized, all eight Rorschach indices
were also correlated with the criterion measures, and in the expected directions.
Correlations with the Magical Ideation Scale, a self-report scale used to identify
proneness to psychosis, had a range of |r| = .23 to .54. Correlations with the Positive and
Negative Syndrome Scale, a semistructured interview used to evaluate schizophrenia
symptoms, had a range of |r| = .37 to .54. Correlations with the single-item Clinical
Global Impressions-Severity, which is used by clinicians to assess overall mental health
by answering a question about the patient’s mental health compared to others in the
population, had a range of |r| = .39 to .60. Finally, correlations with Diagnostic Severity,
a 1-5 scale indicating severity of the DSM-IV diagnosis(es) of each patient, had a range
of |r| = .34 to .50. Using hierarchical regressions, Su et al. (2015) found that the R-PAS
indices also incremented the CS indices in predicting each of the criterion measures, but
42
the CS indices did not increment the R-PAS indices in predicting any of the criterion
measures.
FQ validity – malingering. Netter and Viglione (1994) examined FQ as part of a
larger Rorschach malingering study. Malingering is an important consideration in
interpretation of FQ indices or other indices on the Rorschach containing FQ
components. The CS, unlike many other measures used in clinical practice, does not
include formal indicators to assess for valid responding. Thus, it is helpful to have such
studies to provide evidence regarding how trustworthy and impermeable Rorschach
indices are to malingering efforts of test-takers. After providing 20 participants with an
external incentive to motivate successful malingering of schizophrenia (they were told
they would only get their movie tickets if they could fool the examiner), their protocols
were compared to a control nonpatient group and to a group of patients with diagnosed
schizophrenia. The malingering and nonpatient groups were screened with a
demographics questionnaire and had below-established-cutoff scores on the Gorham
Proverbs Test, which is a test that can be used to assess thinking disturbances in
schizophrenia. The schizophrenia group was diagnosed by two psychiatrists using the
DSM-III based on clinical interview and histories, and diagnostic status was validated by
scores exceeding the established cutoff on the Gorham Proverbs Test. Consistent with the
author’s hypotheses, the schizophrenic group had more distorted FQ than the control
group (X-% = .32 and .20, respectively), but the malingerers (X-% = .27) did not differ
from the schizophrenic group. However, the modified version of X-% proposed and
tested by Netter and Viglione did discriminate between malingerers and the
schizophrenic group (Modified X-% = .22 and .30, respectively). The specific scoring
43
criteria for the modified scores were not published in the article, but the criteria were
broadly based on strange testing behaviors considered somewhat unusual or uncommon
for true psychotic individuals.
Ganellen, Wasyliw, and Haywood (1996) also explored the possibility that
psychosis could be malingered on the Rorschach. A sample of 48 forensic patients
completed assessments for fitness to stand trial and/or sanity at the time of the committed
crime, most of the crimes carrying either a long prison sentence or death penalty. As part
of the assessment, each patient completed both the Minnesota Multiphasic Personality
Inventory (MMPI; Hathaway & McKinley, 1967) and the Rorschach. The MMPI validity
scales were used to assign each patient to either the honest or malingered group. When
compared to patient samples and profiles of identified malingerers in the criminal justice
system (using extra-test information), the malingered group produced MMPI profiles
containing highly-elevated levels of psychosis indicators and thus appeared to be exerting
effort to produce pathological-looking records. However, the honest and malingered
groups did not differ on any of the FQ indices included in the study — X+%, Xu%, and
X-% — nor did they differ on SCZI or a frequency-based indicator of highlyconventional responding known as the Popular variable. In essence, the malingering
group did not produce Rorschach protocols that contained psychotic indicators more
frequently than did the honest group.
Limitations of FQ. Although there is a strong base of literature supporting the
use of FQ in assessing reality testing, there are some noteworthy limitations. CS FQ is
largely based on the frequency of response objects within specific locations. Though it is
not inherently problematic that CS FQ is in part determined by location and frequency of
44
the response object, this does represent a departure from the CS conceptual definition of
FQ as indicating accuracy in perception on the Rorschach. Also, in Exner's CS, the FQ
score assigned to a response is based in part on the response verbiage rather than the
shape of the object. For example, consider the response of “anchor” to Card 3, the D2
location (see Figure 5): According to the CS FQ tables, “anchor” (see Figure 6 to
compare shape of actual objects) is a minus response, while “fishhook” (see Figure 7) is
an unusual response. Although the shape of the actual objects are quite similar,
“fishhook” is assigned a higher CS FQ score than “anchor” when given to that Rorschach
location, and subjectively speaking, it does appear to have better fit with the blot. On
Card 3, location D3: “Eye glasses” is a minus response, while “sun glasses” is scored
unusual — this might be partially due to the fact that the D3 location is red, and that
sunglasses have colored or darker lenses than glasses. For Card 5, location W: “Crow” is
an ordinary response, while “bird” is scored as either ordinary or unusual, depending on
the location of the beak (Exner, 2003, Table A). The provided examples demonstrate that
some response objects have quite similar shapes but nonetheless receive different CS FQ
scores due to frequency of word choice and conventionality; using less conventional
words and/or more specific language when describing objects can result in lower CS FQ
scores. Additionally, some response objects differ in FQ score due to non–form stimulus
features of the inkblot (e.g., color of the blot). Thus, it can be observed that FQ scoring
following CS guidelines results in variation in scores due to factors other than the form fit
of the response object to the location. Again, these statements are not inherently
problematic, but they do indicate a departure in CS FQ scoring from the CS definition of
FQ. Such conceptual deviations were noted in the development of R-PAS, and the
45
authors of the R-PAS FQ tables went to great lengths to explore and document the
interplay of fit and frequency data. In the end, they incorporated both into the R-PAS FQ
tables as well as into the FQ variable descriptions (Meyer et al., 2011). They also looked
into seeming inconsistencies, such as the examples above, and attempted to bring more
consistency to the FQ tables in the R-PAS manual. However, the R-PAS FQ tables
retained the CS system of trichotomizing FQ into ordinary, unusual, and minus
responses. The seeming next step in the advancement of FQ coding is to dimensionalize
FQ, much like other indices were dimensionalized (e.g., the PTI).
Figure 5. Card 3 location D2.
Figure 6. Image of an anchor.
46
Figure 7. Images of fishhooks.
Rorschach Form Accuracy (FA)
The development of FA. Rorschach FA was developed as one component of
Perceptual Accuracy (PA; Meyer & Viglione, 2008), the ultimate goal of the PA system
being to improve coding of perceptual accuracy on the Rorschach by addressing
limitations of CS FQ. FA was designed to quantify the goodness of fit between the
features of the inkblots and the response object; it can be combined with response object
frequency to form the PA scoring system, a planned alternative approach to FQ in
quantifying accuracy of perception on the Rorschach.
Development of FA began in 2001 (e.g., Meyer, Patton, & Henley, 2003) with the
creation of a database of response objects identified by various Rorschach systematizers
(including Exner, Beck, Hertz, Thomas, Beizmann, Rorschach, Bohm, Loosli-Usteri,
Binder, Klopfer, and others). Subsequently, these response objects were rated by an
international sample of judges on how accurately the objects can be perceived in the
specified inkblot locations (Meyer & Viglione, 2008; Viglione, Meyer, Ptucha, Horn, &
Ozbey, 2008). Once ratings were completed, they were averaged across judges for each
response object, resulting in a final numeric score for each response object.
47
Following are some of the details of the Rorschach FA Project (Meyer &
Viglione, 2008). Each of the response objects was rated by five to 15 judges. In the end,
ratings were obtained from a total of 569 judges, including undergraduate and graduate
students, professionals in the field, and student-recruited community members. In
addition, a convenience sampling approach generated ratings from non-university and
non-professional adult participants in Turkey, Romania, Japan, and Taiwan. The total
number of rated objects was 13,031 and 129,230 ratings were obtained in total.
Judges were asked to rate the objects by answering the question “Can you see the
response quickly and easily at the designated location?” The ratings were made on a 5point Likert-type scale, with the following answer categories:
1) "No. I can't see it at all. Clearly, it's a distortion."
2) “Not really. I don't really see that. Overall, it does not match the blot area.”
3) “A little. If I work at it, I can sort of see that.”
4) “Yes. I can see that. It matches the blot pretty well.”
5) "Definitely. I think it looks exactly or almost exactly like that.”
Rating forms were provided, and also included items for indicating the
participant's background experience with the Rorschach, gender, age, and country of
residence. Rating forms were made available in Portuguese (with slightly different forms
for use in Portugal and Brazil), Japanese, Finnish, Traditional Chinese, and English. In
general, judges were asked to rate about 250 objects each; the final 13,031 objects were
rated an average of 9.9 times by judges who were from Brazil, China, Finland, Israel,
Italy, Japan, Portugal, Romania, Taiwan, Turkey, and the United States. Because judges
varied considerably in their use of the rating scale, raw scores were ipsatized by
48
conversion to z-scores on a per-rater basis before calculating the median score for each
object. Subsequently, the ipsatized median scores were re-expressed on the original 1 to 5
distribution.
An example of a relatively well-fitting FA item given to the Card 3 location
depicted in Figure 8 is “bowtie” (see Figure 9 to compare shape of actual object); an FA
score of 4.5 would be assigned to this response. A moderately well-fitting response object
is “insect” (see Figure 10; FA = 2.8). An example of a poor-fitting object for the depicted
location is “werewolf” (see Figure 11; FA = 1.3), which illustrates a clear violation of
form.
Figure 8. Card 3 location D3.
Figure 9. Image of a bowtie.
49
Figure 10. Image of an insect.
Figure 11. Image of a werewolf.
Scoring of FA. During the development of FA, various scoring procedures were
considered and piloted. In the end, final FA scoring guidelines were developed as
detailed here and in Horn (2009). Each response within a protocol is assigned an FA
score. Similar to scoring FQ using the CS or R-PAS tables, FA tables were developed
that allow for objective assignment of FA scores for most responses. The FA tables are
organized by card and by location within each card. The standard object locations are
included in the tables, as well as some locations that are not standard CS or R-PAS
locations but that occur with some frequency. Within a location section specific to one of
the 10 cards, all rated objects from that card location are listed along with the
50
corresponding FA score. As in the coding of FQ, when a response contains a single
object that is listed in the table, the corresponding FA score is assigned to the response.
When more than one object comprises the response, all objects important to the
response are considered in assigning the FA score. If the gestalt of the response is listed
in the table (e.g., “2 people” to the whole location of Card 7), the corresponding FA score
is applied to the response. However, it is sometimes the case that there is more than one
important object to the response and the gestalt is not listed. In such instances each
important object is looked up in the FA tables separately and the lowest FA score is
assigned to the response.
The decision to assign the lowest FA score from among the important response
objects is a decision that was debated throughout the development of FA. Another
consideration was to use the mean FA of the important objects within the response.
However, one important asset of the Rorschach is its ability to identify pathology;
pathology of perception is the construct of interest in the development of FA and it seems
most beneficial to identify perceptual lapses rather than ability averaged across the
response. Therefore, it was decided that using the lowest FA score from within a response
provides the most useful information when assessing for perceptual accuracy.
There are times when objects are not listed in the FA tables; such instances
require the examiner to attempt extrapolation from the tables when possible. If an
unlisted response object is quite similar in shape to a listed object within the same
location, the listed object can be used in assigning the FA score. Such decisions should be
based exclusively on the shape of the object (e.g., if “apple” was not listed, “cherry”
could be an appropriate extrapolation object), not on word choice or simple
51
categorization of objects (e.g., if “ostrich” was not listed, “hawk” would not be an
appropriate extrapolation object, even though they are both birds). Another technique for
extrapolation is to look up the response object (or objects quite similar in shape) in a
location that is very similar to that of the response location. If extrapolation is not
possible using the aforementioned techniques, then the examiner must make a judgment
about the FA score that seems appropriate for the response. To aid in this decision, the
examiner should make use of the criteria used by raters in the development of the tables
(i.e., descriptions of what warrants each FA score). Additionally, the examiner should
reference the listed objects and their corresponding FA scores for the location(s) used in
the response. This is done to acquire benchmarks for what raters considered well- versus
poor-fitting objects for that specific blot area.
Review of FA validity. FA was developed as one component of a new PA
system of scoring for quality of perceptual behaviors on the Rorschach, the goal at the
time being to improve upon the existing CS FQ system of scoring. It is now hoped that
FA can also be further used to improve the R-PAS FQ system of scoring. However, the
concept of scoring the Rorschach for pure accuracy (or inaccuracy) of perception is not a
new one; two previous Rorschach studies were found that examined very similar
approaches to quantifying perceptual accuracy of responses. Conducted more than 50
years ago, the studies seem to have been largely overlooked by researchers until recently.
In one study, 100 adults were provided a list of 329 response objects reported to the
whole area of all ten inkblots, and the adults rated the objects dichotomously as
perceptible and fitting the inkblot reasonably well, or not (Walker, 1953). These ratings
were aggregated to comprise reference tables for W responses. A total of 219 responses
52
given to the W location were scored using the Walker (1953) tables and the Beck FQ
tables. Using Chi-Square analyses to compare groups (normal responses vs.
schizophrenic responses), the average of the obtained form accuracy ratings significantly
differentiated responses given by a normal population (n = 122 responses) from those
given by patients with paranoid schizophrenia (n = 97 responses). However, Beck's
traditional method of scoring FQ did not. Kimball (1950) selected 10 W-location
responses for each card from the Beck and Hertz FQ tables to be rated. Using a 6-point
scale of goodness-of-fit, form accuracy ratings were given to each of the 100 responses
by 4 sets of raters (total n = 103), with the sets grouped by amount of training and
experience with the Rorschach. The form accuracy ratings varied widely across judges,
with much of the variation assumed to be due to judges’ projection and lack of clarity
about where components of the whole percept were located or how they were positioned
within the blot (Kimball, 1950). Key recommendations from Kimball’s (1950) form
accuracy methodology were applied to the FA Project (Meyer & Viglione, 2008) in that
many of the objects to be rated had parenthesized location aides to orient the judge and
ratings of each object were made by several judges with varied amounts of experience
with the Rorschach.
Several preliminary studies have provided more recent information about the
validity of FA compared to FQ. Findings were mixed in the original research exploring
the relative validity of CS FQ and FA scores against criteria that assess the ability to
correctly interpret the nature of interpersonal relationships and to correctly comprehend
nonverbal communication (Horn, Meyer, Viglione, & Ozbey, 2008). Criterion measures
included: (1) The Interpersonal Perception Task, in which the test-taker watches a 20-
53
minute video of 15 conversation scenes and selects the type of interaction depicted in
each scene, (2) the Profile of Nonverbal Sensitivity–Face and Body, which consists of 40
two-second segments of the face or torso of a woman expressing a sentiment and a
multiple-choice question after each segment in which the test-taker is asked to describe
the depicted woman’s action, and (3) the Communication of Affect Receiving Ability
Test, during which the test-taker watches 32 videotaped instances of a person (the
“sender”) watching a scene (each scene either being sexual, scenic, unpleasant, or
unusual) that is not visible to the test-taker. From the sender’s reaction to each scene the
test-taker has to identify the type of scene being viewed. The relative independence of
these criterion measures introduced noise in the results, but both FA-derived and FQderived indices produced associations with the criterion measures; effect sizes ranged
from small to moderate.
In a second study, researchers examined how FA compared to CS FQ in a
validation study using positive psychotic symptomatology as a criterion (Ozbey, Meyer,
Viglione, Dean, & Horn, 2008). FA was slightly but nonsignificantly better than CS FQ
in predicting a composite measure of disordered thinking in a sample of 61 long-term
adult psychiatric patients. The protocol-level mean FA (the average of FA scores
assigned to each response within the protocol), the protocol-level mean of the lowest 25%
of response-level FA scores, and FQ each predicted the disordered thinking composite
with a moderate effect size (r = -.33; -.31; -.34, respectively). When FA ratings were
converted to FQ equivalents of -, u, and o at the response level and then converted into
analogs of X-%, X+%, XA%, and WDA% at the protocol level, the FA versions of these
scores showed slightly larger correlations with criterion measures than the FQ versions
54
(e.g., for X-% and WDA%, the rs were .39 and -.38 for the FA-based scores and .35 and .33 for the FQ-based scores).
In a third study, researchers investigated the relative validity of CS FQ indices
and FA scores to differentiate patients based on their general degree of psychiatric
severity (Ptucha, Saltman, Filizetti, Viglione, & Meyer, 2008). Findings were mixed in
that FQ-derived variables predicted diagnostic group (i.e., no diagnosis, non-psychotic, or
psychotic) and diagnostic severity (a combination variable of diagnosis and patient
status); FA-derived variables predicted patient status (i.e., non-patient, out-patient, or inpatient). However, Ptucha et al. (2008) computed FA indices somewhat differently than
the other two aforementioned FA studies.
In a more recent study, Rorschach protocols and a variety of criterion measures
were collected from 114 adult college students in a comparison of FA and CS FQ validity
(Horn, 2009). Criterion variables represented a wide range of perceptual abilities
including elemental visual-spatial ability (the Judgment of Line Orientation; Benton,
Sivan, deS. Hamsher, Varney, & Spreen, 1983), ability to unite a disparate perceptual
field (Gestalt Completion Test and Snowy Pictures Test; Ekstrom, French, Harman, &
Dermen, 1976), the ability to make inferences about the mental states of others (Eyes
Test–Revised Version; Baron-Cohen, Wheelwright, Hill, Raste, & Plumb, 2001), as well
as a test of complex interpersonal perception in which participants rated the IQ and
personality characteristics of college students seen in 10 different videos (Carney, Colvin,
& Hall, 2007). From the evaluation of convergent validity, Horn (2009) observed that
protocol-level FA indices assessed basic perceptual processes, while CS FQ indices
aligned more with tasks of interpersonal perception; effect sizes ranged from small to
55
moderate. Results were interpreted as indicating that FA likely assesses perceptual
accuracy at a more basic and concrete cognitive-processing level, while FQ seems to
indicate a more complex, “warmer” type of processing in which the test-taker is accurate
in detecting nuanced interpersonal and personality cues. Such results may be early
indications of the importance of factors other than objective accuracy of fit when using
the Rorschach to assess such a nuanced construct as perceptual accuracy.
Rorschach Frequency of Perceptions
As described in the overview of CS FQ, Exner embedded frequency of percepts
into the development of the FQ tables: To be considered to have FQ of ordinary, a
percept must have been identified by at least 2% of persons in the FQ data pool for W and
D areas, or by at least 50 persons (0.52%) in the pool who responded to Dd areas. For an
item to have an FQ listing of unusual, the percept must have occurred in less than 2% of
persons for W and D areas, and for Dd areas they occurred in fewer than 50 people
(Exner, 2003). This rough benchmark reflects the impact of percept frequency on CS FQ
determination, but frequency of percepts cannot be formally assessed separately from
perceptual accuracy using the CS FQ tables, and the same is true for the R-PAS FQ
tables.
In the R-PAS FQ tables, the process of balancing fit with frequency in the
determination of the final 3-level FQ code assignments was much more iterative and
nuanced. A table in the technical manual chapter (see p. 424) summarizes the final FQ
determinations with regard to fit and frequency levels, as well as with regard to CS FQ
classification. In general, higher fit and frequency values correspond to higher R-PAS FQ
56
codes, and most objects are classified as unusual when fit is high and frequency is low, or
when fit is poor but frequency is high.
Popular responses. Exner (2003) identified 13 percepts that are labeled as
Popular responses in the CS. Popular responses were “defined using a criterion that
requires the answer to appear at least once in every three protocols,” the sample
consisting of approximately 7,000 protocols (pp. 129-130). Within the category of
Popular, response frequencies still vary; Exner (p. 130) found that some Popular
responses barely meet the criteria used in development of the score (e.g., animal to D1
location of Card 2; crab to D1 location of Card 10), while others are seen in more than
4/5 of protocols (e.g., human figures to D9 location of Card 3; animal figure to D1
location of Card 8). The CS Populars were also retained in R-PAS (Meyer et al., 2011).
The Populars give a rough indication of the conventionality of a test-taker’s
perceptions but the index represents only how frequently the person identifies the
extreme end of what would be considered “common” responses. The score can also be
misleading for clinicians — Weiner (1998) points out that two individuals might have the
same number of Popular responses on a protocol, but could be quite different from each
other in capacity for recognizing conventional reality. Consider the case in which one
individual gives responses containing the four most frequently-seen Popular responses,
while another individual could also respond with four Popular responses, but deliver the
ones that are least common. At the surface, these two individuals appear to have the same
capacity for identifying conventional reality, but they fall at different places along the
spectrum of conventional responding.
57
Findings using Rorschach indices of response frequency. A number of
imaging studies have explored Rorschach perception in recent years with special
attention paid to frequency of responses. Asari et al. (2008) used fMRI scanning to
identify cognitive regions that are activated in the production of unique versus
conventional visual perceptions. The Rorschach was administered to 217 Japanese
participants to form a control sample of protocols. The response percepts were tabulated
and used to create frequency benchmarks: Responses were labeled “frequent” if they
occurred in at least 2% of the control protocols, “infrequent” if they occurred in the
control protocols but did not reach the 2% benchmark, and any responses not encountered
in the control group were classified as “unique.” Sixty-eight Japanese volunteers were
screened for psychiatric or neurological illness using a structured interview, then
delivered verbal responses to the Rorschach cards while in a fMRI scanner. They were
prompted to deliver as many responses as they could within the three-minute presentation
time for each card. The levels of activity in the right temporopolar regions of the brain
differed at the time the response was vocalized based on the response being “unique”
versus “frequent.” Asari et al. reviewed literature suggesting that the right temporal pole
is an important node that is used when perceptual and emotional signals converge, and
that it has also been implicated as part of the system that stores emotional and
autobiographical memories. Olson, Plotzker, and Ezzyat (2007) suggest that this brain
region is also an integral part of the activation of personal memories when a person
becomes emotionally aroused.
Asari et al. (2010) used the same experimental and control samples but examined
brain structure instead of brain function. They computed a Unique Response Ratio for
58
each experimental participant, the ratio consisting of the sum of “unique” responses in the
protocol divided by the total number of responses produced in the protocol. As in the
Asari et al. 2008 article, any responses not encountered in the control group were
classified as “unique.” Across participants, the mean of the protocol-level sum of
“unique” responses was 11.6 (SD = 8.6) and the average number of total responses per
protocol was 39.4 (SD = 17.6) (Asari et al., 2010). The volume concentration of both the
bilateral amygdalae and the bilateral cingulate gyri correlated with the Unique Response
Ratio (p < .05, p < .01, respectively). When examined unilaterally, the volume
concentrations of the right and left amygdala and the cingulate gyrus all had medium
effect sizes in their relationship with the Unique Response Ratio (r = .34, .30, and .37,
respectively). According to the literature review by Asari et al. (2010), the limbic region
is critical in the processing of perceptual information and more specifically in emotional
processing, and the frequency with which brain structures are activated determines, at
least in part, the degree to which that structure becomes enlarged. Thus, it was suggested
that the positive association between the volume of limbic structures and unique
perception supported the hypothesis that increased activity in the limbic system might
underlie unique perceptions.
Statement of the Problem
Within the field of clinical psychology there are limited resources available for
identifying psychotic perception. Research has established that the Rorschach can
accurately identify psychosis in test-takers, but it is possible that new Rorschach scores
could extend the utility of the Rorschach in identifying such characteristics. Within the
CS (Exner, 2003) and R-PAS (Meyer et al., 2011), FQ is currently used to assess
59
perceptual accuracy on the Rorschach and has been demonstrated as a valid indicator in a
robust literature. However, FQ does have some important limitations. Perhaps most
importantly, FQ is not based solely on the shape of the response object and the frequency
of the percept. In the development of the CS tables, clinician judgments played a central
role. In the development of the R-PAS tables, the authors worked to remove
inconsistencies and base the tables more solidly in empirical data, but there were still
many decisions that were made by the development team when the fit and frequency data
did not clearly indicate a response object was both frequent and strong in fit, or both
infrequent and poor in fit. Part of the struggle in constructing the FQ tables is certainly
due to the trichotomization of FQ scores that is seen in both the CS and R-PAS, when in
actuality, fit and frequency of objects on the Rorschach is much more nuanced than a 3category system reflects. Dimensionalization of scores elsewhere on the test has been
completed (e.g., the PTI was dimensionalized and became the TP-Comp), but
dimensional FQ scores have not yet been determined and published.
FA was developed (Meyer & Viglione, 2008) in an attempt to rectify some of the
problems associated with scoring FQ, but there is not a Rorschach score — CS, R-PAS,
or otherwise — that can thoroughly and efficiently tap into the conventionality of a
protocol. It is believed that such a score could be an important factor in identifying
distorted perceptual processes of the test-taker. It is now time that development of an
alternative scoring system for Rorschach perceptual accuracy include a response
frequency score that can ultimately be combined with the accuracy of fit score (FA) to
form a dimensional Perceptual Accuracy (PA) score.
60
Purpose of the Present Study
It is anticipated that the overall PA project (Meyer & Viglione, 2008) will
advance our understanding of perception, as well as result in a validated method of
assessing reality testing using the Rorschach. Ideally, the final PA system will improve
diagnostic accuracy, helping to correctly recognize perceptual aberrations and lead to
fewer mistaken psychotic-disordered diagnostic inferences. FA was the first leg of the PA
scoring system to be developed and it has been included in some validity evaluations; FA
represents how well the shape of the response object fits the blot location. The present
study began with expansion and refinement of Meyer et al.’s (2011) frequency tables, and
re-calculation of international indices of PF. These indices indicate how frequently each
perceived object is given as a response to the location used by the respondent. The
preliminary tables of PF values (Meyer et al., 2011) was developed by examining and
averaging the specific object frequencies from five international datasets (i.e., Argentina,
Brazil, Italy, Japan, and Spain); the tables were expanded by adding data from a U.S.
sample, and the PF variables were re-calculated. The resulting international PF indices
were intended to represent a cross-culturally generalizable index of how frequently
response objects are identified in specific areas of specific cards on the Rorschach. It was
anticipated that PF would be an important component of the final PA scoring system
because it serves as an indicator of conventionality. It is possible that respondents may
see things in an accurate and typical way (i.e., high FA and high PF), an accurate but
unusual way (i.e., high FA and low PF), an inaccurate but typical way (i.e., low FA and
high PF), or in an inaccurate and unusual way (i.e., low FA and low PF). It was expected
that most people who take the Rorschach would have a mix of different types of
61
responses, but that PA would allow for more precise identification of true distortions that
are also atypical in the normal population.
In addition to compiling frequency of response information to form indices of PF,
I explored the structure of FA and PF indices using an archival database that included
Rorschach protocols and diagnostic information, as well as a Diagnostic Severity score
that served as a criterion measure. Although the performance of FA alone had shown
some promise in earlier research, it was believed that if FA was combined with PF to
form PA, significant correlations with the criterion measure would be observed, and
future research might demonstrate the ability of PA to detect problems in accuracy of
perception across a wider range of the “accuracy of perception” spectrum than FQ. It is
also hoped that PA might eventually lead to more accurate assessment of the kind of
perceptual difficulties that impair functioning and interpersonal interactions. Before PA
can be developed, it is essential to understand how FA and PF function independently in
predicting constructs of interest, and to understand the structure of the various FA and PF
indices across responses and cards within the Rorschach test. Without addressing such
questions, it would remain unclear how to best combine FA and PF information within a
protocol to maximize the performance of PA. By clarifying the structure and performance
of FA and PF, it was hoped that standardized methods of scoring and interpreting PA
scores could then be developed and applied to future research and ideally, to future
clinical practice.
For the current study, a criterion database was selected for exploration of the FA
and PF indices. For each response within the database, the important response objects
were identified and both an FA score and PF scores were applied to each object, then
62
were averaged at the response level. The response-level FA and PF indices were then
explored by modeling how card number, response within card, and the criterion variable
contributed to the structure of each variable, and validity coefficients with the criterion
measure were calculated.
Principle of Aggregation
The principle of aggregation holds that single measurements are less stable, more
prone to measurement bias, and less accurate in portraying information than are summed
or aggregated collections of measurements. Rushton, Brainerd, and Pressley (1983)
hypothesize that the weak relationships between variables or measurements so commonly
found in the psychology literature are partly the product of failure to apply the principle
of aggregation to the methodology used in studies. Using aggregated data helps to
average out random error associated with measurement, and multiple non-redundant
measures of the same construct provide more substantial sampling of the behavior of
interest (Rushton et al., 1983).
Behavior can vary drastically as a product of the situation, and behavioral
measurements are often based on single sources of information (Epstein, 1979).
Therefore, the corresponding results are limited in terms of generalizability and
possibility of accurate replication. However, it is also important to consider the range of
the construct anticipated in participants as compared to the range of the construct
expected to be covered by the measures; weak relationships between variables could
result from simply having little variability in the sample compared to relatively great
variability in the situations they are exposed to (Epstein, 1979). After conducting four
studies addressing aggregation hypotheses, Epstein (1979) found that aggregating data
63
over a number of events led to estimates of personality traits being more stable, and also
revealed heteromethod convergent validity.
Epstein (1980) described four types of aggregation: aggregation over subjects,
over stimuli or stimulus situations, over time, and over modes of measurement.
Aggregation over subjects refers to testing many participants and averaging responses
over the sample. This can be accomplished by using an appropriate sample size and
through appropriate types of data analysis. Aggregation over stimuli or stimulus
situations refers to using a variety of stimuli and contexts in addressing the research
question, making sure to include a range of stimuli that the researcher is attempting to
generalize results to. This method helps to reduce the influence of situation-specific
effects in the data. Aggregation over time and over modes of measurement refers to
varying the trials or occasions of measurement, and to using multiple measures of the
same construct, respectively. These recommended aggregation practices were applied to
various components of the research design and data analyses in the following study. For
example, care was taken to select statistical analyses that can appropriately model both
the hierarchical, as well as the repeated measures nature of Rorschach data (i.e.,
responses within cards within protocol). The sample used in this study was fairly large,
and the Rorschach administrations occurred over a period of years by two different welltrained examiners. The criterion measure was reliably coded and each criterion data point
was based on an aggregation of diagnoses, which were in-turn based on the presenting
clinical picture. Finally, FQ is a rather coarse classification method as it is based on just
three options along a continuum of perceptual accuracy; this study is focused on
exploring the structure of FA and PF to make progress on a new dimensional method of
64
scoring perceptual accuracy that will be based on aggregation of dimensional fit and
frequency scores.
Research Questions
How frequently do various Rorschach responses occur? How can the frequency of
perceptions be aggregated at the response and protocol levels? What does the structure of
the FA and PF indices look like within card and across cards? How can PF indices be
combined with FA indices to produce a dimensional scoring system for perceptual
accuracy on the Rorschach?
65
Chapter Three
Method
Participants
Percept Frequency samples.
U.S. Sample. The Rorschach Performance Assessment System (R-PAS; Meyer et
al., 2011) is an effort to evaluate the empirical evidence of Rorschach variables, many of
them CS variables, and to develop a standard method of administration, scoring, and
interpretation that is based on retaining what has empirical support. Norms for R-PAS
were derived from an adult non-patient sample collected from multiple countries (Meyer
et al., 2007). The 145 verbatim English-language protocols that are part of the R-PAS
normative database comprised one of the data files used to identify the frequency with
which various percepts are delivered as responses. An additional data file was employed,
which contains Rorschach protocols for 127 college students from a University in Ohio
(Horn, 2009). This combined U.S Sample, containing a total of 262 protocols, was
aggregated with five other samples during development of the final FA and PF tables.
Argentinean Sample. The individuals who comprise this sample were described
as 506 well-functioning adult nonpatients from the area of Gran La Plata, Argentina
(Lunazzi et al., 2007). This sample’s file provided specific frequencies for all objects
reported by subjects during the Rorschach test. Responses were aggregated along with
the other frequency samples to form PF tabulations.
66
Italian Sample. This sample consists of 800 non-patient adult Rorschach
protocols (Parisi, Pes, & Cicioni, 2005; Rizzo, Parisi, & Pes, 1980). The data file
provided specific frequencies for all objects identified by 2% of the subjects or more (i.e.,
seen by 16 or more people).
Spanish Sample. A sample of 470 Spanish adult outpatient Rorschachs (Miralles
Sangro, 1997) was also used in tabulating PF. The test-takers in this sample had
presented to the Interaction and Personal Dynamic Institute in Spain requesting
psychological evaluation. No sample members were of inpatient status at the time of the
evaluation or were recommended for inpatient treatment following the evaluation. In
total, the data file consisted of 10,562 responses and it provided specific frequencies for
all objects reported during Rorschach administration.
Japanese Sample. This sample’s data file includes the Rorschach protocols of
400 Japanese nonpatients (Takahashi, 2009). It provided specific frequencies for all
objects seen by at least 1% of the sample.
Brazilian Sample. A total of 600 Rorschach protocols are included in this
nonpatient Brazilian sample (Villemor-Amaral et al., 2008). The data file provided
specific frequencies for all objects that were used as Rorschach response objects by the
subjects.
Criterion Database. Data for this adult mixed-status database was collected in
Chicago through a hospital-based psychological testing program (Meyer, 1997; Meyer,
Riethmiller, Brooks, Benoit, & Handler, 2000; Hoelzle & Meyer, 2008). Valid Rorschach
protocols and MMPI-2 administrations were obtained from 362 patients as part of
treatment or evaluation. Of the patients with valid Rorschachs, “…52% were psychiatric
67
inpatients, 30% were psychiatric outpatients, 15% were general medical patients, and 3%
were drawn from other settings” (Meyer, 1997). Diagnostic categorization of the sampled
individuals was used to better understand the relative strengths and weaknesses of FA
and PF indices on the Rorschach.
For the current study, a subset of the database was used, consisting of 212
Rorschach protocols that met criteria for R-Optimized modeling. R-Optimized
administration includes instructions to the Rorschach test-taker that they should “give 2,
or maybe 3 responses…” to each card, the test-taker is prompted for a second response if
they only give one, and the card is removed after a test-taker delivers four responses to a
card (Meyer et al., 2011). Given that the Criterion Database protocols were collected
using CS administration procedures instead of R-Optimized instructions, R-Optimized
modeling was applied to the database. Meyer et al. (2011) used a complex procedure for
determining which responses to retain so as to closely match the distribution of responses
in the Criterion Database to a target database that had been administered using ROptimized administration procedures. They applied the same procedures described in the
R-PAS Manual, and “…wanted the distribution of first, second, third, and fourth
responses given to each card in our modeled sample to match the distribution of first,
second, third, and fourth responses to each card in the target sample….” Of the 212
subjects, the final dataset used for the present study consisted of data from the 159
subjects who also had Diagnostic Severity scores available. The decision to use the
Criterion Database was based on the sampled population – The Criterion Database
contains data collected from a large clinical sample, and the criterion scores reflect
68
severity of their diagnosis(es), which closely approximates the clinical constructs of
interest.
Measures
Percept Frequency samples measures.
Match numbers are used as an indexing aid within the FA and PF tables and for
coding Rorschach protocols. Each Rorschach object is assigned a unique match number
in the FA and PF tables, and Rorschach protocols can then be coded with match numbers
using the FA and PF tables. This process allows the researcher to then import other data
from the tables into the Rorschach coding file. The FA and PF tables also contain
Rorschach card numbers, location codes, angle of the card, and the object names. The
card number is used to identify which Rorschach card the test taker was responding to
when delivering each Rorschach response. Rorschach location numbers are assigned to
each card and are used to identify the various parts of the inkblot image, and these codes
indicate where the response object was located when it was used as part of a response.
The angle of the card is used to indicate which orientation the test taker was holding the
Rorschach card when constructing and delivering their response. Each response object is
also named and listed in the FA and PF tables. The same object can be seen in different
ways and in different locations, but each unique perception has a unique match number.
For example, a butterfly could be seen on different cards, in different locations within the
same card, or even in the same location on the same card but in a different orientation.
Each unique type of “butterfly” response would have its own match number and listing in
the FA and PF tables.
69
An object-level FA score is associated with each unique response object in the FA
tables, with each object’s FA value having been derived from an average of 9.9 rater
judgements (Meyer & Viglione, 2008). The PF tables also contain a variety of objectlevel variables. The first set of PF variables are contained within the non-consolidated PF
tables. These are within-country variables that indicate the percentage of protocols that
contained each unique object. In other words, within each country’s sample, the
percentage of people who gave responses containing each unique response object was
calculated and indexed within the lookup tables. The variable for each country indicates
the specific frequencies for all objects reported by subjects during the Rorschach
administration, with two exceptions: The Japanese Sample listings indicate specific
frequencies for all objects identified by at least 1% of the sample, and the Italian Sample
listings indicate specific frequencies for all objects identified by at least 2% of the
subjects. These percentage-based variables were also converted into country-specific
binary variables that indicate whether or not the percentage of protocols that contained
each match number is greater than or equal to 1.5% of the protocols from that country. In
other words, the PF tables indicate whether or not each unique percept (i.e., match
number) was given by at least 1.5% of the participants from each country’s sample.
A series of composite international PF variables were also computed at the object
level for the PF tables. The first variable is the international version of the percentagebased variable, which is computed as the average of the non-missing values for the six
country-specific percentage-based variables. More simply, it is the mean of the six
within-country variables that indicate the percentage of protocols that contained each
match number. It indicates on average how often a particular percept is reported across
70
samples when it was identified by at least 1.5% of the participants in at least one of the
samples. The binary PF variable was also converted to a composite international variable,
which equals the sum of the six country-specific binary variables. Thus, it is a count of
the number of samples in which the match number was found in at least 1.5% of the
protocols. It has a possible range of 0-6.
In an effort to reduce the length and complexity of the FA and PF tables, Meyer et
al. (2011) had consolidated many of the response objects into response-object-categories
within the tables. The consolidation decisions had been based on careful consideration of
the response object properties; consolidations occurred when there were multiple objects
listed within a single card location and orientation, and the objects were similar to each
other in shape and content and had similar FA ratings. For example, the initial tables
contained separate listings for “anchor,” “fishhook,” and “hook” in the D2 location of
Card III. Upon consolidation, those 3 response objects were consolidated into a single
response object category listing: “hook or similar object (e.g., anchor).” The table
consolidation process is described in more detail in the Procedures section.
To account for the consolidations, object-level variables were computed for each
country based on the consolidated FA and PF tables. The first set of variables indicate the
percentage of protocols from each country (if the percentage was greater than or equal to
1.5%) that contained each match number from the consolidated FA and PF tables. In
order to compute each country’s percentage-based variable for the consolidated tables,
the percentage-based variable values from the unconsolidated tables were aggregated to
match the consolidation of objects, this having been accomplished by summing across the
various object listings included within each consolidated listing. For example, in the
71
unconsolidated tables, Card III location D2 (see Figure 12) contained separate listings for
“anchor,” “fishhook,” and “hook”, and each listing has a percentage-based PF score for
each of the six countries. Upon consolidation, the PF scores for those objects were
summed into the object listing for “hook or similar object (e.g., anchor).”
Figure 12. Card III location D2.
Binary representations of the consolidated percentage-based variables were also
computed and listed within the consolidated PF tables. These within-country variables
indicate whether or not the percentage of protocols that contained any match number
within each consolidated category was greater than or equal to 1.5% of the protocols
from that country. In other words, the consolidated PF tables indicate whether or not,
within the given country’s sample, any object that contributed to a specific consolidated
listing was found in at least 1.5% of the protocols. Thus, this consolidated object-level
score was a zero or a one for each country.
Two consolidated international object-level PF variables were also computed for
the consolidated PF tables. The first variable is the mean of the six within-country
variables that indicate the percentage of protocols that contained each match number after
the listings were consolidated. Thus, it indicates on average how often a particular
72
consolidated percept category is reported across samples. The percentage-based variable
was also converted into a count-based composite international variable. It is a count of
the number of samples (out of the six countries) in which any match number that
contributed to the consolidated listing was found in at least 1.5% of the protocols from
each given country. In other words, it is just a sum of the binary country-specific
consolidated objects that were found in at least 1.5% of the protocols. The count variable
has a possible range of 0-6.
Criterion Database measures.
Each patient within the Criterion Database was assigned an ID number, which
was used for indexing the data and merging data files after coding was completed. By
using an ID number as an indexing variable, Rorschach coding could be completed blind
to any information about the patient, including their diagnosis(es) or their Diagnostic
Severity score.
Patients were assigned diagnoses, which were used to later construct a Diagnostic
Severity indicator. Initial billing diagnoses were recorded for each individual before
testing began, and thus diagnoses were made independent of Rorschach data. The billing
diagnoses were assigned by the treating clinician or by a multi–disciplinary inpatient
treatment team. Diagnoses contained in the database include depressive disorders,
psychotic disorders, personality disorders, anxiety disorders, bipolar disorders, and
gender identity disorder. Medical patients with diabetes, pain management concerns, and
organ transplant candidates were also included.
The diagnostic severity criterion variable is based on the 1–3 diagnoses obtained
for each patient, which were then converted to a 5–point severity scale (Dawes, 1999;
73
Meyer & Resnick, 1996). The severity scale was conceptually derived to quantify the
degree of overall dysfunction associated with a diagnosis, with higher scores indicating
higher levels of dysfunction (e.g., 1 = Adjustment Disorder with Depressed Mood; 3 =
Major Depression, Recurrent, Severe, Non-psychotic; 5 = Schizoaffective Disorder).
When developing the scale there was good agreement between the independent raters on
severity ratings for 141 diagnostic codes (r = .84; 97.9% of ratings were within one point
of each other). The highest diagnosis severity rating for each patient was used as the
criterion measure.
Patients were also administered the Rorschach as part of their treatment or
evaluation at the hospital. The Rorschach was administered using the CS, which was the
most commonly used administration and interpretation system for Rorschach assessment
(Exner, 2003). As is dictated by CS administration guidelines, each patient was
individually presented with the standard series of 10 inkblots and was asked to respond to
each, answering the question, “What might this be?” Their responses were written down,
and later transcribed into an Excel database. Clarifications were collected from each
patient after they completed the response phase of the Rorschach. Each patient’s response
within the database was accompanied by the response clarification, as well as the other
information that is part of a typical CS record: The response number, the card number,
the angle of the response, and response object location information.
Some additional Rorschach information was coded and calculated for this study.
The variable R_InCard was assigned to each patient’s responses to indicate the ordering
of their responses within each card. The possible range for R_InCard, using the ROptimized protocols, was 1-4. Match numbers were assigned to the objects that were
74
included in each Rorschach response. By coding object-level match numbers into the
Criterion Database, other indexed information in the FA and PF tables could be pulled
from the tables into the Criterion Database through the use of syntax, saving manual
coding time and reducing human errors in coding. Five data columns allowed up to five
match numbers to be assigned to each response. However, not all response objects could
be assigned a match number because the look-up table of 11,352 consolidated objects
with corresponding match numbers is not exhaustive and people identify response objects
that are not listed.
Object-level FA was pulled from the FA tables into the Criterion Database. It is
the FA score assigned to each response object. Each response had up to five FA scores,
corresponding to the match numbers that were coded to identify the various objects used
in each response. Response-level FA scores were assigned to each Rorschach response
within the Criterion Database by a human coder (rather than by match number).
Response-level FA was determined by coders after reading the response and clarification,
considering any of the object-level FA scores associated with match numbers, and
applying the coding rules discussed and practiced during the Rorschach coder training.
The protocol-level mean of the response-level FA scores within a patient’s protocol was
calculated as well.
The two international object-level PF variables were also pulled into the Criterion
Database from the consolidated PF tables. As with the object-level FA score assignments,
the object-level PF variables were applied to all response objects with a listed match
number, with a maximum of five objects and associated scores per response. Unlike FA,
there was no coder judgment in assigning PF values; the observed frequencies were used.
75
As a reminder, the first object-level PF variable is the percentage-based variable, which
represents the mean of the six within-country variables that indicate the percentage of
protocols that contained each match number after the listings were consolidated. The
count-based international variable is a count of the number of samples (out of the six
countries) in which the consolidated match number was found in at least 1.5% of the
protocols, with a possible range of 0-6.
Response-level PF variables were also calculated for the Criterion Database. PFM
(Percept Frequency Mean) is the response-level average of the object-level international
percentage-based PF scores that were coded for each object within the response. PFN1.5
(Percept Frequency Number of samples >= 1.5%) is the response-level average of the
object-level international count-based PF scores that were coded for each object within a
response.
Procedures
Frequency tables construction.
Structure of the original FA and PF tables. The present study began with
expanding and updating the Rorschach response object PF tables, one step in developing
the new Rorschach PF scores. The preliminary Microsoft Excel file of FA and PF values
(Meyer et al., 2011) included specific object frequency information from five of the
proposed international samples: Argentina, Brazil, Italy, Japan, and Spain. The Excel FA
and PF tables were structured such that a row was assigned to each individual response
object, with columns indicating country-specific frequency information. Each row was
also assigned a unique match number, which functions as an index aid for the various
response objects included in the tables. As described earlier, there were two types of PF
76
values entered for each country within each response object listing (i.e., within each row):
(1) The percentage of protocols collected from the indicated country that contained the
response object, with indexed values representing frequencies that were greater than, or
equal to, 1.5% of the country’s protocols, and (2) a binary value indicating whether or not
the percentage-based frequency value for the indicated country was greater than or equal
to 1.5% of the protocols.
Coding the U.S. Sample. The preliminary FA and PF tables were expanded by
adding object-level PF information from the U.S. Sample. The U.S. Rorschach responses
were contained in a set of Excel files. The files were structured so that the full Rorschach
responses were represented in the rows; five columns contained the preliminary match
numbers (i.e., the response object identifiers) and response location information for up to
five response objects.
The U.S. frequency coding was accomplished using a simplified version of the
preliminary FA and PF tables as lookup tables. The only variables retained in the tables
for this step of the project were match numbers, object names, and the necessary percept
location information (i.e., Rorschach card number, percept location within the card, the
angle of the card when response was delivered, and whether the response included the
use of non-inked areas). All existing FA and PF information was removed from the tables
before using the tables for coding match numbers in the U.S. Sample.
A merged Excel file was created for the U.S. Sample that combined the 145
protocols that are part of the R-PAS normative database (Meyer et al., 2007) with the 127
protocols from college students from a University in Ohio (Horn, 2009). Initial match
numbers had been assigned to the responses to identify the response objects, with up to
77
five match numbers assigned to each response. SPSS was used to import object-level
information into the Excel file, based on match number. Syntax was written that scanned
the match numbers assigned to the U.S. responses, used the match numbers to locate the
corresponding object information listed in the preliminary FA and PF tables (i.e.,
Rorschach card number, percept location within the card, the angle of the card when
response was delivered), and that object information was then embedded into the U.S.
Excel file. All responses within the U.S. Excel file were then manually screened for
accuracy of the listed match numbers (i.e., did the newly-embedded response object
information match the actual card number, location, and response object that were
indicated in the Excel file), as well as for the presence of unique response objects that
were not already listed in the preliminary FA and PF tables. In rare instances, incorrect
match numbers had been assigned to response objects within the U.S. Sample -- These
errors were corrected by inserting the correct object match numbers into the Excel file. In
cases when a response contained an object without an associated match number having
been assigned, either the correct match number was assigned or the response was flagged
for further manual screening if no object match existed in the FA and PF tables.
After the U.S. Sample Excel file was coded and checked for accuracy, responses
were extracted that had been flagged for further manual screening due to no available
match number for one or more objects used in the response. Each unlisted object used in
a response was assigned a unique match number, and the new match numbers were then
assigned to all additional instances of the unique response objects within the U.S.
Sample. Additionally, nine colleagues were asked to independently assign FA ratings to
each of the new objects that occurred in at least 1.5% of the U.S. protocols. Due to not
78
having enough ratings within each judge to form ipsatized scores for the newly-rated
objects, the median rating within object was used to determine the final object-level FA
rating for each object that would be added to the FA and PF tables from the U.S. Sample.
Updating and adding variables to the FA and PF tables. Following the coding
of the U.S. Sample and the identification of the new objects to be added to the FA and PF
tables, the tables were updated to reflect the addition of the U.S. Sample’s data. As a first
step, the U.S. Sample’s data were imported from Excel to an SPSS database and a
variable was created to index all of the match numbers (i.e., response objects) used within
each protocol. Some protocols contained several similar responses (e.g., more than one
response that incorporated a “butterfly” to location D3 of Card III in the upright
orientation), and thus had more than one response with the same match numbers
assigned. In such instances, the duplicate match numbers within a protocol were filtered
out when tabulating frequency values; in other words, only the first instance of each
match number within a protocol was included in the frequency tabulations -- This
prevented individual protocols from over-contributing to the frequency variables. The
match numbers were then tabulated to create a count of U.S. protocols that used each
match number. The count variable was then converted into a new variable, which
represented the percentage of U.S. protocols that contained each response object. The
U.S. percentage-based variable was indexed in the FA and PF tables, and it was also used
to compute the U.S. binary variable, which indicates whether the U.S. percentage-based
frequency value was greater than, or equal to, 1.5% of the total U.S. protocols.
After the new response objects and the U.S. frequency variables were added to the
FA and PF tables the tables contained the two frequency variables described earlier for
79
each of the six PF Samples: (1) The percentage of protocols collected from the indicated
country that contained the response object, with values representing specific frequencies
greater than or equal to 1.5% of the country’s protocols, and (2) a binary variable
indicating whether the percentage-based frequency value for the indicated country was
greater than, or equal to, 1.5% of the protocols. At this point, as described previously,
two international frequency variables were computed for each listed object: (1) The
average of the six countries’ percentage-based frequency values, and (2) The count of
countries (range 0-6) that had a percentage-based frequency value of greater than, or
equal to, 1.5%.
Given that response object listings had been consolidated in an attempt to simplify
the FA and PF tables, as discussed in the Measures section, the frequency data also
needed to be consolidated for the tables to function properly as lookup tables. Therefore,
SPSS syntax was written to calculate the new country-specific frequency variables for all
object listings that had been consolidated. The first variable to be calculated reflects the
sum of the individual percentage-based frequency values within a consolidated category.
For example, if the consolidated category contained three separate object listings (as in
the “anchor-fishhook-hook” example above), the consolidated variable equaled the sum
of the three objects’ frequency percentages (i.e., the sum of the three values that represent
the percentage of protocols from the indicated country that contained each of the three
response objects). The binary frequency variable was then calculated for each country to
represent whether any response object within a consolidated category had a frequency
greater than or equal to 1.5% of the country’s protocols; in other words, the variable
80
represented whether the consolidated category contained any object that was present
within at least 1.5% of protocols from the indicated country.
Finally, the two international frequency variables were computed for each
consolidated category listing: (1) The average of the six countries’ consolidated
percentage-based frequency values for all values greater than, or equal to, 1.5%, and (2)
The count of countries (range 0-6) that had a consolidated percentage-based frequency
value of greater than, or equal to, 1.5%. For object listings that did not get consolidated
into a category, the frequency values for the original object were retained. Within the FA
and PF tables, PF ratings are listed as missing values for percentage-based frequency
values that are less than 1.5%, and for counts of countries that are 0.
Criterion Database coding.
Coder training and interrater reliability. Prior to the current study, the author (S.
Horn) was extensively trained in coding FA and PF, and also co-trained a research team
on FA coding under the supervision of Gregory J. Meyer. The most extensive coding
training utilized a database collected by Dean, Viglione, Perry, and Meyer (2007; 2008),
consisting of Rorschach protocols from 61 adults who were receiving long-term
residential treatment at the time of the assessment, either at a state psychiatric facility or
in a state prison. This was the primary database employed by G. Meyer and S. Horn for
calibrating their own scoring, as well as training a team of coders on FA coding
procedures, establishing coding reliability, and ensuring calibration across a full coding
team. The full coding team consisted of G. Meyer and S. Horn, fellow graduate student T.
Ozbey, and three undergraduate research assistants. Coder training included several
months of weekly team meetings where coding procedures were taught and reviewed,
81
practice protocols were collectively coded, independently-coded practice protocols were
reviewed as a team, and coding procedure questions were addressed. Coding reliability
for response-level FA was clearly established in this training database. Each of the 40
reliability protocols had been scored by at least two coders, and reliability was computed
using a 2-Way Random Effects Model ICC with an Absolute Agreement definition.
Across the 841 responses (40 protocols), the response-level single measure ICC = .74,
indicating good to excellent interrater reliability (Cicchetti, D. V., 1994). Coding
reliability was computed at the response level because it provided a conservative
assessment of the reliability of the coding rules.
Prior to the current study, S. Horn completed and conducted additional FA coding
training and subsequent interrater reliability analyses using a database that contained
complete Rorschach and criterion data for 110 college students at a small University in
Ohio (Horn, 2009). Within the college student database, agreement ratings for responselevel FA were obtained for 10 protocols coded by S. Horn and an independent coder, E.
Crawford, who completed one-on-one training with S. Horn and was co-supervised by S.
Horn and G. Meyer. Reliability for this training database was computed using a 2-Way
Random Effects Model ICC with an Absolute Agreement definition. Across the 245
responses (10 protocols), the response-level single measure ICC = .82, indicating
excellent interrater reliability (Cicchetti, D. V., 1994).
Given S. Horn’s clearly established coding reliability using 50 protocols within
the training databases, it was determined that less extensive reliability coding would be
needed in the current study due to her predetermined proficiency in coding. The Criterion
Database was initially reviewed and had match numbers assigned to response objects by
82
an independent coding team. All Rorschach responses were reviewed by S. Horn, and the
associated match number coding within the database was revised, as necessary, by S.
Horn. To establish interrater reliability for the Criterion Database coding, individual
coder training was provided to graduate student coder N. Bromley by S. Horn prior to
coding the protocols. As N. Bromley was already familiar with the Rorschach, the
training consisted of two hours of intensive one-on-one coding training. After an
orientation to response-level FA coding and the FA and PF tables, practice responses
were collaboratively coded for response-level FA, allowing for practice and clarification
of concepts. After the training, N. Bromley felt comfortable independently completing
the 10 reliability protocols. After reliability was computed using a 2-Way Random
Effects Model ICC with an Absolute Agreement definition, all coding disagreements
were resolved between S. Horn and N. Bromley on the reliability protocols. All
remaining protocols then underwent match number coding review and revision by S.
Horn, followed by the assignment of object-level FA and PF codes through the use of
syntax, and response-level FA coding by S. Horn.
Coding FA and PF. As a first step in coding the Criterion Database, coders
assigned match numbers to the responses. The coders were provided with an Excel file
that contained all of the Rorschach responses, and they assigned up to five match
numbers to each response to identify the important response objects. As described earlier,
the criterion measure scores were not included in the Excel file and were not available to
the coders; the Excel file only contained the Rorschach responses and the information
needed to code them (e.g., card number, card orientation, location information), and
indexing numbers that could be used to match Rorschach response coding back to the full
83
Criterion Database. The coding was accomplished using a simplified version of the FA
and PF tables. The only data available in the FA and PF tables for this step of the project
were match numbers, object names, the necessary percept location information (i.e.,
Rorschach card number, percept location within the card, the angle of the card when
response was delivered, and whether the response included the use of non-inked areas),
and object-level FA ratings; no PF information was available in the tables for coding the
criterion database.
As described above, following the initial match number coding completed by the
coding team, all responses within the Criterion Database were manually screened by S.
Horn for accuracy of the match numbers (i.e., the response object identification), as well
as for the presence of unique response objects that were not already listed in the FA and
PF tables. SPSS syntax was written that imported information from the FA and PF tables
into the Criterion Database Excel file, with the assigned match numbers serving as the
index values for the various objects contained within the Rorschach responses. For each
Rorschach response in the Criterion Database Excel file, the imported information
included the object names, the object location information, the object-level FA ratings,
and the two object-level consolidated international PF scores. Each response was read in
full, followed by verification of the accuracy of the match numbers, the object names, the
orientation of the card, and the object locations. The imported object-level FA and PF
data for each response were also scanned for missing or unusual-looking values that
might indicate errors in the tables. Response-level FA scores were also assigned by S.
Horn during this stage.
84
In rare instances, incorrect match numbers had been assigned by coders to
response objects within the Criterion Database. Such coding errors were corrected by
inserting the correct object match numbers into the file, and correcting the associated
scores. A more common inaccuracy occurred when an important object was present in the
response language but the object was not accompanied by a match number. Sometimes
such instances of missing information resulted from a simple oversight when the data was
initially coded for match numbers; however, in most cases this type of missing
information occurred when a response contained an object that had been recently added
to the FA and PF tables and thus had not been a listed/indexed object when the Criterion
Database Excel file was initially coded. In such cases the correct match number was
assigned as long as the object and match number were available in the most recent FA
and PF tables; when a response contained an object without an associated match number
having been assigned, and the object was not listed in the most recent FA and PF tables,
then the response object was flagged for further manual screening and FA score
assignment, and the object-level PF ratings were left as missing values.
After the coding and verification stages were completed within the Criterion
Database Excel file, the Rorschach data was imported into SPSS and the response-level
PF scores (i.e., PFM and PFN1.5) were calculated for each Rorschach response. If a
response had no objects that were listed in the FA and PF tables (and thus missing values
for the object-level PF scores), PFM and PFN1.5 were assigned a value of 0.
Statistical Analyses
Overview of planned analyses. The goal was to understand the structure of the
FA and PF variables through the use of HLM modeling, and to determine their
85
relationship with the criterion measure. These steps were part of the process of exploring
how FA and PF indices might be combined to form PA scores. It was believed that once
FA was combined with PF to form PA, correlations with the criterion measure would be
stronger and PA would lead to more accurate assessment of the kind of perceptual
difficulties that impair functioning and interpersonal interactions. This was an important
issue to explore so that standardized methods of scoring and interpreting PA scores could
potentially be applied to future research and ideally, to future clinical practice.
Hierarchical Linear Modeling (HLM). Hierarchical Linear Modeling (HLM)
was the proposed method for exploring the optimally weighted structure of the FA and
PF variables relative to a Diagnostic Severity criterion. Using HLM, I planned to run
regression analyses that would concurrently model response-level information within
Rorschach cards, response number to the card, and people at higher hierarchical levels.
HLM was selected as the most appropriate statistical approach to the data due to the
nested nature of the variables, as well as the fact that consecutive responses to the
Rorschach cards can be conceptualized as repeated measures. It was anticipated that
R_InCard and card number would likely factor into the optimal weighting of FA and PF
scores to jointly predict Diagnostic Severity.
In a broad overview of HLM, Garson (2013) summarized that HLM is a type of
multilevel model, also broadly referred to as a Linear Mixed Model (LMM), and advised
that HLM/LMM is an appropriate way to model data that violate assumptions of
independent observations, as correlated error is accurately modeled in HLM.
Assumptions of independence are often violated in general linear models (e.g., analysis
of variance, correlation, regression) when observations are clustered by grouping
86
variables that can also cause correlated error terms. Garson warns that the standard errors
for prediction parameters that get computed through general linear modeling (e.g., beta
values for regression equations) are inaccurate when the error terms are clustered by a
grouping factor. Such inaccuracies in the computed standard errors (e.g., incorrect
magnitude or direction of beta values for the predictor variables in a regression equation)
can lead to very different conclusions about the relationships between variables than
when using HLM.
As described by Garson (2013), any time data are sampled, there could be a
random effect of the sampling unit as a grouping variable, violating the assumption of
independence of error terms in general linear modeling and OLS regression. Garson
summarized the difference between the models as follows:
… Unlike OLS regression, linear mixed models take into account the fact
that over many samples, different b coefficients for effects may be
computed, one for each group. Conceptually, mixed models treat b
coefficients as random effects drawn from a normal distribution of
possible b’s, whereas OLS regression treats the b parameters as if they
were fixed constants (albeit within a confidence interval)… In summary,
OLS regression and GLM assume error terms are independent and have
equal error variances, whereas when data are nested or cross-classified by
groups, individual-level observations from the same upper-level group
will not be independent but rather will be more similar due to such factors
as shared group history and group selection processes. While random
effects associated with upper-level random factors do not affect lower-
87
level population means, they do affect the covariance structure of the data.
Indeed, adjusting for this is a central point of LMM models and is why
linear mixed models are used instead of regression and GLM, which
assume independence. (pp. 5-6).
When this concept is tied back to the interpretation of research results, the effect
of inaccurate standard errors in general linear models is an inflation of the Type I
error rate (i.e., concluding there is a relationship between variables when there is
not).
Linear mixed modeling techniques have a broad array of language and
terminology tied to them, with labels oftentimes varying by modeling technique, author,
and field of study. Garson (2013) noted how terms for LMM models currently used in
various disciplines include random intercept modeling, random coefficients modeling,
random coefficients regression, random coefficient regression modeling, random effects
modeling, mixed effects modeling, hierarchical linear modeling, linear mixed modeling,
growth modeling, and longitudinal modeling. According to Garson (2013), “In sociology,
‘multilevel modeling’ is common, alluding to the fact that regression intercepts and
slopes at the individual level may be treated as random effects of a higher (ex.,
organizational) level. And in statistics, the term ‘covariance components models’ is often
used, alluding to the fact that in linear mixed models one may decompose the covariance
into components attributable to within-groups versus between-groups effects.” What links
all of these models, despite the variety of names, is that each approach statistically
accounts for the clustering of scores at the lowest level by at least one grouping variable
when the prediction model is calculated.
88
Although it quickly became apparent that there are a wide variety of potential
applications for HLM in psychological research, HLM is still a relatively new statistical
approach. Therefore, guidelines for HLM are still in the process of being developed
(Beaubien, Hamman, Holt, & Boehm-Davis, 2001; Raudenbush & Bryk, 2002).
Additionally, there are few articles published within the social sciences that employ
HLM, and fewer that discuss the approach with the level of detail needed for HLM
novices to fully digest the method and results. Luke (2004) and Hox (2010) provide
comprehensive overviews of the statistical foundations and the technical details of HLM
approaches, though the texts are written to those with intermediate to advanced
knowledge of statistics, and they do not discuss application of HLM within SPSS
specifically. Heck, Thomas, and Tabata (2010) is an excellent example-based resource
for conducting HLM within SPSS, especially with regard to understanding the menu
options and specifications that are specific to the SPSS software. They also include their
syntax in the text, and provide a copy of the database they use in the examples. Although
the book is not entirely specific to SPSS, Garson (2013) provides a comprehensive
overview of HLM written to those with intermediate-level knowledge of statistics, and he
includes straightforward summaries of output within the HLM example chapters.
Using HLM allowed for more accurate exploration of the Criterion Database
because it can correctly model error terms that are correlated (rather than independent of
each other) due to repeated measures (i.e., responses) occurring within the 10
sequentially administered specific cards. The modeling approach used for the current
study, as well as the terminology used in describing the models and the results, closely
follows conventions established by Garson (2013). I completed a series of linear mixed
89
models that included hierarchical (i.e., nested) data, and therefore are referred to as
hierarchical linear models. The models were used to explore the differences between, as
well as within groups. With nested data, the variables can be conceptualized as falling
within different levels in the data. Level-1 is the lowest level of the data hierarchy, and
level-1 variables are nested within level-2 groupings, which are nested within level-3
groupings and so on. These grouping variables are also referred to as cluster variables or
subject variables.
Variables can be defined in a variety of ways within HLM. The models include a
dependent variable, also called the predicted variable. This variable must be a level-1
variable, meaning it occurs at the lowest level of measurement in the model. For the
Criterion Database, level-1 variables occur at the Rorschach response level. Predictor
variables can include level-1 variables, as well as variables from other levels of the
hierarchy. Predictor variables can be entered as fixed effects and/or random effects. Fixed
effects are effects that impact the intercept of the dependent variable (i.e., the mean of the
dependent variable when all other predictors are set at zero) in the model. Fixed effects
are generally thought of as variables whose values of interest are all represented in the
data file, and predictors of interest should be included as fixed effects. Random effects
are effects that impact the covariance structure of the data. Random effects are typically
modeled for variables whose values can be considered a random sample from a larger
population. They are useful for accounting for excess variability in the dependent
variable. An effect can also be specified as both fixed and random, if it contributes to
both the intercept and the covariance structure in the model.
90
Fixed effects are specified as factors or covariates. A factor is an independent
categorical variable that defines groups of cases, and each unique group is assigned a
fixed effect parameter estimate that indicates how each group membership impacts the
intercept of the dependent variable. Factors consist of different nominal levels, not to be
confused with the cluster/grouping levels of the hierarchy of data. The levels of a factor
equate to the data values of the factor, and each level can have a different linear effect on
the value of the dependent variable. For example, Rorschach card number could be
specified as a fixed effect factor at level-2 of the data hierarchy, with factor levels being
1-10, identifying the 10 cards that are the possible data values for the variable. A
covariate is an independent dimensional scale variable, and changes in the value of a
covariate should be linearly associated with changes in the value of the dependent
variable. Scale predictors should be selected as covariates in the model because within
combinations of factor levels, values of covariates are assumed to be linearly correlated
with values of the dependent variable. Fixed effect covariates are also assigned fixed
effect parameter estimates that indicate how the value of the covariate impacts the
intercept of the dependent variable.
Variables can also be defined as repeated effects. Repeated effects variables are
the variables that mark multiple observations of a single subject. Specification of
repeated effects (i.e., repeated measure variables) is a way to relax the assumption of
independence of the error terms. Subject variables (i.e., grouping variables) are used to
define the individual subjects of the repeated measurements, and by identifying subject
variables, the error terms for each individual specified by the subject variable are treated
as independent of those of other individuals in the model. The covariance structure that is
91
applied in the model is used to specify the relationship between the levels of the repeated
effects. There are a variety of covariance matrix structures available, hence allowing for
residual terms with a wide variety of variances and covariances.
To explore the structure of the Criterion Database, HLM modeling was used to
predict the response-level FA score and PF scores (PFM and PFN1.5). The predictor
variables included response number within a card (R_InCard), the specific card the
response was delivered to, Diagnostic Severity, and the individual subject (the subject ID
number). These were used as components in the various HLM models. Modeling of each
variable began with null models, which are random intercept models. In the 2-level null
model, the intercept of the dependent variable (level-1; i.e., FA, PFM, or PFN1.5) was
predicted as a random effect of the identified grouping variable (level-2; i.e., card number
or R_InCard), with no other predictors included in the model. For the 3-level null model,
a second grouping variable (level-3; i.e., Diagnostic Severity) was added as a random
effect predictor of the dependent variable’s intercept. These models can also be
considered one-way ANOVA models with random effects. The null models were used to
determine whether the data demonstrates a hierarchical structure with both betweengroup and within-group variance. Following the null models, predictors can be added to
the model and additional structure can be specified. Various combinations and iterations
of the predictor variables R_InCard, card number, Diagnostic Severity, and ID number
were used to specify fixed effects (i.e., predictor equations) and random effects (i.e.,
variance not accounted for by the predictor equations). Fixed effects included both main
effects, as well as cross-level interaction effects. Revisions were also made to the
covariance matrices when appropriate.
92
Of note, SPSS does not allow for the reference category to be changed within the
HLM syntax or dropdown specifications. Therefore, the R_InCard variable and card
number variable were recoded, and these recoded variables were used in place of the
original variables for all HLM modeling. The recoding allowed for the reference
categories to be set to Card 1 (instead of Card 10) and response 1 within card (instead of
response 4 within card). By using the recoded variables, when interpreting the results,
comparisons were made to responses on the first card instead of the last card, and to the
first response within each card, instead of the last response within each card. This makes
for more intuitive interpretations of the results. It also makes the reference category the
most frequent category, as all patients gave at least one response to each card; most did
not give a fourth.
Supplemental analysis strategies. HLM was the initial approach used in
modeling the data. However, relationships between variables were smaller than expected,
limiting the usefulness of HLM. Therefore, supplemental strategies were employed to
further explore the structure and relationship between variables. The strategies included
simple correlation coefficients and tables, as well as graphical representations of the data.
The initial descriptive statistics, as well as the HLM analyses, made use of the
data at the response level. In completing the supplemental analyses, first the data were
aggregated at the protocol level. For the protocol level aggregation, each person ended up
with a protocol level score that was computed as a mean of the response level scores for
FA and both PF variables. These mean scores were computed overall, individually for
each card, and sequentially for each first through fourth response to a card.
93
Chapter Four
Results
Interrater Reliability
There were 250 responses in the 10 protocols independently coded by S. Horn and
N. Bromley. The response-level single measure ICC was .75, indicating good to excellent
interrater reliability (Cicchetti, D. V., 1994).
Frequency Tables: Descriptives
Within the U.S. Sample (protocols n = 262), I identified Rorschach responses that
had no available match number for one or more objects in the response. Each of those
previously unlisted objects was assigned a unique match number and their frequency was
determined. FA ratings were assigned to each of those new objects that occurred in at
least 1.5% of the U.S. protocols. The five new objects and their associated data values are
provided in Table 1.
94
Table 1
New Response Objects Derived From the U.S. Frequency Sample
Card
Location Angle
Number
1
3
4
5
W
D1
D4
W
Object
v
v
Median
FA
Rating
4.00
3.00
4.00
2.00
Frequency
(% U.S.
Protocols)
1.53
1.53
3.44
1.91
Airplane
Frog
Penguin Head
Shoes (2; toes pointing
out)
9
W
Goblet or Trophy Cup
3.00
1.53
Note. The Angle indentifier “v” indicates the card was held at a 180-degree rotation.
Criterion Database: Descriptives
The Criterion Database contained 159 valid Rorschach protocols with
accompanying Diagnostic Severity scores available. Of the 3,979 responses in the
database, there were 3,897 responses with complete response-level data. The 82
responses that were not assigned an FA score and also not matched with PF data most
often lacked any type of form, although some were verbalizations that were not
considered valid responses to the task (e.g., “Blue inkblots”; “Something on each side so
it's symmetrical. That's good enough for that.”).
Table 2 provides descriptive statistics for the primary variables used in the
primary analyses. As demonstrated in the table, Diagnostic Severity had the full range of
possible values represented in the sample, with a moderately high mean score (M = 3.52,
SD = 1.06). Response-level FA scores also covered the full possible range of values, with
the mean FA score being 3.32 (SD = 1.00).
As a reminder, PFM is the mean of the international percentage-based PF values
across all the objects in a response (and those PF values are themselves mean values
95
computed across all six of the PF databases when the frequencies were 1.5% or higher). It
is the response-level average of the object-level scores. The descriptive statistics indicate
that PFM scores ranged from 0 to 63.25. At the low end, participants gave responses that
contained only objects that did not show up in any of the six countries at a frequency of
1.5% or higher. Recall that a value of zero was applied to all objects that had a frequency
less than 1.5 in a given sample because it was impractical to have all objects in each
sample translated into English, and impossible in the case of the Italian sample. On the
high end, at least one participant gave a response that contained a PFM score of 63.25.
This score indicates that within a single response, on average their response objects were
present in 63.25% of the protocols across samples. In other words, people were delivering
responses containing objects that more than half of people in the comparison samples
also saw. The mean of the response-level PFM score across all responses and protocols
(M = 8.80, SD = 14.37) indicates that, on average, people delivered responses with
objects that about 9% of people in the comparison samples also saw.
PFN1.5 is the mean of the international object-level count-based PF values within
a response. It is the response-level average of the object-level count-based scores and it
indicates on average how often the objects in a response appeared with a frequency of
1.5% or more across the six samples. The observed range for PFN1.5 was 0-6. At the low
end, people gave responses containing only objects that were observed in less than 1.5%
of protocols across all 6 comparison samples. At the high end of the range, people gave
responses containing objects that were present in all six samples at a frequency of 1.5%
or higher. This means that some people gave responses only containing an object or
objects that occurred with a frequency of 1.5% or higher in all 6 samples. The average of
96
the PFN1.5 variable was 2.37 (SD = 2.43). On average, people gave response objects that
were present in 2.37 of the six samples at a frequency of 1.5% or higher.
Table 2
Descriptive Statistics for the Criterion Database
Diagnostic Severity
FA
PFM
PFN1.5
M
3.52
3.32
8.80
2.37
SD
1.06
1.00
14.37
2.43
Min
1.00
1.00
0.00
0.00
Max
5.00
5.00
63.25
6.00
Skew
-0.11
-0.34
2.02
0.39
Kurtosis
-0.96
-0.69
3.36
-1.51
Table 3 provides mean values of the primary Rorschach variables for the Criterion
Database, organized by card number and by R_InCard. As anticipated, cards can be
conceptualized as having different levels of complexity and different mean scores for the
Rorschach variables reported in the table. Additionally, the means also vary according to
which response within a card a person is on. For some cards, the average response had a
fairly high level of fit (FA) and frequency (PFM and PFN1.5). For example, on Card 5,
the response-level FA scores (M = 3.79), as well as the PFM (M = 17.47) and PFN1.5 (M
= 3.37) scores are high compared to other cards. On Card 9, we see that the scores for FA
(M = 2.78), PFM (M = 1.32), and PFN1.5 (M = 1.05) are much lower. This example
indicates that, on average, people gave responses to Card 9 that had lower fit scores (FA
scores) and contained less-common response objects than the responses that people
tended to deliver on Card 5.
Of additional value, Table 3 demonstrates the variation in fit and object frequency
for responses as a function of which response within a card is being examined, both
within cards and across cards. Response-level FA decreases, on average, as a person
97
delivers each additional response within a card (R_InCard 1 M = 3.57; R_InCard 2 M =
3.23; R_InCard 3 M = 3.07; and R_InCard 4 M = 2.94). The same pattern holds true for
the variables PFM and PFN1.5, when examined according to the variable levels for
R_InCard. Within each card, the trend for fit and frequency scores decreasing with each
subsequent response is highly consistent as well.
98
Table 3
Mean Values by Card Number and R_InCard for the Criterion Database
R_InCard
FA
PFM
PFN1.5
N
1
2
3
4
Total
1
2
3
4
Total
1
2
3
4
Total
1
2
3
4
Total
1
3.87
3.50
3.26
2.99
3.56
15.13
7.22
5.63
2.61
9.60
4.12
2.91
1.85
1.18
3.05
158
152
80
28
418
2
3.47
3.38
3.26
3.06
3.38
10.16
5.94
5.56
5.44
7.51
3.05
2.35
2.16
1.52
2.54
156
144
70
25
395
3
3.39
3.45
3.19
3.05
3.36
27.35
11.56
6.85
8.52
16.95
3.54
2.59
1.88
1.55
2.79
159
139
73
20
391
4
3.61
3.12
3.03
2.80
3.29
13.35
5.51
3.22
1.31
8.17
3.10
1.77
1.39
0.76
2.21
157
143
61
19
380
99
Mean
Card Number
5
6
7
4.34
3.34
3.65
3.48
3.03
3.18
3.07
2.92
3.03
2.99
2.72
2.93
3.79
3.13
3.35
28.85
6.29
12.62
10.67
3.48
4.80
4.33
2.10
2.35
0.35
2.93
2.81
17.47
4.42
7.69
4.68
1.92
3.96
2.67
1.37
2.12
1.92
0.77
1.54
0.44
0.80
1.63
3.37
1.48
2.80
159
159
158
133
137
139
48
60
56
16
20
16
356
376
369
8
3.60
3.07
3.00
3.00
3.26
19.39
8.68
7.34
2.91
12.18
2.85
1.80
1.14
0.98
2.03
153
136
68
32
389
9
2.84
2.86
2.60
2.53
2.78
1.51
1.49
0.65
1.10
1.32
1.17
1.22
0.52
0.77
1.05
150
135
67
24
376
10
3.50
3.22
3.20
3.09
3.29
4.78
3.26
3.08
2.23
3.61
2.99
2.19
2.14
1.52
2.36
152
143
95
57
447
Total
3.57
3.23
3.07
2.94
3.32
14.04
6.25
4.18
2.96
8.80
3.15
2.11
1.56
1.18
2.37
1561
1401
678
257
3897
Criterion Database: HLM
HLM models for FA. FA Model 1 was the 2-level null model. The intercept of
the FA scores (level-1) was specified as a random function of card number (level-2
grouping variable). The only fixed effect specified was the level-1 intercept. The model
fit was -2LL = 10905.25 (see Table 4 for a statistical summary of all FA HLM Models),
and the SPSS Type III Test of Fixed Effects table indicted a significant card number
effect on FA scores (F = 1810.03, p < .05), signaling that constructing a multilevel model
was an appropriate way to explore the structure of the data. In other words, there was
significant between-card variation in FA. The SPSS Estimates of Covariance Parameters
table also indicated that the clustering of FA scores by card number (as a level-2 random
effect) accounted for a significant portion of the total variance (Estimate = 0.06, p < .05).
The residual component signaled that there was a significant amount FA score variance
that was not accounted for by the model (Estimate = 0.95, p < .05). Thus, there was
evidence of unexplained within-card variation in FA scores.
FA Model 2 was the 3-level null model. As in the 2-level null model, there are no
predictors at any level; the only specified fixed effect was the level-1 intercept. The
intercept of FA (level-1) was modeled with an accounting of the card number effect
(level-2) and the possible grouping effect by person (ID number at level-3). The model fit
statistic was slightly higher (-2LL = 11074.26), indicating a worse fit than the 2-level null
model. The test of fixed effects remained significant (F = 26502.56, p < .05) as expected,
indicating variance in the intercept attributable to higher-order effects. In the SPSS
Estimates of Covariance Parameters table, the within-person residual component
(Estimate = 0.98, p < .05) was higher than in the 2-level null model, indicating more
100
unexplained variance in FA than in Model 1. The between-card effects within person
component (card number*ID number Estimate = 0.01, p = .78) was reduced to a nonsignificant level in Model 2, due to adding the ID number component as a level-3
grouping variable. The between-person effects account for a small amount of variance in
FA (ID number component Estimate = .03, p < .05).
FA Model 3 was a 3-level Random Intercepts Model. A predictor variable,
R_InCard (level-1) was added as a fixed factor to Model 2 to account for possible trends
due to R_InCard. As compared to the null models, Model 3 had slightly better fit (-2LL =
10890.04). The test of fixed effects revealed significant main effects for the intercept (F =
18793.36, p < .05) and for R_InCard (F = 63.34, p < .05), indicating there was variance
in the intercept of FA attributable to higher-order effects as well as R_InCard. In the
SPSS Estimates of Covariance Parameters table, the components indicate there was still
variance in FA that was attributable to between-person effects (ID number component
Estimate = 0.02, p < .05) as well as unexplained within-person variance (residual
component Estimate = 0.92, p < .05). In this model, a regression equation for FA was
built for each group rather than having a single regression equation across all groups (i.e.,
people). Because R_InCard was modeled as a fixed effect predictor with no
corresponding R_InCard random effects, the slope coefficients were the same for each
regression line (i.e., each ID number). In other words, the regression lines were calculated
to indicate how FA scores were impacted by R_InCard, with that impact (i.e., the slope
coefficients) being consistent across groups (i.e., people), but with each person having a
different FA intercept (i.e., a different FA mean across the responses within their
protocol). The SPSS Estimates of Fixed Effects table gives estimates of individual
101
parameters, and was generated in response to having identified fixed effect predictors –
the FA intercept and R_InCard as a fixed factor (level-1). The parameter estimate for the
intercept of FA (Estimate = 3.57, p < .05) indicates the mean value of FA when all
predictors are set at zero. The remaining parameter estimates indicated that, when using
R_InCard = 1 as the reference category, predicted FA was highest on the first response
within a card (Estimate = 0.00; i.e., no change from 3.57), and slightly lower for each
subsequent response within a card (second response Estimate = -0.33, p < .05; third
response Estimate = -0.49, p < .05; fourth response Estimate = -0.61, p < .05).
FA Model 4 was a 3-level Random Intercepts Model with repeated measures. The
specification of card number as a level-2 grouping variable was removed, and R_InCard
(level-1) within card number (level-2) was specified as a repeated measure with a scaledidentity matrix covariance structure. This allowed for modeling the possible correlation
of residual errors due to R_InCard within card number being a repeated measure within
subject (ID number, at level-3). As compared to Model 3, Model 4 fit was essentially
unchanged (-2LL = 10891.88). The main effects for the intercept (F = 18851.17, p < .05)
and for R_InCard (F = 62.33, p < .05) remained significant. The Estimates of Fixed
Effects showed parameter estimates for the FA intercept and for R_InCard that were also
essentially unchanged. In the SPSS Estimates of Covariance Parameters table, the
components indicated variance in FA that was attributable to between-person effects (ID
number component Estimate = 0.02, p < .05) as well as significant within-person
repeated measures variance (repeated measures Estimate = 0.94, p < .05).
FA Model 5 was also a 3-level Random Intercepts Model with repeated measures.
As compared to Model 4, card number (level-2) was added as a fixed factor predictor
102
variable to account for possible predictive trends due to card number. As compared to the
previous FA models, Model 5 fit was substantially improved (-2LL = 10648.04). The
main effects for the intercept (F = 18981.03, p < .05) and for R_InCard (F = 64.37, p <
.05) remained significant, and card number entered the model as a main effect (F = 28.00,
p < .05). Within the SPSS Estimates of Fixed Effects table, the parameter estimates for
R_InCard displayed the same pattern as in Models 3 and 4, with predicted FA being
lower for later responses within a card. The parameter estimates for card number
indicated that, when using card number = 1 as the reference category, predicted FA also
differs for every card. In order, predicted FA was highest for Card 5 (Estimate = 0.19, p <
.05), followed by Card 1 (Estimate = 0.00), Card 2 (Estimate = -0.20, p < .05), Card 3
(Estimate = -0.22, p < .05), Card 10 (Estimate = -0.23, p < .05), Card 7 (Estimate = -0.23,
p < .05), Card 4 (Estimate = -0.29, p < .05), Card 8 (Estimate = -0.30, p < .05), Card 6
(Estimate = -0.45, p < .05), and Card 9 (Estimate = -0.79, p < .05). In the SPSS Estimates
of Covariance Parameters table, the between-person effects remained (ID number
component Estimate = 0.02, p < .05) as well as the slightly reduced repeated measures
variance (repeated measures Estimate = 0.88, p < .05).
FA Model 6, like Models 4 and 5, was also a 3-level Random Intercepts Model
with repeated measures. In Model 6, a factor-factor cross-level interaction term
(R_InCard*card number) was added to the list of fixed effects – The interaction term was
used to model the effect of card number (level-2) on R_InCard (level-1) in predicting FA.
More specifically, since both R_InCard and card number are specified as factors (not
covariates), the interaction term was used to explore the possibility that each unique
combination of factor levels might have a different linear effect on FA. Model fit was
103
again improved (-2LL = 10563.04). The main effects for the intercept (F = 18071.57, p <
.05), R_InCard (F = 68.08, p < .05), and card number (F = 11.53, p < .05) remained
significant, and the R_InCard*card number interaction term entered the model as a small
but statistically significant fixed effect (F = 3.18, p < .05). Within the SPSS Estimates of
Fixed Effects table, although the values changed slightly, the parameter estimates for
R_InCard displayed the same pattern as in Models 3, 4, and 5. Predicted FA was different
for each level of R_InCard, with the value being highest for the first response and lower
for each subsequent response within a card. Although card number also remained as a
main effect, and all cards still had a unique parameter estimate, the parameter estimates
demonstrated a slightly altered pattern as compared to Model 5. Predicted FA was still
highest for Card 5, followed by Card 1, and lowest for Cards 6 then 9. However, the
remaining cards had a slightly different pattern for the parameter estimates when placed
in descending order (i.e., Card 7, 4, and 8 were now higher than 10, 2, and 3). In
examining the interaction effect parameter estimates, predicted FA scores were set at the
FA mean intercept (Estimate = 0) for 13 of the 40 combinations of R_InCard*card
number. Eighteen others did not differ from that mean intercept at a statistically
significant level (p ≥ .05). The remaining nine interaction effect parameter estimates were
statistically different from the mean intercept. Relative to the baseline estimates provided
by the main effects, the interaction based intercepts were higher for the 2nd response to
Cards 3 and 9; the 3rd response to Cards 2, 3, and 9; and the 4th response to Cards 9 and
10; and lower for the 2nd and 3rd response to Card 5. The interaction effect parameter
estimates can be interpreted as ways to adjust the main effects based on the exact
combination of factor levels. In general, FA did not decline as much as expected on
104
subsequent responses to Cards 3 and 9 but FA declined more than expected on Card 5,
with “expected” defined by the marginal means from card number and R_InCard. This
can be seen in the pattern of means in Table 3. In the SPSS Estimates of Covariance
Parameters table, the between-person effects remained (Estimate = 0.02, p < .05) as well
as the repeated measures variance (Estimate = 0.86, p < .05). The addition of the
interaction term would also be the reason the pattern of estimates for card number
changed. Though it may be a more precise model, it is also more complex to understand
at a conceptual level when examining parameter estimates due to the sheer number of
effects that are modeled, and the fact that the various fixed effects impact each other in
the modeling process.
In FA Model 7, the scaled-identity matrix covariance structure was replaced with
a diagonal matrix covariance structure for the repeated measures specification. Using a
diagonal matrix, like with the scaled-identity matrix, it is assumed that residual
covariances between times are independent of each other (i.e., equal to 0.0), and this is
typically used as the default specification for repeated measures. The difference between
the scaled-identity matrix and the diagonal matrix is that the diagonal matrix permits
unequal variances and thus estimates a different FA variance for the 1st, 2nd, 3rd, and 4th
response to a card. The notably lowered model fit statistic in Table 4 (-2LL = 10398.06)
indicated an improvement in the model by allowing for unequal variances across
sequential responses within cards, with later responses within card having higher variance
estimates. As anticipated, the fixed effects (i.e., the main effects and the interaction term)
remained significant. Within the SPSS Estimates of Fixed Effects table, the main effect
of R_InCard retained the same pattern of estimates as in Models 3-6. The card number
105
main effect also retained the same pattern for the parameter estimates as seen in Model 6:
When placed in order of descending estimates, the factor levels were Card 5, 1, 7, 4, 8,
10, 2, 3, 6, then 9. Examination of the SPSS Estimates of Covariance Parameters table
confirmed that the between-person effects remained (Estimate = 0.02, p < .05) and that
the diagonal matrix specification was appropriate, as the covariance parameter estimates
for all R_InCard* card number combinations were statistically significant (p < .05),
supporting the assumption of no residual covariance between measurement occasions.
For FA Model 8, Diagnostic Severity was added as a fixed effect covariate (level3). The model would still be classified as a 3-level Random Intercepts Model for FA with
Repeated Measures, but with an additional fixed effect predictor variable specified. The
model fit statistic was relatively unchanged (-2LL = 10391.13). The fixed effects
remained significant for the intercept (F = 2640.59, p < .05), R_InCard (F = 70.28, p <
.05), card number (F = 11.55, p < .05), and the R_InCard*card number interaction term
(F = 3.70, p < .05). Additionally, Diagnostic Severity entered the model as a main effect
(F = 7.17, p < .05). Within the SPSS Estimates of Fixed Effects table, the intercept is
slightly higher than in the beginning models (Estimate = 4.04, p < .05). The main effect
of R_InCard retained the same pattern of estimates as in Models 3-7, with predicted FA
being lower for later responses within a card (first response Estimate = 0; second
response Estimate = -0.37, p < .05; third response Estimate = -0.61, p < .05; fourth
response Estimate = -0.89, p < .05). The card number main effect also retained the same
pattern for the parameter estimates as seen in Models 6 and 7: When placed in order of
descending estimates, predicted FA was highest for Card 5 (Estimate = 0.47, p < .05),
followed by Card 1 (Estimate = 0.00), Card 7 (Estimate = -0.22, p < .05), Card 4
106
(Estimate = -0.26, p < .05), Card 8 (Estimate = -0.27, p < .05), Card 10 (Estimate = -0.37,
p < .05), Card 2 (Estimate = -0.40, p < .05), Card 3 (Estimate = -0.48, p < .05), Card 6
(Estimate = -0.53, p < .05), and Card 9 (Estimate = -1.03, p < .05). Diagnostic Severity
was specified as a fixed effect covariate (i.e., a linear variable, as opposed to a nominal
fixed effect factor), so the parameter estimate demonstrated a singular linear effect of
Diagnostic Severity on predicted FA scores (Estimate = -0.05, p < .05), with higher
Diagnostic Severity scores predicting slightly lower FA scores, on average. The SPSS
Estimates of Covariance Parameters table was essentially unchanged (ID number
Estimate = 0.02, p < .05; all R_InCard* card number covariance parameters had p < .05).
In FA Model 9, two factor-covariate cross-level interaction terms
(R_InCard*Diagnostic Severity and card number*Diagnostic Severity) were added to the
list of fixed effects. The interaction terms were used to model the possible effects of
R_InCard (level-1) and card number (level-2) on Diagnostic Severity (level-3) in
predicting FA. In other words, the linear relationship between Diagnostic Severity and
FA (i.e., the slope of Diagnostic Severity) could change for different levels of R_InCard
and card number. The model fit statistic was again relatively unchanged (-2LL =
10385.32). The fixed effects remained significant for the intercept (F = 1864.54, p < .05),
R_InCard (F = 7.10, p < .05), card number (F = 2.01, p < .05), and the R_InCard*card
number interaction term (F = 3.71, p < .05). However, Diagnostic Severity dropped out
of the model as a main effect (F = 3.62, p = .06), and neither of the new interaction terms
were significant (R_InCard*Diagnostic Severity F = 0.27, p = .85; card
number*Diagnostic Severity F = 0.56, p = .83). In essence, the slope of the previouslyseen linear relationship between Diagnostic Severity and FA was not altered according to
107
different levels of R_InCard or card number. The model also seems to be over-specified,
as a main effect was lost.
108
Table 4
Statistical Summary of FA HLM Models for the Criterion Database
-2LL
Model 1
Intercept
Model 2
Intercept
Model 3
Intercept
R_InCard
Model 4
Intercept
R_InCard
Model 5
Intercept
R_InCard
Card Number
Model 6
Intercept
R_InCard
Card Number
R_InCard * Card Number
Model 7
Intercept
R_InCard
Card Number
R_InCard * Card Number
Model 8
Intercept
R_InCard
Card Number
R_InCard * Card Number
Dx Severity
10905.25
11074.25
10890.04
10891.88
10648.04
10563.04
10398.06
10391.13
109
Type III Tests of Fixed Effects
Num df Denom df
F
P
1
9.97
1810.03
< .01
1
152.61
26502.56
< .01
1
3
278.79
3079.55
18793.36
63.34
< .01
< .01
1
3
276.96
3844.18
18851.17
62.33
< .01
< .01
1
3
9
276.14
3841.59
3775.92
18981.03
64.37
28.00
< .01
< .01
< .01
1
3
9
27
300.05
3844.16
3833.64
3789.97
18071.57
68.08
11.53
3.18
< .01
< .01
< .01
< .01
1
3
9
27
243.17
653.15
95.45
141.23
17916.73
69.90
11.56
3.68
< .01
< .01
< .01
< .01
1
3
9
27
1
162.49
647.19
95.07
142.79
151.85
2640.59
70.28
11.55
3.70
7.17
< .01
< .01
< .01
< .01
.01
-2LL
Type III Tests of Fixed Effects
Num df Denom df
F
P
Model 9
10385.32
Intercept
1
241.87
R_InCard
3
971.77
Card Number
9
604.47
R_InCard * Card Number
27
140.36
Dx Severity
1
244.28
R_InCard * Dx Severity
3
957.52
Card Number * Dx Severity
9
639.76
Note. The identifier “Dx Severity” refers to Diagnostic Severity.
110
1864.54
7.10
2.01
3.71
3.62
0.27
0.56
< .01
< .01
.04
< .01
.06
.85
.83
HLM models for PFM. PFM Model 1 was the 2-level null model. The intercept
of the PFM scores (level-1) was specified as a random function of card number (level-2
grouping variable). The only fixed effect specified was the level-1 intercept. The model
fit was -2LL = 31367.94 (see Table 5 for a statistical summary of all PFM HLM Models),
and the SPSS Type III Test of Fixed Effects table indicted a significant card number
effect on PFM scores (F = 30.54, p < .05), signaling that, like with FA, constructing a
multilevel model was an appropriate way to explore the structure of the PFM data. In
other words, there was significant between-card variation in PFM. The SPSS Estimates of
Covariance Parameters table also indicated that the clustering of PFM scores by card
number (as a level-2 random effect) accounted for a significant portion of the total
variance (Estimate = 25.41, p < .05). The residual component signaled that there was a
significant amount PFM score variance that was not accounted for by the model
(Estimate = 181.47, p < .05). Thus, there was evidence of unexplained within-card
variation in PFM scores.
PFM Model 2 was the 3-level null model. As in the 2-level null model, there were
no predictors at any level; the only specified fixed effect was the level-1 intercept. The
intercept of PFM (level-1) was modeled with an accounting of the card number effect
(level-2) and the possible grouping effect by person (ID number at level-3). The model fit
statistic was higher (-2LL = 31825.05), indicating a worse fit than the 2-level null model.
The test of fixed effects remained significant (F = 1204.44, p < .05), as expected,
indicating variance in the intercept attributable to higher-order effects. In the SPSS
Estimates of Covariance Parameters table, the residual component (Estimate = 200.72, p
< .05) was higher than in the 2-level null model, indicating more unexplained within111
person variance in PFM than in Model 1. The variance in PFM attributable to betweencard effects within person (card number*ID number component Estimate = 3.99, p = .34)
was reduced to a non-significant level in Model 2, due to adding the ID number
component as a level-3 grouping variable. The variance attributable to between-person
effects was also non-significant (ID number component Estimate = 1.72, p = .17).
PFM Model 3 was a revised 2-level null model, in which the intercept of the PFM
scores (level-1) was specified as a random function of ID number (level-3 grouping
variable). The only fixed effect specified was the level-1 intercept. The model fit (-2LL =
31826.00) was essentially unchanged as compared to Model 2. The SPSS Type III Test of
Fixed Effects table indicted variance in the intercept attributable to higher-order effects
on PFM scores (F = 1197.99, p < .05), continuing to signal the need for a multi-level
model. However, the SPSS Estimates of Covariance Parameters table indicated that PFM
scores did not have significant variance accounted for by the component ID number as a
level-3 random effect (Estimate = 2.00, p = .10).
PFM Model 4 was a revision of Model 2 (i.e., the 3-level null model). Model 4
was designed as a 3-level Random Intercepts Model for PFM. A predictor variable,
R_InCard (level-1), was added as a fixed effect factor. As compared to Model 2, Model 4
had improved fit (-2LL = 31437.05). The test of fixed effects revealed significant main
effects for the intercept (F = 559.87, p < .05) and for R_InCard (F = 137.76, p < .05),
indicating there was variance in the intercept of PFM attributable to higher-order effects
as well as R_InCard. The SPSS Estimates of Fixed Effects table listed significant unique
estimates for each parameter. The parameter estimate for the intercept of PFM (Estimate
= 14.04, p < .05) indicates the mean value of PFM when all predictors are set at zero (i.e.,
112
on average, the objects seen in responses by these participants were seen by about 14% of
others). The remaining parameter estimates indicated that, when using R_InCard = 1 as
the reference category, predicted PFM was highest on the first response within a card
(Estimate = 0.00; i.e., no different from 14.04), and lower for each subsequent response
within a card (second response Estimate = -7.78, p < .05; third response Estimate = -9.79,
p < .05; fourth response Estimate = -10.89, p < .05). In the SPSS Estimates of Covariance
Parameters table, the components indicate a small amount of the variance in PFM was
attributable to between-card effects within person (card number*ID number component
Estimate = 11.50, p < .05), but not between-person effects (ID number component
Estimate = 0.14, p = .90). The majority of the within-person variance in PFM remains
unexplained (residual component Estimate = 175.60, p < .05).
PFM Model 5 was a 3-level Random Intercepts Model with repeated measures.
The specification of card number as a level-2 random effects grouping variable was
removed, and R_InCard (level-1) within card number (level-2) was specified as a
repeated measure with a scaled-identity matrix covariance structure. This allowed for
modeling the possible correlation of residual errors due to R_InCard within card number
being a repeated measure within subject (ID number, at level-3). As compared to Model
4, the Model 5 fit statistic was slightly higher (-2LL = 31446.13), indicating a very small
decline in model fit. The main effects for the intercept (F = 558.65, p < .05) and for
R_InCard (F = 133.28, p < .05) remained significant. The SPSS Estimates of Fixed
Effects table retained the same pattern as in Model 4. In the SPSS Estimates of
Covariance Parameters table, the components indicated no significant variance in PFM
that was attributable to between-person effects (ID number component Estimate = 0.91, p
113
= .37), but there was significant variance accounted for by the repeated measures
(repeated measures Estimate = 186.20, p < .05).
PFM Model 6 was a revised 3-level Random Intercepts Model with repeated
measures, in which card number (level-2) was added as a fixed effect factor. The model
fit was clearly improved as compared to Models 4 and 5 (-2LL = 30902.41 vs. ~31440).
The main effects for the intercept (F = 603.81, p < .05) and for R_InCard (F = 142.19, p
< .05) remained significant, and card number entered the model as a main effect (F =
65.12, p < .05). Within the SPSS Estimates of Fixed Effects table, the intercept was
slightly higher at 14.91 though the parameter estimates for R_InCard displayed the same
pattern as in Models 4 and 5, with predicted PFM being lower for each subsequent
response within a card. The parameter estimates for card number indicated that, when
using card number = 1 as the reference category, predicted PFM also differs for every
card. In order, predicted PFM was highest for Card 5 (Estimate = 7.26, p < .05, such that
responses to this card have objects seen by about 22% of other people [7.26 + 14.91 =
22.17]), followed by Card 3 (Estimate = 7.10, p < .05), Card 8 (Estimate = 2.51, p < .05),
Card 1 (Estimate = 0), Card 4 (Estimate = -1.76, p = .05), Card 2 (Estimate = -2.27, p <
.05), Card 7 (Estimate = -2.40, p < .05), Card 10 (Estimate = -5.48, p < .05), Card 6
(Estimate = -5.60, p < .05), and Card 9 (Estimate = -8.46, p < .05). In the SPSS Estimates
of Covariance Parameters table, the repeated measures variance component remained
(Estimate = 160.73, p < .05), and the between-person variance component became
statistically significant (Estimate = 2.30, p < .05).
PFM Model 7 was another 3-level Random Intercepts Model with repeated
measures, and a factor-factor cross-level interaction term (R_InCard*card number) was
114
added to the list of fixed effects. The interaction term was used to model the effect of
card number (level-2) on R_InCard (level-1) in predicting PFM. Model fit was again
clearly improved (-2LL = 30649.22). The main effects for the intercept (F = 552.34, p <
.05), R_InCard (F = 155.92, p < .05), and card number (F = 21.03, p < .05) remained
significant, and the R_InCard*card number interaction term entered the model as a fixed
effect (F = 9.69, p < .05). Within the SPSS Estimates of Fixed Effects table, the
parameter estimates for R_InCard displayed the same pattern as in Models 4, 5, and 6,
with predicted PFM being lower for each subsequent response within a card. Although
card number also remained as a main effect, the parameter estimates demonstrated a very
slightly altered pattern as compared to Model 6: When placed in order of descending
estimates, the factor levels were Card 5, 3, 8, 1, 4, 7, 2, 6, 10, then 9. The 40 interaction
effect parameter estimates can be interpreted as ways to adjust the main effects based on
the exact combination of factor levels. As with the model predicting FA, 13 of these
estimates were set to zero because they were redundant. Eleven others did not differ
significantly from zero. Relative to the marginal means set by the Card Number and
R_InCard, the interaction coefficients increased for the 2nd response to Cards 6, 9, and 10
and for the 3rd and 4th responses to Cards 2, 6, 9, and 10; they decreased for the 2nd and
3rd responses to Cards 3 and 5 and for the 4th response to Card 5. This pattern is broader
and somewhat different from that observed for FA, with PF values declining more rapidly
than expected across responses to the two cards with the highest PF means (i.e., 5 and 3)
and less rapidly than expected across responses to the three cards with the lowest PF
means (9, 10, and 6). These trends can be seen in the means in Table 3. In the SPSS
115
Estimates of Covariance Parameters table, the between-person effects remained (Estimate
= 2.41, p < .05) as well as the repeated measures variance (Estimate = 150.43, p < .05).
In PFM Model 8, the scaled-identity matrix covariance structure was replaced
with a diagonal matrix covariance structure for the repeated measures specification. The
lowered model fit statistic (-2LL = 27815.62) indicated a notable improvement in the
model from allowing the PFM variances to differ by R_InCard. The main effects for the
intercept (F = 1114.10, p < .05), R_InCard (F = 155.65, p < .05), and card number (F =
79.62, p < .05), and the R_InCard*card number interaction term (F = 23.27, p < .05)
remained significant, but with increased F-values. Within the SPSS Estimates of Fixed
Effects table, the main effect of R_InCard retained the same pattern of estimates as in
Models 4-7. The card number parameter estimates also retained the same pattern as
compared to Model 7: When placed in order of descending estimates, the factor levels
were Card 5, 3, 8, 1, 4, 7, 2, 6, 10, then 9. Examination of the SPSS Estimates of
Covariance Parameters table revealed that the between-person effects returned to a nonsignificant level (Estimate = 0.07, p = .48) and that the diagonal matrix specification was
appropriate, as the covariance parameter estimates for all R_InCard* card number
combinations were statistically significant (p < .05), supporting the assumption of no
residual covariance between measurement occasions.
PFM Model 9 was used to explore whether Diagnostic Severity contributes to the
model as a fixed effect covariate (level-3). The model would still be classified as a 3level Random Intercepts Model with Repeated Measures. The model fit statistic was
essentially unchanged (-2LL = 27814.09). The fixed effects remained significant for the
intercept (F = 492.38, p < .05), R_InCard (F = 155.82, p < .05), card number (F = 79.74,
116
p < .05), and the R_InCard*card number interaction term (F = 23.28, p < .05). However,
Diagnostic Severity did not enter the model as a main effect (F = 1.60, p = .21).
PFM Model 10 was another 3-level Random Intercepts Model with repeated
measures, but with added fixed effects specifications for two factor-covariate cross-level
interaction terms: R_InCard*Diagnostic Severity (level-1*level-3) and card
number*Diagnostic Severity (level-2*level-3). The model fit statistic indicated a slight
improvement in model fit (-2LL = 27794.10). Fixed effects remained significant for the
intercept (F = 167.75, p < .05), R_InCard (F = 59.80, p < .05), card number (F = 10.27, p
< .05), and the R_InCard*card number interaction term (F = 23.31, p < .05). As in Model
9, Diagnostic Severity did not enter the model as a main effect (F = 0.46, p = .50).
However, one of the two new cross-level interactions did enter the model as a small but
statistically significant fixed effect: R_InCard*Diagnostic Severity (F = 3.57, p < .05).
The interaction term was used to model the effect of R_InCard (level-1) on Diagnostic
Severity in predicting FA. More specifically, the interaction term was used to explore the
possibility that within each unique factor level of R_InCard, Diagnostic Severity might
have a different linear effect on FA (i.e., a change in slope). Although the overall
factor*covariate interaction term was statistically significant, none of the individual
factor level parameter estimates for the interaction are significant; for each level of
R_InCard, the R_InCard*Diagnostic Severity parameter estimate is not statistically
significant, and therefore does not differ from 0. The card number*Diagnostic Severity
interaction did not enter the model (F = 1.23, p = .28).
PFM Model 11 was a simplification of Model 10, in which the significant fixed
effects were retained but the non-significant effects were deleted from the model
117
specification. This model is identical to Model 8 except for one additional fixed effect
specification: The R_InCard*Diagnostic Severity interaction term. The model fit statistic
(-2LL = 27804.91) is almost identical to that of Model 8. All specified main effects were
significant (Intercept F = 475.49, p < .05; R_InCard F = 65.41, p < .05; card number F =
79.70, p < .05; R_InCard*card number F = 23.34, p < .05; R_InCard*Diagnostic Severity
F = 2.74, p < .05). Within the SPSS Estimates of Fixed Effects table (Intercept Estimate
= 15.34, p < .05), the parameter estimates for R_InCard retained the same pattern of
estimates as in Model 8, with each subsequent response within a card having a lower
predicted PFM (first response Estimate = 0; second response Estimate = -7.00, p < .05;
third response Estimate = -10.54, p < .05; fourth response Estimate = -12.02, p < .05).
The card number parameter estimates also retained the same pattern as was seen in Model
8: In order, predicted PFM was highest for Card 5 (Estimate = 13.72, p < .05), followed
by Card 3 (Estimate = 12.22, p < .05), Cards 8, 1, 4, and 7 (Estimate = 0), Card 2
(Estimate = -4.97, p < .05), Card 6 (Estimate = -8.84, p < .05), Card 10 (Estimate = 10.35, p < .05), and Card 9 (Estimate = -13.62, p < .05). The 40 R_InCard*card number
interaction effect parameter estimates can be interpreted as ways to adjust the main
effects based on the exact combination of factor levels. Of the four R_InCard*Diagnostic
Severity interaction effect parameter estimates, only one was significant (R_InCard =
2*Diagnostic Severity Estimate = -0.31, p < .05). It indicates that if the response was the
second response within a card, for each unit of increase on Diagnostic Severity, predicted
PFM is reduced by 0.31 units. Examination of the SPSS Estimates of Covariance
Parameters table revealed that the between-person effects remained at a non-significant
level (Estimate = 0.07, p = .46) and that the diagonal matrix specification was still
118
appropriate because the covariance parameter estimates for all R_InCard* card number
combinations were statistically significant (p < .05), supporting the assumption of no
residual covariance between measurement occasions.
119
Table 5
Statistical Summary of PFM HLM Models for the Criterion Database
-2LL
Model 1
Intercept
Model 2
Intercept
Model 3
Intercept
Model 4
Intercept
R_InCard
Model 5
Intercept
R_InCard
Model 6
Intercept
R_InCard
Card Number
Model 7
Intercept
R_InCard
Card Number
R_InCard * Card Number
Model 8
Intercept
R_InCard
Card Number
R_InCard * Card Number
Model 9
Intercept
R_InCard
Card Number
R_InCard * Card Number
Dx Severity
31367.94
31825.05
31826.00
31437.05
31446.13
30902.41
30649.22
27815.62
27814.09
120
Type III Tests of Fixed Effects
Num df Denom df
F
P
1
9.99
30.54
< .01
1
143.79
1204.44
< .01
1
143.42
1197.99
< .01
1
3
282.44
2876.11
559.87
137.76
< .01
< .01
1
3
274.19
3836.06
558.65
133.28
< .01
< .01
1
3
9
275.38
3841.87
3778.10
603.81
142.19
65.12
< .01
< .01
< .01
1
3
9
27
297.55
3842.13
3847.23
3794.09
552.34
155.92
21.03
9.69
< .01
< .01
< .01
< .01
1
3
9
27
277.25
330.75
160.35
254.01
1114.10
155.65
79.62
23.27
< .01
< .01
< .01
< .01
1
3
9
27
1
275.69
330.74
160.29
254.34
136.98
492.38
155.82
79.74
23.28
1.60
< .01
< .01
< .01
< .01
.21
-2LL
Type III Tests of Fixed Effects
Num df Denom df
F
P
Model 10
27794.10
Intercept
1
841.43
R_InCard
3
710.81
Card Number
9
386.11
R_InCard * Card Number
27
252.60
Dx Severity
1
769.61
R_InCard * Dx Severity
3
422.54
Card Number * Dx Severity
9
314.89
Model 11
27804.91
Intercept
1
250.41
R_InCard
3
524.76
Card Number
9
160.47
R_InCard * Card Number
27
254.49
R_InCard * Dx Severity
4
162.93
Note. The identifier “Dx Severity” refers to Diagnostic Severity.
121
167.75
59.80
10.27
23.31
0.46
3.57
1.23
< .01
< .01
< .01
< .01
.50
.01
.28
475.49
65.41
79.70
23.34
2.74
< .01
< .01
< .01
< .01
.03
HLM models for PFN1.5. PFN1.5 Model 1 was the 2-level null model. The
intercept of the PFN1.5 scores (level-1) was specified as a random function of card
number (level-2 grouping variable). The only fixed effect specified was the level-1
intercept. The model fit was -2LL = 17714.07 (see Table 6 for a statistical summary of all
PFN1.5 HLM Models), and the SPSS Type III Test of Fixed Effects table indicted a
significant card number effect on PFN1.5 scores (F = 124.09, p < .05), signaling that
multilevel modeling was an appropriate way to explore the structure of the PFN1.5 data.
The SPSS Estimates of Covariance Parameters table also indicated that the clustering of
PFN1.5 scores by card number (as a level-2 random effect) accounted for a significant
portion of the total variance (Estimate = 0.44, p < .05). The residual component signaled
that there was a significant amount PFN1.5 score variance that was not accounted for by
the model (Estimate = 5.47, p < .05). Thus, there was evidence of unexplained withincard variation in PFN1.5 scores.
PFN1.5 Model 2 was the initial 3-level null model. As in the 2-level null model,
there were no predictors at any level; the only specified fixed effect was the level-1
intercept. The intercept of PFN1.5 (level-1) was modeled with an accounting of the card
number effect (level-2) and the possible grouping effect by person (ID number at level-3)
as random effects. The model failed to converge and produced a warning that the final
Hessian matrix was not positive definite even though all the convergence criteria were
satisfied. This warning means that the best estimate for the variance of the random
effect(s) is zero. A common cause of the Hessian matrix warning is a model specification
that involves redundant covariance parameters, and a typical recommendation is to try
using a simpler covariance structure specification. Failure to specify a “Subject” variable
122
on the “Random” subcommand line can also produce redundant covariance parameters,
though failure to specify a subject was not the cause of the problem in this model.
PFN1.5 Model 3 was a reattempted 3-level null model, but with a simplified
covariance structure specification. As in Model 2, there were no predictors at any level;
the only specified fixed effect was the level-1 intercept. The intercept of PFN1.5 (level-1)
was modeled with an accounting of the card number effect (level-2) within person (ID
number at level-3), but with no separate between-person effect specified as an individual
random effect. The model fit statistic was higher (-2LL = 17968.39) than in Model 1,
indicating a worse fit than the 2-level null model. The test of fixed effects remained
significant (F = 3602.85, p < .05), as expected, indicating variance in the intercept
attributable to higher-order effects. In the SPSS Estimates of Covariance Parameters
table, the residual component (Estimate = 5.77, p < .05) indicated unexplained withinperson variance in PFN1.5. The card number*ID number component (Estimate = 0.12, p
= .25) was non-significant, indicating the clustering of PFN1.5 scores by card number
within person did not account for a significant portion of the total variance in PFN1.5
scores.
PFN1.5 Model 4 was a second 2-level null model. As compared to Model 1, the
intercept of the PFN1.5 scores (level-1) was specified as a random function of ID number
(level-3 grouping variable), instead of card number (level-2). Once again, the only fixed
effect specified was the level-1 intercept. The model fit was -2LL = 17949.48, and the
SPSS Type III Test of Fixed Effects table indicted a significant ID number effect on
PFN1.5 scores (F = 2381.55, p < .05), still signaling that multilevel modeling was an
appropriate way to explore the structure of the PFN1.5 data within person. The SPSS
123
Estimates of Covariance Parameters table also indicated that the clustering of PFN1.5
scores by ID number (as a level-3 random effect) accounted for a significant portion of
the total variance (Estimate = 0.14, p < .05). The residual component signaled that there
was a significant amount of PFN1.5 score variance that was not accounted for by personlevel effects (Estimate = 5.75, p < .05). Thus, there was evidence of unexplained withinperson variation in PFN1.5 scores.
PFN1.5 Model 5 was a revision of Models 2 and 3 (i.e., the 3-level null model).
Model 5 was designed as a 3-level Random Intercepts Model for PFN1.5. A predictor
variable, R_InCard (level-1), was added as a fixed effect factor. As compared to Model 3,
Model 5 had clearly improved fit (-2LL = 17629.52 vs. 17949.48). The test of fixed
effects revealed significant main effects for the intercept (F = 1389.12, p < .05) and for
R_InCard (F = 112.37, p < .05), indicating there was variance in the intercept of PFN1.5
attributable to higher-order effects as well as R_InCard. The Estimates of Fixed Effects
table listed significant unique estimates for each parameter. The parameter estimate for
the intercept of PFN1.5 (Estimate = 3.15, p < .05) indicated the mean value of PFN1.5
when all predictors are set at zero. As such, the average response contains objects that are
reported by 1.5% of the people in about 3 of the 6 international PF samples. The
remaining parameter estimates indicated that, when using R_InCard = 1 as the reference
category, predicted PFN1.5 was highest on the first response within a card (Estimate =
0.00), and lower for each subsequent response within a card (second response Estimate =
-1.04, p < .05; third response Estimate = -1.57, p < .05; fourth response Estimate = -1.93,
p < .05). Thus, by the fourth response to a card, the average response contains objects that
are reported by 1.5% of the people in about 1 of the 6 international PF samples (3.15 –
124
1.93 = 1.22). In the SPSS Estimates of Covariance Parameters table, the variance
components indicated a small amount of the variance in PFN1.5 was attributable to
between-person effects (ID number component Estimate = 0.09, p < .05), but not
between-card effects within person (card number*ID number component Estimate = 0.12,
p = .23). The majority of the within-person variance in PFN1.5 remained unexplained
(residual component Estimate = 5.21, p < .05).
PFN1.5 Model 6 was a 3-level Random Intercepts Model with repeated measures.
The specification of card number within person as a level-2 random effects grouping
variable was removed, and R_InCard (level-1) within card number (level-2) was
specified as a repeated measure with a scaled-identity matrix covariance structure. This
allowed for modeling the possible correlation of residual errors due to R_InCard within
card number being a repeated measure within subject (ID number, at level-3). As
compared to Model 5, the Model 6 fit statistic was slightly higher (-2LL = 17631.04),
indicating a very small decline in model fit. The main effects for the intercept (F =
1388.75, p < .05) and for R_InCard (F = 110.92, p < .05) remained significant. The
Estimates of Fixed Effects table retained the same pattern as in Model 5. In the SPSS
Estimates of Covariance Parameters table, the components indicated variance in PFN1.5
that was attributable to person-level effects (ID number component Estimate = 0.10, p <
.05), as well as significant variance accounted for by the repeated measures (repeated
measures Estimate = 5.32, p < .05).
PFN1.5 Model 7 was a revised 3-level Random Intercepts Model with repeated
measures, in which card number (level-2) was added as an additional fixed effect factor.
The model fit was notably improved as compared to Models 5 and 6 (-2LL = 17303.37
125
vs. ~17630). The main effects for the intercept (F = 1417.42, p < .05) and for R_InCard
(F = 118.77, p < .05) remained significant, and card number entered the model as a main
effect (F = 38.03, p < .05). Within the SPSS Estimates of Fixed Effects table, the
intercept was 3.86 and the parameter estimates for R_InCard displayed the same pattern
as in Models 5 and 6, with predicted PFN1.5 being lower for each subsequent response
within a card. The parameter estimates for card number were computed using card
number = 1 as the reference category. In order, estimates were highest for Cards 5, 1, and
3 (all three Estimates = 0.00 or p > .05), followed by Card 7 (Estimate = -0.34, p < .05),
Card 2 (Estimate = -0.54, p < .05), Card 10 (Estimate = -0.57, p < .05), Card 4 (Estimate
= -0.90, p < .05), Card 8 (Estimate = -1.02, p < .05), Card 6 (Estimate = -1.64, p < .05),
and Card 9 (Estimate = -2.02, p < .05). In the SPSS Estimates of Covariance Parameters
table, the repeated measures variance component remained significant (Estimate = 4.88, p
< .05), as did the between-person variance component (Estimate = 0.11, p < .05).
PFN1.5 Model 8 was another 3-level Random Intercepts Model with repeated
measures, and a factor-factor cross-level interaction term (R_InCard*card number) was
added to the list of fixed effects. The interaction term was used to model the effect of
card number (level-2) on R_InCard (level-1) in predicting PFN1.5. Model fit was again
clearly improved (-2LL = 17212.51). The main effects for the intercept (F = 1327.89, p <
.05), R_InCard (F = 122.32, p < .05), and card number (F = 13.86, p < .05) remained
significant, and the R_InCard*card number interaction term entered the model as a fixed
effect (F = 3.41, p < .05). Within the SPSS Estimates of Fixed Effects table, the
parameter estimates for R_InCard displayed the same pattern as in Models 5, 6, and 7,
with predicted PFN1.5 being lower for each subsequent response within a card. Although
126
card number also remained as a main effect, the parameter estimates demonstrated a very
slightly altered pattern as compared to Model 7: When placed in order of descending
estimates, the factor levels were Card 5, 1, 7, 3, 4, 2, 10, 8, 6, then 9. The 40 interaction
effect parameter estimates can be interpreted as ways to adjust the main effects based on
the exact combination of factor levels. As with the previous dependent variables, 13 of
these parameters were set to zero; an additional 17 were not significantly different from
zero. For the remaining 10, relative to the marginal means set by the card number and
R_InCard, the interaction coefficients increased for the 2nd response to Card 9 and for the
3rd and 4th responses to Cards 2, 6, 9, and 10; they decreased for the 2nd response to Card
5. This pattern is somewhat similar to that observed for PFM, with the number of data
sets having the response object present for at least 1.5% of the respondents declining
more rapidly than expected from the 1st to 2nd response on the card with the highest
dataset PFN1.5 mean (5) and less rapidly than expected for the 2nd, 3rd, and 4th responses
to the card with the lowest PFN1.5 mean (9) and for the 3rd and 4th responses to Cards 10,
2, and 6. These trends can be seen in the means in Table 3. In the SPSS Estimates of
Covariance Parameters table, the between-person effects remained significant (Estimate =
0.11, p < .05) as well as the repeated measures variance (Estimate = 4.76, p < .05).
In PFN1.5 Model 9, the scaled-identity matrix covariance structure was replaced
with a diagonal matrix covariance structure for the repeated measures specification. The
lowered model fit statistic (-2LL = 17048.34 vs. 17212.51) indicated a clear improvement
in the model from allowing variances to differ by R_InCard. The main effects for the
intercept (F = 1622.04, p < .05), R_InCard (F = 136.99, p < .05), and card number (F =
23.43, p < .05), as well as the R_InCard*card number interaction term (F = 4.68, p < .05)
127
remained significant. Within the SPSS Estimates of Fixed Effects table (Intercept
Estimate = 4.12, p < .05), the main effect of R_InCard retained the same pattern of
estimates as in Models 5-8 (first response Estimate = 0.00; second response Estimate = 1.21, p < .05; third response Estimate = -2.25, p < .05; fourth response Estimate = -2.92,
p < .05). The card number parameter estimates also retained the same pattern as
compared to Model 8: In order, predicted PFN1.5 was highest for Card 5 (Estimate =
0.56, p < .05), followed by Cards 1 and 7 (both Estimates = 0 or p > .05), Card 3
(Estimate = -0.58, p < .05), Card 4 (Estimate = -1.02, p < .05), Card 2 (Estimate = -1.07,
p < .05), Card 10 (Estimate = -1.13, p < .05), Card 8 (Estimate = -1.27, p < .05), Card 6
(Estimate = -2.19, p < .05), and Card 9 (Estimate = -2.94, p < .05). Examination of the
SPSS Estimates of Covariance Parameters table revealed that the between-person effects
remained (Estimate = 0.10, p < .05) and that the diagonal matrix specification was
appropriate because the covariance parameter estimates for all R_InCard* card number
combinations were statistically significant (p < .05), supporting the assumption of no
residual covariance between measurement occasions.
PFN1.5 Model 10 was used to explore whether Diagnostic Severity contributes to
the model as a fixed effect covariate (level-3). The model would still be classified as a 3level Random Intercepts Model with Repeated Measures. The model fit statistic was
essentially unchanged (-2LL = 17047.30). The fixed effects for the intercept (F = 206.64,
p < .05), R_InCard (F = 137.21, p < .05), card number (F = 23.42, p < .05), and the
R_InCard*card number interaction term (F = 4.68, p < .05) remained significant.
However, Diagnostic Severity did not enter the model as a main effect (F = 1.05, p =
.31).
128
PFN1.5 Model 11 was another 3-level Random Intercepts Model with repeated
measures, but with added fixed effects specifications for two factor-covariate cross-level
interaction terms: R_InCard*Diagnostic Severity (level-1*level-3) and card
number*Diagnostic Severity (level-2*level-3). The model fit statistic was again
essentially unchanged (-2LL = 17041.27). The fixed effects for the intercept (F = 166.59,
p < .05), R_InCard (F = 12.43, p < .05), card number (F = 2.58, p < .05), and the
R_InCard*card number interaction term (F = 4.69, p < .05) once again remained
significant. However, Diagnostic Severity still did not enter the model as a main effect (F
= 0.64, p = .42), and neither did the newly-specified R_InCard*Diagnostic Severity (F =
0.42, p = .74) and card number*Diagnostic Severity (F = 0.53, p = .86) interaction terms.
129
Table 6
Statistical Summary of PFN1.5 HLM Models for the Criterion Database
-2LL
Model 1
Intercept
Model 2
Intercept
Model 3
Intercept
Model 4
Intercept
Model 5
Intercept
R_InCard
Model 6
Intercept
R_InCard
Model 7
Intercept
R_InCard
Card Number
Model 8
Intercept
R_InCard
Card Number
R_InCard * Card Number
Model 9
Intercept
R_InCard
Card Number
R_InCard * Card Number
Model 10
Intercept
R_InCard
Card Number
R_InCard * Card Number
Dx Severity
17714.07
-17968.39
17949.48
17629.52
17631.04
17303.37
17212.51
17048.34
17047.30
130
Type III Tests of Fixed Effects
Num df Denom df
F
P
1
9.98
124.09
< .01
--
--
--
--
1
1397.76
3602.85
< .01
1
153.56
2381.55
< .01
1
3
282.80
3050.93
1389.12
112.37
< .01
< .01
1
3
280.48
3845.49
1388.75
110.92
< .01
< .01
1
3
9
281.06
3843.50
3778.51
1417.42
118.77
38.03
< .01
< .01
< .01
1
3
9
27
305.37
3845.45
3839.19
3793.25
1327.89
122.32
13.86
3.41
< .01
< .01
< .01
< .01
1
3
9
27
200.49
855.09
123.81
168.16
1622.04
136.99
23.43
4.68
< .01
< .01
< .01
< .01
1
3
9
27
1
153.14
854.94
123.79
168.59
148.88
206.64
137.21
23.42
4.68
1.05
< .01
< .01
< .01
< .01
.31
-2LL
Type III Tests of Fixed Effects
Num df Denom df
F
P
Model 11
17041.27
Intercept
1
192.50
R_InCard
3
1210.30
Card Number
9
564.19
R_InCard * Card Number
27
167.32
Dx Severity
1
194.41
R_InCard * Dx Severity
3
1183.67
Card Number * Dx Severity
9
664.04
Note. The identifier “Dx Severity” refers to Diagnostic Severity.
131
166.59
12.43
2.58
4.69
0.64
0.42
0.53
< .01
< .01
.01
< .01
.42
.74
.86
Supplemental Analysis Strategies
The majority of the supplemental analyses were completed using data that were
aggregated at the protocol level, accomplished by calculating the mean of each responselevel variable within each protocol. Previous descriptive statistics and the HLM analyses
were completed using data at the response level. The protocol-level aggregation leads to
descriptive statistics and analyses in which each person’s scores are equally represented
in the results; in the response-level analyses, a person with a greater number of responses
would have more data contributing to the descriptive statistics and models than a person
with fewer responses. Therefore, some minor changes are apparent in the descriptive
statistics as compared to the previously reported results.
Consistent with the descriptive statistics reported earlier, the Criterion Database
contained 159 valid Rorschach protocols with accompanying Diagnostic Severity scores
(M = 3.53, SD = 1.07) available. Descriptive statistics for the new protocol-level
variables are reported in Table 7. As compared to response-level FA (M = 3.32, SD =
1.00), the protocol-level FA has a similar mean but much less dispersion of scores (M =
3.34, SD = 0.26). This signifies more score fluctuation in FA between responses than
between protocols. PFM (response-level M = 8.80, SD = 14.37; protocol-level M = 9.10,
SD = 3.41) and PFN1.5 (response-level M = 2.37, SD = 2.43; protocol-level M = 2.43, SD
= 0.63) have similar patterns.
132
Table 7
Protocol-Level Descriptive Statistics for the Criterion Database
FA
PFM
PFN1.5
M
SD
M
SD
M
SD
N
Total
3.34
0.26
9.10
3.41
2.43 0.63
159
Card 1
3.59
0.47 10.00
7.48
3.21 1.65
159
Card 2
3.40
0.62
8.02
6.17
2.60 1.51
158
Card 3
3.36
0.71 18.73 14.53
2.95 1.46
159
Card 4
3.34
0.63
8.81 10.74
2.32 1.70
159
Card 5
3.89
0.63 19.61 13.96
3.61 1.73
159
Card 6
3.18
0.69
4.63
5.48
1.56 1.53
159
Card 7
3.39
0.64
8.37
7.07
2.99 1.77
159
Card 8
3.31
0.81 13.67 13.32
2.22 1.42
159
Card 9
2.79
0.64
1.45
1.34
1.15 1.20
157
Card 10
3.27
0.65
3.65
2.52
2.42 1.40
158
R_InCard 1
3.57
0.33 14.14
5.34
3.16 0.83
159
R_InCard 2
3.24
0.36
6.33
4.30
2.13 0.87
159
R_InCard 3
3.10
0.60
4.58
6.35
1.75 1.45
150
R_InCard 4
2.96
0.65
3.25
7.66
1.17 1.33
101
Note. All variables listed in the table represent protocol-level means of response-level
variables; the means and standard deviations listed are across all protocols in the
Criterion Database with relevant data.
Table 7 also includes protocol-level descriptive statistics for the variables broken
down by card number and by R_InCard. The protocol-level means for FA, PFM, and
PFN1.5 are also displayed graphically in Figures 13-18. The protocol-level mean of FA is
highest for Cards 5 and 1, and lowest on Card 9, indicating that, on average, response
objects have the best perceptual fit to Cards 5 and 1, and worst fit on Card 9.
Examination of protocol-level PFM reveals that subjects, on average, delivered responses
containing the most popularly-reported response objects to Cards 5, 3, 8, and 1, while
delivering the least common response objects to Card 9. It is also noteworthy that Cards
5, 3, and 8 had the largest standard deviations for PFM, while Card 9 had the lowest. The
pattern of means and standard deviations indicates that people, on average, gave
133
conventional responses to Cards 5, 3, and 8, but it was on those cards that there also was
the most variation between people on the conventionality of their responses, at least with
regard to the response objects they used in constructing their responses. Protocol-level
PFN1.5 statistics demonstrated that subjects delivered responses containing objects
commonly used in the most countries to Cards 5, 1, 7, and 3, while delivering responses
containing objects that were common to the least number of countries to Card 9. In
examining Table 7, the PFN1.5 statistics showed less-pronounced patterns than the
statistics for PFM.
134
Figure 13. Protocol-Level FA Means by Card Number.
135
Figure 14. Protocol-Level PFM Means by Card Number.
136
Figure 15. Protocol-Level PFN1.5 Means by Card Number.
137
Figure 16. Protocol-Level FA Means by R_InCard.
138
Figure 17. Protocol-Level PFM Means by R_InCard.
139
Figure 18. Protocol-Level PFN1.5 Means by R_InCard.
140
When Figures 16-18 are examined with regard to patterns in protocol-level scores
averaged across people and organized according to R_InCard, a clear pattern emerges
across variables – For FA, PFM, and PFN1.5, the mean protocol-level score decreases
with each subsequent response within a card. With each subsequent response within a
card, the objects used in constructing the responses have worse perceptual fit, are lesscommonly-used objects, and are also commonly-used objects in fewer countries.
Examination of the standard deviations reported in Table 7 reveals that FA scores also
seem have more variation with each subsequent response within a card. For PFM and
PFN1.5, the pattern is similar, with R_InCard 1 and 2 having lower standard deviations
than R_InCard 3 and 4. However, this effect is partially driven by the reduced number of
responses in the sample for each subsequent R_InCard (e.g., there are more first
responses within a card than fourth responses within a card). The sample size affects the
size of the standard deviations, with smaller sample sizes leading to larger standard
deviations. Thus, with each subsequent R_InCard, the smaller sample size contributes to
the larger estimate for the standard deviation.
To further explore differences in FA, PFM, and PFN1.5 based on card number
and response within card, the means and standard deviations were used to compute
Cohen’s d scores. At the protocol-level, means and standard deviations from Table 7
were used to compute the d-values that are listed below in Table 8 and displayed in
Figure 19 and 20. Card 1 and R_InCard 1 were used as the reference categories, and thus
they have d-values of 0. The d-values associated with Cards 2-10 and R_InCard 2-4
reflect differences relative to Card 1 and R_InCard 1. When FA is examined, relative to
the 1st response to a card, on average the 2nd response was 0.96 of a SD lower, the 3rd
141
response was about one full SD lower, and the 4th response had average FA that was
about 1.2 SD’s lower than the 1st response. With regard to differences in FA based on
card number, Card 5 was about a half a standard deviation higher in FA than the
reference value set by Card 1. Cards 2, 7, and 3 were about a third of a SD lower in FA
than Card 1, while cards 8, 4, 10, and 6 were about a half a SD lower in FA than Card 1.
Card 9 stood out by having average FA scores that were 1.4 SD’s below Card 1 and thus
about two full SDs below Card 5. As can be seen in the table and figures, the same
patterns are fairly consistent across FA, PFM, and PFN1.5. The same general patterns
also hold true for the data when Cohen’s d scores are calculated based on response-level
FA, PFM, and PFN1.5 means and standard deviations. Response-level d-values are listed
below in Table 9 and displayed in Figure 21 and 22.
142
Table 8
Protocol-Level Cohen’s d comparing Each Card to Card 1 and Each R_InCard to
Response 1 for the Criterion Database
FA
d
PFM
d
PFN1.5
d
Card #
1
0.00
0.00
0.00
2
-0.35
-0.29
-0.39
3
-0.39
0.79
-0.17
4
-0.45
-0.13
-0.53
5
0.55
0.90
0.24
6
-0.71
-0.83
-1.04
7
-0.36
-0.22
-0.13
8
-0.44
0.35
-0.64
9
-1.44
-1.94
-1.45
10
-0.57
-1.27
-0.52
R_InCard
1
0.00
0.00
0.00
2
-0.96
-1.62
-1.21
3
-1.01
-1.64
-1.24
4
-1.24
-1.68
-1.84
Note. Card 1 and R_InCard 1 are used as the reference values. All values listed are based
on protocol-level means and standard deviations of response-level variables; the means
and standard deviations used to calculate d-scores are across all protocols in the Criterion
Database with relevant data.
143
Figure 19. Protocol-Level Cohen’s d Comparing Cards 2-10 to Card 1 on FA, PFM, and
PFN1.5.
144
Figure 20. Protocol-Level Cohen’s d Comparing R_InCard 2-4 to R_InCard 1 on FA,
PFM, and PFN1.5.
145
Table 9
Response-Level Cohen’s d Comparing Each Card to Card 1 and Each R_InCard to
Response 1 for the Criterion Database
FA
d
PFM
d
PFN1.5
d
Card #
1
0.00
0.00
0.00
2
-0.22
-0.20
-0.21
3
-0.22
0.45
-0.11
4
-0.31
-0.11
-0.33
5
0.25
0.49
0.12
6
-0.48
-0.51
-0.67
7
-0.24
-0.17
-0.10
8
-0.30
0.15
-0.41
9
-0.92
-1.15
-0.94
10
-0.29
-0.71
-0.28
R_InCard
1
0.00
0.00
0.00
2
-0.34
-0.54
-0.43
3
-0.50
-0.73
-0.69
4
-0.66
-0.89
-0.93
Note. Card 1 and R_InCard 1 are used as the reference values. All values listed are based
on response-level means and standard deviations.
146
Figure 21. Response-Level Cohen’s d Comparing Cards 2-10 to Card 1 on FA, PFM,
and PFN1.5.
147
Figure 22. Response-Level Cohen’s d Comparing R_InCard 2-4 to R_InCard 1 on FA,
PFM, and PFN1.5.
148
In the HLM analyses, the predicted variables were FA, PFM, and PFN1.5.
Although Diagnostic Severity was introduced to the models for each of the predicted
variables, it was not possible to decipher the simple relationship between Diagnostic
Severity and each of the predicted variables, as other variables were part of the structural
models as well. Therefore, 2-tailed Pearson correlation coefficients were calculated
between Diagnostic Severity and each of the Rorschach variables at the protocol level
(i.e., overall protocol-level FA, PFM, and PFN1.5, as well as FA, PFM, and PFN1.5
broken out by card number and by R_InCard). There were hypothesized relationships
between variables, with increases in Diagnostic Severity hypothesized to correspond with
decreases in FA, PFM, and PFN1.5 scores.
There were very few statistically-significant correlations between the Rorschach
variables and Diagnostic Severity. Using Cohen's (1988) conventions and an alpha of .05,
there were small correlations between Diagnostic Severity and protocol-level FA over all
responses (r = -.16, p = .04), and for responses to Cards 4 (r = -.16, p = .05). There were
correlation coefficients that neared significance for responses to Card 6 (r = -.13, p =
.09), and for the 1st (r = -.14, p = .08) and 2nd (r = -.15, p = .06) responses to each card.
The correlations were in the expected direction, with higher Diagnostic Severity scores
corresponding with lower FA scores. However, when the Holm, Larzelere and Mulaik
alpha correction procedure was used (see Howell, 2010) to adjust for the number of cardspecific correlations (and thus null hypotheses) being tested, the correlation between FA
scores on Card 4 and Diagnostic Severity was no longer significant (corrected alpha to
surpass = .002, based on 30 correlation tests). Surprisingly, there were no statistically-
149
significant correlations between Diagnostic Severity scores and the protocol-level PFM
and PFN1.5 variables, even when using the more lenient alpha of .05.
However, there were moderate correlations between the Rorschach fit and
frequency variables, as would be anticipated. Response-level correlations between FA
and PFM (r = .51, p < .01), and FA and PFN1.5 (r = .59, p < .01) were slightly smaller
than the correlation between PFM and PFN1.5 (r = .69, p < .01). All correlations
referenced here were also recomputed using the nonparametric alternative, Spearman’s
Rho. Effect sizes and significance values were very similar to the Pearson correlation
coefficient values, direction of effects remained the same, and the same effects were
determined to meet the threshold for statistical significance.
Given that there were moderate correlations between FA, PFM, and PFN1.5, but
there were no correlations between the frequency variables and Diagnostic Severity and
only a small correlation between protocol-level FA and Diagnostic Severity, further
follow-up data exploration seemed warranted. Thus, a new approach was taken to
quantifying the fit and frequency information at the protocol level. If rank ordered and
quantified according to frequency, Rorschach response objects form a Zipf distribution:
Very few objects are extremely frequent (i.e., the populars), a small proportion of objects
occur with high enough frequency that they are assigned a PFM and PFN1.5 value in the
lookup tables (i.e., occur in at least 1.5% of protocols in at least 1 sample), and the
remaining objects are relatively unique objects and occur infrequently, thus creating a
long tail in the distribution of objects by frequency. It was believed that the right-hand
long tail of the object distribution might hold a lot of information that was not being wellrepresented in the previous analyses because of the way that Percept Frequency variables
150
were tabulated, with all objects that had frequencies of less than 1.5% being weighted
equally.
First, a new response-level variable was calculated: Form Inaccuracy (FI) was
computed by subtracting FA scores (range of 1-5) from 6.0, resulting in an inverse of FA
that still has a range of 1-5. Thus, higher FI scores are indicative of greater levels of
inaccuracy in perception. As expected, FA and FI had a perfect correlation of -1.0. The
Criterion Database was then sorted and filtered according to the response-level PFM
scores. If a response had a PFM score of 0, the response was included in the following
protocol-level computations: Mean of the response-level FA scores per protocol, and the
sum of the response-level FI scores per protocol. The mean FA score, calculated only
from responses with a PFM score of 0, was conceptualized as a way to represent the
accuracy of each test-taker’s perceptions on responses that contained only infrequent
objects; it is the average of a person’s FA scores from the right tail of the Zipf
distribution. The sum of FI scores, again calculated only from responses with a PFM
score of 0, was considered a way to quantify inaccurate fit while also inherently
accounting for how often the person gave responses that contained only objects that are
infrequent and reside in the tail of the Zipf distribution. In the end, neither of the new
scores had an association with Diagnostic Severity (mean FA r = -.02, p = .79; sum FI r =
.01, p = .87).
151
Chapter Five
Discussion
FQ scoring systems and related indices have been constructed, evaluated, and
revised over time and across Rorschach systems. Hermann Rorschach devised FQ as a
way to describe whether the objects used in Rorschach responses were appropriate for the
contours of the inkblot used in the constructing the response (Exner, 2003). It was
Rorschach’s belief, which is shared by many others who use and research the Rorschach,
that the manner in which form was used in constructing a response delivered information
about the person’s perceptual accuracy or “reality testing” ability (Exner, 2003). Though
the validity of the Rorschach has been the topic of intense debate since its development,
even the toughest current critics of the Rorschach attest to the validity of FQ (e.g.,
Dawes, 1999; Wood, Nezworski, & Garb, 2003; Wood, Garb, Nezworski, Lilienfeld, &
Duke, 2015). Adding to their appeal, these scores also serve as an example of variables
with a clear relationship to the construct they are intended to assess (McGrath, 2008). The
existing peer-reviewed literature has clearly and consistently demonstrated that the
Rorschach can be used to accurately identify psychosis in test-takers by employing FQ
scores and indices, as well as indices that are partially comprised of FQ information (e.g.,
Mihura et al., 2013), and this is attributed to FQ functioning as a gauge of the accuracy of
the test-takers’ perceptions.
152
Within the CS, the three primary FQ designations are ordinary (o), unusual (u),
and minus (-). CS FQ is comprised of response goodness-of-fit with the inkblot,
frequency of the percept, selection of words used to describe the percept, and use of
arbitrary lines in forming the percept within the inkblot. However, goodness-of-fit is the
concept most emphasized in Exner’s (2003) descriptions of FQ. Partly due to the
disparities between CS definitions of FQ and the actual elements that contributed to the
FQ designations listed in the tables, researchers have revisited the topic of how to best
capture the FQ construct of interest: A person’s perceptual accuracy or “reality testing”
ability (e.g., Meyer et al., 2011). Over the past few years, the argument has been made
that factors like the frequency of perceptions on the Rorschach do in fact relate to the
accuracy of a person’s perceptions, when the concept of perceptual accuracy is
considered from an ecological position. Should a person’s objective misperception of a
stimulus (i.e., low fit) be considered a misperception if it is a highly common
misperception (i.e., high frequency)?
Prior to beginning work on R-PAS as a formal system, Meyer and Viglione
(2008) conceptualized and developed the FA scoring category, which is a dimensional
indicator of the accuracy of perceptual fit between a response object and the features of
the inkblot. Meyer et al. (2011) followed the development of FA with the initial
development of PF indices, which indicate the frequency with which the various response
objects are used on the Rorschach.
In developing the R-PAS FQ tables, Meyer et al. (2011) wanted to retain the
essence of FQ as a measure of accuracy of perception that can be used to identify
distorted perceptual processes of the test-taker. Following their existing line of research,
153
they included two distinct elements in their operational definition of FQ: Fit between the
perceived object and the features of the inkblot, and the frequency with which the
reported object occurs in the location used by the respondent. Consistent with their
conceptualization, in an iterative review process the fit and frequency elements were both
used in constructing the final R-PAS FQ reference tables. Like in the CS, R-PAS
designates three FQ codes that can be assigned to responses that incorporate the use of
form: ordinary (o), unusual (u), and minus (-). Although the R-PAS version of FQ was
developed in an attempt to rectify some of the problems associated with the CS version of
FQ, early validity studies demonstrate additional room for improvement in the detection
of psychosis using the Rorschach.
There is currently no single fully dimensional Rorschach score (within the CS, RPAS, or otherwise) that can thoroughly and efficiently tap into both the conventionality
or spontaneously given frequency of response objects and the perceptual fit of those
response objects to the cards. Researchers have anticipated that an empirically-developed
and dimensional score that is comprised of both goodness-of-fit information as well as
frequency information (i.e., PA) could substantially improve our ability to detect
distorted perceptual processes and impaired reality testing of the test-taker, and thus
improve validity coefficients in the Rorschach-based identification of psychosis. It
seemed a worthwhile investment to first explore how FA and PF function independently,
and to understand the structure of the various FA and PF indices across responses and
cards within the Rorschach. Without exploring the variables independently, it would be
difficult to determine how to best combine FA and PF information within a protocol to
maximize the performance of a new PA scoring system. By clarifying the structure and
154
performance of FA and PF, it was hoped that standardized methods of scoring and
interpreting PA scores could then be developed and applied to future research and ideally,
to future clinical practice.
Updating the PF Tables
In the current study, the preliminary lookup table of PF variables and values that
was developed by Meyer et al. (2011) was expanded by examining the existing specific
object frequencies from five international datasets (Argentina, Brazil, Italy, Japan, and
Spain), and by adding data from a sixth country (the U.S.) before creating international
summary PF indices. The two final PF indices serve as cross-cultural indicators of the
conventionality of response objects, and they were developed and inserted into the
lookup tables at the object level. The first PF variable is the mean of the six withincountry variables that indicate the percentage of protocols that contained each match
number. This variable is computed based on data from those countries that had objects
reported by at least 1.5% of the participants in the sample. Thus, it roughly indicates on
average how often a particular percept is reported across samples. When this percentagebased variable is applied to actual Rorschach responses and averaged across the objects
within the response, it is referred to as PFM (Percept Frequency Mean). The object-level
percentage-based variable was also converted into the second PF variable, which is a
count of the number of samples (out of the six countries) in which the object was found
in at least 1.5% of the protocols from each country. In other words, it is a sum of the
binary country-specific variables that were used to indicate whether the response object
was found in at least 1.5% of the protocols from that country. When this count-based
variable is applied to actual Rorschach responses and averaged across the objects within
155
the response, it is referred to as PFN1.5 (Percept Frequency Number of samples ≥ 1.5%).
Object-level FA ratings were also retained in the lookup tables, with each object’s FA
value having been derived from an average of 9.9 rater judgements (Meyer & Viglione,
2008). After responses are coded for FA at the object level, response-level FA scores are
determined. If the gestalt of the response percept is listed in the lookup tables, the
corresponding FA score is applied to the response; if the gestalt is not listed, the lowest
FA score from across the important objects used in the response is assigned as the
response-level FA score.
After the frequency data were compiled for the U.S. Sample protocols, 5 unique
objects with a percentage-based frequency of ≥ 1.5% in the U.S. Sample were identified
that were not listed in the previous version of the lookup tables. Although it was initially
surprising that a greater number of unique objects were not identified within the U.S.
Sample with a frequency of ≥ 1.5%, this result can also be interpreted as evidence that the
FA and PF projects nearly exhausted the list of objects that will be encountered on most
protocols.
Interrater Reliability
Like Exner (2003), Meyer et al. (2011) considered interrater reliability of great
importance for each score included in the R-PAS system. FQ has consistently
demonstrated high levels of interrater reliability in the published literature. When Meyer
and Viglione (2008) began developing the FA scoring system, they anticipated that FA
would have interrater reliability on-par with, or better than that typically encountered in
CS FQ scoring. In the existing FA research, interrater reliability has ranged from good to
excellent, and the same is true for the current study (ICC = .75). One might predict that
156
there would be more opportunity for disagreements on the proper FA code for objects and
responses than on the FQ code. However, the FA scoring steps and guidelines are very
similar to those used in FQ coding, decisions use lookup tables, and there are more
objects contained within the FA lookup tables than in the CS or R-PAS FQ tables.
Therefore, there would likely be fewer extrapolations and coder judgments required for
FA coding than for CS or R-PAS FQ coding across the average protocol. This is an
important consideration because if interrater reliability is low for a variable, more error is
introduced into the scores and the validity coefficients will be reduced as an effect.
Though interrater reliability for PFM and PFN1.5 was not computed in this study, it can
also be safely assumed that it would mirror that of FA because the PF variable values
were assigned through the use of syntax following coding of match numbers, which was
part of the process of coding for FA. Coder judgment and extrapolation from the tables
was not involved in assigning PF variable scores; either the specific object is listed in the
lookup tables (with corresponding PF codes) or it is not. If an object is not listed, the PF
variable scores for those objects are set to zero. The response-level PF variables are then
also calculated through syntax as the mean of the object-level PF variables within the
response.
Modeling the Criterion Database
I explored the structure of the response-level and protocol-level FA and PF
indices using an archival database that included Rorschach protocols and a Diagnostic
Severity score that served as a criterion measure. Diagnostic Severity was expressed on a
5-point scale, with higher scores indicating higher degree of overall dysfunction
associated with a diagnosis. The response-level FA and PF indices were explored by
157
modeling how card number, response within card, and the criterion variable contributed
to the structure of each variable, and protocol-level validity coefficients with the criterion
measure were calculated in follow-up analyses.
Modeling the Structure of FA
The descriptive statistics for the response-level FA scores indicate that, across the
Criterion Database sample, on average people gave responses that had a gestalt goodnessof-fit rating of 3.32 (SD = 1.00, range of 1-5). This means that if the average response
from the Criterion Database was shown to a group of judges who were asked, “Can you
see the response quickly and easily at the designated location,” the consensus would fall
between “A little. If I work at it, I can sort of see that” (a rating of 3), and “Yes. I can see
that. It matches the blot pretty well” (a rating of 4).
A series of nine HLM models were constructed for predicting FA scores at the
response level, following a modeling approach suggested by Garson (2013) in which the
null modeling is the first step, and is followed by building in additional model terms
based on theory if there is indication that multilevel modeling is needed. The FA
modeling began with two-level and three-level null models (“unconditional models”).
The null models are random intercept models in which the intercept of the predicted
variable (in this case, FA) is specified as a random effect of 1 or more grouping variables
at a higher level(s), with no fixed effect predictors specified. Null modeling is used to
establish a baseline model, and also functions as a test of possible higher-order grouping
effects; if higher-order grouping effects are present in the data (i.e., the covariance
structure of the data is impacted by the grouping variable, due to clustering of effects,
which creates correlated error), mixed modeling (e.g., HLM) of the data is indicated. The
158
FA null modeling (FA Models 1 and 2) indicated clustering by card, supporting the
application of HLM modeling procedures.
As a next step (Models 3-5), possible fixed effects (i.e., predictors that impact the
intercept of FA) were specified, as well as a repeated measures effect on the covariance
structure of the data. The FA intercept, R_InCard (level-1 fixed factor as a main effect),
and card number (level-2 fixed factor as a main effect) were all statistically significant
predictors in the modeling. Additionally, there was statistical support and a theory-based
rationale for including a repeated measures specification in the models. Repeated
measurements made on the same unit (e.g., the same person responding to a stimulus
multiple times) exhibit clustering effects. Within HLM, the level-1 repeated
measurements (i.e., the FA scores) can be modeled as clustered within higher-order
observation units (i.e., within R_InCard that is in turn within card number).
Next (Models 6-7), a possible fixed effect interaction term was added to the
model specifications and the scaled-identity matrix covariance structure was replaced
with a diagonal matrix covariance structure for the repeated measures specification. Fixed
effect interaction terms are used to model the possibility that each unique combination of
factor levels might have a different linear effect on the predicted variable. The
R_InCard*card number interaction term (cross-level factor*factor interaction) entered as
a small but statistically significant effect, and the decision to use a diagonal matrix
specification for the covariance structure that allowed FA variances to differ across the
first to fourth response had statistical support and led to improved model fit.
In the final models (Models 8-9), Diagnostic Severity was introduced into the
model specifications. In Model 8, Diagnostic Severity had a small but statistically
159
significant main effect (level-3 fixed effect covariate) on predicted FA intercept, and the
other model parameters remained significant (i.e., the fixed effects for FA intercept,
R_InCard, card number, and R_InCard*card number interaction term; the repeated
measures variance for R_InCard within card number). In Model 9, additional possible
fixed effect interaction terms were added to the model specifications: The
R_InCard*Diagnostic Severity and the card number*Diagnostic Severity interaction
terms (cross-level factor*covariate interactions) were not statistically significant.
Model 8 proved to be the best model for understanding the structure of FA. The
fit statistic was low compared to the other models (indicating it had improved fit), all
specified effects were statistically significant, and the patterns within the fixed effect
parameter estimates were largely consistent with the simpler models, indicating stability
in the effects. Based on R_InCard parameter estimates, predicted FA was lower for each
subsequent response within a card. Based on Cohen’s d values computed at the response
level, relative to the 1st response to a card, on average the 2nd response was about 3/10 of
a SD lower, the 3rd response was about 5/10 of an SD lower, and the 4th response had
average FA that was about 7/10 of a SD lower than the 1st response. Predicted FA was
also different for each factor level of card number: When placed in order of descending
parameter estimates, the factor levels were Card 5, 1, 7, 4, 8, 10, 2, 3, 6, and 9. Based on
Cohen’s d values computed at the response level, Card 5 was about a quarter of a
standard deviation higher in FA than the reference value set by Card 1. Cards 7, 4, 8, 10,
2, and 3 were about a third a SD lower in FA than Card 1, and Card 6 was about a half a
SD lower in FA than Card 1. Card 9 stood out by having average FA scores that were
almost a full SD below Card 1 and about 1.2 SDs below Card 5. The R_InCard*card
160
number interaction effect parameter estimates also contributed small adjustments to
predicted FA based on the exact combinations of R_InCard and card number factor
levels. Additionally, there was a small linear effect of Diagnostic Severity (fixed effect
covariate) on predicted FA scores, with higher Diagnostic Severity scores predicting
slightly lower FA scores, on average. Lastly, there was statistical support for specifying
repeated measurements of FA within R_InCard within card number.
Modeling the Structure of PFM
The mean of the response-level PFM score across all responses and protocols (M
= 8.80, SD = 14.37) indicates that, on average, people delivered responses with objects
that about 9% of people in the comparison samples also saw. Note, however, the SD is
larger than the M and the distribution has a floor of zero, indicating that this is a variable
with a positively skewed distribution (skew = 2.02). At the high end of the range of
observed scores (0 to 63.25), at least one person delivered a response in which their
objects, on average, were present in 63.25% of the protocols across samples. In other
words, people were delivering responses containing objects that more than half of people
in the comparison samples also saw.
Response-level PFM was modeled with a series of 11 HLM models. The PFM
modeling began with two-level and three-level null models (PFM Models 1-3), and the
models collectively indicated that there was higher-order effects on PFM, that a multilevel model (HLM) was appropriate, and there was clustering of PFM error variance by
card number. Possible fixed effects were specified in the next steps, as well as a repeated
measures effect on the covariance structure of the data (Models 4-6). The PFM intercept,
R_InCard (level-1 fixed factor as a main effect), and card number (level-2 fixed factor as
161
a main effect) were all statistically significant predictors in the modeling. There was also
statistical support for specifying repeated measurements of PFM within R_InCard within
card number. Next (Models 7-8), a fixed effect interaction term was specified as a
possible predictor of PFM in addition to the previous main effects. The R_InCard*card
number interaction term (cross-level factor*factor interaction) entered as a statistically
significant effect. Additionally, the scaled-identity matrix covariance structure was
replaced with a diagonal matrix covariance structure for the repeated measures
specification, which again led to an improved model fit. In the last PFM models (Models
9-11), Diagnostic Severity was introduced into the model specifications. In Model 9,
Diagnostic Severity failed to enter the model as a main effect (level-3 fixed effect
covariate). In Model 10, additional possible fixed effect interaction terms were added to
the model specifications: R_InCard*Diagnostic Severity (level-1*level-3) and card
number*Diagnostic Severity (level-2*level-3). Both terms are cross-level
factor*covariate interaction terms. Diagnostic Severity still failed to enter the model as a
main effect, and card number*Diagnostic Severity was not a significant interaction.
However, there was a small but statistically significant fixed effect for the interaction
term of R_InCard*Diagnostic Severity in predicting PFM. Model 11 was a simplification
of Model 10 in which the significant effects were retained in the specifications, but nonsignificant predictors were dropped from the model.
Model 11 was the best structural model for predicting response-level PFM scores.
The fit statistic was lower than previous models and all specified effects were statistically
significant. Also, like with FA, there was good consistency in fixed effect parameter
estimate patterns throughout the modeling of PFM. Entirely consistent with the prediction
162
models for FA, predicted PFM scores were lower for each subsequent response within a
card. When the response-level d values were assessed, relative to the 1st response to a
card, on average the 2nd response was about ½ of a SD lower in PFM, the 3rd response
was about ¾ of a SD lower, and the 4th response had average PFM that was about 9/10 of
a SD lower than the 1st response. Predicted PFM also varied based on the factor level of
card number. In order, predicted PFM was highest for Card 5, followed by Card 3, Cards
8, 1, 4 and 7, Card 2, Card 6, Card 10, and Card 9. When d was computed using
response-level data, PFM values for Cards 5 and 3 were about half a SD above the
reference value of Card 1; Cards 8, 4, 7, and 2 were within 2/10 a SD of Card 1; Card 6
was about a half a SD below Card 1; and Cards 10 and 9 were about 3/4 to 1 SD below
Card 1. The R_InCard*card number interaction effect parameter estimates also
contributed small adjustments to predicted PFM based on the exact combinations of
R_InCard and card number factor levels. Of the four R_InCard*Diagnostic Severity
interaction effect parameter estimates, only one was significant (R_InCard =
2*Diagnostic Severity). It indicates that if the response was the second response within a
card, for each unit of increase on Diagnostic Severity, predicted PFM is reduced by 0.31
units, which is a small change. In conjunction with the p value of .03 for this effect and
the fact that there was not a consistent pattern in the interaction across other response
positions, this small degree of change has to be considered tentative. Finally, there was
also statistical support for specifying repeated measurements of PFM within R_InCard
within card number.
163
Modeling the Structure of PFN1.5
The response-level PFN1.5 scores ranged from 0-6, with an average score of 2.37
(SD = 2.43). On average, people gave response objects that were present in 2.37 samples
at a frequency of 1.5% or higher. Modeling of the response-level PFN1.5 scores was
accomplished using 11 HLM models. The two-level and three-level null models (PFN1.5
Models 1-4) collectively indicated that there was higher-order effects on PFN1.5, that a
multi-level model (HLM) was appropriate, and there was clustering of PFN1.5 error
variance by card number. Possible fixed effects were specified in the next steps, as well
as a repeated measures effect on the covariance structure of the data (Models 5-7). The
PFN1.5 intercept, R_InCard (level-1 fixed factor as a main effect), and card number
(level-2 fixed factor as a main effect) all entered the modeling as significant predictors.
As with FA and PFM, there was also statistical support for specifying repeated
measurements of PFN1.5 within R_InCard within card number. Next (Models 8-9), a
R_InCard*card number interaction term was specified as a possible fixed effect predictor
of PFN1.5, and it entered as a statistically significant effect. Additionally, the scaledidentity matrix covariance structure was replaced with a diagonal matrix covariance
structure for the repeated measures specification, leading to better model fit. In the last
PFN1.5 models (Models 10-11), Diagnostic Severity was introduced into the model
specifications. In Model 10, Diagnostic Severity failed to enter the model as a main effect
(level-3 fixed effect covariate). In Model 11, the additional possible fixed effect
interaction terms were added to the model specifications: R_InCard*Diagnostic Severity
(level-1*level-3) and card number*Diagnostic Severity (level-2*level-3). Diagnostic
164
Severity still failed to enter the model as a main effect, and neither of the new interaction
terms was statistically significant in the prediction of response-level PFN1.5 scores.
Model 9 was the best structural model for predicting response-level PFN1.5
scores; the model fit statistic was lower than previous models and the model was not
over-specified in that all effects were statistically significant. There was once again good
consistency in fixed effect parameter estimate patterns throughout the modeling
sequence. As observed in the FA and PFM models, the predicted response-level PFN1.5
scores were lower for each subsequent response within a card, and the scores also varied
based on the factor level of card number. Based on response-level effect sizes, relative to
the 1st response to a card, on average the 2nd response had PFN1.5 values that were about
4/10 of a SD lower, the 3rd response was about 7/10 of a SD lower, and the 4th response
had an average PFN1.5 value that was about 9/10 of a SD lower than for the 1st response.
With respect to card number, in order, predicted PFN1.5 was highest for Card 5, followed
by Cards 1 and 7, Card 3, Card 4, Card 2, Card 10, Card 8, Card 6, and Card 9.
According to response-level Cohen’s d values, Card 5 was about a 1/10 of a SD higher
than the reference value for Card 1; Cards 7 and 3 were about 1/10 of a SD lower than
Card 1; Cards 4, 2, 10, and 8 were about 2/10 to 4/10 a SD lower than Card 1; Card 6 was
about 7/10 of a SD lower; and Card 9 was about 9/10 of a SD lower than Card 1. The
R_InCard*card number interaction effect parameter estimates also contributed small
adjustments to predicted PFN1.5 scores. Lastly, there was statistical support for
specifying repeated measurements of PFM within R_InCard within card number.
165
Summary of Variable Structures Across Modeling Techniques
Analyses of the Criterion Database were completed using a total of 159 protocols,
collectively containing 3,897 responses with form demand. At the response level across
the variables of Diagnostic Severity, FA, PFM, and PFN1.5, the scores covered the full
range of possible values (except for PFN1.5, which still had a large range), and also had
high degrees of variability across responses within the Criterion Database. The full
ranges and large standard deviations were useful to consider before examining results of
analyses because they indicate good spread of scores, and based on the theory behind the
current research, it was anticipated that FA, PFM, and PFM1.5 all relate to reality testing
ability and would correlate with Diagnostic Severity. Therefore, the lack of range
restriction increases the possible effect sizes and power of the analyses, and reduces the
chance of making Type II errors (i.e., failing to reject the null hypothesis when the null
hypothesis is false).
Across the various analyses completed at different levels of aggregation, there
was resounding evidence that the structure of the cards and the Rorschach task itself
produce deviations in goodness-of-fit and frequency scores that cannot be entirely
attributed to stable characteristics of the test-taker. When regression equations were
computed independently for each person when predicting FA, PFM, and PFN1.5
(specified through the random effects commands), there were very consistent clustering
effects in the data due to card number and due to response within card. Understanding the
structural patterns of the fit and frequency data is an important undertaking in forming the
foundation for future research on Rorschach perceptual accuracy scoring.
166
Card number and R_InCard main effects, as well as their interaction, accounted
for a significant portion of the score variance in FA, PFM, and PFN1.5. Across all 3
scores, the predicted scores are lowered with each subsequent response within card, as a
main effect across cards. Additionally, the factor level of card number (i.e., which card
the response is being given to) also impacts the intercept of the 3 variables, but the
pattern is a bit less consistent than is seen with R_InCard. Generally speaking, the main
effects of card number across the different sets of models indicate that FA, PFM, and
PFN1.5 predicted scores tend to be highest for Card 5, Card 1, and Card 7 (in decreasing
score order); they tend to be lowest on Card 9, followed by Card 6. PFM scores showed a
slight deviation in this pattern, with a spike in PFM scores on Card 3 due to the popular
response to locations D1, D9, and W, and the extremely common response object of bow
or butterfly to the D3 location. The patterns observed in the HLM modeling were also
evidenced in the response-level and protocol-level descriptive statistics. The protocollevel statistics reinforced the pattern of FA, PFM, and PFN1.5 scores decreasing with
each subsequent response within a card, and it can be interpreted to mean that with each
subsequent response within a card, the objects used in constructing the responses have
worse perceptual fit, are less-commonly-used objects, and are also commonly-used
objects in fewer countries. Interestingly, the protocol-level standard deviations also
revealed that FA, PFM, and PFN1.5 have more variation on later responses within card;
R_InCard 1 and 2 have tighter distributions than R_InCard 3 and 4 across FA, PFM, and
PFN1.5. In other words, people’s scores tend to scatter more on later responses within
cards. However, reduced sample size with each subsequent response within card likely
accounts for a portion of this effect.
167
The HLM modeling accounts for clustering of scores within people, but the
descriptive statistics do not. Therefore, it can be concluded that there were structural
consistencies in the data across the FA and PF indices that were present within people as
well as across people. There was also substantially more unexplained within-person
variance in scores than between-person variance in scores in the HLM models. This is
evidence that the patterns of scores within person, organized by card number and by
R_InCard, were stable and pronounced. However, there is also a lot of residual withinperson variance that has not yet been accounted for by the modeling. It was anticipated
that Diagnostic Severity would correlate with mean FA, PFM, and PFN1.5 at the protocol
level, and that the variables might have stronger or weaker relationships with Diagnostic
Severity as an effect of card number or response within card. Surprisingly, Diagnostic
Severity had very little association with the fit and frequency scores. There were small
correlations between Diagnostic Severity and protocol-level FA over all responses, and
for responses to Card 4; correlation coefficients were near significance for FA on Card 6,
and for the 1st and 2nd responses to each card. The correlations were in the expected
direction, with higher Diagnostic Severity scores corresponding with lower FA scores.
However, when an adjusted alpha was used to account for running multiple tests of
exploratory correlations, the correlation with FA for responses to Card 4 no longer
reached significance. There were no statistically-significant correlations between
Diagnostic Severity scores and the protocol-level PFM and PFN1.5 variables. This lack
of correlation also occurred when mean FA and sum of FI were calculated for the tail of
the object distribution, thus isolating the variables for responses with only unique objects
that occur in less than 1.5% of protocols. This finding also coincides with the absence of
168
Diagnostic Severity as a significant effect in most of the PFM and PFN1.5 HLM models.
Within the HLM models, there was a small linear effect of Diagnostic Severity on
predicted response-level FA scores, with each unit of increase on Diagnostic Severity
scores predicting a slight reduction in FA score (-0.05 units). In the HLM modeling of
PFM, if the response was the second response within a card, predicted PFM was reduced
by a very small amount (-0.31 units) for each unit of increase on Diagnostic Severity.
Strengths and Limitations of the Study
Although HLM proved to be a very useful technique in exploring and
understanding the structure of FA, PFM, and PFN1.5, I was not able to specify the
models such that Diagnostic Severity was the dependent variable, with predictor variables
including FA, PFM, PFN1.5, card number, and R_InCard. This is due to the requirement
in Hierarchical Linear Modeling techniques that the dependent variable be a level-1
variable (i.e., it has variance at the lowest level of the structure present within the data).
However, HLM did prove useful and appropriate for the structural modeling of the fit and
frequency variables, and supplemental techniques were used to help broaden the data
exploration.
The Diagnostic Severity criterion measure has strong interrater reliability and is
an interesting approach to quantifying a clinical construct. However, it is a 5-point scale
which is a somewhat blunt criterion measure, and it is based on ratings of billing
diagnoses that were assigned before patients underwent assessment. Billing diagnoses are
based on chart review, and are used to allow the hospital to bill for services. They are
therefore quite tentative diagnoses, and are oftentimes revised after the patient completes
the in-person evaluation. It should also be noted that the specific sample of patients used
169
was comprised of individuals who had complex clinical presentations, and that was the
reason they had been referred for assessment, making it quite possible that billing
diagnoses might have lower agreement with final diagnosis than is typically the case.
Diagnostic Severity is also not specific to reality testing ability; it was conceptually
derived to quantify the degree of overall dysfunction associated with a diagnosis, with
higher scores indicating higher levels of dysfunction. The scores cannot be parsed into a
domain of perceptual distortion. Although Diagnostic Severity scores were distributed
across the possible range, there were no non-patients in the sample, which would have
broadened the range of level of dysfunction, and thus increased power of the analyses.
Some of the benefits of using the Criterion Database were the relatively large sample
size, the nature of the sample as a clinical sample with a variety of diagnoses having been
applied to the patients in the sample, and that the Rorschach protocols had been modeled
for R-Optimization, with R-Optimized administration being the new standard for
Rorschach assessment using R-PAS.
FA clearly had more encouraging results than the PF variables with regard to
aligning with the criterion variable. Coder judgment and extrapolation from the tables
was not involved in assigning PF variable scores, though it is a component of FA scoring.
It is possible that the extrapolation and coder judgment process is an important element in
the final fit scores (i.e., FA) aligning with Diagnostic Severity, and this may help explain
why FA clearly outperformed PFM and PFN1.5. Of note, the PFM variable was also
more crude of a measure than it may seem at first glance. Response-level PFM scores
averaged the PF object-level percentage-based scores only when the PF value for the
object was 1.5% or higher. If three important objects were included in a response, but
170
only two of the objects had PF values of 1.5% or higher, the response-level PFM score
was the mean of two of the three objects because the third object had a missing value for
the object-level score. If a response contained one or more objects that incorporated form,
but none of the objects had PF values of 1.5% or higher, the response-level PFM score
was assigned a value of zero. These coding decisions should lead to an upward bias of
PFM. However, this was done because it would not have been reasonable to assign a
value of 0 to all objects with PFs less than 1.5%. This concept also applies to the
computation of PFM scores in the FA and PF lookup tables. The final international
object-level score reflects the percentage of protocols containing each object, averaged
across countries. However, the country-specific scores are missing values in the tables if
the object occurs in less than 1.5% of protocols. Therefore, the calculation of the
international percentage-based score also has upward bias. Response-level PFM also
evidenced a positively skewed distribution (skew = 2.02) and the variable was not
transformed prior to being modeled. PFN1.5 may also be limited in that it is based on a
single dichotomy; each unique object was either present in at least 1.5% of the withincountry samples or it was not. It is possible that some other cut-point (e.g., 2%, 4%, 10%
or 1%) would have been more discriminating.
Expected and Surprising Findings
Based on previous research and clinical use of the Rorschach, it was anticipated
that the Rorschach cards would differ to a degree on average FA, PFM, and PFN1.5
scores, as it is commonly professed that some cards are “easier” or “more difficult” than
others (e.g., Meyer & Viglione, 2008). It was also anticipated that R_InCard would have
an impact on the fit and frequency scores, with scores decreasing over responses within a
171
card. Prior to this research, one theory held by the FA and PF systematizers and their
students is that, within each card, people might deliver their subjectively good and more
obvious responses first, then deliver their less-obvious and more tentative responses that
even the test-taker might realize do not have strong perceptual fit.
It was surprising that Diagnostic Severity did not have stronger associations with
FA scores, and that it had no association with PFM and PFN1.5. With regard to
R_InCard, though differences in scores were expected as an effect of response within
card, it was not anticipated that R_InCard 1 and 2 would stand out as seeming to be the
more important responses for differentiating between people on FA. This conclusion is
supported by the trends seen in the data, with near-significant small correlations between
FA and Diagnostic Severity that were observed for the first and second response within a
card, and the lack of correlation between FA and Diagnostic Severity for the third and
fourth response within a card. The near-significant correlations between FA and
Diagnostic Severity that were observed for Cards 4 and 6 were not anticipated either,
because they are mid-level cards in that they are not the obvious “easy” or “hard” cards to
do well on with regard to FA, PFM, and PFN1.5. However, the correlations did not meet
significance when using 2-tailed adjusted alphas, and are also small so it is not clear
whether such results would replicate in other samples. Although the degree of association
between Diagnostic Severity and response-level FA does not change as a function of card
number or R_InCard (i.e., the interactions were not significant), from the FA protocollevel analyses it does appear to be the case that Cards 4 and 6, and response within card 1
and 2, may help differentiate levels of diagnostic severity to a small degree.
172
Following the analyses, further attempts were made to understand the patterns
seen with the Diagnostic Severity scores. In the meta-analyses by Mihura et al. (2013),
CS X+% and X-% differentiated psychotic disorder samples from comparison psychiatric
samples with medium effect sizes (r = .31, p < .01; r = .47, p < .01), suggesting that it
would be possible to have similar effect sizes when exploring FA, PFM, and PFN1.5.
However, in Meyer et al. (2011), Diagnostic Severity scores did not have significant
correlations with R-PAS FQ indices (non-significant p values: FQo% r = -.14; FQ-% r =
.13; WD-% r = .15), but correlations were significant for 3 of 4 CS FQ indices
(significant p values: FQo% r = -.26; FQ-% r = .29; WD-% r = .28). This indicates that
Diagnostic Severity was not a strong criterion for FQ-%, which is the primary CS and RPAS FQ variable of interest in studying reality testing and psychosis. If the moderate
effect sizes seen in the meta-analyses are considered to be a good indication of the effect
sizes that could be anticipated when differentiating psychotic disordered patients from
non-psychotic disordered patients using the Rorschach, it becomes apparent that the
criterion variable in validity studies must be highly correlated with the true construct of
interest to have enough power in the analyses to have significant results without using an
extremely large sample. Given that the CS FQ indices had small correlations with
Diagnostic Severity, and R-PAS FQ indices did not have significant correlations with
Diagnostic Severity (though the effect size magnitudes were .13-.15 for 3 of the 4
indices), it seems quite possible that the Diagnostic Severity scores were too rough and
non-specific to the construct of interest to result in higher associations with the FA, PFM,
and PFN1.5 variables. Finding a criterion measure that allows for accurate and
dimensional measurement of perceptual accuracy is an incredibly difficult task. In
173
previous Rorschach research, similar problems have arisen. For example, in Horn (2009),
across six performance-based measures of accuracy of perception that ranged from very
basic neuropsychological accuracy tests to very complex interpersonal perception tasks,
there was minimal association between criterion measure scores. This made it much more
difficult to interpret the moderate correlations observed between FA and FQ indices and a
few of the criterion measures. However, it was ultimately concluded that FA seems to
indicate a more emotionally-removed and colder cognitive style of perceptual accuracy,
while Form Quality seems to encompass a warmer and more emotionally-involved style
of accuracy.
Conclusions
In essence, the Rorschach cards have a lot of perceptual structure present in the
inkblots, and people seem to attend to that structure of the cards to a high degree,
regardless of their level of psychopathology. Therefore, there is much more consistency
in fit and frequency scores between people than within people when factors such as the
card number and which response people are on within the card are considered. That being
said, there is still a high level of unexplained residual variance in fit and frequency scores
within people. The structure of FA, PFM, and PFN1.5 by card number and by response
within card has been detailed. This explanation of structure can benefit researchers, and
eventually clinicians, as it will help inform expectations for fit and frequency
performance on the Rorschach. Knowing the structural contributions to the scores can
allow researchers to account for those effects in test-taker performance on the test, and in
the future this will ideally reduce erroneous assumptions about clinical constructs like
reality testing ability due to score variance that is explained in substantial part by
174
elements of the cards rather than characteristics of the test-taker. For instance, observing
relatively poor perceptual accuracy on Card 9 is to be expected and that says something
about that particular inkblot, not necessarily about the person generating a response.
Next steps in this line of research could include a follow-up criterion validity
study of the frequency scores to see if the lack of association between the criterion
measure and the Rorschach scores replicate. Additionally, there could be further
exploration of how to best adapt FA (and potentially PFM and/or PFN1.5, if future results
are highly promising) to improve the FQ system of scoring for perceptual accuracy using
the Rorschach. Interesting approaches to future research could include isolation of just
the first and second responses to each card, and investigating Form Accuracy and Percept
Frequency scores for those initial responses. Researchers could also explore the reaction
time of test-takers in the delivery of responses, and how reaction time might moderate
relationships between variables. For example, if reaction time is extremely fast, the
person may be delivering a very typical and obvious response. If reaction time is more
delayed, the person may be delivering more unique responses that might have stronger
association with a criterion variable. Interestingly though, social and cognitive
psychologists might purport that responses with longer reaction time may also be
indicative of a response search process that includes higher levels of social desirability.
Another possible approach is to remove the responses that contain objects that are spikes
in the frequency distribution (e.g., on Cards 1, 3, 5, and 7) from the analyses, which
would reduce the impact of extremely typical responses on the final analyses. Similarly, a
count variable could be created for the absence of those extremely typical responses
within a protocol.
175
The perceptual structure that is present in the Rorschach cards and the inherent
structure present in the sequential nature of the test-taking process contribute substantial
variance to final fit and frequency scores. Form Accuracy does demonstrate some
construct validity when modeled and correlated with Diagnostic Severity, and it is
hypothesized that the association is due to Form Accuracy functioning as an indicator of
reality testing ability, and thus psychosis. Despite having successfully modeled the
structure of FA, PFM, and PFN1.5, as well as their relationships to Diagnostic Severity
with consideration of structure within the cards and within the Rorschach test-taking
process, the high levels of unexplained residual variance indicate that there is substantial
person-specific information that contributes to meaningful score differences between
people. Knowing the structural contributions to the scores will allow researchers to
account for effects produced by the structure of the Rorschach task, and differentiate
those effects from the effects produced by person-specific variables, ultimately moving
the field closer to improved detection of reality testing abilities and psychosis.
176
References
American Psychiatric Association. (1994). Diagnostic and statistical manual of mental
disorders (4th ed.). Washington, DC: Author.
American Psychiatric Association. (2013). Diagnostic and statistical manual of mental
disorders (5th ed.). Washington, DC: Author.
Archer, R. P., & Gordon, R. A. (1988). MMPI and Rorschach indices of schizophrenic
and depressive diagnoses among adolescent inpatients. Journal of Personality
Assessment, 52, 276-287. doi: 10.1207/s15327752jpa5202_9
Archer, R. P. & Krishnamurthy, R. (1997). MMPI-A and Rorschach indices related to
depression and conduct disorder: An evaluation of the incremental validity
hypothesis. Journal of Personality Assessment, 69, 517-533. doi:
10.1207/s15327752jpa6903_7
Asari, T., Konishi, S., Jimura, K., Chikazoe, J., Nakamura, N., & Miyashita, Y. (2010).
Amygdalar enlargement associated with unique perception. Cortex, 46, 94-99.
doi: 10.1016/j.cortex.2008.08.001
Asari, T., Konishi, S., Jimura, K., Chikazoe, J., Nakamura, N., & Miyashita, Y. (2008).
Right temporopolar activation associated with unique perception. NeuroImage,
41, 145-152. doi: 10.1016/j.neuroimage.2008.01.059
Balcetis, E., & Dunning, D. (2006). See what you want to see: Motivational influences on
visual perception. Journal of Personality and Social Psychology, 91, 612-625.
doi: 10.1037/0022-3514.91.4.612
177
Balcetis, E., & Dunning, D. (2007). Cognitive dissonance and the perception of natural
environments. Psychological Science, 18, 917-921. doi: 10.1111/j.14679280.2007.02000.x
Bannatyne, L. A., Gacono, C. B., & Greene, R. L. (1999). Differential patterns of
responding among three groups of chronic, psychotic, forensic outpatients.
Journal of Clinical Psychology, 55, 1553-1565. doi: 10.1002/(SICI)10974679(199912)55:12<1553::AID-JCLP12>3.0.CO;2-1
Baron-Cohen, S., Wheelwright, S., Hill, J., Raste, Y., & Plumb, I. (2001). The “Reading
the Mind in the Eyes” Test Revised Version: A study with normal adults, and
adults with Asperger Syndrome or high-functioning autism. Journal of Child
Psychology and Psychiatry, 42, 241-251. doi: 10.1111/1469-7610.00715
Beaubien, J. M., Hamman, W. R., Holt, R. W., & Boehm-Davis, D. A. (2001). The
application of hierarchical linear modeling (HLM) techniques to commercial
aviation research. Proceedings of the 11th annual symposium on aviation
psychology, Columbus, OH: The Ohio State University Press.
Beck, S. J. (1938). Personality structure in schizophrenia: A Rorschach investigation in
81 patients and 64 controls. Nervous & Mental Disorders Monograph Series, 63.
Beck, S. J., Beck, A., Levitt, E., & Molish, H. (1961) Rorschach’s test. I: Basic processes
(3rd ed.). New York, NY: Grune & Stratton.
Benton, A. L., Sivan, A. B., deS. Hamsher, K., Varney, N. R., & Spreen, O. (1983).
Benton Judgment of Line Orientation (Forms H & V and record forms). Lutz, FL:
Psychological Assessment Resources.
178
Berkowitz, M., & Levine, J. (1953). Rorschach scoring categories as diagnostic “signs.”
Journal of Consulting Psychology, 17, 110-112. doi: 10.1037/h0062113
Blais, M. A., Hilsenroth, M. J., Castlebury, F., Fowler, C. J., & Baity, M. R. (2001).
Predicting DSM-IV cluster B personality disorder criteria from MMPI-2 and
Rorschach data: A test of incremental validity. Journal of Personality Assessment,
76, 150-168. doi: 10.1207/S15327752JPA7601_9
Bruner, J. S. (1957). On perceptual readiness. Psychological Review, 64, 123-152. doi:
10.1037/h0043805
Camara, W. J., Nathan, J. S., & Puente, A. E. (2000). Psychological test usage:
Implications in professional psychology. Professional Psychology: Research and
Practice, 31, 141-154. doi: 10.1037/0735-7028.31.2.141
Carney, D. R., Colvin, C. R., & Hall, J. A. (2007). A thin slice perspective on the
accuracy of first impressions. Journal of Research in Personality, 41, 1054-1072.
doi: 10.1016/j.jrp.2007.01.004
Cicchetti, D. V. (1994). Guidelines, criteria, and rules of thumb for evaluating normed
and standardized assessment instruments in psychology. Psychological
Assessment, 6, 284-290. doi: 10.1037/1040-3590.6.4.284
Clemence, A. J., & Handler, L. (2001). Psychological assessment on internship: A survey
of training directors and their expectations for students. Journal of Personality
Assessment, 76, 18-47. doi: 10.1207/S15327752JPA7601_2
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155-159. doi:
10.1037/0033-2909.112.1.155
179
Dao, T. K., & Prevatt, F. (2006). A psychometric evaluation of the Rorschach
Comprehensive System’s Perceptual Thinking Index. Journal of Personality
Assessment, 86, 180-189. doi: 10.1207/s15327752jpa8602_07
Dao, T. K., Prevatt, F., & Horne, H. L. (2008). Differentiating psychotic patients from
nonpsychotic patients with the MMPI-2 and Rorschach. Journal of Personality
Assessment, 90, 93-101. doi: 10.1080/00223890701693819
Dawes, R. M. (1999). Two methods for studying the incremental validity of a Rorschach
variable. Psychological Assessment, 11(3), 297-302. doi: 10.1037/10403590.11.3.297
Dean, K. L., Viglione, D. J., Perry, W., & Meyer, G. J. (2007). A method to optimize the
response range while maintaining Rorschach Comprehensive System validity.
Journal of Personality Assessment, 89, 149-161. doi:
10.1080/00223890701468543
Dean, K. L., Viglione, D. J., Perry, W., & Meyer, G. J. (2008). Correction to: “A method
to optimize the response range while maintaining Rorschach Comprehensive
System validity”. Journal of Personality Assessment, 90, 2. doi:
10.1080/00223890701845542
Diener, M. J., Hilsenroth, M. J., Shaffer, S. A., & Sexton, J. E. (2011). A meta‐analysis of
the relationship between the Rorschach Ego Impairment Endex (EII) and
psychiatric severity. Clinical Psychology & Psychotherapy, 18, 464-485. doi:
10.1002/cpp.725
Dzamonja-Ignjatovic, T., Smith, B. L., Jocic, D. D., & Milanovic, M. (2013). A
comparison of new and revised Rorschach measures of schizophrenic functioning
180
in a Serbian clinical sample. Journal of Personality Assessment, 95, 471-478. doi:
10.1080/00223891.2013.810153
Ekstrom, R. B., French, J. W., Harman, H. H., & Dermen, D. (1976). Manual for kit of
factor-referenced cognitive tests. Princeton, NJ: Eductional Testing Service.
Epstein, S. (1979). The stability of behavior: I. On predicting most of the people much of
the time. Journal of Personality and Social Psychology, 37, 1097-1126. doi:
10.1037/0022-3514.37.7.1097
Epstein, S. (1980). The stability of behavior: II. Implications for psychological research.
American Psychologist, 35, 790-806. doi: 10.1037/0003-066X.35.9.790
Exner, J. E. (1974). The Rorschach: A comprehensive system: Vol 1. New York, NY:
Wiley & Sons.
Exner, J. E. (1984). More on the schizophrenia index. Alumni newsletter. Bayville, NY:
Rorschach Workshops.
Exner, J. E. (1986). The Rorschach: A comprehensive system: Vol 1. Basic foundations
(2nd ed.). New York, NY: Wiley & Sons.
Exner, J. E., Jr. (1989). Searching for projection in the Rorschach. Journal of Personality
Assessment, 53, 520-536. doi: 10.1207/s15327752jpa5303_9
Exner, J. E., Jr. (1991). The Rorschach: A comprehensive system: Vol. 2. Interpretation
(2nd ed.). New York, NY: Wiley& Sons.
Exner, J. E., Jr. (2000). A primer for Rorschach interpretation. Asheville, NC: Rorschach
Workshops.
181
Exner, J. E., Jr. (2003). The Rorschach: A comprehensive system: Vol. 1. Basic
foundations and principles of interpretation (4th ed.). New York, NY: Wiley&
Sons.
Exner, J. E., Jr. (2007). A new U.S. adult nonpatient sample. Journal of Personality
Assessment, 89(S1), S154-S158. doi: 10.1080/00223890701583523
Friedman, H. (1953). Perceptual regression in schizophrenia: An hypothesis suggested by
the use of the Rorschach test. Journal of Projective Techniques, 17, 171-185. doi:
10.1080/08853126.1953.10380477
Ganellen, R. J. (1996). Comparing the diagnostic efficiency of the MMPI, MCMI-II, and
Rorschach: A review. Journal of Personality Assessment, 67, 219-243. doi:
10.1207/s15327752jpa6702_1
Ganellen, R. J., Wasyliw, O. E., & Haywood, T. W. (1996). Can psychosis be malingered
on the Rorschach? An empirical study. Journal of Personality Assessment, 66, 6580. doi: 10.1207/s15327752jpa6601_5
Garb, H. N. (1984). The incremental validity of information used in personality
assessment. Clinical Psychology Review, 4, 641-655. doi: 10.1016/02727358(84)90010-2
Garson, G. D. (Ed.). (2013). Hierarchical linear modeling: Guide and applications. Los
Angeles, CA: Sage Publications.
Hathaway, S. R., & McKinley, J. C. (1967). MMPI manual (revised ed.). New York, NY:
Psychological Corporation.
Heck, R. H., Thomas, S. L., & Tabata, L. N. (2010). Multilevel and longitudinal
modeling with IBM SPSS. New York: Routledge.
182
Hertz, M. R. (1970). Frequency tables for scoring Rorschach responses with code charts,
normal and rare details, F+ and F– responses, and popular and original
responses (5th ed.). Cleveland, OH: The Press of Case Western Reserve
University.
Hilsenroth, M. J., Eudell-Simmons, E. M., DeFife, J. A., & Charnas, J. W. (2007). The
Rorschach Perceptual-Thinking Index (PTI): An examination of reliability,
validity, and diagnostic efficiency. International Journal of Testing, 7(3), 269291. doi: 10.1080/15305050701438033
Hilsenroth, M. J., Fowler, J. C., & Padawer, J. R. (1998). The Rorschach Schizophrenia
Index (SCZI): An examination of reliability, validity, and diagnostic efficiency.
Journal of Personality Assessment, 70, 514-534. doi:
10.1207/s15327752jpa7003_9
Hoelzle, J. B., & Meyer, G. J. (2008). The factor structure of the MMPI-2 Restructured
Clinical (RC) Scales. Journal of Personality Assessment, 90, 443-455. doi:
10.1080/00223890802248711
Horn, S. L. (2009). Rorschach perception: Multimethod validation of Form Accuracy.
Unpublished master’s thesis, University of Toledo, Ohio.
Horn, S. L., Meyer, G. J., Viglione, D. J., & Ozbey, G. T. (2008, March). The validity of
the Rorschach Human Representational Variable using Form Accuracy. In G. J.
Meyer (Chair), Assessing perceptual accuracy on the Rorschach using Form
Accuracy ratings versus Form Quality scores. Symposium conducted at the
annual meeting of the Society for Personality Assessment, New Orleans, LA.
183
Howell, D. C. (2010). Statistical methods for psychology (7th ed.). Belmont, CA:
Wadsworth.
Hox, J. J. (2010). Multilevel analysis: Techniques and applications (2nd ed.). New York:
Routledge.
Jorgensen, K., Andersen, T. J., & Dam, H. (2000). The diagnostic efficiency of the
Rorschach Depression Index and Schizophrenia Index: A review. Assessment, 7,
259-280. doi: 10.1177/107319110000700306
Kimball, A. J. (1950). Evaluation of form-level in the Rorschach. Journal of Projective
Techniques, 14, 219-244. doi: 10.1080/08853126.1950.10380327
Kimhy, D., Corcoran, C., Harkavy-Friedman, J. M., Ritzler, B, Javitt, D. C., &
Malaspina, D. (2007). Visual form perception: A comparison of individuals at
high risk for psychosis, recent onset schizophrenia and chronic schizophrenia.
Schizophrenia Research, 97, 25-34. doi: 10.1016/j.schres.2007.08.022
Kinder, B., Brubaker, R., Ingram, R., & Reading, E. (1982). Rorschach form quality: A
comparison of the Exner and Beck systems. Journal of Personality Assessment,
46, 131-138. doi: 10.1207/s15327752jpa4602_4
Knopf, I. J. (1956). Rorschach summary scores in differential diagnosis. Journal of
Consulting Psychology, 20, 99-104. doi: 10.1037/h0049120
Koivisto, M., & Revonsuo, A. (2007). How meaning shapes seeing. Psychological
Science, 18, 845-849. doi: 10.1111/j.1467-9280.2007.01989.x
Leichtman, M. (1996). The nature of the Rorschach task. Journal of Personality
Assessment, 67, 478-493. doi: 10.1207/s15327752jpa6703_4
Luke, D. (2004). Multilevel modeling. Thousand Oaks, CA: Sage Publications.
184
Lunazzi, H. A., Urrutia, M. I., de La Fuente, M. G., Elias, D., Fernandez, F., & de La
Fuente, S. (2007). Rorschach Comprehensive System data for a sample of 506
adult nonpatients from Argentina. Journal of Personality Assessment, 89(S1), S7S12. doi: 10.1080/00223890701582806
Mason, B. J., Cohen, J. B., & Exner, J. E., Jr. (1985). Schizophrenic, depressive, and
nonpatient personality organizations described by Rorschach factor structures.
Journal of Personality Assessment, 49, 295-305. doi:
10.1207/s15327752jpa4903_16
Mayman, M. (1970) Reality contact, defense effectiveness, and psychopathology in
Rorschach form level scores. In B. Klopfer, M. Meyer, & F. Brawer (Eds.),
Developments in the Rorschach technique. III: Aspects of personality structure
(pp. 11-46). New York, NY: Harcourt Brace Jovanovich.
McGrath, R. E. (2008). The Rorschach in the context of performance-based personality
assessment. Journal of Personality Assessment, 90, 465-475. doi:
10.1080/00223890802248760
Meyer, G. J. (1997). On the integration of personality assessment methods: The
Rorschach and MMPI. Journal of Personality Assessment, 68, 297-330. doi:
10.1207/s15327752jpa6802_5
Meyer, G. J. (2000). Incremental validity of the Rorschach prognostic rating scale over
the MMPI ego strength scale and IQ. Journal of Personality Assessment, 74, 356370. doi: 10.1207/S15327752JPA7403_2
Meyer, G. J. (2001). Evidence to correct misperceptions about Rorschach norms. Clinical
Psychology: Science and Practice, 8, 389-396. doi: 10.1093/clipsy/8.3.389
185
Meyer, G. J., & Eblin, J. J. (2012). An overview of the Rorschach Performance
Assessment System (R-PAS). Psychological Injury and Law, 5, 107-121. doi:
10.1007/s12207-012-9130-y
Meyer, G. J., Erdberg, P., & Shaffer, T. W. (2007). Toward international normative
reference data for the Comprehensive System. Journal of Personality Assessment,
89(S1), S201-S216. doi: 10.1080/00223890701629342
Meyer, G. J., Hsiao, W., Viglione, D. J., Mihura, J. L., & Abraham, L. M. (2013).
Rorschach scores in applied clinical practice: A survey of perceived validity by
experienced clinicians. Journal of Personality Assessment, 95, 351-365. doi:
10.1080/00223891.2013.770399
Meyer, G. J., & Kurtz, J. E. (2006). Advancing personality assessment terminology: Time
to retire “Objective” and “Projective” as personality test descriptors. Journal of
Personality Assessment, 87, 223-225. doi: 10.1207/s15327752jpa8703_01
Meyer, G. J., Patton, W. M., & Henley, C. (2003, March). A comparison of form quality
tables from Exner, Beck, and Hertz. Paper presented at the annual meeting of the
Society for Personality Assessment, San Francisco, CA.
Meyer, G. J., & Resnick, G. D. (1996). Assessing ego impairment: Do scoring
procedures make a difference? Paper presented at the 15th International Research
Conference, Boston, MA.
Meyer, G. J., Riethmiller, R. J., Brooks, R. D., Benoit, W. A., & Handler, L. (2000). A
replication of Rorschach and MMPI-2 convergent validity. Journal of Personality
Assessment, 74, 175-215. doi: 10.1207/S15327752JPA7402_3
186
Meyer, G. J., Viglione, D. J., Mihura, J. L., Erard, R. E., & Erdberg, P. (2011).
Rorschach Performance Assessment System: Administration, coding,
interpretation, and technical manual. Toledo, OH: Rorschach Performance
Assessment System.
Meyer, G. J., & Viglione, D. J. (2008, March). Overview of the Form Accuracy rating
project and general findings. In G. J. Meyer (Chair), Assessing perceptual
accuracy on the Rorschach using Form Accuracy ratings versus Form Quality
scores. Symposium conducted at the annual meeting of the Society for Personality
Assessment, New Orleans, LA.
Mihura, J. L., Meyer, G. J., Dumitrascu, N., Bombel, G. (2013). The validity of
individual Rorschach variables: Systematic reviews and meta-analyses of the
Comprehensive System. Psychological Bulletin, 139, 548-605. doi:
10.1037/a0029406
Minassian, A., Granholm, E., Verney, S., & Perry, W. (2004). Pupillary dilation to simple
vs. complex tasks and its relationship to thought disturbance in schizophrenia
patients. International Journal of Psychophysiology, 52, 53-62. doi:
10.1016/j.ijpsycho.2003.12.008
Miralles Sangro, F. (1997). Location tables, Form Quality, and Popular responses in a
Spanish sample of 470 subjects. In I. B. Weiner (Ed.), Rorschachiana XXII:
Yearbook of the International Rorschach Society (pp. 38-66). Ashland, OH:
Hogrefe & Huber.
Mohammadi, M. R., Hosseininasab, A., Borjali, A., & Mazandarani, A. A. (2013).
Reality testing in children with childhood-onset schizophrenia and normal
187
children: A comparison using the Ego Impairment Index on the Rorschach.
Iranian Journal of Psychiatry, 8, 44–50.
Moore, R. C., Viglione, D. J., Rosenfarb, I. S., Patterson, T. L., & Mausbach, B. T.
(2013). Rorschach measures of cognition relate to everyday and social
functioning in schizophrenia. Psychological Assessment, 25, 253-263. doi:
10.1037/a0030546
Netter, B. E. C., & Viglione, D. J., Jr. (1994). An empirical study of malingering
schizophrenia on the Rorschach. Journal of Personality Assessment, 62, 45-57.
doi: 10.1207/s15327752jpa6201_5
Neville, J. W. (1995). Validating the Rorschach measures of perceptual accuracy.
(Doctoral dissertation, University of Arkansas, 1993). Dissertation Abstracts
International, 55, 4128B.
Olson, I. R., Plotzker, A., & Ezzyat, Y. (2007). The enigmatic temporal pole: A review of
findings on social and emotional processing. Brain, 130, 1718-1731. doi:
10.1093/brain/awm052
Ozbey., G. T., Meyer, G. J., Viglione, D. J., Dean, K., & Horn, S. L. (2008, March). The
validity of Rorschach Perceptual Thinking Index and Ego Impairment Index-2
using Form Accuracy. In G. J. Meyer (Chair), Assessing perceptual accuracy on
the Rorschach using Form Accuracy ratings versus Form Quality scores.
Symposium conducted at the annual meeting of the Society for Personality
Assessment, New Orleans, LA.
Parisi, S., Pes, P., & Cicioni, R. (2005). Tavole di localizzazione Rorschach, Volgari e
R+ statistiche. Disponibili presso I’lstituto.
188
Perry, W., Minassian, A., Cadenhead, K., Sprock, J., & Braff, D. (2003). The use of the
Ego Impairment Index across the schizophrenia spectrum. Journal of Personality
Assessment, 80, 50-57. doi: 10.1207/S15327752JPA8001_13
Perry, W., & Viglione, D. J. (1991). The Ego Impairment Index as a predictor of outcome
in melancholic depressed patients treated with tricyclic antidepressants. Journal of
Personality Assessment, 56, 487-501. doi: 10.1207/s15327752jpa5603_10
Peterson, C. A., & Horowitz, M. (1990). Perceptual robustness of the nonrelationship
between psychopathology and popular responses on the Hand Test and the
Rorschach. Journal of Personality Assessment, 54, 415-418. doi:
10.1207/s15327752jpa5401&2_38
Piotrowski, Z. (1957). Perceptanalysis; a fundamentally reworked, expanded, and
systematized Rorschach method. New York, NY: Macmillan.
Ptucha, K., Saltman, C., Filizetti, K., Viglione, D. J., & Meyer, G. J. (2008, March).
Differentiating Psychiatric Severity using Form Accuracy and Form Quality. In
G. J. Meyer (Chair), Assessing perceptual accuracy on the Rorschach using Form
Accuracy ratings versus Form Quality scores. Symposium conducted at the
annual meeting of the Society for Personality Assessment, New Orleans, LA.
Rapaport, D., Gill, M., & Schafer, R. (1946). Diagnostic psychological testing: The
theory, statistical evaluation, and diagnostic application of a battery of tests (Vol.
2). Chicago, IL: The Yearbook Publishers.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: Applications and
data analysis methods (2nd ed.). Thousand Oaks, CA: Sage Publications.
189
Rickers-Ovsiankina, M. (1938). The Rorschach test as applied to normal and
schizophrenic subjects. British Journal of Medical Psychology, 17, 227-257. doi:
10.1111/j.2044-8341.1938.tb00296.x
Ritsher, J. B. (2004). Association of Rorschach and MMPI psychosis indicators and
schizophrenia spectrum diagnoses in a Russian clinical sample. Journal of
Personality Assessment, 83, 46-63. doi: 10.1207/s15327752jpa8301_05
Rizzo, d. C., Parisi, S., & Pes, P. (1980). Manuale per la raccolta, localizzazione e
siglatura delle interpretazioni Rorschach. Rome: Kappa.
Rorschach, H. (1942). Psychodiagnostics: A diagnostic test based on perception. Bern,
Switzerland: Hans Humber. (Original work published in German in 1921).
Rushton, J. P., Brainerd, C. J., & Pressley, M. (1983). Behavioral development and
construct validity: The principle of aggregation. Psychological Bulletin, 94, 1838. doi: 10.1037/0033-2909.94.1.18
Schafer, R. (1954). Psychoanalytic interpretation in Rorschach testing. New York, NY:
Grune & Stratton.
Sherman, M. (1952). A comparison of formal and content factors in the diagnostic testing
of schizophrenia. Genetic Psychology Monographs, 46, 183-234.
Smith, S. R., Bistis, K., Zahka, N. E., & Blais, M. A. (2007). Perceptual-organizational
characteristics of the Rorschach task. The Clinical Neuropsychologist, 21, 789799. doi: 10.1080/13854040600800995
Spitzer, R. L., Endicott, J., & Robins, E. (1978). Research Diagnostic Criteria: Rationale
and reliability. Archives of General Psychiatry, 35, 773-782. doi:
10.1001/archpsyc.1978.01770300115013
190
Su, W., Viglione, D. J., Green, E. E., Tam, W. C., Su, J., & Chang, Y. (2015). Cultural
and linguistic adaptability of the Rorschach Performance Assessment System as a
measure of psychotic characteristics and severity of mental disturbance in
Taiwan. Psychological Assessment, May, 1-13. doi: 10.1037/pas0000144
Sundberg, N. D. (1961). The practice of psychological testing in clinical services in the
United States. American Psychologist, 16, 79-83. doi: 10.1037/h0040647
Takahashi (2009). [English translation of Rorschach object frequency counts in
nonpatients from Japan]. Unpublished raw data.
van Os, J., & Tamminga, C. (2007). Deconstructing psychosis. Schizophrenia Bulletin,
33, 861-862. doi: 10.1093/schbul/sbm066
Viglione, D. J., Jr. (1996). Data and issues to consider in reconciling self-report and the
Rorschach. Journal of Personality Assessment, 67, 579-587. doi:
10.1207/s15327752jpa6703_12
Viglione, D. J., Giromini, L., Gustafson, M., & Meyer, G. J. (2014). Developing
continuous variable composites for Rorschach measures of thought problems,
vigilance, and suicide risk. Assessment, 21, 42-49. doi:
10.1177/1073191112446963
Viglione, D. J., Meyer, G. J., Ptucha, K., Horn, S. L., & Ozbey, G. T. (2008, July). Initial
validity data for the form accuracy project from three studies. In G. J. Meyer
(Chair), Advancing the assessment of perceptual accuracy using form quality and
form accuracy, Part 2: Validity Data for the Brazilian and U.S. projects.
Symposium presented at the XIXth Congress of the International Rorschach
Society, Leuven, Belgium; July 23.
191
Viglione, D., Perry, W., Giromini, L., & Meyer, G. (2011). Revising the Rorschach Ego
Impairment Index to accommodate recent recommendations about improving
Rorschach validity. International Journal of Testing, 11, 349-364. doi:
10.1080/15305058.2011.589019
Viglione, D. J., Perry, W., Jansak, D., Meyer, G., & Exner, J. J. (2003). Modifying the
Rorschach human experience variable to create the human representational
variable. Journal of Personality Assessment, 81, 64-73. doi:
10.1207/S15327752JPA8101_06
Viglione, D. J., Perry, W., & Meyer, G. (2003). Refinements in the Rorschach Ego
Impairment Index incorporating the Human Representational variable. Journal of
Personality Assessment, 81, 149-156. doi: 10.1207/S15327752JPA8102_06
Viglione, D. J., & Rivera, B. (2003). Assessing personality and psychopathology with
projective tests. In J. R. Graham & J. A. Naglieri (Eds.), Handbook of
Psychology: Vol. 10. Assessment psychology (1st ed., pp. 531-553). New York:
Wiley & Sons.
Viglione, D. J., & Rivera, B. (2013). Performance assessment of personality and
psychopathology. In J. R. Graham & J. A. Naglieri (Eds.), Handbook of
Psychology: Vol. 10. Assessment psychology (2nd ed., pp. 600-621). Hoboken, NJ:
Wiley & Sons.
Villemor-Amaral, Yazigi, Nascimento, Primi, Semer, & Petrini (2008). [English
translation of Rorschach object frequency counts in nonpatients from Brazil].
Unpublished raw data.
192
Wagner, E. E. (1998). Perceptual integrations and the “normal” Rorschach percept.
Perceptual and Motor Skills, 86, 296-298. doi: 10.2466/pms.1998.86.1.296
Walker, R. G. (1953). An approach to standardization of Rorschach form-level. Journal
of Projective Techniques, 17, 426-436. doi: 10.1080/08853126.1953.10380508
Weiner, I. B. (1998). Principles of Rorschach interpretation. Mahwah, NJ: Lawrence
Erlbaum Associates.
Weiner, I. B. (2003). Principles of Rorschach interpretation (2nd ed.). Mahwah, NJ:
Lawrence Erlbaum Associates.
Wood, J. M., Garb, H. N., Teresa Nezworski, M., Lilienfeld, S. O., & Duke, M. C.
(2015). A second look at the validity of widely used Rorschach indices: Comment
on Mihura, Meyer, Dumitrascu, and Bombel (2013). Psychological Bulletin, 141,
236-249. doi: 10.1037/a0036005
Wood, J. M., Nezworski, M. T., & Garb, H. N. (2003). What’s right with the Rorschach?
The Scientific Review of Mental Health Practice, 2, 142-146. doi:
10.1037/t03306-000
193