THE VALIDITY OF THE JOINT STORY RETELL AS A MEASURE OF

THE VALIDITY OF THE JOINT STORY RETELL
AS A MEASURE OF YOUNG CHILDREN'S COMPREHENSION
OF FAMILIAR STORIES
Lynn F . Dempsey
School of Communication Sciences and Disorders
Submitted in partial fulfillment
of the requirements for the degree of
Master of Science
Faculty of Graduate Studies
The University of Western Ontario
London, Ontario
August, 1999
0 Lynn F. Dempsey 1999
National Library
Bibliotheque nationale
du Canada
Acquisitions and
Bibliographic Sewices
Acquisitions et
services bibliographiques
395 Wellington Street
OttawaON KlAON4
Canada
395, rue Wellington
OttawaON KlAON4
Canada
Your h& Votre mIbrmce
Our irre Norre relermcs
The author has granted a nonexclusive licence allowing the
National Library of Canada to
reproduce, loan, distribute or sell
copies of thls thesis in microform,
paper or electronic formats.
L'auteur a accorde une licence non
exclusive permettant a la
Bibliotheque nationale du Canada de
reproduire, preter, distribuer ou
vendre des copies de cette these sous
la forme de microfiche/film, de
reproduction sur papier ou sur format
electronique.
The author retains ownership of the
copyright in this thesis. Neither the
thesis nor substantial extracts fiom it
may be printed or otherwise
reproduced without the author's
permission.
L'auteur conserve la propriete du
droit d'auteur qui protege cette these.
Ni la these ni des extraits substantiels
de celle-ci ne doivent Stre imprimes
ou autrement reproduits sans son
autorisation.
Abstract
This study investigated the construct validity of the Joint Story Retell (JSR),a
newly developed measure of young children's oral discourse comprehension adapted from
the doze procedure. Thuty-eight children between 30 and 50 months of age were
presented with the JSR and several additional language comprehension and production
measures in order to investigate the developmental sensitivity, concurrent validity, and
content relevance of the JSR Results revealed that older children performed significantly
more accurately than younger children on the JSR, indicating that this measure is sensitive
to age differences. In addition, results revealed a moderately strong relationship between
the JSR and traditional comprehension questions, suggesting that the JSR demonstrates
concurrent validity with an accepted measure of discourse comprehension. Finally, results
indicated that the JSR minimires enabling factors such as memory and language
production, and thus has content relevance. These findings suggested that the JSR may
provide a valid measure of young children's discourse comprehension.
Key words: Joint Story Retell, doze, language, comprehension, discourse, measurement,
validity, children, stories
ACKNOWLEDGEMENTS
I would like to thank my chief advisor, Dr. Elizabeth Skarakis-Doyle, for her
assistance and support. Under her guidance my understanding of the research process and
my interest in the measurement of early language comprehension developed. I would also
like to thank my colleague, Tania Perfetti, for her assistance during the experimental
testing phase of the investigation. Thank-you to the members of my advisory committee
and examining board for their thoughtfbl comments on this research project. Thanks also
to Dr. Philip Doyle for his support during the preparation for my defense.
I would like to extend my sincere appreciation to the parents and children who
participated in this investigation. I am also grateful to the individuals and community
facilities who helped me to locate these families. This research was supponed by the
Harmonize for Speech Fund, Ontario District.
TABLE OF CONTENTS
Page
CERTIFICATE OF EXAMINATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
..
...
ui
11
ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv
TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
LIST OF APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
METHOD
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
Materials and Experimental Tea Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
.
DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72
VITA
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
LIST OF TABLES
Table
Description
Page
1
Mean Ages, and Mean Raw Scores and Standard Deviations
for Two Groups of Children on Pre-Experiment Tests . . . . . . . . . . . . . . . . . . . 16
2
Correlation Coefficients for Age, JSR, and Comprehension
Questions@=38) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3
Mean Number of Accurate Responses and Standard Deviations
on the JSR and Comprehension Questions for Two Age Groups . . . . . . . . . . . 33
4
Correlation Coefficients for Age, JSR and Comprehension
Questions for Children Between 42 and 50 months (g = 23) . . . . . . . . . . . . . . . 35
5
Correlation Coefficients for Age, JSR Comprehension
QuestionsandEM)T(n=24) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
LIST OF APPENDICES
Appendix
Page
Appendix A
Vocabulary pre-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Appendix B
Early Literacy Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
Appendix C
Original Version of "Splish Splash" . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Appendix D
Cloze (Joint Story Retell) Version of "Splish Splash . . . . . . . . . . . . . . 62
Appendix E
Comprehension Questions:Form A . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
Appendix F
Comprehension Questions:Form B . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
Appendix G
Expectancy Violation Detection Task Version of "Splish Splash" . . . . . 66
Appendix H
Joint Story Retell Scoring Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
Appendix 1
Wh-Question Scoring Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
Appendix I
Certification of Approval Of Human Research
vii
...................
71
Introduction
The nature of childhood language impairment and its long-term prognosis differ
when both expressive and receptive abilities rather than solely expressive abilities are
impaired (Thal,1991;Thal,Tobias, & Momson, 1991). A study of latetalkers by Thal et
al. (1991) suggested that young children who have both expressive and receptive language
delays have a much poorer chance of catching up with their age matched peers than latetalkers with normally developing comprehension. As Bates (1993) states, "what a chiid
knows is ultimately a better predictor of language ability than what a child does" (p. 233)
when he or she is very young. Given that impaired comprehension is associated with a
different long-term prognosis than expressive-only impairments, reliable and valid
measurement of young children's comprehension is essential. Early differential
identification of the language impairment most certainly depends on it. Unfortunately, the
development of comprehensive measurement of language comprehension in young
children has been limited by the influence of syntactic based theories of language
acquisition, the emphasis on decontextudized linguistic comprehension, and by the
inherent difficulties in reliably assessing young children (Bates, 1993).
Language comprehension is a broad construct that encompasses multiple levels,
including: the lexical or word level, the syntactic or phrase and clause levels, and the
discourse or text level. Individuals build representations based on information from all
these levels in order to acquire the full range of meaning from language @filer & Paul,
1995). Atthough all levels of comprehension are necessary for complete understanding of
language, most efforts at measuring young children's comprehension have focused on the
lexical and syntactic levels. This is the case even though toddlers and preschoolers are
regularly exposed to extended units of language rather than decontextualized words and
sentences (Snow, PerImam, & Nathan 1987). Although the young language-leaning child
is frequently exposed to expanses of language, measurement of young children's
comprehension of discourse has been relatively neglected. As a result. only a narrow
aspect
of a broad construct is typically evaluated. This has implications for the validity of
comprehension measurement.
Validity is not a property of a measurement instrument independent of its hction.
Rather, validity lies in the meaning and interpretation of the test scores (Anastasi, 1988;
McCauley & Swisher, 1984; Messick, 1995). Thus,an instrument that is vaiid for the
measurement of one attribute may be less valid for the measurement of another. This
means, in the case of comprehension, that a valid tea of literal level comprehension is not
necessarily a valid instrument for the measurement of discourse level comprehension or for
the measurement of language comprehension as a whole. The risk associated with the
paucity of measures that tap comprehension at the discourse level, is that performance on
Literal level comprehension measures may be used to draw inferences about discourse level
abilities. Yet,regardless of the validity with which a task measures a literal level
comprehension ability, when inferences about the broader conmct are made on the basis
of such a task, those inferences will not necessarily be valid.
The logic of this argument is clear when the specific abilities required to
comprehend language at each level are considered. At the lexical and syntactic levels,
decontextualized comprehension involves identifyurg the referents of single words and
decoding meaning relations within a sentence (Miller & Pad, 1995). At the discourse
level, comprehension, in addition to requiring word and sentence understanding, involves
3
making judgments based on social, textual, scriptal and other forms of prior knowledge to
determine what an utterance means, in relation to what else has gone on in the discourse
(Miller & Paul, 1995; Rees & Shulman, 1978). Since discourse comprehension requires
more than simply the ability to understand decontexhlalized words and sentences,
performance on tasks that measure only word and sentence understanding cannot be taken
as indicative of the ability to comprehend longer segments of language (McCabe, 1996).
However, the risk that invalid interpretations about comprehension will be made on the
basis of literal comprehension measures will remain until an acceptable measure of young
children's discourse comprehension is developed. Such a measure, in conjunction with
measures of lexical and syntactic level comprehension, might permit more comprehensive
measurement of language comprehension than is currently possible.
As has been discussed, the risk of drawing invalid conclusions about
comprehension might be lessened if a measure designed to tap discourse comprehension
were available. For such a measure to contribute rneaningfblly to comprehension
measurement as a whole. it would have to be valid in its own right. There are several
issues associated with the development of valid measurement tasks. One of the most
critical for discourse comprehension would be ensuring that tasks include representative
and relevant content. According to Messick (1995) if a test is to be valid, it must be
carefblly designed to include content that is representative of the domain of interest. The
test must also be designed to include only content that is relevant to the construct domain.
If the test is too broad, containing excess variance associated with other distinct
constructs, interpretation of test scores may be confounded (Messick, 1995).
Validity is built into a test tiom the outset through the choice of representative
content (Anastasi, 1988; Messick, 1995). Therefore, for a discourse comprehension
measure to be valid, it should include content that is representative of discourse, the
domain of interest. Discourse is present in a number of different genres, inc1ucliig
conversations and stories. It is an extraordinarily contextudked, profoundly social form of
language (McCabe, 1996). A representative measure of discourse should reflect this.
Unfortunately, the need to accurately reflect the contextual and social nature of discourse
can conflict with the need for rigorous, controlled measurement of this construct.
Conversation, while reflecting the social nature of contextualized discourse, is not easily
controlled since its content is often variable and difficult to predict. Story-reading on the
other hand, reflects the social nature of contextualized discourse and at the same time is a
form of discourse whose properties may be controlled to allow for consistent, reliable
measurement across children. and over time. It is not suggested that children's ability to
understand stories would necessarily be indicative of their ability to understand
conversation. But rather, that story, since it can be controlled, may be the most viable
form of discourse to measure.
Messick (1995) states that one way of ensuring that content is representative of
the construct is to select it in terms of its fimctional importance or ecological validity for
the population of interest. According to Messick, ( 1995) functional importance can be
considered in terms of what individuals amally do in the performance domain. Evidence
about the frequency with which young children engage in story-reading supports the
notion that it is an activity that has fbnctional importance in the domain of discourse
comprehension. According to Carlisle, (199 1) most young, middle-class children have
quite extensive experience with narratives encountered in story-reading with adults.
Interview or questiormaice responses by parents of two to five year old British and
American children suggest that the typical mainstream preschooler is probably read to on a
regular basis: 43 to 75% of children have reportedly been read to daily or more often
(Scarborough & Dobrich, 1994). Heath (1986) notes that in societies where Literacy is
valued, children give attention to books and information derived from books and begin to
acknowledge questions about books from as early as six months of age. According to
Heath (1986). by the time they are preschool age. most children accept book and bookrelated activities as entertainment.
The type of stories young children are exposed to tend to be predictable in nature.
That is, they are ofien based on familiar routines for which children have developed
expectations about how events will unfold (e.g., bedtime, bathtime). These stories also
tend to incorporate rhythm, rhyme, and repetitive words, phrases and episodes into their
texts (Glazer, 1991; Lynch, 1986). Perhaps the reason for the widespread use of
predictable stories with young children is that features such as familiarity, rhythm, rhyme
and repetition are naturally engaging to young children. In fact, it has been suggested that
such features may facilitate children's attention to and understanding of the material
(Diehl-Faxon & Dockstader-Anderson, 1985). Some researchers have argued that
exposure to predictable language may even facilitate language learning (Snow et al.,
1987).
In summary, stories, and predictable stories in particular, may be meaningfbl
content for a measure of discourse comprehension. Story-reading is a highly social,
contextualized activity and therefore, representative of discourse. At the same time its
properties are controllable, which is important for reliable measurement. Story-reading is
also a form of discourse that has hctional importance for young children. In addition,
since story-reading is typically a f d i a r and enjoyable activity for young children, the
compliance problems often associated with comprehension measurement may potentially
6
be minimized, making more reliable measurement possible. AU of these characteristics of
story reading suggest that its use in a measure of discourse comprehension may contribute
toward the validity of that measure.
The preceding discussion addresses the issue of content representativeness and
hndonai importance. Another aspect of validity is content relevance. Messick (1991)
states that ifa task measures only the construct it is designed to measure it demonstrates
content relevance. If, on the other hand, a task is too broad, containing excess variance
associated with other distinct constructs. it demonstrates construct-irrelevance variance.
This excess variance may confound measurement of the conmct, making interpretation
of task performance more difficult. One group of factors whose presence in a
measurement task can lead to content irrelevance variance are enabling factors. Enabling
factors are aspects of a task that are extraneous to the construct, but are required for
adequate task performance (McCauley & Swisher, 1984). An example of an enabling
factor is the receptive language sMI required by many tests of expressive language. An
individual taking an expressive language test needs to understand complex oral
instructions in order to respond appropriately to test items (McCauley & Swisher, 1984).
In this example, the intrusion of receptive language skills might affect performance on the
test in a manner irrelevant to the construct and therefore make the test score susceptible to
an invalid interpretation. Because enabling factors may compromise valid interpretation of
tea scores, measurement tasks should be examined carefblly for their presence.
Such an examination of the few existing tests of discourse comprehension reveals
the presence of two enabling factors for test performance that typically intrude into
measurement of this construct. Both the story retelling task, which requires the participant
to listen to a story and then retell it, and the comprehension question task, which requires
the participant to respond to questions posed about a stimulus, include memory and
language production demands that may confound interpretation of test scores.
Memory is an enabling factor for both the story retelling and comprehension
question tasks. In both of these tasks, there is a temporal gap between the presentation of
the stimulus and the response of the listener. This means that a representation of the entire
stimulus, including the story structure as well as the lexical and grammatical elements,
must be maintained in working memory until a response is required (Tyler, 1991). The
memory demands of these traditional tasks may obscure the comprehension abilities of
young children, since it is generally recognized that young children have more limited
memory skills than older children and adults (Schneider & Pressley, 1997).
Given this difficulty, it seems that a discourse comprehension measure which limits
memory demands should be developed. However, this is not a straight-fonvard issue.
Messick ( 1995) states that determining what constitutes construct-irrelevance variance is
often a difficult and contentious task. This is certainly true in the case of comprehension
and memory. Memory is not a construct that is clearly distinct from comprehension.
Rather, memory and comprehension are closely interrelated. As a result of the
interco~ectionbetween memory and comprehension, it may be impossible to completely
eliminate the memory construct fiom the measurement of discourse comprehension. (It
may also be undesirable since, to some degree, memory is a component of comprehension,
and vice versa, and a valid measurement should reflect all aspects of the construct.)
However, it may be possible to impose some limited control over the enabling factor of
memory, thereby preventing a heavy memory burden fiom obscuring young children's
comprehension abilities.
One way of lessening the memory demands typically imposed by tests of
comprehension may be to reduce the temporal gap between the presentation of the
stimulus and the response of the listener. If a close temporal relationship existed between
the stimulus and the response, the listener would not have to hold the entire stimulus in
working memory until the response was required. As a result, less demand might be placed
on the listener's memory abilities. According to Tyler (1991) a task may establish a close
temporal relationship between speech stimulus and response, "either by requiring the
listener to produce a fast reaction to the response immediately after the critical part of the
stimulus has been heard or by stopping the input at a specific point and requiring the
listener to make a response on the basis of partial information" (pp. 162-163).
In addition to designing the task to tap discourse comprehension while the input is
being presented, the content of the stimulus itself might also be structured to help limit the
memory required to perform the task. As was previously discussed. the kinds of stories
young children are frequently exposed to in their everyday lives contain factors such as
rhythm, rhyme, repetition, and familiar or script-based plots. These features seem to
enable children to attend to and recall the stories they hear. The mechanism by which such
fatures aid children's recall has been widely investigated. A number of researchers have
found that young children have ordered event representations or scripts for familiar events
and that they use these representations to guide comprehension and recall of story
narratives which are based on these famiiiar events (Hudson & Nelson, 1983; McCartney
& Nelson, 1981; Nelson, 1978; Pace & Feagans, 1984). Scripts have also been shown to
be an important memory organizer.
The impact of structured story content on young children's performance on a
comprehension task was revealed in Skaralds-Doyle's (1998) investigation of young
9
children's comprehension monito~gskills. Skarakis-Doyle (1998) found that structuring
the content of the discourse appeared to facilitate young children's ability to monitor their
comprehension of familiar stories. The stories used in the investigation were based on
familiar events in order to capitalize on the system of mental representation of world
knowledge (i-e., scripts) employed by young children. Specifically, stories based on
familiar events were used so that the input was structured and children had expectations
about the material. The results indicated that children as young as 30 months were able to
detect violations to story content when predictable, script-based material was employed.
In summary, the enabling factor of memory may be limited both by designing a
task which taps comprehension while the stimulus is being presented and by utilizing
structured stories. A task that diminishes the memory confound in this way may allow
young children's comprehension abilities to be measured more directly, and hence, permit
a more valid interpretation of their performance.
As has been discussed, memory is one enabling factor for performance on the
traditional measures of discourse comprehension. A second enabling factor for
performance on these measures is language production ability. Story retelling, since it
requires individuals to retell an entire stimulus story, demands a substantial degree of
verbal proficiency. This necessity for verbal proficiency may obscure young children's
comprehension abilities since their language production skills are not as well-developed as
those of older children and adults. Research on children's nanative production abilities has
shown that preschool children have difficulty structuring what they know into story format
(Applebee, 1978; Carlisle, 1991;Feagans & Farran, 1981;McCabe, 1996). When young
children are asked to tell a story, whether spontaneously, or as part of a retell task, they
tend to produce short, scattered accounts. In fact, retelhgs of fictional stories are said not
10
to be logically and coherently produced until Mh grade (Stein, 1988). As opposed to the
story retelling task, the comprehension question task is not as demanding of language
production ability. However, the ability to answer content questions (i.e., wh-questions)
still requires substantial formulation and speech production abilities (McCabe, 1996).
In summary, the language production demands imposed by the story retelling and
comprehension question tasks may hinder young children's ability to engage in those tasks
and confound interpretation of their performance, hence compromising the validity of the
measurement (McCabe, 1996). The possibility of controlling the enabling factor of
language production needs to be examined since such control may allow young children's
comprehension abilities to be measured more directly than has been possible.
Skarakis-Doyle & Wootton (1998) attempted to create a measure of discourse
comprehension measure that would control both memory and language production
demands. They developed the Joint Story Retell (JSR),a discourse comprehension
measure that was adapted from a widely used procedure for measuring reading
comprehension called the doze test. The doze test consists of a aory from which words
have been deleted in a systematic fashion (Dietrich, Freeman, & Griffin, 1979). During the
doze test, the reader supplies the word or words that have been deleted fiom the passage.
When a word is deleted from a text, the set of words that can go into the blank is
constrained by the context (i.e.,the remaining undeleted words in the passage) @ieuich,
et al., 1979). The doze test is based on the premise that readers who have understood a
passage well will have a better chance of recovering the deleted word than readers who
have not (Diecrich, et al., 1979). Thus, the rationale for the use ofthe doze procedure as a
measure of discourse comprehension is that if readers understand the structure and
content of a text, they should be able to utilize the redundancy in that text to retrieve
deleted words at a better than chance level. Essentially, the intent of the cloze test is to
"lead a child to use the context surrounding a blank to retrieve the word that was deleted
fiom the blank, and thus, to demonstrate his [or her] comprehension of that surrounding
context" (Dietrich, et al., 1979, p. 5).
An examination of the features of the JSR reveals that the task may indeed control
the enabling factors of memory and language production. First, relative to traditional
measures of discourse comprehension (i.e., story retelling, comprehension questions). the
ISR may diminish the enabling factor of memory. The essential difference between the
JSR and the traditional discourse comprehension measures is that with the former, the
listener does not have to maintain a representation of the entire story structure in working
memory while his or her comprehension is being measured. Rather, pans of the stimulus
story are retold by the investigator, providing the listener with a scaffold into which
memory for specific story details can be placed. Thus, it seems that the JSR may diminish
the substantial memory demands which typically confound young children's
comprehension abilities. Second, the JSR may control the enabling factor of language
production. Since children are required to produce only specific and limited elements.
rather than sentences or entire stories, minimal demands are placed on their production
abilities. Thus, the response requirements of the JSR may enable young children to
demonstrate their story comprehension with limited interference from language
production.
There is empirical support for this possibility that the JSR controls both memory
and language production demands. In a preliminary investigation of the JSR. SkarakisDoyle & Wootton (1998) analyzed the relationships between the JSR and two other teas
in order to determine whether the JSR controlled memory and language production
demands. First, they examined the relationship between performance on the JSR and
performance on the Expectancy Violation Detection Task (EVDT),a comprehension
measure which limits memory demands by requiring children to detect alterations to
critical story components (e.g., actors, actions, objects) as they occur during story-
retelling. Results of the investigation revealed a strong correlation between performance
on the JSR and accuracy on the EVDT suggesting that the JSR, like the EVDT,may
diminish the confounding factor of memoly ability. Second, they examined the relationship
of Mean Length of Utterance (MLU)to performance on the JSR.Results of the
investigation indicated that although MLU was moderately correlated with the JSR,it did
not contribute meaningfblly to the prediction of performance on the task. The
investigators concluded that performance on the JSR is not dependent on age-appropriate
sentence production skill. The results of the investigation by Skarakis-Doyleand Wootton
(1998) provide some preliminary empirical evidence for the validity of the JSR as a
measure of early discourse comprehension. Specifically, Skarakis-Doyle and Woonon's
(1998) findings suggest that the JSR may reduce the enabling factors typically associated
with tests of discourse comprehension and, thus, may demonstrate content relevance.
Evidence of content relevance is particularly important for measures of discourse
comprehension given the role that enabling factors play in the measurement of this
construct. Nevertheless. other types of evidence are also reievant to a fU.l evaluation of the
validity of discourse measures. Evidence of expected performance differences over time is
a partidarty illuminating form of validity evidence (Messick, 1995). Anastasi (1988) has
termed this form of validity evidence, age ditferentiation. Evidence of age differentiation
or developmental sensitivity is obtained through a comparison of test scores with
chronological age. For any ability which is expected to increase with age during childhood,
13
test scores are also expected to show such an increase, if the test is valid (Anastasi, 1988).
Since language comprehension is an ability that improves with age, a vaiid measurement of
comprehension would be expected to show a similar improvement in performance with
age. As yet, the degree to which JSR performance improves with age has not been firmly
established. since. in the preliminary investigation (Skaraksi-Doyle & Wootton, 1W8), the
measure was examined across a limited age range of children (i.e., 46 to 58 month olds).
Therefore, empirical evidence of the developmental sensitivity of the JSR is still needed.
In addition to evidence of age differentiation, evidence of an expected relationship
between two tests may suppon the validity of a test under investigation. According to
Anastasi (1988) correlations between a new test and similar earlier tests may be cited as
evidence that the new tea measures the same general area of behaviour as the other tests.
That is, these correlations may be cited as evidence that a measure demonstrates
concurrent validity. Correlations between the ISR and a currently accepted measure of
discourse comprehension (i.e., story retelling, comprehension questions) could be
examined to determine whether the JSR measures the construct it was designed to
measure. As yet, however, the JSR has not been tested against either of the traditionally
accepted measures of discourse comprehension (Skarakis-Doyle& Wootton, 1998).
Therefore, the concurrem validity of the JSR has not been determined.
In summary, the results of preliminary investigations (Skarakis-Doyle & Wootton,
1998) suggest that the JSR might limit the role of enabling factors, allowing more direct
interpretation of discourse comprehension. However, more rigorous empirical evidence of
this is needed. In addition, evidence that the JSR meets other validity criteria, such as age
differentiation and concurrent validity, is required. It was the purpose of this investigation
14
to evaluate the validity of the JSR by addressing these outstanding needs. The foilowing
specific questions were posed:
1) Is the JSR a developmentally sensitive measure of discourse comprehension? It is
hypothesized that performance on the JSR will improve with age such that older children
will attain significantly higher scores on the task than younger children.
2) What is the relationship of the JSR to a traditional measure of discourse comprehension
(i.e.,comprehension questions)? It is hypothesized that performance on the JSR will be
moderately correlated with accuracy on the traditional comprehension questions,
indicating that the two procedures measure the same construct, discourse comprehension,
but potentially differ in memory and language production demands.
3) What is the relationship of the JSR to a measure of discourse comprehension that has
diminished memory demands (i.e., the EVDT)? Ifthe JSR like the EM)?, has diminished
memory demands relative to traditional measures of discourse comprehension, the ISR
should be more strongly correlated with the EVDT than with the comprehension
questions.
4) What is the relationship of language production ability to performance on the JSR? If
the ISR minimizes production demands as it is purported to do, then language production
should contribute little to performance on the task.
Method
Participants
Forty children between the ages of 30 and 50 months were evaluated for eligibility
to be included in this investigation. Thirty-eight of these children (24 females and 14
males) met the criteria for inclusion. These children demonstrated both normally
developing l a p a g e abilities on the pre-experiment language tests and understanding of
the primary experimental task, the JSR as indicated by at least one self-initiated accurate
response during the practice session administered prior to the task. Of the two children
who did not meet the inclusionary criteria, one did not perform adequately on the preexperiment language tests; the other did not demonstrate understanding of the JSR as
indicated by a failure to complete any of the practice items provided. Only the 38 children
who met the inclusion criteria are included in the detailed description of the participants
which follows. Specific information on mean ages and performance on pre-experimental
testing for the children who met the inclusion criteria is shown in Table 1.
In order that performance of children of different ages might be compared, and the
developmental sensitivity of the JSR evaluated, the children were divided into two groups,
each of which covered an age span of approximately 10 months. The younger, 30 to 40
month old group, consisted of 15 children (1 1 females and 4 males), and the older, 41 to
50 month old group, consisted of 23 children (13 females and 10 males). As shown in
Table 1, the mean age of children in the younger group was 36.13 months (D
= 3.09).
while the mean age of the children in the older group was 45.96 months S
(Jl
= 2.38).
All of the children came fiom homes were English was reported by a parent to be
the primary language. None of the participants had any obvious cognitive or other
uncomected sensory (including hearing) or motor impairment, as reported by parents and
16
Table 1
Mean Ages. and Mean Raw Scores and Standard Deviations for Two Groups of Children
on Pre-Emeriment Tests
Variables
---
Age Group
-n
Agea
-
-
-
PLS-3b
Vocab. test'
Younger group
M
SD
-
Older group
M
SD
Total group
M
-
SD
Note. Age ranges in parentheses.
'Age reported in months. VLS-3 maximum raw score = 48 'Vocabulary pre-test
maximum raw score = 16.
17
as described on the Checklist for Hearing Impairment (CHI)(Warr-Leeper et al., 1997).
The CHI, a checklist pertaining to a child's developmental and hearing history, consisted
of y d n o questions and rating scales.
The children's receptive language abilities were normaily developing as indicated
by scores within 1.5 standard deviations W ' s ) of the mean for their age on the Auditory
Comprehension (AC) subscde of the Preschool Lannuane Scale-3 (PLS-3) (Zimmerman,
Steiner, & Pond, 1979). Although the AC subscale of the PLS-3 includes items that tap a
variety of aspects of language understanding, particular emphasis is placed conceptual
understanding (i. e., understanding of temporal, quantity and spatial concepts). All children
also possessed normally developing language as indicated by a score above the 10'
percentile on the MacArthur Communication Develo~mentInventory I11 (CDI-111)
(Dale, 1996). The CDI-111, a global measure of language development, is a parent checklist
consisting of items that tap expressive vocabulary, sentence structure and, comprehension.
A vocabulary pre-test constructed by the investigator (Appendix A) documented each
child's understanding of the words used in the story. The pre-test consisted of 16 words
fiom the story including, eight nouns, five verbs, two adjectives and, one locative. The test
was a word-picture matching task in which the child was required to select one of four
pictures based on a word orally presented by the investigator (e.g., "Show me, rubber
duckier*).All children obtained scores of 60% or greater on the vocabulary pre-test.
Children's scores on the vocabulary pre-test, as well as their scores on the AC subscale of
the PLS-3. and on the CDI-111 are shown in Table 1.
Parent responses to questions on an early literacy questionnaire that was
constructed by the investigator (Dempsey, Perfetti, & Skatakis-Doyle, 1999) (Appendix
B) characterized the participants and their early literacy experience. According to their
18
parents, all of the children enjoyed listening to stories. Ninety-five percent of the children
were read stories more than three times a week. Approximately half of all parents
described their reading styles as including both comments that extend the stories and
specific requests for story information directed to their children. The other parents
reported using variations of these basic interactive exchanges. Only one parent reportedly
did not engage in these types of exchanges when reading to his or her child.
CDI-I11forms for four of the children (two fiorn each age group) were not
completed. Thus, average performance on the CDI-III for the younger group was based
on 13 children; whereas, average performance for the older group was based on 2 1
children. The mean raw score of children in the younger group was 85.85 (SD = 23.70)
out of 124 possible items; whereas, the mean raw score of the older group was 99.52
= 18.60). The mean
(s
raw score of ail the children for whom a CDI-III form was completed
("& = 34) was 94.29 ISD = 21.43).
Materials and Ex~erimenta.1Test Stirnuli
A predictable children's storybook entitled "Splish Splash" (Wootton & Skarakis-
Doyle, 1995) was employed in this investigation (Appendix C). The plot of the story
incorporates a familiar bathtime routine. The story is composed of seven episodes that are
organized around a central goal. Each episode, in tum, is comprised of a sequence of goal
directed activities. The overall length of the story in total number of words is 398.
Vocabulary words, sentence patterns and episodes (i.e., child requesting something for her
bath and mother searching for it) are repeated and a distinct rhythmic or song-like refiain
is incorporated into the text of the story at several intervals. Each page of text is
accompanied by a corresponding picture. An audio recording of the "Splish Splash" story
was employed in the story familiarization phase of the investigation to allow for
consistency in story presentation across participants.
The following three test stimuli were constructed: a cloze or JSR story version,
two forms of a M of traditional comprehension questions, and an EVDT story version.
In the cloze story version (Appendix D), eight elements, (i.e., actions, actors,
objects, locations, adjectives) were omitted fiom the text of the story and substituted with
blanks or pauses. Seven of the omitted elements were judged to be critical to the
progression of the story toward its goal (i.e., necessary to achieve the goal). One of the
elements (i.e., the name of the central character in the story) was judged to be supportive.
That is, this element was logically consistent with the story but not necessary to achieve
the goal. In order that the eight elements might be deleted and the sense of the story
maintained, some of the word order was altered from the original version of the story and
the length was condensed to 238 words. Three of the eight cloze items were accompanied
by pictures that could have revealed the appropriate response. The pictures accompanymg
the other five items did not provide additional cues to the appropriate responses.
The comprehension questions were designed to tap the same content as the cloze
version of the story. Both yedno, and wh-questions were included in the set of questions
so that the demand for recognition type responses (as in yedno questions) and r e d l type
responses (as in wh-questions) was balanced. Two forms of the comprehension questions
(Fonn A and Fonn B) (Appendices E and F) were prepared in order to minimize the
probability of responses to the yedno questions being influenced by guessing. Form A of
the comprehension questions was composed of seven yedno questions and seven wh-
questions. Six of the yesho questions and five of the wh-questions pertained to concrete
aspects of the story, or content information. The remaining three questions (i-e.,one
yes/no question and two wh- questions) required children to make inferences based on the
events of the story. Form B consisted of seven yedno questions and six wh-questions. Six
of the yesho questions and four of the w h questions pertained to content information.
The remaining three questions (i.e., one yedno question and two wh-questions) were
inferential in nature. As was stated previously, the two forms of questions were prepared
in order to minimize the probability of results being influenced by guessing on the yesho
questions. Thus, three of the yesho questions on Form A required an afhnative
response, while the remaining four questions required a negative response. On Form B,
the seven yesho questions required the opposite responses. An example of a Form A
question and its Form B counterpart is as follows: Is Sarah dirty? (yes) I Is Sarah clean?
(no). The wh-questions were identical across the two forms, with the exception of one
additional wh-question which was included on Form A as a follow-up to a yeslno
question. The form of comprehension questions selected for administration was
counterbalanced across the children within each age group. Thus, 23 children completed
Form 4 and 16 children completed Form B.
The EVDT story version (Appendix G) consisted of eight violations made to key
story components. Five of the violations involved within story substitutions where certain
actions, actors or objects were exchanged with others present within the aory (e.g., "dirty
clothes" in place of "rubber duckie"). Two violations consisted of goal disruptions where
an event which was plausible and fit the general story context but disrupted progress
toward the goal (i.e., avoiding a bath) was inserted into the story. For example, the phrase
"jumped in the tub" was inserted into the aory in the place of the original phrase, "ran
outside to make more mud-pies". The one remaining violation involved the substitution of
the content of the rhythmic, repetitive phrase (i.e., "Splish splash, Sarah needs a bath.. .")
21
with a phrase that was related to the story but contrary to the goal structure and content
ofthe o r i w story (i.e.,"Ooey gooey, let's make a mess...").
Procedure
The investigation was separated into two phases. Phase I consisted of preexperiment testing and story familiarization.Phase I1 involved presentation of the
experimental procedures. The two phases occurred over a two-day period for each child.
The experimental sessions were recorded on videotape to allow for detailed scoring and
analysis. All sessions were conducted either at Elbom College (Universityof Western
Ontario), in the child's home, or at the child's preschooi or daycare centre.
Pre-Exoeriment Testing and Story Familiarization.
During this phase children completed the inclusionary testing for participation in
the investigation. The children completed a standardized receptive language test, the
Auditory Comprehension (AC) subscale of the PLS-3(Zimmerman et al., 1979) as well as
the vocabulary pre-test that was constructed by the investigator. In addition., the following
questionnaireswere completed by the parents: the CDI-III @ale, 1996); the CHI (WarrLeeper et al., 1997); and, the early literacy questionnaire (Dempsey, et al., 1999).
Each child was presented with the original "Splish Splash" story three times prior
to participating in the experimental tasks. The number of story presentations was
determined based on the research of Slackman and Nelson (1984) who found that with as
few as three presentations of a story, preschool children could develop a script for an
unfamiliar story. The first story presentation occurred following completion of the
inclusionary testing. The child Listened to the audio-recording of the story with the
investigator and followed dong in the book. The second aory presentation occurred
between the end of the first session and the beginning of the test phase on the second day.
Each parent was provided with a copy of the storybook and the audiotape and was asked
to listen to the story with the child one time. The instruction to listen to the aory only
one time was made to ensure consistency in number of story exposures across children.
The parent was instructed to listen to the aory with the child during their usual
storyreading time and was asked to respond to comments made by the child during the
"Oh", plus repetition of
storyreading with neutral acknowledgment (e-g., "Oh. "O.K.".
the child's phrase) but not engage in discussion of the story with the child. The third and
final aory presentation occurred on the second day, prior to the administration of the
experimental tasks. Again, the child listened to the audio-recording of the story with the
investigator and followed along in the book.
Experimental Test Procedures.
Following the third story presentation, children participated in the experimental
portion of the investigation where they jointly retold the story with the investigator and
answered the comprehension questions. A subset of the children also participated in the
EVDT. In order to control for the possibility that participation in one of the tasks might
influence performance on subsequent tasks, the order of two of the experimental
procedures (the JSR and the comprehension questions) was counterbalanced across
children, such that half of the children in each group participated in the ISR &st, while
the other half answered the comprehension questions first. The subset of the children who
participated in the E M T always completed this task last, in order to prevent the
introduction of a new story (the E M T practice story) f?om interfering with performance
on the other two tasks.
Joint Stow Retell Procedure.
Administration ofthe JSR consisted of the child jointly retelling the "Splish
Splash" story with the investigator by providing the missing elements. Prior to
participating in the experimental task, the child completed four practice items. The
practice session was provided so that the child had a model of what would be required of
himher during the actual ISR test procedure. The four practice items were taken from
the story but were not included among those used in the actual test procedure which
followed. Before presenting the praaice items, the investigator set out a playhouse and
props (e.g., bathtub, washcloth, mom figure), labeling each item for the child. The props
were then placed out of the child's view and the investigator instructed the child: "Help
me tell the Splish Splash story. You can tell me the words or show me with the toys. This
is how we'll do it". For each practice item., the investigator read a passage fiom the
story, and left a word or phrase out (e.g., "Sarah stuck her big toe into the
"). The
child was expected to supply the missing word or words (e.g., bathtub). If the child failed
to respond verbally within five seconds on the first item, the investigator's confederate
supplied the missing word or words, thereby demonstrating a verbal response. tfthe child
failed to respond verbally within five seconds on subsequent practice items, the
investigator presented the tray of props and prompted the child to enact the item, saying
"Doyou want to show me?'and repeating the item. If the child responded verbally to all
practice items, he or she was encouraged to enact the last item. This was done in order to
ensure that the child was aware that both verbal and enactment responses were
acceptable during the task. The investigator eacouraged a enactment response by saying,
"Let's try that one again. This time show me" while presenting the tray of props.
Following completion of the practice items, the child participated with the
investigator in the joint retelling of the story. The investigator read the doze story to the
child, as he or she followed along in the book. Again, the child was required to supply the
appropriate word or words during pauses made by the investigator. The investigator
waited no longer than five seconds for a response. If the child did not respond verbally
within five seconds, the props for the enactment response were made available. That is,
the investigator prompted the child to respond nonverbally by asking, "Do you want to
show me?" and offering the tray of props. If the child still did not respond, the
investigator provided a verbal and enactment response and then continued with the story.
Corn~rehensionaestions Procedure.
Administration of the traditional comprehension questions consisted of the
investigator asking the child either the Form A or Form B questions. The child was
expected to answer the wh-questions verbally; however, either a verbal or a nonverbal
(i-e.,nodding yes or shaking head no) response was acceptable for the yedno questions.
Each question could be repeated a maximum of one time. If a child did not respond to a
question or gave an incorrect response the investigator responded neutrally and then
continued with the next question without providing the correct response. The child was
given no longer than five seconds to respond to each question.
Emectancv Violation Detection Procedure.
A third experimental procedure, the EVDT, was administered in order to evaluate
the role of memory in performance on discourse comprehension tasks. Seven of the
thsrty-eight children were used to pilot refinements which were made to the original
EVDT procedures employed by Skarakis-Doyle and Woonon (1998). Mer the pilot
work was completed, the remaining thirty-one children participated in the EVDT task
used in the present study.
In order to clarify to each child what would be required during the EVDT, a
practice session was provided. The following instruction was given prior to the practice
session: "I'm going to read your story to you. But, I'm going to say some things that
sound silly or out-of-place and I want you and (confederate) to catch me". The
investigator then read a portion of a f d i a r storybook provided by the child's parents
for use during the practice session. The investigator altered salient elements (i.e.,objects,
actors, actions) of the famiIiar story. If the child failed to respond, either verbally or
nonverbally to a violation the confederate responded to the violation by saying, for
example, "No! It's not cow, it's Mortimer!". The investigator then encouraged the child
to be the one to catch the mistake the next time. When a child detected a violation during
the practice session, that child was verbally reinforced (e.g., "Good catching!"). Each
child received a maximum of five practice trials. As soon as the child responded to one of
the violations, his or her practice session was discontinued, since understanding of the
task had been demonstrated. Thus, children received as few as one and no more than five
practice trials prior to participating in the task.
Following the practice session, the violated version of the story was read to the
child as he or she followed along in the book. The investigator introduced the EVDT by
saying to the child, "Now I'm going to read the Splish Splash story to you. I'm going to
say some more things that sound silly or o~t-o~place
and I want you to catch me, just
like before."
The child was expected to demonstrate that he or she detected the violation
within five seconds of its occurrence through verbal response and/or via nonverbal
behaviours (e.g., change in eye gaze, facial expression or body movement). If the child
produced a verbal protest (i-e., 'To!)in response to a violation, the investigator
encouraged the child to correct the error with the following probe, "No? No whet?".The
investigator did not overtly acknowledge nonverbal responses made following violations
to the story. The investigator never prompted a child who failed to respond to a violation
either by pausing or by questioning that child. Rather, the investigator continued reading
the story. The experimental procedure was videotaped to allow children's nonverbal
responses to be analyzed.
Data Analvsis
Scorin~.
Accuracy on the JSR was determined by calculating the total number of items
(out of a maximum number of eight) that were correctly responded to. Both accurate
verbal and enactment responses were accepted as correct. Accurate verbal responses
included those that were verbatim from the story and in some cases also included
responses that were variations of the words in the story (Appendix H).Enactment
responses consisted of the child manipulating a toy using the correct action, object or
person.
Accuracy on the comprehension questions was determined by calculating the
number of correct responses to the questions. Verbal responses were expected for whquestions; both verbal and nonverbal responses to the yesho questions were accepted.
Correct responses to whquestions received one point each (see Appendix I), while
correct responses to yesho questions received M a point each. Yesho questions were
27
given less weight in order to prevent inflated scores due to the potential for guessing an
answer correctly. Since Form A had one more wh-question than Form B, the total score
possible on Form A (i-e., ten) was one point more than the total possible score on Form
B (i-e., nine).
Performance on the EVDT was scored by the investigator fiom the videotape of
the session. In accordance with the criteria established by Skarakis-Doyle and Wootton
(1998), acceptable verbal responses were operationally defined as: protests (e.g., "No!")
or corrections (e.g., "Not dirty clothes, rubber duckie!"), andlor repetitions with rising
inflections which indicated a challenge to the utterance (e.g., "Dirty clothes!") or a
question (e.g., "Dirty clothes?") . Acceptable nonverbal responses were operationally
defined as: changes in eye gaze (fiom book to reader, from reader to book, fiom book to
external environment of the room, and from the environment to the book), changes in
facial expression (including smiles, f?owns, and puzzled looks), and changes in body
movement (including sudden whole body or discrete body part movements such as head
turns and sudden cessations of movements such as foot tapping) (Skarakis-Doyle &
Wootton, 1998). In order for a nonverbal response to be accepted as a detection of a
story violation, the behaviour had to occur within five seconds following the violation
and no later than the end of the phrase immediately following occurrence of the violation.
Skarakis-Doyle (1998), has demonstrated that nonverbal responses are rigorous
indicators of detection that are used if and only if a story violation has occurred. A
response was recorded as a combined response when a nonverbal behaviour was
exhibited in conjunction with a verbal behaviour (e-g., a child looks up fiom the book to
the experimenter and says, 'To, not dirty clothes! Rubber Duckie!"). Accuracy on the
EVDT was determined by calculating the number of violations responded to, including
both nonverbal and combined responses, out of a maximum possible of eight.
Agreement.
A minimum of 10% of both the JSR and comprehension question forms were
rescored by a second graduate student who was involved in the study in order to
determine inter-judge agreement. Agreement for scoring between students was 100% for
both the JSR and comprehension question forms. Inter-judge agreement for EVDT
scoring was also calculated. A minimum of 10Y0of the EVDT's were rescored from
video recordings of the task. Inter-judge agreement was calculated for the occurrence of
a response (nonverbal or combined) to a violation. Agreement for scoring between judges
was 94%.
Results
Before the primary data analyses were conducted, several preliminary analyses
were undertaken in order to evaluate particular aspects of the test stimuli and their
presentation. Since a storybook format was used to present the JSR there was a
possibility that some of the pictures accompanying the cloze items might have revealed
the correct response. This potential advantage was evaluated in the preliminary analyses.
Further, since two forms of the comprehension questions were utilized, prehmary
analyses were also conducted to ensure that the comprehension forms were equivalent.
First, the impact of pictures on JSR performance was examined across all children
= 38). The average proportion correct across the three items where pictures might
have cued a correct response
= 0.69,
= 0.27) was equivalent to the average
proportion correct on the other five items (kJ
= 0.69
= 0.31). A paired t-test was
performed to determine whether there was a statistically significant difrence in
performance on the two types of doze items. The results of the t-test indicated that there
was,in fact, no sigmiicant dierence (t(37)= 0.02, p > .05) between performance on
cloze items which did or did not have an accompanying picture. Therefore, subsequent
data analyses did not differentiate between the two types of items; the JSR raw scores
were computed from children's responses to all eight items.
Second, the equivalency of the two forms of comprehension questions (Form A
and Form B) was examined. Since assignment of form was counterbalanced within each
age group, 22 children completed Form A of the comprehension questions and 16
children completed Form B.The average proportion correct on Form A was slightly
lower
= 0.60,
= 0.14) than the average proportion correct on Form B
= 0.70,
SD = 0.24). In order to determine whether the difference in petfonnance on the two
-
forms was statistically significant, an independent t-test was conducted. Based on a
pooled variance estimate, no si@cant difference (t(36)= -1S 3 , > .05) was found
between accuracy on Form A and accuracy on Form B. Therefore, subsequent data
analyses did not differentiate the form of comprehension questions completed by the
children.
Once these preliminary analyses were completed, primary analyses were
undertaken to evaluate the validity of the JSR.A number of specific issues related to the
validity of the JSR were examined, including the developmental sensitivity, the
concurrent-validity and the content-relevance of the measure.
As stated previously, one of the major criteria employed in empirical evaluation of
test validity is age differentiation or developmental sensitivity (Anastasi, 1988). Since
discourse comprehension abilities are expected to increase with age during childhood,
test scores on discourse comprehension measures should likewise show such an increase,
if the measures are valid. Several analyses were conducted to examine whether the JSR
met the criterion of age differentiation. Correlational analysis was performed to explore
the relationship between age and performance on the JSR for all 38 participants. As
shown in Table 2, a statistically significant correlation of moderate magnitude (I= -61, g
< .01), was found between age and the JSR indicating that 37% of the variability in
scores could be explained by a relationship between JSR performance and age. Also
shown is Table 2 is the correlation between age and accuracy on the traditional
comprehension questions. As would be expected, a statistically significant correlation of
moderate strength (L = .60,
.01) was also found between age and accuracy on the
comprehension questions, indicating that 36% of the variability in scores could be
Table 2
Correlation Coefficients for Ane. JSR and Com~rehensionQuestions (N= 38)
Variable
Age
JSR
Questions
Age
--
61**
.60**
--
.60**
JSR
Questions
--
32
explained by the relationship between performance on the comprehension questions and
age.
In order to hrther explore the relationship between age and each of the
comprehension measures, the performance of children in the younger age group &
I = 15)
was compared to the performance of children in the older age group (n = 23). As shown
in Table 3, the older children, as a group, provided a greater number of accurate
responses than did the younger children on both the JSR and the comprehension
questions. To determine if the performance differences between the two age groups were
significant, independent t-tests were conducted for each of the measures. A t-test was
conducted with age group as the independent variable and the number of correct
responses on the JSR as the dependent variable. Based on a separate variance estimate, a
significant difference (i(20) = -3 -68, g < .01) was found between the younger group and
the older group, indicating that the scores of the older group on the JSR were, in fact,
significantly higher than the scores of the younger group. A second t-test was conducted,
this time with the number of correct responses on the comprehension questions as the
dependent variable. Based on a pooled variance estimate, a significant difference (t(3 6) =
-3.39,
Q<
.01) was again found between the younger group and the older group,
indicating that the scores of the older group were significantly higher on the
comprehension questions than the scores of the younger group. In summary,both
correlational and t-test analyses revealed performance differences with age on the JSR In
addition, as would be expected, similar findings were revealed for the traditional
comprehension questions.
Analyses were then conducted to evaluate the concurrent validity of the measure.
The concurrent validity of a test is determined by evaluating how closely an individual's
Table 3
Mean Number of Accurate Res~onsesand Standard Deviations on the JSR and
Com~rehensionOuestions for Two Age Grou~s
Measure
Age Group
-n
JSR
Comprehension Questions
--
Younger groupa
15
M
4.40 (0.52)
5.10(0.47)
SD
1.99(0.28)
1.66(0.23)
M
6.48(0.77)
6.85 (0.71)
SD
-
r . 1 2(0.21)
1.48 (0.16)
Older groupb
Note. Mean proportion correct and standard deviation enclosed in parentheses.
'Age range = 30-40months
months).
= 36.13 months). bAgerange = 41-50 months @
= 45.96
I
34
test score is related to
his or her score on a criterion variable that is measured at about the
same time the test score is obtained (Anastasi, 1988; McCauley & Swisher, 1984). The
concurrent validity of the JSR was evaluated by comparing children's performance on the
JSR with their scores on the traditional comprehension questions. As shown in Table 2,
correlational anaIysis revealed a statistically significant correlation of moderate magnitude
k=.60,p < .01) between the JSR and the comprehension questions, indicating that 36%
of the variability in scores on the JSR was explained by the relationship between
performance on that task and performance on the comprehension questions.
Also shown in Table 2 is the relationship between each of the comprehension
measures and age. Given that, in addition to being moderately correlated with each other,
both the JSR and the comprehension questions were also moderately correlated with age,
analyses were conducted to determine whether there was any unique contribution of
comprehension to the relationship between the JSR and the comprehension questions. The
relationship between the two comprehension measures was analyzed for children within a
more developmentally stable age range (i.e., children between 42 and 50 months of age).
Correlational analysis was performed to examine the relationship between scores on the
JSR and the comprehension questions for children within this age range. As shown in
Table 4, a moderate correlation (L = .44) was found between the JSR and the
comprehension questions in this age group. Thus, 19% of the variability in scores on the
JSR was explained by the relationship between performance on that task and performance
on the comprehension questions when age varied only minimally. As would be expected
given the inextricable relationship between age and comprehension ability, the magnitude
of the correlation between the JSR and the comprehension questions was not as strong
within the developmentally stable age p u p (E = -44)as it was within the group as a
Table 4
Correlation Coefficients for Age. JSR and Corn~rehensionQuestions for Children
Between 42 and 50 months (n = 23)
Age
--
28
.27
JSR
.44*
Questions
--
whole (L = -60).However, the fact that the correlation remained within the moderate
range suggests that the JSR and the comprehension questions share some common
variance other than what is accounted for by age alone.
In addition to the comparisons made between the JSR and the comprehension
questions, comparisons were also made between the JSR and two of the pre-experimental
measures, the vocabulary pre-test and the AC subscale of the PLS-3.These comparisons
were conducted in order to determine the relationship between the JSR and two widely
accepted measures of literal comprehension.
First. analyses were performed to examine the role of vocabulary knowledge in
performance on the JSR As was shown in Table 1, the average number of correct
responses on the vocabulary pre-test was 13.11 (SJ = 1.74) out of a maximum possible
score of 16. Correlational analysis was conducted to examine the relationship between the
JSR and the vocabulary pre-test. Results of the analysis revealed a statistically significant
relationship of moderate strength between the JSR and the vocabulary pre-test (I= .57, p
< .01), indicating that 32% of the variability in scores on the JSR could be explained by
the relationship between performance on the JSR and performance on the vocabulary pretest. A regression analysis was conducted to evaluate the unique roles of age, and
vocabulary pre-test score in predicting ISR performance. Based on a step-wise regression
analysis, age alone accounted for 37% of the variability in scores on the JSR @ = .61,
-F(1,36) = 20.90,s < -01). When the vocabulary pre-test score was entered into the
equation it accounted for an additional sigruficant 13% of the variability in performance on
the JSR task &=
- 13, L ( 1 , 3 5) = 9.27, < .01). This finding indicates that knowledge
of story vocabulary contributes uniquely to performance on the JSR even after the
contribution of age is considered.
Second, analyses were performed to examine the relationship between the JSR and
the AC subscale of the PLS-3. As was shown in Table 1, the average PLS-3 AC raw score
was 35.55 (SP = 5.95) out of a maximum total score of 48. Correlational analysis was
conducted to determine the relationship between the JSR and the PLS-3 AC subscale.
Results of this analysis revealed a statistically siwcant relationship of moderate
magnitude
= .49, p < -01) between the JSR and the PLS-3 AC subscale, indicating that
24% of the variability in JSR scores could be accounted for by the relationship between
pedonance on the JSR and performance on this PLS-3 AC subscale. A regression
analysis was conducted to evaluate the unique roles of age and PLS-3 AC score in
predicting JSR performance. Again, based on a step-wise regression analysis, age alone
accounted for 37% of the variability in performance on the JSR
= .6 1, E(l.36) = 20.90,
p < .01). However, PLS-3 AC scores did not make any unique contribution to the
variance in JSR performance (e = .06,t = 29, > .05)even after age was considered.
The developmental sensitivity and the concurrent validity of the JSR having been
evaluated, the content-relevance of the measure was finally examined. Empirical evidence
of content-relevance may be obtained through a demonstration that scores on two tests
are related in expected ways. As stated previously, the JSR was designed to reduce
memory demands by establishing a close temporal relationship between the stimulus and
response. Since the EVDT also limits memory demands by closely linking the stimulus and
response, at least a moderate correlation was expected between these two measures.
Statistical analyses were conducted to examine whether this was the case.
Prior to examining the role of memory demands in task perfonnance, several
analyses were performed to examine the relationship between age and performance on the
EVDT. As presented earlier, after the pilot work was completed, 3 1 children were
38
presented the EVDT.However, of these 3 1 children, only 24 met the criterion of at least
one seEinitiated, correct response provided during the practice session. Therefore, all
analyses involving the EVDT were based on the data obtained fiom the performances of
24 children @fern age = 42.04 months; age range = 30-50 months). As shown in Table 5,
a statistically significant correlation of moderate strength @ = -56. g < .01) was found
between performance on the EVDT and age, indicating that 3 1% of the variability in
performance on the EVDT was explained by the relationship between EVDT performance
and age.
Further analyses revealed that, as a group, the older children provided almost twice
as many correct responses on the EVDT
(M = 2.50,
= 2.27).
= 4-11,
= 2.16) as the younger children
An independent t-test was conducted to examine whether the
difference in performance between the two age groups was significant. Based on a pooled
variance estimate, a significant difference (r(22) = -2.42, g < .05) was found between the
younger group and the older group in accuracy on the EVDT.The finding that children in
the older group were significantly more accurate on the EVDT than children in the
younger group was consistent with the correlational result.
Given that the relationship of the EVDT to age was established, analyses were
conducted to examine the relationship between the ISR and the EVDT. As stated
previously, it was assumed that since both the JSR and the EVDT required responses to
be made while the child was creating a representatio~rather than after the representation
had been completed and was held in working memory, both measures reduced memory
demands. In order to evaluate this assumption, the relationship between the JSR and the
EVDT was analyzed for the 24 children who demonstrated understanding of both tasks.
Table 5
Correlation Coefficientsfor Ape. JSR Com~rehensionOuestions and EVDT in = 24)
Age
JSR
Questions
E.V. Detection
--
.65**
.64**
--
5 1*
--
40
As was shown in Table 5,
a statistically sigru£icantcorrelation of moderate magnitude (r =
-57, g < .Ol)was found between the JSR and the EVDT, indicating that 32% of the
variability in scores on the JSR could be explained by the relationship between
performance on the ISR and performance on the EVDT. Also shown in Table 5 is the
correlation coefficient forthe relationship between the comprehension questions and the
EVDT.Again, a statistically sigruficant correlation of moderate magnitude (I= .47, <
.05)was found between the two measures, indicating that 22% of the variance in
comprehension question scores was explained by the relationship ben~eenperformance on
the comprehension questions and performance on the EVDT.
As was shown in Table 5. the JSR the EVDT, and the comprehension questions
were each moderately correlated with age. Therefore, as with the data for the JSR and the
comprehension questions. EVDT data was analyzed for the children within the smaller.
more developmentally stable age range (i.e., children within 42 to 50 months). For the 14
children between the ages of 42 and 50 months who demonstrated understanding of the
tasks. moderate correlations were found between both the EVDT and the WR (_r =.38)
and between the EVDT and the comprehension questions (I= .41). Thus, as would be
expected given the close relationship between age and comprehension ability, the
correiations between the EVDT and both the JSR and the comprehension questions were
not as strong in the developmentally stable age range as they were in the group as a whole.
However, the correlations remained in the moderate range, with performance on the
EVDT continuing to account for 14% andl 7% of the variability in scores on the JSR and
comprehension questions respectively.
41
In addition to memory, the other enabling factor that was examined was language
production. The JSR was designed to limit language production demands by requiring
cMdren to produce only limited elements of the text. As well, children who did not
initially provide a verbal response to an item were given an opportunity to enact the
response. Given that the enactment response was an option, the number of children who
utilized this form of response was calculated. Results indicated that only six of the thirtyeight children availed themselves of the enactment response option during testing. All six
of these children utilized the enactment response option to complete the doze item which
required the name of the central character in the story (i.e., Sarah). One child also used the
enactment response to complete a cloze item which required the name of an object used in
the story (i.e., big bar of soap). Half of the children who utilized the enactment response
option got less than SOTOof the JSR items correct; the other half of the children got more
than SOTOof the items correct. Of the total number of correct responses provided by all
the children, only 3% (i-e..7 of 2 15 correct responses) were provided via enactment. This
finding suggests that the language production demands of the JSR task may be of an
appropriate level for children between the ages of 30 and 50 months.
[n order to
further examine the potential role of language production in
performance on both the JSR and the comprehension questions, additional quantitative
analyses were undertaken. Specifically, the relationship of each measure to scores on the
language production component of the CDI-III (i.e.,the vocabulary production and
sentence structure Scaies) was analyzed. Parents of 35 of the 38 children completed the
language production component of the CDI-III. The average number of items correct on
42
the language production component of the CDI-III KDI-III-production) was 85.34 out of
a total possible score of 1 12. Correlational analysis revealed a sigruficant correlation of
moderate magnitude between the JSR and CDI-III-production & = -38, Q < .05) and a
moderately strong correlation between the comprehension questions and the CDI-IIIproduction (1= .63, g < .01). To further analyze the role of language production in
performance on each measure, regression analyses were conducted. Fint, a regression
analysis was performed with number of responses correct on the JSR as the dependent
variable and age and CDI-III-production as the independent variables. Based on a aep-
wise regression analysis, age alone accounted for 37% of the variability in ISR
performance (B = -57, E ( 1,33) = 15.67, LC .OI). CDI-III-production did not significantly
account for a unique amount of variability and hence, was not entered into the equation. A
second regression analysis was conducted, this time with comprehension questions as the
dependent variable. Based on a step-wise regression analysis, CDI-III-production alone
=
accounted for 45% of the variability in performance on the comprehension questions
.63.E(l,33) = 21.29, E < .01). Age was also entered into the equation and accounted for a
additional 9% of the variability in performance on the comprehension questions
=
. 0 8 , L ( 1 , 3 3 ) = 5.97, g < .OS).
Although there were moderate correlations between the CDI-In-production and
both the JSR and the comprehension questions, results suggested that global language
production ability only contributed uniquely to performance on the comprehension
questions. Global language production ability did not contribute to performance on the
JSR beyond what would be expected due to age alone.
Discussion
Although comprehension is a broad construct that encompasses the discourse
level, as well as the lexical and syntactic levels, measurement efforts have focused almost
exclusively on the latter two levels. As a result, while there are a variety of measures
available to tap decontextualized literal and syntactic comprehension, there are relatively
few available to tap discourse comprehension. Given that very few discourse level
measures are available. there is a possibility that scores on lexical and syntactic measures
may be used to draw inferences about the broader construct of language comprehension.
However, since different abilities are required to comprehend language at each level, teas
that are valid for the measurement of lexical and syntactic comprehension, are not
necessarily, and in fact are unlikely to be, valid indicators of discourse comprehension.
Clearly, measures designed specifically to tap discourse comprehension are necessary for
comprehensive measurement of the construct of language comprehension-
The development of such measures is not a straight-forward task, however. Valid
measurement of discourse comprehension faces a number of challenges. One of the most
critical challenges involves ensuring that tasks only measure content that is relevant to
discourse comprehension. Entailed in existing measures of discourse comprehension are
enabling factors such as memory and language production. These facton intrude into the
measurement of discourse comprehension, confounding interpretation of performance.
Skarakis-Doyle & Wootton (1998) attempted to create a measure of discourse
comprehension that minimized the confounding factors typically found in measures of
discourse comprehension. As discussed previously, preliminary investigation suggested
44
that this new measure of discourse comprehension, the Joint Story Retell, might indeed
limit the role of enabling factors but M e r evaluation of its validity was necessary. The
current study sought additional evidence for the validity of the JSR.
As stated previously, age differentiation, or developmental sensitivity is a major
criterion employed in construct validation (Anastasi, 1988). Since language
comprehension abilities are expected to increase with age during early childhood,
performance on measures of comprehension should likewise show such an increase, in
order for the measure to be valid. As expected, results of this investigation showed that
the comprehension questions did demonstrate age differentiation. Of greater import, was
the finding that the JSR also demonstrates age differentiation. Results of correlational
analyses revealed a moderately strong relationship between age and performance on the
ISR. When group comparisons were conducted, the older children (i.e.,41 to 50 month
olds) performed signtficantly more accurately on the JSR than the younger children (i.e.,
30 to 40 month olds). These findings are consistent with the hypothesis that the JSR is
sensitive to age differences.
According to Anastasi (1988), age differentiation is a necessary but not sufficient
condition for validity. Thus, the finding that the JSR exhibits age-related performance
differences does not on its own ensure that the ability that is changing with age is
discourse comprehension. In order to determine whether the ability measured by the JSR
is, in fact, discourse comprehension, comparisons were made between performance on the
JSR and performance on a preexisting measure of discourse comprehension.
45
As stated previously, correlations between a new meawe and related earlier tests
may be cited as evidence that the new test measures approximately the same area of
behaviour as the other tests of the same c o n m a (Anastasi, 1988). Since the JSR
purports to measure discourse comprehension, the same Construct which is measured by
comprehension questions, the concurrent validity of these two measures was examined.
When comparisons were made between children's performance on the JSR and their
performance on the traditional comprehension questions, a moderately strong correlation
was revealed. This finding suggests that a considerable amount of variance is shared by the
two measures.
Since both measures were also correlated with age, it was also possible that shared
variance related to age, rather than to discourse comprehension, might have accounted for
the relationship between the JSR and comprehension questions. Therefore, the relationship
between the two measures was examined while attempting to control for age. To do this,
the relationship between the two measures was examined for children within a more
developmentally stable age range (i.e., 42-50 month range). This approach to controlling
for age was taken instead of partialling age out of the correlation because age is
inextricably linked to comprehension ability and because a valid measure of a construct
should contain all aspects relevant to that construct. Although the magnitude of the
correlation between the JSR and comprehension questions decreased slightly when age
varied only minimally, the correlation remained in the moderate range, suggesting that the
JSR and comprehension questions shared variance that was not solely attributable to age.
Given that comprehension questions are the widely accepted measure of discourse
comprehension, it seems plausible that the shared variance reflects the construct of
discourse comprehension.
It may be argued that the close tie between stimulus and response in the cloze task
might permit children to respond correctly to cloze items that they do not comprehend.
That is, perhaps the close temporal proximity of the stimulus provides such a direct link to
the correct response that children can produce that response, even when they do not
understand the passage. Evidence obtained from this investigation, however, suggests that
this is not likely the case. First, as discussed previously, older children perform more
accurately on the JSR than younger children. If the JSR were merely providing cues to
accurate responses in the absence of comprehension, it would be expected that younger
children could perform just as well on the task as older children. This was not the case.
Furthermore, it has been shown that the JSR is strongly related to performance on
comprehension questions. Since comprehension questions measure story understanding,
the finding of a close relationship between the ISR and the comprehension questions
suggests that the ISR like the comprehension questions, provides a measure of discourse
comprehension ability.
In addition to the comparisons made between the JSR and the comprehension
questions, comparisons were also made between the JSR and the two comprehension tests
employed in pre-experiment testing. Results revealed that both the vocabulary pre-test and
the PLS-3 AC subscale were at least moderately correlated with the JSR However, only
scores on the vocabulary pre-test contributed uniquely to performance on the JSR,the
PLS-3 AC scores did not. As discussed previously, both lexical and syntactic
-
understanding are necessary (though not sufficient) for understanding discourse. Thus,
given that the JSR measures discourse comprehension, it is not surprising to find that
vocabulary knowledge conmiutes uniquely to performance on the measure. A possible
explanation for the finding that PLS-3AC scores did not also contribute uniquely to
performance on the JSR may have to do with the areas of comprehension tapped by this
measure. Relatively few items on the PLS-3 AC subscale specifically test vocabulary or
syntactic knowledge, abilities that are known to be imponant components of discourse
comprehension. Rather, many of the items on the PLS-3 AC subscale tap general
conceptual knowledge, including spatial, temporal, and quantitative relations. This type of
conceptual knowledge may not have as direct an impact on discourse level comprehension
as does lexical and syntactic knowledge, at least not in the type of story employed in this
study.
In addition to the evaluations performed to determine the JSR's developmental
sensitivity and concurrent validity, evaluations were also performed to examine the role of
enabling factors in task performance. As stated previously, two enabling factors, memory
and language production, typically intrude in measures of discourse comprehension,
making interpretation of performance difficult. The JSR was designed to limit the intrusion
of these factors, allowing more direct interpretation of test scores. As Anaaasi (1988) has
discussed. decisions about whether a new test may be said to be free of the influence of
enabling factors may be made based on an examination of the correlations that exist
between the new tea and other existing tests. Such correlations were employed in
Skarakis-Doyle & Woofton's (1998) prefimkary investigation of the JSR.The results of
48
that investigation suggested that the JSR did, in fact, reduce the enabling factors of both
memory and language production.
In the present investigation, correlations between the JSR and the EVDT were
again obtained in order to examine the assertion that the JSR limits the enabling factor of
memory in task performance. One way the JSR was designed to limit memory demands
was by reducing the temporal gap between stimulus and response. The EVDT also
purportedly limits memory demands by reducing the temporal gap between stimulus and
response (Skarakis-Doyle & Wootton, 1998). When correlational analysis was conducted,
a moderate correlation between these two measures was revealed. Although, as expected,
age clearly played a role in performance on both the JSR and the EVDT, the fact that a
moderately low correlation was found between the two measures even in a more
developmentally stable age range suggests that there was some shared variance between
the measures not attributable to age alone. While the JSR and the EVDT both establish
close temporal links between the stimulus and response, the tasks also differ in important
dimensions. The JSR requires completion of story elements, a recall task, while the EVDT
requires detection of story alterations, or recognition responses. Thus, it seems plausible
that the source of their shared variance is related to the fact that both measures minimize
memory demands. However, it is also plausible that their shared variance reflects the fact
that both measure discourse comprehension.
To investigate these two alternatives. correlations of the JSR and EVDT, both of
which limit memory demands, were examined relative to the comprehension questions,
which impose greater memory demands. Both the JSR and EVDT were more strongly
correlated with each other than either was with comprehension questions. All three
measures s h a d approximately 22% to 32% of the variance, suggesting that all measure
discourse comprehension. The differences in amount of shared variance between the pairs
of measures. supports the first alternative, that both the JSR and EVDT minimize memory
demands relative to the comprehension questions.
A comparison of the relationship found between the JSR and EVDT in the present
investigation (r = .57) with the relationship found between those measures in the
preliminary investigation (r = .62) (Skarakis-Doyle & Woonon, 1998) reveals that the
correlations are similar but not identical. One possible explanation for the slight difference
in results between the two investigations has to do with alterations made to the EVDT
stimuli for the current investigation. Many of the violations originally employed by
'
Skarakis-Doyle & Wootton (1998) involved the substitution of one element with another
element of the same class (e.g., actor, action, object) that was not from the story, making
violations more obvious. None of the violations employed in this investigation consisted of
this type of substitution. Rather, the violations employed in this investigation involved
within story substitutions or goal disruptions. It is possible that the inclusion of these more
subtle types of violations (i.e., within story substitutions) made the task more difficult for
children., accounting for the slightly lower correlation found between the JSR and EVDT
in the current investigation. The difference between the correlations obtained in the two
investigations might also have been affected by the fact that the children who participated
in the current study ranged in age from 30 to 50 months while the children who
participated in the preliminary study (Skarakis-Doyle & Wootton, 1998) were older,
ranging in age from 46 to 58 months.
This investigation pursued the role of memory in task performance by examining
the relationship between two measures (the JSR and the EMIT) that require a close
temporal relationship between stimulus and response as compared to a measure requiring
the entire story to be held in memory. In addition to the stimulus-response temporal
relationship, it has also been suggested that predictable, script-based stories may play an
imponant role in reducing memory demands (Hudson & Nelson, 1983; McCartney &
Nelson, L 98 1;Nelson, 1978; Pace & Feagans, 1984). As yet, the role of predictable
aories in reducing the memory demands of the JSR has not been examined. Further
research might compare children's performance on the JSR, when less predictable or nonscripted aories are utilized to their performance on the JSR when predictable aories such
as "Splish Splash" are employed. In addition to providing information about the role of
predictability in JSR performance, such a comparison would address the issue of the
extent to which the current results may be generalized to a wider range of aory types.
As well as the type of aory, the impact of number of story exposures provided
prior to the administration of the task should be examined. In the current investigation,
three story exposures were utilized prior to test administration since research had
indicated that children could establish a script for an unfamiliar story with three exposures
to that story (Slackman & Nelson, 1984). In the future, performance on the JSR following
three story presentations might be compared to performance following just one story
presentation. If memory can be shown not to intrude in test performance when only one
story presentation is provided prior to the task the clinical utility of the task might be
enhanced.
In addition to controlling memory demands, the JSR was designed to limit
language production demands relative to traditional discourse comprehension measures. It
was argued that the JSR was less demanding of language production abilities than the
traditional comprehension questions, because on the JSR children were only required to
produce limited elements of the text and because enactment responses were permitted as
an alternative to verbal responses. The results of the current investigation support this
argument.
Investigation of the role of global language production ability as measured by CDI111-production revealed moderate correlations between that measure and both the J SR and
the comprehension questions. However, global language production ability only
contributed uniquely to performance on the comprehension questions, not to performance
on the JSR.Given that both the JSR and the comprehension questions appear to measure
discourse comprehension, the diierence in language production requirements may be one
explanation for why the correlation between the JSR and the comprehension questions
was not even stronger than it was.
Additional support for the limited language production demands of the JSR can be
found through an examination of the enactment responses made by the children. First, only
3% of the total number of correct responses were made in this mode. Second, these
responses were accounted for by only six children who spanned the age range and
accuracy rate investigated in the study. Given these findings concerning the use of
52
enactment, it appears that the verbal cloze procedure alone sufficiently reduced language
production demands across the age span.
Further evidence concerning the role of language production in JSR performance
might be attained in the m r e by comparing the performance of children with language
production disorders to the performance of children with normally developing production
abilities. Ifthe JSR does, in fact, Limit language production demands. it would be
expected that children with normally developing comprehension, regardless of their
language production abilities. should perform equivalently on the JSR.
In addition to language production demands, another area that requires further
investigation involves the reliability of the JSR.Skarakis-Doyle and Wootton ( 1998)
compared children's performance on two forms of the JSR each of which was based on a
different stimulus story and found that scores on the two forms were highly correlated,
indicating that the JSR demonstrated alternate forms reliability. Evidence of test-retest
reliability would m h e r strengthen support for the reliability, and hence, the validity of the
measure.
One limitation of the current investigation which should be raised concerns the
characteristics of the participants. Children who participated in this investigation were
recruited largely from centres in the university community.Informal observations suggest
that many of these children came fiom highly educated families who were of middle to
upper socioeconomic status. Given that literacy experiences might be expected to vary
across the socioeconomic spectrum, fkther research should attempt to include children
from a broader range of socioeconomic backgrounds.
Conclusion
Construct validation requires that evidence accumulated fiom a variety of sources
support the appropriateness of test score interpretation (Messick, 1995). One of the
sources of evidence that construct validity relies on is evidence of expected performance
differences with age. Construct validity also relies on evidence of expected relationships
between measures. Evidence of an expected relationship between a new test and a preexisting test may be used to support the theory that the new test measures the same
construct as the pre-existing test. That is, such evidence may be used to show that a new
test demonstrates concurrent validity. Evidence of an expected relationship between tests
may also be used to demonstrate that a new measure is relatively free of any constructirrelevant factors (e.g., enabling factors) that might confound tea score interpretation.
Such evidence supports the content-relevance of a test.
Expectations about how the JSR would be related to age, as well as how it would
be related to certain other comprehension and production measures were based on
theoretical arguments about the developmental nature of comprehension, analyses of task
demands, and results of the preliminary investigation (Skarakis-Doyle & Wootton, 1998).
First, it was expected that scores on the JSR would increase with age since there is rapid
growth in both modalities of expression and comprehension during the early preschool
years (Chapman, 1978). Empirical evidence supported this expectation; the JSR exhibited
the expected performance differences with age. Second, it was expected that the JSR
would demonstrate only a moderate correlation with comprehension questions since both
measures claim to measure discourse comprehension but differ in that the JSR purports to
limit enabling factors relative to the traditional measure. Again, empirical evidence
supported this expectation; the JSR did, in fact, demonstrate a moderate correlation with
the comprehension questions, suggesting that both are measures of discourse
comprehension. Third, it was expected that the JSR would be more strongly related to the
EVDT than to the comprehension questions, given that the JSR like the EVDT. and
unlike the comprehension questions, was purported to minimize memory demands by
maintaining a close temporal relationship between stimulus and response. This expectation
also received empirical support; the ISR was more strongly correlated with the E M T
than it was with the comprehension questions. Finally, it was expected that language
production would contribute little to perfomance on the JSR, since the task requires
children to produce only specific and limited elements of the text. This expectation also
received empirical support; although it did contribute to perfonnance on the
comprehension questions, global language production ability did not to contribute to
performance on the JSR.
The finding that the ISR is sensitive to age differences, in conjunction with the
finding that the JSR demonstrates concurrent validity, suggests that this measure is a
potentially valid measure of discourse comprehension for children between the ages of 30
and 50 months. Further, the finding that the JSR limits enabling factors, indicates that the
measure may allow test scores to be interpreted more directly than has previously been
possible since it appears that the potential for memory and language production to
confound performance have successllly been limited.
As stated previously, comprehensive measurement of comprehension has been
limited by the lack of discourse comprehension measures. The risk associated with the
paucity of discourse comprehension measures has been that scores on available measures,
which are typically only valid for the measurement of decontextualied literal and syntactic
comprehension, might be used to draw inferences about the broader construct. Given this
risk, it was evident that a measure designed specifically to tap discourse comprehension
needed to be developed. For a new test to be added to the battery of comprehension tests,
there should be evidence that the test is valid. AIthough hnher investigations of the
measure are necessary, evidence obtained from this investigation suggests that the JSR has
the potential to be a valid measure of discourse comprehension. It appears that the JSR
could be a meaningfUl addition to the current battery of comprehension measures. Its use,
in conjunction with the use of literal level comprehension measures, should allow more
comprehensive measurement of young children's comprehension in the b r e .
Appendix A
Vocabulary me-test
1.
pull
-
9.
drop
2.
look for
-
10.
dirty
3.
turn
-
11.
outside
4.
dry
12.
open
5.
foot
13.
rubber duckie
6.
washcloth
14.
house
7.
mother
15.
bathtub
8.
soap
6
clothes
-
Appendix B
Earlv Literacy Questiomaire
1 . Does your child enjoy listening to stories and looking at books?
2. How often do you and your children have storytime?
Once a week
Twice a week
Three times a week
More than three times a week:
Specify
Other
3. Describe your typical storytime routine:
a) Who participates in storytime?
b) Who selects the story?
C)
Where does storytime take place?
d) When does storytime take place?
e) Anything else?
4. When you read with your child, are any of the following typical of your reading style?
Please check.
Yes
No
a. Talking about the pictures
--
b. Asking your child questions about the story
--
c. Asking your child to tell parts of the story
--
d. Relating parts of the story to personal experiences
(e.g.,
"Oh,remember when we built a snowman.")
Other
5. Does your child experience storytime in locations other than your home? (e.g.,
preschool library). If so. please describe.
6. Do you read patterned stories such as "Monimer" or "ThePaper Bag Princess" by
Robert Munsch?
Yes
No-
If yes, please elaborate (frequencyof exposure, your child's response to these books.. ..)
7. What are some of your child's favourite stories?
8.
How does your child respond when you introduce a new story?
9. What happens if you accidently make a mistake when you read a story (e.g., misname a
character, leave something out)? Can you recall a time when that happened?
Appendix C
Oriainal Version of "S~lishS~lash"
One day a little girl named Sarah made twenty very messy mud-pies in the back yard.
Sarah's mother took one look at her, and said, ''Splish, splash, Sarah needs a bath.
M o m m y says you're dirty and she can't have that."
So the mother took Sarah upstairs to the bathroom. Sarah took off all her dirty clothes
and her mother wrapped her in a big towel. Then the mother filled the bathtub with water,
and said, "Splish, splash, Sarah needs a bath. Mommy says you're dirty and she can't have
that."
Sarah stuck her big toe into the bathtub. But then she said, "Oh Mommy, the water's too
cold. The water must be nice and warm."
So the mother reached for the tap and turned on the hot water until it was nice and warm.
Then Sarah stuck her foot into the bathtub and said, "I like the nice warm water. But
Mommy, I just can't have a bath. I must have my red washcloth."
So the mother opened the bathroom cupboard and pulled out all of the washcloths until
she found Sarah's red washcloth. Then she dropped it into the bathtub, and said, "Splish,
splash, Sarah needs a bath. Mommy says you're dirty and she can't have that."
61
Sarah squeezed all the water out of the washcloth. But then she said, "Oh Mommy,Ijust
can't have a bath. I must have my big bar of soap."
So the mother dug through the soap drawer until she found a big bar of soap. Then she
rubbed the soap on the red washcloth, and said, "Splish, splash, Sarah needs a bath.
Mommy says you're dirty and she can't have that ."
Sarah took the washcloth. But then she said, "Oh Mommy, i just can't have a bath. I must
have my little rubber duckie."
So the mother looked all over the bathroom, but she couldn't find Sarah's little rubber
duckie. Then she iooked all over Sarah's bedroom, but she still couldn't find Sarah's
rubber duckie. Then she looked ail over the house, but she still couldn't find the rubber
duckie.
Sarah waited and waited for her mother to come back. But Sarah got tired of sitting in the
bathroom. So she dried her foot, dressed in all her dirty clothes again, and ran outside to
make more mud-pies.
Appendix D
Cloze (Joint Stow Retell) Version of "S~lishSplash"
One day a little girl named Sarah made twenty very messy mud-pies in the back yard.
9'
Sarah's mother took one look at her, and said, (1) "
So the mother took Sarah upstairs to the bathroom and filled the bathtub with water.
Then, the little gid named (2)
stuck her big toe into the bathtub and said,
"Oh Mommy, the water's too cold. The water must be nice and warm."
So the mother reached for the tap and (3)
. The water was just
right. Then, Sarah started to wash her foot with her red washcloth. But then she said, "Oh
Mommy,I just can't have a bath. I must have a (4)
v*
So the mother dug through the soap drawer and found a big bar of soap. But then Sarah
said, "Oh Mommy, I just can't have a bath. I must have my (5)
*?
So the mother looked all over the bathroom, and all over the bedroom but she couldn't
find Sarah's little rubber duckie. Then she looked all over the (6)
,but
she still couldn't find the rubber duckie.
Sarah waited and waited for her mother to come back. But Sarah got tired of sitting in the
63
bathroom. So she dried her nice clean foot, dressed in all her (7)
again, and ran (8)
Total: I8
to make more mud-pies.
Com~rehensionQuestions: Form A
1. What does Sarah's mother say?
a. Is Sarah dirty?
b. Does mom want Sarah to go outside and play?
2. Who stuck her big toe into the bathtub?
3. Mom filled the bathtub with water.
Sarah stuck her big toe in the bathtub.
Then mom turned on the hot water. Why?
Follow-up: Was the water too cold for Sarah?
4. Does Sarah have everything for her bath?
a. What does Sarah need?
5. Where did mom look for Sarah's little rubber duckie?
a. Did mom say, "No, you can't have your little rubber duckie."?
6. Sarah got tired of waiting in the bathroom. Where did Sarah go?
7. Did Sarah put on her dirty clothes?
8. Did Sarah want to take a bath?
a. Why/why not?
Total: 110
Appendix F
Com~rehensionOuestions: Form B
1. What does Sarah's mother say?
a. Is Sarah clean?
b. Does mom want Sarah to take a bath?
2. Who stuck her big toe into the bathtub?
3. Mom filled the bathtub with water.
Sarah stuck her big toe in the bathtub.
Then mom turned on the hot water. Why?
Follow-up: Was the water just right for Sarah?
4. Does Sarah want her big bar of soap?
5. Where did mom look for Sarah's little rubber duckie?
a. Did mom look and look for Sarah's little rubber duckie?
6. Sarah got tired of waiting in the bathroom. Where did Sarah go?
7.Did Sarah put on clean clothes?
8. Did Sarah want to play outside?
a. Why/why not?
Total: I9
Appendix G
Emectancv Violation Detection Task Version of "Splish Splash"
One day a little girl named Sarah made twenty very messy mud-pies in the back yard.
Sarah's mother took one look at her, and said. "Splish, splash, Sarah needs a bath.
Mommy says you're dirty and she can't have that."
So the mother took Sarah upstairs to the bathroom. Sarah saueezed (took off) all her diny
clothes and her mother wrapped her in a big towel. Then the mother filled the bathtub with
water, and said, "Splish, splash, Sarah needs a bath. Mommy says you're dirty and she
can't have that." [within story substitution]
The mother (Sarah) stuck her big toe into the bathtub. But then she said, "Oh Mommy,
the water's too cold. The water must be nice and wann." [within story substitution]
So the mother reached for the tap and turned on the cold (hot) water until it was nice and
warm. Then Sarah stuck her foot into the bathtub and said, "I like the nice warm water.
But Mommy, I just can't have a bath. I must have my red washcloth." [goal disruption]
So the mother opened the bathroom cupboard and pulled out dl of the washcloths until
she found Sarah's red washcloth. Then she dropped it into the bathtub, and said, "Oow
pooev. Let's make a mess: Mud-~iesand french fries and stuff like that." ("Splish, spiash,
67
Sarah needs a bath. Mommy says you're dirty and she can't have that." [goal disruption
and prosodic alteration]
Sarah squeezed all the water out of the washcloth. But then she said, "Oh Mommy, I just
can't have mv messv mud-des (a bath). I must have my big bar of soap." [within story
substitution]
So the mother dug through the soap drawer until she found a big bar of soap. Then she
rubbed the soap on the red washcloth, and said, "Splish, splash, Sarah needs a bath.
Mommy says you're dirty and she can't have that." Sarah took the washcloth. But then
she said, "Oh Mommy,i just can't have a bath. I must have my dim clothes (little rubber
duckie)." [within story substitution]
So the mother looked all over the bathroom, but she couldn't find Sarah's little rubber
duckie. Then she looked all over Sarah's bedroom, but she still couldn't find Sarah's
rubber duckie. Then she looked all over the house. but she still couldn't find the rubber
duckie.
Sarah waited and waited for her mother to come back. But Sarah got tired of sitting in the
bathroom. So she dried her foot, put on her red washcloth (dressed in all her dirty clothes
again), and i m e d in the tub to take a bath (ran outside to make more mud-pies). [within
aory substitution I goal disruption]
Appendix H
Joint Story Retell Scoring K w
Target Res~onse
Acceptable Variations
1 ) Spiish splash, Sarah needs a bath..................................
Splish splash, Sarah needs a
Mommy says you're dirty and
bath.
she can't have that.
Take a bath
You need a bath.
You're dirty.
2) Sarah ............................................................................. none
3) turned on the hot water ..................................................in hot water
made the water hot
made it hot
4) big bar of soap .......................................................
....(bar of') soap
big soap
5) rubber duckie .................................................................rubber
duck
duck(ie)
6) house ...........................................-............-..................home
place
7) dirty clothes .................................................................. .grubby clothes
8) outside ........................................................................... out the door
out in the backyard
Appendix I
Wh-Question Scoring Key
Question
Acce~tableRes~onses
1) What does Sarah's mother say?..................................
Splish splash. Sarah needs a
bath
Take a bath
You need a bath
You're dirty
2) Who stuck her big toe into the bathtub........................... .Sarah
The little girl
3 ) Mom filled the bathtub with water
Sarah stuck her big toe into the bathtub.
Then mom turned on the hot water. Why? ...................... To make it nice and w m
Sarah wants warm water
The water was too cold
4) What does Sarah need?............................................... Red washcloth, Big bar of
soap or Rubber duckie
5 ) Where did Mom look for Sarah's rubber duckie?........... Bathroom, Bedroom or All
over the house
6) Where did Sarah go?...................................................... Outside
In the backyard
Out the door
7) Did Sarah want to take a bath? Why not?....................... She wanted to play outside
She wanted to make rnud-pies
She was tired o f waiting
She didn't have her soap/
rubber duckie/washcloth
The water was too cold
8) Did Sarah want to play outside? Why?.............................She wanted to make mud-
pies
She didn't want to take a bath
Appendix I
The UNlVERSITYof WESTERN ONTARIO
Vi-Pmsident (Rrsearth)
W h i u Reuitw Board Dental Seunces Budding
REVIEW BOARD FOR HEALTH SCENCES RESEARCH INVOLVING HUMAN SUBJECTS
1998-99 CERTIFICATlON OF APPROVAL OF HUMAN RESEARW
ALL HEALTH SCIENCES RESEARCH INVOLVING HUMAN SUBJECTS AT THE UNIVERSITY OF WESTERN ONTARIO 1s
CARRlED OUT IN COMPUANCE VKlW THE MEDICAL RESEARCH COUNCIL OF CANADR'GUIDELINES ON
RESEhRCH WOLVMG HUMAN SUBJECT:'
1998-99 REVIEW BOARD MEMBERSHIP
-
I) Dr. 8. Barwin. &isant Dan-Recanit Medicine (Chiman) (hMcomylOph~h;rlmotogy)
2) Mr. S. Hoddinark D i m o r of Ruareh Servicc~(Epidemiology)
3) Dr. K. Giikrl. Sc Joseph's Health Centlrc Rcprr~cnulivc
( f n m d Maiicim)
1) Dr. R McMurus, London Wth Sciuyrs Cencrr Victoria Campus R c p ~ t a t i w
(Endocrinology k M&balism)
5) Dr. D. beking. landon Health Scicnas Centn Udwrsity Clmpws Representative(Phpician lntcmal Medicine)
6) Dr. L HclItr, Offioc of the President Rqr~+~~~Otivc
(Fh)
7) Mrs. M Urn,Office of the Pccsident Rtpcc~ntative(Community)
8) Mr. S. Fincher-Stall, Offia of the President Rtptt~lltaliw
(Legal)
9) Dr. 0. Framats Faculty o f Mcdicinc & Dtnliztry Rcpracnlativc(CIinial)
10) Dr. D. Sim, Facui~yof Medicine & Dentistry Representative@asic)(Epidcmialogyl
I 1) Dr. T.M. Underhill. School of Dentistry Rcpnxatativc (Od Biology)
12) Dr. H. k h i n g e r . School of Nursiog Rcptucmtrtive (Nursing)
13) Dr. S. Aduns, Faculty of Heaith Scienas Rtprtsentali\r (Communrauon Science k Disorders)
14) Mn. L Noq London Clinical Rexarch &ochLioa Rcprcxntative
15) Dr. P.G.R I-titrding, Research Insrilulcs Rcprcscnutiw (ObstetridGynaccology)
Kt) Mrs. R Yohnidci. Administntivc Officcr
Altcnutcs an appointed for each member.
-
-
-
THE REVIEW BOARD HAS EXAMINED THE RESEARCH PROJECT ENITLED
IThe joint aory melt u a mcanrre of young children's camprchcnsiaa of familiar stories"
REVIEW NO:
W21
AS SUBMITTED BY
-
Dr. E. Skarakis-Doyle Comrnuniacian Scicnoa and Disorden. Elborn College
AND CONSIDERS K TO BE ACCEETABLEON ETHICAL GROUNDS FOR RESEARCH [NVOLVING HUMAN SUBJECTS
UNDER CONDtTIONS OFTHE UNIVERSKY'S POLICY ON RESEARCH INVOLVING HUMAN SWBJECTS
APPROVAL DATE:
03 Nm+mbcr 1998 (h
Later of Information )
AGENCY
London. Onurto * Canada
K6A 5C1
Telepbane 1519) 661 3036
Fax 15191 661 3875
References
Anastasi A. (1988). Validity: Basic concepts. In Psvcholoaical testing (@ed., pp. 139164). New York: MacmiUan Publishing Co.
Applebee, A. N. (1978). The child's concept of story. Chicago: University of Chicago
Press.
Bates E. (1993). Comprehension and production in early language development.
Monoara~hsof the Societv for Research in Child Develournent. S8(3 -4), 222-242.
Carlisle. J. F. ( 1991). Planning an assessment of listening and reading comprehension.
To~icsin Lanrmane Disorders. 12(1), 17-31.
Chapman, R. (1978). Comprehension strategies in children. In J.F. Kavanaugh & W.
Strange (Eds.), Lanauaae and speech in the laboratory. school. and clinic (pp. 309327). Cambridge, MA: MIT Press.
Dale, P. (1996). MacArthur Communicative Development Enventorv- III ICDI-1111.
Seattle, WA.
Dempsey, L., Perfetti, T.,& Skarakis-Doyle, E. ( 1999). Early literacy auestionnaire.
Unpublished Manuscript. University of Western Ontario, London, Ontario.
Diehl-Faxon, J., & Dockstader-Anderson, K. (1985). Discourse intonation patterns of
mothers reading to their young children... readerese. National Readinn Conference
Yearbook. 34,300-305.
Dietrich, T.. Freeman, C., & GnffiR P. ( 1979). Assessing comprehension in a school
setting. In. R.W. Shuy (Series Ed.) & P.Griffin (Val. Ed.), Pawrs in ap~lied
linpuistics: Linguistics and readiia series: Vol. 3. Arlington TX:Center for
Applied Linguistics.
Feagans, L., & Farran, D. C. (1981). How demonstrated comprehension can get muddled
in production. Developmental Psvcholoav. 17(6), 718-727.
Glazer, J. I. (199 1). Literature for vouna children. (3rd ed.). New York: Macmillan
Publishing Co.
Heath, S. B. (1986). What no bedtime story means: Narrative skills at home and school. In
B. B. SchieffeUin, & E. Ochs (Eds.), Lanauaae socialization across cultures:
Studies in the social and cultural foundations of lanrmaae. No. 3. (pp. 97-124).
New York: Cambridge University Press.
Hudson,J., & Nelson, K.(1983). Effects of script structure on children's story recall.
Develoomentd Psvcholonv. 19(4), 625-635 .
Lynch, P. (1986). Usina bia books and medictable books. Toronto, ON: Scholastic-TAB
Publications Ltd .
McCabe, A. (1996). Evaluating narrative discourse skills. In K. Cole, P.Dale, & D.Thd
(Eds.), Assessment of communication and lanmane (pp . 12 1- 142). Baltimore: Paul
H.Brookes Publishing Co.
McCartney, IS.A., & Nelson, K. (1981). Children's use of scripts in story recall.
Discourse Processes, 4,59-70.
McCauley, R. J., & Swisher, L. (1984). Psychometric review of language and articulation
tests for preschool children. Journal of Speech and Hearing Disorders. 49,34042.
Messick, S. ( 1995). Validity of psychological assessment : Validation of inferences from
persons' responses and performances as scientific inquiry into score meaning. In
A. E. Kazdan (Ed.), Methodoloeical issues & strategies in clinical research (2*
ed., pp. 24 1-260). Washington, D.C : American Psychological Association.
Miller, J. F., & Paul, R. (1995). The clinical assessment of lanmane com~rehension.
Baltimore: Paul H. Brookes Publishing Co.
Nelson, K. (1978). How children represent knowledge of their world in and out of
language: A preliminary report. In R. S. Seegler (Ed.), Children's thinkinn: What
develops? (pp . 25 5-273). Hillsdale, NJ: Lawrence Erlbaum Associates.
Pace. A. J., & Feagans, L. (1984). Knowledge and language: Children's ability to use and
communicate what they know about everyday experiences. In L. Feagans, C.
Garvy, & R. Golinkoff (Eds.), The origins and nrowth of communication (pp. 268280). NJ: Norwood.
Rees. N. S. & Shulman, M. (1978). Idon't understand what you mean by comprehension.
Journal of S~eechand Hearinn Disorders. 43,208-219.
Scarborough, H. S., & Dobrich, W. (1994). On the efficacy of reading to preschoolers.
Develoomental Review. 14(3), 245-302.
Schneider, W., & Pressley, M. (1997 ). Memory develooment between two and twentv.
(2"6ed.). Mahwah, NJ: Lawrence Erlbaum Associates Inc.
Skarakis-Doyle, E. (1998). Emernence of young children's comorehension monitoring of
stories. Unpublished Manuscript.
Skarakis-Doyle, E.,& Wootton, S. (1998). Measuring oreschool children's discourse
com~rehension:A modified retell ~rocedure.Paper presented at the 19' m u d
Symposium on Research in Child Language Disorders, Madison, WI.
Slackman, E., & Nelson, K. (1984). Acquisition of an unfamiliar script in story form by
young children. Child Development. 55,329-340.
Snow, C. E., Perlmann, R., & Nathan, D.(1987). Why routines are different: Toward a
multiple-factors model of the relation between input and language acquisition. In
K. E.Nelson, & A. van Kleeck (Eds.), Children's Language: Volume 6 (pp. 6597). NJ: Lawrence Erlbaum Associates Inc.
Stein, N. (1988). The development of children's storytelling skill. In M. B. Franklin, & S.
Barten (Eds.), Child lannuaae: A book of readings (pp. 282-297). New York:
Odord University Press.
Thal, D. (1991). Language and cognition in normal and late-talking toddlers. Tooics in
Language Disorders. l 1,33-42.
Thal. D., Tobias, S., & Morrison, D.(1991). Language and gesture in late talkers: A 1year follow-up. Journal of Speech and Hearing Research 34,604-612.
Tyler, L. ( 1991). The distinction between implicit and explicit language functions:
Evidence from aphasia. In A. D. Milner, & M. D. Rugg (Eds.), The
neuropsycholo~vof consciousness (pp. 159- 178). New York: Academic Press.
Wan-Leeper, G.,Miller, L., Brac, M., Culhane, R.Bernhard, K., & Yearous, J. (1997).
Use of questionnaires to identifv children at risk of hearing impairment. Poster
session presented at the annual OSLA Conference, Toronto. Ontario.
Wootton, S., & Skarakis-Doyle, E. (1995). S~lish!%lash.Unpublished manuscript.
Zimrnerman, I., Steiner, V., & Pond, R (1979). Preschool L a n w e Scale4 Columbus,
OH:Charles E-Memll.