THE VALIDITY OF THE JOINT STORY RETELL AS A MEASURE OF YOUNG CHILDREN'S COMPREHENSION OF FAMILIAR STORIES Lynn F . Dempsey School of Communication Sciences and Disorders Submitted in partial fulfillment of the requirements for the degree of Master of Science Faculty of Graduate Studies The University of Western Ontario London, Ontario August, 1999 0 Lynn F. Dempsey 1999 National Library Bibliotheque nationale du Canada Acquisitions and Bibliographic Sewices Acquisitions et services bibliographiques 395 Wellington Street OttawaON KlAON4 Canada 395, rue Wellington OttawaON KlAON4 Canada Your h& Votre mIbrmce Our irre Norre relermcs The author has granted a nonexclusive licence allowing the National Library of Canada to reproduce, loan, distribute or sell copies of thls thesis in microform, paper or electronic formats. L'auteur a accorde une licence non exclusive permettant a la Bibliotheque nationale du Canada de reproduire, preter, distribuer ou vendre des copies de cette these sous la forme de microfiche/film, de reproduction sur papier ou sur format electronique. The author retains ownership of the copyright in this thesis. Neither the thesis nor substantial extracts fiom it may be printed or otherwise reproduced without the author's permission. L'auteur conserve la propriete du droit d'auteur qui protege cette these. Ni la these ni des extraits substantiels de celle-ci ne doivent Stre imprimes ou autrement reproduits sans son autorisation. Abstract This study investigated the construct validity of the Joint Story Retell (JSR),a newly developed measure of young children's oral discourse comprehension adapted from the doze procedure. Thuty-eight children between 30 and 50 months of age were presented with the JSR and several additional language comprehension and production measures in order to investigate the developmental sensitivity, concurrent validity, and content relevance of the JSR Results revealed that older children performed significantly more accurately than younger children on the JSR, indicating that this measure is sensitive to age differences. In addition, results revealed a moderately strong relationship between the JSR and traditional comprehension questions, suggesting that the JSR demonstrates concurrent validity with an accepted measure of discourse comprehension. Finally, results indicated that the JSR minimires enabling factors such as memory and language production, and thus has content relevance. These findings suggested that the JSR may provide a valid measure of young children's discourse comprehension. Key words: Joint Story Retell, doze, language, comprehension, discourse, measurement, validity, children, stories ACKNOWLEDGEMENTS I would like to thank my chief advisor, Dr. Elizabeth Skarakis-Doyle, for her assistance and support. Under her guidance my understanding of the research process and my interest in the measurement of early language comprehension developed. I would also like to thank my colleague, Tania Perfetti, for her assistance during the experimental testing phase of the investigation. Thank-you to the members of my advisory committee and examining board for their thoughtfbl comments on this research project. Thanks also to Dr. Philip Doyle for his support during the preparation for my defense. I would like to extend my sincere appreciation to the parents and children who participated in this investigation. I am also grateful to the individuals and community facilities who helped me to locate these families. This research was supponed by the Harmonize for Speech Fund, Ontario District. TABLE OF CONTENTS Page CERTIFICATE OF EXAMINATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. ... ui 11 ABSTRACT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ACKNOWLEDGEMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iv TABLE OF CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v LIST OF TABLES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vi LIST OF APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 METHOD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Participants . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 Materials and Experimental Tea Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 . DISCUSSION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 APPENDICES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 REFERENCES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .72 VITA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 LIST OF TABLES Table Description Page 1 Mean Ages, and Mean Raw Scores and Standard Deviations for Two Groups of Children on Pre-Experiment Tests . . . . . . . . . . . . . . . . . . . 16 2 Correlation Coefficients for Age, JSR, and Comprehension Questions@=38) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3 Mean Number of Accurate Responses and Standard Deviations on the JSR and Comprehension Questions for Two Age Groups . . . . . . . . . . . 33 4 Correlation Coefficients for Age, JSR and Comprehension Questions for Children Between 42 and 50 months (g = 23) . . . . . . . . . . . . . . . 35 5 Correlation Coefficients for Age, JSR Comprehension QuestionsandEM)T(n=24) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 LIST OF APPENDICES Appendix Page Appendix A Vocabulary pre-test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 Appendix B Early Literacy Questionnaire . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Appendix C Original Version of "Splish Splash" . . . . . . . . . . . . . . . . . . . . . . . . . . . 60 Appendix D Cloze (Joint Story Retell) Version of "Splish Splash . . . . . . . . . . . . . . 62 Appendix E Comprehension Questions:Form A . . . . . . . . . . . . . . . . . . . . . . . . . . . 64 Appendix F Comprehension Questions:Form B . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 Appendix G Expectancy Violation Detection Task Version of "Splish Splash" . . . . . 66 Appendix H Joint Story Retell Scoring Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68 Appendix 1 Wh-Question Scoring Key . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 Appendix I Certification of Approval Of Human Research vii ................... 71 Introduction The nature of childhood language impairment and its long-term prognosis differ when both expressive and receptive abilities rather than solely expressive abilities are impaired (Thal,1991;Thal,Tobias, & Momson, 1991). A study of latetalkers by Thal et al. (1991) suggested that young children who have both expressive and receptive language delays have a much poorer chance of catching up with their age matched peers than latetalkers with normally developing comprehension. As Bates (1993) states, "what a chiid knows is ultimately a better predictor of language ability than what a child does" (p. 233) when he or she is very young. Given that impaired comprehension is associated with a different long-term prognosis than expressive-only impairments, reliable and valid measurement of young children's comprehension is essential. Early differential identification of the language impairment most certainly depends on it. Unfortunately, the development of comprehensive measurement of language comprehension in young children has been limited by the influence of syntactic based theories of language acquisition, the emphasis on decontextudized linguistic comprehension, and by the inherent difficulties in reliably assessing young children (Bates, 1993). Language comprehension is a broad construct that encompasses multiple levels, including: the lexical or word level, the syntactic or phrase and clause levels, and the discourse or text level. Individuals build representations based on information from all these levels in order to acquire the full range of meaning from language @filer & Paul, 1995). Atthough all levels of comprehension are necessary for complete understanding of language, most efforts at measuring young children's comprehension have focused on the lexical and syntactic levels. This is the case even though toddlers and preschoolers are regularly exposed to extended units of language rather than decontextualized words and sentences (Snow, PerImam, & Nathan 1987). Although the young language-leaning child is frequently exposed to expanses of language, measurement of young children's comprehension of discourse has been relatively neglected. As a result. only a narrow aspect of a broad construct is typically evaluated. This has implications for the validity of comprehension measurement. Validity is not a property of a measurement instrument independent of its hction. Rather, validity lies in the meaning and interpretation of the test scores (Anastasi, 1988; McCauley & Swisher, 1984; Messick, 1995). Thus,an instrument that is vaiid for the measurement of one attribute may be less valid for the measurement of another. This means, in the case of comprehension, that a valid tea of literal level comprehension is not necessarily a valid instrument for the measurement of discourse level comprehension or for the measurement of language comprehension as a whole. The risk associated with the paucity of measures that tap comprehension at the discourse level, is that performance on Literal level comprehension measures may be used to draw inferences about discourse level abilities. Yet,regardless of the validity with which a task measures a literal level comprehension ability, when inferences about the broader conmct are made on the basis of such a task, those inferences will not necessarily be valid. The logic of this argument is clear when the specific abilities required to comprehend language at each level are considered. At the lexical and syntactic levels, decontextualized comprehension involves identifyurg the referents of single words and decoding meaning relations within a sentence (Miller & Pad, 1995). At the discourse level, comprehension, in addition to requiring word and sentence understanding, involves 3 making judgments based on social, textual, scriptal and other forms of prior knowledge to determine what an utterance means, in relation to what else has gone on in the discourse (Miller & Paul, 1995; Rees & Shulman, 1978). Since discourse comprehension requires more than simply the ability to understand decontexhlalized words and sentences, performance on tasks that measure only word and sentence understanding cannot be taken as indicative of the ability to comprehend longer segments of language (McCabe, 1996). However, the risk that invalid interpretations about comprehension will be made on the basis of literal comprehension measures will remain until an acceptable measure of young children's discourse comprehension is developed. Such a measure, in conjunction with measures of lexical and syntactic level comprehension, might permit more comprehensive measurement of language comprehension than is currently possible. As has been discussed, the risk of drawing invalid conclusions about comprehension might be lessened if a measure designed to tap discourse comprehension were available. For such a measure to contribute rneaningfblly to comprehension measurement as a whole. it would have to be valid in its own right. There are several issues associated with the development of valid measurement tasks. One of the most critical for discourse comprehension would be ensuring that tasks include representative and relevant content. According to Messick (1995) if a test is to be valid, it must be carefblly designed to include content that is representative of the domain of interest. The test must also be designed to include only content that is relevant to the construct domain. If the test is too broad, containing excess variance associated with other distinct constructs, interpretation of test scores may be confounded (Messick, 1995). Validity is built into a test tiom the outset through the choice of representative content (Anastasi, 1988; Messick, 1995). Therefore, for a discourse comprehension measure to be valid, it should include content that is representative of discourse, the domain of interest. Discourse is present in a number of different genres, inc1ucliig conversations and stories. It is an extraordinarily contextudked, profoundly social form of language (McCabe, 1996). A representative measure of discourse should reflect this. Unfortunately, the need to accurately reflect the contextual and social nature of discourse can conflict with the need for rigorous, controlled measurement of this construct. Conversation, while reflecting the social nature of contextualized discourse, is not easily controlled since its content is often variable and difficult to predict. Story-reading on the other hand, reflects the social nature of contextualized discourse and at the same time is a form of discourse whose properties may be controlled to allow for consistent, reliable measurement across children. and over time. It is not suggested that children's ability to understand stories would necessarily be indicative of their ability to understand conversation. But rather, that story, since it can be controlled, may be the most viable form of discourse to measure. Messick (1995) states that one way of ensuring that content is representative of the construct is to select it in terms of its fimctional importance or ecological validity for the population of interest. According to Messick, ( 1995) functional importance can be considered in terms of what individuals amally do in the performance domain. Evidence about the frequency with which young children engage in story-reading supports the notion that it is an activity that has fbnctional importance in the domain of discourse comprehension. According to Carlisle, (199 1) most young, middle-class children have quite extensive experience with narratives encountered in story-reading with adults. Interview or questiormaice responses by parents of two to five year old British and American children suggest that the typical mainstream preschooler is probably read to on a regular basis: 43 to 75% of children have reportedly been read to daily or more often (Scarborough & Dobrich, 1994). Heath (1986) notes that in societies where Literacy is valued, children give attention to books and information derived from books and begin to acknowledge questions about books from as early as six months of age. According to Heath (1986). by the time they are preschool age. most children accept book and bookrelated activities as entertainment. The type of stories young children are exposed to tend to be predictable in nature. That is, they are ofien based on familiar routines for which children have developed expectations about how events will unfold (e.g., bedtime, bathtime). These stories also tend to incorporate rhythm, rhyme, and repetitive words, phrases and episodes into their texts (Glazer, 1991; Lynch, 1986). Perhaps the reason for the widespread use of predictable stories with young children is that features such as familiarity, rhythm, rhyme and repetition are naturally engaging to young children. In fact, it has been suggested that such features may facilitate children's attention to and understanding of the material (Diehl-Faxon & Dockstader-Anderson, 1985). Some researchers have argued that exposure to predictable language may even facilitate language learning (Snow et al., 1987). In summary, stories, and predictable stories in particular, may be meaningfbl content for a measure of discourse comprehension. Story-reading is a highly social, contextualized activity and therefore, representative of discourse. At the same time its properties are controllable, which is important for reliable measurement. Story-reading is also a form of discourse that has hctional importance for young children. In addition, since story-reading is typically a f d i a r and enjoyable activity for young children, the compliance problems often associated with comprehension measurement may potentially 6 be minimized, making more reliable measurement possible. AU of these characteristics of story reading suggest that its use in a measure of discourse comprehension may contribute toward the validity of that measure. The preceding discussion addresses the issue of content representativeness and hndonai importance. Another aspect of validity is content relevance. Messick (1991) states that ifa task measures only the construct it is designed to measure it demonstrates content relevance. If, on the other hand, a task is too broad, containing excess variance associated with other distinct constructs. it demonstrates construct-irrelevance variance. This excess variance may confound measurement of the conmct, making interpretation of task performance more difficult. One group of factors whose presence in a measurement task can lead to content irrelevance variance are enabling factors. Enabling factors are aspects of a task that are extraneous to the construct, but are required for adequate task performance (McCauley & Swisher, 1984). An example of an enabling factor is the receptive language sMI required by many tests of expressive language. An individual taking an expressive language test needs to understand complex oral instructions in order to respond appropriately to test items (McCauley & Swisher, 1984). In this example, the intrusion of receptive language skills might affect performance on the test in a manner irrelevant to the construct and therefore make the test score susceptible to an invalid interpretation. Because enabling factors may compromise valid interpretation of tea scores, measurement tasks should be examined carefblly for their presence. Such an examination of the few existing tests of discourse comprehension reveals the presence of two enabling factors for test performance that typically intrude into measurement of this construct. Both the story retelling task, which requires the participant to listen to a story and then retell it, and the comprehension question task, which requires the participant to respond to questions posed about a stimulus, include memory and language production demands that may confound interpretation of test scores. Memory is an enabling factor for both the story retelling and comprehension question tasks. In both of these tasks, there is a temporal gap between the presentation of the stimulus and the response of the listener. This means that a representation of the entire stimulus, including the story structure as well as the lexical and grammatical elements, must be maintained in working memory until a response is required (Tyler, 1991). The memory demands of these traditional tasks may obscure the comprehension abilities of young children, since it is generally recognized that young children have more limited memory skills than older children and adults (Schneider & Pressley, 1997). Given this difficulty, it seems that a discourse comprehension measure which limits memory demands should be developed. However, this is not a straight-fonvard issue. Messick ( 1995) states that determining what constitutes construct-irrelevance variance is often a difficult and contentious task. This is certainly true in the case of comprehension and memory. Memory is not a construct that is clearly distinct from comprehension. Rather, memory and comprehension are closely interrelated. As a result of the interco~ectionbetween memory and comprehension, it may be impossible to completely eliminate the memory construct fiom the measurement of discourse comprehension. (It may also be undesirable since, to some degree, memory is a component of comprehension, and vice versa, and a valid measurement should reflect all aspects of the construct.) However, it may be possible to impose some limited control over the enabling factor of memory, thereby preventing a heavy memory burden fiom obscuring young children's comprehension abilities. One way of lessening the memory demands typically imposed by tests of comprehension may be to reduce the temporal gap between the presentation of the stimulus and the response of the listener. If a close temporal relationship existed between the stimulus and the response, the listener would not have to hold the entire stimulus in working memory until the response was required. As a result, less demand might be placed on the listener's memory abilities. According to Tyler (1991) a task may establish a close temporal relationship between speech stimulus and response, "either by requiring the listener to produce a fast reaction to the response immediately after the critical part of the stimulus has been heard or by stopping the input at a specific point and requiring the listener to make a response on the basis of partial information" (pp. 162-163). In addition to designing the task to tap discourse comprehension while the input is being presented, the content of the stimulus itself might also be structured to help limit the memory required to perform the task. As was previously discussed. the kinds of stories young children are frequently exposed to in their everyday lives contain factors such as rhythm, rhyme, repetition, and familiar or script-based plots. These features seem to enable children to attend to and recall the stories they hear. The mechanism by which such fatures aid children's recall has been widely investigated. A number of researchers have found that young children have ordered event representations or scripts for familiar events and that they use these representations to guide comprehension and recall of story narratives which are based on these famiiiar events (Hudson & Nelson, 1983; McCartney & Nelson, 1981; Nelson, 1978; Pace & Feagans, 1984). Scripts have also been shown to be an important memory organizer. The impact of structured story content on young children's performance on a comprehension task was revealed in Skaralds-Doyle's (1998) investigation of young 9 children's comprehension monito~gskills. Skarakis-Doyle (1998) found that structuring the content of the discourse appeared to facilitate young children's ability to monitor their comprehension of familiar stories. The stories used in the investigation were based on familiar events in order to capitalize on the system of mental representation of world knowledge (i-e., scripts) employed by young children. Specifically, stories based on familiar events were used so that the input was structured and children had expectations about the material. The results indicated that children as young as 30 months were able to detect violations to story content when predictable, script-based material was employed. In summary, the enabling factor of memory may be limited both by designing a task which taps comprehension while the stimulus is being presented and by utilizing structured stories. A task that diminishes the memory confound in this way may allow young children's comprehension abilities to be measured more directly, and hence, permit a more valid interpretation of their performance. As has been discussed, memory is one enabling factor for performance on the traditional measures of discourse comprehension. A second enabling factor for performance on these measures is language production ability. Story retelling, since it requires individuals to retell an entire stimulus story, demands a substantial degree of verbal proficiency. This necessity for verbal proficiency may obscure young children's comprehension abilities since their language production skills are not as well-developed as those of older children and adults. Research on children's nanative production abilities has shown that preschool children have difficulty structuring what they know into story format (Applebee, 1978; Carlisle, 1991;Feagans & Farran, 1981;McCabe, 1996). When young children are asked to tell a story, whether spontaneously, or as part of a retell task, they tend to produce short, scattered accounts. In fact, retelhgs of fictional stories are said not 10 to be logically and coherently produced until Mh grade (Stein, 1988). As opposed to the story retelling task, the comprehension question task is not as demanding of language production ability. However, the ability to answer content questions (i.e., wh-questions) still requires substantial formulation and speech production abilities (McCabe, 1996). In summary, the language production demands imposed by the story retelling and comprehension question tasks may hinder young children's ability to engage in those tasks and confound interpretation of their performance, hence compromising the validity of the measurement (McCabe, 1996). The possibility of controlling the enabling factor of language production needs to be examined since such control may allow young children's comprehension abilities to be measured more directly than has been possible. Skarakis-Doyle & Wootton (1998) attempted to create a measure of discourse comprehension measure that would control both memory and language production demands. They developed the Joint Story Retell (JSR),a discourse comprehension measure that was adapted from a widely used procedure for measuring reading comprehension called the doze test. The doze test consists of a aory from which words have been deleted in a systematic fashion (Dietrich, Freeman, & Griffin, 1979). During the doze test, the reader supplies the word or words that have been deleted fiom the passage. When a word is deleted from a text, the set of words that can go into the blank is constrained by the context (i.e.,the remaining undeleted words in the passage) @ieuich, et al., 1979). The doze test is based on the premise that readers who have understood a passage well will have a better chance of recovering the deleted word than readers who have not (Diecrich, et al., 1979). Thus, the rationale for the use ofthe doze procedure as a measure of discourse comprehension is that if readers understand the structure and content of a text, they should be able to utilize the redundancy in that text to retrieve deleted words at a better than chance level. Essentially, the intent of the cloze test is to "lead a child to use the context surrounding a blank to retrieve the word that was deleted fiom the blank, and thus, to demonstrate his [or her] comprehension of that surrounding context" (Dietrich, et al., 1979, p. 5). An examination of the features of the JSR reveals that the task may indeed control the enabling factors of memory and language production. First, relative to traditional measures of discourse comprehension (i.e., story retelling, comprehension questions). the ISR may diminish the enabling factor of memory. The essential difference between the JSR and the traditional discourse comprehension measures is that with the former, the listener does not have to maintain a representation of the entire story structure in working memory while his or her comprehension is being measured. Rather, pans of the stimulus story are retold by the investigator, providing the listener with a scaffold into which memory for specific story details can be placed. Thus, it seems that the JSR may diminish the substantial memory demands which typically confound young children's comprehension abilities. Second, the JSR may control the enabling factor of language production. Since children are required to produce only specific and limited elements. rather than sentences or entire stories, minimal demands are placed on their production abilities. Thus, the response requirements of the JSR may enable young children to demonstrate their story comprehension with limited interference from language production. There is empirical support for this possibility that the JSR controls both memory and language production demands. In a preliminary investigation of the JSR. SkarakisDoyle & Wootton (1998) analyzed the relationships between the JSR and two other teas in order to determine whether the JSR controlled memory and language production demands. First, they examined the relationship between performance on the JSR and performance on the Expectancy Violation Detection Task (EVDT),a comprehension measure which limits memory demands by requiring children to detect alterations to critical story components (e.g., actors, actions, objects) as they occur during story- retelling. Results of the investigation revealed a strong correlation between performance on the JSR and accuracy on the EVDT suggesting that the JSR, like the EVDT,may diminish the confounding factor of memoly ability. Second, they examined the relationship of Mean Length of Utterance (MLU)to performance on the JSR.Results of the investigation indicated that although MLU was moderately correlated with the JSR,it did not contribute meaningfblly to the prediction of performance on the task. The investigators concluded that performance on the JSR is not dependent on age-appropriate sentence production skill. The results of the investigation by Skarakis-Doyleand Wootton (1998) provide some preliminary empirical evidence for the validity of the JSR as a measure of early discourse comprehension. Specifically, Skarakis-Doyle and Woonon's (1998) findings suggest that the JSR may reduce the enabling factors typically associated with tests of discourse comprehension and, thus, may demonstrate content relevance. Evidence of content relevance is particularly important for measures of discourse comprehension given the role that enabling factors play in the measurement of this construct. Nevertheless. other types of evidence are also reievant to a fU.l evaluation of the validity of discourse measures. Evidence of expected performance differences over time is a partidarty illuminating form of validity evidence (Messick, 1995). Anastasi (1988) has termed this form of validity evidence, age ditferentiation. Evidence of age differentiation or developmental sensitivity is obtained through a comparison of test scores with chronological age. For any ability which is expected to increase with age during childhood, 13 test scores are also expected to show such an increase, if the test is valid (Anastasi, 1988). Since language comprehension is an ability that improves with age, a vaiid measurement of comprehension would be expected to show a similar improvement in performance with age. As yet, the degree to which JSR performance improves with age has not been firmly established. since. in the preliminary investigation (Skaraksi-Doyle & Wootton, 1W8), the measure was examined across a limited age range of children (i.e., 46 to 58 month olds). Therefore, empirical evidence of the developmental sensitivity of the JSR is still needed. In addition to evidence of age differentiation, evidence of an expected relationship between two tests may suppon the validity of a test under investigation. According to Anastasi (1988) correlations between a new test and similar earlier tests may be cited as evidence that the new tea measures the same general area of behaviour as the other tests. That is, these correlations may be cited as evidence that a measure demonstrates concurrent validity. Correlations between the ISR and a currently accepted measure of discourse comprehension (i.e., story retelling, comprehension questions) could be examined to determine whether the JSR measures the construct it was designed to measure. As yet, however, the JSR has not been tested against either of the traditionally accepted measures of discourse comprehension (Skarakis-Doyle& Wootton, 1998). Therefore, the concurrem validity of the JSR has not been determined. In summary, the results of preliminary investigations (Skarakis-Doyle & Wootton, 1998) suggest that the JSR might limit the role of enabling factors, allowing more direct interpretation of discourse comprehension. However, more rigorous empirical evidence of this is needed. In addition, evidence that the JSR meets other validity criteria, such as age differentiation and concurrent validity, is required. It was the purpose of this investigation 14 to evaluate the validity of the JSR by addressing these outstanding needs. The foilowing specific questions were posed: 1) Is the JSR a developmentally sensitive measure of discourse comprehension? It is hypothesized that performance on the JSR will improve with age such that older children will attain significantly higher scores on the task than younger children. 2) What is the relationship of the JSR to a traditional measure of discourse comprehension (i.e.,comprehension questions)? It is hypothesized that performance on the JSR will be moderately correlated with accuracy on the traditional comprehension questions, indicating that the two procedures measure the same construct, discourse comprehension, but potentially differ in memory and language production demands. 3) What is the relationship of the JSR to a measure of discourse comprehension that has diminished memory demands (i.e., the EVDT)? Ifthe JSR like the EM)?, has diminished memory demands relative to traditional measures of discourse comprehension, the ISR should be more strongly correlated with the EVDT than with the comprehension questions. 4) What is the relationship of language production ability to performance on the JSR? If the ISR minimizes production demands as it is purported to do, then language production should contribute little to performance on the task. Method Participants Forty children between the ages of 30 and 50 months were evaluated for eligibility to be included in this investigation. Thirty-eight of these children (24 females and 14 males) met the criteria for inclusion. These children demonstrated both normally developing l a p a g e abilities on the pre-experiment language tests and understanding of the primary experimental task, the JSR as indicated by at least one self-initiated accurate response during the practice session administered prior to the task. Of the two children who did not meet the inclusionary criteria, one did not perform adequately on the preexperiment language tests; the other did not demonstrate understanding of the JSR as indicated by a failure to complete any of the practice items provided. Only the 38 children who met the inclusion criteria are included in the detailed description of the participants which follows. Specific information on mean ages and performance on pre-experimental testing for the children who met the inclusion criteria is shown in Table 1. In order that performance of children of different ages might be compared, and the developmental sensitivity of the JSR evaluated, the children were divided into two groups, each of which covered an age span of approximately 10 months. The younger, 30 to 40 month old group, consisted of 15 children (1 1 females and 4 males), and the older, 41 to 50 month old group, consisted of 23 children (13 females and 10 males). As shown in Table 1, the mean age of children in the younger group was 36.13 months (D = 3.09). while the mean age of the children in the older group was 45.96 months S (Jl = 2.38). All of the children came fiom homes were English was reported by a parent to be the primary language. None of the participants had any obvious cognitive or other uncomected sensory (including hearing) or motor impairment, as reported by parents and 16 Table 1 Mean Ages. and Mean Raw Scores and Standard Deviations for Two Groups of Children on Pre-Emeriment Tests Variables --- Age Group -n Agea - - - PLS-3b Vocab. test' Younger group M SD - Older group M SD Total group M - SD Note. Age ranges in parentheses. 'Age reported in months. VLS-3 maximum raw score = 48 'Vocabulary pre-test maximum raw score = 16. 17 as described on the Checklist for Hearing Impairment (CHI)(Warr-Leeper et al., 1997). The CHI, a checklist pertaining to a child's developmental and hearing history, consisted of y d n o questions and rating scales. The children's receptive language abilities were normaily developing as indicated by scores within 1.5 standard deviations W ' s ) of the mean for their age on the Auditory Comprehension (AC) subscde of the Preschool Lannuane Scale-3 (PLS-3) (Zimmerman, Steiner, & Pond, 1979). Although the AC subscale of the PLS-3 includes items that tap a variety of aspects of language understanding, particular emphasis is placed conceptual understanding (i. e., understanding of temporal, quantity and spatial concepts). All children also possessed normally developing language as indicated by a score above the 10' percentile on the MacArthur Communication Develo~mentInventory I11 (CDI-111) (Dale, 1996). The CDI-111, a global measure of language development, is a parent checklist consisting of items that tap expressive vocabulary, sentence structure and, comprehension. A vocabulary pre-test constructed by the investigator (Appendix A) documented each child's understanding of the words used in the story. The pre-test consisted of 16 words fiom the story including, eight nouns, five verbs, two adjectives and, one locative. The test was a word-picture matching task in which the child was required to select one of four pictures based on a word orally presented by the investigator (e.g., "Show me, rubber duckier*).All children obtained scores of 60% or greater on the vocabulary pre-test. Children's scores on the vocabulary pre-test, as well as their scores on the AC subscale of the PLS-3. and on the CDI-111 are shown in Table 1. Parent responses to questions on an early literacy questionnaire that was constructed by the investigator (Dempsey, Perfetti, & Skatakis-Doyle, 1999) (Appendix B) characterized the participants and their early literacy experience. According to their 18 parents, all of the children enjoyed listening to stories. Ninety-five percent of the children were read stories more than three times a week. Approximately half of all parents described their reading styles as including both comments that extend the stories and specific requests for story information directed to their children. The other parents reported using variations of these basic interactive exchanges. Only one parent reportedly did not engage in these types of exchanges when reading to his or her child. CDI-I11forms for four of the children (two fiorn each age group) were not completed. Thus, average performance on the CDI-III for the younger group was based on 13 children; whereas, average performance for the older group was based on 2 1 children. The mean raw score of children in the younger group was 85.85 (SD = 23.70) out of 124 possible items; whereas, the mean raw score of the older group was 99.52 = 18.60). The mean (s raw score of ail the children for whom a CDI-III form was completed ("& = 34) was 94.29 ISD = 21.43). Materials and Ex~erimenta.1Test Stirnuli A predictable children's storybook entitled "Splish Splash" (Wootton & Skarakis- Doyle, 1995) was employed in this investigation (Appendix C). The plot of the story incorporates a familiar bathtime routine. The story is composed of seven episodes that are organized around a central goal. Each episode, in tum, is comprised of a sequence of goal directed activities. The overall length of the story in total number of words is 398. Vocabulary words, sentence patterns and episodes (i.e., child requesting something for her bath and mother searching for it) are repeated and a distinct rhythmic or song-like refiain is incorporated into the text of the story at several intervals. Each page of text is accompanied by a corresponding picture. An audio recording of the "Splish Splash" story was employed in the story familiarization phase of the investigation to allow for consistency in story presentation across participants. The following three test stimuli were constructed: a cloze or JSR story version, two forms of a M of traditional comprehension questions, and an EVDT story version. In the cloze story version (Appendix D), eight elements, (i.e., actions, actors, objects, locations, adjectives) were omitted fiom the text of the story and substituted with blanks or pauses. Seven of the omitted elements were judged to be critical to the progression of the story toward its goal (i.e., necessary to achieve the goal). One of the elements (i.e., the name of the central character in the story) was judged to be supportive. That is, this element was logically consistent with the story but not necessary to achieve the goal. In order that the eight elements might be deleted and the sense of the story maintained, some of the word order was altered from the original version of the story and the length was condensed to 238 words. Three of the eight cloze items were accompanied by pictures that could have revealed the appropriate response. The pictures accompanymg the other five items did not provide additional cues to the appropriate responses. The comprehension questions were designed to tap the same content as the cloze version of the story. Both yedno, and wh-questions were included in the set of questions so that the demand for recognition type responses (as in yedno questions) and r e d l type responses (as in wh-questions) was balanced. Two forms of the comprehension questions (Fonn A and Fonn B) (Appendices E and F) were prepared in order to minimize the probability of responses to the yedno questions being influenced by guessing. Form A of the comprehension questions was composed of seven yedno questions and seven wh- questions. Six of the yesho questions and five of the wh-questions pertained to concrete aspects of the story, or content information. The remaining three questions (i-e.,one yes/no question and two wh- questions) required children to make inferences based on the events of the story. Form B consisted of seven yedno questions and six wh-questions. Six of the yesho questions and four of the w h questions pertained to content information. The remaining three questions (i.e., one yedno question and two wh-questions) were inferential in nature. As was stated previously, the two forms of questions were prepared in order to minimize the probability of results being influenced by guessing on the yesho questions. Thus, three of the yesho questions on Form A required an afhnative response, while the remaining four questions required a negative response. On Form B, the seven yesho questions required the opposite responses. An example of a Form A question and its Form B counterpart is as follows: Is Sarah dirty? (yes) I Is Sarah clean? (no). The wh-questions were identical across the two forms, with the exception of one additional wh-question which was included on Form A as a follow-up to a yeslno question. The form of comprehension questions selected for administration was counterbalanced across the children within each age group. Thus, 23 children completed Form 4 and 16 children completed Form B. The EVDT story version (Appendix G) consisted of eight violations made to key story components. Five of the violations involved within story substitutions where certain actions, actors or objects were exchanged with others present within the aory (e.g., "dirty clothes" in place of "rubber duckie"). Two violations consisted of goal disruptions where an event which was plausible and fit the general story context but disrupted progress toward the goal (i.e., avoiding a bath) was inserted into the story. For example, the phrase "jumped in the tub" was inserted into the aory in the place of the original phrase, "ran outside to make more mud-pies". The one remaining violation involved the substitution of the content of the rhythmic, repetitive phrase (i.e., "Splish splash, Sarah needs a bath.. .") 21 with a phrase that was related to the story but contrary to the goal structure and content ofthe o r i w story (i.e.,"Ooey gooey, let's make a mess..."). Procedure The investigation was separated into two phases. Phase I consisted of preexperiment testing and story familiarization.Phase I1 involved presentation of the experimental procedures. The two phases occurred over a two-day period for each child. The experimental sessions were recorded on videotape to allow for detailed scoring and analysis. All sessions were conducted either at Elbom College (Universityof Western Ontario), in the child's home, or at the child's preschooi or daycare centre. Pre-Exoeriment Testing and Story Familiarization. During this phase children completed the inclusionary testing for participation in the investigation. The children completed a standardized receptive language test, the Auditory Comprehension (AC) subscale of the PLS-3(Zimmerman et al., 1979) as well as the vocabulary pre-test that was constructed by the investigator. In addition., the following questionnaireswere completed by the parents: the CDI-III @ale, 1996); the CHI (WarrLeeper et al., 1997); and, the early literacy questionnaire (Dempsey, et al., 1999). Each child was presented with the original "Splish Splash" story three times prior to participating in the experimental tasks. The number of story presentations was determined based on the research of Slackman and Nelson (1984) who found that with as few as three presentations of a story, preschool children could develop a script for an unfamiliar story. The first story presentation occurred following completion of the inclusionary testing. The child Listened to the audio-recording of the story with the investigator and followed dong in the book. The second aory presentation occurred between the end of the first session and the beginning of the test phase on the second day. Each parent was provided with a copy of the storybook and the audiotape and was asked to listen to the story with the child one time. The instruction to listen to the aory only one time was made to ensure consistency in number of story exposures across children. The parent was instructed to listen to the aory with the child during their usual storyreading time and was asked to respond to comments made by the child during the "Oh", plus repetition of storyreading with neutral acknowledgment (e-g., "Oh. "O.K.". the child's phrase) but not engage in discussion of the story with the child. The third and final aory presentation occurred on the second day, prior to the administration of the experimental tasks. Again, the child listened to the audio-recording of the story with the investigator and followed along in the book. Experimental Test Procedures. Following the third story presentation, children participated in the experimental portion of the investigation where they jointly retold the story with the investigator and answered the comprehension questions. A subset of the children also participated in the EVDT. In order to control for the possibility that participation in one of the tasks might influence performance on subsequent tasks, the order of two of the experimental procedures (the JSR and the comprehension questions) was counterbalanced across children, such that half of the children in each group participated in the ISR &st, while the other half answered the comprehension questions first. The subset of the children who participated in the E M T always completed this task last, in order to prevent the introduction of a new story (the E M T practice story) f?om interfering with performance on the other two tasks. Joint Stow Retell Procedure. Administration ofthe JSR consisted of the child jointly retelling the "Splish Splash" story with the investigator by providing the missing elements. Prior to participating in the experimental task, the child completed four practice items. The practice session was provided so that the child had a model of what would be required of himher during the actual ISR test procedure. The four practice items were taken from the story but were not included among those used in the actual test procedure which followed. Before presenting the praaice items, the investigator set out a playhouse and props (e.g., bathtub, washcloth, mom figure), labeling each item for the child. The props were then placed out of the child's view and the investigator instructed the child: "Help me tell the Splish Splash story. You can tell me the words or show me with the toys. This is how we'll do it". For each practice item., the investigator read a passage fiom the story, and left a word or phrase out (e.g., "Sarah stuck her big toe into the "). The child was expected to supply the missing word or words (e.g., bathtub). If the child failed to respond verbally within five seconds on the first item, the investigator's confederate supplied the missing word or words, thereby demonstrating a verbal response. tfthe child failed to respond verbally within five seconds on subsequent practice items, the investigator presented the tray of props and prompted the child to enact the item, saying "Doyou want to show me?'and repeating the item. If the child responded verbally to all practice items, he or she was encouraged to enact the last item. This was done in order to ensure that the child was aware that both verbal and enactment responses were acceptable during the task. The investigator eacouraged a enactment response by saying, "Let's try that one again. This time show me" while presenting the tray of props. Following completion of the practice items, the child participated with the investigator in the joint retelling of the story. The investigator read the doze story to the child, as he or she followed along in the book. Again, the child was required to supply the appropriate word or words during pauses made by the investigator. The investigator waited no longer than five seconds for a response. If the child did not respond verbally within five seconds, the props for the enactment response were made available. That is, the investigator prompted the child to respond nonverbally by asking, "Do you want to show me?" and offering the tray of props. If the child still did not respond, the investigator provided a verbal and enactment response and then continued with the story. Corn~rehensionaestions Procedure. Administration of the traditional comprehension questions consisted of the investigator asking the child either the Form A or Form B questions. The child was expected to answer the wh-questions verbally; however, either a verbal or a nonverbal (i-e.,nodding yes or shaking head no) response was acceptable for the yedno questions. Each question could be repeated a maximum of one time. If a child did not respond to a question or gave an incorrect response the investigator responded neutrally and then continued with the next question without providing the correct response. The child was given no longer than five seconds to respond to each question. Emectancv Violation Detection Procedure. A third experimental procedure, the EVDT, was administered in order to evaluate the role of memory in performance on discourse comprehension tasks. Seven of the thsrty-eight children were used to pilot refinements which were made to the original EVDT procedures employed by Skarakis-Doyle and Woonon (1998). Mer the pilot work was completed, the remaining thirty-one children participated in the EVDT task used in the present study. In order to clarify to each child what would be required during the EVDT, a practice session was provided. The following instruction was given prior to the practice session: "I'm going to read your story to you. But, I'm going to say some things that sound silly or out-of-place and I want you and (confederate) to catch me". The investigator then read a portion of a f d i a r storybook provided by the child's parents for use during the practice session. The investigator altered salient elements (i.e.,objects, actors, actions) of the famiIiar story. If the child failed to respond, either verbally or nonverbally to a violation the confederate responded to the violation by saying, for example, "No! It's not cow, it's Mortimer!". The investigator then encouraged the child to be the one to catch the mistake the next time. When a child detected a violation during the practice session, that child was verbally reinforced (e.g., "Good catching!"). Each child received a maximum of five practice trials. As soon as the child responded to one of the violations, his or her practice session was discontinued, since understanding of the task had been demonstrated. Thus, children received as few as one and no more than five practice trials prior to participating in the task. Following the practice session, the violated version of the story was read to the child as he or she followed along in the book. The investigator introduced the EVDT by saying to the child, "Now I'm going to read the Splish Splash story to you. I'm going to say some more things that sound silly or o~t-o~place and I want you to catch me, just like before." The child was expected to demonstrate that he or she detected the violation within five seconds of its occurrence through verbal response and/or via nonverbal behaviours (e.g., change in eye gaze, facial expression or body movement). If the child produced a verbal protest (i-e., 'To!)in response to a violation, the investigator encouraged the child to correct the error with the following probe, "No? No whet?".The investigator did not overtly acknowledge nonverbal responses made following violations to the story. The investigator never prompted a child who failed to respond to a violation either by pausing or by questioning that child. Rather, the investigator continued reading the story. The experimental procedure was videotaped to allow children's nonverbal responses to be analyzed. Data Analvsis Scorin~. Accuracy on the JSR was determined by calculating the total number of items (out of a maximum number of eight) that were correctly responded to. Both accurate verbal and enactment responses were accepted as correct. Accurate verbal responses included those that were verbatim from the story and in some cases also included responses that were variations of the words in the story (Appendix H).Enactment responses consisted of the child manipulating a toy using the correct action, object or person. Accuracy on the comprehension questions was determined by calculating the number of correct responses to the questions. Verbal responses were expected for whquestions; both verbal and nonverbal responses to the yesho questions were accepted. Correct responses to whquestions received one point each (see Appendix I), while correct responses to yesho questions received M a point each. Yesho questions were 27 given less weight in order to prevent inflated scores due to the potential for guessing an answer correctly. Since Form A had one more wh-question than Form B, the total score possible on Form A (i-e., ten) was one point more than the total possible score on Form B (i-e., nine). Performance on the EVDT was scored by the investigator fiom the videotape of the session. In accordance with the criteria established by Skarakis-Doyle and Wootton (1998), acceptable verbal responses were operationally defined as: protests (e.g., "No!") or corrections (e.g., "Not dirty clothes, rubber duckie!"), andlor repetitions with rising inflections which indicated a challenge to the utterance (e.g., "Dirty clothes!") or a question (e.g., "Dirty clothes?") . Acceptable nonverbal responses were operationally defined as: changes in eye gaze (fiom book to reader, from reader to book, fiom book to external environment of the room, and from the environment to the book), changes in facial expression (including smiles, f?owns, and puzzled looks), and changes in body movement (including sudden whole body or discrete body part movements such as head turns and sudden cessations of movements such as foot tapping) (Skarakis-Doyle & Wootton, 1998). In order for a nonverbal response to be accepted as a detection of a story violation, the behaviour had to occur within five seconds following the violation and no later than the end of the phrase immediately following occurrence of the violation. Skarakis-Doyle (1998), has demonstrated that nonverbal responses are rigorous indicators of detection that are used if and only if a story violation has occurred. A response was recorded as a combined response when a nonverbal behaviour was exhibited in conjunction with a verbal behaviour (e-g., a child looks up fiom the book to the experimenter and says, 'To, not dirty clothes! Rubber Duckie!"). Accuracy on the EVDT was determined by calculating the number of violations responded to, including both nonverbal and combined responses, out of a maximum possible of eight. Agreement. A minimum of 10% of both the JSR and comprehension question forms were rescored by a second graduate student who was involved in the study in order to determine inter-judge agreement. Agreement for scoring between students was 100% for both the JSR and comprehension question forms. Inter-judge agreement for EVDT scoring was also calculated. A minimum of 10Y0of the EVDT's were rescored from video recordings of the task. Inter-judge agreement was calculated for the occurrence of a response (nonverbal or combined) to a violation. Agreement for scoring between judges was 94%. Results Before the primary data analyses were conducted, several preliminary analyses were undertaken in order to evaluate particular aspects of the test stimuli and their presentation. Since a storybook format was used to present the JSR there was a possibility that some of the pictures accompanying the cloze items might have revealed the correct response. This potential advantage was evaluated in the preliminary analyses. Further, since two forms of the comprehension questions were utilized, prehmary analyses were also conducted to ensure that the comprehension forms were equivalent. First, the impact of pictures on JSR performance was examined across all children = 38). The average proportion correct across the three items where pictures might have cued a correct response = 0.69, = 0.27) was equivalent to the average proportion correct on the other five items (kJ = 0.69 = 0.31). A paired t-test was performed to determine whether there was a statistically significant difrence in performance on the two types of doze items. The results of the t-test indicated that there was,in fact, no sigmiicant dierence (t(37)= 0.02, p > .05) between performance on cloze items which did or did not have an accompanying picture. Therefore, subsequent data analyses did not differentiate between the two types of items; the JSR raw scores were computed from children's responses to all eight items. Second, the equivalency of the two forms of comprehension questions (Form A and Form B) was examined. Since assignment of form was counterbalanced within each age group, 22 children completed Form A of the comprehension questions and 16 children completed Form B.The average proportion correct on Form A was slightly lower = 0.60, = 0.14) than the average proportion correct on Form B = 0.70, SD = 0.24). In order to determine whether the difference in petfonnance on the two - forms was statistically significant, an independent t-test was conducted. Based on a pooled variance estimate, no si@cant difference (t(36)= -1S 3 , > .05) was found between accuracy on Form A and accuracy on Form B. Therefore, subsequent data analyses did not differentiate the form of comprehension questions completed by the children. Once these preliminary analyses were completed, primary analyses were undertaken to evaluate the validity of the JSR.A number of specific issues related to the validity of the JSR were examined, including the developmental sensitivity, the concurrent-validity and the content-relevance of the measure. As stated previously, one of the major criteria employed in empirical evaluation of test validity is age differentiation or developmental sensitivity (Anastasi, 1988). Since discourse comprehension abilities are expected to increase with age during childhood, test scores on discourse comprehension measures should likewise show such an increase, if the measures are valid. Several analyses were conducted to examine whether the JSR met the criterion of age differentiation. Correlational analysis was performed to explore the relationship between age and performance on the JSR for all 38 participants. As shown in Table 2, a statistically significant correlation of moderate magnitude (I= -61, g < .01), was found between age and the JSR indicating that 37% of the variability in scores could be explained by a relationship between JSR performance and age. Also shown is Table 2 is the correlation between age and accuracy on the traditional comprehension questions. As would be expected, a statistically significant correlation of moderate strength (L = .60, .01) was also found between age and accuracy on the comprehension questions, indicating that 36% of the variability in scores could be Table 2 Correlation Coefficients for Ane. JSR and Com~rehensionQuestions (N= 38) Variable Age JSR Questions Age -- 61** .60** -- .60** JSR Questions -- 32 explained by the relationship between performance on the comprehension questions and age. In order to hrther explore the relationship between age and each of the comprehension measures, the performance of children in the younger age group & I = 15) was compared to the performance of children in the older age group (n = 23). As shown in Table 3, the older children, as a group, provided a greater number of accurate responses than did the younger children on both the JSR and the comprehension questions. To determine if the performance differences between the two age groups were significant, independent t-tests were conducted for each of the measures. A t-test was conducted with age group as the independent variable and the number of correct responses on the JSR as the dependent variable. Based on a separate variance estimate, a significant difference (i(20) = -3 -68, g < .01) was found between the younger group and the older group, indicating that the scores of the older group on the JSR were, in fact, significantly higher than the scores of the younger group. A second t-test was conducted, this time with the number of correct responses on the comprehension questions as the dependent variable. Based on a pooled variance estimate, a significant difference (t(3 6) = -3.39, Q< .01) was again found between the younger group and the older group, indicating that the scores of the older group were significantly higher on the comprehension questions than the scores of the younger group. In summary,both correlational and t-test analyses revealed performance differences with age on the JSR In addition, as would be expected, similar findings were revealed for the traditional comprehension questions. Analyses were then conducted to evaluate the concurrent validity of the measure. The concurrent validity of a test is determined by evaluating how closely an individual's Table 3 Mean Number of Accurate Res~onsesand Standard Deviations on the JSR and Com~rehensionOuestions for Two Age Grou~s Measure Age Group -n JSR Comprehension Questions -- Younger groupa 15 M 4.40 (0.52) 5.10(0.47) SD 1.99(0.28) 1.66(0.23) M 6.48(0.77) 6.85 (0.71) SD - r . 1 2(0.21) 1.48 (0.16) Older groupb Note. Mean proportion correct and standard deviation enclosed in parentheses. 'Age range = 30-40months months). = 36.13 months). bAgerange = 41-50 months @ = 45.96 I 34 test score is related to his or her score on a criterion variable that is measured at about the same time the test score is obtained (Anastasi, 1988; McCauley & Swisher, 1984). The concurrent validity of the JSR was evaluated by comparing children's performance on the JSR with their scores on the traditional comprehension questions. As shown in Table 2, correlational anaIysis revealed a statistically significant correlation of moderate magnitude k=.60,p < .01) between the JSR and the comprehension questions, indicating that 36% of the variability in scores on the JSR was explained by the relationship between performance on that task and performance on the comprehension questions. Also shown in Table 2 is the relationship between each of the comprehension measures and age. Given that, in addition to being moderately correlated with each other, both the JSR and the comprehension questions were also moderately correlated with age, analyses were conducted to determine whether there was any unique contribution of comprehension to the relationship between the JSR and the comprehension questions. The relationship between the two comprehension measures was analyzed for children within a more developmentally stable age range (i.e., children between 42 and 50 months of age). Correlational analysis was performed to examine the relationship between scores on the JSR and the comprehension questions for children within this age range. As shown in Table 4, a moderate correlation (L = .44) was found between the JSR and the comprehension questions in this age group. Thus, 19% of the variability in scores on the JSR was explained by the relationship between performance on that task and performance on the comprehension questions when age varied only minimally. As would be expected given the inextricable relationship between age and comprehension ability, the magnitude of the correlation between the JSR and the comprehension questions was not as strong within the developmentally stable age p u p (E = -44)as it was within the group as a Table 4 Correlation Coefficients for Age. JSR and Corn~rehensionQuestions for Children Between 42 and 50 months (n = 23) Age -- 28 .27 JSR .44* Questions -- whole (L = -60).However, the fact that the correlation remained within the moderate range suggests that the JSR and the comprehension questions share some common variance other than what is accounted for by age alone. In addition to the comparisons made between the JSR and the comprehension questions, comparisons were also made between the JSR and two of the pre-experimental measures, the vocabulary pre-test and the AC subscale of the PLS-3.These comparisons were conducted in order to determine the relationship between the JSR and two widely accepted measures of literal comprehension. First. analyses were performed to examine the role of vocabulary knowledge in performance on the JSR As was shown in Table 1, the average number of correct responses on the vocabulary pre-test was 13.11 (SJ = 1.74) out of a maximum possible score of 16. Correlational analysis was conducted to examine the relationship between the JSR and the vocabulary pre-test. Results of the analysis revealed a statistically significant relationship of moderate strength between the JSR and the vocabulary pre-test (I= .57, p < .01), indicating that 32% of the variability in scores on the JSR could be explained by the relationship between performance on the JSR and performance on the vocabulary pretest. A regression analysis was conducted to evaluate the unique roles of age, and vocabulary pre-test score in predicting ISR performance. Based on a step-wise regression analysis, age alone accounted for 37% of the variability in scores on the JSR @ = .61, -F(1,36) = 20.90,s < -01). When the vocabulary pre-test score was entered into the equation it accounted for an additional sigruficant 13% of the variability in performance on the JSR task &= - 13, L ( 1 , 3 5) = 9.27, < .01). This finding indicates that knowledge of story vocabulary contributes uniquely to performance on the JSR even after the contribution of age is considered. Second, analyses were performed to examine the relationship between the JSR and the AC subscale of the PLS-3. As was shown in Table 1, the average PLS-3 AC raw score was 35.55 (SP = 5.95) out of a maximum total score of 48. Correlational analysis was conducted to determine the relationship between the JSR and the PLS-3 AC subscale. Results of this analysis revealed a statistically siwcant relationship of moderate magnitude = .49, p < -01) between the JSR and the PLS-3 AC subscale, indicating that 24% of the variability in JSR scores could be accounted for by the relationship between pedonance on the JSR and performance on this PLS-3 AC subscale. A regression analysis was conducted to evaluate the unique roles of age and PLS-3 AC score in predicting JSR performance. Again, based on a step-wise regression analysis, age alone accounted for 37% of the variability in performance on the JSR = .6 1, E(l.36) = 20.90, p < .01). However, PLS-3 AC scores did not make any unique contribution to the variance in JSR performance (e = .06,t = 29, > .05)even after age was considered. The developmental sensitivity and the concurrent validity of the JSR having been evaluated, the content-relevance of the measure was finally examined. Empirical evidence of content-relevance may be obtained through a demonstration that scores on two tests are related in expected ways. As stated previously, the JSR was designed to reduce memory demands by establishing a close temporal relationship between the stimulus and response. Since the EVDT also limits memory demands by closely linking the stimulus and response, at least a moderate correlation was expected between these two measures. Statistical analyses were conducted to examine whether this was the case. Prior to examining the role of memory demands in task perfonnance, several analyses were performed to examine the relationship between age and performance on the EVDT. As presented earlier, after the pilot work was completed, 3 1 children were 38 presented the EVDT.However, of these 3 1 children, only 24 met the criterion of at least one seEinitiated, correct response provided during the practice session. Therefore, all analyses involving the EVDT were based on the data obtained fiom the performances of 24 children @fern age = 42.04 months; age range = 30-50 months). As shown in Table 5, a statistically significant correlation of moderate strength @ = -56. g < .01) was found between performance on the EVDT and age, indicating that 3 1% of the variability in performance on the EVDT was explained by the relationship between EVDT performance and age. Further analyses revealed that, as a group, the older children provided almost twice as many correct responses on the EVDT (M = 2.50, = 2.27). = 4-11, = 2.16) as the younger children An independent t-test was conducted to examine whether the difference in performance between the two age groups was significant. Based on a pooled variance estimate, a significant difference (r(22) = -2.42, g < .05) was found between the younger group and the older group in accuracy on the EVDT.The finding that children in the older group were significantly more accurate on the EVDT than children in the younger group was consistent with the correlational result. Given that the relationship of the EVDT to age was established, analyses were conducted to examine the relationship between the ISR and the EVDT. As stated previously, it was assumed that since both the JSR and the EVDT required responses to be made while the child was creating a representatio~rather than after the representation had been completed and was held in working memory, both measures reduced memory demands. In order to evaluate this assumption, the relationship between the JSR and the EVDT was analyzed for the 24 children who demonstrated understanding of both tasks. Table 5 Correlation Coefficientsfor Ape. JSR Com~rehensionOuestions and EVDT in = 24) Age JSR Questions E.V. Detection -- .65** .64** -- 5 1* -- 40 As was shown in Table 5, a statistically sigru£icantcorrelation of moderate magnitude (r = -57, g < .Ol)was found between the JSR and the EVDT, indicating that 32% of the variability in scores on the JSR could be explained by the relationship between performance on the ISR and performance on the EVDT. Also shown in Table 5 is the correlation coefficient forthe relationship between the comprehension questions and the EVDT.Again, a statistically sigruficant correlation of moderate magnitude (I= .47, < .05)was found between the two measures, indicating that 22% of the variance in comprehension question scores was explained by the relationship ben~eenperformance on the comprehension questions and performance on the EVDT. As was shown in Table 5. the JSR the EVDT, and the comprehension questions were each moderately correlated with age. Therefore, as with the data for the JSR and the comprehension questions. EVDT data was analyzed for the children within the smaller. more developmentally stable age range (i.e., children within 42 to 50 months). For the 14 children between the ages of 42 and 50 months who demonstrated understanding of the tasks. moderate correlations were found between both the EVDT and the WR (_r =.38) and between the EVDT and the comprehension questions (I= .41). Thus, as would be expected given the close relationship between age and comprehension ability, the correiations between the EVDT and both the JSR and the comprehension questions were not as strong in the developmentally stable age range as they were in the group as a whole. However, the correlations remained in the moderate range, with performance on the EVDT continuing to account for 14% andl 7% of the variability in scores on the JSR and comprehension questions respectively. 41 In addition to memory, the other enabling factor that was examined was language production. The JSR was designed to limit language production demands by requiring cMdren to produce only limited elements of the text. As well, children who did not initially provide a verbal response to an item were given an opportunity to enact the response. Given that the enactment response was an option, the number of children who utilized this form of response was calculated. Results indicated that only six of the thirtyeight children availed themselves of the enactment response option during testing. All six of these children utilized the enactment response option to complete the doze item which required the name of the central character in the story (i.e., Sarah). One child also used the enactment response to complete a cloze item which required the name of an object used in the story (i.e., big bar of soap). Half of the children who utilized the enactment response option got less than SOTOof the JSR items correct; the other half of the children got more than SOTOof the items correct. Of the total number of correct responses provided by all the children, only 3% (i-e..7 of 2 15 correct responses) were provided via enactment. This finding suggests that the language production demands of the JSR task may be of an appropriate level for children between the ages of 30 and 50 months. [n order to further examine the potential role of language production in performance on both the JSR and the comprehension questions, additional quantitative analyses were undertaken. Specifically, the relationship of each measure to scores on the language production component of the CDI-III (i.e.,the vocabulary production and sentence structure Scaies) was analyzed. Parents of 35 of the 38 children completed the language production component of the CDI-III. The average number of items correct on 42 the language production component of the CDI-III KDI-III-production) was 85.34 out of a total possible score of 1 12. Correlational analysis revealed a sigruficant correlation of moderate magnitude between the JSR and CDI-III-production & = -38, Q < .05) and a moderately strong correlation between the comprehension questions and the CDI-IIIproduction (1= .63, g < .01). To further analyze the role of language production in performance on each measure, regression analyses were conducted. Fint, a regression analysis was performed with number of responses correct on the JSR as the dependent variable and age and CDI-III-production as the independent variables. Based on a aep- wise regression analysis, age alone accounted for 37% of the variability in ISR performance (B = -57, E ( 1,33) = 15.67, LC .OI). CDI-III-production did not significantly account for a unique amount of variability and hence, was not entered into the equation. A second regression analysis was conducted, this time with comprehension questions as the dependent variable. Based on a step-wise regression analysis, CDI-III-production alone = accounted for 45% of the variability in performance on the comprehension questions .63.E(l,33) = 21.29, E < .01). Age was also entered into the equation and accounted for a additional 9% of the variability in performance on the comprehension questions = . 0 8 , L ( 1 , 3 3 ) = 5.97, g < .OS). Although there were moderate correlations between the CDI-In-production and both the JSR and the comprehension questions, results suggested that global language production ability only contributed uniquely to performance on the comprehension questions. Global language production ability did not contribute to performance on the JSR beyond what would be expected due to age alone. Discussion Although comprehension is a broad construct that encompasses the discourse level, as well as the lexical and syntactic levels, measurement efforts have focused almost exclusively on the latter two levels. As a result, while there are a variety of measures available to tap decontextualized literal and syntactic comprehension, there are relatively few available to tap discourse comprehension. Given that very few discourse level measures are available. there is a possibility that scores on lexical and syntactic measures may be used to draw inferences about the broader construct of language comprehension. However, since different abilities are required to comprehend language at each level, teas that are valid for the measurement of lexical and syntactic comprehension, are not necessarily, and in fact are unlikely to be, valid indicators of discourse comprehension. Clearly, measures designed specifically to tap discourse comprehension are necessary for comprehensive measurement of the construct of language comprehension- The development of such measures is not a straight-forward task, however. Valid measurement of discourse comprehension faces a number of challenges. One of the most critical challenges involves ensuring that tasks only measure content that is relevant to discourse comprehension. Entailed in existing measures of discourse comprehension are enabling factors such as memory and language production. These facton intrude into the measurement of discourse comprehension, confounding interpretation of performance. Skarakis-Doyle & Wootton (1998) attempted to create a measure of discourse comprehension that minimized the confounding factors typically found in measures of discourse comprehension. As discussed previously, preliminary investigation suggested 44 that this new measure of discourse comprehension, the Joint Story Retell, might indeed limit the role of enabling factors but M e r evaluation of its validity was necessary. The current study sought additional evidence for the validity of the JSR. As stated previously, age differentiation, or developmental sensitivity is a major criterion employed in construct validation (Anastasi, 1988). Since language comprehension abilities are expected to increase with age during early childhood, performance on measures of comprehension should likewise show such an increase, in order for the measure to be valid. As expected, results of this investigation showed that the comprehension questions did demonstrate age differentiation. Of greater import, was the finding that the JSR also demonstrates age differentiation. Results of correlational analyses revealed a moderately strong relationship between age and performance on the ISR. When group comparisons were conducted, the older children (i.e.,41 to 50 month olds) performed signtficantly more accurately on the JSR than the younger children (i.e., 30 to 40 month olds). These findings are consistent with the hypothesis that the JSR is sensitive to age differences. According to Anastasi (1988), age differentiation is a necessary but not sufficient condition for validity. Thus, the finding that the JSR exhibits age-related performance differences does not on its own ensure that the ability that is changing with age is discourse comprehension. In order to determine whether the ability measured by the JSR is, in fact, discourse comprehension, comparisons were made between performance on the JSR and performance on a preexisting measure of discourse comprehension. 45 As stated previously, correlations between a new meawe and related earlier tests may be cited as evidence that the new test measures approximately the same area of behaviour as the other tests of the same c o n m a (Anastasi, 1988). Since the JSR purports to measure discourse comprehension, the same Construct which is measured by comprehension questions, the concurrent validity of these two measures was examined. When comparisons were made between children's performance on the JSR and their performance on the traditional comprehension questions, a moderately strong correlation was revealed. This finding suggests that a considerable amount of variance is shared by the two measures. Since both measures were also correlated with age, it was also possible that shared variance related to age, rather than to discourse comprehension, might have accounted for the relationship between the JSR and comprehension questions. Therefore, the relationship between the two measures was examined while attempting to control for age. To do this, the relationship between the two measures was examined for children within a more developmentally stable age range (i.e., 42-50 month range). This approach to controlling for age was taken instead of partialling age out of the correlation because age is inextricably linked to comprehension ability and because a valid measure of a construct should contain all aspects relevant to that construct. Although the magnitude of the correlation between the JSR and comprehension questions decreased slightly when age varied only minimally, the correlation remained in the moderate range, suggesting that the JSR and comprehension questions shared variance that was not solely attributable to age. Given that comprehension questions are the widely accepted measure of discourse comprehension, it seems plausible that the shared variance reflects the construct of discourse comprehension. It may be argued that the close tie between stimulus and response in the cloze task might permit children to respond correctly to cloze items that they do not comprehend. That is, perhaps the close temporal proximity of the stimulus provides such a direct link to the correct response that children can produce that response, even when they do not understand the passage. Evidence obtained from this investigation, however, suggests that this is not likely the case. First, as discussed previously, older children perform more accurately on the JSR than younger children. If the JSR were merely providing cues to accurate responses in the absence of comprehension, it would be expected that younger children could perform just as well on the task as older children. This was not the case. Furthermore, it has been shown that the JSR is strongly related to performance on comprehension questions. Since comprehension questions measure story understanding, the finding of a close relationship between the ISR and the comprehension questions suggests that the ISR like the comprehension questions, provides a measure of discourse comprehension ability. In addition to the comparisons made between the JSR and the comprehension questions, comparisons were also made between the JSR and the two comprehension tests employed in pre-experiment testing. Results revealed that both the vocabulary pre-test and the PLS-3 AC subscale were at least moderately correlated with the JSR However, only scores on the vocabulary pre-test contributed uniquely to performance on the JSR,the PLS-3 AC scores did not. As discussed previously, both lexical and syntactic - understanding are necessary (though not sufficient) for understanding discourse. Thus, given that the JSR measures discourse comprehension, it is not surprising to find that vocabulary knowledge conmiutes uniquely to performance on the measure. A possible explanation for the finding that PLS-3AC scores did not also contribute uniquely to performance on the JSR may have to do with the areas of comprehension tapped by this measure. Relatively few items on the PLS-3 AC subscale specifically test vocabulary or syntactic knowledge, abilities that are known to be imponant components of discourse comprehension. Rather, many of the items on the PLS-3 AC subscale tap general conceptual knowledge, including spatial, temporal, and quantitative relations. This type of conceptual knowledge may not have as direct an impact on discourse level comprehension as does lexical and syntactic knowledge, at least not in the type of story employed in this study. In addition to the evaluations performed to determine the JSR's developmental sensitivity and concurrent validity, evaluations were also performed to examine the role of enabling factors in task performance. As stated previously, two enabling factors, memory and language production, typically intrude in measures of discourse comprehension, making interpretation of performance difficult. The JSR was designed to limit the intrusion of these factors, allowing more direct interpretation of test scores. As Anaaasi (1988) has discussed. decisions about whether a new test may be said to be free of the influence of enabling factors may be made based on an examination of the correlations that exist between the new tea and other existing tests. Such correlations were employed in Skarakis-Doyle & Woofton's (1998) prefimkary investigation of the JSR.The results of 48 that investigation suggested that the JSR did, in fact, reduce the enabling factors of both memory and language production. In the present investigation, correlations between the JSR and the EVDT were again obtained in order to examine the assertion that the JSR limits the enabling factor of memory in task performance. One way the JSR was designed to limit memory demands was by reducing the temporal gap between stimulus and response. The EVDT also purportedly limits memory demands by reducing the temporal gap between stimulus and response (Skarakis-Doyle & Wootton, 1998). When correlational analysis was conducted, a moderate correlation between these two measures was revealed. Although, as expected, age clearly played a role in performance on both the JSR and the EVDT, the fact that a moderately low correlation was found between the two measures even in a more developmentally stable age range suggests that there was some shared variance between the measures not attributable to age alone. While the JSR and the EVDT both establish close temporal links between the stimulus and response, the tasks also differ in important dimensions. The JSR requires completion of story elements, a recall task, while the EVDT requires detection of story alterations, or recognition responses. Thus, it seems plausible that the source of their shared variance is related to the fact that both measures minimize memory demands. However, it is also plausible that their shared variance reflects the fact that both measure discourse comprehension. To investigate these two alternatives. correlations of the JSR and EVDT, both of which limit memory demands, were examined relative to the comprehension questions, which impose greater memory demands. Both the JSR and EVDT were more strongly correlated with each other than either was with comprehension questions. All three measures s h a d approximately 22% to 32% of the variance, suggesting that all measure discourse comprehension. The differences in amount of shared variance between the pairs of measures. supports the first alternative, that both the JSR and EVDT minimize memory demands relative to the comprehension questions. A comparison of the relationship found between the JSR and EVDT in the present investigation (r = .57) with the relationship found between those measures in the preliminary investigation (r = .62) (Skarakis-Doyle & Woonon, 1998) reveals that the correlations are similar but not identical. One possible explanation for the slight difference in results between the two investigations has to do with alterations made to the EVDT stimuli for the current investigation. Many of the violations originally employed by ' Skarakis-Doyle & Wootton (1998) involved the substitution of one element with another element of the same class (e.g., actor, action, object) that was not from the story, making violations more obvious. None of the violations employed in this investigation consisted of this type of substitution. Rather, the violations employed in this investigation involved within story substitutions or goal disruptions. It is possible that the inclusion of these more subtle types of violations (i.e., within story substitutions) made the task more difficult for children., accounting for the slightly lower correlation found between the JSR and EVDT in the current investigation. The difference between the correlations obtained in the two investigations might also have been affected by the fact that the children who participated in the current study ranged in age from 30 to 50 months while the children who participated in the preliminary study (Skarakis-Doyle & Wootton, 1998) were older, ranging in age from 46 to 58 months. This investigation pursued the role of memory in task performance by examining the relationship between two measures (the JSR and the EMIT) that require a close temporal relationship between stimulus and response as compared to a measure requiring the entire story to be held in memory. In addition to the stimulus-response temporal relationship, it has also been suggested that predictable, script-based stories may play an imponant role in reducing memory demands (Hudson & Nelson, 1983; McCartney & Nelson, L 98 1;Nelson, 1978; Pace & Feagans, 1984). As yet, the role of predictable aories in reducing the memory demands of the JSR has not been examined. Further research might compare children's performance on the JSR, when less predictable or nonscripted aories are utilized to their performance on the JSR when predictable aories such as "Splish Splash" are employed. In addition to providing information about the role of predictability in JSR performance, such a comparison would address the issue of the extent to which the current results may be generalized to a wider range of aory types. As well as the type of aory, the impact of number of story exposures provided prior to the administration of the task should be examined. In the current investigation, three story exposures were utilized prior to test administration since research had indicated that children could establish a script for an unfamiliar story with three exposures to that story (Slackman & Nelson, 1984). In the future, performance on the JSR following three story presentations might be compared to performance following just one story presentation. If memory can be shown not to intrude in test performance when only one story presentation is provided prior to the task the clinical utility of the task might be enhanced. In addition to controlling memory demands, the JSR was designed to limit language production demands relative to traditional discourse comprehension measures. It was argued that the JSR was less demanding of language production abilities than the traditional comprehension questions, because on the JSR children were only required to produce limited elements of the text and because enactment responses were permitted as an alternative to verbal responses. The results of the current investigation support this argument. Investigation of the role of global language production ability as measured by CDI111-production revealed moderate correlations between that measure and both the J SR and the comprehension questions. However, global language production ability only contributed uniquely to performance on the comprehension questions, not to performance on the JSR.Given that both the JSR and the comprehension questions appear to measure discourse comprehension, the diierence in language production requirements may be one explanation for why the correlation between the JSR and the comprehension questions was not even stronger than it was. Additional support for the limited language production demands of the JSR can be found through an examination of the enactment responses made by the children. First, only 3% of the total number of correct responses were made in this mode. Second, these responses were accounted for by only six children who spanned the age range and accuracy rate investigated in the study. Given these findings concerning the use of 52 enactment, it appears that the verbal cloze procedure alone sufficiently reduced language production demands across the age span. Further evidence concerning the role of language production in JSR performance might be attained in the m r e by comparing the performance of children with language production disorders to the performance of children with normally developing production abilities. Ifthe JSR does, in fact, Limit language production demands. it would be expected that children with normally developing comprehension, regardless of their language production abilities. should perform equivalently on the JSR. In addition to language production demands, another area that requires further investigation involves the reliability of the JSR.Skarakis-Doyle and Wootton ( 1998) compared children's performance on two forms of the JSR each of which was based on a different stimulus story and found that scores on the two forms were highly correlated, indicating that the JSR demonstrated alternate forms reliability. Evidence of test-retest reliability would m h e r strengthen support for the reliability, and hence, the validity of the measure. One limitation of the current investigation which should be raised concerns the characteristics of the participants. Children who participated in this investigation were recruited largely from centres in the university community.Informal observations suggest that many of these children came fiom highly educated families who were of middle to upper socioeconomic status. Given that literacy experiences might be expected to vary across the socioeconomic spectrum, fkther research should attempt to include children from a broader range of socioeconomic backgrounds. Conclusion Construct validation requires that evidence accumulated fiom a variety of sources support the appropriateness of test score interpretation (Messick, 1995). One of the sources of evidence that construct validity relies on is evidence of expected performance differences with age. Construct validity also relies on evidence of expected relationships between measures. Evidence of an expected relationship between a new test and a preexisting test may be used to support the theory that the new test measures the same construct as the pre-existing test. That is, such evidence may be used to show that a new test demonstrates concurrent validity. Evidence of an expected relationship between tests may also be used to demonstrate that a new measure is relatively free of any constructirrelevant factors (e.g., enabling factors) that might confound tea score interpretation. Such evidence supports the content-relevance of a test. Expectations about how the JSR would be related to age, as well as how it would be related to certain other comprehension and production measures were based on theoretical arguments about the developmental nature of comprehension, analyses of task demands, and results of the preliminary investigation (Skarakis-Doyle & Wootton, 1998). First, it was expected that scores on the JSR would increase with age since there is rapid growth in both modalities of expression and comprehension during the early preschool years (Chapman, 1978). Empirical evidence supported this expectation; the JSR exhibited the expected performance differences with age. Second, it was expected that the JSR would demonstrate only a moderate correlation with comprehension questions since both measures claim to measure discourse comprehension but differ in that the JSR purports to limit enabling factors relative to the traditional measure. Again, empirical evidence supported this expectation; the JSR did, in fact, demonstrate a moderate correlation with the comprehension questions, suggesting that both are measures of discourse comprehension. Third, it was expected that the JSR would be more strongly related to the EVDT than to the comprehension questions, given that the JSR like the EVDT. and unlike the comprehension questions, was purported to minimize memory demands by maintaining a close temporal relationship between stimulus and response. This expectation also received empirical support; the ISR was more strongly correlated with the E M T than it was with the comprehension questions. Finally, it was expected that language production would contribute little to perfomance on the JSR, since the task requires children to produce only specific and limited elements of the text. This expectation also received empirical support; although it did contribute to perfonnance on the comprehension questions, global language production ability did not to contribute to performance on the JSR. The finding that the ISR is sensitive to age differences, in conjunction with the finding that the JSR demonstrates concurrent validity, suggests that this measure is a potentially valid measure of discourse comprehension for children between the ages of 30 and 50 months. Further, the finding that the JSR limits enabling factors, indicates that the measure may allow test scores to be interpreted more directly than has previously been possible since it appears that the potential for memory and language production to confound performance have successllly been limited. As stated previously, comprehensive measurement of comprehension has been limited by the lack of discourse comprehension measures. The risk associated with the paucity of discourse comprehension measures has been that scores on available measures, which are typically only valid for the measurement of decontextualied literal and syntactic comprehension, might be used to draw inferences about the broader construct. Given this risk, it was evident that a measure designed specifically to tap discourse comprehension needed to be developed. For a new test to be added to the battery of comprehension tests, there should be evidence that the test is valid. AIthough hnher investigations of the measure are necessary, evidence obtained from this investigation suggests that the JSR has the potential to be a valid measure of discourse comprehension. It appears that the JSR could be a meaningfUl addition to the current battery of comprehension measures. Its use, in conjunction with the use of literal level comprehension measures, should allow more comprehensive measurement of young children's comprehension in the b r e . Appendix A Vocabulary me-test 1. pull - 9. drop 2. look for - 10. dirty 3. turn - 11. outside 4. dry 12. open 5. foot 13. rubber duckie 6. washcloth 14. house 7. mother 15. bathtub 8. soap 6 clothes - Appendix B Earlv Literacy Questiomaire 1 . Does your child enjoy listening to stories and looking at books? 2. How often do you and your children have storytime? Once a week Twice a week Three times a week More than three times a week: Specify Other 3. Describe your typical storytime routine: a) Who participates in storytime? b) Who selects the story? C) Where does storytime take place? d) When does storytime take place? e) Anything else? 4. When you read with your child, are any of the following typical of your reading style? Please check. Yes No a. Talking about the pictures -- b. Asking your child questions about the story -- c. Asking your child to tell parts of the story -- d. Relating parts of the story to personal experiences (e.g., "Oh,remember when we built a snowman.") Other 5. Does your child experience storytime in locations other than your home? (e.g., preschool library). If so. please describe. 6. Do you read patterned stories such as "Monimer" or "ThePaper Bag Princess" by Robert Munsch? Yes No- If yes, please elaborate (frequencyof exposure, your child's response to these books.. ..) 7. What are some of your child's favourite stories? 8. How does your child respond when you introduce a new story? 9. What happens if you accidently make a mistake when you read a story (e.g., misname a character, leave something out)? Can you recall a time when that happened? Appendix C Oriainal Version of "S~lishS~lash" One day a little girl named Sarah made twenty very messy mud-pies in the back yard. Sarah's mother took one look at her, and said, ''Splish, splash, Sarah needs a bath. M o m m y says you're dirty and she can't have that." So the mother took Sarah upstairs to the bathroom. Sarah took off all her dirty clothes and her mother wrapped her in a big towel. Then the mother filled the bathtub with water, and said, "Splish, splash, Sarah needs a bath. Mommy says you're dirty and she can't have that." Sarah stuck her big toe into the bathtub. But then she said, "Oh Mommy, the water's too cold. The water must be nice and warm." So the mother reached for the tap and turned on the hot water until it was nice and warm. Then Sarah stuck her foot into the bathtub and said, "I like the nice warm water. But Mommy, I just can't have a bath. I must have my red washcloth." So the mother opened the bathroom cupboard and pulled out all of the washcloths until she found Sarah's red washcloth. Then she dropped it into the bathtub, and said, "Splish, splash, Sarah needs a bath. Mommy says you're dirty and she can't have that." 61 Sarah squeezed all the water out of the washcloth. But then she said, "Oh Mommy,Ijust can't have a bath. I must have my big bar of soap." So the mother dug through the soap drawer until she found a big bar of soap. Then she rubbed the soap on the red washcloth, and said, "Splish, splash, Sarah needs a bath. Mommy says you're dirty and she can't have that ." Sarah took the washcloth. But then she said, "Oh Mommy, i just can't have a bath. I must have my little rubber duckie." So the mother looked all over the bathroom, but she couldn't find Sarah's little rubber duckie. Then she iooked all over Sarah's bedroom, but she still couldn't find Sarah's rubber duckie. Then she looked ail over the house, but she still couldn't find the rubber duckie. Sarah waited and waited for her mother to come back. But Sarah got tired of sitting in the bathroom. So she dried her foot, dressed in all her dirty clothes again, and ran outside to make more mud-pies. Appendix D Cloze (Joint Stow Retell) Version of "S~lishSplash" One day a little girl named Sarah made twenty very messy mud-pies in the back yard. 9' Sarah's mother took one look at her, and said, (1) " So the mother took Sarah upstairs to the bathroom and filled the bathtub with water. Then, the little gid named (2) stuck her big toe into the bathtub and said, "Oh Mommy, the water's too cold. The water must be nice and warm." So the mother reached for the tap and (3) . The water was just right. Then, Sarah started to wash her foot with her red washcloth. But then she said, "Oh Mommy,I just can't have a bath. I must have a (4) v* So the mother dug through the soap drawer and found a big bar of soap. But then Sarah said, "Oh Mommy, I just can't have a bath. I must have my (5) *? So the mother looked all over the bathroom, and all over the bedroom but she couldn't find Sarah's little rubber duckie. Then she looked all over the (6) ,but she still couldn't find the rubber duckie. Sarah waited and waited for her mother to come back. But Sarah got tired of sitting in the 63 bathroom. So she dried her nice clean foot, dressed in all her (7) again, and ran (8) Total: I8 to make more mud-pies. Com~rehensionQuestions: Form A 1. What does Sarah's mother say? a. Is Sarah dirty? b. Does mom want Sarah to go outside and play? 2. Who stuck her big toe into the bathtub? 3. Mom filled the bathtub with water. Sarah stuck her big toe in the bathtub. Then mom turned on the hot water. Why? Follow-up: Was the water too cold for Sarah? 4. Does Sarah have everything for her bath? a. What does Sarah need? 5. Where did mom look for Sarah's little rubber duckie? a. Did mom say, "No, you can't have your little rubber duckie."? 6. Sarah got tired of waiting in the bathroom. Where did Sarah go? 7. Did Sarah put on her dirty clothes? 8. Did Sarah want to take a bath? a. Why/why not? Total: 110 Appendix F Com~rehensionOuestions: Form B 1. What does Sarah's mother say? a. Is Sarah clean? b. Does mom want Sarah to take a bath? 2. Who stuck her big toe into the bathtub? 3. Mom filled the bathtub with water. Sarah stuck her big toe in the bathtub. Then mom turned on the hot water. Why? Follow-up: Was the water just right for Sarah? 4. Does Sarah want her big bar of soap? 5. Where did mom look for Sarah's little rubber duckie? a. Did mom look and look for Sarah's little rubber duckie? 6. Sarah got tired of waiting in the bathroom. Where did Sarah go? 7.Did Sarah put on clean clothes? 8. Did Sarah want to play outside? a. Why/why not? Total: I9 Appendix G Emectancv Violation Detection Task Version of "Splish Splash" One day a little girl named Sarah made twenty very messy mud-pies in the back yard. Sarah's mother took one look at her, and said. "Splish, splash, Sarah needs a bath. Mommy says you're dirty and she can't have that." So the mother took Sarah upstairs to the bathroom. Sarah saueezed (took off) all her diny clothes and her mother wrapped her in a big towel. Then the mother filled the bathtub with water, and said, "Splish, splash, Sarah needs a bath. Mommy says you're dirty and she can't have that." [within story substitution] The mother (Sarah) stuck her big toe into the bathtub. But then she said, "Oh Mommy, the water's too cold. The water must be nice and wann." [within story substitution] So the mother reached for the tap and turned on the cold (hot) water until it was nice and warm. Then Sarah stuck her foot into the bathtub and said, "I like the nice warm water. But Mommy, I just can't have a bath. I must have my red washcloth." [goal disruption] So the mother opened the bathroom cupboard and pulled out dl of the washcloths until she found Sarah's red washcloth. Then she dropped it into the bathtub, and said, "Oow pooev. Let's make a mess: Mud-~iesand french fries and stuff like that." ("Splish, spiash, 67 Sarah needs a bath. Mommy says you're dirty and she can't have that." [goal disruption and prosodic alteration] Sarah squeezed all the water out of the washcloth. But then she said, "Oh Mommy, I just can't have mv messv mud-des (a bath). I must have my big bar of soap." [within story substitution] So the mother dug through the soap drawer until she found a big bar of soap. Then she rubbed the soap on the red washcloth, and said, "Splish, splash, Sarah needs a bath. Mommy says you're dirty and she can't have that." Sarah took the washcloth. But then she said, "Oh Mommy,i just can't have a bath. I must have my dim clothes (little rubber duckie)." [within story substitution] So the mother looked all over the bathroom, but she couldn't find Sarah's little rubber duckie. Then she looked all over Sarah's bedroom, but she still couldn't find Sarah's rubber duckie. Then she looked all over the house. but she still couldn't find the rubber duckie. Sarah waited and waited for her mother to come back. But Sarah got tired of sitting in the bathroom. So she dried her foot, put on her red washcloth (dressed in all her dirty clothes again), and i m e d in the tub to take a bath (ran outside to make more mud-pies). [within aory substitution I goal disruption] Appendix H Joint Story Retell Scoring K w Target Res~onse Acceptable Variations 1 ) Spiish splash, Sarah needs a bath.................................. Splish splash, Sarah needs a Mommy says you're dirty and bath. she can't have that. Take a bath You need a bath. You're dirty. 2) Sarah ............................................................................. none 3) turned on the hot water ..................................................in hot water made the water hot made it hot 4) big bar of soap ....................................................... ....(bar of') soap big soap 5) rubber duckie .................................................................rubber duck duck(ie) 6) house ...........................................-............-..................home place 7) dirty clothes .................................................................. .grubby clothes 8) outside ........................................................................... out the door out in the backyard Appendix I Wh-Question Scoring Key Question Acce~tableRes~onses 1) What does Sarah's mother say?.................................. Splish splash. Sarah needs a bath Take a bath You need a bath You're dirty 2) Who stuck her big toe into the bathtub........................... .Sarah The little girl 3 ) Mom filled the bathtub with water Sarah stuck her big toe into the bathtub. Then mom turned on the hot water. Why? ...................... To make it nice and w m Sarah wants warm water The water was too cold 4) What does Sarah need?............................................... Red washcloth, Big bar of soap or Rubber duckie 5 ) Where did Mom look for Sarah's rubber duckie?........... Bathroom, Bedroom or All over the house 6) Where did Sarah go?...................................................... Outside In the backyard Out the door 7) Did Sarah want to take a bath? Why not?....................... She wanted to play outside She wanted to make rnud-pies She was tired o f waiting She didn't have her soap/ rubber duckie/washcloth The water was too cold 8) Did Sarah want to play outside? Why?.............................She wanted to make mud- pies She didn't want to take a bath Appendix I The UNlVERSITYof WESTERN ONTARIO Vi-Pmsident (Rrsearth) W h i u Reuitw Board Dental Seunces Budding REVIEW BOARD FOR HEALTH SCENCES RESEARCH INVOLVING HUMAN SUBJECTS 1998-99 CERTIFICATlON OF APPROVAL OF HUMAN RESEARW ALL HEALTH SCIENCES RESEARCH INVOLVING HUMAN SUBJECTS AT THE UNIVERSITY OF WESTERN ONTARIO 1s CARRlED OUT IN COMPUANCE VKlW THE MEDICAL RESEARCH COUNCIL OF CANADR'GUIDELINES ON RESEhRCH WOLVMG HUMAN SUBJECT:' 1998-99 REVIEW BOARD MEMBERSHIP - I) Dr. 8. Barwin. &isant Dan-Recanit Medicine (Chiman) (hMcomylOph~h;rlmotogy) 2) Mr. S. Hoddinark D i m o r of Ruareh Servicc~(Epidemiology) 3) Dr. K. Giikrl. Sc Joseph's Health Centlrc Rcprr~cnulivc ( f n m d Maiicim) 1) Dr. R McMurus, London Wth Sciuyrs Cencrr Victoria Campus R c p ~ t a t i w (Endocrinology k M&balism) 5) Dr. D. beking. landon Health Scicnas Centn Udwrsity Clmpws Representative(Phpician lntcmal Medicine) 6) Dr. L HclItr, Offioc of the President Rqr~+~~~Otivc (Fh) 7) Mrs. M Urn,Office of the Pccsident Rtpcc~ntative(Community) 8) Mr. S. Fincher-Stall, Offia of the President Rtptt~lltaliw (Legal) 9) Dr. 0. Framats Faculty o f Mcdicinc & Dtnliztry Rcpracnlativc(CIinial) 10) Dr. D. Sim, Facui~yof Medicine & Dentistry Representative@asic)(Epidcmialogyl I 1) Dr. T.M. Underhill. School of Dentistry Rcpnxatativc (Od Biology) 12) Dr. H. k h i n g e r . School of Nursiog Rcptucmtrtive (Nursing) 13) Dr. S. Aduns, Faculty of Heaith Scienas Rtprtsentali\r (Communrauon Science k Disorders) 14) Mn. L Noq London Clinical Rexarch &ochLioa Rcprcxntative 15) Dr. P.G.R I-titrding, Research Insrilulcs Rcprcscnutiw (ObstetridGynaccology) Kt) Mrs. R Yohnidci. Administntivc Officcr Altcnutcs an appointed for each member. - - - THE REVIEW BOARD HAS EXAMINED THE RESEARCH PROJECT ENITLED IThe joint aory melt u a mcanrre of young children's camprchcnsiaa of familiar stories" REVIEW NO: W21 AS SUBMITTED BY - Dr. E. Skarakis-Doyle Comrnuniacian Scicnoa and Disorden. Elborn College AND CONSIDERS K TO BE ACCEETABLEON ETHICAL GROUNDS FOR RESEARCH [NVOLVING HUMAN SUBJECTS UNDER CONDtTIONS OFTHE UNIVERSKY'S POLICY ON RESEARCH INVOLVING HUMAN SWBJECTS APPROVAL DATE: 03 Nm+mbcr 1998 (h Later of Information ) AGENCY London. Onurto * Canada K6A 5C1 Telepbane 1519) 661 3036 Fax 15191 661 3875 References Anastasi A. (1988). Validity: Basic concepts. In Psvcholoaical testing (@ed., pp. 139164). New York: MacmiUan Publishing Co. Applebee, A. N. (1978). The child's concept of story. Chicago: University of Chicago Press. Bates E. (1993). Comprehension and production in early language development. Monoara~hsof the Societv for Research in Child Develournent. S8(3 -4), 222-242. Carlisle. J. F. ( 1991). Planning an assessment of listening and reading comprehension. To~icsin Lanrmane Disorders. 12(1), 17-31. Chapman, R. (1978). Comprehension strategies in children. In J.F. Kavanaugh & W. Strange (Eds.), Lanauaae and speech in the laboratory. school. and clinic (pp. 309327). Cambridge, MA: MIT Press. Dale, P. (1996). MacArthur Communicative Development Enventorv- III ICDI-1111. Seattle, WA. Dempsey, L., Perfetti, T.,& Skarakis-Doyle, E. ( 1999). Early literacy auestionnaire. Unpublished Manuscript. University of Western Ontario, London, Ontario. Diehl-Faxon, J., & Dockstader-Anderson, K. (1985). Discourse intonation patterns of mothers reading to their young children... readerese. National Readinn Conference Yearbook. 34,300-305. Dietrich, T.. Freeman, C., & GnffiR P. ( 1979). Assessing comprehension in a school setting. In. R.W. Shuy (Series Ed.) & P.Griffin (Val. Ed.), Pawrs in ap~lied linpuistics: Linguistics and readiia series: Vol. 3. Arlington TX:Center for Applied Linguistics. Feagans, L., & Farran, D. C. (1981). How demonstrated comprehension can get muddled in production. Developmental Psvcholoav. 17(6), 718-727. Glazer, J. I. (199 1). Literature for vouna children. (3rd ed.). New York: Macmillan Publishing Co. Heath, S. B. (1986). What no bedtime story means: Narrative skills at home and school. In B. B. SchieffeUin, & E. Ochs (Eds.), Lanauaae socialization across cultures: Studies in the social and cultural foundations of lanrmaae. No. 3. (pp. 97-124). New York: Cambridge University Press. Hudson,J., & Nelson, K.(1983). Effects of script structure on children's story recall. Develoomentd Psvcholonv. 19(4), 625-635 . Lynch, P. (1986). Usina bia books and medictable books. Toronto, ON: Scholastic-TAB Publications Ltd . McCabe, A. (1996). Evaluating narrative discourse skills. In K. Cole, P.Dale, & D.Thd (Eds.), Assessment of communication and lanmane (pp . 12 1- 142). Baltimore: Paul H.Brookes Publishing Co. McCartney, IS.A., & Nelson, K. (1981). Children's use of scripts in story recall. Discourse Processes, 4,59-70. McCauley, R. J., & Swisher, L. (1984). Psychometric review of language and articulation tests for preschool children. Journal of Speech and Hearing Disorders. 49,34042. Messick, S. ( 1995). Validity of psychological assessment : Validation of inferences from persons' responses and performances as scientific inquiry into score meaning. In A. E. Kazdan (Ed.), Methodoloeical issues & strategies in clinical research (2* ed., pp. 24 1-260). Washington, D.C : American Psychological Association. Miller, J. F., & Paul, R. (1995). The clinical assessment of lanmane com~rehension. Baltimore: Paul H. Brookes Publishing Co. Nelson, K. (1978). How children represent knowledge of their world in and out of language: A preliminary report. In R. S. Seegler (Ed.), Children's thinkinn: What develops? (pp . 25 5-273). Hillsdale, NJ: Lawrence Erlbaum Associates. Pace. A. J., & Feagans, L. (1984). Knowledge and language: Children's ability to use and communicate what they know about everyday experiences. In L. Feagans, C. Garvy, & R. Golinkoff (Eds.), The origins and nrowth of communication (pp. 268280). NJ: Norwood. Rees. N. S. & Shulman, M. (1978). Idon't understand what you mean by comprehension. Journal of S~eechand Hearinn Disorders. 43,208-219. Scarborough, H. S., & Dobrich, W. (1994). On the efficacy of reading to preschoolers. Develoomental Review. 14(3), 245-302. Schneider, W., & Pressley, M. (1997 ). Memory develooment between two and twentv. (2"6ed.). Mahwah, NJ: Lawrence Erlbaum Associates Inc. Skarakis-Doyle, E. (1998). Emernence of young children's comorehension monitoring of stories. Unpublished Manuscript. Skarakis-Doyle, E.,& Wootton, S. (1998). Measuring oreschool children's discourse com~rehension:A modified retell ~rocedure.Paper presented at the 19' m u d Symposium on Research in Child Language Disorders, Madison, WI. Slackman, E., & Nelson, K. (1984). Acquisition of an unfamiliar script in story form by young children. Child Development. 55,329-340. Snow, C. E., Perlmann, R., & Nathan, D.(1987). Why routines are different: Toward a multiple-factors model of the relation between input and language acquisition. In K. E.Nelson, & A. van Kleeck (Eds.), Children's Language: Volume 6 (pp. 6597). NJ: Lawrence Erlbaum Associates Inc. Stein, N. (1988). The development of children's storytelling skill. In M. B. Franklin, & S. Barten (Eds.), Child lannuaae: A book of readings (pp. 282-297). New York: Odord University Press. Thal, D. (1991). Language and cognition in normal and late-talking toddlers. Tooics in Language Disorders. l 1,33-42. Thal. D., Tobias, S., & Morrison, D.(1991). Language and gesture in late talkers: A 1year follow-up. Journal of Speech and Hearing Research 34,604-612. Tyler, L. ( 1991). The distinction between implicit and explicit language functions: Evidence from aphasia. In A. D. Milner, & M. D. Rugg (Eds.), The neuropsycholo~vof consciousness (pp. 159- 178). New York: Academic Press. Wan-Leeper, G.,Miller, L., Brac, M., Culhane, R.Bernhard, K., & Yearous, J. (1997). Use of questionnaires to identifv children at risk of hearing impairment. Poster session presented at the annual OSLA Conference, Toronto. Ontario. Wootton, S., & Skarakis-Doyle, E. (1995). S~lish!%lash.Unpublished manuscript. Zimrnerman, I., Steiner, V., & Pond, R (1979). Preschool L a n w e Scale4 Columbus, OH:Charles E-Memll.
© Copyright 2024 Paperzz