Morphosyntactic Assessment in SLI - UvA-DARE

UNIVERSITEIT VAN AMSTERDAM
Graduate School for Humanities
MA General Linguistics
Morphosyntactic Assessment in SLI
Standardized testing versus language sample analysis
2014
June 17
Acknowledgements
II
Abstract
BACKGROUND:
Morphosyntax is an area of language development often affected in
children with specific language impairment (SLI). Morphosyntactic development can
either be assessed by standardized norm-referenced tests or by language sample analysis.
Clearly, each method has its own strengths and weaknesses. Comparable studies as well
as clinical observations have shown mismatches between the results of both methods.
Yet, a comparison of the results of the two methods has not been conducted for Dutch
instruments. AIMS: In the current thesis, the central question is whether performance on
standardized language tests and results of language sample analysis lead to comparable
conclusions about the language development of the children with SLI. Since
morphosyntax is the only language domain assessed by all three diagnostic instruments,
this domain will be the focus of the current study. METHODS & PROCEDURE: Test files of
ten 4;0-8;0 year-old children clinically diagnosed with SLI and tested by either the
CELF-4-NL and STAP or Schlichting Test for Sentence Development and STAP were
assessed qualitatively. Group performance, individual performance and problem
structures were evaluated. RESULTS: The group level analysis showed that: complexity
scores of the STAP are higher than grammaticality scores for nearly all children; normreferenced tests correspond most to the complexity scores of the STAP; and the STAP
scores in general correspond most to the Schlichting and to the CELF subtest Formulated
Sentences. Analysis of the individual subjects showed a low correspondence between
performances on both methods. In only three out of ten subjects, both test methods led to
the same conclusions. The methods also differed considerably in the structures they
identified as problem structures. CONCLUSION: The current study reveals only a slight
correspondence between performance on standardized tests for morphosyntactic
assessment and the results on the instrument for language sample analysis. Although in
practice both methods have proved to be a good clinical tool for the assessment of
children with SLI, these results indicate that caution is needed in choosing one method
over the other since the tasks are not equivalent. Until more extensive studies prove
otherwise, the current procedure (i.e. the use of spontaneous language as an addition to
standardized language tests) seems justified.
III
Table of Contents
Acknowledgements ................................................................................................................... II
Abstract .................................................................................................................................... III
1. Introduction ........................................................................................................................... 2
2. Background ........................................................................................................................... 4
2.1 Diagnosing children with specific language impairment ............................................... 4
2.2 Comparison of diagnostic instruments used to assess language disorders .................... 7
2.3 The current study.......................................................................................................... 11
3. Linguistic and clinical aspects tested by CELF, SCHLICHTING and STAP .................... 13
3.1 CELF-4-NL .................................................................................................................. 13
3.1.1
Materials ........................................................................................................ 14
3.1.2
Procedure ....................................................................................................... 14
3.1.3
Scores and interpretation................................................................................ 15
3.1.4
Psychometric properties ................................................................................. 15
3.2 Schlichting Test for Language Production ................................................................... 16
3.2.1
Materials ........................................................................................................ 17
3.2.2
Procedure ....................................................................................................... 17
3.2.3
Scores and interpretation................................................................................ 18
3.2.4
Psychometric properties ................................................................................. 18
3.3 STAP ............................................................................................................................ 18
3.3.1
Materials ........................................................................................................ 19
3.3.2
Procedure ....................................................................................................... 19
3.3.3
Scores and interpretation................................................................................ 19
3.3.4
Psychometric properties ................................................................................. 20
3.4 Comparison of the CELF, Schlichting and STAP ....................................................... 21
4. Methods ............................................................................................................................... 24
4.1 Subjects ........................................................................................................................ 24
4.2 Data collection ............................................................................................................. 25
4.3 Analysis ........................................................................................................................ 25
5. Results ................................................................................................................................. 29
5.1 Scores ............................................................................................................................. 29
5.2 Case studies .................................................................................................................... 31
6. Conclusion & Discussion .................................................................................................... 36
References ................................................................................................................................ 40
Appendix 1 Subtests of the CELF-4-NL used in the four-level assessment process............. 43
Appendix 2 Structures assessed by the Schlichting test for Sentence Development ............. 44
Appendix 3 Structures assessed by the STAP ....................................................................... 46
Appendix 4 STAP-profile & STAP-summary ...................................................................... 47
Appendix 5 Comparison of structures assessed by the CELF-4-NL, Schlichting test for
Sentence Development and STAP ..................................................................... 50
Appendix 6 Example items of morphosyntactic structures shared by the CELF-4-NL and
Schlichting test for Sentence Development ....................................................... 51
List of Tables and Figures
Table 2.1 The EpiSLI system. ................................................................................................. 6
Table 2.2 Strengths and weaknesses of norm-referenced tests and language sample analysis
............................................................................................................................... 10
Table 3.1 Correlations between standard deviations of CELF and Sentence Development of
the Schlichting....................................................................................................... 21
Table 3.2 Shared morphosyntactic structures assessed by the CELF, Schlichting and STAP
............................................................................................................................... 21
Tabel 4.1 Characteristics of subjects ..................................................................................... 25
Table 5.1 Scores of the children tested by the CELF and STAP ........................................... 29
Table 5.2 Scores of the children tested by the Schlichting and STAP .................................. 29
Table 5.3 Comparison of the children tested by the CELF and STAP .................................. 29
Table 5.4 Comparison of the children tested by the Schlichting and STAP.......................... 29
Figure 5.1 Scatterplot of scores of the children tested by the CELF and STAP .................... 29
Figuur 5.2 Scatterplot of the scores of the children tested by the Schlichting and STAP ...... 31
1.
Introduction
Immediately after birth, children start using their speech apparatus to produce sounds.
This mainly comes down to crying at first, but a broader variation of sounds emerges
soon after. Children are able to produce adult-like structures when they are only five
years old, and when a child has reached his teens, language proficiency has developed so
much that there is only some fine-tuning left to do.
The current study focuses on assessment of morphosyntactic development in 4;0-8;0
year old Dutch children. Four-year-old children are known to be in the penultimate stage
of language acquisition, the stage in which they rapidly develop in all areas of language.
Although they already possess most of the adult language-structures in this stage, they
still need practice on how to consistently use these structures in the correct ways.
Overregularization occurs frequently and exceptional structures have to be learned.
Examples of developing structures and processes mentioned by Gillis and Schaerlaekens
(2000) are reflexive pronouns, derivational morphology and the fluent production of
passive, long and composed sentences.
The language development of children with a language deficit deviates or is delayed
compared to the pattern displayed in typically developing children. Language domains
most commonly affected in children with language impairment are phonology,
morphology and syntax, but other domains may also be affected. For Dutch, Bol and
Kuiken (1988) compared the spontaneous language of nineteen 4;1 to 8;2 year old
children with specific language impairment (SLI) to a group typically developing
language-age matched children (3;6-4;0 year old). They found some significant
differences, to the disadvantage of children with SLI. These differences included: less
frequent use of pronouns, possessive nouns and diminutives; less frequent use of
conjugations of verbs in the first person singular; and less use of phrases that include
articles and prepositions. Furthermore, children with SLI produced more incomplete
sentences consisting of only two constituents and produced less sentences consisting of
more than four constituents. Coordinations using the conjunctions maar (but) and want
(for) were used less, as were questions including syntactic inversions consisting of three
constituents and structures in which objects and adverbials were combined. Tense,
agreement and verb argument structure were assessed by De Jong (1999), who studied
Dutch school-aged children with (grammatical) SLI. He found that the inventory of past
tense forms was limited and that, instead of using regular past tense forms, the children
2
preferred using a past tense form of the auxiliary gaan (‘go’) complemented by a verb
infinitive. Deficits in subject-verb agreement included omission or substitution of the
agreement morpheme and use of the infinitive instead of an inflected verb. Verb
argument structure was also affected in children with SLI. Often this meant that the
structures were low in complexity, but grammatically correct.
To assess language development in children, various instruments are being used. The
current study assesses the use of standardized language tests and an instrument for
language sample analysis in the diagnosis of Dutch children with SLI. The main question
of this thesis is: To what extent do performance on standardized language tests and
results of language sample analysis lead to comparable conclusions? This question is
relevant, because it may reveal whether both methods are interchangeable in the process
of diagnosing children with SLI.
The language tests that will be assessed in this study are the CELF-4-NL (Kort,
Schittekate & Compaan, 2008) and the Schlichting Test for Sentence Development
(Schlichting & lutje Spelberg, 2010). The instrument for language sample analysis that
will be assessed is the STAP (Van den Dungen & Verbeek, 1994). Since morphosyntax is
the only language domain assessed by all three diagnostic instruments, this domain will
be the focus of the current study.
Based on clinical observations of the diagnostic instruments and their use, it is
expected that correspondences as well as differences will be found when results of the
different instruments are compared. Norm-referenced tests as well as language sample
analyses provide a general view of a child’s morphosyntactic knowledge. Corresponding
within-subject scores on these methods would therefore be expected. However, it is likely
that the CELF, Schlichting and STAP measure different morphosyntactic structures or
measure the same structures by using different techniques. This presumably lowers the
correspondence.
The structure of this thesis is as follows: first in chapter 2, some light will be shed on
earlier research about the diagnosis of children with specific language impairment.
Moreover, the two methods for language assessment will be discussed. Chapter 3 will
provide an evaluation of the assessment tools used in the current research. In chapter 4,
the subjects and methods of the current research will be described. Chapter 5 provides an
overview
of
the
results,
which
will
be
discussed
in
chapter
6.
3
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
2.
Background
Although language impairments can be caused by a variety of underlying problems, this
study focuses on the language development of children with a primary language disorder.
The criteria adopted by various researchers and in research instruments for identifying
children with SLI will be discussed in 2.1. In 2.2 a comparison of two methods for
language assessment is provided and section 2.3 discusses the current study.
2.1
Diagnosing children with specific language impairment
Specific language impairment is a developmental disorder generally defined by exclusion
criteria. Although the language of children with SLI deviates compared to typically
developing children, their nonverbal IQ is within normal limits and they do not have
sensory problems. Furthermore, these children have had normal opportunities for
language learning and don not show signs of any other developmental disorders (Bishop
1992, Leonard 2000). Inclusion criteria are less frequently and consistently used. A
number of studies did however formulate such criteria. Stark and Tallal (1981) devised a
method for selecting children with SLI, based on both inclusion and exclusion criteria.
For children with a performance IQ score of at least 85 who passed the other exclusion
criteria described above, complementing inclusion criteria were applied. These children
needed to have: a combined language age (LA) score of at least 12 months below either
mental age (MA) or chronological age (CA), whichever was the lower; a receptive LA of
at least 6 months below MA or CA; or an expressive LA of at least 12 months below MA
or CA. Out of the 132 children aged 4;0-8;6 clinically classified as being language
impaired, only 39 (29,54%) were selected as having SLI based on the inclusion criteria by
Stark and Tallal. The majority of the exclusions were based on low performance IQ level.
Plante (1998) argues that, although children with a non-verbal IQ below 70 should be
excluded from SLI studies, a cut-off score of 85 might be too high. This argument was
based on an earlier study by Swisher, Plante and Lowell (1994) in which was
demonstrated that within-subject performance on IQ tests varied with a mean of 10
points, depending on the IQ measurement method used. Another objection to the method
by Stark and Tallal came from Lahey (1990), who argued that identification of children
with SLI should not be made based on a comparison of language age to mental age but to
chronological age, which is most frequently used nowadays.
4
Stark and Tallal discuss the heterogeneity of the remaining children who were found
to have SLI. Although children with severe expressive language deficits and articulation
deficits had been excluded, the severity and nature of the deficits found in the remaining
39 children varied greatly. This led them to propose that ‘the classificatory term “Specific
Language Deficit” may be a misleading one’ (Stark & Tallal, 1981: 122), because it does
not refer to one single deficit. This proposition agrees with the more recent thoughts and
findings about the existence of subtypes of SLI (Conti-Ramsden et al., 1997; Van Daal et
al., 2004). Some of these proposed subtypes predominantly address morphosyntactic
problems. In other subtypes the focus is more on phonological, articulation, semantic or
pragmatic problems. Rapin and Allen (1987) proposed subtypes of SLI, including a
phonologic-syntactic deficit and a lexical-syntactic deficit. The first deficit was
characterized by children having articulation, phonology, morphology and syntax
problems, the latter included children with syntax and morphology problems, word
finding difficulties and expressive problems. These subtypes respectively corresponded to
‘cluster 1’ and ‘cluster 5’ found by Conti-Ramsden and her colleagues (1997). The focus
of this study will be on subjects with morphosyntactic problems who possibly fall within
one of these subtypes. Although the existence of different subtypes of SLI seems
plausible, this debate will not be elaborated any further because it exceeds the purpose of
the current study.
Another diagnostic system for identifying kindergarten children with SLI was
designed by Tomblin, Records and Zhang (1996). The EpiSLI system was designed to aid
the conduction of epidemiologic research on SLI. The system consisted of previously
existing standardized tasks, addressing three domains of language: vocabulary, grammar
and narrative; and two modalities: comprehension and production. Vocabulary was tested
through a picture identification task and an oral identification task. Grammatical
knowledge was tested by a grammatical comprehension task, a sentence imitation task
and a grammatical completion task. Both the vocabulary and the grammatical tasks were
selected from the TOLD-2:P (Test of Language Development-2 Primary: Newcomer &
Hammill, 1988). A narrative production and comprehension screening task (Culatta, Page
& Ellis, 1983) was used to assess narrative competence. Individual composite scores for
each domain and each modality were calculated. This led to five scores, as shown in
Table 2.1.
5
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
Table 2.1 Specific areas of language measured by the EpiSLI system and the composite scores derived
from these measures (Tomblin et al. 1996: 1287).
Modality
Language
Domain
Picture
Identification
Oral
Vocabulary
Grammatic
Understanding
Narrative
Comprehension
Grammatic
Completion
Sentence
Imitation
Narrative
Recall
Comprehension
Composite
Expression
Composite
Vocabulary
Composite
Grammar
Composite
Narrative
Composite
The diagnostic system and its composite scores as designed by Tomblin and his
colleagues concur with the CELF-4-NL described in section 3.1. Tomblin et al. (1996)
statistically computed the diagnostic standard in which ‘a language impairment is
indicated if the child failed two or more of the five composite measures, where failing
was a z-score of -1,25 or less’ (Tomblin et al. 1996: 1288). A total of 7,019 kindergarten
children were sampled and screened for possible language impairment. Subsequently a
subgroup of 1,502 children with the same proportion of screening failures as in the
original group of 7,019, was given the diagnostic battery in Table 2.1. Of these children,
13.58% was labeled having language impairment. This rate was consistent with the
expectations of clinical standards as well as with the only other large-scale epidemiologic
study of language impairment in kindergarten children. However, it was not a good
predictor of the prevalence of SLI, because exclusion criteria were not employed. Further
research by Tomblin and his colleagues (1997) provided an estimated prevalence rate of
7,4%. This agrees with the overall estimated prevalence rate of 6% to 10% for school-age
children, as reported by the DSM-IV-TR (American Psychiatric Association, 2000).
In the Netherlands, children are labeled as being language impaired when they fail at
least two separate language tests on two different domains, where failing is defined as
scoring -1,5 standard deviations or less. The domains that can be used to identify
language impairment are (a) speech; (b) auditory processing; (c) grammar; and (d)
lexical/semantic development. Alternatively, children who score -2 standard deviations or
less on a general language test will also be diagnosed with SLI. In both cases cognitive
functioning cannot be an underlying cause of the language impairment and speech
therapy should have been proven to be unsuccessful (Voogd, 2009). Tests frequently used
6
by Dutch speech therapists to demonstrate morphosyntactic impairment are the CELF-4NL (Kort, Schittekatte & Compaan, 2008) and the Schlichting tests for sentence
production (Schlichting & lutje Spelberg, 2010). These tests, along with a Dutch
assessment tool for sample analysis (STAP: van den Dungen & Verbeek, 1994), will
therefore be the focus of this study.
2.2
Comparison of diagnostic instruments used to assess language disorders
Speech therapists and speech-language pathologists generally use standardized language
tests to assess children with language problems. These tests are often preferred because
they are norm-referenced and relatively easy and quick to administer. Ornstein (1993)
reports a number of strengths of norm-referenced tests, including: ‘(a) they assume
statistical rigor in that they are reliable and valid; (b) the quality of test items is often high
in that they are developed by test experts, pilot tested and have undergone revision prior
to publication and use; and (c) administration procedures are standardized and the test
items are designed to rank examinees for the purpose of placing them in specific
programs or instructional groups’ (Ornstein, 1993 in: Ford, 2009).
McCauley and Swisher (1984) agree with the notion that, if properly used, normreferenced tests are useful for diagnostics. However, they do argue that test
administration and processing is prone to error, which is detrimental to test reliability and
validity. To support this claim, they discuss four errors commonly made when using
norm-referenced tests. The first error frequently made, arises when age-equivalent scores
are used. The authors warn that these scores should be used with caution, because they
are psychometrically imprecise and may lead to misinterpretation of the results. They
suggest that summarizing test results with standard scores or percentile ranks, as is the
case with the Dutch language tests evaluated in this study, serves as a valuable
alternative. A second error may occur when interpreting the test profile. When normreferenced tests are used, the standard error of measurement and the confidence interval
are often provided to calculate the range in which the child’s true score is expected to be
found. However, the scores on a profile are only estimates of the true score that a child
would obtain if the scores were truly reliable and error-free. The deviation between a pair
of scores on a profile can therefore be interpreted as difference in performance, while this
difference might as well originate from measurement error. This may result in wrong
conclusions about an individual’s strengths and weaknesses. Moreover, children will
sometimes vary in their performance and therefore vary in their test scores, without the
7
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
existence of a language problem. The third error specified by McCauley and Swisher is
the use of repeated testing as a means of assessing progress. Because norm-referenced
tests are designed to look at differences between individuals, they often consist of items
that cover a broad range of skills. The authors claim that, because of the limited number
of items that assess the individual skills, norm-referenced tests are likely to be less
sensitive to changes in behavior over time. The last error discussed by the authors is the
use of test items in planning goals for therapy. First of all, they state that the number of
items on a norm-referenced test is too small to draw conclusions, since not all forms and
all developmental levels of the skills tested are covered, and conclusions cannot be drawn
from individual errors. Secondly, most test profiles fail to provide enough detailed
information to be used when planning goals for therapy.
A study by Merell and Plante (1997) supported the claim that norm-referenced tests
are not suitable for planning therapy goals. They examined the extent to which normreferenced tests are qualified to answer the questions “Is there a language impairment?”
and “What are the specific areas of deficit?”. By investigating the first question, the
authors assessed the tests’ sensitivity and specificity: the former addressing the tests’
accuracy in identifying children with language impairment, and the latter addressing their
accuracy in identifying typically developing children. Merrell and Plante tested 40
preschool children (20 with SLI and 20 typically developing children) with the Test for
Examining Expressive Morphology (TEEM) and the Patterned Elicitation Syntax Test
(PEST). After cut-off scores were statistically computed to discriminate maximally
between the two groups of children, sensitivity and specificity of both tests were still high
enough for accurate discrimination. The specific areas of deficit, however, could not
easily be identified because performance on similar structures varied in both tests. This
possibility of variable performance lead Merrell and Plante to conclude that individual
items could not be used to demonstrate mastery or deficit of specific structures and thus
could not be used when planning goals for therapy.
McCauley and Swisher (1984) claim that language sampling is an important
alternative to norm-referenced tests when describing children’s expressive language.
Although this method is time consuming and therefore was infrequently used at the time
of their writing, the authors claim it to be ‘a fertile source of suitable therapy objectives’
(McClauley & Swisher, 1984: 345). Additionally, language sample analysis can be
deployed when evaluating progress in a child’s language. This is because, contrary to
norm-referenced tests, no learning effect occurs (Van den Dungen en Verboog, 1998).
8
Besides being a valuable tool for the detailed analysis of specific structures and
deficits, the use of language samples has other benefits. First of all, research by Dunn,
Flax, Sliwinski and Aram (1996) indicated that measures of spontaneous language might
be more sensitive than standardized tests. These authors compared clinical judgement,
standardized test performance and measures of spontaneous language of preschool
children. They found that children who were clinically diagnosed as language impaired,
had deficits in spontaneous language even though they did not fail the standardized test.
This suggests that the measures of spontaneous language reflect language difficulties of a
child that were not assessed by the standardized test. Furthermore, language samples are
more ecologically valid than standardized tests, because they naturally reflect the child’s
language competence. The samples show the grammatical forms and the vocabulary used
by the child, but they also demonstrate how the child uses language to share information
with a listener. Finally, language sample analysis is also effective when diagnosing
children who are difficult to test with standardized tests, for instance because of
behavioural problems or high levels of performance anxiety (Costanza-Smith, 2010). In
relation to this, Stockman (1996) discusses the value of language sample analysis for
linguistic minority children. She claims that this method is more cultural sensitive, valid,
accessible and flexible than standardized tests in the process of diagnosing linguistic
minority children.
Apart from being time-consuming, language sample analysis has another
disadvantage. Eisenberg (1996) points out the inability to draw conclusions about non- or
infrequently produced structures. When using language sampling as a diagnostic tool,
frequency of production is used as evidence that a child has acquired a particular
structure. However, if the child’s production of a structure does not meet the criterion
frequency, it is unclear whether this reflects a child’s lack of knowledge about the
structure or if it reflects other factors such as a lack of opportunities to use the structure in
the discourse situation. Eisenberg therefore suggests that language sample analysis may
not be sufficient for studying all aspects of child language, and that some form of elicited
production is desirable.
As shown above, the use of both methods for language analysis is subject to debate.
An overview of the discussed strengths and weaknesses of both instruments is provided
in table 2.2.
9
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
Table 2.2 Strengths and weaknesses of norm-referenced tests and language sample analysis
Strengths
Weaknesses
High reliability and validity
Measurement error often interpreted as
(Ornstein, 1993)
performance error
High quality of test items
(McCauley and Swisher, 1984)
(Ornstein, 1993)
Less suitable for assessing progress
Norm-referenced
Ranking of examinees in specific
and planning goals for therapy
tests
programs or instructional groups
(McCauley and Swisher, 1984)
(Ornstein, 1993)
Age-equivalent scores are prone to
Standardized procedures
error
(Ornstein, 1993)
(McCauley and Swisher, 1984)
Detailed
analysis
of
language
Time consuming
production
(Ornstein, 1993)
(e.g. Merell and Plante, 1997; Prutting
Difficult
et al., 1975; Blau et al., 1984)
knowledge from lack of production
Suitable for difficult-to-test children
opportunities in cases of infrequent
Language sample
and linguistic minority children
produced structures
analysis
(Costanza-Smith,
(Eisenberg, 1996)
2010;
Stockman,
to
distinguish
lack
of
1996)
High ecological validity
(Costanza Smith, 2010)
High sensitivity
(Dunn et al., 1996)
No learning effect
(Van den Dungen & Verboog, 1998)
Some of the strengths and weaknesses shown in table 2.2 are disputable or differ in
relevance depending on the specific tests used for assessment. The implications of table
2.2 for the Dutch tests are discussed in chapter 6.
In the past decades, several studies have been conducted on the use of norm-referenced
tests and language sample analysis in the diagnosis of children with language impairment.
Often these studies have shown a mismatch between the results on both test methods.
Prutting, Gallagher and Mulac (1975) investigated the relationship between the
syntactic structures produced on the Northwestern Syntax Screening Test (NSST) and the
same structures produced in a spontaneous language sample. The NSST elicits syntactic
structures using a type of sentence repetition. The examiner showed 20 pairs of pictures
10
with grammatical distinctions to 12 four- and five- year old children with language delay.
The pictures were introduced by using pre-determined sentences. Subsequently, the target
structures were elicited by the examiner pointing at the picture and asking ‘What is this
one?’ or ‘What is that one?’. For a child to have acquired a structure, the produced
sentence form had to be identical to the form used by the examiner. Besides
administration of the NSST, two language samples were collected and structures identical
to those of the NSST were used for further investigation. Comparison of both methods
demonstrated that 30% of the children failed to produce a grammatical distinction on the
NSST, but correctly generated this distinction in spontaneous language. The authors
concluded that these results indicated that item analysis of the NSST did not accurately
represent the children’s spontaneous language skills, which made the NSST an instrument
merely suitable for screening. A spontaneous language sample could in their opinion be
used as a diagnostic tool to analyse specific syntactic structures. Although these results
seem straightforward, the authors only reported whether the failed items of the NSST
were produced correctly in spontaneous language. They did not address NSST
performance of the structures produced incorrectly in spontaneous language. The current
study attempts to look at the error production by using the tests as well as spontaneous
speech as a starting point for analysis.
A similar conclusion to that of Prutting et al. (1975) was drawn by Blau, Lahey and
Oleksiuk-Velez (1984), who studied whether the Carrow Elicited Language Inventory
(CELI) could be used when developing goals for language intervention. The authors
tested ten children with language impairment with the CELI and additionally analysed a
language sample of the children. As with the findings by Prutting, Gallagher and Mulac,
all of the children made fewer errors in their language sample than on the CELI, but
correlations between the scores of the CELI and the language sample were high enough
to conclude that the CELI could function as a diagnostic tool. However, based on goals
that were determined in an earlier stage of assessment, most errors produced on the CELI
were not considered as immediate goals for intervention. The language samples on the
other hand did lead to these specific goals and also provided content and context in which
the goals could be taught.
2.3 The current study
The mismatches between norm-referenced tests and language sample analysis found in
earlier studies, agree with the authors clinical observations of Dutch children who show a
11
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
mismatch between performance on language tests and performance in spontaneous
language. The current study therefore attempts to assess and compare performance on
two norm-referenced tests for language production on the one hand, and a language
sampling method for Dutch children with SLI on the other hand. The main question of
this thesis is: To what extent do performance on norm-referenced language tests and
results of language sample analysis lead to comparable conclusions? To answer this
question, multiple case-studies will be conducted and the following questions will be
answered:
1. To what extent does group performance on the norm-referenced tests correspond
to the results of the language sample analysis?
2. To what extent do individual scores on the CELF and Schlichting correspond to
the results of the STAP?
3. Are the problem structures identified by the CELF or Schlichting also identified
by the STAP and vice versa?
These questions will be discussed in chapter 4 and 5. The next chapter provides a
description of the CELF, Schlichting and STAP.
12
3.
Linguistic and clinical aspects tested by CELF, SCHLICHTING and
STAP
In this chapter, three diagnostic tools used to assess the language production of children
with (suspected) language impairment will be described. Sections 3.1 through 3.3 provide
a description of the individual instruments. Each section describes the goal and target
group, materials, procedure, methods for scoring and interpretation, and psychometric
properties of the diagnostic tool. These psychometric properties were derived from the
test manuals and from reports of the Dutch commission for test matters (COTAN:
Commissie Testaangelegenheden Nederland, www.cotandocumentatie.nl). Section 3.4
provides a more detailed comparison of the instruments and the structures they assess.
3.1
CELF-4-NL
The CELF-4-NL is a Dutch adaptation of the CELF-4 (Clinical Evaluation of Language
Fundamentals, fourth edition) by Semel, Wiig and Wayne (2003). The CELF-4-NL
(henceforth CELF) by Kort, Schittekatte and Compaan was published in 2008. The test
was designed to provide an overview of a child’s general language ability and to assess
performance on specific language areas. Norms are available for 5;0-15;0 year old Dutch
speaking children and estimated norm scores are available for the ages 16;0 to 18;0 (Kort
et al. 2008). To facilitate diagnosis, the CELF provides a four-level assessment process,
consisting of the following levels:
1. Identification of a possible language disorder.
2. Description of the nature of the disorder.
3. Evaluation of possible underlying deficits.
4. Evaluation of language and communication in context.
The first level is assessed by administering four pre-selected tests addressing different
language areas. These four tests together provide the Core Language Score, which
determines the presence or absence of a language disorder. For the second level,
additional tests are administered. These tests provide scores to calculate the Receptive
Language Index, Expressive Language Index, Language Content Index and Language
Structure Index. The third level explores any possible connection between the language
problem and other abilities such as memory and rapid automatic naming. The fourth level
assesses how the disorder affects the children’s’ classroom performance (Kort et al.,
2008). An overview of the specific subtests that are used in each level of the four-level
13
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
assessment process is provided in Appendix 1. Because of their morphosyntacticproductive nature, main focus will be on the subtests Word Structure, Recalling
Sentences and Formulated Sentences.
3.1.1
Materials
The CELF consists of an examiner’s manual, a stimulus book, scoring forms,
Observational Rating Scale forms and Pragmatic Profile forms. The manual provides
guidelines for testing, scoring and interpreting scores, norm tables and a description of
the tests’ purpose, design and development. The flip-over stimulus book provides the
examiner with test information and model sentences, while simultaneously presenting the
test item to the child.
3.1.2
Procedure
Assessment of a child’s language has to be carried out by a speech therapist familiar with
the materials and procedures. The guidelines provided by the CELF have to be followed
accurately. Test order is generally based on the four-level assessment process, but other
orders are also possible. Although the procedure may vary slightly depending on the
subtest, general procedure is as follows:
1. The child is seated opposite the examiner, looking at the picture on the stimulus
book.
2. The examiner explains the test and administers practice items to check
understanding.
3. Test items are administered using the model sentences on the stimulus book or in
the manual. Depending on test or test-item, the examiner can repeat instructions.
If relevant, the examiner chooses the correct starting-item based on the child’s
age.
4. The examiner writes down the score and other relevant information on the score
form.
5. The examiner calculates the child’s scores.
The three subtests most relevant to this study are Word Structure, Recalling Sentences
and Formulated Sentences. In Word Structure the child is presented a picture on the
stimulus book. The examiner initiates a sentence and urges the child to complete it, as
shown in (1).
14
(1) The examiner points to the picture and says:
Dit is een jongen en dit is … (‘This is a boy and this is …)
The child completes the sentence by saying:
… een meisje (‘a girl’)
The subtest Recalling sentences requires the children to repeat a spoken sentence, as
shown in (2).
(2) The examiner reads the following sentence:
De jongen viel en deed zich pijn (‘The boy fell and hurt himself’)
The child is stimulated to repeat the sentences without changing it.
In Formulated Sentences the child looks at a picture on the stimulus book. The examiner
provides the child with a word that has to be integrated in a sentence matching the
picture.
(3) The examiner shows a picture of children crossing a finish line in a running
competition and provides the child with the word lachend (‘laughing’)
The child creates the sentence: Lachend komen de kinderen de finish over (‘The
children cross the finish line laughing’)
3.1.3 Scores and interpretation
The CELF uses quotient scores (mean 100, standard error 15) and percentiles (0 to 100)
as a standard for scoring. The raw data on the score forms can be converted into these
scores by using the designated norm tables that can be found in the manual. The manual
also provides age-equivalent scores, but the authors state that these scores should be used
with caution (Kort et al., 2008).
3.1.4 Psychometric properties
Norms of the CELF-4-NL are based on a sample of 1356 Dutch and Flemish children
aged 5 to 15 years old. All children lived in either The Netherlands or Flanders for at least
7 years and none of the children had a mental or physical disability. Norm groups
consisted of 77 to 152 subjects, depending on the age group (Kort et al., 2008). In 2010
15
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
the COTAN evaluated the CELF and concluded the norms for the 5 to 15 year old
children to be satisfactory (scale: good – satisfactory – poor). The reliability and construct
validity were also reported satisfactory for the three subtests used in the current study.
The criterion validity was not assessed and was therefore reported poor (Egberink et al.
2010a).
3.2
Schlichting Test for Language Production
The Schlichting Test for Language Production was originally designed as part of a test
battery to test language development of children up to the age of four years. The authors
claim that before development of the battery, which also includes a Dutch version of the
Reynell Verbal Comprehension Scale A (Reynell, 1985), no instruments were available
for testing young children with sufficient reliability and validity (Schlichting & lutje
Spelberg, 2003). The original Test for Language Production consisted of four tests that
could be used independently. These were the test for Sentence Development; a picturebased Vocabulary Test; a test for Auditory Memory and a vocabulary checklist. In 2010 a
renewed version of the Schlichting test was published (Schlichting Test voor
Taalproductie-II: Schlichting & lutje Spelberg, 2010). In this more recent version the
vocabulary checklist was omitted and a Narrative Task and a Pseudo-Word task were
added. At the same time, the Reynell was replaced by the Schlichting Test voor
Woordontwikkeling (Schlichting Test for Word Development).
The test for Sentence Development is the subject of this study. This test assesses
grammatical production based on functional imitation of sentences, meaning that the
child (partly) imitates utterances produced by the examiner in a functional context.
Functional imitation is viewed as a good measure of syntactical knowledge, since it is
claimed that young children cannot imitate structures that are not part of their own
linguistic system (Schlichting & lutje Spelberg, 2003). The utterances were designed to
have a communicative purpose, as shown in example (4) below. The complete list of
structures tested can be found in Appendix 2.
(4) The examiner chooses a picture, puts it in a paper frame and says:
Ik denk dat ik de auto neem (‘I think I’ll take the car’)
The child is invited to imitate the examiners actions and says:
Ik denk dat ik de appel neem (‘I think I’ll take the apple’)
16
The goal of the Test for Sentence Development as described in the manual, is to measure
the syntactic productive knowledge of Dutch children (Schlichting & lutje Spelberg,
2010). However, in an earlier stage Schlichting and lutje Spelberg acknowledged that it is
impossible to test the children’s knowledge of all syntactic structures by using only 40
items. They therefore proposed that the test assessed the child’s knowledge of ‘certain
structures in certain linguistic (and cognitive) contexts’ (Schlichting and lutje Spelberg,
2003; 249).
The test is designed to assess 2;0-7;0 year old Dutch children with a possible
language delay and is furthermore claimed to be suitable for diagnostics and the
assessment of progress (Schlichting & lutje Spelberg, 2010).
3.2.1 Materials
The Schlichting Test for Language Production consists of an examiner’s manual, a
stimulus book, stimulus materials and scoring forms. The manual provides guidelines for
testing, scoring and interpreting scores, norm tables and a description of the tests’
purpose, design and development. The stimulus book provides scenes that, in
combination with the stimulus materials, are used in different test items.
3.2.2 Procedure
Assessment of a child’s language has to be carried out by a speech therapist familiar with
the materials and procedures and the guidelines provided by the Schlichting Test for
Sentence Development have to be followed accurately. General procedure is as follows:
1. The child is seated opposite of the examiner (looking at the stimulus book).
2. The examiner lays out the scene on the stimulus book that corresponds to the
practice item, explains the test and administers practice items to check
understanding.
3. Test items are administered. Answers elicited are: exact imitation; imitation with
variation; complementation; and answering. The examiner can repeat instructions
and utterances depending on the specific item. If relevant, the examiner chooses
the correct starting-item based on the child’s age.
4. The examiner writes down the utterance on the scoring form.
5. The examiner calculates the child’s scores.
17
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
3.2.3
Scores and interpretation
After the child’s utterances are written down, the examiner analyses them and scores
them as either ‘passed’ or ‘failed’. The score form provides the target responses for all
items. The total score can subsequently be converted into quotient scores (mean 100,
standard error 15) or percentile scores (0 to 100), using the norm table included in the
manual. For the test for Sentence Development, the quotient score is called the
‘zinsquotiënt’ (‘sentence quotient’) or ZQ for short.
3.2.4
Psychometric properties
The exact number of children included in the norm group of the test for Sentence
Development is unknown. Schlichting and lutje Spelberg (2010) report that the sample
used in the norm study for the Test for Language Production-II, ranged from 635 to 983
children per subtest, depending on the age group. The twelve age groups included 67 to
101 children. In 2010, COTAN evaluated the Schlichting Test for Language ProductionII and concluded the norms to be good (scale: good – satisfactory – poor). Reliability was
rated satisfactory and construct validity was rated good. Due to minimal study of the
criterion validity by the authors, this criterion was rated poor (Egberink et al. 2010b).
3.3
STAP
Development of a first version of the STAP (Spontane Taal Analyse Procedure:
Language Sample Analysis Procedure) was initiated by Magreet van Ierland in 1975. The
definitive version of the STAP was realised by Van den Dungen and Verbeek (1994), and
was published by the department of Linguistics of the University of Amsterdam. A
theoretical motivation by Verbeek, Van den Dungen and Baker was published in 2005.
At the start of the first STAP study, barely any spontaneous language data on Dutch
children was available. Therefore two goals were formulated. The first goal was to collect
language production data of typically developing Dutch children. The second goal was to
develop an instrument that could qualify and quantify the productive language of Dutch
children. Data was collected of 240 4;0-8;0 year-old children attending regular schools.
This age range was chosen because in 1975 little was known about the language
production of Dutch children after the age of four. The current STAP system is therefore
also intended to assess 4;0 to 8;0 year old children (Verbeek et al., 2007).
18
In the formal diagnosis of language impairments, the STAP cannot be the only
diagnostic tool used in the assessment process. This method is, however, frequently used
to support results of other tests and to justify the conclusions drawn by the clinician.
The STAP assesses multiple language domains. In some of these domains
complexity as well as correctness is assessed, in other domains only correctness is
assessed. Domains assessed both for complexity and correctness are morphology and
syntax. Phonology, semantics and pragmatics are only assessed for correctness. Appendix
3 shows an overview of the variables assessed by the STAP (Verbeek et al., 2007; 111).
3.3.1 Materials
The STAP manual consists of guidelines for recording and transcribing language samples
and instructions for the analysis of language samples. It also contains eight STAP forms
used for the analysis of language samples, four STAP-profile forms for comparison of a
child’s performance on the observed variables to one of the four norm groups, and a
STAP-summary form that provides a quick overview of the child’s performance.
3.3.2 Procedure
Assessment of a child’s language using STAP has to be carried out by a clinician familiar
with language sampling and language analysis, and the guidelines provided by the STAP
manual have to be followed accurately. The procedure consists of the following steps:
1. Engaging a conversation with the child.
A conversation between clinician and child is taped on video or audio. No
materials are used and at least 50 full utterances have to be collected. Elliptical
answers are not included in this counting, but are counted separately.
2. Transcribing the conversation following the STAP guidelines.
3. Segmenting the transcript following the STAP guidelines.
4. Analysing the transcript following the STAP guidelines.
5. Filling out the STAP-profile form
6. Filling out the STAP-summary form
7. Interpreting the data
3.3.3 Scores and interpretation
After the transcript is analysed, total scores of each of the variables presented in
Appendix 3 can be calculated. Subsequently, the obtained total scores can be drawn on
19
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
the STAP-profile form (Appendix 4). On this form, all variables are presented,
accompanied by the total scores and matching standard deviations (-2, -1, 0, 1, 2) of the
norm group of typically developing age-matched children. By drawing the scores on the
profile-form, interpretation of the scores is facilitated:
- Scores drawn on or to the right of -1 standard deviation are interpreted as being
average or above average.
- Scores drawn between -1 and -2 standard deviations are interpreted as being
moderately deviant (the child being moderately impaired on the associated
morphosyntactic structure).
- Scores drawn on or left of -2 standard deviations are interpreted as being severely
deviant (the child being severely impaired on the associated morphosyntactic
structure).
After calculating total scores and drawing them on the STAP-profile, the STAP-summary
(Appendix 4) can be filled out. The summary form facilitates quick assessment of the
severity of the language disorder, by providing scores of overlapping morphosyntactic
categories such as ‘syntactical errors’ or ‘morphological complexity’ (Van den Dungen &
Verbeek, 1999). These overlapping categories are composed of specific variables
presented on the profile form. As an example, to award a score to the category
‘syntactical errors’ on the summary form, the scores of the variables ‘main verb missing’
and ‘agreement errors’ of the profile form are analyzed. Following guidelines, whichever
of the two variables within a category scores the lowest, has to be noted on the summary
form. Thus, the highest of the two scores is not taken into account. To illustrate with the
example above: performance on the category ‘syntactical errors’ for the participant
shown in Appendix 4 is based on the variable ‘agreement errors’, because performance
on this variable is worse than performance on the variable ‘main verb missing’.
3.3.4
Psychometric properties
Norms of the STAP are based on a sample of 240 children between the ages of 4;0 and
8;0, divided in four age groups. The STAP was not evaluated by the COTAN, since it is
not a diagnostic test. Because analysis is based on a sample of spontaneous language,
reliability of the STAP is expected to be low (Van den Dungen & Verboog, 1998).
Validity, however, increases when the child’s language in the test situation represents the
child’s normal language, as is the case in the STAP (Costanza-Smith, 2010). An
20
assessment of the inter-rater reliability by Verbeek et al. (2007) had satisfactory results
for most of the variables tested with the STAP.
3.4
Comparison of the CELF, Schlichting and STAP
Schlichting and lutje Spelberg (2010) assessed the correlations between the standard
deviations of the Schlichting tasks and the tasks that are used to determine the core score
of the CELF. The correlations of the tasks relevant for this thesis are provided in table 3.1
below.
Table 3.1 Correlations between standard deviations of CELF and Sentence Development of the Schlichting
(Schlichting, 2010; 34).
CELF Subtest
Schlichting
SD
WS
RS
FS
0,62*
0,48*
0,27
*p<0.01. SD=Sentence Development, WS=Word Structure, RS=Recalling Sentences, FS=Formulated Sentences
Although for Word Structure and Recalling Sentences the correlations to the Schlichting
are significant, the correlations in table 3.1 are not very high. Sentence Development
shows the highest correlations to the subtest Word Structure of the CELF. The correlation
of Formulated Sentences to the Schlichting is, on the other hand, very low. The authors
claim that this might be explained by the CELF’s focus on both structure and content,
rather than full focus on structure. Children have to produce a sentence that is both
syntactically and semantically correct. In some cases, the sentence provided by the child
will be syntactically correct, but the child does not obtain full score because the sentence
does not completely match the event described on the picture presented by the examiner.
Schlichting and lutje Spelberg (2010) indicate that in the Schlichting Test for Sentence
Development, the difficulty-level of the items is deliberately kept low and the focus is
mainly on structure. Regarding the correlation of the Schlichting Test for Sentence
Development and Recalling Sentences of the CELF, it shows that the correlation is
surprisingly low given the fact that both tests are based on sentence repetition. The
authors do not provide any suggestions as to how this low correlation could be explained.
When comparing the items on both tests, it seems plausible that differences in
performance arise from a difference in number of items and a difference in
morphosyntactic complexity. The children included in the correlation study fell within the
age range of 5;7 to 7;3. The CELF subtest Recalling Sentences only provides eight items
21
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
for this age range, whereas the Schlichting provides thirty. Furthermore, item difficulty
increases rapidly on the CELF, whereas the Schlichting shows a gradual increase in
difficulty.
A full overview of the morphosyntactic structures assessed by the CELF, Schlichting Test
for Sentence Development and STAP is provided in Appendix 5. An overview of the
overlapping structures is provided in table 3.2.
Table 3.2 Shared morphosyntactic structures assessed by the CELF, Schlichting and/or STAP, according to
their respective manuals.
Morphosyntactic
Schlichting
CELF
STAP
structure
WS
RS
FS
x
x
x
Noun
x
x
x
Verb
x
x
x
x
Adjective
x
x
x
x
Coordination
x
x
x
x
Subordination
x
x
x
Pronoun
x
x
x
Past participle
x
x
Adverbial adjunct
x
x
Comparative
x
x
Negative
x
x
Passive
x
x
Relative clause
x
x
Adverb
FS= Formulated Sentences, RS=Recalling Sentences, WS=Word Structure
In total, the CELF and STAP assess seven structures that are similar and the Schlichting
and STAP share eight similar structures. This is, however, only when the structures
reported by the manuals are used as a starting point. An item-level analysis of the target
responses reveals differences in labeling, underlying structures, and scoring. This brings
to question whether the structures in table 3.2 indeed are comparable.
The first problem with labeling is the fact that some structures are assessed by both
tests, but are labeled differently. An example is the demonstrative pronoun die (‘that
one’) assessed by the CELF subtest Word Structure but labeled ‘subject’ by the
Schlichting in for instance the sentence die daar (‘that one there’). Another example is
lachend (‘laughing’) labeled as an adverb by the CELF, whereas the Schlichting labeled
the antonym huilend (‘crying’) as a present participle. Secondly, some complex
morphosyntactic structures that seem relevant are not labeled at all, simply because they
are not the target item of the sentence. An example is the diminutive vriendinnetje, not
22
labeled as such by the Schlichting in the sentence haar vriendinnetje (‘her girlfriendDIM’), where the target structure is the pronoun haar (‘her’).
When examining the items with the supposed shared structures in more detail, it
emerges that they often differ in underlying variables. This is shown by the example
items provided in Appendix 6. Some of these items (e.g. pronouns and relative clauses)
are quite similar, but most of the items are less comparable than initially thought. An
example is the structure ‘verb’, assessed by both the CELF and the Schlichting. The
target structure of the Schlichting is the infinitive slapen (‘to sleep’), while the CELF
elicits a past tense 3th person gaf (‘gave’). Since both manuals state that the items assess
the structure ‘verb’, it is initially expected for both items to be comparable. However,
they are not, and neither infinitives nor the finite past tense were assessed by the other
test.
A final difference between both tests is the strictness for scoring the child’s
responses. While the CELF subtests have a very strict scoring system that takes even a
slight deviation into account, the Schlichting manual allows for more variability and often
allows for deviant utterances to be produced, as long as the specific target structure is
produced correctly.
Because of the differences between the CELF and Schlichting described so far, the
question arises to what extent the results of these two tests are comparable. This is
relevant because the results of both tests have to be combined in order to answer the main
research question. Moreover, differences in structure labels and underlying variables
make it unfeasible to provide a detailed analysis of performance on comparable items.
However, in order to remain true to the tests as they were intended by their authors,
analysis will be based on the structures as reported by the CELF and Schlichting manuals
The next chapter provides a description of the subjects, data collection and methods
of data analysis used in the current study.
23
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
4.
Methods
The current research is based on document analysis. Data were obtained by analyzing ten
files of eight children who visited one of the two participating institutions treating and/or
diagnosing children with language difficulties: Pento Centre for Audiology in Amersfoort
and the Sophia Children’s Hospital in Rotterdam. This chapter will provide an overview
of the methods for analysis. The criteria used to select subjects will be discussed in 4.1
followed by the methods for data-collection in section 4.2. Section 4.3 describes the
methods used for analysis.
4.1
Subjects
The results of this study are based on the data of eight children clinically diagnosed as
being language impaired. All children were aged 4;0 to 8;0, since they had to be within
the age range for which STAP has norm data. Children with a hearing loss of more than
20 decibels were excluded from the study and all children had a non-verbal IQ above 85.
Furthermore, all children were tested within a 6 month period by either the CELF and
STAP or Schlichting and STAP. When no full STAP analysis was available, there had to
be at least one spontaneous speech sample available for which a STAP analysis could be
carried out.
In total, ten files were analyzed. For the subjects C4a and S4a in table 4.1, data were
available of respectively twelve (C4b) and eight (S4b) months after the initial test data.
The availability of these data allows for a comparison over time, but initially the children
are treated as separate subjects. C1 and C3 were tested at Pento Centre for Audiology, all
other data were collected at the Sophia Children’s Hospital.
An overview of the children of whom data were used for this study, can be found in
table 4.1. As this table shows, there is an age difference between the children tested by
the CELF and the children tested by the Schlichting. Children in the latter group are up to
3,5 years younger than the children in the first group. This is not surprising, because the
Schlichting is often used for younger children. Finding children who all fell within the
same age category, proved to be impossible.
24
Table 4.1 Characteristics of subjects
Participant
Test
Gender
Age at test
date
Age
STAP
IQ
C1
C2
C3
C4a
C4b
S1
S2
S3
S4a
S4b
CELF
CELF
CELF
CELF
CELF
Schlichting
Schlichting
Schlichting
Schlichting
Schlichting
Female
Female
Female
Male
Male
Female
Female
Female
Male
Male
7;0
7;7
7;9
7;0
8;0
4;7
5;10
6;4
5;4
6;0
7;3
7;7
7;10
7;0
8;0
4;7
5;10
6;4
5;4
6;0
Estimate average
Estimated average
107 (WNV)
106 (WISC)
106 (WISC age 7;0)
121 (SON-R)
110 (SON-R)
114 (SON-R)
107 (WPSSI age 6;0)
107 (WPSSI)
C4b same participant as C4a, S4b same participant as S4a. WNV= Wechsler Non-Verbal, WISC= Wechsler Intelligence Scale,
SON=Snijders-Oomen Non-Verbal Intelligence Test, WPSSI= Wechsler Preschool and Primary Scale of Intelligence .
4.2
Data collection
All data were collected through analysis of existing test reports. Either the children were
selected by the author herself or they were pre-selected by the supervising clinician based
on age, IQ and hearing status. The CELF data that were available for analysis consisted
of completely filled out score profiles, which included the children’s Core Language
Scores and index scores. Filled out score forms from the subtests Word Structure,
Recalling Sentences and Formulated Sentences were also available. The Schlichting data
consisted of ZQ scores and filled out score forms with the children’s responses to the
individual items. The STAP data consisted of the original transcripts and the examiners
analyses of these transcripts. For the subjects C1 and C3 this analysis was conducted by
the author, analyses of the remaining children were carried out by the supervising
clinician working at the Sophia Children’s Hospital. STAP profile forms and summary
forms were not yet available in the test reports. These forms were therefore filled out by
the author following the guidelines described in 3.3.3. Examples of a STAP-profile and
STAP-summary are presented in Appendix 4.
4.3
Analysis
In order to assess the extent to which performance on norm-referenced tests and the
STAP lead to comparable conclusions, analysis took place at two levels. First, the scores
of both methods were compared in a group-level analysis. At the second level, the
subjects’ performances on both methods were analyzed individually. Additionally, scores
of C4a and C4b and scores of S4a and S4b were compared.
25
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
For the group-level analysis, the grammaticality and complexity scores of the STAP
were compared with either the CELF scores or the Schlichting scores to determine the
relations between the different measures. In order to make the STAP scores comparable
to the scores of the CELF or Schlichting, the STAP’s scoring system was slightly adapted
at two points.
The first adaptation concerned the calculation of more precise standard deviations for
the STAP. While norm-referenced tests allow for precise standard deviations to be
calculated, the STAP only provides the standard deviations -2, -1, 0, 1, 2 or between -2
and -1, between -1 and 0, between 0 and 1, and between 1 and 2, as described in section
3.3.3. Through consultation with one of the STAP authors, it became clear that precise
standard deviations for the STAP could not be computed statistically. Therefore, scores
that fell within the intermediate categories, were round of with 0,5 standard deviations.
When for instance a variable on the profile form scored ‘between -1 and 0 standard
deviations’, this variable was scored as -0,5 standard deviations.
A second adaptation was the calculation of mean standard deviations for both the
grammaticality and the complexity of the children’s utterances on the STAP. This was
done by using the scores linked to the categories that were filled out on the STAPsummary (section 3.3.3 and Appendix 4). The mean scores of the sections Gegevens over
ongrammaticaliteit (‘data on ungrammaticalities’) and Gegevens over complexiteit (‘data
on complexity’) were calculated. These mean scores could then be compared with the
scores obtained by either the CELF or the Schlichting. Due to the small number of
subjects per norm-referenced test, no statistical analysis could be conducted.
For the individual analysis, the questions previously proposed by Merrell and Plante
(1997) were discussed: “Does the tool indicate language impairment?” and “What are the
specific areas of deficit?”. In order to answer the first question, individual scores on both
methods were compared. The criteria used to establish the presence of a language
disorder are based on the criteria used in the formal diagnosis of Dutch children with
possible language impairment, as described in section 2.2. These criteria were as follows:
26
CELF:
Core Language Score ≤ -2 SD, or: two or more subtests ≤ 1,5 SD
Schlichting:
ZQ ≤ -1,5 SD
STAP:
Grammaticality or complexity ≤ -1,5 SD
In order to discuss the question “What are the specific areas of deficit?”, criteria for the
identification of problem structures have been defined. For the STAP, morphosyntactic
structures presented on the profile form with scores on or below -1,5 standard deviations
were interpreted as being a problem structure. This is because this cut-off score is most
frequently used in formal diagnostics. For the CELF subtests, morphosyntactic structures
were considered problem structures when 75% or more of the items of a given structure
was produced erroneously. Items were labeled as being produced erroneously, when they
obtained zero points for an item. Example (5) demonstrates how problem structures were
identified following CELF guidelines.
(5) The examiner reads the following sentence:
Het boek werd niet door de leraar naar de bibliotheek teruggebracht.
(‘The book was not returned to the library by the teacher’)
The child repeats the sentence as follows:
De boek is niet door leraar naar bibliotheek.
(‘The book is not to library by teacher’)
According to the CELF manual, the intended target structure in example (5) is a passive
sentence with negation. Most of the time, the children did indeed produce the intended
target structure incorrectly. Example (5) shows, however, that this is not always the case,
as the negation niet (‘not’) was present in the child’s utterance. However, because of the
CELF guidelines, the amount of errors in the sentence above lead to a zero points score
for this item. By adapting the 75% criterion, only those structures that were produced
erroneously above chance were labeled as being problem structures.
For the Schlichting, all structures produced incorrectly were noted and problem
structures that occur twice or were marked individually. This is because most structures
were only tested once. Example (6) demonstrates the identification of problem structures
by the Schlichting.
(5) The examiner reads the following sentence:
Nu wil ik deze. (‘Now I want this one’)
The child repeats the sentence as follows:
Ik wil deze (‘I want this one’)
27
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
According to the manual, structures tested in this sentence are the subject, auxiliary verb,
object, adverbial adjunct and word-order inversion. In this example the problem
structures would be the adverbial adjunct nu (‘now’) and the word-order inversion.
The next chapter presents and discusses the results obtained by these analyses.
28
5.
Results
This chapter presents the scores obtained by the subjects on either the CELF and STAP or
the Schlichting and STAP. In section 5.1 an overview of the individual scores is
presented. These scores will be compared at group-level. Section 5.2 reviews individual
performance and describes whether both methods enable for the same conclusions to be
drawn.
5.1 Scores
Test scores of the CELF and Schlichting were copied from the children’s test files and are
presented in table 5.1 and table 5.2. These tables also include the manually computed
STAP scores (section 4.3). Table 5.1 shows the scores of the children tested using the
CELF and STAP. An accompanying scatter plot is provided in figure 5.1.
Table 5.1 Scores (standard deviations) of the children tested by the CELF and STAP
Participant
Core
Language
Score
Structure
index
Word
Structure
Recalling
Sentences
Formulated
Sentences
STAP
Grammaticality
STAP
Complexity
C1
C2
C3
C4a
C4b
-2,3
-1,8
-1,5
-2,2
-2,1
-1,7
-1,8
-0,9
-1,7
-1,9
-2,1
-1,7
-1,0
-1,7
-1,7
-2,4
-3,0
-1,0
-2,7
-2,8
-1,3
-0,7
-1,3
-0,7
-0,7
-1,4
-1,6
-1,9
0,7
0,1
-1,3
-0,7
-0,1
0,3
0,5
CELF/STAP
1
0,5
CLS
Test scores
0
-0,5
C1
C2
C3
C4a
C4b
SI
-1
WS
-1,5
RS
-2
FS
-2,5
STAPg
-3
STAPc
-3,5
Participants
Figure 5.1 Scatterplot of scores of the children tested by the CELF and STAP.
CLS = Core Language Score, SI = Structure Index, WS = Word Structure, RS = Recalling Sentences, FS = Formulated Sentences,
STAPg = STAP grammaticality 2nd score, STAPc = STAP complexity
29
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
Although the sample size is too small to draw solid conclusions, some findings can be
highlighted. The STAP scores in table 5.1 show that, in all children except C4a,
grammaticality scores were lower than complexity scores. Furthermore, when comparing
the STAP and CELF scores of these four children, it shows that the Core Language Score
corresponds more to grammaticality than to complexity of the STAP. In all children
except C3 Formulated Sentences corresponds most to complexity. An explanation for the
latter finding is difficult to find. The first findings will be discussed together with the
results of table 5.2 and figure 5.2.
For the CELF, table and figure 5.1 show that all children except C3 score lowest on
Recalling Sentences. As described in section 3.4, these low scores might be explained by
the high complexity of the sentences that have to be recalled. Additionally it is shown that
in all children the Core Language Score (global score) is lower than the Structure Index
(specific score). This implies that the children do not only demonstrate problems in
morphosyntax, but also have difficulties in other linguistic domains.
Observing the dispersion of the CELF and STAP scores, the figure shows that the
scores of C1 and C3 are the closest together. They show a respective maximum
difference of 1,1 and 1,4 standard deviations between methods. The other subjects show
maximum differences of 2,3 standard deviations or higher.
Table 5.2 shows the scores of the children tested using the CELF and STAP. An
accompanying scatter plot is presented in figure 5.2.
Tabel 5.2 Scores (Standard deviations) of children tested by the Schlichting test for Sentence Production and
STAP
Participant
Schlichting Sentence Quotient
STAP Grammaticality
STAP Complexity
S1
-1,9
-1,6
-0,6
S2
-1,8
-1
-0,9
S3
-1,9
-1,6
-1,3
S4a
-1,2
-1,8
1
S4b
-1,2
-1,9
0,2
30
Schlichting/STAP
1,5
1
Test scores
0,5
0
S1
S2
S3
S4a
S4b
Schlichting
-0,5
STAPg
-1
STAPc
-1,5
-2
-2,5
Participants
Figuur 5.2 Scatterplot of the scores of the children tested by the Schlichting Test for Sentence Development and STAP.
STAPg = STAP grammaticality 2nd score, STAPc = STAP complexity
As with most of the CELF children, all Schlichting children perform better on STAP
complexity than on grammaticality. These results imply that the children have acquired
knowledge on various morphosyntactic structures, but have trouble consistently
producing the structures in a correct way. Additionally, figure 5.2 shows that in all
children the Schlichting Sentence Quotient corresponds most to grammaticality of the
STAP. Together with the similar results for the Core Language Score of the CELF, this
might imply that the norm-referenced tests primarily assess grammaticality, rather than
complexity.
As for the dispersion of the scores, table 5.2 shows that the scores on both methods
are less dispersed than the CELF and STAP scores. STAP grammaticality shows a
maximum difference of 0,8 standard deviations to the Schlichting. For the complexity
score of the STAP this maximum difference is 2,2 standard deviations.
5.2 Case studies
This section will provide a global comparison of the conclusions drawn by the CELF,
Schlichting and STAP, and will provide an overview of the problem structures identified
by these instruments. As explained in section 3.4, it is not feasible to provide a detailed
31
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
item-level analysis of comparable structures. Therefore analysis was based on the
structures as reported by the CELF and Schlichting manuals.
Tables 5.3 and 5.4 on the following pages provide answers to the questions “Does the
test indicate language impairment?” and “What are the specific areas of deficit?”. In order
to answer these questions, the criteria described in section 4.3 were applied. The last
columns of tables 5.3 and 5.4 present the structures that were identified as being a
problem structure in both methods.
The first columns of table 5.3 show that, based on their CELF results, four out of five
subjects tested by the CELF and STAP are diagnosed as having SLI. However, based on
their STAP results, only two of the children can be diagnosed as language impaired.
Table 5.4 shows that the Schlichting diagnoses three out of five subjects as language
impaired, whereas four children are diagnosed when the STAP criteria were applied.
Hence, the norm-referenced tests diagnose seven out of ten subjects with SLI and the
STAP diagnoses six out of ten subjects with SLI. However, only for C2, S1 and S3 the
same diagnosis was provided by both methods.
With regard to the second question, tables 5.3 and 5.4 show that structures tested by a
high number of items occur most often as a problem structure. For the CELF, these
structures are the irregular plural (Word Structure), passive (Recalling Sentences) and
conjunctions (Formulated Sentences). For the subjects tested by the Schlichting, complex
clauses were the most problematic. STAP categories that were often addressed as a
problem structure are noun group deletion (8 out of 10 subjects) and agreement error (7
out of 10 subjects). Additional problematic STAP categories for the children tested by the
Schlichting, are main verb deletion, determiner deletion, number of finite verbs and either
number of- or errors with the past perfect.
The last column of table 5.3 shows that there is only one problem structure identified
by both the CELF and STAP. This is the adjective, represented in CELF as well as on the
STAP profile form, and produced erroneously on both methods by C1. Semi-overlapping
structures are coordinations tested by the STAP and conjunctions tested by the CELF,
both produced erroneously by C2. These categories are tied closely together, but are not
exactly the same. Although conjunctions (C2, C3, C4b) are also part of the STAP
analysis, they are not represented on its profile form. It can therefore be stated that,
although produced erroneously on both methods, conjunctions are a structure that is
identified by the CELF but not by the STAP. Conversely, children also produced errors
on both methods that were identified by the STAP but not by the CELF. These errors are
32
agreement errors, noun group deletions or determiner deletions. For the Schlichting and
STAP, problem structures that were identified by both methods are the finite verb (S1,
S3) and adverbial adjuncts (S3). Semi-overlapping errors are determiner deletion of the
STAP and definite article deletion of the Schlichting (S1, S3, S4, S4b). Furthermore, S3
and S4b show tense problems on both methods. As with some of the CELF children, S2
and S3 have problems with conjunctions on both methods, but the STAP does not report
these problems on their profile form.
In order to assess development over time, performances of C4a and C4b and
performances of S4a and S4b on both methods were compared. Tables 5.3 and 5.4 show
that performances of both children are relatively stable. C4a and C4b both meet the CELF
criteria for SLI, but language sample scores do not indicate language impairment. For
both methods, the number of problem structures decreases over time. Diagnoses for S4a
and S4b also correspond. However, this time spontaneous language indicates the
language impairment, whereas the Schlichting scores do not meet the criteria for SLI.
Remarkably, the number of problem structures on both methods increases over time.
33
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
Tabel 5.3 Comparison of the children tested by the CELF and STAP.
“Does the test indicate language impairment?”
C1
C2
CELF
Yes
CELF
Yes
STAP
No
STAP
Yes
“What are the specific areas of deficit?”
CELF
Irregular plural
Diminutive
Pronoun
Passive
Conjunctions
Adjective
CELF
Irregular plural
Separable compound verb
Irregular past perfect
Passive
Conjunction
STAP
Noun group deletion
Word order error
Agreement error
Main verb deletion
Adjective
STAP
Determiner deletion *
Noun group deletion
Agreement error
Word order error
Coordinations
Overlapping morphosyntactic structures
CELF/STAP
Overlapping: Adjective
Agreement error also observed in CELF Recalling
Sentences, however not a CELF target structure.
Semi-overlapping: Coordinations – conjunctions
Determiner deletion, noun group deletion and
agreement error also frequent in CELF Formulated
sentences, however not a CELF target structure.
Conjunction errors also present in STAP, however
counted in category ‘other’ and therefore not a direct
target structure
C3
C4a
C4b
CELF
No
STAP
Yes
CELF
Irregular plural
Relative clause
Subordination
Conjunction
CELF
Yes
STAP
No
CELF
Yes
STAP
No
CELF
Irregular plural
Diminutive
Passive
Relative clause
Conjunction
CELF
Separable compound verb
Passive
Conjunctions
STAP structures marked with * occur more than twice as often as the number reported under -2 standard deviations.
34
STAP
Past tense error
Noun group deletion
Determiner selection error
Word order error
Main verb deletion
Noun (total number)
Total of adverbial adjuncts
STAP
Agreement error
Predicate (total number)
STAP
-
Conjunction errors also frequent in STAP, however
counted in category ‘other’ and therefore not a direct
target structure
-
Conjunction errors also present in STAP, however
counted in category ‘other’ and therefore not a direct
target structure
Tabel 5.4 Comparison of the children tested by the Schlichting and STAP.
“Does the test indicate language impairment?”
S1
Schlichting
Yes
STAP
Yes
S2
Schlichting
Yes
STAP
No
Schlichting
Yes
STAP
Yes
S3
S4a
Schlichting
No
“What are the specific areas of deficit?”
Schlichting
Adverbial adjunct °
Complex clauses:
- Object clause °
Subject
Finite verb
Numeral (‘geen’)
Schlichting
Complex clauses:
- object clause
- dependent clause °
- subordinate clause
Numeral
Conjunction
Definite article
Schlichting
Conjunction °
Neuter definite article °
Complex clauses:
- object clause
- dependent clause °
Adverbial adjunct
Subject
Finite verb
Present participle
Reflexive pronoun 3th person
STAP
Yes
STAP
Agreement error *
Noun group deletion
Main verb deletion
Determiner deletion
Word order error
Finite verb (total number)
STAP
Agreement error *
Noun group deletion
Main verb deletion
Finite verb (total number)
Past perfect (total number)
STAP
Agreement error *
Noun group deletion *
Determiner deletion *
Main verb deletion
Word order error
Pronoun 3th person (total number)
Finite verb (total number)
Past tense (total number)
Past perfect (total number)
Adverbial adjunct of time
Other adverbial adjuncts
STAP
Agreement error
Noun group deletion
Past perfect error
Determiner deletion
Overlapping morphosyntactic structures
Schlichting /STAP
Overlapping: Finite verb
Conjunction errors also present in STAP, however
counted in category ‘other’ and therefore not a direct
target structure
Overlapping:
Determiner – definite article deletion
Adverbial adjunct
Finite verbs
Tense problems present in both methods.
Conjunction errors also present in STAP, however
counted in category ‘other’ and therefore not a direct
target structure
Schlichting
Complex clauses
Overlapping: Determiner – definite article deletion
- subordinate clause °
- dependent clause °
Postposition
Neuter definite article
STAP
S4b
Schlichting
STAP
Schlichting
No
Yes
Complex clauses
Determiner deletion
Overlapping: Determiner – definite article deletion
- dependent clause °
Main verb deletion
- subordinate clause °
Past perfect error
Tense problems present in both methods
Neuter definite article
Noun group deletion
Tense (past > present)
Word order error
Reflexive pronoun 3th person
Finite verb (total number)
Adjective
Schlichting structures marked with ° were produced erroneously twice or more. STAP structures marked with * occur more than twice as often as the number reported under -2 standard deviations.
35
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
6.
Conclusion & Discussion
The current thesis attempted to assess the extent to which performance on standardized
norm-referenced language tests and an instrument for language sample analysis lead to
comparable conclusions. The two norm-referenced tests evaluated in this study were the
CELF-4-NL and the Schlichting test for Sentence Development. The STAP was
evaluated as the instrument for language sample analysis. Since no previous research on
correspondence between the results of these Dutch tools was found, the current study was
predominantly explorative. In order to answer the research questions, multiple case
studies were conducted and performance on the three instruments was assessed
qualitatively at two levels. First, the scores on the norm-referenced tests and the STAP
were analyzed at group-level. The second level compared individual scores on the two
methods and identified problem structures.
Three main findings emerged from the group-level analysis. First, it was found that the
Schlichting ZQ and the CELF Core Language Score corresponded most to the STAP’s
grammaticality score. This may suggest that norm-referenced tests predominantly assess
grammaticality rather than complexity. The second finding was a higher performance on
the STAP’s complexity than on grammaticality for nearly all children. This implies that
the children have more knowledge of complex morphosyntactic structures than could be
demonstrated by the use of standardized tests. Finally, with regard to the dispersion of the
scores on both methods, it was found that scores of the Schlichting and scores of the
CELF subtest Formulated Sentences approached the STAP scores most.
Analysis of the individual subjects showed a low correspondence between
performances on both methods. In only three out of ten subjects, both test methods led to
the same conclusion. Although previous clinical observations and assessment of the
instruments indicated the existence of some differences between methods, a
correspondence this low was surprising. As for the identification of problem structures,
analysis of the target responses reported by the CELF and Schlichting manuals revealed
differences in labeling, incomparability of underlying structures and differences in
methods of scoring. Because of these differences, identification and comparison of the
exact problems would be most accurate when target structures of each item were
identified or specified by the author. However, in order to remain true to the tests as they
were intended by their authors, comparison was based on the structures as reported by the
36
manuals. Results showed that the CELF, Schlichting and STAP all differ considerably in
the structures they identify as problem structures. Only few of the structures identified by
the STAP were also identified by the norm-referenced tests and vice versa. This is
primarily caused by the differences in target structures assessed by both methods.
Finally, in order to assess whether the methods allowed for a comparison over time
to be conducted, two subjects were tested twice. It was found that the norm-referenced
tests and the STAP provided the same conclusions for both children. The tools also
showed the same patterns of either a decreasing or increasing number of problem
structures. The identified structures itself were mostly identical. Present findings
therefore seem to indicate a possibility for all three methods to assess language
development. However, this conclusion requires verification by means of further
empirical studies.
In conclusion, the current study reveals only a slight correspondence between
performance on standardized tests for morphosyntactic assessment and the results on the
instrument for language sample analysis. Results of the individual analysis displayed no
specific pattern. Further conclusions could therefore not be drawn. Although in practice
both methods have proved to be valuable for the assessment of children with SLI, these
results indicate that caution is needed in choosing one method over the other. As was
shown by the group-level analysis, the tasks are not equivalent. Until more extensive
studies prove otherwise, the current procedure (i.e. the use of spontaneous language as an
addition to standardized language tests) seems justified.
The low correspondence between test methods found in the present study, agrees with
some of the earlier studies on English children as well as with clinical observations of the
author. However, as opposed to the literature studies, no pattern could be identified.
Prutting et al. (1975) and Blau et al. (1984) showed that children produced fewer errors in
their spontaneous language than on standardized tests. Although this finding is true for
some of the children tested in the current study, the result could not be generalized.
Contradicting findings by Dunn et al. (1996) could neither be confirmed in the current
study.
When the problem structures found in the current study are compared to the
structures described by Gillis and Schaerlaekens (2000) and Bol and Kuiken (1988), the
overlap is relatively small. Passives, pronouns and long and composed sentences were
found to be problem structures in all three studies. A overlap this small might either be
37
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
caused by the difference in target structures assessed by the specific tools, or it might be
caused by differences in underlying variables or methods.
As for the strengths and weaknesses of both methods, most of the positions stated in
table 2.2 are applicable to the CELF, Schlichting and STAP. Three positions will be
highlighted because they were influencing factors for some of the test results. First of all,
Ornstein (1993) proposed that the standardized procedures were one of the advantages of
norm-referenced tests. Although this is theoretically indisputable, analysis of the scoring
forms that were used in the current study showed that clinicians would not always
accurately follow the guidelines provided by the manuals. Thus, despite of the
standardized methods, scoring and analysis of norm-referenced tests will unfortunately
vary depending on the clinician. Secondly, McCauley and Swisher (1984) claimed that
measurement error on norm-referenced tests will often be interpreted as performance
error. This is true for the CELF and Schlichting, but this weakness can also be accredited
to the STAP. Some structures tested by the STAP occur infrequently to such a degree,
that one occurrence less or more can make a difference between for instance ‘-1 standard
deviations’ and ‘-2 standard deviations’ on the STAP profile form. Measurement error
therefore also has a great impact on the outcome of the language sample analysis. Finally,
McCauley and Swisher stated that norm-referenced tests are less suitable for assessing
progress and planning goals for therapy. However, both the CELF and Schlichting are
commonly used for this purpose even though most structures assessed by the tests are
only tested a couple of times.
Apart from the limitations described above, multiple factors influenced the outcome of
this study. One of the factors previously specified, is the limited comparability of the
instruments and the target structures they assess. The current study evaluated problem
structures based on the structure names provided by the manuals. Often these structures
were not fully comparable. If only completely comparable structures were assessed,
outcomes of the structure analysis might have been different. Furthermore, because none
of the subjects were tested by both norm-referenced tests, no clear comparison between
these tests could be made. Therefore, it is difficult to determine whether results of both
tests can be combined in order to answer the main research question. Another important
factor is the amount of subjects used for this study. Since the number of available subjects
was limited, the current study only assessed five test files per standardized test. This
sample is too small to perform a statistical analysis. Based on the results of this study, no
38
significant conclusions could therefore be drawn. In addition, the ages of the two groups
of subjects differed considerably. As a consequence, the subjects tested by the Schlichting
(i.e. the younger children) demonstrated more problems on the STAP than the children
tested by the CELF. This also affected comparability of the two test groups. Finally, the
current method to calculate STAP scores is based on the use of mean scores and roundoff scores. Additionally, these scores are calculated by means of the STAP-summary,
which is composed of a limited number of structures. As described in chapter 3.3.3., these
structures are selected based on their low scores. Therefore, when all tested structures
would be taken into account, the STAP scores presented in chapter 4 would deviate and
would probably be higher.
Although the current study has some shortcomings, the results of this study are
interesting and hopefully inspire other researchers to explore this topic in more detail.
The question whether both methods for language assessment are sensitive and specific
enough to be used independently and interchangeably could not be answered in this
study, thus further research is warranted. For follow-up studies it would be interesting to
compare two larger groups of subjects assessed by the STAP as well as by both of the
norm-referenced tests. If the number of subjects per test is high enough, the research
could be quantitative as well as qualitative. Furthermore, by assessing subjects tested by
both norm-referenced tests, it is possible to actually draw conclusions about
comparability of language sample analysis and norm-referenced tests in general, as
opposed comparability to specific norm-referenced tests only. Additionally, in order to
increase comparability of test performances it would be preferable if only one clinician
tested all children. If test sessions are recorded, a second clinician could review the test
results or even fill out a new score form, which allows for the interrater reliability to be
assessed. Although for this study it was not possible to calculate STAP scores any other
way, it might be desirable to investigate whether it is possible to calculate scores that are
more accurate.
39
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
References
American Psychiatric Association. (2000) Diagnostic and statistical manual of mental
disorders, Fourth Edition, Text Revision. Washington, DC: American Psychiatric
Association.
Bishop, D.V.M. (1992). The underlying nature of specific language impairment. Journal
of Chilld Psychology and Psychiatry, 33 (1), 3-66.
Blau, A.F., Lahey, M., & Oleksiuk-Velez, A. (1984). Planning goals for intervention:
Language testing or language sampling? Exeptional Children, 51 (1), 78-79.
Bol, G. & Kuiken, F. (1988) Grammaticale analyse van taalontwikkelingsstoornissen.
Ph.D. dissertation University of Amsterdam. Utrecht: Elinkwijk.
Conti-Ramsden, G., Crutchley, A., & Botting, N. (1997). The extent to which
psychometric tests differentiate subgroups of children with SLI. Journal of Speech,
Language and Hearing Research, 40, 765-777.
Costanza-Smith, M. (2010). The clinical utility of language samples. Perspectives in
Language and Learning, 17, 9-15.
Culatta, B., Page, J.L. & Ellis, J. (1983) Story Retelling as a communicative performance
screening tool. Language, Speech, and Hearing Services in Schools, 14, 66-74.
Daal, J. van, Verhoeven, L., & Van Balkom, H. (2004). Subtypes of severe speech and
language impairments: Psychometric evidence from 4-year-old children in the
Netherlands. Journal of Speech, Language and Hearing Research, 47, 1411-1423.
D’hondt, M., Desoete, A., Schittekatte, M., Kort, W., Compaan, E., Neyt, F., Folfliet, M.
& Surdiacourt, S. (2008) De CELF-4-NL: een opvolger voor de TvK?, Signaal, 65, 416.
Dungen, L. van den & Verboog (1998). Kinderen met taalontwikkelingsstoornissen.
Bussum: Uitgeverij Coutinho.
Dungen, L. van den & Verbeek, J. (1994) STAP-Handleiding. STAP-instrument,
gebaseerd op Spontane-Taal Analyse Procedure, developed by M. van Ierland.
Number 63, november 1999, Amsterdam: University of Amsterdam, Faculty of
Humanities.
Dunn, M. & Flax, J. (1996) The use of spontaneous language measures as criteria for
identifying children with specific language impairment: An attempt to reconcile
clinical and research incongruence. Journal of Speech & Hearing Research, 39 (3),
643-654.
40
Egberink, I.J.L., Vermeulen, C.S.M., & Frima, R.M. (2010a). COTAN beoordeling 2010,
CELF-4-NL [COTAN review 2010, CELF-4-NL]. Retrieved April 14, 2014 from
www.cotandocumentatie.nl
Egberink, I.J.L., Vermeulen, C.S.M., & Frima, R.M. (2010b). COTAN beoordeling 2010,
Schlichting Test voor Taalproductie [COTAN review 2010, Schlichting Test for
Language Production]. Retrieved April 14, 2014 from www.cotandocumentatie.nl
Eisenberg, S. (1997). Investigating children’s language: A comparison of conversational
sampling and elicited production. Journal of Psycholinguistic Research, 26 (5), 519538.
Ford, H. (2009, December 23). Norm-Referenced Testing. Education.com. Retrieved
December
11,
2013
from
http://www.education.com/reference/article/norm-
referenced-testing/
Jong, J. de. (1999) Specific language impairment in Dutch: Inflectional morphology and
argument structure. Doctoral dissertation. Rijksuniversiteit Groningen.
Kort, W., Schittekatte, M., & Compaan, E. (2008) CELF-4-NL: Clinical Evaluation of
Language
Fundamentals-vierde-editie.
Amsterdam:
Pearson
Assessment
and
Information B.V.
Lahey, M. (1990). Who shall be called language disordered? Some reflections and one
perspective. Journal of Speech and Hearing Disorders, 55, 612-620.
Leonard, L.B. (2000) Children with specific language impairment. Massachusetts: MIT
Press.
McCauley, R.J., & Swisher, L. (1984) Use and misuse of norm-referenced tests in clinical
assessment: A hypothetical case. Journal of Speech and Hearing Disorders, 49, 338348.
Merrell, A. W. & Plante, E. (1997) Norm-referenced test interpretation in the diagnostic
process. Language, Speech, and Hearing Services in Schools, 28, 50-58.
Newcomer, P. & Hammill, D. (1988) Test of Language Development-2 Primary. Austin,
Texas: Pro-Ed.
Ornstein, A.C. (1993). Norm-referenced and criterion-referenced tests: An overview.
NASSP Bulletin, 77(555), 28–39.
Plante, E. (1998). Criteria for SLI: The Stark and Tallal legacy and beyond. Journal of
Speech, Language and Hearing Research, 41, 951-957.
41
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
Prutting, C.A., Gallagher, T.M., & Mulac, A. (1975). The expressive portion of the NSST
compared to a spontaneous language sample. Journal of Speech and Hearing
Disorders, 40, 40-48.
Rapin, I. & Allen, D. (1987) Developmental dysphasia and autism in preschool children:
characteristics and shape’, in: J. Martin, P. Fletcher, P, Grunwell & D. Hall (eds.)
Proceedings of the First International Symposium on Specific Speech and Language
Disorders in Children, pp. 20-35. London: AFASIC.
Reynell, J., & Huntley, M. (1985) Reynell Developmental Language Scales, 2nd
Revision, NFER-Nelson.
Schlichting, J.E.P.T., Eldik, M.C.M. van, lutje Spelberg, H.C., Meulen, Sj. van der, &
Meulen, B.F. van der (1998) Schlichting Test voor Taalproductie. Tweede gewijzigde
druk. Lisse: Swets & Zeitlinger.
Schlichting, J.E.P.T., & lutje Spelberg, H.C. (2003). A test for measuring syntactic
development in young children. Language Testing, 20 (3), 241-266.
Schlichting, J.E.P.T., & lutje Spelberg, H.C. (2010). Schlichting Test voor TaalproductieII; Handleiding. Houten: Bohn Stafleu van Loghum.
Semel, E., Wiig, E.H., & Secord, W.A. (2003). Clinical evaluation of language
fundamentals, fourth edition (CELF-4). Toronto, Canada: The Psychological
Corporation/A Harcourt Assessment Company.
Stark, R.E., & Tallal, P. (1981). Selection of children with specific language deficits.
Journal of Speech and Hearing Disorders, 46, 114–122.
Stockman, I. J. (1996). The promises and pitfalls of language sample analysis as an
assessment tool for linguistic minority children. Language, Speech, and Hearing
Services in Schools, 27, 355-366.
Swisher, L., Plante, E. & Lowell, S. (1994). Nonlinguistic deficits of language-impaired
children complicate the interpretation of their nonverbal IQ scores. Language, Speech,
and Hearing Services in Schools, 25, 235–240.
Verbeek, J., Dungen, L. van den & Baker, A. (2007) Verantwoording van het STAPinstrument, ontwikkeld door Margreet van Ierland. Amsterdam: University of
Amsterdam, Faculty of Humanities.
Voogd, L. de (2009). Criteria ESM; CVI Noord Nederland. Retrieved January 5 from
http://www.rec2noordnederland.nl/
42
Appendix 1
Subtests of the CELF-4-NL used in the four-level assessment process
5-8 years old
Level 1
Identify whether or not there is a language disorder
Core Score
Concepts and Following Directions
Word Structure
Recalling Sentences
Formulated Sentences
Level 2
Describe the nature of the disorder
Receptive Language Index
Language Content Index
Concepts and Following Directions
Concepts and Following Directions
Word Classes 1 or 2 receptive
Word Classes 1 or 2 total
Sentence Structure
Expressive Vocabulary
Understanding Spoken Paragraphs
Expressive Language Index
Language Structure Index
Word Structure
Word Structure
Recalling Sentences
Recalling Sentences
Formulated Sentences
Formulated Sentences
Word Classes 1 or 2 expressive
Sentence Structure
Expressive Vocabulary
Level 3
Evaluate underlying clinical behaviours
Phonological Awareness
Word Associations
Rapid Automatic Naming
Number Repetition
Familiar Sequences
Level 4
Evaluate language and communication in context
Pragmatic Profile
Observational Rating Scale
43
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
Appendix 2
Structures assessed by the Schlichting test for Sentence Development
Item
Utterance
Linguistic structures
1
2
pop
die hier
die ook
die slapen
die lopen
die water drinken
die ook drinken
die moet eten
die gaat liggen
die kan niet
ook een vis
ook een eend
die moet hierop
die hoort hierbij
nou wil ik deze
die heb ik thuis ook
nou zet ik deze hier
nou moet deze weer hier
ik weet waar deze woont
ik weet waar die moet
geen ogen
ik denk dat ik de auto/appel neem
de trap op
zit voor het kleine raam
one-word utterance
two-word utterance
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
44
ik weet hoeveel dit er zijn
die ik in mijn hand heb, is zwart/rood
omdat ie … finite verb
ik geef hem de schaar om te knippen
deze kan wél vliegen, maar deze niet
het mes moet hier
huilend kwam hij de kamer binnen
… zich … wassen/afdrogen
doet de gestreepte broek aan
hij is vergeten zijn jas aan te doen
de man zonder baard
is de sleutel waar je de deur mee
open kan maken
die zit te wachten tot de eieren
uitkomen
die is voor haar
een klein huis
zij huilt, zij lacht
op hun paard
een heel groot/klein boek
two (or more) word utterance with verb
verb, object and/or adverb
subject, auxiliary verb, infinitive
negative
adverbial adjunct, indefinite article, noun
subject, verb, adverbial adjunct
subject, auxiliary verb, object, adverbial adjunct
subject, verb, object, 2 adverbial adjuncts
(subject, verb), 3 constituents
(main clause), object clause introduced by waar
(‘where’)
geen (numeral - ‘not any’), noun
(main clause), object clause introduced by dat (‘that’)
postposition
(neuter noun preceded by) preposition, definite article
and adjective
(afhankelijke vraag met ‘er’)
dependent clause
subordinate clause introduced by omdat (‘because’)
dependent clause with om te (‘to’)
coordination with maar (‘but)
neuter definite article het
present participle, postposition
reflexive pronoun 3th person
past participle used as an adjective
dependent clause with te (‘to’)
negative preposition
relative clause with waarmee (‘with which’)
coordination with tot (‘untill’)
personal pronoun 3th person female (object form)
indefinite article, adjective without -e
personal pronoun 3th person female (subject form)
possessive pronoun 3th person plural
indefinite article, adverb, adjective, neuter noun
35
36
37
38
39
40
ik vind dat deze er het mooiste uitziet
die zijn voor hem
die doet net of hij even groot is als die
dit is een foto van een man met een
bloem op zijn hoed
haar vriendinnetje
wordt gebeten door de hond
object clause with er, complement
plural verb present tense
comparative
3 modifiers following the noun
possessive pronoun 3th person singular female
passive object clause with door (‘by’)
45
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
Appendix 3
Structures assessed by the STAP
STAP-PROFILE
Global language aspects
Length of utterance
MLU
ML5U (mean length of 5 longest utterances)
non-fluencies word-level
non-fluency
Elliptic answers
Elliptic answers
Unintelligible
Unintelligible
Correctness
Ungrammaticalities
Ungrammatical utterances
Total of grammatical errors
Semantic deviating utterances
Moderate semantic deviating utterances
Severe semantic deviating utterances
Pragmatic deviating utterances
Moderate pragmatic deviating utterances
Severe pragmatic deviating utterances
Morphosyntactic complexity
Connecting main-/subordinate clauses
Coordinations
Subordinations
Verb group
Finite verb
Predicate
Past tense
Past perfect
Noun group
Noun
Adjectives
Pronoun 3th person
Adverbial adjuncts
Total of adverbial adjuncts
Adverbial adjuncts of place
Adverbial adjuncts of time
Other adverbial adjuncts
Specifications of ungrammaticalities
Verb group
Main verb deletion
Agreement error
Past tense error
Past participle error
Word order
Word order error
Specifications of unfluencies
Fals starts (n.o. words)
Self-corrections (n.o. words)
Repetitions (n.o. words)
Blending constructions (n.o. words)
46
Noun group
Noun group deletion
determiner deletion
determiner selection error
Appendix 4
STAP-profile
47
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
48
STAP-Summary
49
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
Appendix 5
Comparison of structures assessed by the CELF-4-NL, Schlichting test for
Sentence Development and STAP
Morphosyntactic structures
(number of items per structure displayed in brackets)
CELF-4-NL
Word structure
Recalling sentences
Formulated sentences
Regular plural (4)
Active form (1)
Noun (2)
Irregular plural (4)
- omitted conjunction (1)
Verb (2)
Diminutives (5)
- syntactic contraction (1)
Adjective (1)
Demonstr. pronouns ‘die’/’dat’ (2)
- multiple adjectives (2)
Adverb (4)
Separable compound verbs (2)
- subordination (8)
Conjunction (10)
Regular past participle (2)
- relative clause (5)
Complex preposition (1)
Irregular past participle (2)
- negation (3)
In plaats van (‘instead
Personal pronouns (4)
Question (3)
of’)
Comparative/Superlative (5)
- negation (1)
Passives (1)
- negation (2)
- syntactic contraction (2)
- passive question (1)
Schlichting
STAP
Sentence development
Verb (4)
Object clause (3)
Multiple constituents (1)
Noun (4)
Passive object clause (1)
3 Modifiers following
Subject (4)
Dependent clause (3)
noun (1)
Object (3)
Subordinate clause (1)
Past participle used as
Adverb (2)
Coordination (2)
adjective (1)
Adjective (3)
Relative clause (1)
Reflexive pronoun 3th
Auxiliary verb (2)
Comparative (1)
person (1)
Infinitive (1)
Numeral (non-existence) (1)
Personal pronoun 3th
Present participle (1)
Negative (1)
person (2)
Plural verb, present tense (1)
Adverbial adjunct (4)
Possessive pronoun 3th
Indefinite article (3)
Preposition (2)
person plural (1)
Definite article (1)
Postposition (2)
Possessive pronoun 3th
Neuter definite article (1)
Multiple constituents (1)
person singular (1)
Correctness
Complexity
Ungrammatical utterances
MLU (words)
Noun
Word order errors
Coordinations
Adjectives
Main verb missing
Subordinations
Pronoun 3th person
Agreement error
Finite verb
Adverbial adjuncts
Past tense error
Predicate
Past participle error
Past tense
Noun-group missing
Past participle
Determiner missing
Determiner selection error
50
Appendix 6
Example items of morphosyntactic structures shared by the CELF -4-NL and
Schlichting test for Sentence Development
Example sentences were copied from Schlichting et al. (1998) and Kort et al. (2008).
Target structures are underlined.
Nouns

Formulated Sentences (CELF) – item 1
The child is provided with the word auto (‘car’) and is required to formulate a
sentence that fits the presented picture, such as: Ik zit in de auto (‘I’m sitting in
the car’).
 Schlichting – item 13
The examiner points at a picture of two faces and says:
Kijk eens, hier zijn twee mannen. Die heeft een mond, een neus en ogen. Die heeft
een mond, een neus, maar die heeft … (‘Look. These are two men’. Points at the
first man while saying: ‘That one has a mouth, a nose and eyes’. Points at the
second man while saying: ‘That one has a mouth, a nose but that one has…’).
The child completes the sentence by saying: geen ogen (‘no eyes’).
Verbs
 Formulated Sentences (CELF) – item 3
The child is provided with the word gaf (‘gave’) and is required to formulate a
sentence that fits the presented picture, such as: Moeder gaf mij pap (‘mother gave
me porridge’).
 Schlichting – item 3
The examiner points at figure animals and says:
Nou gaan de dieren allemaal slapen. Die slapen, die slapen en … (‘Now all
animals are going to sleep’. Puts the first animal down and says: ‘that one
sleeping’. Repeats action and says: ‘that one sleeping’). Taps at last animal to get
the child to put the animal down and say: die slapen (‘that one sleeping’).
51
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis
Adjectives

Recalling Sentences (CELF) – item 10
The examiner reads the following sentence to the child, who has to repeat it
afterwards:
De grote, bruine hond heeft al het eten van de poes opgegeten (‘The big, brown
dog ate all of the cats’ food’).

Formulated Sentences (CELF) – item 9
The child is provided with the word beste (‘best’) and is required to formulate a
sentence that fits the presented picture, such as: ‘jij bent de beste gitaarspeler’
(‘You are the best guitar player’).

Schlichting – item 34
The examiner points at a picture showing objects of different sizes and says:
Een hele grote bril, een hele kleine bril. Een hele grote fles, een… (‘A very big
pair of glasses, a very small pair of glasses. A very big bottle, a …’) .
The child is urged to complete the sentence and name the next pictures by saying:
… een hele kleine fles. Een heel groot huis, een heel klein huis. (‘… a very small
bottle. A big very big house, a very small house’).
Subordinations

Recalling Sentences (CELF) – item 16
The examiner reads the following sentence to the child, who has to repeat it
afterwards:
Omdat het morgen zaterdag is mogen we vanavond lang opblijven (‘We’re
allowed to stay up late tonight because tomorrow it’s Saturday’).

Formulated Sentences (CELF) – item 12
The child is provided with the word omdat (‘because’) and is required to
formulate a sentence that fits the presented picture, such as: De man mag eerst,
omdat hij blind is (‘The man can go first, because he’s blind’).

Schlichting – item 18
The examiner points at a picture showing a dog and a fish and says:
De heeft voeten en die niet. Die kan lopen en die niet. Waaróm kan die niet lopen?
(Pointing at the dog: ‘that one has feet’. Pointing at the fish: ‘and that one has
52
not’. Pointing at the dog: ‘that one can walk’. Pointing at the fish: ‘and that one
cannot’. Pointing at the fish: ‘why can’t that one walk?’)
Waits for the child to respond: omdat hij geen voeten heeft (‘because it doesn’t
have feet’) or any other related respond showing knowledge of subordinations.
Pronouns

Word Structure (CELF) – practice item 3 (no examples available for other items)
The examiner starts the following sentence, which has to be completed by the
child: Marjan zei: Deze schoenen zijn van jou en deze schoenen zijn… (‘Marjan
said: These shoes are yours and these shoes are…’).
The child completes the sentence by saying: …van mij (‘mine’).

Schlichting – item 29
The examiner points at two pictures and says: Zij gaan allebei een appel eten. Die
is voor hem en… (‘They are both going to eat an apple’ Puts an overlay of an
apple on the picture of the man and says: ‘This apple is his, and…’).
The examiner waits for the child to put the overlay on the picture of the girl and
say: …die is voor haar (‘…that one is hers’).
Relative clauses

Recalling sentences (CELF) – item 15
The examiner reads the following sentence to the child, who has to repeat it
afterwards:
De jongen is de voetballer die het winnende doelpunt maakte. (‘The boy is the
soccer player who scored the winning goal’).

Schlichting – item 30
The examiner subsequently points at pictures of two keys, a closet and a door, and
says: Dat zijn twee sleutels. Dat is een kast, dat is een deur. Dat is de sleutel waar
je de kast mee open kan maken en dat… (Pointing at the keys: ‘these are two
keys’. Pointing at the closet and the door: ‘that’s a closet, that’s a door’. Pointing
at the right key: ‘that’s the key which can be used to open the closet’. Pointing at
the left key while saying: ‘and that…’).
Waits for the child to respond …is de sleutel waar je de deur mee open kan maken
(‘…is the key which can be used to open the door’).
53
Morphosyntactic assessment in SLI: Standardized testing vs spontaneous speech analysis