Including Children Who Are Culturally or Linguistically Diverse

Do We Have To Choose
Between Accountability and
Program Improvement?
NECTAC’s Measuring Child and Family
Outcomes Conference
2006
Kristie Pretti-Frontczak
Kent State University
[email protected]
Jennifer Grisham-Brown
University of Kentucky
[email protected]
1
Overview of Session




Discuss the need for measuring child outcomes
as it relates to programming and accountability
purposes
Discuss three issues and associated
recommendations and related research
Discussion is encouraged throughout
Time will remain at the end for questions and
further discussion of what was presented.
2
Introductions and Setting a Context

Kristie Pretti-Frontczak


Jennifer Grisham-Brown


Kent State University
University of Kentucky
Belief/Bias/Recommended Practice

Authentic assessment is critical regardless of
purpose
3
CENTRAL QUESTION FOR
TODAY’S PRESENTATION


Can instructional data be used for
accountability purposes?
The Short Answer: Yes (IF) ….
4
Linked System Approach
Assessment
•Authentic
•Involves families
•Comprehensive
•Common
Goal Development
•Based upon
children’s emerging
skills
•Will increase access
and participation
•Ongoing
•Developmentally
and individually
appropriated
•Guides decisionmaking
•Comprehensive
and common
•Systematic
Evaluation
Instruction
5
If you….



If you assess young children
using a high quality
authentic assessment…
Then you’ll be able to
develop high quality
individualized plans to meet
children’s unique needs….
If you identify the individual
needs of children….
6



You’ll want to use the
information to guide
curriculum development…
If you have a curriculum
framework that is designed
around the individual needs
of the children…
Then you’ll want to
document that children’s
needs are being met…
7


Then you’ll need to monitor
children’s performance over
time using your authentic
assessment…
And when you have done
the authentic assessment
for a second or third time,
you’ll want to jump for joy
because all of the children
will have made progress!
8
Three Issues


Selection
Implementation

Interpretation
9
Questions around Selecting an
Assessment







Which tools/processes?
Which characteristics should be considered?
What about alignment to state standards or
Head Start Outcomes?
Use a single/common assessment or a list?
Allow for choice or be prescriptive?
Who should administer?
Where should the assessment(s) be
administered?
10
Recommendations


Use an assessment for its intended
purpose
Avoid comparing assessments to one
another – rather compare them to
stated/accepted criteria





Alignment to local/state/federal standards
Reliable and valid
Comprehensive and flexible
Link between assessments purposes
Link between assessment and intervention
11
Recommendations Continued

Allow for state/local choice if possible



Increases likelihood of a match
Increases fidelity and use
Avoids a one size fits all approach


if assessment is flexible and comprehensive 1
might work
Authentic, authentic, authentic



People who are familiar
Settings that are familiar
Toys/materials that are familiar
12
Generic Validation Process

Step 1– Create a Master Alignment Matrix



Step 2 –Create Expert Alignment Matrixes


Experts create a master matrix
Establish inclusion and exclusion criteria
Experts blind to the master matrix create their own alignment
matrixes
Step 3 – Validate Master Alignment Matrix



Compare master and expert matrixes
Ensure that all items that should be considered were placed on
the final matrixes
Examine the internal consistency of the final matrixes
Allen, Bricker, Macy, & Pretti-Frontczak, 2006; Walker, & Pretti-Frontczak, 2005
For more information on crosswalks visit:
http://www.fpg.unc.edu/~ECO/crosswalks.cfm or http://aepslinkedsystem.com
13
Concurrent Validity

Purpose:


Subjects:



31 Head Start children
Ranged in age from 48 months to 67 months (M=60.68, SD=4.65)
Methods:


To examine the concurrent validity between a traditional normreferenced standardized test (BDI-2) and an curriculum-based
assessment (AEPS®)
Six trained graduate students administered the BDI-2 and six trained
Head start teachers administered the AEPS® during a two-week period.
Conducted seven (7) bivariate 2-tailed correlations (Pearson’s and
Spearman’s)
Results:


Five correlations suggested a moderate to good relationship between the
BDI-2 and the AEPS
Two correlations suggested a fair relationship between the BDI-2 and
the AEPS
Hallam, Grisham-Brown, & Pretti-Frontczak, 2005
14
Concurrent Validity Results

Adaptive


Social


Self Care items from the BDI (M = 66.03, SD = 6.67) were
moderately correlated with Adaptive items from the AEPS (M =
62.03, SD = 13.57), r = .57, n = 31, p =.01.
Personal Social items from the BDI (M = 175.15, SD = 22.74)
had a fair correlation with Social items from the AEPS (M =
80.06, SD = 16.33), r = .50, n = 31, p =.01.
Communication

Communication items from the BDI (M = 121.06, SD = 16.22)
were moderately correlated with Social Communication items
from the AEPS (M = 88.61, SD = 14.20), r = .54, n = 31, p
=.01.
15
Concurrent Validity Results
Continued

Motor




Gross Motor items from the BDI (M = 82.76, SD = 4.70)
had a fair correlation with Gross Motor items from the
AEPS (M = 30.10, SD = 6.62), r = .48, n = 31, p =.01.
Fine Motor items from the BDI (M = 52.45, SD = 5.30)
were moderately correlated with Fine Motor items from
the AEPS (M = 26.39, SD = 5.68), r = .58, n = 31, p =.01.
Perceptual Motor items from the BDI (M = 27.73, SD =
3.63) were moderately correlated with Fine Motor items
from the AEPS (M = 26.39, SD = 5.68), r = .58, n = 31, p
=.01.
Cognitive

Cognitive items from the BDI (M = 135.85, SD = 23.44)
were moderately correlated with Cognitive items from
the AEPS (M = 81.26, SD = 24.26), r = .71, n = 31, p
=.01.
16
Project LINK


Head Start/University Partnership grant (Jennifer
Grisham-Brown/Rena Hallam)
Purpose: To build the capacity of Head Start
programs to link child assessment and
curriculum to support positive outcomes for
preschool children

Focus on mandated Head Start Child Outcomes




Concepts of Print
Oral Language
Phonological Awareness
Concepts of Number
Grisham-Brown, Hallam, & Brookshire, in press; Hallam, GrishamBrown, Gao, & Brookshire, in press
17
PRELIMINARY RESULTS FROM
PROJECT LINK: Classroom Quality


No significant differences between control
and intervention classrooms on global
quality (ECERS-R)
The quality of the language and literacy
environment (ELLCO) was superior in
intervention classrooms; significant in pilot
classrooms
18
PRELIMINARY RESULTS FROM
PROJECT LINK: Child Outcomes




Change scores in Intervention classrooms are
significantly higher than Control classrooms on
letter-word recognition subscale of FACES
battery.
The mean change scores were higher (although
not significantly so) on seven additional
subscales (11 total) of FACES battery - nearing
significance on PPVT
Results would probably have been greater with
larger sample
Results will be duplicated this year
19
Questions Around Training,
Implementation, and Use






Who will implement?
What level of training and support will staff
need?
What will be topics of training?
Who will provide training and support?
How will you know if staff are reliably collecting
data?
How ill you know if staff are procedurally
collecting data with fidelity?
20
Recommendations:

Training/Follow-up




Format
Topics
Classroom and administrative
Valid and reliable


Will require training and support
Will require seeing assessment as a critical
part of intervention/curriculum planning
21
What it takes!

Who?



All classroom staff
Administrators/consultants
What?




Instrument
Methods (e.g., observations, anecdotals, work
samples)
Data entry/management
Relationship to everything else (I.e., Linked system)
22
What it takes (cont.)

How?



Training that is “chunked”
Self-assessment
Follow-up, follow-up, follow-up




Mentoring
On-site technical assistance
Access to someone to call!
Involvement of administration
23
Can preschool teachers (with
appropriate training) collect reliable
data with fidelity?



Reliability study
Fidelity study
Accuracy study
Brown, Kowalski, Pretti-Frontczak, Uchida, & Sacks, 2002 ; Grisham-Brown,
Hallam, & Pretti-Frontczak, in preparation
24
Inter-Rater Reliability

Subjects:



Method:



7 Head Start Teachers
7 Head Start Teaching Assistants
Practiced scoring AEPS items from video
Scored AEPS items; Checked against master score provided by
author
Results:


7 of 7 teachers reached reliability at 80% or higher (range 85%
- 93%)
5 of 7 teaching assistants reached reliability at 80% or higher
(range 75% - 90%)
25
Fidelity Study

Subjects:


Six (6) Head Start teachers/teaching assistants who reached
80% or higher on interrater reliability study
Method:

Used fidelity measure to check teachers’ implementation of
authentic assessment within seven (7) planned activities


Six (6) Authentic Assessment Variables
 set up and preparation; decision making; materials; choice;
embedding; and procedure
Procedures


Observed participants collecting AEPS® data during each 7 small
group activities
Observed participants 7 times for up to 10 minutes per activity
26
Average Ratings on Six Authentic Assessment Variables across
Observations and Activities by Teacher
Setup and Preparation
Decision Making
Materials
3
Embedding
Child Choice
Procedures
Average Ratings
2.5
2
1.5
1
0.5
0
Abby
Amanda
Kate
Reba
Sarah
Vicky
Teachers
27
Average Ratings on Six Authentic Assessment Variables across
Observations for Seven Different Activities
3
Setup and Preparation
Decision Making
Materials
Embedding
Average Rating
Child Choice
Procedures
2
1
0
Outdoor Play
Dramatic
Play
A Book
About Me
Playdough
Manipulatives
Story Time
Snack
Authenic Assessment Activities
28
Accuracy Study

Study designed to investigate the accuracy of
teachers’ assessments of children’s skills and
abilities using observational assessment

Examined the degree of agreement between
assessments of children’s Language and Literacy and
Early Math skills made by their teachers using an
observational assessment instrument and
assessments of the same skills made by researchers
using a demand performance instrument.
Brown, Kowalski, Pretti-Frontczak, Uchida, & Sacks, 2002
29
Measures

Observational Measure - Galileo System’s Scales
(Bergan, Bergan, Rattee, & Feld, 2001)



Language & Literacy-Revised Ages 3-5 (n=68 items full scale)
Early Math-Revised Ages 3-5 (n=68 items full scale)
Demand Performance Measure

Items that could be readily assessed in individual, one-session,
performance-based interviews with children were selected from
the Galileo System’s scales and converted into demand
performance tasks to create two performance measures





Language & Literacy (n=21 items)
Early Math (n=23 items).
Items varied in difficulty and knowledge domain assessed.
Standardized sets of materials for administering tasks were also
developed (e.g., index cards with printed objects, books,
manipulatives, etc.).
The performance measures were piloted with preschoolers in
two regions of the state and revised accordingly.
30
Procedures

Trained research assistants visited sites across the state:




collected data teachers entered into the relevant observation scales of
the Galileo System; and
administered the Performance Measures.
In order to ensure that the most up-to-date information
was obtained from the Galileo System, data were
collected during the 2 weeks prior to and following a
state mandated entry date.
Order of administration of Performance Measures was
counterbalanced across assessment domains.
31
Participants

122 children



ranged in age from 3 to 6 years (M=4 years, 11
months)
100% in state-funded Head Start programs
66 teachers

Areas in which children are served




47% urban
41% suburban/small town
11% rural
Representation by use of the Galileo System



38% first-year users
32% second-year users
23% third-year users
32
Conclusions

Overall, levels of concordance were moderate.



In the domain in which teachers were most conservative in
attributing abilities to children, Language & Literacy, there was
the most amount of agreement between data teachers entered
into the Galileo System and the Performance Measure (71%).
In the domain in which teachers were most generous in
attributing abilities to children, Early Math, there was the least
amount of agreement between the data teachers entered into
the Galileo System and the Performance Measure (66%).
Reliability


Teachers using the naturalistic observation instrument (the
Galileo System) are not providing inflated estimates of children’s
skills and abilities.
However, they may be underestimating children’s skills and
abilities in the domain of Language & Literacy.
33
Questions Around
Interpreting the Evidence






What is evidence?
Where should the evidence come from?
What is considered “performing as same
age peers”?
How should decisions be made?
Who should interpret the evidence?
How can the ECO child summary form be
used?
34
What is Evidence?

Information (observations, scores, permanent
products) about a child’s performance across the
three OSEP outcomes




Positive social-emotional skills (including social
relationships)
Acquisition and use of knowledge and skills
Use of appropriate behaviors to meet their needs
The amount of type of evidence for each
outcome will vary
35
Where should the evidence
come from?



Multiple time periods
Multiple settings
Multiple people




Parents
Providers
Those familiar with the child
Multiple measures (should be empirically aligned)



Observations
Interviews
Direct tests
36
Required Decisions

Decision for Time 1

Is the child performing as same age peers?



Yes
No
Decision for Time 2

Did the child make progress?



YES – and performance is as you would expect of same age peers
YES – and performance is not as you would expect of same age
peers
NO progress was made
37
Things to Keep in Mind




“Typical/performing as same age peers” is NOT
average
“Typical” includes a very broad range of
skills/abilities
Child can be “typical” in one OSEP area and not
another
Progress is any amount of change




Raw score changed by 1 point
A single new skill was reached
Child needs less assistance at time two
If using the Child Outcome Summary Form


Child’s rating score does NOT have to change from time 1
to time 2 to demonstrate progress
Progress can be continuing to develop at a typical rate
(i.e., maintain typical status)
38
How Should the Required
Decisions be Made?

Some assessments will make the decision







Standard score
Residual Change Scores
Goal Attainment Scaling
Number of objectives achieved/Percent objectives
achieved
Rate of Growth
Item Response Theory (cutoff score)
Proportional Change Index
39
Making Decisions Continued

Regardless - Team conclusions….



should be based on multiple sources
should be based on valid and reliable
information
should be systematic

Can use the Child Outcome Summary Form

Will help with required decision and provide more
information for use at the local or state level
40
Child Outcome Summary Form



Single rating scale that can be used to systematize
information and make decisions
After reviewing the evidence rate the child’s
performance on each of the 3 outcomes from 1 to 7
Currently a score of 6 or 7 is considered to be
performance that is similar to same age peers.
Completely
7
Somewhat
6
5
Emerging
4
3
Not Yet
2
1
41
Getting from 7 to 3


Seven point rating scale just summarizes
the evidence
The required interpretation is still needed
a.
b.
c.
% of children who reach or maintain functioning
at a level comparable to same-age peers
% of children who improve functioning but are
not in “a”
% of children who did not improve functioning
42
Example

During a play-based assessment, IFSP/IEP team administered




The team then summarized the child’s performance using each
method’s internal summary procedures





a norm-referenced test
a curriculum-based assessment
an interview with relevant caregivers
Calculated a standard score
Derived a cutoff score
Narratively summarized interview
Lastly the team rated the child’s overall performance using
ECO’s Child Outcome Summary Form for each of the 3 OSEP
outcomes
Two years later as the child was being transitioned out of the
program, the results from a comprehensive curriculum-based
assessment were reviewed


The child’s performance rated using ECO’s Child Outcome Summary
Form
The team made a determination of progress
43
Example Continued
Time One

Outcome One






Rating = 3
Interpretation = “Not typical”
Outcome One



Rating = 5
Interpretation = “Not typical”
Outcome Three


Rating = 6
Interpretation = “Typical”
Outcome Two

Time Two
Outcome Two



Rating = 6
Interpretation = a
Rating = 5
Interpretation = b*
Outcome Three


Rating = 5
Interpretation = b
*Remember the Child Outcome Summary Form 7 point rating is a summary of
performance not of progress. At time two, teams are also prompted to consider progress.
44
Fact or Fiction
1.
2.
3.
4.
Someone has the answers and if I look long
enough I’ll have them too.
Everything has to be perfect this first time
around.
Research doesn’t matter – just getting the data
submitted.
I really do believe that garbage in is garbage
out but at the end of the day – just want the
data.
45
Overall Synthesis and
Recommendations


Rigorous implementation of curriculumbased assessments requires extensive
professional development and support of
instructional staff.
Findings suggest that CBAs,
when implemented with rigor, have the
potential to provide meaningful child
progress data for program evaluation and
accountability purposes.
46
“And that’s our outcomes measurement system.
Any questions?”
47
References





Allen, D., Bricker, D., Macy, M., & Pretti-Frontczak, K. (2006, February). Providing
Accountability Data Using Curriculum-Based Assessments. Poster presented at the
Biannual Conference on Research Innovations in Early Intervention, San Diego,
California.
Brown, R. D., Kowalski, K., Pretti-Frontczak, K., Uchida, C., & Sacks, D. (2002,
April). The reliability of teachers’ assessment of early cognitive development using
a naturalistic observation instrument. Paper presented at the 17th Annual
Conference on Human Development, Charlotte, North Carolina.
Grisham-Brown, J., Hallam, R., & Brookshire, R. (in press). Using authentic
assessment to evidence children’s progress towards early learning standards.
Early Childhood Education Journal.
Grisham-Brown, J., Hallam, R., & Pretti-Frontczak, K. Measuring child outcomes
using authentic assessment practices. Journal of Early Intervention (Innovative
Practices). Manuscript in preparation.
Hallam, R., Grisham-Brown, J., Gao, X., & Brookshire, R. (in press). The effects of
outcomes-driven authentic assessment on classroom quality. Early Childhood
Research and Practice.


Hallam, R., Grisham-Brown, J., & Pretti-Frontczak, K. (2005, October). Meeting
the demands of accountability through authentic assessment. Paper presented at
the International Division of Early Childhood Annual Conference, Portland, OR.
Walker, D., & Pretti-Frontczak, K. (2005, December). Issues in selecting
assessments for measuring outcomes for young children. Paper presented at the
OSEP National Early Childhood Conference, Washington, D.C.
(http://www.nectac.org/~meetings/nationalDec05/mtgPage1.asp?enter=no)
48