Do We Have To Choose Between Accountability and Program Improvement? NECTAC’s Measuring Child and Family Outcomes Conference 2006 Kristie Pretti-Frontczak Kent State University [email protected] Jennifer Grisham-Brown University of Kentucky [email protected] 1 Overview of Session Discuss the need for measuring child outcomes as it relates to programming and accountability purposes Discuss three issues and associated recommendations and related research Discussion is encouraged throughout Time will remain at the end for questions and further discussion of what was presented. 2 Introductions and Setting a Context Kristie Pretti-Frontczak Jennifer Grisham-Brown Kent State University University of Kentucky Belief/Bias/Recommended Practice Authentic assessment is critical regardless of purpose 3 CENTRAL QUESTION FOR TODAY’S PRESENTATION Can instructional data be used for accountability purposes? The Short Answer: Yes (IF) …. 4 Linked System Approach Assessment •Authentic •Involves families •Comprehensive •Common Goal Development •Based upon children’s emerging skills •Will increase access and participation •Ongoing •Developmentally and individually appropriated •Guides decisionmaking •Comprehensive and common •Systematic Evaluation Instruction 5 If you…. If you assess young children using a high quality authentic assessment… Then you’ll be able to develop high quality individualized plans to meet children’s unique needs…. If you identify the individual needs of children…. 6 You’ll want to use the information to guide curriculum development… If you have a curriculum framework that is designed around the individual needs of the children… Then you’ll want to document that children’s needs are being met… 7 Then you’ll need to monitor children’s performance over time using your authentic assessment… And when you have done the authentic assessment for a second or third time, you’ll want to jump for joy because all of the children will have made progress! 8 Three Issues Selection Implementation Interpretation 9 Questions around Selecting an Assessment Which tools/processes? Which characteristics should be considered? What about alignment to state standards or Head Start Outcomes? Use a single/common assessment or a list? Allow for choice or be prescriptive? Who should administer? Where should the assessment(s) be administered? 10 Recommendations Use an assessment for its intended purpose Avoid comparing assessments to one another – rather compare them to stated/accepted criteria Alignment to local/state/federal standards Reliable and valid Comprehensive and flexible Link between assessments purposes Link between assessment and intervention 11 Recommendations Continued Allow for state/local choice if possible Increases likelihood of a match Increases fidelity and use Avoids a one size fits all approach if assessment is flexible and comprehensive 1 might work Authentic, authentic, authentic People who are familiar Settings that are familiar Toys/materials that are familiar 12 Generic Validation Process Step 1– Create a Master Alignment Matrix Step 2 –Create Expert Alignment Matrixes Experts create a master matrix Establish inclusion and exclusion criteria Experts blind to the master matrix create their own alignment matrixes Step 3 – Validate Master Alignment Matrix Compare master and expert matrixes Ensure that all items that should be considered were placed on the final matrixes Examine the internal consistency of the final matrixes Allen, Bricker, Macy, & Pretti-Frontczak, 2006; Walker, & Pretti-Frontczak, 2005 For more information on crosswalks visit: http://www.fpg.unc.edu/~ECO/crosswalks.cfm or http://aepslinkedsystem.com 13 Concurrent Validity Purpose: Subjects: 31 Head Start children Ranged in age from 48 months to 67 months (M=60.68, SD=4.65) Methods: To examine the concurrent validity between a traditional normreferenced standardized test (BDI-2) and an curriculum-based assessment (AEPS®) Six trained graduate students administered the BDI-2 and six trained Head start teachers administered the AEPS® during a two-week period. Conducted seven (7) bivariate 2-tailed correlations (Pearson’s and Spearman’s) Results: Five correlations suggested a moderate to good relationship between the BDI-2 and the AEPS Two correlations suggested a fair relationship between the BDI-2 and the AEPS Hallam, Grisham-Brown, & Pretti-Frontczak, 2005 14 Concurrent Validity Results Adaptive Social Self Care items from the BDI (M = 66.03, SD = 6.67) were moderately correlated with Adaptive items from the AEPS (M = 62.03, SD = 13.57), r = .57, n = 31, p =.01. Personal Social items from the BDI (M = 175.15, SD = 22.74) had a fair correlation with Social items from the AEPS (M = 80.06, SD = 16.33), r = .50, n = 31, p =.01. Communication Communication items from the BDI (M = 121.06, SD = 16.22) were moderately correlated with Social Communication items from the AEPS (M = 88.61, SD = 14.20), r = .54, n = 31, p =.01. 15 Concurrent Validity Results Continued Motor Gross Motor items from the BDI (M = 82.76, SD = 4.70) had a fair correlation with Gross Motor items from the AEPS (M = 30.10, SD = 6.62), r = .48, n = 31, p =.01. Fine Motor items from the BDI (M = 52.45, SD = 5.30) were moderately correlated with Fine Motor items from the AEPS (M = 26.39, SD = 5.68), r = .58, n = 31, p =.01. Perceptual Motor items from the BDI (M = 27.73, SD = 3.63) were moderately correlated with Fine Motor items from the AEPS (M = 26.39, SD = 5.68), r = .58, n = 31, p =.01. Cognitive Cognitive items from the BDI (M = 135.85, SD = 23.44) were moderately correlated with Cognitive items from the AEPS (M = 81.26, SD = 24.26), r = .71, n = 31, p =.01. 16 Project LINK Head Start/University Partnership grant (Jennifer Grisham-Brown/Rena Hallam) Purpose: To build the capacity of Head Start programs to link child assessment and curriculum to support positive outcomes for preschool children Focus on mandated Head Start Child Outcomes Concepts of Print Oral Language Phonological Awareness Concepts of Number Grisham-Brown, Hallam, & Brookshire, in press; Hallam, GrishamBrown, Gao, & Brookshire, in press 17 PRELIMINARY RESULTS FROM PROJECT LINK: Classroom Quality No significant differences between control and intervention classrooms on global quality (ECERS-R) The quality of the language and literacy environment (ELLCO) was superior in intervention classrooms; significant in pilot classrooms 18 PRELIMINARY RESULTS FROM PROJECT LINK: Child Outcomes Change scores in Intervention classrooms are significantly higher than Control classrooms on letter-word recognition subscale of FACES battery. The mean change scores were higher (although not significantly so) on seven additional subscales (11 total) of FACES battery - nearing significance on PPVT Results would probably have been greater with larger sample Results will be duplicated this year 19 Questions Around Training, Implementation, and Use Who will implement? What level of training and support will staff need? What will be topics of training? Who will provide training and support? How will you know if staff are reliably collecting data? How ill you know if staff are procedurally collecting data with fidelity? 20 Recommendations: Training/Follow-up Format Topics Classroom and administrative Valid and reliable Will require training and support Will require seeing assessment as a critical part of intervention/curriculum planning 21 What it takes! Who? All classroom staff Administrators/consultants What? Instrument Methods (e.g., observations, anecdotals, work samples) Data entry/management Relationship to everything else (I.e., Linked system) 22 What it takes (cont.) How? Training that is “chunked” Self-assessment Follow-up, follow-up, follow-up Mentoring On-site technical assistance Access to someone to call! Involvement of administration 23 Can preschool teachers (with appropriate training) collect reliable data with fidelity? Reliability study Fidelity study Accuracy study Brown, Kowalski, Pretti-Frontczak, Uchida, & Sacks, 2002 ; Grisham-Brown, Hallam, & Pretti-Frontczak, in preparation 24 Inter-Rater Reliability Subjects: Method: 7 Head Start Teachers 7 Head Start Teaching Assistants Practiced scoring AEPS items from video Scored AEPS items; Checked against master score provided by author Results: 7 of 7 teachers reached reliability at 80% or higher (range 85% - 93%) 5 of 7 teaching assistants reached reliability at 80% or higher (range 75% - 90%) 25 Fidelity Study Subjects: Six (6) Head Start teachers/teaching assistants who reached 80% or higher on interrater reliability study Method: Used fidelity measure to check teachers’ implementation of authentic assessment within seven (7) planned activities Six (6) Authentic Assessment Variables set up and preparation; decision making; materials; choice; embedding; and procedure Procedures Observed participants collecting AEPS® data during each 7 small group activities Observed participants 7 times for up to 10 minutes per activity 26 Average Ratings on Six Authentic Assessment Variables across Observations and Activities by Teacher Setup and Preparation Decision Making Materials 3 Embedding Child Choice Procedures Average Ratings 2.5 2 1.5 1 0.5 0 Abby Amanda Kate Reba Sarah Vicky Teachers 27 Average Ratings on Six Authentic Assessment Variables across Observations for Seven Different Activities 3 Setup and Preparation Decision Making Materials Embedding Average Rating Child Choice Procedures 2 1 0 Outdoor Play Dramatic Play A Book About Me Playdough Manipulatives Story Time Snack Authenic Assessment Activities 28 Accuracy Study Study designed to investigate the accuracy of teachers’ assessments of children’s skills and abilities using observational assessment Examined the degree of agreement between assessments of children’s Language and Literacy and Early Math skills made by their teachers using an observational assessment instrument and assessments of the same skills made by researchers using a demand performance instrument. Brown, Kowalski, Pretti-Frontczak, Uchida, & Sacks, 2002 29 Measures Observational Measure - Galileo System’s Scales (Bergan, Bergan, Rattee, & Feld, 2001) Language & Literacy-Revised Ages 3-5 (n=68 items full scale) Early Math-Revised Ages 3-5 (n=68 items full scale) Demand Performance Measure Items that could be readily assessed in individual, one-session, performance-based interviews with children were selected from the Galileo System’s scales and converted into demand performance tasks to create two performance measures Language & Literacy (n=21 items) Early Math (n=23 items). Items varied in difficulty and knowledge domain assessed. Standardized sets of materials for administering tasks were also developed (e.g., index cards with printed objects, books, manipulatives, etc.). The performance measures were piloted with preschoolers in two regions of the state and revised accordingly. 30 Procedures Trained research assistants visited sites across the state: collected data teachers entered into the relevant observation scales of the Galileo System; and administered the Performance Measures. In order to ensure that the most up-to-date information was obtained from the Galileo System, data were collected during the 2 weeks prior to and following a state mandated entry date. Order of administration of Performance Measures was counterbalanced across assessment domains. 31 Participants 122 children ranged in age from 3 to 6 years (M=4 years, 11 months) 100% in state-funded Head Start programs 66 teachers Areas in which children are served 47% urban 41% suburban/small town 11% rural Representation by use of the Galileo System 38% first-year users 32% second-year users 23% third-year users 32 Conclusions Overall, levels of concordance were moderate. In the domain in which teachers were most conservative in attributing abilities to children, Language & Literacy, there was the most amount of agreement between data teachers entered into the Galileo System and the Performance Measure (71%). In the domain in which teachers were most generous in attributing abilities to children, Early Math, there was the least amount of agreement between the data teachers entered into the Galileo System and the Performance Measure (66%). Reliability Teachers using the naturalistic observation instrument (the Galileo System) are not providing inflated estimates of children’s skills and abilities. However, they may be underestimating children’s skills and abilities in the domain of Language & Literacy. 33 Questions Around Interpreting the Evidence What is evidence? Where should the evidence come from? What is considered “performing as same age peers”? How should decisions be made? Who should interpret the evidence? How can the ECO child summary form be used? 34 What is Evidence? Information (observations, scores, permanent products) about a child’s performance across the three OSEP outcomes Positive social-emotional skills (including social relationships) Acquisition and use of knowledge and skills Use of appropriate behaviors to meet their needs The amount of type of evidence for each outcome will vary 35 Where should the evidence come from? Multiple time periods Multiple settings Multiple people Parents Providers Those familiar with the child Multiple measures (should be empirically aligned) Observations Interviews Direct tests 36 Required Decisions Decision for Time 1 Is the child performing as same age peers? Yes No Decision for Time 2 Did the child make progress? YES – and performance is as you would expect of same age peers YES – and performance is not as you would expect of same age peers NO progress was made 37 Things to Keep in Mind “Typical/performing as same age peers” is NOT average “Typical” includes a very broad range of skills/abilities Child can be “typical” in one OSEP area and not another Progress is any amount of change Raw score changed by 1 point A single new skill was reached Child needs less assistance at time two If using the Child Outcome Summary Form Child’s rating score does NOT have to change from time 1 to time 2 to demonstrate progress Progress can be continuing to develop at a typical rate (i.e., maintain typical status) 38 How Should the Required Decisions be Made? Some assessments will make the decision Standard score Residual Change Scores Goal Attainment Scaling Number of objectives achieved/Percent objectives achieved Rate of Growth Item Response Theory (cutoff score) Proportional Change Index 39 Making Decisions Continued Regardless - Team conclusions…. should be based on multiple sources should be based on valid and reliable information should be systematic Can use the Child Outcome Summary Form Will help with required decision and provide more information for use at the local or state level 40 Child Outcome Summary Form Single rating scale that can be used to systematize information and make decisions After reviewing the evidence rate the child’s performance on each of the 3 outcomes from 1 to 7 Currently a score of 6 or 7 is considered to be performance that is similar to same age peers. Completely 7 Somewhat 6 5 Emerging 4 3 Not Yet 2 1 41 Getting from 7 to 3 Seven point rating scale just summarizes the evidence The required interpretation is still needed a. b. c. % of children who reach or maintain functioning at a level comparable to same-age peers % of children who improve functioning but are not in “a” % of children who did not improve functioning 42 Example During a play-based assessment, IFSP/IEP team administered The team then summarized the child’s performance using each method’s internal summary procedures a norm-referenced test a curriculum-based assessment an interview with relevant caregivers Calculated a standard score Derived a cutoff score Narratively summarized interview Lastly the team rated the child’s overall performance using ECO’s Child Outcome Summary Form for each of the 3 OSEP outcomes Two years later as the child was being transitioned out of the program, the results from a comprehensive curriculum-based assessment were reviewed The child’s performance rated using ECO’s Child Outcome Summary Form The team made a determination of progress 43 Example Continued Time One Outcome One Rating = 3 Interpretation = “Not typical” Outcome One Rating = 5 Interpretation = “Not typical” Outcome Three Rating = 6 Interpretation = “Typical” Outcome Two Time Two Outcome Two Rating = 6 Interpretation = a Rating = 5 Interpretation = b* Outcome Three Rating = 5 Interpretation = b *Remember the Child Outcome Summary Form 7 point rating is a summary of performance not of progress. At time two, teams are also prompted to consider progress. 44 Fact or Fiction 1. 2. 3. 4. Someone has the answers and if I look long enough I’ll have them too. Everything has to be perfect this first time around. Research doesn’t matter – just getting the data submitted. I really do believe that garbage in is garbage out but at the end of the day – just want the data. 45 Overall Synthesis and Recommendations Rigorous implementation of curriculumbased assessments requires extensive professional development and support of instructional staff. Findings suggest that CBAs, when implemented with rigor, have the potential to provide meaningful child progress data for program evaluation and accountability purposes. 46 “And that’s our outcomes measurement system. Any questions?” 47 References Allen, D., Bricker, D., Macy, M., & Pretti-Frontczak, K. (2006, February). Providing Accountability Data Using Curriculum-Based Assessments. Poster presented at the Biannual Conference on Research Innovations in Early Intervention, San Diego, California. Brown, R. D., Kowalski, K., Pretti-Frontczak, K., Uchida, C., & Sacks, D. (2002, April). The reliability of teachers’ assessment of early cognitive development using a naturalistic observation instrument. Paper presented at the 17th Annual Conference on Human Development, Charlotte, North Carolina. Grisham-Brown, J., Hallam, R., & Brookshire, R. (in press). Using authentic assessment to evidence children’s progress towards early learning standards. Early Childhood Education Journal. Grisham-Brown, J., Hallam, R., & Pretti-Frontczak, K. Measuring child outcomes using authentic assessment practices. Journal of Early Intervention (Innovative Practices). Manuscript in preparation. Hallam, R., Grisham-Brown, J., Gao, X., & Brookshire, R. (in press). The effects of outcomes-driven authentic assessment on classroom quality. Early Childhood Research and Practice. Hallam, R., Grisham-Brown, J., & Pretti-Frontczak, K. (2005, October). Meeting the demands of accountability through authentic assessment. Paper presented at the International Division of Early Childhood Annual Conference, Portland, OR. Walker, D., & Pretti-Frontczak, K. (2005, December). Issues in selecting assessments for measuring outcomes for young children. Paper presented at the OSEP National Early Childhood Conference, Washington, D.C. (http://www.nectac.org/~meetings/nationalDec05/mtgPage1.asp?enter=no) 48
© Copyright 2026 Paperzz