A State Perspective on Enhancing Assessment & Accountability Systems through Systematic Integration of Computer Technology Joseph A. Martineau, Ph.D. Vincent J. Dean, Ph.D. Michigan Department of Education Presentation at the tenth annual Maryland Assessment Conference October 2010 The Michigan Stage Michigan offers an interesting perspective ◦ ◦ ◦ ◦ Pilot in 2006 Pilot in 2011 (English Language Proficiency) Pilot in 2012 (Alternate Assessments) Pilots leading up to operational adoption of SMARTER/Balanced Assessment Consortium products in 2014/15 ◦ Constitutional amendment barring unfunded mandates The National Stage Survey of state testing directors (+D.C.) ◦ 50 responses + one investigation via state department of education website ◦ 7 of 51 states have no CBT initiatives ◦ 44 of 51 states have current CBT initiatives, including: Operational online assessment Pilot online assessment Plans for moving online The National Stage, continued… Survey of state testing directors (+D.C.) ◦ CBT initiatives include Teacher entry of student responses online Student entry of responses online P&P replication CAT AI scoring MC via internet, CR via paper and pencil General populations (grade level and end of course) Special populations (eases infrastructure concerns) Modified Alternate English language proficiency Online repository and scoring of portfolio materials Item banks for flexible unit-specific interim assessment ◦ Initiatives are all over the board, piecemeal for the most part The National Stage, continued… Survey of state testing directors (+D.C.) ◦ Of 44 states with some initiative 26 states currently administer large-scale general populations assessments online 15 states have plans to begin (or expand) online administration of large-scale general populations assessments 12 states currently administer special populations assessments online 3 states have plans to begin (or expand) online administration of special populations assessments The National Stage, continued… Survey of state testing directors (+D.C.) ◦ Of 44 states with some initiative 7 states currently use Artificial Intelligence (AI) scoring of constructed response items 4 states currently use Computer Adaptive Testing (CAT) technology for general populations assessment, with one more moving in that direction soon 0 states currently use CAT technology for special populations assessment 10 states offer online interim/benchmark assessments 10 states offer online item banks accessible to teachers for creating “formative”/interim/benchmark assessments tailored to unique curricular units The National Stage, continued… Survey of state testing directors (+D.C.) ◦ Of 44 states with some initiative 6 states offer computer based testing (CBT) options on general populations assessment as an accommodation for special populations 4 states report piloting and administration of innovative item types (e.g. flash-based modules providing mathematical tools such as protractors, rulers, compasses) 16 states offer End of Course (EOC) tests online, or are implementing online EOC in the near future 6 states report substantial failure of a large-scale online testing resulting in cessation of computer based testing Some have recovered and are moving back online Others have no plans to return to online testing The National Stage, continued… Development of the Common Core of State Standards (CCSS) ◦ Content standards (not a test) English Language Arts (K-12) Mathematics (K-12) ◦ Developed with backing from 48 states ◦ Adoption tally Adopted in full by 39 states Adoption declined in 5 states Adoption expected by remaining 6 states by end of 2011 The National Stage, continued… Assessment Consortia ◦ Race to the Top Assessment Competition ◦ Development of an infrastructure and content for a common assessment in measuring CCSS in English Language Arts and Mathematics ◦ Two consortia SMARTER/Balanced Assessment Consortium (SBAC) Partnership for the Assessment of Readiness for College and Career (PARCC) The National Stage, continued… The consortia: ◦ SMARTER/Balanced 31 states 17 governing states CAT beginning in 2014-2015 ◦ PARCC 26 states 11 governing states CBT beginning in 2014-15 Consortia Membership The National Stage, Summary State efforts have been, with few exceptions, piecemeal by… ◦ Program ◦ Content area ◦ Grade level ◦ Type of assessment (summative, interim, formative) ◦ Population (general, modified, alternate) Most states are… ◦ Involved in some kind of pilot or operational use ◦ Intending to be operational on a large scale by 2014-2015 ◦ Experiencing budget crises… ◦ ◦ That make transitions difficult That make efficiencies of technology integration critical A strong need to take a systems look at how to integrate computer technology into assessment and accountability systems Technology integration is a significant opportunity to provide a platform that connects all initiatives The Organizing Framework for this Paper From… ◦ Martineau, J. A., & Dean,V. J. (in press). Making Assessment Relevant to Students, Teachers, and Schools. In V. Shute & Becker, B.J. (Eds.). Innovative Assessment for the 21st Century: Supporting Educational Needs. Springer-Verlag, NY. ◦ Figure 1 Accountability Content & Process Standards Professional Development SEA & LEA accountability (e.g., accreditation) for inservice PD Educator accountability (e.g., evaluation, performance pay) for implementation of classroom assessment & data use practices Educator accountability (e.g., evaluations, performance pay) for individual student achievement & growth scores on secure summative assessments Student accountability (e.g., grades, course credit) for classroom (and possibly secure interim) summative scores End of year, on-demand summary assessment (if needed) Classification of content & process standards for measurement purposes by: Assessment literacy standards for educator certification Response type * on-demand timed * on-demand untimed * feedback looped Task type * selected response * short constructed response * extended constructed response * performance event Setting * classroom only * classroom and secure Classroom Formative Assessment Classroom Summative Assessment Secure Adaptive Interim Assessment Secure Adaptive Summary Assessment Teacher prep institution accountability (e.g., accreditation) for preservice PD Limited number of high-schoolexit standards Learning progressions Assessment literacy training requirements for: * teachers * consultants * leaders Limited number of K-12 content/ process standards Repeatable, on-demand customizable, on-line, unit assessments Overall achievement & growth scores Scoring (maximize objective, distribute subjective) If needed Portfolio description (feedback looped tasks) Portfolio development & submission Summative classroom assessments Model classroom formative & summative assessment strategies & materials Online classroom assessment strategies & materials clearinghouse for educators Learning progressions Unit achievement scores Growth scores based on learning progressions Model curriculum/ instruction units Pre- and in-service balanced assessment training on: * content standards * classroom assessment (formative, summative) * large-scale assessment (benchmark, summative) * assessment data use for decision making * subjective item scoring Ongoing support for implementation in the form of school teams and coaches (for observation and followup) Classroom achievement scores Formative assessment implementation Accountability Accountability as Protective Umbrella Over the Complete System Makes Sense Only when All Layers Below are in Place Secure Adaptive Summary Assessment Secure Adaptive Summary Assessment as a Policy and Accountability Metric (including Cross-Year Growth Modeling) Secure Adaptive Interim Assessment Secure Adaptive Interim Assessment as a Policy and Accountability Metric (including Within-Year Growth Modeling) that Makes Sense Only when the Foundational Layers are in Place Classroom Summative Assessment Classroom Summative Assessment Layered on Formative Assessment Classroom Formative Assessment Classroom Formative Assessment as the Ground Floor Content & Process Standards Content and Process Standards as Foundation Professional Development Professional Development as Footings Accountability Content & Process Standards Professional Development SEA & LEA accountability (e.g., accreditation) for inservice PD Educator accountability (e.g., evaluation, performance pay) for implementation of classroom assessment & data use practices Educator accountability (e.g., evaluations, performance pay) for individual student achievement & growth scores on secure summative assessments Student accountability (e.g., grades, course credit) for classroom (and possibly secure interim) summative scores End of year, on-demand summary assessment (if needed) Classification of content & process standards for measurement purposes by: Assessment literacy standards for educator certification Response type * on-demand timed * on-demand untimed * feedback looped Task type * selected response * short constructed response * extended constructed response * performance event Setting * classroom only * classroom and secure Classroom Formative Assessment Classroom Summative Assessment Secure Adaptive Interim Assessment Secure Adaptive Summary Assessment Teacher prep institution accountability (e.g., accreditation) for preservice PD Limited number of high-schoolexit standards Learning progressions Assessment literacy training requirements for: * teachers * consultants * leaders Limited number of K-12 content/ process standards Repeatable, on-demand customizable, on-line, unit assessments Overall achievement & growth scores Scoring (maximize objective, distribute subjective) If needed Portfolio description (feedback looped tasks) Portfolio development & submission Summative classroom assessments Model classroom formative & summative assessment strategies & materials Online classroom assessment strategies & materials clearinghouse for educators Learning progressions Unit achievement scores Growth scores based on learning progressions Model curriculum/ instruction units Pre- and in-service balanced assessment training on: * content standards * classroom assessment (formative, summative) * large-scale assessment (benchmark, summative) * assessment data use for decision making * subjective item scoring Ongoing support for implementation in the form of school teams and coaches (for observation and followup) Classroom achievement scores Formative assessment implementation Entry Points Assessment literacy standards for educator certification Limited number of high-schoolexit standards Overall achievement & growth scores Unit achievement scores Outcomes Growth scores based on learning progressions Classroom achievement scores Formative assessment implementation The Organizing Framework for this Paper, continued… With a comprehensive system in place, it is possible to identify comprehensively where integration of technology will enable and enhance the system Components identified with bold outlines on the next slide Accountability Content & Process Standards Professional Development SEA & LEA accountability (e.g., accreditation) for inservice PD Educator accountability (e.g., evaluation, performance pay) for implementation of classroom assessment & data use practices Educator accountability (e.g., evaluations, performance pay) for individual student achievement & growth scores on secure summative assessments Student accountability (e.g., grades, course credit) for classroom (and possibly secure interim) summative scores End of year, on-demand summary assessment (if needed) Classification of content & process standards for measurement purposes by: Assessment literacy standards for educator certification Response type * on-demand timed * on-demand untimed * feedback looped Task type * selected response * short constructed response * extended constructed response * performance event Setting * classroom only * classroom and secure Classroom Formative Assessment Classroom Summative Assessment Secure Adaptive Interim Assessment Secure Adaptive Summary Assessment Teacher prep institution accountability (e.g., accreditation) for preservice PD Limited number of high-schoolexit standards Learning progressions Assessment literacy training requirements for: * teachers * consultants * leaders Limited number of K-12 content/ process standards Repeatable, on-demand customizable, on-line, unit assessments Overall achievement & growth scores Scoring (maximize objective, distribute subjective) If needed Portfolio description (feedback looped tasks) Portfolio development & submission Summative classroom assessments Model classroom formative & summative assessment strategies & materials Online classroom assessment strategies & materials clearinghouse for educators Learning progressions Unit achievement scores Growth scores based on learning progressions Model curriculum/ instruction units Pre- and in-service balanced assessment training on: * content standards * classroom assessment (formative, summative) * large-scale assessment (benchmark, summative) * assessment data use for decision making * subjective item scoring Ongoing support for implementation in the form of school teams and coaches (for observation and followup) Classroom achievement scores Formative assessment implementation Starting from the Bottom Up Professional Development Pre- and in-service balanced assessment training on: * content standards * classroom assessment (formative, summative) * large-scale assessment (benchmark, summative) * assessment data use for decision making * subjective item scoring Ongoing support for implementation in the form of school teams and coaches (for observation and followup) Current lack of pre-service and in-service balanced assessment training Need for rapid scale up to millions of educators on a small budget Technology Integration into Pre- and In-Service Professional Development Scaling up is only feasible with integral use of technological tools High-quality online courses Social networking among educators Live tele-coaching Electronic (graphic, audio, video) capture for distance streaming of materials, plans, and instructional practice vignettes over high speed networks To facilitate discussion regarding instructional practice between Candidates and instructor/coach Candidates and mentor Mentors and instructor/coach For example, repurposing Idaho’s special portfolio submission system for educator training Moving to Content & Process Standards Start a limited set of high school exit standards based on college and career readiness From that, develop K-12 content/process standards in a logical progression to college and career readiness Based on the learning progressions and K-12 content/process standards, develop model instructional materials Model curriculum/ instruction units Model Instructional Materials Clearinghouse Develop online clearinghouse of materials for model curriculum and instructional units ◦ Lesson plans ◦ Lesson materials ◦ Video vignettes of high quality instructional practices based on those units ◦ Flexible platform to accept user submission in a variety of formats ◦ User moderated ratings of submission quality Moving to Assessment Practices Before actually moving into assessment practices, it is important to classify content standards in three ways: ◦ Timing On-demand, time limited On-demand, not time limited Feedback-looped ◦ Task type Selected response Short constructed response Extended constructed response Performance events ◦ Setting Classroom only Classroom and secure Based on these classifications, several types of assessment take place Assessment Practices, continued… Start with model classroom materials and tools Model classroom formative & summative assessment strategies & materials Online classroom assessment strategies & materials clearinghouse for educators Initial development of model materials, vignettes, strategies, and tools sets the stage for… Educator submissions to Populate online clearinghouse of materials for model classroom assessment practice units ◦ Summative assessment materials ◦ Formative assessment vignettes, strategies, and tools ◦ Flexible platform to accept user submission in a variety of formats ◦ User moderated ratings of submission quality Non-secure item bank generated by educators ◦ Platform support various item types ◦ User moderated ratings of submission quality ◦ Large enough that security is not a concern Empirically designed MC items Fully customizable Which in Turn Leads to… Summative classroom assessments Online classroom assessment strategies & materials clearinghouse for educators Formative assessment implementation Implementation of formative assessment practices enhanced by technological aids, such as ◦ Response devices (e.g., clickers, tablet computers, phones) ◦ Rapid response to teacher queries over online systems ◦ Remote response to formative queries (e.g. rural areas and cyberschools) Which in Turn Leads to… Summative classroom assessments Online classroom assessment strategies & materials clearinghouse for educators Formative assessment implementation Selection or development of summative classroom assessments ◦ On-demand micro-benchmark (small unit) assessments ◦ From non-secure item bank generated by educators ◦ Customizable to fit specific lesson plans/curricular documents ◦ Instant reporting for diagnostic/instructional intervention purposes ◦ Inform targeted professional development in real time ◦ RESULTS NOT used for large-scale accountability purposes (belongs to the schools and teachers) With High-Quality Classroom Assessment Practices in Place Large-scale assessment now makes sense, with three types of large-scale assessment End of year, on-demand summary assessment (if needed) Repeatable, on-demand customizable, on-line, unit assessments Portfolio development & submission Large-Scale Assessment, continued… Start with classroom-based Portfolio development & submission For content standards best measured using “feedback-looped” tasks ◦ Meaning content standards (likely higher order) that are best accomplished with a feedback cycle between teacher and student Portfolio Development & Submission, continued… Portfolio development & submission Creation of portfolio includes scannable materials, electronic documents, and/or audio/video of student performance Submitted via a secure online portfolio repository (e.g., Idaho’s alternate assessment portfolio submission site) Unlikely to be scorable using AI, therefore, scored on a distributed online scoring system that prevents teachers from scoring their own students’ portfolios (e.g., Idaho’s alternate assessment portfolio scoring site Can be scored both for final product and development over time Moving to Secure Online Testing For content standards that do not require “feedback-looped” tasks Repeatable, on-demand customizable, on-line, unit assessments Dynamic online CAT assessments ◦ Based on dynamically selected clusters of content standards covered in instructional units ◦ Scaled to the same scale as the end-of-year assessment, with cut scores for mastery/proficiency ◦ Can move students on to higher grade level content once mastery/proficiency of all grade level content is demonstrated through unit assessments ◦ What Race to the Top Assessment Competition calls “Through-Course Assessment” Moving to Secure Online Testing Repeatable, on-demand customizable, on-line, unit assessments ◦ What Race to the Top Assessment Competition calls “Through-Course Assessment” ◦ Provides advance look at trajectory toward proficiency ◦ Provides multiple opportunities to demonstrate proficiency ◦ More equitable for high-stakes accountability purposes ◦ Useful for mid-year correction in instructional practice (e.g. Response to Intervention) ◦ Useful for placement purposes of newly arrived students ◦ Useful for differentiated instruction ◦ Anticipate increase educator motivation (because of timely information) Moving to Secure Online Testing Repeatable, on-demand customizable, on-line, unit assessments Beyond traditional CAT/CBT AI Scoring of constructed response items Technology enhanced items Performance tasks/events (through simulations) Gaming type items Moving to Secure Online Testing For three groups of students… End of year, on-demand summary assessment (if needed) 1. 2. 3. Initial scaling and calibration group Ongoing randomly selected validation groups (to validate that students proficient on all required unit tests retain proficiency at the end of the year) Students who do not achieve proficiency on all required unit tests Final opportunity to demonstrate overall proficiency if proficiency was in question on any single unit assessment Allows for the elimination of a single end-of-year test for most students Scoring Maximize objective scoring by ◦ Automated scoring of objective items ◦ AI scoring of extended written response items, technology enhanced items, and performance tasks wherever possible ◦ Distributed hand-scoring of tasks not scorable using AI Scoring (maximize objective, distribute subjective) Distributed Scoring as Professional Development Human scorers taken from ranks of educators ◦ ◦ ◦ ◦ Online training on hand-scoring Online certification as a hand-scorer Online monitoring of rater performance Validation hand-scoring of samples of AI-scored tasks Our experience with teacher-led scoring and range-finding indicates that it is some of the best professional development that we provide to educators Reporting For the most part, reports are difficult to read and poorly used Need online reporting of all scores for all stakeholders, including: ◦ ◦ ◦ ◦ ◦ Policymakers (aggregate) Administrators (aggregate and individual) Teachers (aggregate and individual) Parents (aggregate and individual) Students (individual) Reporting Portal Reporting portal needs to be able to integrate reports from classroom metrics all the way to large-scale secure assessment metrics Overall achievement & growth scores Unit achievement scores Growth scores based on learning progressions Classroom achievement scores Reporting Portal Reporting cycles depend on the item types and application of AI scoring. ◦ Immediate where possible ◦ Expedited hand-scoring (shifting funding focus from printing, shipping, and scanning to on-demand hand-scoring) Overall achievement & growth scores Unit achievement scores Growth scores based on learning progressions Classroom achievement scores Where the Rubber Hits the Road This is a nice system design (if we do say so ourselves), but what are the impediments to implementation? Infrastructure ◦ LEA hardware and bandwidth capacity ◦ Assessment vendor capacity ◦ Moving from piecemeal components to an integrated, coherent system ◦ Development of educator-moderated clearinghouses ◦ Development of educator-moderated item bank Where the Rubber Hits the Road Security ◦ The more high-stakes the system, the more likely security breaches become ◦ Critical need for training on user roles ◦ Critical need for training on data use, since data will become much more readily available across the board ◦ Security controls versus open-source and maximal access Where the Rubber Hits the Road Funding ◦ Very high initial startup investment ◦ Dual systems during development and initial implementation ◦ Ramping up LEA technology systems to be capable of working within the system Where the Rubber Hits the Road Sustainability ◦ Requires perpetual investment in administration ◦ Development is only the start (e.g. sustainability concerns regarding RTTT-funded assessment consortia) ◦ Requires early success and public understanding of the benefits of the system weighed against ongoing costs ◦ Recurring hardware/software technology upgrade costs for LEAs ◦ Recurring hardware/software technology maintenance costs for central IT systems Where the Rubber Hits the Road Local Control ◦ This kind of system is only possible to create with significant funding and local buy-in ◦ No single state (let alone district) could afford the cost of development and implementation ◦ Consortia are imperative to creating such a system Consortia can tend toward self-perpetuation rather than serving their members Consortia cannot ignore local nuances Consortia cannot ignore reasonable needs for flexibility Consortia must monitor and maximize member investment Where the Rubber Hits the Road Building an appetite for online systems ◦ Implementation may occur piecemeal, but should be undertaken within a framework for a coherent and complete system ◦ Each piece when implemented needs to be implemented in such a way that local educators and policymakers see a positive impact on the educational system, e.g., Immediate turnaround of results Connection between family and school Improved instructional practice Facilitation of differentiated instruction Recommendations for Future Directions System has the potential to make us data-rich and analysis-poor ◦ Build local (SEA and LEA) capacity for appropriate analysis (possibly through re-defining positions that might be eliminated through consortia services) ◦ New practices (e.g. through-course, innovative items types, AI scoring) will require a significant research and validation agenda, including Equating Comparability Standard setting Recommendations for Future Directions System has the potential to make educators and students data rich ◦ Portfolios of assessment results and products as evidence of students’ college and career readiness ◦ Portfolios of assessment results and products as evidence of teacher classroom practices and effectiveness Recommendations for Future Directions Financial incentives from ARRA/RTTT have provided the impetus for some of these initiative to get started Sustainability needs to be a focus both within and across states To maximize cross-state focus, we recommend continued significant funding of initiatives through ESEA reauthorization, Enhanced Assessment Grants, and other competitive/formula funding opportunities Recommendations for Future Directions Scoring of competitive consortium applications should be weighted toward… ◦ The development of integrated systems across all aspects of assessment & accountability ◦ Significant and rigorous research, development, and evaluation of the validity and impact (intended and unintended consequences) of system development and implementation Formula funding should stipulate collaboration in system development Use of formula funding guarantees… ◦ Continued focus on students with the greatest needs ◦ Access to quality systems for states without strong resources for writing competitive grants Contact Information Joseph A. Martineau, Ph.D. ◦ Director of Assessment & Accountability ◦ [email protected] Vincent J. Dean, Ph.D. ◦ State Assessment Manager ◦ [email protected] Michigan Department of Education
© Copyright 2026 Paperzz