How a state might benefit from computer

A State Perspective on Enhancing
Assessment & Accountability Systems
through Systematic Integration of
Computer Technology
Joseph A. Martineau, Ph.D.
Vincent J. Dean, Ph.D.
Michigan Department of Education
Presentation at the tenth annual
Maryland Assessment Conference
October 2010
The Michigan Stage

Michigan offers an interesting perspective
◦
◦
◦
◦
Pilot in 2006
Pilot in 2011 (English Language Proficiency)
Pilot in 2012 (Alternate Assessments)
Pilots leading up to operational adoption of
SMARTER/Balanced Assessment Consortium
products in 2014/15
◦ Constitutional amendment barring unfunded
mandates
The National Stage

Survey of state testing directors (+D.C.)
◦ 50 responses + one investigation via state
department of education website
◦ 7 of 51 states have no CBT initiatives
◦ 44 of 51 states have current CBT initiatives,
including:
 Operational online assessment
 Pilot online assessment
 Plans for moving online
The National Stage, continued…

Survey of state testing directors (+D.C.)
◦ CBT initiatives include








Teacher entry of student responses online
Student entry of responses online
P&P replication
CAT
AI scoring
MC via internet, CR via paper and pencil
General populations (grade level and end of course)
Special populations (eases infrastructure concerns)
 Modified
 Alternate
 English language proficiency
 Online repository and scoring of portfolio materials
 Item banks for flexible unit-specific interim assessment
◦ Initiatives are all over the board, piecemeal for the most part
The National Stage, continued…

Survey of state testing directors (+D.C.)
◦ Of 44 states with some initiative
 26 states currently administer large-scale general
populations assessments online
 15 states have plans to begin (or expand) online
administration of large-scale general populations
assessments
 12 states currently administer special populations
assessments online
 3 states have plans to begin (or expand) online
administration of special populations assessments
The National Stage, continued…

Survey of state testing directors (+D.C.)
◦ Of 44 states with some initiative
 7 states currently use Artificial Intelligence (AI) scoring
of constructed response items
 4 states currently use Computer Adaptive Testing (CAT)
technology for general populations assessment, with one
more moving in that direction soon
 0 states currently use CAT technology for special
populations assessment
 10 states offer online interim/benchmark assessments
 10 states offer online item banks accessible to teachers
for creating “formative”/interim/benchmark assessments
tailored to unique curricular units
The National Stage, continued…

Survey of state testing directors (+D.C.)
◦ Of 44 states with some initiative
 6 states offer computer based testing (CBT) options on
general populations assessment as an accommodation for
special populations
 4 states report piloting and administration of innovative item
types (e.g. flash-based modules providing mathematical tools
such as protractors, rulers, compasses)
 16 states offer End of Course (EOC) tests online, or are
implementing online EOC in the near future
 6 states report substantial failure of a large-scale online testing
resulting in cessation of computer based testing
 Some have recovered and are moving back online
 Others have no plans to return to online testing
The National Stage, continued…

Development of the Common Core of
State Standards (CCSS)
◦ Content standards (not a test)
 English Language Arts (K-12)
 Mathematics (K-12)
◦ Developed with backing from 48 states
◦ Adoption tally
 Adopted in full by 39 states
 Adoption declined in 5 states
 Adoption expected by remaining 6 states by end of
2011
The National Stage, continued…

Assessment Consortia
◦ Race to the Top Assessment Competition
◦ Development of an infrastructure and content
for a common assessment in measuring CCSS
in English Language Arts and Mathematics
◦ Two consortia
 SMARTER/Balanced Assessment Consortium
(SBAC)
 Partnership for the Assessment of Readiness for
College and Career (PARCC)
The National Stage, continued…

The consortia:
◦ SMARTER/Balanced
 31 states
 17 governing states
 CAT beginning in 2014-2015
◦ PARCC
 26 states
 11 governing states
 CBT beginning in 2014-15
Consortia Membership
The National Stage, Summary

State efforts have been, with few exceptions, piecemeal by…
◦ Program
◦ Content area
◦ Grade level
◦ Type of assessment (summative, interim, formative)
◦ Population (general, modified, alternate)

Most states are…
◦ Involved in some kind of pilot or operational use
◦ Intending to be operational on a large scale by 2014-2015
◦ Experiencing budget crises…
◦
◦
That make transitions difficult
That make efficiencies of technology integration critical

A strong need to take a systems look at how to integrate computer
technology into assessment and accountability systems

Technology integration is a significant opportunity to provide a platform
that connects all initiatives
The Organizing Framework for this
Paper

From…
◦ Martineau, J. A., & Dean,V. J. (in press). Making
Assessment Relevant to Students, Teachers,
and Schools. In V. Shute & Becker, B.J. (Eds.).
Innovative Assessment for the 21st Century:
Supporting Educational Needs. Springer-Verlag,
NY.
◦ Figure 1
Accountability
Content &
Process
Standards
Professional
Development
SEA & LEA
accountability (e.g.,
accreditation) for inservice PD
Educator accountability (e.g.,
evaluation, performance pay) for
implementation of classroom
assessment & data use practices
Educator accountability (e.g., evaluations,
performance pay) for individual student
achievement & growth scores on secure
summative assessments
Student accountability (e.g.,
grades, course credit) for
classroom (and possibly secure
interim) summative scores
End of year, on-demand
summary assessment
(if needed)
Classification of content & process
standards for measurement
purposes by:
Assessment
literacy
standards
for educator
certification
Response type
* on-demand timed
* on-demand untimed
* feedback looped
Task type
* selected response
* short constructed response
* extended constructed response
* performance event
Setting
* classroom only
* classroom and secure
Classroom
Formative
Assessment
Classroom
Summative
Assessment
Secure Adaptive
Interim Assessment
Secure
Adaptive
Summary
Assessment
Teacher prep institution
accountability (e.g.,
accreditation) for preservice PD
Limited number
of high-schoolexit standards
Learning
progressions
Assessment literacy
training requirements for:
* teachers
* consultants
* leaders
Limited number of
K-12 content/
process standards
Repeatable, on-demand
customizable, on-line, unit
assessments
Overall achievement &
growth scores
Scoring
(maximize
objective,
distribute
subjective)
If
needed
Portfolio description
(feedback looped tasks)
Portfolio development &
submission
Summative
classroom
assessments
Model classroom formative &
summative assessment
strategies & materials
Online classroom
assessment strategies
& materials clearinghouse for educators
Learning
progressions
Unit achievement scores
Growth scores based on
learning progressions
Model curriculum/
instruction units
Pre- and in-service balanced assessment training on:
* content standards
* classroom assessment (formative, summative)
* large-scale assessment (benchmark, summative)
* assessment data use for decision making
* subjective item scoring
Ongoing support for
implementation in the form of
school teams and coaches
(for observation and followup)
Classroom
achievement scores
Formative
assessment
implementation
Accountability
Accountability as Protective Umbrella Over the Complete System
Makes Sense Only when All Layers Below are in Place
Secure
Adaptive
Summary
Assessment
Secure Adaptive Summary Assessment as a Policy and
Accountability Metric (including Cross-Year Growth Modeling)
Secure Adaptive
Interim Assessment
Secure Adaptive Interim Assessment as a Policy and
Accountability Metric (including Within-Year Growth Modeling) that
Makes Sense Only when the Foundational Layers are in Place
Classroom
Summative
Assessment
Classroom Summative Assessment Layered on Formative Assessment
Classroom
Formative
Assessment
Classroom Formative Assessment as the Ground Floor
Content &
Process
Standards
Content and Process Standards as Foundation
Professional
Development
Professional Development as Footings
Accountability
Content &
Process
Standards
Professional
Development
SEA & LEA
accountability (e.g.,
accreditation) for inservice PD
Educator accountability (e.g.,
evaluation, performance pay) for
implementation of classroom
assessment & data use practices
Educator accountability (e.g., evaluations,
performance pay) for individual student
achievement & growth scores on secure
summative assessments
Student accountability (e.g.,
grades, course credit) for
classroom (and possibly secure
interim) summative scores
End of year, on-demand
summary assessment
(if needed)
Classification of content & process
standards for measurement
purposes by:
Assessment
literacy
standards
for educator
certification
Response type
* on-demand timed
* on-demand untimed
* feedback looped
Task type
* selected response
* short constructed response
* extended constructed response
* performance event
Setting
* classroom only
* classroom and secure
Classroom
Formative
Assessment
Classroom
Summative
Assessment
Secure Adaptive
Interim Assessment
Secure
Adaptive
Summary
Assessment
Teacher prep institution
accountability (e.g.,
accreditation) for preservice PD
Limited number
of high-schoolexit standards
Learning
progressions
Assessment literacy
training requirements for:
* teachers
* consultants
* leaders
Limited number of
K-12 content/
process standards
Repeatable, on-demand
customizable, on-line, unit
assessments
Overall achievement &
growth scores
Scoring
(maximize
objective,
distribute
subjective)
If
needed
Portfolio description
(feedback looped tasks)
Portfolio development &
submission
Summative
classroom
assessments
Model classroom formative &
summative assessment
strategies & materials
Online classroom
assessment strategies
& materials clearinghouse for educators
Learning
progressions
Unit achievement scores
Growth scores based on
learning progressions
Model curriculum/
instruction units
Pre- and in-service balanced assessment training on:
* content standards
* classroom assessment (formative, summative)
* large-scale assessment (benchmark, summative)
* assessment data use for decision making
* subjective item scoring
Ongoing support for
implementation in the form of
school teams and coaches
(for observation and followup)
Classroom
achievement scores
Formative
assessment
implementation
Entry Points
Assessment
literacy
standards
for educator
certification
Limited number
of high-schoolexit standards
Overall achievement &
growth scores
Unit achievement scores
Outcomes
Growth scores based on
learning progressions
Classroom
achievement scores
Formative
assessment
implementation
The Organizing Framework for this Paper,
continued…

With a comprehensive system in place, it
is possible to identify comprehensively
where integration of technology will
enable and enhance the system

Components identified with bold outlines
on the next slide
Accountability
Content &
Process
Standards
Professional
Development
SEA & LEA
accountability (e.g.,
accreditation) for inservice PD
Educator accountability (e.g.,
evaluation, performance pay) for
implementation of classroom
assessment & data use practices
Educator accountability (e.g., evaluations,
performance pay) for individual student
achievement & growth scores on secure
summative assessments
Student accountability (e.g.,
grades, course credit) for
classroom (and possibly secure
interim) summative scores
End of year, on-demand
summary assessment
(if needed)
Classification of content & process
standards for measurement
purposes by:
Assessment
literacy
standards
for educator
certification
Response type
* on-demand timed
* on-demand untimed
* feedback looped
Task type
* selected response
* short constructed response
* extended constructed response
* performance event
Setting
* classroom only
* classroom and secure
Classroom
Formative
Assessment
Classroom
Summative
Assessment
Secure Adaptive
Interim Assessment
Secure
Adaptive
Summary
Assessment
Teacher prep institution
accountability (e.g.,
accreditation) for preservice PD
Limited number
of high-schoolexit standards
Learning
progressions
Assessment literacy
training requirements for:
* teachers
* consultants
* leaders
Limited number of
K-12 content/
process standards
Repeatable, on-demand
customizable, on-line, unit
assessments
Overall achievement &
growth scores
Scoring
(maximize
objective,
distribute
subjective)
If
needed
Portfolio description
(feedback looped tasks)
Portfolio development &
submission
Summative
classroom
assessments
Model classroom formative &
summative assessment
strategies & materials
Online classroom
assessment strategies
& materials clearinghouse for educators
Learning
progressions
Unit achievement scores
Growth scores based on
learning progressions
Model curriculum/
instruction units
Pre- and in-service balanced assessment training on:
* content standards
* classroom assessment (formative, summative)
* large-scale assessment (benchmark, summative)
* assessment data use for decision making
* subjective item scoring
Ongoing support for
implementation in the form of
school teams and coaches
(for observation and followup)
Classroom
achievement scores
Formative
assessment
implementation
Starting from the Bottom Up

Professional Development
Pre- and in-service balanced assessment training on:
* content standards
* classroom assessment (formative, summative)
* large-scale assessment (benchmark, summative)
* assessment data use for decision making
* subjective item scoring
Ongoing support for
implementation in the form of
school teams and coaches
(for observation and followup)
Current lack of pre-service and in-service
balanced assessment training
 Need for rapid scale up to millions of
educators on a small budget

Technology Integration into Pre- and
In-Service Professional Development

Scaling up is only feasible with integral use of
technological tools




High-quality online courses
Social networking among educators
Live tele-coaching
Electronic (graphic, audio, video) capture for distance streaming
of materials, plans, and instructional practice vignettes over high
speed networks
 To facilitate discussion regarding instructional practice between



Candidates and instructor/coach
Candidates and mentor
Mentors and instructor/coach
 For example, repurposing Idaho’s special portfolio submission system
for educator training
Moving to Content & Process
Standards



Start a limited set of high school exit standards based
on college and career readiness
From that, develop K-12 content/process standards in a
logical progression to college and career readiness
Based on the learning progressions and K-12
content/process standards, develop model instructional
materials
Model curriculum/
instruction units
Model Instructional Materials
Clearinghouse

Develop online clearinghouse of materials
for model curriculum and instructional
units
◦ Lesson plans
◦ Lesson materials
◦ Video vignettes of high quality instructional
practices based on those units
◦ Flexible platform to accept user submission in
a variety of formats
◦ User moderated ratings of submission quality
Moving to Assessment Practices

Before actually moving into assessment practices, it is
important to classify content standards in three ways:
◦ Timing
 On-demand, time limited
 On-demand, not time limited
 Feedback-looped
◦ Task type




Selected response
Short constructed response
Extended constructed response
Performance events
◦ Setting
 Classroom only
 Classroom and secure

Based on these classifications, several types of
assessment take place
Assessment Practices, continued…

Start with model classroom materials and
tools
Model classroom formative &
summative assessment
strategies & materials

Online classroom
assessment strategies
& materials clearinghouse for educators
Initial development of model materials,
vignettes, strategies, and tools sets the
stage for…
Educator submissions to

Populate online clearinghouse of materials for
model classroom assessment practice units
◦ Summative assessment materials
◦ Formative assessment vignettes, strategies, and tools
◦ Flexible platform to accept user submission in a
variety of formats
◦ User moderated ratings of submission quality

Non-secure item bank generated by educators
◦ Platform support various item types
◦ User moderated ratings of submission quality
◦ Large enough that security is not a concern


Empirically designed MC items
Fully customizable
Which in Turn Leads to…
Summative
classroom
assessments
Online classroom
assessment strategies
& materials clearinghouse for educators

Formative
assessment
implementation
Implementation of formative assessment practices
enhanced by technological aids, such as
◦ Response devices (e.g., clickers, tablet computers, phones)
◦ Rapid response to teacher queries over online systems
◦ Remote response to formative queries (e.g. rural areas and
cyberschools)
Which in Turn Leads to…
Summative
classroom
assessments
Online classroom
assessment strategies
& materials clearinghouse for educators

Formative
assessment
implementation
Selection or development of summative classroom assessments
◦ On-demand micro-benchmark (small unit) assessments
◦ From non-secure item bank generated by educators
◦ Customizable to fit specific lesson plans/curricular documents
◦ Instant reporting for diagnostic/instructional intervention purposes
◦ Inform targeted professional development in real time
◦ RESULTS NOT used for large-scale accountability purposes (belongs to the
schools and teachers)
With High-Quality Classroom Assessment
Practices in Place

Large-scale assessment now makes sense,
with three types of large-scale assessment
End of year, on-demand
summary assessment
(if needed)
Repeatable, on-demand
customizable, on-line, unit
assessments
Portfolio development &
submission
Large-Scale Assessment, continued…
 Start
with classroom-based
Portfolio development &
submission
 For
content standards best measured
using “feedback-looped” tasks
◦ Meaning content standards (likely higher
order) that are best accomplished with
a feedback cycle between teacher and
student
Portfolio Development & Submission,
continued…
Portfolio development &
submission
Creation of portfolio includes scannable materials, electronic
documents, and/or audio/video of student performance
 Submitted via a secure online portfolio repository (e.g.,
Idaho’s alternate assessment portfolio submission site)
 Unlikely to be scorable using AI, therefore, scored on a
distributed online scoring system that prevents teachers
from scoring their own students’ portfolios (e.g., Idaho’s
alternate assessment portfolio scoring site
 Can be scored both for final product and development over
time

Moving to Secure Online Testing

For content standards that do not require “feedback-looped” tasks
Repeatable, on-demand
customizable, on-line, unit
assessments

Dynamic online CAT assessments
◦ Based on dynamically selected clusters of content standards covered in
instructional units
◦ Scaled to the same scale as the end-of-year assessment, with cut scores
for mastery/proficiency
◦ Can move students on to higher grade level content once
mastery/proficiency of all grade level content is demonstrated through
unit assessments
◦ What Race to the Top Assessment Competition calls “Through-Course
Assessment”
Moving to Secure Online Testing
Repeatable, on-demand
customizable, on-line, unit
assessments
◦
What Race to the Top Assessment Competition calls
“Through-Course Assessment”
◦ Provides advance look at trajectory toward proficiency
◦ Provides multiple opportunities to demonstrate proficiency
◦ More equitable for high-stakes accountability purposes
◦ Useful for mid-year correction in instructional practice (e.g. Response
to Intervention)
◦ Useful for placement purposes of newly arrived students
◦ Useful for differentiated instruction
◦ Anticipate increase educator motivation (because of timely information)
Moving to Secure Online Testing
Repeatable, on-demand
customizable, on-line, unit
assessments

Beyond traditional CAT/CBT
 AI Scoring of constructed response items
 Technology enhanced items
 Performance tasks/events (through
simulations)
 Gaming type items
Moving to Secure Online Testing

For three groups of students…
End of year, on-demand
summary assessment
(if needed)
1.
2.
3.


Initial scaling and calibration group
Ongoing randomly selected validation groups (to validate that
students proficient on all required unit tests retain proficiency at
the end of the year)
Students who do not achieve proficiency on all required unit tests
Final opportunity to demonstrate overall proficiency if
proficiency was in question on any single unit assessment
Allows for the elimination of a single end-of-year test for most
students
Scoring

Maximize objective scoring
by
◦ Automated scoring of
objective items
◦ AI scoring of extended
written response items,
technology enhanced items,
and performance tasks
wherever possible
◦ Distributed hand-scoring of
tasks not scorable using AI
Scoring
(maximize
objective,
distribute
subjective)
Distributed Scoring as Professional
Development

Human scorers taken from ranks of
educators
◦
◦
◦
◦

Online training on hand-scoring
Online certification as a hand-scorer
Online monitoring of rater performance
Validation hand-scoring of samples of AI-scored
tasks
Our experience with teacher-led scoring
and range-finding indicates that it is some
of the best professional development that
we provide to educators
Reporting
For the most part, reports are difficult to
read and poorly used
 Need online reporting of all scores for all
stakeholders, including:

◦
◦
◦
◦
◦
Policymakers (aggregate)
Administrators (aggregate and individual)
Teachers (aggregate and individual)
Parents (aggregate and individual)
Students (individual)
Reporting Portal
 Reporting
portal
needs to be able to
integrate reports
from classroom
metrics all the way
to large-scale
secure assessment
metrics
Overall achievement &
growth scores
Unit achievement scores
Growth scores based on
learning progressions
Classroom
achievement scores
Reporting Portal

Reporting cycles
depend on the item
types and application
of AI scoring.
◦ Immediate where possible
◦ Expedited hand-scoring
(shifting funding focus from
printing, shipping, and
scanning to on-demand
hand-scoring)
Overall achievement &
growth scores
Unit achievement scores
Growth scores based on
learning progressions
Classroom
achievement scores
Where the Rubber Hits the Road
This is a nice system design (if we do say so
ourselves), but what are the impediments to
implementation?
 Infrastructure

◦ LEA hardware and bandwidth capacity
◦ Assessment vendor capacity
◦ Moving from piecemeal components to an integrated,
coherent system
◦ Development of educator-moderated clearinghouses
◦ Development of educator-moderated item bank
Where the Rubber Hits the Road

Security
◦ The more high-stakes the system, the more
likely security breaches become
◦ Critical need for training on user roles
◦ Critical need for training on data use, since
data will become much more readily available
across the board
◦ Security controls versus open-source and
maximal access
Where the Rubber Hits the Road

Funding
◦ Very high initial startup investment
◦ Dual systems during development and initial
implementation
◦ Ramping up LEA technology systems to be
capable of working within the system
Where the Rubber Hits the Road

Sustainability
◦ Requires perpetual investment in administration
◦ Development is only the start (e.g. sustainability
concerns regarding RTTT-funded assessment
consortia)
◦ Requires early success and public understanding of
the benefits of the system weighed against ongoing
costs
◦ Recurring hardware/software technology upgrade
costs for LEAs
◦ Recurring hardware/software technology
maintenance costs for central IT systems
Where the Rubber Hits the Road

Local Control
◦ This kind of system is only possible to create with
significant funding and local buy-in
◦ No single state (let alone district) could afford the
cost of development and implementation
◦ Consortia are imperative to creating such a system
 Consortia can tend toward self-perpetuation rather than
serving their members
 Consortia cannot ignore local nuances
 Consortia cannot ignore reasonable needs for flexibility
 Consortia must monitor and maximize member investment
Where the Rubber Hits the Road

Building an appetite for online systems
◦ Implementation may occur piecemeal, but should be
undertaken within a framework for a coherent and
complete system
◦ Each piece when implemented needs to be
implemented in such a way that local educators and
policymakers see a positive impact on the educational
system, e.g.,




Immediate turnaround of results
Connection between family and school
Improved instructional practice
Facilitation of differentiated instruction
Recommendations for Future
Directions

System has the potential to make us data-rich
and analysis-poor
◦ Build local (SEA and LEA) capacity for appropriate
analysis (possibly through re-defining positions that
might be eliminated through consortia services)
◦ New practices (e.g. through-course, innovative items
types, AI scoring) will require a significant research
and validation agenda, including
 Equating
 Comparability
 Standard setting
Recommendations for Future
Directions
 System
has the potential to make
educators and students data rich
◦ Portfolios of assessment results and
products as evidence of students’
college and career readiness
◦ Portfolios of assessment results and
products as evidence of teacher
classroom practices and effectiveness
Recommendations for Future
Directions
Financial incentives from ARRA/RTTT have
provided the impetus for some of these
initiative to get started
 Sustainability needs to be a focus both within
and across states
 To maximize cross-state focus, we recommend
continued significant funding of initiatives
through ESEA reauthorization, Enhanced
Assessment Grants, and other
competitive/formula funding opportunities

Recommendations for Future
Directions

Scoring of competitive consortium applications
should be weighted toward…
◦ The development of integrated systems across all aspects of
assessment & accountability
◦ Significant and rigorous research, development, and evaluation of
the validity and impact (intended and unintended consequences)
of system development and implementation
Formula funding should stipulate collaboration in system
development
 Use of formula funding guarantees…

◦ Continued focus on students with the greatest needs
◦ Access to quality systems for states without strong resources
for writing competitive grants
Contact Information

Joseph A. Martineau, Ph.D.
◦ Director of Assessment & Accountability
◦ [email protected]

Vincent J. Dean, Ph.D.
◦ State Assessment Manager
◦ [email protected]

Michigan Department of Education