Let`s say you have finished teaching a lesson on ___ and have

Kelly D. Bradley1 and Shannon O. Sampson2
Introduction
The No Child Left Behind Act of 2001 (NCLB) is an education reform designed to improve student
achievement and close achievement gaps. With the passage of No Child Left Behind, Congress
reauthorized the Elementary and Secondary Education Act of 1965 (ESEA), the principal federal law
affecting education from kindergarten through high school. The legislation is built on four pillars:
accountability for results, an emphasis on doing what works based on scientific research, expanded
parental options, and expanded local control and flexibility. Our nation relies on data-driven decisions in
everything from sports to medicine. Similarly, standards call for the collection and analysis of data to
assess the effectiveness of instruction. It is imperative that we have data about what students are learning
and that teachers be able to responsibly analyze and make decisions based on the data.
This notebook provides a framework and implementation plan formulated around a “Work Sampling
System” (Meisels et al. 1995), to support you in constructing assessments, and then in interpreting,
utilizing and reporting the corresponding data. This system involves a continuous assessment approach
providing various visions of what students should know and be able to do. After all, the central purpose
of classroom assessment is to provide information about what students know and are able to do [and not
to do] in order to make decisions about instruction. For example, results from an assessment may lead
you to spend more time on a topic based on the lack of understanding demonstrated by your students,
increase the pace of instruction or decide to divide the class into groups for more individualized tasks.
On a broader scale, you will learn how to utilize your results to make comparisons of your students
across similar content classes within the school, across the county, state and even national contexts.
The notebook is organized around the following concepts:
Contextual Factors – how to use information about the learning-teaching context and student
individual differences to set learning goals and plan instruction and assessment.
Learning Goals – how to set significant, varied and appropriate learning goals.
Assessment Plan – how to use multiple assessment modes and approaches aligned with learning goals
to assess student learning before, during, and after instruction.
Design for Instruction – how to design instruction for specific learning goals, student characteristics
and needs, and learning contexts.
1
For additional information, contact Kelly D. Bradley, Ph.D. at 131 Taylor Education Building, Lexington, KY, 40506;
[email protected]; (859)257-4923
2
Both authors contributed equally and are listed in alphabetical order.
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
1
Instructional Decision-Making – how to conduct ongoing analysis of student learning to make
instructional decisions.
Analysis of Student Learning – how to use assessment data to profile student learning and
communicate information about student progress and achievement.
Evaluation and Reflection – how to reflect on your instruction and student learning in order to improve
teaching practice.
The basic figurative model of the notebook is presented below.
Content
Knowledge
Pedagogy &
Dispositions
Assessment Package
Learning Goals
Student Needs
Figure 1. Conceptual Model of Assessment Training Package
Contextual Factors
Contextual Factors – having teachers use information about the learning-teaching context and student
individual differences to set learning goals and plan instruction and assessment.
Chapter 1 of The Learning Record (see http://www.fairtest.org/LearningRecord/LR%20Math%20%20Recording%20Form.pdf) notes contextual factors that are important to consider in planning
mathematics instruction:
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
2
• Confidence and Independence. How willing are students to risk error? Are they able to volunteer
information and possible solutions to problems? Will they initiate topics for discussion and study? To
what extent will they persevere in the face of complexity?
• Experience. How well do students use their prior knowledge to make sense of current tasks? What
background do they have in mathematics? How well do they apply their textbook knowledge to
authentic purposes?
• Skills and Strategies. Do students use the skills and strategies of the subject to solve problems? Do
they demonstrate they can use mathematics to solve a variety of problems across different mathematics
strands?
• Knowledge and Understanding. How well can students demonstrate what they know and understand?
What evidence suggests they are adding to their personal knowledge and understanding? To what extent
do they make connections among mathematical ideas and across other content areas?
• Ability to Reflect. Can students provide criteria for assessing their own work? How well can they judge
the quality of their own work?
The Center for Language in Learning recommends collecting this information through brief
interviews with each student and his/her parents/caregivers in the first quarter or term. Information from
these interviews contributes “baseline data” from which to measure accomplishment as students
progress during the year. Students and their parents can begin to set criteria for measuring success in
their own terms and can compare year-to-year achievements.
Learning Goals
Learning Goals – how to set significant, varied and appropriate learning goals.
The place to begin this process is by looking at the Kentucky Academic Expectations, Program of
Studies, and Core Content for Assessment, as well as how Kentucky describes different levels of
performance. You can locate the Academic Expectations, Program of Studies, and Core Content for
Assessment for your subject area through a search for the “Combined Curriculum Document” at the
Kentucky Department of Education (www.kentuckyschools.net). A search for “Performance Standards”
lists standards for various subject areas. The performance standards chart may be useful in the initial
conference with students, to have them talk about where they feel they stand and to begin to set goals.
You may want to list the standards which students will be expected to reach to be considered
proficient in your course. Throughout the semester the students would work in conjunction with you to
collect evidence that they are progressing toward, and eventually reaching, proficiency in each area. To
do this, you might use a checklist as recommended by Meisels (1997), in which you make page for each
student with a list of the given performance indicators. Next to each indicator, you would have a fall,
winter and spring column and three check-boxes within each column. One box for “not yet proficient”,
another for “in process” and a third for “proficient.”
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
3
Skills, Concepts and Relationships
Mathematical Strategies
Understanding
Terminology and Representatives
Reasoning
3
HIGH SCHOOL (GRADE 11) MATHEMATICS3
DISTINGUISHED
PROFICIENT
APPRENTICE
Student demonstrates an extensive
Student demonstrates an
Student demonstrates understanding
understanding of concepts, skills,
understanding of concepts, skills and of concepts, skills, and relationships
and relationships in
relationships in
in number/computation,
number/computation,
numbers/computation
geometry/measurement,
geometry/measurement,
geometry/measurement,
probability/statistics and algebraic
probability/statistics, and algebraic
probability/statistics, and algebraic
ideas as defined by Kentucky’s Core
ideas as defined by Kentucky’s Core ideas as defined by Kentucky’s Core Content for high school students
Content for high school students.
Content for high school students
some of the time
most of the time.
Student demonstrates consistent,
Student demonstrates effective
Student demonstrates correct
effective application of the problem- application of the problem-solving
application of the problem-solving
solving process. Student consistently process by showing evidence of a
process by implementing appropriate
shows evidence of a well-developed
well-developed plan for solving
strategies for solving problems some
plan for solving problems, using
problems, using appropriate
of the time.
appropriate procedures, sequence of
procedures, sequence of steps, and
steps, and relationships between the
relationships between the steps most
steps.
of the time.
Student demonstrates an extensive
Student demonstrates a general Student demonstrates some
understanding of problems and
understanding of problems and understanding of problems and
procedures by arriving at complete
procedures by arriving at correct and procedures by arriving at correct and
and correct solutions. (Student rarely complete solutions most of the time. complete solutions some of the time.
has minor computational errors that
(Student may have some minor
do not interfere with conceptual
computational errors that do not
understanding.)
interfere
with
conceptual
understanding.)
Student consistently and effectively
Student uses appropriate and
Student uses appropriate and
uses appropriate and accurate
accurate mathematical
accurate mathematical
mathematical representations/models representations/models (symbols,
representations/models (symbols,
(symbols, graphs, tables, diagrams,
graphs, tables, diagrams, models)
graphs, tables, diagrams, models)
models) and corrects mathematical
and corrects terminology to
and correct mathematical
terminology to communicate in a
effectively communicate a sequential terminology appropriate for high
clear and concise manner.
development of the solution most of
school students some of the time.
the time.
Student consistently and effectively
Student demonstrates appropriate
Student demonstrates appropriate
demonstrates appropriate
use of mathematical reasoning to
use of mathematical reasoning (e.g.,
mathematical reasoning to solve
solve problems (e.g., make and
make and investigate mathematical
problems (e.g., make and investigate investigate mathematical
conjectures, make generalizations,
mathematical conjectures, make
conjectures, make generalizations,
make predictions, and/or defend
generalizations, make predictions,
make predictions and/or defend
solutions) some of the time.
and/or defend solutions).
solutions) most of the time.
NOVICE
Student rarely demonstrates
understanding of concepts, skills,
and relationships in number
number/computation,
geometry/measurement,
probability/statistics, and algebraic
ideas as defined by Kentucky’s Core
Content for high school students.
Student rarely demonstrates
appropriate problem solving skills
and/or rarely applies the problemsolving process correctly.
Student rarely demonstrates
understanding of problems and
procedures by arriving at solutions
that may be incorrect or incomplete.
Student rarely uses appropriate
mathematical representations/models
(symbols, graphs, tables, diagrams,
models) and corrects mathematical
terminology appropriate for high
school students.
Student rarely demonstrates
appropriate use of mathematical
reasoning.
http://www.education.ky.gov/NR/rdonlyres/en6buqjvte3hgf5ddhi6cggzpskobj3mxribbcgwayd4gtn3lztb2clkepwzekerbhopyxc4hwiez4epnbnn63njunh/SPLDMathematics.pdf
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
4
Assessment Plan
Assessment Plan – how to use multiple assessment modes and approaches aligned with learning goals
to assess student learning before, during, and after instruction.
Meisels (1996) writes, “For too long assessment and instruction have been adversaries. Teachers say
that they cannot teach as they wish because they spend time preparing their students and modifying their
curriculums to conform to items that will appear on mandated achievement tests. Policymakers say that
they need objective information to show what students are learning and what teachers are teaching, even
if the indicators provided to teachers are inconsistent with educational practice and are seriously flawed.
With authentic performance assessments…, these conflicts can be resolved. In this approach, educators
design instructional objectives for teaching and learning, as well as for evaluation.”
What counts as assessment?
An assessment can be anything that provides evidence of a student’s level of understanding of a
concept. In fact, a useful starting point in developing an assessment plan is to ask yourself what kinds of
instructional opportunities could serve as evidence of progression toward the learning goals.
Assessments can include many tasks, from selected response items, open response items, performance
events, and even teacher observations. Grondlund (2006, p. 22) offers eight guidelines for effective
student assessment:
1. It specifies the learning outcomes and assesses even the complex levels of understanding
2. A variety of assessment procedures are used
3. It is instructionally relevant, supporting the outcomes of instruction and improves student learning
4. It creates an adequate sample of student performance
5. It is fair to all students, eliminating irrelevant sources of difficulty
6. It specifies the criteria for judging the performance, such with a rubric
7. It provides meaningful feedback to the students so that they can adjust learning strategies
8. It must be supported by a comprehensive grading and reporting system
Gronlund recommends using assessments to inform their decisions before beginning instruction, during
instruction and at the end of instruction.
Before Instruction
Assessments administered prior to instruction can be designed to provide information about whether
students have mastered prerequisite skills necessary for moving to the planned instruction, to indicate
the level of understanding students have on a subject prior to teaching it, and can serve as a baseline for
measuring student growth. Students lacking particular prerequisite skills can be provided supplemental
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
5
instruction. Depending on students’ level of understanding, you might choose to spend more or less time
on given concepts, and differentiate instruction for students who have varied levels of understanding.
During Instruction
Assessments should be used throughout instruction to monitor student progress toward achieving the
intended learning outcomes. These assessments, often called formative assessments, should be used to
improve student learning rather than to assign grades. From the results of these assessments, teachers
can diagnose where students have difficulties in understanding and can address these for individuals
and/or the class.
After Instruction
Assessment after instruction is often called summative assessment and the results are typically used
for grading. This may be an end-of-the-chapter test or performance assessment. In keeping with the
Learning Record, this would be the opportunity for students and teachers to select work which
demonstrates they have reached proficiency on the subject at hand.
Types of Assessment
Classroom achievement tests
A traditional source of evidence is a paper and pencil test administered at the end of a unit,
oftentimes a multiple choice, short answer or essay format. This sort of test can provide good evidence;
however, its capacity is often limited by the manner in which teachers score it. If you use tests such this
as evidence, see Using Tests to Create Measures in the Analysis of Student Learning section.
Gronlund (2006) describes specific guidelines for writing items for these tests.





For multiple choice items, he offers the following recommendations:
It should be appropriate for measuring the learning outcome
Item tasks should match the learning tasks to be measured
The item stem should:
o present a single problem
o be written with clear and simple language
o be worded so there is no repetition of material in the answer choices
o be stated in positive form wherever possible
o emphasize any negative wording with bold, underline or caps
o be grammatically consistent with answer choices
The answer choices should:
o be in parallel form
o free from verbal clues to the answer
o have distracters that are plausible and attractive to the uninformed avoid use of “all of the
above” and “none of the above”
The position of the correct answer should be varied from item to item




For true-false items, he suggests:
Using only one idea for each statement
Keeping the items short and grammatically simple
Avoiding qualifiers (i.e., may, possible) and vague terms (i.e., seldom)

*** Working Draft ***
Do not reproduce or cite without permission of the authors.
6


Using negative statements sparingly and avoid double negatives
Attributing statement of opinion to a source
 For matching items Gronlund recommends using a matching format only when the same
alternatives in multiple choice items are repeated often. If using this,
 Use homogeneous material so that all responses serve as plausible alternatives
 Keep the list of items to less than 10 and place answer alternatives to the right
 Use a different number of responses than items and permit responses to be used more than once
(indicate this in the directions)
 Place the responses in alphabetical order





For short answer items
State the item so that only a brief answer is required
Place the blanks at the end of the statement
Incorporate only one blank per item
Use uniform length for blanks on all items
 Finally, essay items should be used for measuring complex learning outcomes and not knowledge
recall. To aid in clarifying to the students what outcomes will be measured, you might include the
criteria to be used in grading the answers. As described by Gronlund, “Your answer will be
evaluated in terms of its comprehensiveness, the relevance of its arguments, the appropriateness of
its examples, and the skill with which it is organized.” Also,
 Avoid starting the question with words that request recall of information (i.e., who, what, where,
when, name, list)
 Use words such as “why”, “describe”, “explain”, “criticize”
 Write a model answer
 Avoid permitting a choice of questions
 In assigning scores to essay answers:
o Evaluate answers in terms of the learning outcomes being measured
o Score using a point method, in which various points are assigned to various facets of the learning
outcome OR
o Score using a rubric, using defined criteria as a guide.
 Evaluate all students’ answers to one question before moving to the next
 Evaluate answers without knowing the identity of the writer
Performance or product assessment
Mueller (see http://jonathan.mueller.faculty.noctrl.edu/toolbox/howdoyoudoit.htm), outlines the
steps to developing assessments outside of the traditional paper and pencil test. For a particular standard
or set of standards, he recommends developing a task your students could perform that would indicate
(or provide evidence that) they have met these standards. Identify the characteristics of good
performance on that task, the criteria that, if present in your students’ work, will indicate that they have
met the standards. For each criterion, identify two or more levels of performance along which students
can perform which will sufficiently discriminate among student performance for that criterion. The
combination of the criteria and the levels of performance for each criterion will be your rubric for that
task (assessment). The rubric will indicate how well most students should perform. The minimum level
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
7
at which you would want students to perform is your cut score or benchmark; it should indicate
proficiency in the content area. Providing students with the rubric early on will give students feedback
on what they need to improve upon and allow you to adjust instruction to ensure that each student
reaches proficiency.
With careful planning, many instructional activities can serve as assessments. For examples of authentic
assessments, see:
http://jonathan.mueller.faculty.noctrl.edu/toolbox/examples/authentictaskexamples.htm
For information about authentic assessment and rubrics, see:
http://www.education.ky.gov/KDE/Instructional+Resources/Elementary+School/Primary+Program/Instr
uctional+Resources/Instuctional+Strategy+Links.htm
Teacher Observations
The Learning Record suggests teachers use classroom observations as a way to collect evidence of
student learning. To facilitate this and make the observations more intentional, you might print labels
with student names and prepare a blank form for each student in each class. You can record observations
about anything significant about a student’s learning. As noted in the Learning Record, having a class
list helps the teacher record observations about all students in class. It is not necessary to note
observations for every student prior to starting a new page of labels, but using this method helps alert the
teacher to which students he/she needs to watch to record more data. These observations can be used as
evidence in describing students’ level of understanding.
Design for Instruction
Design for Instruction – how to design instruction for specific learning goals, students’ characteristics
and needs, and learning contexts.
Gronlund (2006) recommends that plans for assessment be made during plans for instruction. He
explains the relation between instruction and assessment, including that effective instruction provides
remediation for students not achieving the intended learning and that assessment reveal specific learning
gaps; instructional decisions are based on information that is “meaningful, dependable, and relevant”
and effective assessments provide information that is meaningful, dependable and relevant; and the
methods and materials of both instruction and assessments are congruent with the outcomes to be
achieved (p. 4).
Search for “Standards-Based Units of Study” at the KDE website (www.kentuckyschools.net).
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
8
Instructional Decision-Making
Instructional Decision-Making – how to use on-going analysis of student learning to make
instructional decisions.
Once you have identified the standards and know what proficiency looks like, our task is to gather
evidence that students are approaching proficiency, and to be able identify what support they need to
become proficient.
If you are using tests to create measures (as explained in the Analysis of Student Learning section
and taught in the appendix,) Winsteps software produces the following helpful table. This table lists all
of the items from an assessment along the right; MC and SA indicate if the items were multiple choice
or short answer, and the numbers after each SA indicate the number of points assigned to each item. The
most difficult items are listed at the top, and the least difficult are listed at the bottom. The numbers
across the top horizontal line indicate a difficulty/ability score. The scores are similar to z-scores in that
the zero point is the mean difficulty measure and 1, 2 and 3 are respective standard deviations away
from the mean. The vertical line down the center of the chart indicates the proficiency line (see How to
determine the proficiency cut point on page 26 for an explanation on why a line was placed at that
point.) The vertical line of numbers to the left of the line is placed at the ability point for the student and
indicates the expected score on each item for a student at that ability. Numbers to the left and right of
that line indicate the student’s actual score on each item; they are placed such at the difficulty point of
the item. If the student were to consistently answer as he answered an item off the line, he/she would be
placed at the higher or lower overall measure. Actual scores listed to the left or right of the score line
with a period on each side is within an acceptable deviation range. Those with parentheses are
considered to be outside of that range. To diagnose where a student would need to focus to reach
proficiency, the teacher and student would look at items corresponding to numbers to the left of the
vertical line that indicates the proficiency point.
-3
-2
-1
0
1
2
3
4
|-------+-------+-------+-------+-------+-------+-------|
0
(2)
1
.2.
.4.
.5.
6
.5.
.3.
1
.2.
9
.10.
18
.20.
.0.
1
7
.8.
(2)
7
.2.
.3.
.2.
.2.
.2.
.2.
|-------+-------+-------+-------+-------+-------+-------|
-3
-2
-1
0
1
2
3
4
NUM
3
9
19
12
18
17
8
15
21
10
11
13
14
20
4
6
7
16
NUM
ITEM
3null & alternative hypothesesMC
9margin of error MC
19test stat & p-value calc SA7
12cond for significance test SA8
18conditions for inference SA8
17state hypotheses SA4
8p-value calc MC
15construct, interpret CI SA10
21construct and interp90% CI SA20
10margin of error MC
11population and parameter ID SA8
13test stat and p-value calc SA8
14conclusion SA3
20conclusion SA3
4conditions for inferenceMC
6standard error calc MC
7conditions for inferenceMC
16proprtions calc SA2
ITEM
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
9
This student has a measure slightly below proficiency, but many of the items fall in the proficient
range (3, 9, 8, 15, 21, and 11.) While the student would need to focus on most items, it would be
especially important to review items 12, 10, and 13 because these were the items that fell below the
student’s own ability measure on the test.
Analysis of Student Learning
Analysis of Student Learning – How to use assessment data to profile student learning and
communicate information about student progress and achievement.
The Learning Record recommends that you and the students collect evidence throughout the year to
evaluate what the student is learning. It is recommended that this take place at least three times
throughout the year.
Presenting Student Data
Let’s say you have finished teaching a lesson on probability and have given a 10-question quiz over
the material. Each question is worth one point—it’s either right or it’s wrong. Adding up the number of
correct answers gives us the raw score, which is typically the first calculation used in determining
student performance. We might record that number in our grade book, or go a step further and divide by
10 to come up with a percentage. A student who gets 9 out of 10 would receive a 90% on the quiz, and
both the student and the teacher would probably feel like the material was well taught and learned.
Looking at the raw score alone provides very little information. There are a few ways to use item results
to communicate more information, using concepts from both statistics and measurement.
One way to get a more informative picture of the data is to graph the individual items and compare
student performance on each. The histogram below displays the multiple choice items from a sample
statistics test that a teacher administered. The items were worth 2-points each, and student scores for
each item were added and the total displayed as a vertical bar (See Graphing Multiple Choice Items for
an explanation of how to create this graph.)
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
10
Sum of student scores
50
40
30
20
10
1
10
2
3
4
5
6
7
8
9
Multiple Choice Item
This visually displays that item 10, 8, 9 and 10 were difficult for the overall group, while item 3 was
extremely difficult. This teacher would probably want to review the concepts behind items 8, 9 and 10.
She would also want to look at item 3 to make sure it was written clearly and keyed correctly, and then
either modify the item or re-teach the concept.
This assessment also contained short answer items, worth 2 to 21 points each. In order to facilitate
comparisons of item difficulty and performance, the student scores on the individual items were
converted to an equal scale by calculating percentages for each answer, and graphed as a series of
boxplots (See Creating Boxplots below for an explanation of how to create these.)
A boxplot consists of a box, vertical lines extending from the top and bottom of the box (often called
“whiskers”), and asterisks often appearing beyond the whiskers. Horizontal lines inside the boxes
represent the median score for the item, or the score in that falls at the midpoint of the distribution of
scores from high to low. The median scores are connected by a line in the graph below to facilitate
comparison of the median point across items. The box encompasses the scores of the middle 50% of the
scores, and the length of the box (called the interquartile range) is determined by the distribution of the
scores. A short box indicates that the scores are similar to each other and a long box indicates that there
is more variation in the scores. The whiskers encompass scores that fall in the region extending 1.5
times the interquartile range (above and below the box.) Asterisks above the box represent individual
scores greater than 1.5 times the interquartile range above the end of the box, and asterisks below the
box represent individual scores less than 1.5 times the interquartile range below the box. These asterisks
are considered outliers. You will want to pay special attention to the outliers below the box; these are
scores which are well below the performance of the rest of the class.
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
11
percent
100
50
0
11
12
13
14
15
16
17
18
19
20
21
item
The chart above displays the distribution of scores for the short answer items on a statistics test.
Looking at item 11, you can see that most students did very well. Because the box is relatively short, we
know that most students (50%) received a similar score on that item, and because the box is at the high
end of the y-axis, we can conclude that most students did well on that item. One outlier rests well below
the box and whisker. Using the “brush” function, we see that this actually represents two students
(entries 22 and 35.) It would be important to address this concept with these students because they
performed below the majority of the other students.
Glancing at the graphs reveals that concepts that should be re-addressed for many students include
12, 17, 18 and 19. Many students need more work with item 14, but the position of the box reveals that
many students already have a solid grasp of this concept. You would also identify the students
represented by asterisks on items 11, 13, 16, 17, 19, and 21, to determine why these students scored
especially low on these items.
Interpreting (and helping parents and students interpret) large scale assessment data
Consider the following questions:
Q: If a student receives a standard score (SS) of 85, would that be a good score?
Q: Is a score of 50% on an achievement assessment considered failing?
In order to answer these questions, it is important to understand the concept behind the normal
curve. Throughout life and nature, events tend to follow a similar pattern of distribution. Let’s say you
go to a Kentucky basketball game and collect each person’s age, height, weight, cholesterol level,
distance he/she drove to Rupp Arena, and number of UK games he/she has attended. If you were to
graph the frequency of each of these variables, each would approximate what is called a normal
distribution. In other words, the most frequently encountered observations would appear around the
middle (at the mean) and the less frequently encountered observations would appear on either side of the
mean. The distribution would be bell-shaped, as in the figure below.
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
12
Achievement test scores typically have the same distribution, and interpretation of these scores is
dependent on an understanding of what these curves are.
If you draw a line down the center of the normal curve (at the mean), you will have a mirror image
on either side of that line. Half of the observations will fall below the mean, and half will fall above the
mean. The normal curve is divided by standard deviations. In all tests, the mean is at 0 (zero) standard
deviations from the mean. The next marker on the bell curve is +1 and -1 standard deviations from the
mean, followed by 2 standard deviations from the mean. To interpret standardized test scores, you will
need to know the test instrument's mean score and standard deviation score. Standardized test scores are
typically reported as standard scores, percentiles, stanines, z-scores and T-scores. These scores are
explained below.
A Standard Score (SS) compares the student's performance with that of other children at the same age
or grade level. In standard scores, the average score or mean is 100, and the standard deviation of 15.
Thus, a 100 is considered the average score, and it is, by definition, at the 50 percent level. A 115 is 1
standard deviation above the mean, at the 84 percent level; a 130 is 2 standard deviations above the
mean, at the 98 percent level. An 85 is 1 standard deviation below the mean, at the 16 percent level; a 70
is 2 standard deviations below the mean, at the 2 percent level. (This explanation is based on
information from http://www.nldontheweb.org/wright.htm)
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
13
Returning to the questions posed at the beginning of this section,
Q: If a student receives a standard score (SS) of 85, would that be a good score?

Actually, a standard score of 85 is in the 16% and that score is below average. Average is
between SS 90-110 (sometimes 85-115 is considered average), so any SS below 90 would be
reason for concern.
Q: Is a score of 50% on an achievement assessment considered failing?

No. A score of 50% puts the child right in the middle of the average group of students.
The Percentile (%) Score indicates the student's performance on given test relative to the other children
the same age on whom the test was normed. A score of 50% or higher is considered above the average.
The Stanine Score like the Standard score, reflects the student's performance compared with that of
students in the age range on which the given test was normed. For reference, a stanine of 7 is above
average, a stanine of 5 is average and a stanine of 3 is below average.
Z scores are simply standard deviation scores of one with a mean of zero (Mean = 0, SD = 1, instead of
a mean of 100 and SD of 15 as we found with standard scores). If a student earned a z-score of 2, you
would know the student’s score is two standard deviations above the mean, with a percentile rank of 98.
The standard score equivalent would be 130 (mean of 100, standard deviation of 15.)
Another test format uses T Scores, which have a mean of 50 and a standard deviation of 10. A T score
of 40 would be one standard deviation below the mean. It would be equivalent to a Z score of -1, a
percentile rank of 16, and a standard score of 84.
There is a chart available from http://concordspedpac.org/Scores-Chart.html, which presents the same
information in another format, which you may find helpful.
Using Tests to Create Measures
Wright and Stone (1979), explain why this technique for scoring is problematic. The 9 out of 10, or
90%, actually provides little meaningful information. This is illustrated through the comparison of three
students according to their raw scores. Below, a 1 indicates a correct answer, a 0 indicates an incorrect
answer, and an m indicates a missing value:
Student A: 1 1 1 m m m m 1 1 0 = 5
Student B: 0 0 0 0 0 1 1 1 1 1 = 5
Student C: 1 1 1 1 0 0 0 0 0 1 = 5
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
14
Looking at the raw score alone would make it appear these three students have an equivalent
understanding of the material. A closer look at the items reveals a very different situation. If the items
are ordered from easiest to hardest, a look at item performance indicates Student A did not answer every
question; perhaps she missed a page of the test or ran out of time. Student B could have been careless on
the easiest items and diligent about the harder ones, he could have had a special skill set that addressed
the specific content of the hardest items, or he could have missed the presentation of the material of the
easiest items. Student C’s response is what would be expected: the easy items were answered correctly,
and the difficult ones were answered incorrectly, with the exception of the most difficult item. Looking
at the most difficult item, Student C’s correct answer might be the result of a lucky guess. In the case of
Student A, the incorrect answer could be viewed as a careless mistake, or perhaps the student missed it
because it was slightly more difficult than his/her ability level. Without more difficult items, it is
impossible to draw a conclusion.
A different set of students highlights another problem with the use of raw scores:
Student D: m 1 m m m 1 m m 1 1 = 4
Student E: 0 1 0 1 0 1 0 1 1 0 = 5
Student F: 1 1 1 1 1 1 0 0 0 0 = 6
Based on raw scores alone, it would appear Student D is less knowledgeable than Students E and F,
and Student F appears to be the most knowledgeable. Furthermore, these students are within a point or
two of each other; but a point earned on the easy end should probably be less “valuable” than a point on
the more difficult end. And what if a student answered 0/10 questions correctly; would that indicate he
has no ability at all related to the unit? Or, if a student were to get a 10 put of 10, would that indicate she
has completely mastered the subject and would be able to correctly answer all other questions about it?
Although they are frequently interpreted as having direct meaning, raw scores are really just ordinal data
with unequal units. They are very limited in the information they convey.
The Rasch model addresses these concerns with raw scores by converting the scores them into
measures. Each person has a certain probability of answering each item correctly, and each item has a
certain probability of being answered correctly. That probability, Px, is defined such that ln [P nix/(1Pnix)] = Bn – Di. Pnix is the probability of a successful response Xni being produced by person n to item i,
Bn is the ability of person n, and Di is the difficulty of item i. This equation produces equal units,
measures, which are, in turn, additive.
It is logical that a student with a good understanding of a concept should have a higher probability of
getting any item correct than a student with a poor understanding of that concept, regardless of the item
attempted. Furthermore, more difficult items should always have a lower probability of success than a
less difficult item regardless of the person attempting the item. Students B and E above illustrate that
students do not always meet the expectation.
When using raw scores to come up with a students’ grade, through a percentage, for example, you
are using classical test theory (CTT). As illustrated above, this method of analysis is limited in that the
use of raw scores alone are blind to unpredictable responses, they provide no information about the
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
15
ability of persons who have maximum and minimum scores, and the scores fall along an irregular
interval. To address these limitations, Rasch analysis was developed by Georg Rasch (1960).
The Rasch model produces a difficulty measure for each item on an assessment and an ability
measure for each person taking the assessment. It generates a type of “ruler” which measures item
difficulty and person ability on the same scale. The items should span a wide range of the difficulty
continuum and be spaced fairly evenly across it. A wider spread of items allows for the measurement of
a larger range of person abilities, and a closer space between item difficulties allows for a more precise
measurement of person abilities. When Rasch analysis places items and persons along a “ruler,” one can
see where the persons fall (based on their ability) in comparison to where the items fall (based on their
difficulty.) Just as a 12-inch measuring tape would be of little use in measuring the height of a 64-inch
person, when a number of persons have a higher ability than the most difficult item on an assessment,
the instrument would be considered too narrow to measure their ability. Similarly, just as a ruler that is
marked with meters would be of little use in measuring the length of a small insect, when a number of
persons have an ability measure that does not correspond to any item’s difficulty measure, the
instrument would be considered limited in its ability to accurately measure those persons. A well
designed assessment has a distribution of items that is equivalent to the distribution of persons (Bradley
and Sampson, 2006).
One feature of the Rasch model is that a more difficult assessment does not mean students receive
lower scores. Similarly, a test on the same information but with many easy items does not mean students
will receive higher scores. As Gronlund (2006) notes, assessment is a matter of sampling. Instruction is
typically organized around concepts or domains, and a test cannot possibly include the endless number
of questions that could be written about the topic. It is most fair that the estimation of student ability be
independent of the items included on the assessment of that ability. Rasch analysis produces ability
estimations which are independent of the difficulty level of the items selected for the test.
You can conduct a Rasch analysis with various programs. Here we will work with Ministep, the
student version of Winsteps software. Ministep uses Rasch analysis to produce a multitude of tables.
You will only need to look at a few to get a feel for student level of proficiency and to access the helpful
diagnostic information.
To use Winsteps to create a Rasch analysis, you will need to download a free copy of Ministep
(the student version of Winsteps) at www.winsteps.com/ministep.htm. When you open Ministep, it will
ask you to enter the control file name. Before beginning you will need to create a control file for the
assessment you have administered to your students. Specific examples of control files are listed in
Appendix A. Simply copy a control file which is similar to the test you have administered, and replace
the sample data with your own data.
In creating the control file, it is very important that you test only one concept at a time, or at least
include only the items which deal with one concept as you input them into the control file. Winsteps
(and Ministep) will create a sort of ruler to measure student ability in a domain, and just as you cannot
height and weight with the same instruments, you should only one concept in each analysis.
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
16
Winsteps example:4
A typical chapter test has various sections with various item types and score values. For example,
Becky has created a test for chapter 10 of her text, on probability. The first ten questions are multiple
choice questions which students either answer correctly or incorrectly. She awards two points for each
correct answer and zero points for each incorrect answer. The last eleven questions are short answer
items of varying worth, from two to twenty points. Students receive partial credit for their work, so very
few students receive a zero on any item in this section. Students lose points for a number of errors, such
as an incorrect wrong answer, an incorrect application of a formula used to solving the problem, an
incomplete conclusion, a weak justification for their answer, and careless mistakes. On a test like this, a
teacher typically sums the number of missed points, subtracts that from the total number of points
possible, and divides the resulting amount by the total number of points possible to come up with a
percentage to record as the test score.
What does this percentage communicate? A 100% is the easiest to interpret, because it indicates that
the student answered each item correctly, with accurate calculations, complete and insightful
conclusions and justifications and free of careless errors. But would this student receive a 100% on any
other teacher’s test on this same material? Does a 100% indicate complete mastery of the construct of
probability, such that if this student approached any question involving probability, this student would
successfully answer it? Scores below 100% communicate even less. John and Henry can both receive a
90%, even though John’s missed points were due to a lack of understanding on a couple of items,
whereas Henry had a very good understanding of the material, but made careless mistakes throughout
the test.
Below are specific tables which present the Rasch output for this same test.
Incidentally, student scores are reported as logits, which are log odds units (the log of the odds that a
student will answer items correctly and that items will be answered correctly.) A score of zero is the
mean score, and scores range from -3 to 3. Instead of thinking in terms of percentages for determining
grades, this analysis allows you to determine a proficiency point and to think in terms of distance from
that point.
Table 1.0 (Output Tables  Variable Maps)
This table displays the “ruler”, placing students along an ability continuum and items along a
difficulty continuum. One of the features of Rasch analysis is that item difficulty is not dependent upon
the students who take the test, and (more usefully in this case), student ability is not dependent upon the
items included in the test. You will notice that there is really no such thing as a 100%; as students are
simply placed higher or lower on the continuum of ability for the larger construct. You determine the
point that is sufficient proficiency. The items appear to the right of the vertical line; the more difficult
are at the top and the easier items are at the bottom. The students appear to the left; those who
demonstrated more understanding of the concept fall at the top, and those who demonstrated less
understanding are at the bottom.
4
For more information on running analysis in Winsteps, contact authors for the code used in examples in this handout and
see www.winsteps.com for user’s manual.
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
17
TABLE 1.0 Multiple Response Formats, Different Re ZOU160WS.TXT
----------------------------------------------------------------------------
2
1
0
0Xiang
-1
-2
PERSONS -MAP- ITEMS
<more>|<rare>
+
|
2Liam |
T|
2Quentin |
|
2Kelly |
|
2Jackie | 3 MC
2Patricia |
S|T
2Uma |
2Isabella +
2Faith |
2Geraldine |
1Taariq |
|
1David
1Oliver M|S 19 SA
| 12 SA
1Adam
1Steven |
1Elizabeth 1Matthew |
1Bryan
1Hugh | 18 SA
| 17 SA
0Robert | 8 MC
0Clay S+M 10 MC
0Yoshi
1Victor | 11 SA
0Nathan
0Whitney | 13 SA
|
|
|
|
T|S
| 20 SA
|
| 4 MC
|
+
|
|T
|
|
|
|
|
|
|
|
|
+ 1 MC
<less>|<frequ>
9 MC
15 SA
21 SA
14 SA
6 MC
7 MC
2 MC
5 MC
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
18
Here, item 21 are the most difficult, whereas items 1, 2, and 5 are the easiest. Incidentally, items 1,
2, and 5 appear at the base of the “ruler.” This indicates that all students correctly answered them, so we
don’t really know how easy they are. They might be extremely easy, or might just be easy enough that
this sample answered them correctly, but that another group of students with slightly less understanding
might not answer them correctly. In the case of this sample, they do not provide useful information in
determining student ability; thus, they are essentially excluded from this analysis.
Liam, Quentin and Kelly are at the top of the ability continuum, and Victor, Whitney, and Yoshi are
at the bottom. Actually, Liam and Quentin have a higher ability on this topic than the items were able to
gauge, but this is not of too much concern, since the goal of the test is to gain evidence of proficiency.
We are most interested in the “proficiency” cut-point.
How to determine the proficiency cut point
For a test such as this one, with multiple choice and partial credit items, but without a rubric, it is a
bit difficult to set a cut-point to indicate proficiency. One procedure developed by Julian and Wright
(1993, as cited in Stone, 1996), first identifies items and students around the criterion region, then
decides whether the items are required by a “passing examinee to be considered competent” (Stone,
1996). In the case of a classroom, you could begin by rating each student based on your perception about
his/her understanding of the concept. The ratings would be 2 (Definitely proficient in this area), 1 (Not
sure if proficient or not), 0 (Definitely not proficient in this area). You would place this number next to
the students’ names. Below they are placed to the left of student names; numbers to the right are scores
students received on the test items.
1
1
0
1
1
2
Adam
Bryan
Clay
David
Elizabeth
Faith
1
1
1
1
1
1
1
1
1
1
1
1
1
0
0
1
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
0
1
0
1
1
0
1
0
1
1
0
1
1
8
7
8
8
7
8
5
5
4
5
6
7
2
5
8
5
7
8
2 10 2 3 5 4 3
2 10 2 2 4 5 2
2 8 0 2 2 2 2
2 10 2 3 6 5 3
2 8 2 2 6 4 2
2 8 2 3 6 5 3
20
19
14
18
18
19
Once the assessment has been analyzed through Winsteps, you can draw a line to mark where the
students do not meet proficiency (marked with zeros), then draw a box around students in the “don’t
know” region (see example below.) Then look at the items in the box to determine whether the items at
this level are required by a “passing examinee to be considered competent” (Stone, 1996). You might
even set aside the names at this point and judge the items based on a rubric such as Kentucky’s General
Scoring Rubric for 11th grade.
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
19
5
5
This rubric was retrieved from
http://www.education.ky.gov/NR/rdonlyres/et233oxqbnhvl56spt6lbjdoltmkv5t2ok5j52nivmgoyf76y7we3upsgs5tcfquzm53so
fx5h44jnhjc2ep66hyr4f/Phase22004ReleaseGrade11.pdf
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
20
TABLE 1.0 Multiple Response Formats, Different Re ZOU160WS.TXT
INPUT: 25 PERSONS, 21 ITEMS MEASURED: 25 PERSONS, 20 ITEMS, 63 CATS
-----------------------------------------------------------------------------
2
1
0
0Xiang
-1
-2
PERSONS -MAP- ITEMS
<more>|<rare>
+
|
2Liam |
T|
2Quentin |
|
2Kelly |
|
2Jackie | 3 MC
2Patricia |
S|T
2Uma |
2Isabella +
2Faith |
2Geraldine |
1Taariq |
|
1David
1Oliver M|S 19 SA
| 12 SA
1Adam
1Steven |
1Elizabeth 1Matthew |
1Bryan
1Hugh | 18 SA
| 17 SA
0Robert | 8 MC
0Clay S+M 10 MC
0Yoshi
1Victor | 11 SA
0Nathan
0Whitney | 13 SA
|
|
|
|
T|S
| 20 SA
|
| 4 MC
|
+
|
|T
|
|
|
|
|
|
|
|
|
+ 1 MC
<less>|<frequ>
9 MC
15 SA
21 SA
14 SA
6 MC
7 MC
2 MC
5 MC
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
21
Let’s say I make the break between items 12 and 18, as indicated by the arrow above. Table 13.1
displays the measures. I am going to look at the measure column, and go with .53 as the cut-point.
TABLE 13.1 Multiple Response Formats, Different R ZOU304WS.TXT May 22 15:58
---------------------------------------------------------------------------PERSON: REAL SEP.: 1.77 REL.: .76 ... ITEM: REAL SEP.: 1.80 REL.: .76
TABLE 13.1 Multiple Response Formats, Different R ZOU160WS.TXT May 19 15:48 2006
INPUT: 25 PERSONS, 21 ITEMS MEASURED: 25 PERSONS, 20 ITEMS, 63 CATS
3.59.1
-------------------------------------------------------------------------------PERSON: REAL SEP.: 1.74 REL.: .75 ... ITEM: REAL SEP.: 1.75 REL.: .75
ITEM STATISTICS:
MEASURE ORDER
+--------------------------------------------------------------------------------------+
|ENTRY
RAW
MODEL|
INFIT | OUTFIT |PTMEA|EXACT MATCH|
|
|NUMBER SCORE COUNT MEASURE S.E. |MNSQ ZSTD|MNSQ ZSTD|CORR.| OBS% EXP%| ITEM G |
|------------------------------------+----------+----------+-----+-----------+---------|
|
3
12
25
1.33
.27|1.05
.3| .86
-.1| .45| 56.0 57.0| 3 MC 1 |
|
9
24
25
.62
.23|1.81
3.3|1.96
2.9| -.12| 12.0 36.3| 9 MC 1 |
|
19
114
25
.58
.16| .76
-.9| .77
-.9| .71| 32.0 26.8| 19 SA 0 |
|
12
155
25
.54
.19|1.20
.9|1.19
.8| .40| 16.0 35.8| 12 SA 0 |
|
18
141
25
.24
.15|1.28
1.1|1.36
1.3| .44| 16.0 28.3| 18 SA 0 |
|
17
65
24
.21
.30| .77
-.8| .77
-.8| .64| 54.2 52.3| 17 SA 0 |
|
8
34
25
.09
.24|1.13
.7|1.11
.4| .32| 32.0 35.6| 8 MC 1 |
|
15
234
25
.02
.26|1.17
.9|1.09
.4| .26| 40.0 43.1| 15 SA 0 |
|
21
430
25
.01
.08| .48 -1.4| .41 -1.8| .70| 32.0 25.3| 21 SA 0 |
|
10
36
25
-.02
.24| .93
-.3| .73
-.5| .47| 44.0 42.6| 10 MC 1 |
|
11
176
25
-.10
.14| .99
.1| .87
-.1| .43| 40.0 44.6| 11 SA 0 |
|
13
169
25
-.14
.15|1.25
.8|1.49
1.1| .37| 24.0 38.5| 13 SA 0 |
|
14
58
25
-.15
.31| .72 -1.2| .69 -1.3| .69| 64.0 53.5| 14 SA 0 |
|
20
69
25
-.65
.48| .94
-.2| .80
-.6| .38| 76.0 76.0| 20 SA 0 |
|
4
46
25
-.86
.38| .90
.0| .45
-.2| .34| 92.0 92.0| 4 MC 1 |
|
6
46
25
-.86
.38| .89
.0| .44
-.2| .35| 92.0 92.0| 6 MC 1 |
|
7
46
25
-.86
.38| .89
.0| .44
-.2| .35| 92.0 92.0| 7 MC 1 |
|
1
50
25
-2.23
1.30| MINIMUM ESTIMATED MEASURE |
| 1 MC 1 |
|
2
50
25
-2.23
1.30| MINIMUM ESTIMATED MEASURE |
| 2 MC 1 |
|
5
50
25
-2.23
1.30| MINIMUM ESTIMATED MEASURE |
| 5 MC 1 |
|------------------------------------+----------+----------+-----+-----------+---------|
| MEAN
100.3
25.0
-.33
.41|1.01
.2| .91
.0|
| 47.9 51.3|
|
| S.D.
96.0
.2
.95
.38| .28
1.1| .41
1.1|
| 26.3 22.3|
|
+--------------------------------------------------------------------------------------+
Once you have determined the cut-point, you can use Table 18.3 as a diagnostic tool.
Table 18.3 (Output Tables  PERSON Keyforms: entry)
These charts list individual students, with the score we would expect each one to receive on each
item, given their overall score on the test. The items listed to the right of the chart are from most difficult
to least difficult. The vertical line of scores is placed at their ability level on the given topic. The scores
in this line are what we would expect the student to receive given his/her ability level. Scores to the left
of that line are actual scores which are lower than we would have expected given the student ability
level. Scores to the right are actual scores which are higher than we would have expected. The actual
scores are placed at the point on the horizontal ability continuum where the student would have fallen
had he or she consistently answered as he/she answered the given item. Scores to either side of the
continuum which are between periods are considered acceptably close to the expectation. However,
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
22
scores in between parentheses are those which differ significantly from the expectation. These items
should be reviewed; especially those which fall below the ability line.
Earlier we set .53 as the proficiency cut point. To aid in determining what instruction would be
helpful in bringing individual students to proficiency, you can draw a vertical line at the approximate
proficiency point on the horizontal line, and highlight items which fall below that line. These are the
items on which the student should focus.
The first table below displays the expectations and actual responses by Adam:
TABLE 18.3 Multiple Response Formats, Different R ZOU304WS.TXT May 22 15:58 2006
INPUT: 25 PERSONS, 21 ITEMS MEASURED: 25 PERSONS, 21 ITEMS, 67 CATS
3.59.1
-------------------------------------------------------------------------------KEY: .1.=OBSERVED, 1=EXPECTED, (1)=OBSERVED, BUT VERY UNEXPECTED.
NUMBER - NAME ------------------ MEASURE - INFIT (MNSQ) OUTFIT - S.E.
1 1Adam
.47
2.2
1.6
.24
-3
-2
-1
0
1
2
3
4
|-------+-------+-------+-------+-------+-------+-------|
0
(2)
1
.2.
.4.
.5.
6
.5.
.3.
1
.2.
9
.10.
18
.20.
.0.
1
7
.8.
(2)
7
.2.
.3.
.2.
.2.
.2.
.2.
|-------+-------+-------+-------+-------+-------+-------|
-3
-2
-1
0
1
2
3
4
NUM
3
9
19
12
18
17
8
15
21
10
11
13
14
20
4
6
7
16
NUM
ITEM
3null & alternative hypothesesMC
9margin of error MC
19test stat & p-value calc SA7
12cond for significance test SA8
18conditions for inference SA8
17state hypotheses SA4
8p-value calc MC
15construct, interpret CI SA10
21construct and interp90% CI SA20
10margin of error MC
11population and parameter ID SA8
13test stat and p-value calc SA8
14conclusion SA3
20conclusion SA3
4conditions for inferenceMC
6standard error calc MC
7conditions for inferenceMC
16proprtions calc SA2
ITEM
Adam measured slightly below the determined cutpoint, as indicated by the vertical line of scores to the
left of the straight line placed at the approximate proficiency point. The scores in the vertical line
indicate the expectation for Adam’s scores on each item given the overall score on the test. Numbers to
the left and right of that line are the actual scores Adam received on each item, placed at the difficulty
point of that level of response. Many of his responses fall well into the proficiency zone, including items
3, 9, 8, 15, 21, and 11. Any other item would need more attention to make it to proficiency, especially
items 12, 10 and 13, which fall well below the proficiency line. The concepts found in these items would
be most important in formulating the instructional plan for Adam.
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
23
The chart below displays Isabella’s scores. She was in the proficient range, however, she answered
below the expectation on items 18, 8, 15 and 14.
TABLE 18.11 Multiple Response Formats, Different ZOU304WS.TXT May 22 15:58 2006
INPUT: 25 PERSONS, 21 ITEMS MEASURED: 25 PERSONS, 21 ITEMS, 67 CATS
3.59.1
-------------------------------------------------------------------------------NUMBER - NAME ------------------ MEASURE - INFIT (MNSQ) OUTFIT - S.E.
9 2Isabella
1.08
.9
.8
.31
-3
-2
-1
0
1
2
3
4
|-------+-------+-------+-------+-------+-------+-------|
1
.2.
1 .2.
5 .6.
.7.
.5.
6
.3.
(0)
2
.9.
10
.19.
.2.
.8.
.8.
.2.
3
.3.
.2.
.2.
.2.
.2.
|-------+-------+-------+-------+-------+-------+-------|
-3
-2
-1
0
1
2
3
4
NUM
3
9
19
12
18
17
8
15
21
10
11
13
14
20
4
6
7
16
NUM
ITEM
3null and alt hypotheses MC
9margin of error MC
19test stat & p-value calc SA7
12conditions for signif test SA8
18conditions for inference SA8
17state hypotheses SA4
8p-value calc MC
15construct, interpret CI SA10
21construct and interp CI SA20
10margin of error MC
11population and parameter ID SA8
13test stat and p-value calc SA8
14conclusion SA3
20conclusion SA3
4conditions for inf MC
6standard error calc MC
7conditions for inference MC
16proprtions calc SA2
ITEM
You would work with Isabella on these concepts to help her understand these items which were
especially difficult for her.
Evaluation and Reflection
Evaluation and Reflection – having teachers reflect on their instruction and student learning in order to
improve teaching practice.
The Learning Record recommends that teachers and/or students set aside at least three samples of
student work for inclusion in the Learning Record. The selected work should demonstrate
understandings the student has gained, and could be assessments from class, an investigation a student
conducted, a presentation, or an assignment that is relevant in demonstrating student understanding.
Students should include a written comment about why he/she selected the work.
*** Working Draft ***
Do not reproduce or cite without permission of the authors.
24