EVALUATION TESTS The essay test refers to any written test that

III.2
Educational Evaluation
EVALUATION TESTS
The essay test refers to any written test that requires an examinee to write a
sentence, a paragraph or longer passages and that demands a subjective
judgement about its quality and completeness when it is scored.
THE ESSAY QUESTION
The distinctive features of the essay question are:
1) No single answer can be considered throught correct;
2) The examinee is permitted freedom of response:
3) The answers vary
4) The answer needs a comprehensive explanation
TYPES
1) Extended response
2) Restricted response
Overall Essay questions can be divided into
1. Essay Type (Recall Type)
a. Discussion or extended response;
b. Short answer or restricted response;
c. Oral
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
For eg. In Geography
Q1. What is a phase of moon?
Q2. What are the factors causing pollution of oceanic waters?
Q3. Describe volcanic activities.
In Economics
Q1. What are the main indicators of economic development?
Q2. Explain the major objectives of GATT.
Q3. Write the functions of WTO.
In English
Q1. Write a brief 250 words composition of the following topic.
1.
_________ if trees could talk.
In Science
Q1. Define Classification of organism. Enlist the kingdom with examples
Q2. Describe the structure of human brain.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
In History
Q1. What is prehistoric period?
Q2. How was harappan civilization discovered?
OBJECTIVE TYPE TEST
It refers to any written test that requires the exa minee to select the correct
answer from among one or more of several alternatives or supply a word or
two and that demands an objective judgement when it is scored.
OBJECTIVE TYPE TEST ITEM:
The most important criterion of an objective type test item is that it can be
most objectively scored.the scoring will not vary from examiner to examiner.
2. Objective Type
A)supply-type items
a)short answer:
1) Single word, symbol, formula
2) Multiple words or phrases
b) Completion (fill up the blanks)
B) Selection type items (Recognition type)
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
a) Alternate response :true-false, yes-no ,right-wrong;
b) Multiple-choice;
c) Matching.
c) Context-dependent type items
a) Pictorial
b) Interpretative
Examples :
a. which of the following is the formula for finding the number of subsets for
given set?
A. 2n
B.2+n
C.n2 D.2n
b. The number of subsets of the set (1,2,3) is –
A.3
B.6
C.8
D.9
THE MULTIPLE CHOICE ITEM (THE MC ITEM)
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
The MC item consists of two parts:
1. the stem, which contains the problem;
2. responses or options, i.e; a list of suggested answers
QUESTION VARIETY :
Stem : which one of the following men invented the telephone?
Response or options: a. Marconi b. Edison c. Bell d. Faraday e. Morse.
INCOMPLETE VARIETY :
Stem : the telephone was invented byRespose : a. Marconi b. Edison c. Bell d. Faraday e. Morse.
Variation of the MC Formate
1. Correct answer
2. Best Answer
3. Analogy Type
4. Numeric Series Type.
5. Substitution Type
6. Absurdities Type
7. Reverse Type
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
EDUCATIONAL EVALUATION PAPER 3.2
BLOOMS TAXONOMY
PRESENTED BY
PROF. MEGHA D GOKHE
TAXONOMIES OF EDUCATIONAL OBJECTIVES
The taxonomy of educational objectives is basically a classification scheme.
pURPOSES
a) To establish the accuracy of communication.
b) To reduce the vagueness
c) To become a means of a more precise communication.
d) To establish a common understanding.
e) To be a great help in clearly defining.
Three-flod Divisions of instructional objectives.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
Cognitive -(the recall or recognition of knowledge and the development of
intellectual abilities and skill).
Affective – (changes in interests and values, and the development of
application).
Psychomotor – (development of manipulative or motor skills).
Congnitive domain
1) Knowledge
It involves the recall of specifics and universals, methods and processes, or of
a pattern, structure or setting.
a) Knowledge of terminology and facts.
b) Knowledge of conventions, trends and sequences, classification and
categories, criteria, methodology.
2) Comprehension
It represents the lower level of under-standing and includes.
a) Translation of facts, principles and theories.
b) Interpretation
c) Extrapolation
3) Application
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
The use of abstractions in particular and concrete situations. The abstraction
may be in the form of general ideas, rules and theories
a) Making generalist ion of facts
b) Diagnosis of the weakness
c) Application
4) Analysis
The breakdown of a communication into its constituent elements
a) An analysis of elements
b) The establishment of relationship
c) The formulation of some principles
5) Synthesis
The putting together of elements and parts so as to form a whole.
a) The production of a unique communication.
b) The suggestion of new plan
c) The establishment and derivation
6) Evaluation
This includes
a) Quantitative and qualitative judgement about the value of material
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
b) Judgement in terms of internal evidence
Affective domain
It includes those objectives which deal with a
attitudes, values interest and appreciation.
1) Receiving (attending): This means that the learner should be sensitized
to the existence of certain phenomena and stimuli.
a) Awareness about the stimulus or phenomena.
b) Willingness in the learner to receive it
c) Controlling the attention and selected affection of the leaner.
2) Responding: This is concerned with responses that go beyond merely
attending to phenomena.
a) The learner‟s acquiescence in responding
b) The learner‟s willingness to respond
c) The learner‟s satisfaction in respond.
3) Valuating: This objective concerns with the worth of a thing, phenomena
or behavior, which is a result of individual‟s own valuing and assessment.
a) Acceptance of a value
b) Preference for a value
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
c) Commitment to or a conviction in regard to a certain point of view.
4) Organization: For situations where more than one value is relevant, the
need arises
a) The organization of the value into a system
b) The determination of the interrelationship among them
c) The establishment of the dominant value
5) Characterization by a value or value complex: At this level, the already
existing values are organized in to some kind of an internally consistent
system and control the behavior of an individual, who attains an integration of
his beliefs and attitudes into a total philosophy.
Psychomotor domain
“The psychomotor domain includes those objectives which deal with manual
and motor skills.”
1) Imitation of an action, performance
2) Manipulation of an act. This includes differentiating among various
movements and selecting the proper one
3) Precision in reproducing a given act. This includes accuracy, proportion
and exactness in performance.
4) Articulation among different acts, This includes co-ordination
, sequence and harmony among acts.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
5) Naturalisation: Here a pupil‟s skill attains its highest level of proficiency in
performing an act with the least expenditure of psychic energy. The act
becomes so automatic that it is attended to unconsciously.
Thus the behaviour of a child is governed by his development in
three
domains: cognitive, affective and psychomotor.
A GOOD MEASURING INSTRUMENT
A good tool or a machine, a good measuring instrument must meet certain
minimum requirements. The essential characteristics and requirements of a
good measuring instrument Planning, validity, reliability, objectivity,
discriminating power, adequacy, practicality, comparability and utility
 Planning
- Adequate planning
- Design of an instrument
- Different weightages to be given to the different aspects of the tools
- Its proper blue print
 Reliability
Defines as
- The degree of consistency among test scores.
- The degree of consistency with which the test measures what it does
measure.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
- A test score is called reliable when we have reasons for believing it to be
stable and trustworthy.
There are many reasons why a pupil‟s test score may vary.
a) Trait instability: The characteristics we measure may change over a
period to time
b) Sampling Error: Any particular question we ask in order to infer a
person‟s knowledge may affect his score.
c) Administrative error: any change indirection, timing or amount of
rapport with the test administrator may cause score variability.
d) Scoring error: Inaccuracies in scoring a test paper will affect the scores.
e) Other factors: Such things as health, motivation, degree of fatigue of the
pupil and good or bad luck in guessing may cause score variability.
 VALIDITY
Defined
The accuracy with which a test reliably measures what is relevant.
It should always be remembered that
a) Validity is an inclusive term
b) Validity is a matter of degree
c) Validity is specific rather than general
 Types of Validity
1) Content validity
2) Concurrent validity
3) Predictive validity
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
4) Constructive validity
 Objectivity
A test is objective when the scorer‟s personal judgment does not affect
scoring. Objectivity in a test makes for the elimination of biased opinion or
judgment of the person who scores it.
 Objectivity
In an objective test, test items can readily be scored as right or wrong. The
true-false type, the alternate response type, the matching type, the multiplechoice type test items are highly objective, while essay type items are highly
subjective.
The objectivity of the test can be increased by
a) Using more objective type test items
b) Preparing a marking scheme or a scoring key
c) Setting realistic standards
d) Asking two independent examiners to evaluate the test and using the
average score of the two as the final score
 Discriminating Power
The basic function of all educational measurement is to place individuals in a
defined scale in accordance with differences in their achievements. Such a
function implies a high discriminating power on the part of a test. Since tests
are made up of separate items, it is clear that each item must have this quality
in a maximum degree if the total test is to possess it. This quality of a test
directly affects its validity.
 Adequacy
A measuring instrument should be adequate i.e. balanced and fair.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
The highest mountain in the world is Mt. Everest.
 Practicality
Practicality is an important criterion for assessing the value of a test; and it
depends upon a number of factors.
a)Ease of Administrability
b) Ease of Scoring
c) Ease of Interpretation
d Economy
 Comparability
A Test possesses comparability when scores resulting from its use can be
interpreted in terms of a common base that has a natural or accepted meaning.
1) Availability of equivalent forms of a test.
2) Availability of adequate norms.
 Utility
A Test possesses utility to the extent to which it satisfactorily serves a definite
need in the situation in which it is used.
THAKUR SHYAMNARAYAN
COLLEGE OF EDUCATION
&
RESEARCH
Prof. Megha Gokhe.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
 Criterion Reference Test
CRT is meant to measure the achievement of an examinee on a certain domain
to find out his level of achievement in that domain. It has little to do with the
achievement level of other examinees.
 Criterion Reference Test
In the words of Gronlund, N.E.(1985), criterion-referenced test is “a test
designed to provide a measure of performance that is interpretable in terms of
a clearly defined and delimited domain of learning tasks.”
 Characteristics
1. Its main objective is to measure student‟s achievement of curriculum
based skills.
2. Number of correct items.
3. Percent of correct items.
4. Derived score based on correct items and other factors.
 Uses
1. To identify the master learners and non-master learners in class.
2. To find out the level of attainment of various objectives of instruction.
3. To find out the level at which a particular concept has been learnt.
4. To better placement of concepts at different grade levels.
 Norm-Referenced Test
Norm-Referenced Test is used primarily for comparing achievement of an
examinee to that of a large representative group of examinees at the same
grade level. The representative group is know as the „Norm Group‟.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
 Norm-Referenced Test
Bormuth (1970) writes that a norm-referenced test is designed “to measure the
growth in a student‟s attainment and to compare his level of attainment with
the level of attainment with the levels reached by other students and norm
group.”
 Characteristics
1. Its basic objective is to measure student‟s achievement in curriculum
based skills.
2. It is prepared for a particular grade level.
3. It is administered after instruction.
4. It is used for forming homogeneous or heterogeneous class group
5. It classifies achievement as above average, average of below average for
a given grade.
 Uses
1. To get a reliable rank ordering of the pupils with respect to the
achievement we are measuring;
2. To identify the pupils who have mastered the essentials of the course
more than others;
3. To select the best of the applicants for a particular programme;
4. To find out how effective a programme is in comparison to other
possible programmes.
Thanks
You
 ANECDOTAL RECORDS
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
These are records of specific incidents, factual description of important and
meaningful events or behaviour of students on informal occasions.
 Characteristics
 Factual description of what happened, when it happened, under what
circumstances the behaviour.
 The interpretation and recommended action should be noted separately
from the description.
 Each anecdotal record should contain a record of a single incident
 The incident recorded should be one that is considered to be significant
to the pupil‟s growth and development.
 Advantages
 If properly used, they provide a factual record of an observation of a
single, significant incident in the pupil‟s behaviour.
 They record critical incidents of spontaneous behaviour.
 They provide the teacher with objective descriptions.
 They are very good for young children who are unable to use penciland-paper test.
 They direct the teacher‟s attention to a single pupil.
 They provide for a cumulative record of growth and development.
 Limitations
 They tend to be less reliable than other observational tools as they tend
to be less formal and systematic.
 They are time-consuming to write.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
 It is difficult for the observer to maintain objectivity when he records the
incident observed.
 The observers tend to record only undesirable incidents and neglect the
positive incidents.
 Making Effective
 Restrict observations to those aspects to behaviour which cannot be
evaluated by other means.
 Concentrate on only one or two behaviours.
 Observation should be selective
REVISED TAXONOMY
Background: In the late 1950s into the early 1970s here in the US there were
attempts to dissect and classify the varied domains of human learning cognitive (knowing, head), affective (feeling, heart) and psychomotor
(doing, hand/body). The resulting efforts yielded a series of taxonomies in
each area. A taxonomy is really just a word for a form of classification. The
following taxonomies deal with the varied aspects of human learning and are
arranged hierarchically proceeding from the simplest functions to those that
are more complex.
While all of the taxonomies have been defined and are explained in this site
via the hotlinks, the material below is a simple overview of the newer version
of the cognitive domain. You can also search the Web for various references
on these different taxonomies, as well as explore the active hyperlinks below.
There are many valuable discussions of the development of the varied
taxonomies and examples of their usefulness and application in teaching. If
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
you find that some of my links are not working, please let me know through
my e-mail link as I know how frustrating that can be. Also, if you have
additional related resources that you think I might be interested in, please
write sending the URL.
The Cognitive Domain: In the following table are the two primary existing
taxonomies of cognition. The one on the left, entitled Bloom's, is based on the
original work of Benjamin Bloom and others as they attempted in 1956 to
define the functions of thought, coming to know, or cognition. This taxonomy
is over 50 years old.
The taxonomy on the right is the more recent adaptation and is the redefined
work of one of Bloom's former students, Lorin Anderson, working with one of
his partners in the original work on cognition, David Krathwohl. That one is
labeled Anderson and Krathwohl. The group redefining Bloom's original
concepts, worked from 1995-2000. The group was assembled by Anderson
and Krathwohl and included people with expertise in the areas of cognitive
psychology, curriculum and instruction, and educational testing,
measurement, and assessment.
As you will see the primary differences are not just in the listings or
rewordings from nouns to verbs, or in the renaming of some of the
components, or even in the repositioning of the last two categories. The major
differences in the updated version is in the more useful and comprehensive
additions of how the taxonomy intersects and acts upon different types and
levels of knowledge -- factual, conceptual, procedural and metacognitive.
Taxonomies of the Cognitive Domain:
Bloom's Taxonomy 1956
1. Knowledge: Remembering or
retrieving previously learned material.
Examples of verbs that relate to this
function are:
Mrs. Megha Gokhe
TSCER
Anderson and
Krathwohl's Taxonomy
2000
1. Remembering:
Retrieving, recalling, or
recognizing knowledge
from memory.
1 of 29
III.2
Educational Evaluation
know
identify
relate
list
define
recall
memorize
repeat
record
name
recognize
acquire
Remembering is when
memory is used to
produce definitions,
facts, or lists, or recite or
retrieve material.
2. Comprehension: The ability to
grasp or construct meaning from
material. Examples of verbs that relate
to this function are:
2. Understanding:
Constructing meaning
from different types of
functions be they written
or graphic messages
restate
identify illustrate
activities like
locate
discuss interpret
interpreting,
report
describe draw
exemplifying,
recognize review
represent
classifying,
explain
infer
differentiate
summarizing,
express
conclude
inferring, comparing,
and explaining.
3. Application: The ability to use
3. Applying: Carrying
learned material, or to implement
out or using a procedure
material in new and concrete situations. through executing, or
Examples of verbs that relate to this
implementing. Applying
function are:
related and refers to
situations where learned
apply
organize
practice
material is used through
relate
employ
calculate
products like models,
develop restructure show
presentations, interviews
translate interpret
exhibit
or simulations.
use
demonstrate dramatize
operate illustrate
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
4. Analysis: The ability to break down
or distinguish the parts of material into
its components so that its
organizational structure may be better
understood. Examples of verbs that
relate to this function are:
analyze
compare
probe
inquire
examine
contrast
categorize
differentiate
contrast
investigate
detect
survey
classify
deduce
experiment
scrutinize
discover
inspect
dissect
discriminate
separate
5. Synthesis: The ability to put parts
together to form a coherent or unique
new whole. Examples of verbs that
relate to this function are:
compose
produce
design
assemble
create
prepare
predict
modify
Mrs. Megha Gokhe
plan
invent
formulate
collect
set up
generalize
document
combine
propose
develop
arrange
construct
organize
originate
derive
write
TSCER
4. Analyzing: Breaking
material or concepts into
parts, determining how
the parts relate or
interrelate to one another
or to an overall structure
or purpose. Mental
actions included in this
function are
differentiating,
organizing, and
attributing, as well as
being able to
distinguish between the
components or parts.
When one is analyzing
he/she can illustrate this
mental function by
creating spreadsheets,
surveys, charts, or
diagrams, or graphic
representations.
5. Evaluating: Making
judgments based on
criteria and standards
through checking and
critiquing. Critiques,
recommendations, and
reports are some of the
products that can be
created to demonstrate
the processes of
evaluation. In the newer
taxonomy evaluation
comes before creating as
1 of 29
III.2
Educational Evaluation
tell
relate
propose
it is often a necessary
part of the precursory
behavior before creating
something.
Remember this one
has now changed places
with the last one on the
other side.
6. Evaluation: The ability to judge,
check, and even critique the value of
material for a given purpose. Examples
of verbs that relate to this function are:
judge
assess
compare
evaluate
conclude
measure
deduce
argue
decide
choose
rate
select
estimate
validate
consider
appraise
value
criticize
infer
6. Creating: Putting
elements together to
form a coherent or
functional whole;
reorganizing elements
into a new pattern or
structure through
generating, planning,
or producing. Creating
requires users to put
parts together in a new
way or synthesize parts
into something new and
different a new form or
product. This process is
the most difficult mental
function in the new
taxonomy.
This one used to be #5
in Bloom's known as
synthesis.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
Table 1.1 Bloom vs. Anderson/Krathwohl
Visual Comparison of the two taxonomies
Bloom et al 1956
Anderson & Krathwohl et al 2000
Evaluation
Create
Synthesis
Evaluate
Analysis
Application
Analyze
Apply
Comprehension
Understand
Knowledge
Remember
One of the things that differentiates the new model from that of the 1956
original is that it lays out components nicely so they can be considered and
used. And while the levels of knowledge were indicated in the original work
� factual, conceptual, and procedural -- these were never fully understood or
used by teachers because most of what educators were given in training
consisted of a simple chart with the listing of levels and related accompanying
verbs. The full breadth of Handbook I and its recommendations on types of
knowledge were rarely discussed in any instructive way. Nor were teachers in
training generally aware of any of the criticisms of the original model. The
updated version has added "metacognitive" to the array of knowledge types.
Here are the intersections as the processes impact the levels of knowledge.
Using a simple cross impact grid or table like the one below, one can match
easily activities and objectives to the types of knowledge and to the cognitive
processes as well.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
Cognitive Processes
The
Knowledge
Dimensions
1.
2.
3.
4.
5.
6.
Remembe Understan Appl Analyz Evaluat Creat
r
d
y
e
e
e
Factual
Conceptual
Procedural
Metacognitiv
e
Knowledge Dimensions Defined:
Factual Knowledge is knowledge that is basic to specific disciplines. This
dimension refers to essential facts, terminology, details or elements students
must know or be familiar with in order to understand a discipline or solve a
problem in it.
Conceptual Knowledge is knowledge of classifications, principles,
generalizations, theories, models, or structures pertinent to a particular
disciplinary area.
Procedural Knowledge refers to information or knowledge that helps
students to do something specific to a discipline, subject, area of study. It also
refers to methods of inquiry, very specific or finite skills, algorithms,
techniques, and particular methodologies.
Metacognitive Knowledge is the awareness of one�s own cognition and
particular cognitive processes. It is strategic or reflective knowledge about
how to go about solving problems, cognitive tasks, to include contextual and
conditional knowledge and knowledge of self.
Source: . Anderson, L. W. and David R. Krathwohl, D. R., et al (2000) A
Taxonomy for Learning, Teaching, and Assessing: A Revision of Bloom's
Taxonomy of Educational Objectives. Allyn & Bacon
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
 Rating Scales
Rating scales resemble check lists but are used when finer discriminations are
required.
 Uses of Rating Scales
1. They measure specified outcome
2. They evaluate procedure (playing of instrument, typing, cooking),
products (typed letters, written themes), and personal-social
development.
3. They help teachers to rate their students periodically on various
characteristics. E.g. punctuality, enthusiasm, cheerfulness,
co-operativeness.
4. Used by pupil to rate himself.
5. A teacher can make use of them for evaluating the effectiveness of the
instructional procedure, teaching-learning strategy, tactics and aids.
 Advantages of rating scales
- They can be used with a large number of students
- Very adaptable and flexible.
- Efficient and economical
- Comprehensive in the amount of information
 Type of rating scales
 Numerical
 Graphic
 Descriptive Graphic
 Ranking
 Numerical rating scale
This is one of the simplest types of ratings scales.
- The rater simply marks a number that indicates the extent to which a
characteristic or trait is present
- The trait is presented as a statement and values
Direction: Encircle the appropriate number showing the extent to which the
pupil exhibits his skill in questioning.
Key:5-outstanding 4-above average 3-average 2-below average 1unsatisfactory
Skill
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
i. Questions were specific
ii. Questions were relevant to the topic discussed
iii. Questioned were grammatically correct, etc.
 Graphic rating scale
As in the case of the numerical rating scale, the rater is required to assign
some value to a specific trait.
This time, however, instead of using predetermined scale values, the ratings
are made in graphic form – a position anywhere along a continuum.
Direction: Rate for each characteristic listed blow along the continuum form
1 to 5. You can point: between the scale values
Were the illustrations used interesting?
1
2
3
4
5
Too little Little
Adequate
Much
Too much
 Descriptive graphic scale
This type of scale is generally the most desirable type of scale to use
Direction: As shown in graphic rating scale
While preparing a blackboard summary, how was the penmanship
- legible, beautiful uniform size and slant
- Normally readable, good-looking, fluent motion.
- Illegible, bad-looking, tends to draw outlines
 Ranking
The rater, instead of assigning a numerical value to each student with regard
to a characteristic, ranks a given set of individuals from high to low on the
characteristic that is rated.
 Sources of error in rating scales
- Ambiguity
- The Personality of the rater
- Attitude of the rater
- Opportunity for adequate observation.
Check List
 Check List
A Check list consists of a listing of steps, activities or behaviour which the
observer records when an incident occurs.
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
A check list enables the observer to note only whether or not a trait or
characteristic is present
 Advantage of check lists
 They are adaptable to most subject-matter areas.
 They are useful in evaluating those learning activities that involve a
product, process and some aspects of personal-social adjustment.
 They are most useful for evaluating those processes that can be dubdivided into a series of clear, distinct, separate actions.
 They provide a simple method to record observations.
 They objectively evaluate traits or characteristics.
 They may be used for evaluating interest, attitudes and values of the
learner
 Directions
Listed below are a series of characteristics related to health practices. Check
those characteristics which are applicable to students.
 Examples
 Expresses each item in clear, simple language;
 Avoids negative statements wherever possible;
 Makes sure that each item is clearly true or false;
 Reviews the items independently.
 Factors Influencing Reliability
Method
The method used in obtaining data on reliability affects the reliability
coefficient both stability and equivalence.
Interval
With any method involving two testing occasions, the longer the interval of
time between two test administrations, the lower the coefficient will tend to
be.
Test Length
Adding equivalent items makes a test more reliable.
Speed
A test is considered to be pure speed test if everyone who reaches an item gets
it right, but no one has time to finish all the items.
Group Homogeneity
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
The test is more reliable when applied to a group of students with a wide
range of ability than one with a narrow range of ability.
Difficulty of the items
Too easy or too difficult tests for a group will tend to be less reliable because
the differences among the students in such tests are narrow.
Objectivity of scoring
The more subjectively a measure is scored, the lower its reliability. Ordinarily
objective type tests are more reliable than subjective or essay type test.
Ambiguous Wording of items
When questions are interpreted in different ways at different times by the
same students, the test becomes less reliable.
Inconsistency in Test Administration
Inconsistency in test administration such as deviations in timing, procedure,
instruction etc., fluctuations in interest and attention of the pupils shifts in
emotional attitude, etc., make a test less reliable.
Optional Questions
If optional question are given, the same students may not attempt the same
items on a second administration; thereby the reliability of the test is reduced.
 Factors Affecting Validity
Unclear direction
Unclear - State whether the following statements are true or false. Give
reasons.
Clear – State whether each of the following statements is true or false. If false,
give reasons.
Reading vocabulary
It the reading vocabulary is poor the students fail to reply to the item, even if
they know the answer.
Difficult sentence construction
If a sentence is so constructed as to be difficult to understand, students would
be unnecessarily confused, which will affect the validity of the test.
Poorly constructed test items
Poorly constructed test items reduce the validity of a test.
Use of inappropriate items
With the help of objective type items, the pupils power of organising matter
cannot be judged.
Medium of expression
Mrs. Megha Gokhe
TSCER
1 of 29
III.2
Educational Evaluation
English, as the medium of instruction and response for non-English medium
students, creates many serious problems.
Difficulty level of items
Too easy too difficult test items would not discriminate among pupils.
Influence of extraneous factors
Extraneous factors, like the style of expression, legibility, mechanics of
grammar handwriting, length of the answer method of organising the matter,
etc., influence the validity of a test.
Inappropriate time limit
In a speed test, if no time limit is given, the results will be invalidated.
Inadequate coverage
Essay type items generally fail to cover a vast portion of the content.
Inadequate sampling lowers validity.
Inadequate weightage
Inadequate weightage to subtopics or objectives or various forms of questions
would call into question the validity of a test.
Quiz items
Sometimes students, because of their inability to understand a test item, guess
and respond.
 Relation Between Validity and Reliability
i) Validity is sometimes defined as truthfulness while reliability is
sometimes defined as trustworthiness.
ii) In order that a test should be valid, it must first of all be reliable. A
measure might be very consistent but not accurate.
iii) Neither validity nor reliability is an either. They are degrees of each.
iv) Since a single test may be used for many different purposes there is no
single validity index for a test.
v) Validity includes reliability. A classroom test should be both consistent and
relevant; this combination of characteristics is called validity.
Mrs. Megha Gokhe
TSCER
1 of 29