ESL Test Creation Guidelines

ESL Test Creation Guidelines
1) Test writers should have experience of and/or qualifications in writing materials
for English learners, or should be willing to undergo a training period in
English-test's methodology and submit sample test sets for approval.
(see note 1)
2) Materials used in creating sentences should ideally be original, since material
garnered from the Internet can easily, by cut and paste, provide the missing
word. However, the use of authentic materials excerpted from the
internet or other sources and appropriately limited and modified to avoid
intimation of plagiarism can also be a valuable source of real written or
spoken English. (see note 2)
3) Sentences created for tests should by inference/indication/hint provide a
sensible pointer to the missing word. A check on the suitability of the correct
word can be seen when the whole sentence provides a reasonable definition of
the missing word. Such internal indication should not be a definition in
and of itself or an overt synonym, however, because in such questions
we are testing for meaning within the statement; the sentential context
should only limit the choice of words that can collocate meaningfully; a
good test sentence does not present an overt answer but an environment
in which the correct answer is the most appropriate choice. (see note 3
and 4)
4) Words offered as choices should have some kind of symmetry whereby all are
adjectives/adverbs/verbs and so on. Consistency of part of speech is one of
the key factors in choosing attractive distractors for the vocabulary
questions of the TOEIC. (This does not apply to the grammar questions,
where verb forms and parts of speech are of the essence.)
No ‘made up’ words should be used (this includes in particular false verb
forms such as 'digged' or 'layed').(see note 3 and 4)
5) Continuity of theme in the ten test sentences is recommended rather than ten
randomly created sentences. To that end the test needs a descriptive title apart
from its level. (see note 5)
6) It is important to bear in mind the purpose of the tests is to help students
consolidate or develop their vocabulary by seeing the ‘missing’ word in context.
(see note 6)
7) Conversational/informal/everyday English should not be neglected and can
usefully be used with ‘dialogue’ tests. Test writers should be clearly aware of
whether the particular test question in hand is dialogue or written
statement, as their registers are quite different. They should also keep
in mind– and feel free to utilize– the various sub-registers of spoken and
written English. For instance, spoken English is not necessarily slangladen, and written language can come from casual emails as well as
formal essays.
8) Once a tester has been accepted as competent, there seems little point in any
of the moderators going through tests with a fine toothcomb to suggest changes.
Mistakes, errors and typos are inevitable and can be put right. This can be dealt
with in the forum: Feedback, Comments and Suggestions. (see note 8)
9) In keeping with the spirit of English-test, the lighter touch should be
encouraged so that the tests do not smack too much of the ‘dry-as-dust’ attitude.
10) In making the choices of words, the tester has to bear in mind that
explanations are always needed by learners who want to know why their chosen
word is not correct. This means that the writer him/herself must have
created the choices with a good rationale behind them.
Accepting the proviso outlined in (1) concerning the required competence of the
test writer, I would recommend that it falls to the originator to check the accuracy
of the test and put right any typos/errors/wrong answers that appear in the
transcription. (see note 8 again)
12) A useful function of the new forum would be to offer suggestions/new ideas
concerning the creation and construction of tests - anything to get away from the
traditional : Fill in the missing word in these sentences: The cat sat **** the mat
(a) off (b) on (c) under (d) over - if you see what I mean!
I do feel quite earnestly that the forum should not be there for members simply
to voice alternatives along the lines of “if it were me, I'd say so and so.” As I
wrote earlier to Torsten, there will never be a perfect test and we could go on
chuntering about choices till the cows come home. (see note 12)
NOTES to ESL Test Creation Guidelines (MM)
1) As we develop a methodology and a concept of what the English-test test should be, we should look toward
enlisting and training members who reveal talent in this direction.
2) As we have discussed before, random sentences excerpted from other sources and not presented as one's
own concept are not plagiary. ['plagiarism: taking the writings or literary concepts (a plot, characters, words) of
another and selling and/or publishing them as one's own product. Quotes which are brief or are acknowledged
as quotes do not constitute plagiarism'-- Dictionary.law.com]. They are on the contrary an invaluable source of
authentic communication NOT produced simply as a test question. News articles and interviews are good
sources of real written and spoken English respectively. It is easy enough to include an acknowledgement in the
title slot if it is felt necessary.
Since we are not a certifying body and we don't give students cash awards for passing our tests (do we?), I see
no reason to prevent students from going and looking for the answers elsewhere if they wish to use our tests in
that way. We are not schoolmasters and they are not schoolboys. However, minor modifications of authentic
texts will usually put the kibosh on their efforts in that direction.
3 and 4) It is true that the sentence must contain hints toward the answer; otherwise there would be no point in
having a sentence at all. However, it should be part and parcel of the natural sentence; the sentence cannot be
didactic for the sake of didacticism:
She was __________ when she watched the movie. Not acceptable.
She was __________ when she watched the horror movie. Acceptable.
She was __________ when she watched the movie of the aliens devouring humanity -- Very acceptable.
She was __________ or frightened when she watched the (horror) movie. Not acceptable.
She was __________ , meaning she was frightened and upset, when she watched the (horror) movie. Not
acceptable.
Here are 4 vocabulary questions from an ETS TOEIC:
i. Bianca Brunelli hopes to be _________ to government office in the spring.
(A) chosen (B) elected (C) preferred (D) considered
ii. All department supervisors are required to attend the ________ on the new employee time-keeping policy.
(A) delegation (B) summary (C) commission (D) seminar
Notice in i and ii that it is the nature of the context as a whole (i - government, ii - corporate policy) that
determines the appropriate choice, though the distractors in isolation carry similar or related meanings.
iii. ________ the latest census, the population of the province has increased by 18% in the last decade.
(A) In compliance with (B) Depending on (C) According to (D) Along with
In iii, the meanings of the discourse connectives are different, and the test taker must ascertain the logical
relationship of the two clauses.
iv. Lyon Brother, Inc., had a very small budget for advertising, so they decided to produce brochures _________
.
(A) itself (B) oneself (C) ourselves (D) themselves
And iv is a common case in which the test taker must determine pronomial reference.
The best (i.e. 'most testing') questions come in this sort of situation: where the overall tenor, theme or register of
the statement determines the correct answer, not just a key give-away word or defining phrase (like 'horror').
It is important that a distractor (false answer) have a reasonable chance of fitting into the gap. Notice that within
each set of answers, all are one part of speech and they are all 'reasonable' words or phrases for the general
circumstances of the statement. In fact, the TOEIC instructions to the test taker read 'select the answer choice
that best completes the sentence' [my italics]. Our main goal is to find 3 good distractors that almost work but
do not quite do so for some reason.
Sometimes the problem set for the test taker can be structural:
v. John _____ to attend the business conference in Las Vegas, but his wife wouldn't let him.
(A) wanted (B) enjoyed (C) invited (D) could
In v, the answer concepts are all quite possible, but structure, for various reasons, permits only A. Choosing this
method of barring distractors is a technique that works for lower level students, but as they advance they acquire
a better understanding of structure, and this technique loses effectiveness. Still, it is a common technique when
working with function words particularly.
But most often, the problem set is simply lexical, as in i and ii above. Now, let's go back to what I just said a
moment ago: 'Our main goal is to find 3 good distractors that almost work but do not quite do so'. Three
distractors-- that is often a difficult task, especially when we need to write 100 or 200 questions per assignment.
All too often, the correct answer is too obvious, particularly when testing at higher levels, because English (like
all languages, I suppose) collocates so much. So there are some cheap tricks that can sometimes be used (like
catsup) to help camouflage the taste of the truth if distractors convincing in themselves, in their meanings,
cannot be found.
These tricks seem to have taken up too much of our previous discussions, but if I list them here, maybe we can
move on. Their intent is to make the 4 answers more 'symmetric' as Alan suggests-- to help prevent the real
answer from standing out like a sore thumb, and to try to counteract the efforts of those test takers who are good
at guessing because they can read the test writer's psychology:
1. Similar sounds
1a. All answers start with the similar phonemes (conference, congress, convention, colloquium). This usually
means we can use the same prefix, and thus it may distract a test taker who is unsure of the precise term though
s/he may recognize the meaning of the prefix.
1b. All answers end with the similar phonemes (ambitious, zealous, assiduous, industrious)
1c. All answers have similar internal phonemes (bought, sought, fought, wrought)
2. Similar roots and sources
2a. Similar roots (intended, pretended, contended, extended)
2b. Similar linguistic sources (pro forma, ex officio, in vitro, ad hoc)
3. Similar looks
3a. Similar lengths (resources, artifices, substances, facilities)
3b. Similar registers (skinny, dope, buzz, scuttlebutt)
Corollary to 1, 2, 3: When we cannot treat all 4 answers this way reasonably, we can (even better) do them by
pairs (conference, congress, exhibition, exposition). In this way, no single answer is yet the odd man out.
The goal of all of these is simply to make the answers appear more similar; none of them affects their
appropriateness, really. Notice that none of the official TOEIC answer sets above (i-iv) starts with the same
single phoneme or employs any other of these stunts. Notice that my examples in 1c and 2 are utter failures in
suggesting correct meanings and should not be used. I have already taken up too much space in listing these;
they are to be used SPARINGLY. If we do fall into using them on occasion (and I do-- it almost turns into poetry
composition late at night!), we must be sure that they nevertheless are attractive (= alluring to the text taker) on
more substantial grounds as well.
Grammar questions should not be confused with vocabulary questions; they should be conceived quite
separately. I haven't seen many of these in the tests I've reviewed here, actually, and they're not my favorite
kind of question to write. Here is an ETS TOEIC grammar question:
vi. City College is now offering programs designed for students ________ to pursue a two-year certificate in
information technology.
(A) intending (B) intended (C) is intending (D) which intended
Notice that the answers are all from the same stem, but are different parts of speech (here, different verb
structures). This is quite different from putting a different part of speech into a lexical question; here we are
focussed on testing grammar structure, while the goal behind constructing a lexical question is to minimize
structural distinctions and focus on vocabulary. Grammar questions are more challenging if the gapping is done
within more complex structures (here it is within a nonfinite adjectival clause).
This might be a good place to talk about test levels (Basic, Intermediate, Advanced). I think this has been the
most subjective topic of discussion. Of course there is no way to assign a test absolutely between Basic and
Intermediate, or between Intermediate and Advanced, but we should be able to find some guidelines for
separating Basic from Advanced questions at least. (And we could have just two levels of tests: Basic and
Advanced!)
A. Test writers have different experiences with different L1 learners of L2. My Japanese students are steeped in
grammar; many of them know our grammar better than we do. Chinese students are bright, and they are wily
test takers-- they have developed all sorts of strategies for out-thinking the test makers. I know little of the levels
that others' students are attaining-- they may be greater or more modest. Nevertheless, I don't see this as a
problem to level setting: brighter students should still find their greatest challenge in the Advanced section, and if
more Chinese than Peruvian students reach that level, then so be it. Their showings will be similar on the official
language proficiency tests.
B. Word frequency lists do exist. I sometimes use the Cambridge English Lexicon (Roland Hindmarsh), which is
a graded word list specifically 'for materials writers and course designers'. The trouble is that they do not cover
enough words (Hindmarsh is 4500 items) and also that frequency is not the only measure, since many of the
most frequent are function words. It is how these function words and other de-lexicalized words are involved in
collocations, 'in word chunks', as Michael Lewis would have it, that determines commonness or usefulness, not
just frequency of occurrence.
C. So what can we use as markers? I suggest that might be able to work toward some sort of-- not agreement,
but fellow feeling-- on 4 factors: (1) sentence length and complexity, (2) sophistication/difficulty of vocabulary, (3)
distractor attractiveness or subtlety of distinctions, and (4) extent of 'idiomness'.
(1) Simple sentences, compound sentences, compound-complex sentences. The inclusion of nonfinite, verbless
and embedded clauses. It is not too much to ask, I think, to generally limit Basic sentences to simple and
compound sentences and to avoid such simple sentences in favor of more complex composition in Advanced
sentences.
(2) Similarly, we could try to lean more heavily on get, put, make in Basic and acquire, place, manufacture in
Advanced, if you take my meaning.
(3) Basic questions can assume that the test taker has only basic skills and will have trouble with this
assortment of adverbs:
vii. When did we last visit Tokyo? - It was a while ______. Must have been about two months now.
(A) behind (B) ago (C) ahead (D) under
Advanced students need to be challenged with more subtle distinctions:
viii. In many ways, wider Europe is _______ in the Nordic region, which is characterized by similarities of
language, law and history but enjoys different cultures.
(A) mimicked (B) mirrored (C) refracted (D) replicated
(4) Advanced questions could incorporate less transparent idioms.
5) A theme and a coherent set of test questions is an excellent contribution to keeping our tests from being dry
as dust. This raises the question, 'Is legitimate context then extra-sentential? Can or should test takers include
consideration of not just the single sentence but the whole thematic set of statements in determining an answer?'
I personally think that it is good if they have to remember what sentences #1 and #2 said when they are
answering #3-- not that it is required on the TOEIC, but it surpasses that requirement, making the TOEIC easier
for them. But either way, the test writer will need to keep in mind that the test taker is assembling a narrative as
he takes the test.
(I might add here an anecdote. I recently responded to a student who read such tests for the content. She said
that she learned a lot about history, politics, etc.from such quizzes. I warned her that language tests were written
for another purpose and did not necessarily contain factual information. We might consider the ethics of
adapting factual truth and historical accuracy to language testing; I had never thought about that before.)
6) Point #2 expressed concern about the test taker's finding test text on line, but this point seems to welcome
student development. I don't think we are clear on the purpose of tests, though Alan's opinion is clear enough. If
we wish to consolidate and develop student vocabulary, then we should be offering not tests but vast amounts of
assigned or self-chosen reading. Krashen's work on comprehensible input makes it clear that that is the route to
vocabulary acquisition.
I feel strongly that tests should be made only to measure that acquisition, and that the two tasks (acquisition and
measurement) both suffer if that distinction is not clearly marked. The test itself is what it is: does the test taker
know the answer? Of course, what happens after that-- hopefully, the student looks up all 4 words in his
dictionary, he goes searching for other instances on the internet, he tries using the new-found idiom on his
girlfriend-- these are where vocabulary is consolidated and developed. If the test question is turned into an
attempt to introduce a word-- if the text attempts to inject a definition or a set-up didactic circumstance, it drifts
from the natural language. It is neither a language proficiency test (upon which many test-takers' careers
depend), nor is it a natural piece of communicative English, so we are left with a halfbreed thing somewhere in
the middle. Yes, I agree that a well-written test will allow the test taker to see the word in context...if he gets it
right. But I think we should let that occur naturally as a by-product and focus on the testing aspect.
8) This is precisely the unfortunate situation we are trying to rectify with our writer's forum. There should be
continuous peer and staff review of all materials submitted. That is how it is done professionally: publishers
have editors and proofreaders, and whole books manage to appear with nary a typo. ETS and other language
proficiency test makers continually rewrite and refine writer submissions. The ETS and IELTS tests are virtually
error-free. It is unthinkable to simply wait for member feedback in the Feedback, Comments and Suggestions
forum. It is personally embarrassing for me to read some student's post informing us of a mistake-- even if it's
just a typo-- in one of our tests. It reflects on our professionalism. Errors always occur in the initial stages-everyone gets tired and myopic sometimes-- but errors of any kind should never reach the general membership.
A fine-tooth comb is precisely what we lack here. Our hair is all tousled and unruly. On the contrary, test writers,
moderators and staff should embrace the process, drop their defenses, and become proactive in assuring that all
of our tests are presented impeccably. It can be done.
Work for everyone can be minimized, however, if the writer takes time to proofread carefully-- especially for
punctuation and the notorious 'a(n)'. We mentioned above the use of different registers, of spoken and written
texts, and it is important to realize that not only the lexes but also the structure and punctuation vary with these.
I could also come up with a list of the most frequent oversights... but not here and now: this is already too long.
12) I kind of see what you mean about escaping the traditional. My understanding is that most of our tests are
preparatory to TOEIC and TOEFL tests. As such, they should follow those formats as nearly as possible. We
also have tests listed under many categories here (ESL-EFL Tests, TOEIC and TOEFL Prep tests, GMATs,
ASVABs, Audio lessons, etc). I presume that we should be writing questions to match those examinations, too.
In any case, I myself would like to see a complete annotated list of the tests we offer and their formats set down
in one place. Perhaps the writers of each could lay out what they have been doing and the forms their tests
have taken?
My comments to date in the writers & moderators forum have not been simply to voice my druthers; I have been
suggesting improvements and other ideas which I hope writers will take note of. I would also like that kind of
feedback when I post my own efforts there. There is certainly time and space to discuss generalities, but the
forum's raison d'etre was the uneven quality of tests being written in isolation. We needn't wait for the cows; a
simple peer review does not take that long. (Please see my note #8.)

Download Report

ESL Test Creation Guidelines

Paperzz.com

Your Paperzz