ESL Test Creation Guidelines 1) Test writers should have experience of and/or qualifications in writing materials for English learners, or should be willing to undergo a training period in English-test's methodology and submit sample test sets for approval. (see note 1) 2) Materials used in creating sentences should ideally be original, since material garnered from the Internet can easily, by cut and paste, provide the missing word. However, the use of authentic materials excerpted from the internet or other sources and appropriately limited and modified to avoid intimation of plagiarism can also be a valuable source of real written or spoken English. (see note 2) 3) Sentences created for tests should by inference/indication/hint provide a sensible pointer to the missing word. A check on the suitability of the correct word can be seen when the whole sentence provides a reasonable definition of the missing word. Such internal indication should not be a definition in and of itself or an overt synonym, however, because in such questions we are testing for meaning within the statement; the sentential context should only limit the choice of words that can collocate meaningfully; a good test sentence does not present an overt answer but an environment in which the correct answer is the most appropriate choice. (see note 3 and 4) 4) Words offered as choices should have some kind of symmetry whereby all are adjectives/adverbs/verbs and so on. Consistency of part of speech is one of the key factors in choosing attractive distractors for the vocabulary questions of the TOEIC. (This does not apply to the grammar questions, where verb forms and parts of speech are of the essence.) No ‘made up’ words should be used (this includes in particular false verb forms such as 'digged' or 'layed').(see note 3 and 4) 5) Continuity of theme in the ten test sentences is recommended rather than ten randomly created sentences. To that end the test needs a descriptive title apart from its level. (see note 5) 6) It is important to bear in mind the purpose of the tests is to help students consolidate or develop their vocabulary by seeing the ‘missing’ word in context. (see note 6) 7) Conversational/informal/everyday English should not be neglected and can usefully be used with ‘dialogue’ tests. Test writers should be clearly aware of whether the particular test question in hand is dialogue or written statement, as their registers are quite different. They should also keep in mind– and feel free to utilize– the various sub-registers of spoken and written English. For instance, spoken English is not necessarily slangladen, and written language can come from casual emails as well as formal essays. 8) Once a tester has been accepted as competent, there seems little point in any of the moderators going through tests with a fine toothcomb to suggest changes. Mistakes, errors and typos are inevitable and can be put right. This can be dealt with in the forum: Feedback, Comments and Suggestions. (see note 8) 9) In keeping with the spirit of English-test, the lighter touch should be encouraged so that the tests do not smack too much of the ‘dry-as-dust’ attitude. 10) In making the choices of words, the tester has to bear in mind that explanations are always needed by learners who want to know why their chosen word is not correct. This means that the writer him/herself must have created the choices with a good rationale behind them. Accepting the proviso outlined in (1) concerning the required competence of the test writer, I would recommend that it falls to the originator to check the accuracy of the test and put right any typos/errors/wrong answers that appear in the transcription. (see note 8 again) 12) A useful function of the new forum would be to offer suggestions/new ideas concerning the creation and construction of tests - anything to get away from the traditional : Fill in the missing word in these sentences: The cat sat **** the mat (a) off (b) on (c) under (d) over - if you see what I mean! I do feel quite earnestly that the forum should not be there for members simply to voice alternatives along the lines of “if it were me, I'd say so and so.” As I wrote earlier to Torsten, there will never be a perfect test and we could go on chuntering about choices till the cows come home. (see note 12) NOTES to ESL Test Creation Guidelines (MM) 1) As we develop a methodology and a concept of what the English-test test should be, we should look toward enlisting and training members who reveal talent in this direction. 2) As we have discussed before, random sentences excerpted from other sources and not presented as one's own concept are not plagiary. ['plagiarism: taking the writings or literary concepts (a plot, characters, words) of another and selling and/or publishing them as one's own product. Quotes which are brief or are acknowledged as quotes do not constitute plagiarism'-- Dictionary.law.com]. They are on the contrary an invaluable source of authentic communication NOT produced simply as a test question. News articles and interviews are good sources of real written and spoken English respectively. It is easy enough to include an acknowledgement in the title slot if it is felt necessary. Since we are not a certifying body and we don't give students cash awards for passing our tests (do we?), I see no reason to prevent students from going and looking for the answers elsewhere if they wish to use our tests in that way. We are not schoolmasters and they are not schoolboys. However, minor modifications of authentic texts will usually put the kibosh on their efforts in that direction. 3 and 4) It is true that the sentence must contain hints toward the answer; otherwise there would be no point in having a sentence at all. However, it should be part and parcel of the natural sentence; the sentence cannot be didactic for the sake of didacticism: She was __________ when she watched the movie. Not acceptable. She was __________ when she watched the horror movie. Acceptable. She was __________ when she watched the movie of the aliens devouring humanity -- Very acceptable. She was __________ or frightened when she watched the (horror) movie. Not acceptable. She was __________ , meaning she was frightened and upset, when she watched the (horror) movie. Not acceptable. Here are 4 vocabulary questions from an ETS TOEIC: i. Bianca Brunelli hopes to be _________ to government office in the spring. (A) chosen (B) elected (C) preferred (D) considered ii. All department supervisors are required to attend the ________ on the new employee time-keeping policy. (A) delegation (B) summary (C) commission (D) seminar Notice in i and ii that it is the nature of the context as a whole (i - government, ii - corporate policy) that determines the appropriate choice, though the distractors in isolation carry similar or related meanings. iii. ________ the latest census, the population of the province has increased by 18% in the last decade. (A) In compliance with (B) Depending on (C) According to (D) Along with In iii, the meanings of the discourse connectives are different, and the test taker must ascertain the logical relationship of the two clauses. iv. Lyon Brother, Inc., had a very small budget for advertising, so they decided to produce brochures _________ . (A) itself (B) oneself (C) ourselves (D) themselves And iv is a common case in which the test taker must determine pronomial reference. The best (i.e. 'most testing') questions come in this sort of situation: where the overall tenor, theme or register of the statement determines the correct answer, not just a key give-away word or defining phrase (like 'horror'). It is important that a distractor (false answer) have a reasonable chance of fitting into the gap. Notice that within each set of answers, all are one part of speech and they are all 'reasonable' words or phrases for the general circumstances of the statement. In fact, the TOEIC instructions to the test taker read 'select the answer choice that best completes the sentence' [my italics]. Our main goal is to find 3 good distractors that almost work but do not quite do so for some reason. Sometimes the problem set for the test taker can be structural: v. John _____ to attend the business conference in Las Vegas, but his wife wouldn't let him. (A) wanted (B) enjoyed (C) invited (D) could In v, the answer concepts are all quite possible, but structure, for various reasons, permits only A. Choosing this method of barring distractors is a technique that works for lower level students, but as they advance they acquire a better understanding of structure, and this technique loses effectiveness. Still, it is a common technique when working with function words particularly. But most often, the problem set is simply lexical, as in i and ii above. Now, let's go back to what I just said a moment ago: 'Our main goal is to find 3 good distractors that almost work but do not quite do so'. Three distractors-- that is often a difficult task, especially when we need to write 100 or 200 questions per assignment. All too often, the correct answer is too obvious, particularly when testing at higher levels, because English (like all languages, I suppose) collocates so much. So there are some cheap tricks that can sometimes be used (like catsup) to help camouflage the taste of the truth if distractors convincing in themselves, in their meanings, cannot be found. These tricks seem to have taken up too much of our previous discussions, but if I list them here, maybe we can move on. Their intent is to make the 4 answers more 'symmetric' as Alan suggests-- to help prevent the real answer from standing out like a sore thumb, and to try to counteract the efforts of those test takers who are good at guessing because they can read the test writer's psychology: 1. Similar sounds 1a. All answers start with the similar phonemes (conference, congress, convention, colloquium). This usually means we can use the same prefix, and thus it may distract a test taker who is unsure of the precise term though s/he may recognize the meaning of the prefix. 1b. All answers end with the similar phonemes (ambitious, zealous, assiduous, industrious) 1c. All answers have similar internal phonemes (bought, sought, fought, wrought) 2. Similar roots and sources 2a. Similar roots (intended, pretended, contended, extended) 2b. Similar linguistic sources (pro forma, ex officio, in vitro, ad hoc) 3. Similar looks 3a. Similar lengths (resources, artifices, substances, facilities) 3b. Similar registers (skinny, dope, buzz, scuttlebutt) Corollary to 1, 2, 3: When we cannot treat all 4 answers this way reasonably, we can (even better) do them by pairs (conference, congress, exhibition, exposition). In this way, no single answer is yet the odd man out. The goal of all of these is simply to make the answers appear more similar; none of them affects their appropriateness, really. Notice that none of the official TOEIC answer sets above (i-iv) starts with the same single phoneme or employs any other of these stunts. Notice that my examples in 1c and 2 are utter failures in suggesting correct meanings and should not be used. I have already taken up too much space in listing these; they are to be used SPARINGLY. If we do fall into using them on occasion (and I do-- it almost turns into poetry composition late at night!), we must be sure that they nevertheless are attractive (= alluring to the text taker) on more substantial grounds as well. Grammar questions should not be confused with vocabulary questions; they should be conceived quite separately. I haven't seen many of these in the tests I've reviewed here, actually, and they're not my favorite kind of question to write. Here is an ETS TOEIC grammar question: vi. City College is now offering programs designed for students ________ to pursue a two-year certificate in information technology. (A) intending (B) intended (C) is intending (D) which intended Notice that the answers are all from the same stem, but are different parts of speech (here, different verb structures). This is quite different from putting a different part of speech into a lexical question; here we are focussed on testing grammar structure, while the goal behind constructing a lexical question is to minimize structural distinctions and focus on vocabulary. Grammar questions are more challenging if the gapping is done within more complex structures (here it is within a nonfinite adjectival clause). This might be a good place to talk about test levels (Basic, Intermediate, Advanced). I think this has been the most subjective topic of discussion. Of course there is no way to assign a test absolutely between Basic and Intermediate, or between Intermediate and Advanced, but we should be able to find some guidelines for separating Basic from Advanced questions at least. (And we could have just two levels of tests: Basic and Advanced!) A. Test writers have different experiences with different L1 learners of L2. My Japanese students are steeped in grammar; many of them know our grammar better than we do. Chinese students are bright, and they are wily test takers-- they have developed all sorts of strategies for out-thinking the test makers. I know little of the levels that others' students are attaining-- they may be greater or more modest. Nevertheless, I don't see this as a problem to level setting: brighter students should still find their greatest challenge in the Advanced section, and if more Chinese than Peruvian students reach that level, then so be it. Their showings will be similar on the official language proficiency tests. B. Word frequency lists do exist. I sometimes use the Cambridge English Lexicon (Roland Hindmarsh), which is a graded word list specifically 'for materials writers and course designers'. The trouble is that they do not cover enough words (Hindmarsh is 4500 items) and also that frequency is not the only measure, since many of the most frequent are function words. It is how these function words and other de-lexicalized words are involved in collocations, 'in word chunks', as Michael Lewis would have it, that determines commonness or usefulness, not just frequency of occurrence. C. So what can we use as markers? I suggest that might be able to work toward some sort of-- not agreement, but fellow feeling-- on 4 factors: (1) sentence length and complexity, (2) sophistication/difficulty of vocabulary, (3) distractor attractiveness or subtlety of distinctions, and (4) extent of 'idiomness'. (1) Simple sentences, compound sentences, compound-complex sentences. The inclusion of nonfinite, verbless and embedded clauses. It is not too much to ask, I think, to generally limit Basic sentences to simple and compound sentences and to avoid such simple sentences in favor of more complex composition in Advanced sentences. (2) Similarly, we could try to lean more heavily on get, put, make in Basic and acquire, place, manufacture in Advanced, if you take my meaning. (3) Basic questions can assume that the test taker has only basic skills and will have trouble with this assortment of adverbs: vii. When did we last visit Tokyo? - It was a while ______. Must have been about two months now. (A) behind (B) ago (C) ahead (D) under Advanced students need to be challenged with more subtle distinctions: viii. In many ways, wider Europe is _______ in the Nordic region, which is characterized by similarities of language, law and history but enjoys different cultures. (A) mimicked (B) mirrored (C) refracted (D) replicated (4) Advanced questions could incorporate less transparent idioms. 5) A theme and a coherent set of test questions is an excellent contribution to keeping our tests from being dry as dust. This raises the question, 'Is legitimate context then extra-sentential? Can or should test takers include consideration of not just the single sentence but the whole thematic set of statements in determining an answer?' I personally think that it is good if they have to remember what sentences #1 and #2 said when they are answering #3-- not that it is required on the TOEIC, but it surpasses that requirement, making the TOEIC easier for them. But either way, the test writer will need to keep in mind that the test taker is assembling a narrative as he takes the test. (I might add here an anecdote. I recently responded to a student who read such tests for the content. She said that she learned a lot about history, politics, etc.from such quizzes. I warned her that language tests were written for another purpose and did not necessarily contain factual information. We might consider the ethics of adapting factual truth and historical accuracy to language testing; I had never thought about that before.) 6) Point #2 expressed concern about the test taker's finding test text on line, but this point seems to welcome student development. I don't think we are clear on the purpose of tests, though Alan's opinion is clear enough. If we wish to consolidate and develop student vocabulary, then we should be offering not tests but vast amounts of assigned or self-chosen reading. Krashen's work on comprehensible input makes it clear that that is the route to vocabulary acquisition. I feel strongly that tests should be made only to measure that acquisition, and that the two tasks (acquisition and measurement) both suffer if that distinction is not clearly marked. The test itself is what it is: does the test taker know the answer? Of course, what happens after that-- hopefully, the student looks up all 4 words in his dictionary, he goes searching for other instances on the internet, he tries using the new-found idiom on his girlfriend-- these are where vocabulary is consolidated and developed. If the test question is turned into an attempt to introduce a word-- if the text attempts to inject a definition or a set-up didactic circumstance, it drifts from the natural language. It is neither a language proficiency test (upon which many test-takers' careers depend), nor is it a natural piece of communicative English, so we are left with a halfbreed thing somewhere in the middle. Yes, I agree that a well-written test will allow the test taker to see the word in context...if he gets it right. But I think we should let that occur naturally as a by-product and focus on the testing aspect. 8) This is precisely the unfortunate situation we are trying to rectify with our writer's forum. There should be continuous peer and staff review of all materials submitted. That is how it is done professionally: publishers have editors and proofreaders, and whole books manage to appear with nary a typo. ETS and other language proficiency test makers continually rewrite and refine writer submissions. The ETS and IELTS tests are virtually error-free. It is unthinkable to simply wait for member feedback in the Feedback, Comments and Suggestions forum. It is personally embarrassing for me to read some student's post informing us of a mistake-- even if it's just a typo-- in one of our tests. It reflects on our professionalism. Errors always occur in the initial stages-everyone gets tired and myopic sometimes-- but errors of any kind should never reach the general membership. A fine-tooth comb is precisely what we lack here. Our hair is all tousled and unruly. On the contrary, test writers, moderators and staff should embrace the process, drop their defenses, and become proactive in assuring that all of our tests are presented impeccably. It can be done. Work for everyone can be minimized, however, if the writer takes time to proofread carefully-- especially for punctuation and the notorious 'a(n)'. We mentioned above the use of different registers, of spoken and written texts, and it is important to realize that not only the lexes but also the structure and punctuation vary with these. I could also come up with a list of the most frequent oversights... but not here and now: this is already too long. 12) I kind of see what you mean about escaping the traditional. My understanding is that most of our tests are preparatory to TOEIC and TOEFL tests. As such, they should follow those formats as nearly as possible. We also have tests listed under many categories here (ESL-EFL Tests, TOEIC and TOEFL Prep tests, GMATs, ASVABs, Audio lessons, etc). I presume that we should be writing questions to match those examinations, too. In any case, I myself would like to see a complete annotated list of the tests we offer and their formats set down in one place. Perhaps the writers of each could lay out what they have been doing and the forms their tests have taken? My comments to date in the writers & moderators forum have not been simply to voice my druthers; I have been suggesting improvements and other ideas which I hope writers will take note of. I would also like that kind of feedback when I post my own efforts there. There is certainly time and space to discuss generalities, but the forum's raison d'etre was the uneven quality of tests being written in isolation. We needn't wait for the cows; a simple peer review does not take that long. (Please see my note #8.)
© Copyright 2026 Paperzz