Expanding the Model of Item-Writing Expertise: Cognitive Processes and Requisite Knowledge Structures Annual Meeting of the American Educational Research Association New Orleans, Louisiana Dennis Fulkerson, Pearson Paul Nichols, Center for Assessment Eric Snow, SRI International April 2011 This material is based on work supported by the National Science Foundation under grant number DRL-0733172 (Application of Evidence-Centered Design in Large-Scale Assessment). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 1 Expanding the Model of Item-Writing Expertise: Cognitive Processes and Requisite Knowledge Structures Background A number of studies have attempted to establish a research base for writing assessment items (e.g., Wesman, 1971; Millman & Greene, 1989; Bejar, 1993; Gorin, 2006). In these studies, an understanding of test takers’ response processes and knowledge structures is used to predict the psychometric properties of the items. This same knowledge of test takers’ responses and knowledge structures is used to construct items that measure specific aspects of the test framework, e.g., reasoning versus problem solving. However, research supporting a research-based approach to writing items has tended to overlook the item writers themselves. Few studies have examined cognitive models of item writers’ writing processes and knowledge structures. The development of a cognitive model of item-writing expertise holds promise for improving the quality of the written items. Careful examination of item writers’ cognitive processes and knowledge structures should provide insight into yet another aspect of item writing and thus further inform efforts to improve the quality of items at an early phase of development by addressing and resolving areas of need in item writers’ knowledge and skills related to item construction. Such an improvement effort may be especially useful in relation to writing innovative and technology-enhanced items. One example of how this improvement might be effected is through incorporation of information from an item-writing cognitive model into item writer training workshops and guides. Several studies of studies of item writers’ cognition were completed by Fulkerson and his colleagues. In an earlier study of experienced item writers’ cognition, Fulkerson, Mittelholtz, and Nichols (2009) found that expert item writers engaged in three phases of problem solving. In the EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 2 initial representation phase, expert item writers routinely created a problem definition in the form of a mental model (e.g., Gentner & Stevens, 1983; Johnson-Laird, 1983). In the second phase, the exploration phase, item writers purposefully explored the problem-solving space in search of content that represented a workable solution to the item-writing assignment. In the third phase, the solution phase, item writers successfully completed the assignment by finding a workable solution that satisfied the set of constraints. Subsequent research exploring differences between more and less experienced item writers (Fulkerson, Nichols, & Mittelholtz, 2010) found that less experienced item writers engaged in less structured problem solving. This lack of organization in problem solving may have been due to the difficulty less experienced item writers had in accommodating the cognitive load of an item-writing assignment. Together, the results of these studies suggest that the development of expertise in item writing is similar to the development of expertise in other domains studied by cognitive scientists. However, previous research on item writers’ cognition has focused on the process of writing items. This research has largely ignored the role that knowledge plays in item writers’ development of assessment items. Yet research in other related domains indicates that knowledge, as well as processes, plays an important role in differences between the performance of more and less experienced performers. For example, Katz (1994) found that differences in knowledge were related to performance differences between experts and novices in design under constraint. In addition, Shulman (1986, 1987) and others (e.g., Ball, 2000; Hill, Rowan & Ball, 2005) have found that knowledge structures are important in the performance of teachers. In this paper, we expand the cognitive model of item writing to not only include cognitive processes but to also include requisite knowledge structures used by item writers. We propose a more comprehensive model of item writers’ cognition that incorporates the following EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 3 perspectives: the earlier research on the cognitive processes of item writers’ (Fulkerson, Mittelholtz, & Nichols 2009; Fulkerson, Nichols, & Mittelholtz, 2010), the research on knowledge for teaching (Shulman, 1986, 1987; Ball, 2000; Hill, Rowan & Ball, 2005), and research on expert-novice design under constraint (Katz, 1994). Study Methods Participants in this study were required to write a rough assessment storyboard consisting of four scenes and one multiple-choice item corresponding to one of the storyboard scenes while thinking aloud. Storyboards are products of innovative test development processes that precede the development of online scenario assessment tasks. A storyboard is a written description of the narrative, images, animation, and/or video that will be developed for a test scenario. Scenarios consist of several related scenes that emphasize inquiry-based learning theory and hands-on science strategies and provide students opportunities to observe a process of science and the results of an investigation or event. Benchmark-aligned items are presented within the scenario context. The participants in this study were nine science content specialists from an assessment company. The nine participants had been employed as science content specialists for at least six months and had a broad range of education and teaching experience. Most writers had some experience writing scenarios over a three-year period, while some writers had little or no experience in scenario writing. Writers were considered experts in scenario writing if they had more than one year of scenario-writing experience. Scenario writers were considered novices if they had less than one year of experience. Demographic information for all nine participants is shown in Table 1. EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 4 Table 1. Demographic Information for Study Participants Subject Assessment ScenarioCompany Writing Experience Experience 1 4 years 3 years* 2 5 years 3 years* 3 6 years None† 4 1 year None† 5 6 months None† 6 5 years 6 months† 7 4 years 3 years* 8 6 years 3 years* 9 6 years 3 years* *Participant considered an expert † Participant considered a novice Teaching Experience 24 years 31 years 4 years 38 years 5 years None 28 years 5 years 23 years Subject Area Highest Degree Chem/Phys Master’s Chemistry Master’s Biology Master’s Physics Master’s Biology Bachelor’s Chemistry Master’s Earth Ed.S. Biology Master’s General Master’s Design Pattern Group No No No Yes Yes Yes Yes Yes Yes Six of the participants were asked to use design patterns during their think-aloud sessions. Design patterns are products of evidence-centered assessment design (Mislevy, Steinberg, & Almond, 2002) and are intended to aid the item writer in creating meaningful items that effectively elicit evidence of student understanding. Design patterns communicate the elements of evidence-centered assessment tasks and consist of a series of attributes characteristic of wellconstructed test items. Design pattern attributes include knowledge, skills, and abilities, observations, work products, and characteristic and variable features (Mislevy et al., 2003). A half-hour training session was held four days before the first think-aloud session was conducted. All participants received information on the purpose of the study, the science content framework, and the task they would be asked to do during the think-aloud session. An additional half-hour training session was held for those participants in the design pattern group, informing the participants how to use design patterns. The participants were tested individually in one-hour sessions. During individual sessions, participants received a writing assignment and instructions for completing the task. The writing assignment presented three sets of assigned content benchmarks and asked participants to EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 5 select one of the sets of benchmarks as their assignment. Content benchmarks were selected from the Minnesota Comprehensive Assessments Series II Test Specifications for Science (MDE, 2008). The writing assignment asked participants to complete two tasks. First, participants were asked to write a rough four-scene storyboard based on the assigned benchmarks. Second, participants were asked to write one rough multiple-choice item aligned to one of the assigned benchmarks in the context of one of the previously written storyboard scenes. As they responded to the writing assignment, writers were asked to think aloud, verbalizing cognitive information generated during item writing. Writers’ verbal reports were audio recorded as they attempted to write the assigned storyboards and items. Data Analysis Protocol analysis techniques (Ericsson & Simon, 1993) were applied to analyze how the writers performed the assigned assessment task. The analyses occurred in four steps. First, the verbal behaviors recorded during the think aloud sessions and retrospective reports were transcribed for analysis. Second, the transcripts were reviewed and edited for accuracy. Third, the transcripts were segmented into individual statements. Finally, each statement was coded in two ways: 1) categories of revised problem-solving cognitive processes and 2) categories of requisite knowledge structures. Each segment was assigned an alphanumeric code corresponding to its identified cognitive process and knowledge structure. Segmenting and coding were completed by trained analysts. Prior to segmenting and coding, inter-rater agreement was calculated to ensure consistency in analyses. For segmenting, agreement was 74%. Agreements for cognitive process and knowledge structure categories were 75% and 72%, respectively. EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 6 Before the statements were coded, the researchers revised the process categories reported in Fulkerson, Nichols, and Mittelholtz (2010). The process category of decomposition was dropped from the analyses because the earlier research had shown that no statements were identified as examples of decomposition. In addition, several sets of processes were combined into a single process because the earlier research indicated that these categories were functionally similar or difficult to differentiate during analysis. The categories of meta-clarification, missing information, and problem definition were combined into problem definition, and the categories of backtracking and impasse were combined into impasse. Table 2 describes the nine process categories used in the analyses. Table 2. Revised Cognitive Process Categories Code Category 1 Problem definition 2 Impasse 3 4 5 Constraint recognition Relaxation and prioritization Schema activation 6 Operator 7 Evaluation 8 9 Solution satisfaction Extraneous Description Describes creating an initial or subsequent problem representation that includes potentially useful knowledge elements. Includes seeking clarification about assignment or study procedure. Refers to a state of mind in which the item writer feels that all options have been exhausted. Includes retreating toward earlier states or even to the beginning of the problem. Sets limits on the problem-solving space. Expands the problem-solving options by relaxing constraints or selecting a preferred solution to an impasse or constraint. Describes the application of mental structures drawing on past experience. Reveals active searching for content and solutions. Includes mental and physical operators that advance the writer in the problem space. Describes evaluating an operator relative to some task requirement. Describes meeting some desired goals. Represents statements that appear irrelevant to the assignment. EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 7 In addition, the authors defined the following set of knowledge structures based on the research on knowledge for teaching (Shulman, 1986, 1987; Ball, 2000; Hill, Rowan & Ball, 2005): general item-writing knowledge, general and pedagological content knowledge, assessment content knowledge that consists of knowledge of content and items and knowledge of content and scoring, and context-specific knowledge. General item-writing knowledge consists of generic rules and guidelines for writing items in different formats, including multiple choice and constructed response. These rules and guidelines are widely held by professional assessment developers and have been described by such authors as Haladyna and Downing (1989) and Hoepfl (1994). General and pedagological content knowledge consists of knowledge of the domains such as science, mathematics, or writing. This is knowledge that a scientist, mathematician, writer, or other well-educated person may have. This category also includes the domain-specific pedagological knowledge necessary to instruct learners in the domain content. A person with strong general and pedagological science knowledge would be able to spot an inaccurate or incomplete definition or fact, critique an experiment, and produce and deliver instructional materials. However, we claim that the knowledge needed to write items well goes beyond that of the subject specialist or teacher. Assessment content knowledge, which goes beyond the content knowledge of a teacher or other well-educated person, is the understanding of the domain needed to effectively and fairly arrange environments in a way that elicits evidence and then constructs criteria evaluating that evidence to recognize what students know and can do. Assessment content knowledge is divided into knowledge of content and items and knowledge of content and scoring. The knowledge of content and items category address the item writer’s understanding of how to connect content to context by selecting and sequencing content, scenes, characters, and actions in EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 8 text and graphics so as to elicit relevant student performance. The knowledge of content and scoring category addresses the item writer’s understanding of how to connect content to performance in the content area by describing criteria or constructing likely student responses so as to reveal missing knowledge, misconceptions, or other achievement deficits. Context-specific knowledge consists of knowledge of idiosyncratic preferences or restrictions of the particular context in which items are developed. It is generally founded on project-specific documentation such as test specifications, style preference guides, and itemwriting guides. In addition to the five knowledge structures, a sixth category representing unclassifiable statements was recognized. This category included extraneous, problem-defining, and other statements that could not be appropriately coded into the five knowledge structures. Table 3 summarizes the knowledge structures used in this study. Table 3. Knowledge Structure Categories Code Category A General item-writing knowledge B General and pedagological content knowledge C Knowledge of content and items D Knowledge of content and scoring E Context-specific knowledge F Not applicable Description Knowledge of generic rules and guidelines for writing items Knowledge of domain-specific content and instructional practices Knowledge of how to select and sequence assessment content so as to elicit relevant student performance Knowledge of how to construct assessment content so as to reveal achievement deficits Knowledge of preferences or restrictions of the particular context Does not represent pertinent knowledge EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 9 Study Results Similar to the findings of Fulkerson, Nichols, and Mittelholtz (2010), the results of this study show marked differences in the cognitive processes of more and less experienced writers. Data indicate that novice writers made extraneous and impasse statements at a higher percentage than expert writers and spent more time attempting to define the problem. Expert writers, on the other hand, more readily recognized constraints, relaxed or prioritized aspects of the task, and activated schema. Expert writers also moved forward in the problem space with mental or physical operators and came to a satisfactory solution at a greater percentage than novice writers (Table 4). Additionally, expert writers expressed fewer statements during the task than novice writers. On average, expert writers had 234 statements during the task, while novice writers had 259 statements. This indicates that experts were more efficient than novices in completing the task. Table 4. Percentages of Cognitive Process Category Use Process Category 1: Problem definition 2: Impasse 3: Constraint recognition 4: Relaxation and prioritization 5: Schema activation 6: Operator 7: Evaluation 8: Solution satisfaction 9: Extraneous Experts (%) 17.4 2.9 12.9 4.9 3.3 46.2 5.6 2.4 4.4 Novices (%) 24.4 4.9 7.5 3.7 0.8 40.5 8.7 1.1 8.4 All (%) 20.7 3.8 10.4 4.3 2.1 43.5 7.1 1.8 6.3 As predicted, results of the study confirm the use of the five knowledge categories by item writers. Examples of the use of each of the five knowledge structures were identified during transcript analysis. Analyses of transcripts indicate that all study participants used each of the EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 10 five knowledge structures. Additionally, each transcript contained a number of segments that could not be coded as one of the knowledge structures. These unclassifiable segments were coded as “not applicable.” An example from the transcripts of each of the knowledge structures is shown in Table 5. Table 5. Transcript Examples of Item Writer Use of Knowledge Structure Categories Knowledge Category A: General item-writing knowledge Transcript Example 3 B: General and pedagological content knowledge C: Knowledge of content and items 8 The wording needs to be a little bit—worked out a little bit and kind of—I am setting myself up for a long foil or this might be a good basic idea. It probably needs—or a decent basic idea. I’m sure it needs refining. That’s a long foil—if I’m going to do it this way. Cells are important to us because they are the building blocks of life. D: Knowledge of content and scoring 2 E: Contextspecific knowledge 6: Not applicable 6 9 1 Okay, I'm trying to think of a way to get a flow from a diagram of a woodland ecosystem to inherited characteristics. Maybe I can just use the phrase animal. Use a phrase like animals inherit characteristics from their parents. Or actually I could actually say plants and animals inherit characteristics. But I don't think eighth graders had enough experience with plant heredity and really focusing more on animals in this so I think I'll stay with animals inherit characteristics from their parents. So we could have—what did I say, 100 grams, 150 grams, 250 grams, and 50 grams. So rearranging these, if they subtract it, it would be 50 grams. If they just had the pizza mix it’s 100. If they just had the water, it’s 150, and if they had both of them, they have 250. I’m not sure about the word rate. I think somewhere in the benchmarks there’s something about rate being an issue, but I can look at that later. Okay. If I don’t say anything for a while it means my mind is blank. Is that correct? Okay. Percentages of knowledge structure statements for the participants are shown in Table 6. Percentage data indicate that the most-used knowledge structure category was knowledge of EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 11 content and items, while the least-used knowledge structure category was context-specific knowledge. Expert writers expressed the categories of general and pedagological content knowledge and knowledge of content and items at higher percentages than novices and expressed nonapplicable statements and general item-writing statements at lower percentages than novice writers. Table 6. Percentages of Knowledge Structure Category Use Knowledge Category A: General item-writing knowledge B: General and pedagological content knowledge C: Knowledge of content and items D: Knowledge of content and scoring E: Context-specific knowledge F: Not applicable Experts (%) 10.1 Novices (%) 11.6 All (%) 10.8 8.7 3.2 6.1 46.5 40.9 43.9 6.7 7.1 6.9 4.6 5.1 4.8 23.4 32.0 27.4 To examine the interaction between cognitive processes and knowledge structures among the expert and novice participants, each segment was assigned a corresponding code combination representing concurrence in cognitive process and knowledge structure categories. The percentage of use and rank of each concurrence were analyzed. Figure 1 shows a graphical comparison of the percentages of code combinations for expert and novice writers. Table 7 shows the ranked percentages of cognitive process and knowledge structure code concurrences for the participants. EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE Code Frequency (5D - 9F) Code Use (5D-9F) Code Frequency (1A - 5C) Code Use (1A-5C) 5C Experts 5A 9D 4F 9C 4E 9B 4D 9A 4C 8F 4B 8E 4A 8D 3F 8C 3E 8B 3D 8A 3C 7F 3B 7D 2F 7C 2E 7B 2D 7A 2C 6F 2B 6E 2A 6D 1F 6C 1E 6B 1D 6A 1C 5F 1B 5E 1A 5D 10.0% 20.0% Frequency Percent Use 30.0% 40.0% Expert s 7E 3A 0.0% Novices 9E Code Code 9F Novices 5B 12 0.0% 10.0% 20.0% Frequency Percent Use Figure 1. Combined code percentages for novice and expert participants. 30.0% 40.0% EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 13 Table 7. Ranked Percentages of Cognitive Process and Knowledge Structure Concurrences Rank 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Experts 6C 1F 6D 9F 3C 3B 6B 4C 3E 6A 7C 3A 5E 7A 2C 8C 1A 6F 33.5% 15.6% 5.7% 4.1% 4.0% 3.5% 3.4% 2.7% 2.6% 2.4% 2.4% 2.3% 1.8% 1.7% 1.6% 1.6% 1.2% 1.1% Novices 6C 31.9% 1F 18.2% 9F 5.9% 6D 5.6% 7A 2.9% 7C 2.9% 3C 2.8% 4C 2.4% 3B 2.4% 3E 2.4% 6B 2.4% 2C 2.2% 3A 2.2% 1A 1.9% 6A 1.9% 6F 1.3% 5E 1.2% 8C 1.2% All 6C 1F 9F 6D 7A 7C 2C 1A 3E 3A 4C 3C 6F 6A 6B 3B 6E 2F 30.0% 21.0% 8.0% 5.5% 4.2% 3.4% 2.9% 2.7% 2.2% 2.1% 2.1% 1.4% 1.4% 1.3% 1.3% 1.2% 1.1% 1.0% The data show that code 6C (Operator/Knowledge of content and items) was the combination with the highest percentage of use, with code 1F (Problem definition/Not applicable) being a distant second. These two category combinations accounted for half of all segments. Of the oft-expressed combinations, novice writers expressed the following combinations at higher percentages than the expert writers: 1A (Problem definition/General item-writing knowledge), 1F (Problem definition/Not applicable), 2C (Impasse/Knowledge of content and items), 7A (Evaluation/General item-writing knowledge), 7C (Evaluation/Knowledge of content and items), and EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 9F (Extraneous/Not applicable). Expert writers expressed the following combinations at higher percentages than novice writers: 3B (Constraint recognition/General and pedagological content knowledge), 3C (Constraint recognition/Knowledge of content and items), 5E (Schema activation/Content-specific knowledge), 6A (Operator/General item-writing knowledge), 6B (Operator/General and pedagological content knowledge), 6C (Operator/Knowledge of content and items), and 8C (Solution satisfaction/Knowledge of content and items). Other commonly expressed combinations for all writers included the following: 3A (Constraint recognition/General item-writing knowledge), 3E (Constraint recognition/Context-specific knowledge), 4C (Relaxation and prioritization/Knowledge of content and items), and 6D (Operator/Knowledge of content and scoring). Table 8 shows example segments from the transcripts for the most-used code concurrences. 14 EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 15 Table 8. Transcript Examples of Oft-Used Code Combinations Code 1A Transcript 6 1F 9 2C 5 3A 8 3B 8 3C 3E 2 8 4C 5 5E 6 6A 3 6B 7 6C 1 6D 5 7A 6 7C 5 8C 4 9F 2 Example I’m getting my things out of order. I should have numbered them. Two, three, all right. Okay, I'm looking at the assignment. It’s grade 8, investigation. Write a rough four-scene storyboard based on assigned benchmarks. Use the storyboard scene template. I’m kind of looking at this and trying to figure out—I’m kind of getting myself stuck here and—as far as where I want to go with the scene from here. I don’t think that would pass bias committee. Even though I would be simulating it; could I simulate it and still get it past committee? But I have a problem with that from a scientific viewpoint because blood can, somewhat, can receive and transmit chemical signals, which is why I don’t agree with that necessarily, Scene III—this is the one I wasn’t sure about what we want to do here. So this is atypical of how I would normally write a storyboard because I’d have a lot more content benchmarks to hit. 6.2.A.3 is another physical science choice. I’m actually deciding to stay away from physical science choice. I am going to go to earth science. I know there’s another benchmark somewhere about variables and whether we hold them constant, which ones would vary. Okay, so I’m just trying to think of a good way to approach one of these standards that’s going to be somewhat interesting to the students and, you know, have some sort of flow. Quartz doesn’t have a streak. Quartz doesn’t have cleavage. Hardness of quartz is its most distinctive property. And luster is—is a—quartz luster is shared by many other minerals. Along with in doing this, somewhere along the line maybe to collect some data and then be able to ask questions dealing with explaining the data and the findings that you obtained from doing this. I’m going to put which instrument does the student need in order to find air pressure? My multiple-choice distractors for this item are going to be thermometer, barometer… This kind of just helps me think about which ways I could go with this storyboard if I want to talk about these independent, dependent, or interfering variables. So maybe I could just—I’ll just use these as ideas, maybe, of what I want in my different scenes. But I did want to use one of the examples that includes testing motion, speed, testing weather using wind patterns, variables—what I was trying to go for. And it was 7.1.B.1. But I think I’m off the mark here a little bit. So I think I have four scenes described, however good, bad, or ugly they may be, but they’re there. Blah, blah, blah, blah, blah. EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 16 A number of code combinations were used rarely or not at all. These include the following combinations: 1B (Problem definition/General and pedagological content knowledge), 1D (Problem definition/Knowledge of content and scoring), 2B (Impasse/General and pedagological content knowledge), 2E (Impasse/Context-specific knowledge), 5C (Schema activation/Knowledge of content and items), 5D (Schema activation/Knowledge of content and items), 5F (Schema activation/Not applicable), 7E (Evaluation/Context-specific knowledge), 8A (Solution satisfaction/General item-writing knowledge), 8B (Solution satisfaction/General and pedagological content knowledge), 8E (Solution satisfaction/Context-specific knowledge), 8F (Solution satisfaction/Not applicable), 9B (Extraneous/General item-writing knowledge), 9C (Extraneous/Knowledge of content and items), 9D (Extraneous/Knowledge of content and scoring), and 9E (Extraneous/Context-specific knowledge). In addition to percentages and ranks of concurrent categories, segment data were analyzed to determine if there was a pattern of concurrent category use as writers progressed through the task. To identify a pattern, five of the nine transcripts were randomly selected for further analysis. Each transcript was divided into ten equal and sequential parts so that each part represented 10% of the transcript. The code combination with the highest percentage of use for EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 17 each part was identified. Data from the individual parts of the five transcripts were complied to determine the collective dominant code for each tenth of the transcripts. Table 9 shows the dominant code combination for each part for the transcripts. The data indicate that writers generally spent the first 20% of their task work trying to define the problem and then moved into the use of operators and assessment content knowledge for the remainder of the task. Table 9. Dominant Code Categories for Each Part of Five Compiled Transcripts Transcript Part Dominant Code Description 1 1F Problem definition/Not applicable 2 1F Problem definition/Not applicable 3 6C Operator/Knowledge of content and items 4a 6C Operator/Knowledge of content and items 5a 6C Operator/Knowledge of content and items 6a 6C Operator/Knowledge of content and items b 7 6C Operator/Knowledge of content and items 8 6C Operator/Knowledge of content and items c,e 9 6D Operator/Knowledge of content and scoring d,e 10 6D Operator/Knowledge of content and scoring a Highest concentrations of 3B (Constraint recognition/General and pedagological content knowledge) b Highest concentration of 8C (Solution satisfaction/Knowledge of content and items) c Highest concentration of 5E (Schema activation/Context-specific knowledge) d Highest concentration of 8D (Solution satisfaction/Knowledge of content and scoring) e Highest concentrations of 7A (Evaluation/General item-writing knowledge) These data coincide with the problem-solving phases predicted by the item-writing model (Fulkerson, Mittelholtz, & Nichols, 2009; Fulkerson, Nichols, & Mittelholtz, 2010), which suggests that item writers move through the cognitive process phases of problem definition followed by operators followed by solution satisfaction, with intermittent use of constraint recognition, schema activation, and evaluative statements. The inclusion of knowledge structures strongly shows that item writers move through phases of problem definition/not applicable (1F) followed by operator/assessment content knowledge (6C, 6D). Solution satisfaction statements are interspersed throughout the 10 transcript parts, with the highest concentrations in part 7 (8C) EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 18 and part 10 (8D). Relatively high concentrations of constraint recognition are found in parts 4, 5, and 6. Constraint recognition is most frequently coupled in these three parts with general and pedagological content knowledge (3B). The highest concentration of schema activation is found in part 9 and is most frequently coupled with context-specific knowledge (5E). The highest concentrations of evaluation statements are found in parts 9 and 10, most often coupled with general item-writing knowledge (7A). These concentrations are noted in Table 9. Conclusion The results of this study confirm the earlier findings of Fulkerson, Mittelholtz, and Nichols (2009) and Fulkerson, Nichols, and Mittelholtz (2010) with regard to the cognitive processes of more and less experienced item writers. These studies found that item writers appear to engage in three phases of problem solving: representation/definition, exploration/operation, and solution (Figure 2). These phases are more distinct in the problemsolving activities of more experienced item writers than less experienced items writers. These studies indicated that expert writers are more likely to recognize and relax constraints and draw on past item-writing experiences. Problem definition Physical and mental operators Constraint recognition Solution Schema activation Figure 2: Model of item-writing expertise based on cognitive processes as described by Fulkerson, Mittelholtz, & Nichols (2009) and Fulkerson, Nichols, & Mittelholtz (2010). EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 19 The results of this study also show the use of knowledge structures in the item-writing process. All knowledge structures were used by the participating item writers. Of the 2,208 statements made by the nine writers in this study, 1,602 statements (73%) were assigned a code (non-F) for a relevant knowledge structure. A large majority (60%) of the applicable statements expressed knowledge of content and items, distantly followed by general item-writing knowledge (15%). These findings suggest that domain-specific content or pedagological knowledge is not the sole or even primary knowledge structure necessary for writing assessment tasks, but that assessment content knowledge and general item-writing knowledge are essential to the creation of quality assessments. The study suggests a strong relationship between some cognitive processes and knowledge structures. The high percentage of 1F (Problem definition/Not applicable) and 6C/6D (Operator/Assessment content knowledge) statements expressed by both expert and novice writers indicates that, regardless of writer experience, the item-writing process largely consists of a period of defining the parameters of an item-writing task, followed by an active search for ways to select and sequence assessment content so as to elicit relevant student performance. The study also indicates that novices spend more of their writing time defining the task and evaluating ways to select and sequence assessment content, while expert writers spend more time moving forward in the problem space by developing and sequencing the assessment content so as to elicit relevant student performance. Experts also spend more of their time recognizing, relaxing, and prioritizing constraints stemming from domain-specific content and/or instructional practices and context-specific nuances that informed the items. These results are consistent with the findings of Katz (1994), who found that professionals were more efficient than students at EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 20 solving real-world design problems, and that professionals tend to identify and handle constraints more effectively than students. This study informs and extends the model of item-writing expertise proposed by Fulkerson, Mittelholtz, and Nichols (2009) and Fulkerson, Nichols, and Mittelholtz (2010) by incorporating requisite knowledge structures for the writing of assessment tasks. However, the results of this study are limited by the relatively small sample size and uneven group sizes and treatments. Further research is necessary to determine if the patterns identified in this study vary based on the type of item being written and what types of supports are offered to writers, especially novices. Additional research should also focus on attempting to determine the types of supports necessary for novice item writers as they begin item-writing tasks. EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 21 References Ball, D.L. (2000). Bridging practices: Intertwining content and pedagogy in teaching and learning to teach. Journal of Teacher Education, 51, 241-247. Bejar, I.I. (1993). A generative approach to psychological and educational measurement. In N. Frederiksen, R.J. Mislevy, & I.I. Bejar (Eds.), Test theory for a new generation of tests (pp. 323-357). Hillsdale, NJ: Erlbaum. Ericsson, K.A. & Simon, H.A. (1993). Protocol Analysis: Verbal Reports as Data. Cambridge: MIT Press. Fulkerson, D., Mittelholtz, D.J, & Nichols, P.D. (2009, April 13). The psychology of writing items: Improving figural response item writing. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA. Fulkerson, D., Nichols, P.D., & Mittelholtz, D.J. (2010, May 3). What item writers think when writing items: Toward a theory of item writing expertise. Paper presented at the annual meeting of the American Educational Research Association, Denver, CO. Gentner, D., & Stevens, A.L. (eds.) (1983). Mental Models. Hillsdale, NJ: Erlbaum. Gorin, J.S. (2006). Test design with cognition in mind. Educational Measurement: Issues and Practice, 25(4), 21-35. Haladyna, T.M., & Downing, S.M. (1989). A taxonomy of multiple-choice item-writing rules. Applied Measurement in Education, 2(1), 37-50. Hill, H.C., Rowan, B., & Ball, D.L. (2005). Effects of teachers’ mathematical knowledge for teaching on student achievement. American Educational Research Journal, 42, 371-406. Hoepfl, M.C. (1994). Developing and evaluating multiple-choice tests. Technology Teacher, 53(7), 25-26. Johnson-Laird, P.N. (1983). Mental Models: Towards a Cognitive Science of Language, Inference, and Consciousness. Cambridge, MA: Harvard University Press. Cambridge, Eng.: Cambridge University Press. Katz, I.R. (1994). Coping with the complexity of design: Avoiding conflicts and prioritizing constraints. In A. Ram, N Nersessian, & M. Recker (Eds.), Proceedings of the Sixteenth Annual Meeting of the Cognitive Science Society. Hillsdale, NJ: Erlbaum. Millman, J., & Greene, (1989). The specification and development of tests of achievement and ability. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 335-366). New York: American Council on Education and Macmillan. EXPANDING THE MODEL OF ITEM-WRITING EXPERTISE 22 Minnesota Department of Education (MDE). (2008). Minnesota Comprehensive Assessments Series II (MCA-II) Test Specifications for Science. Roseville, MN: Author. Mislevy, R.J., Steinberg, L.S., & Almond, R.G. (2002). On the structure of educational assessments. Measurement Interdisciplinary Research and Perspectives, 1, 3-67. Mislevy, R.J., Chudowsky, N., Draney, K., Fried, R., Gaffney, T., Haertel, G., Hafter, A., Hamel, L., Kennedy, C., Long, K., Morrison, A., Murphy, R., Pena, P., Quellmalz, E., Rosenquist, A., Songer, N.B., Schank, P., Wenk, A., & Wilson, M. (2003). Design Patterns for Assessing Science Inquiry (PADI Technical Report 1). Menlo Park, CA: SRI International. Shulman, L. (1986). Those who understand: Knowledge growth in teaching. Educational Researcher, 15(2), 4-14. Shulman, L. (1987). Knowledge and teaching: Foundations of the new reform: Harvard Educational Review, 57(1), 1-22. Wesman, A.G. (1971). Writing the test item. In R.L. Thorndike (Ed.), Educational Measurement. Washington, DC: American Council on Education.
© Copyright 2026 Paperzz