S T A T I S T I C S TEACHER’S GUIDE Chapter V: Scoring the Free-Response Questions Copyright © 2002 the College Entrance Examination Board. All rights reserved Chapter V Scoring the Free-Response Questions The short-answer questions and investigative task on the AP Statistics Exam are scored “holistically”; that is, each response is evaluated as a complete package. With holistic scoring, a judgment is made about the overall quality of the student’s response, as opposed to “analytic” scoring, wherein the necessary components of a student response are specified in advance and each component is given a value counting toward the overall score. For example, suppose a student is to solve the quadratic equation x 2 2 x ! 7 . An analytic scoring rubric might allocate a total of four points in the following manner. Score one point for rewriting the equation as x 2 2 x 7 ! 0 . Score one point for correctly choosing the quadratic formula. Score one point for correct substitution of the coefficients into the formula. Score one point for the correct answer. An analytic scoring method is well suited for evaluating a response to a focused question with few response alternatives (Nitko 2001, 194-195). Holistic scoring, in contrast, is well suited for questions wherein a student is required to synthesize information and respond at least partly in written paragraphs. For example, an open-ended question may present a “real life” experiment with resulting data and ask the student not only to analyze the data but also to comment on how the experimental protocol might be improved. Opinions on improvement might focus on improving the sampling method, controlling confounding variables, or seeking more power through increasing the sample size. Indeed, the student’s justification of his or her experimental improvement could even depend on relevant contextual knowledge brought to the exam from the real world! In this context, holistic scoring represents a recognition not only of multiple reasonable approaches to a statistical analysis, but a statement about statistical synergy—a quality student response is more than just the sum of its parts. The Strategy of Holistic Scoring for the Free-Response Questions One method of implementing holistic scoring is to decide in advance the number of categories into which the student answers will be sorted. These may be labeled in any one of a number of ways: A, B, C, D, and F; distinguished, proficient, and novice; or simple numbers, 5, 4, 3, 2, and 1. The AP Statistics scoring rubric for each free-response question on the AP Exam has five categories numerically scored on a 0–4 scale. Each of these categories represents a level of quality in the student response. These levels of quality are defined on two dimensions: statistical knowledge and communication. The specific rubrics for each question are tied to a general template, which represents the descriptions of the quality levels as envisioned by the AP Statistics Development Committee. This general template is given in the following table, “A Guide to Scoring Free-Response Statistics Questions.” 61 Scoring the Free-Response Questions A Guide To Scoring Free-Response Statistics Questions: The Category Descriptors Score Descriptors Statistical Knowledge Identification of the important components of the problem Communication Explanation of what was done and why, along with a statement of conclusions drawn Demonstration of the statistical concepts and techniques that result in a correct solution of the problem 62 4 Complete • Shows complete understanding of the problem’s statistical components • Synthesizes a correct relationship among these components, perhaps with novelty and creativity • Uses appropriate and correctly executed statistical techniques • May have minor arithmetic errors, but answers are still reasonable • Provides a clear, organized, and complete explanation, using correct terminology, of what was done and why • States appropriate assumptions and caveats • Uses diagrams or plots when appropriate to aid in describing the solution • States an appropriate and complete conclusion 3 Substantial • Shows substantial understanding of the problem’s statistical components • Synthesizes a relationship among these components, perhaps with minor gaps • Uses appropriate statistical techniques • May have arithmetic errors, but answers are still reasonable • Provides a clear but not perfectly organized explanation, using correct terminology, of what was done and why, but explanation may be slightly incomplete • May miss necessary assumptions or caveats • Uses diagrams or plots when appropriate to aid in describing the solution • States a conclusion that follows from the analysis but may be somewhat incomplete Scoring the Free-Response Questions Score Descriptors Statistical Knowledge Communication 2 Developing • Shows some understanding of the problem’s statistical components • Shows little in the way of a relationship among these components • Uses some appropriate statistical techniques, but misses or misuses others • May have arithmetic errors that result in unreasonable answers • Provides some explanation of what was done, but explanation may be vague and difficult to interpret and terminology may be somewhat inappropriate • Uses diagrams in an incomplete or ineffective way, or diagrams may be missing • States a conclusion that is incomplete 1 Minimal • Shows limited understanding of the problem’s statistical components by failing to identify important components • Shows little ability to organize a solution and may use irrelevant information • Misuses or fails to use appropriate statistical techniques • Has arithmetic errors that result in unreasonable answers • Provides minimal or unclear explanation of what was done or why it was done, and explanation may not match the presented solution • Fails to use diagrams or plots, or uses them incorrectly • States an incorrect conclusion, or fails to state a conclusion 0 • Shows little to no understanding of statistical components • Provides no explanation of a legitimate strategy 63 Scoring the Free-Response Questions The AP Statistics Exam consists of two parts; the first is comprised of multiple-choice questions, the second of free-response questions. The free-response questions and the holistic evaluation of the students’ responses to them reflect the practice of modern statistics and have implications for the instructional strategies used when preparing students for the exam. The Topic Outline on page 28 delineates the skills and concepts of the AP Statistics course. A student who understands the terms, correctly applies the formulas, and can provide good answers to the “stock” questions in elementary statistics still may not be as well prepared as possible for the open-ended questions. Why might this be? The General Goals of the AP Statistics Course The general goals of the course, consistent with the course philosophy, are more about the relations and connections among the topics in the Topic Outline than about the topics alone. Instructional time should encourage acquisition by students of an effective network of relationships that binds sets of topics into cognitive wholes. The short-term benefit of such an instructional strategy is that students will be better able to function when confronted with context-rich open-ended questions. The long-term benefits may be divided into two classes. First, the instructional approach should go beyond “topics to be covered” and synthesize relations among the topics to provide a solid foundation for further academic study in statistics. Students preparing to study more advanced mathematics in college have a foundation in algebra and geometry constructed throughout their high school careers. For many students, the analogous foundation in statistics will be provided in the AP Statistics course. Statistics, like mathematics, is a discipline rich in connections between parts that at first blush may not be readily seen as connected. A solid instruction in statistics will help students see these connections. A second long-term benefit is the preparation of students to address and solve problems of a statistical nature long after the statistics class has finished. Cognitive psychology tells us that a relatively disconnected set of topics will be forgotten as time passes. A set of topics cohesively organized into larger schema will be remembered longer and are more likely to be applied in future problem-solving situations. The building blocks of such a cohesive framework are presented here. 64 Scoring the Free-Response Questions 1. The Collection of Data Students should understand that valid statistical analyses of data depend critically on the method of data collection. Measures must be related to the variables of interest in the study. While this is generally not a problem in the physical sciences, it is especially important, for example, when designing survey questions in the social sciences. A survey consisting of vague or poorly worded questions will be difficult to interpret. Meaningful generalizations from samples to populations can be made only if based on random sampling; students should appreciate the importance of a good sampling plan. Also, they should be aware of the differences between observation and experimentation, specifically with respect to the possibility of tentatively identifying a cause-and-effect relationship between variables. In general, students should be able to construct a sampling plan, design an experiment, and interpret the results in light of these considerations, as well as analyze and interpret a study with 20/20 hindsight, recognizing inevitable limitations and offering constructive criticism. 2. The Representation of Data Numeric and graphical representation of data is the starting point for both descriptive and inferential statistical analysis. Whether data results from observations of individual subjects, or possibly from a simulation, the variability, shape of the distribution, and unusual or interesting features of data are of fundamental importance. In addition to providing information about the distribution of data values, simple pictures of data can also uncover measurement errors, provide reality checks on assumptions about populations, and suggest avenues for future analysis. Students should be able to represent data in a variety of forms and base sound statistical arguments on their representations. 3. Probability Is the Basic Language of Statistics Probability, the mathematics of chance and variability, provides both foundational ideas like events, independence, and probability distributions, and also a mathematical language for communicating about these ideas in the AP Statistics course. An understanding of sampling distributions as probability distributions and statistics as random variables, together with a basic knowledge of the algebra of random variables, provides a conceptually solid foundation for future study in college. Students should have a working vocabulary that allows active classroom discussion of statistical ideas based on the grounding of probabilistic ideas. 65 Scoring the Free-Response Questions 4. Statistical Inference Occurs in a Very Real World Successful students should have a firm grasp of the nature and logic of statistical inference as it unfolds in scientific studies, from planning to p-value. Random sampling provides the basis for generalizing one’s findings beyond the data at hand. Proper experimental design is a tool for controlling extraneous variables and reducing the ambiguities of experimental results. The formal inferential techniques of confidence intervals and hypothesis testing lead to the assessment of the statistical significance of particular experiments. 5. Communication of Analysis, Methods, and Results Even the best experimental design and statistical analysis will suffer if the reporting of methods and results is incomplete, ambiguous, or misleading; student responses on the AP Statistics open-ended questions are not exempt from this reality. It may seem paradoxical to demand precision and clarity in a science devoted to the study of uncertainty, but the communication demands in modern statistical practice are congruent with the test situation. Statistics, more than any other mathematical discipline, features communication between expert and relative novice. The practice of modern statistics involves explanation, interpretation, and translation. Students should be prepared to justify their analyses to a statistically literate reader as well as write cogent explanations for general public consumption. The Communication Dimension of the Free-Response Questions The arrival of the National Council of Teachers of Mathematics’ (NCTM) Principles and Standards has focused the attention of mathematics teachers on problem solving and writing in the math classroom. Mathematics teachers are very familiar with exhorting students to “show their work” on their homework and exams, and, of course, writing is a standard method of communication in school. However, this new focus brings math teachers and math students to the frontier of their mathematical knowledge and their ability to communicate fluently in writing. There are few guidelines and no ironclad rules on this boundary. 66 Scoring the Free-Response Questions Statistics is a discipline where communication at the boundary is an essential skill. Consulting statisticians as well as investigators in the various scientific fields must be able to formulate real research questions into a statistical form and then interpret the results of their statistical analysis for others. The evaluation of student responses on the free-response section of the AP Statistics Exam reflects the importance of communication. The AP Central Web site contains free-response questions and scoring rubrics from past AP Statistics Exams. This is an excellent place to become more familiar with specific implementations of the general rubric template for specific free-response exam questions. However, the specificity of the rubrics on past questions can be misinterpreted. The level of detail of the rubrics and their association with the specific question at hand renders them not completely reliable as general guidelines. After reading the individual rubrics, one might be tempted to create overly-specific rules for students’ statistical writing. For example, a particular hypothesis-testing question may ask students to comment on how the data were collected, and this would, of course, be reflected in the rubric. A cursory reading of the rubric might lead one to think that for all hypothesis-testing questions, a discussion of the sampling is required. This temptation must be resisted! A particular element in the response to, say, an inference question could be crucial, less than crucial, or even irrelevant in a different question. While it may never be possible to completely codify guidelines or rules of thumb for approaching the free-response questions, it is now possible to offer some general observations about common limitations and errors of omission and commission in student responses on past exams. Writing in statistics is a matter of judgment, wisdom, and experience, and these qualities tend to be in short supply while one is learning the basics. Judgment and wisdom will arrive in due time; we can provide some benefits of experience now. 1. General Remarks A common omission by students is the failure to define symbols correctly. With the time constraints of an exam, students may forget which Greek letter is the appropriate one to use and provide, without explanation, an incorrect symbol. Defining the meaning of a symbol will, in general, inoculate students against ambiguity or the presumption of error. For example, defining dA to be the population mean effect of drug A is not the standard way of proceeding, but it does eliminate ambiguity. Specifically, students should distinguish between population parameters and their estimates verbally or with symbols. They should mention “the population mean” or “the sample mean,” rather than simply “the mean.” 67 Scoring the Free-Response Questions As a general rule, students should communicate in the accepted symbolism and format of statistical writing when writing their responses. Frequently, students have attempted to communicate using the symbolism and format of their calculators. For example, some calculators will perform a hypothesis test automatically and present the results on the screen in a format that is different from the accepted statistical format. Students may then simply copy what they see on their calculator screens. This is seldom an appropriate form of communicating a process and should be avoided. Sometimes students will write too much or too little. The best strategy is to (a) clearly answer the question asked and (b) stop. It frequently seems that students begin their writing while still formulating their thoughts, perhaps under the impression that they are using their time more efficiently. Also, some students seem to feel they must fill up the space allotted for the question. These two strategies have a common effect: irrelevant or possibly incorrect or contradictory communication at either the beginning of the response or the end. A much better strategy is to read the question carefully and then respond to that question. Contradictory writing will always be regarded as incorrect. Should two parallel solutions be offered, both will be read, but the lesser/least score will be awarded. A common omission is the failure to interpret the results of a statistical procedure. Most questions on the AP Statistics Exam require students to use a statistical procedure in a context. Student responses are evaluated on the statistical methodology and the interpretation of the results. Students may feel that the statistical methodology will carry the day and the reader of the response will “know what was meant.” Irrespective of what was meant, every response will be evaluated on what was written. If a question is asked in a context, students must explicitly interpret the statistical results in that context. 2. Exploring Data The graphic presentation of data is, of course, an excellent method of communicating about distributions of data and relations between variables. Effective communication with graphs will be compromised without correctly labeled and scaled axes. In comparative displays, students frequently fail to label which group is associated with which display. Another common error is the use of different scales in a comparative setting. For example, the graphic clarity of comparative boxplots of the heights of males and females is destroyed if each boxplot has its own scale. Descriptions of distributions should, as a matter of course, address the center, variability, and shape as well as unusual features like outliers, gaps, and clusters. When interpreting such distributions, these features should be explicitly mentioned; when comparing distributions, characteristics of each distribution should be explicitly mentioned. To say that distribution A is skewed to the right, for example, does not carry any implicit information that distribution B is not so skewed. 68 Scoring the Free-Response Questions 3. Planning Studies Writing responses to questions about planning studies has historically been difficult for students. This is not surprising since these questions tend to elicit the least formulaic responses. A student who, for example, struggles with hypothesis testing can demonstrate some knowledge by adhering to the stylistic form of the hypothesis testing procedure. The “writing style” for discussion of methods of data collection and experimental design is almost blank verse by comparison. Some students seize this opportunity and write what are, in effect, blank responses, though they may entirely fill the allotted space for the response. Possibly the best advice for responding to these questions is to know the vocabulary of sampling and experimental design and to write with precision using that vocabulary. With questions about experimental design in particular, it is not uncommon for students to answer questions that were not asked, leaving less time and space to respond effectively to the question that was asked. For example, if asked to identify a potential confounding variable in a particular scenario, it is not needed, nor is it necessarily helpful, for the student to define what a confounding variable is. Some students adopt a shotgun strategy and attempt to regurgitate what they remember about confounding variables in the hope that in the process of writing their response they will stumble upon something relevant to the question. It is more likely, with holistic grading, that the lack of focus in their answer will count against them. 4. Probability Probability questions are the most “math-like” on the AP Statistics Exam, and students who know how to solve these problems should be relatively comfortable doing so. Difficulties in this area tend to be more organizational than conceptual, and students can improve their chances of doing well by using some simple strategies. a) Establish the formula first. Students should state what formula they are using in their calculations and substitute values from the problem appropriately. At that point, not before, students should pick up their calculators. b) Work through the entire problem. If a problem requires a sequence of calculations, intermediate values should be shown. Generally, if a question has multiple parts and a student misses an early part, the evaluation of successive parts will presume that whatever answer was given earlier is correct. That is, an error in part (a) will not automatically invalidate answers in parts (b), (c), and so on. Giving credit in the face of earlier errors depends on being able to trace the thread of reasoning through the problem. 69 Scoring the Free-Response Questions c) Use graphics. Students should take advantage of graphic representations of the problem in their response. Displays of Venn diagrams and tree diagrams can effectively convey the students’ understanding of probability problems as well as reduce ambiguity in the algebraic presentation of their responses. d) Answer in context. Because probability questions are math-like, it seems to be easy for students to forget that the question is presented in a context. They should not stop at getting a numerical answer; they should ensure the numerical answer is embedded in the context of the problem in a complete sentence. 5. Inference Answers to inference questions tend to be stylistic in nature, although presentations in textbooks are slightly different. Complete (i.e., score = 4) responses to inference questions where students use hypothesis testing will generally require five components. a) The correct statement of a set of hypotheses. Null and alternative hypotheses must be stated, correctly defining any notation used. The use of H0 : , w and are standard symbols and need not be defined. However, the use of the symbols should be specified (e.g., “Let Q A represent the population mean height of corn under treatment A.”). The symbols as presented on the formula sheet should be regarded as defined and meaningful, and other notation should be defined before use. A common error is to specify the symbols, Qa , Q b , but then fail to link them to the particular populations under discussion. This is particularly unfortunate in the case of a one-tailed hypothesis test. b) Identification of an appropriate test procedure. The procedure and test statistic must be clearly specified. This should be in the usual statistical form, avoiding the “calculator form” naming the procedures for button pushing. The correct formula for the test statistic is usually the least ambiguous identification of a procedure, although a construction such as “Two-sample t-test for independent samples” is also acceptable. c) Identifying and checking the appropriate assumptions. An important part of using inferential procedures is recognizing the assumptions that underlie and justify the procedures. Different texts will counsel slightly different requirements, and no specific requirement is enforced by the question rubrics except that it be commonly accepted in the statistical community. For example, requirements for using the large-sample test for a single population proportion are usually stated 70 Scoring the Free-Response Questions in the format np u k , n(1 p ) u k , but the actual value for k will differ. It is not important that students identify a “correct” value for k. What is important is that they know there are assumptions that underpin inferential procedures, recognize which are appropriate for the given problem, and check them. A common omission students make is the actual checking of the assumption by substituting values for n, p, k, and stating that the conditions are met. Simply stating the assumptions without checking them explicitly is not sufficient for a complete response. A top-notch approach to checking assumptions is to include a statement about the reason for checking. For example, a common check for the credibility of a normal population is graphic: constructing a boxplot. If there is no indication of skewness, the t-procedure may be judged as justified. A student who sketches the boxplot, comments that there are no outliers or other indications of skew, and then links those comments to the appropriateness of the t-procedure, would have a stellar check of the assumptions. Many students show some confusion about the assumption of normality. This assumption refers to the population from which the data is drawn. Specifically, the assumption is not that there are no outliers in a set of data, it is that the population from which the data is drawn may credibly be presumed to be normal, or at least approximately so. In addition, the evidence must be linked to the assumption. Simply stating, for example, that there are “no outliers” is not sufficient for establishing normality. It is often forgotten that random sampling from a known population is an assumption for making generalizations to that population. In addition, random assignment to treatments is required for an argument of causation based on differences in experimental groups. Students should be sensitive to these concerns when responding to inference questions in contexts and express reservations when appropriate. d) The correct mechanics. This is usually a matter of showing intermediate steps in an organized manner. The correct values of the test statistic, the number of degrees of freedom when appropriate, the level of significance, and the p-values appropriately identified in the body of the work is essential. e) Stating correct conclusions in the context of the problem. Test statistics must be used to arrive at a correct conclusion, either via the p-value approach or a rejection region approach. Linkage between the conclusion and test result should be stated clearly (e.g., “Because the p-value is less 71 Scoring the Free-Response Questions than 0.05, the null hypothesis is rejected.”). Of critical importance is the conclusion stated in the context of the problem, consistent with the defined hypotheses. That is, a rejection of a null hypothesis should be correctly interpreted in context. If confidence intervals are used for making inferences about population parameters, the considerations above will still apply. One particularly common omission that occurs when confidence intervals are used is the checking of assumptions for the procedure. Both confidence intervals and hypothesis testing are based on the characteristics of the sampling distributions of the statistics of interest, and both procedures require checks of the assumptions. If confidence intervals are used, a correct confidence interval statement is required. Students frequently err by making a probability statement or indicating they have confidence that a sample value rather than a population value is in a certain interval. Rounding Answers Remind your students that it is best not to round numbers at intermediate steps in a calculation. Answers should be rounded only at the end and then not too much. For example, a z-score should be rounded to two decimal places in order to use the table, but each step in the calculation should not be rounded at all, or the final z-score may be quite different from what it would have been otherwise. A rule that is usually satisfactory is to round the final answer to one more decimal place than was given in the data. Defining Symbols Students should define all symbols when writing solutions to open-ended questions. For example, when writing a null hypothesis, students should not write just Q ! 75, but should state what Q represents. A clear and complete statement of a null hypothesis would be Q ! 75, where Q is the mean of the reading test scores for all students in this school. If students practice this policy throughout the course, they will have less trouble understanding that the null hypothesis is about the population, not about the sample. 72 Scoring the Free-Response Questions Interpreting Sample Computer Output The Development Committee strongly recommends the use of computers in performing data analysis. A variety of inexpensive statistical packages exist, and the output from these programs is, for the most part, standardized, though some slight variations in terminology may be seen. There may also be some differences in the information presented and its organization on the computer screen or printout. Since computers are not allowed in the actual AP Statistics Exam and calculators typically present information in a nonstandard manner, some questions on the exam contain computer output. The computer output varies from question to question and is not from any particular statistical package. The computer output given for a particular question may be “complete,” or it is possible that only partial output with more focused information may be presented for the students’ consideration. Students must be able to interpret the computer output in order to answer both multiple-choice and free-response questions. Some examples of output from some standard packages are shown in this section. As you will notice, the different software packages present their output slightly differently and sometimes give some different information, so you should ensure that your students have seen several different types of printouts before taking the exam. Such sample output can be found in many introductory textbooks. The data in this example come from a study of alarm barks in prairie dog towns (Motiff 1980). These creatures emit repetitious alarm barks when there are intruders or other dangers present. The regression output presented is from fitting the bark frequency (barks/30 sec) vs. intruder distance from the burrow (feet). Bark Freq. of Black-Tailed Prairie Dogs Distance Bark from Burrow Frequency 10.00 81 20.00 79 30.00 78 40.00 73 50.00 72 60.00 71 70.00 59 80.00 71 90.00 67 100.00 64 110.00 57 120.00 55 130.00 41 Bivariate Fit of Bark Freq. by Distance 73 Scoring the Free-Response Questions Regression Analysis from Minitab Regression Analysis The regression equation is Bark Freq = 85.3 – 0.264 Distance Predictor Constant Distance S = 5.001 Coef 85.269 –0.26429 StDev 2.942 0.03707 R-Sq = 82.2% T 28.98 –7.13 P 0.000 0.000 R-Sq(adj) = 80.6% Analysis of Variance Source Regression Residual Error Total DF 1 11 12 SS 1271.2 275.1 1546.3 MS 1271.2 25.0 F 50.83 P 0.000 Unusual Observations Obs 13 Distance 130 Bark Fre 41.00 Fit 50.91 StDev Fit 2.62 R denotes an observation with a large standardized residual. 74 Residual –9.91 St Resid –2.33R Scoring the Free-Response Questions Regression Analysis from Data Desk Dependent variable is: Bark Frequency R squared = 82.2% R squared (adjusted) = 80.6% s = 5.001 with 13 - 2 = 11 degrees of freedom Source Regression Residual Sum of Square 1271.21 275.093 Variable Constant Distant Coefficient 85.2692 –0.264286 df 1 11 s.e. of Coeff 2.942 0.0371 Mean Square 1271.21 25.0085 t-ratio 29.0 –7.13 F-ratio 50.8 Prob 0.0001 0.0001 75 Scoring the Free-Response Questions Regression Analysis from JMP-Intro Linear Fit Bark Freq = 85.269231 - 0.2642857 Distance Summary of Fit Rsquare Rsquare Adj Root Mean Square Error Mean of Response Observations (or Sum Wgts) 0.822097 0.805924 5.000849 66.76923 13 Analysis of Variance Source Model Error C Total DF 1 11 12 Sum of Squares 1271.2143 275.0934 1546.3077 Mean Square 1271.21 25.01 F Ratio 50.8313 Prob > F <0.0001 Parameter Estimates Term Intercept Distance 76 Estimate 85.269231 –0.264286 Std Error 2.942242 0.037069 T Ratio 28.98 –7.13 Prob > |t| <0.0001 <0.0001
© Copyright 2025 Paperzz