AP Statistics Unit 4 Note Packet Sampling and Experimental Design Name______________________ Hr___ Sampling and Surveys The _________________ in a statistical study is the entire group of individuals about which we want information. A _________________ is the part of the population from which we actually collect information. We use information from a sample to draw conclusions about the entire population. Representation: Sample Survey: Step 1: Define the population we want to describe. Step 2: Say exactly what we want to ________________. A “sample survey” is a study that uses an organized plan to choose a sample that represents some specific population. Step 3: Decide how to choose a ________________ from the population. Examples of “Bad” Sampling: The design of a statistical study shows ___________ if it systematically favors certain outcomes. Convenience Sampling: Voluntary Response Sample: Sampling well: Random Sampling Voluntary response samples show bias because people with strong opinions (often in the same direction) are most likely to respond. A simple random sample (SRS) of size n consists of n individuals from the population chosen in such a way that every set of n individuals has an equal chance to be the sample actually selected. A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these properties: Each entry in the table is equally likely to be any of the 10 digits 0 - 9. The entries are independent of each other. That is, knowledge of one part of the table gives no information about any other part. AP Statistics Unit 4 Note Packet Sampling and Experimental Design Choosing an SRS using a Table of Random Digits: Step 1: ______________. Give each member of the population a numerical label of the same length. Step 2: ______________. Read consecutive groups of digits of the appropriate length from Table D. Your sample contains the individuals whose labels you find. Example: Choosing an SRS • Use Table D at line 130 to choose an SRS of 4 hotels. Aloha Kai Anchor Down Banana Bay Banyan Tree Beach Castle Best Western Cabana Captiva Casa del Mar Coconuts Diplomat Holiday Inn Lime Tree Outrigger Palm Tree Radisson Ramada Sandpiper Sea Castle Sea Club Sea Grape Sea Shell Silver Beach Sunset Beach Tradewinds Tropical Breeze Tropical Shores Veranda SAMPLING METHODS To select a stratified random sample, first classify the population into groups of similar individuals, called ______________. Then choose a separate SRS in each stratum and combine these SRSs to form the full sample. To take a cluster sample, first divide the population into smaller groups. Ideally, these clusters should mirror the __________________ of the population. Then choose an SRS of the clusters. ______individuals in the chosen clusters are included in the sample. EXAMPLE: At Kansas State University, a professor wanting to find out about student attitudes randomly selects a certain number of classes to survey and he includes all the students in those classes. Systematic sampling: a procedure that can be employed when it is possible to view the population of interest as consisting of a list or some other _____________ arrangement. A value k is specified (a number such as 25, 100, 2500…). Then one of the first k individuals is selected at random, and then every kth individual in the sequence is selected to be included in the sample. Example: In a large university, a professor wanting to select a sample of students to determine the student’s age, might take the student directory (an alphabetical list) and randomly choose one of the first 100 students) and then take every 100th student from that point on. AP Statistics Unit 4 Note Packet Sampling and Experimental Design EXAMPLE: Sampling at a School Assembly: Describe how you would use the following sampling methods to select 80 students to complete a survey. • (a) Simple Random Sample • (b) Stratified Random Sample • (c) Cluster Sample Inference for Sampling The purpose of a sample is to give us information about a larger ______________. The process of drawing conclusions about a population on the basis of sample data is called ___________________. Why should we rely on random sampling? To eliminate ________ in selecting samples from the list of available individuals. The laws of _______________ allow trustworthy inference about the population Results from random samples come with a ________________________ that sets bounds on the size of the likely error. (It tells us how much variability to expect.) ________________ random samples give better information about the population than smaller samples. Bias is introduced by the way in which a sample is selected or by the way in which the data are collected from the sample. Increasing the size of the sample does _____________ to reduce the bias! Sampling Error: Mistakes made in the ________________ of taking a sample that could lead to inaccurate information about the population _______________________________ _______________________________ _______________________________ Undercoverage occurs when some ___________________ in the population are left out of the process of choosing the sample. EXAMPLE: A sample survey of households will miss homeless persons, prison inmates, students living in dorms, etc. AP Statistics Unit 4 Note Packet Sampling and Experimental Design Nonsampling Error: Can plague even a __________ Nonresponse Response Bias Poor wording of ______________ Nonresponse occurs when an individual __________ for the sample can’t be contacted or refuses to participate. NOTE: This differs from “voluntary response” because in a voluntary response survey the individuals have all opted to take part in the survey. In nonresponse, those chosen for the sample do not participate. Response Bias: A systematic pattern of _________________ responses EXAMPLES: People know they should vote, so when asked by an interviewer if they voted in the last election, they will say that they did. Faulty memory: “Have you visited the dentist in the last 6 months?” Wording of Questions: Confusing or __________ questions can introduce strong bias EXAMPLE: The same sample was asked both of these questions: “Should illegal immigrants be prosecuted and deported for being in the U.S. illegally, or shouldn’t they?” (69% favored deportation) “Should illegal immigrants who have been in the U.S. for two years be given the chance to keep their jobs and eventually apply for legal status?” (62% responded “yes”) Observational Studies vs. Experiments Observational study: The researcher observes individuals and measures variables of interest ________________ influencing the responses. GOAL: to draw _____________________ about the corresponding population or about differences between two or more populations NOTE: Observational studies of the effect of one variable on another often _________ because of confounding between the explanatory variable and one or more lurking variables. A lurking variable is a variable that is not among the explanatory or response variables in a study but that may influence the response variable. Confounding occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other. AP Statistics Unit 4 Note Packet Sampling and Experimental Design Example: Observe women who take hormones vs. those not taking hormones and note whether a heart attack has occurred. What are some possible lurking variables? Experiment: The researcher _______________________ imposes some treatment on individuals to measure their responses. GOAL: to determine whether the treatment _______________ a change in the response Well-designed experiments can provide evidence for a ___________________ relationship The Language of Experiments Treatment: A specific ______________ applied to the individuals in an experiment If an experiment has several explanatory variables, a treatment is a combination of specific values of these variables. Experimental Units: the collection of individuals to which treatments are applied When the units are human beings, they often are called ________________. Sometimes, the explanatory variables in an experiment are called _______________. Many experiments study the joint effects of several factors. In such an experiment, each treatment is formed by combining a specific value (often called a __________) of each of the factors. EXAMPLE: There are now many special courses that claim to prepare you for the SAT. Suppose that you want to evaluate a particular course, using SAT scores to measure the effect of the course. You might find a reasonably large high school where students are offered the chance to take the course, and then compare the SAT scores of those who completed the course with the scores of those who chose not to take it. Suppose that you find that the average SAT score for students who took the course is 30 points higher than for students who didn’t. Identify each of the elements in this study: population, response variable, treatments. Is this study a true experiment? Do you conclude that the course causes an increase in SAT scores? Why or why not? AP Statistics Unit 4 Note Packet Sampling and Experimental Design Each pair of variables shown is strongly associated. Does A cause B or does B cause A, or is there a lurking variable? 1. A: having hip surgery 2. A: the amount of milk a person drinks B: the strength of a person’s bones B: dying within the next 10 years 3. A: the amount of money a person earns 4. A. the number of classes taken with Mrs. Sapp B: the number of years a person went to school B. level of endorphins in bloodstream Model for experiments: EXAMPLES: Suppose Starbucks wishes to find out whether the population of MHS students prefer hot or cold, frozen coffee drinks. A random sample of students is selected, and each one is asked to try first hot coffee and then frozen coffee, or vice versa (with the order determined at random). They then indicate which type they prefer. Experiment or Observational Study? A researcher is interested in the effects of excessive homework on family dinner nights. She surveys students on their homework load, and the number of family nights they have been required to forego. She concludes that homework load does not directly affect family nights. Experiment or Observational Study? Suppose an experiment is designed to investigate the effects of repeated exposure to TV ads. All subjects viewed a 40 minute TV program that included ads for an iPad. Some subjects saw a 30- second commercial; others, a 90-second version. The same commercial was shown either 1, 3, or 5 times during the program. After viewing, all the subjects answered questions about their recall of the ad, their attitude toward the iPad and their intention to purchase it. Identify the explanatory and response variables: List all the treatments: AP Statistics Unit 4 Note Packet Sampling and Experimental Design How to Experiment Well: The Randomized Comparative Experiment A farm-product manufacturer wants to determine if the yield of a crop is different when the soil is treated with three different types of fertilizers. Fifteen similar plots of land are planted with the same type of seed but are fertilized differently. At the end of the growing season, the mean yield from the sample plots is compared. experimental units explanatory variable (factor) levels response variable experimental design The Importance of Randomizing: What is the main threat to making reliable inferences about cause? We can think of confounding as the two groups that we wish to compare differing in some other way than the relevant response variable. How can we guard against confounding? Remember: If you don’t randomize, it’s risky to generalize! The remedy for confounding is to perform a ____________________________ in which some units receive one treatment and similar units receive another. Most well designed experiments compare two or more treatments. Comparison alone isn’t enough, if the treatments are given to groups that differ greatly, ______ will result. The solution to the problem of bias is random assignment. In a _________________________ design, the treatments are assigned to all the experimental units completely by chance. Some experiments may include a ___________________ group that receives an inactive treatment or an existing baseline treatment. Diagram: AP Statistics Unit 4 Note Packet Sampling and Experimental Design Three Principles of Experimental Design: 1. _______________ for lurking variables that might affect the response: Use a comparative design and ensure that the only systematic difference between the groups is the treatment administered. 2. _______________________________: Use impersonal chance to assign experimental units to treatments. This helps create roughly equivalent groups of experimental units by balancing the effects of lurking variables that aren’t controlled on the treatment groups. 3. __________________: Use enough experimental units in each group so that any differences in the effects of the treatments can be distinguished from chance differences between the groups. Example: The Physicians’ Health Study This study looked at the effects of two drugs: aspirin and beta carotene. Researchers wondered whether beta carotene would help prevent some types of cancer. The subjects were 21, 996 male physicians. There were two explanatory variables (factors), each having two levels: aspirin (yes or no) and beta carotene (yes or no). Combinations of these factors form the four treatments shown. On odd-numbered days, the subjects took either a tablet that contained aspirin or a placebo pill. On even-numbered days, they took either a beta carotene pill or a placebo*. There were several kinds of response variables: heart attacks, certain types of cancer, and other medical outcomes. After several years, 239 of the placebo group and 139 of the aspirin group suffered heart attacks. The beta carotene, however, didn’t seem to have any significant effects. Explain how each of the three principles of experimental design was used in the study. *Note: A placebo is a “dummy pill” or inactive treatment that is indistinguishable from the real treatment. Inference for Experiments and Blocking A response to a dummy treatment is called a ______________ effect. The strength of the placebo effect is a strong argument for randomized comparative experiments. Whenever possible, experiments with human subjects should be ___________________. In a double-blind experiment, neither the subjects nor those who interact with them and measure the response variable know which treatment a subject received. AP Statistics Unit 4 Note Packet Sampling and Experimental Design Statistically Significant: In an experiment, researchers usually hope to see a difference in the responses so large that it is unlikely to happen just because of chance variation. An observed effect so large that it would rarely occur by chance is called statistically significant. (A statistically significant association in data from a well-designed experiment does imply causation.) Blocking: A block is a group of experimental units that are known before the experiment to be similar in some way that is expected to affect the response to the treatments. In a _______________________ design, the random assignment of experimental units to treatments is carried out separately within each block. Form blocks based on the most important unavoidable sources of _____________ (lurking variables) among the experimental units. Randomization will average out the effects of the remaining lurking variables and allow an __________________________ of the treatments. ***Control what you can, block on what you can’t control, and randomize to create comparable groups.*** In a block design, before the experimental units are randomly assigned to a treatment group: • experimental subjects are divided into ____________________ blocks The blocks are based on the most important unavoidable sources of ____________________ (lurking variables) • The variability within blocks is ________ than the variability between blocks. • Reduces _____________________ and potential ________________ • Produces a better ________________ of treatment effects. EXAMPLE: Suppose a researcher is carrying out a study of the effectiveness of four different skin creams for the treatment of a certain skin disease. He has ninety subjects and plans to divide them into 3 treatment groups of thirty subjects each. If the experimenter has reason to believe that age might be a significant factor in the effect of a given medication, he might choose to first divide the experimental subjects into age groups, such as under 30 years old, 30-60 years old and over 60 years old. Then, within each age level, individuals would be assigned to treatment groups using a completely randomized design. Another way we could do randomized block design would be to have the subjects assessed and put in blocks of three according to how severe their skin condition is; the four most severe cases are the first block, the moderate cases in the second block, and mildest cases in the third block. The members of each block are then randomly assigned, one to each of the four treatment groups. AP Statistics Unit 4 Note Packet Sampling and Experimental Design Example 2: Suppose you have 500 individuals (250 males, 250 females) participating in a study for a new vaccine. Since it is known that men and women are physiologically different and react differently to medication, we might consider blocking by gender. Then, within each block, subjects are randomly assigned to treatments. This design ensures that each treatment condition has an equal TREATMENT proportion of men and women. As a result, differences between treatment conditions cannot be attributed to Gender Placebo Vaccine __________________. This randomized block design removes gender as a potential Male 250 250 source of ___________________ and as a potential confounding variable. Female 250 A ______________________________ is a randomized blocked experiment in which each block consists of a matching pair of similar experimental units. Chance is used to determine which unit in each pair gets each treatment. Sometimes, a “pair” in a matched-pairs design consists of a single unit that receives both treatments. Since the order of the treatments can influence the response, chance is used to determine with treatment is applied first for each unit. To do a matched pair design using the previous example, the 1000 subjects are grouped into 500 matched pairs. • Each pair is ____________________ on gender and age. • For example, Pair 1 might be two women, both age 21. Pair 2 might be two women, both age 22, and so on. For the acne example, the matched pairs design is an ______________________ over the completely randomized design and the randomized block design. • Like the other designs, the matched pairs design uses randomization to control for ______________________. • However, unlike the others, this design explicitly controls for two potential lurking variables - age and gender. 250
© Copyright 2026 Paperzz