AP Statistics Unit 4 Note Packet Sampling and Experimental Design

AP Statistics
Unit 4 Note Packet
Sampling and Experimental Design
Name______________________ Hr___
Sampling and Surveys
The _________________ in a statistical study is the entire group of individuals about which we
want information.
A _________________ is the part of the population from which we actually collect information.
We use information from a sample to draw conclusions about the entire population.
Representation:
Sample Survey:
Step 1: Define the population we want to describe.
Step 2: Say exactly what we want to ________________.
A “sample survey” is a study that uses an organized plan to choose a sample that represents some
specific population.
Step 3: Decide how to choose a ________________ from the population.
Examples of “Bad” Sampling:
The design of a statistical study shows ___________ if it systematically favors certain outcomes.
Convenience Sampling:
Voluntary Response Sample:
Sampling well: Random Sampling
Voluntary response samples show
bias because people with strong
opinions (often in the same direction)
are most likely to respond.
A simple random sample (SRS) of size n consists of n individuals from the population chosen in
such a way that every set of n individuals has an equal chance to be the sample actually selected.
A table of random digits is a long string of the digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 with these
properties:
 Each entry in the table is equally likely to be any of the 10 digits 0 - 9.
 The entries are independent of each other. That is, knowledge of one part of the table
gives no information about any other part.
AP Statistics
Unit 4 Note Packet
Sampling and Experimental Design
Choosing an SRS using a Table of Random Digits:
Step 1: ______________. Give each member of the population a numerical label of the same
length.
Step 2: ______________. Read consecutive groups of digits of the appropriate length from Table
D. Your sample contains the individuals whose labels you find.
Example: Choosing an SRS
• Use Table D at line 130 to choose an SRS of 4 hotels.
Aloha Kai
Anchor Down
Banana Bay
Banyan Tree
Beach Castle
Best Western
Cabana
Captiva
Casa del Mar
Coconuts
Diplomat
Holiday Inn
Lime Tree
Outrigger
Palm Tree
Radisson
Ramada
Sandpiper
Sea Castle
Sea Club
Sea Grape
Sea Shell
Silver Beach
Sunset Beach
Tradewinds
Tropical Breeze
Tropical Shores
Veranda
SAMPLING METHODS
To select a stratified random sample, first classify the population into groups of similar
individuals, called ______________. Then choose a separate SRS in each stratum and combine
these SRSs to form the full sample.
To take a cluster sample, first divide the population into smaller groups. Ideally, these clusters
should mirror the __________________ of the population. Then choose an SRS of the clusters.
______individuals in the chosen clusters are included in the sample.
 EXAMPLE: At Kansas State University, a professor wanting to find out about student
attitudes randomly selects a certain number of classes to survey and he includes all the
students in those classes.
Systematic sampling: a procedure that can be employed when it is possible to view the
population of interest as consisting of a list or some other _____________ arrangement. A
value k is specified (a number such as 25, 100, 2500…). Then one of the first k individuals is
selected at random, and then every kth individual in the sequence is selected to be included in the
sample.
 Example: In a large university, a professor wanting to select a sample of students to
determine the student’s age, might take the student directory (an alphabetical list) and
randomly choose one of the first 100 students) and then take every 100th student from that
point on.
AP Statistics
Unit 4 Note Packet
Sampling and Experimental Design
EXAMPLE: Sampling at a School Assembly:
Describe how you would use the following sampling
methods to select 80 students to complete a survey.
•
(a) Simple Random Sample
•
(b) Stratified Random Sample
•
(c) Cluster Sample
Inference for Sampling






The purpose of a sample is to give us information about a larger ______________.
The process of drawing conclusions about a population on the basis of sample data is
called ___________________.
Why should we rely on random sampling?
To eliminate ________ in selecting samples from the list of available individuals.
The laws of _______________ allow trustworthy inference about the population
Results from random samples come with a ________________________ that sets bounds
on the size of the likely error. (It tells us how much variability to expect.)
________________ random samples give better information about the population than
smaller samples.
Bias is introduced by the way in which a sample
is selected or by the way in which the data are
collected from the sample. Increasing the size of
the sample does _____________ to reduce the
bias!
Sampling Error: Mistakes made in the ________________ of taking a sample that could lead
to inaccurate information about the population
 _______________________________
 _______________________________
 _______________________________
Undercoverage occurs when some ___________________ in the population are left out
of the process of choosing the sample.
 EXAMPLE: A sample survey of households will miss homeless
persons, prison inmates, students living in dorms, etc.
AP Statistics
Unit 4 Note Packet
Sampling and Experimental Design
Nonsampling Error:
 Can plague even a __________
 Nonresponse
 Response Bias
 Poor wording of ______________
Nonresponse occurs when an individual __________ for the sample can’t be contacted or
refuses to participate.
NOTE: This differs from “voluntary response” because in a voluntary response survey the
individuals have all opted to take part in the survey. In nonresponse, those chosen for the
sample do not participate.
Response Bias: A systematic pattern of _________________ responses
 EXAMPLES: People know they should vote, so when asked by an
interviewer if they voted in the last election, they will say that they
did.
 Faulty memory: “Have you visited the dentist in the last 6
months?”
Wording of Questions: Confusing or __________ questions can introduce strong bias
 EXAMPLE: The same sample was asked both of these questions:
“Should illegal immigrants be prosecuted and deported for being in the U.S. illegally, or
shouldn’t they?” (69% favored deportation)
“Should illegal immigrants who have been in the U.S. for two years be given the chance to
keep their jobs and eventually apply for legal status?” (62% responded “yes”)
Observational Studies vs. Experiments
Observational study: The researcher observes individuals and measures variables of interest
________________ influencing the responses.
GOAL: to draw _____________________ about the corresponding population or about
differences between two or more populations
 NOTE: Observational studies of the effect of one variable on another often _________
because of confounding between the explanatory variable and one or more lurking
variables.
A lurking variable is a variable that is not among the explanatory or response
variables in a study but that may influence the response variable.
Confounding occurs when two variables are associated in such a way that their
effects on a response variable cannot be distinguished from each other.
AP Statistics
Unit 4 Note Packet
Sampling and Experimental Design
Example: Observe women who take hormones vs. those not taking hormones and note whether a
heart attack has occurred. What are some possible lurking variables?
Experiment: The researcher _______________________ imposes some treatment on
individuals to measure their responses.
GOAL: to determine whether the treatment _______________ a change in the response
 Well-designed experiments can provide evidence for a ___________________
relationship
The Language of Experiments
Treatment: A specific ______________ applied to the individuals in an experiment

If an experiment has several explanatory variables, a treatment is a combination of
specific values of these variables.
Experimental Units: the collection of individuals to which treatments are applied

When the units are human beings, they often are called ________________.
Sometimes, the explanatory variables in an experiment are called _______________. Many
experiments study the joint effects of several factors. In such an experiment, each treatment is
formed by combining a specific value (often called a __________) of each of the factors.
EXAMPLE: There are now many special courses that claim to prepare you for the SAT.
Suppose that you want to evaluate a particular course, using SAT scores to measure the effect of
the course. You might find a reasonably large high school where students are offered the chance
to take the course, and then compare the SAT scores of those who completed the course with the
scores of those who chose not to take it.
 Suppose that you find that the average SAT score for students who took the course is
30 points higher than for students who didn’t. Identify each of the elements in this
study: population, response variable, treatments.
 Is this study a true experiment?
 Do you conclude that the course causes an increase in SAT scores? Why or why not?
AP Statistics
Unit 4 Note Packet
Sampling and Experimental Design
Each pair of variables shown is strongly associated. Does A cause B or does B cause A, or is
there a lurking variable?
1. A: having hip surgery
2. A: the amount of milk a person drinks
B: the strength of a person’s bones
B: dying within the next 10 years
3. A: the amount of money a person earns
4. A. the number of classes taken with Mrs. Sapp
B: the number of years a person went to school
B. level of endorphins in bloodstream
Model for experiments:
EXAMPLES:
 Suppose Starbucks wishes to find out whether the population of MHS students prefer hot
or cold, frozen coffee drinks. A random sample of students is selected, and each one is
asked to try first hot coffee and then frozen coffee, or vice versa (with the order
determined at random). They then indicate which type they prefer. Experiment or
Observational Study?
 A researcher is interested in the effects of excessive homework on family dinner nights.
She surveys students on their homework load, and the number of family nights they have
been required to forego. She concludes that homework load does not directly affect
family nights. Experiment or Observational Study?
 Suppose an experiment is designed to investigate the effects of repeated exposure to TV
ads. All subjects viewed a 40 minute TV program that included ads for an iPad. Some
subjects saw a 30- second commercial; others, a 90-second version. The same
commercial was shown either 1, 3, or 5 times during the program. After viewing, all the
subjects answered questions about their recall of the ad, their attitude toward the iPad and
their intention to purchase it.
Identify the explanatory and response variables:
List all the treatments:
AP Statistics
Unit 4 Note Packet
Sampling and Experimental Design
How to Experiment Well: The Randomized Comparative Experiment
A farm-product manufacturer wants to determine if the yield of a crop is different when the soil
is treated with three different types of fertilizers. Fifteen similar plots of land are planted with
the same type of seed but are fertilized differently. At the end of the growing season, the mean
yield from the sample plots is compared.
experimental units explanatory variable (factor) levels response variable experimental design The Importance of Randomizing:
 What is the main threat to making reliable inferences about cause?
We can think of confounding as the two groups that we wish to compare differing in some other
way than the relevant response variable.
 How can we guard against confounding?
Remember: If you don’t randomize,
it’s risky to generalize!
The remedy for confounding is to perform a ____________________________ in which some
units receive one treatment and similar units receive another. Most well designed experiments
compare two or more treatments.
 Comparison alone isn’t enough, if the treatments are given to groups that differ greatly,
______ will result. The solution to the problem of bias is random assignment.
In a _________________________ design, the treatments are assigned to all the experimental
units completely by chance.
Some experiments may include a ___________________ group that receives an inactive
treatment or an existing baseline treatment.
Diagram:
AP Statistics
Unit 4 Note Packet
Sampling and Experimental Design
Three Principles of Experimental Design:
1. _______________ for lurking variables that might affect the response: Use a comparative
design and ensure that the only systematic difference between the groups is the treatment
administered.
2. _______________________________: Use impersonal chance to assign experimental
units to treatments. This helps create roughly equivalent groups of experimental units by
balancing the effects of lurking variables that aren’t controlled on the treatment groups.
3. __________________: Use enough experimental units in each group so that any
differences in the effects of the treatments can be distinguished from chance differences
between the groups.
Example: The Physicians’ Health Study
 This study looked at the effects of two drugs: aspirin and beta carotene. Researchers
wondered whether beta carotene would help prevent some types of cancer. The subjects
were 21, 996 male physicians. There were two explanatory variables (factors), each
having two levels: aspirin (yes or no) and beta carotene (yes or no). Combinations of
these factors form the four treatments shown. On odd-numbered days, the subjects took
either a tablet that contained aspirin or a placebo pill. On even-numbered days, they took
either a beta carotene pill or a placebo*. There were several kinds of response variables:
heart attacks, certain types of cancer, and other medical outcomes. After several years,
239 of the placebo group and 139 of the aspirin group suffered heart attacks. The beta
carotene, however, didn’t seem to have any significant effects.
 Explain how each of the three principles of experimental design was used in the study.
*Note: A placebo is a “dummy pill”
or inactive treatment that is
indistinguishable from the real
treatment.
Inference for Experiments and Blocking
A response to a dummy treatment is called a ______________ effect. The strength of the
placebo effect is a strong argument for randomized comparative experiments.
Whenever possible, experiments with human subjects should be ___________________.
In a double-blind experiment, neither the subjects nor those who interact with them and
measure the response variable know which treatment a subject received.
AP Statistics
Unit 4 Note Packet
Sampling and Experimental Design
Statistically Significant:
In an experiment, researchers usually hope to see a difference in the responses so large that it is
unlikely to happen just because of chance variation.

An observed effect so large that it would rarely occur by chance is called statistically
significant. (A statistically significant association in data from a well-designed
experiment does imply causation.)
Blocking:
 A block is a group of experimental units that are known before the experiment to be
similar in some way that is expected to affect the response to the treatments.
 In a _______________________ design, the random assignment of experimental units to
treatments is carried out separately within each block.
Form blocks based on the most important unavoidable sources of _____________ (lurking
variables) among the experimental units.
Randomization will average out the effects of the remaining lurking variables and allow an
__________________________ of the treatments.
***Control what you can, block on what you can’t control, and randomize to create
comparable groups.***
In a block design, before the experimental units are randomly assigned to a treatment group:
• experimental subjects are divided into ____________________ blocks
 The blocks are based on the most important unavoidable sources of
____________________ (lurking variables)
• The variability within blocks is ________ than the variability between blocks.
• Reduces _____________________ and potential ________________
• Produces a better ________________ of treatment effects.
EXAMPLE: Suppose a researcher is carrying out a study of the effectiveness of four different
skin creams for the treatment of a certain skin disease. He has ninety subjects and plans to
divide them into 3 treatment groups of thirty subjects each. If the experimenter has reason to
believe that age might be a significant factor in the effect of a given medication, he might choose
to first divide the experimental subjects into age groups, such as under 30 years old, 30-60 years
old and over 60 years old. Then, within each age level, individuals would be assigned to
treatment groups using a completely randomized design.
Another way we could do randomized block design would be to have the subjects assessed and
put in blocks of three according to how severe their skin condition is; the four most severe cases
are the first block, the moderate cases in the second block, and mildest cases in the third block.
The members of each block are then randomly assigned, one to each of the four treatment
groups.
AP Statistics
Unit 4 Note Packet
Sampling and Experimental Design
Example 2: Suppose you have 500 individuals (250 males, 250 females) participating in a study
for a new vaccine. Since it is known that men and women are physiologically different and react
differently to medication, we might consider blocking by gender. Then, within each block,
subjects are randomly assigned to treatments.
 This design ensures that each treatment condition has an equal
TREATMENT
proportion of men and women. As a result, differences
between treatment conditions cannot be attributed to
Gender
Placebo
Vaccine
__________________.
 This randomized block design removes gender as a potential
Male
250
250
source of ___________________ and as a potential
confounding variable.
Female
250
A ______________________________ is a randomized blocked
experiment in which each block consists of a matching pair of similar experimental units.
Chance is used to determine which unit in each pair gets each treatment.
Sometimes, a “pair” in a matched-pairs design consists of a single unit that receives both
treatments. Since the order of the treatments can influence the response, chance is used to
determine with treatment is applied first for each unit.
To do a matched pair design using the previous example, the 1000 subjects are grouped into 500
matched pairs.
•
Each pair is ____________________ on gender and age.
•
For example, Pair 1 might be two women, both age 21. Pair 2 might be two women, both
age 22, and so on.
For the acne example, the matched pairs design is an ______________________ over the
completely randomized design and the randomized block design.
•
Like the other designs, the matched pairs design uses randomization to control for
______________________.
•
However, unlike the others, this design explicitly controls for two potential lurking
variables - age and gender.
250