Chapter 5.1 Data Production

AP Statistics

Observational study: We observe individuals
and measure variables of interest but do not
attempt to influence responses.

Experiment: We deliberately impose some
treatment on individuals in order to observe
their responses.

Pros vs. Cons of each? (control etc…experiment better)

Pop: the entire group of individuals that we
want information about

Sample: a part of the population that we
actually examine in order to gather info

Sampling vs. Census: Sampling studies a part
in order to gain info about the whole, census
attempts to contact every individual in the pop



Voluntary response: People choose
themselves by responding
Convenience sampling: Choosing individuals
who are easiest to reach
Bias: The sampling method is biased if it
systematically favors certain outcomes


The simplest way to use chance to select a
sample is to place names in a hat (the
population) and draw out a handful (the
sample).
SRS: every individual has = chance of getting
picked, every sample of the size you are
drawing has = chance of getting picked


Table B: long string of digits 0-9, each entry in
table is equally likely to be any of the 1- digits
Choosing SRS with table:
 1. Label: Assign a # label to every individual in the
pop (example: 01-50 for each senior girl @ CSH)
 2. Table: use table B to select random labels
 3. Stop: indicate when you should stop sampling (toss
out repeated numbers, or numbers out of your range)
 4. Identify sample: use the random #’s to identify
subjects to be selected from your pop. This is your
sample!

Math, prb, randint(lowest #, highest #, # of
people you want in your sample)

If you use ctlghlp: instead of hitting enter
when randint( is highlighted in the prb menu,
hit “+” and it will tell you what goes in
parens.
You can store your random numbers in a list:
Randint(1,150,25) sto-> L1






Probability sample: samples chosen by chance
Stratified random sample: divide population into
groups (aka strata) that are similar in some way,
then choose a separate SRS in each stratum,
then combine these SRS’s to form the full
sample
Cluster sampling: divide population into groups
(aka clusters). Some of these clusters are
randomly selected. Then all individuals in
chosen clusters are selected to be in the sample
Multistage samples

Undercoverage: occurs when some groups in
the population are left out in the process of
choosing the sample (hard to get an accurate
and complete list of the population. Most
samples suffer from some degree of this)

Nonresponse: occurs when an individual
chosen for the sample can’t be contacted or
does not cooperate.

The behavior of the respondent or
interviewer can cause response bias in
sample results

Wording of questions can influence answers

We can improve our results by knowing that
larger random samples give more accurate
results than smaller samples




The individuals on which the experiment is
done are the experimental units.
If units are humans, they are called subjects.
The experimental condition applied to the
units (aka the thing we ‘do’ to the people
participating) is called a treatment.
Goal of research is to establish a causal link
between a particular treatment and a
response.




Factors: number of variables interested in
(The explanatory variable, causes the change
in other variables)
Levels: number of ‘categories’ for each:
Example: use 2 pain relievers at 3 different
doses.
This is an example of a 2x3 study

When designing an experiment we want to
minimize the effect of lurking variables so
that our results are not biased.

It is essential to use a control group

The control gets a fake treatment to counter
the placebo effect and other lurking variables



Even w/control, natural variability occurs
among experimental units.
We would like to see units within a treatment
group responding similarly to one another,
but differently from units in other treatment
groups (then we can be sure that the
treatment is responsible for the differences).
If we assign many individuals to each
treatment group, the effects of chance (and
individual differences) will average out.

Comparison of the effects of several
treatments is valid only when all treatments
are applied to similar groups of experimental
units.


Step 1: Choose treatment
Identify factors and levels
Control group
Step 2: Assign the experimental units to the
treatment
Matching (place similar units in each
treatment group)
Randomization (randomly assign
units to each treatment group)


Remember if we want to examine a cause and
effect relationship, we conduct an
experiment
If an experiment is well-designed, a strong
association in the data does imply causation,
since any possible lurking variables are
controlled.



1. Control the effects of lurking variables on
the response, most simply by comparing 2 or
more treatments
2. Randomize – use impersonal chance to
assign experimental units to treatments
3. Replicate each treatment on many units
to reduce chance variation in results

We hope to see big differences (differences
so large they are not likely just due to chance
or individual differences).

If we do have an observed effect so large that
it would rarely occur by chance, we call our
result Statistically Significant



In a completely randomized design, all
subjects are randomly assigned to treatment
groups.
In a block design, subjects are first split into
groups called blocks
In a matched-pair design, there are only two
treatments.

A block is a group of experimental units that
are known before the experiment to be
similar in some way that is expected to
systematically affect the response to
treatments

Separate into “blocks” of similar subjects to
reduce the effect of variation




Matching the subjects in various ways can
produce more precise results than simple
randomization
Matched pairs design compares 2
treatments. Subjects matched in pairs.
a single subject receives both treatments or
a pair of subjects, each receiving a different
treatment

Even well- designed experiments can contain
hidden bias

Double-blind: neither subject nor
experimenter knows which treatment is
assigned
May be other hidden lurking variables that
are not considered in the experiment


Five steps of simulation

1. State the problem or describe the
experiment
2. State the assumptions
3. Assign digits to represent outcomes
4. Simulate many repetitions
5. State your conclusion



