ACMS 20340 Statistics for Life Sciences

ACMS 20340
Statistics for Life Sciences
Chapter 7:
Samples and Observational Studies
Obtaining Data
How do we obtain the data we need for statistical analysis?
I
Suppose we’re interested in answering the question, “What
percent of Americans drive to work daily?”
I
We couldn’t possibly ask every American.
I
However, we can get information from a sample chosen to
represent the whole population
I
But how do we choose such a sample?
The Language of Sampling
Some important terminology:
I
The population in a statistical study is the entire group of
individuals about which we want information.
I
A sample is a part of the population from which we actually
collect information.
I
A sampling design describes exactly how we choose a sample
from the population.
The Challenges of Sampling
Choosing a sample from a large or varied population can be
challenging.
We should ask ourselves:
I
“What population do we want to describe?”
I
“What variables do we want to measure?”
Statistical studies cannot always sample from the entire population
of interest due to practical or ethical reasons.
Sampling Designs Done Wrong
Convenience Sample: The experimenter selects which individuals
to measure (usually the close at hand, or the ones which are easy
to contact).
Voluntary Response Sample: Individuals solely decide whether
or not to participate.
The design of a statistical study is biased if it systematically favors
certain outcomes.
In both of these designs the sample is determined by choice, either
the experimenter’s choice or the individuals choice.
It is possible that parts of the population would never be chosen
with these designs!
Sampling Designs Done Right
Probability sampling removes bias by selecting individuals based on
chance.
A simple random sample (SRS) of size n consists of n individuals
from the population chosen in such a way that every set of n
individuals has an equal chance to be the sample selected.
(This is stronger than saying every individual has the same chance
of being selected).
How do we pick such a sample?
Table of Random Digits
Table of Random Digits
A table of random digits is a long string of the digits 0, 1, 2, 3,
4, 5, 6, 7, 8, 9 with these two properties:
I
Each entry in the table is equally likely to be any of the 10
digits 0 through 9.
I
The entries are independent of each other: knowledge of one
part of the table gives no information about any other part.
Table of Random Digits
How to Use a Table of Random Digits
1. Assign to each member of the population a numerical label of
the same length.
2. Select a place in the table and begin reading blocks of digits
of the length we chose in step 1.
3. We include in our sample those individuals whose labels we
find in the table.
4. If a block of numbers appears more than once, we ignore it.
Hooray For Technology!
Commonly, we use a random number generator.
I
SRS applet available on the book’s website.
I
www.random.org
Advanced Sampling Designs—Stratified Random Sample
While SRSs are ideal, they have some shortcomings.
I
Suppose we are interested in including both majority and
minority groups in our study.
I
However, our SRS may include only a few members of one
such minority group, or even no members at all.
I
For instance could sample students about their sporting
interests, but we want to include the opinions of scuba divers
in our study.
Stratified random sample: Sample important groups within the
population separately and then combine the results.
I
To include the opinions of scuba divers in our study, we can
use a stratified random sample by choosing a sample from the
scuba divers and another from the non-scuba divers.
I
The results can then be combined.
Advanced Sampling Designs—Multistage Random Sample
Another shortcoming: If the population is very large or is spread
over a large area, choosing an SRS can be logistically challenging.
Multistage random sample: Choose SRSs within SRSs.
I
For example, to sample sporting interest of students in ND,
we could first choose a random selection of dorms, and then
for each chosen dorm, we then choose a SRS of students who
live in that dorm.
Observation vs. Experiment
There are two distinct settings for collecting data, both of which
involve sampling:
I
An observational study observes individuals and measures
variables of interest but does not attempt to influence the
responses.
I
I
The purpose is to describe a group or situation.
An experiment deliberately imposes some treatment on
individuals in order to observe their responses.
I
The purpose is to study any possible causation due to the
treatment.
We consider the specifics of observational studies in this chapter,
and the specifics of experimental design in the next chapter.
Some Types of Observational Studies
We will focus on three in particular:
I
Sample Surveys
I
Case-Control Studies
I
Cohort Studies
Sample Surveys
A sample survey is an observational study trying to answer
questions of a population.
In conducting a sample survey, one asks the members of a sample
one or more questions (spoken or written).
I
Opinion surveys
I
Election polling
Potential Pitfalls of Sample Surveys
Even when we have an SRS, the sample still may be biased.
Some sources of bias in sample surveys:
1. undercoverage
2. non-response
3. response bias
4. wording of questions
To avoid these problems, try to think of all the possible ways one
might not get an accurate, random sample even after using
randomization to select the sample.
Undercoverage
If one or more groups of the population are left out of the process
of generating the samples, then the survey will undercover the
population.
I
For example, when doing a telephone survey people who do
not have phones will be left out.
I
If the stated population for the survey includes people without
phones, this would be an example of undercoverage.
If a certain group is not included in our sample, this may not be
due to undercoverage.
I
If a group has the potential to be in the sample and isn’t, that
is fine.
I
Undercoverage occurs only if there is no way for the sampling
process to select individuals from some subgroup.
Nonresponse
Nonresponse occurs when an individual cannot be contacted or
refuses to take part after being selected to be part of the sample.
If enough people of a certain type refuse to participate, the
omission of this group of people can bias the survey.
Response Rates
Response Bias
Some survey questions may be on sensitive topics, and as a result,
the responder may exaggerate or understate his or her answers.
I
“How much do you weigh?”
I
“Have you ever committed a felony?”
It may not be just the content of a question that results in
response bias, but also certain traits of the interviewer.
I
The interviewer’s gender.
I
The interviewer’s ethnicity.
Wording of Questions
The wording of the questions can potentially have a large on the
answer given.
I
Confusing or leading questions can change a survey’s outcome.
I
Questions worded like “Do you agree that it is awful that...”
are prompting you to give a particular response.
I
Questions may also be too complicated and confusing.
Many questions are standardized to allow comparison with earlier
studies.
Wording Differences
Case-Control Study
The second type of observational study that we consider is the
case-control study.
In a case-control observational study, we consider samples of
individuals from two different groups:
(i) case-subjects are selected based on a defined outcome, and
(ii) and a control group of subjects is selected separately to serve
as a baseline with which the case group is compared.
Once these two samples are selected, we look for exposure factors
in the subjects’ past (the retrospective approach).
Some Pros and Cons of Case-Control Studies
Case-control studies are useful for studying rare conditions.
However, selecting controls can be challenging.
Not all case-control studies are so careful about the choice of
control group: Historical-control designs are case-control studies
that utilize existing data from previous studies to make up the
control group.
Historical-control designs may introduce confounding variables (to
be defined shortly).
When approval by an ethics committee is hard to obtain,
historical-control designs may be the only available tool.
Cohort Study
The last type of observational study we will consider is the cohort
study.
Cohort studies enlist individuals of common demographic, and keep
track of them over a long period of time (the prospective
approach).
Individuals who later develop a condition are compared with those
who don’t.
In general, cohort studies examine the compounded effect of
certain factors over time.
I
They are good for studying common conditions.
I
However, they can be very expensive.
Example of a Cohort Study
Confounding variables
Two variables (explanatory or lurking) are confounded when their
effects on a response variable cannot be distinguished from each
other.
Observational studies of the effect of a variable often fail because
of confounding.
For example, moderate use of alcohol is associated with better
health.
Observational studies suggest wine has a better effect on health
than other alcoholic beverages.
Is there a confounding variable?
Confound It!
!"#$"%#&'()*
!
!