Section 1.1

TOPIC 2: TYPES OF
STATISTICAL STUDIES
OBSERVATIONAL STUDIES
Observational Study
When data is collected only by monitoring what occurs, we call this an
observational study.
For example, to study the relationship between cell phone use and
brain cancer, researchers might monitor the health of those who
regularly use cell phones and those who do not.
http://www.iarc.fr/en/media-centre/pr/2010/pdfs/pr200_E.pdf
Association and Causation
An association between two variables exists if a value of one variable is
more likely to occur with certain values of another variable.
A causation between two variables exists if a value of one variable
tends to cause certain values of another variable.
An observational study can help establish an association between
variables, but not a causation.
http://www.peertrainer.com/LoungeCommunityThread.aspx?ThreadID=3118
The breakfast-slimness study
• This is an observational study since the researchers merely observed
the behavior of the girls (as opposed to assigning which girls ate
breakfast and which didn’t).
• What is the conclusion of the study? There is an association between
girls eating breakfast and being slimmer.
• Who sponsored the study? National Institutes of Health and General
Mills
Three possible explanations
1.
Eating breakfast causes girls to be thinner.
2.
Being thin causes girls to eat breakfast.
3.
A third (unconsidered) variable is responsible for both. This is
called a lurking variable or (confounding variable).
Distinguishing variables
“Girls who regularly ate breakfast were slimmer than those who skipped
the morning meal.”
If we suspect eating breakfast affects weight, then we call eating
breakfast the explanatory variable and weight the response variable.
explanatory variable
might affect
response variable
Warning: Labeling variables as explanatory and response does not
guarantee the relationship between them is causal, even if we identify
an association. In fact the labeling has no effect on the statistical
analysis at all.
Distinguishing variables (continued)
Sometimes there is no clear choice between two variables, and we do
not indicate one as the explanatory and the other as the response.
Consider the following:
If homeownership is lower than the national average in one county, will
the percent of multi-unit structures in that county likely be above or
below the national average?
Obtaining good samples
Recall that the population of interest is often too large to collect data
from, so we choose a sample instead.
Ideal situation: You can enumerate every individual in the population
and randomly select (using a computer algorithm) the number of
individuals you want in your sample. This is called a simple random
sample.
There are other random sampling techniques that are often used. The
most common are stratified and cluster sampling.
Simple random sample
Randomly select subjects from the population, where there is no
implied connection between the subjects chosen.
Stratified sample
Strata are made up of similar observation. We take a simple random
sample from each stratum.
Cluster sample
Clusters do not usually consist of homogeneous observations, and we
take a simple random sample from a random sample of clusters.
EXPERIMENTS
Experiment
Studies where researchers assign treatments to subjects are called
experiments.
For example, to study the relationship between cell phone use and
brain cancer, researchers might take 200 mice and randomly split them
into two groups of 100. One group is exposed to the same radiation
transmitted from a cell phone, and the other group is not exposed. After
eighteen months, the mice are checked for cancer development.
It is possible for a well-designed experiment to help establish a
causation.
http://www.ncbi.nlm.nih.gov/pubmed/9146709
Let’s design an experiment
• We want to investigate if energy gels make
people run faster.
• Treatment: energy gel
• Control: no energy gel
Let’s design an experiment
• We want to investigate if energy gels make
people run faster.
• Treatment: energy gel
• Control: no energy gel
• It is suspected that energy gels may affect
pro and amateur athletes differently, so we
block for pro status.
Let’s design an experiment
• We want to investigate if energy gels make
people run faster.
• Treatment: energy gel
• Control: no energy gel
• It is suspected that energy gels may affect
pro and amateur athletes differently, so we
block for pro status.
• Divide the sample by pro and amateur
Let’s design an experiment
• We want to investigate if energy gels make
people run faster.
• Treatment: energy gel
• Control: no energy gel
• It is suspected that energy gels may affect
pro and amateur athletes differently, so we
block for pro status.
• Divide the sample by pro and amateur
• Randomly assign pro athletes to treatment
and control groups
Let’s design an experiment
• We want to investigate if energy gels make
people run faster.
• Treatment: energy gel
• Control: no energy gel
• It is suspected that energy gels may affect
pro and amateur athletes differently, so we
block for pro status.
• Divide the sample by pro and amateur
• Randomly assign pro athletes to treatment
and control groups
• Randomly assign amateur athletes to
treatment and control groups
Distinguishing variables
“Do energy gels make people run faster?”
If we suspect consuming energy gels affects speed, then we call
energy gel the explanatory variable and speed the response variable.
explanatory variable
might affect
response variable
More experimental-design terminology
• Placebo: fake treatment, often used as the control group for medical
studies.
• Placebo effect: experimental subjects showing improvement simply
because they believe they are receiving a special treatment.
• Blinding: when experimental subjects do not know whether they are in
the control or treatment group.
• Double-blind: when both the experimental subjects and the
researchers who interact with the patients do not know who is in the
control and who is in the treatment group.
Replication experiments
A replication experiment is a repeat of a previous experiment using the
same methods, but with different subjects.
When random sampling is not practical for an experiment, causation
can only be established for the sample. By replicating the experiment
using different samples, causation can be established for a larger group
of subjects.
Treating chronic fatigue syndrome:
revisited
Objective. Evaluate the effectiveness of cognitive-behavior therapy for
chronic fatigue syndrome.
Participant pool. 142 patients who were recruited from referrals by
primary care physicians and consultants to a hospital clinic specializing
in chronic fatigue syndrome
Actual participants. Only 60 of the 142 referred patients entered the
study. Some were excluded because they didn't meet the diagnostic
criteria, some had other health issues, and some refused to be a part of
the study.
Deale, et. al. 1997. Cognitive behavior therapy for chronic fatigue syndrome: A randomized controlled
trial. The American Journal of Psychiatry 154:3.
Recall: generalizing the results of the
chronic fatigue syndrome case study
Are the results of this study generalizable to all patients with chronic
fatigue syndrome?
No. These patients had specific characteristics and volunteered to be a
part of this study, therefore they may not be representative of all
patients with chronic fatigue syndrome.
Later replication experiments may be able to strengthen the assertion
that cognitive-behavior therapy is an effective treatment for chronic
fatigue syndrome.
Random assignment vs. random sampling