Session Slides/Handout

Biostatistics in Practice
Session 1:
Quantitative and Inferential Issues
Youngju Pak
Biostatistician
Peter D. Christenson
http://research.LABioMed.org/Biostat
Class Note
We will typically have many more slides
than are covered in class.
Session 1 Objectives
General quantitative needs in biological
research
Statistical software
Protocol examples, with statistical sections
Overview of statistical issues using a
published paper
Session 1 Objectives
General quantitative needs in biological
research
Statistical software
Protocol examples, with statistical sections
Overview of statistical issues using a
published paper
General Quantitative Needs
Descriptive: Appropriate summarization to
meet scientific questions: e.g.,
• changes, or % changes, or reaching
threshold?
• mean, or minimum, or range of response?
• average time to death, or chances of
dying by a fixed time?
General Quantitative Needs, Cont’d
• Inferential: Could results be spurious, a
fluke, due to “natural” variations or
chance?
• Sensitivity/Power: How many subjects are
needed?
• Validity: Issues such as bias and valid
inference are general scientific ones,
but can be addressed statistically.
Session 1 Objectives
General quantitative needs in biological
research
Statistical software
Protocol examples, with statistical sections
Overview of statistical issues using a
published paper
Professional Statistics Software Package
Output
Stored
data;
accessible.
Enter
code;
syntax.
Typical Statistics Software Package
Select Methods from Menus
www.ncss.com
www.minitab.com
Output after menu selection
Data in spreadsheet
Microsoft Excel for Statistics
• Primarily for
descriptive statistics.
• Limited output.
• No analyses for %s.
Almost Free On-Line Statistics Software
www.statcrunch.com
• Run from browser, not local.
• Can store data, results on
statcrunch server.
• $5/ 6 months usage.
Free Statistics Software: Mystat
www.systat.com
Free Study Size Software
www.stat.uiowa.edu/~rlenth/Power
Session 1 Objectives
General quantitative needs in biological
research
Statistical software
Protocol examples, with statistical sections
Overview of statistical issues using a
published paper
Typical Statistics Section of Protocol








Overview of study design and goals
Randomization/treatment assignment
Study size
Missing data / subject withdrawal or
incompletion
Definitions / outcomes
Analysis populations
Data analysis methods
Interim analyses
Public Protocol Registration
www.clincialtrials.gov
www.controlled-trials.com
Attempt to allow the public to be aware of
studies that may be negative.
Many journals now require registration in
order to consider future publication.
Public Protocol Registration
Example of Protocol
--- Displayed in Class ---
Session 1 Objectives
General quantitative needs in biological
research
Statistical software
Protocol examples, with statistical sections
Overview of statistical issues
Statistical Issues
Subject selection
Randomization
Efficiency from study design
Summarizing study results
Making comparisons
Study size
Attributability of results
Efficacy vs. effectiveness
Exploring vs. proving
Paper with Common Statistical Issues
Case Study:
McCann, et al., Lancet 2007 Nov
3;370(9598):1560-7
• Food additives and hyperactive behaviour in 3-year-old and 8/9-yearold children in the community: a randomised, double-blinded, placebocontrolled trial.
• Target population: 3-4, 8-9 years old children
• Study design: randomized, double-blinded, controlled, crossover trial
• Sample size: 153 (3 years), 144(8-9 years) in Southampton UK
• Objective: test whether intake of artificial food color and additive
(AFCA) affects childhood behavior
• Sampling: Stratified sampling based on SES
• Baseline measure: 24h recall by the parent of the child’s pretrial diet
• Group: three groups (mix A, mix B, placebo)
• Outcomes: ADHD rating scale IV by teachers, WWP hyperactivity
score by parents, classroom observation code, Conners continuous
performance test II (CPTII)  GHA score
Statistical Issues
Subject selection
Randomization
Efficiency from study design
Summarizing study results
Making comparisons
Study size
Attributability of results
Efficacy vs. effectiveness
Exploring vs. proving
Selecting Study Subjects
Representative or Random Samples
How were the children to be studied selected
(second column on the first page)?
The authors purposely selected
"representative" social classes.
Is this better than a "randomly"
chosen sample that ignores social class?
Often hear: Non-random = Non-scientific.
Case Study: Participant Selection
No mention of random samples.
Case Study: Participant Selection
It may be that only a few schools are needed
to get sufficient individuals.
If, among all possible schools, there are few
that are lower SES, none of these schools
may be chosen.
So, a random sample of schools is chosen
from the lower SES schools, and another
random sample from the higher SES schools.
Selection by Over-Sampling
It is not necessary that the % lower SES in the
study is the same as in the population.
There may still be too few subjects in a rare
subgroup to get reliable data.
Can “over-sample” a rare subgroup, and then
weight overall results by proportions of
subgroups in the population. The CDC
NHANES studies do this.
Random Samples vs. Randomization
We have been discussing the selection of
subjects to study, often a random sample.
An observational study would, well, just
observe them. An interventional study assigns
each subject to one or more treatments in
order to compare treatments.
Randomization refers to making these
assignments in a random way.
Why Randomize?
Plant breeding example: Compare yields of
varieties A and B, planting each to 18 plots.
Which design is better?
Systematic
Randomized
A B A B A B
B
A
B
A
B
B
B A B A B A
A
A
B
A
B
A
A B A B A B
B
A
B
B
A
A
B A B A B A
B
B
A
A
B
B
A B A B A B
A
B
A
B
B
A
B A B A B A
A
A
B
A
B
A
Why Randomize?
So that groups will be similar except for
the intervention.
So that, when enrolling, we will not
unconsciously choose an “appropriate”
treatment for a particular subject.
Minimizes the chances of introducing
bias when attempting to systematically
remove it, as in plant yield example.
Statistical Issues
Subject selection
Randomization
Efficiency from study design
Summarizing study results
Making comparisons
Study size
Attributability of results
Efficacy vs. effectiveness
Exploring vs. proving
Basic Study Designs
1. Prospective (longitudinal)
2. Retrospective(Case-Control)
3. Cross sectional
4. Randomized-Control
Case Study: Crossover Design
Each child is studied on 3 occasions
under different diets.
Is this better than three separate groups
of children?
Why, intuitively?
How could you scientifically prove your
intuition?
Blocked vs. Unblocked Studies
AKA matched vs. unmatched.
AKA paired vs. unpaired.
Block = Pair = Set receiving all treatments. Set
could be an individual at multiple times (pre
and post), or left and right arms for sunscreen
comparison; twins or family; centers in multicenter study, etc. Block ↔ Homogeneous.
Blocking is efficient because treatment
differences are usually more consistent among
subjects than each separate treatment is.
Potential Efficiency Due to Pairing
Unpaired
A and B Separate Groups
..
.
.
. .
. .
…
. ..
..
..
A
.
..
…
. .
..
3 . ..
..
. .
B
Paired
A and B in a Paired Set
..
.
.
. .
. .
…
. ..
..
..
A
..
..
.
..
.
3
. .
..
. ..
B
Δ
…
…
….
….
…
…
….
….
Δ=B-A
3
Statistical Issues
Subject selection
Randomization
Efficiency from study design
Summarizing study results
Making comparisons
Study size
Attributability of results
Efficacy vs. effectiveness
Exploring vs. proving
Outcome Measures
Generally, how were the outcome measures
defined (third page)?
They are more complicated here than for
most studies.
What are the units (e.g., kg, mmol, $, years)?
Outcome measures are specific and predefined. Aims and goals may be more
general.
Summarization / Data Reduction
How are the outcome measures
summarized? e.g., Table 2:
Case Study: Statistical Comparisons
How might you intuitively decide from the
summarized results whether the additives
have an effect?
Different Enough?
Clinically?
Statistically?
Statistical Comparisons: Figure 3
Statistical Comparisons and
Tests of Hypotheses
Engineering analogy: Signal and Noise
Signal = Diet effect
Noise = Degree of precision
Statistical Tests:
Effect is probably real if signal-to-noise
ratio Signal/Noise is large enough.
Importance of reducing “noise”, which
incorporates subject variability and N.
Back to Efficiency of Design
Unpaired
Paired
A and B Separate Groups
..
.
.
. .
. .
…
. ..
..
..
A
.
..
…
. .
..
3 . ..
..
. .
Noise
B
A and B in a Paired Set
..
.
.
. .
. .
…
. ..
..
..
A
Signal = 3
..
..
.
..
.
3
. .
..
. .. Noise
B
Δ
…
…
….
….
…
…
….
….
Δ=B-A
3
Statistical Issues
Subject selection
Randomization
Efficiency from study design
Summarizing study results
Making comparisons
Study size
Attributability of results
Efficacy vs. effectiveness
Exploring vs. proving
Number of Subjects
The authors say, in the second column
on the fourth page:
Intuitively, what should go into selecting the
study size?
We will make this intuition rigorous in
Session 4.
Statistical Issues
Subject selection
Randomization
Efficiency from study design
Summarizing study results
Making comparisons
Study size
Attributability of results
Efficacy vs. effectiveness
Exploring vs. proving
Other Effects, Potential Biases
The top of the second column on the fourth
page mentions other effects on diet :
The issue here is: Could apparent diet
differences (e.g., -0.26 B vs. -0.44 Placebo)
be attributable to something else?
Statistical Issues
Subject selection
Randomization
Efficiency from study design
Summarizing study results
Making comparisons
Study size
Attributability of results
Efficacy vs. effectiveness
Exploring vs. proving
Non-Completing or Non-Adhering Subjects
What is the most relevant group of studied
subjects: all randomized, mostly adherent, fully
adherent?
Study Goal:
Scientific effect?
Societal impact?
Statistical Issues
Subject selection
Randomization
Efficiency from study design
Summarizing study results
Making comparisons
Study size
Attributability of results
Efficacy vs. effectiveness
Exploring vs. proving
Multiple and Mid-Study Analyses
Many more analyses could have been
performed on each of the individual behavior
ratings that are described in the first column
of the 3rd page.
Wouldn’t it be negligent not to do them, and
miss something?
Is there a downside to doing them?
Should effects be monitored as more and
more subjects complete?
Multiple Analyses
…
Parent
ADHD
…
Teacher
ADHD
Many
Separate
Measures
…
…
GHA: Global
Hyperactivity
Aggregate
Class
ADHD
Conner
Torture data long enough and it
will confess to something
Mid-Study Analyses
Effect
Too many
analyses, as on
previous slide
0
Wrong early
conclusion
Time →
Number of Subjects Enrolled
Need to monitor, but also account for many analyses
Bad Science That May Seem Good
1. Re-examining data, or using many outcomes,
seeming to be due diligence.
2. Adding subjects to a study that is showing
marginal effects; stopping early due to strong
results.
3. Emphasizing effects in subgroups.
Actually bad? Could be negligent NOT to do
these, but need to account for doing them.