L.11: Calculating sample size

Methodology session III
Monday 16 November 2009
09.15-10.00: Exercise S3
10.15-11.00: Measures of association
11.15-12.00: Exercise S4
12.00-13.00:Lunch break
13.00-13.45:Experimental design incl. Randomized
controlled trials
14.00-14.45: Calculating sample size
15.00-15.45:Exercise S5
L.11: Calculating sample size
11.1 Sample size calculation
11.2 The inadequacy of small
experiments
1
11.1 Sample size calculations
- How many observations (patients) do we
need?
- This question is answered by evaluating
the study’s statistical power.
Example: Humerfelt et al. (1998) Effectiveness of a postal
smoking cessation advice: a randomized controlled trial
in young men with reduced FEV1 and asbestos exposure.
Eur. Resp. J. 11: 284-290.
To calculate the sample size some key
questions have to be answered first:
(1) What is the primary aim of the study?
The example:
- To decide if mailed advice reduces smoking in
smokers with high risk of lung disease.
(2) What is the primary response variable (end point)?
The example:
- Smoking status 1 year after advice was given.
2
(3) How will the data be analysed to detect a
potential treatment effect?
The example:
- Compare the proportions of quitters in an
intervention group and a control group 1 year after
a letter was sent with advice about quitting. A twosided hypothesis test will be performed (the chisquare test w/o Yate’s correction). If p ≤ 0.05 we
will conclude that smoking cessation advice per
mail influences the smoking-habits one year later
in patients with high risk of lung disease
=>
α= 0.05.
(4) How large response is
expected in the control group?
The example:
- In the control group 2.5 % were expected to
have quit smoking 1 year after first letter
=> p1 = 0.025 for quitting during the year
3
(5) What is the smallest treatment effect that would
be of (clinical) importance to find, and how certain
would you want to be to detect such an effect?
The example:
- If the letter makes (at least) 5 % to quit during
the following year would we like to have (at
least) an 80 % chance to detect it
=> p2 = 0.05 for quitting
and β = 0.20.
(6)
Use Pocock’s formula for sample size for a
dichotomous or continuous response:
Dichotomous response:
n=
p1(1-p1) + p2(1-p2)
f(α,β)
(p2-p1)2
Continuous response:
2σ2__
n = (μ2-μ1)2f(α,β)
Here the factor f(α,β) is found in Pocock’s table.
4
Pocock’s table for f( α, β)
f( α, β)
β
0.05
α
0.10
0.20
0.50
0.10
10.8
8.6
6.2
2.7
0.05
13.0
10.5
7.9
3.8
0.02
15.8
13.0
10.0
5.4
0.01
17.8
14.9
11.7
6.6
The example:
f(α,β) = f(0.05,0.20) = 7.9 gives
n=
0.025(1-0.025) + 0.05(1-0.05) 7.9 = 908.5
(0.05-0.025)2
And, adjusted for 70 % response rate:
approx. 1300 in each group.
5
PS: One-sided test
For a one-sided hypothesis test at significance level α with power 1-β use
the formulas with 2α (and β)
Instead of α (and β).
Example: For a one-sided test of the hypotheses about smoking with α =
0.05 and power 0.80 you need
n=
p1 (1 - p1 ) + p 2 (1 - p 2 )
f(2α , β )
( p 2 - p1 )2
=
0.025(1 - 0.025) + 0.05(1 - 0.05)
f(0.10,0.20)
2
(0.025 - 0.05 )
=
0.025(1 - 0.025) + 0.05(1 - 0.05)
• 6.2 = 115 • 6.2 = 713
2
(0.025 - 0.05 )
patients in each group, i.e. a total of 1426 should be randomized.
Example with continuous response variable:
Cockburn et al. (1980) Maternal vitamin D intake and mineral metabolism in
mothers and their newborn infants. Br. Med. J., 281, 11-14.
Replies to the key questions:
(1) To decide if supplementary vitamin D given to pregnant women prevent
hypocalcaemia in newborns.
(2) The child’s serum calcium-level 1 week after birth.
(3) Compare mean serum calcium-level between a placebo-group and a
treatment group using a two-sided unpaired t-test at significance level α =
0.05.
(4) Without D-vitamins the children are expected to have a mean calcium level of
μ1 = 9.0 mg per 100 ml with a standard deviation of σ = 1.8 mg per 100 ml.
(5) If D-vitamins increase the mean calcium level to μ2 = 9.5 mg per 100 ml we
would wish to have a 95 % chance of detecting it in this RCT.
(6) This gives n = (2⋅1.8²/0.5²)⋅13.0 = 337, i.e. a total of 700 patients.
6
Sample size for an equivalence trial
(Pocock p.129-130)
H0: Not equivalent vs.H1: Equivalent
1. Choose a value d such that if the two treatments really
are equally effective (H1) the upper 100(1-α)% CI for
the difference in proportion successes on the two
treatments should not exceed d with probability 1-β.
2. Use the following formula for the size of each group:
n=
2 p(1 - p )
f( α , β )
( d )2
Sample size for an equivalence trial
(Example)
- In a RCT one wants to specify that a new
antidepressant will only be considered
acceptable if it can be demonstrated with 95%
confidence that it is at worst 10% inferior to the
standard drug. Suppose one accepts a 20% risk
that even if the drug is really effective one will
fail to show it as acceptable in this sense.
- Then p=0.70, d=0.10, α=0.05, β=0.20, so
n=
2 x 0.70 x 0.30
7.9 = 332
2
(10 )
7
11.2 The inadequacy of small trials:
- Large risk for failing in documenting treatment effects of
clinical importance.
- Too many false ‘negative’ trials get published.
- Unnecessary experimentation with humans (or animals).
- Delay progress in the development of new treatments, and
- Is a waste of time, money and effort.
In other words, many trials that are published are wasted
since they did not have the resources necessary to answer
the clinical research questions that were posed.
What can be done?
-
Be not too optimistic w.r.t. patient-recruitment.
Design multi-centre studies.
Broaden the inclusion criteria.
Don’t conduct the study.
- Meta-analyses.
8
Michel de Montaigne (1533-92):
“Frukten av en leges erfaring er ikke
historiene om hans behandlinger og
minnet om at han har helbredet fire
pestsmittede og tre giktbrudne, hvis ikke
han samtidig evner å utvinne noe av denne
erfaringen som kan utvikle hans skjønn, og
han kan vise oss at han av dette er blitt
klokere i sitt yrke”.
9