Ferron, John – Randomization Tests Issues and Applications

Statistical Randomization Tests:
Issues and Applications

Randomization Tests versus Permutation Tests

Test Statistic Choice

Complete versus Sampled Randomization
Distribution
Randomization Test
1. Random assignment of 3 blocks to each condition
2. Compute observed test statistic
3. Create randomization distribution by computing test statistic
for each of the 20 possible randomizations
4. Compare the obtained test statistic to the randomization
distribution
Permutation Test
1. No random assignment
2. Compute observed test statistic
3. Create permutation distribution by computing test statistic
for data permutations
- 20 permutations of blocks of 5 observations?
- 155,117,520 permutations of individual observations?
4. Compare the obtained test statistic to the permutation
distribution
Type I error control of randomization tests
Type I error is controlled as long as:
1. Randomization is used in some form
2. Randomization distribution mirrors the randomization used
3. The test statistic choice is not influenced by knowledge of
treatment assignments
Type I error control of permutation tests
Type I error may or may not be controlled
Type I error can be controlled if one can assume exchangability, but
this is generally difficult to assume with time series data.
Consequently, it is statistically preferable to use some form of
random assignment and a randomization test than to use a
permutation test.
What if you can’t randomize?
Some permutation tests work better than others to control the Type I
error rate.
Design:
ABABABABABABABABABABABABABABAB
Type of Permutation:
individual observations
Sample Permutations: ABABABABABABABABABABABABABABAB
BBAAABBAABABABABABABABABABBAAB
AAABABABABABABABBBABABABABABAB
This test will become conservative with positive autocorrelation
What if you can’t randomize?
Design:
AAAAABBBBBAAAAABBBBBAAAAABBBBB
Type of Permutation:
blocks of 5 observations
Sample Permutations:
AAAAABBBBBAAAAABBBBBAAAAABBBBB
AAAAAAAAAABBBBBBBBBBAAAAABBBBB
BBBBBAAAAAAAAAABBBBBBBBBBAAAAA
This tests will become conservative with positive autocorrelation
What if you can’t randomize?
Design:
AAAAAAABBBBBBBAAAAAAABBBBBBB
Type of Permutation:
start points (say minimum phase length = 5)
Sample Permutations: AAAAAAABBBBBBBAAAAAAABBBBBBB
AAAAAAAABBBBBAAAAAAAAABBBBBB
AAAAABBBBBBBBAAAAAAABBBBBBBB
This test will become conservative with positive autocorrelation
Choice of Test Statistic
Anticipated increase in level:
T  MB  M A
T  MedianB  MedianA
1
0.8
0.6
0.4
0.2
0
Choice of Test Statistic
Delayed increase in level:
T
M B*
MA
M B* is mean of last n observations in B
1
0.8
0.6
0.4
0.2
0
Choice of Test Statistic
Change in slope:
T  bB  bA
1
0.8
0.6
0.4
0.2
0
Choice of Test Statistic
Change in variation:
sA
T
sB
1
0.8
0.6
0.4
0.2
0
Complete versus Sampled Randomization
Distribution
Complete – Randomization distribution is constructed by
systematically enumerating all possible random assignments
and computing the test statistic for each.
Sampling – Randomization distribution is constructed by
randomly sampling with replacement some large number
(e.g., 1000) of the possible assignments, and then computing
the test statistic for each sampled assignment.
Suppose you have 20 possible randomizations and the
observed test statistic is the largest of the 20.
Complete: the p-value = .05
Sampling: the p-value will be approximately .05. Therefore,
you will only get a statistically significant (p ≤ .05) result
about half the times you run the test.
Research Applications
The Impact of a Computer Network on Pediatric
Pain and Anxiety: A Randomized Control Clinical
Trial
Authors: Holden, G., Bearison, D. J., Rode, D. C., Kapiloff, M. F.,
Rosenberg, G., & Rosenzweig, J.
Publication Date: 2002
Journal: Social Work and Health Care, 36, 21-33
Treatment: Starbright World – a private computer network that allows
hospitalized children to interact with other hospitalized children
Outcomes: Pain intensity, pain aversiveness, and anxiety (self report)
Design: Alternating treatment design replicated across participants with
random assignment of conditions (B=SBW, A=Control) to observation
periods with the restriction that there couldn’t be more than 6
consecutive periods of the same condition.
Sample: 39 Hospitalized Children 7-18 years old that had at least 8
observations and went through at least one reversal (e.g., ABA)
Randomization Test: separate test on each child for each outcome using the
mean difference as the test statistic and Single Case Randomization Test
(SCRT) software
Promoting Expressive Writing Among Students with
Emotional and Behavioral Disturbance Via Dialogue
Journals
Authors: Regan, K. S., Mastropieri, M. A., & Scruggs, T. E.
Publication Date: 2005
Journal: Behavioral Disorders, 31, 33-50
Treatment: individualized dialogue journals in which students and teachers
communicated daily about observed social and behavioral issues
Outcomes: Time on task, number of words written, and writing quality
Design: Multiple-baseline design across participants with randomly ordered
intervention onset. Baseline lengths of 4, 8, 12, 16, or 20 observations in
a series of 26 observations
Sample: 5 students in 6th grade identified as having EBD
Randomization Test: “These differences were also statistically significant
from baseline to intervention phases according to a randomization test
(p < .001).” How many randomizations were possible? Minimum pvalue?
Extending Classwide Social Skills Interventions to
At-Risk Minority Students: A Preliminary Application
of Randomization Tests Combined with SingleSubject Methodology
Authors: Bardon, L. A., Dona, D. P., & Symons, F. J.
Publication Date: 2008
Journal: Behavioral Disorders, 33, 141-152
Treatment: PATHS a violence prevention program
Outcome: % of intervals cooperative play during classroom play time
Design: Replicated AB design across participants with randomly selected
intervention start times. For each participant, onset of the intervention was
selected randomly from the 6th to 15th observation in a planned 20 observation
sequence. Baseline lengths were 6, 8, and 8. Is this random assignment ideal?
Sample: 3 elementary students with at least 3 office behavioral referrals
Randomization Test: “…the proportion of 1,000 randomly sampled data divisions
producing a mean difference in the predicted direction at least as large as the
experimentally obtained difference was 0.036”
How many randomizations were possible?