view powerpoint slides

Bootstrap Confidence
Intervals for Three-way
Component Methods
Henk A.L. Kiers
University of Groningen
The Netherlands
1
SUBJECTs
i=1
.
.
.
.
.
.
I
threeway
data
X
K
OCCASIONS
j=1 . . . . . . . J
VARIABLES
k=1
2
i=1
.
SUBJECTs .
.
.
.
.
threeway
data
X
K
OCCASIONS
j=1 . . . . . . . J k=1
VARIABLES
Three-way Methods:
I
Tucker3
Xa = AGa(CB) + Ea
A (IP), B (JQ), C (KR) component matrices
Ga matricized version of G (PQR) core array
CP = Candecomp/Parafac
Xa = AGa(CB) + Ea
G (RRR) superdiagonal
Practice:
• three-way methods applied to sample from population
• goal: results should pertain to population
3
Example (Kiers & Van Mechelen 2001):
• scores of 140 individuals,
on 14 anxiousness response scales
in 11 different situations
• Tucker3 with P=6, Q=4, R=3 (41.1%)
Rotation: B, C, and Core to simple structure
4
Results for
example data
Kiers & Van
Mechelen
2001:
B 
Factor
Loadings
Exhilaration
Auton.
physiol.
Sickness
Excret.
need
Heart beats faster
-0.06
0.57
-0.07
-0.18
“Uneasy feeling”
-0.28
0.25
0.07
-0.06
Emotions disrupt
action
Feel exhilarated
and thrilled
Not want to avoid
situation
Perspire
-0.18
0.20
0.23
-0.01
0.46
0.11
0.05
0.09
0.41
-0.11
0.06
-0.02
-0.07
0.52
-0.03
-0.03
Need to urinate
frequently
Enjoy the
challenge
Mouth gets dry
0.06
0.21
-0.03
0.48
0.48
0.09
0.08
0.01
0.08
0.36
0.00
0.32
Feel paralyzed
-0.06
0.18
0.28
0.19
0.00
0.00
0.79
-0.12
Full feeling in
stomach
Seek experiences
like this
Need to defecate
0.48
0.12
0.09
-0.03
-0.09
-0.12
-0.09
0.72
Feel nausea
-0.14
-0.18
0.45
0.25
5
C 
Situation Type Performance Inanimate Alone in
judged by
danger
woods
Loadings
others
Auto trip
0.13
0.15
-0.11
New date
0.26
0.15
-0.30
Psychological
experiment
Ledge high on
mountain side
Speech before
large group
Consult counsel.
bureau
Sail boat on rough
sea
Match in front of
audience
Alone in woods at
night
Job-interview
0.04
0.09
0.13
0.04
0.77
0.09
0.49
-0.14
-0.11
0.25
-0.07
0.19
0.15
0.53
-0.07
0.38
0.11
-0.09
0.09
0.05
0.89
0.48
-0.13
-0.04
Final exam
0.46
-0.16
0.16
6
Core
Core 
dim. 1
dim. 2
dim. 3
dim. 4
dim. 5
dim. 6
Fit
dim. 1
dim. 2
dim. 3
dim. 4
dim. 5
dim. 6
Fit
dim. 1
dim. 2
dim. 3
dim. 4
dim. 5
dim. 6
Fit
Performance judged by others
Exhil
Auto
Sickphys.
ness
-36.4
1.0
0.4
-0.2
0.7
-0.1
-0.5
0.1
-1.2
-0.8
-1.6
0.3
34.9
-0.4
1.0
40.0
-1.0
1.2
6.6 %
7.9 %
5.8 %
Inanimate danger
Exhil
Auto
Sickphys.
ness
-1.6
-3.4
-2.0
2.9
3.5
2.4
-0.4
2.6
-1.9
-30.2
11.1
11.8
-0.4
-4.0
6.5
2.7
11.2
0.5
4.0 %
1.5 %
1.1 %
Alone in woods
Exhil
Auto
Sickphys.
ness
-2.5
-4.3
-1.7
1.6
-0.5
3.9
-26.4
18.5
8.4
-0.4
0.4
-0.8
3.0
1.7
9.8
1.2
5.0
-4.8
3.0 %
1.9 %
0.9 %
Excr.
need
0.2
36.9
-0.9
-2.2
0.2
1.2
6.2 %
Excr.
need
-1.0
15.2
-1.9
9.0
-4.7
-0.6
1.4 %
Excr.
need
2.2
12.4
6.6
-2.4
2.2
-7.0
0.9 %
7
Is solutions stable?
Is solution ‘reliable’? Would it also hold for population?
Kiers & Van Mechelen report split-half stability results:
Split-half results: rather global stability measures
8
How can we assess degree of
stability/reliability of individual results?
 confidence intervals (CI) for all parameters
• not readily available
• derivable under rather strong assumptions
(e.g., normal distributions, full identification)
• alternative:
BOOTSTRAP
9
BOOTSTRAP
• distribution free
• very flexible (CI’s for ‘everything’)
• can deal with nonunique solutions
• computationally intensive
10
Bootstrap procedure:
Analyze sample data X (IJK) by desired method
 sample outcomes  (e.g., A, B, C and G)
Repeat for b=1:500
• draw sample with replacement from I slabs of X  Xb (IJK)
• analyze bootstrap sample Xb in same way as sample
 outcomes b (e.g., Ab, Bb, Cb and Gb)
For each individual parameter :
• 500 values available
• range of 95% middle values
 “95% percentile interval”
( Confidence Interval)
11
Basic idea of bootstrap:
• distribution in sample = nonparametric maximum
likelihood estimate of population distribution
• draw samples from estimated population distribution,
just as actual sample drawn from population
From which mode do we resample?
Answer: mimic how we sampled from population
• sample subjects from population
 resample A-mode
12
Three questions:
• How deal with transformational nonuniqueness?
Lots of possibilities, depends on interpretation
• Are bootstrap intervals good approximations
of confidence intervals?
Not too bad
• How deal with computational problems (if any)?
Simple effective procedure
13
1. How to deal with transformational nonuniqueness?
• identify solution completely
• identify solution up to permutation/reflection
 for CP and Tucker3
• identify solution up to orthogonal transformations
• identify solution up to arbitrary nonsingular
transformations
 only for Tucker3
14
Identify solution completely:
 uniquely defined outcome parameters 
 bootstrap straightforward (CI’s directly available)
CP and Tucker3 (principal axes or simple structure)
- solution identified up to scaling/permutation
Both cases:
- further identification needed
15
Identify solution up to permutation/reflection
 outcome parameters b may differ much,
but maybe only due to ordering or sign
 bootstrap CI’s unrealistically broad !
 how to make b’s comparable?
Solution:
 reorder and reflect columns in (e.g.) Bb, Cb
such that Bb, Cb optimally resemble B, C
does not
affect fit
16
e.g., two equally
strong components
 unstable order
Completely
identified
Identified
up to
perm./refl.
pros
cons
direct bootstrap CI’s
takes orientation, order,
(too?!) seriously
more realistic solution
cannot fully mimic
sample & analysis
process
17
Intermezzo
What can go wrong when you take orientation too seriously?
Two-way Example Data:
100 x 8 Data set
PCA: 2 Components
Eigenvalues: 4.04, 3.96, 0.0002,
(first two close to each other)
PCA (unrotated) solutions for
variables (a,b,c,d,e,f,g,h)
*) thanks to program by Patrick Groenen (procedure by
Meulman & Heiser, 1983)
bootstrap 95% confidence ellipses*
18
What caused these enormous ellipses?
Look at loadings for data and some bootstraps:
a
b
c
d
e
f
g
h
Data
-0.6 0.8
-0.8 0.7
-0.5 0.9
-0.8 0.6
-0.8 -0.6
-0.7 -0.7
-0.9 -0.5
-0.6 -0.8
Bootstrap 1
-0.6
0.8
-0.7
0.7
-0.5
0.9
-0.8
0.6
-0.8
-0.6
-0.7
-0.7
-0.9
-0.5
-0.7
-0.8
Bootstrap 2 Bootstrap 3
-1.0 -0.3
0.8
0.6
-0.9 -0.4
0.7
0.7
-1.0 -0.2
0.9
0.5
-0.8 -0.6
0.6
0.8
0.3
-1.0 -0.7
0.7
0.5
-0.9 -0.8
0.6
0.2
-1.0 -0.6
0.8
0.6
-0.8 -0.9
0.5
… leading to standard errors: ...
Loadings
a
b
c
d
e
f
g
h
-0.6
-0.8
-0.5
-0.8
-0.8
-0.7
-0.9
-0.6
0.8
0.7
0.9
0.6
-0.6
-0.7
-0.5
-0.8
Bootstrap based
standard errors
0.6
0.5
0.5
0.6
0.6
0.5
0.5
0.6
0.6
0.5
0.6
0.5
0.5
0.5
0.6
0.5
19
Conclusion: solutions very unstable,
hence: loadings seem very uncertain
However ….
Configurations of subsamples very similar
So: We should’ve considered the whole configuration !
20
Identify solution up to orthogonal transformations
Tucker3 solution with A, B, C columnwise orthonormal:
 any rotation gives same fit (counterrotation of core)
 outcome parameters b may differ much,
but maybe only due to coincidental ‘orientation’
 bootstrap CI’s unrealistically broad
Make b’s comparable:
 rotate Bb, Cb, Gb such that they optimally resemble B, C, G
comparable across
bootstraps
How?
• minimize f1(T)=||BbT–B||2 and f2(U)=||CbU–C||2
• counterrotate core: Gb(UT)
• minimize f3(S)=||SGb–G||2
• use Bb* = BbT , Cb* = CbU, Gb* = SGb to determine 95%CI’s
21
Notes:
• first choose orientation of sample solution
(e.g., principal axes or other)
• order of rotations (first B and C, then G):
somewhat arbitrary, but may have effect
22
Identify solution up to nonsingular transformations
....analogously.....
 transform Bb, Cb, Gb so as to optimally resemble B, C, G
23
Expectation:
the more transformational freedom used in bootstraps
 the smaller the CI’s
Example:
•
anxiety data set
(140 subjects, 14 scales, 11 situations)
•
apply 4 bootstrap procedures
•
compare bootstrap se’s of all outcomes
24
Some summary results:
Bootstrap
Method
mean
se (B)
mean
se (C)
mean
se (G)
Principal
Axes
.085
.101
3.84
Simple
Structure
.085
.093
2.77
Orthog
Matching
.059
.088
2.20
Oblique
Matching
.055
.076
2.17
25
Now what CI’s did we find for Anxiety data
Plot of confidence ellipses for first two and last two B components
26
Confidence intervals for
Situation Loadings
27
Confidence intervals for
Higest Core Values
28
29
2. Are bootstrap intervals good approximations
of Confidence Intervals?
95%CI should contain popul.values in 95% of samples
 “coverage” should be 95%
Answered by SIMULATION STUDY
Set up:
• construct population with Tucker3/CP structure + noise
• apply Tucker3/CP to population  population parameters
• draw samples from population
• apply Tucker3/CP to sample and construct bootstrap CI’s
• check coverage: how many CI’s contain popul. parameter
30
Design of simulation study:
• noise:
low, medium, high
• sample size (I):
20, 50, 100
• 6 size conditions:
(J=4,8, K=6,20, core: 222, 333, 432)
Other Choices
• number of bootstraps: always 500
• number of populations: 10
• number of samples 10
Each cell: 1010500 = 50000 Tucker3 or CP analyses
(full design: 336=54 conditions)
31
Should be close
to 95%
Procedure
Bootstrap for
CANDECOMP/PARAFAC
Tucker3 principal axes,
bootstraps permuted/reflected
Tucker3 simple structure,
bootstraps permuted/reflected
Tucker3 simple structure,
bootstraps optimally rotated
Tucker3 simple structure,
bootstraps optimally transformed
Tucker3 principal axes,
bootstraps permuted/reflected,
nonsimple B and C used in construction
B
C
G
95%
94%
-
91%
85%
87%
Here are the 92%
results
94%
92%
94%
93%
93%
95%
94%
95%
95%
94%
94%
32
Some details:
ranges of values per cell in design
(and associated se’s)
Procedure
CP
Tucker3 princ
Tucker3 simp
Tucker3 rotated
Tucker3 transformed
Tucker3 nonsimple
B and C
B
C
G
91.5-98.2
(1.4, .3)
81.2-96.8
(1.8,.7)
84.3-95.8
(.8,.5)
84.2-95.8
(.8,.5)
91.1-98.1
(1.9,.5)
90.7-98.1
(1.2,.5)
91.8-97.4
(.4,.5)
72.2-92.1
(2.0,1.9)
91.8-96.5
(.5,.6)
92.0-95.7
(.5,.7)
92.1-95.6
(.5,.7)
86.4-97.2
(2.8,.5)
75.8-92.8
(3.2,1.6)
88.8-96.8
(1.6,.4)
86.0-95.0
(1.9,.9)
89.4-96.8
(1.5,.6)
85.4-98.3
(2.0,.3)33
3. How deal with computational problems (if any)
Is there a problem?
Computation times per 500 boostraps:
(Note: largest data size: 100  8  20)
CP:
Tucker3 (SimpStr):
Tucker3 (OrthogMatch):
min 4 s, max 452 s
min 3 s, max 30 s
min 1 s, max 23 s
Problem most severe with CP
34
How deal with computational problems for CP?
Idea: Start bootstraps from sample solution
Problem: May detract from coverage
Tested by simulation:
• CP with 5 different starts per bootstrap
vs
• Fast bootstrap procedure
35
Results:
Fast method about 6 times faster (as expected)
Coverage
Optimal method:
B: 95.5%
C: 95.1%
Fast method:
B: 95.3%
C: 94.7%
• Time gain enormous
• Coverage hardly different
36
Conclusion & Discussion
• Bootstrap CI’s seem reasonable
• Matching makes intervals smaller
• Computation times for Tucker3 acceptable,
for CP can be decreased by starting each
bootstrap in sample solution
37
Conclusion &
some first tests
show that this
Discussion works
• What do bootstrap CI’s mean in case of
matching?
• 95% confidence for each value ?
- chance capitalization
- ignores dependence of parameters
some first (they
tests vary together)
show that this
Show dependence by bootstrap movie...!?!
does not work
Develop joint intervals (hyperspheres)...?
• Sampling from two modes (e.g., A and C) ?
38