Modelling health care costs - The University of Sheffield

Modelling health care costs:
practical examples and applications
Andrew Briggs
Philip Clarke
University of Oxford
&
Daniel Polsky
Henry Glick
University of Pennsylvania
Modelling health care costs:
Presentation overview
• Statement of problem
• Examples of cost distributions
– Overall
– By treatment group
• Testing cost differences
– Raw scale
– Transformations
– Back transformation
• Multivariate analysis
– Raw scale
– Transformation
• Summary/future directions
Modelling health care costs:
Statement of problem
• Common to collect cost data in clinical
trials
• Cost data almost always skewed and may
exhibit substantial kurtosis
• Nevertheless, arithmetic means are the
concern of decision makers
– Only the mean can be used to estimate total cost
of care
– Only total cost of care will lead to balanced
budgets
• Cost models have a role beyond the simple
estimation of within trial analysis
– May be used to generalise to broader
populations
– May be used for sub-group analysis
Modelling health care costs:
Examples of cost distributions
1.BOAES
Fraction
.6
.4
.2
0
0
5000
10000
15000
Cost
2.UKPDS
Fraction
.6
.4
.2
0
0
500
1000 1500
cost
2000 2500
Modelling health care costs:
Examples of cost distributions
3.ACT
Fraction
.2
.1
0
0
100000
200000
Cost
300000
4.Dan
.8
Fraction
.6
.4
.2
0
0
100000
Cost
200000
Modelling health care costs:
Examples of cost distributions
5.SAH
Fraction
.3
.2
.1
0
0
100000
Cost
200000
50000
Cost
75000 100000
6.HG
Fraction
.2
.1
0
0
25000
Modelling health care costs:
Cost distributions by treatment
1.BOAES: control group
Fraction
.6
.4
.2
0
0
5000
10000
15000
Cost
1.BOAES: treatment group
Fraction
.6
.4
.2
0
0
5000
10000
Cost
15000
Modelling health care costs:
Cost distributions by treatment
2.UKPDS: control group
Fraction
.6
.4
.2
0
0
500
1000 1500
cost
2.UKPDS: treatment group
2000 2500
0
500
2000 2500
Fraction
.6
.4
.2
0
1000 1500
cost
Modelling health care costs:
Cost distributions by treatment
3.ACT: control group
Fraction
.3
.2
.1
0
200000
Cost
3.ACT: treatment group
0
100000
0
100000
300000
Fraction
.3
.2
.1
0
200000
Cost
300000
Modelling health care costs:
Cost distributions by treatment
4.Dan: control group
.8
Fraction
.6
.4
.2
0
100000
Cost
4.Dan: treatment group
200000
0
200000
0
.8
Fraction
.6
.4
.2
0
100000
Cost
Modelling health care costs:
Cost distributions by treatment
5.SAH: control group
Fraction
.3
.2
.1
0
0
100000
Cost
5.SAH: treatment group
200000
0
200000
Fraction
.3
.2
.1
0
100000
Cost
Modelling health care costs:
Cost distributions by treatment
6.HG: control group
Fraction
.2
.1
0
0
25000
50000
Cost
6.HG: treatment group
75000 100000
0
25000
75000 100000
Fraction
.2
.1
0
50000
Cost
Approaches for testing cost differences
• Parametric T-test or nonparametric
bootstrap on untransformed cost
– Both unbiased
– Inefficient?
• (Log) transformation of cost
– Straight retransformation biased
– Use E Ci   exp   0.5 2 
– Or non-parametric smearing
1
E Ci   exp     exp  i 
N
• Generalised linear models
– lognormal: ln  E Ci     ti
– Expectation modelled directly so no
retransformation problem
– Wide variety of possible link
function/distributions
Zhou’s test based on log normality
H 0 : exp  1  0.5 12   exp  0  0.5 02 
 H 0 :  1  0.5 12   0  0.5 02
 H 0 :  1  0.5 12   0  0.5 02  0
 H0 :  1   0  0
 iff 
2
1
  12  .
• Special case of homogeneity of log
variances – test of geometric means is
equivalent to test of arithmetic means
• By symmetry: for special case of
homogeneity of log means – test of
equality of log variances is equivalent to
test of arithmetic means?
• Zhou’s proposed test combines the two
P-values and confidence intervals for
back-transformed cost differences
Dataset
P-value
Cost diff
(95% CI)
T-test: raw cost
0.013
149
(31 - 267)
Bootstrapped means
0.012
149
(44 - 255)
Zhou (bootstrap)
0.026
107
(21 - 191)
<0.001
212
(146 - 278)
GLM: log link / normal
0.019
149
(26 - 259)
GLM: B-C link / normal
0.019
149
(26 - 260)
T-test: raw cost
0.971
0
(-8 - 8)
Bootstrapped means
0.938
0
(-8 - 8)
Zhou (bootstrap)
0.988
0
(-14 - 12)
Log (smeared)
0.165
5
(-2 - 13)
GLM: log link / normal
0.971
0
(-8 - 8)
GLM: B-C link / normal
0.971
0
(-8 - 8)
T-test: raw cost
0.179
-15,523
(-38,248 - 7,201)
Bootstrapped means
0.172
-15,523
(-37,854 - 6,665)
Zhou (bootstrap)
0.005
-136,747
(-611,607 - -21,014)
Log (smeared)
0.393
14,162
(-18,321 - 47,530)
GLM: log link / normal
0.185
-15,523
(-37,212 - 7,790)
GLM: B-C link / normal
0.182
-15,523
(-37,458 - 7,509)
1. BOAE
Log (smeared)
2. UKPDS
3. ACT
P-values and confidence intervals for
back-transformed cost differences
Dataset
P-value
Cost diff
(95% CI)
T-test: raw cost
0.057
2,925
(-91 - 5,940)
Bootstrapped means
0.058
2,925
(-97 - 5,807)
Zhou (bootstrap)
<0.001
114,565
Log (smeared)
<0.001
-8,589
GLM: log link / normal
0.073
2,925
(-297 - 5,675)
GLM: B-C link / normal
0.071
2,925
(270 - 5701)
T-test: raw cost
0.236
-4,060
(-10,795 - 2,675)
Bootstrapped means
0.230
-4,060
(-10,836 - 2,575)
Zhou (bootstrap)
0.119
-4,019
(-9,429 - 2,170)
Log (smeared)
0.004
-6,701
(-11,128 - -2,229)
GLM: log link / normal
0.243
-4,060
(-10,506 - 2,881)
GLM: B-C link / normal
0.244
-4,060
(-10,484 - 2,909)
T-test: raw cost
0.077
2,353
(-259 - 4,965)
Bootstrapped means
0.080
2,353
(-200 - 4,959)
Zhou (bootstrap)
0.468
1,258
(-1,873 - 4,388)
Log (smeared)
0.024
2,891
(394 - 5,397)
GLM: log link / normal
0.081
2,353
(-298 - 4,899)
GLM: B-C link / normal
0.081
2,353
(-295 - 4,903)
4. DP
(64,023 - 194,871)
(-13,277 - -4,413)
5. SAH
6. HG
Approaches to model selection
• Examine fit using standard
regression diagnostics
– R2, normal probability plots etc.
– Summarises fit to observed data
• Test the predictive ability of the
models directly
– Ability to predict observations
not used in model fitting
Predictive ability of the models
A simulation experiment
1.
Sample was split into two equal parts
•
•
2.
3.
Part i designated ‘training sub-sample’
Part ii designated ‘test sub-sample’
Each model fitted using the training
sub-sample and costs predicted for
the test sub-sample
Mean square error calculated for
each model
Process repeated in 10,000 trials
Results of a simulation exercise
Mean squared error
Model
mean
OLS (cost)
10466
53
102
OLS log(cost) no smearing
16072
226
127
OLS log(cost) smeared
47432
1489
218
OLS sqrt(cost) no smearing
10821
55
104
OLS sqrt(cost) smearing
10441
54
102
Poisson regression
11427
70
107
2-part OLS (‘+’ve cost)
10467
53
102
2-part OLS log(‘+’ve cost) no
smearing
2-part OLS log(‘+’ve cost)
11298
54
106
11689
51
108
smearing
2-part OLS sqrt(‘+’ve cost) no
smearing
2-part OLS sqrt(‘+’ve cost)
10616
55
103
10429
10757
54
51
102
104
smearing
Tobit
SE – estimated standard error of the mean
RMSE – root mean squared error
SE
RMSE
P-values and confidence intervals for
back-transformed cost differences
Dataset
P-value
Cost diff
(95% CI)
T-test: raw cost
0.013
149
(31 - 267)
Bootstrapped means
0.012
149
(44 - 255)
Zhou (bootstrap)
0.026
107
(21 - 191)
<0.001
212
(146 - 278)
0.019
149
(26 - 259)
Covar Adj raw cost
0.
180
(70 - 300)
Covar Adj: Log(smeared)
0.
222
(126 - 338)
Covar Adj GLM: log
0.
154
(-48 - 289)
T-test: raw cost
0.971
0
(-8 - 8)
Bootstrapped means
0.938
0
(-8 - 8)
Zhou (bootstrap)
0.988
0
(-14 - 12)
Log (smeared)
0.165
5
(-2 - 13)
GLM: log link / normal
0.971
0
(-8 - 8)
Covar Adj raw cost
0.
-1
(-9 - 6)
Covar Adj: Log(smeared)
0.
0
(-7 - 8)
Covar Adj GLM: log
0.
-2
(-13 - 7)
T-test: raw cost
0.179
-15,523
(-38,248 - 7,201)
Bootstrapped means
0.172
-15,523
(-37,854 - 6,665)
Zhou (bootstrap)
0.005
Log (smeared)
0.393
14,162
(-18,321 - 47,530)
GLM: log link / normal
0.185
-15,523
(-37,212 - 7,790)
Covar Adj raw cost
0.
-18,378
(-43,078 - 6,555)
Covar Adj: Log(smeared)
0.
-12,602
(-47,687 - 24,271)
Covar Adj GLM: log
0.
-25,230
(-57,500 - 7,039)
1. BOAE
Log (smeared)
GLM: log link / normal
2. UKPDS
3. ACT
-136,747 (-611,607 - -21,014)
P-values and confidence intervals for
back-transformed cost differences
Dataset
P-value
Cost diff
(95% CI)
T-test: raw cost
0.057
2,925
(-91 - 5,940)
Bootstrapped means
0.058
2,925
(-97 - 5,807)
Zhou (bootstrap)
<0.001
114,565
Log (smeared)
<0.001
-8,589
0.073
2,925
(-297 - 5,675)
Covar Adj raw cost
0.
3,078
(125 - 6,102)
Covar Adj: Log(smeared)
0.
3,649
(473 - 6,924)
Covar Adj GLM: log
0.
3,364
(-984 - 8,149)
T-test: raw cost
0.236
-4,060
(-10,795 - 2,675)
Bootstrapped means
0.230
-4,060
(-10,836 - 2,575)
Zhou (bootstrap)
0.119
-4,019
(-9,429 - 2,170)
Log (smeared)
0.004
-6,701
(-11,128 - -2,229)
GLM: log link / normal
0.243
-4,060
(-10,506 - 2,881)
Covar Adj raw cost
0.
-3,289
(-9,394 - 3,073)
Covar Adj: Log(smeared)
0.
-4,036
(-9,729 - 1,680)
Covar Adj GLM: log
0.
-3,248
(-16,448 - 9,510)
T-test: raw cost
0.077
2,353
(-259 - 4,965)
Bootstrapped means
0.080
2,353
(-200 - 4,959)
Zhou (bootstrap)
0.468
1,258
(-1,873 - 4,388)
Log (smeared)
0.024
2,891
(394 - 5,397)
GLM: log link / normal
0.081
2,353
(-298 - 4,899)
Covar Adj raw cost
0.
1,759
(-494 - 4,068)
Covar Adj: Log(smeared)
0.
1,772
(-321 - 4,097)
Covar Adj GLM: log
0.
1,540
(-1,067 - 4,132)
4. DP
GLM: log link / normal
(64,023 - 194,871)
(-13,277 - -4,413)
5. SAH
6. HG
Modelling health care costs:
Summary
• Different approaches to modelling health
care cost can lead to quite different
estimates
• Difficult to tell which is most appropriate
• Transforming cost data can be more
efficient
– GLM intuitive in modelling expectations
– But modelling log cost better for heavy tails?
• Covariate adjustment can help precision
and should be used whenever possible
– Will be used to extrapolate beyond the data
– Creates sub-group effects with transformed
models
– Creates challenges for summarising incremental
cost across different covariate patterns
Modelling health care costs:
Log cost distributions by treatment
1.BOAES: control group
Fraction
.2
.1
0
0
2
8
6
4
Natural log of cost
1.BOAES: treatment group
10
0
2
10
Fraction
.2
.1
0
4
6
8
Natural log of cost
Modelling health care costs:
Log cost distributions by treatment
2.UKPDS: control group
Fraction
.2
.1
0
0
2
8
6
4
Natural log of cost
2.UKPDS: treatment group
10
0
2
10
Fraction
.2
.1
0
4
6
8
Natural log of cost
Modelling health care costs:
Log cost distributions by treatment
3.ACT: control group
Fraction
.2
.1
0
0
2
10
8
6
4
Natural log of cost
3.ACT: treatment group
12
14
0
2
12
14
Fraction
.2
.1
0
4
6
8
10
Natural log of cost
Modelling health care costs:
Log cost distributions by treatment
Fraction
4.Dan: control group
.2
.1
0
0
2
10
8
6
4
Natural log of cost
4.Dan: treatment group
12
14
0
2
12
14
Fraction
.2
.1
0
4
6
8
10
Natural log of cost
Modelling health care costs:
Log cost distributions by treatment
5.SAH: control group
Fraction
.2
.1
0
0
2
10
8
6
4
Natural log of cost
5.SAH: treatment group
12
14
0
2
12
14
Fraction
.2
.1
0
4
6
8
10
Natural log of cost
Modelling health care costs:
Log cost distributions by treatment
6.HG: control group
Fraction
.2
.1
0
0
2
10
8
6
4
Natural log of cost
6.HG: treatment group
12
14
0
2
12
14
Fraction
.2
.1
0
4
6
8
10
Natural log of cost