Presentation by Eric Schone – Reliability for Peer Grouping (PDF: 167KB/9 pages)

Reliability for Peer Grouping
Eric Schone, Ph.D.
Mathematica Policy Research
Reliability Workgroup Charter

Minnesota statute 62U.04 requires the
Commissioner to consult with providers,
health plans, purchasers and consumers
regarding an appropriate minimum reliability
threshold for the peer grouping results.

The purpose of this group is for members to
provide advice about how to consider
reliability for the peer grouping analysis and
for MDH to convey information about reliability
related to specific methodological issues
1
Reliability and Validity

From the charter
– Validity refers to the accuracy of a measurement and
reliability refers to the consistency of a measurement

A measure is valid if its value reflects what it is
supposed to be measuring
– Quality of care for quality measures
– Impact on cost of care for efficiency measures

A measure is reliable if changes in the value of
the measure reflect changes in performance
– Differences between providers are real
– Differences over time indicate improvement or
worsening
2
What Makes a Measure Valid?



Accurate record-keeping
The numerator and denominator are correctly defined
Examples
– Free throw percentage
• Clearly defined numerator, denominator, no measurement
error
• Poor measure of offensive impact
– Batting average
• Alternative measures of offensive production
• Choosing a pinch hitter
– Risk adjusted outcomes
• Mortality
• Cost
3
What Makes a Measure Reliable?

Variation
– Variation between providers is substantial
– Random variation is small

Example
– Batting average vs “Clutch-hitting” average
4
Some Reliability Formulas

Intraclass Correlation R = τ2/(τ2+σ2/N),
– Increases with τ2 (variance between providers)
– Decreases with σ2 (variance within measure)
– Increases with N (number of patients)

Minimum threshold N = R σ2 /(τ2 /(1-R))

Risk of misclassification: Complicated
5
Reliability Trade-Offs

Winsorizing can reduce both between and
within variance

Increasing reporting threshold decreases
number reported

Increasing number classified increases risk of
misclassification
6
Example: Reliability, sample size and
misclassification
Misclassified
False
positive
False
positive/
positives
20
22%
11%
44%
.5
30
21%
10%
40%
.7
70
15%
8%
31%
.9
270
8%
4%
16%
.4
20
30%
15%
30%
.5
30
25%
13%
25%
.7
70
15%
8%
15%
.9
270
5%
3%
5%
Sample
size
.4
Reliability
High =Top
25%
High =Top
50%
7
Conclusions

Reliability standard for reporting/including in
results different from standard for
classification

Reliability subject to trade-offs
– Reliability and validity goals can be at odds

Type of misclassification matters
– False negative may be preferred to false positive
8