Written Responses from RRT Members (PDF: 180KB/16 pages)

Protecting, maintaining and improving the health of all Minnesotans
Date:
April 15, 2011
To:
Provider Peer Grouping Rapid Response Team members
From:
Katie Burns, Health Economics Program
Subject:
Quality Composite Measure Design
Thank you for participating in the Rapid Response Team. In preparation for our next
meeting, I wanted to distribute the attached memo and excel tables from Mathematica
Policy Research, Inc. The memo summarizes our planned approach for constructing the
composite quality measure and outlines several issues for which we would like your
input:
•
What method should be used to combine individual measures into subcomposite
categories?
•
How should missing data be treated within the subcomposite categories?
•
How should missing subcomposite category scores be treated?
•
How should subcomposite categories be weighted as part of the overall composite
quality score?
We will review the memo during our meeting to ensure you have an opportunity to
clarify your understanding of the issues and to ask questions.
Response deadline: We will need your feedback on these issues by Tuesday, April 26
at 4:00 pm. Responses may be provided via email to [email protected].
MN Council of Health Plans: Sue Knudson
MDH Rapid Response Team
Peer Grouping Methodology
Quality Composite Measure Design
Thank you for the opportunity to review and provide input on the quality composite
measure design. Our feedback is outlined according to the approach for constructing
the composite.
1. What method should be used to combine individual measures into subcomposite categories?
The recommended approach of combining standardized scores for the
measures within each sub-composite and standardizing each measure with a
z-score is thorough but it introduces a moving target for performance instead
of having a stated goal. It is further confounded by the notion that the same
performance level in two measures relative to average could have very
different z-scores. For example, measures with smaller variation will
contribute more to the overall quality score than each measure contributing
equally. This relative influence makes this a complicated approach and is
likely to result in an outcome which is difficult to explain to providers and
consumers because it is not intuitive based on measure level results. While
this approach is good to highlight breakout performers, it is likely to be
perceived by providers and consumers to be arbitrary because it is difficult to
explain and peer group rankings could be counterintuitive to results of
individual performance measures.
Our experience in many measurement areas is that we enjoy a high
performing network of providers in our state. We do not recommend use of
this method because it would force some to look like relatively lower
performers when in actuality they are very strong performers. Based on this,
we recommend reconsideration of Option 3 to combine absolute rates. This
approach aligns more closely with other existing industry approaches that you
outline in addition it also aligns more closely with the recently released
federal ACO criteria. This method deals with topped out measures without
forcing some providers to look like they have relatively poor performance
when in fact they are a strong performer. The point scale can be simplified to
accommodate reasonable yet meaningful targeted performance by measure.
Well thought out targets is the key to communication on this approach to
mitigate perceptions of it being an arbitrary measures. This approach is more
intuitive, easy to understand and communicate.
2. How should missing data be treated within the sub-composite categories?
The current recommendation to calculate a sub-composite if at least one
measure is available will result in performance on one measure driving
performance for an entire category for many providers. An unintended
consequence of this approach will be that asthma results drives 60% of an
overall score because of the significant weight placed on chronic conditions.
This type of result could be very misleading because it is not representative of
overall provider quality. On a similar note, in our experience, it is never a
good idea to “impute” quality performance level for providers. It is either
demonstrated or absent. We also recommend the inclusion of displays that
are intuitive to understand and that list individual measure performance.
We recommend each sub-composite should have the majority or at least half
of available measures. We recommend use of the optimal/composite care
scores for both hospital and physician care, not the component parts. As an
alternative, component part use would be better than both. However, the
latter approach and the MDH recommended approach will result in overall
performance being swamped by these measures alone.
3. How should missing sub-composite category scores be treated?
We agree with the recommendation to require all sub-composite categories
scores be present for inclusion in physician peer grouping. We will reserve
comment on hospital until a recommendation is proposed.
4. How should sub-composite categories be weighted as part of the overall
composite quality score?
Measures of health information technology (HIT) are missing. Given
emphasis on safety and care coordination among other critical aspects, we
suggest adding these measures to both physician and hospital. Minnesota
providers have high adoption rates and should be given credit for early
adoption and capabilities. HIT could be considered another sub-composite on
its own because it is more of a structure measure, versus a process or
outcome measure.
Hospital measures appear to be disproportionally representative of Medicare
population with 50% of the score being driven by measures sourced from
CMS. Adding HIT measures would add greater balance so results are usable
by other cohorts and purchasers.
Other comments:
Given verbal comments on the RRT call and the inferences made in this memo
regarding a potential change to peer group physicians at the clinic level rather than
the group level, we suggest we revisit attribution rules. Physician level enumeration
is complicated by physicians practicing at multiple sites.
Last, in order to fully inform this methodology it would be most helpful to also
understand the final approach for stratifying results.
Thank you for the opportunity to provide input and suggest alternative methods for
consideration.
MN Medical Association: Janet Silversmith
Memorandum
To:
Katie Burns
Minnesota Department of Health
From: Janet Silversmith
Date: April 26, 2011
Re:
MMA Feedback on Rapid Response Team Quality Measurement Issues
On behalf of the Minnesota Medical Association (MMA), I appreciate the opportunity to
provide the following comments for your consideration with respect to the development
and incorporation of quality metrics into Minnesota’s provider peer grouping system.
Measure Selection
A total of 21 measures are proposed in three sub-composite areas – preventive services
(5 measures), short-term acute (3 measures), and chronic disease outcomes (13
measures). The MMA would note that this set of measures – although generally
reasonable for inclusion (see exception below) – still represents a fairly narrow and
limited picture of quality across all the types of care that primary care clinics deliver.
The MMA urges both clarity and caution when communicating the “total care” results
as these measures do not represent an overall statement on practice quality. This
challenge is more significant should the department attempt to create a quality and cost
composite result – which would infer measurement of quality associated with those
areas of care included in total cost, and vice versa. Given the limited available
measures, the MMA urges the department to modify its representation of “total
care” in its publication and communication of results and urges the department to
avoid the development of an overall combined cost/quality ranking.
Among the 21 measures recommended for inclusion are the optimal vascular care
(OVC) and optimal diabetes care (ODC) – both as individual measures and as
composites. The MMA is very concerned that the inclusion of the composites will
provide additional (inappropriate) weighting such that the results are skewed for the
lower and higher performers. At a minimum, the MMA urges the department to
study and report back to the RRT the distribution of the chronic disease sub
composite scores with the ODC and OVC measures included and excluded in order
to better understand the likely impact.
Combining Measures
The MMA supports the recommendation to combine standardized scores for the
measures within each subcomposite, using a z-score standardization approach.
Minimum Case Size
The MMA notes that the recommended minimum case sizes for clinic measures will be
30 (OVC/ODC and HEDIS) or 60 (HEDIS hybrid measures). Missing from the
background memo and discussion, however, is how the department intends to address
the reliability of quality measures. Research (see documents attached to email)
indicates that there is no guarantee of having reliable results with as few as 30 cases and
that reliability can vary significantly by measure. The MMA urges the department to
explicitly address the reliability of quality measures either through the RRT or
through the Reliability Work Group. The MMA requests follow up with the RRT
as to the department’s planned approach.
Combining Group and Clinic-Level Measures
The department notes that in an attempt to create peer grouping analyses at the clinic
level, it will be necessary to create clinic-level quality data where such gaps exist (i.e.,
all measures with the exception of the ODC and OVC). The MMA is strongly opposed
to the allocation of group-level results to individual clinics within a group. Such
assignment of results is both unfair and inappropriate to individual clinics. The MMA
believes strongly that physicians (and clinics/groups) should not be held accountable for
care they did not provide. Given the expected use of peer grouping results (e.g., to
change payment policy or limit provider networks), it is critical that the department
create accurate results, even if such results are limited to a smaller number of entities.
Publication of clinic-level results would not fairly or accurately differentiate the cost
and quality of care provided at each clinic and, as such, suggests a higher degree of
accuracy than the data will support. The MMA strongly opposes the assignment of
group results to individual clinics.
Weighting of Subcomposite Scores
The MMA generally supports the proposed weighting for the development of the
overall composite, recognizing that the weights are somewhat arbitrary. If
possible, it would be interesting to analyze the data to see if the weights reflect the
actual care delivered by primary care clinics (i.e., is their mix of services approximately
20% preventive, 20% short-term acute, and 60% chronic disease).
MN Hospital Association: Mark Sonneborn
A. Construction of Subcomposites
1. General approach for combining measures:
Mathematica’s recommendation is to use a standardized variation of
combining absolute rates by converting each measure to a z-score (option 1a).
While not judging whether this method is superior, it is different than the one
being employed by CMS in their Hospital Value Based Purchasing (HVBP).
CMS’s methodology is most closely aligned with option 3.
Given that hospitals will become increasingly familiar with HVBP as it is
implemented over the next few years, we would recommend option 3. Using
a different framework is unnecessarily confusing and adds to administrative
burden and complexity.
We do not accept the rationale given against option 3 – that 1) using a point
system and benchmarks could be viewed as arbitrary, and 2) it results in the
loss of individual measure distribution.
It is not arbitrary because it is following a methodology established by CMS.
Although one could argue that the CMS methodology arbitrarily chooses the
90th percentile as its top threshold and 50th as its bottom, and that it chooses a
10-point point scale, these are at least rational choices.
We do not understand the second rationale. You can choose to display
individual measures so that the reader sees the actual distribution of results.
But for scoring, if you use a 10-point scale, you will have 11 groups
(including the zeros).
The AHRQ measures do lend themselves more to z-score conversion, but
these can relatively easily be converted to benchmarks and point scores.
For any general approach used, there is an issue with “topped out” hospital
measures (discussed in section 4). The CMS proposed approach is to score
these measures as “pass/fail” – either a hospital gets full credit or no credit.
We support this approach. Using a z-score in situations with so little variation
may be problematic, as is setting 50th and 90th percentile benchmarks. We
would recommend setting a hard target of 95% compliance for these
measures.
Sections 2, 3, & 4:
We generally support the recommendations in sections A.2, 3 and 4 (see
comment on “topped out” measures above).
The minimum case threshold for HCAHPS measures, however, may need
amendment. Though HVBP may exclude hospitals with fewer than 100
responses, that program is meant only for PPS hospitals. Hospital
Compare still displays all HCAHPS-participating hospitals, including
CAHs. CMS only provides the range of returned surveys (less than 100,
100-300, over 300), so we cannot determine how many were actually
returned. We would recommend no minimum case threshold for
HCAHPS.
B. Construction of Overall Composite
Mathematica recommends excluding the HCAHPS scores in computing the overall
composite for CAHs. Though we understand this makes it cleaner from a
computational and comparison standpoint, it strikes us as unfortunate.
HCAHPS is mandated as part of SQRMS for CAHs with more than 500 discharges
per year, so it would stand to reason that those results should be used in Provider Peer
Grouping. We are aware of CAHs that are very engaged in improving patient
experience. We also suspect that many of the 20% of CAHs that fall out of the
analysis because of the recommended treatment of missing subcomposites are the
same CAHs who do not participate in HCAHPS because they have fewer than 500
annual discharges. This would leave roughly two-thirds of the remaining CAHs who
do participate in HCAHPS (even if they had fewer than 100 surveys returned).
Could you let CAHs with HCAHPS be scored with the same weighting method as
PPS hospitals and those without HCAHPS to redistribute to the 15 extra percentage
points to the process measures? That would violate the recommendation about a
missing subcomposite, but it could be treated as an exception to that rule.
MN Business Partnership: Beth McMullen
No response submitted.
AARP: Michelle Kimball
No response submitted.
DHS: Marie Zimmerman
No response submitted.
Protecting, maintaining and improving the health of all Minnesotans
Date: May 29, 2012
To:
Provider Peer Grouping (PPG) Rapid Response Team (RRT) Members
From: Stefan Gildemeister
Director, Health Economics Program
Subj.: Quality Composite Measure Design, revised first hospital report
Thank you for participating in the Rapid Response Team (RRT). In preparation for our
meeting this afternoon on May 29 (1:00-2:00 p.m.), I wanted to distribute the attached
memo from Mathematica Policy Research. The memo summarizes changes to the
scoring and compositing methodologies which we developed and implemented after
additional analysis and in response to hospital stakeholder comments—particularly with
respect to concerns about the original relative scoring methodology. We would like to
discuss and receive your comments on the following changes:
•
Using absolute thresholds rather than relative thresholds to assign points to each
measure;
•
Stricter requirements on the number of measures per subdomain that are needed
to receive a score; and
•
The combination of three former outcome subdomains—readmission, mortality,
and inpatient complications—into one combined outcome subdomain.
We will review the memo during our meeting to ensure you have an opportunity to clarify
your understanding of the issues and to ask questions.
Response deadline: We will need your feedback on these issues by June 5 at 4:00
p.m. Comments may be provided via email to [email protected]. We
will reconvene the RRT for a conference call on June 7 (8:00-9:30 a.m.) to discuss your
comments.
Minnesota Council of Health Plans: Sue Knudson_______________________________
MDH Rapid Response Team
Peer Grouping Methodology
Quality Composite Measure Design – Round 2
Thank you for the opportunity to review and provide input on the revised hospital
total care quality scoring methodology. We appreciate the Department’s effort to
improve the methodology and attention to producing sound results.
In general, we support the revised approach of scoring hospitals on their absolute
performance at the measure level as it is an improvement over the previous relative
scoring method. Still, there are several further improvements needed.
•
Rate Cutoffs for Absolute Point Assignments:
Thresholds should be modified to be measure specific. Using one threshold
across all measures methodologically produces different weights by measure
when this should be an intended decision. While this puts every provider and
every measure on the same ‘ruler’ it inadvertently devalues some measure’s
weight within the cluster. Essentially, this has the effect of saying where the
average performance is lower in comparison; these measures are worth fewer
potential points within the cluster. Again, the net effect of this is a
disproportional weight being placed on topped out measures. Measure specific
absolute thresholds or normalizing techniques should be tested and implemented
to solve for this issue.
•
Use of out of date quality measures:
MDH should reconsider the quality reports methodology to use the most current
quality data available. Presenting outdated quality results does not reflect quality
scores today and may inadvertently drive care to lower achieving hospitals. MDH
explains the rationale for using the outdated data as a means to match the time
period for the cost analysis which is largely driven by the availability of Medicare
data. There are two main reasons to seriously reconsider this approach. First,
lack of correlation between cost and quality is well documented in literature thus
rendering this rationale unsupported. Second, the cost results should be
segmented by commercial, Medicaid and Medicare for usability and accuracy.
•
Maximizing the number of providers analyzed at the expense of methods
accuracy:
MDH has consistently cited the principle, as stated by the original advisory
committee, to include as many providers/hospitals in the comparison as possible.
We don’t believe the intent of this principle is to increase the number of
providers/hospitals in the model if it compromises the integrity of methods used
to derive the comparisons. We provide two examples for which we think this
occurs:
1. Inclusion of small denominator measures may lead to inaccurate results as
MDH has lowered the minimum patient thresholds for the process of care
measures, using patient denominators as low as 10, where CMS Hospital
Compare and The Joint Commission recommend minimum denominators of
25 patients to achieve reliable results for transparency. CMS only lowers the
denominator to 10 patients for quality improvement purposes.
2. Domain scores require a minimum of 6 measures out of 16 total. We
recommend reconsidering this decision to include at least half or 8 measures.
Though page 9, paragraph 3 outlines that there is not very much difference
between the use of at least 7, 8 and 9 measures, the fact that on average 2
are imputed [table 3], is likely what drives this finding. We would assert
these confounding methods decisions are not reflective of actual results and
are misleading.
•
Lack of consideration of N size requirements and confidence intervals issued by
original measure/results publishers:
Previously MDH cited the CMS Value based Purchasing (HVBP) program’s lack of
use of confidence intervals as justification for not taking into account the
confidence intervals in PPG. However, the purposes of these programs are very
different as PPG focuses on steerage and transparency versus HVBP focuses
quality improvement.
•
Transparency of peer group performance vs. ranking:
With a 10 point range of the overall composite quality scores for PPS hospitals
[page 10, Table 4a], we question if there is any real difference in performance
among the hospitals using this methodology. Again, most variation is likely
driven by disproportionate weight on topped out measures as well as other
methods issues noted above. To that end, the process of ‘ranking’ the hospitals
versus ‘peer grouping’ the hospitals performance is misleading and an overbroad
interpretation of the legislated task.
In closing, the timeline to review this methodology change along with the upcoming
risk adjustment review as mentioned on the call seem rather ambitious to make the
goal of producing new results for hospital review by mid-July. The ambitious
timeline leaves the perception the rapid response team feedback couldn’t possibly be
taken into account in earnest if the timeline is to be made. Given our time
investment to provide improvement information on methods, we are hopeful it is
considered seriously.
On the whole, our review of the revised methodology finds it to provide incremental
improvement. Still, it is not on the whole a credible method for use in transparency
and peer grouping (or ranking per the current interpretation).
Minnesota Medical Association: Janet Silversmith____________________
From: Janet Silversmith <[email protected]>
Sent: Tuesday, June 05, 2012 2:12 PM
To:
McCabe, Denise (MDH)
Subject:
RRT Total Care Quality Composite
Denise:
On behalf of the MMA, I appreciate the opportunity to provide feedback on the total care
quality composite development for hospitals. Given the particular focus of this issue on
hospitals, the MMA’s comments are very limited.
Generally, the MMA finds the new approach (absolute thresholds) preferable to the
relative threshold approach previously used. We do have some questions about how
results will ultimately be displayed given the likely small variation between hospitals that
is expected. It is important that consumers/other users of the data not assume variation
in performance where none actually exists.
Thanks for your considerationJanet
-------------------------------------------------------------Janet Silversmith | Director of Health Policy
Minnesota Medical Association | mnmed.org
1300 Godward Street NE | Suite 2500 | Minneapolis, MN 55413
612-362-3763 office | 651-210-2275 cell | [email protected]
Minnesota Hospital Association: Mark Sonneborn____________________
From: Mark Sonneborn <[email protected]>
Sent: Tuesday, June 05, 2012 3:45 PM
To:
McCabe, Denise (MDH)
Subject:RE: PPG RRT: Total Care Quality Composite Memo
Denise:
Several things:
1) Statistical significance within individual measures
A)
I’ve shared a thought about statistical significance to Stefan late last week, but I’ll copy
it here:
Stefan,
I just wanted to give you a more concrete example of why using the risk-adjusted rate alone can
skew things.
In the latest run of the AHRQ measures (which is based on 4q10-3q11), I looked at IQI 19 – Hip
fracture mortality.
Fairview Ridges had a risk adjusted rate of .0126 which is among the better scores in the state.
St. John’s Maplewood was at .0185, and there were 8 hospitals between those two performance
rates. However, St. John’s rate was statistically significantly lower than the expected rate while
Ridges was not. Yet, Ridges (probably) would get more points than St. John’s.
An alternate step you could take would be to create a ratio of the risk-adjusted rate to the
expected rate. This would probably be fairer than the risk-adjusted rate alone, but it still does
not account for statistical significance (i.e. you could have 2 hospitals with O/E of 0.8 and one of
them is statistically significantly lower than expected and the other isn’t).
I’m not sure how to incorporate the confidence interval, but it seems unfair for a hospital that is
statistically significantly better to not get full points (or, in the reverse, for one that is worse to
get any points).
Since I sent that e-mail, I learned that the Leapfrog Group will be releasing a report tomorrow
that gives letter grades to hospitals based on their z-scores on many of the same measures that
we are looking at. I think there are problems with the Leapfrog methodology because they are
using a lot of very low-frequency patient safety measures (i.e. most hospitals have a rate of
zero), but here we have a group that releases reports for consumers that uses z-scores. This
might be an alternative worth exploring.
B)
I also received a comment from [Respondent A] on this topic. I’ll copy it here, but I have
to admit that I’m a bit confused by it. I tried to get clarification today, but was unable to reach
anyone:
Mark:
Thanks for the opportunity to provide comments.
The updated methodology does represent minor improvement in quality measurement but still
leaves much to be desired; no discussion was provided on cost methodology. Frankly, it is
difficult to tell how much improvement there actually is. There is not enough information for us
to try and model it.
They report to model this after the Federal Hospital Value Based Purchasing but they do not
include patient satisfaction data. We think inclusion of this data would be very helpful. We
couldn’t recall why it was initially left out of the public reporting.
It seems that the scoring of process measures has improved but now the scoring ignores
substantive differences for outcome measures – have we gone from one extreme of forced
variation to the other extreme of ignoring variation? At the end of the report; specifically, there
is an intent to show relative ranking, ostensibly because it is easier for consumers to understand
relative ranking. The relative ranking is meaningless if the real differences between hospitals
are non-existent. MDH needs to be able to demonstrate to consumers where real (statistically
significant) differences exist and where such differences do not exist. In the CMS hospital
compare outcomes measures, CMS does this simply by saying a hospital is better than, no
different than or worse than the national average, regardless of ranking (relative or absolute).
Can you ask Mathematica to show whether there is a way to see whether significant differences
in performance exits?
The methods so far proposed do not help consumers understand what is a significant variation
or what might cause the variation. One thing that is missing, and probably is far more important
in terms of value, is the utilization rate for certain procedures. We know that utilization varies
across the state and
It would be very helpful to know why care patterns are different in these areas. The current PPG
effort doesn't get us closer to this understanding. When we are looking at disparities in
outcomes, then this representation doesn't get us there either.
So, sooner or later, we need to step back and ask why we would try to keep refining a method
that doesn't really get us where we want to be. That is, stop trying to refine the wrong thing
(even if it is perfect, it is still wrong) and instead create a method to do the right thing.
2)
Other general comments
A)
All of the people that I contacted felt the change in direction to using hard thresholds
was welcome. There was also a very keen curiosity about how MDH would display the very
compressed total scores – here is a representative comment:
“For instance, it would be perfectly legitimate for them to report the percentile rankings, but
incumbent upon them to highlight that the percentile rankings may contain no meaningful
differences in actual scores.”
B)
There was also a comment about potentially using different thresholds for each
measure rather than across the board one:
“The use of an absolute scale assumes equal variation and similar denominators, and that 1%
variation in one metric is roughly equal to a 1% variation in another metric. One
recommendation to address this would be to use a weighted index in which each metric is
weighted by denominator and coefficient of variation. Compressed measures, and those with
very small denominators, would receive less index weight.
We would encourage them to use at least 1 digit precision in the calculations of the individual
measures as to mitigate rounding error. Not rounding will cause a 10-15% variation in the total
composite score for the IPPS hospitals.”
3)
Critical Access Hospital comment: “…Encourage the special emphasis on the importance
of clear, concise, easy-to-digest descriptions and presentation of results from both the PPS and
CAH perspectives as I did during an interview by Mathematica Research in May 2011 regarding
the reporting of hospital quality scores.” This person also felt that the shifting around of how
many measures you needed to be included in the report was a bit random – either you have
enough or you don’t. How many hospitals end up getting included should not be as big a factor
as whether it’s a fair analysis.
4)
Time lag of data – through the course of e-mail exchange, a side discussion arose about
the time lag issue. I then posed the question of whether it would be more preferable to have
cost and quality data from the same time period or to have the most recent data available. The
group unanimously chose the latter.
That’s it. Look forward to speaking in a couple of days.
Mark A. Sonneborn, FACHE
VP, Information Services
Minnesota Hospital Association
2550 University Av. W, Ste. 350-S
St. Paul, MN 55114
651-659-1423
Minnesota Council of Health Plans: Sue Knudson –
MCHP Response to MHA Comments________________________________
From: Knudson, Susan M <[email protected]>
Sent: Thursday, June 14, 2012 8:01 AM
To:
McCabe, Denise (MDH); Mark Sonneborn; Janet Silversmith; Beth
McMullen; Castellano, Susan E (DHS); Michele Kimball
Cc:
Jennifer Sanislo ([email protected]); Zimmerman, Marie L
(DHS); Wasieleski, Christine M (DHS); Lo, Sia X; Darcee Weber; Julie
Brunner; Eileen Smith; Gildemeister, Stefan (MDH)
Subject:RE: PPG RRT Comments: Total Care Quality Composite Memo
Mark,
It’s very helpful to see the other written feedback. I concur with [Respondent A’s] feedback and
view it very consistently with the MN Council of Health Plan’s feedback.
Thanks.
Sue
Sue Knudson
Vice President, Health Informatics
HealthPartners, Inc.
952-883-6185 Office
952-484-6744 Cell