Facility-Level Outcome Performance Measures for Nursing Homes

Copyright 1998 by
The Cerontobgical Society of America
The Cerontologist
Vol. 38, No. 6, 665-683
Risk-adjusted nursing home performance scores were developed for four health outcomes
and five quality indicators from resident-level longitudinal case-mix reimbursement data
for Medicaid residents of more than 500 nursing homes in Massachusetts. Facility
performance was measured by comparing actual resident outcomes with expected
outcomes derived from quarterly predictions of resident-level econometric models
over a 3-year period (1991-1994). Performance measures were tightly distributed among
facilities in the state. The intercorrelations among the nine outcome performance
measures were relatively low and not uniformly positive. Performance measures were not
highly associated with various structural facility attributes. For most outcomes,
longitudinal analyses revealed only modest correlations between a facility's performance
score from one time period to the next. Relatively few facilities exhibited consistent
superior or inferior performance over time. The findings have implications toward
the practical use of facility outcome performance measures for quality assurance
and reimbursement purposes in the near future.
Key Words: Health outcomes, Nursing homes, Long-term care
Facility-Level Outcome Performance
Measures for Nursing Homes
Frank Porell, PhD,1 and Francis G. Caro, PhD:
Historically, the system of public regulation of nursing homes tnat has developed in the United States
has focused on structural and process variables, such
as the presence of desired staffing and the provision
of certain services. Researchers have advocated the
use of outcome-based measures for measuring nursing home quality of care and for reimbursement purposes since the late 1960s (Andersen & Stone, 1969).
Studies that incorporated facility characteristics in riskadjusted resident-level outcome models have imparted
useful insights about structural and process attributes
associated with better outcomes (Braun, 1991; Cohen
& Spector, 1996; Kane, Bell, Riegler, Wilson, & Keeler,
1983; Linn, Gurel, & Linn, 1977; Porell, Caro, Silva,
& Monane, 1998; Spector & Takada, 1991). Furthermore, analyses of aggregate facility-level data have
identified structural factors associated with prevalence
rates for quality indicators (Qls) such as pressure ulcers, restraint use, or counts of deficiency citations from
regulatory facility reviews (e.g., Aaronson, Zinn, & Rosko,
This research was supported in part by a grant from the Agency for
Health Care Policy and Research (1 RO1 HS07587-01A1 "Health Outcome Measurement in Nursing Homes"). The authors thank Paul Dreyer
and Phil Mello of the Massachusetts Department of Public Health for
their assistance in data acquisition; Ajith Silva, Courish Hosangady, Helen
Miltiades, Connie Tai, and Graham Porell for their help in data file construction; George jakubson and Ed Norton for advice on estimation methods;
Mark Monane for clinical advice on model specification; Edward Norton,
Elinor Walker, William Spector, Vicki Freedman, David Mishel, Amy Lishko,
and Mona LeBlanc for helpful comments on earlier drafts of this article; and
Josephine Sturgis for her assistance with producing the manuscript.
'Address correspondence to Dr. Frank Porell, Gerontology Institute, University of Massachusetts Boston, 100 Morrissey Boulevard, Boston, MA
02125-3393. E-mail: [email protected]
2
Ceronology Institute, University of Massachusetts Boston.
Vol. 38, No. 6,1998
665
1994; Graber & Sloane, 1995; Phillips et al., 1996;
Zinn, Aaronson, & Rosko, 1993).
Only one published study to date has developed
risk-adjusted facility-level outcome performance measures for individual nursing homes using resident-level
outcome models (Mukamel, 1997). Mukamel (1997)
employed 5 years of case-mix reimbursement data from
550 facilities in upstate New York to estimate resident-level statistical outcome models for adverse outcomes. Five facility-level outcome performance scores
were derived for each nursing home by comparing
expected (predicted) and actual outcomes of residents
over the 5-year time period (1986-1990). The empirical findings revealed generally low and mostly insignificant correlations among the five facility outcome
performance measures, suggesting that exemplary
facility performance on any one outcome measure does
not imply much about likely performance on others.
The work of Mukamel (1997) was the first published study to provide empirical insight into how nursing home performance varies across different outcome
measures. However, the empirical findings provide no
insight about the longitudinal empirical properties of
facility outcome performance scores. There is little current Knowledge about whether a facility with higher
(or lower) than average risk-adjusted outcome performance in one time period exhibits higher (or lower)
than average performance in subsequent time periods
as well (without intervention). Yet such knowledge is
critical for assessing the prospects for the practical use
of facility-level measures for quality assurance or reimbursement purposes.
Mukamel (1997) argues that meaningful outcome
measures of quality should have four properties: (a) the
measure should be either a desirable or undesirable
outcome; (b) the outcome should be affected by health
and nursing care; (c) the measure should be based
on the outcomes of a sufficiently large population to
substantially reduce the influence of stochastic factors
on performance measurement; and (d) the measure
should take into account patient risk factors affecting
outcomes that are beyond the control of providers.
We agree about the importance of these four properties. However, we believe if facility-level outcome
measures are to be useful for quality assurance or reimbursement purposes in practice, they should possess at least two additional properties. First, performance should be measured over discrete time periods that are short enough to increase the likelihood
that identifiable care problems associated with poor
outcome performance can be identified quickly by
regulators. Second, nursing facility performance measurement should be based on outcomes for all residents of the facilities over the time period for which
performance is being measured.
The outcome measures in Mukamel's (1997) person-level statistical models were specified in terms of
changes in a resident's status (e.g., ADLs, use of restraints) over a 6-month time period following admission to a nursing facility. A single facility-level performance measure was then derived for each nursing home
by comparing expected and observed outcomes for
all new admissions between 1986 and 1990. We believe that the measurement of facility outcome performance over such a long time period greatly reduces
its utility for guiding quality assurance regulatory activities. Furthermore, Mukamel's facility performance
measures do not reflect any changes in a resident's
status after his or her first 6 months of nursing home
residence. The average length of stay of residents in
the study data was reported to be about 4 years, so it
is reasonable to question how well outcomes experienced within a half-year of admission reflect those
experienced over residence histories that are at least
eight times longer on average.
This article reports an effort to develop facility-level
outcome performance measures for nursing homes in
Massachusetts. Resident-level quarterly data routinely
collected from the state's Medicaid case-mix reimbursement system for more than 500 nursing homes were
employed in the development of facility-level outcome
performance measures for multiple resident-level outcomes over regular time intervals spanning a 3-year
time period (1991-1994). Performance scores were
developed to track the outcome performance of individual nursing homes over full-year and half-year time
periods based upon the outcomes experienced by
all Medicaid residents of those facilities. The study's
empirical findings provide some important insights
about the longitudinal empirical properties of facility
outcome performance measures.
Previous Research
Extensive reviews of the literature on nursing home
quality by Davis (1991) and Sainfort, Ramsey, and Monato
666
(1995) reveal a mixed and often inconsistent set of
empirical findings regarding relationships between quality
measures and facility attributes. Here we only briefly
summarize the general findings from relevant literature regarding significant facility correlates of health
outcomes ana quality indicators (Qls) employed in the
current study.
Survival and ADL Functional Status
Given the obvious importance of nursing services
in the provision of long-term care, the association between survival and functional status outcomes and the
mix and/or level of nurse staffing has been a common
focus of empirical research. Linn and colleagues (1977)
found both survival rates and improved patient functioning to be positively associated with higher levels
of registered nurse (RN) staffing. Cohen and Spector
(1996) also found a significant negative association between mortality outcomes and the level of RN staffing
in a facility. A significant negative association was also
found between increases in residents' activity of daily
living (ADL) functional limitations and a facility's licensed practice nurse (LPN) staffing level. No associations between ADLs and RN or nurse aide staffing
levels were found, however. Both Spector and Takada
(1991) and Porell and colleagues (1998) failed to find
an association between nurse staffing levels and mortality rates. However, Spector and Takada found that
residents of "understaffed" facilities (defined in terms
of the overall nurse staffing level relative to the average functional status of residents) were less likely to
experience improved functional status. Porell and
colleagues found no significant association between
nurse staffing variables and ADL functional status outcomes.
Among other facility attributes, including facility bed
size and private-pay patient mix, there is less empirical evidence of significant associations with survival
and functional status outcomes. Although for-profit
status was not specified in many studies of functional
status and mortality outcomes, two studies (Spector &
Takada, 1991; Zinn et al., 1993) found significant (negative) associations between for-profit status and mortality rates. Porell and colleagues (1998) found no
association between profit status and mortality or functional status outcomes, however.
Physical Restraint Use
Evans and Strumpf's (1989) review of restraint use
showed a literature giving little attention to potential
associations between restraint use and facility attributes
such as bed size, payment source, and nurse staffing
intensity. Tinneti, Liu, Marottoli, and Ginter's (1991)
prospective analysis of risk factors associated with restraint use in 12 facilities also showed no systematic
facility effects. However, Burton, German, Rovner,
Brant, and Clark's (1992) analysis of residents of facilities with either high or low prevalence of restraint use
is suggestive of the importance of staff attitudes or
styles of care in determining when restraints are used.
Also, Phillips and colleagues' (1996) recent residentThe Gerontologist
level analysis of some 2,000 residents in more than
250 nursing homes suggests that the likelihood of restraint use was lower in facilities with more nursing
staff (RNs, LPNs, and nurse aides) per bed.
Analyses of restraint use at the facility level have
yielded mixed findings as well. Zinn and associates
(1993) found prevalence rates for restraint use to be
higher in larger bed-size facilities, but did not find a
significant association for either nurse staffing or forprofit variables. Employing the same data as Zinn and
associates but with different model specifications, Aaronson and colleagues (1994) no longer found bed size
to be significant. Furthermore, for-profit status and nurse
staffing variables were only associated with restraintuse rates in facilities serving heavier need case-mix
populations.
Graber and Sloane (1995) analyzed restraint use in
North Carolina facilities in the year following the implementation of federal regulations severely restricting their
use, and found significant associations between prevalence of restraints and the ratio of licensed vocational nurses (LVNs) and nurse aides per patient (negative) and the mean ADL disability level of facility
residents (positive). However, they did not find significant associations for bed size, profit status, RN
staffing ratio, and some additional case-mix variables.
Decubitus (Pressure) Ulcers
Fewer studies have examined facility attributes associated with the prevalence of decubitus ulcers. Both
Zinn and associates (1993) and Aaronson and colleagues (1994) reported significantly higher prevalence
rates for pressure ulcers in larger facilities and in facilities with higher private-pay rates (which was contrary to a priori expectations). Aaronson and colleagues
also found higher prevalence rates in for-profit facilities when profit status was interacted with resident
case-mix control variables. Zinn and associates found
no association with profit status in a simple linear model
specification. The resident-level analysis of the likelihood of decubitus (pressure) ulcers among nursing
home residents by Conen and Spector (1996) revealed
no association with several nurse-staffing facility variables.
Data and Methodology
The main source of our data was the Management
Minutes Questionnaire (MMQ) used for case-mix reimbursement of nursing homes on behalf of Medicaid
residents in Massachusetts. Medicaid finances care for
about 75% of Massachusetts nursing home residents,
and 95% of the 560 nursing homes in the state accept Medicaid patients. The MMQ is first completed
at the time of a nursing home admission or at conversion from private-pay to Medicaid payer status. Thereafter MMQ records are updated regularly on a quarterly basis for all residents. MMQ submissions are
phased such that about one third of the facilities in
the state submit MMQ records in any month. Because
Medicaid payments to facilities are based on these data,
nursing homes have financial incentives for thorough
Vol. 38, No. 6, 1998
and accurate reporting of residents' service needs. Medicaid staff perform regular audits on facility data to
counter the inflation of MMQ scores by facilities to
increase their revenues.
Porell, Caro, and Silva (1993) conducted a formal
reliability analysis of the MMQ data. Nursing home
and auditor ratings for specific, individual MMQ items
with potential value for measuring patient outcomes
were compared for a sample of 4,438 Medicaid residents with above-average MMQ scores in facilities
with above-average facility-level MMQ scores. All
items exhibited levels of agreement exceeding the
minimum reliability standard of 80%. For example, the
agreement rates for individual audited ADL functional status items were: bathing (99.3%), grooming
(98.8%), dressing (98.6%), mobility (90.3%), and eating (86.4%), where impairment was defined as a need
for assistance or complete dependence on any item
(vs independence). The mean count of 4.45 ADLs per
resident from nursing home data was only 1.8% higher
than the mean of 4.38 ADLs derived from auditor data.
These reliability results are comparable to or better
than those reported by others for ADLs with similar
data not used for reimbursement purposes, including
MDS data (e.g., Hawes et al., 1995; Hogan, Smith, &
Jameson, 1986; National Center for Health Statistics,
1982).
Massachusetts death registry file data were obtained
to measure survival outcomes. Death data were only
available through 1993, so survival outcomes were censored at the end of December 1993. Death records
were matched to Medicaid resident MMQ data
through Social Security number identifiers. Facility attributes were specified from cost report and staffing
data obtained from the Massachusetts Rate Setting
Commission (MRSC). MRSC data were only available
for the years 1990-1992 at the time of the study. Because empirical findings based on two years of MMQ
data with corresponding facility attribute data showed
that the results were insensitive to the lagging of facility attribute data, facility data were lagged to permit
the analysis of longer residential histories.
An analytic longitudinal history file with more than
500,000 quarterly observations spanning the time period April 1991 through June 1994 was constructed
for 78,524 Medicaid residents in the state with at least
one MMQ record during the time period. Multiple
distinct resident histories were created for individuals
who transferred among nursing homes, and for individuals with a gap of one or more regular quarterly
MMQ records in their resident history for a single
facility. Unique facility identifiers were employed to
distinguish real transfers of residents among facilities
from facility ownership changes.
Outcome Measures
Health status measures associated with functional
abilities are probably among the most meaningful longterm care outcome measures given the nature of care
needs of residents. Health or functional status measured at point in time is not a useful measure of quality, however, because facilities can vary greatly in the
667
health status of the residents that they admit. Changes
in health status, then, should serve as the focus for
measuring facility performance. Long-stay Medicaid nursing home residents are likely to experience very gradual
declines in health and functional status during nursing
home stays that usually end in death. Functional improvement and discharges to home are unlikely to be
useful as favorable outcome measures for this population. Rather, favorable outcomes may be measured
better in terms of slower rates of functional decline
and/or longer survival relative to expected rates based
upon the case-mix composition of residents.
Table 1 contains definitions for a set of four health
outcomes for measuring resident-level outcomes over
time. Survival and ADL functional status outcomes are
the most fundamental and encompassing health outcome measures for assessing facility performance. The
other two health outcome indicators reflect more specific dimensions of a resident's health and functional
status. All four of these health outcomes will be influenced to varying degrees by factors associated with
the natural course of aging and disease processes that
will be largely beyond the control of a nursing home.
However, the rate of decline in health and functional
status of residents, on average, should be affected by
the quality of nursing care provided by a facility. For
example, some deaths associated with medical emergencies may be prevented by early identification of
medical problems, including life-threatening situations,
by attentive medical staff. Careful development of appropriate care plans and daily monitoring by nursing
staff, as well as the possible provision of certain restorative services, may affect the rate of deterioration
in a resident's ADL functional status over time. Simi-
larly, the onset of incontinence can be delayed, and
even restored in some situations, through bladder/boweltraining services.
Deaths were attributed as outcomes to a specific
facility when they either occurred in the facility or
outside the facility within 90 days of discharge. This
was done because most out-of-facility deaths were
found to occur in acute hospitals after discharge from
a nursing facility. The 90-day time period was chosen
to conform with the way all other Medicaid residence
histories were recorded. Because very few out-offacility deaths occurred after 30 days of discharge from
a nursing home, varying this time span to as long as
180 days had negligible effects on the empirical results. ADL limitations were specified simply in terms
of independence versus need for assistance to reduce
the chance of any bias due to code inflation associated with case-mix reimbursement incentives. Alternative coding for ADLs that distinguished among three
categories of dependence (i.e., independence, needs
assistance, and complete dependence) yielded results
that were similar to those reported here.
The general premise underlying the combined behavioral/cognitive outcome measure is that although
cognitive impairments are the result of progressive,
irreversible dementing illnesses, facilities providing
higher quality care may be better able to moderate or
delay the temporal rate of cognitive decline of residents who are at earlier stages of the disease process.
Furthermore, better care should also deter the associated disruptive behaviors of residents with cognitive impairments. A cross-tabulation of the cognitive
impairment and behavior problem variables for all
Medicaid nursing home resident data over three years
Table 1. Definitions of Outcome Measures
Outcome Measures
Definitions
Health Outcomes
Survival rate
ADL functional status
Behavioral/cognitive status
Incontinence status
Quality Indicators
Decubitus ulcers
Restraint use
Contractures
Accidents
Weight change
A dichotomous variable where 0 = a death within 90 days of discharge from the facility, and 1 =
otherwise.
A (0-5) count of MMQ items noting assistance needs versus independence in bathing, toileting,
dressing, transferring, and eating activities.
A (0-2) count of two dichotomous MMQ items pertaining to the presence of disruptive behavior and
cognitive impairment. The coding of disruptive behavior requires residents to have displayed
dependent or disruptive behavior (e.g., screaming, physically abusive, wandering) at least three
times per week. The coding of a cognitive impairment requires that a resident be disoriented or
impaired in memory nearly every day in performance of basic ADL tasks, mobility, and
adaptive tasks.
A (0-2) count of experience of bladder and/or bowel incontinence on a regular basis. To be assigned
as regularly incontinent versus continent for bladder or bowel control, a resident had to either be
incontinent on at least a daily basis, or on a bladder/bowel retraining program.
A 0-2 scale where 0 = no ulcers, 1 = at least one stage 1-2 ulcer but no higher stage ulcers, 2 = at
least one stage 3-4 ulcer. The coding of a pressure ulcer requires a daily treatment/procedure
performed by a licensed nurse under written orders by a physician.
A 0-2 scale where 0 = no written order for restraints exists, 1 = restraint is ordered but not used on
a regular daily basis, 2 = restraint is ordered and used regularly.
A dichotomous variable where 1 = a resident has any contractures, 0 = otherwise.
A dichotomous variable where 1 = a resident has experienced an accident during the month, 0 =
otherwise.
A dichotomous variable where 1 = a resident has experienced an unplanned gain of 8 or more
pounds or loss of 5 or more pounds during the month, 0 = otherwise.
668
The Gerontologist
showed the combined behavioral/cognitive status variable
to exhibit the properties of a Guttman scale, with a
value of one for cognitively impaired residents without reported behavioral problems and a value of two
for cognitively impaired residents with reported behavioral problems. The coefficient of reproducibility,
defined as the number of classification errors as a percentage of potential classification errors, for the posited Guttman scale outcome measure was 94%. This
value easily exceeded the conventional minimum value
of 90% used in Guttman scale development.
The remaining five outcome measures are not direct measures of health status change; rather, they are
Qls, or variables with observed values that indicate
with a high likelihood when substandard care is being
provided (Spector & Takada, 1991). For example, although decubitus ulcers cannot always be prevented
through appropriate nursing care, they frequently can
be prevented. Hence, a high prevalence rate of decubitus ulcers in a facility may be indicative of a situation in which there is a high likelihood of substandard
care due to factors such as the absence of a positioning program, inadequate incontinence care, or ineffective skin assessments.
Resident-Level Outcome
Models
How well a nursing home performs in its service to
Medicaid residents over a time period should be measured by comparing actual outcomes with expected
outcomes for all Medicaid residents of the facility over
the relevant time period. All current Medicaid residents, regardless of their admission date, should be
included in this measurement. Furthermore, residents
who are at risk for a greater fraction of the relevant
time period should contribute more to the measurement of facility outcome performance than residents
who are at risk for shorter time periods. Using this
reasoning, a 15% random sample of Medicaid nursing
home residents aged 65 years or older with at least
one quarter of MMQ data over the study period is an
appropriate sample design for modeling resident-level
outcomes. The study sample is representative of all
aged Medicaid residents in Massachusetts nursing
homes over the 3-year study period.
Resident-level outcome models were estimated for
each outcome measure in Table 1. Dates of death
were only available through December 1993, so the
survival outcome model was estimated with 61,280
quarterly MMQ records of residents with at least one
MMQ record through 1993 from the 15% sample. The
other eight models were estimated on 59,407 quarterly records of residents with at least two MMQ records
over the entire 3-year period. Individual residence histories were observed for an average of 7.6 quarters in
the estimation sample.
Multivariate "state-dependence" regression models
were specified for modeling quarterly changes for
most OT the outcome measures of Table 1. Using this
dynamic model specification, an outcome in the next
quarter is specified to be a function of the current
quarter's measurement of the outcome variable, a set
of resident demographic and medical diagnosis attriVol. 38, No. 6,1998
669
butes for risk adjustment purposes, and a set of facility attributes. Appendix 1 contains a brief technical
discussion of the model specification, definitions of
resident and facility attributes specified, estimation procedures, sensitivity analyses performed, and a summary of empirical findings. Additional detail about
the four resident-level health outcome models can be
found in Porell and colleagues (1998).
A state-dependence model form is particularly appropriate for modeling changes in health and functional status associated with chronic conditions when
effective risk factor adjustments are needed to create
a relatively level playing field for meaningful comparison of outcome performance among nursing homes.
For example, a facility with a heavily impaired resident population may exhibit higher levels of ADLs than
a facility serving residents with lesser service needs.
Specifying a resident's own "experience," as reflected
in his or her current ADL functional status, as an additional risk factor for predicting future ADL functional
status in the next quarter should undoubtedly produce a fairer overall risk adjustment than a set of
generic demographic or diagnostic variables alone.
The resident-level models for restraint use, accidents,
and unplanned weight change were not specified as
state-dependence models under the premise that
these specific Qls are more reflective of the effects of
service styles, or are incidents associated with inadequate or improper care rather than enduring physical chronic conditions of residents. In these models,
lagged dependent variables were not specified as independent variables. The outcome survival model was
specified as a conditional, discrete-time survival model
using a logistic regression model specification (Allison,
1984).
In general, the estimated parameters for resident
demographic and diagnostic attributes showed a great
deal of construct validity with respect to clinical expectations. Consistent with much of the nursing home
quality literature, the estimated parameters for facility
attributes were weaker and showed less consistent
patterns among outcome measures relative to resident
attributes, however. Table A-2 of Appendix 1 contains
a summary of the statistically significant coefficients for
each outcome model.
Development of Facility-Level
Performance Measures
Every Medicaid resident of a nursing home during
quarter t is at risk, to varying degrees, of experiencing
an adverse outcome over the quarter, such as a decline in ADL functional status, the development of a
higher stage pressure ulcer, or death. Residents are
also at risk of experiencing favorable outcomes, such
as the restoration of bladder control, the healing
of pressure ulcers, or improved functional status. The
mere observance of adverse outcomes does not necessarily imply anything about inferior outcome performance because the risk of adverse outcomes will
vary among individual residents. Facility outcome performance over a single quarter can be measured by
comparing actual outcomes for all Medicaid resident
outcomes at quarter t + 1 with expected outcomes at
quarter t + 1.
Facility-level performance measures were developed
for the outcome measures reported in Table 1 through
a four-step procedure entailing comparisons of actual
and expected outcomes of Medicaid residents over
each quarter-year of the 3-year study period. In the
first step, the estimated model parameters from the
resident-level outcome models were used to compute
predicted outcomes for each Medicaid nursing home
resident for all relevant quarters of available data. With
the exception of survival outcomes, the first expected
resident outcomes are the predicted values for the start
of the second quarter based on resident information
drawn from the first quarter of MMQ data. Predicted
outcomes were computed for each quarter from July
1991 through June 1994, except for survival, which
was only computed through December 1993 because
of data constraints on death dates. The measurement
of survival or death did not require a minimum of
two quarters of MMQ data, so a nursing home resident with 12 quarters of observed MMQ data will have
11 predicted quarterly outcomes for all performance
measures.
Although actual attributes for each Medicaid resident were specified in predicting outcomes, the actual facility attributes were not specified because facility outcome performance could vary systematically
with respect to certain facility attributes, such as profit
status or RN staffing level. Rather, for each facility attribute variable in the prediction models, a common
sample mean value (based on all nursing homes in
the state) was substituted for a facility's actual value.
Using this approach, the predicted outcomes should
reflect expected resident outcomes if each facility had
identical structural attributes. If certain types of facilities systematically exhibit better outcomes, this will be
reflected in the performance scores derived by comparison of actual and expected outcomes.
In the second step of the facility performance measurement, the quarterly predicted outcomes were then
averaged over all at-risk Medicaid resident-quarters for
a facility over time periods of three different lengths:
half-years, full years, and the 3-year study period. These
predicted mean outcomes are interpreted as outcomes
that would be expected if individual facility outcomes
conformed to the regularities reflected in statewide
performance norms, irrespective of the structural attributes of individual facilities. Facility attributes are held
fixed in generating these expectea outcomes, so any
variation in expected outcomes among facilities is solely
the result of case-mix differences in the resident attributes specified in each outcome model.
The third step in the procedure involves the computation of mean actual outcomes for facilities over
all residents and quarters of data for each defined time
period. The fourth step involves the comparison of
actual and predicted mean outcomes for each facility.
It is important to emphasize here that the basic
units of observation for measurement of facility performance over a specific time period are residentquarters. The first-quarter MMQ data of a resident is
used to predict his or her outcomes for the second
670
quarter. The second-quarter MMQ data is used to predict third-quarter outcomes, and so forth. Because the
measurement of performance outcomes at time t + 1
is based on resident data at time t, one resident with
three quarterly MMQ records will contribute two resident-quarters of data for performance measurement
over the half-year spanning time t and time t + 2.
Two other residents, each with only data for t and
t + 1, would also contribute two resident-quarters of
data to performance measurement over the same time
period.
Facility performance is measured for any single
quarter by comparison of observed and predicted
outcomes. Performance is measured over time periods longer than a quarter-year by taking weighted
averages of successive quarterly measurements of outcome performance, with Medicaid resident-quarters
serving as the relative weights. Thus 6-month performance scores are derived as a weighted average of
two successive quarterly measurements of outcome
performance. Of course, by averaging quarterly outcome performance to produce half-year and annual
facility performance measures, it is implicitly assumed
that facility effects on quality are relatively constant
over the 6-month or annual time period (see Appendix 2, Note 1).
An important advantage of measuring half-year or
annual facility performance by averaging successive
quarter-year performances rather than extending the
time period over which individual resident outcomes
are measured (e.g., from a quarter-year to 6 months
or a year) is a reduction in population attrition due to
death or discharge from the facility. For example, if
ADL outcomes were measured between ADL assessments made a year apart from each other, then a resident who dies 10 months after the initial assessment
must be excluded from the study population whose
data are used for the measurement of ADL-facility performance over a year. With the quarterly measurement of ADL outcomes, this same resident would still
contribute two resident-quarters of data for facility performance measurement over the same year.
The measurement of facility outcome performance
on the basis of quarterly outcomes averaged over time
will result in situations where good performance in
one quarter may be offset by poor performance in
another quarter. We believe this is appropriate given
the discrete nature of the adverse outcomes being
used for performance measurement. This can be seen
more clearly by taking a longitudinal perspective on
the survival outcomes of an individual resident. If actual survival is coded as one and death as zero, then
a surviving resident will always contribute favorably
to a facility's quarterly performance measurement because the actual survival outcome (measured as one)
will naturally exceed the expected outcome, a riskadjusted survival probability between 0 and 1. A resident who dies contributes negatively to facility outcome performance in the quarter of death because
his or her survival probability will always exceed the
observed death outcome (coded as zero). If a resident is observed over many successive quarters until
death and the care provided in the nursing home had
The Gerontologist
no significant adverse impact on survival chances, the
net contribution of this individual resident to the
facility's survival outcome performance measurement
over the individual's entire residence history would
likely be very marginal. That is, the accumulation of
small favorable contributions toward measurement of
survival outcome performance in the quarters survived would likely be offset by a more substantial
negative contribution to performance measurement in
the quarter of the resident's death. Similar arguments
can be made for the other health outcome measures
and Qls as well. For example, in over 79% of the
quarterly observations in the 15% sample population
used to estimate resident-level outcome models, no
change in ADLs was observed. In general, the health
outcome measures and Qls are based on events with
relatively low prevalence rates.
With the exception of the survival outcomes, all of
the outcome measures reflect adverse events (e.g.,
deterioration of functional status as reflected in higher
levels of ADL limitations). Other than survival outcomes,
performance indices were constructed as: (1 + mean
expected outcome)/(1 + mean actual outcome). One
was added to both the numerator and denominator
of the ratio not only to avoid the need for division by
zero if no adverse events occurred in a facility, but
also to allow performance scores to vary with a facility's
expected outcome level when no actual adverse
events were observed. Hence, a score exceeding one
means that the actual experience of nursing home
residents was better than predicted by the pooled
statewide model. A score less than one means that
the actual experience of residents was worse than predicted by the pooled statewide model. In the case of
survival outcomes, the performance score was constructed as the ratio of actual to predicted surviving
residents so that its scale conformed to that of the
others.
In this study, facility-level outcome performance
measures were developed by averaging "outcome residuals," or the difference between the expected and
actual outcomes of facility residents, over specific time
periods. This approach has been used in many previous studies profiling the performance of various health
care providers, including the performance of nursing
homes (Mukamel, 1997; Phillips, 1990). The methodology has been criticized for profiling the performance
of health care providers on several grounds, however.
Normand, Glickman, and Gatsonis (1997) recently
noted that: (a) the precision of provider-specific performance estimates may vary greatly with the sample
size of patients; (b) provider practice styles may induce a strong association among the outcomes of patients served by the same provider; and (c) sampling
variability is not separated from unobserved systematic interinstitutional variability. Patient-level hierarchical, or multilevel, models have been advocated as a
means to address these potential shortcomings of residual facility-level performance measures (Normand
et al., 1997). Because raw facility performance measures, derived as residuals of actual and expected
outcomes, may be imprecise for small providers who
serve very few patients, an appealing feature of the
Vol. 38, No. 6, 1998
671
multilevel estimation methodology is that these raw
performance measures based on actual facility experience are shrunk toward an expected value based on
the experience of a pool of providers. Although this
shrinkage toward a norm can result in a less accurate
measure of performance for an individual facility, it
will produce biased, but more precise, facility performance estimates for all facilities, on average.
Although there are merits to using multilevel models to profile the outcome performance of nursing
facilities, in this study we developed conventional
residual-based performance measures similar to those
developed by Phillips (1990) and Mukamel (1997). However, we have taken specific steps to minimize the
potential impacts of the recognized shortcomings of
residual-based outcome performance measures. Most
importantly, in our empirical analyses of facility performance measures, facility cases were either weighted
in proportion to the relative size of their Medicaid
resident populations, or facility cases were restricted
to those facilities with Medicaid resident populations
exceeding the state median of 78 Medicaid residents
during a quarter-year. In addition, multiple econometric
methods have been employed to estimate residentlevel model parameters to ensure that a robust set of
coefficients was used to generate expected resident
outcomes for facilities (see Appendix 2, Note 2).
Empirical Results
Our investigation of the empirical properties of the
facility-level performance measures consisted of both
cross-sectional and longitudinal analyses. The crosssectional analyses entailed the computation of: (a) basic descriptive statistics about the distribution of performance scores among facilities in the state; (b)
intercorrelations among the nine facility performance
measures; and (c) correlations between performance
scores and a set of facility attributes. Given that the
median number of Medicaid residents per facility in a
quarter was only about 78 residents and resident-level
changes in the outcome measures employed were relatively infrequent, 3-year performance measures were
employed for the cross-sectional analyses of performance measures. The longitudinal properties of facility outcome performance measures were investigated
for facility performance scores measured over half-year
and annual time periods. The longitudinal analyses
entailed the computation of: (a) correlations among
facility outcome performance scores over time, and
(b) prevalence rates for the repeated flagging of facilities with high/low outlier performance over the 3-year
study time period.
Facility Performance Measure
Descriptive Statistics
Table 2 contains some descriptive statistics for the
3-year facility-level performance indices for 566 nursing homes with at least one quarterly performance
score during the study period. Tne outcome measures
facility scores were generally symmetrically distributed
Intercorrelations Among Outcome
Measure Performance Scores
Table 2. Descriptive Statistics on Facility-Level
Outcome Performance Measures (N = 556)
Outcome Measure
Survival Rate
ADL Functional Status
Incontinence Status
Behavioral/Cognitive
Status
Decubitus Ulcers
Restraint Use
Accidents
Contractures
Weight Change
Coefficient
of Variation
Minimum
Maximum
1.18
1.18
1.64
.96
.96
.95
1.12
1.11
1.10
2.14
2.89
16.66
5.69
1.32
4.47
.92
.86
.54
.72
.94
.81
1.12
1.08
1.63
1.14
1.06
1.09
with the mean score roughly equal to the median.
The small coefficients of variation (i.e., 100 x (standard deviation/mean)) suggest there is relatively small
variation among facilities in their 3-year performance
scores. The most dispersion among facilities was found
in the performance scores for restraint use. Yet this
level of dispersion is still fairly modest. Although the
modest levels of dispersion in performance scores
may be attributed partially to averaging effects associated with the use of 3 years of data in the construction of the performance measures, it may also be
reflective of the highly regulated nature of the industry in Massachusetts. Unfortunately, Mukamel (1997)
did not report comparable measures of the dispersion
of facility outcome performance measures to permit a
comparison with those reported here.
Table 3 contains Pearson correlation coefficients
among the nine facility-level performance measures
computed over the entire study time period. Similar
results were found when performance was measured
over shorter time periods. Facility performance scores
were assigned relative case-weignts based upon number of resident-quarters used to measure a facility's
performance, because facilities with fewer Medicaid
residents will have less reliable performance measures.
It is obvious that the intercorrelations among the performance scores are very modest at best and are not
uniformly positive. The largest correlations were those
between performance scores for accidents and unplanned weight change (0.31), and between incontinence and behavioral/cognitive status (-0.24) and
survival rate (0.20), respectively. Whereas the former
correlation suggests that facilities with better than expected performance on accidents also tend to have
better than expected outcomes with respect to unplanned weight changes, the latter two correlations
suggest that facilities with better than expected incontinence outcomes also tend to have better than expected survival outcomes, but worse than expected
behavioral/cognitive status outcomes.
Overall, the correlations suggest that there are few
nursing homes with uniformly much better or much
worse than expected performance on all of the outcome measures. Furthermore, the modest intercorrelations among performance measures do not even
reveal much in the way of strong systematic patterns
among a few subsets of the outcome measures. The
Table 3. Intercorrelations Among Facility-Level Outcome Performance Measures (N = 556)a
ADL
Functional
Status
ADL Functional Status
Survival
Rate
Behavioral/
Cognitive
Status
Incontinence
Status
Decubitus
Ulcers
Restraint
Use
Accidents
Contractures
1.00
-0.08
(0.05)*
1.00
Behavioral/Cognitive
Status
0.09
(0.03)*
-0.00
(0.91)
Incontinence Status
-0.24
(0.01)*
0.15
(0.01)*
-0.20
(0.01)*
1.00
0.06
(0.13)
0.05
(0.25)
0.02
(0.58)
0.02
(0.70)
Restraint Use
-0.07
(0.08)
-0.03
(0.52)
-0.11
(0.01)
0.01
(0.75)
0.17
(0.01 )*
1.00
Accidents
-0.04
(0.31)
0.06
(0.13)
-0.02
(0.56)
0.03
(0.49)
0.10
(0.02)*
-0.07
(0.09)
1.00
Contractures
-0.06
(0.17)
-0.01
(0.86)
0.01
(0.88)
0.06
(0.13)
0.14
(0.01 )*
0.06
(0.15)
0.12
(0.01)*
1.00
Weight Change
-0.05
(0.16)
0.01
(0.93)
0.02
(0.55)
0.03
(0.47)
-0.01
(0.81)
0.31
(0.01)*
0.07
(0.11)
Survival Rate
Decubitus Ulcers
Weight
Change
1.00
1.00
-0.03
(0.42)
1.00
a
Cases weighted by total resident-quarters over the study period.
*p value in parenthesis < .05.
672
The Gerontologist
general magnitudes and pattern of the correlations
are consistent with the correlations among outcome
measures reported by Mukamel (1997) tor nursing
homes in Upstate New York.
Correlations With Facility Attributes
Table 4 contains a summary of significant (p < .05)
Pearson correlations between the 3-year facility performance scores and attributes of nursing homes. Cases
were again weighted by relative Medicaid population
size over the study period. There were no significant
correlations between any performance measure and
two facility attributes: the for-profit status of a facility,
and a facility's average net revenue as a percent of
total annual costs. Otherwise, very modest correlations
were found between facility attributes and performance
measures, and only a few attributes had significant correlations with more than one performance measure.
Facility performance measures were correlated with
two separate Omnibus Budget Reconciliation Act
(OBRA) deficiency tag variables: (a) average annual
counts of all OBRA deficiency tags, and (b) average
annual counts of only "quality of care" OBRA deficiency tags. The average annual count of all OBRA
deficiency tags for a facility was only significantly (negatively) correlated with restraint-use performance. Although statistical significance was achieved only for
survival, behavioral/cognitive status, and restraintuse performance measures, all facility performance
Table 4. Significant (p < .05) Pearson Correlations Between 3-Year Performance Scores and Facility Attributes
Outcome Measure
Survival Rate
Facility Attribute (N = 520)
Pearson r
-0.09
-0.22
Management firm
Medicare days
Private pay days
Nursing pool expenses
RN nursing expenses
LPN nursing expenses
OBRA quality of care deficiencies (N = 525)
-0.12
-0.13
-0.10
0.20
0.09
ADL Functional Status
Behavioral/Cognitive Status
Incontinence Status
Decubitus Ulcers
-0.09
0.09
Medicare days
Private pay days
Medicare days
Nursing FTE per patient day (N = 512)
0.18
0.11
Operating tenure
Medicare days
RN nursing expenses
LPN nursing expenses
0.09
-0.14
-0.10
-0.09
0.07
Operating tenure
Restraint Use
0.09
-0.15
-0.14
Nursing pool labor expenses
OBRA deficiencies (N = 525)
OBRA quality of care deficiencies (N = 525)
Accidents
-0.11
Bed size
Contractures
-0.12
-0.14
-0.13
-0.09
Medicare days
RN expenses
LPN expenses
OBRA quality of care deficiencies (N = 525)
Weight Change
0.12
-0.12
Operating tenure
Nursing FTE per patient day (N = 512)
Notes: Facility attribute definitions: Not-for-profit = 1 for not-for-profit facilities, = 0 otherwise.3 Management firm =
1 for facilities operated by a management company, = 0 otherwise. Operating tenure = the tenure of facility operation
under the current ownership in years. Bed size = the mean number of certified skilled, intermediate care, and rest home
beds for the facility. Net revenue = annual facility net revenue from all sources as a percent of total costs.3 Nursing FTE
per patient day = average annual FTE nursing staff hours (RNs, LPNs, nurse aides) per patient day. RN expenses = total
RN expenses as a percent of total annual nursing expenses. LPN expenses = total LPN expenses as a percent of total
annual nursing expenses. Nursing pool labor expenses = total nursing expenses for non-staff nursing services as a percent
of total annual nursing expenses. Private pay days = private payer days as a percent of total annual patient days from all
payers. Medicare days = Medicare payer days as a percent of total annual patient days from all payers. OBRA quality of
care deficiencies = mean annual OBRA quality of care subcomponent deficiency tags 1991-1993. OBRA deficiencies =
mean annual total OBRA deficiency tags 1991-1993.
"Facility attribute was not significantly correlated with any outcome measure.
Vol. 38, No. 6, 1998
673
measures, except unplanned weight changes, were
negatively correlated with the average count of OBRA
"quality of care" deficiencies over the study period.
Given the kinds of items comprising the specific subset of OBRA deficiency tags classified as "quality of
care" deficiency tags, the nearly uniform modest negative associations between the quality of care deficiency
tag variable and our facility performance scores are
supportive of the validity of the facility performance
indicators.
The pattern of positive and negative associations between the percentage of nursing expenses allocated
among LPN and RN nurses, respectively, and the survival, incontinence, and contracture performance measures may also be plausible given the relative wage
levels of these nurses and the care needs of a longstay, aged Medicaid nursing home population. For the
same level of nursing staff expenditure, more intensive use of LPNs rather than RNs will increase the level
of professional nursing FTEs per resident. Such nurse
staffing patterns may better serve the labor-intensive
service needs of such an institutionalized aged Medicaid population.
Although considerable care was taken in the specification of resident case-mix attributes in the residentlevel econometric models used to generate expected
outcomes, the significant associations between the
percentage of patient days paid for by Medicare and
survival, ADL, incontinence, and contracture and behavior/cognitive performance measures are the plausible result of residual unspecified resident case-mix
effects.
Correlation of Performance
Measures Over Time
The practical use of facility outcome performance
measures for quality assurance or reimbursement purposes will likely require that performance be measured
over time periods much shorter than 3 years. Some
insights about the longitudinal properties of such
facility-level outcome performance measures is found
in Table 5, which contains Pearson correlations among
repeated annual and half-year performance scores
of facilities over time. Because some facilities were
newly opened and others were closed (for various reasons) during the study time period, this analysis was
restricted to a subset of facilities observed over the
entire study period. Similar results were found when
the number of study facilities was allowed to vary
among quarters. Because the performance scores for
facilities with relatively small Medicaid resident populations are likely to exhibit greater temporal instability
due to effects of sampling variation, performance score
cases were weighted by their relative Medicaid resident population size, defined for each facility as the
minimum population between the two time periods
being compared.
The top and bottom portions of Table 5 report Pearson correlations for annual and half-year facility performance measures, respectively. Given that half-year
facility performance scores are only based upon two
quarters of resident outcomes (rather than the four
quarters of outcomes used for annual performance
scores), greater temporal instability is expected for half-
Table 5. Pearson Correlations of Annual and Half-Year Facility Outcome Performance Scores Over Time (N = 504)
Pearson Correlation of Performance Scores Between Years Y1-Y3
Outcome Performance Indicator
Y1 &Y2
Y2&Y3
Y1 &Y3
Survival Rate
ADL Functional Status
Behavioral/Cognitive Status
Incontinence Status
Decubitus Ulcers
Restraint Use
Contractures
Accidents
0.204*
0.137*
0.454*
0.213*
0.623*
0.861*
0.453*
0.752*
0.699*
0.200*
-0.001
0.304*
0.096*
0.503*
0.930*
0.288*
0.779*
0.726*
0.129*
0.090*
0.322*
0.197*
0.394*
0.602*
0.336*
0.627*
0.462*
Weight Change
Pearson Correlation of Half-Year Performance Scores Between 6-Month Time Periods T1-T6
Survival Rate
ADL Functional Status
Behavioral/Cognitive Status
Incontinence Status
Decubitus Ulcers
Restraint Use
Contractures
Accidents
Weight Change
T1 & T 2
T1 & T 3
T1 & T 4
T1 & T 5
T1 & T 6
0.216*
0.190*
0.266*
0.209*
0.443*
0.931*
0.368*
0.795*
0.716*
0.053
0.070
0.338*
0.199*
0.483*
0.848*
0.269*
0.654*
0.595
0.138*
0.045
0.174*
0.088*
0.379*
0.718*
0.309*
0.595*
0.534*
0.116*
0.053
0.249*
0.118*
0.317*
0.585*
0.206*
0.558*
0.381*
0.055
0.034
0.183*
0.060*
0.127*
0.457*
0.149*
0.501*
0.310*
*p < .05.
674
The Gerontologist
year scores due to effects of sampling variation. Although the data support this expectation, note that
the correlations between every other half-year performance score (e.g., T1 and T3 correspond to outcomes
measured over the same months in successive years)
were only a little smaller than those between successive annual performance scores. Shifts in facility performance scores over time appear to be more substantial than expected by sampling variation alone. With
some exceptions, there is a general pattern of smaller
correlations between facility performance measures separated by greater intervals of time. That is, facility performance scores in time periods T1 and T2 are more
highly correlated than are facility performance scores
between time periods T1 and T3, and between time
periods T1 and T4, and so forth. In fact, the half-year
survival and ADL facility performance scores of 504
facilities between time periods T1 and T6 (separated
by two years) were essentially uncorrelated.
One of the more interesting findings from Table 5,
however, is the markedly lower temporal intercorrelations in facility performance scores for the broader
health outcomes of survival, ADLs, and incontinence
relative to the other Ql performance measures. The
highest temporal intercorrelations of facility performance scores were found for the Qls of restraint use
and accidents. These results suggest that over the
study time period at least, facilities were much more
likely to exhibit consistent superior/inferior relative performance with respect to the prevalence of restraint
use and accidents than resident survival and maintenance of their ADL functioning.
performance outlier threshold at the lowest decile
of the statewide distribution of facility performance
scores. That is, nursing homes with performance scores
placing them at or below the 10th percentile of the
facility distribution were flagged as inferior performers. For the purpose of illustrating the empirical properties of the facility performance scores, superior performance facilities were also flagged when their performance scores were above the 90th percentile of
the facility distribution. Both annual and half-year
facility performance scores were used to distinguish
superior and inferior outlier performance facilities over
the 3-year study period. The facility distributions of
performance scores used to set upper and lower threshold levels for flagging outliers included all nursing
homes with Medicaid residents in the time period regardless of the size of their Medicaid resident population.
To provide some empirical insight about the discriminatory power of performance measures for flagging outlier performance facilities, prevalence rates of
repeated outlier status were computed for a sample
of nursing homes. To reduce the influence of sampling variation associated with very small Medicaid populations on the findings, the facility sample for this
analysis was restricted to nursing homes with at least
the median facility-1 eve I Medicaid resident population
for half-year time periods for the entire 3-year study
period. There were 233 facilities meeting the sample
selection requirements (see Appendix 2, Note 3).
Table 6 contains the empirical findings of the prevalence rates for outliers flagged with annual performance scores for each of the nine outcome measures.
Flagged Outlier Facilities Over Time
Among outcome measures, between 65% and 74%
of facilities are never flagged for superior or inferior
performance over the 3 years. Among the residual
groups of facilities flagged at least once over the 3
years, roughly two thirds of them are flagged only once
as a superior or inferior performance facility over the
3 years for most outcomes. A very small number of
facilities were identified both as superior and inferior performance outliers at least once over the 3year period. More of them were found for the broader
The most practical use of facility outcome performance measures for quality assurance purposes may
be to target outlier facilities with inferior outcome performances for further investigation of potential quality
of care problems. Although there will be some ambiguity in any single threshold definition of outlier
status, we followed the suggested approach of Zimmerman and colleagues (1995) ana set the inferior
Table 6. Prevalence of Flagged High/Low Outlier Performance Facilities With Annual Performance Scores Over 3 Years (N = 233)
Number of Times a Facility Is Flagged as Outlier Over Three Years
Outcome
Measure
Times in Highest
Decile
3 Times
2 Times
1 Time
0 Times
0 Times
0 Times
0 Times
1-2 Times
Times in Lowest
Decile
0 Times
0 Times
0 Times
0 Times
1 Time
2 Times
3 Times
1-2 Times
Total
Facilities
0.0%
0.4%
0.0%
1.3%
0.4%
2.2%
5.6%
2.6%
3.0%
0.9%
0.0%
1.7%
3.9%
3.0%
3.0%
4.3%
3.9%
1.3%
12.9%
10.3%
8.1%
10.3%
14.6%
6.9%
7.3%
4.7%
7.7%
70.8%
74.3%
73.8%
71.7%
66.1%
67.4%
64.8%
67.8%
71.7%
12.1%
12.0%
12.5%
7.7%
10.3%
9.4%
12.0%
15.9%
9.0%
2.0%
1.3%
2.2%
3.0%
3.9%
6.4%
4.3%
1.3%
4.7%
0.0%
0.0%
0.0%
0.4%
1.3%
4.7%
1.3%
2.6%
2.6%
1.3%
1.7%
1.7%
1.7%
0.4%
0.0%
0.4%
1.2%
0.0%
100%
100%
100%
100%
100%
100%
100%
100%
100%
Survival Rate
ADL Functional Status
Incontinence Status
Behavioral/Cognitive Status
Decubitus Ulcers
Accidents
Restraint Use
Contactures
Weight Change
Vol. 38, No. 6, 1998
675
survival, ADL, and incontinence health outcomes than
for the Ql performance measures.
A relatively small fraction of facilities were repeatedly flagged as either superior or inferior performance
facilities over the study period. The prevalence of such
repeated outlier status, however, was much greater
for Qls such as decubitus ulcers, accidents, and restraint use than for the broader health outcomes. Furthermore, given the modest intercorrelations among
outcome performances reported earlier in Table 3, it
is not surprising that there were very few individual
facilities that were repeatedly flagged as high or low
outlier performance facilities on more than one outcome measure. As a consequence of the variable performance both among outcome measures and over
time periods, a comparison of mean values for various structural facility attributes (e.g., profit status, nurse
staffing level) among subgroups of facilities defined on
the basis of repeated high outlier status, repeated low
outlier status, or repeated non-outlier status did not
produce a consistent pattern of differences in facility
attributes among the subgroups.
Table 7 contains our empirical findings for the halfyear performance scores. Overall, the half-year performance scores show much less discrimination among
facilities with respect to outlier performance than do
the annual scores. A much smaller proportion of facilities (between 18% and 56%) were never flagged as
performance outliers over the six half-year time periods. Although differences among outcome measures
were accentuated, the pattern of empirical results among
outcomes was similar to that reported for annual performance scores. Repeated superior or inferior outlier
performance status was much more prevalent for Qls
such as restraint use, accidents, and unplanned weight
changes than for the broader health outcomes of survival, ADLs, and incontinence.
The findings for the survival outcome measure are
particularly striking with respect to their variability.
Over 80% of the sample nursing homes were flagged
as outliers at least once and nearly 25% of them were
flagged both for superior and inferior survival performance at least once over the six time periods. Be-
cause the study sample for this analysis was restricted
to facilities witn larger Medicaid resident populations,
even lesser discriminatory power would be found if
all facilities were included.
Discussion
This study has broken new ground in the development and testing of facility-levelperformance measures
for multiple health outcomes and Qls. The implications of our study findings are not entirely clear. Our
results could suggest that for long-term aged Medicaid
residents at least, outcome performance differences
among nursing homes in a highly regulated state like
Massachusetts are subtle enough that there are only
weak associations among and between measurable
outcomes and facility attributes. On the other hand,
the subtle facility performance differences may have
less to do with strict regulation than with the study
population itself. During the study period, the average Medicaid nursing home resident in Massachusetts
was nearly 83 years old and had about 3.7 ADLs (out
of a maximum of 5). Nevertheless, in each quarteryear more than 95% of Medicaid nursing home residents survived to the next quarter, with survivors
experiencing an increase of only 0.06 ADLs between
successive quarters. Given the slow but usually irreversible decline in health and functional status experienced by this institutionalized population and the
limits of administrative data for purposes of making
sensitive case-mix adjustments, most nursing homes may
simply exhibit a little better or average performance
on some measures and a little worse than average on
others.
The empirical findings of this study have implications toward the practical use of facility outcome performance measures for quality assurance purposes in
the near future. Our empirical findings suggest that
very strong facility performance on some outcome
measures may very well coexist with very weak facility performance on others. Whether simultaneous strong/
weak performance on various outcomes is a common
outcome stemming from the multidimensionality of the
Table 7. Prevalence of Flagged High/Low Outlier Performance Facilities With Half-Year Performance Scores Over 3 Years (/V = 233)
Number of Times a Facility Is Flagged as Outlier Over Six Half-Year Time Periods
Outcome
Measure
Times in Highest
Decile
3-6 Times
2 Times
1 Time
0 Times
0 Times
0 Times
0 Times
Times in Lowest
Decile
0 Times
0 Times
0 Times
0 Times
1 Time
2 Times
3-6 Times
9.9%
6.0%
5.6%
8.2%
9.9%
4.3%
3.9%
2.6%
5.6%
19.8%
15.5%
12.9%
11.2%
23.6%
6.4%
4.7%
10.7%
8.2%
17.6%
31.3%
33.9%
40.8%
26.2%
52.4%
55.8%
34.8%
54.5%
18.5%
21.0%
20.2%
16.3%
15.5%
10.7%
9.0%
20.2%
11.2%
7.3%
5.2%
7.7%
3.9%
4.7%
6.0%
8.2%
5.6%
5.2%
2.2%
2.2%
3.0%
4.7%
4.7%
13.7%
8.2%
5.2%
8.6%
Survival Rate
ADLs
Incontinence Status
Behavioral/Cognitive Status
Decubitus Ulcers
Accidents
Restraint Use
Contractures
Weight Change
1.3%
2.2%
2.2%
7.3%
6.0%
6.4% .
10.3%
5.6%
6.9%
676
1-5 Times
Total
1-5 Times Facilities
23.6%
16.8%
14.6%
7.7%
9.0%
0.9%
2.2%
15.5%
0.4%
100%
100%
100%
100%
100%
100%
100%
100%
100%
The Gerontologist
concept of quality is unknown at the present time.
Mukamel (1997) found similar results for facility performance measures derived from case-mix reimbursement data in New York. Although it may be unrealistic to expect that a nursing home of superior
quality will exhibit superior outlier performance for
all outcomes, very divergent performance on various
outcomes would seem to be at odds with common
perceptions of high quality nursing homes. Additional
research in states with different regulatory climates
and/or broader resident populations should provide
important comparative data to better understand some
of the current study findings.
Our study findings also suggest that temporal shifts
in facility performance scores may be common. A facility exhibiting superior performance a year or two
ago for certain outcomes (particularly broader health
outcomes) may be just as likely to exhibit average or
even inferior performance today. It is reasonable to
ask whether many of the facilities exhibiting temporal
shifts in outcome performance were subject to sanctions or corrective action as a result of regulatory activity. Although it was beyond the scope of this study
to study formally to what degree facility outcome performance changes followed regulatory sanctions, temporal shifts in facility performance were just as prevalent in facilities with no OBRA deficiencies as in those
with OBRA deficiency citations. There are also common anecdotal perceptions about rapid shifts in
perceived quality of certain nursing homes associated
with events sucn as the turnover of some key staff
members, but such situations would have to be fairly
widespread to account for the small intertemporal
correlations in performance scores. Our comparisons
of empirical findings for half-year and annual performance measures suggest that the findings cannot be
attributed simply to sampling variation effects either.
Although the basis for our empirical findings is uncertain at this time, at minimum, the mediocre performance of our facility performance scores in discriminating outlier facilities over time raises some questions
about their practical utility for effective targeting of
limited quality assurance resources. Validation research
is needed to assess the effectiveness of a quality assurance regulatory strategy that would be based on
detailed reviews triggered by outlier status on facility
outcome performance measures.
Use of multilevel modeling estimation for measuring facility-level outcome performance would be
expected to produce performance measures with
greater temporal stability, particularly for facilities with
small resident populations. However, there may be
shortcomings to the use of multilevel modeling techniques in practice as well. Fitz-Cibbon (1991) questioned whether it is practical to use shrinkage adjustments in a working system of school performance
measurement. Teachers confided to her tnat although
residual-based performance measurement could be
followed easily, shrinkage methods were much more
difficult to understand. In addition, Fitz-Gibbon noted
that shrinkage effects might actually obscure real
changes in school effectiveness because the amount
of shrinkage applied to a school's raw performance
Vol. 38, No. 6, 1998
677
may change over time. Finally, Fitz-Gibbon noted that
as long as schools are aware of the limitations of results for small samples, the face validity of residual performance measures may be more important to educators than the temporal stability afforded by shrinkage estimates when performance results are reported
to schools each year. Certainly some nursing home
industry practitioners, particularly those associated with
facilities whose exemplary actual performance is shrunk
toward a state norm, may raise similar questions about
the merits of complex shrinkage performance measures that are only partially based on actual nursing
home resident experience.
Overall, our study findings suggest that some difficult issues must be grappled with before the practical
use of data from facility-level nursing home outcome
performance measures can be established for quality
assurance or reimbursement purposes. Nursing home
industry practitioners are likely to be skeptical of outcome performance data that do not show a great deal
of construct validity with regard to expectations about
variations among facilities and stability over time. Future longitudinal studies of nursing home outcomes
incorporating data from a wider variety of geographic
markets are needed to provide additional insights
about these issues.
References
Aaronson, W., Zinn, J., & Rosko, M. (1994). Do for-profit and not-forprofit nursing homes behave differently? The Cerontologist, 34, 775786.
Allison, P. (1984). Event history analysis. Newbury Park: Sage Publications.
Andersen, N., & Stone, L. (1969). Research and public policy. The Cerontologist, 9, 214-218.
Braun, B. (1991). The effect of nursing home quality on patient outcome. Journal of the American Geriatrics Society, 39, 329-338.
Breen, R. (1996). Regression models: Censored, sample selected, or truncated
data. Newbury Park: Sage
Burton, L, German, P., Rovner, B., Brant, L, & Clark, R. (1992). Mental illness and the use of restraints in nursing homes. The Cerontologist, 32, 164-170.
Cohen, J., & Spector, W. (1996). The effect of Medicaid reimbursement
on quality of care in nursing homes. Journal of Health Economics,
15, 23-48.
Davis, M. (1991). On nursing home quality: A review and analysis. Medical
Care Review, 48, 129-165.
Evans, L, & Strumpf, N. (1989). Tying down the elderly: A review of
the literature on physical restraint. Journal of the American Geriatrics Society, 37, 65-74.
Craber, D., & Sloane, P. (1995). Nursing home survey deficiencies for
physical restraint use. Medical Care, 33, 1051-1063.
Fitz-Cibbon, C. (1991). Multilevel modeling in an indicator system. In S.
Raudenbush & J. Willms (Eds.), Schools, classrooms, ana pupils (pp.
67-83). New York: Academic Press.
Hawes, C , Morris, J., Phillips, C , Mor, V., Fries, B., & Nonemaker, S.
(1995). Reliability estimates for the Minimum Data Set (MDS) for
nursing home resident assessment and care screening. The Gerontologist, 35, 172-178.
Heckman, J. (1979). Sample selection bias as a specification error. Econometrica,
47, 153-161.
Hogan, A., Smith, D., & Jameson, J. (1986). Functional assessment of
nursing home patients: Reliability and relevance. Evaluation and the
Health Professions, 9, 339-360.
Huber, P. (1967). The behavior of maximum likelihood estimates under
non-standard conditions. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, 1, 221-233.
Kane, R., Bell, R., Riegler, S., Wilson, A., & Keeler, E. (1983). Predicting the
outcomes of nursing home patients. The Cerontologist, 23, 200-206.
Linn, M., Curel, L, & Linn, B. (1977). Patient outcome as a measure of
quality of nursing home care. American Journal of Public Health,
67, 337-343.
Menard, S. (1995). Applied logistic regression analysis. Newbury Park:
Sage.
Mukamel, D. (1997). Risk-adjusted outcome measures and quality of
care in nursing homes. Medical Care, 35, 367-385.
National Center for Health Statistics. (1982). Evaluation of the long-term
care Minimum Data Set. Hyattsville, M D : U.S. Department of Health
and Human Services.
Normand, S., Clickman, M., & Catsonis, C. (1997). Statistical methods
for profiling providers of medical care: Issues and applications. Journal of the American Statistical Association, 92, 803-814.
Norton, E. (1992). Incentive regulation of nursing homes: Specification
tests of the Markov model. In D. A. Wise (Ed.), Topics in the economics of aging (pp. 275-303). Chicago: The University of Chicago
Press.
Nyman, J. (1988). Improving the quality of nursing home outcomes.
Medical Care, 26, 1158-1171.
Ouslander, J., Kane, R., & Abrass, I. (1982). Urinary incontinence in
elderly nursing home patients. Journal of the American Medical Association, 248, 1194-1198.
Phillips, C. (1990). Developing a method of assessing quality of care in
nursing homes using key indicators and population norms. Journal of
Aging and Health, 3, 407-422.
Phillips, C , Hawes, C , Mor, V., Fries, B., Morris, J., & Nennstiel, M.
(1996). Facility and area variation affecting the use of physical restraints in nursing homes. Medical Care, 34, 1149-1162.
Porell, F., Caro, F., & Silva, A. (1993, November). Medicaid audit of
nursing home case mix reimbursement submissions: A reliability ana'ysis.
Paper presented at the Annual Meeting of the Cerontological Society of America, New Orleans, LA.
Porell, F., Caro, F., Silva, A., & Monane, M. (1998). A longitudinal analysis
of nursing home outcomes. Health Services Research, 33, 835-865.
Rice, N., & Leyland, A. (1996). Multilevel models: Applications to health
data. Journal of Health Services Research and Policy, 1, 154-164.
Sainfort, F., Ramsey, J., & Monato, H. (1995). Conceptual and methodological sources of variation in the measurement of nursing facility
quality: An evaluation of 24 models and an empirical study. Medical
Care Research and Review, 52, 60-87.
Spector, W., & Takada, H. (1991). Characteristics of nursing homes that
affect resident outcomes. Journal of Aging and Health, 3, 427-454.
Tinneti, M., Liu, W., Marottoli, R., & Cinter, S. (1991). Mechanical restraint use among residents of skilled nursing facilities. Journal of the
American Medical Association, 265, 468-471.
White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica, 48, 817830.
Zimmerman, D., Karon, S., Arling, C , Clark, B., Collins, T., Ross, R., &
Sainfort, F. (1995). Development and testing of nursing home quality indicators. Health Care Financing Review, 16, 107-128.
Zinn, J., Aaronson, W., & Rosko, M. (1993). Variations in outcomes of
care provided in Pennsylvania nursing homes: Facility and environmental correlates. Medical Care, 31, 457-487.
Received February 18, 1997
Accepted September 16, 1998
Appendix 1
This appendix provides additional information on
the specification of resident-level outcome models,
the sensitivity analyses performed, and a summary of
empirical results for the resident outcome models.
Model Development
The multivariate quarterly "state-dependence" model
had the following general form:
Outcome, + 1 = (30 + P1 Outcome, + p2 Resident
attributes, + p3 Facility Attributes, + ef
[A1]
where Outcomes t+1/ Outcome, = Outcome in quarter t + 1 and its lagged value in quarter t, respectively; Resident attributes, = a set of demographic
attributes, diagnostic medical conditions, frailty level,
and other resident risk factor attributes in quarter t;
Facility attributes, = a set of facility attributes in quarter t; and et = a random disturbance term.
The restraint use, accidents, weight change, and
survival outcome model specification differed from
678
Equation A1 by the omission of the lagged outcome
measure. The survival model be viewed as a conditional (upon survival through quarter t - 1), discretetime survival model (Allison, 1984).
Variable Specification
Table A-1 contains definitions for all independent
variables specified in the various outcome models. The
rationale for and specification of most of the demographic attributes of residents should be fairly obvious
and are not discussed here. Facility admission dates
were used to specify two resident tenure variables: a
nonlinear quadratic function of a resident's nursing home
tenure, and a dummy variable to distinguish the initial quarters of newly admitted Medicaid residents, who
may be more likely to experience health decline or
death due to medical instabilities associated with hospital discharge or transfer from another nursing home.
Four diagnostic MMQ data fields were used to
characterize the medical conditions of residents. A set
of dummy variables were specified under a two-step
hierarchical procedure. First, if any three-digit ICD-9CM diagnosis matched any one of the 15 "highest
frequency" primary diagnoses for Medicaid residents
in the state (accounting for the diagnoses of about
55% of Medicaid residents), the respective "highest
frequency" diagnostic dummy variable was coded to
one. Otherwise, one of 15 residual dummy variable
groups defined on the basis of Major Diagnostic Conditions (MDCs) was set to one.
Lagged values of all three functional health outcome
measures were specified in each outcome model to
capture the interactive effects among these fundamental dimensions of frailty upon subsequent outcomes. Furthermore, lagged values of few additional
MMQ items were specified as additional risk factors
associated for specific health outcomes or Qls. For example, variables indicating regular use of restraints and
the presence of a contracture were specified as risk
factors for incontinence problems (e.g., Ouslander,
Kane, & Abrass, 1982; Evans & Strumpf, 1989).
The middle portion of Table A-1 contains the definitions of facility attributes. Five broad organizational
facility attributes were specified: profit status, facility
bed size, management form, ownership tenure, and
financial performance. Four variables were specified
to reflect the nurse staffing patterns. Because oill-time
equivalent nursing staff data did not distinguish among
RNs, LPNs, and nurse aides in all years, an aggregate
overall nurse staffing ratio variable was specified. Personnel expense data from facility cost reports were
used to specify variables differentiating the skill-level
mix of nursing staff among facilities. It has been suggested that use of non-staff nursing labor from agency
pools may adversely affect the process of care, so a
variable was specified as the fraction of nursing expenses allocated to such labor services. Lastly, three
aggregate resident payer-mix variables were specified
to capture the influence of other unmeasured service
or case-mix severity effects. The private-pay variable
serves a dual function because it has been employed
as a structural quality indicator under the premise that
The Gerontologist
Table A-1. Variable Specification and Descriptive Statistics of the Estimation Sample (N = 59,407)
Variable
Demographic/Length of Residence Attributes
Male
White
Black
Age
LOS
LOS-squared
New admission
Mean
.195
.969
.011
.40
.17
.10
85.005
3.627
28.724
.057
3.95
70.77
.23
Secondary diagnoses
2.345
High Frequency Diagnoses'
Dementia
.222
Alzheimer's
.114
Schizophrenia
.051
Psychosis
.058
Parkinson's
.054
Diabetes
.149
Heart failure
.120
Stroke
.068
Other cerebrovascular accident
.040
Hypertension
.236
Ischemic heart
.114
Hip fracture
.074
Chronic air obstruction
.070
Osteoarthrosis
.124
General symptoms
.072
Residual MDC Diagnoses'1
Infections
.017
Neoplasms
.062
Metabolic
.183
Mental
.212
Nervous
.150
Circulatory
-165
Respiratory
.053
Digestive
.124
Genitourinary
.071
Skin
.033
Muskoskeletal
.120
Congenital
.011
Ill-defined
.087
Injury/poisoning
.075
Baseline Frailty Measures and Other Risk Attributes
ADL status
3.700
Mental status
.870
Incontinence
1.212
Contracture
.135
Weight change
.066
Restraints
General Facility Attributes
Not-for-profit
Management firm
Operating tenure
Beds
Net revenue
Nurse Staffing Attributes
Nursing staff intensity
SD
Definition
.87
Male = 1, other = 0
White residents = 1, other = 0
Black residents = 1, other = 0
(Other non-White race omitted)
Age in years
Years since admission to facility
Squared value of LOS
MMQ completion date within 90 days of admission = 1,
other = 0
Count of diagnoses (0-3)
.42
.32
.22
.23
.23
.36
.32
.25
.20
.42
.32
.26
.26
.33
.26
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
290
331
295
298
332
250
428
436
437
401
414
820
496
715
780
.13
.24
.39
.41
.36
.37
.22
.33
.26
.18
.32
.10
.28
.26
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
ICD-9
(001-139)
(140-239)
(240-289)
(290-319)
(320-389)
(390-459)
(460-519)
(520-579)
(580-629)
(680-709)
(710-739)
(740-759)
(780-799)
(800-999)
8.10
1.51
.66
.91
.34
.25
.191
.39
.241
.488
.43
.50
8.639
5.81
119.774
56.59
3.134
8.97
3.250
.67
1, other
1, other
1, other
1, other
1, other
1, other
1, other
1, other
1, other
1, other
1, other
1, other
1, other
1, other
1, other
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
1,
=
=
=
=
=
=
=
=
=
=
=
=
=
=
=
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
other
other
other
other
other
other
other
other
other
other
other
other
other
other
=
=
=
=
=
=
=
=
=
=
=
=
=
=
0
0
0
0
0
0
0
0
0
0
0
0
0
0
Count of ADLs in quarter t
Count of behavior/cognitive impairments in quarter f.
Count of bladder/bowel incontinence problems in quarter t
A contracture in quarter t = 1,0 = otherwise
An unplanned weight change of 5 pounds or more in
quarter t = 1, 0 = otherwise
Restraints are regularly used = 1,0 = otherwise
Not-for-profit facilities = 1, 0 = otherwise
Facilities operated by a management company = 1,
0 = otherwise
Tenure of facility operation under the current ownership
in years
The mean number of certified skilled, intermediate care,
and rest home beds in facility
Annual facility net revenue from all sources as a percent
of total costs
Average FTE hours of nursing staff (RNs, LPNs, nurse
aids) per resident patient day
(Table continues on next page)
Vol. 38, No. 6, 1998
679
Table A-1. Variable Specification and Descriptive Statistics of the Estimation Sample (N = 59,407) {Continued)
Variable
Definition
Mean
SD
24.730
9.24
22.612
7.50
3.142
5.50
17.570
12.42
3.169
3.82
1856.087
263.56
Nurse Staffing Attributes (Cont)
RN staffing
LPN staffing
Nursing pool labor
Facility Case-Mix Attributes
Private payer days
Medicare payer days
Mean M M Q score
Total personnel expenses for RNs as a percent of total
annual nursing personnel expenses
Total personnel expenses for LPNs as a percent of total
annual nursing personnel expenses
Total nursing expense for non-staff nursing services as a
percent of total annual nursing personnel expenses
Private payer days as a percent of total annual patient
days from all payers
Medicare payer days as a percent of total annual patient
days from all payers
The mean MMQ score for all Medicaid residents of the
facility in the quarter
a
The 15 most frequent primary diagnoses of Medicaid residents in Massachusetts.
Major Diagnostic Category (MDC) dummy variable were set to unity only for diagnoses not contained in any of the high frequency
diagnostic groups. Conditions originating in perinatal period (ICD-9-CM 760-799) was the omitted group.
b
increases in quality are needed to attract more higher
paying private-pay patients (Nyman, 1988).
Estimation Procedures
The resident outcome model parameters were estimated with multiple linear regression and logistic
regression methods. Logistic regression was used for
binary outcome measures. Because the estimation
sample file is essentially a panel data set with multiple
observations over time for each nursing home resident, asymptotic bootstrap standard error estimates were
obtained for the estimated parameters from a maximum likelihood procedure, derived independently by
White (1980) and Huber (1967). This procedure is
intended for use with nonrandom clustered data with
unspecified nonzero covariances among disturbances.
Although we did not assume uncorrelated disturbances
among quarterly observations for each resident, disturbances among different residents were assumed to
be uncorrelated for any quarter and among different
quarters.
Because we were unaware of any single estimation
methodology that could simultaneously address all potential estimation problems associated with the categorical form of the dependent variables, the repeated
observation panel structure of the data, and the
specification of a lagged-dependent variable in the
state-dependence models, several alternative estimation procedures were used to test the robustness of
the empirical results. First, all models were estimated
as linear regression models with the conventional iterative Cochran-Orcutt procedure for time series data.
This was done to assess how sensitive the empirical
results were to endogeneity bias associated with the
specification of a lagged dependent variable as a covariate. Second, ordinal logit models were estimated
and intraclass correlation coefficients, estimated from
the residuals, were used to inflate the standard errors
of the estimated coefficients. This was done to test
how sensitive the empirical results were to the treat-
680
ment of outcomes as discrete ordinal variables rather
than interval variables. Given the generally modest
level of autocorrelation among residuals for the estimated models, the empirical findings, including results
from standard OLS regression, showed only marginal
differences when different estimation methodologies
were applied.
Two-step sample selection models (Heckman, 1979)
were also run to address potential problems of selectivity bias associated with discharge or death in all outcome models other than survival. In the first step, a
probit model was estimated for the likelihood of continued nursing home residence in the next quarter.
The covariates in this model were largely the same
variables entered in the outcome models. The estimated probit model, which exhibited good statistical
fits based on conventional measures of goodness of fit
(Menard, 1995), was then used to estimate a selection factor, which was included as an additional covariate in the second step outcome models. The estimated
outcome model parameters were not affected by its
inclusion. The selection factor itself was never found
to be statistically significant.
Because the estimated parameters of sample selection models may be sensitive when many of the same
covariates are specified in the first ana second step
models (Breen, 1996), the sensitivity of the empirical
results to the treatment of death as a censored outcome in the health outcome models was further tested
by comparing the outcome empirical results with the
results from multinomial logit models of four discrete
outcomes (improved status next quarter, no change in
status next quarter, worsened status next quarter, and
death). There was no evidence of bias due to selectivity effects associated with mortality, or evidence of
bias due to the assumed linearity in the measurement
of discrete outcomes. With very few exceptions the
multinomial logit outcome model results showed the
same covariates to be statistically significant and of the
same sign as the various quarterly outcome models
estimated under the model A1.
The Gerontologist
Sensitivity Analyses
Because the use of the MMQ instrument was initiated in 1991, the residence histories of many longtenured nursing home residents were left-truncated.
Although actual admission dates were known for
specification of length of stay variables, no other information exists for the time of facility admission. The
employment of such data requires a formal "no memory
assumption," or that prior history before time t has no
direct influence on outcomes at t + 1. Norton (1992)
found empirical support for this assumption in a Markov
model of nursing home transitions among states
defined in terms of functional status, nursing home
discharge destinations, and death.
Sensitivity analyses were performed to test for potential biases associated with the use data with lefttruncated residence histories. Each outcome model was
reestimated on a smaller sample of nursing home residents whose admission quarter was observed with
12,970 quarters of data. All model parameters (other
than the tenure of residence and intercept) for the
smaller sample of residents with complete histories
were constrained to the values estimated from the full
estimation sample. The null hypothesis of equality with
the unconstrained model could not be rejected for
each outcome model at the conventional 5% level of
statistical significance.
The assumed time-invariance of estimated model
parameters (other than the constant which varies with
tenure of nursing home residence through the length
of stay variable) was also tested through the specification of interaction terms between the tenure of nursing home residence and other independent variables.
Although there were some notable shifts in some parameter estimates, possibly due to effects of multicollinearity, joint tests of statistical significance showed
that the additional interaction terms did not increase
the model fits significantly.
These sensitivity analyses did not show any evidence
of significant cohort effects or temporal parameter
shifts, suggesting that for the institutionalized Medicaid study population there was no significant bias imparted by left-censored resident histories.
Empirical Results
Given the volume of data associated with the empirical results for the nine resident outcome models,
only summaries of the empirical results can be reported
here. Table A-2 contains a summary of the model specifications, the signs of individual coefficients that were
significant at the 5% level of statistical significance, and
model fit statistics. A full set of empirical results is available from the corresponding author.
Appendix 2
Notes
1. It could be argued that the assumption of relatively constant facility effects is untenable because a
Vol. 38, No. 6, 1998
681
number of nursing homes are bought and sold each
year, directors of nursing come and go, and so forth.
However, if quality changes are truly so volatile that it
is unreasonable to assume that the quality of care is
relatively stable over a year in most facilities, the practical utility of any outcome performance measure for
nursing homes would be in doubt.
Certainly there may be other ways that facility performance can be measured over longer time periods
other than by averaging quarterly outcome performance. For example, it would be possible to distinguish superior facility performance by counting the
number of quarters in which a facility had exemplary
quarterly outcome performance. Inferior facilities might
be distinguished by counts of inferior quarterly outcome performance. All alternative approaches for ranking facility outcome performance will have potential
shortcomings as well. For example, if superior facility
performance were measured solely by the number of
quarters in which a facility's favorable performance
exceeded some threshold, a facility with six quarters
of exemplary performance and two quarters of very
poor performance would be ranked as having better
overall performance than a facility with eight consecutive
quarters of only moderately favorable performance. In
this study, we have measured facility performance in
a straightforward manner under some plausible assumptions. Certainly the issue of what is the best way
to measure facility performance over time deserves
attention in future research.
2. Because multilevel models have not been applied as widely in health services research as in other
fields (e.g., education; Rice & Leyland, 1996), it is difficult to know how much of a difference use of multilevel models might make in measuring nursing facility
performance. In the field of education, Fitz-Gibbon
(1991) directly compared school performance measures derived under the residual approach used in this
study (i.e., observed-expected outcomes) with those
obtained with multilevel models. Among four performance measures in three different subject areas, the
correlation between these two classes of performance
measures had a median value of 0.90, and all but
one of the correlations exceeded 0.77. When school
group sizes exceeded 30, very little shrinkage occurred
in the multilevel model estimates. In general, multilevel modeling made very little difference in measurement of performance for schools.
3. A recent study of hospital performance profiling
on the basis of risk-adjusted patient mortality rates
provides some insights about the likely impact of restricting the sample of outlier facilities to those nursing homes with Medicaid populations exceeding the
state median. In the study done by Normand, Glickman, and Gatsonis (1997), hospital performance was
measured through several multilevel models and as
a residual between observed and expected patient
outcomes for a hospital. Of a total of 96 hospitals, the
nine lowest-performance outlier hospitals and four
highest-performance hospitals under the residual
measure of performance were selected, and their performance rankings were compared with the corresponding rankings derived from the multilevel
Table A-2. Summary of Empirical Results for Resident-Level Outcome Models: Signs of Significant Variables (p < .05)
Variable Name
Sur
Demographic
Male
White
Black
Age
Length of Residence
Metabolic
Mental
Nervous
Circulatory
Respiratory
Digestive
Genitourinary
Skin
Muskoskeletal
Beh
Inc
+
+
Length of stay
Length of stay squared
New admit
Diagnostic Attributes
Secondary diagnoses
Dementia
Alzheimer's
Schizophrenia
Psychosis
Parkinson's
Diabetes
Heart Failure
Stroke
Other cerebrovascular
Hypertension
Ischemic heart
Hip fracture
Chronic air obs.
Osteoarthrosis
General symptoms
Infections
Neoplasms
ADL
Dec
Res
+
+
+
+
+
+
-
Ace
Wgt
+
+
+
+
-
+
+
+
+
+
-
+
+
-
+
+
+
+
+
-
-
+
+
+
+
+
-
+
+
4+
+
+
+
+
+
+
+
-
+
+
+
-
+
+
-
+
+
+
-
+
-
-
+
+
+
+
-
Congenital
-
+
Ill-defined
Injury/poisoning
Baseline Frailty Measures
ADL status
Behavior/cognitive status
Incontinence
Contracture
Weight change
Restraints
Decubitus ulcers
Accidents
General Facility Attributes
Not-for-profit
Management firm
Operating tenure
Beds
Net revenue
Nurse Staffing
Nursing staff intensity
RN staffing
LPN staffing
Nursing pool labor
Facility Case-Mix
Private payer days
Medicare payer days
Mean M M Q score
ffVPseudo R
Con
2
+
NS
NS
NS
+
+
+
+
NS
NS
NS
NS
+
+
+
NS
NS
NS
NS
NS
+
+
+
+
NS
+
NS
NS
+
+
+
NS
+
NS
+
+
+
NS
NS
NS
NS
+
+
+
+
NS
+
NS
NS
+
4NS
NS
NS
NS
+
+
+
NS
NS
NS
NS
NS
+
-f
+
+
+
+
.06
.85
.73
.80
+
+
-
+
-
-
.39
.19
.73
.02
.02
Notes: Outcomes: Sur = Survival rate; ADL = ADL functional status; Beh = Behavioral/cognitive status; Inc = Incontinence status;
Dec = Decubitus ulcers; Res = Restraint use; Ace = Accidents; Con = Contracture; Wgt = Weight change. For all outcomes except
survival rate, a positive coefficient (+) implies a negative association with a favorable outcome.
a
A " + " or " - " denotes the sign of a statistically significant estimated coefficients (p < .05). A blank means the coefficient was not
significant, and NS is used to distinguish variables that were not specified in a model.
682
The Gerontologist
models. Although there was moderate disagreement
in the relative rankings among alternative performance
measures in general, nearly all disagreement was for
facilities with patient populations smaller than the
median for the sample of study hospitals. If comparisons of rankings among alternative measures are restricted to the six outlier hospitals with patient populations exceeding the median, then only one of the
six outlier hospitals (under the residual method of performance measurement) would not have been ranked
among the nine lowest or four highest performing
hospitals from the multilevel model results. This is consistent with findings from school performance research
showing that multilevel modeling makes the most
difference in measuring the performance of organizations with small populations (Fitz-Gibbon, 1991).
NEW MONTHLY PUBLICATION
SCHEDULE FOR 1999
Effective January 1999, The Journals of Gerontology Series A: Biological Sciences and
Medical Sciences W\\\ be published every month rather than every other month. Now
The Journals of Gerontology bring you the best gerontological research in the fields
of biology and medicine twice as often.
Manuscripts should be submitted to the editors in accordance with the "General
Information and Instructions to Authors" published in the appropriate journals and
posted on our website. For further information about subscriptions and advertising
opportunities, please contact:
THE GERONTOLOGICAL SOCIETY OF AMERICA
1030 15th Street, NW, Suite 250
Washington, DC 20005-1503
(202)842-1275 phone
(202) 842-1150 fax
E-mail: [email protected] or visit http://www.geron.org
Vol. 38, No. 6, 1998
683