Short and Precise Patient Self-Assessment of Heart Failure Symptoms Using a Computerized Adaptive Test (HF-CAT) Rose et al: Heart Failure CAT Matthias Rose MD PhD 1,2,3, Milena Anatchkova PhD 1, Jason Fletcher PhD 4, Arthur E. Blank PhD 4, Jakob Bjørner MD PhD 5, Bernd Löwe MD PhD 3, Thomas S. Rector PhD 6, John E. Ware PhD 1,7 Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 1 Department of Quantitative Health Sciences, University of Massachusetts, Worcester, MA, USA 2 Department of Psychosomatic Medicine, Charité – University Medicine Berlin Berlin, in n, Germany 3 University Medical Center Hamburg-Eppendorf and Schön Klinik nik Hamburg-Eilbek, Ham H ambu am burg bu rg-E rg -Eil -E ill Germany 4 Department of Family and Social Medicine, Albert Einstein College llegee of M Med Medicine, edic ed iciin ic in Bronx, NY, USA 5 3i QualityMetric, Lincoln, L RI, USA 6 VA Medical Centerr and Department of Medicine, University of Minnesota, Minneapolis, Min n MN, USA 7 John Ware Research h Group, Incorporated, Worcester, MA, USA Correspondence to Matthias Rose Department of Psychosomatic Medicine, Charité – University Medicine Berlin, Germany Charitéplatz 1 10117 Berlin, Germany office +49 30 450 553002 fax +49 30 450 553989 [email protected] Journal Subject Codes: 110 DOI: 10.1161/CIRCHEARTFAILURE.111.964916 Abstract Background—Assessment of dyspnea, fatigue and physical disability is fundamental to the monitoring of patients with heart failure (HF). A plethora of patient-reported measures exist, but most are too burdensome or imprecise to be useful in clinical practice. New techniques used for computer adaptive tests (CAT) may be able to address these problems. The purpose of this study was to build a CAT for patients with HF. Methods and Results—Item banks of 74 queries (‘items’) were developed to assess selfDownloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 reported physical disability, fatigue and dyspnea. All queries were administered to 658 adults with HF to build three item banks. The resulting HF-CAT was administered to 100 ancillary n, the physical physica caal function and HF-patients (NYHA I 11%, II 53%, III&IV 36%). In addition, rtnes esses s-of sof-b of -bre -b reat re athh at vitality domains of the SF-36 questionnaire, an established shortness-of-breath-scale (SOB), Failur Questionnaire (MLHFQ) were app and the Minnesota Living with Heart Failure applied. The HFo ook 3:09r1:52 minutes to complete and score. All HF F CAT assessment took HF-CAT scales demonstrated good construct validity through high correlations with the corresp corresponding SF-36 r r=-.87), vitality (r=-.85) scales, and the SOB scale (r=.84 4 Simulation physical function (r=-.87), (r=.84). HF CAT scales over a larg studies showed a more precise measurement of all HF-CAT larger range than comparable static tools. HF-CAT scales identified significant differences between patients classified by the NYHA symptom criteria, similar to the MLHFQ. Conclusions—A new CAT for HF patients was built using modern psychometric methods. Initial results demonstrate its potential to increase the feasibility, and precision of patient selfassessments of symptoms of HF with minimized respondent burden. Clinical Trial Registration—URL: http://www.projectreporter.nih.gov. Unique identifier: 1R43HL083622-01. Key Words: heart failure, patient-reported outcomes, computer adaptive tests The cardinal manifestations of heart failure (HF) are dyspnea and fatigue, limited tolerance of physical activity, fluid retention, pulmonary congestion and peripheral edema. Therefore, HF is a clinical diagnosis that is largely based on physical examination and a careful history about typical subjective symptoms in the presence of cardiac dysfunction (1). A patient-centered measurement approach is particularly important in HF, to provide clinicians with tools to help them to monitor the syndrome, to compare improvements under different forms of therapy, Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 and to identify risk of deterioration. The NYHA classification has been used for this purpose, rare ra rreely y uused seed ou ooutside u but is being criticized for its questionable reliability (2,3) andd rarely clinical studies or specialized units. e elf-assessments have been shown to be the more reliable aassessments of Generally, patient self-assessments m which is one reason for a growing interest in subjectivee health status ms, subjective symptoms, measures from thee scientific community, clinical practitioners, as well as from the industry (4,5). Self-assessed symptoms are used to predict declines in health status of patients with HF (6), total expenses for HF care (7), hospitalization or even mortality (8,9). Their widespread use has been recommended to increase quality of care (10), and 30% of all new drug developments use Patient-Reported Outcomes (PROs) as their primary or co-primary endpoint (11). However, with traditional methods, a comprehensive and reliable ‘static’ measure is likely to be long and time-consuming to administer and score. If questionnaire data need to be analyzed manually assessments become cost-prohibitive for use in routine clinical practice, and individual patient reports cannot be provided timely. Short-forms limit the respondent burden, but often show more ceiling- or floor effects and lack the precision required at the individual patient level (12,13). Measurement precision to guide individual decision-making must be substantially higher than for group comparisons, because true change must be separated from measurement error for every single assessment (13). For example, if a confidence interval of 95% is required, a traditional tool with good psychometric properties for group comparisons (e.g. with Cronbach Į=.80) would only allow for interpretation of score differences of almost one standard deviation when used for an individual (14). Moreover, classic psychometric methods cannot be used to determine the measurement Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 precision for an individual measurement. As a result, none of the existing tools has become a cc standard measure in clinical practice (15,16). Enhancing the precision, ac accessibility and oul uldd ma make k heart failure ke interpretability of patient reported outcome (PRO) measuress co could e and effective in meeting patient care needs. management more efficient With the presentedd study we apply computerized adaptive testing (CAT (CAT) methods, a o ology (17) which is used widely in educational testing (18).. We aimed to measurement technology build a system which will allow routine, comprehensive assessment of pathognomonic symptoms. The use of CAT techniques also promise to provide more precise measures, with fewer items, and an effective resolution to the classic conflict between practicality and precision faced by traditional measurement methodology (12). CATs tailor each assessment to the individual’s status on what is being measured, applying only items which are most appropriate for her/his current health status. Responses to each CAT-item direct the choice of the following CAT-item towards the most informative for this particular assessment. A patient indicating higher levels of disability within the first questions would only be asked about this level of ability. Omitting the use of uninformative items not relevant for a given functional limitation focuses the assessment, decreases the respondent burden, and increases the measurement precision achievable with a given number of items. CATs select the items out of a larger item bank representing the entire range of the construct being measured. Most of the item banks are built upon the principles of the Item-Response Theory (IRT). The National Institutes of Health (NIH) are intensively promoting use of these methods to develop a comprehensive Patient-Reported Outcomes Measurement Information System (PROMIS) as part of their roadmap initiatives (http://nihroadmap.nih.gov/). Authors Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 of this paper are part of the PROMIS initiative, which aims to provide a standard assessment for generic health status measures in the near future (19). dy was to develop CATs for dyspnea, fatigue and physical function fu The goal of this study for the n with HF, and to evaluate their acceptability, precision andd validity. nts assessment of patients Methods Development of the items After review of the relevant literature we developed a set of 74 patient questions (items) covering the three primary physical impairments commonly reported by patients with HF: physical function/disability (24 items), dyspnea (30 items) and vitality/fatigue (20 items). The queries were designed to be short enough to fit on a portable phone screen for home assessments (Figure 1). Items were selected to represent the entire continuum of each aspect of HF from no to severe impairment. All three item banks have been scored in the direction that higher scores indicate more impairment (i.e. physical disability, fatigue, and dyspnea). The item bank development was performed separately for each of the three domains of physical function, dyspnea, and fatigue following the same procedures as described in previous studies (20,21). After the item banks had been developed we used them as a basis for a CAT. A new software solution was developed to work on a Personal Digital Assistant. The CAT logic can be set to stop after the measurement reaches a particular precision or after a maximum of items had been administered. For this study phase the CAT was set to assess each of the three different domains with a standard error of SE < 3.3 (corresponding to a Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 reliability of Cronbach Į > .90 for samples with a standard deviation of 10) or a maximum number of 7 items per scale. Participants T item bank development (IB sample) were collected via thee Internet from The data for the CAT d dults Y English speaking adults with HF. All respondents were recruited by YouGov. YouGov uses a l matching t hi for f the th selection l ti off study t d samples l ffrom pools of opt-in methodology called sample respondents (22). Sample matching starts with an enumeration of the target population. For patient recruitments, the target population is all adults with similar sociodemographic characteristics like patients with a particular condition, as enumerated in consumer databases (e.g. maintained by Acxiom, Experian, and InfoUSA). Then a random sample is drawn from the target population. Finally for each member of the target sample, a matching member of the internet pool of opt-in respondents is selected, resulting in a “matched sample”. Matching was based on age, gender and race. The resulting matched sample has similar characteristics to the target population and, will have similar properties to a true random sample. For this study 14,028 adults have been approached until the target number of patients with heart failure had been enrolled. All newly developed items were administered randomly. The same data collection method and vendor has been used for many similar projects, including a NIH roadmap initiative for the development of generic PRO tools (www.nihpromis.org). To ensure a sufficient distribution of responses for the item parameter estimation, we used a quota of 1/3 of patients with minor, medium, and severe impairment based on one screening question describing the level of impairment analogous to the NYHAclassification (I/II/III). To help ensure the quality of the data we applied the following exclusion criteria: (a) average Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 answering time per item was less than 5 seconds, (b) subjects who did not indicate they had ndi dica ccaatee tthat haat th thee HF diagnosis HF and one underlying cause for HF, (c) subjects who did not indicate y ((d)) last visit to a pphysician y re th han 6 m onth ago, or (e) on was given by a physician, was more than months d not indicate at least one drug used for the treatment of HF (diuretics, current medication did o ockers, digoxin). ACEI or ARB, ȕ-blockers, racteristics of the HF-CAT HF CAT different simulation studies were conducted as To examine the characteristics described earlier (20,23). These analyses are based on the real data provided for all items in the bank by the patients in the online survey. Only small subsets of those item responses are used to estimate the patient score for the CAT simulation (in IRT terms called ‘theta score’). The quality of the items in the bank defines the precision of the score at different ranges. The ‘test information curve’ identifies floor- and ceiling effects and if the measurement range of the tool fits to the symptoms of the sample. To illustrate this for the HF-CAT, the precision of the score estimate was plotted as a function of the patient scores (20). To evaluate the construct validity of the HF-CAT, items from the following established tools were also included in the data collection: the SF-36® Health Survey scales for Physical Functioning (PF) and Vitality (VT) (24), four items from the Medical Health Outcomes Survey (HOS) to assess Shortness of Breath (SOB) (25) and the Minnesota Living with Heart Failure Questionnaire (26) (MLHFQ, 21 items) as a legacy tool for measuring HF as indicated by patients’ perceptions of its overall effects on their lives. A separate sample of 100 consecutive participants was recruited for the validity test conducted at the heart failure clinic of the Montefiore Medical Center, Bronx, NY (MMC sample). The clinic was selected as it usually does not use PRO assessments, and predominantly serves a low income, diverse population. We considered this environment as Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 particularly challenging to test a new technology, assuming relatively low health literacy ic pr prop op per erti t es w ti wo o levels. In addition, we felt that an evaluation of psychometric properties would be more p as the validityy of the IRT assumptions umpt ppttio ions ns have hav avee bbeen evaluated relevant in a less educated sample, o opment d (Tab b 1). Patients already in the development sample, which was affluent and well-educated (Table g gnosed y Consenting with previously diagnosed heart failure were invited to participate in the study study. s sked u (Personal participants were asked to complete the actual HF-CAT on a hand-held compu computer Digital Assistant, PDA) and a series of paper- and pencil-assessments including sociodemographic questions, the MLHFQ, and a survey evaluation the experience with the HFCAT. All participants completed both instruments. Participants were randomly assigned to one of two groups within a cross-over design where the order of presentation of the HF-CAT assessment and the MLHFQ was counterbalanced. Patients were placed in the waiting area and asked to follow the standard instructions provided for each measure. Medical information, including the NYHA class was extracted from the medical files. The NYHA class is determined routinely for all patients at every visit at the MMC Heart Failure Clinic based on the clinical assessment of the treating physician. The NYHA class was determined without knowledge of the results of patient self-assessments. Patients gave written informed consent and received a $25 incentive for their participation in the study. Results Samples After applying the inclusion and exclusion criteria, the final item development sample (IB Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 sample) consisted of 658 participants, 60r13 years old (49% female) who had experienced g conditions bbeside e es HF for 8.8r7.9 years (Table 1). Patients reported the following their HF: ardiiom omyo yopa yo path pa thyy, 14% valvular th 43% coronary heart disease, 42% previous heart attacks, 18% cardiomyopathy, % rheumatic fever, 60% hypertension, 31% arrhythmias, 440% diabetes. heart disease, 5.2% r by 5.9%. Alcohol abuse was reported e edical Center clinical sample (MMC sample, n=100) was ppredominantly The Montefiore Medical male (62%), with a mean age of 58 years. The sample was diverse including a majority of African-American patients and a large proportion of Hispanics. One third of the population had a comparatively low household income. The severity of their heart failure symptoms assessed by the New York Heart Association (NYHA) classification was 11% in class I, 53% in class II, 36% in class III or IV. HF-CAT Development Item Banks Development: In the final calibrated item banks there were 21 items assessing Physical Disability, 20 items assessing Fatigue and 29 items in the Dyspnea bank with satisfactory item fit (Table 2). Most informative (i.e. with a high discrimination parameter: ‘slope’) was the item asking about the ability to run errands, an item referring to a feeling of being “worn out”, and the item asking if the patient will be short of breath walking from one room to another. Simulation Studies: The precision of every score estimate can be displayed as a function of the level of function, or the severity of the symptoms. The results of the simulation studies showed that a highly precise score (comparable to an internal consistency of Į>.90) can be estimated with 5 items for each domain over a range of nearly three SDs. (Figure 2, left side). Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 The concordance between the results of the CATs and the entire item bank was very good for atio at ions io ns (r=0.95-0.97), (r= r=0. 0 95 0. 95-0 -0 0 all of the constructs as illustrated by the extremely high correlations showing T can essentially capture the information provided by the en that the 5 item CAT entire bank. As r high correlations between the simulated CAT scale sc re c expected there were scores and the 6 Health Survey’s Physical Function (r=-.87), and Vitality sscales (r=-.84), corresponding SF-36 (r .83). Compared to all legacy tools, as well as the static Shortness of Breath measurement (r=.83). the HF-CAT provides a more precise measurement over a larger measurement range (Figure 2, right side). For Physical Disability a similar measurement precision like with SF-36 Physical Function scale can be achieved with ½ the number of items (Figure 2, upper left corner). HF-CAT Evaluation Respondent burden: On average 4-5 items were administered for the assessment of physical disability, fatigue and dyspnea to achieve the predefined level of precision (Table 3). The average time for administration of the entire HF-CAT with all three domains was 3 minutes (3r2 min). Validity: We used the MLHFQ to help evaluate the constructs of the HF-CAT and the NYHA class to evaluate its discriminative validity (Table 3). The mean MLHFQ score of the sample was 38 ± 25, the mean score of the HF-CAT were 59.6 ± 8.4 for Physical Disability, 52.6 ± 8.5 for Fatigue, and 54.8 ± 13.3 for Dyspnea. There were no order effects for any measure. The HF-CAT scales for physical disability, fatigue, dyspnea correlated significantly with the MLHFQ total score (r = 0.71, r = 0.63, r = 0.68 respectively). A general linear model was used to evaluate the ability of the HF-CAT scales to statistically Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 differentiate patients with different levels of symptom severity as measured by the clinician’s eaasurres were werre significant, sigg si NYHA classification (Table 3). The main effects for all the measures with y ((Eta², F-values)) for the HF-CAT CAT T Ph Phys yys ysic ical ic a D al very similar discriminative ability Physical Disability and d the MLHFQ scale. Dyspnea scales, and study took place ace in a low income, less educated, minor minority population User Experience: Ass this stu we had been particularly cularly interested in the subjective user experience with a computer assessment. 98% of the patients found the HF-CAT assessment overall very easy or easy, 100% thought it was very easy or easy to follow the instructions, and 95% said it was very easy or easy to read the questions on the screen. 98% judged the time for the assessment as ‘just right’, and 90% considered the questions as relevant. 98% had been willing to use the device again on the next visit. Discussion For the first time we applied computerized adaptive testing methods to develop and evaluate an ultra-short assessment system for patients with HF (HF-CAT) in clinical practice. The tool allows routine, comprehensive assessment of three primary problems that are commonly experienced by patients with heart failure. If the emotional or social impact of the disease is of additional interest, further tools, e.g. from the PROMIS, need to be added for a comprehensive coverage of the health-related quality of life construct. Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 Feasibility uate teed in a low low income, low The feasibility of the HF-CAT in its PDA version was evaluated p in the Bronx, NY. It was demonstrated that the HF-CAT is a educated minority population o practical tool well accepted. Nevertheless, it was tested under study co conditions, and h been biased receiving an incentive for their particip p participants might have participation. To our b t th t ithi clinical li i l practice t settings is knowledge, only one reportt about the acceptance off CAT CATs within available. A similar CAT, also being displayed on a PDA, is in routine clinical use since 2004. Patients answering this CAT also report a high acceptability. All most all of the 423 consecutive patients considered the handling as easy and felt that the use of the PDA made sense (27). Several other studies report about the reception of CATs under study conditions. The majority of patients in a feasibility test of a pain CAT found the CAT application to be useful, relevant, of appropriate length, and easy to complete (28). Similarly the majority of respondents in a feasibility study of an asthma impact CAT found it easy to complete and of appropriate length (29). The results of a feasibility test of a diabetes CAT gave somewhat mixed results. While both English-speaking and Spanish-speaking participants agreed that a paper-and-pencil assessment was more burdensome than a CAT, the Spanish-speaking participants preferred the paper tool and were more willing to complete a paper tool in the future (30). Respondent Burden One important contribution of the Computer Adaptive Test technology will be to reduce the Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 respondent burden without compromising the precision and validity of the assessment, by g was demonstrated earlier, tailoring each assessment to the patient’s condition. This advantage iving ngg C AT,, wh AT whi i found that for example, in a simulation study of the Activities of Daily Living CAT, which m the CAT provided similar results to a static version while reducing the num number of items % (31). Results from other studies t sim m administered by 50% indicate that scores similar to those e ength a be achieved obtained with full-length item banks (ranging in length from 18 to 585 items) ca can through much shorter CATs when measuring functional status (32-34), mental health status (21,27,35,36) or the impact of conditions like headache (23,37), diabetes (30), chronic pain (28), and asthma (29). Most actual CAT applications used between 5-7 items to measure one construct. The present HF-CAT applied between 4-5 items per scale and the average total time for the entire assessment and scoring was 3 min, i.e. 1 min per scale (which could be applied individually). The assessment time of the MLHFQ electronically measured in a previous study was 4r2 min (38), and time administer the Kansas City Cardiomyopathy Questionnaire (KCCQ), another common tool for the assessment of HF patients, is reported to be 4-6 minutes without scoring (39). In summary, the HF-CAT provides a precise measure over a large measurement range with minimal respondent burden. As far it is known today, it seems that CATs offer an effective resolution to the classic conflict between practicality and precision faced by traditional measurement technology (12). Validity Studies of CAT applications in diseases, like depression (27,35), or headache (40), have Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 shown that their measurement advantages can transfer to increased validity in identifying differences between groups known to differ in clinical characteristics, compared to static etw twee eenn gr ee grou oups ou ps oof patients of tools. The three scales of the HF-CAT also discriminated between groups assification equally as well as a legacy tool measuring the im different NYHA classification impact of heart t HF-failure, using four times more items. These initial results show thatt the HF-CAT has the l potential to provide a valid, highly relevant assessment of patients with heart fail failure. t Serial Measurements For the assessment of HF patients, we believe it is important to assess the health status of the patient at the point of care as well as at the patient’s home. As many elderly patients do not have access to the internet or are not familiar with its use, one way to do so is the use of a smart phone and or interactive voice recognition. Most established tools include items which are not suitable to be used over the phone. IRT methods allow using much simpler items over the phone and more comprehensive items at the doctor’s office, and scoring both assessments on the same measurement metric . This allows having a smart phone administer the HF-CAT at the patient’s home, and have the same patient answering the more comprehensive PROMIS-CAT on a tablet PC at the doctor’s office. IRT-based measurements of health outcomes are independent of the particular items being administered and from the test administrator. The same value for the same domain yields the same interpretation, whereas results from different traditional tools cannot be compared directly making serial health status monitoring less practicable. Limitations Despite many encouraging findings with recent CAT developments, a number of issues still need to be addressed. Within this study we have only used outpatients to evaluate the HFDownloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 CAT, which limits the generalizability to less severely disabled patients. However, one of the most relevant advantages of CATs is that they can essentially eliminate floor and ceiling on st stud udie ud iess ha ie have ve sshown that the effects by applying items tailored to the test-taker. Our simulation studies current item bank covers more than three standard deviations aabove the pop population mean, s spitalized which is where a hospitalized population of HF patients usually scores. v not used the We did not evaluate the test-retest reliability for the HF-CAT. Similarly, we hav have ti study t d tto ttestt it i tto ttreatments. t t H HF-CAT in an intervention its responsiveness However, several studies have reported on the ability of other CATs to detect change. For example, in a telephone study of 540 headache patients, a CAT for headache impact was demonstrated to be more responsive to self-evaluated changes of headache impact than a corresponding 54-item bank (23). In a longitudinal, prospective cohort study of 94 patients discharged from inpatient rehabilitation, the CAT version of the Activity Measure for Post-Acute Care was found to be comparable in responsiveness to the 66-item static version (41). Similarly, in a series of articles, Hart et al. report on the results of validation studies of condition-specific CATs, using large data sets from patients receiving rehabilitation services across multiple U.S. clinics (33,34). Summary In summary, we have developed a promising method to measure patient-reported dyspnea, fatigue and physical function for use in the care of patients with heart failure. This new measure is part of a rapidly growing number of new assessment tools utilizing the advantages of item response theory and computerized adaptive test techniques (16,19,42), with some of them being used in clinical practice already (27,43). However, whether these encouraging improvements in measurement will transfer to improved care and ultimately health of heart Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 failure patients warrants further studies. Sources of Funding s in part by an NIH/NLHBI grant (1 R43 HL083622 2 The work has been supported HL083622-01, PI Rose) Disclosures None. References 1. Hunt SA, Baker DW, Chin MH, Cinquegrani MP, Feldman AM, Francis GS, Ganiats TG, Goldstein S, Gregoratos G, Jessup ML, Noble RJ, Packer M, Silver MA, Stevenson LW, Gibbons RJ, Antman EM, Alpert JS, Faxon DP, Fuster V, Jacobs AK, Hiratzka LF, Russell RO, Smith SC, Jr.: ACC/AHA guidelines for the evaluation and management of chronic heart failure in the adult: executive summary. A report of the American College of Cardiology/American Heart Association Task Force on Practice Guidelines. J Am Coll Cardiol. 2001;38:2101-2113. 2. Bennett JA, Riegel B, Bittner V, Nichols J: Validity and reliability of the NYHA classes for measuring research outcomes in patients with cardiac disease. Heart Lung. 2002;31:262-270. 3. Goldman L, Hashimoto B, Cook EF, Loscalzo A: Comparative reproducibility and validity of systems for assessing cardiovascular functional class: advantages of a new specific activity scale. Circulation. 1981;64:1227-1234. 4. Lett HS, Blumenthal JA, Babyak MA, Sherwood A, Strauman T, Robins C, Newman MF: Depression as a risk factor for coronary artery disease: evidence, mechanisms, and treatment. Psychosom Med. 2004;66:305-315. 5. Konstam V, Moser DK, De Jong MJ: Depression and anxiety in heart failure. J Card Fail. 2005;11:455-463. 6. Rumsfeld JS, Havranek E, Masoudi FA, Peterson ED, Jones P, Tooley JF, Krumholz HM, Spertus JA: Depressive symptoms are the strongest predictors of short-term declines in health status in patients with heart failure. J Am Coll Cardiol. 2003;42:1811-1817. Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 7. Sullivan M, Simon G, Spertus J, Russo J: Depression-related costs in heart failure care. Arch Intern Med. 2002;162:1860-1866. B Weintraub Wei eint ntra nt raub ra ub WS, WS W S Spertus JA: 8. Rumsfeld JS, Jones PG, Whooley MA, Sullivan MD, Pitt B, wiith myocardial myoc my ocar oc ardi ar d infarction Depression predicts mortality and hospitalization in patientss with complicated by y heart failure. Am Heart JJ. 2005;150:961-967. l llberg A Zipfel S, 9. Junger J, Schellberg D, Muller-Tasch T, Raupp G, Zugck C, Haunstetter A, Herzog W, Haass ass M: Depression increasingly predicts mortality in the course courr of congestive heart failure. Eur E J Heart Fail. 2005;7:261-267. 10. Cleary PD, Edgman-Levitan dgman gman Levitan S: Health care quality. quality Incorporating consumer consume perspectives. JAMA. 1997;278:1608-1612. 11. Burke, L. FDA Perspectives on IRT/CAT. DIA Workshop on Advances in Health Outcomes Measurement: Exploring the Current State and the Future Applications of Item Response Theory, Item Banks, and Computer-adaptive Testing, Bethesda, June 25. 2004. 12. McHorney CA, Cohen AS: Equating health status measures with item response theory: illustrations with functional status items. Med Care. 2000;38:II43-II59. 13. Rose M, Bezjak A: Logistics of collecting patient-reported outcomes (PROs) in clinical practice: an overview and practical examples. Qual Life Res. 2009;18:125-136. 14. McHorney CA, Tarlov AR: Individual-patient monitoring in clinical practice: are available health status surveys adequate? Qual Life Res. 1995;4:293-307. 15. Rector TS: A conceptual model of quality of life in relation to heart failure. J Card Fail. 2005;11:173-176. 16. Garin O, Ferrer M, Pont A, Rue M, Kotzeva A, Wiklund I, Van GE, Alonso J: Diseasespecific health-related quality of life questionnaires for heart failure: a systematic review with meta-analyses. Qual Life Res. 2009;18:71-85. 17. Bjorner JB, Chang CH, Thissen D, Reeve BB: Developing tailored instruments: item banking and computerized adaptive assessment. Qual Life Res. 2007;16 Suppl 1:95-108. 18. Wainer H, Dorans NJ, Eignor D, Flaugher R, Green BF, Mislevy RJ, Steinberg L, Thissen D: Computerized Adaptive Testing: A primer. Mahwah, NJ, Lawrence Erlbaum Associates, 2000. 19. Cella D, Yount S, Rothrock N, Gershon R, Cook K, Reeve B, Ader D, Fries JF, Bruce B, Rose M: The Patient-Reported Outcomes Measurement Information System (PROMIS): progress of an NIH Roadmap cooperative group during its first two years. Med Care. 2007; 45:S3-S11. Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 20. Rose M, Bjorner JB, Becker J, Fries JF, Ware JE: Evaluation of a preliminary physical function item bank supported the expected advantages of the Patient-Reported Outcomes Measurement Information System (PROMIS). J Clin Epidemiol. 2008;61:17-33. 21. Fliege H, Becker J, Walter OB, Bjorner JB, Klapp BF, Rose M: Development of a Res s. 20 2005 05;1 05 ;14: ;1 4:22 4: 22 computer-adaptive test for depression (D-CAT). Qual Life Re Res. 2005;14:2277-2291. ork Cambridge Cam ambr brid br idge id g University ge 22. Rubin D.B.: Matched Sampling for Causal Effects. New York, Press, 2006 K M, Bjorner JB, Bayliss MS, Batenhorst A, Dahlof CG, C Tepper S, 23. Ware JE, Jr., Kosinski Dowson A: Applications p pplications of computerized adaptive testing (CAT) to the ass assessment s of headache impact. a Qual Life Res. 2003; 12:935-952. act. 24. Ware JE, Jr., Dewey J: How to Score Version Two of the SF SF-36 Survey. Lincoln, RI, 36 Health Sur QualityMetric Incorporated, 2000. 25. National Committee for Quality Assurance. Specifications for the Medicare Health Outcomes Survey. HEDIS® . 6. 2004. Washington, DC, National Committee for Quality Assurance. 26. Rector T, Cohn J: Patients'self-assessment of their congestive heart failure. Part 2: Content, reliability and validity of a new measure, the Minnesota Living with Heart Failure questionnaire. Heart Failure. 1987;3:198-209. 27. Fliege H, Becker J, Walter OB, Rose M, Bjorner JB, Klapp BF: Evaluation of a computeradaptive test for the assessment of depression (D-CAT) in clinical application. Int J Methods Psychiatr Res. 2009;18:23-36. 28. Anatchkova MD, Saris-Baglama RN, Kosinski M, Bjorner JB: Development and preliminary testing of a computerized adaptive assessment of chronic pain. J Pain. 2009; 10:932-943. 29. Turner-Bowker DM, Saris-Baglama RN, Anatchkova M, Mosen DM: A Computerized Asthma Outcomes Measure Is Feasible for Disease Management. Am J Pharm Benefits. 2010;2:119-124. 30. Schwartz C, Welch G, Santiago-Kelley P, Bode R, Sun X: Computerized adaptive testing of diabetes impact: a feasibility study of Hispanics and non-Hispanics in an active clinic population. Qual Life Res. 2006;15:1503-1518. 31. Chien TW, Wu HM, Wang WC, Castillo RV, Chou W: Reduction in patient burdens with graphical computerized adaptive testing on the ADL scale: tool development and simulation. Health Qual Life Outcomes. 2009;7:39. 32. Haley SM, Gandek B, Siebens H, Black-Schaffer RM, Sinclair SJ, Tao W, Coster WJ, Ni P, Jette AM: Computerized adaptive testing for follow-up after discharge from inpatient rehabilitation: II. Participation outcomes. Arch Phys Med Rehabil. 2008;89:275-283. Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 33. Hart DL, Wang YC, Stratford PW, Mioduski JE: Computerized adaptive test for patients with knee impairments produced valid and responsive measures of function. J Clin Epidemiol. 2008;61:1113-1124. 34. Hart DL, Werneke MW, Wang YC, Stratford PW, Mioduski JE: Computerized adaptive alid al id aand nd rresponsive espo es pons po ns measures of test for patients with lumbar spine impairments produced valid function. Spine (Phila Pa 1976). 2010;35:2157-2164. 35. Gibbons RD, Weiss W DJ, Kupfer f DJ, Frank E, Fagiolini A, Grochocinski VJ, VJJ Bhaumik DK, k RD, Immekus JC: Using computerized adaptive testing to rreduce the Stover A, Bock burden of mental n health assessment. Psychiatr Serv. 2008;59:361-368. ntal 36. Walter OB, Becker e ecker J, Bjorner JB, Fliege H, Klapp BF, Rose M: Developm Development m and evaluation of a computer adaptive test for 'Anxiety' (A-CAT). Qual Life Re Res. e 2007;16 Suppl 1:143-155. 55 37. Bayliss MS, Dewey JE, Dunlap I, Batenhorst AS, Cady R, Diamond ML, Sheftell F: A study of the feasibility of Internet administration of a computerized health survey: the headache impact test (HIT). Qual Life Res. 2003;12:953-961. 38. Bennett SJ, Oldridge NB, Eckert GJ, Embree JL, Browning S, Hou N, Chui M, Deer M, Murray MD: Comparison of quality of life measures in heart failure. Nurs Res. 2003;52:207-216. 39. Green CP, Porter CB, Bresnahan DR, Spertus JA: Development and evaluation of the Kansas City Cardiomyopathy Questionnaire: a new health status measure for heart failure. J Am Coll Cardiol; 2000;35:1245-1255. 40. Martin M, Kosinski M, Bjorner JB, Ware JE, Jr., Maclean R, Li T: Item response theory methods can improve the measurement of physical function by combining the modified health assessment questionnaire and the SF-36 physical function scale. Qual Life Res. 2007;16:647-660. 41. Haley SM, Fragala-Pinkham M, Ni P: Sensitivity of a computer adaptive assessment for measuring functional mobility changes in children enrolled in a community fitness programme. Clin Rehabil. 2006;20:616-622. 42. Ruo B, Choi SW, Baker DW, Grady KL, Cella D: Development and validation of a computer adaptive test for measuring dyspnea in heart failure. J Card Fail. 2010;16:659668. 43. Becker J, Fliege H, Kocalevent RD, Bjorner JB, Rose M, Walter OB, Klapp BF: Functioning and validity of a Computerized Adaptive Test to measure anxiety (A-CAT). Depress Anxiety. 2008;25:E182-E194. Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 Table 1. Characteristics of the Samples Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 HF-CAT Development (IB sample) HF-CAT Evaluation (MMC sample) Total Sample n = 658 N = 100 Age Years with HF Family Status Living in partnership Living alone 60 (13) 58 (12) 8.8 (7.9) 4.6 (4.5) 78% 21% 54% 33% Gender Female 49% 38% Ethnicity y Hispanic or Latino 4% 35% Race White African American Other 93% 3% 4% 19% 46% 35% Education 8th Grade or Less Some High g School High g School Graduate Some College g College g Graduate 0.1% 3% 15% 39% 22% 13% 21% 25% 24% 11% Postgraduate 20% 5% Household income Less than $5,000 1% 11% $5,001 to $20,000 18% 22% $20,001 to $45,000 32% 15% $45,001 to $75,000 23% 10% More than $75,000 17% 5% Prefer not to answer 9% 37% Employment p y status Student .3% 4% Working at a paying job 22% 23% Retired 56% 47% Laid off or unemployed 3% 2% A full-time homemaker 7% 9% Other 11% 11% Table 2. IRT Item Parameters HF-CAT Item Banks Physical Disability slope mean 1 2 Exercising hard for half an hour 1 2.549 0.556 -0.123 1.236 Doing an hour of physical labor 1 2.810 0.625 0.072 1.177 1.650 Walking up a steep hill 1 thresholds 3.558 0.748 -0.154 Rearranging furniture at home 1 3.952 1.136 0.610 1.663 Doing chores 1 4.252 1.391 0.765 2.017 3.432 1.492 0.537 1.191 Doing daily physical activities 2 1 Climbing up a flight of stairs Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 3.728 1.535 0.824 2.246 Doing daily physical activities 1 4.092 1.583 0.881 2.285 Carrying two bags of groceries 1 3.800 1.595 1.132 2.058 * Walking on flat ground 1 5.247 1.621 1.621 Preparing a meal 1 5.554 1.643 1.643 * Walking one hundred yards 1 3.977 1.648 1 158 1.158 2.137 2.1 .1137 3 Standing up from a chair 1 3.837 1.814 1.814 1.8 14 * Running errands and shopping 1 5.713 1.826 1.236 1.2 236 2.4 2.417 417 Dressing myself 1 4.825 1 832 1.832 1 832 1.832 * Taking a tub bath 1 2.683 1.869 1.601 2.137 7 Getting from one room to another 1 5.494 1.907 1.907 * Standing up from a bed 1 3.922 1.910 1.910 * Getting on and off the toilet 1 3.890 1.953 1.953 * Making the bed 1 4.330 1.955 1.444 2.465 5 Putting a trash bag outside 1 4.768 1.995 1.536 2.453 3 2.419 -0.475 -1.407 0.456 Fatigue Full of energy 3 3 2.243 -0.421 -1.271 0.429 Fresh and rested 3 1.979 -0.175 -1.195 0.845 Lively 3 1.925 -0.131 -1.100 0.839 Strong and vital Active 3 1.856 -0.123 -1.203 0.957 Full of life 3 1.600 0.063 -0.756 0.881 Tired 3 2.591 0.406 -0.638 1.450 3 3.617 0.546 -0.345 1.436 Sluggish 3 2.899 0.578 -0.383 1.539 4.090 0.647 -0.214 1.508 1.551 Fatigued Worn out 3 Run down 3 3.445 0.679 -0.192 Wide awake 3 1.217 0.741 -0.407 1.889 As if I have no energy left 3 3.189 0.767 -0.072 1.606 Spent 3 3.325 0.807 -0.104 1.719 Exhausted 3 Weary 3 Weak 3 Save my energy 3 3.392 0.811 -0.016 1.637 2.614 0.852 -0.042 1.747 2.421 0.866 -0.094 1.825 1.161 1.065 -0.064 2.195 Sleepy all day 3 1.765 1.125 0.139 2.111 Jaded 3 1.241 1.809 0.817 2.801 3 4 1.715 2.523 Dyspnea slope mean 1 2 3 Running a short distance makes me short of breath 3 1.190 -0.525 -2.072 -0.090 0.587 1.134 0.131 -0.206 0.468 Exercising hard for half an hour makes me short of breath Talking while walking up a hill will make me short of breath 3 thresholds Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 2.040 0.185 -1.455 0.385 An hour of physical labor makes me short of breath 4 1.418 0.394 0.033 0.755 My breathing problems limit my ability to exercise as much as I would like 3 1.500 0.449 -0.529 0.685 1.193 Talking while walking up a flight of stairs makes me short of breath 3 2.068 0.564 -0.919 0.807 1.803 During a typical day I feel short of breath 4 2.407 0.649 -0.220 1.518 Doing chores, like vacuuming or yard work, makes me short of breath 3 2.033 0.693 -0.608 0.763 1.924 Climbing up one flight of stairs makes me short of breath 3 2.440 0.819 -0.646 0.983 2.120 Going outside for a walk makes me short of breath 3 2.646 1.037 -0.213 1.197 2.128 Walking one hundred yards makes me short of breath 3 2.351 1.052 -0.064 1.194 2.027 Walking up a hill makes me short of breath 4 1.625 2.122 1.085 0.381 1.789 Carrying groceries makes me short of breath 3 2.796 1.102 -0.106 -0. 10 106 1.222 1.2 22 2 Talking while walking makes me short of breath 3 2.422 1.241 -0.023 -0. 0 023 1.313 1 313 1.3 3 2.434 Running errands makes me short of breath reath 3 2.677 1.351 0.191 1.415 2.448 Taking a bath makes me short of breath a 4 ath 2.849 1.404 1.009 1.800 0 4 Dressing myself makes me short of breath r reath 3.118 1.431 0.887 1.9755 3.104 1.451 0.988 1.914 4 4 Preparing a meal makes me short of breath b Singing or humming makes me short of breath 4 Speaking in a group makes me short of o breath 4 I feel short of breath when I sit and rest st 4 Talking at noisy places makes me short of breath 4 2.086 1.456 0.943 1.970 0 1.900 1.481 0.994 1.969 9 2.775 1.543 1.543 * 2.187 1.606 1.170 2.043 Walking from one room to another makes me short of breath 4 3.909 1.647 1.154 2.139 Talking to someone makes me short of breath 4 2.875 1.779 1.247 2.311 Talking on the phone makes me short of breath 4 2.768 1.840 1.398 2.281 Getting off the bed makes me short of breath 4 2.958 1.849 1.305 2.393 Going to the toilet makes me short of breath 4 2.868 1.900 1.501 2.298 1.532 1.924 1.234 2.000 2.511 1.924 1.302 2.547 Lying down flat makes me short of breath 3 Standing up from a chair makes me short of breath 4 2.190 2.537 The table is ordered by the mean threshold value. Response options: 1: easy / hard / impossible, 2: no difficulty / a little bit of difficulty / some difficulty / a lot of difficulty / can’t do because of my health; 3: not at all / somewhat / very much, 4: not at all / a little bit / quite a lot / can’t do; 5: not at all / a little bit / quite a lot; * two highest response option had been collapsed for the item parameter estimation the presentation of responses options for the patient remains the same IRT item bank parameters are developed as usual on a 0±1 metric, with 0 representing the scaling sample mean with a standard deviation of 1. For easier interpretability estimated patient scores are transformed linear to a 50+10 metric later. 4 The slope parameter is also called discrimination parameter. Higher slope parameters indicate a better discrimination, which makes the item more valuable, i.e. ‘informative’, for the score estimation: the capability e.g. to ‘run errands’ is more informative to determine the physical disability of a patient than e.g. her or his ability to ‘put the trash outside the house’. Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 The thresholds of an item show at which score level a particular response option is the most likely to be endorsed. For the item ‘running errands’ the threshold 1.236 separates the response ‘easy’ from ‘hard’, and the threshold 2.417 ‘hard’ from ‘impossible’. If a patient scores 3 standard deviations above the population mean s/he is most likely to answer the item ‘running errands is …’ with ‘impossible’, as her/his score is above the threshold of 2.417. If her/his level of disability is only 1.5 SD above the U.S. population mean s/he is likely to endorse ‘hard’, as the score is between the thresholds 1.236 and 2.417. The mean threshold illustrates the position of the item on the metric, which can be seen as ‘item n threshold. difficulty’ in traditional terms. The table is sorted by the mean Table 3. Score differences between different NYHA classes NYHA class N° I II III / IV n=11 n=53 N=36 Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 Items Mean SD Mean SD Mean SD Eta² F p RV (95%CI) Physical Disability 4.9±1.5 53.0 6.2 58.9 8.6 62.6 7.4 .12 6.2 .003 1.01 (.38-2.20) Fatigue 3.7±0.7 46.8 6.9 52.0 7.6 55.4 9.4 .09 4.9 .009 .80 (.21-1.91 Dyspnea 4.6±1.5 43.9 14.4 53.8 12.7 59.8 11.7 .13 6.9 .002 1.13 (.34-2.67) MLHFQ 21 15.5 14.8 38.3 25.3 44.9 22.9 .11 6.1 .003 1.00 Theta values of the CAT scales are scored on a T-distribution. ion o . Th The he ML MLHFQ LHF H scores are summary scores ranging from 0-105. All analyses have been en ccontrolled ontr on trol tr olle ol ledd fo le for the order of administration as a confounding variable. alidity: HF-CAT scale F-values divided by the F-value for the MLHFQ RV: Relative Validity: tstrap analysis was used to determine the confidence interval l sum scale. A bootstrap intervals Figure Legends Figure 1. HF CAT patient interface and examples for one item of each bank Figure 2. Measurement precision in relation to measurement range Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 The x-axis shows the patient score. In IRT terminology this score is referred to as the ‘theta score’. To make the HF-CAT and the legacy tools comparable both instruments are scored on the same metric as determined by the developed item banks. The y-axis shows the 95% confidence interval of the patient score, the smaller the y-value the higher the precision of the score. The dotted lines show confidence intervals which would be comparable to an internal constancy of Cronbach Į 0.80, 0.90, and 0.95 for illustrative purposes. With the following questions we would like to assess your current health status … For me, running errands is … easy Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 hard impossible I feel short of breath when I sit and rest … I feel tired … not at all not at all somewhat a little bit very much quite a lot 0,6 12 D=.80* 0,5 10 HF-CAT 5 items 0,4 8 95% CI 12 0,6 Physical Disability 10 0,5 SF-36 PF 10 items HF-CAT 10 items 0,48 SF-36 PF 10 items D=.90* 0,36 6 0,3 Physical Disability D=.95* Item Bank 20 items 2 0,1 0,12 -3 -2 30 -1 40 0,6 12 HF-CAT 5 items 0,5 10 95% CI Item Bank 20 items 0 0 0 50 1 60 2 70 3 80 -1 40 HF-CAT C CAT 4 items tem ems ems 0,5 10 0,4 8 0,3 6 0,3 6 4 0,2 Item Bank 29 items 2 0,1 -2 30 0 50 1 60 2 70 0,6 12 Dyspnea HOS 4 items -3 4 0,4 8 4 0,2 3 80 4 Dyspnea H HOS 4 item iitems temss tem Item Bank 29 items 2 0,1 0 0 -3 -2 30 -1 40 0 50 1 60 2 70 0,6 12 3 80 4 HF-CAT 5 items -3 -2 30 -1 40 0 50 1 60 2 70 0,6 12 Fatigue 10 0,5 95% CI Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 0,24 4 0,2 SF-36 VT 4 items HF-CAT 4 items 0,5 10 SF-36 VT 4 items 3 80 4 Fatigue 0,4 8 8 0,4 6 0,3 6 0,3 4 0,2 0,2 4 Item Bank 20 items 2 0,1 Item Bank 20 items 0,1 2 0 0 -3 30 -2 40 -1 50 0 60 1 patient score 70 2 80 3 4 -3 -2 30 -1 40 0 50 1 60 patient score 2 70 3 80 4 Short and Precise Patient Self-Assessment of Heart Failure Symptoms Using a Computerized Adaptive Test (HF-CAT) Matthias Rose, Milena Anatchkova, Jason Fletcher, Arthur E. Blank, Jakob Bjørner, Bernd Löwe, Thomas S. Rector and John E. Ware Downloaded from http://circheartfailure.ahajournals.org/ by guest on June 17, 2017 Circ Heart Fail. published online April 23, 2012; Circulation: Heart Failure is published by the American Heart Association, 7272 Greenville Avenue, Dallas, TX 75231 Copyright © 2012 American Heart Association, Inc. All rights reserved. Print ISSN: 1941-3289. Online ISSN: 1941-3297 The online version of this article, along with updated information and services, is located on the World Wide Web at: http://circheartfailure.ahajournals.org/content/early/2012/04/23/CIRCHEARTFAILURE.111.964916 Permissions: Requests for permissions to reproduce figures, tables, or portions of articles originally published in Circulation: Heart Failure can be obtained via RightsLink, a service of the Copyright Clearance Center, not the Editorial Office. Once the online version of the published article for which permission is being requested is located, click Request Permissions in the middle column of the Web page under Services. Further information about this process is available in the Permissions and Rights Question and Answer document. Reprints: Information about reprints can be found online at: http://www.lww.com/reprints Subscriptions: Information about subscribing to Circulation: Heart Failure is online at: http://circheartfailure.ahajournals.org//subscriptions/
© Copyright 2026 Paperzz