RELIABILITY consistency or reproducibility of a test score (or measurement) Common approaches to estimating reliability Classical True Score Theory – test-retest, alternate forms, internal consistency – intraclass correlation useful for estimating relative decisions useful for estimating absolute decisions Generalizability Theory – can estimate both relative & absolute Reliability is a concept central to all behavioral sciences. To some extent all measures are unreliable. This is especially true with psychological measures and measurements based on human observation Sources of Error Random – fluctuations in the measurement based purely on chance. Systematic – Measurement error that affect a score because of some particular characteristic of the person or the test that has nothing to due with the construct being measured. CTST X=T+E – Recognizes only two sources of variance – test -retest (stability) alternate forms (equivalence in item sampling) test-retest with alternate forms (stability & equivalence but these are confounded) Cannot adequately estimate individual sources of error influencing a measurement ICC Uses ANOVA to partition variance due to between subjects and within subjects – – Has some ability to accommodate multiple sources of variance Does not provide an integrated approach to estimating reliability under multiple conditions Generalizability Theory The Dependability of Behavioral Measures, (1972) Cronbach, Glaser, Nanda, & Rajaratnam Dependability The accuracy of generalizing from a person’s observed score on a measure to the average score that person would have received under all possible testing conditions the tester would be willing to accept. The Decision Maker The score on which the decision is to be based is only one of many scores that might serve the same purpose. The decision maker is almost never interested in the response given to the particular moment of testing. Ideally the decision should be based on that person’s mean score over all possible measurement occasions. Universe of Generalization Definition & establishment of the universe admissible observations: – – observations that the decision maker is willing to treat as interchangeable. all sources of influence acting on the measurement of the trait under study. What are the sources of ERROR influencing your measurement? Generalizability Issues Facet of Generalization – raters, trials, days, clinics, therapists Facet of Determination – usually people, but can vary (e.g. raters) Types of Studies Generalizability Study (G-Study) Decision Study (D-Study) G-Study Purpose is to anticipate the multiple uses of a measurement. To provide as much information as possible about the sources of variation in the measurement. The G-Study should attempt to identify and incorporate into its design as many potential sources of variation as possible. D-Study Makes use of the information provided by the GStudy to design the best possible application of the measurement for a particular purpose. Planning a D-Study: – – – defines the Universe of Generalization specifies the proposed interpretation of the measurement. uses G-Study information to evaluate the effectiveness of alternative designs for minimizing error and maximizing reliability. Design Considerations Fixed Facets Random Facets Fixed Facet When the levels of the facet exhaust all possible conditions in the universe to which the investigator wants to generalize. When the level of the facet represent a convenient sub-sample of all possible conditions in the universe. Random Facets When it is assumed that the levels of the facet represent a random sample of all possible levels described by the facet. If you are willing to EXCHANGE the conditions (levels) under study for any other set of conditions of the same size from the universe. Types of Decisions Relative – – establish a rank order of individuals (or groups). the comparison of a subject’s performance against others in the group. Absolute – – to index an individual’s (or group’s) absolute level of measurement. measurement results are to be made independent from the performance of others in the group. Statistical Modeling ANOVA – just as ANOVA partitions a dependent variable into effects for the independent variable (main effects & interactions), G-theory uses ANOVA to partition an individual’s measurement score into an effect for the universe-score and an effect for each source of error and their interactions in the design. Statistical Modeling In ANOVA we were driven to test specific hypotheses about our independent variables and thus sought out the F statistic and pvalue. In G-theory we will use ANOVA to partition the different sources of variance and then to estimate their amount (Variance Component). One Facet Design 4 Sources of Variability – systematic differences among subjects (object – – – of measurement) systematic differences among raters (occasions, items) subjects*raters interaction confounded random error Two Facet Design Components of Variance Example of a fully crossed two facet design (Kroll, et. al.) Seven sources of variance are estimated: – – – – – – – subjects raters observations sx r sx o rxo sxrxo,e Variance Components Subjects (s) (sxo) Observations (o) (sxrxo) + Error (sxr) (oxr) Raters (r) TABLE 1 - Variance Components and Percentage of Variation for Measures of Pelvic Tilt (raters=2, observations=5) Resting Pelvic Tilt Source of Variation Anterior Pelvic Tilt Posterior Pelvic Tilt VC Percent VC Percent VC Percent 19.956 75.2 47.683 84.8 20.607 72.3 Raters 1.726 6.5 0.000 0.0 2.508 8.8 Observations 0.148 0.6 0.000 0.0 0.011 0.0 PxR 1.671 6.3 1.935 3.4 1.910 6.7 PxO 0.042 0.2 0.972 1.7 1.077 3.8 RxO 0.000 0.0 0.000 0.0 0.000 0.0 P x R x O, E 3.050 11.5 5.646 10.0 2.394 8.4 Persons Abbreviations: P x R = persons by raters; P x O = persons by observations; R x O = raters by observations; P x R x O, E = persons by raters by observations combined with residual error TABLE 2 - Variance Components and Percentage of Variation for Modified Schober, Attraction Method, and Lower Abdominal Strength Measures (raters=2, observations=3) Modified Schober Attraction Method Lower Abdominal Strength Source VC Percent VC Percent VC Percent Persons 1.006 67.8 0.360 81.3 105.055 52.9 Raters 0.000 0.0 0.000 0.0 0.000 0.0 Observations 0.008 0.5 0.000 0.0 0.000 0.0 PxR 0.181 12.2 0.000 0.0 71.349 36.0 PxO 0.029 2.0 0.083 18.7 3.695 1.9 RxO 0.016 1.1 0.000 0.0 0.757 0.4 P x R x O, E 0.243 16.4 0.000 0.0 17.577 8.9 Abbreviations: P x R = persons by raters; P x O = persons by observations; R x O = raters by observations; P x R x O, E = persons by raters by observations combined with residual error Relative Error Facet of Determination: Subjects Subjects (s) (sxo) Observations (o) (sxrxo) + Error (sxr) (oxr) Raters (r) F2rel = F2sr /nr + F2so /no+ F2sro,e/nrno Absolute Error Facet of Determination: Subjects Subjects (s) (sxo) Observations (o) (sxrxo) + Error (sxr) (oxr) Raters (r) F2abs = F2r/nr + F2o /no + F2sr /nr + F2so /no + F2or /nonr + F2sro,e /nonr Generalizability Coefficients AKA: Reliability Coefficients Relative Generalizability Coefficient for Subjects: F2s 2 = ------------F2s + F2rel Absolute Generalizability Coefficient for Subjects: F2s = ------------F2s + F2abs TABLE 3 - Variance Components and Percentage of Variation for Right and Left Hamstring Flexibility Measures (raters = 2, observations = 3) Right Hamstring Flexibility Left Hamstring Flexibility Source VC Percent VC Percent Persons 398.526 93.1 382.639 91.9 Raters 0.000 0.0 0.000 0.0 Observations 1.767 0.4 2.123 0.5 PxR 20.656 4.8 24.030 5.8 PxO 0.708 0.2 1.235 0.3 RxO 0.001 0.0 0.707 0.2 P x R x O, E 6.407 1.5 5.727 1.4 Abbreviations: P x R = persons by raters; P x O = persons by observations; R x O = raters by observations; P x R x O, E = persons by raters by observations combined with residual error TABLE 4 - Variance Component and Percentage of Variation of Abdominal and Trunk Muscle Endurance Methods (raters=2, observation=2) Abdominal Muscle Endurance Trunk Muscle Endurance Source VC Percent VC Percent Persons 646.177 68.9 1160.656 83.6 Raters 43.936 4.7 0.000 0.0 Observations 0.000 0.0 0.000 0.0 PxR 0.000 0.0 21.732 1.6 PxO 15.736 1.7 24.559 1.8 RxO 0.000 0.0 0.000 0.0 232.117 24.7 181.944 13.1 P x R x O, E Abbreviations: P x R = persons by raters; P x O = persons by observations; R x O = raters by observations; P x R x O, E = persons by raters by observations combined with residual error TABLE 5 - Generalizability of Pelvic Tilt Measures Resting Pelvic Tilt Anterior Pelvic Tilt G-study D-study G-study D-study G-study D-study 2 5 1 1 2 5 1 1 2 5 1 1 ρ2 0.946 0.809 0.967 0.848 0.936 0.793 φ 0.907 0.750 0.967 0.848 0.886 0.723 nr = no = Posterior Pelvic Tilt Abbreviations: nr = number of raters; no = number of observations; ρ2 = generalizability (G) coefficient for relative decisions; φ = G-coefficient for absolute decisions TABLE 6 - Generalizability of Trunk Flexibility and Strength Measures Modified Schober Attraction Method Lower Abdominal Strength Right Hamstring Flexibility Left Hamstring Flexibility study G D G D G D G D G D nr = no = 2 3 1 1 2 3 1 1 2 3 1 1 2 3 1 1 2 3 1 1 ρ2 0.877 0.690 0.928 0.813 0.752 0.531 0.972 0.935 0.966 0.925 φ 0.873 0.678 0.928 0.813 0.724 0.530 0.970 0.931 0.964 0.919 Abbreviations: nr = number of raters; no = number of observations; ρ2 = generalizability (G) coefficient for relative decisions; φ = G-coefficient for absolute decisions TABLE 7 - Generalizability of Trunk Endurance Measures Flexion Extension G-study D-study G-study D-study 2 2 1 1 2 2 1 1 ρ2 0.908 0.723 0.944 0.836 φ 0.880 0.689 0.944 0.836 nr = no = Abbreviations: nr = number of raters; no = number of observations; ρ2 = generalizability (G) coefficient for relative decisions; φ = G-coefficient for absolute decisions
© Copyright 2025 Paperzz