Reliability and Validity Testing Definitions Validity - the extent to which a test measures what it is designed to measure Reliability - the extent to which a test or measure is reproducible Validity Logical (face) - how much the measure obviously involves the performance. Construct - how well the measure relates to the theory Content - how well the outcome evaluates the intervention Criterion - how well the test measures against a set standard Assessment of Validity Criterion validity Concurrent Predictive Prescriptive Bland and Altman Bias Dispersion of the Bias Relationship of Bias to value M = Experimental measured value GS = Gold Standard measured value M GS 102 96 98 105 Mean Diff SD 1.96*SD Bias + 1.96 SD,s Bias - 1.96 SD's Diff 103 98 93 101 -1 -2 5 4 Bias SD Bias ULA LLA Mean (M,GS) 102.5 97 95.5 103 1.5 3.5 6.9 8.4 -5.4 Prediction versus True VO2max; Difference against mean (mls/min/kg) 20.00 15.00 10.00 Diff 5.00 0.00 35.00 40.00 45.00 50.00 55.00 60.00 65.00 70.00 75.00 80.00 85.00 -5.00 -10.00 -15.00 Mean diff bias mean+1.96stdev mean-1.96stdev Bland and Altman Limits of Agreement Advantages Easy to interpret visually Can indicate bias in measurements Can be clinically useful Useful for validity Disadvantages Difficult for more than two raters or datasets More complex to interpret Needs high numbers Should also report raw data to interpret variation Reliability A measure CANNOT be valid but NOT reliable However a measure CAN BE reliable but NOT valid Reliability Observed score = True score + Error score True score hard to evaluate but we can estimate the error score Sources of Error The Participants Sources of Error The Testing Poor directions Additional motivation Inconsistent protocol Sources of Error The Scoring The scorers Type of scoring system Sources of Error The Instrumentation Calibration Inaccuracies Sensitivity Statistical techniques Pearsons r ICC Limits of agreement Cronbachs alpha Kappa statistic Weighted kappa statistic Pearsons r Weaknesses Bi-variate Limited to two variables Does not consider differences in variance Only measures association not agreement Not really appropriate for reliability Intra-class correlation (ICC) Strengths Weaknesses Univariate Allows for unequal cell numbers Value from -1 to +1 Allows any number of raters or subjects Has several formulae Does not imply usefulness Ratios can be difficult to compare Between subject variation should reflect population Calculation Variance between (due to) repeated trials Variance between (due to) repeated observers/observations Variance from ANOVA model = Mean Squares Shrout and Fleiss formulae Case 1: Each subject rated by a different set of k raters randomly selected from a larger population of raters Case 2: A random sample of k raters, selected from a larger population of raters, rates each subject Case 3: Each subject is rated by k raters who are the only raters of interest Cases (1,1), (2,1) & (3,1) are used when the unit of measurement is obtained from only one measurement Cases (1,k), (2,k) & (3,k) are used when the unit of measurement is obtained from more than one measurement (i.e. a mean measurement) How to calculate Use equations and values obtained from ANOVA’s (Rankin and Stokes, 1998) Use macros downloaded from SPSS.com (may not work with all versions of SPSS) Cronbachs Alpha Generalised measure of reliability Easy to interpret Similar to intraclass correlation Kappa statistics Kappa statistic Nominal data Weighted Kappa statistic Ordinal data Generating ICC’s Need Correct macro Data laid out appropriately Two lines of syntax to run macros All files resident in the same directory References Sim J (1993) Measurement validity in Physical Therapy research. Physical Therapy, 73 (2); 48-55 Rankin G, Stokes M (1998) Reliability of assessment tools in rehabilitation: an illustration of appropriate statistical analyses. Clinical Rehabilitation, 12; 187 Bland JM, Altman DG (1986) Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, Feb 8; 307-310. Kreb DE (1984) Intraclass correlation coefficients: Use and calculation. Physical Therapy, 64 (10); 15811582. Thomas JR, Nelson JK (2001) Research Methods in Physical Activity 4th Ed. Human Kinetics, Leeds. George,K, Batterham,A & Sulliavan,I (2000) Validity in clinical research: a review of basic concepts and definitions. Physical Therapy in Sport, 1; 19-27 more references Eliasziw M, Young SL, Woodbury MG, Fryday-Field K (1994) Statistical methodology for the concurrent assessment of interrater and intrarater reliability: Using gonimetric measurements as an example. Physical Therapy, 74 (8); 777-788. Keating J, Maryas T (1998) Unreliable inferences from reliable measurements. Australian Journal of Physiotherapy, 44 (1); 5-10. Greenfield MLH, Kuhn JE, Wotjys EM (1998) Validity and Reliability. American Journal of Sports Medicine, 26 (3); 483-485. Batterham,A.M. & George,K.P. (2000) Reliability in evidence-based clinical practice: a primer for allied health professionals. Physical Therapy in Sport, 1; 54-61
© Copyright 2026 Paperzz