Reliability

Reliability and Validity Testing
Definitions
Validity - the extent to which a test
measures what it is designed to measure
Reliability - the extent to which a test or
measure is reproducible
Validity
Logical (face) - how much the measure
obviously involves the performance.
Construct - how well the measure relates to
the theory
Content - how well the outcome evaluates the
intervention
Criterion - how well the test measures against
a set standard
Assessment of Validity
Criterion validity
 Concurrent
 Predictive
 Prescriptive
Bland and Altman
Bias
Dispersion of the Bias
Relationship of Bias to value
M = Experimental measured value
GS = Gold Standard measured value
M
GS
102
96
98
105
Mean Diff
SD
1.96*SD
Bias + 1.96 SD,s
Bias - 1.96 SD's
Diff
103
98
93
101
-1
-2
5
4
Bias
SD Bias
ULA
LLA
Mean (M,GS)
102.5
97
95.5
103
1.5
3.5
6.9
8.4
-5.4
Prediction versus True VO2max; Difference
against mean (mls/min/kg)
20.00
15.00
10.00
Diff
5.00
0.00
35.00
40.00
45.00
50.00
55.00
60.00
65.00
70.00
75.00
80.00
85.00
-5.00
-10.00
-15.00
Mean
diff
bias
mean+1.96stdev
mean-1.96stdev
Bland and Altman Limits of
Agreement
Advantages
Easy to interpret
visually
Can indicate bias in
measurements
Can be clinically
useful
Useful for validity
Disadvantages
Difficult for more than
two raters or datasets
More complex to
interpret
Needs high numbers
Should also report
raw data to interpret
variation
Reliability
A measure CANNOT be valid but NOT
reliable
However a measure CAN BE reliable but
NOT valid
Reliability
Observed score =
True score + Error score
True score hard to evaluate but we can
estimate the error score
Sources of Error
The Participants
Sources of Error
The Testing
Poor directions
Additional motivation
Inconsistent protocol
Sources of Error
The Scoring
The scorers
Type of scoring system
Sources of Error
The Instrumentation
Calibration
Inaccuracies
Sensitivity
Statistical techniques
Pearsons r
ICC
Limits of agreement
Cronbachs alpha
Kappa statistic
Weighted kappa statistic
Pearsons r
Weaknesses
Bi-variate
Limited to two variables
Does not consider differences in variance
Only measures association not agreement
Not really appropriate for reliability
Intra-class correlation (ICC)
Strengths
Weaknesses
Univariate
Allows for unequal
cell numbers
Value from -1 to +1
Allows any number of
raters or subjects
Has several
formulae
Does not imply
usefulness
Ratios can be
difficult to compare
Between subject
variation should
reflect population
Calculation
Variance between (due to) repeated trials
Variance between (due to) repeated
observers/observations
Variance from ANOVA model = Mean
Squares
Shrout and Fleiss formulae
Case 1: Each subject rated by a different set of
k raters randomly selected from a larger
population of raters
Case 2: A random sample of k raters, selected
from a larger population of raters, rates each
subject
Case 3: Each subject is rated by k raters who
are the only raters of interest
Cases (1,1), (2,1) & (3,1) are used when
the unit of measurement is obtained from
only one measurement
Cases (1,k), (2,k) & (3,k) are used when
the unit of measurement is obtained from
more than one measurement (i.e. a mean
measurement)
How to calculate
Use equations and values obtained from
ANOVA’s (Rankin and Stokes, 1998)
Use macros downloaded from SPSS.com
(may not work with all versions of SPSS)
Cronbachs Alpha
Generalised measure of reliability
Easy to interpret
Similar to intraclass correlation
Kappa statistics
Kappa statistic
Nominal data
Weighted Kappa statistic
Ordinal data
Generating ICC’s
Need
Correct macro
Data laid out appropriately
Two lines of syntax to run macros
All files resident in the same directory
References
 Sim J (1993) Measurement validity in Physical
Therapy research. Physical Therapy, 73 (2); 48-55
 Rankin G, Stokes M (1998) Reliability of assessment
tools in rehabilitation: an illustration of appropriate
statistical analyses. Clinical Rehabilitation, 12; 187
 Bland JM, Altman DG (1986) Statistical methods for
assessing agreement between two methods of clinical
measurement. Lancet, Feb 8; 307-310.
 Kreb DE (1984) Intraclass correlation coefficients:
Use and calculation. Physical Therapy, 64 (10); 15811582.
 Thomas JR, Nelson JK (2001) Research Methods in
Physical Activity 4th Ed. Human Kinetics, Leeds.
 George,K, Batterham,A & Sulliavan,I (2000) Validity
in clinical research: a review of basic concepts and
definitions. Physical Therapy in Sport, 1; 19-27
more references
 Eliasziw M, Young SL, Woodbury MG, Fryday-Field K
(1994) Statistical methodology for the concurrent
assessment of interrater and intrarater reliability:
Using gonimetric measurements as an example.
Physical Therapy, 74 (8); 777-788.
 Keating J, Maryas T (1998) Unreliable inferences
from reliable measurements. Australian Journal of
Physiotherapy, 44 (1); 5-10.
 Greenfield MLH, Kuhn JE, Wotjys EM (1998) Validity
and Reliability. American Journal of Sports Medicine,
26 (3); 483-485.
 Batterham,A.M. & George,K.P. (2000) Reliability in
evidence-based clinical practice: a primer for allied
health professionals. Physical Therapy in Sport, 1;
54-61

Download Report

Reliability

Paperzz.com

Your Paperzz