FULL-TEXT - Manchester eScholar

Use of the TCI questionnaire in
General Practice: response rates
and reliability
Evan Kontopantelis
Stephen Campbell
David Reeves
NPCRDC
Team Climate Inventory
• 65 item measure with six subscales that
attempts to quantify the working climate
within a practice (Anderson & West, 1994)
• All items on a scale of 1 to 5
• The six subscales (factors) are:
Participation
Task style
Support for innovation
Reviewing processes
Objectives
Working
Data collection
• The questionnaire was distributed to all
clinical, nursing and administration staff
working in a sample of 60 practices in 1998
and 42 of the same practices in 2003
• Response rates varied greatly by practice
number of practices
average respondents per pract
average resp rate per pract
1998
60
9.5
63.1%
2003
42
12.2
65.1%
The question
• What level of response is required
to obtain a reliable/accurate TCI
subscore for a practice?
Data structure
• Three levels: items, respondents & practices
• But which is the exact form?
– When each respondent in a practice evaluates a
different set of items (e.g. “I generally prefer to
work as part of a team”)  I:R:P
– When each respondent in a practice evaluates the
same set of items (i.e. “My team has a lot of team
spirit”)  (I×R):P
• Unfortunately the questionnaire is a mixture
of both
Aggregate-level variables and
reliability
• Methods are based on concepts from
the generalizability theory
• The universe score is the score that an
object of measurement – e.g. a practice
– would receive on a characteristic – e.g.
participation – if its score was based on
the mean of all relevant predefined
conditions of measurement – e.g. all
possible respondents and questions
(O’Brien 1990)
Defining reliability
• Reliability is defined as the ratio of the
universe score’s variance to the
expected observed score variance:
2
 True
p 2
 Measure
• It is an indicator of how different the
observed score would have been, if
another random set of respondents
and/or questions had been selected
Variance components
Shavelson &
Webb 1991
graph
solid
circle
outer
ring
grey
ring
centre
circle
variability
expected observed
score
True score
(practices)
symbol
Error (respondents)
 r2, pr
 p2
Error (random error 2
 i , pi ,ri , pri ,e
+ items)
Estimating Reliability
• For practice j, with nj respondents and k
items and according to the I:R:P design:
(Marsden et al. 2006)
ˆ p2
ˆ j 
ˆ p2 
ˆ r2, pr
nj

ˆ i2, pi ,ri , pri ,e
njk
• Variance components need to be
calculated
Accuracy, defined
• It is the likely amount of error on the
observed score compared to the true score
• Using the central limit theorem we estimate a
95% CI for the TCI score and the subscores:
CI 95%  ˆ  1.96
ˆ r2, pr 
ˆ i2, pi ,ri , pri ,e
K
nj
• We defined a score as accurate if the 95% CI
for it was: [μˆ - 0.5, μˆ + 0.5]
• That is, 0.5 points on the scale of 1 to 5
Model & estimation parameters
• Only the 3-level random-effect model was
described but more were estimated:
– Mixed-effect models in which items was treated
as a fixed factor
– 2-level models that use the aggregate score of the
items (R:P)
• Variances are estimated within STATA, using
ANOVA and MLGLM (StataCorp, 2005)
• The GLLAMM command is used to estimate
the parameters of the ml linear models we
employed (Rabe-Hesketh et al. 2005)
Results – Review of processes 03
̂ p2
Variances
̂ r2, pr ,e ̂ r2, pr ̂ i2, pi ,ri , pri ,e
2l_anova_reflex
0.045
2l_anovau_reflex
model
0.5
n95%
pˆ av
nj
0.336
0.622
17.3
5.2
0.045
0.338
0.619
17.5
5.2
2l_gllamm_reflex
0.046
0.335
0.626
17.0
5.1
3l_f_anova_reflex
0.045
0.259
0.594
0.620
17.4
5.1
3l_f_anovau_reflex
0.044
0.268
0.546
0.617
17.6
5.2
3l_f_gllamm_reflex
0.046
0.253
0.653
0.626
17.0
5.1
3l_r_anova_reflex
0.045
0.247
0.691
0.622
17.3
5.1
3l_r_anovau_reflex
0.045
0.251
0.682
0.618
17.6
5.2
3l_r_gllamm_reflex
0.046
0.248
0.691
0.625
17.0
5.1
Results – ANOVA unpooled
1998
model
2003
0.5
n95%
nj
2l_anovau_TCI
0.679
10.7
3.3 0.764
8.8
3.1
3l_r_anova_obj
0.464
26.2
7.3 0.565
21.9
4.9
3l_r_anova_part
0.644
12.5
4.4
0.814
6.5
4.5
3l_r_anova_reflex
0.576
16.7
6.3
0.618
17.6
5.2
3l_r_anova_supinv
0.567
17.4
4.9
0.791
7.5
4.0
3l_r_anova_task
0.463
26.3
8.6 0.626
17.0
6.7
3l_r_anova_work
0.616
14.2
2.7 0.650
15.3
3.4
pˆ av
nj
0.5
n95%
pˆ av
Why are accuracy and reliability
scores so different?
1998
2003
• Reliability coefficients are
μ min max μ min max
part
affected by the low
3.6 2.7 4.5 3.7 2.7 4.6
variation in practice scores
supinv 3.4 2.5 4.3 3.5 2.6 4.3
reflex 3.4 2.2 4.1 3.4 2.9 4.0
• Practice mean scores did
work
not vary by more than 2
3.6 2.8 4.4 3.7 3.1 4.2
points on the scale of 1 to 5 obj
3.7 2.7 4.4 3.7 3.0 4.4
task
3.4 2.6 4.4 3.5 2.7 4.1
• TCI may not be particularly
TCI
3.5 2.8 4.3 3.6 2.9 4.2
good at detecting practice
differences in climate
Descriptives, practice mean scores
Summary
• TCI is a measure that attempts to quantify
the working climate within a practice
• Assuming that the I:R:P structure best
describes our data we calculate:
– the variances in the design
– reliability & accuracy measures
• Small between practice variances affect the
reliability score but don’t affect the accuracy
Future work
• Use of a finite population
correction for practices
• Examine the (I×R):P structure and
compare results to the I:R:P one
References
•
•
•
•
•
•
Anderson, N. & West, M. A. 1994, Team climate inventory : manual
and user's guide Windsor : ASE.
O'Brien, R. M. 1990, "Estimating the reliability of aggregate-level
variables based on individual-level characteristics", Sociological
Methods and Research; 18 (May 90) p.473-504
Shavelson, R. J. & Webb, N. M. 1991, Generalizability theory : a
primer Newbury Park ; London : Sage Publications.
Marsden, P. V., Landon, B. E., Wilson, I. B., McInnes, K., Hirschhorn,
L. R., Ding, L., & Cleary, P. D. 2006, "The reliability of survey
assessments of characteristics of medical clinics", Health
Serv.Res., vol. 41, no. 1, p.265-283
StataCorp 2005, Stata Statistical Software: release 9.2 College
Station, TX.
Rabe-Hesketh S., Skrondal A., & Pickles A. 2004, GLLAMM Manual
U.C. Berkeley.