(SF-36) Health Survey questionnaire

Journal of Public Health Medicine
Vol. 21, No. 3, pp. 255–270
Printed in Great Britain
Short Form 36 (SF-36) Health Survey
questionnaire: which normative data should be
used? Comparisons between the norms
provided by the Omnibus Survey in Britain, the
Health Survey for England and the Oxford
Healthy Life Survey
Ann Bowling, Matthew Bond, Crispin Jenkinson and Donna L. Lamping
Abstract
Background Population norms for the attributes included in
measurement scales are required to provide a standard with
which scores from other study populations can be compared.
This study aimed to obtain population norms for the Short
Form 36 (SF-36) Health Survey Questionnaire, derived from a
random sample of the population in Britain who were
interviewed at home, and to make comparisons with other
commonly used norms.
Methods The method was a face-to-face interview survey of a
random sample of 2056 adults living at home in Britain
(response rate 78 per cent). Comparisons of the SF-36 scores
derived from this sample were made with the Health Survey
for England and the Oxford Healthy Life Survey.
Results Controlling for age and sex, many of mean scores on
the SF-36 dimensions differed between the three datasets.
The British interview sample had better total means for
Physical Functioning, Social Functioning, Mental Health,
Energy/Vitality, and General Health Perceptions. The Health
(interview) Survey for England had the lowest (worst) total
mean scores for Physical Functioning, Social Functioning,
Role Limitations (physical), Bodily Pain, and Health Perceptions. The postal sample in central England had the lowest
(worst) total mean scores for Role Limitations (emotional),
Mental Health and Energy/Vitality.
Conclusion Responses obtained from interview methods
may suffer more from social desirability bias (resulting in
inflated SF-36 scores) than postal surveys. Differences in SF36 means between surveys are also likely to reflect question
order and contextual effects of the questionnaires. This
indicates the importance of providing mode-specific population norms for the various methods of questionnaire
administration.
Keywords: SF-36, health status, methodology, questionnaires
Introduction
The SF-36 is a generic measure of health status, which was
derived from the batteries developed for the Rand Medical
Outcomes Study in the USA.1 It can be used to provide a
population-based measure of broader health status, for use in
service planning and monitoring and in making comparisons
with the health of populations elsewhere, and in measuring the
health outcomes of clinical interventions.2
All measurement scales need to define population norms for
the attribute of interest, to provide a standard with which scores
from other study populations can be compared – the
investigator needs to know what scores would be expected
from a comparable group to that under study. Thus normative
data are essential to interpret a scale’s scores for particular study
populations. For example, a judgement of ‘healthiness’ or
‘unhealthiness’ on the basis of scores on a health status scale
needs to relate to some norm (e.g. statistical, as in a group
average). It is essential that norms are valid scores derived from
members of a clearly defined, random sample of the relevant
population, which, in turn, has been taken from a representative
and unbiased sampling frame. This comparison enables
investigators to determine whether scores from the sample
under study are above or below those for the general population.
Both national and regional norms are required. National norms
CHIME/Population Studies and Primary Care, Royal Free and University
College London Medical School, Whittington Campus, London N19 5NF.
Ann Bowling, Professor of Health Services Research
Matthew Bond, Lecturer in Health Services Research
Health Services Research Unit, Department of Public Health, University of
Oxford, Institute of Health Sciences, Old Road, Headington, Oxford OX3 7LF.
Crispin Jenkinson, Deputy Director
Health Services Research Unit, Department of Public Health and Policy,
London School of Hygiene and Tropical Medicine, Keppel Street, London
WC1E 7HT.
Donna L. Lamping, Senior Lecturer in Health Services Research
Address correspondence to Professor A. Bowling.
q Faculty of Public Health Medicine 1999
256
JOURNAL OF PUBL IC HEALTH MEDICINE
are essential for making comparisons with national datasets, as
well as for making comparisons at local levels. However,
regional norms are also required for the latter given variations
in health status by geographical area, independently of the
socio-demographic composition of the population.
Most commonly used norms are based on random samples
of patients registered with general practitioners (GPs).
Centralized lists of these patients are commonly used in
research, given that 98 per cent of the British population are
registered with National Health Service (NHS) GPs, although
they do present problems of ‘blanks’ (people who have died or
moved and whose records have not been updated). The
currently used normative data for the SF-36 in the UK were
derived from postal surveys of people registered with GPs in
central England and Sheffield.3–5 The central England study
norms (the Oxford Healthy Life Survey), which was restricted
to people aged 18–64, are the most widely used.3,4 The
sampling frame for the Sheffield survey was limited to lists of
patients aged 16–74 registered with just two general practices,
and which under-represented people in social class II and overrepresented those in social class III and employed women.5
Given the need for regional norms for making local
comparisons, as well as for norms relating to older people,
regional postal and interview surveys based on the SF-36 have
also been carried out, for example, in Aberdeen, Scotland,6 and
in West Glamorgan in Wales, and Dudley and North
Staffordshire in England.7–10 All of these regional norms
were based on lists of GPs’ patients. The postal survey in
Aberdeen sampled GPs’ patients with pre-defined conditions
who had been referred to out-patients departments, and who
were aged between 16 and 86 years registered with four training
practices, although they did subsequently compare them with a
sample from the electoral register.6 More recently, normative
data for England have been provided by the English Health
Survey for 1996, which was based on personal interviews in
respondents’ homes. The provision of norms from a mixture of
different research designs (e.g. postal and interview surveys) in
different areas creates difficulties when comparing data.11
SF-36 scores have been reported to vary significantly
between samples in different areas, perhaps reflecting the
known variation in health status and mortality by region, even
when controlling for social class (e.g. SF-36 scores have been
reported to be lower (worse) in West Glamorgan than in Oxford
or Aberdeen),8 and higher on most dimensions in East Anglia
and Oxford than in other regions.11 One further limitation of the
commonly used UK and regional norms in the UK is that, as
well as being area specific, they take different younger age
ranges as the starting point for sampling (e.g. ages 16, 18 or 20)
and take different older age cut-off points (e.g. 65, 75, 89), or
they sample only very old age groups (e.g. 70 years and over).
The question these surveys collectively pose is: are the
differences in norms a result of: sample selection bias,
geographical area differences, differences in mode of administration, question order effects or contextual differences (the
nature of the survey and the positioning of the SF-36 scale in
the wider questionnaire can both influence response bias)? This
questioning has led to some investigators carrying out local
surveys with the aim of providing more appropriate norms,7–10
or to investigators even making comparisons with US norms,
given the unavailability of national British norms.12
The SF-36 has been tested for use in the USA and in 12 other
countries, as well as the UK (Australia, the Netherlands,
France, Belgium, Canada, Denmark, Italy, Japan, Norway,
Spain, Sweden, and a Chinese (HK) version has been
developed). These studies formed the International Quality of
Life Assessment (IQOLA) project. However, despite evidence
on the validity and reliability of the norms for each country,
published together in a special journal issue,13 these norms
suffer from the same geographical constrictions as the UK
norms. Also, just one of the groups of researchers in this
collection of international papers on the SF-36 questioned the
mode of administration of the instrument.14 Perkins and
Sanson-Fisher14 reported on their Australian study in which
they randomly allocated community sample members to the
postal or telephone interview mode of administration of the SF36. Not only was overall response and item response higher
with the telephone interview mode (although more people aged
75 and over refused to participate with this mode), and also
internal consistency (Cronbach’s alpha), but mean SF-36 scores
for Bodily Pain, Social Functioning, Role Limitations (emotional)
and Mental Health (four of the eight scale dimensions) were also
significantly higher for the interview mode of administration in
comparison with the postal mode. Unfortunately, the researchers
did not explore the implications of the latter difference.
Because of the age restrictions of the surveys providing UK
norms, and the need for regional norms for comparison, coinvestigators in West Glamorgan, Dudley and North Staffordshire carried out interview surveys of a random samples of
people aged 65–897,9 or included only people aged 70 and
over.9 They conducted further postal and interview surveys in
West Glamorgan, based on a random sample of people aged
20–89.8,10 Thus, these omitted young adults aged 16–20, and
very elderly people aged 90+.
The scarcity of SF-36 norms for older people has been
experienced in other countries and is not unique to the UK.13
The suitability of the SF-36 for use with this population is still
uncertain. Relatively poor levels of item response with
increasing age in surveys involving self-completion of the
SF-365,6,10 have led to questions about its value in older age
groups and criticism of the relevance of the items to older
people.6,16,17 The high total and item non-response to the SF-36
among older people, particularly when cognitive impairment
and physical disability was present, has been confirmed in
recent surveys of older hospital and ambulatory care patients,
which has again led to serious doubts about its utility as a health
status measure for self-administration among older people.18
Given that there is an inverse association with health status and
age, SF-36 scores are likely to inflate the ‘healthiness’ of
NORMATIVE DATA FOR SF-36
populations. Moreover, as Lyons et al.19 have pointed out, as
older people are the main users of health services, a health
status measure that is unsuitable for them has limited practical
use.
More recently, a Health Survey for England was designed to
provide annual data about the nation’s health. The 1996 survey
included the SF-36.11 The survey was based on a random
sample of 720 postcode sectors, and a random sample of 12 960
addresses from the postcode file. This provides the largest
dataset for the provision of norms to date, and involved
administration of the SF-36 to adults aged 16+. However,
figures for item-response and non-response to the SF-36 have
not been released and the published data have not been widely
accessed.
Norms based on different modes of questionnaire administration are also required. The SF-36, like many measures of
health status, can be self or interviewer administered. Each
mode of administration has advantages and disadvantages. Selfadministered questionnaires are more economical in staff time
and money, and allow respondents to complete the instrument
in their own time. However, problems include bias or lack of
full completion because of respondents’ lack of comprehension,
illness or frailty, the researchers’ lack of control over question
order effects (respondents can read through the questionnaire
before completing it and their response is therefore biased),
lack of motivation and variable time periods of completion. The
best quality data are usually derived from face to face
interviews with people where the interviewer can motivate
the respondent, explain questions where appropriate, control
the order in which the questions are asked, maximize response,
and minimize item non-response. However, personal interviews
are expensive and the interviewer can also introduce biasing
effects (interviewer bias) and may lead to social desirability
bias, particularly in sensitive areas, including physical and
mental health.20–22
Studies comparing the results of questionnaires by mode of
administration have reported inconsistent results. Whereas
some report no or few differences in results,23,24 others have
shown that telephone interviews yield more positive response
patterns than self-administered questionnaires mailed to
respondents.19,21,22 McHorney et al.22 provided national
norms for the SF-36 in the USA for both a postal and telephone
survey approach. They reported that all of the SF-36 dimension
scores were more favourable by about 3–10 points in the
telephone interviews, and the reporting of chronic conditions
was more frequent for postal than telephone interview
respondents. Younger people were more likely to refuse to
participate in the postal mode, thus non-response bias cannot be
ruled out as one explanation for more negative health status
scores in the postal mode. However, this explanation is
unlikely, as the findings of McHorney et al. have been
replicated by Perkins and Sanson-Fisher,14 as described earlier,
and by Lyons et al.19 in a randomized cross-over study of outpatient attenders. Patients were randomly assigned to either an
257
initial out-patient postal questionnaire followed up by an
interview-based questionnaire while attending out-patients
departments, or an initial interview-based questionnaire while
in out-patients departments, with a postal follow-up questionnaire. Lyons et al. compared SF-36 profiles of the same
respondents’ clinic-based interviews and postal questionnaires
self-completed at home, and reported that seven of the eight SF36 dimension scores were lower (indicating worse health status)
for self-completion than interview-based formats, with the
largest differences in Role Limitations due to emotional
problems and Social Functioning. The implication of these
studies is that the mode of administration of the SF-36 leads to
systematic differences in health status ratings, with selfcompleted modes providing more negative profiles. Lyons
et al. pointed to the implications for considerable error in the
interpretation of data from study designs based on baseline
interviews and postal follow-up questionnaires to assess
outcomes of health care interventions. As they also pointed
out, there are problems in interpretation of data where the
differences in scores related to mode of administration are as
large as the effect of the therapies or medical condition under
investigation. They illustrated how the difference in scores by
mode of administration can equate to 20–50 per cent of the
impact of having a condition.7,19
Two further problems in interpretation may be presented by
question order effects (position of SF-36 in wider questionnaire,
before or after other questions on health) and contextual effects
(sensitizing effects of the health status scale being included
within a questionnaire on health status, medical effectiveness,
or in a generic questionnaire). These issues have not been fully
examined for the SF-36, or for other health status questionnaires. One principle of questionnaire design is that general
questions should be placed before specific ones, to minimize
bias from order effects.25 It is possible that if disease-specific
questions are asked before generic health status items, then
respondents’ generic health status ratings would be more
favourable. This is because the disease items had already been
considered by respondents and therefore excluded in replies to
the generic items. Accordingly, Keller and Ware26 recommended that the SF-36 should be presented to respondents
before more specific health and disease items or scales, and that
there should be a clear break between scales, to remove
potential bias from order effects. This can only be controlled for
in interviewer modes of administration. Accordingly, Barry
et al.,27 in their study of benign prostatic hyperplasia, reported
that SF-36 scores were better when disease-specific modules
were administered first, although the differences were not
statistically significant. This issue is pertinent when comparing
results from the SF-36 with normative data. It is necessary to
look, not only at mode of administration, but also the
positioning of the SF-36 within interview-based questionnaires
and also the context of the study. For example, the central
England study3 was based on a Healthy Life Survey, which may
have sensitized respondents to health and lifestyle issues from
258
JOURNAL OF PUBL IC HEALTH MEDICINE
the outset. The central England study team correctly placed the
SF-36 at the beginning of the survey questionnaire, but this
would minimize order effects only if respondents did not read
through the (postal) questionnaire before responding. In
contrast, the Office for National Statistics (ONS; formerly
known as the Office of Population Censuses and Surveys)
Omnibus Survey investigators administered the SF-36 to
respondents in the middle of a lengthy interview about a
range of non-health topics (e.g. savings, household characteristics). The Health Survey for England in 199611 involved the
administration of the SF-36 towards the end of a lengthy
interview about negative health (e.g. respiratory and other
specific medical problems). This positioning carried the
potential for bias from order effects.
Just as investigators require appropriate geographical norms
with which to compare their data, researchers undertaking a
personal interview survey cannot necessarily rely on norms
derived from a postal survey and vice versa. In the absence of
appropriate norms, at the very least the comparisons will
require adjustment. This step requires the provision of
information on area, contextual, question order and mode of
administration effects. These effects are examined here. SF-36
norms based on a random sample of adults interviewed at
home in Britain are presented. The availability of total
population norms extends previously published norms based
on regional samples in the UK and national norms for England
only, and provides normative data for adults of all ages in
Britain, including elderly people. Comparisons with the
regional norms from the central England survey – the
Oxford Healthy Life Survey3,4 (the most commonly used
normative dataset) – and results from the Health Survey for
England11 are presented in this paper, with the aim of
furthering debate on the issue of which norms are most
appropriate for use.
Aims and methods
The aims of the analyses presented here were to provide
population norms, and information on internal consistency, for
the Short Form 36 Health Survey Questionnaire (SF-36), based
on a random sample of the population in Britain. A further aim
was to make comparisons between the norms presented and the
commonly used UK regional norms based on the sample from
central England, and the more recent results from the 1996
Health Survey for England, which were described earlier.3,4,11
The study design for the British survey presented here was a
face-to-face interview survey, in respondents’ homes, of a
national random sample of people aged 16 and over in Great
Britain. The survey was carried out in November 1992, and was
commissioned by the King’s Fund Institute in London. The
dataset has recently been released by the Office for National
Statistics to the Data Archive at the University of Essex, with
approved access for authorized, registered users. The survey
was previously unpublished.
The vehicle for the study was the ONS Omnibus Survey in
Great Britain. The sampling frame was the British postcode
address file of ‘small users’, which includes all private
household addresses. The sample was a multi-stage stratified
random sample. The postcode address file was stratified by
region, the proportion of households renting from local
authorities, and the proportion in which the head of the
household is in socio-economic group 1–5 or 13 (i.e. a
professional, employer or manager). One hundred postal sectors
were selected with probability proportional to size. Within each
postal sector, 30 addresses were selected randomly. The
number of sampled addresses was 3000, with the aim of
achieving a target of 2000 completed interviews.
If an address contained more than one household, the
interviewer was instructed to use a standard procedure to select
just one household randomly. In households with more than one
adult member, just one person aged 16 or over was selected for
interview with the use of a Kisch grid. Because only one
household member was interviewed, people in households that
contained few adults had a better chance of selection than those
in households with many. A weighting factor was applied to
correct for this unequal probability. The individual adult, rather
than the household, was the unit of analysis in the results
presented here. Of the 3000 selected addresses, 356 were
ineligible (e.g. non-domestic). At the remaining 2644
addresses, 327 (12 per cent) people refused to take part, 40 (2
per cent) were incapable of interview, 221 (8 per cent) were
non-contactable. Consequently, 2056 people aged 16 and over
were interviewed in person in their own homes, giving a
response rate of 78 per cent. Details of the design of the central
England Survey and the Health Survey for England 1996, with
which comparisons were made, are briefly described next.
The sampling frame for the central England study, known as
the Oxford Healthy Life Survey, was computerized registers of
GPs’ patients aged 18–64 in four family health services
authorities (now merged with district health authorities) in
central England: Berkshire, Buckinghamshire, Northamptonshire and Oxfordshire. The study was a postal survey in 1991–
1992 of 9332 randomly sampled people (representing 72 per
cent who responded).3 The central England sample did not aim
to include anyone aged 65 or over. The investigators compared
the characteristics of their sample with 1981 Census data and
1991 population estimates, and reported that their sample
mirrored closely the characteristics of the general population
(for age, sex and social class), although it slightly overrepresented those in the higher social classes I, II and IIInm (65
per cent fell into these groups, in comparison with 56 per cent
for the British population in the 1991 Census). They did note
the limitations of their data. Most users make comparisons with
the central England normative data. The investigators of the
latter have developed a user’s handbook,4 and also developed a
scoring algorithm for the two sub-scales that can be derived
from the SF-36 – the SF-36 Physical and Mental Component
Summary Scores.28
NORMATIVE DATA FOR SF-36
The Health Survey for England was designed to provide
annual data about the nation’s health. The 1996 survey included
the SF-36 and focused on respiratory disease.11 The survey was
based on a random sample of 720 postcode sectors, and a
random sample of 12 960 addresses from the postcode file, after
stratification for socio-demographic factors. Within each
household, all persons aged two and over were eligible for
inclusion in the survey. Interviews were obtained with 20 328
people; 16 443 were with those aged 16+ (75 per cent response
rate for these adults), and this group were asked to complete the
SF-36. The characteristics of respondents were broadly similar
in age, sex and social class, to those in the 1991 Census,
although it slightly under-represented men. Comparison with
mid-1995 population estimates show that men aged 16–34 were
slightly under-represented. A major advantage of this, over
earlier datasets, is that apart from covering England as a whole,
the sample included sizeable numbers of older people: in the
households that agreed to co-operate with the survey, 928 men
and 1121 women (n ¼ 2049) were aged 65–74, and 573 men
and 903 women (n ¼ 1476) were aged 75þ; between 96 and 98
per cent of these groups of older people were interviewed, and
between 91 and 96 per cent of people in these age groups in the
co-operating households completed the SF-36. This sample,
then, represents the largest representative dataset for older
people in England.
Measures
The questions for the Omnibus Survey included the anglicized,
UK version of the SF-36,4 the ONS Omnibus standard sociodemographic items, and items on self-reported illness or injury
that restricted usual activities during the last 2 weeks, long-term
illness that limited daily activities, together with single items on
health service use (in last 2 weeks, respondent talked to a
doctor, attended casualty or A&E department (apart from
antenatal or postnatal visits), or has been an in-patient).
The SF-36 contains 36 items within eight dimensions:
Physical Functioning (ten items); Social Functioning (two
items); Role Limitations due to physical problems (four items);
Role Limitations due to emotional problems (three items);
Mental Health (five items); Energy/Vitality (four items); Pain
(two items) and General Health Perceptions (five items); and an
item on perceived changes in health status in the past 12
months.
The scoring method for the SF-36 involves recoding,
summing and transforming dichotomous (‘yes’ or ‘no’) and
ranked (e.g. ‘none’ to ‘very severe’) response categories for the
eight dimensions using a scoring algorithm, into a scale ranging
from zero (worst possible health state) to 100 (best possible
health state). The results for the eight dimensions have
conventionally been reported as means, rather than frequency
distributions.2,29 This has been done for pragmatic reasons, to
optimize the ability to make easier comparisons of results
across studies, and on the grounds that the treatment of the data
259
as interval level has minimal effects on most statistical
procedures, although this has been the subject of debate.30,31
Two summary scores – Physical and Mental Health Component
Summary Scores – can also be calculated, in each case using a
formula that involves multiplying each SF-36 scale z-score by
its respective factor score coefficient.4,28,29
The ONS version of the SF-36 contained one difference. The
question in the Mental Health dimension asking ‘How much
time during the past month ...’ ‘Have you felt calm and
peaceful?’, the word ‘peaceful’ was changed to ‘cheerful’.
Given that question wording can affect response, it is possible
that this may lead to some differences in response between this
and the original item included in other surveys, and interpretation of results should be cautious for this item.
In the tables presented, the authors have focused on the
magnitude of the score differences, rather than statistical
significance, given that very small differences are likely to be
statistically significant with such large sample sizes.
Results
Table 1 shows the socio-demographic characteristics of the
1992 Omnibus Survey sample, which are comparable with the
characteristics of the adult sample for the 1992 General
Household Survey (GHS) in Britain.32 Checks which the
ONS makes on non-response bias for the GHS indicate that this
is small.33 The characteristics of respondents are similar across
each data source, except for reporting of a long-term health
problem, which can probably be explained by differences in
question wording. The characteristics of the respondents also
compare well with 1991 Census figures (mid-term 1992
population estimates) for Britain (case estimates based on a
10 per cent random sample).
Table 2 confirms the results of the earlier research in the UK
and USA2,3,5,6,22 showing good internal consistency of the SF36 dimension scores, with Cronbach’s alphas exceeding 0.80
for each dimension except Social Functioning and Health
Perceptions. The table presents the Cronbach alpha coefficients
for the sample compared with those reported from five
other sources. Results were similar across all six samples.
Kolmogorov–Smirnov tests were also highly significant for
each of the eight domains, and are shown in Table 2. These
results confirm the highly skewed nature of the distributions
(see Fig. 1), which is a problematic feature of all health status
scales.
Table 3 shows the mean and standard deviations for the eight
SF-36 dimensions for the Omnibus sample by their age, sex,
social class, limitations on activities, long-term health problems
and health service use. Item response was high for all SF-36
dimensions, as would be expected in an interview-based survey.
The eight dimension scores were able to distinguish between
those in the highest and lowest social classes; those in younger
and older age groups; males and females; those with and
without a longstanding illness; and users and non-users of
260
JOURNAL OF PUBL IC HEALTH MEDICINE
Table 1 Socio-demographic characteristics of sample*
1992 Omnibus sample
1992 GHS sample
................................
...........................
Adults
%
Adults
(no.)
Adults
%
1991 Census mid-term
Population estimates 1992
......................................
Adults
%
Sex
Male
Female
45
55
(929)
(1122)
47
53
49
51
Age
16<25
25<45
45<55
55<65
65<75
75+
10
36
15
15
13
11
(204)
(735)
(302)
(297)
(281)
(217)
14
36
16
13
12
8
16
36
15
15
11
9
Household size
1
2
3–11
25
38
37
(513)
(770)
(773)
26
33
40
26
34
40
Ethnic group
White
Black and other
97
3
(1981)
(68)
95
5
95
5
Region of residence
The North and North West
Midlands and East Anglia
Greater London
South East
South West & Wales
Scotland
25
20
12
20
14
9
(520)
(416)
(236)
(407)
(288)
(189)
26
20
11
19
14
9
26
20
12
19
14
9
Social class†
I professional
II intermediate
IIInm skilled non-manual
IIIm skilled manual
IV semi-skilled
V unskilled
4
24 52%
24
23
17
8
(73)
(483)
(476)
(448)
(341)
(151)
4
24 53%
25
21
17
8
Housing tenure
Owner occupier
Owner–mortgage
Rents local/housing authority
Rents privately
26
44
22
8
(528)
(902)
(452)
(167)
25
42
25
7
–
–
–
–
Marital status
Married/cohabiting
Single
Widowed
Divorced/separated
59
18
13
9
(1124)
(368)
(272)
(191)
65
20
9
6
–
–
–
–
Age completed full-time education
<14
15<19
19+
23
64
13
(463)
(1319)
(274)
13
69
18
–
–
–
Economic status
Working full time
Working part time
Unemployed
Inactive
37
13
5
45
(747)
(254)
(94)
(900)
41
15
6
37
–
–
–
–
}
}
5
28
23
22
16
6
261
NORMATIVE DATA FOR SF-36
Table 1 contd
1992 Omnibus sample
1992 GHS sample
................................
...........................
Adults
%
Adults
(no.)
Adults
%
1991 Census mid-term
Population estimates 1992
......................................
Adults
%
Contact with GP in last 2 weeks
Yes
No
20
80
(420)
(1629)
16
84
–
–
Out-patient in last 3 months
Yes
No
17
83
(355)
(1694)
15
85
–
–
In-patient in last year
Yes
No
13
87
(257)
(1792)
10
90
–
–
Long-term health problem‡
Yes
No
22
78
(452)
(1594)
37
63
–
–
Acute health problem
Yes
No
16
84
(320)
(172)
13
87
–
–
Number of respondents*
1972–2056
11 385–19 274
45 110 470
*Totals do not all equal 100% as a result of weighting.
†Social class in Census was derived from a question on paid job in last 10 years; social class in Omnibus and GHS was derived on ‘current main job’ and
‘last main job’.
‡GHS: ‘Longstanding illness, disability or infirmity’.
health services; thus supporting the construct validity of the SF36. These results support earlier research.2,4,5
Table 4 compares the Omnibus (interview) Survey SF-36
dimension scores by sex and age group with those from the
central England postal survey (the normative dataset most often
used for comparisons in the UK) and with the Health
(interview) Survey for England.
The total scores for the three samples show that the postal
central England survey had the lowest (worst) mean scores for
Role Limitations (emotional), Mental Health and Energy/
Vitality.
The Health (interview) Survey for England had the lowest
(worst) total mean scores for Physical Functioning, Social
Functioning, Role Limitations (physical), Bodily Pain and
Health Perceptions.
Finally, the Omnibus Survey had the highest mean scores for
Physical Functioning, Social Functioning, Mental Health,
Energy/Vitality and General Health Perceptions. The Health
Survey for England had lower (worse) total mean scores than
both of the other surveys on five of the eight dimensions.
Tri-variate analyses within age groups showed that, for most
age groups, the SF-36 scores for the three dimensions of
Physical Functioning, Bodily Pain and General Health Perceptions were lowest (worst health) for the Health (interview)
Survey for England, with the central England postal survey in
the middle, and highest (best health) in the Omnibus (interview)
Survey. Patterns were in this direction but less consistent with
increasing age for Social Functioning. The central England
postal survey scores were lower (worse health) than the scores
for two interview surveys for younger age groups in relation to
Energy/Vitality, Role Limitations (emotional) and Mental
Health. Differences were less evident or consistent for Role
Limitations (physical).
Multiple linear regression analyses showed that the slight
social class differences between the samples did not account for
the variance in the SF-36 dimension scores between the studies;
nor did any variation in the age and sex distributions in the
samples (tables available from the authors).
Discussion
Regional norms are useful for comparing with local datasets,
given that health status varies by area of residence. National
norms are also required for making comparisons with national
and with (as a yardstick) local datasets. The ONS Omnibus
Survey data provided the opportunity to calculate norms for the
SF-36 based on a random sample of the population in Britain,
and to make comparisons with existing norms for the UK
derived from the regional central England survey and the
Health Survey for England 1996.
The data show that, controlling for age and sex, many of the
SF-36 dimension means differed between the three datasets,
262
K–S Z ¼ 22:10 p < 0:001 2036
K–S Z ¼ 6:15 p < 0:001 2040
0.89
0.81
0.82
0.84
K–S Z ¼ 5:59 p < 0:001 2019
K–S Z ¼ 5:13 p < 0:001 2040
K–S Z ¼ 19:89 p < 0:00012037
0.82
0.81
0.73
0.81
0.86
0.68
K–S Z ¼ 16:45 p < 0:001 2035
K–S Z ¼ 11:71 p < 0:001 2041
0.92
0.84
0.89
0.90
2047
K–S Z ¼ 11:75 p < 0:01
0.93
0.93
SF-36 Manual2
Skewness
No. of Garratt et al.
McHorney et al. Brazier et al.
Jenkinson et al. Table 8.2
ONS (OS) 1992 Kolmgorov–Smirnov
Items (Cronbach’s a) (Cronbach’s a)* (Cronbach’s a) (Cronbach’s a) (Cronbach’s a) (Cronbach’s a) ONS (OS) 1992
0.80
0.83
0.80
0.85
0.76
0.88
0.82
0.90
3
0.96
0.95
0.95
0.96
0.73
0.96
0.85
0.93
5
0.81/0.85
0.82/0.84
0.82/0.76
0.87/0.84
0.63/0.78
0.89/0.89
0.88/0.89
0.93/0.92
22
0.89
0.86
0.83
0.86
0.80
0.86
0.86
4
2
5
4
2
3
5
* Postal/interview survey.
0.92
10
Physical Functioning
Role Limitations
(phys)
Bodily Pain
General Health
Perceptions
Energy/Vitality
Social Functioning
Role Limitations
(emotional)
Mental Health
Dimension
6
Table 2 Internal consistency and skewness of SF-36 and comparison of Cronbach’s alpha coefficients with other studies
No. of
respondents
ONS (OS) 1992
JOURNAL OF PUBL IC HEALTH MEDICINE
with the Omnibus Survey sample having better total health
status means for Physical Functioning, Social Functioning,
Mental Health, Energy/Vitality and General Health Perceptions. The Health (interview) Survey for England had the lowest
(worst) total mean scores for Physical Functioning, Social
Functioning, Role Limitations (physical), Bodily Pain and
Health Perceptions. The postal central England survey had the
lowest (worst) total mean scores for Role Limitations (emotional), Mental Health, and Energy/Vitality.
Analyses by age showed that the differences were greatest
for younger age groups. These may be more sensitive areas for
younger people, and thus interviews may suffer more from
social desirability bias (resulting in inflated SF-36 scores) than
postal surveys.
Although these observations are based on three different
samples, with potential for the effects of selection and response
bias, regression analyses suggested that this explanation is
unlikely to account substantially for the observed differences.
One possible explanation for the differences in SF-36 dimension
means between these surveys is the different methods of
questionnaire administration used. The central England survey
was based on a postal survey and the ONS Omnibus Survey and
the Health Survey for England were based on personal interview
surveys. It is possible that the more anonymous postal approach,
free from contamination by interviewer effects, may have led to
higher (and more accurate) reporting of morbidity, particularly
in the more sensitive areas of mental and emotional health. This
is consistent with the literature that under-reporting of health
problems (perceived as undesirable characteristics) is more
likely in interview situations than with self-administered
questionnaires.34 It is also consistent with the findings of Perkins
and Sanson-Fisher,14 Lyons et al.19 and McHorney et al.22 that
interview surveys using the SF-36 (face-to-face and telephone)
lead to under-reporting of problems in comparison with postal
approaches, particularly for Mental Health and Role Limitations
(emotional).
Thus it appears that social desirability bias during the
Omnibus interviews may partly explain the better reported
health status of the Omnibus Survey interview sample in
comparison with the central England survey postal sample on
most SF-36 dimensions. However, this explanation does not
account for the even poorer reported health status of the
respondents interviewed for the Health Survey for England
1996, in comparison with both of the other postal and interview
surveys. This sample had the lowest (worst) total means for
Physical Functioning, Social Functioning, Role Limitations
(physical), Bodily Pain and Health Perceptions. The explanation for this inconsistency is likely to be found in both
contextual and question order effects. The Omnibus Survey was
unlikely to sensitize respondents to health at the outset of the
questionnaire because the context of the survey was generic.
Thus it would be expected that self-reported health status would
be better than that in the central England study, which placed
the SF-36 in the context of a healthy life survey, and the Health
NORMATIVE DATA FOR SF-36
263
Figure 1 Histograms of SF-36 dimensions with normal plot.
Survey for England, which focused on disease (respiratory
conditions), thereby sensitizing respondents to health issues
from the outset. This explanation is supported by the
observation that in the Health Survey for England, the
number of respondents who reported a longstanding illness
was 3–4 per cent higher than in the annual GHSs in Britain,
conducted by the ONS.32,33 Investigators at the ONS have
reported that more positive responses to this item are obtained
when asked in the context of health surveys, in comparison with
the GHSs, supporting a contextual bias; this is also supported by
data from their Omnibus Surveys.35,36 This explanation is
supported by the methodological literature,20 and is likely to
apply to the differences in SF-36 scores.
Question order effects (position of SF-36 in wider
questionnaire; e.g. before or after other questions on health)
may also account for some of the observed differences. It was
264
JOURNAL OF PUBL IC HEALTH MEDICINE
Table 3 Mean (SD) scores for the SF-36 dimensions by age and sex and social class and health variables
Physical Functioning
Role Limitations (Physical)
Bodily Pain
...................................
......................................
...................................
General Health Perceptions
......................................
Total score
Mean
(SD)
( n)
Mean
(SD)
( n)
Mean
(SD)
( n)
Mean
(SD)
( n)
Age
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
95.5
94.5
93.3
87.2
78.0
72.7
57.9
39.3
(12.1)
(13.5)
(13.4)
(20.9)
(26.3)
(26.7)
(28.6)
(31.5)
(204)
(415)
(319)
(297)
(297)
(281)
(296)
(36)
90.7
87.5
88.2
83.5
72.7
72.6
62.2
65.0
(24.5)
(29.8)
(28.0)
(33.8)
(40.6)
(39.5)
(42.1)
(42.5)
(202)
(414)
(318)
(296)
(295)
(279)
(176)
(35)
86.0
86.7
84.9
80.7
74.4
75.5
70.3
65.7
(22.6)
(22.6)
(21.8)
(25.5)
(28.6)
(27.2)
(29.5)
(30.5)
(203)
(414)
(318)
(298)
(296)
(279)
(178)
(35)
78.6
79.4
75.5
71.3
65.3
65.0
58.7
60.3
(17.7)
(17.9)
(19.5)
(23.4)
(26.5)
(24.1)
(24.5)
(22.2)
(204)
(416)
(319)
(297)
(295)
(279)
(175)
(34)
Sex
Male
Female
86.3
81.8
(22.5)
(25.7)
(925)
(1117)
82.7
79.0
(33.7)
(36.5)
(921)
(1109)
83.2
78.1
(24.4)
(26.8)
(922)
(1114)
71.9
71.3
(23.0)
(22.8)
(921)
(1113)
Social class
I Prof
II Semi-prof
IIInm
IIIm
IV
V
89.9
85.5
85.8
82.0
79.9
75.7
(18.0)
(19.2)
(22.6)
(26.0)
(27.3)
(29.5)
(73)
(481)
(474)
(447)
(339)
(151)
83.5
86.4
82.2
77.1
77.2
75.7
(33.3)
(30.1)
(33.8)
(38.2)
(36.8)
(40.1)
(71)
(480)
(473)
(443)
(337)
(149)
83.7
85.2
80.4
79.0
75.5
75.6
(21.0)
(22.1)
(24.5)
(27.2)
(28.7)
(28.7)
(73)
(481)
(474)
(444)
(337)
(151)
76.7
75.5
72.9
69.7
68.4
65.6
(18.9)
(19.4)
(22.0)
(23.9)
(25.2)
(25.2)
(72)
(479)
(473)
(446)
(338)
(150)
Cut down activities because of illness (in last 2 weeks)
Yes
63.0
(33.4)
(319)
27.7
No
87.7
(20.2) (1722)
90.4
(37.4)
(24.7)
(316)
(1713)
49.8
86.1
(29.8)
(20.6)
(318)
(1717)
51.5
75.3
(27.5)
(19.9)
(316)
(1717)
Long-term health problem
Yes
52.3
(28.9)
No
92.7
(13.1)
(42.5)
(24.7)
(444)
(1583)
56.0
87.4
(30.2)
(19.6)
(447)
(1586)
44.8
79.0
(23.5)
(16.1)
(444)
(1587)
Attended A&E or hospital out-patient (in past 3 months)
Yes
71.5
(30.8)
(353)
60.4
No
86.4
(22.1) (1689)
84.9
(43.0)
(31.9)
(353)
(1677)
65.3
83.6
(30.9)
(23.5)
(353)
(1683)
60.4
73.9
(27.0)
(21.2)
(351)
(1683)
Been hospital in-patient (in past 12 months)
Yes
70.2
(31.3)
(256)
No
85.8
(22.7) (1786)
(44.0)
(32.7)
(255)
(1775)
67.6
82.3
(31.8)
(24.4)
(256)
(1780)
59.7
73.2
(27.4)
(21.7)
(256)
(1778)
(449)
(1590)
44.1
90.9
58.2
83.9
pointed out in the Introduction that the principles of
questionnaire design recommend asking general questions
before specific ones, to minimize bias from order effects. If
disease-specific questions are asked before a generic health
status scale, then the generic health status ratings are likely to
be more favourable because the disease items had already been
considered by respondents and therefore excluded in replies to
the generic items. The central England study correctly placed
the SF-36 at the beginning of the survey questionnaire, but this
would minimize order effects only if respondents did not read
through the (postal) questionnaire before responding. The ONS
Omnibus Survey administered the SF-36 to respondents in the
middle of a lengthy interview about a range of non-health topics
(e.g. savings, household characteristics). In contrast, the Health
Survey for England in 1996 administered the SF-36 towards the
end of a lengthy interview about respiratory and other specific
problems with health. This positioning had the potential for
creating bias from order effects but does not explain the poorer
health status scores obtained in this survey.
In conclusion, it is necessary to assess, not only mode of
administration, but also the positioning of the SF-36, as with
any health status scale, within interview-based questionnaires
and also the context of the study. These results indicate the
importance of providing population norms for the various
modes of questionnaire administration, and also taking
account of contextual and question order effects. As
McHorney et al.22 concluded, the varying norms for the
different modes of questionnaire administration should not be
regarded as any more or less accurate or valid, they are
simply different, and should be regarded as relative rather
than absolute data. The provision of different norms for
different modes of administration for scales is essential so
that investigators can make comparisons with appropriate
norms and not wrongly interpret their data. Where
265
NORMATIVE DATA FOR SF-36
Table 3 contd
Energy/Vitality
Social Functioning
Role Limitations
(emotional)
...................................
...................................
...................................
Mental Health
......................................
Total score
Mean
(SD)
( n)
Mean
(SD)
( n)
Mean
(SD)
( n)
Mean
(SD)
( n)
Age
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
68.3
66.8
64.9
63.0
61.8
63.0
54.2
52.1
(19.0)
(18.4)
(18.2)
(23.1)
(24.6)
(24.1)
(24.9)
(20.8)
(203)
(415)
(319)
(296)
(297)
(279)
(176)
(35)
91.5
91.1
90.5
87.7
84.7
85.8
75.9
73.7
(17.2)
(16.9)
(19.4)
(22.8)
(25.6)
(25.6)
(29.1)
(24.7)
(203)
(413)
(317)
(297)
(297)
(279)
(176)
(35)
91.1
90.3
88.2
88.4
81.8
88.1
82.8
87.6
(24.6)
(26.2)
(28.7)
(28.6)
(35.8)
(29.8)
(34.7)
(32.4)
(203)
(414)
(317)
(297)
(296)
(280)
(174)
(35)
79.6
77.2
76.0
75.6
76.0
80.0
75.6
73.9
(15.2)
(16.2)
(18.3)
(20.2)
(20.2)
(18.3)
(19.5)
(17.8)
(203)
(415)
(319)
(297)
(297)
(279)
(176)
(34)
Sex
Male
Female
67.2
60.6
(21.7)
(21.6)
(923)
(1112)
88.8
86.2
(21.4)
(23.4)
(920)
(1112)
90.4
85.3
(26.6)
(32.0)
(912)
(1110)
79.5
75.1
(17.7)
(18.6)
(920)
(1115)
Social class
I Prof
II Semi-prof
IIInm
IIIm
IV
V
69.0
65.9
62.8
64.3
60.5
60.7
(18.5)
(19.3)
(19.7)
(23.1)
(24.8)
(25.2)
(73)
(480)
(474)
(445)
(338)
(149)
90.7
91.4
88.0
86.2
84.3
81.9
(17.8)
(18.3)
(21.1)
(23.0)
(26.0)
(27.7)
(73)
(479)
(473)
(445)
(337)
(149)
93.4
89.7
88.7
88.2
83.1
82.8
(17.5)
(26.2)
(28.2)
(29.5)
(34.3)
(36.5)
(73)
(481)
(472)
(443)
(338)
(149)
81.2
79.0
76.6
78.7
74.1
73.8
(15.3)
(16.7)
(16.6)
(18.6)
(20.7)
(21.1)
(73)
(481)
(475)
(444)
(338)
(149)
Cut down activities because of illness (in last 2 weeks)
Yes
44.0
(24.8)
(315)
60.8
No
67.1
(19.3) (1719)
92.3
(30.9)
(16.6)
(316)
(1715)
66.4
91.6
(45.1)
(23.9)
(316)
(1714)
66.3
79.1
(21.8)
(16.9)
(316)
(1718)
Long-term health problem
Yes
44.1
(23.9)
No
69.0
(17.9)
(30.5)
(14.9)
(445)
(1584)
71.6
92.1
(41.7)
(23.5)
(444)
(1584)
67.5
79.8
(22.1)
(16.1)
(446)
(1586)
Attended A&E or hospital out-patient (in past 3 months)
Yes
54.2
(25.0)
(351)
74.7
No
65.5
(20.7) (1684)
90.0
(30.2)
(19.7)
(352)
(1680)
79.5
89.3
(37.7)
(27.5)
(352)
(1679)
72.5
78.1
(20.3)
(77.7)
(351)
(1684)
Been hospital in-patient (in past 12 months)
Yes
50.9
(26.1)
(254)
No
65.4
(20.6) (1781)
(31.9)
(20.1)
(255)
(1777)
79.9
88.7
(37.2)
(28.4)
(255)
(1777)
71.3
77.9
(21.5)
(17.7)
(255)
(1780)
(446)
(1586)
65.3
93.5
72.1
89.5
appropriate norms are not available, awareness of this
problem is required so that researchers may consider whether
to make adjustments when making comparisons with existing
norms.
Acknowledgements
We are grateful to members of the King’s Fund Institute for
commissioning the study, and to the Office for National
Statistics for conducting it and depositing it on the Data
Archive, as well as to the staff of the Data Archive for
granting us access as authorized users. We are particularly
grateful to Fiona Dawe of the ONS and Cathy Cooper at the
Data Archive for their help in accessing the dataset, Lee
Marriott for typing the tables and Dr Ronan Lyons at the
Welsh Combined Centres for Public Health for helpful
advice. Crown Copyright 1992. Used with permission of the
Office for National Statistics.
266
JOURNAL OF PUBL IC HEALTH MEDICINE
Table 4 Comparison of SF-36 dimension norms in Britain
Means for
dimensions by age
and sex
Health Survey for
England (HSE) 1996 (ages 16+)
Oxford (Central England)
Healthy Life Survey
1991–1992 (ages 18–64)
British ONS Survey 1992
(ages 16+)
............................................
.....................................
....................................
Mean
(SEM)
Mean
(SD)
Mean
(SD)
Physical Functioning
Males
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
92
94
91
87
76
70
58
–
(0.59)
(0.40)
(0.50)
(0.61)
(0.94)
(1.00)
(1.28)
(–)
(n ¼ 7294)
92.8
93.9
91.9
87.9
80.0
n/a
n/a
n/a
(16.8)
(14.2)
(14.5)
(17.4)
(22.1)
(–)
(–)
(–)
(n ¼ 3963)
94.8
96.3
93.5
89.7
79.0
76.2
65.4
52.3
(15.0)
(10.6)
(14.8)
(19.1)
(25.6)
(26.4)
(27.5)
(30.5)
(n ¼ 916)
Females
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
90
90
87
82
72
62
48
–
(0.56)
(0.47)
(0.54)
(0.61)
(0.86)
(0.92)
(1.10)
(–)
(n ¼ 8760)
90.1
92.9
89.4
84.8
74.8
n/a
n/a
n/a
(16.4)
(13.3)
(16.1)
(18.3)
(23.5)
(–)
(–)
(–)
(n ¼ 4838)
96.1
93.1
93.2
85.3
77.2
70.0
52.9
32.0
(8.9)
(15.2)
(12.2)
(21.3)
(27.1)
(26.8)
(28.4)
(30.2)
(n ¼ 1109)
Total sample mean
81
(0.21)
88.4
(17.9)
89.6
(19.3)
Role Limitations (Physical)
Males
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
92
91
89
84
75
68
57
–
(0.72)
(0.70)
(0.72)
(0.91)
(1.28)
(1.42)
(1.89)
(–)
(n ¼ 7308)
91.8
92.0
89.5
87.6
78.8
n/a
n/a
n/a
(22.6)
(23.2)
(25.5)
(28.3)
(36.1)
(–)
(–)
(–)
(n ¼ 4051)
90.6
90.2
88.4
88.7
70.0
74.4
68.7
75.0
(24.1)
(26.0)
(28.3)
(28.5)
(41.9)
(38.6)
(41.1)
(38.2)
(n ¼ 917)
Females
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
90
88
84
80
72
64
55
–
(0.77)
(0.67)
(0.80)
(0.94)
(1.25)
(1.30)
(1.51)
(–)
(n ¼ 8747)
86.6
86.9
84.0
82.4
76.6
n/a
n/a
n/a
(25.5)
(29.2)
(32.0)
(32.0)
(36.9)
(–)
(–)
(–)
(n ¼ 5007)
90.8
85.4
88.1
78.9
75.2
71.2
57.9
59.1
(25.0)
(32.3)
(27.8)
(37.3)
(39.3)
(40.2)
(42.4)
(44.7)
(n ¼ 1101)
Total sample mean
80
(0.28)
85.8
(29.9)
84.2
(32.7)
Bodily Pain
Males
16–24
25–34
35–44
45–54
55–64
65–74
82
84
82
78
74
75
(0.67)
(0.60)
(0.63)
(0.71)
(0.90)
(0.93)
86.6
87.5
85.6
81.8
78.8
n/a
(17.9)
(17.7)
(19.7)
(22.2)
(23.6)
(–)
86.8
88.0
86.0
86.3
75.6
78.8
(21.9)
(21.6)
(21.3)
(22.5)
(28.3)
(26.0)
267
NORMATIVE DATA FOR SF-36
Table 4 contd
Means for
dimensions by age
and sex
Health Survey for
England (HSE) 1996 (ages 16+)
Oxford (Central England)
Healthy Life Survey
1991–1992 (ages 18–64)
British ONS Survey 1992
(ages 16+)
............................................
.....................................
....................................
Mean
Mean
(SD)
Mean
(SEM)
(SD)
75–84
85+
73
–
(1.19)
(–)
(n ¼ 7351)
n/a
n/a
(–)
(–)
(n ¼ 5064)
75.7
82.1
(26.8)
(29.6)
(n ¼ 916)
Females
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
80
80
78
74
69
69
65
–
(0.70)
(0.59)
(0.63)
(0.68)
(0.89)
(0.87)
(1.05)
(–)
(n ¼ 8809)
81.7
82.1
79.4
77.4
75.0
n/a
n/a
n/a
(20.8)
(21.1)
(22.0)
(22.3)
(25.1)
(–)
(–)
(–)
(n ¼ 5041)
85.4
85.7
84.0
75.6
73.2
72.9
66.7
56.1
(23.2)
(23.2)
(22.1)
(26.6)
(28.9)
(27.8)
(30.7)
(27.3)
(n ¼ 1106)
Total sample mean
77
(0.21)
81.5
(21.6)
82.5
(24.8)
General Health Perceptions
Males
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
74
74
73
69
64
62
61
–
(0.63)
(0.52)
(0.53)
(0.62)
(0.80)
(0.81)
(0.97)
(–)
(n ¼ 7301)
77.2
76.7
74.1
72.0
68.1
n/a
n/a
n/a
(17.4)
(17.7)
(18.5)
(20.1)
(22.9)
(–)
(–)
(–)
(n ¼ 4031)
79.8
79.7
75.8
72.7
63.1
64.8
61.3
65.2
(17.9)
(17.5)
(19.4)
(22.2)
(26.9)
(24.2)
(26.7)
(23.9)
(n ¼ 912)
Females
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
70
74
73
70
65
64
61
–
(0.58)
(0.48)
(0.54)
(0.56)
(0.72)
(0.69)
(0.80)
(–)
(n ¼ 8715)
72.1
77.3
74.1
73.1
68.0
n/a
n/a
n/a
(20.3)
(18.5)
(20.3)
(19.9)
(22.0)
(–)
(–)
(–)
(n ¼ 4959)
77.9
79.1
75.1
70.3
67.4
65.1
58.7
57.3
(17.6)
(18.2)
(19.6)
(23.9)
(26.1)
(24.1)
(23.0)
(21.0)
(n ¼ 1105)
Total sample mean
69
(0.17)
73.5
(19.9)
74.0
(21.9)
Energy & Vitality
Males
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
69
68
66
66
64
64
58
–
(0.59)
(0.49)
(0.51)
(0.53)
(0.72)
(0.75)
(0.94)
(–)
(n ¼ 7339)
66.4
64.5
63.5
62.9
62.9
n/a
n/a
n/a
(17.1)
(17.3)
(18.6)
(19.9)
(20.3)
(–)
(–)
(–)
(n ¼ 4025)
73.0
71.3
67.8
68.5
63.4
65.0
57.7
56.9
(16.6)
(16.8)
(18.2)
(22.4)
(25.0)
(25.0)
(26.5)
(23.4)
(n ¼ 914)
Females
16–24
25–34
35–44
63
62
61
(0.57)
(0.47)
(0.50)
59.8
58.3
58.2
(19.4)
(19.5)
(19.9)
64.1
63.4
62.4
(20.1)
(18.9)
(17.8)
268
JOURNAL OF PUBL IC HEALTH MEDICINE
Table 4 contd
Means for
dimensions by age
and sex
Health Survey for
England (HSE) 1996 (ages 16+)
Oxford (Central England)
Healthy Life Survey
1991–1992 (ages 18–64)
British ONS Survey 1992
(ages 16+)
............................................
.....................................
....................................
Mean
(SEM)
Mean
(SD)
Mean
(SD)
45–54
55–64
65–74
75–84
85+
60
60
59
53
–
(0.53)
(0.67)
(0.66)
(0.82)
(–)
(n ¼ 8800)
59.4
59.0
n/a
n/a
n/a
(20.3)
(21.4)
(–)
(–)
(–)
(n ¼ 4973)
57.7
60.5
61.5
51.9
49.3
(22.4)
(24.3)
(23.1)
(23.7)
(19.2)
(n ¼ 1104)
Total sample mean
63
(0.16)
61.1
(19.6)
64.7
(20.8)
Social Functioning
Males
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
89
90
88
87
84
83
79
–
(0.61)
(0.52)
(0.56)
(0.64)
(0.83)
(0.91)
(1.21)
(–)
(n ¼ 7354)
90.2
91.3
90.5
89.8
86.9
n/a
n/a
n/a
(16.4)
(16.3)
(17.0)
(18.7)
(22.6)
(–)
(–)
(–)
(n ¼ 4073)
91.7
93.2
91.9
91.2
84.4
86.0
77.0
80.3
(21.5)
(14.7)
(18.0)
(19.4)
(26.1)
(25.1)
(28.0)
(24.1)
(n ¼ 916)
Females
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
85
86
86
85
83
82
78
–
(0.64)
(0.52)
(0.57)
(0.60)
(0.79)
(0.81)
(0.99)
(–)
(n ¼ 8813)
90.2
91.3
90.5
89.8
86.9
n/a
n/a
n/a
(16.4)
(16.3)
(17.0)
(18.7)
(22.6)
(–)
(–)
(–)
(n ¼ 5051)
91.4
89.5
89.3
84.9
85.0
85.7
75.2
69.7
(17.4)
(18.3)
(20.5)
(25.4)
(25.2)
(26.1)
(29.9)
(24.8)
(n ¼ 1104)
Total sample mean
85
(0.19)
88.0
(19.5)
89.0
(20.8)
Role Limitations (emotional)
Male
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
90
90
89
86
84
81
77
–
(0.80)
(0.69)
(0.74)
(0.87)
(1.07)
(1.20)
(1.65)
(–)
82.9
87.1
86.0
85.7
85.8
n/a
n/a
n/a
(31.1)
(27.9)
(28.6)
(29.5)
(29.9)
(–)
(–)
(–)
93.4
93.2
90.6
91.7
86.3
89.5
82.9
100.0
(19.7)
(23.2)
(26.2)
(24.9)
(31.5)
(28.0)
(35.3)
(0.0)
(n ¼ 7294)
(n ¼ 4056)
(n ¼ 917)
Females
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
84
85
85
84
81
78
75
–
(0.93)
(0.76)
(0.79)
(0.85)
(1.09)
(1.16)
(1.36)
(–)
(n ¼ 8732)
78.8
80.6
80.3
80.8
83.3
n/a
n/a
n/a
(33.0)
(34.0)
(33.6)
(33.6)
(32.5)
(–)
(–)
(–)
(n ¼ 4011)
89.1
88.0
86.2
85.1
77.7
87.0
82.7
80.3
(28.1)
(28.2)
(30.6)
(31.7)
(38.7)
(31.1)
(34.4)
(39.4)
(n ¼ 1002)
Total sample mean
84
(0.25)
82.9
(31.8)
88.0
(29.1)
269
NORMATIVE DATA FOR SF-36
Table 4 contd
Means for
dimensions by age
and sex
Health Survey for
England (HSE) 1996 (ages 16+)
Oxford (Central England)
Healthy Life Survey
1991–1992 (ages 18–64)
British ONS Survey 1992
(ages 16+)
............................................
.....................................
....................................
Mean
(SEM)
Mean
(SD)
Mean
(SD)
Mental Health
Males
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
77
78
77
77
78
78
79
–
(0.53)
(0.43)
(0.44)
(0.47)
(0.56)
(0.59)
(0.75)
(–)
(n ¼ 7333)
74.8
75.8
75.0
76.0
78.0
n/a
n/a
n/a
(15.4)
(15.2)
(16.1)
(16.7)
(17.3)
(–)
(–)
(–)
(n ¼ 3984)
81.2
80.1
77.7
79.9
78.3
81.4
77.1
77.0
(15.5)
(15.0)
(18.1)
(18.1)
(19.5)
(18.0)
(19.4)
(24.3)
(n ¼ 912)
Females
16–24
25–34
35–44
45–54
55–64
65–74
75–84
85+
73
73
73
74
73
74
75
–
(0.50)
(0.43)
(0.43)
(0.44)
(0.56)
(0.57)
(0.67)
(–)
(n ¼ 8794)
70.2
71.6
71.6
73.2
74.4
n/a
n/a
n/a
(17.4)
(15.2)
(17.8)
(18.2)
(18.5)
(–)
(–)
(–)
(n ¼ 4946)
78.2
75.0
74.6
71.5
74.0
78.8
74.6
72.2
(14.9)
(16.7)
(18.3)
(21.4)
(20.5)
(18.5)
(19.6)
(13.4)
(n ¼ 1107)
Total sample mean
75
(0.14)
73.8
(17.2)
76.6
(18.3)
Sample sizes:
CEHLS: completed mail questionnaires = 9332 adults aged 18–64 (72% response rate). Sample mirrored closely the characteristics of general population
in 1981 Census and 1991 population estimates, although it slightly under-represented those in social classes I, 11 and IIInm.
HSE: interviewed = 16443 adults aged 16+ (75% response rate). Characteristics broadly similar to age, sex and social class of population in 1991 Census,
although it slightly under-represented men. In comparison with mid-1995 population estimates men aged 16–34 were slightly under-represented.
HSE provided data on ages 75+ collectively with no older age breakdown.
ONS: interviewed = 2056 adults aged 16+ (78% response rate). The sample compared well with 1991 Census figures and mid-term 1992 population
estimates.
n/a, not applicable.
References
Department of Public Health and Primary Care, Health
Services Research Unit, 1996.
1 Ware JE, Sherbourne CD. The MOS 36 item short-form
health survey (SF-36). 1: Conceptual framework and item
selection. Med Care 1992; 30: 473–480.
5 Brazier JE, Harper R, Jones NMB, et al. Validating the SF36 health survey questionnaire: new outcome measure for
primary care. Br Med J 1992; 305: 160–164.
2 Ware JE, Snow KK, Kosinski MA, Gandek MS. SF-36
health survey. Manual and interpretation guide. Boston, MA:
New England Medical Center, The Health Institute, 1993.
6 Garratt AM, Ruta DA, Abdalla MI, Buckingham JK, Russell
IT. The SF 36 health survey questionnaire: an outcome
measure suitable for routine use within the NHS? Br Med J
1993; 306: 1440–1444.
3 Jenkinson C, Coulter A, Wright L. Short form 36 (SF 36)
health survey questionnaire: normative data for adults of
working age. Br Med J 1993; 306: 1437–1440.
4 Jenkinson C, Layte R. Wright L, Coulter A. The UK SF-36:
an analysis and interpretation manual. A guide to health
status measurement with particular reference to the Short
Form 36 Health Survey. Oxford: University of Oxford,
7 Lyons RA, Lo SV, Littlepage BNC. Comparative health
status of patients with 11 common illnesses in Wales. J
Epidemiol Commun Hlth 1994; 48: 388–390.
8 Lyons RA, Fielder H, Littlepage BNC. Measuring health
status with the SF-36: the need for regional norms. J Publ
Hlth Med 1995; 17: 46–50.
270
JOURNAL OF PUBL IC HEALTH MEDICINE
9 Lyons RA, Crome P, Monaghan S, Killalea D, Daley JA.
Health status and disability among elderly people in three
UK districts. Age Ageing 1997; 26: 203–209.
23 Hochstim JA. A critical comparison of three strategies of
collecting data from households. J Am Statist Assoc 1967;
62: 976–989.
10 Lyons RA, Perry HM, Littlepage BNC. Evidence for the
validity of the short-form 36 questionnaire (SF-36) in an
elderly population. Age Ageing 1994; 23: 182–184.
24 Wu AW, Jacobson DL, Berzon RA, et al. The effect of mode
of administration on Medical Outcomes Study health ratings
and EuroQol scores in AIDS. Qual Life Res 1997; 6: 3–10.
11 Prescott-Clarke P, Primatesta P, eds. Health survey for
England, 1996. Vols 1 and 2. London: The Stationery
Office, 1998.
25 Bowling A. Research methods in health. Investigating
health and health services. Buckingham: Open University
Press, 1997.
12 Lamping D. When is a norm a norm? The representativeness
of population norms for UK version of SF-36 (abstract).
Qual Life Res 1997; 6: 675.
26 Keller SD, Ware JE. Questions and answers about the SF-36
and SF-12. Med Outcomes Trust Bull 1996; 4: 3.
13 Gandek B, Ware JE, eds. Translating functional health and
well-being: International Quality of Life Assessment
(IQOLA) project studies of the SF-36 Health Survey. J Clin
Epidemiol, Special Issue 1998; 51: 891–1214.
14 Perkins JJ, Sanson-Fisher RW. An examination of self- and
telephone-administered modes of administration for the
Australian SF-36. J Clin Epidemiol, Special Issue 1998; 51:
969–973.
15 Ware JE, Kosinski M, Keller SD. SF-36 physical and mental
summary scales: a user’s manual. Boston, MA: The Health
Institute, 1994.
16 Hayes V, Morris J, Wolfe C, Morgan M. The SF-36 Health
Survey Questionnire: is it suitable for use with older adults?
Age Ageing 1995; 24: 120–125.
17 Hill S, Harries, U. Assessing the outcome of health care for
the older person in community settings: should we use the
SF-36? Outcomes briefing. UK Clearing House for Health
Outcomes 1994; 4: 26–27.
18 Parker SG, Peet SM, Jagger C, Farhan M, Castleden CM.
Measuring health status in older patients. The SF-36 in
practice. Age Ageing 1998; 27: 13–18.
19 Lyons RA, Wareham K, Lucas M, et al. SF-36 scores vary
by method of administration: implications for study design.
J Publ Hlth Med 1999; 21: 41–45.
20 Locander W, Sudman S, Bradburn N. An investigation of
interview method, threat, and response distortion. J Am
Statist Assoc 1976; 71: 269–274.
21 Siemiatycki M. A comparison of mail, telephone, and home
interview strategies for household health surveys. Am J Publ
Hlth 1979; 69: 238–245.
22 McHorney CA, Kosinski M, Ware JE. Comparisons of
the costs and quality of norms for the SF-36 Health
Survey collected by mail versus telephone interview:
results from a national survey. Med Care 1994; 32:
551–567.
27 Barry MJ, Walker-Corkery E, Chang Y, et al. Measurement
of overall and disease-specific health status: does the order
of questionnaires make a difference? J Hlth Serv Res Policy
1996; 1: 20–27.
28 Jenkinson C, Layte R, Lawrence K. Development and
testing of the SF-36 summary scale scores in the United
Kingdom: results from a large scale survey and clinical trial.
Med Care 1997; 35: 410–416.
29 Ware JE, Kosinski M, Bayliss MS, et al. Comparison of
methods for scoring and statistical analysis of SF-36 health
profiles and summary measures: summary of results from
the Medical Outcomes Study. Med Care 1995; 33(Suppl. 4):
AS264–AS279.
30 Julious SA, George S, Campbell J. Sample sizes for studies
using the short form 36 (SF-36). J Epidemiol Commun Hlth
1995; 49: 642–644.
31 Lamping DL, Campbell KA, Schroter S. Methodological
issues in combining items to form scales and analysing
interval data (abstract). Qual Life Res 1998; 7: 621.
32 Thomas M, Goddard E, Hickman M, Hunter, P. General
household survey 1992. Office of Population Censuses
and Surveys, Social Survey Division. London: HMSO,
1994.
33 Foster K, Jackson B, Thomas M, Hunter, P, Bennett N.
General household survey 1993. Office of Population
Censuses and Surveys. London: HMSO, 1995.
34 Cannell CC, Groves RM, Miller PV. The effects of mode of
data collection on health survey data. J Am Statist Assoc,
Proc Section Social Statist 1981; 1: 1–6 (suppl.).
35 Breeze E, Maidment A, Bennett N, Flatley J, Carey S.
Health survey for England 1992. Office of Population
Censuses and Surveys. London: HMSO, 1994.
36 Bowling A. Health care rationing: the public’s debate. Br
Med J 1996; 312: 670–674.
Accepted on 16 March 1999