Invited Commentary: Comparison of the Block and the Willett Food

American Journal of Epidemiology
Copyright © 1998 by The Johns Hopkins University School of Hygiene and Public Health
All rights reserved
Vol. 148, No. 12
Printed in U.S.A.
Invited Commentary: Comparison of the Block and the Willett Food
Frequency Questionnaires
Gladys Block
Wirfalt et al. (1) and Caan et al. (2) deserve a great
deal of credit for conducting evaluations of two instruments at the same time, something that is all too rarely
done. Only when we compare two different instruments against the same gold standard at the same time
in the same population will we be able to improve our
methods and move forward.
The analyses conducted and their interpretation
should, of course, depend on the instruments examined. Since Wirfalt et al. used a reduced 60-item
version of the Block questionnaire, the lower means,
lower energy estimates in comparison with resting
metabolic rate, and slopes less than 1.0 seen with the
reduced Block food frequency questionnaire (FFQ) are
accurate, but uninteresting. The full-length Block
questionnaire was designed to do a reasonable job at
capturing the absolute value of nutrient intake and
included foods representing at least the top 90 percent
of nutrient intake in the National Health and Nutrition
Examination Survey II data set. For the 60-item version, on the other hand, we selected the food items to
be included by identifying those foods that represented
the top 80 percent of nutrient intake in the National
Health and Nutrition Examination Survey II data set,
and we explicitly stated in the paper introducing it (3)
that the absolute value of nutrient estimates would be
lower than the truth. We hope that no investigators
would be using the brief instrument to make accurate
point estimates.
The Block FFQ has undergone various changes in
format between the original 1986 versions and the
latest 1998 version. The original write-a-numberin-the-column format of the Block FFQ used by
Wirfalt et al. (1) has not been recommended for selfadministration for almost a decade because of the
possibility of serious respondent errors if the answer is
off by a column. At the time of the original developReceived for publication June 23, 1998, and accepted for publication July 6, 1998.
Abbreviations: DARCC, Diet, Activity, and Reproductive Risks for
Colon Cancer; FFQ, food frequency questionnaire; NHANES III,
National Health and Nutrition Examination Survey III.
From the University of California, Berkeley, School of Public
Health, Berkeley, CA.
ment of scannable versions of the Block FFQ, we
conducted an analysis in which data from an openended questionnaire were collapsed into Willett's categories and nutrient estimates recalculated. The openended and collapsed estimates were each compared
against three 4-day records in the Women's Health
Trial Pilot validation study. Correlations with reference data were almost identical by the two calculation
methods (e.g., percent fat r — 0.64 by open-ended
questionnaire and r = 0.63 by categories; calcium, r —
0.61 vs. r = 0.61; vitamin C, r = 0.48 vs. r = 0.46.)
Thus, I disagree with conclusions that scannable formats such as the Willett and the more recent Block
instruments entail loss of precision in ranking and that
open-ended formats are preferable.
With regard to ranking, the data presented in the
article by Wirfalt et al.(l) suggest that full-length
instruments may be better at capturing nutrient-rich,
but infrequently consumed, foods and their micronutrients or, alternatively, that large overestimates can
produce high correlations for mathematical reasons
alone. Comparisons of micronutrient estimates produced by the Willett and the full-length open-ended
version of the Block FFQs were conducted in the
Women's Health Trial Pilot validation study (Women's Health Trial Pilot Steering Committee, unpublished data).
It is notable that both the Willett and the Block
correlations and classification agreements seen in the
paper by Wirfalt et al. are quite low in comparison
with other validation studies of the Willett or fulllength Block instruments. In part, this reflects the fact
that 3 days of diet recalls do not provide a good
estimate of the past 6 months (in the case of the Block
FFQ, as used in this study) or the past 1 month (in the
case of the Willett FFQ, as used in the study). Both
of these instruments are better than they appear here;
the results are at least partly due to the vagaries
of three recalls and the problematic nature of the
two-dimensional portion size drawings used for the
reference data.
The issues in the paper by Caan et al. (2) are
somewhat different, since this is not really a validation
1160
Comparison of Food Frequency Questionnaires
using as a gold standard an unrelated type of reference
data (e.g., multiple recalls), but, rather, simply a comparison with a longer, but similar (frequency-type),
instrument. Comparisons of group means across these
three FFQs (Caan et al., table 3), and the use of the
term "underestimate" for either the Block or the
Willett FFQ, seems fairly inappropriate, since the
"truth" is unknown in any case. However, table 1 in
the paper by Caan et al. indicates that the CARDIA
questionnaire produced substantial overestimates of all
nutrients in women and of potassium and vitamin A in
men. In addition, comparison of energy intakes produced by the Diet, Activity, and Reproductive Risks
for Colon Cancer (DARCC) questionnaire used by
Caan et al. with the National Health and Nutrition
Examination Survey III (NHANES III) 24-hour recall
data suggests substantial overestimates: NHANES III
males aged 60-69 years had a mean of 2,110 kcal, and
those aged 70-79 years had mean of 1,887 kcal in
contrast to the DARCC questionnaire estimate of
2,455 kcal. NHANES III females aged 60-69 years
had mean of 1,578 kcal, and those aged 70-79 years
had mean of 1,435 kcal compared with the DARCC
questionnaire estimate of 1,894 kcal. Thus, "produced
lower estimates" would seem to be more appropriate
terminology than "underestimated."
The fundamental difficulty with drawing conclusions from comparison of these three instruments is
that, all things being equal, the more food items a FFQ
contains, the higher the nutrient estimates will be. An
instrument with up to 800 items will produce higher
estimates than one with 153 items, and that, in turn,
will produce higher estimates than one with 105 items.
When the comparison instrument has 103 possible
fruit/vegetable items and produces overestimates of
vitamin A (Caan et al., table 1), a test instrument with
60 fruit/vegetable items will inevitably produce higher
point estimates and, consequently, correlate better
with it for mathematical reasons alone than an instrument with only 30 such items, for nutrients such as
fiber, vitamin A, carotene, vitamin C, and folic acid. It
is simply a matter of the number and choice of food
items contained in the various instruments. It is not
possible to know, in Caan et al., which of these three
instruments produces point estimates or rankings that
are closer to the truth.
A further complication is that, apparently, the analysis of the modified Block version was performed by
using "standard" portion sizes (ONQUEST option)
(Caan, personal communication, 1998) rather than the
age-and sex-specific portion sizes usually recommended. The ONQUEST portions will always produce
Am J Epidemiol
Vol. 148, No. 12, 1998
1161
substantial underestimates for men and sometimes for
women as well. This choice of portion size options in
the analysis software also is relevant to resulting correlations with reference data. The use of age- and
sex-specific portion sizes has been shown to produce
higher correlations with reference data (16 days of
recalls/records) (4) than does the "ONQUEST" approach. Thus, conclusions in the paper by Caan et al.
about the ability of the modified Block instrument to
rank individuals on nutrient intake must be limited to
situations in which the age-and sex-specific portion
size calculation is not used in the analysis.
The comparison of odds ratios conducted by Caan et
al. in their table 6 is very interesting. On the one hand,
it is reassuring that all three instruments produced risk
estimates that were within each other's confidence
intervals. On the other hand, the table emphasizes the
need for caution in our interpretation of modest epidemiologic results, since in several instances the
Block FFQ produced lower risk estimates than did the
DARCC whereas the Willett FFQ produced higher
estimates than did the DARCC (e.g., kilocalories), or
vice versa (e.g., fat).
Both the Willett and the Block instruments are continually undergoing revisions (and, it is hoped, improvements!). I hope that publication of Wirfalt et al.
and Caan et al. puts to rest at last any use of the phrase
"the validity of food frequency questionnaires." There
is no such thing as "the food frequency questionnaire"
and no possibility of any generalizations about such
instruments. God is in the details, and both accuracy of
point estimates and ability to rank depend intimately
on the exact nature of the instrument, response format,
number of foods, selection of foods, method of administration, calculation options, edit rules applied, and
subsequent analytic choices.
REFERENCES
1. Wirfalt AKE, Jeffery RW, Elmer PJ. Comparison of food
frequency questionnaires: the reduced Block and Willett questionnaires differ in ranking on nutrient intakes. Am J Epidemiol 1998:148:1148-56.
2. Caan BJ, Slattery ML, Potter J, et al. Comparison of the Block
and the Willett self-administered semiquantitative food frequency questionnaires to an interviewer-administered dietary
history. Am J Epidemiol 1998;148:1137-47.
3. Block G, Hartman AM, Naughton D. A reduced dietary
questionnaire: development and validation. Epidemiology
1990; 1:58-64.
4. Block G, Thompson FE, Hartman AM, et al. Comparison of
two dietary questionnaires validated against multiple dietary
records collected during a 1-year period. J Am Diet Assoc
1992;92:686-93.