American Journal of Epidemiology Copyright © 1998 by The Johns Hopkins University School of Hygiene and Public Health All rights reserved Vol. 148, No. 12 Printed in U.S.A. Invited Commentary: Comparison of the Block and the Willett Food Frequency Questionnaires Gladys Block Wirfalt et al. (1) and Caan et al. (2) deserve a great deal of credit for conducting evaluations of two instruments at the same time, something that is all too rarely done. Only when we compare two different instruments against the same gold standard at the same time in the same population will we be able to improve our methods and move forward. The analyses conducted and their interpretation should, of course, depend on the instruments examined. Since Wirfalt et al. used a reduced 60-item version of the Block questionnaire, the lower means, lower energy estimates in comparison with resting metabolic rate, and slopes less than 1.0 seen with the reduced Block food frequency questionnaire (FFQ) are accurate, but uninteresting. The full-length Block questionnaire was designed to do a reasonable job at capturing the absolute value of nutrient intake and included foods representing at least the top 90 percent of nutrient intake in the National Health and Nutrition Examination Survey II data set. For the 60-item version, on the other hand, we selected the food items to be included by identifying those foods that represented the top 80 percent of nutrient intake in the National Health and Nutrition Examination Survey II data set, and we explicitly stated in the paper introducing it (3) that the absolute value of nutrient estimates would be lower than the truth. We hope that no investigators would be using the brief instrument to make accurate point estimates. The Block FFQ has undergone various changes in format between the original 1986 versions and the latest 1998 version. The original write-a-numberin-the-column format of the Block FFQ used by Wirfalt et al. (1) has not been recommended for selfadministration for almost a decade because of the possibility of serious respondent errors if the answer is off by a column. At the time of the original developReceived for publication June 23, 1998, and accepted for publication July 6, 1998. Abbreviations: DARCC, Diet, Activity, and Reproductive Risks for Colon Cancer; FFQ, food frequency questionnaire; NHANES III, National Health and Nutrition Examination Survey III. From the University of California, Berkeley, School of Public Health, Berkeley, CA. ment of scannable versions of the Block FFQ, we conducted an analysis in which data from an openended questionnaire were collapsed into Willett's categories and nutrient estimates recalculated. The openended and collapsed estimates were each compared against three 4-day records in the Women's Health Trial Pilot validation study. Correlations with reference data were almost identical by the two calculation methods (e.g., percent fat r — 0.64 by open-ended questionnaire and r = 0.63 by categories; calcium, r — 0.61 vs. r = 0.61; vitamin C, r = 0.48 vs. r = 0.46.) Thus, I disagree with conclusions that scannable formats such as the Willett and the more recent Block instruments entail loss of precision in ranking and that open-ended formats are preferable. With regard to ranking, the data presented in the article by Wirfalt et al.(l) suggest that full-length instruments may be better at capturing nutrient-rich, but infrequently consumed, foods and their micronutrients or, alternatively, that large overestimates can produce high correlations for mathematical reasons alone. Comparisons of micronutrient estimates produced by the Willett and the full-length open-ended version of the Block FFQs were conducted in the Women's Health Trial Pilot validation study (Women's Health Trial Pilot Steering Committee, unpublished data). It is notable that both the Willett and the Block correlations and classification agreements seen in the paper by Wirfalt et al. are quite low in comparison with other validation studies of the Willett or fulllength Block instruments. In part, this reflects the fact that 3 days of diet recalls do not provide a good estimate of the past 6 months (in the case of the Block FFQ, as used in this study) or the past 1 month (in the case of the Willett FFQ, as used in the study). Both of these instruments are better than they appear here; the results are at least partly due to the vagaries of three recalls and the problematic nature of the two-dimensional portion size drawings used for the reference data. The issues in the paper by Caan et al. (2) are somewhat different, since this is not really a validation 1160 Comparison of Food Frequency Questionnaires using as a gold standard an unrelated type of reference data (e.g., multiple recalls), but, rather, simply a comparison with a longer, but similar (frequency-type), instrument. Comparisons of group means across these three FFQs (Caan et al., table 3), and the use of the term "underestimate" for either the Block or the Willett FFQ, seems fairly inappropriate, since the "truth" is unknown in any case. However, table 1 in the paper by Caan et al. indicates that the CARDIA questionnaire produced substantial overestimates of all nutrients in women and of potassium and vitamin A in men. In addition, comparison of energy intakes produced by the Diet, Activity, and Reproductive Risks for Colon Cancer (DARCC) questionnaire used by Caan et al. with the National Health and Nutrition Examination Survey III (NHANES III) 24-hour recall data suggests substantial overestimates: NHANES III males aged 60-69 years had a mean of 2,110 kcal, and those aged 70-79 years had mean of 1,887 kcal in contrast to the DARCC questionnaire estimate of 2,455 kcal. NHANES III females aged 60-69 years had mean of 1,578 kcal, and those aged 70-79 years had mean of 1,435 kcal compared with the DARCC questionnaire estimate of 1,894 kcal. Thus, "produced lower estimates" would seem to be more appropriate terminology than "underestimated." The fundamental difficulty with drawing conclusions from comparison of these three instruments is that, all things being equal, the more food items a FFQ contains, the higher the nutrient estimates will be. An instrument with up to 800 items will produce higher estimates than one with 153 items, and that, in turn, will produce higher estimates than one with 105 items. When the comparison instrument has 103 possible fruit/vegetable items and produces overestimates of vitamin A (Caan et al., table 1), a test instrument with 60 fruit/vegetable items will inevitably produce higher point estimates and, consequently, correlate better with it for mathematical reasons alone than an instrument with only 30 such items, for nutrients such as fiber, vitamin A, carotene, vitamin C, and folic acid. It is simply a matter of the number and choice of food items contained in the various instruments. It is not possible to know, in Caan et al., which of these three instruments produces point estimates or rankings that are closer to the truth. A further complication is that, apparently, the analysis of the modified Block version was performed by using "standard" portion sizes (ONQUEST option) (Caan, personal communication, 1998) rather than the age-and sex-specific portion sizes usually recommended. The ONQUEST portions will always produce Am J Epidemiol Vol. 148, No. 12, 1998 1161 substantial underestimates for men and sometimes for women as well. This choice of portion size options in the analysis software also is relevant to resulting correlations with reference data. The use of age- and sex-specific portion sizes has been shown to produce higher correlations with reference data (16 days of recalls/records) (4) than does the "ONQUEST" approach. Thus, conclusions in the paper by Caan et al. about the ability of the modified Block instrument to rank individuals on nutrient intake must be limited to situations in which the age-and sex-specific portion size calculation is not used in the analysis. The comparison of odds ratios conducted by Caan et al. in their table 6 is very interesting. On the one hand, it is reassuring that all three instruments produced risk estimates that were within each other's confidence intervals. On the other hand, the table emphasizes the need for caution in our interpretation of modest epidemiologic results, since in several instances the Block FFQ produced lower risk estimates than did the DARCC whereas the Willett FFQ produced higher estimates than did the DARCC (e.g., kilocalories), or vice versa (e.g., fat). Both the Willett and the Block instruments are continually undergoing revisions (and, it is hoped, improvements!). I hope that publication of Wirfalt et al. and Caan et al. puts to rest at last any use of the phrase "the validity of food frequency questionnaires." There is no such thing as "the food frequency questionnaire" and no possibility of any generalizations about such instruments. God is in the details, and both accuracy of point estimates and ability to rank depend intimately on the exact nature of the instrument, response format, number of foods, selection of foods, method of administration, calculation options, edit rules applied, and subsequent analytic choices. REFERENCES 1. Wirfalt AKE, Jeffery RW, Elmer PJ. Comparison of food frequency questionnaires: the reduced Block and Willett questionnaires differ in ranking on nutrient intakes. Am J Epidemiol 1998:148:1148-56. 2. Caan BJ, Slattery ML, Potter J, et al. Comparison of the Block and the Willett self-administered semiquantitative food frequency questionnaires to an interviewer-administered dietary history. Am J Epidemiol 1998;148:1137-47. 3. Block G, Hartman AM, Naughton D. A reduced dietary questionnaire: development and validation. Epidemiology 1990; 1:58-64. 4. Block G, Thompson FE, Hartman AM, et al. Comparison of two dietary questionnaires validated against multiple dietary records collected during a 1-year period. J Am Diet Assoc 1992;92:686-93.
© Copyright 2026 Paperzz