International Journal of Epidemiology Vol. 26, No.1 Printed in Great Britain © International Epidemiological Association 1997 A Comparison of Three Methods of Analysis for Age-Period-Cohort Models with Application to Incidence Data on Non-Hodgkin's Lymphoma RICHARD J Q MCNALLY,· FREDA E ALEXANDER,·· ANTHONY STAINES· AND RAYMOND A CARTWRIGHT* McNally R J Q (Leukaemia Research Fund Centre for Clinical Epidemiology, University of Leeds, 17 Springfield Mount, Leeds LS2 9NG, UK), Alexander F E, Staines A and Cartwright R A. A comparison of three methods of analysis for age-period-cohort models with application to incidence data on non-Hodgkin's lymphoma. International Journal of Epidemiology 1997; 26: 32-46. Background. Various methods of analysis have been used to study age-period-cohort models. The main aim of this paper is to illustrate and compare three such methods. Those of Clayton and Schifflers, Robertson and Boyle, and De Carli and La Vecchia. The main differences between these methods lie in their approach to distinguish between linear-period and linear-cohort effects. Clayton and Schifflers do not attempt to solve this identification problem, whereas Robertson and Boyle, and De Carli and La Vecchia attempt to tackle this question. Method. In order to study the assumptions and problems of these methods, we analysed data from 2678 subjects aged 3o-B4 in Yorkshire, UK, who were diagnosed with non-Hodgkin's lymphoma (NHL) during the period 1978-1991. Loglinear Poisson models were used to examine the effects of age, period and cohort. Results. All three methods of analysis agree that, after stratification for sex and county, the age-standardized rate has been increasing at about 5% per year. The Robertson-Boyle method differed from the Clayton-Schifflers method in showing a significant non-linear cohort effect, and a significant county-cohort interaction. The method of De CarliLa Vecchia agreed more closely with Clayton-Schifflers than with Robertson-Boyle. Conclusions. The linear increase in incidence would lead to a doubling of the number of cases within 15 years. There is controversy over whether the identification problem can be solved and should be solved. Many authors would not rely on the results of the methods of Robertson and Boyle, or De Carli and La Vecchia. Keywords: age-period-cohort models, time trends, epidemiological methods, identification problem, non-Hodgkin's lymphoma, United Kingdom Incidence rates for non-Hodgkin's lymphoma vary widely throughout the world. 1 Higher rates are observed from those cancer registries which have predominantly fair-skinned populations. There is a consistent excess in males (generally around 50 to 100%) at all ages. For males, age-standardized incidence rates, obtained from cancer registry data and standardized to the world population, range from a low of 1.4 per 100 000 person years in Bamako, Mali to a high of 17.4 per 100 000 person years among whites in the San Francisco Bay area (USA).I Corresponding rates for females range from a low of 0.4 per 100 000 person years in Bamako, Mali to a high of 10.6 per 100 000 person years in Manitoba, Canada. 1 In the UK, age-standardized rates (to the world population) for males range from 6.5 per 100 000 person years in Birmingham to 9.5 per 100 000 person years in North Scotland. 1 Overall, both mortality and incidence rates have been increasing in virtually all registries, although the rates of increase vary. The increase is more rapid than for any other cancers except prostate cancer, melanoma for both sexes and lung cancer among women.l>' The increases were somewhat larger amongst older people, particularly during the 1950s and 1960s, which suggests a role for improved diagnosis.i Data from North West England suggest no increase in the incidence of NHL in children aged 0-14, over the 35-year time period 1954-1988.4 The largest increases, observed from cancer registry data, and based on an age-peri ad-cohort analysis using the Clayton-Schifflers method.v" or a closely related method, generally occurred in Europe and North * Leukaemia Research Fund Centre for Clinical Epidemiology, University of Leeds, 17 Springfield Mount, Leeds LS2 9NG, UK. ** Department of Public Health Sciences, The University of Edinburgh, Medical School, Teviot Place, Edinburgh EH8 9AG, UK. 32 33 ANALYSIS OF AGE-PERIOD-COHORT MODELS America.v' The best fitting models greatly varied between registries throughout the world, so no overall conclusions can be drawn, in terms of period and cohort effects." Estimated mean percentage increases in the truncated age-standardized incidence rates (30-74 years), over the period 1973-1987, were largest for males in Bas-Rhin, France, where an increase of 12% per annum was reported, and for females in Cali, Colombia, where an increase of 9% per annum was reported.' A significant increase of 10.9% per year (P < 0.001), over the period 1980-1989, has been reported from a specialist registry in France. The increase was more marked in rural than in urban areas, being 19.6% and 8.1%, respectively. This increase could not be attributed to HIV. 7 For UK cancer registries, increases for males ranged from 2.6% per annum in Birmingham to 8.4% per annum, in South Thames. Corresponding increases for females were 2.2% and 7.6%, respectively. These increases in the UK are larger than the 2% per year rise previously reported, despite there having been problems with diagnosis of NHL. 8 An age-period-cohort analysis" was carried out applying Holford's method'" to NHL incidence data from Connecticut for the period 1935-1989, for both males and females. This indicated an increase in risk of approximately 2% per annum, since 1965, for males and females. In addition, age, non-linear period and non-linear cohort were statistically significant. A further study of Connecticut incidence data II shows that whilst incidence rates have increased up to the period 1985-1989, they have subsequently remained stable. The rate of increase itself has doubled since the late 1970s in males, but not in females. This is consistent with a strong impact of the HIV epidemic. If increases of this magnitude persist there will be an effective doubling of NHL incidence rates by the year 2010. 12 Neither the increased prevalence of HIV infection, nor the increase in the number of heart and kidney transplant patients who are given immunosuppressive drugs seem plausible explanations for increases of this magnitude. 12 A recent review'? of risk factors does not suggest any particular reason for such an international increase, whilst a new hypothesis l4,15 proposes a role for exposure to sunlight, but this is speculative. Other hypotheses that have been proposed to explain the increase include dietary factors and occupational exposure to certain chemicals.P During the late 1980s, the impact of AIDS is apparent among NHL rates for young and middle-aged men (20-54 years) in the USA, but not the UK. 16-19 A quantitative analysis has examined the effect of known and suspected artefacts and risk factors including problems with diagnosis, viruses, familial factors, medical conditions and drugs, radiation, occupation and environmental exposures. It concluded that 80% of the rise in incidence among all males, and 42% among those aged 0-64, remained unexplained." The aim of our study is to perform an age-periodcohort analysis of specially collected and carefully reviewed NHL incidence data for Yorkshire, UK. We use a log-linear Poisson model to assess the time trends between 1978 and 1991, taking into account age, sex and geographical location. We wish to clarify which components of time (i.e. age, period, and cohort) are important. Several methods of analysing for age, period and cohort effects are available 5,6,21,22 and there is, as yet, no general agreement as to which is most appropriate. The Clayton-Schifflers method 5 ,6 has the firmest statistical foundations, but provides less insight into the data than other methods such as those proposed by Robertson and Boyle'" or De Carli and La Vecchia.F Little comparative information on their performance is available and we have taken this opportunity of comparing three popular methods. MATERIALS AND METHODS Incidence Data A specialist register of NHL cases in the Yorkshire Health Region has been established, since 1977, based, at first, on a region-wide lymphoma panel,23 and subsequently on the Leukaemia Research Fund data collection study.24 All the cases entered are verified histologically and the use of overlapping data sources has made us confident that coverage of cases is nearly complete. Personal level data were extracted from this register with sex, date of birth, date of diagnosis and county of residence at diagnosis. Analysis has been confined to subjects aged 30-84 years since ascertainment in the elderly is unreliable and the number of cases aged under 30 years was small. There were only 156 NHL patients younger than 30, 5.5% of the total observed, and these were excluded. Statistical Methods The age-period-cohort model is given by E [ In r jj] = ai + Pj + c k ' where A is the number of age-groups, aj is the effect of age-group i, Pj is the effect of period j, r jj is the incidence rate, k = A - i - j, * and ck is the effect of birth cohort k. 34 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY 1982 PERIOD 1981 1980 1979 1978 60 61 62 63 64 AGE FIGURE 1 Diagram showing Age-Period grouping split into Older (lower diagonal) and Younger (upper diagonal) cohorts In age-period-cohort modelling, various approaches have been applied to resolve the identifiability problem, i.e. the fact that age, period and cohort are linearly dependent on one another (see * above). For example, suppose that a person aged 63 is diagnosed in 1979. Then, clearly, the year of birth is 1915 or 1916. If one solution to the full model is given by (~, p, f), then another solution is given by ~/, p', £/), where log a; = log a j + a + A(A-i), log pj = log Pj + ~ + Aj, log c~ = log c k - a - ~ - Ak. The parameters a, ~ and Aare arbitrary with a and ~ corresponding to fixing the origin, whereas A indicates the 1- dimensional family of solutions. It is not possible to separate out linear effects due to all three components, unless an extra constraint is applied. Three methods of analysis are considered here-those proposed by Clayton and Schifflers.i'" Robertson and Boyle.i! and De Carli and La Vecchia.P Clayton and Schifflersi-" do not make any extra assumptions or apply constraints. Under these circumstances, it is only possible to separate out non-linear effects of period and cohort. Regular increase or decrease is termed 'drift', and may be attributable to linear period, linear cohort, or both. Robertson and Boyle'" use extra data from individual records and implicit assumptions concerning constancy of effects within period and/or cohort grouping to solve the identifiability problem. Each age-period combination is associated with two cohorts, which do not overlap. This breaks the linear dependency but there is a problem (Figure 1). The two cohorts have different average ages and periods of diagnosis. Suppose we consider age group 60-64, and period 1978-1982. Then, the two associated cohorts were born 1913-1917 (lower diagonal) and 1918-1922 (upper diagonal). Separating this age-period group into the two associated cohorts, the older cohort has average age 63~ (lower diagonal), whereas the younger cohort has average age 61:X (upper diagonal). Similarly, the older cohort has average date of diagnosis 1979:X (lower diagonal), whereas the younger cohort has average date of diagnosis 1981 ~ (upper diagonal). Where there is a strong age effect present, the implicit assumption of equal age effects is incorrect. An assumption has also been made that the numbers of people at risk in 1978-1982 and born 1913-1917 are uniformly distributed by year of birth. This problem may be overcome." by including in the model an 01d/Y oung factor to distinguish between the two cohorts associated with each age-period combination. The interaction between Old/Young and Age is used to allow for the age differences between the cohorts. An analogous use is made of interactions between Old/Young and Period, and Old/Young and Cohort. De Carli and La Vecchia 22 work with grouped data and use a penalty function to solve the identifiability problem. All three 'two-factor' models (age-period, age-cohort, and period-cohort) are fitted. The penalty function measures the Euclidean distance between the two-factor model estimates and the family of threefactor model estimates (age-period-cohort-see earlier), and chooses as the full model that weighted average of the three two-factor model solutions which minimizes the penalty function. The same age groups (11 5-year groups: 30-34; 35-39; ... ; 80-84) and periods (1978-1982; 19831987; 1988-1991) were used throughout the present analysis. Note that the last period is only 4 years. For 35 ANALYSIS OF AGE-PERIOD-COHORT MODELS AGE GROUP PERIOD 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 1978-1982 11 10 9 8 7 6 5 4 3 2 1 1983-1987 12 11 10 9 8 7 6 5 4 3 2 1988-1991 13 12 11 10 9 8 7 6 5 4 3 COHORT FIGURE 2 Age X YEAR OF BIRTH 1 1893-1902 2 1898-1907 3 1903-1912 4 1908-1917 5 1913-1922 6 7 1918-1927 1923-1932 8 1928-1937 9 1933-1942 10 1938-1947 11 1943-1952 12 1948-1957 13 1953-1961 period table for the Clayton-Schifflers and De Carli-La Vecchia methods analysis using the methods of Clayton and Schifflers'-? and De Carli and La Vecchia.P 13 birth cohorts were derived from the three time periods and 11 age groups. This yields overlapping cohorts: (1893-1902; 18981907; ... ; 1948-1957; and 1953-1961). These are approximate cohorts, corresponding to the diagonals of the age X period Table (Figure 2). Note that the last cohort is only 9 years. For analysis by the method of Robertson and Boyle;" the incidence data were stratified into 14 birthcohorts (1893-1897; 1898-1902; ... ; 1953-1957; and 1958-1961). The first 13 cohorts are all 5 years, whilst the last is 4 years. Note also that the first and last cohorts each apply to only one age-period combination. All analyses were stratified by sex (male, female), and by the three constituent counties of the Yorkshire Regional Health Authority: West Yorkshire, Humberside and North Yorkshire. A strong increase in the incidence of NHL with age is well known,I.9,23 as is a large difference between males and females. Hence, these terms were included in the models without testing. Main effects of other factors and interactions were tested by comparing the changes in deviances with the appropriate X2 distributions. Applying the method of Clayton and Schifflers, the effects of county, drift, non-linear period and non-linear cohort were evaluated using a log-linear Poisson model, with link function as log, and the offset log (population years). Applying the Robertson and Boyle methodology, the effects of county, period and cohort were evaluated in the same manner; both period and cohort factors could be separated into linear and non-linear components. Interactions between county, sex, and the components of time (i.e, age, period, and cohort, or age, drift, non-linear period, non-linear cohort), were evaluated using the methods of Clayton and Schifflers and Robertson and Boyle. The data were analysed using the package GENSTAT V.26 Extra-Poisson variation was taken into account using the method of Breslow.27 Population Data For the Clayton-Schifflers and De Carli-La Vecchia methods of analysis, population data were obtained from the 1971 census (adjusted to 1974 boundaries by Office of Population Censuses and Surveys (OPCS), owing to a major local government re-organization), and from the 1981 and 1991 Censuses.i" An assumption was made that the census data referred to the populations at 1 April 1971, 1 April 1981 and 1 April 1991, respectively. For both sexes and each of the three 36 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY 1 Incidence of non-Hodgkin's lymphoma, person years at risk and age-specific incidence rate (per 100000 person years) for the period 1978-1991 TABLE Age-group Number of cases Person years at risk 40 INCIDENCE RA TEl/lOll 000) ,---- J5 1978-1982 1983-1987 1988-1991 .JO Incidence rate 25 30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84 68 98 121 161 206 294 336 429 437 326 202 3604634 3 114422 3 174627 2841933 2778572 2793766 2565733 2491947 2052960 1554855 947783 Total 2678 27921232 Standardized rate" 1.89 3.15 3.81 5.67 7.41 10.52 13.10 17.22 21.29 20.97 21.31 3.45 20 /.~:.:.::;.:-/./ \0 .... " 15 ,.',":.... .../ / .. 0l-----r-----.-----r----.----.--70 .JO 40 50 60 8C AGE FIGURE 3a Age-specific incidence curves for males and three time periods 1978-1982, 1983-1987. 1988-1991 " To the world population. constituent counties of Yorkshire (West Yorkshire, Humberside, North Yorkshire), population estimates were derived by linear interpolation between two adjacent censuses, for each of the 11 age groups (which correspond to available census data). For the ClaytonSchifflers and De Carli-La Vecchia methods of analysis, population estimates were used for the middle of each of the three periods (i.e. approximately 30 September 1980; 30 June 1985; and 1 January 1990). However, for the Robertson-Boyle method, estimates were derived for all years from 1978 to 1991. These estimates were assumed to be at mid-year points, and were used to construct population estimates for each county-sex-age group-period category. Following Robertson and Boyle," the population estimate for each of these categories was split into its two constituent cohorts. INCIDENCE 40 RA TE(/IOO 000) J5 19/8 1982 198J 1981 1988-1991 .JO 25 20 15 10 .JO 40 50 AGE 60 70 80 3b Age-specific incidence curves for females and three time periods 1978-1982. 1983-1987, 1988-1991 FIGURE RESULTS A total of 2678 verified cases of NHL in those aged 30-84 during the period 1978-1991 were used for the analysis. Table 1 shows the number of cases, population years at risk and incidence rates. Figures 3a and 3b show the age-specific incidence curves for males and females separately. The truncated age-standardized incidence rates were 4.17 per 100 000 person years for males and 2.80 for females (30-84 years). The truncated age- and sex-standardized incidence rates for West Yorkshire, Humberside and North Yorkshire were 3.40, 3.23 and 3.81 per 100000 person years, respectively (30-84 years). Clayton-Schifflers Method Using the Clayton-Schifflers method.i'" the main effects were tested (Tables 2 and 3). The effect of COUNTY, adjusting for age and sex, was highly significant (P = 0.0055). Thereafter all testing was done automatically adjusting for age, sex and county. The effect of DRIFT was very highly 37 ANALYSIS OF AGE-PERIOD-COHORT MODELS TABLE 2 Main effects models, residual deviances, degrees of freedom and mean residual deviances-using Clayton-Schifflers method d.f. Mean residual deviance 325.1 314.7 186 184 1.75 1.71 209.5 183 1.14 209.3 182 1.15 193.4 172 1.12 193.0 171 1.13 Residual deviance Model AGE+SEX AGE+SEX+COUNTY AGE+SEX+COUNTY + DRIFT AGE+SEX+COUNTY + PERIOD AGE+SEX+COUNTY + COHORT AGE+SEX+COUNTY +PERIOD+COHORT significant (P < 0.0001). There was no evidence of Non-Linear PERIOD or Non-Linear COHORT effects. When interactions were investigated, only one (COUNTY.COHORT, P = 0.085) approached statistical significance but this could be attributed to the small numbers of cases aged 30-34 years (omitting this group, the P-value of COUNTY.COHORT = 0.23). The best fitting model included age, sex, county and drift. The standardized residuals from fitting this model exhibit slightly more variation than might be expected under the Poisson assumption. The test for overdispersion was of marginal significance (P = 0.086). Parameter estimates obtained by allowing for extraPoisson variation, using the method of Breslow'? were not much different from those obtained from the model which assumed only Poisson variation. The effects of drift and county remained highly significant. Figures 4a and 4b show plots of predicted values obtained from the best fitting model. Robertson-Boyle Method Using the Robertson-Boyle method," main effects were tested (Tables 4 and 5). TABLE All testing adjusted for age, sex and county. The effects of PERIOD and COHORT were both very highly significant (P < 0.0001). The effect of PERIOD, adjusting for cohort, was not as strong, although still significant (P = 0.011), whereas, the effect of COHORT, adjusting for period, was still very highly significant (P = 0.0002). The interaction OLD/YOUNG.AGE was not significant (P = 0.21), as neither were the interactions OLDIYOUNG. PERIOD, nor OLDIYOUNG. COHORT. Linear-period and linear-cohort effects were both very highly significant (P < 0.0001). The non-linear cohort effect, adjusting for linear cohort, was also highly significant (P = 0.0063), but the non-linear period effect, adjusting for linear-period, was not significant (P = 0.65). Interactions were investigated using various base models (Table 6). The interaction COUNTY.COHORT was very highly significant (P = 0.0007), but COUNTY. LINEAR COHORT was not significant (P = 0.12). A further analysis was performed, omitting the reference age group (30-34 year olds). However, the COUNTY. COHORT interaction was still highly significant (P < 0.001). Hence, the best fitting model included age, sex, county, linear-period, cohort and county cohort. Goodness-of-fit testing, carried out by looking at the standardized residuals, show that the model fits well. There was no evidence of extra-Poisson variation. Figures 5a-c show plots of predicted values obtained from this model. De Carli-La Vecchia Method The penalty function method of De Carli and La Vecchia 22 was applied to the three 'two-factor' models of Clayton and Schifflers.Y' The estimates from the three 'two-factor' Clayton-Schifflers models 5,6 were used. The age-cohort model produces strong cohort effects. The period-cohort model produces negligible cohort 3 Tests of main effects, using models of Table 2-Clayton-Schifflers method Effect adjusted for Effect COUNTY COUNTY DRIFT NON-LINEAR NON-LINEAR NON-LINEAR NON-LINEAR PERIOD COHORT PERIOD COHORT AGE, AGE, AGE, AGE, AGE, AGE, AGE, SEX SEX, SEX, SEX, SEX, SEX, SEX, DRIFT COUNTY COUNTY, COUNTY, COUNTY, COUNTY, DRIFT DRIFT COHORT PERIOD Change deviance Change d.f. 10.4 10.5 105.2 0.2 16.1 0.4 16.3 2 2 1 1 11 1 11 Significance P P P P P P P = = < = = = = 0.0055 0.0052 0.0001 0.65 0.14 0.53 0.13 38 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY 80-84 75-79 70-74 65-69 60-64 55-59 1980 1982 1984 1986 1988 MID-YEAR OF TIME PERIOD FIGURE 4a Incidence rates versus period predicted by the model: Age+County+ Sex-Drift, for males in West Yorkshire, using the Clayton-Schifflers method 35 70-74 30 INCIDENCE RATE(Il00 (00) 25 20 15 10 65-69 ~~>~ ~~~. / 40-44 -----:: 35-39 ---- .--- 30-34 1898 1903 1908 1913 1918 1923 1928 1933 1938 1943 1948 1953 1957 MID-YEAR OF BIRTH COHORT FIGURE 4b Incidence rates versus cohort predicted by the model: Age+County+Sex+ Drift, for males in West Yorkshire, using the Clayton-Schifflers method 39 ANALYSIS OF AGE-PERIOD-COHORT MODELS TABLE4 Main effects models, residual deviances, degrees offreedom and mean residual deviances-using Robertson-Boyle method Model AGE+SEX AGE+SEX+COUNTY AGE+SEX+COUNTY+PERIOD AGE+SEX+COUNTY+COHORT AGE+SEX+PERIOD+COHORT AGE+SEX+COUNTY+PERIOD+COHORT AGE+SEX+COUNTY+LINEAR-PERIOD AGE+SEX+COUNTY+LINEAR-COHORT AGE+SEX+COUNTY+LINEAR-PERIOD+LINEAR-COHORT AGE+SEX+COUNTY+LINEAR-PERIOD+COHORT AGE+SEX+COUNTY+LINEAR-COHORT+PERIOD Residual deviance d.f. Mean residual deviance 559.5 549.1 443.5 413.5 414.7 404.5 443.7 441.1 432.8 405.2 432.4 384 382 380 369 369 367 381 381 380 368 379 1.46 1.44 1.17 1.12 1.12 1.10 1.16 J.16 1.14 1.10 1.14 TABLE5 Tests of main effects, using models of Table 4-Robertson-Boyle method Adjusted for Effect COUNTY PERIOD COHORT COUNTY PERIOD COHORT LINEAR-PERIOD LINEAR-COHORT NON-LINEAR PERIOD NON-LINEAR COHORT LINEAR-PERIOD LINEAR-COHORT AGE, SEX AGE, SEX, AGE, SEX, AGE, SEX, AGE,SEX, AGE, SEX, AGE, SEX, AGE, SEX, AGE, SEX, AGE, SEX, AGE, SEX, AGE, SEX, COUNTY COUNTY PERIOD, COHORT COUNTY, COHORT COUNTY, PERIOD COUNTY COUNTY COUNTY, LINEAR-PERIOD COUNTY, LINEAR-COHORT COUNTY, LINEAR-COHORT COUNTY, LINEAR-PERIOD Change deviance Change d.f. Significance 10.4 105.6 135.6 10.2 9.0 39.0 105.4 108.0 0.2 27.6 8.3 10.9 2 2 13 2 2 13 1 1 P 0.0055 P < 0.0001 P < 0.0001 P 0.0061 P= 0.011 = = P =0.0002 P < 0.0001 P < 0.0001 P =0.65 P =0.0063 P =0.0040 P =0.0010 1 12 1 1 TABLE6 Tests of interactions-Robertson-Boyle method Effect COUNTY.COHORT COUNTY.COHORT COUNTY. COHORT COUNTY.NON-LINEAR-COHORT Adjusted for AGE, COUNTY, SEX, LINEAR-PERIOD, COHORT AGE,COUNTY, SEX,COHORT AGE, COUNTY, SEX, PERIOD, COHORT AGE, COUNTY, SEX, LINEAR-PERIOD, COHORT, COUNTY.LINEAR-COHORT Change deviance Change 55.4 55.4 55.5 26 26 26 P = 0.0007 P 0.0007 51.1 24 P = 0.0010 Significance d.f. P = 0.0007 = 40 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY 30 INCIDENCE RATE(flOO 000 25 20 15 10 ~45-49 ------=::: 40-44 35.39 ~30-34 1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 MID-YEAR OF BIRTH COHORT 5a Incidence rates versus cohort predicted by the model: Age+County+ Sex+Linear-Period+Cohort+County. Cohort, for males in West Yorkshire, using the Robertson-Boyle method FIGURE 35 80-84 INCIDENCE RATE(flOO 000) 30 25 20 15 10 1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 MID-YEAR OF BIRm COHORT 5b Incidence rates versus cohort predicted by the model: Age+County+ Sex+Linear-Period+Cohort+County. Cohort, for males in Humberside, using the Robertson-Boyle method FIGURE ANALYSIS OF AGE-PERIOD-COHORT MODELS 41 70-74 J5 INCIDENCE RATE(f100 000) 30 25 20 15 10 1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 MID·YEAR OF BIRTH COHORT 5c Incidence rates versus cohort predicted by the model: Age+County+ Sex+Linear-Period+Cohort+County. Cohort, for males in North Yorkshire, using the Robertson-Boyle method FIGURE effects, except for cohorts 1, 11 and 13. The age-periodcohort model produces small cohort effects that are evidently increasing from cohort 1 through to cohort 13, but in a 'non-linear' fashion. Whilst the best fitting 'two-factor' model appears to be the period-cohort model, there seems to be stronger evidence for a non-linear cohort effect from the full model. The effects of cohorts 2-8 seem to be lower than cohorts 9-12. However, the method does not allow us to obtain standard errors. INTERPRET ATION Clayton-Schifflers Method (Model-Age, Sex, County, Drift) Figure 4a shows a graph of the estimates. of the multiplicative effects of Drift-parameterized in terms ofa linear period effect, for each of the 11 age groups. The estimates (Rate Ratios [RR]) in this graph are for males in West Yorkshire. Figure 4b shows a graph of the same multiplicative effect estimates, (i.e. males in West Yorkshire), but with Drift parameterized in terms of a linear cohort effect. Estimates for females, Humberside and North Yorkshire can easily be obtained. Humberside has an RR of 0.92, indicating a slightly lower incidence, whereas North Yorkshire has an RR of 1.10, indicating a slightly higher incidence, than West Yorkshire (the baseline incidence). The incidence rates for females are 69% those for males. DRIFT was estimated to be 1.28 (95% confidence interval [CI] : 1.22-1.34), but it is not possible to say if it is due to period or cohort (or both).5,6 There is a slight problem with interpretation due to the final period being only 4 years in length, and the final cohort being only 9 years. The increase due to drift is attributed to every unit increase in time. Hence, we can calculate only approximate yearly increases. For the first two periods this increase is 5.1% per annum (95% CI : 4.1-6.1%). For the last two periods the increase is 5.7% per year (95% CI : 4.6-6.8%). For the first 12 cohorts, drift represents an increase of 5.1% per year (95% CI : 4.1-6.1 %), whilst for the last two cohorts the annual increase is 5.7% (95% CI : 4.6-6.8%). Hence, the estimated increase in the incidence of NHL due to period of diagnosis or year of birth (or both) is between 5% and 6% per annum. Robertson-Boyle Method (Model-Age, Sex, County, Linear-Period, Cohort, County. Cohort) Figures 5a-c show graphs of the multiplicative effects of cohort for each of the three counties separately, and for each of the 11 age groups. The estimates are for males but female estimates can be obtained by multiplying by 0.69 (as found by the Clayton-Schifflers method).5,6 42 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY The effect of linear-period was estimated to be 1.14 (95% CI: 1.04-1.24), indicating a yearly increase somewhere in the range 2.6% (95% CI : 0.8-4.4%) to 2.9% (95% CI : 0.9-4.9%), i.e. about 2.6% to 2.9% per annum. However, we have not yet looked at the effect of birth cohort. The linear increase due to cohort is 0.137 (95% CI: 0.042-0.238). This gives an approximate annual increase of 2.6% (95% CI: 0.8-4.4%). Multiplying by the increase due to linear period gives an estimated rise of 5.24% (95% CI : 1.88-8.71 %). De Carli-La Vecchia Method (Model-Age, Sex, County, Period, Cohort) The full model estimates were closer to the Clay tonSchifflers model estimates than the Robertson-Boyle model estimates (Figure 6a-c). DISCUSSION Methodological Issues Three methods for performing an age-period-cohort analysis were used in this paper-those of Clayton and Schifflers.e" Robertson and Boyle," and De Carli and La Vecchia. 22 The main comparison is between the methods of Clayton-Schifflers-" and RobertsonBoyle.r' The Robertson and Boyle 21 method makes assumptions. The most important is that each ageperiod group can be split into two constituent cohorts, However, this is not correct and may lead to errors, since the two cohorts will have the same age and period effects, whereas their average ages and periods are different. This problem can be overcome by introducing an OLD/YOUNG cohort effect. 25 The second assumption is that the two birth- cohorts uniformly make up the age-period grouping. This is probably reasonable in the young and middle age groups, but may be unreliable in the older age groups, where there may tend to be a bias in favour of the younger birth cohort, or in the birth cohorts comprising those involved in either of the two world wars. Clayton and Schifftersv" argue against the method of Robertson and Boyle;" because of the aforementioned assumptions. The final assumption which is necessary for all three approaches is that the population changes in a linear fashion, within a county, between censuses. In the absence of information concerning migration, this or an alternative assumption is essential. The RobertsonBoyle method may be more sensitive to errors here since the linear interpolation was applied to provide annual population estimates rather than mid-census ones. A slight difference will occur between the estimates of drift obtained by the Robertson-Boyle" and Clayton-Schifflers methods.v" The explanation lies in the difference between the Robertson-Boyle'" and Clayton-Schifflerse" population estimates for period 1 (1978-1982). The mid-period estimates for the Clay tonSchifflers method were obtained by linearly interpolating between the 1971 and 1981 census estimates. For the Robertson-Boyle method.i' individual years estimates were required, so that there was linear interpolation between the 1971 and 1981 census estimates, and also between the 1981 and 1991 census estimates, to obtain the period estimate for 1978-1982. The differences we found are small, but noticeable (approximately < 1.5%), There is no reason to expect differences between the effects of non-linear period, obtained by either of these two methods. Three time periods are likely to be far too few to detect non-linear period effects, so it is not surprising that none were found. The Clayton-Schifflers method-" uses 13 lO-year overlapping cohorts, whereas the Robertson-Boyle method'" uses 14 5-year cohorts. Since each age-period group is associated with two cohorts, using the Robertson-Boyle method, and since there are three periods, then for each age group, changes over four consecutive cohorts may be detected. However, using the Clayton-Schifflers method.i'" each age-period group is associated with only one cohort, so that changes over three consecutive cohorts may be detected. Changes due to cohort therefore span a 10-year difference in mid-cohort birth year using the Clayton-Schifflers method.v'' whilst they span a 15-year difference using the Robertson-Boyle method." Although the entire time spanned by both approaches is 20 years, the Clayton-Schifflers cohort effects will be smoothed because of the overlap, whereas the Robertson-Boyle cohort effects will not. Therefore, the Robertson-Boyle methods may have a greater chance of detecting nonlinear cohort effects than the Clayton-Schifflers methods. However, we are particularly cautious concerning the Robertson-Boyle" estimates because of the assumptions involved in estimating cohort populations and because the standard errors of the estimates are much greater than those obtained by the Clayton-Schifflers method. Conversely, Clayton-Schifflerse" may be too conservative, because real effects may be smoothed out. In conclusion, the Clayton-Schifflers method ' ? only requires data on age group and period of diagnosis and hence uses either individual or grouped data, whereas the Robertson-Boyle merhod'" requires individual information on year of birth as well. Since the Robertson-Boyle method'" relies on the accuracy of 5-year non-overlapping birth cohort population estimates, whereas Clayton-Schifflerse'' uses 10-year overlapping birth cohort population estimates, the amount 43 ANALYSIS OF AGE-PERIOD-COHORT MODELS MULTIPLICATIVE EFFECTS OF AGE (DC·LV) 30 25 20 15 10 32.5 37.5 42.5 47.5 52.5 57.5 62.5 67.5 n.5 77.5 82.5 MID-YEAR OF AGE GROUP FIGURE 6a Multiplicative age effects, obtained using the method of De Carli and La Vecchia (DC-LV) Clayton and Schifflers (C-S), and Robertson and Boyle (R-B) 1.6 (C-S) MULTIPLICATIVE EFFECTS OF PERIOD (DC-LV) 1.5 1.4 u (R-B) 1.2 1.0 1980 1985 1990 MID-YEAR OF TIME PERIOD FIGURE 6b Multiplicative period effects. obtained using the method of De Carli and La Vecchia (DC-LV) Clayton and Schiff1ers (C-S), and Robertson and Boyle (R-B) 44 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY MULTIPLICATIVE EFFECTS OF COHORT (DC-LV) (C·S) 1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 MID- YEAR OF BIRTH COHORT FIGURE 6c Multiplicative cohort effects, obtained using the method of De Carli and La Vecchia (DC-LV) Clayton and Schifflers (C-S), and Robertson and Boyle (R-B) of error in these estimates is likely to be much greater using Robertson-Boyle/! results rather than Clay tonSchifflers.e" Therefore, we suggest caution concerning Robertson-Boyle" results. Epidemiological Results Our main concern, now, is to interpret the results, taking into account the differences between the approaches of Clayton-Schifflersv" and RobertsonBoyle.i! Both methods agree that there are large county and sex differences, as expected. They estimate that there is a linear increase in NHL incidence of about 5% per year. The estimate (for drift), obtained by the Clayton-Schifflers methodi:" is 5.09% per year (95% CI : 4.09-6.09%), whilst that from the Robertson-Boyle method" is 5.24% per year (95% CI : 1.88-8.71 %). Whilst the linear cohort and period increases are unidentifiable by the Clayton-Schifflers analysis.Y the Robertson-Boyle approach/! estimates that 2.57% per annum is due to period, and about 2.60% per annum is due to cohort. Non-linear period effects were not significant using either method. The major difference between the Robertson-Boyle-! and Clayton-Schifflersv" results lies in the non-linear cohort effects, and the interaction between county and cohort, which were both highly significant (P < 0.001) using the Robertson-Boyle" method, but were not significant (P = 0.13, 0.09 respectively) using the Clayton-Schifflersv" method. Estimates obtained from the De Carli-La Vecchia method 22 were compared with those of ClaytonSchifflers'" and Robertson-Boyle.P De Carli-La Vecchia22 period and cohort effects lie between those of Clay tonSchifflers and Robertson-Boyle;" but are much closer to the Clayton-Schifflers solution (Figures 6a-c). The period effects are similar, and cohort effects, while slightly higher, are of a similar order of magnitude to the Clayton-Schifflers results, Hence, this method seems to support the Clayton-Schifflers:" solution. Observed age-specific incidence rates were calculated applying the definitions of cohort used by the Clayton-Schifflersv" and Robertson-Boyle" methods. Examining changes in age-specific incidence rates for males and females, separately, by cohort, shows increases in all ages over 35 for males, and larger increases over 65. The increases start at 40 for females, Female incidence is approximately two-thirds that for males. Clayton-Schifflerso'' and Robertson-Boyle." show similar increases in age-specific incidence rates, ANALYSIS OF AGE-PERIOD-COHORT MODELS but Robertson-Boyle/! rates were much larger in magnitude. Examining changes in age-specific incidence rates, for each county, by cohort, using the Robertson-Boyle method.r! shows large differences between the three counties. In all three counties, there are striking increases for the 65-84 year olds. For West Yorkshire, there is a similar change in age groups within this range, whereas for Humberside and North Yorkshire, the effects differ widely. There is little change within the 30-64 year olds in Humberside, except for 50-54 year olds. In West Yorkshire, there are modest increases in the 55-64 year old group, and in North Yorkshire in all aged 40-64. Clayton-Schifflers V' shows large increases in all aged 65-84, and modest increases in 40-64 year oIds, but these increases are fairly consistent across counties. Examining age-specific incidence rates by sex and by county over the three periods, clearly shows very large increases for 65-84 year olds, and modest increases for 40-64 year oIds. There are differences in magnitudes between the sexes, but very little differences between Clayton-Schifflers v" and RobertsonBoyle," as expected. In conclusion, we have given an example where the Clayton-Schifflersv" method appears to be conservative not only on the interpretation of linear effects, but also non-linear cohort effects. Robertson-Boyle's method'" shows a 'DRIFT' effect, which is allocated equally to PERIOD and COHORT, and also provides evidence of strong (i.e. very highly significant) non-linear cohort, and COUNTY.COHORT interaction effects. Of these the Clayton-Schifflersi:" method only shows the drift. De Carli-La Vecchia gives a solution which appears close to that of Clayton-Schifflers. The increase in incidence is greater in the older age groups. It is important not to over-interpret the results of the methods of Robertson and Boyle, and De Carli and La Vecchia. Finally, there has been an annual increase of between 5-6% in the incidence of NHL in Yorkshire. This increase is larger than the one previously reported" for two reasons. First of all more recent data are included, and secondly this analysis considers specifically the ages 30-84. Clearly, there is cause for concern about this increase in NHL incidence, although it is possible that it may partly be attributed to improved diagnostic accuracy. ACKNOWLEDGEMENTS Mrs Beverley Cooper and Mrs Agnes McKeating are thanked for typing the manuscript. We gratefully 45 acknowledge the invaluable comments from two referees. The 1981 and 1991 Census data are Crown copyright, from the ESRC purchase. REFERENCES Parkin D M, Muir C S, Whelan S L, Gao Y-T, Ferlay J, Powell J (eds). Cancer Incidence in Five Continents: Vol VI. Lyon: IARC, 1992. 2 Devesa S S, Fears T. Non-Hodgkin's lymphoma time trends: United States and international data. Cancer Res 1992; 52 (Suppl.): 5432-40. 3 Coleman M P, Esteve J, Damiecki P, Arslan A, Renard H. Trends in Cancer Incidence and Mortality. Lyon: IARC, 1993. 4 Blair V. Birch J M. Patterns and temporal trends in the incidence of malignant disease in children: I: Leukaemia and lymphoma. Eur J Cancer 1994; 30A: 1490-98. 5 Clayton D, Schifflers E. Models for temporal variation in cancer rates. I: Age-Period and Age-Cohort models. Stat Med 1987; 6: 449-67. 6 Clayton D, Schifflers E. Models for temporal variation in cancer rates. II: Age-Period-Cohort models. Stat Med 1987; 6: 469-81. 7 Carli P M, Boutron M C, Maynadie M, Bailly F, Caillot D, Petrella T. Increase in the incidence of non-Hodgkin's lymphomas: Evidence for a recent sharp increase in France independent of AIDS. Br J Cancer 1994; 70: 713-15. 8 Cartwright R A. Changes in the descriptive epidemiology of non-Hodgkin's lymphoma in Great Britain. Cancer Res 1992; 52 (Suppl.): 5441-42. 9 Hoiford T R, Zheng T, Mayne S T, McKay L A. Time trends of non-Hodgkin's lymphoma: are they real? Cancer Res 1992; 52 (Suppl.): 5432-40. 10 Holford T R. The estimation of age, period and cohort effects for vital rates. Biometrics 1983; 39: 311-24. 11 Polednak A P. Trends in cancer incidence in Connecticut, 1935-1991. Cancer 1994; 74: 2863-72. 12 Cancer in South East England 1991. Thames Cancer Registry, Sutton, 1994. 13 Cartwright R A, McNally R J Q. Epidemiology of NonHodgkin's lymphoma. Cambridge Medical Reviews. Haematol Oncol 1994; 3: 1-34. 14 Cartwright R A, McNally R J Q, Staines A. The increasing incidence of non-Hodgkin's lymphoma (NHL): The possible role of sunlight? Leukemia and Lymphoma 1994; 14: 387-94. 15 Zheng T, Mayne S T, Boyle P et al. Epidemiology of nonHodgkin's lymphoma in Connecticut 1935-1988. Cancer 1992; 70: 840-49. 16 Levine A M, Shibata D, Sullivan Halley J et at. Epidemiological and biological study of acquired immunodeficiency syndrome-related lymphoma in the County of Los Angeles: Preliminary results. Cancer Res 1992; 19 (Suppl.): 5482-84. 17 McKinney P A, Alexander F E, Cartwright R A. Epidemiology of non-Hodgkin's lymphoma in parts of England and Wales (1984-1988). Leukemia and Lymphoma 1991; 5: 123-30. 18 CDR. Human Immunodeficiency Virus infection in the UK. Vol. /. London: Public Health Laboratory Service, 1988. 1 46 19 20 21 22 23 24 INTERNATIONAL JOURNAL OF EPIDEMIOLOGY Serraino D, Salamina G, Franceschi S et al. The epidemiology of AIDS-associated non-Hodgkin's lymphoma in the World Health Organization European Region. Br J Cancer 1992; 66: 912-16. Hartge P, Devesa S S. Quantification of the impact of known risk factors on time trends in non-Hodgkin's lymphoma incidence. Cancer Res 1992; 52 (Suppl.): 5566-69. Robertson C, Boyle P. Age, period and cohort models: The use of individual records. Stat Med 1986; 5: 527-38. De Carli A, La Vecchia C. Age, period and cohort models: A review of knowledge and implementation in GUM. Revisita di Statistica Applicata 1987; 20: 397-410. Bird C C, Lauder I, Kellett H S, Barnes N, Cartwright R A. Yorkshire Regional Lymphoma Histology Panel: Analysis of five years experience. J Path 1984; 143: 249-58. Cartwright R A, Alexander FE, McKinney P A, Ricketts T J. Leukaemia and Lymphoma: An Atlas ofDistribution within 25 26 27 28 Areas of England and Wales 1984-1988. London: Leukaemia Research Fund, 1990. Robertson C, Boyle P. Age-period-cohort models: A comparison of methods. In: Fahrmeir L, Francis B, Gilchrist R, Tutz G. Advances in GUM and Statistical Modelling. Basel: Springer, 1992. Payne R W, Lane P W, Ainsley A E et al. Genstat V reference Manual. Oxford: Clarendon Press, 1987. Breslow N E. Extra-Poisson variation in log-linear models. Appl Stat 1984; 33: 38-44. Office of Population Censuses and Surveys. Census 1971: Reports for the Counties of England and Wales as Constituted on 1st April 1974. London: HMSO, 1975. (Revised version received May 1996)
© Copyright 2026 Paperzz