A Comparison of Three Methods of Analysis for Age-Period

International Journal of Epidemiology
Vol. 26, No.1
Printed in Great Britain
© International Epidemiological Association 1997
A Comparison of Three Methods of
Analysis for Age-Period-Cohort
Models with Application to Incidence
Data on Non-Hodgkin's Lymphoma
RICHARD J Q MCNALLY,· FREDA E ALEXANDER,·· ANTHONY STAINES· AND
RAYMOND A CARTWRIGHT*
McNally R J Q (Leukaemia Research Fund Centre for Clinical Epidemiology, University of Leeds, 17 Springfield Mount,
Leeds LS2 9NG, UK), Alexander F E, Staines A and Cartwright R A. A comparison of three methods of analysis for
age-period-cohort models with application to incidence data on non-Hodgkin's lymphoma. International Journal of
Epidemiology 1997; 26: 32-46.
Background. Various methods of analysis have been used to study age-period-cohort models. The main aim of this
paper is to illustrate and compare three such methods. Those of Clayton and Schifflers, Robertson and Boyle, and De Carli
and La Vecchia. The main differences between these methods lie in their approach to distinguish between linear-period
and linear-cohort effects. Clayton and Schifflers do not attempt to solve this identification problem, whereas Robertson
and Boyle, and De Carli and La Vecchia attempt to tackle this question.
Method. In order to study the assumptions and problems of these methods, we analysed data from 2678 subjects aged
3o-B4 in Yorkshire, UK, who were diagnosed with non-Hodgkin's lymphoma (NHL) during the period 1978-1991. Loglinear Poisson models were used to examine the effects of age, period and cohort.
Results. All three methods of analysis agree that, after stratification for sex and county, the age-standardized rate has
been increasing at about 5% per year. The Robertson-Boyle method differed from the Clayton-Schifflers method in
showing a significant non-linear cohort effect, and a significant county-cohort interaction. The method of De CarliLa Vecchia agreed more closely with Clayton-Schifflers than with Robertson-Boyle.
Conclusions. The linear increase in incidence would lead to a doubling of the number of cases within 15 years. There is
controversy over whether the identification problem can be solved and should be solved. Many authors would not rely on
the results of the methods of Robertson and Boyle, or De Carli and La Vecchia.
Keywords: age-period-cohort models, time trends, epidemiological methods, identification problem, non-Hodgkin's
lymphoma, United Kingdom
Incidence rates for non-Hodgkin's lymphoma vary
widely throughout the world. 1 Higher rates are
observed from those cancer registries which have
predominantly fair-skinned populations.
There is a consistent excess in males (generally around
50 to 100%) at all ages. For males, age-standardized incidence rates, obtained from cancer registry data and
standardized to the world population, range from a low
of 1.4 per 100 000 person years in Bamako, Mali to a
high of 17.4 per 100 000 person years among whites in
the San Francisco Bay area (USA).I
Corresponding rates for females range from a low of
0.4 per 100 000 person years in Bamako, Mali to a high
of 10.6 per 100 000 person years in Manitoba, Canada. 1
In the UK, age-standardized rates (to the world
population) for males range from 6.5 per 100 000
person years in Birmingham to 9.5 per 100 000 person
years in North Scotland. 1
Overall, both mortality and incidence rates have been
increasing in virtually all registries, although the rates
of increase vary. The increase is more rapid than for
any other cancers except prostate cancer, melanoma for
both sexes and lung cancer among women.l>' The
increases were somewhat larger amongst older people,
particularly during the 1950s and 1960s, which suggests a role for improved diagnosis.i
Data from North West England suggest no increase
in the incidence of NHL in children aged 0-14, over the
35-year time period 1954-1988.4
The largest increases, observed from cancer registry
data, and based on an age-peri ad-cohort analysis using
the Clayton-Schifflers method.v" or a closely related
method, generally occurred in Europe and North
* Leukaemia Research Fund Centre for Clinical Epidemiology,
University of Leeds, 17 Springfield Mount, Leeds LS2 9NG, UK.
** Department of Public Health Sciences, The University of
Edinburgh, Medical School, Teviot Place, Edinburgh EH8 9AG, UK.
32
33
ANALYSIS OF AGE-PERIOD-COHORT MODELS
America.v' The best fitting models greatly varied
between registries throughout the world, so no overall
conclusions can be drawn, in terms of period and cohort
effects." Estimated mean percentage increases in the
truncated age-standardized incidence rates (30-74
years), over the period 1973-1987, were largest for
males in Bas-Rhin, France, where an increase of 12%
per annum was reported, and for females in Cali,
Colombia, where an increase of 9% per annum was
reported.'
A significant increase of 10.9% per year (P < 0.001),
over the period 1980-1989, has been reported from a
specialist registry in France. The increase was more
marked in rural than in urban areas, being 19.6% and
8.1%, respectively. This increase could not be attributed to HIV. 7
For UK cancer registries, increases for males ranged
from 2.6% per annum in Birmingham to 8.4% per
annum, in South Thames. Corresponding increases for
females were 2.2% and 7.6%, respectively. These increases in the UK are larger than the 2% per year rise
previously reported, despite there having been problems with diagnosis of NHL. 8
An age-period-cohort analysis" was carried out
applying Holford's method'" to NHL incidence data
from Connecticut for the period 1935-1989, for both
males and females. This indicated an increase in risk of
approximately 2% per annum, since 1965, for males
and females. In addition, age, non-linear period and
non-linear cohort were statistically significant.
A further study of Connecticut incidence data II
shows that whilst incidence rates have increased up
to the period 1985-1989, they have subsequently
remained stable. The rate of increase itself has doubled
since the late 1970s in males, but not in females. This is
consistent with a strong impact of the HIV epidemic.
If increases of this magnitude persist there will be an
effective doubling of NHL incidence rates by the year
2010. 12 Neither the increased prevalence of HIV infection, nor the increase in the number of heart and kidney
transplant patients who are given immunosuppressive
drugs seem plausible explanations for increases of this
magnitude. 12
A recent review'? of risk factors does not suggest any
particular reason for such an international increase,
whilst a new hypothesis l4,15 proposes a role for exposure to sunlight, but this is speculative. Other hypotheses that have been proposed to explain the increase
include dietary factors and occupational exposure to
certain chemicals.P During the late 1980s, the impact
of AIDS is apparent among NHL rates for young and
middle-aged men (20-54 years) in the USA, but not the
UK. 16-19
A quantitative analysis has examined the effect of
known and suspected artefacts and risk factors including problems with diagnosis, viruses, familial
factors, medical conditions and drugs, radiation,
occupation and environmental exposures. It concluded
that 80% of the rise in incidence among all males, and
42% among those aged 0-64, remained unexplained."
The aim of our study is to perform an age-periodcohort analysis of specially collected and carefully reviewed NHL incidence data for Yorkshire, UK. We use
a log-linear Poisson model to assess the time trends
between 1978 and 1991, taking into account age, sex
and geographical location. We wish to clarify which
components of time (i.e. age, period, and cohort) are
important.
Several methods of analysing for age, period and
cohort effects are available 5,6,21,22 and there is, as yet,
no general agreement as to which is most appropriate.
The Clayton-Schifflers method 5 ,6 has the firmest
statistical foundations, but provides less insight into the
data than other methods such as those proposed by
Robertson and Boyle'" or De Carli and La Vecchia.F
Little comparative information on their performance is
available and we have taken this opportunity of comparing three popular methods.
MATERIALS AND METHODS
Incidence Data
A specialist register of NHL cases in the Yorkshire Health
Region has been established, since 1977, based, at first,
on a region-wide lymphoma panel,23 and subsequently
on the Leukaemia Research Fund data collection study.24
All the cases entered are verified histologically and the
use of overlapping data sources has made us confident
that coverage of cases is nearly complete.
Personal level data were extracted from this register
with sex, date of birth, date of diagnosis and county of
residence at diagnosis. Analysis has been confined to
subjects aged 30-84 years since ascertainment in the
elderly is unreliable and the number of cases aged under
30 years was small. There were only 156 NHL patients
younger than 30, 5.5% of the total observed, and these
were excluded.
Statistical Methods
The age-period-cohort model is given by
E [ In r jj]
= ai + Pj + c k '
where A is the number of age-groups, aj is the effect of
age-group i, Pj is the effect of period j, r jj is the incidence rate, k = A - i - j, * and ck is the effect of birth
cohort k.
34
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
1982
PERIOD
1981
1980
1979
1978
60
61
62
63
64
AGE
FIGURE
1 Diagram showing Age-Period grouping split into Older (lower diagonal) and Younger (upper
diagonal) cohorts
In age-period-cohort modelling, various approaches
have been applied to resolve the identifiability problem,
i.e. the fact that age, period and cohort are linearly
dependent on one another (see * above). For example,
suppose that a person aged 63 is diagnosed in 1979.
Then, clearly, the year of birth is 1915 or 1916.
If one solution to the full model is given by (~, p, f),
then another solution is given by ~/, p', £/), where log
a; = log a j + a + A(A-i), log pj = log Pj + ~ + Aj, log c~
= log c k - a - ~ - Ak.
The parameters a, ~ and Aare arbitrary with a and ~
corresponding to fixing the origin, whereas A indicates
the 1- dimensional family of solutions.
It is not possible to separate out linear effects due
to all three components, unless an extra constraint is
applied. Three methods of analysis are considered
here-those proposed by Clayton and Schifflers.i'"
Robertson and Boyle.i! and De Carli and La Vecchia.P
Clayton and Schifflersi-" do not make any extra
assumptions or apply constraints. Under these circumstances, it is only possible to separate out non-linear
effects of period and cohort. Regular increase or decrease is termed 'drift', and may be attributable to
linear period, linear cohort, or both.
Robertson and Boyle'" use extra data from individual
records and implicit assumptions concerning constancy
of effects within period and/or cohort grouping to solve
the identifiability problem. Each age-period combination is associated with two cohorts, which do not
overlap.
This breaks the linear dependency but there is a
problem (Figure 1). The two cohorts have different
average ages and periods of diagnosis. Suppose we
consider age group 60-64, and period 1978-1982.
Then, the two associated cohorts were born 1913-1917
(lower diagonal) and 1918-1922 (upper diagonal).
Separating this age-period group into the two associated cohorts, the older cohort has average age 63~
(lower diagonal), whereas the younger cohort has
average age 61:X (upper diagonal).
Similarly, the older cohort has average date of
diagnosis 1979:X (lower diagonal), whereas the younger
cohort has average date of diagnosis 1981 ~ (upper
diagonal). Where there is a strong age effect present,
the implicit assumption of equal age effects is incorrect.
An assumption has also been made that the numbers of
people at risk in 1978-1982 and born 1913-1917 are
uniformly distributed by year of birth.
This problem may be overcome." by including in the
model an 01d/Y oung factor to distinguish between the
two cohorts associated with each age-period combination.
The interaction between Old/Young and Age is used
to allow for the age differences between the cohorts.
An analogous use is made of interactions between
Old/Young and Period, and Old/Young and Cohort.
De Carli and La Vecchia 22 work with grouped data
and use a penalty function to solve the identifiability
problem. All three 'two-factor' models (age-period,
age-cohort, and period-cohort) are fitted. The penalty
function measures the Euclidean distance between the
two-factor model estimates and the family of threefactor model estimates (age-period-cohort-see earlier),
and chooses as the full model that weighted average of
the three two-factor model solutions which minimizes
the penalty function.
The same age groups (11 5-year groups: 30-34;
35-39; ... ; 80-84) and periods (1978-1982; 19831987; 1988-1991) were used throughout the present analysis. Note that the last period is only 4 years. For
35
ANALYSIS OF AGE-PERIOD-COHORT MODELS
AGE GROUP
PERIOD
30-34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70-74 75-79 80-84
1978-1982
11
10
9
8
7
6
5
4
3
2
1
1983-1987
12
11
10
9
8
7
6
5
4
3
2
1988-1991
13
12
11
10
9
8
7
6
5
4
3
COHORT
FIGURE
2 Age
X
YEAR OF BIRTH
1
1893-1902
2
1898-1907
3
1903-1912
4
1908-1917
5
1913-1922
6
7
1918-1927
1923-1932
8
1928-1937
9
1933-1942
10
1938-1947
11
1943-1952
12
1948-1957
13
1953-1961
period table for the Clayton-Schifflers and De Carli-La Vecchia methods
analysis using the methods of Clayton and Schifflers'-?
and De Carli and La Vecchia.P 13 birth cohorts were
derived from the three time periods and 11 age groups.
This yields overlapping cohorts: (1893-1902; 18981907; ... ; 1948-1957; and 1953-1961). These are
approximate cohorts, corresponding to the diagonals of
the age X period Table (Figure 2). Note that the last
cohort is only 9 years.
For analysis by the method of Robertson and
Boyle;" the incidence data were stratified into 14 birthcohorts (1893-1897; 1898-1902; ... ; 1953-1957; and
1958-1961). The first 13 cohorts are all 5 years, whilst
the last is 4 years. Note also that the first and last
cohorts each apply to only one age-period combination.
All analyses were stratified by sex (male, female),
and by the three constituent counties of the Yorkshire
Regional Health Authority: West Yorkshire, Humberside and North Yorkshire.
A strong increase in the incidence of NHL with age is
well known,I.9,23 as is a large difference between males
and females. Hence, these terms were included in the
models without testing. Main effects of other factors and
interactions were tested by comparing the changes in
deviances with the appropriate X2 distributions.
Applying the method of Clayton and Schifflers, the
effects of county, drift, non-linear period and non-linear
cohort were evaluated using a log-linear Poisson model,
with link function as log, and the offset log (population
years).
Applying the Robertson and Boyle methodology, the
effects of county, period and cohort were evaluated in
the same manner; both period and cohort factors could
be separated into linear and non-linear components.
Interactions between county, sex, and the components of time (i.e, age, period, and cohort, or age,
drift, non-linear period, non-linear cohort), were evaluated using the methods of Clayton and Schifflers and
Robertson and Boyle.
The data were analysed using the package
GENSTAT V.26 Extra-Poisson variation was taken into
account using the method of Breslow.27
Population Data
For the Clayton-Schifflers and De Carli-La Vecchia
methods of analysis, population data were obtained
from the 1971 census (adjusted to 1974 boundaries by
Office of Population Censuses and Surveys (OPCS),
owing to a major local government re-organization),
and from the 1981 and 1991 Censuses.i" An assumption
was made that the census data referred to the populations at 1 April 1971, 1 April 1981 and 1 April 1991,
respectively. For both sexes and each of the three
36
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
1 Incidence of non-Hodgkin's lymphoma, person years at
risk and age-specific incidence rate (per 100000 person years)
for the period 1978-1991
TABLE
Age-group
Number of
cases
Person
years at risk
40
INCIDENCE
RA TEl/lOll 000)
,----
J5
1978-1982
1983-1987
1988-1991
.JO
Incidence
rate
25
30-34
35-39
40-44
45-49
50-54
55-59
60-64
65-69
70-74
75-79
80-84
68
98
121
161
206
294
336
429
437
326
202
3604634
3 114422
3 174627
2841933
2778572
2793766
2565733
2491947
2052960
1554855
947783
Total
2678
27921232
Standardized rate"
1.89
3.15
3.81
5.67
7.41
10.52
13.10
17.22
21.29
20.97
21.31
3.45
20
/.~:.:.::;.:-/./
\0
....
"
15
,.',":....
.../
/
..
0l-----r-----.-----r----.----.--70
.JO
40
50
60
8C
AGE
FIGURE 3a Age-specific incidence curves for males and three time
periods 1978-1982, 1983-1987. 1988-1991
" To the world population.
constituent counties of Yorkshire (West Yorkshire,
Humberside, North Yorkshire), population estimates
were derived by linear interpolation between two
adjacent censuses, for each of the 11 age groups (which
correspond to available census data). For the ClaytonSchifflers and De Carli-La Vecchia methods of
analysis, population estimates were used for the middle
of each of the three periods (i.e. approximately
30 September 1980; 30 June 1985; and 1 January 1990).
However, for the Robertson-Boyle method, estimates
were derived for all years from 1978 to 1991. These
estimates were assumed to be at mid-year points, and
were used to construct population estimates for each
county-sex-age group-period category. Following
Robertson and Boyle," the population estimate for each
of these categories was split into its two constituent
cohorts.
INCIDENCE
40
RA TE(/IOO 000)
J5
19/8 1982
198J 1981
1988-1991
.JO
25
20
15
10
.JO
40
50
AGE
60
70
80
3b Age-specific incidence curves for females and three
time periods 1978-1982. 1983-1987, 1988-1991
FIGURE
RESULTS
A total of 2678 verified cases of NHL in those aged
30-84 during the period 1978-1991 were used for the
analysis. Table 1 shows the number of cases, population
years at risk and incidence rates. Figures 3a and 3b show
the age-specific incidence curves for males and females
separately. The truncated age-standardized incidence
rates were 4.17 per 100 000 person years for males and
2.80 for females (30-84 years). The truncated age- and
sex-standardized incidence rates for West Yorkshire,
Humberside and North Yorkshire were 3.40, 3.23 and
3.81 per 100000 person years, respectively (30-84
years).
Clayton-Schifflers Method
Using the Clayton-Schifflers method.i'" the main
effects were tested (Tables 2 and 3).
The effect of COUNTY, adjusting for age and sex,
was highly significant (P = 0.0055). Thereafter all
testing was done automatically adjusting for age, sex
and county. The effect of DRIFT was very highly
37
ANALYSIS OF AGE-PERIOD-COHORT MODELS
TABLE 2 Main effects models, residual deviances, degrees of
freedom and mean residual deviances-using Clayton-Schifflers
method
d.f.
Mean residual
deviance
325.1
314.7
186
184
1.75
1.71
209.5
183
1.14
209.3
182
1.15
193.4
172
1.12
193.0
171
1.13
Residual
deviance
Model
AGE+SEX
AGE+SEX+COUNTY
AGE+SEX+COUNTY
+ DRIFT
AGE+SEX+COUNTY
+ PERIOD
AGE+SEX+COUNTY
+ COHORT
AGE+SEX+COUNTY
+PERIOD+COHORT
significant (P < 0.0001). There was no evidence of
Non-Linear PERIOD or Non-Linear COHORT effects.
When interactions were investigated, only one
(COUNTY.COHORT, P = 0.085) approached statistical
significance but this could be attributed to the small
numbers of cases aged 30-34 years (omitting this
group, the P-value of COUNTY.COHORT = 0.23).
The best fitting model included age, sex, county and
drift. The standardized residuals from fitting this model
exhibit slightly more variation than might be expected
under the Poisson assumption. The test for overdispersion was of marginal significance (P = 0.086).
Parameter estimates obtained by allowing for extraPoisson variation, using the method of Breslow'? were
not much different from those obtained from the model
which assumed only Poisson variation. The effects of
drift and county remained highly significant. Figures 4a
and 4b show plots of predicted values obtained from the
best fitting model.
Robertson-Boyle Method
Using the Robertson-Boyle method," main effects
were tested (Tables 4 and 5).
TABLE
All testing adjusted for age, sex and county. The
effects of PERIOD and COHORT were both very highly
significant (P < 0.0001). The effect of PERIOD, adjusting for cohort, was not as strong, although still significant (P = 0.011), whereas, the effect of COHORT,
adjusting for period, was still very highly significant
(P = 0.0002). The interaction OLD/YOUNG.AGE was
not significant (P = 0.21), as neither were the interactions OLDIYOUNG. PERIOD, nor OLDIYOUNG.
COHORT.
Linear-period and linear-cohort effects were both
very highly significant (P < 0.0001). The non-linear
cohort effect, adjusting for linear cohort, was also
highly significant (P = 0.0063), but the non-linear
period effect, adjusting for linear-period, was not
significant (P = 0.65).
Interactions were investigated using various base
models (Table 6). The interaction COUNTY.COHORT
was very highly significant (P = 0.0007), but
COUNTY. LINEAR COHORT was not significant (P =
0.12). A further analysis was performed, omitting the
reference age group (30-34 year olds). However, the
COUNTY. COHORT interaction was still highly
significant (P < 0.001).
Hence, the best fitting model included age, sex,
county, linear-period, cohort and county cohort.
Goodness-of-fit testing, carried out by looking at the
standardized residuals, show that the model fits well.
There was no evidence of extra-Poisson variation.
Figures 5a-c show plots of predicted values obtained
from this model.
De Carli-La Vecchia Method
The penalty function method of De Carli and La
Vecchia 22 was applied to the three 'two-factor' models
of Clayton and Schifflers.Y' The estimates from the
three 'two-factor' Clayton-Schifflers models 5,6 were
used.
The age-cohort model produces strong cohort effects.
The period-cohort model produces negligible cohort
3 Tests of main effects, using models of Table 2-Clayton-Schifflers method
Effect adjusted
for
Effect
COUNTY
COUNTY
DRIFT
NON-LINEAR
NON-LINEAR
NON-LINEAR
NON-LINEAR
PERIOD
COHORT
PERIOD
COHORT
AGE,
AGE,
AGE,
AGE,
AGE,
AGE,
AGE,
SEX
SEX,
SEX,
SEX,
SEX,
SEX,
SEX,
DRIFT
COUNTY
COUNTY,
COUNTY,
COUNTY,
COUNTY,
DRIFT
DRIFT
COHORT
PERIOD
Change
deviance
Change
d.f.
10.4
10.5
105.2
0.2
16.1
0.4
16.3
2
2
1
1
11
1
11
Significance
P
P
P
P
P
P
P
=
=
<
=
=
=
=
0.0055
0.0052
0.0001
0.65
0.14
0.53
0.13
38
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
80-84
75-79
70-74
65-69
60-64
55-59
1980
1982
1984
1986
1988
MID-YEAR OF TIME PERIOD
FIGURE 4a
Incidence rates versus period predicted by the model: Age+County+
Sex-Drift, for males in West Yorkshire, using the Clayton-Schifflers method
35
70-74
30
INCIDENCE
RATE(Il00 (00)
25
20
15
10
65-69
~~>~
~~~.
/
40-44
-----::
35-39
----
.---
30-34
1898 1903 1908 1913 1918 1923 1928 1933 1938 1943 1948 1953 1957
MID-YEAR OF BIRTH COHORT
FIGURE 4b Incidence rates versus cohort predicted by the model: Age+County+Sex+
Drift, for males in West Yorkshire, using the Clayton-Schifflers method
39
ANALYSIS OF AGE-PERIOD-COHORT MODELS
TABLE4 Main effects models, residual deviances, degrees offreedom and mean residual deviances-using Robertson-Boyle method
Model
AGE+SEX
AGE+SEX+COUNTY
AGE+SEX+COUNTY+PERIOD
AGE+SEX+COUNTY+COHORT
AGE+SEX+PERIOD+COHORT
AGE+SEX+COUNTY+PERIOD+COHORT
AGE+SEX+COUNTY+LINEAR-PERIOD
AGE+SEX+COUNTY+LINEAR-COHORT
AGE+SEX+COUNTY+LINEAR-PERIOD+LINEAR-COHORT
AGE+SEX+COUNTY+LINEAR-PERIOD+COHORT
AGE+SEX+COUNTY+LINEAR-COHORT+PERIOD
Residual
deviance
d.f.
Mean
residual
deviance
559.5
549.1
443.5
413.5
414.7
404.5
443.7
441.1
432.8
405.2
432.4
384
382
380
369
369
367
381
381
380
368
379
1.46
1.44
1.17
1.12
1.12
1.10
1.16
J.16
1.14
1.10
1.14
TABLE5 Tests of main effects, using models of Table 4-Robertson-Boyle method
Adjusted for
Effect
COUNTY
PERIOD
COHORT
COUNTY
PERIOD
COHORT
LINEAR-PERIOD
LINEAR-COHORT
NON-LINEAR PERIOD
NON-LINEAR COHORT
LINEAR-PERIOD
LINEAR-COHORT
AGE, SEX
AGE, SEX,
AGE, SEX,
AGE, SEX,
AGE,SEX,
AGE, SEX,
AGE, SEX,
AGE, SEX,
AGE, SEX,
AGE, SEX,
AGE, SEX,
AGE, SEX,
COUNTY
COUNTY
PERIOD, COHORT
COUNTY, COHORT
COUNTY, PERIOD
COUNTY
COUNTY
COUNTY, LINEAR-PERIOD
COUNTY, LINEAR-COHORT
COUNTY, LINEAR-COHORT
COUNTY, LINEAR-PERIOD
Change
deviance
Change
d.f.
Significance
10.4
105.6
135.6
10.2
9.0
39.0
105.4
108.0
0.2
27.6
8.3
10.9
2
2
13
2
2
13
1
1
P 0.0055
P < 0.0001
P < 0.0001
P 0.0061
P= 0.011
=
=
P =0.0002
P < 0.0001
P < 0.0001
P =0.65
P =0.0063
P =0.0040
P =0.0010
1
12
1
1
TABLE6 Tests of interactions-Robertson-Boyle method
Effect
COUNTY.COHORT
COUNTY.COHORT
COUNTY. COHORT
COUNTY.NON-LINEAR-COHORT
Adjusted for
AGE, COUNTY, SEX, LINEAR-PERIOD, COHORT
AGE,COUNTY, SEX,COHORT
AGE, COUNTY, SEX, PERIOD, COHORT
AGE, COUNTY, SEX, LINEAR-PERIOD, COHORT,
COUNTY.LINEAR-COHORT
Change
deviance
Change
55.4
55.4
55.5
26
26
26
P = 0.0007
P 0.0007
51.1
24
P = 0.0010
Significance
d.f.
P = 0.0007
=
40
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
30
INCIDENCE
RATE(flOO 000
25
20
15
10
~45-49
------=:::
40-44 35.39
~30-34
1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960
MID-YEAR OF BIRTH COHORT
5a Incidence rates versus cohort predicted by the model: Age+County+
Sex+Linear-Period+Cohort+County. Cohort, for males in West Yorkshire, using the
Robertson-Boyle method
FIGURE
35
80-84
INCIDENCE
RATE(flOO 000)
30
25
20
15
10
1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960
MID-YEAR OF BIRm COHORT
5b Incidence rates versus cohort predicted by the model: Age+County+
Sex+Linear-Period+Cohort+County. Cohort, for males in Humberside, using the
Robertson-Boyle method
FIGURE
ANALYSIS OF AGE-PERIOD-COHORT MODELS
41
70-74
J5
INCIDENCE
RATE(f100 000)
30
25
20
15
10
1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960
MID·YEAR OF BIRTH COHORT
5c Incidence rates versus cohort predicted by the model: Age+County+
Sex+Linear-Period+Cohort+County. Cohort, for males in North Yorkshire, using
the Robertson-Boyle method
FIGURE
effects, except for cohorts 1, 11 and 13. The age-periodcohort model produces small cohort effects that are
evidently increasing from cohort 1 through to cohort
13, but in a 'non-linear' fashion.
Whilst the best fitting 'two-factor' model appears to
be the period-cohort model, there seems to be stronger
evidence for a non-linear cohort effect from the full
model. The effects of cohorts 2-8 seem to be lower than
cohorts 9-12. However, the method does not allow us
to obtain standard errors.
INTERPRET ATION
Clayton-Schifflers Method
(Model-Age, Sex, County, Drift)
Figure 4a shows a graph of the estimates. of the multiplicative effects of Drift-parameterized in terms ofa
linear period effect, for each of the 11 age groups. The
estimates (Rate Ratios [RR]) in this graph are for males
in West Yorkshire.
Figure 4b shows a graph of the same multiplicative
effect estimates, (i.e. males in West Yorkshire), but
with Drift parameterized in terms of a linear cohort
effect.
Estimates for females, Humberside and North
Yorkshire can easily be obtained. Humberside has an
RR of 0.92, indicating a slightly lower incidence,
whereas North Yorkshire has an RR of 1.10, indicating
a slightly higher incidence, than West Yorkshire (the
baseline incidence). The incidence rates for females are
69% those for males. DRIFT was estimated to be 1.28
(95% confidence interval [CI] : 1.22-1.34), but it is not
possible to say if it is due to period or cohort (or
both).5,6 There is a slight problem with interpretation
due to the final period being only 4 years in length, and
the final cohort being only 9 years. The increase due to
drift is attributed to every unit increase in time. Hence,
we can calculate only approximate yearly increases. For
the first two periods this increase is 5.1% per annum
(95% CI : 4.1-6.1%). For the last two periods the increase is 5.7% per year (95% CI : 4.6-6.8%). For the
first 12 cohorts, drift represents an increase of 5.1% per
year (95% CI : 4.1-6.1 %), whilst for the last two cohorts the annual increase is 5.7% (95% CI : 4.6-6.8%).
Hence, the estimated increase in the incidence of NHL
due to period of diagnosis or year of birth (or both) is
between 5% and 6% per annum.
Robertson-Boyle Method
(Model-Age, Sex, County, Linear-Period, Cohort,
County. Cohort)
Figures 5a-c show graphs of the multiplicative effects of
cohort for each of the three counties separately, and for
each of the 11 age groups. The estimates are for males
but female estimates can be obtained by multiplying by
0.69 (as found by the Clayton-Schifflers method).5,6
42
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
The effect of linear-period was estimated to be 1.14
(95% CI: 1.04-1.24), indicating a yearly increase
somewhere in the range 2.6% (95% CI : 0.8-4.4%) to
2.9% (95% CI : 0.9-4.9%), i.e. about 2.6% to 2.9% per
annum. However, we have not yet looked at the effect
of birth cohort. The linear increase due to cohort is
0.137 (95% CI: 0.042-0.238). This gives an approximate annual increase of 2.6% (95% CI: 0.8-4.4%).
Multiplying by the increase due to linear period gives
an estimated rise of 5.24% (95% CI : 1.88-8.71 %).
De Carli-La Vecchia Method
(Model-Age, Sex, County, Period, Cohort)
The full model estimates were closer to the Clay tonSchifflers model estimates than the Robertson-Boyle
model estimates (Figure 6a-c).
DISCUSSION
Methodological Issues
Three methods for performing an age-period-cohort
analysis were used in this paper-those of Clayton and
Schifflers.e" Robertson and Boyle," and De Carli and
La Vecchia. 22 The main comparison is between the
methods of Clayton-Schifflers-" and RobertsonBoyle.r' The Robertson and Boyle 21 method makes
assumptions. The most important is that each ageperiod group can be split into two constituent cohorts,
However, this is not correct and may lead to errors,
since the two cohorts will have the same age and period
effects, whereas their average ages and periods are
different. This problem can be overcome by introducing
an OLD/YOUNG cohort effect. 25 The second assumption is that the two birth- cohorts uniformly make up the
age-period grouping. This is probably reasonable in the
young and middle age groups, but may be unreliable in
the older age groups, where there may tend to be a bias
in favour of the younger birth cohort, or in the birth
cohorts comprising those involved in either of the two
world wars. Clayton and Schifftersv" argue against the
method of Robertson and Boyle;" because of the
aforementioned assumptions.
The final assumption which is necessary for all three
approaches is that the population changes in a linear
fashion, within a county, between censuses. In the
absence of information concerning migration, this or
an alternative assumption is essential. The RobertsonBoyle method may be more sensitive to errors here since
the linear interpolation was applied to provide annual
population estimates rather than mid-census ones.
A slight difference will occur between the estimates of drift obtained by the Robertson-Boyle" and
Clayton-Schifflers methods.v" The explanation lies in
the difference between the Robertson-Boyle'" and
Clayton-Schifflerse" population estimates for period 1
(1978-1982). The mid-period estimates for the Clay tonSchifflers method were obtained by linearly interpolating between the 1971 and 1981 census estimates. For
the Robertson-Boyle method.i' individual years estimates were required, so that there was linear interpolation
between the 1971 and 1981 census estimates, and also
between the 1981 and 1991 census estimates, to obtain
the period estimate for 1978-1982. The differences we
found are small, but noticeable (approximately < 1.5%),
There is no reason to expect differences between the
effects of non-linear period, obtained by either of these
two methods. Three time periods are likely to be far too
few to detect non-linear period effects, so it is not
surprising that none were found.
The Clayton-Schifflers method-" uses 13 lO-year
overlapping cohorts, whereas the Robertson-Boyle
method'" uses 14 5-year cohorts. Since each age-period
group is associated with two cohorts, using the
Robertson-Boyle method, and since there are three
periods, then for each age group, changes over four
consecutive cohorts may be detected. However, using
the Clayton-Schifflers method.i'" each age-period group
is associated with only one cohort, so that changes over
three consecutive cohorts may be detected. Changes
due to cohort therefore span a 10-year difference in
mid-cohort birth year using the Clayton-Schifflers
method.v'' whilst they span a 15-year difference using
the Robertson-Boyle method." Although the entire
time spanned by both approaches is 20 years, the
Clayton-Schifflers cohort effects will be smoothed
because of the overlap, whereas the Robertson-Boyle
cohort effects will not. Therefore, the Robertson-Boyle
methods may have a greater chance of detecting nonlinear cohort effects than the Clayton-Schifflers
methods.
However, we are particularly cautious concerning the
Robertson-Boyle" estimates because of the assumptions involved in estimating cohort populations and
because the standard errors of the estimates are much
greater than those obtained by the Clayton-Schifflers
method. Conversely, Clayton-Schifflerse" may be too
conservative, because real effects may be smoothed out.
In conclusion, the Clayton-Schifflers method ' ? only
requires data on age group and period of diagnosis and
hence uses either individual or grouped data, whereas
the Robertson-Boyle merhod'" requires individual
information on year of birth as well. Since the
Robertson-Boyle method'" relies on the accuracy of
5-year non-overlapping birth cohort population estimates, whereas Clayton-Schifflerse'' uses 10-year overlapping birth cohort population estimates, the amount
43
ANALYSIS OF AGE-PERIOD-COHORT MODELS
MULTIPLICATIVE
EFFECTS OF AGE
(DC·LV)
30
25
20
15
10
32.5
37.5
42.5
47.5
52.5
57.5
62.5
67.5
n.5 77.5
82.5
MID-YEAR OF AGE GROUP
FIGURE 6a Multiplicative age effects, obtained using the method of De Carli and
La Vecchia (DC-LV) Clayton and Schifflers (C-S), and Robertson and Boyle (R-B)
1.6
(C-S)
MULTIPLICATIVE
EFFECTS OF
PERIOD
(DC-LV)
1.5
1.4
u
(R-B)
1.2
1.0
1980
1985
1990
MID-YEAR OF TIME PERIOD
FIGURE 6b Multiplicative period effects. obtained using the method of De Carli and
La Vecchia (DC-LV) Clayton and Schiff1ers (C-S), and Robertson and Boyle (R-B)
44
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
MULTIPLICATIVE
EFFECTS OF
COHORT
(DC-LV)
(C·S)
1895 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960
MID- YEAR OF BIRTH COHORT
FIGURE 6c Multiplicative cohort effects, obtained using the method of De Carli and
La Vecchia (DC-LV) Clayton and Schifflers (C-S), and Robertson and Boyle (R-B)
of error in these estimates is likely to be much greater
using Robertson-Boyle/! results rather than Clay tonSchifflers.e" Therefore, we suggest caution concerning
Robertson-Boyle" results.
Epidemiological Results
Our main concern, now, is to interpret the results,
taking into account the differences between the
approaches of Clayton-Schifflersv" and RobertsonBoyle.i! Both methods agree that there are large county
and sex differences, as expected. They estimate that
there is a linear increase in NHL incidence of about 5%
per year. The estimate (for drift), obtained by the
Clayton-Schifflers methodi:" is 5.09% per year (95%
CI : 4.09-6.09%), whilst that from the Robertson-Boyle
method" is 5.24% per year (95% CI : 1.88-8.71 %).
Whilst the linear cohort and period increases are
unidentifiable by the Clayton-Schifflers analysis.Y the
Robertson-Boyle approach/! estimates that 2.57% per
annum is due to period, and about 2.60% per annum is
due to cohort. Non-linear period effects were not
significant using either method.
The major difference between the Robertson-Boyle-!
and Clayton-Schifflersv" results lies in the non-linear
cohort effects, and the interaction between county and
cohort, which were both highly significant (P < 0.001)
using the Robertson-Boyle" method, but were not
significant (P = 0.13, 0.09 respectively) using the
Clayton-Schifflersv" method.
Estimates obtained from the De Carli-La Vecchia
method 22 were compared with those of ClaytonSchifflers'" and Robertson-Boyle.P De Carli-La Vecchia22
period and cohort effects lie between those of Clay tonSchifflers and Robertson-Boyle;" but are much closer
to the Clayton-Schifflers solution (Figures 6a-c). The
period effects are similar, and cohort effects, while
slightly higher, are of a similar order of magnitude to
the Clayton-Schifflers results, Hence, this method
seems to support the Clayton-Schifflers:" solution.
Observed age-specific incidence rates were calculated applying the definitions of cohort used by the
Clayton-Schifflersv" and Robertson-Boyle" methods.
Examining changes in age-specific incidence rates for
males and females, separately, by cohort, shows increases in all ages over 35 for males, and larger increases over 65. The increases start at 40 for females,
Female incidence is approximately two-thirds that for
males. Clayton-Schifflerso'' and Robertson-Boyle."
show similar increases in age-specific incidence rates,
ANALYSIS OF AGE-PERIOD-COHORT MODELS
but Robertson-Boyle/! rates were much larger in
magnitude.
Examining changes in age-specific incidence rates,
for each county, by cohort, using the Robertson-Boyle
method.r! shows large differences between the three
counties. In all three counties, there are striking
increases for the 65-84 year olds. For West Yorkshire,
there is a similar change in age groups within this
range, whereas for Humberside and North Yorkshire,
the effects differ widely.
There is little change within the 30-64 year olds in
Humberside, except for 50-54 year olds. In West
Yorkshire, there are modest increases in the 55-64 year
old group, and in North Yorkshire in all aged 40-64.
Clayton-Schifflers V' shows large increases in all aged
65-84, and modest increases in 40-64 year oIds, but
these increases are fairly consistent across counties.
Examining age-specific incidence rates by sex and
by county over the three periods, clearly shows very
large increases for 65-84 year olds, and modest increases for 40-64 year oIds. There are differences in
magnitudes between the sexes, but very little differences between Clayton-Schifflers v" and RobertsonBoyle," as expected.
In conclusion, we have given an example where the
Clayton-Schifflersv" method appears to be conservative
not only on the interpretation of linear effects, but also
non-linear cohort effects. Robertson-Boyle's method'"
shows a 'DRIFT' effect, which is allocated equally to
PERIOD and COHORT, and also provides evidence of
strong (i.e. very highly significant) non-linear cohort,
and COUNTY.COHORT interaction effects. Of these
the Clayton-Schifflersi:" method only shows the drift.
De Carli-La Vecchia gives a solution which appears
close to that of Clayton-Schifflers. The increase in
incidence is greater in the older age groups. It is
important not to over-interpret the results of the
methods of Robertson and Boyle, and De Carli and
La Vecchia.
Finally, there has been an annual increase of between
5-6% in the incidence of NHL in Yorkshire. This
increase is larger than the one previously reported" for
two reasons. First of all more recent data are included,
and secondly this analysis considers specifically the
ages 30-84. Clearly, there is cause for concern about
this increase in NHL incidence, although it is possible
that it may partly be attributed to improved diagnostic
accuracy.
ACKNOWLEDGEMENTS
Mrs Beverley Cooper and Mrs Agnes McKeating are
thanked for typing the manuscript. We gratefully
45
acknowledge the invaluable comments from two
referees. The 1981 and 1991 Census data are Crown
copyright, from the ESRC purchase.
REFERENCES
Parkin D M, Muir C S, Whelan S L, Gao Y-T, Ferlay J,
Powell J (eds). Cancer Incidence in Five Continents: Vol VI.
Lyon: IARC, 1992.
2 Devesa S S, Fears T. Non-Hodgkin's lymphoma time trends:
United States and international data. Cancer Res 1992; 52
(Suppl.): 5432-40.
3 Coleman M P, Esteve J, Damiecki P, Arslan A, Renard H.
Trends in Cancer Incidence and Mortality. Lyon: IARC,
1993.
4 Blair V. Birch J M. Patterns and temporal trends in the incidence
of malignant disease in children: I: Leukaemia and
lymphoma. Eur J Cancer 1994; 30A: 1490-98.
5 Clayton D, Schifflers E. Models for temporal variation in cancer
rates. I: Age-Period and Age-Cohort models. Stat Med
1987; 6: 449-67.
6 Clayton D, Schifflers E. Models for temporal variation in cancer
rates. II: Age-Period-Cohort models. Stat Med 1987; 6:
469-81.
7 Carli P M, Boutron M C, Maynadie M, Bailly F, Caillot D,
Petrella T. Increase in the incidence of non-Hodgkin's
lymphomas: Evidence for a recent sharp increase in France
independent of AIDS. Br J Cancer 1994; 70: 713-15.
8 Cartwright R A. Changes in the descriptive epidemiology of
non-Hodgkin's lymphoma in Great Britain. Cancer Res
1992; 52 (Suppl.): 5441-42.
9 Hoiford T R, Zheng T, Mayne S T, McKay L A. Time trends of
non-Hodgkin's lymphoma: are they real? Cancer Res 1992;
52 (Suppl.): 5432-40.
10 Holford T R. The estimation of age, period and cohort effects
for vital rates. Biometrics 1983; 39: 311-24.
11 Polednak A P. Trends in cancer incidence in Connecticut,
1935-1991. Cancer 1994; 74: 2863-72.
12 Cancer in South East England 1991. Thames Cancer Registry,
Sutton, 1994.
13 Cartwright R A, McNally R J Q. Epidemiology of NonHodgkin's lymphoma. Cambridge Medical Reviews.
Haematol Oncol 1994; 3: 1-34.
14 Cartwright R A, McNally R J Q, Staines A. The increasing
incidence of non-Hodgkin's lymphoma (NHL): The
possible role of sunlight? Leukemia and Lymphoma 1994;
14: 387-94.
15 Zheng T, Mayne S T, Boyle P et al. Epidemiology of nonHodgkin's lymphoma in Connecticut 1935-1988. Cancer
1992; 70: 840-49.
16 Levine A M, Shibata D, Sullivan Halley J et at. Epidemiological
and biological study of acquired immunodeficiency
syndrome-related lymphoma in the County of Los Angeles:
Preliminary results. Cancer Res 1992; 19 (Suppl.):
5482-84.
17 McKinney P A, Alexander F E, Cartwright R A. Epidemiology
of non-Hodgkin's lymphoma in parts of England and
Wales (1984-1988). Leukemia and Lymphoma 1991; 5:
123-30.
18 CDR. Human Immunodeficiency Virus infection in the
UK. Vol. /. London: Public Health Laboratory Service,
1988.
1
46
19
20
21
22
23
24
INTERNATIONAL JOURNAL OF EPIDEMIOLOGY
Serraino D, Salamina G, Franceschi S et al. The epidemiology
of AIDS-associated non-Hodgkin's lymphoma in the
World Health Organization European Region. Br J Cancer
1992; 66: 912-16.
Hartge P, Devesa S S. Quantification of the impact of known
risk factors on time trends in non-Hodgkin's lymphoma
incidence. Cancer Res 1992; 52 (Suppl.): 5566-69.
Robertson C, Boyle P. Age, period and cohort models: The use
of individual records. Stat Med 1986; 5: 527-38.
De Carli A, La Vecchia C. Age, period and cohort models: A
review of knowledge and implementation in GUM.
Revisita di Statistica Applicata 1987; 20: 397-410.
Bird C C, Lauder I, Kellett H S, Barnes N, Cartwright R A.
Yorkshire Regional Lymphoma Histology Panel: Analysis
of five years experience. J Path 1984; 143: 249-58.
Cartwright R A, Alexander FE, McKinney P A, Ricketts T J.
Leukaemia and Lymphoma: An Atlas ofDistribution within
25
26
27
28
Areas of England and Wales 1984-1988. London:
Leukaemia Research Fund, 1990.
Robertson C, Boyle P. Age-period-cohort models: A
comparison of methods. In: Fahrmeir L, Francis B,
Gilchrist R, Tutz G. Advances in GUM and Statistical
Modelling. Basel: Springer, 1992.
Payne R W, Lane P W, Ainsley A E et al. Genstat V reference
Manual. Oxford: Clarendon Press, 1987.
Breslow N E. Extra-Poisson variation in log-linear models.
Appl Stat 1984; 33: 38-44.
Office of Population Censuses and Surveys. Census 1971:
Reports for the Counties of England and Wales
as Constituted on 1st April 1974. London: HMSO,
1975.
(Revised version received May 1996)