MASTER’S THESIS
Cumulative probability of a false-positive
screening result in the Finnish
breast cancer screening program
Ilona Siljander
Master’s thesis
University of Helsinki
Faculty of Science
Department of Mathematics and Statistics
August 2016
HELSINGIN YLIOPISTO — HELSINGFORS UNIVERSITET — UNIVERSITY OF HELSINKI
Tiedekunta/Osasto — Fakultet/Sektion — Faculty
Laitos — Institution — Department
Matematiikan ja tilastotieteen laitos
Matemaattis-luonnontieteellinen
Tekijä — Författare — Author
Ilona Siljander
Työn nimi — Arbetets titel — Title
Cumulative probability of a false-positive screening result in the Finnish breast cancer
screening program
Oppiaine — Läroämne — Subject
Tilastotiede
Työn laji — Arbetets art — Level
Aika — Datum — Month and year
Sivumäärä — Sidoantal — Number of pages
Pro gradu -tutkielma
Marraskuu 2016
56 s.
Tiivistelmä — Referat — Abstract
Tässä pro gradu -tutkielmassa tutkitaan väärän positiivisen (FP) tuloksen kumulatiivista todennäköisyyttä Suomen 20-vuotisessa rintasyöpäseulonnassa. Tutkielmassa käytetään Suomen
joukkotarkastusrekisterin, Suomen Syöpärekisterin aineistoa, joka koostuu 50–51-vuotiaista vuosina 1992–1995 ensimmäisen seulontakutsun saaneista naisista.
Työn tilasto-osuudessa esitellään yleistettyjen lineaaristen mallien ja yleistettyjen estimointiyhtälöiden teoriaa. Kumulatiivisen todennäköisyyden mallintamiseen käytetään yleistettyjä
estimointiyhtälöitä (GEE), joilla mallinnetaan yksittäisen mammografian FP-tuloksen todennäköisyyttä. Tästä lasketaan kumulatiivinen todennäköisyys käyttämällä ehdollisen todennäköisyyden
kaavoja. Mallinnus huomioi erityisesti sen, että yksittäiset mittaustulokset eivät ole toisistaan
riippumattomia. Luottamusvälien (Lv) laskemiseen käytetään GEE-estimaattien asymptoottiseen
jakaumaan perustuvaa Monte Carlo -simulointia.
Lopputuloksena kumulatiivisen todennäköisyyden arvo oli 15.84 % (95 % Lv: 15.49–16.18 %).
Aiemman FP-tuloksen todettiin lisäävän uuden FP-tuloksen riskiä: yhdellä aiemmalla, odds ratio
(OR) 1.91 (95 % Lv: 1.78–2.04) ja usealla aiemmalla OR 3.09 (95 % Lv: 2.49–3.83). Lisäksi
epäsäännöllinen seulontoihin osallistuminen lisäsi FP-tuloksen riskiä OR 1.46 (95 % Lv: 1.37–1.56).
The purpose of this thesis is to study the cumulative probability of a false-positive (FP) test result
during the Finnish 20-year breast cancer screening program. This study is based on breast cancer
screening data provided by the Mass Screening Registry of the Finnish Cancer Registry, which
consists of women aged 50–51 years at the time of their first invitation to mammography screening
in 1992–1995.
Generalized estimating equations (GEE) are used to estimate the cumulative probability of a
FP screening result. In the theoretical part we present the corresponding theory together with
reviewing the theory of generalized linear models (GLM). The cumulative probabilities are calculated from the modeling of individual examinations by using the theory and formulas of conditional
probability. The confidence intervals (Cl) are calculated by using Monte Carlo simulation relying
on the asymptotic properties of the GEE estimates.
The estimated cumulative risk of at least one FP during the screening program was 15.84% (95%
Cl: 15.49–16.18%). Previous FP findings increased the risk of (another) FP results with an odds
ratio (OR) of 1.91 (95% Cl: 1.78–2.04), and OR 3.09 (95% Cl: 2.49–3.83) for one or more previous
FP results, respectively. Irregular screening attendance increased the risk of FP results with an OR
of 1.46 (95% Cl: 1.37–1.56).
Avainsanat — Nyckelord — Keywords
GEE, GLM, Monte Carlo simulation, Breast cancer screening, False-positive result
Säilytyspaikka — Förvaringsställe — Where deposited
Kumpulan tiedekirjasto
Muita tietoja — Övriga uppgifter — Additional information
Contents
1
Introduction
4
2
Background
5
2.1
Process of breast cancer screening and setting . . . . . . . . . . . .
5
2.2
Benefits and harms of screening . . . . . . . . . . . . . . . . . . .
6
2.3
An overview of false-positive screening result . . . . . . . . . . . .
7
2.4
Methods used estimating cumulative probability . . . . . . . . . . .
8
2.5
Study population . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
2.6
Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11
3
Statistical methods
13
3.1
Generalized linear model . . . . . . . . . . . . . . . . . . . . . . .
13
3.1.1
Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . .
13
3.1.2
The exponential family . . . . . . . . . . . . . . . . . . . .
14
3.1.3
Linear predictor . . . . . . . . . . . . . . . . . . . . . . . .
17
3.1.4
Link function . . . . . . . . . . . . . . . . . . . . . . . . .
17
3.1.5
Likelihood equation and asymptotic distribution for GLM .
18
Generalized estimating equations . . . . . . . . . . . . . . . . . . .
19
3.2.1
Parameter estimation . . . . . . . . . . . . . . . . . . . . .
20
3.2.2
Sandwich estimator . . . . . . . . . . . . . . . . . . . . . .
22
3.2.3
Correlation matrix . . . . . . . . . . . . . . . . . . . . . .
23
3.3
Cumulative probability . . . . . . . . . . . . . . . . . . . . . . . .
24
3.4
Confidence intervals . . . . . . . . . . . . . . . . . . . . . . . . .
27
3.2
4
Analysis
30
4.1
Data description . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
4.2
Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
34
4.2.1
The validity of specifying previous FP and regularity of the
attendance as categorical variables . . . . . . . . . . . . . .
36
4.2.2
The relation between the age and the probability of FP result
38
2
4.2.3
4.2.4
The capability of the model to explain the typical behavior
for each combination of age, previous FP and regularity of
the attendance . . . . . . . . . . . . . . . . . . . . . . . . .
41
Variability of the data . . . . . . . . . . . . . . . . . . . . .
43
5
Results
46
6
Discussion and conclusion
49
References
55
3
1
Introduction
A false-positive result represents disadvantage in a population-based mammography
screening program. In this study we estimate the cumulative risk of at least one falsepositive (FP) mammography test result during 10 biennial screening examinations
for women aged between 50 to 69 years.
A retrospective cohort study was performed for 84, 475 women aged 50–51 years
at the time of their first invitation to mammography screening in 1992–1995. These
women underwent altogether 477, 047 screening mammograms in the Finnish 20year breast cancer screening program. The attendance rate overall in the cohort group
was 88.4% during the screening program.
Generalized estimating equations (GEE) were used to estimate the cumulative probability of a false-positive screening result. The cumulative risk was calculated from
the parameters estimates by using simple probability theory. We studied the effect of
age, previous FP result and regularity of the women attending the screening rounds
on obtaining (another) FP test result. The 95% confidence intervals for the risk of
a false-positive screening result at each age group was estimated using a suitable
modification of bootstrapping.
Out of all screenings 2.8% were recalled for further examinations and of these
82.0% were false-positive results. The estimated cumulative risk of at least one
FP result during the screening program was 15.84% among the attendees (95% CI:
15.49–16.18%). Out off the whole study population 12.3% received at least one falsepositive screening result. In addition, age, regularity of attendance and a previous FP
were used to model the risk of an FP. The odds ratios were 0.61, 1.46, respectively,
as well as 1.91 and 3.09 for one or more previous FPs.
Approximately one in every six women will be recalled for further examinations
with at least one negative outcome, provided they attend the biennial mammography
screenings between ages 50 to 69 years. Irregular screening attendance and previous
FP findings increased the risk of (another) FP test result. False-positive results in
mammography screening are a disadvantage in a breast cancer screening program.
Understanding the burden of false-positive results to the target population helps in
analyzing the viability of the screening program.
4
2
Background
Organized screening means cancer screening programs with a systematic, populationbased approach with quality assurance at all appropriate levels (EU Council, 2003).
The main goal of organized cancer screening is to reduce the mortality through
the early detection of the disease (IARC, 2002). However, screening provides both
benefits and harms. Therefore, benefits of screening must be weighted against its
disadvantages.
Recall for assessments after mammography screening without detecting breast cancer
is referred to as a false-positive screening result, which is considered a disadvantage
of screening. In this work, we concentrate on false-positive (FP) screening test result,
which may cause psychological harms, increased hospital visits, increased diagnostic
tests as well as increased radiation exposure and additional costs. The cumulative
probability of a false positive screening test result is defined as woman experiencing
at least one FP screening result after certain time period defined beforehand.
An accurate estimation of the cumulative probability of FP screening test results,
from the long running screening programs, is important for program evaluation
(Hubbard et al., 2010). Life-time risk of different screening outcomes, including
screen-detected cancers and false-positive results have been estimated after multiple
screening rounds to study the effectiveness of screening program (Otten et al., 2013).
Moreover, the psychological harms can also be reduced by suitable informing of
individuals undergoing organized population based screening, regarding what they
should expect from testing over time (Hubbard et al., 2010).
In the next sections we present the overall process of breast cancer screening and
setting in Finland. There will be a brief overview about the benefits and disadvantages
of cancer screening overall. Furthermore, we concentrate on the disadvantages of a
false-positive screening result and present an overview on the earlier studies about the
risk of receiving a false-positive screening test result. In addition, we discuss briefly
about the methods used in estimating the cumulative risk of receiving a false-positive
result. Finally we will describe the study population and define the concepts used in
the study and analysis.
2.1
Process of breast cancer screening and setting
In Finland, the population-based breast cancer screening program, administered by
the Finnish Cancer Registry (FCR), started in 1987 (FCR, 2010). It is based on the
Government Decree on Screenings 1339/2006 (Ministry of Social Affairs and Health,
2006). According to the Ministry of Social Affairs and Health (2006), population
based breast cancer screening is to be offered free of charge by the municipalities
to women between 50–69 years every 2 years. Screening is part of the preventive
5
health care and means examination of the population, certain part of the population
or sampling, in order to discover a certain disease, its precursor, or to identify a
pathogen (Ministry of Social Affairs and Health, 2006).
Based on the FCR (2010), a personal invitation letter is sent by mail with a prefixed
time and place of screening as well as the results of the mammogram afterwards.
All women in the target age are invited. Breasts are examined by mammography.
Two radiologists will interpret the results. Women with positive results, or women
recommended for further assessment, are recalled for further examination (FCR,
2010). The program is administered according to the European guidelines (Perry
et al., 2006).
This study is based on breast cancer screening data provided by the Mass Screening
Registry of the Finnish Cancer Registry. The Mass Screening Registry provides
information about the name and address of women to be invited for the screening units
regularly (FCR, 2010). Stated by FCR (2010), screening units receive information
about the age groups that are included in the screening program in each year. The
information about the name and address of women are gathered from the Population
Register Centre.
According to the FCR (2010), the data about the invitations and the screening
mammogram examinations are stored in the Mass Screening Registry at the Finnish
Cancer Registry. The Registry maintains the database for breast cancer screening
and it provides the annual statistics about the breast cancer screening. In addition,
the Registry evaluates the impact of screening on mortality and the quality of the
screening program.
2.2
Benefits and harms of screening
Positive effects of screening have been largely studied and well established. Prevention of disease-specific mortality is the main aim of cancer screening (IARC, 2002).
Earlier diagnosis may improve the outcome of the treatment reducing disease-specific
mortality and increase the treatment options such as breast-conserving therapy.
The organized mammography screening in Finland has been studied to be effective
(Sarkeala et al., 2008b). The incidence-based mortality from breast cancer has been
reported to reduce approximately by 22–28% among the invited women (Sarkeala
et al., 2008a).
Screening can have various disadvantages. The most important harms of screening
are due to the adverse events of the diagnosis and management of cancer. This is
related to overdiagnosis, overtreatment and treatment related complications (Puliti
et al., 2012; Heinävaara et al., 2014; Lauby-Secretan et al., 2015; Harris et al., 2014).
Most reliable estimates indicate overdiagnosis of approximately 1–10% among all
invited (Puliti et al., 2012).
Various psychological harms may occur during the screening process, including
6
pain from the screening test, confirmation and information about the test result. Low
diagnostic specificity and false-positive test results can increase adverse psychosocial
consequences (Brett et al., 2005). False-positive results in screening process are
inevitable in breast cancer screening and therefore it needs to be endured to some
extent. It is important to keep the further assessment rate as low as possible without
missing breast cancer cases.
2.3
An overview of false-positive screening result
The negative effects of false-positive screening results have been widely studied.
False-positive results increase the psychological harms (Brodersen and Siersma,
2013; Bond et al., 2013). Recalls increase additional hospital visits, diagnostic tests,
radiation exposure and additional costs (Lerman et al., 1991; Bull and Campbell,
1991; Harris et al., 2014). False-positive screening result may also lead to decreased
participation in future screening rounds (Salz et al., 2011).
The risk of a false-positive test result is influenced by the screening procedures, such
as screening interval, single versus double reading by radiologist, the method how
screening tests are performed, training and the experience of the radiologist (Roman
et al., 2014). In addition, characteristics of the women, such as age, screening history,
use of hormone therapy, breast density and previous invasive procedures are prone to
impact the risk for false-positive screening test result (Roman et al., 2014). Indeed,
dense breast tissue makes it more difficult to detect the possible cancer. Women with
dense breast tissue are most likely to be younger at age. Therefore younger women
have more increased risk of receiving FP screening result.
A population based study in Spain (Castells et al., 2006) stated that factors associated with a higher cumulative risk of false-positive recall were previous benign
breast disease, premenopausal status, body mass index above 27, 3 and age 50–54
years. Another study from the USA (Christiansen et al., 2000) states that previous
breast biopsies, number of previous breast biopsies, family history of breast cancer,
premenopausal status and hormonal therapy were all associated with higher probability of false-positive mammograms. Contrary to Castells et al. (2006), Christiansen
et al. (2000) found that BMI was not statistically significant factor contributing to
the higher risk of FP result.
In Europe, the cumulative risk of a false-positive screening result after 10 biennial
screening rounds has been estimated to be 19.7% (Hofvind et al., 2012) and in
the U.S. the cumulative risk of a false-positive test result after 10 years of biennial
screening has been reported higher ranging from 41.6% to 49.1% (Elmore et al.,
1998; Christiansen et al., 2000; Hubbard et al., 2011).
The overall cumulative risk of a false-positive screening test for all centers in Norwegian breast cancer screening program was 20% (95% Cl: 19.7–20.4%) (Roman et al.,
2013) by using generalized linear mixed models (GLMM) for estimation with the
regularity of attendance as the explanatory variable. Roman et al. (2013) found that
7
irregular screening attendees had a higher risk of receiving FP screening test result
with odds ratio 1.12 (95% Cl: 1.06–1.20%). In Spain the cumulative false-positive
risk was estimated at 20.39% (95% Cl: 20.02–20.76%) by using the same regression
model as was used in the Norwegian study (Roman et al., 2012). In addition, a study
from Denmark, Njor et al. (2007) demonstrated that after completing 3–5 screening rounds, the extrapolated cumulative risk of false-positive test over a program of
10 screening rounds was predicted to be 15.8% for a woman participating in the
program.
2.4
Methods used estimating cumulative probability
In an international level the calculation methods for cumulative probability vary
across the countries. Example studies are gathered in a Table 2.1 to describe the
different methods used in estimating the cumulative probability of FP results.
Estimating the cumulative risk of FP with particular model is challenging because of
the differences between the programs, screening population, radiologist interpretation
differences and resources (Hubbard et al., 2010). The estimation is also difficult
within the country itself. In Finland, even though the breast cancer screening is based
on government law, the process still varies between the municipalities, some inviting
more extensively than others. The estimation is also challenging due to the fact, that
not all individuals receive the intended screening examinations due to non-adherence
or drop-outs.
Hubbard et al. (2010) bring up another challenging issue concerning the estimation of FP results. Since not all subject will attend all the recommended screening
rounds, nor will receive all the intended invitations, it results in right censoring of
screening round. Hubbard et al. (2010) studied statistical methods used in estimating
the cumulative risk of a false-positive screening test result. They discovered two
approaches used in the estimation: conditional approaches and marginal approaches.
Conditional probabilities estimate the risk of an FP result for the subdatasets, for
women who attend certain number of screening rounds, relying on the assumption
that the probability of an FP result at each screening round is independent of the
total number of screening rounds attended. Contrary to conditional approach, the
marginal approach relies on the assumption that the probability of a first FP result at
each screening round is constant after censoring (Hubbard et al., 2010).
We use conditional probability approach to estimate the cumulative risk of an FP
screening result conditional on age, attendance, previous FP result and regularity
of attendance. We are estimating the probability that after undergoing j screening
examinations, the subject has received at least one FP test result. In addition, we
are interested on how certain characteristics, such as a woman receiving several FP
results and regularity of attending the screening program influence the FP probability.
These are the same variables used by Baker et al. (2003) and Roman et al. (2013)
respectively.
8
The question of calculating the cumulative probability has been previously studied
for instance by Baker et al. (2003) and Gelfand and Wang (2000). For a more
comprehensive review of the methodological aspects we refer to Hubbard et al.
(2010).
In this study we use generalized estimating equations (GEE) to model the probability
of an FP test result, as a basis for calculating the cumulative probability. Indeed, a
single probability is modeled using GEE. Cumulative probability is then counted
from single cases as it has been done in previous studies by using simple probability
theory. Various methods have been used to calculate the confidence intervals. We
use a suitable modification of standard bootstrapping.
We use GEE because it allows for within subject correlations between observations.
GEE is a robust method for a longitudinal epidemiological study and it seems reasonable choice for the problem in hand. Previous studies (Roman et al., 2012, 2013;
Hubbard et al., 2011; Christiansen et al., 2000) have used, for instance, Bayesian or
GLMM methods, and for a further overview of the methodology we refer to Table 2.1.
9
10
364,991
231,310
1,565,364
21,261†
9,039‡
28,500
169,456
2,400
2,227
Europe
Norway
Spain
Denmark
50–69
50–69
45–69
50–69
50–69
50–69
40–59
40–69
40–69
Age group
2
2
2
2
2
2
2
Screen
Interval
6§
6§
5§
5§
7§
13
9
9
Observed
screening
rounds
10
10
10
10
10
7
10
10
9
Screening
rounds
estimated
19.7%
20%
20.39%
15.8%
8.1%
15.2%
41.6% ‡‡
49.1%
43.1%
Cumulative
risk of FP
(Hofvind et al., 2012)
(Roman et al., 2013)
(Roman et al., 2012)
(Njor et al., 2007)
(Njor et al., 2007)
(Puliti et al., 2011)
(Hubbard et al., 2011)
(Elmore et al., 1998)
(Christiansen et al., 2000)
Note*†
GLMM¶
GLMM¶
Note*†
Note*†
GLMM¶
MCMC
Bayesian
Reference
Method
*† Estimation
used by Hofvind (Hofvind et al., 2004)
Markov Chain Monte Carlo via Gibbs sampler to sample from the posterior distribution (Gelfand and Smith, 1990)
linear mixed model (Greenwood’s approximation for Cls)
with biennial screening and 61.3% with annual screening.
¶ Generalised
for rest of the screening rounds are based on the last observed screening round.
‡‡ 41.6%
mammography registries
Boston
§ Values
†† 7
† Copenhagen ‡ Fyn
Pooled estimate based on three studies (Hofvind et al., 2004; Njor et al., 2007; Salas et al., 2011).
Italy
USA††
Population
Country
Table 2.1. Summary of studies performed to estimate the cumulative probability of a FP screening result.
2.5
Study population
The study population included 84, 475 women aged 50–51 years at the time of their
first mammography invitation in 1992–1995. The women are identified by a unique
personal identification number given to all citizens in Finland. Information about
the invitations, attendance, further examinations and screening outcome is registered
in the Finnish Mass Screening Registry database, with the personal identification
number used as identifier for each woman.
The breast cancer screening is administered by digital mammography. Finnish breast
cancer screening program uses independent double reading by radiologists. An interpretation is given for both breasts from both readers by scoring. Score ranges from
0 to 5. A score 0 indicates test failure, 1 a normal screening result and 5 is highly
suspicious for malignancy. If the result indicates malignancy, the woman is recommended for further assessment and she receives an invitation for further examination
or referral for surgical procedure.
Further examination covers additional mammography, clinical breast examination,
ultrasound, pneumocystography or fine-needle/core-needle biopsy. Surgical procedure covers lumpectomy or partial/radical mastectomy. If no malignancy is found,
women are referred back to routine screening. All malignancies, invasive or ductal
carcinoma in situ, are histologically verified. Women who receive a diagnosis of
breast cancer are referred for treatment.
2.6
Definitions
Recall is defined as woman receiving an invitation for further examinations. In this
respect further examination procedures can be classified as additional mammography,
ultrasound, pneumocystography or biopsy. Referral is defined as woman receiving
an invitation for surgical procedure.
Positive diagnostic finding is classified as true positive if breast cancer (invasive or
ductal carcinoma in situ) is diagnosed in the woman on the basis of histological
findings.
The outcome of mammography screening examination is defined as a false-positive
test result, normal test result or true positive test result. In this study, a screening test
was defined as false-positive test result if a woman attended the screening test and
was recalled or referred for further assessment without being diagnosed with either
invasive carcinoma or ductal carcinoma in situ. Any recall or referral for further
assessments was considered a false-positive screening result if breast cancer was not
diagnosed during the diagnostic workup, regardless of the procedures performed.
We also examined whether there was independence between the outcomes of subsequent screens as in if the probability of woman receiving a FP result influenced the
fact whether the woman already had a previous FP result. Therefore we identified a
11
variable of previous FP result as well.
Previous FP was also studied as accumulated factor, meaning that women were
divided into three groups, whether they had no FP results, one previous FP results or
more than one previous FP result during the screening program.
Regularity of the attendance was used to model the risk of an FP screening result.
Woman was considered an irregular attendee at those screening rounds when she
had missed her last screening examination, meaning that there were 3.5 years
(42 months) in between the screening rounds. See more specific definitions for the
variables used in the model in the section 3.3.
The cumulative risk of a false-positive screening result is defined as the risk of
experiencing at least one false-positive test result in the biennial 20-year screening
program. The estimates of the cumulative probability of having a false-positive test
result was calculated by using formula provided in the section 3.3.
12
3
Statistical methods
We start by introducing the generalized linear model (GLM). First we will present the
elements of GLM in more detail together with the likelihood equation and asymptotic distribution for GLM. Furthermore, we will follow the GLM with generalized
estimating equations (GEE). We will address the specific elements of GEE method,
including asymptotic behavior, sandwich estimator and corresponding correlation
matrix.
In the last two sections we introduce the calculation for the cumulative probability
and calculation method based on bootstrapping argument for the confidence intervals.
3.1
Generalized linear model
This section is based on Agresti (2015), Fitzmaurice et al. (2012) and Hardin (2005).
3.1.1
Preliminaries
n our aim is to build a model which explains the effect of the
Given a dataset (yi, xi )i=1
independent variables x on the studied dependent variable y. In the standard linear
model, one studies the model
yi = x0i + "i,
i = 1, 2, . . . , n,
for
1
*. +/
*. 1 +/
x
2
i1
xi = ... .. /// ,
= ... .. /// ,
. . /
. . /
x
, p, ip and for the error "i , i = 1, 2, . . . , n, where the aim is to determine the coefficient
vector .
In the generalized linear model (GLM) the relation between the response and independent variables is not necessarily linear, and thus one introduces an additional link
function to describe the modeled relation. In this setting, it is also more natural to
model the mean E(yi |xi ) instead of the true value with some error, as above.
Definition 3.1. The generalized linear model is defined via the following three
elements:
1. a probability distribution of the response variable y from the exponential
family:
!
yi ✓i b(✓i )
(3.1)
f (yi ; ✓i, ) = exp
+ c(yi, ) ;
a( )
13
2. a linear predictor ⌘ = X ;
3. a link function g such that
⇥
⇤
g E(y|X) = ⌘,
(3.2)
where in (3.2) we have abused the notation as on the left hand side the function
g : R ! R is to be interpreted as acting on the components of the vector E(y|X).
The first element is the assumption of the distribution. GLMs are an extended family
of probability models for a univariate response variable yi . For each element i in
the dataset, we have a particular distribution depending on ✓i , where ✓i is typically
determined via the data xi . This relation can also be seen in (3.2), as we will later
explain.
The exponential family includes the normal distribution for a continuous response,
binomial distribution for binary response, and the Poisson distribution for counts.
In addition, it also includes several other distributions, such as gamma, beta and
negative binomial distribution.
Each element ⌘ j of the linear predictor is a function of the observed independent
n . Given the observed dataset
variables x j of the j’th element in the dataset (xi )i=1
x 11 . . . x 1p
x01
*. .
+
*
.. / = . .. +/
X = . ..
. / ../
0
, x n1 . . . x np - ,xn -
we obtain the linear predictor ⌘ = X which is then used to obtain the fitted value
for the mean of the random component y = (y1, . . . , yn )0 via the model (3.2). Unlike
earlier, here y is now a random variable corresponding to the response variable, and
in particular, it is not observed in the dataset. (Agresti, 2015, 2–4.)
3.1.2
The exponential family
All distributions that belong to the exponential family can be expressed in the same
general form (3.1), where ✓ is the location parameter (natural parameter), a is known
as prior weight and is a scale or dispersion parameter. Although the probability
distributions have quite different data types, such as continuous, dichotomous and
count, the members of the distributions have much in common. The function c is
merely a normalization constant. Indeed, for f to be a probability distribution, its
integral need to equal 1. As the distribution may vary with , we let the normalization
constant to depend on this parameter.
The location parameter ✓ is closely related to the mean of the distribution and is
related to the variance of the exponential distributions. For some distributions is
constant and does not require estimation. When is known, y has one-parameter
distribution. However for some distributions, is unknown parameter and while is
unknown, it has two-parameter distribution.
14
Example 3.1. Recall that the probability density function of the normal distribution
is
(3.3)
f (y; µ,
2
)=p
1
2⇡
2
!
(y µ) 2
,
2 2
exp
which can be written as
f (y; µ,
(3.4)
2
#
(y µ) 2
) = exp
log 2⇡
· exp
2 2
" 2
#
y
2y µ + µ2 log(2⇡ 2 )
= exp
2
2 2
!3
26 y µ 1 µ2
1 y2
2
2 77
6
= exp 6
+ log(2⇡ ) 7 .
2
2
2
64
75
f
2
1
2
g
"
From this representation we observe that the normal distribution is a member of the
exponential family, with the canonical location parameter ✓ = µ and the dispersion
parameter
= 2 . These correspond to the mean and the variance. Indeed, by
choosing a( ) = as well as
(3.5)
b(✓) = 12 ✓ 2 = 12 µ2
and
c(y, ) =
(3.6)
=
"
#
1 y2
+ log(2⇡ )
2
"
#
1 y2
2
+ log(2⇡ ) ,
2
2
we immediately see that (3.4) is of the form (3.1).
For the sake of completeness, we next calculate the mean and variance of y in this
setting. For f defined in (3.1) we have
15
E✓
"
#
@
log f (y; ✓) = E✓
@✓
=
Z1
=
Z1
26 @ f (y; ✓) 37
66 @✓
7
64 f (y; ✓) 775
@
@✓
f (y; ✓)
f (y; ✓)dy
f (y; ✓)
1
(3.7)
1
@
=
@✓
@
f (y; ✓)dy
@✓
Z1
f (y; ✓)dy =
1
@
1 = 0,
@✓
provided f is differentiable and regular enough for the above change of the order
of integration and differentiation to be justified. Here we also used the fact, that the
total mass of the probability density function is 1.
Since
(3.8)
"
@
@
y✓
log f (y; ✓, ) =
@✓
@✓ a( )
#
b(✓)
y
=
a( )
a( )
b0 (✓)
,
a( )
we obtain that
(3.9)
de f
µ = E✓ (y) = b0 (✓).
Similarly for the variance we have
(3.10)
@2
log f (y; ✓, ) =
@✓ 2
b00 (✓)
.
a( )
By using the standard properties of the Fisher information we obtain
" 2
#
! 23
26
7
@
@
6
E✓
log f (y; ✓, ) = E✓ 6
log f (y; ✓, ) 77 .
2
64 @✓
75
@✓
This together with (3.9) implies
" 2
#
! 23
26
7
b00 (✓)
@
@
6
= E✓
log f (y, ✓, ) = E✓ 6
log f (y; ✓, ) 77
2
a( )
64 @✓
75
@✓
(3.11)
! 23
⇥
⇤
26
y b0 (✓) 77 E (y µ) 2
Var(y)
6
= E✓ 6
=
=
.
7
2
a( )
64
75
a( )
a( ) 2
Finally, we obtain
(3.12)
Var(y) = b00 (✓)a( ),
where b00 (✓) is called the variance function. It describes how the variance relates to
the mean parameter ✓. (Agresti, 2015, 120–123.)
16
3.1.3
Linear predictor
The second element is the linear predictor. The linear predictor specifies the effects
of the covariates xi on the mean of yi and it is expressed as
(3.13)
⌘i =
1 x i1
+
2 x i2
+···+
p x ip
=
p
X
j xi j
= x0i .
j=1
The linear predictor is a linear combination of the unknown vector of regression
coefficients
= ( 1, . . . , p )0, and the vector of covariates xi . The objective of
the statistical analysis is to determine an estimate for the regression coefficient .
(Agresti, 2015, 123.)
3.1.4
Link function
The last element is the link function. The link function connects the random component and linear predictor of the GLM. It describes the relation between E(y|X) and
the linear predictor ⌘. The link function is a known function g(·), which is predetermined in the model specifications. Now the random and systematic components are
directly related via
(3.14)
⇥
⇤
⇥
⇤
g E(yi |X) = g E(yi |xi ) = ⌘ i = x0i .
de f
In the sequel, we will denote µi = E(yi |xi ).
Example 3.2. The standard linear regression model can be represented as a GLM
by choosing the identity link function g(µi ) = µi .
In principle, any function g(·) can be chosen to link the mean of the yi to the linear
predictor. However, every distribution that belongs to the exponential family has a
special link function called the canonical link. The canonical link function is defined
as
(3.15)
g(µi ) = ✓i,
Pp
where ✓i is the canonical location parameter ✓i = j=1 j x i j = ⌘ i . The link function
relates the mean of yi to the linear predictor and determines the scale on which the
additive effects of covariates have an impact on the mean response. The link function
has no other limitations except, that it is monotone and differentiable. (Fitzmaurice
et al., 2012, 327–335.) and (Agresti, 2015, 123.)
17
3.1.5
Likelihood equation and asymptotic distribution for GLM
The parameter can be estimated using maximum likelihood. By maximizing the
likelihood function, the method is bound to choose values of the coefficients that are
most likely to have generated from the data. In GLM the response is assumed to have
a distribution belonging to the aforementioned exponential family of distributions.
Recall that for independent observations y1, . . . , yn the likelihood function is defined
as
n
Y
de f
L(✓ | y) =
f ✓ (y j )
j=1
where f is a continuos probability distribution with density function depending on
the parameter ✓.
We will study the log-likelihood, and for that we define
l( ) =
n
X
i=1
(3.16)
=
n
X
li ( ) = log
n
Y
f (yi )
i=1
log f (yi ; ✓i, ) =
i=1
n
n
X
yi ✓i b(✓i ) X
+
c(yi, ).
a(
)
i=1
i=1
The notation l ( ) reflects the dependence of on the parameters ✓ 1 , . . ., ✓ n . We
maximize the likelihood function via the first-order condition
(3.17)
n
n
X
X
@li
@li @✓i @ µi @⌘ i
0=
=
,
@ j
@✓i @ µi @⌘ i @ j
i=1
i=1
where we used the chain rule to differentiate the log-likelihood (3.16).
Since µi = b0 (✓i ) as well as Var(yi ) = b00 (✓i )a( ) (cf. (3.9) and (3.12)) and
@li
yi b0 (✓i )
=
,
@✓i
a( )
we have
(3.18)
@ µi
Var(yi )
= b00 (✓i ) =
@✓i
a( )
In addition, since
⌘i =
p
X
k=1
18
and
k x ik
@li
yi µi
=
.
@✓i
a( )
we have @@⌘ ij = x i j . Finally, since ⌘ i = g(µi ), @@⌘µii depends on the link function for
the model. In summary, substituting into (3.17), we obtain
(3.19)
(yi µi )x i j @ µi
@li
(yi µi ) a( ) @ µi
=
xi j =
.
@ j
a( ) Var(yi ) @⌘ i
Var(yi ) @⌘ i
By summing over i we obtain the estimating equations for the GLM:
(3.20)
n
@l ( ) X (yi µi )x i j @ µ j
=
= 0,
@ j
Var(y
)
@⌘
i
j
i=1
j = 1, 2, . . . , p,
Pp
where ⌘ j = j=1 j x i j = g(µ j ) for the link function g. These are j equations for j
unknown variables 1, . . . , j .
Let V denote the diagonal matrix of variances of the observations, and let D denote
the diagonal matrix with elements @@⌘µii . For what follows, we observe that with this
notation, we may write the estimating equations in the matrix form as
X0DV
(3.21)
1
µ) = 0.
(y
After solving the parameter vector , we also need this to have the proper asymptotic
properties. Indeed, the estimating equations produce an unbiased estimator, with the
asymptotic large sample distribution of ˆ given by
N[ , (XT WX) 1 ],
(3.22)
where W is the diagonal matrix with elements
⇣ @µ ⌘2
i
(3.23)
wi =
@⌘ i
Var(yi )
.
(Agresti, 2015, 123–126.)
3.2
Generalized estimating equations
Generalized estimating equations (GEE) is an extension of the generalized linear
models developed by Liang and Zeger (1986). It is a semi-parametric approach
used in longitudinal, repeated measures or clustered study designs. GEE permits
a specification of a working correlation matrix that accounts for the within-subject
19
correlation of response variables. Indeed, in longitudinal data there are often repeated
measures over time. This may result in correlation within a subject or clusters of
observations over repeated measures. If the correlation is not taken into account in
the analysis, the inferences about the parameter estimates may be incorrect.
Ballinger (2004) state that, when using a linear regression, we may face the nonconstant variance of the dependent variable, especially when modeling probabilities,
as is our goal. In addition, error terms are not usually normally distributed. In such
cases, power transformation on the dependent variable prior to running an OLS
regression is an alternative option.
The GEE model is specified similarly to GLM as
g(µi ) = x0i .
(3.24)
However, contrary to GLM, full specification of the joint distribution as well as
the maximum likelihood estimation (MLE) is not required for the estimation of the
coefficient vector. GEE uses quasi-likelihood estimation rather than MLE or ordinary
least squares (OLS) to estimate the parameters. The quasi-likelihood method affirms
a model for the mean function µi j = E(yi j ), and specifies a variance function V (µi j ).
The estimate of is called the quasi-likelihood estimate when we maximize the
normality-based log-likelihood without the assumption, that the response is normally
distributed. (Agresti, 2015, 316–317.)
3.2.1
Parameter estimation
GEE consists of the same components as GLM:
1.
a probability distribution of the response variable y from the exponential
family;
2.
a linear predictor ⌘ = X ;
3.
a link function g such that
(3.25)
⇥
⇤
g E(y|X) = ⌘.
Here we have again used the same abuse of notation as in (3.2). Now the covariance
structure of the correlated responses are specified and modeled. Contrary to GLM,
the responses (y1, y2, . . . , yn ) are allowed to be correlated or clustered. This means
that the cases do not need to be independent. In addition, also the errors are allowed
to be correlated and the homogeneity of the variances does not need to be satisfied.
In the Chapter 3.1 we obtained the estimating equations for the GLM (3.20), that
included a linear predictor with an associated coefficient vector to be estimated.
20
Assuming n clusters with ni observations in each clusture, the estimating equation
can now be written as
26 X
37
!
ni
n X
yit µit @ µ
6
77
*
+
(3.26)
( ) = 66
x jit
= 0.
7
64, i=1 t=1 V (µit ) @⌘ it
- j=1,...,p 75 px1
Typically the observations in each cluster are measurements in different time points,
and thus the corresponding indexing is usually denoted by t.
Using a matrix notation, the equation (3.26) can be rewritten as
(3.27)
26 X
n
⇥
⇤
6
*
( ) = 66
x0ji D V(µi )
64, i=1
1
yi
37
77
µi +
= 0,
7
- j=1,...,p 75 px1
where D is the diagonal matrix as specified in (3.21), and
(3.28)
f
g
V(µi ) = V(µit ) 1/2 I(ni xni ) V(µit ) 1/2
.
ni xni
Observe, that here we assumed the responses to be independent. Correspondingly,
the identity matrix I can be interpreted as the correlation matrix, where I corresponds
to the assumption that the observations are mutually independent. Next, the aim is
to model the expected value with mutually dependent observations. Indeed, GEE is
a modification of the limited information maximum likelihood estimating equation
for GLMs that simply replaces the identity matrix with a more general correlation
matrix, since the variance matrix for correlated data does not have a diagonal form.
Formally, we replace the identity matrix I with the corresponding correlation matrix
R(↵):
(3.29)
f
g
V(µi ) = V(µit ) 1/2 R(↵)V(µit ) 1/2
ni xni
,
where ↵ is a parameter vector used to represent the correlation data. Now replacing
the identity matrix I in the estimating equation (3.27) by (3.29), we obtain the
generalised estimating equations. (Hardin, 2005, 57–58.) By solving the equations
we find the coefficient vector which is used in the model for the expected value µit
via g(µit ) = x0it .
While the derivation of the estimating equations, we explained, is not valid anymore
due to not having independent covariates, it turns out that the equations can still be
justified. Indeed, similarly to the formula (3.22) in the case of GLM, under suitable
regularity assumptions, we now have the asymptotic behavior given by
(3.30)
p
n( ˆ
d
) ! N (0, VG ),
21
where the asymptotic covariance matrix VG is given by
(3.31)
26X
3 1 26X
3
n
n
0
1 77
0
1
1 77
6
6
VG := lim n 6 Di Vi Di 7 6 Di Vi cov(yi )Vi Di 7
n!1 6
75 64 i=1
75
4 i=1
(Liang and Zeger, 1986).
3.2.2
26X
n
66 D0V
64 i=1 i
1
37
7
i Di 7
75
1
.
Sandwich estimator
For inferences about , valid standard errors can be obtained from the sandwich
estimator for (3.31) given by
26X
37
n
(3.32) n 66 D̂0i V̂i 1 D̂i 77
64 i=1
75
1
26X
n
66 D̂0V̂ 1 (yi
64 i=1 i i
µ̂i )(yi
37
µ̂i ) V̂i D̂i 77
75
1
0
26X
3
n
66 D̂0V̂ 1 D̂i 777
64 i=1 i i
75
1
,
where compared to (3.31) D and V have been replaced by the sample estimates D̂
and V̂ (Fitzmaurice et al., 2012, 357–359). In other words, GEE estimates regression
coefficients and standard errors with sampling distributions that are asymptomatically
normal.
Observe, that if the working covariance structure R(↵) is correct, i.e. Vi = cov(yi ),
the approximation VG for the asymptotic covariance matrix simplifies to the modelbased covariance matrix
(Agresti, 2015, 317–318.)
n
X
*
VG = lim n
D0i Vi 1 Di +
n!1
, i=1
-
1
.
Moreover, we note that by using the notation
B=
(3.33)
M=
n
X
D0i Vi 1 Di
i=1
n
X
D0i Vi 1 cov(yi )Vi 1 Di,
i=1
the asymptotic covariance of ˆ can be expressed as
avar( ˆ ) := B 1 MB 1 .
Here the notations B and M are sometimes referred as bread and meat of the
sandwich.
Considering the convergence results, in which the asymptotic properties rely, as usual
the most important assumption is that the model is correctly specified. Secondly, in
22
case of missing data, the consistency of the estimates requires the data to be missing
completely at random. In particular, contrary to maximum likelihood estimation the
probability of a missing data point should not depend on the value of the missing
item. (Fitzmaurice et al., 2012, 357–361.)
3.2.3
Correlation matrix
Common patterns of correlation matrices R(↵) are exchangeable (corr (yi j , yik ) =
↵), autoregressive (corr (yi j , yik ) = ↵ | j k | , independence (corr (yi j , yik ) = 0) and
unstructured (corr (yi j , yik ) = ↵ j k ) (Agresti, 2015, 316). These correlation structures
can be expressed in matrix form as follows: identity matrix
(3.34)
261 0 037
66
7
660 1 0777 ,
640 0 175
where correlation between time points is independent, Vi is diagonal; for exchangeable matrix
(3.35)
261 p p37
66
7
66 p 1 p777 ,
64 p p 175
where all measurements on the same unit are equally correlated; for autoregressive
matrix of order 1 ( AR1)
(3.36)
26 1 p p2 37
66
7
66 p 1 p 777 ,
64 p2 p 1 75
where correlation depends on time or distance between measurements i and m; e.g.
first order auto-regressive model has terms p, p2, p3 ; and finally, for the unstructured
correlation matrix we have
(3.37)
26 1 p12 p13 37
66
7
66 p21 1 p23 777 ,
64 p31 p32 1 75
where there are no assumptions about the correlations.
23
Finally, we point out that the GEE estimates are valid even if the correlation matrix
R(↵) is not correctly specified. Indeed, we specify the supposed correlation structure
in (3.29), but the convergence result of (3.30) holds for any such R(↵). Of course, this
may affect the value of the sandwich estimator for a given number of observations n.
In other words, mis-specifying the correlation structure may cause the convergence
to be slower, but it still is asymptotically convergent.
3.3
Cumulative probability
Definition 3.2. We define a random variable to be a real-valued function f : ⌦ !
(Varadhan, 2001).
Definition 3.3. Let ⌦ represent the rows of our dataset including all the screening
invitations, independent of the attendance. Let ⌦i be the set of all rows for person i
and denote by !ik the chronologically ordered k’th row for i’th individual. Denote
(3.38)
sumF Pik :=
k
X
F P(!i j ).
j=1
Here F P : ⌦ ! {0, 1} is the random variable corresponding to a false-positive test
result. We define the categorial variable measuring the number of previous falsepositive results as
prevF Pik = min{sumF Pik , 2}.
In the sequel, we will not pay attention to any particular individual or any particular
screening round, and thus, we will drop the subindices i and k to represent a generic
data row for a generic individual.
Woman is considered to be an irregular attendee at those screening rounds when she
has missed her last screening examination. Since the screening program in Finland
is biennial for breast cancer screening, this means that for a irregular attendee in this
study, there were 3.5 years (42 months) in between the screening rounds. We define
for non-attended screening invitations irregular = N A and for attended screenings
irregulari,1 = 0
and for k
for every i,
2 that
8
> 1,
irregularik = <
> 0,
:
if 3.5 years (42 months) after the last screening examination,
otherwise.
Next we calculate the conditional cumulative probability of a false-positive screening
result. Here the probability is calculated with the condition of a woman attending
24
the screening, having no previous FP result and a woman either being an irregular or
regular attendee.
{! 2 ⌦ | age(!) = j, prevF P(!) = 0, attend(!) = 1}
1
= [i=0
{! 2 ⌦ | age(!) = j, prevF P(!) = 0, attend(!) = 1, irregular (!) = i},
age, prevFP, attend, irregular: ⌦ ! . Therefore, age, prevFP, attend and irregular
are random variables. In the future, we will denote
{! 2 ⌦ | age(!) = j, prevF P(!) = 0}
= {age = j, prevF P = 0}
and
{! 2 ⌦ | age(!) = j, prevF P(!) = 0, attend(!) = 1, irregular (!) = i}
de f
= {age = j, prevF P = 0, attend = 1, irregular = i}.
We begin by using the decomposition
age = j, prevF P = 0, attend = 1 =
1
[
age = j, attend = 1, prevF P = 0, irregular = i
i=0
to deduce
P F P = 1 \ age = j, prevF P = 0, attend = 1
(3.39)
= P FP = 1 \
=
1
X
i=0
1
[
age = j, attend = 1, prevF P = 0, irregular = i
i=0
P F P = 1 \ age = j, attend = 1, prevF P = 0, irregular = i .
25
By using this identity and the definition of the conditional probability we obtain
(3.40)
P F P = 1 age = j, prevF P = 0, attend = 1
P F P = 1 \ age = j, prevF P = 0, attend = 1
P age = j, prevF P = 0, attend = 1
P1
P F P = 1 \ age = j, attend = 1, prevF P = 0, irregular = i
= i=0
P age = j, prevF P = 0, attend = 1
1
X P F P = 1 \ age = j, attend = 1, prevF P = 0, irregular = i
=
P age = j, attend = 1, prevF P = 0, irregular = i
i=0
=
⇥
=
1
X
P age = j, attend = 1, prevF P = 0, irregular = i
P age = j, prevF P = 0, attend = 1
P F P = 1 age = j, attend = 1, prevF P = 0, irregular = i
i=0
⇥ P irregular = i age = j, prevF P = 0, attend = 1
= P F P = 1 age = j, attend = 1, prevF P = 0, irregular = 0
⇥ P irregular = 0 age = j, prevF P = 0, attend = 1
+ P F P = 1 age = j, attend = 1, prevF P = 0, irregular = 1
⇥ P irregular = 1 age = j, prevF P = 0, attend = 1 .
This is used later to identify the risk of a FP screening result with respect to age.
Next we calculate the cumulative probability of a false-positive test result. We define
new random variables F P k and prevF P k which describe the corresponding value at
the k’th examination for the i’th individual !i . In other words, we define
F P k (!i ) := F P(!ik)
prevF P k (!i ) := prevF P(!ik ).
We formally obtain
P("F P = 0 for the first k screenings")
= P {F P1 = 0} \ {F P2 = 0} \ · · · \ {F P k = 0}
=
P {F P1 = 0} \ {F P2 = 0} \ · · · \ {F P k = 0}
P {F P1 = 0} \ {F P2 = 0} \ · · · \ {F P k 1 = 0}
⇥ P {F P1 = 0} \ {F P2 = 0} \ · · · \ {F P k
1
= 0}
⇥ P {F P1 = 0} \ {F P2 = 0} \ · · · \ {F P k
1
= 0} ,
= P(F P k = 0 prevF P k = 0)
26
for every k
1. Inductively, we obtain
P("F P = 0 for the first k screenings")
= P {F P1 = 0} \ {F P2 = 0} \ · · · \ {F P k = 0}
= P(F P k = 0 prevF P k = 0) ⇥ P(F P k
1
= 0 prevF P k
⇥ P {F P1 = 0} \ {F P2 = 0} \ · · · \ {F P k
..
.
= P(F P k = 0 prevF P k = 0) ⇥ P(F P k
2
1
2
= 0)
1
= 0)⇥
= 0}
= 0 prevF P k
2
1
1
· · · ⇥ P(F P = 0 prevF P = 0) ⇥ P {F P = 0} .
For the cumulative probability we will only consider the regular attendees, who
participate the screening regularly for every two years starting from age 50. Thus, for
k’th examination the corresponding whole population is the set of regular attendees
(i.e. irregular = 0) of age 50 + 2(k 1). We obtain
P("F P = 0 for the first k biennial screenings")
= P(F P = 0 | age = 50, prevF P = 0, irregular = 0)⇥
· · · ⇥ P(F P = 0 | age = 50 + 2(k
=
=
k
Y
⇥
n=1
k
Y
1
(1
1), prevF P = 0, irregular = 0)
P(F P = 1 | age = 50 + 2(n
1), prevF P = 0, irregular = 0)
⇤
qn ),
n=1
where we have denoted
de f
qn = P(F P = 1 | age = 50 + 2(n
1), prevF P = 0, irregular = 0).
Therefore, the cumulative probability of at least one FP during the first k screenings
is
(3.41)
de f
pk = 1
k
Y
(1
qn ).
n=1
3.4
Confidence intervals
For calculating the confidence intervals for the cumulative probability, we use a
method based on a bootstrapping argument with some computational simplifications.
The bootstrapping method gives possibility to form confidence intervals estimates
for unknown parameter when the distribution of the estimator ˆ is unknown. With
the given dataset and a model we form the confidence intervals by generating a large
number of bootstrap estimates.
27
Without going to excessive details about the methodology of bootstrapping, we
consider the question through a simple example developed for our setting. In our
case, we consider the model
Pr (F P) = logit 1 [
0
+
1 log(agemonths
580) +
where the logit function is defined as
(3.42)
logit(p) = log
p
1
p
!
,
2
· prevF P +
3
· irregular],
p 2 (0, 1).
Now the model, with the estimated parameter vector ˆ , provides an estimate for
Pr (F P) given the data, and we can use this estimate Pr (F P) to produce a new set of
binary variables F Pgenerated generated by using these probabilities. In practice, we
obtain a dataset, with newly generated values of F Pgenerated , similarly to Table 3.1.
Table 3.1. Illustration matrix of bootstrapping.
F Pgener at ed
Pr (F P)model
Agemont hs
Irregular
prevFP
FP
0
0
1
..
.
0.01
0.02
0.03
..
.
590
600
610
..
.
0
0
1
..
.
1
0
0
..
.
0
0
1
..
.
In the bootstrapping argument, we ignore the original values of F P, and produce a
new estimate ˜ for based on the newly generated values of F Pgenerated . Next, we
calculate the cumulative probabilities corresponding to a large number of generated
parameter vectors ˜ . Finally, we obtain the confidence intervals directly from the
distribution of this set of cumulative probabilities.
However, since our dataset is very large consisting of over 470,000 examinations,
generating a large number of new datasets is not computationally feasible. Assuming
that our originally estimated parameter vector ˆ represents the true value of , and
that our model is correctly specified, the asymptotic behavior of the GEE estimates
gives
p
d
(3.43)
n( ˜ ˆ ) ! N (0, VG ),
where VG is the covariance matrix given in (3.31).
Therefore, in order to simplify the computational requirements, we directly use this
asymptotic distribution for the values of = ˜ , assuming that the originally estimated
parameter vector ˆ represents the true value. Thus, we generate the distribution of
cumulative probabilities by randomizing a number of new parameter vectors from
the probability distribution given by
N ( ˆ, VG /n).
28
We acknowledge that this approach is most likely not as robust as the standard
bootstrapping argument. Relying on the large number of observations it, however,
seems reasonable to assume that ˆ is very close to the true value, supporting our
choice. In the next chapter we will consider our modeling in detail, to also argue that
our modeling is correctly specified, providing further support for this computational
simplification.
29
4
Analysis
Generalized estimating equations was used to estimate the cumulative probability of a
false-positive screening result. We studied the effect of age, previous FP and regularity
of the women attending the screening rounds on obtaining (another) FP test result.
Age was used as a continuous variable and previous FP and regularity variables were
categorised. The 95% confidence intervals for the risk of a false-positive screening
result at each age group was estimated using a suitable modification of bootstrapping,
see Section 3.4.
We used the cohort data, where the first invitation was sent to women aged 50–51,
the program lasting until the age of 69. We then used the modeling to estimate
the cumulative probability of a FP result for women attending all the 10 biennial
screenings of the program.
The analysis was performed with R (R Development Core Team, 2008), using geepack
package (Halekoh et al., 2006; Yan and Fine, 2004; Yan, 2002).
4.1
Data description
The women were invited for screening mammography examination every two years.
There were altogether 539, 512 invitations and a total of 477, 047 screening mammograms were carried out (see Table 4.1). The attendance rate overall was 88.4%.
Altogether 13, 454 screening examinations, 2.8% out of the total, resulted in a recall
for further examination and of these 82% received a false-positive result. Altogether,
9,981 women (12.3%) received at least one false-positive tests result. Moreover,
3, 207 referrals were given, and a total of 2, 310 (0.5% of the attended women) breast
cancers were diagnosed.
The largest number of false-positive test results (3, 399) were received at the first
screening round, at age 50–51. The number of false-positive results diminished by
age, older women receiving less FP results. Note that the number of women attended
diminishes also along with the age, with a total of 18, 287 invited and 15, 258 attended
women at the age of 68–69. The distribution of invitations, attendance rates, recalls,
referrals and test results by age is shown in the Table 4.1.
In the Table 4.2 the distribution of invited women are presented in an individual
level. The Table describes the variety of how many women have received how
many invitations, and how many women have attended how many screening rounds.
Overall, 576 women received one invitation in the cohort group and 3, 722 received
ten invitations. 1, 874 women participated only in one screening round and 2, 430
women attended all ten screening rounds. However, the most concentration of the
study population can be seen in the screening rounds 5, 6 and 7, where over 10, 000
women received either 5, 6 or 7 invitations and attended at least 5, 6 or 7 screenings.
30
The Table 4.2 also describes the number of women who received more than one
invitation at same age and the number of women who attended mammography
screening two times a row at same age. This is referred to as a double invitation. There
were altogether 1, 228 double invitations and 620 women attended the screening
second time during the same age. Altogether 120 women received 11 invitations and
out of those 64 women attended 11 screening examinations. In addition, 3 women
received 12 invitations and out of these 1 attended all 12 screening examinations.
All women were included in the analysis, including those who received double
invitations and attended both times and women who received more than ten invitations
and attended them all. There were 3040 women who did not attend any screening
examinations.
The number of women, both in invited and attendees, decreased over time. The cohort
data was received only until the end of 2012, therefore there are some age groups
which have not yet received all 10 invitations during the study period. This means
that all the invitations and screening examinations that should be included in the
cohort data were not yet available. We note, that this can have an biased effect on the
estimates. We could have based our estimates on the data from 7 screening rounds
and extrapolate the cumulative risk of false-positive result over a program of 10
screening rounds. However, there seems to be no reason to expect different behavior
of women with different year of birth with respect to obtaining a FP screening result.
Therefore, we expect there to be no bias and for simplicity use the whole dataset
without correction.
31
32
Invitations
84,704
76,098
76,180
81,141
80,426
34,261
25,114
30,630
32,671
18,287
539,512
Age
50–51
52–53
54–55
56–57
58–59
60–61
62–63
64–65
66–67
68–69
Total
477,047 (88.4)
75,390 (89.0)
67,501 (88.7)
68,152 (89.5)
72,481 (89.3)
72,075 (89.6)
30,704 (89.6)
21,607 (86.0)
26,150 (85.4)
27,729 (84.9)
15,258 (83.4)
Attendance (%)
13,454 (2.8)
3,708 (4.9)
2,030 (3.0)
1,725 (2.5)
1,562 (2.2)
1,537 (2.1)
669 (2.2)
434 (2.0)
670 (2.6)
685 (2.5)
434 (2.8)
Recall (%)
3,207 (0.7)
631 (0.8)
357 (0.5)
323 (0.5)
391 (0.5)
453 (0.6)
206 (0.7)
149 (0.7)
265 (1.0)
251 (0.9)
181 (1.2)
Referral (%)
2,310 (0.5)
291 (0.4)
233 (0.4)
216 (0.3)
300 (0.4)
364 (0.5)
173 (0.6)
131 (0.6)
237 (0.9)
215 (0.8)
150 (1.0)
TP (%)
11,036 (2.3)
3,399 (4.5)
1,784 (2.6)
1,496 (2.2)
1,250 (1.7)
1,164 (1.6)
486 (1.6)
300 (1.4)
427 (1.6)
460 (1.7)
270 (1.8)
FP (%)
Table 4.1. Distribution of invitations, attendance, recalls, referrals, TP and FP from 1992 through 2012 by age.
33
84,475
Total
81,435 (96.4)
1,874 (2.3)
2,203 (2.7)
3,409 (4.2)
8,329 (10.2)
22,890 (28.1)
14,868 (18.3)
11,056 (13.6)
7,641 (9.4)
6,670 (8.2)
2,430 (3.0)
†
1,228
1,170
55
3
N
No.
of
women
with
no.
of double
invitations
620
N
No.
of
women
with no. of
attended
double
times
11,984 (14.7)
10,657
1,192
127
8
N (%)
3,121 (3.8)
3,035
86
N (%)
2,277 (2.8)
2,244
33
N (%)
No. of women No. of women No.
of
with no. of re- with no. of re- women
calls
ferrals
with no. of
TP
120 women received 11 invitations and out of those 64 women attended 11 screening examinations.
3 women received 12 invitations and out of those 1 attended 12 screening examinations.
† 3,040 (3.6%) women did not attend any screening rounds.
576 (0.7)
748 (0.9)
1,237 (1.5)
5,179 (6.1)
24,421 (28.9)
15,870 (18.8)
13,202 (15.6)
9,875 (11.7)
9,522 (11.3)
3,722 (4.4)
N (%)
N (%)
No.
1
2
3
4
5
6
7
8
9
10
No. of women
with no. of attended screenings
No. of women
with no. of invitations in total
9,981 (12.3)
9,012
887
78
4
N (%)
No. of women
with no. of FP
Table 4.2. Distribution of women in an individual level. Number of invitations observed at screening rounds 1 to 10 and overall number
of women. Number of women attended at screening rounds 1 to 10 and in total. Number of women with number of double times invited
during one year with one, two or three times. Number of women attended two times during the same year. Number of women with
number of recalls, referrals and in total. Number of women with number of true positives, false positives and in total.
4.2
Modeling
Similarly to Roman et al. (2013) and Baker et al. (2003) we use age, previous FP
screening results and regularity of attendance as explanatory variables. Previous
studies (Roman et al., 2012, 2013) have estimated the model-based cumulative probability of an FP screening result by screening round. This can be considered to be
equivalent to modeling with the age, assuming the women start the mammography
screening program at a certain age and attend the screenings regularly.
In this study we estimate the probability of an FP result by age. We make this choice,
because the underlying medical characteristics vary according to the age of the
patient, and not according to the screening round. For instance, younger breasts are
more dense, making them to be more difficult for radiological interpretation (Roman
et al., 2014). In the model building we use age as a continuous variable in months
to be able to get the most data points for the model formation. Then we define two
categorical variables. For variable definitions, see Sections 2.6 and 3.3.
We use the logit-model specified as
(4.1) logit F P =
0
+
1 log(agemonths
580) +
2
· prevF P +
3
· irregular.
Here we have normalized the monthly age variable to 580 months corresponding to
the minimum age of 588 months (49 years) in the data. Also other normalizations
are possible, but we expect the reasonable ones to present similar results. Observe
that log(0) = 1.
For the GEE modeling binomial family was used together with the logit link function.
For the correlation structure an independence structure was used in both models,
when modeling the FP and when modeling the irregular variable.
For the correlation matrix independence, exchangeable and AR1 structures were
tested. For the exchangeable structure the value of the correlation parameter ↵ proved
to be ↵ = 0.00718 which indicated low correlation within the observations. Therefore, an independence correlation structure was chosen. We emphasize that this is
not an assumption on the data, as GEE is robust with respect to a incorrect choice of
the correlation structure. However, with a proper choice of the structure, we obtain
the best standard deviations, leading to the most reliable modeling. In particular, we
dot not assume independence between the observations.
Since GEE estimates are sensitive for unrandom missing data, we comment that in
our setting the missing data can be considered random, since the screening program
covers the whole target population. In particular, there is no reason to assume that
the population studied during the cohort years, is not a proper representation of the
general phenomenon. However, since the cohort data was received only until the end
of 2012, all the women have not yet received all 10 invitations. This raises a question
whether the data is randomly missing. However, here we refer to previous discussion
34
in the Section 4.1 about the possible bias due to missing data.
Next we consider the feasibility of our modeling (see Table 4.3).
Table 4.3. GEE model with logit link function and independence correlation
matrix.
Model
Estimate
Std.Error
Wald
Sig
OR
Lower
OR
Upper
OR
Intercept
log(Age-580)†
Irregular
FP
> 1 FPs
1.7975
0.5024
0.3778
0.6462
1.1270
0.0421
0.0109
0.0324
0.0349
0.1105
1821
2125
136
342
104
<2e-16***
<2e-16***
<2e-16***
<2e-16***
<2e-16***
0.17
0.61
1.46
1.91
3.09
0.15
0.60
1.37
1.78
2.49
0.18
0.62
1.56
2.04
3.83
Signif.codes:
0’***’
0,001’**’
0,01’*’ 0,05’.’
† Age
in months.
The monthly age variable is normalized to 580 months corresponding to the
minimum age of 588 months.
As a first check we observe that all the variables are statistically significant. Age,
regularity of the attendance and a previous FP were used to model the risk of a FP
screening result. The corresponding odds ratios are 0.61, 1.46, respectively, as well
as 1.91 and 3.09 for one or more previous FPs. Women with irregular attendance
and previous FP result have an increased risk of receiving FP screening result.
The primary question in the modeling is to determine the true model representing
the studied phenomenon. This is important, not only for the asymptotic of the GEE
modeling, but also for the bootstrapping argument, which relies on the model based
data generating process being truthful.
We concluded that all the variables in the model were statistically significant factors.
Next we concentrate further to study the validity of the model. We will concentrate
on following four main questions:
1.
the validity of specifying previous FP and regularity of the attendance as
categorical variables,
2.
the proper relation between the age and the probability of FP screening test
result e.g. logarithmic or power like relation,
3.
the capability of the model to explain the typical behavior for each combination
of age, previous FP and regularity of the attendance,
4.
how much the model explains the variability of the data.
First thing is to consider, that categorizing the variables previous FP and regularity
of the attendance are justified. In the first subsection we will examine whether the
categorization fits the data.
35
In the second subsection we verify the proper relation between the age and the
probability of FP screening test result. We look at the data to obtain a first indication
of a proper transformation. It turns out that a logarithmic or power like relation seems
plausible, and then we use a Boxcox analysis for further examination.
Next we examine the capability of the model to explain the typical behavior for
each combination of the variables age, previous FP and regularity of the attendance.
Finally, we study how much the chosen model explains the variability of the data.
4.2.1
The validity of specifying previous FP and regularity of the attendance
as categorical variables
In the following, we will, for simplicity, compare the model fitted values to the
ordinary least squares (OLS) regression computed for each subdatasets. Indeed, in
the Figure 4.1 we fit an OLS regression into the subdatasets with each possible
values of the categorical variables. When we have a categorical variable (irregular
and prevF P), it means that when either variable receives value 1, it adds a constant
into the fitted value. This indicates that the regression line moves while the slope
stays the same.
36
prevFP=0, irregular=1
−1
−1
prevFP=0, irregular=0
−3
−2
R 2 = 0.0191
−6
−5
−4
logit(Pr(FP))
−3
−4
−6
−5
logit(Pr(FP))
−2
R 2 = 0.298
600
650
700
750
800
850
600
650
Age(month)
700
750
800
prevFP=1, irregular=1
−1
−1
prevFP=1, irregular=0
2
R 2 = − 0.0054
−2
−3
−6
−5
−4
logit(Pr(FP))
−3
−4
−6
−5
650
700
750
800
850
600
650
700
750
800
Age(month)
prevFP=2, irregular=0
prevFP=2, irregular=1
850
−5
R 2 = 0.165
−6
−6
−5
R 2 = − 0.000335
−3
−2
−4
−3
logit(Pr(FP))
−2
−1
Age(month)
−1
600
−4
logit(Pr(FP))
−2
R = 0.039
logit(Pr(FP))
850
Age(month)
600
650
700
750
800
850
600
Age(month)
650
700
750
800
850
Age(month)
Figure 4.1. OLS regression for each subdata.
We see a similar slope in the first four graphs. This indicates that the categorical
variable seems at least at a first glance reasonable. Because the slope stays the same
37
in the first four graphs, it seems reasonable to assume that changing the categorical
value does not change the relation between the age and the response variable.
The last two graphs represent a different behavior. However, we note that there is a
low number of the observations. When prevF P = 2, i.e. the women has two or more
previous false-positive results, the slope differs significantly. It is hard to elaborate
why this happens, other than due to the low number of observations (see Table 4.4),
although we do not rule out other possibilities either.
Table 4.4. Number of observations for the cases represented in Figure 4.1.
prevFP
Irregular
No. Observations
0
0
1
1
2
2
0
1
0
1
0
1
398,073
46,321
27,405
3,282
1,748
218
Total
4.2.2
477,047
The relation between the age and the probability of FP result
In addition, in the first graph (see Figure 4.1) where prevF P and irregular have the
value 0, we notice a convex behavior that indicates that some transformation would
be recommended to the age variable. Indeed, this is not only a question of fitting
data, but has deep underlying medical reasons.
The fact that the risk for a FP test result is substantially higher for younger women is
well-documented in the literature (Castells et al., 2006). The reason for this could be
explained by not having previous mammograms to compare the new ones. Another
reason can be that younger breasts are more dense, which makes the interpretation
of the mammogram more difficult (Roman et al., 2014).
Now, the convexity of the first graph in the Figure 4.1 suggests logarithmic or possibly
polynomial transformation. By using a Boxcox analysis, with the transformation
(4.2)
we get the optimal Boxcox variable
1
x
,
= 0.34.
38
−145
−155
−150
log−Likelihood
−140
−135
95%
−2
−1
0
1
2
λ
Figure 4.2. Boxcox analysis.
The value 0.34 being close to 0, recalling that by the l’Hospital rule
lim
x
1
!0
= log x,
we choose to use the logarithmic transformation.
Thus we next compare the model fit, with the logarithmic transformation, versus
the OLS fit of the logit-model for each subdatasets. This can be seen in the Figure
4.3. The red line describes the OLS fit for the observed subdata and the blue line is
the estimated value from the model. Since logit 0 = 1, we first study the model
fit versus the Pr(FP) to be able to study the OLS also considering for 0 cases,
corresponding to logit Pr (F P) = 1.
39
prevFP=0, irregular=1
0.10
0.10
prevFP=0, irregular=0
2
0.06
0.08
R 2 = 0.038
0.00
0.02
0.04
Pr(FP)
0.00
0.02
0.04
Pr(FP)
0.06
0.08
R = 0.591
600
650
700
750
800
850
600
650
log(Age(month))
750
800
850
prevFP=1, irregular=1
0.10
prevFP=1, irregular=0
0.10
700
log(Age(month))
2
0.06
0.08
R 2 = 0.0016
0.00
0.02
0.04
Pr(FP)
0.00
0.02
0.04
Pr(FP)
0.06
0.08
R = 0.152
600
650
700
750
800
850
600
log(Age(month))
650
700
750
800
850
log(Age(month))
Figure 4.3. Inverse logit with the OLS fit of the observed data (red) and
modeling (blue).
In the most typical case of prevF P = 0, irregular = 0, we have a very good fit for
the data (398,073 observations). There is a slight difference in the fit in the second
(46,321 observations) and fourth (3,282 observations) graphs. However, accounting
for the complexity of the question in hand, the model fit is reasonably close to the
OLS fit of the subdata. In the third graph (27,405) the OLS fit (red line) does not seem
40
to represent the data. This happens probably because the cases, where Pr (F P) = 1,
corresponding to a low number of observations within that age group, distort the
data.
Observe also that in the last three graphs some age groups have zero cases, which
can be seen as Pr (F P) = 0 for the corresponding ages in the Figure 4.3. It means
that there are no event in certain age groups with the model parameters. Indeed, the
probability is calculated from probability sampling, and there are age groups without
any events. The adjusted R2 is calculated from OLS fit (red line).
Finally, here we can also see that height of both the OLS fit (red line) as well as the
model fit (blue line) vary depending on the values of the categorial variables. This
corresponds to the different probabilities of obtaining the FP test result depending
on the individual screening history. In addition, the convex behavior can only be
seen in the first graph, but this is expected, since by definition, the later graphs have
observations only for older ages. For instance, prevF P = 1 immediately implies
that at least one screening test has already been done, indicating that the woman
is typically at least 52 years old. The same applies also for the regularity of the
attendance.
4.2.3
The capability of the model to explain the typical behavior for each
combination of age, previous FP and regularity of the attendance
To be able to see more comparable fitted lines, especially for the graph three in the
Figure 4.3, for the model fit versus the OLS fit of the logit-model, lastly we do an
exponential transformation. For log(Age(month)), when we have
(4.3)
log
p
1
p
!
and take an exponent of this, we are left with
(4.4)
p
1
p
.
This transformation can be seen in the Figure 4.4.
41
prevFP=0, irregular=1
0.10
0.10
prevFP=0, irregular=0
2
0.06
0.08
R 2 = 0.0333
0.00
0.02
0.04
p/(1−p)
0.00
0.02
0.04
p/(1−p)
0.06
0.08
R = 0.594
650
700
750
800
850
600
650
700
750
log(Age(month))
prevFP=1, irregular=0
prevFP=1, irregular=1
0.10
log(Age(month))
0.10
600
2
850
0.06
0.08
R 2 = − 0.00166
0.00
0.02
0.04
p/(1−p)
0.00
0.02
0.04
p/(1−p)
0.06
0.08
R = 0.0285
800
600
650
700
750
800
850
600
log(Age(month))
650
700
750
800
850
log(Age(month))
Figure 4.4. Exponential transformation with the OLS fit of the observed data
(red) and modeling (blue).
Here we again have a reasonable fit for first, second and fourth graph, but now also
for the third graph.
A priori, it is clearly possible that the model fit (blue line) and the subdata OLS
fit (red line) would have significant differences, as the red line only optimizes with
respect to the subdata, whereas the blue line has to fit for all the data combined from
42
each of the four (or six) graphs. Apparently, the differences being relatively small,
this is not the case. In overall, we seem to have a reasonable good fit when comparing
the model and the subdata OLS fits.
4.2.4
Variability of the data
Finally we see the model fit in the Figure 4.5. Observe that now, according to the
modeling, the vertical axis is for the logit Pr (F P).
43
−1
prevFP=0, irregular=1
−1
prevFP=0, irregular=0
−3
−6
−5
−4
logit(Pr(FP))
−3
−4
−6
−5
logit(Pr(FP))
−2
R 2 = 0.0342
−2
R 2 = 0.459
650
700
750
800
850
600
650
700
prevFP=1, irregular=0
prevFP=1, irregular=1
800
850
−1
log(Age(month))
−3
−4
−5
−6
−6
−5
−4
logit(Pr(FP))
−3
−2
R 2 = − 0.00389
−2
R 2 = 0.0571
logit(Pr(FP))
750
log(Age(month))
−1
600
600
650
700
750
800
850
600
log(Age(month))
650
700
750
800
850
log(Age(month))
Figure 4.5. Model fit.
The model seems to explain significant part of the variability in the first graph
(adjusted R2 = 0.459). Also we saw already in the previous Figures, that in the most
typical case of prevF P = 0, irregular = 0, we have a very good fit when compared
to the subdata fit, where adjusted R2 = 0.594. In the other cases, the adjusted R2 is,
however, rather low.
The idea in the different Figures is that the fitted value raises especially in cases
44
where the prevF P = 1, meaning that the probability rises. Even though the other
fits do not explain the variability in the same level, the values are higher, with a
statistically significant difference. The model does not explain the variation in the
dataset but it shows the difference in the probability levels. Indeed, while we have
significant differences in the fitted values, we also need to consider the variability of
the data explained by the model.
Because the adjusted R2 is really low, it raises the question about the reliability of
the model. In particular, the bootstrapping heavily relies on the model being the true
data generating process. This clearly undermines the reliability of our results. On the
other hand, the dataset we are using is very large compared to many of the typical
longitudinal settings. Relying on the size of the dataset it is reasonable to assume that
the GEE estimates are very close to the true ones. The problem, if any, rises from
the bootstrapping based calculation of the confidence intervals, and these estimates
should be considered with caution.
45
5
Results
In this thesis, we produced a retrospective cohort study concerning the Finnish breast
cancer screening program. The cohort was chosen to include the women who were
50–51 years old at the time of their first mammography invitation in 1992–1995.
We used generalized estimating equations to model the probability of a false-positive
test result in a screening examination during the 10 biennial screening examinations
in the Finnish 20-year breast cancer screening program. The cumulative probability
was then calculated by these estimates with the formula (3.41) which is based on
an iterative use of the conditional probability. The calculation of the cumulative
probability assumes that women attend screenings every two years starting from age
50.
Previous studies from Denmark and Italy (Njor et al., 2007; Puliti et al., 2011)
estimated the cumulative risk of a FP screening result in a 10-visit program to
be 15.2–15.8%. In Norway and Spain, in turn, the estimated cumulative risk of a
FP test result has been estimated at 20% (Roman et al., 2012, 2013). While these
European studies are the most comparable to ours, we also point out that in the
USA the cumulative risk has been demonstrated to be significantly higher, 41.6–
49.1% (Hubbard et al., 2011; Christiansen et al., 2000; Elmore et al., 1998).
Our modeling estimates the cumulative probability of at least one FP screening result
during 10 screening examinations to be 15.84% (95% CI: 15.49–16.18%) among the
81, 435 attendees. The odds ratios for age and regularity of the attendance were 0.61,
1.46, respectively. For one or more previous FP screening results the corresponding
odds ratios were 1.91 and 3.09 respectively. Women with irregular attendance and
previous FP result have increased risk of receiving FP screening result.
There were altogether 2,430 women who attended 10 screenings. Out of these women
15.54% received at least one false-positive test result. The observed risk being
relatively close to the model-fitted value, our modeling seems to explain the data,
and also the underlying phenomena, relatively well.
The cumulative probability of at least one FP screening result is shown in the Figure
5.1 together with the estimated confidence intervals.
46
Cumulative probability of a false positive result
15.84
14.94
15
13.98
12.96
11.84
Probability (%)
10.62
9.26
10
7.71
5.88
5
3.55
0
3.61
2.47
2.01
1.75
1.57
50
52
54
56
58
1.45
1.36
1.29
1.24
1.2
60
62
64
66
68
Age
Figure 5.1. Estimated cumulative probability of a false-positive screening test
result by using generalized estimating equations.
We also calculated the probability of receiving at least one FP at certain age (cf.
the bar graphs) by using the conditional probability (3.40). Here the probability was
47
calculated with the condition of woman attending, having no previous FP results and
woman either being irregular or regular attendee. The point estimates for every age
group is shown in the Figure 5.1. We observe that the probability is not precisely
the same as in the cumulative probability. This is due to allowing the attendee
to be "irregular" as defined in the modeling. For the cumulative probability, we
instead account only the women who attend all ten screenings, implying that all the
attendances are specified as "regular".
Observe that here we also need to model whether the attendee is regular or not
according to (3.40). To be completely precise, justifying this calculation would
require us to check whether the model for the attendance being regular or not is
correctly specified. In our analysis, we used age as a linear explanatory variable
in the logit-model for the irregularity of attendance. However, we did not check
in detail whether this is the correct model. We acknowledge that this may have a
small impact on our estimates, and reduces their reliability. Nevertheless, we believe
these estimates to be somewhat reasonable, and since our emphasis is rather on the
cumulative probability, we bypass the more thorough analysis.
48
6
Discussion and conclusion
We estimated that approximately one in every six women, who attend the biennial 20year Finnish breast cancer screening program, will be recalled for further examination
with at least one false-positive screening result. This result is similar to the other
European screening programs (Hofvind et al., 2012; Roman et al., 2013; Njor et al.,
2007; Puliti et al., 2011). The estimates in the U.S. are higher (Hubbard et al., 2011;
Elmore et al., 1998; Christiansen et al., 2000).
Estimates on the cumulative probability of FP screening result rate vary between organized screening programs. This can be explained by the differences in the screening
programs, management policies and screening interval (Smith-Bindman et al., 2005).
There are various factors that can have an effect on the probability of a woman receiving a false-positive screening test result. These factors can relate to either screening
program itself or the characteristics of a woman (Roman et al., 2014; Castells et al.,
2006; Christiansen et al., 2000).
The estimation of Pr (F P) is an immensely difficult problem. The number of factors
contributing to physician decisions, to recall the woman for further examination,
are very large, including the complete patient history, data of biomarkers, and also
family medical history. The increased risk of receiving a FP screening result after a
previous FP test result could indicate, that there might be the same transformation in
the breast tissue which was already interpreted as a previous false-positive test result.
The radiologist performance has been suggested to be one of the explanations in the
variation of the recall rates and false-positive screening results between the countries
and organised screening programs (Elmore et al., 2009). Roman et al. (2013) elaborate on different recall rates for further assessment. They discuss possible differences
in mammography reading procedures, whether it is independent double reading (Europe) or single reading (USA). The diagnostic performance of a radiologist is related
to the radiologist’s years of experience and number of mammograms interpreted
(Elmore et al., 2009).
There is also a difference in the recommendation to read certain number of screening
mammograms per year between the countries. The European guidelines for quality
assurance in breast cancer screening and diagnosis recommend that radiologist read
at least 5000 screening mammograms annually (Perry et al., 2006). The difference
might also be among the physicians, whether the mammogram is interpreted by
specialised physician or by general radiologist. In addition, perhaps any kind of
uncertainty, whether the physician interprets any kind of abnormality, might trigger
the recall just in case.
Moreover, since a received FP result might influence the attendance for the next
screening examinations (Salz et al., 2011), the radiologist might also have a reduced
number or previous mammographies to compare the result in later rounds. In other
49
words, FP results are related to irregular later screening attendance, further affecting
the radiologist decision making.
There are also individual characteristic of a woman that may increase the risk of
receiving a FP result from screening. As stated by Castells et al. (2006) the factors
associated with a higher cumulative risk of false-positive recall include previous
benign breast disease, premenopausal status, body mass index above 27, 3 and age
50–54 years. In particular, women should be informed about the risk of receiving
a FP test result if they attend regularly breast cancer screenings. It would also be
important to inform the women about the likelihood of experiencing a false-positive
screening result based on certain risk factors that increase the probability of receiving
a false-positive test result.
Altogether, there is a very large number of possible reasons for a false-positive
mammography test result. Therefore, it is unreasonable to expect a perfect, near
perfect, or even a good general fit with such a simple model, as used in this work.
In particular, we have only used the variables based on previous literature (Roman
et al., 2013; Baker et al., 2003), and we have deliberately avoided any kind of ad
hoc reasoning where the variables or the modeling are retrospectively fitted for the
specific data in hand.
A natural extension would be to include a larger number of relevant variables into
the modeling, but in the present case, we are restricted by data availability. In any
case, similarly to the existing literature (Roman et al., 2013; Baker et al., 2003), and
due to the large dataset, we expect our approach to be sufficient for estimating the
cumulative risk of a false-positive test result in the 10-visit 20-year Finnish screening
program.
We would like to point out that the modeling methods for estimating the cumulative
probability of a false-positive screening result also vary depending on the data
availability and assumptions made about the independence of the screening rounds
(Hubbard et al., 2011; Baker et al., 2003; Gelfand and Wang, 2000).
We obtained the cumulative risk of a FP test result in the biennial 20-year screening
program to be 15.84%. While the cumulative probability seems to fit the data,
our modeling does not explain the variability we observe. This raises the question
whether our estimates are reliable. The asymptotic behavior of GEE rely on the
model being correctly specified. This assumption applies also for the bootstrapping
method applied for the confidence intervals. Since the model does not explain the
variability, it indicates that the model is not necessarily completely correct. On the
other hand, GEE produce an unbiased estimator of the parameter vector, with the
proper asymptotic large sample distribution. Thus, it seems safe to assume, that
GEE provides asymptotically correct modeled probabilities, even if the confidence
intervals would not be completely justified.
On the other hand, the observed cumulative risk 15.54 being very close to the modelfitted cumulative probability of a false-positive screening result 15.84, our modeling
seems to explain the data relatively well. Indeed, in the standard case of no previous FP
50
test results, with regular attendance habits, and with the logarithmically transformed
age variable, we receive an adjusted R2 -value of 0.459, which is rather high. In
particular, this is the case, which dominates in our calculation of the cumulative
probability, which further supports our conclusions.
Considering the data, we also saw a substantially different behavior among the women
who had more than 1 previous FP test results. Indeed, in this case the probability of
yet another FP result seemed to be increasing with age. To best of our knowledge,
this kind of behavior has not been observed in earlier studies. Additional studies
are needed, in order to verify or confirm this observation. Moreover, our modeling
does not fit into this case, and does not aim to explain the phenomena. Instead, we
concentrate on the cumulative probability of an FP test result, where the dominating
effect comes from the standard attendee with no previous FP test result and with
regular attendance habits.
Finally, we acknowledge another question; namely that of how many women receive
a false-positive result in the whole population regardless how many times they attend
the screenings or whether they attend at all. This can be seen as the burden of the
population based screening program to the society with respect to the disadvantages
of FP screening results. In this study 12.3% received at least one false-positive
screening result. However, this does not take into account, that all the women have
not yet finished the whole screening program. Nevertheless, this question would be
interesting to study, although we do not cover this issue.
51
References
Agresti, A. (2015). Foundations of Linear and Generalized Linear Models. John
Wiley & Sons.
Baker, S.G., Erwin, D., and Kramer, B.S. (2003). Estimating the cumulative risk of
false positive cancer screenings. BMC Medical Research Methodology, 3(1), 1.
Ballinger, G.A. (2004). Using generalized estimating equations for longitudinal data
analysis. Organizational Research Methods, 7(2), 127–150.
Bond, M., Pavey, T., Welch, K., Cooper, C., Garside, R., Dean, S., and Hyde, C.
(2013). Systematic review of the psychological consequences of false-positive
screening mammograms. Health Technology Assessment, 17(13), 1–170.
Brett, J., Bankhead, C., Henderson, B., Watson, E., and Austoker, J. (2005). The
psychological impact of mammographic screening. A systematic review. PsychoOncology, 14(11), 917–938.
Brodersen, J., and Siersma, V.D. (2013). Long-term psychosocial consequences of
false-positive screening mammography. The Annals of Family Medicine, 11(2),
106–115.
Bull, A.R., and Campbell, M.J. (1991). Assessment of the psychological impact
of a breast screening programme. The British Journal of Radiology, 64(762),
510–515.
Castells, X., Molins, E., and Macià, F. (2006). Cumulative false positive recall rate
and association with participant related factors in a population based breast cancer
screening programme. Journal of Epidemiology and Community Health, 60(4),
316–321.
Christiansen, C.L., Wang, F., Barton, M.B., Kreuter, W., Elmore, J.G., Gelfand,
A.E., and Fletcher, S.W. (2000). Predicting the cumulative risk of false-positive
mammograms. Journal of the National Cancer Institute, 92(20), 1657–1666.
Elmore, J.G., Barton, M.B, Moceri, V.M., Polk, S., Arena, P.J., and Fletcher, S.W.
(1998). Ten-year risk of false positive screening mammograms and clinical breast
examinations. New England Journal of Medicine, 338(16), 1089–1096.
Elmore, J.G., Jackson, S.L., Abraham, L., Miglioretti, D.L., Carney, P.A., Geller,
B.M., Yankaskas, B.C., Kerlikowske, K., Onega, T., Rosenberg, R.D., et al. (2009).
Variability in interpretive performance at screening mammography and radiologists’ characteristics associated with accuracy. Radiology, 253(3), 641–651.
EU Council (2003). EU Council Recommendation (EC) 878/2003 of 2 December
2003 on cancer screening. Official Journal of the European Union, L327, 34–38.
Finnish Cancer Registry (FCR, 2010). Mass screening. Breast Cancer Screening. Retrieved (August 24, 2016) from http://www.cancer.fi/syoparekisteri/en/
mass-screening-registry/breast_cancer_screening/.
52
Fitzmaurice, G.M., Laird, N.M., and Ware, J.H. (2012). Applied longitudinal analysis. Vol. 998. John Wiley & Sons.
Gelfand, A.E., and Smith, A.F.M. (1990). Sampling-based approaches to calculating
marginal densities. Journal of the American Statistical Association, 85(410),
398–409.
Gelfand, A.E., and Wang, F. (2000). Modelling the cumulative risk for a false-positive
under repeated screening events. Statistics in Medicine, 19(14), 1865–1879.
Halekoh, U., Højsgaard, S., and Yan, J. (2006). The R package geepack for generalized
estimating equations. Journal of Statistical Software, 15(2), 1–11.
Hardin, J.W. (2005). Generalized estimating equations (GEE). Wiley Online Library.
Harris, R.P., Sheridan, S.L., Lewis, C.L., Barclay, C., Vu, M.B., Kistler, C.E., Golin,
C.E., DeFrank, J.T., and Brewer, N.T. (2014). The harms of screening: a proposed
taxonomy and application to lung cancer screening. JAMA Internal Medicine,
174(2), 281–286.
Heinävaara, S., Sarkeala, T., and Anttila, A. (2014). Overdiagnosis due to breast
cancer screening: updated estimates of the Helsinki service study in Finland.
British Journal of Cancer, 111(7), 1463–1468.
Hofvind, S., Thoresen, S., and Tretli, S. (2004). The cumulative risk of a falsepositive recall in the Norwegian Breast Cancer Screening Program. Cancer,
101(7), 1501–1507.
Hofvind, S., Ponti, A., Patnick, J., Ascunce, N., Njor, S., Broeders, M., Giordano,
L., Frigerio, A., and Törnberg, S. (2012). False-positive results in mammographic
screening for breast cancer in Europe: a literature review and survey of service
screening programmes. Journal of Medical Screening, 19(suppl 1), 57–66.
Hubbard, R.A., Miglioretti, D.L., and Smith, R.A. (2010). Modelling the cumulative
risk of a false-positive screening test. Statistical Methods in Medical Research,
19(5), 429–449.
Hubbard, R.A., Kerlikowske, K., Flowers, C.I., Yankaskas, B.C., Zhu, W., and
Miglioretti, D.L. (2011). Cumulative probability of false-positive recall or biopsy
recommendation after 10 years of screening mammography: a cohort study. Annals
of Internal Medicine, 155(8), 481–492.
International Agency for Research on Cancer (IARC, 2002). IARC Handbooks of
Cancer Prevention. Breast cancer screening. Vol. 7. Lyon: IARC Press.
Lauby-Secretan, B., Scoccianti, C., Loomis, D., Benbrahim-Tallaa, L., Bouvard, V.,
Bianchini, F., and Straif, K. (2015). Breast-cancer screening - viewpoint of the
IARC Working Group. New England Journal of Medicine, 372(24), 2353–2358.
Lerman, C., Track, B., Rimer, B.K., Boyce, A., Jepson, C., and Engstrom, P.F. (1991).
Psychological and behavioral implications of abnormal mammograms. Annals of
Internal Medicine, 114(8), 657–661.
53
Liang, K-Y., and Zeger, S.L. (1986). Longitudinal data analysis using generalized
linear models. Biometrika, 13–22.
Njor, S.H., Olsen, A.H., Schwartz, W., Vejborg, I., and Lynge, E. (2007). Predicting
the risk of a false-positive test for women following a mammography screening
programme. Journal of Medical Screening, 14(2), 94–97.
Otten, J.D.M., Fracheboud, J., Den Heeten, G.J., Otto, S.J., Holland, R., De Koning, H.J., Broeders, M.J.M., and Verbeek, A.L.M. (2013). Likelihood of early
detection of breast cancer in relation to false-positive risk in life-time mammographic screening: population-based cohort study. Annals of Oncology, 24(10),
2501–2506.
Perry, N., Broeders, M., de Wolf, C., Törnberg, S., Holland, R., and von Karsa, L.
(2006). European guidelines for quality assurance in breast cancer screening
and diagnosis. Luxembourg: Office for Official Publications of the European
Communities.
Puliti, D., Miccinesi, G., and Zappa, M. (2011). More on Screening mammography.
The New England Journal of Medicine, 364, 281–286.
Puliti, D., Duffy, S.W., Miccinesi, G., De Koning, H., Lynge, E., Zappa, M., and
Paci, E. (2012). Overdiagnosis in mammographic screening for breast cancer in
Europe: a literature review. Journal of Medical Screening, 19(1), 42–56.
R Development Core Team (2008). R: A Language and Environment for Statistical
Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN
3-900051-07-0.
Roman, M., Skaane, P., and Hofvind, S. (2014). The cumulative risk of falsepositive screening results across screening centres in the Norwegian Breast Cancer
Screening Program. European Journal of Radiology, 83(9), 1639–1644.
Roman, M., Hubbard, R.A., Sebuodegard, S., Miglioretti, D.L., Castells, X., and
Hofvind, S. (2013). The cumulative risk of false-positive results in the Norwegian
Breast Cancer Screening Program: Updated results. Cancer, 119(22), 3952–3958.
Roman, R., Sala, M., Salas, D., Ascunce, N., Zubizarreta, R., Castells, X., et al.
(2012). Effect of protocol-related variables and women’s characteristics on the
cumulative false-positive risk in breast cancer screening. Annals of Oncology,
23(1), 104–111.
Salas, D., Ibáñez, J., Román, R., Cuevas, D., Sala, M., Ascunce, N., Zubizarreta,
R., Castells, X., et al. (2011). Effect of start age of breast cancer screening
mammography on the risk of false-positive results. Preventive Medicine, 53(1-2),
76–81.
Salz, T., DeFrank, J.T., and Brewer, N.T. (2011). False positive mammograms in
Europe: do they affect reattendance? Breast Cancer Research and Treatment,
127(1), 229–231.
54
Sarkeala, T., Heinävaara, S., and Anttila, A. (2008a). Breast cancer mortality with
varying invitational policies in organised mammography. British Journal of Cancer, 98(3), 641–645.
Sarkeala, T., Heinävaara, S., and Anttila, A. (2008b). Organised mammography
screening reduces breast cancer mortality: a cohort study from Finland. International Journal of Cancer, 122(3), 614–619.
Smith-Bindman, R., Ballard-Barbash, R., Miglioretti, D.L., Patnick, J., and Kerlikowske, K. (2005). Comparing the performance of mammography screening in
the USA and the UK. Journal of Medical Screening, 12(1), 50–54.
Ministry of Social Affairs and Health (2006). Ministry of Social Affairs and
Health. Government Decree on Screenings (1339/2006). http://www.finlex.fi/
en/laki/kaannokset/2006/en20061339.pdf.
Varadhan, S.R.S. (2001). Probability theory (Vol. 7). American Mathematical
Society.
Yan, J. (2002). Geepack: Yet another Package for Generalised Estimating Equations.
R-News 2/3: 12–14.
Yan, J., and Fine, J. (2004). Estimating equations for association structures. Statistics
in Medicine, 23(6), 859–874.
55
© Copyright 2026 Paperzz