Estimation of Prostate Cancer Probability by Logistic Regression

Clinical Chemistry 45:7
987–994 (1999)
Test Utilization and
Outcomes
Estimation of Prostate Cancer Probability by
Logistic Regression: Free and Total Prostatespecific Antigen, Digital Rectal Examination, and
Heredity Are Significant Variables
Arja Virtanen,1* Mehran Gomari,2 Ries Kranse,3 and Ulf-Håkan Stenman4
Background: Despite low specificity, serum prostatespecific antigen (PSA) is widely used in screening for
prostate cancer. Specificity can be improved by measuring free and total PSA and by combining these results
with clinical findings. Methods such as neural networks
and logistic regression are alternatives to multistep
algorithms for clinical use of the combined findings.
Methods: We compared multilayer perceptron (MLP)
and logistic regression (LR) analysis for predicting prostate cancer in a screening population of 974 men, ages
55– 66 years. The study sample comprised men with PSA
values >3 mg/L. Explanatory variables considered were
age, free and total PSA and their ratio, digital rectal
examination (DRE), transrectal ultrasonography, and a
family history of prostate cancer.
Results: When at least 90% sensitivity in the training
sets was required, the mean sensitivity and specificity
obtained were 87% and 41% with LR and 85% and 26%
with MLP, respectively. The cancer specificity of an LR
model comprising the proportion of free to total PSA,
DRE, and heredity as explanatory variables was significantly better than that of total PSA and the proportion
of free to total PSA (P <0.01, McNemar test). The
proportion of free to total PSA, DRE, and heredity were
used to prepare cancer probability curves.
Conclusion: The probability calculated by logistic regression provides better diagnostic accuracy for prostate
cancer than the presently used multistep algorithms for
estimation of the need to perform biopsy.
© 1999 American Association for Clinical Chemistry
Prostate cancer is the most common cancer in men, and it
is the second most common cause of cancer death (1 ).
When localized to the prostate, the cancer can usually be
cured by radical prostatectomy. Therefore, attempts are
being made to improve the prognosis by early detection
and screening (2, 3 ). Prostate-specific antigen (PSA)5 is a
sensitive marker for prostate cancer (4, 5 ). When PSA is
used as a primary test for screening, ;10% of all men .50
years of age have a PSA value exceeding the commonly
used cutoff of 4 mg/L (2, 3 ). A majority of patients with
increased PSA have benign prostatic hyperplasia (2, 3 ).
Prostate cancer is found in approximately one-third of the
cases by the use of digital rectal examination (DRE),
transrectal ultrasonography (TRUS), and sextant biopsy.
Because the examinations required to confirm the
diagnosis are expensive, several approaches are being
used to increase the accuracy of the primary screening
method, e.g., the PSA value divided by the prostate
volume (PSA density) (6 ), the rate of increase of serum
PSA (PSA velocity) (7 ), and the proportion of the two
major forms of PSA in serum, i.e., free PSA and the
complex of PSA with a1-antichymotrypsin. In patients
with prostate cancer, the proportion of PSA-a1-antichymotrypsin is higher and that of free PSA lower than in
those with benign prostatic hyperplasia (8 ). If the complexed, total free, and total PSA are measured, the number of false-positive results can be reduced by 30 –50%,
but a further improvement is desirable (8 –10 ).
1
Central Laboratory, University Central Hospital of Turku, Medical Informatics Research Centre in Turku (MIRCIT), FIN-20521 Turku, Finland.
2
Turku Centre for Computer Science (TUCS), FIN-20520 Turku, Finland.
3
Department of Urology, Dijkzigt Academic Hospital Rotterdam, NL-3015
GD Rotterdam, The Netherlands.
4
Department of Clinical Chemistry, University Central Hospital of Helsinki, FIN-00100 Helsinki, Finland.
*Author for correspondence. Fax 358-2-2613920; e-mail [email protected].
Received September 8, 1998; accepted April 7, 1999.
5
Nonstandard abbreviations: PSA, prostate-specific antigen; DRE, digital
rectal examination; TRUS, transrectal ultrasonography; LR, logistic regression;
MLP, multilayer perceptron; ANN, artificial neural network; and F/T ratio,
proportion of free to total PSA.
987
988
Virtanen et al.: Estimation of Probability of Prostate Cancer
We have evaluated whether the diagnostic accuracy
can be improved by utilizing all the diagnostic information available to determine whether a biopsy should be
performed. Presently, the most common algorithm is to
perform biopsies on all men with a PSA value .4 mg/L or
a positive DRE in combination with a lower PSA value.
ROC curves and logistic regression analysis have been
used to evaluate the diagnostic methods for prostate
cancer. Neural networks may be used as alternatives to
statistical methods for medical decision support (11 ), and
recently artificial neural networks have also been utilized
(12 ). However, comparisons with standard statistical
analysis are lacking.
In this study, we evaluated the ability of logistic
regression (LR) and multilayer perceptron (MLP) to improve the diagnostic accuracy for prostate cancer by
combining determinations of free and total PSA, DRE,
TRUS, age, and a family history of prostate cancer.
A logistic regression model can be formulated mathematically by relating the probability of some event conditional on the vector x of explanatory variables (13 ). The
linear logistic model is described by the formula:
logit~P x ! 5 log(P x /1 2 P x ) 5 a 1 b x
(1)
where Px is the probability that the response belongs in
the cancer group when explanatory variables have certain
values. In the equation above, a is the intercept parameter
and b is the vector of the slope parameters, which are
estimated from the data. For interpretation of the results,
the slope parameters have an important meaning. Odds
ratios can be calculated by taking the antilogarithm of
these values. If the slope parameter is positive, the odds
ratio describes how much the odds for having a cancer
increase when the corresponding explanatory variable
increases one unit and the other variables remain unchanged. A negative slope parameter correspondingly
decreases the odds. Some of the variables may be useful
for the classification, but when combined with other
variables, they may be ineffective if they are highly
correlated with other variables. Variables that contribute
redundant information can be omitted, with only those
variables with the best diagnostic accuracy retained in the
model. In the present study, the probability of cancer is
modeled, and a stepwise variable selection method is
used to provide an adequate solution to the problem of
variable selection (14 ).
Artificial neural networks (ANNs) form a family of
computational architectures inspired by biological brains
[for a review, see Ref. (15 )]. ANNs can approximate
nonlinear functions using only the input-output patterns
in a supervised learning task. The knowledge in ANNs is
distributed among artificial neurons. Therefore, they can
be taught to approximate functions with incomplete or
noisy data. In this study we focus our attention on MLPs,
which are feed-forward ANNs.
MLPs are simple networks with successive layers containing different numbers of “units” (“neurons” or “nodes”).
The units are the basic processing elements of the network, and they are connected to each other by “connections” (“links” or “arcs”). A numeric “weight” is associated with each connection. Units with the same task form
a layer. An MLP consists of three different types of layers:
input, output, and hidden (layers between the input and
output layers) layers. The units in each layer receive their
inputs from the units in the previous layer (feed-forward),
and there are no feedback loops. Each unit has an activation function for computing the activation level and a
threshold (bias) for defining the current activation level.
Learning in MLPs takes place by adjusting the weights
between layers so that the difference between the actual
(computed by the MLP) and the desired output is minimized. The learning algorithm used for this purpose is
usually “backpropagation” (16 ), and a typical error function to be minimized is the mean square error.
Materials and Methods
subjects
The data used in this study were extracted from a prostate
cancer screening database comprising 974 men in Rotterdam, The Netherlands (17 ). Input variables were total and
free PSA, the proportion of free to total PSA (F/T ratio),
DRE, and TRUS, as well as age and family history of
prostate cancer. The outcome, i.e., prostate cancer or no
prostate cancer, was based on histological examination of
sextant prostate biopsies. Blood specimens were collected
before DRE and TRUS were performed. Before TRUS,
DRE was performed by different well-trained urology
residents. Prostate nodules, asymmetry of the gland,
induration, and bogginess were considered abnormal. A
suspicious TRUS was defined as a hypoechoic lesion. DRE
and TRUS were performed without knowledge of the
other tests, and they were performed by a different
investigator. The family history was obtained from the
baseline questionnaire that was filled out by every participant and returned together with the signed informed
consent form. A positive or negative family history of
prostate cancer included only first-degree male relatives.
From this database, all subjects 55– 66 years of age with
PSA concentrations in the diagnostic gray zone between 3
and 10 mg/L were selected for the study (n 5 212).
Among men with total PSA ,3 mg/L, the proportion of
cancer was 7.4%, and in those with total PSA .10 mg/L,
it was 65.3%. In the study sample, the proportion of
cancer was 25%. Another study group comprised all men
55– 66 years of age with a PSA $3 mg/L (n 5 241).
psa determinations
Total- and free-PSA serum samples were measured with
the ProStatus PSA Free/Total assay (EG&G Wallac,
Turku, Finland). This assay provides simultaneous measurement of free and total PSA by the use of time-resolved
DELFIA immunofluorometry. Assay performance characteristics have been reported previously (17, 18 ).
989
Clinical Chemistry 45, No. 7, 1999
methods
To estimate the performance of the different methods, a
fivefold “cross-validation” was used. The study sample
was divided randomly into five equal subsamples. One
subsample at a time formed a test set (;20% of the study
sample), and the remaining four subsamples together
formed the training set. This protocol was repeated five
times.
LR analysis. LR can be used when the response is binary
(e.g., cancer/non-cancer). To investigate the effect of
explanatory variables to predict prostate cancer, we used
a stepwise selection method. In this method, the sequences of regression equations are computed, and at
each step an explanatory variable is automatically added
to or deleted from the model, depending on the statistical
significance of the variable. To describe the association
between response and explanatory variables, an adjusted
coefficient of determination (Nagelkerke’s generalized R2)
and odds ratios with 95% profile likelihood confidence
intervals were evaluated. For identification of extreme
values, we used the regression diagnostics developed by
Pregibon (19 ). The adequacy of the fitted model was
further checked by the Hosmer-Lemeshow test (20 ). Probability curves for cancer were evaluated by LR analysis.
Both the training and validation sets were used for model
building. All statistical calculations were performed with
SAS® for Windows, Ver. 6.11 (SAS Institute), and graphic
presentation with Microcal OriginTM for Windows (Microcal Software).
MLP. A faster version of the backpropagation algorithm—
resilient backpropagation (RPROP)—was used (21 ). Several MLPs with different numbers of hidden units ranging
from 2 to 20 in one hidden layer were examined. For all
MLPs, the activation function of the input units was
“identity”, whereas a “symmetric logistic” activation
function was used for the hidden and output units. To
avoid “overfitting” of the MLPs to the training sets, we
used the “early stopping” method (22 ). This means that
part of the data was used as a validation set (for each
cross-validation trial) and the training phase was terminated when the error of the validation set was at a
minimum. This was achieved by the use of a batch
program that always updated a record of the network
status with the minimum error of the validation set
during the training process. Because the early stopping
method is sensitive to the initial values of Neural Network weights, 10 different random initial weights were
used for each cross-validation trial, and the best one was
chosen according to the validation set. For the MLP
experiments, the Stuttgart Neural Network Simulator
(SNNS), Ver. 4.1, was used.
statistical methods
The association between categorical variables and diagnostic groups was tested by x2 statistics, and the differ-
ences in numerical variables were tested by Wilcoxon
statistics. Sensitivity, specificity, efficiency, and positive
and negative predictive values were calculated to compare different methods. Because sensitivity for the detection of cancer was considered more important than specificity, the threshold was chosen so that sensitivity was at
least 90% in all training sets (validation sets for the MLPs).
Furthermore, the efficiency had to be at its maximum. If
the same efficiency was obtained with different combinations of sensitivity and specificity, the threshold with the
highest sum of sensitivity and specificity was chosen.
After the best threshold values from the training or
validation sets were determined, we used the values in
test sets to evaluate the performance of the different
methods. Statistical comparisons between the methods
were calculated with the multivariate Wilks lambda statistics (23 ). To compare the classification results between
the methods and with the total PSA and the F/T ratio on
the test sets, we calculated pairwise comparisons using
the McNemar test (24 ) with Bonferroni correction.
Results
The descriptive statistics, frequency distributions, and
probability for Wilcoxon and x2 statistics for the study
sample are shown in Table 1. Of the 212 men, 53 (25%)
had prostate cancer. The proportions of abnormal results
for the categorical variables DRE, TRUS, and heredity
were 8.8%, 13.2%, and 8.2% in the non-cancer group and
Table 1. Descriptive statistics, frequency distributions, and
probability for age, total and free PSA, the F/T ratio, DRE,
TRUS, and heredity in the prostate cancer and no prostate
cancer groups.
Age, years
Mean 6 SD
Median (range)
Total PSA, mg/L
Mean 6 SD
Median (range)
Free PSA, mg/L
Mean 6 SD
Median (range)
F/T ratio
Mean 6 SD
Median (range)
DRE
Normal (0)
Abnormal (1)
TRUS
Normal (0)
Abnormal (1)
Heredity
No (0)
Yes (1)
a
No cancer
(n 5 159)
Cancer
(n 5 53)
61.1 6 3.4
62.0 (56–66)
61.3 6 3.4
61.0 (56–66)
5.16 6 1.7
4.51 (3.1–9.9)
5.85 6 1.8
5.17 (3.2–9.9)
0.93 6 0.4
0.83 (0.3–2.6)
0.78 6 0.4
0.65 (0.3–2.1)
0.18 6 0.06
0.18 (0.05–0.38)
0.14 6 0.07
0.12 (0.05–0.42)
145
14
31
22
138
21
31
22
146
13
41
12
Pa
0.752
0.012
0.004
,0.001
0.001
0.001
0.005
Wilcoxon test for continuous and x2 test for discrete variables.
990
Virtanen et al.: Estimation of Probability of Prostate Cancer
41.5%, 41.5%, and 22.6% in the cancer group, respectively.
There was a statistically significant association between
the diagnostic group and categorical variables (all P
,0.01). There was a significant association between TRUS
and DRE (P 5 0.001) and a moderate association between
TRUS and heredity (P 5 0.037), but not between DRE and
heredity (P 5 0.118). There was no difference between the
groups with respect to age (P 5 0.752). The median F/T
ratio in the non-cancer group was 0.18 vs 0.12 in the
cancer group (P ,0.001).
lr
In the stepwise variable selection method, age and free
PSA were not statistically significant variables in any of
the training sets. The F/T ratio and DRE were nearly
always added to the model in the first or second step, and
TRUS and heredity were added in the third or fourth
steps. The F/T ratio, DRE, and heredity were chosen for
the final models. The aptness of these models was
checked, and one observation in each of the four training
sets was removed because it was poorly elucidated by the
model (19 ). The slope parameters with significance levels,
odds ratios with 95% profile likelihood confidence intervals, and adjusted coefficient of determination (R2) in five
different training sets obtained with the final models are
shown in Table 2. In all training sets, DRE was statistically
significant at the 1% level. The F/T ratio was significant at
the 1% level except in one training set, and heredity was
significant at the 5% level except in one training set. The
slope parameters for DRE and heredity were positive, and
those for the F/T ratio were negative. A change from a
normal DRE value (0) to an abnormal value (1) increased
the cancer risk approximately sixfold, and a positive
heredity increased the risk approximately threefold. The
confidence intervals for these odds ratios were wide, reflecting the small sample size. However, in all training sets, the
risk was at least 1.7-fold when DRE was abnormal.
An increased cancer risk was associated with low
F/T ratios. When the F/T ratio decreased by 0.12 units,
the cancer risk increased, on average, fivefold. However, for clarity, the odds ratios in Table 2 display the
decrease in the risk when the F/T ratio increased by
0.12 units.
Using the calculated parameter estimates obtained by
LR, we could determine the probability that a man has a
prostate cancer with certain values for the predictive
variables (e.g., F/T ratio, DRE, and heredity). In Fig. 1,
probability curves for cancer associated with different
values and combinations of the F/T ratio, DRE, and
heredity are plotted for PSA values between 3 and 10
mg/L. These results clearly demonstrate the importance of
the F/T ratio.
When LR is performed for total-PSA values between 3
and 45 mg/L with the combination of total PSA, the F/T
ratio, and DRE as explanatory variables, the area under
the ROC curve was 0.809 and R2 was 22%. The greatest
area under the curve (0.812) was achieved when total PSA
and the F/T ratio were combined with DRE, TRUS, and
heredity. Because the difference between the models was
not critical, the cancer probability curves plotted in Fig. 2
were determined from the model in which total PSA
concentrations of 3– 45 mg/L, the F/T ratio, and DRE are
included.
mlp
For the MLP experiments, all other explanatory variables
except age were selected as input variables. The final
thresholds for the MLPs were chosen on the basis of the
performance of the model in the validation sets instead of
the training sets because the termination of the training
was based on the validation sets and not on the training sets.
Among the MLPs examined, the MLP with five hidden units
provided the best performance on unseen data.
Table 2. Adjusted coefficient of determination (R2), parameter estimates, observed probability, and odds ratios with 95%
profile likelihood confidence intervals in five training sets evaluated by LR.
Set
R2
Variable
Parameter estimate
Observed P
Odds ratio (95% CI)a
I
31.3
II
27.4
III
27.6
IV
27.5
V
39.7
F/T ratio
DRE
Heredity
F/T ratio
DRE
Heredity
F/T ratio
DRE
Heredity
F/T ratio
DRE
Heredity
F/T ratio
DRE
Heredity
216.911
1.618
1.388
210.197
1.643
1.076
25.364
1.995
1.125
212.683
1.475
1.018
215.019
2.176
1.485
0.0001
0.0025
0.0278
0.0054
0.0004
0.0437
0.1010
0.0001
0.0339
0.0006
0.0020
0.0942
0.0004
0.0001
0.0089
0.1 (0.0–0.3)
5.0 (1.8–14.9)
4.0 (1.2–14.3)
0.3 (0.1–0.7)
5.2 (2.1–13.0)
2.9 (1.0–8.4)
0.5 (0.2–1.1)
7.3 (3.1–18.1)
3.1 (1.1–8.7)
0.2 (0.1–0.5)
4.4 (1.7–11.3)
2.8 (0.8–9.1)
0.1 (0.0–0.4)
8.8 (3.2–25.9)
4.4 (1.4–13.8)
a
CI, confidence interval.
Clinical Chemistry 45, No. 7, 1999
Fig. 1. Probability of cancer derived from LR with different results for
DRE (Dre), heredity (0 5 normal, 1 5 abnormal), and the F/T ratio for
PSA concentrations of 3–10 mg/L.
comparisons of the methods
The performance measures of the different methods are
shown in Table 3. The mean differences between the
methods were not statistically significant when all performance measures were considered (P 5 0.154, Wilks
lambda test). However, a statistically significant difference (P 5 0.023, Wilks lambda test) between the methods
was observed when sensitivity and specificity were considered. This difference was attributed to the fact that the
specificity of the MLP was only approximately one-half of
that of the LR method. The LR model was significantly
better in predicting prostate cancer than the MLP or the
model with total PSA and the F/T ratio as explanatory
variables (P ,0.01, McNemar test).
Discussion
We examined the ability of two methods, LR and MLP, to
improve the diagnostic accuracy for prostate cancer in
men 55– 66 years of age with PSA concentrations of 3–10
mg/L. As used, the MLP did not provide information
about the importance of the different variables. However,
when LR was used, the F/T ratio, DRE, and heredity were
found to be independent predictors of prostate cancer,
whereas age, and total and free PSA were not. With the
explanatory variables selected, LR predicted prostate cancer significantly better than the MLP and the combination
991
of total PSA and the F/T ratio alone (P ,0.01, McNemar
test). However, the impact of total PSA is artificially
restricted by the use of a limited concentration range, and
when all cases with PSA concentrations of 3– 45 mg/L
were considered, both total PSA and the F/T ratio were
significant variables.
Of the explanatory variables examined, the F/T ratio
and DRE were the most significant predictors of prostate
cancer in the restricted PSA concentration range 3–10
mg/L. This is in agreement with earlier studies (17, 25–
30 ). Heredity also improved the diagnostic accuracy. The
effect of age was not significant, probably because of the
narrow age distribution, only 10 years, whereas it was
nearly 40 years in the study of Chen et al. (26 ). Free PSA
was not an independent variable in LR, apparently because it contained repetitive information already included
in the F/T ratio.
DRE and TRUS were more sensitive (41.5% each) than
heredity (22.6%), but the stepwise method selected only
DRE to the model, apparently because it was slightly
more specific (91.2%) than TRUS (86.8%) and because of
considerable covariance between these variables. This
finding was obtained although different examiners performed the examinations without knowledge of the result
of the other test. TRUS was a significant variable in only
one training set and did not add diagnostic information to
the selected logistic model. However, in an LR model
where TRUS replaced DRE, TRUS was statistically significant in every training set. A positive result in TRUS
increased the cancer risk 3.5-fold, whereas the corresponding increase for DRE was sixfold. Therefore DRE
was included in the calculations of probability curves. In
this study, TRUS did not add statistical power and,
therefore, was not included.
The results obtained with TRUS and DRE are highly
observer dependent; therefore, our results may not be
reproducible in a different setting. TRUS and DRE generally are thought to provide independent information, but
in a study where this was evaluated by LR, only PSA,
DRE, and PSA related to prostate volume were found to
predict the presence of prostate cancer (31 ).
In a screening setting, PSA often is determined first,
and DRE is performed only on patients with abnormal
results. In a clinical setting, this information is also
usually available before the decision to perform a biopsy
is made. Presently, the F/T ratio mainly is used to
evaluate the need for a biopsy if total PSA is 4 –10 mg/L,
whereas biopsy is always considered as indicated if PSA
is .10 mg/L. The probability curves shown in Figs. 1 and
2 suggest that this approach is fairly crude. The mean
cancer probability associated with PSA values of 4 –10
mg/L is ;25%. Fig. 2A shows that the risk is ;15% when
PSA is 4 mg/L, the F/T ratio is ;0.20, and DRE is normal.
Fig. 2A also demonstrates how strongly changes in the
F/T ratio affect the risk. Thus, a F/T ratio of 0.4 in
combination with a total-PSA concentration of 30 mg/L is
also associated with a cancer probability of ;15%. This
992
Virtanen et al.: Estimation of Probability of Prostate Cancer
Fig. 2. Probability of cancer derived from LR of total PSA (mg/L), the F/T ratio, and a normal DRE (A) or abnormal DRE (B) for PSA concentrations
of 3– 45 mg/L.
shows that the F/T ratio could be utilized at much higher
PSA concentrations than are used at present. A low F/T
ratio increases the risk very strongly. Thus, a total-PSA
concentration of 3 mg/L and a F/T ratio of 0.1 are
associated with a 30% risk. This shows that the trend to
lower the cutoff for total PSA from 4 to 3 mg/L (32, 33 ) is
justified if the F/T ratio is used to reduce the number of
false-positive results. Fig. 2B shows that an abnormal DRE
causes a considerable further increase in cancer risk, and
biopsy is indicated in practically all men with PSA .3
mg/L and positive DRE regardless of the F/T ratio. The
risk is further increased by a positive heredity (Fig. 1), and
the combination of positive DRE and heredity indicates a
need for biopsy for all patients with a PSA concentration
.3 mg/L. The combined risk of a positive heredity and a
PSA concentration ,3 mg/L deserves to be investigated.
As shown in Figs. 1 and 2, the additional impact of
DRE and heredity on prostate cancer probability is considerable compared with total PSA and the F/T ratio.
However, there is room for further improvement in
diagnostic accuracy, and with the statistical tools used in
the present study, other variables can easily be included.
Table 3. Efficiency, sensitivity, specificity, positive predictive value, and negative predictive value in LR and MLP
corresponding to 90% sensitivity in the training or validation sets.
Efficiency, %
Sensitivity, %
PPV,a %
Specificity, %
NPV, %
Test set
LR
MLP
LR
MLP
LR
MLP
LR
MLP
LR
MLP
1
2
3
4
5
Meanb
Meanc
51
56
51
50
54
52
49
39
44
40
32
41
82
100
91
100
60
87
82
73
91
100
80
85
41
41
37
34
52
41
37
28
28
22
16
26
32
37
33
32
29
33
31
26
30
29
23
28
87
100
92
100
80
92
86
75
90
100
71
84
a
42
90
PPV, positive predictive value; NPV, negative predictive value.
b
Mean values when multiple explanatory variables are considered.
c
Mean values when the F/T ratio and total PSA are considered.
26
29
93
993
Clinical Chemistry 45, No. 7, 1999
It will be interesting to see whether utilization of PSA
density, PSA velocity, and prostate volume provides
further independent diagnostic information (6, 7, 29 ). Presenting the probabilities graphically becomes impractical
when more than three variable are included. However,
with LR, the cancer probability can be expressed with a
single value, which could replace the multistep algorithms used at present.
Probability curves and tables for a continuous range of
F/T ratios based on published cancer prevalence rates
have been presented recently by Marley et al. (34 ), but to
our knowledge, curves based on the combined impact of
all the diagnostic variable used in the present study have
not been presented before. When comparing our results
with those of Marley et al. (34 ), it should be noted that our
curves are based on the study sample and that cancer
prevalence rates have not been included in the calculations.
The correlations between the ProStatus assays and
some other widely used assays, the Abbott IMx and the
Hybritech Tandem E assays, are very good (35, 36 ), but it
still is advisable to establish risk algorithms separately for
each different assay. Many other assays give quite different results, and for these the algorithms established are
not applicable. It is also necessary to consider racial
differences when estimating prostate cancer risk on the
basis of PSA values (37 ). The impact of heredity also
needs to be evaluated separately in various populations.
The validity at 90% sensitivity is important, and the
corresponding cutoff is clinically relevant because it facilitates detection of prostate cancer at a potentially curable
stage.
One of the aims of the present study was to compare
the performance of LR and MLP. Our results suggest that
LR was slightly better, but this may not be a generally
valid conclusion because of the limited number of cases
studied. Smaller training sets had to be used than in LR
because part of each training set was used as the validation set in MLP. Furthermore, only a fivefold crossvalidation could be used. In a recent study by Gomari et
al. (38 ) comprising a larger number of cases, the performance of LR and MLP was equal.
In conclusion, the probability of prostate cancer can be
more accurately estimated by the use of LR with multiple
explanatory variables than by the use of only total PSA
and the F/T ratio. DRE and heredity were found to be
independent variables that substantially affected cancer
probability. The results of TRUS added no statistical
power to predict prostate cancer over the findings of DRE
alone, and hence were not included as an explanatory
variable in either LR model. For patients with PSA concentrations between 3–10 mg/L, LR analysis (using the
F/T ratio, the DRE examination, and heredity) allows
calculation of a single value for the probability of finding
prostate cancer on biopsy. This method may offer advan-
tages over the multistep algorithms presently used to
determine the need to perform prostate biopsy.
This work was supported by Tekes (Technology Development Centre Finland) and EG&G Wallac. We thank Jari
Forsström, Veli Kairisto, Esa Uusipaikka, and Timo Järvi
for helpful comments and discussions.
References
1. Stephenson RA, Stanford JL. Population-based prostate cancer
trends in the United States: patterns of change in the era of
prostate-specific antigen. World J Urol 1997;15:331–5.
2. Catalona WJ, Smith DS, Ratliff TL, Dodds KM, Coplen DE, Yuan JJ,
et al. Measurement of prostate-specific antigen in serum as a
screening test for prostate cancer. N Engl J Med 1991;324:
1156 – 61.
3. Labrie F, Dupont A, Suburu R, Cusan L, Tremblay M, Gomez JL,
Emond J. Serum prostate specific antigen as pre-screening test for
prostate cancer. J Urol 1992;147:846 –51.
4. Stamey TA, Yang N, Hay AR, McNeal JE, Freiha FS, Redwine E.
Prostate-specific antigen as a serum marker for adenocarcinoma
of the prostate. N Engl J Med 1987;317:909 –16.
5. Oesterling JE. Prostate specific antigen: a critical assessment of
the most useful tumor marker for adenocarcinoma of the prostate.
J Urol 1991;145:907–23.
6. Babaian RJ, Fritsche HA, Evans RB. Prostate-specific antigen and
prostate gland volume: correlation and clinical application. J Clin
Lab Anal 1990;4:135–7.
7. Carter HB, Pearson JD, Metter EJ, Brant LJ, Chan DW, Andres R, et
al. Longitudinal evaluation of prostate-specific antigen levels in
men with a without prostate disease. JAMA 1992;267:2215–20.
8. Stenman U-H, Leinonen J, Alfthan H, Rannikko S, Tuhkanen K,
Alfthan O. A complex between prostate-specific antigen and
a1-antichymotrypsin in the major form of prostate-specific antigen
in serum of patients with prostatic cancer: assay of the complex
improves clinical sensitivity for cancer. Cancer Res 1991;51:
222– 6.
9. Lilja H, Christensson A, Dahlen U, Matikainen MT, Nilsson O,
Pettersson K, Lövgren T. Prostate-specific antigen in serum occurs
predominantly in complex with a1-antichymotrypsin. Clin Chem
1991;37:1618 –25.
10. Leinonen J, Lövgren T, Vornanen T, Stenman UH. Double-label
time-resolved immunofluorometric assay of prostate-specific antigen and of its complex with a1-antichymotrypsin. Clin Chem
1993;39:2098 –103.
11. Forsström JJ, Dalton KJ. Artificial neural networks for decision
support in clinical medicine. Ann Med 1995;27:509 –17.
12. Snow PB, Smith DS, Catalona WJ. Artificial neural networks in the
diagnosis and prognosis of prostate cancer: a pilot study. J Urol
1994;152:1923– 6.
13. Press SJ, Wilson S. Choosing between logistic regression and
discriminant analysis. J Am Stat Assoc 1978;73:699 –705.
14. Albert A, Harris EK. Multivariate interpretation of clinical laboratory
data. New York: Marcel Dekker, 1987:111–2;137– 8.
15. Freeman JA, Skapura DM. Neural networks: algorithms, applications, and programming techniques. Reading, MA: Addison-Wesley Publishing, 1991:401 pp.
16. Rumelhart DE, McCelland JL, PDP Research Group. Parallel distributed processing. Exploration in the microstructure of cognition.
Vol. 1. Cambridge, MA: MIT Press, 1986:318 – 62.
17. Bangma CH, Kranse R, Blijenberg BG, Schröder FH. The value of
screening tests in the detection of prostate cancer. Part I. Results
994
18.
19.
20.
21.
22.
23.
24.
25.
26.
27.
28.
29.
Virtanen et al.: Estimation of Probability of Prostate Cancer
of a retrospective evaluation of 1726 men. Urology 1995;46:
773– 8.
Blijenberg BG, Bangma CH, Kanse R, Eman I, Schröder FH.
Analytical evaluation of the new ProstatusTM PSA free/total assay
for prostate-specific antigen as part of a screening study for
prostate cancer. Eur J Clin Chem Biochem 1997;35:111– 4.
Pregibon D. Logistic regression diagnostics. Ann Stat 1981;9:
705–24.
SAS/STAT®. Software: changes and enhancements. Cary, NC:
SAS Institute, 1996:425.
Riedmiller M, Braun H. A direct adaptive method for faster
backpropagation learning: the RPROP algorithm. University of
Karlsruhe, Germany. Proceeding of the IEEE International Conference on Neural Networks 1993;1:588 –91.
Bishop CM. Neural network for pattern recognition. New York:
Clarendon Press, 1995:343–5.
Morrison DF. Multivariate statistical methods. Tokyo, Japan:
McGraw-Hill Kogakusha, 1976:415 pp.
Sachs L. Applied statistics. A handbook of techniques. New York:
Springer-Verlag, 1984:363– 4.
Luderer AA, Chen YT, Soriano TF, Kramp WJ, Carlson G, Cuny C,
et al. Measurement of the proportion of free to total prostatespecific antigen improves diagnostic performance of prostate-specific antigen in the diagnostic gray zone of total prostate-specific
antigen. Urology 1995;46:187–95.
Chen YT, Luderer AA, Thiel RP, Carlson G, Cuny CL, Soriano TF.
Using proportions of free to total prostate-specific antigen, age,
and total prostate-specific antigen to predict the probability of
prostate cancer. Urology 1996;47:518 –24.
Elgamal AA, Cornillie FJ, Van Poppel HP, Van de Voorde WM,
McCabe R, Baert LV. Free-to-total prostate specific antigen ratio
as a single test for detection of significant stage T1c prostate
cancer. J Urol 1996;156:1042–9.
Björk T, Piironen T, Pettersson K, Lövgren T, Stenman UH,
Oesterling JE, et al. Comparison of analysis of the different
prostate-specific antigen forms in serum for detection of clinically
localized prostate cancer. Urology 1996;48:882– 8.
Partin AW, Catalona WJ, Southwick PC, Subong ENP, Gasior GH,
Chan DW. Analysis of percent free prostate-specific antigen (PSA)
30.
31.
32.
33.
34.
35.
36.
37.
38.
for prostate cancer detection: influence of total PSA, prostate
volume and age. Urology 1996;48:55– 61.
Reissigl A, Klocker H, Pointner J, Fink K, Horninger W, Ennemoser
O, et al. Usefulness of the ratio free/total prostate-specific
antigen in addition to total PSA levels in prostate cancer screening. Urology 1996;48:62– 6.
Standaert B, Alwan A, Nelen V, Denis L. Prostate volume and
cancer in screening programs. Prostate 1997;33:188 –94.
Lodding P, Aus G, Bergdahl S, Frosing R, Lilja H, Pihl CG,
Hugosson J. Characteristics of screening detected prostate cancer in men 50 to 66 years old with 3 to 4 ng/mL prostate specific
antigen. J Urol 1998;159:899 –903.
Catalona WJ, Smith DS, Ornstein DK. Prostate cancer detection in
men with serum PSA concentrations of 2.6 to 4.0 ng/mL and
benign prostate examination. Enhancement of specificity with free
PSA measurements. JAMA 1997;277:1452–5.
Marley GM, Miller MC, Kattan MW, Zhao G, Patton KP, Vessella
RL, et al. Free and complexed prostate-specific antigen serum
ratios to predict probability of primary prostate cancer and benign
prostatic hyperplasia. Urology 1996;48:16 –22.
Bangma CH, Kranse R, Blijenberg BG, Schröder FH. The new
DELFIA prostate specific antigen (PSA): results of the comparative
evaluation. J Urol 1995;153(Suppl):294A.
Blijenberg BG, Bangma CH, Kranse R, Schröder FH. Analytical
evaluation of the new DELFIA free and total prostate-specific
antigen assays. 11th International European Congress of Clinical
Chemistry, Tampere, 1995. Scand J Clin Lab Investig 1995;
55(Suppl 223):763.
Pettaway CA, Troncoso P, Ramirez EI, Johnston DA, Steelhammer
L, Babaian RJ. Prostate specific antigen and pathological features
of prostate cancer in black and white patients: a comparative
study based on radical prostatectomy specimens. J Urol 1998;
160:437– 42.
Gomari M, Järvi T, Hugosson J, Finne P, Stenman U-H. Learning
vector quantization, multilayer perceptron, neurofuzzy network
and logistic regression in the diagnosis of prostate cancer. In:
Arabnia H, ed. Proceedings of the 1998 International Conference
on Parallel and Distributed Processing Techniques and Applications. PDPTA ‘98. Las Vegas, NV: CSREA Press, 1998:516 –25.