R E G R E S S I 0 N
H 0 D E L S
I N
M0 R B I D I T Y
September,
E S T I MA T I 0 N
Gordon E. Willmot, M.Math, ASA
Actuarial Assistant
Hutual Life of Canada
1~82
-195-
Rearession Models in Morbidity Estimates
Abstract
A statistical model is presented which analyzes group
health and dental data.
The purpose is to determine the cost
of these benefits with special reference to the effects of
various characteristics of individuals in the group.
The methodology involves a combination of a binary
logistic regression and a normal multiple linear regression.
-196-
Regression Models In Morbidity Estimation
We are interested in the group insurance premium for any or
all of the following benefits:
i)
ii)
iii)
iv)
v)
drug benefits
hospital coverage (private or semi-private)
vision care
dental insurance
supplementary health insurance (e.g. ambulance, wheelchair,
etc.)
In particular, we wish to determine the effects of an individual's
characteristics on the cost of supplying these coverages.
We denote by
Lti
=
(xil' xi2'" • • • ,xip)
a given set of covariates which may include
i)
age
ii)
sex
iii)
iv)
v)
occupation
location of residence
marital status
among others.
For example, we might have
xil
xi2
.
ar:Je < 35
1
i f 18
0
otherwise
1
i f 35
0
otherwise
~
age < 50
-197-
xi3
xi4
age < 65
1
if
0
othe:r:wise
1
i f sex is male
0
i f sex is female
50
~
etc.
If we denote by X the total claim amount for an individual for
the premium
paying period then we are interested in the random
variable
X
I
x.
-l.
or, more specifically, the first couple of
moments of its
associated distribution.
He assume we have the total amount of claims for each individual
in the study for a period of time at least as great as the premium
payment period (to eliminate the effects of seasonality on the
claims).
Furthermore, assume that all claim amounts are strictly
positive so that X=O implies no claims occurred.
Then,
~l claim}
I x.,
= Prh.lclaimlx.}•Pr{X=x
Pr {X=xlx.}
_l.
-l.
-l.
We will estimate the two quantities on the right hand side of
this equation separately.
Let us divide the data into n groups such that all covariates in
a given group are equal (or roughly so).
-190-
Suppose in group i
there
are ni individuals, di of which have at least 1 claim.
Let us assume
that x. is the covariate vector of the ith group.
-~
Part 1 - Logistic Regression
Let
p.~
= Pr
{ ~ 1 claim [ x.}
-~
Then it is reasonable to assume that di is an observation from
a binomial variate with parametersni and pi.
Let
e
B1x.~ 1 + ••• + BP x.~n
P·~
1 +e
e
~1
P
1 + e -~ -
~P
e.
where
~
e.
x. B
B1 x. + ••• + B x.
e.
1+e ~
~i 8
~
Then we are interested in making inferences about
B.
For notational simplicity, let
i e1
• X
-1
~2
e2
e
'
81
82
X
en
n
X
1
i
B
X
, -n:
-199-
n
X
p
BP
p
X
1
TO get estimates for 13 , we use likelihood methods.
x. 13·d.
e -l. - l.
x. 3 n.
i=l (l+e-l.-l l.
n
(since d. is binomial)
l.
IT
L ( 8 )
n
e
IT
e.l.
d.
l.
i=l (1 + e
n.
l.
().
].)
Calculate the score function
s
(
a
13
a13i
log L
n
1:
( d
i=l
l.
X.
J.
J
) }
(
13
-
x.3
-l.nixije
l}
x.s
-l.pxl
1 + e
pxl
~
and the information matrix
I
(
a2 loq
a 13j a
13
L (
~ l
13k
p
x
p
n
{
-
i=l
then the estimates 3
may be calculated interatively using Newton-
Raphson techniques as follows (
interation) untll
~
~
i is the value of 13 from the i th
i is sufficiently close to
-700-
~i-l"
To get a starting value ~ , we go back to the likelihoo d as a
0
function of
L
(
e.
We have
e>
.
n
II
e.d.
1. 1.
1
1.=
.:::e~-...,.....-
(1
and so
+e
e
0
1.)
n.
1.
d.
e.
log
-J
It can be shown using large sample theory techniqu es that
(asvrnpto tically)
e. J
N {
e. ,
n.
J
n.
or
log
which is the usual weighted least squaresrn ultiple regressio n model.
Thus, if this were the true result, we would have
where
v
nxp
d.
n.-ct.
y
J
-201-
J
This can thus serve as a good starting value for 8 (ie. as
that i f dj
and if d.
J
II'
I
,Ill'
II"
1
Note
o.s
log
by
n.- d.
J
J
nj + 0. 5
nj, replace
d.
log
0 ).
0, it is recommended to replace
d.
log
~
n. + 0.5
log
by
n.- d.
J
J
0.5
*
in theY matrix above so that 8.
J
final answer.
oo.
~his
does not affect the
'L
From large sample theory, it can be shown that 8 is approximately
·'I
covariance matrix r- 1 (
normal with mean 8 and variance
~).
Thus,
the usual tests can be applied to the solutions 8 for inference
1111
purposes (ie. to test whether
~i
= 0 so that the corresponding
covariates may be dropped) •
I
rl· •
To test the fit of the model, a one-tailed test involving the
chi-squared statistic
- nipi)
'A
2
X (n-p)
n
L:
(di
A
i=l nipi (l
-
2
A
where
pi)
x.8
e -~~j II
(large values give evidence against the model).
,,,
~I
-202-
Part 2 - Multiple Linear Regression
In order to consider
X \ ~i ;;. 1 claim
we consider only the di individuahwho had claims.
Let T .. be
l.J
the total claim amount for person j, group i.
Our model is
) - N { TlXil + ••• +TpXip' cr 2}
f(T i j
where f is chosen so that the data appears to be roughly normal.
Appropriate choices of f include the following
f(x)
=x
f(x)
log x
f(x)
Jx
among others.
Let
dj
l:
j=l
YiJ./dJ.
Then
The least squares estimates of T are then
=
(VTV)-l VT Z
where
v
{ICfi
xij} nxp'
-203-
and
An estimate of c
l.
L:
L:
2
i=l
1
is
d.
n
s
2
j=l (y ij
n
<ih d.)
l.
-
-
2
yi)
n
To test the fit of the model, compute the F statistic
(n-p)
F
n
n-p, (j_~l di)-n
s
2
1
This is the ratio of the "lack of fit" sum of squares to the "pure
error" sum of squares.
A one-tailed test is appropriate. If the model
is good, a better estimate of a
n
L:
i=l
2 is
d.
l.
L:
j=l
This problem could have been formulated as a weighted least
squares problem with identical results.
The use of the mean rather
than the original data is primarily to cut down on the computing
time.
-204-
The Distribution
Part 3
We may now compute the distribution of
The density for
x > 0
is
-
f (x}
cp
P·l
gi (x)
e
cp (x)
{
X. T
-l-}
!_'_(x)
A
a
a
X
where
xlx- l..
2
-1
~
Also
gi
PriX
(0)
Olxil
l
-
I:+
gi(x)dx
and
E{XIx.l
-l
For a deductible of amount c, we are interest in the random
variable
w 1 -x.l
max { X- c, 0 }
and Pr {W = 0 I ~i}
Pr { x . ,< c
1
I -x.l
I x.
}
-l
and the density for x > 0 is
gi (x + c)
The expected value is
~
I
(X - C)
g i (X) dx
wish to thank Dr. R. J. Mackay and Dr. J. G. Kalbfleisch of
the University of Waterloo for their extremely helpful advice and
support in this venture.
-205-
References
1)
Theoretical Statistics, Cox and Hinkley, 1974, Chapman and
Hall, London
2)
Applied Regression Analysis, Draper and Smith, 1966, John
Wiley and Sons, New York
3)
Discrete Data Analysis Lecture Notes, J. G. Kalbfleisch,
University of Waterloo, Waterloo
-206-
© Copyright 2026 Paperzz