Chapter LM2: Review of Mathematical Statistics
Bias bθ̂ (θ):
Confidence interval:
A 100(1 − α)% confidence interval for a parameter θ is a pair of values L and U computed from
a random sample such that:
bθ̂ (θ) = E[θ̂|θ] − θ
Asymptotically unbiased estimator:
For an estimator θ̂n of θ based on a sample of
size n. θ̂ is asymptotically unbiased if
∀ θ, P r[L ≤ θ ≤ U ] ≥ 1 − α
Normal approximation:
For a point estimator θ̂ of parameter θ such that
E(θ̂) = θ, V ar(θ̂) = v(θ) and θ̂ has approximately a normal distribution:
"
#
θ̂ − θ
1 − α ' P r −zα/2 ≤ p
≤ zα/2
v(θ)
lim E[θ̂n |θ] = θ
n→∞
Consistent estimator:
An estimator is (weakly) consistent if ∀ δ > 0
and ∀ θ:
lim P r[|θ̂n − θ| > δ] = 0
n→∞
Type I error:
The error made when the test rejects the null
hypothesis (H0 ) in a situation when the null
hypothesis is true.
A sufficient condition for weak consistency is
that the estimator is asymtotically unbiased
and V ar(θ̂n ) → 0.
Mean-squared error (MSE):
h
i
M SEθ̂ (θ) = E (θ̂ − θ)2 |θ
Significance level α:
The probability of making a type I error given
that the null hypothesis is true. If H0 can be
true in more than one way, the level of significance is the maximum of such probabilities.
Relation between MSE, Variance and
Bias:
Type II error:
The error made when the test does not reject
the null hypothesis (H0 ) in a situation when the
alternative hypothesis is true.
M SEθ̂ (θ) = V ar(θ̂|θ) + [bθ̂ (θ)]2
UMVUE:
An estimator θ̂ is called a uniformly minimum
variance unbiased estimator if it is unbiased and
for any true value of θ there is no other unbiased
estimator that has a smaller variance.
Uniformly most powerful test:
No other test exist that has the same or lower
significance level and for a particular value
within the alternative hypothesis has a smaller
probability of making a type II error.
Empirical mean and variance:
n
x̄ =
s2 =
1X
xi
n
1
n
i=1
n
X
p-value:
For a hypothesis test, the p-value is the probability that the test statistic takes on a value
that is less in agreement with the null hypothesis than the value obtained from the sample.
Tests conducted at a significance level that is
greater than the p-value will lead to a rejection
of the null hypothesis, while tests conducted at
a significance level smaller than the p-value lead
to a failure to reject the null hypothesis.
(xi − x̄)2
i=1
Sample mean and variance:
n
x̄ =
1X
xi
n
i=1
n
s2 =
1 X
(xi − x̄)2
n−1
i=1
c
Exam C - Actuarial Models - LGD
1
Chapter LM3: Non-Parametric Empirical Point Estimation
The empirical distribution
The empirical distribution is a discrete random
variable constructed from a random sample of
n observations x1 , · · · , xn each of which is assigned a probability 1/n.
Empirical estimate of the variance of X
It is the variance of the empirical distribution
n
V ˆar[X] =
1X
(Xi − X̄n )2
n
i=1
=
Emp. distribution probability function
n
1X 2
Xi − X̄n2
n
i=1
# of xi ’s equal to yj
sj
pn (yj ) =
=
n
n
sj = # of observations equal to yj
Empirical estimate of the k-th limited moment with limit u
X
1
E[(X∧u)k ] =
xki + uk · [# of xi ’s > u]
n
Empirical distribution function
xi ≤u
number of xi ’s ≤ t
n
Fn (t) =
Nelson-Aalen empirical estimate of the
cumulative hazard rate function
0
t < x1
j−1
P si
ri xj−1 ≤ t < xj , j = 2, · · · , k
Ĥ(t) =
i=1
j−1
P si
ri xj−1 ≤ t
Empirical survival function
number of xi ’s > t
n
= 1 − Fn (t)
Sn (t) =
Risk set at yj
The set of observed values that are ≥ yj , denoted rj
rj =
k
X
i=1
Nelson-Aalen estimate of the survival
function
Ŝ(x) = e−Ĥ(x)
si = sj + · · · + sk
i=j
Nelson-Aalen estimate of the distribution
function
Empirical survival/distribution function
r
Sn (t) = nj
Fn (t) = 1 −
rj
n
if
if
F̂ (x) = 1 − e−Ĥ(x)
yj−1 ≤ t < yj
yj−1 ≤ t < yj
Smoothed empirical estimate of the 100pth percentile π̂p
If there is no censored or truncated data, the
empirical survival function is the same as the
Kaplan-Meier Product Limit estimate of S(x).
(i) Order the sample values from smallest to
largest x(1) , · · · , x(n)
Empirical estimate of the mean of X
It is the mean of the empirical distribution, also
called sample mean
(ii) Find the integer g such that
g ≤ p · (n + 1) ≤ g + 1
n
µ̂ = X̄n =
(iii) π̂p is found by linear interpolation
1X
Xi
n
π̂p = [g+1−(n+1)p]·x(g) +[(n+1)p−g]·x(g+1)
i=1
Emp. estimate of the k-th raw moment
n
0
µ̂k =
1X k
Xi
n
i=1
c
Exam C - Actuarial Models - LGD
2
Chapter LM4: Kernel Smoothing
Kernel density estimator of the density
function
At each point yi in the empirical distribution,
a density function kyi (x) is created and satisfies the requirements of a pdf. The kernel
smoothed density estimator is then a finite mixture (weighted average) of these separate density functions. The weight applied to kyi (x) is
the empirical probability p(yi ) and the kernel
smoothed estimate of the density function is
X
fˆ(x) =
p(yj ) · kyj (x)
Gamma kernel with shape parameter α
and θ = y/α
α
αx
e−αx/y
y
ky (x) =
,
x>0
xΓ(α)
The Gamma kernel does not require choosing a
bandwidth b (it is never 0 for x > 0) but instead
requires choosing the shape parameter α. When
α = 1, it reduces to an exponential kernel
ky (x) =
All Yi
Kernel density estimator of the distribution function
Kyj (x) is the cdf for the pdf kyi (x) and is always between 0 and 1. Then the kernel density
estimator F̂ (x) of the cumulative distribution
function is
X
F̂ (x) =
p(yj ) · Kyj (x)
All Yi
Uniform kernel with bandwidth b
x<y−b
0
1
y
−b≤x≤y+b
ky (x) =
2b
0
x>y+b
The graph of kyi (x) is a horizontal line of height
1
2b with a width of 2b on the interval [yi −b, yi +b].
For the uniform kernel with bandwidth b, the
formal definition of Kyi (x) is
x<y−b
0
x−(y−b)
Ky (x) =
y−b≤x≤y+b
2b
1
x>y+b
Triangle kernel with bandwidth b
0
x<y−b
x−(y−b)
y−b≤x≤y
b2
ky (x) =
(y+b)−x
y ≤x≤y+b
2
0 b
x>y+b
The graph of kyi (x) is a triangle centered at yi
of height 1b with a width of 2b on the interval
[yi − b, yi + b]. For the triangle kernel with bandwidth b, the formal definition of Kyi (x) is
0
x<y−b
[x−(y−b)]2
y−b≤x≤y
2b2
Ky (x) =
[(y+b)−x]2
y ≤x≤y+b
2b2
1−
1
x>y+b
c
Exam C - Actuarial Models - LGD
3
1 −x/y
e
,
y
x>0
Chapter LM5: Empirical Estimation from Grouped Data
The weight nni is applied to the average for interval i, ci +c2i−1 for the intervals up to ]cj−1 , cj ],
n
+···+nr
# of claims >u
= Total
and the weight j+1 n
# of claims is
applied to u. A similar relationship holds for
the k-th limited moment
Grouped data
Grouped data for n data points is in the
form of a series of intervals ] − ∞, c0 ],
]c0 , c1 ],· · · ,]cτ −1 , cτ ]. The j-th interval ]cj−1 , cj ]
has nj observations and n1 + · · · + nτ = n.
Histogram of the data
The graph of the series of successive rectangles
whose successive bases are the intervals ]cj−1 , cj ]
n
with height n(cj −cj j−1 ) . The area of the rectangle
h
k
E (X ∧ u)
i
'
i=1
r
X
ni
+ u ·
n
n
i=j+1
(ii) cj−1 ≤ u < cj is not an interval point:
j
X
ni ci + ci−1
E[X ∧ u] =
·
n
2
Empirical distribution function at the interval point cj
i=1
nj cj−1 + u u − cj−1
·
·
n
2
cj − cj−1
nj
cj − u
+
·u·
n
cj − cj−1
r
X
ni
+ u·
n
+
j
1X
Fn (cj ) =
ni
n
i=1
The graph of the distribution function (the
ogive) is constructed by connecting the successive points on the graph, so that Fn (c0 ) = 0,
2
Fn (c1 ) = nn1 , Fn (c2 ) = n1 +n
n , · · · , with linear
interpolation between successive cj ’s. The slope
n
of the ogive on that intervalis equal to n(cj −cj j−1 ) .
i=j+1
A similar relationship holds for the k-th limited
moment
j
k+1
h
i
X
ni (ck+1
− ci−1
)
i
E (X ∧ u)k '
n(k + 1)(ci − ci−1 )
Empirical estimate of the mean
r X
nj cj−1 + cj
·
µ̂ =
n
2
i=1
+
j=1
+
which is the weighted average of the intervals
midpoints.
nj (uk+1 − ck+1
j−1 )
n(k + 1)(cj − cj−1 )
r
X
nj uk (cj − u)
ni
k
+u ·
n(cj − cj−1 )
n
i=j+1
Empirical estimate of the 100p-th percentile
It is found by solving Fn (π̂p ) = p using the ogive
of the empirical cumulative distribution function.
Empirical estimate of the k-th moment
!
r
X
ck+1
− ck+1
nj
j
j−1
µ̂k =
·
n (k + 1)(cj − cj−1 )
j=1
This is equivalent to assuming that observations
are uniformly distributed within each interval
Empirical estimate of the k-th limited
moment with limit u
(i) u = cj is an interval point:
j
r
X
X
nj ci + ci−1
ni
·
+u·
E[X ∧ u] =
n
2
n
c
Exam C - Actuarial Models - LGD
ck+1
− ck+1
i
i−1
·
n (k + 1)(ci − ci−1 )
k
above the interval ]cj−1 , cj ] is nj which is the
empirical probability that X occurs in that interval, which can be interpreted as the empirical
pdf fn (x), cj−1 < x < cj .
i=1
j
X
nj
i=j+1
4
Chapter LM6: Estimation from Censored and Truncated Data
Notations
Each individual is assigned a value of xi , the
observed time of death or ui , the age of the individual at the time he leaves the study or when
the study ends. Each individual is also assigned
a value of di indicating the age of the individual at the time he came under observation. We
order the times of death, y1 < · · · < yk , and denote by sj the number of deaths at that point,
i.e sj is the number of xi ’s that are equal to yj .
KMPL regularisation at large t
Let z denote the largest observed time in the
data set. If z is a death time, then Sn (t) = 0 for
t ≥ z. If z is a censoring time, we can estimate
S(t) for t > z as follows:
(i) Sn (t) = 0 for t > z
(ii) Sn (t) = Sn (z) for t > z
(iii) Geometric extension approximation
Sn (t) = [Sn (z)]t/z
The Kaplan-Meier Product-Limit estimate applied to complete individual (random sample
with no censoring and no truncation) data for a
random variable X results in the same estimate
of the distribution and survival functions of X
as the empirical estimate
Risk set
The risk set rj at each death point yj represents
the number of individuals under observation
just before the death point yj .
rj = rj−1 − sj−1
+
# of individuals yj−1 ≤ di < yi
−
# of individuals yj−1 ≤ um < yi
for t > z
Nelson-Aalen estimate of the cumulative
hazard rate function
0
if 0 ≤ t < y1
j−1
P
si
if yj−1 ≤ t < yj
ri
Ĥ(t) =
i=1
k
P
si
if t ≥ yk
ri
If a truncated or censored observation time is
tied with the death time yj , that individual is
added or removed after the deaths at point yj
and only affects the risk set rj+1
i=1
Other risk set descriptions
(i) The risk set at death point yj is set of all
data points that come under observation before
yj , but it excludes those who have died or been
censored before yj
0
0
Nelson-Aalen estimate of the survival
function
Ŝ(t) = e−Ĥ(t)
Kaplan-Meier approximations for large
data sets
As with grouped data, we have a sequence of
points in time: 0 = c0 < c1 < · · · < ck . For
the interval ]cj , cj+1 ], xj is the number of observed deaths, uj is the number of right-censored
(leaving) observations, and dj is the number of
left-truncated (new entrants) observations. We
use the Kaplan-Meier Product-Limit formula
to estimate SX (t) at endpoints of the intervals
(c0 , c1 , · · · ) and use linear interpolation in between.
Y
xj
Sn (ci ) =
1−
r̃j
0
rj = (#d s < yj ) − (#x s < yj ) − (#u s < yj )
(ii) The risk set at yj is set of all individuals who
will be observed to die or are censored after time
yi and have already come under observation
rj = (#x0 s ≥ yj ) + (#u0 s ≥ yj ) − (#d0 s ≥ yj )
(iii) The risk set at yj is set of all individuals
i who have already come under observation by
time yj and must not yet have died or been rightcensored.
rj = # of i’s such di < yj ≤ xi or ui
Kaplan-Meier Product-Limit
1
if
j−1
Q
if
1 − srii
Sn (t) =
i=1
k Q
1 − srii or 0 if
Estimator
j≤i
0 ≤ t < y1
To get r̃j : Assume that the given fraction α of all
enterings occur before all the deaths and (1 − α)
after. Assume that the given fraction β of all the
leavings occur before all the deaths and (1 − β)
after.
yj−1 ≤ t < yj
t ≥ yk
i=1
c
Exam C - Actuarial Models - LGD
5
Chapter LM7: Variance and Confidence Intervals for Survival Probability Estimators
Proportion estimator
The variance of this estimator is
V ar[fn (x)] =
number of observed successes
p̂ =
n = number of trials
estimating the true proportion p. Variance:
V ar[p̂] =
[Sn (cj−1 )−Sn (cj )]·[1−Sn (cj−1 )+Sn (cj )]
n(cj −cj−1 )2
Censored/truncated data: Greenwood’s
approximation of the KMPL variance
p(1 − p)
p̂(1 − p̂)
' V ˆar[p̂] =
n
n
V ˆar [Sn (yj )] = [Sn (yj )]2 ·
i=1
Individual data, no censoring/truncation
Variance of the Nelson-Aalen estimate
of the cumulative hazard function H(yj )
j
h
i X
si
V ˆar Ĥ(yj ) =
r2
i=1 i
Sn (x)[1 − Sn (x)]
n
Linear confidence interval
An 95% confidence interval for θ using θ̂ and an
estimate of the variance of that estimator v(θ̂)
q
θ̂ ± 1.96 v(θ̂)
Data grouped by interval
mj number of deaths in the interval ]cj−1 , cj ]. #
of survivors at time cj is mj+1 + · · · = nj . # of
deaths by time cj is m1 + · · · + mj = n − nj . If
x ∈]cj−1 , cj ] then Sn (x) is found by linear interpolation between Sn (cj−1 ) and Sn (cj )
Sn (x) =
For 90%: 1.96 → 1.645, 99%: 1.96 → 2.33.
Log-transformed interval for S(t)
Start from a 95% symmetric confidence interval for ln[− ln S(t)], L ≤ ln[− ln S(t)] ≤ R.
ln[− ln S(t)] is the midpoint in [L, R].
cj − x
x − cj−1
·Sn (cj−1 )+
·Sn (cj )
cj − cj−1
cj − cj−1
With Y = n − nj−1 = # of deaths up to time
cj−1 and Z = mj = # of deaths in ]cj−1 , cj ]
which can be regarded as binomial RV,
L ≤ ln[− ln S(t)] ≤ R ⇒ L0 ≤ S(t) ≤ R0
The 95% log-transformed interval for S(t) has
lower limit Sn (t)1/U and upper limit Sn (t)U
q
1.96 V ˆar[Sn (t)]
U = exp
Sn (t) · ln[Sn (t)]
Y · (cj − cj−1 ) + Z · (x − cj−1 )
Sn (x) = 1 −
n · (cj − cj−1 )
To formulate the variance of Sn (x) we use
V ar[Y ] =
n·S(cj−1 )·[1−S(cj−1 )]
V ar[Z] =
n·[S(cj−1 )−S(cj )]·[1−S(cj−1 )+S(cj )]
Cov[Y, Z] =
Log-transformed interval for H(t)
95% symmetric interval for ln H(t), L ≤
ln Ĥ(t) ≤ R. ln Ĥ(t) is the midpoint in [L, R].
n·[1−S(cj−1 )]·[S(cj−1 )−S(cj )]
Then,
L ≤ ln Ĥ(t) ≤ R ⇒ eL ≤ Ĥ(t) ≤ eR
V ar[Sn (x)] =
(cj −cj−1 )2 V ar[Y ]+(x−cj−1 )2 V ar[Z]
+2(cj −cj−1 )(x−cj−1 )Cov[Y,Z]
[n(cj − cj−1 )]2
The 95% log-transformed interval for H(t) has
Sn (cj−1 ) − Sn (cj )
Z
=
n(cj − cj−1 )
cj − cj−1
c
Exam C - Actuarial Models - LGD
Ĥ(t)
U
and upper limit Ĥ(t) · U
q
1.96 V ˆar[Ĥ(t)]
U = exp
Ĥ(t)
lower limit
Estimate of the density function
fn (x) =
si
ri (ri − si )
This tends to underestimate the true variance
# deaths after time x
nx
Sn (x) =
=
n
n
Sn (x) is a “binomial proportion” type of estimator. Out of n trials, count nx = # of successes.
Estimator=proportion of trials being successes
V ar[Sn (x)] =
j
X
6
Chapter LM8: Parametric Estimation
Method of Moments
For a distribution defined in terms of r parameters (θ1 , · · · , θr ), the method of moments consists of solving the r equations:
2-Pareto α, θ
E[X] =
2θ2
(α − 1)(α − 2)
α
θ
F (x) = 1 −
x+θ
E[X 2 ] =
n
theoretical j-th moment =
1X j
xi
n
i=1
The equation on the second moment can be replaced by an equation with the variance
Method of moments works well ((E[X])2 /E[X 2 ]
gives α), percentile matching requires numerical
approximation
n
theoretical distribution variance =
θ
α−1
1X
(xi −x̄)2
n
Gamma α, θ
i=1
E[X] = αθ
Method of Percentile Matching
Given a random sample or interval grouped data
sample and a distribution defined in terms of r
parameters, the method of percentiles consists
of solving the r equations obtained by setting
the distribution pi -th percentile equal to the
empirical estimate of the pi -th percentile. The
smoothed empirical percentile is obtained by
adding one to the sample size and multiplying
by the percent.
Method of moments works well ((E[X])2 /E[X 2 ]
gives α), percentile matching requires numerical
approximation
Moment estimation & right-censored data
With a policy limit in effect, we set the theoretical limited expected value equal to the empirical
limited expected value
Methods of moments requires numerical approximation, percentile matching works well
Zu
E[X ∧ u] =
=
V ar[X] = αθ2
Weibull τ , θ
F (x) = 1 − e−(x/θ)
[1 − F (x)]dx =
σ2
E[X] = exp µ +
2
4σ 2
2
E[X ] = exp 2µ +
2
ln x − µ
F (x) = Φ
σ
S(x)dx
0
Empirical LEV
Sum of payment amounts
# of payments
Moment estimation with left-truncated &
right-censored data
With a deductible and a policy limit in effect, we
set the theoretical limited expected value equal
to the empirical limited expected value
E[X] − E[X ∧ u]
1 − F (d)
=
=
Methods of moments works well, percentile
matching works well
Loglogistic γ, θ
F (x) =
Empirical expected cost
per payment
Sum of payment amounts
# of payments
c
Exam C - Actuarial Models - LGD
τ
Lognormal µ, σ
Zu
0
=
E[X 2 ] = α(α + 1)θ2
(x/θ)γ
1 + (x/θ)γ
Methods of moments requires numerical approximation, percentile matching works well
7
A data sample consists of x1 , · · · , xj where each
xi is a number from 0 to m indicating the number of successes in the i-th group of m trials. nk
is the number of observations for which N = k
Exponential θ
E[X] = θ
F (x) = 1 − e−x/θ
j
mq =
Methods of moments works well, percentile
matching works well
1X
k · nk
n
k=0
j
mq(1 − q) =
1-Pareto α unknown, θ given
1X 2
k · nk −
n
k=0
αθ
α−1
α
θ
F (x) = 1 −
x
m
P
q̂ =
Methods of moments works well, percentile
matching works well
Poisson distribution λ
E[N ] = λ
V ar[N ] = λ
Let nk be the number of observations which were
equal to k and n = n1 +· · ·+nj the total number
of observations
j
1X
k · nk
n
k=0
total # claims from all n observations
n
Negative binomial distribution r, β
E[N ] = rβ
V ar[N ] = rβ(β + 1)
The total number of observations is n = n0 +
· · · + nj and nk is the number of observations
for which N = k
j
rβ =
1X
k · nk
n
k=0
j
rβ(β + 1) =
1X 2
k · nk −
n
k=0
j
1X
k · nk
n
!2
k=0
Binomial distribution
E[N ] = mq
V ar[N ] = mq(1 − q)
c
Exam C - Actuarial Models - LGD
8
j
P
k · nk
1 k=0
m
m P
k=0
=
!2
k=0
If m is known then the data set is n0 , · · · , nm
and the moment estimate of q is
E[X] =
λ̂ =
j
1X
k · nk
n
=
nk
xi
i=1
m·j
Chapter LM9: Maximum Likelihood Estimation
Complete individual data likelihood L
n
Y
L(θ) =
f (xj ; θ) , random sample x1 , · · · , xn
For different deductibles di
L(θ) =
j=1
k
Y
f (yi + di ; θ)
i=1
Complete grouped data likelihood L
r
Y
L(θ) =
[F (cj ; θ) − F (cj−1 ; θ)]nj
1 − F (di ; θ)
Estimating cost/payment:shifted model
j=1
for r intervals, and ]cj−1 , cj ] has nj observations
L(θ) =
k
Y
f (yi ; θ0 )
i=1
Loglikelihood
Instead of estimating θ in the ground up distribution of the loss amount X, we estimate the
parameter θ0 using the same distribution type,
but using the payment amount data yi ’s. The
resulting estimated distribution is the estimate
for the amount paid after deductible, or the cost
per payment random variable
l(θ) = ln L(θ)
Find the value of θ that maximizes L(θ)
Properties of the MLE
(i) MLE is asymptotically normal (as sample
size n increases, θ̂n → normal distribution
(ii) MLE is asymptotically unbiased (as sample
size n increases, the expected value of the estimator approaches the true parameter value)
(iii) MLE has smallest asymptotic variance of
all asymptotically normal estimators of θ
Likelihood for data based on policy deductible d and maximum covered loss u
For n observed insurance payments y1 , · · · , yn
satisfying 0 < yi < u−d, there are n corresponding loss amounts x1 , · · · , xn where xi = yi + d.
For m observed insurance payments of amounts
u − d, there are m corresponding losses ≥ u and
L is
#
"
n
Q
f (xj ; θ) · [1 − F (u; θ)]m
Likelihood function & right-censored data
If an insurance policy has a maximum policy
limit of u, any loss above u is represented in the
likelihood function as 1−F (u; θ). For x1 , · · · , xn
known payment amounts < u and m payments
equal to u, the likelihood function is of the form
n
Y
L(θ) =
f (xj ; θ) · [1 − F (u; θ)]m
j=1
L(θ) =
"
j=1
If we have m different limit payments u1 , · · · , um
n
Y
Y
L(θ) =
f (xj ; θ) ·
i = 1m [1 − F (ui ; θ)]
=
i=1
k
Y
i=1
L(θ) =
n
Y
m
f (yj ; θ0 ) · 1 − F (u − d; θ0 )
j=1
f (yi + d; θ)
1 − F (d; θ)
MLE of exponential distribution
(i) Complete data: MLE of θ is the sample mean
f (xi ; θ)
1 − F (d; θ)
c
Exam C - Actuarial Models - LGD
[1 − F (d; θ)]n+m
For the shifted model with parameter θ0 , i.e. the
estimate for the cost per payment random variable (or amount paid after deductibles)
Likelihood function & left-truncated data
The loss data available is either in the form of
(i) insurance payment amounts yi
(i) actual loss amounts xi larger than d
Likelihood function for the ground up loss
L(θ) =
f (yj + d; θ) · [1 − F (u; θ)]m
j=1
=
j=1
k
Y
n
Q
[1 − F (d; θ)]n+m
#
θ̂ = x̄ =
9
1X
xi
n
(ii) Right-censored data: the MLE of θ is the
total of all payment amounts over the number
of non-censored payments
P
xi + mu
θ̂ =
n
ground up loss exponential distribution is
θ̂ =
total amount of insurance payments
# of non-right-censored data points
If the shifted model is assumed to be exponential, then the MLE of its mean is the same
(iii) Policy deductible d data: the MLE of θ is
the total of all payment amounts over the number of insurance payments
P
yi
θ̂ =
n
Franchise d and exponential MLE
n
Q
(iv) Policy limit and deductible data: the MLE
of θ is the total amount of insurance payments
over the number of uncensored observations
P
P
xi + mu − (m + n)d
(xi − d) + m(u − d)
θ̂ =
=
n
n
In all cases, with or without policy
limit/deductible, the MLE of the mean of a
L(θ) =
i=1
1 −xi /θ
θe
e−d/θ
Since d is a franchise deductible, the payment
amounts are the actual loss amounts (the xi ’s),
but in L we must still use conditional densities
since loss amounts are only known if they are
above d
Chapter LM12: Variance of Maximum Likelihood Estimators
Fischer information
"
2
2 #
∂
∂
I(θ) = −E
l(θ) = E
l(θ)
∂θ2
∂θ
Delta method with 2 variables
For MLE θ̂1 and θ̂2 MLE with variances and
covariances V ar[θ̂1 ], V ar[θ̂2 ], Cov[θ̂1 , θ̂2 ] and
g(s, t) a function of two variables The variance
of g(θ̂1 , θ̂2 ) is
∂g 2
V ar[g(θ̂1 , θ̂2 )] =
· V ar[θ̂1 ]
∂θ1
∂g
∂g
+ 2
· Cov[θ̂1 , θ̂2 ]
∂θ1
∂θ2
∂g 2
· V ar[θ̂2 ]
+
∂θ2
where l(θ) = ln L(θ) and I(θ) ≡ information
Information matrix
If MLE is applied to a distribution with a vector
of parameters θ = [θ1 · · · θp ]† then the information matrix is the p × p matrix with (r,s) entry
∂2
I(θ)r,s = −E
l(θ)
∂θr ∂θs
In matrix form, this variance is
"
∂g ∂g
V ar(θ̂1 )
Cov(θ̂1 , θ̂2 )
·
·
∂θ1 ∂θ2
Cov(θ̂1 , θ̂2 )
V ar(θ̂2 )
Variance of the MLE
V ar[θ̂] =
1
I(θ)
#
Covariance between g(s, t) and h(s, t)
∂g
∂h
Cov[g(θ̂1 , θ̂2 ), h(θ̂1 , θ̂2 )] = ∂θ
∂θ1 · V ar[θ̂1 ]
1
h i
∂g
∂g
∂h
∂h
+ ∂θ
· Cov[θ̂1 , θ̂2 ]
∂θ2 + ∂θ2
∂θ1
1
∂g
∂h
+ ∂θ
∂θ2 · V ar[θ̂2 ]
2
For a vector of parameters, the variancecovariance matrix of the estimators is
CoV ar[θ] = [I(θ)]−1
Variance of a function of θ̂: the Delta
Method
For an estimator θ̂ with variance V ar[θ̂], and
for a function g(t), the Delta method provides
an approximation to the variance of g(θ̂)
h
i 2
V ar g(θ̂) = g 0 (θ) · V ar[θ̂]
c
Exam C - Actuarial Models - LGD
∂g
∂θ1
∂g
∂θ2
In matrix form, this variance is
"
∂g ∂g
V ar(θ̂1 )
Cov(θ̂1 , θ̂2 )
·
·
∂θ1 ∂θ2
Cov(θ̂1 , θ̂2 )
V ar(θ̂2 )
10
∂h
∂θ1
∂h
∂θ2
#
Chapter LM13: Graphical Methods for Evaluating Estimated Models
Notations
Fn (x) and fn (x) denote the cumulative distribution function and the density function of the
empirical distribution for a data set consisting
of n data points. F ∗ (x) and f ∗ (x) denote the
cdf and pdf of the estimated model distribution
For individual data, compare the plot of the
empirical distribution function Fn (x) to the estimated model distribution function F ∗ (x). Alternatively, plot D∗ (x) = Fn (x) − F ∗ (x) which
should be ∼ horizontal axis for a good fit
Grouped data
For interval grouped data, compare the plot of
the empirical distribution histogram fn (x) to
the graph of the estimated model density function f ∗ (x). The histogram of the grouped data
n
has height n(cj −cj j−1 ) on interval ]cj−1 , cj ]
Probability (p-p) plot
For individual data, we can create a p-p plot by
first, ordering the x-values in the data set from
smallest to largest, say x1 < · · · < xn and then
for each xi in the data set, plot the point
i
∗
(x, y) ≡
, F (xi )
n+1
Individual data
Good fit ⇔ points ∼ 45◦ line through the origin
Chapter LM14: Hypothesis Tests for Fitted Models
Hypothesis tests
H0 : the data comes from the estimated model
H1 : the data does not come from the model
where r is # of parameters that have been estimated in the model distribution. The null
hypothesis is that the model fits the data well
Kolmogorov-Smirnov test
Schwartz Bayesian Criterion
∗
max |Fn (xj ) − F (xj )|
ln L −
The empirical cdf has jumps at the data points,
and the overall maximum is found by taking the
+
∗
∗
max of |Fn (x−
j ) − F (xj )| and |Fn (xj ) − F (xj )|
where L is the maximized likelihood function
value for the model, r is # of parameters being
estimated in the model, and n is the number
of data points in the sample. The model selected is the one which has the largest value of
ln L − r/2 · ln n
Chi-square test
Test based on grouped data, k intervals with
nj (or Oj ) data points in each interval and a
total of n data points. For each category j, determine the probability p̂j = F ∗ (cj ) − F ∗ (cj−1 )
based on the model distribution. The expected
# of observations is Ej = np̂j . The χ2 statistic
is
k
X
(nj − Ej )2
Q =
Ej
Likelihood ratio test
Test the null hypothesis that model A is acceptable versus the alternative hypothesis that
model B is preferable to model A, where model
B has more parameters than model A
j=1
k
X
n2 (pnj − p̂j )2
=
Ej
2(lB − lA ) = 2 ln
j=1
=
k
X
j=1
(Oj − Ej )2
Ej
degrees of freedom
c
Exam C - Actuarial Models - LGD
LB
LA
where l represents the loglikelihood at the estimated parameter values. This statistic is approximately χ2 with d.o.f equal to # of parameters in model B− # parameters in model A
Q has an approximate χ2 distribution with
k−r−1
r
· ln n
2
11
Anderson-Darling test:
Based on a random sample of individual data points x1 , · · · , xn , we relabel the distinct values k labelled y1 ,
· · · , yk . The Anderson-Darling test statistic is
∗
k−1
k
∗ (y ) X
X
1
−
F
F
(y
)
j
j+1
A2 = −n + n (1 − Fn (yj ))2 · ln
+
Fn (yj )2 · ln
1 − F ∗ (yj+1 )
F ∗ (yj )
j=0
j=1
If the data is truncated on the left at t then we set y0 = t and if the data is censored on the right at u < ∞,
then yk+1 = u. In that case the statistic becomes
X
∗
k−1
k
∗
X
1 − F (yj )
F (yj+1 )
+
A2 = −nF ∗ (u) + n (1 − Fn (yj ))2 · ln
Fn (yj )2 · ln
∗
1 − F (yj+1 )
F ∗ (yj )
j=0
c
Exam C - Actuarial Models - LGD
j=1
12
Chapter LM15: The Cox Proportional Hazards Model
Baseline notations
(i) Baseline hazard rate
Survival probability for person j
Sj (t) = e−Hj (t) = e−H0 (t)·cj
h
icj
= e−H0 (t)
= [S0 (t)]cj
h0 (t)
Density function
(ii) Baseline cumulative hazard function
d
Sj (t) = f0 (t) · cj · [S0 (t)]cj −1
dt
Partial maximum likelihood estimation
The partial likelihood function consists of a
product of likelihood factors at each death point
(i.e. for each uncensored point). At death point
yj , know which individuals are at risk and know
the ci factor for each individual at risk. Define R(yj ) the set of all individuals still at risk
P
at death point yj and define Cj =
ci the
fj (t) = −
Zt
H0 (t) =
h0 (s)ds
0
(iii) Baseline survival function
S0 (t) = e−H0 (t)
(iv) Baseline density function
f0 (t) = −
d
S0 (t)
dt
i∈R(yj )
sum of the ci -values for all those individuals still
at risk at yj . For each of the deaths at yj we formulate Cckj for that individual and the likelihood
factor at yj is the product of those for all individuals who died at yj . The overall likelihood
function is the product of the likelihood factors
at all the death times
Y
Y
c
k
P
L'
c
,
i at risk at yi
i
y
Risk factors
It is assumed there are p risk factors represented
by a variable zji for individual j
Hazard rate for individual j
hj (t) = h0 (t) · exp [β1 zj1 + · · · + βp zjp ]
#
" p
X
βk zjk
= h0 (t) · exp
i
k=1
Breslow’s estimate of H0 (t)
Non-parametric estimate of the baseline cumulative hazard rate similar to Nelson- Aalen. At
each death point up to and including time t, we
add a factor of the form
X
# of deaths at yi
P
Ĥ0 (t) =
of the c-coeff of those at risk at yi
h
i
= h0 (t) · exp β † · z
h0 (t) is a baseline hazard rate, zjk is person j’s
level for risk factor k and β’s are parameters
cj factors
"
cj = exp
p
X
yi ≤t
#
From the Breslow estimate Ĥ0 (t) we get the
Breslow estimated baseline survival function
h
i
Ŝ0 (t) = exp Ĥ0 (t)
h
i ci
Ŝj (t) = Ŝ0 (t)
βk zjk
k=1
so that
hj (t) = h(t|zj ) = h0 (t) · cj
Relative risk
Relative risk for individual i to individual j
ci
r.r. =
cj
Ratio of hazard rates
hj (t)
h(t|zj )
cj
=
=
hg (t)
h(t|zg )
cg
To find a confidence interval for the relative risk,
we find an interval for ln(ci /cj ), say [l, u]. Then
the confidence interval for ci /cj is [el , eu ]
Cumulative hazard function
Hj (t) = H0 (t) · cj
c
Exam C - Actuarial Models - LGD
k dies at yi
13
Chapter CR1: Limited Fluctuation Credibility
Full credibility
Under the full credibility approach, the estimate
of E[W ] is chosen as one of the following two
possible values:
as the estimated premium), if
2
σ
n ≥ n0
µ
V ar[W ]
≥ n0
(E[W ])2
2
≥ n0 · CVW
n
(i) W̄ = W1 +···+W
or
n
(ii) M : the manual premium
where n is the number of observations of W and
CVW is the coefficient of variation of W . The
previous condition is equivalent to
P
σ2
of observed W -values ≥ n0 ·
µ
V ar[W ]
≥ n0 ·
E[W ]
The decision as to which of W̄ (pure premium)
or M (manual premium) is chosen to use as an
estimate of µ = E[W ] (expected pure premium)
is based on how “close” W̄ is to µ. The meaning
of close is based on two quantities:
(i) The range parameter k (usually k = 0.05)
(ii) The probability level P (usually P = 0.90)
Review of compound distributions
If W and U are any two random variables
Full credibility standard
E[W ] = E[E[W |U ]]
V ar[W ] = E[V ar[W |U ]] + V ar[E[W |U ]]
P |W̄ − µ| < kµ ≥ P
For a compound distribution S, we condition the
severity Y over the frequency N
The probability is at least P that the absolute
deviation of W̄ from E[W̄ ] is less than the fraction k of the mean
E[S] = E[N ] · E[Y ]
V ar[S] = E[N ] · V ar[Y ] + V ar[N ] · (E[Y ])2
Quantity y
Full credibility for compound distributions
V ar[S]
n ≥ n0 ·
(E[S])2
E[N ] · V ar[Y ] + V ar[N ] · (E[Y ])2
= n0 ·
(E[N ] · E[Y ])2
W̄ − µ √
y = min z : P rob ≤z ≥P
σ/ n If W̄ is assumed to be normal, then y is the 1+P
2
percentile of the standard normal distribution.
Similarly, given probability P , we find the value
y such that the probability P [−y ≤ Z ≤ y] = P ,
where Z has a standard normal distribution
P
Quantity n0
n0 =
P
y 2
k
E.g., with P = 0.90 and k = 0.05 n0 ' 1083
V ar[S]
E[Y ] · E[S]
E[N ] · V ar[Y ] + V ar[N ] · (E[Y ])2
≥ n0 ·
E[N ] · (E[Y ])2
of observed claims ≥ n0 ·
Full credibility standards for N Poisson
V ar[N ]
# periods needed ≥ n0 ·
(E[N ])2
n0
≥
λ
# claims needed ≥ n0
Full credibility given to W̄
For a random variable W , full credibility is given
to W̄ (meaning that n, the number of observations of W is large enough so that W̄ is chosen
c
Exam C - Actuarial Models - LGD
V ar[S]
E[S]
E[N ] · V ar[Y ] + V ar[N ] · (E[Y ])2
≥ n0 ·
E[N ] · E[Y ]
of observed S amount ≥ n0 ·
14
Full credibility for S compound Poisson
(i) # of observations of S needed ≥
λ V ar[Y ] + (E[Y ])2
V ar[S]
n0 ·
= n0 ·
(E[S])2
(λE[Y ])2
V ar[Y ]
1
= n0 · · 1 +
λ
(E[Y ])2
n0 =
1 + (Coeff. of Variation of Y )2
λ
(ii) Total amount of observed S needed ≥
λ V ar[Y ] + (E[Y ])2
V ar[S]
n0 ·
= n0 ·
E[S]
λE[Y ]
V ar[Y ]
= n0 · E[Y ] +
E[Y ]
(iii) Total # of observed claims needed ≥
λ V ar[Y ] + (E[Y ])2
V ar[S]
= n0 ·
n0 ·
E[Y ] · E[S]
λ(E[Y ])2
V ar[Y ]
= n0 · 1 +
(E[Y ])2
= n0 · 1 + (Coeff. of Variation of Y )2
Partial credibility
The partial credibility sets the premium to
Q = Z W̄ + (1 − Z)M
Q is the credibility premium and Z is the credibility factor in the interval [0, 1]
s
Information available
Z=
Information needed for full credibility
Partial credibility factor Z
r
# of observations available
Z =
# of observations needed for FC
r
Sum of available observations
Z =
Total sum of observations needed for FC
r
Z =
# of claims observed
Total # of claims needed for FC
c
Exam C - Actuarial Models - LGD
15
Chapter CR2-3: Bayesian Estimation and Credibility, Discrete Prior
Conditional probability
The marginal distribution of X is then
X
P [X = x] =
P [(X = x) ∩ (Λ = λ)]
P [A ∩ B]
P [B]
P [A ∩ B] = P [A|B] · P [B]
P [A|B] =
λ
=
leading to the posterior distribution of Λ|X
P [A] = P [A ∩ B] + P [A ∩ B 0 ]
= P [A|B] · P [B] + P [A|B 0 ] · P [B 0 ]
Joint Dist. of X and Λ
Marginal Dist. of X
P [(X = x) ∩ (Λ = λ)]
P [X = x]
P [X = x|Λ = λ] · P [Λ = λ]
P
P [(X = x) ∩ (Λ = λ)]
P [Λ = λ|X = x] =
Simple Bayes rule
=
P [X = x|Λ = λ] · P [Λ = λ]
λ
B 0 complement of event B
P [B|A] =
X
=
P [A ∩ B]
P [A]
P [A|B] · P [B]
P [A|B] · P [B] + P [A|B 0 ] · P [B 0 ]
=
λ
P [X = x|Λ = λ] · P [Λ = λ]
P
P [X = x|Λ = λ] · P [Λ = λ]
=
Partition of probability space B1 , · · · , Bn
λ
Predictive distribution
The distribution of a second outcome of X given
the value of the first outcome of X.
P [A] = P [A ∩ B1 ] + · · · + P [A ∩ Bn ]
n
X
=
P [A|Bi ] · P [Bi ]
i=1
P [X|X0 ] =
Reverse conditioning using Bayes rule
P [Bj |A] =
=
=
P [A ∩ Bj ]
P [A]
P [A|Bj ] · P [Bj ]
n
P
P [A|Bi ] · P [Bi ]
=
P [X ∩ X0 ]
P [X0 ]
P [X∩X0 |Λ]·P [Λ]+P [X∩X0 |Λ̄]·P [Λ̄]
P [X0 |Λ]·P [Λ]+P [X0 |Λ̄]·P [Λ̄]
P [X|Λ]·P [X0 |Λ]·P [Λ]+P [X|Λ̄]·P [X0 |Λ̄]·P [Λ̄]
P [X0 |Λ]·P [Λ]+P [X0 |Λ̄]·P [Λ̄]
P [X|X0 ] = P [X|Λ] · P [Λ|X0 ] + P [X|Λ̄] · P [Λ̄|X0 ]
i=1
Predictive expectation
Bayesian analysis
Given prior event probabilities and the model
event probabilities (the conditional probability
given that a particular prior event has occurred),
the objective is “reverse the conditioning” to
find the posterior probability of a prior event
occurring given that a particular model event
has occurred.
E[Y ] = E[Y |C] · P [C] + E[Y |C̄] · P [C̄]
E[Y |B] = E[Y |C] · P [C|B] + E[Y |C̄] · P [C̄|B]
These rules can be extended to any partition of
a probability space C1 , · · · , Cm
E[Y ] =
Bayesian analysis scheme
Given the prior distribution of Λ and the model
distribution of X
E[Y |B] =
j=1
m
X
E[Y |Cj ] · P [Cj ]
E[Y |Cj ] · P [Cj |B]
j=1
P [Λ] : P [Λ = λ1 ], · · · , P [Λ = λn ]
Bayesian premium
The predictive expectation of the next occurence
of the model random variable X given whatever
information is available about observations of X
that have already been made
P [X|Λ] : P [X = x|Λ = λ], · · ·
one calculates the joint distribution of X and Λ
P [(X = x)∩(Λ = λ)] = P [X = x|Λ = λ]·P [Λ = λ]
c
Exam C - Actuarial Models - LGD
m
X
16
Chapter CR4: Bayesian Credibility, Continuous Prior
Continuous ↔ discrete
In all the previous equations, if the prior distribution Θ is discrete rather than continuous, we
substitute
Z
X
↔
Prior distribution Θ
P [Θ = θ] = π(θ)
Model distribution X
f (x|θ) ≡ fX|Θ (x|Θ = θ)
θ
For a data set of random observed values from
the distribution of X for a particular member
of the population, e.g. x1 , · · · , xn and a specific
possible value of θ, the model distribution is
fX|Θ (x1 , · · · , xn |θ) =
n
Y
Conjugate prior
When a prior-model distributions pair combination results in a posterior distribution that is
of the same type as the prior distribution, the
prior distribution is said to be a conjugate prior
for the model distribution
f (xi |θ)
i=1
Joint distribution of X and Θ
fX,Θ (x, θ) = f (x|θ) · π(θ)
In the multiple data point situation,
fX,Θ (x1 , · · · , xn , θ) =
f (x1 |θ) · f (x2 |θ) · · · f (xn |θ) · π(θ)
Marginal distribution of X
Z
fX (x) = f (x|θ) · π(θ)dθ
In the multiple data-point situation, X1 , · · · , Xn
have joint pdf
fX (x1 , · · · , xn ) =
Z
f (x1 |θ) · f (x2 |θ) · · · f (xn |θ) · π(θ)dθ
Posterior distribution of Θ given X = x
πΘ|X (θ|x) =
=
f (x|θ)π(θ)
fX (x)
f (x|θ)π(θ)
R
f (x|θ) · π(θ)dθ
In the multiple data point situation,
πΘ|X (θ|x1 , · · · , xn ) =
R
f (x1 |θ) · · · f (xn |θ) · π(θ)
f (x1 |θ) · · · f (xn |θ) · π(θ)dθ
Predictive distribution of Xn+1
fXn+1 |X (xn+1 |x1 , · · · , xn ) =
Z
fXn+1 |Θ (xn+1 |θ) · πΘ|X (θ|x1 , · · · , xn )dθ
c
Exam C - Actuarial Models - LGD
17
Chapter CR5: Bayesian Credibility, Specific Distributions
Poisson/Gamma
f (x|λ) =
e−λ · λx
x!
π(λ) =
With n data values, the posterior distribution is
inverse gamma with parameters
X
0
0
α = α + n and θ = θ +
xi
λα−1 · e−λ/θ
θα · Γ(α)
The predictive mean is
The posterior distribution is gamma with parameters
0
α =α+x
P
0
θ
θ + xi
predictive mean = 0
=
α+n−1
α −1
For a model distribution being a gamma distribution with α known and θ = λ unknown
combining with an inverse gamma prior distribution, the posterior distribution is still inverse
gamma
1
1
+1
0 =
θ
θ
and
The marginal distribution of X is negative binomial with r = α and β = θ, thus E[X] = αθ
With n data values, the posterior distribution is
gamma with parameters
0
α =α+
X
xi
Binomial/Beta
1
1
+n
0 =
θ
θ
and
x x
q (1 − q)1−x
f (x|q) =
m
Γ(a + b) a−1
π(q) =
q (1 − q)b−1
Γ(a)Γ(b)
The predictive distribution is negative binomial
with
X
0
0
r = α =α+
xi
β
0
0
= θ =
0
0
0
0
The posterior distribution is beta with parameters
θ
nθ + 1
0
0
a =a+x
0
mean = r β = α θ = (α +
X
xi )
θ
nθ + 1
0
When α = 1 in the prior distribution, the prior
is an exponential distribution and the marginal
distribution of X is a geometric distribution.
The posterior will be a gamma distribution (or
still an exponential if the observation is x = 0)
The predictive mean is
predictive mean
Exponential/Inverse Gamma
0
and
θα · e−θ/λ
π(λ) = α+1
λ
· Γ(α)
f (x|λ) =
λ −λ/x
e
x2
π(λ) =
λα−1 · e−λ/θ
θα · Γ(α)
The posterior distribution is gamma with parameters
1 1
1
0
= +
α = α + 1 and
θ x
θ0
The marginal distribution of X is inverse Pareto
with τ = α and the same θ as in the prior distribution
With n data values, the posterior distribution is
gamma with parameters
1
1 X 1
0
α = α + n and
+
0 =
θ
xi
θ
0
θ =θ+x
The marginal distribution of X is Pareto with
the same α and θ as in the prior distribution,
θ
. The predictive mean is
thus E[X] = α−1
0
θ
θ+x
predictive mean = 0
=
α
α −1
c
Exam C - Actuarial Models - LGD
= m × (posterior mean)
P
a + xi
= m·
a + b + km
Inverse Exponential/Gamma
The posterior distribution is inverse gamma
with parameters
α =α+1
0
b =b+m−x
With k data values, the posterior distribution is
beta with parameters
X
X
0
0
a = a+
xi and b = b + km −
xi
variance = r β (1 + β )
1
f (x|λ) = e−x/λ
λ
and
18
For a model distribution being an inverse
gamma distribution with α known and θ = λ
unknown combining with a gamma prior distribution, the posterior distribution is still gamma
the loss function using the posterior distribution
(i) Squared-error loss
Normal/Normal
The estimate is the mean of the posterior distribution
(ii) Absolute loss
(x−λ)2
1
f (x|λ) = √ e− 2σ2
σ 2π
π(λ) = √
l(θ̂, θ) = (θ̂ − θ)2
(λ−µ)2
1
e− 2a
2πa
l(θ̂, θ) = |θ̂ − θ|
The posterior distribution is normal with parameters
mean
x
σ2
1
σ2
=
variance
+
+
1
1
+
σ2
=
The estimate is the median of the posterior distribution
(iii) Zero-one loss
µ
a
1
a
l(θ̂, θ) =
1
a
With n data values, the posterior distribution is
normal with parameters
Px
mean
=
variance
=
i
σ2
n
σ2
n
σ2
1
+
µ
a
1
a
σ2
n
σ2
mean =
1
+
+
1
a
µ
a
1
a
Uniform/1-Pareto
f (x|λ) =
1
λ
π(λ) =
αθα
λα+1
With observations x1 , · · · , xn suppose that M =
max{x1 , · · · , xn }. The posterior distribution of
λ is a single parameter Pareto with
α
θ
0
0
= α+n
= M
The predictive mean is
predictive mean =
(α + n)M
2(α + n − 1)
Loss function
To obtain an estimate of the parameter θ, we
define a “loss function” l, and the estimate of
θ is that which minimizes the expected value of
c
Exam C - Actuarial Models - LGD
θ̂ = 0
θ̂ 6= 0
Highest posterior density credibility set
The 100(1 − α)% HPD credibility interval for a
given α is an interval over which the posterior
probability is 100(1 − α)% and the numerical
values of the posterior density are higher on
that interval than on any other
The predictive mean is the same as the posterior
mean, which is
Px
if
if
The estimate is the mode of the posterior distribution
+
+
0
1
19
Chapter CR6: Buhlmann Credibility
Hypothetical mean
claim amount µ(θ) and variance v(θ). Therefore, Xi has mean µ(θ) and variance v(θ)/mi
µ(θ) = E[Xi |Θ = θ]
µ(θ) = E[Xi |Θ = θ]
v(θ)
V ar[Xi |Θ = θ] =
mi
µ = E[µ(θ)] = E[Xi ]
Process variance
v(θ) = V ar[Xi |Θ = θ]
v = E[v(θ)]
Pure (collective) premium
a = V ar[µ(θ)]
µ = E[µ(θ)] = E [E[X|Θ]] = E[X]
m = m1 + m2 + · · · + mn
v
E[V ar(Xi |Θ)]
k =
=
a
V ar[E(Xi |Θ)]
m
m
Z =
v =
m+ a
m+k
Expected process variance (EPV)
v = E[v(θ)] = E [V ar(Xi |Θ)]
Premium
Variance of hypothetical mean (VHM)
where X̄ =
a = V ar[µ(θ)] = V ar [E(Xi |Θ)]
i=1
V ar[Xi ] = E [V ar(Xi |Θ)] + V ar [E(Xi |Θ)]
= v + a = σ2
Unconditional covariance Cov[Xi , Xj ]
Cov[Xi , Xj ] = a = ρσ 2
a
v+a
=
v
a
Buhlmann’s credibility factor Z
Z =
n
n+
=
n+
v
a
n
=
n+k
n
fX|Θ (x|θ) =
p(x)e−θx
q(θ)
and if the distribution of Θ has pf/pdf
E[V ar(Xi |Θ)]
V ar[E(Xi |Θ)]
π(θ) =
Buhlmann’s credibility premium
Z X̄ + (1 − Z)µ
[q(θ)]−k e−θµk
for θ0 < θ < θ1
c(µ, k)
with π(θ0 ) = π(θ1 ) = 0, then exact credibility is
0
verified. In addition, v(θ) = µ (θ)
The Buhlmann-Straub model
For changing numbers of exposures from one period to the next. Xi is the average claim for mi
independent exposure units in period i, where
given Θ = θ each exposure unit Xij has mean
c
Exam C - Actuarial Models - LGD
· Xi . It can be shown that
Exact credibility
When the Bayesian premium is equal to the
Buhlmann credibility premium. If the Bayesian
premium E[Xn+1 |X1 , · · · , Xn ] is a linear function of X1 , · · · , Xn then exact credibility is satisfied. If the model distribution of X is from the
linear exponential family
1
v
1+ a
Buhlmann’s k
k=
mi
m
Z X̄ + (1 − Z)µ
V ar[Xi ] = a + mvi . The credibility premium is
the credibility premium per exposure unit, i.e.
per occurence of an individual Xij for the next
period. If there will be mn+1 exposure units in
period n + 1, then the credibility premium for
period n + 1 for all exposures combined is
mn+1 Z X̄ + (1 − Z)µ
Unconditional variance of Xi
where ρ =
n
P
:
20
Chapter CR7: Empirical Bayes Credibility
Non-parametric emp. Bayes credibility
(i) r policy holders, i = 1, 2, · · · , r
(ii) For policy holder i, data on ni exposure periods is available j = 1, 2, · · · , ni
(iii) For policy holder i and exposure period j,
there are mij exposure units, with an average
observed claim(amount or number) of Xij per
exposure unit. The total claim observed for policyholder i in exposure j is mij Xij , and the total
claim observed for policyholder i for all ni exposure periods is
ni
X
(xi) Once the values of µ, v and a are found (or
estimated), then the credibility premium for the
next exposure period for policyholder i is
Zi X̄i + (1 − Zi )µ
where
Zi =
v
a
If there will be mi,ni +1 exposure units for the
next period for policyholder i, then the credibility premium for policyholder i for all exposure
units combined for the next period is
mij Xij
j=1
e.g., in group i, for individual j, there is mij
years of data available with average claim Xij
over those mij years
(iv) The total number of exposure units for polni
P
icyholder i is mi =
mij
mi,ni +1 · Zi X̄i + (1 − Zi )µ
Empirical Bayes estimation for the
Buhlmann model (Equal sample size)
Under the Buhlmann model, there are the same
number of exposure periods for each policyholder (n1 = n2 = · · · = nr = n) and there
is one exposure unit for each exposure period
(mij = 1 for j = 1, 2, · · · , n). Then for each
i = 1, · · · , r we have
j=1
(v) The average observed claim per exposure
unit for policyholder i is
X̄i =
mi
mi
=
mi + k
mi +
ni
1 X
·
mij Xij
mi
j=1
(vi) The total number of exposure units for all
r
P
policyholders is m =
mi
n
X̄i =
i=1
(vii) The average claim per exposure period for
all policyholders is
r
1 X
·
X̄ =
mi X̄i
m
1X
Xij
n
j=1
X̄ =
r
1X
r
i=1
r
n
1 XX
X̄i =
Xij
r·n
i=1 j=1
i=1
The unbiased estimates of the structural parameters are
(viii) Policyholder i has risk parameter random
variable Θi , and Θ1 , · · · , Θr are assumed to be
independent and identically distributed
(ix) For policyholder i and exposure period j =
1, · · · , ni , the conditional distribution of average claim per exposure period given Θi = θi has
mean and variance
µ̂ = X̄
r
v̂ =
r
i=1 j=1
â =
E[Xij |Θi = θi ] = µ(θi )
v(θi )
V ar[Xij |Θi = θi ] =
mij
1
r−1
r
X
i=1
(X̄i − X̄)2 −
i=1
v̂
n
The credibility factors will be equal for all
groups since mi = n for all i. The estimated
credibility factor for each i is
(x) The structural parameters are
µ = E[µ(θi )]
v = E[v(θi )]
Ẑi =
a = V ar[µ(θi )]
c
Exam C - Actuarial Models - LGD
n
XX
1
1X
(Xij − X̄i )2 =
v̂i
r(n − 1)
r
21
mi
mi + k̂
=
n
n+
v̂
â
Empirical Bayes estimation for the
Buhlmann-Straub model
There are ni exposure periods for policyholder
i, i = 1, · · · , r. For policyholder i and exposure
period j, j = 1, · · · , ni , there are mij exposure
units, and Xij represents the observed average
claim per exposure unit. The unbiased estimates
for the structural parameters are
following estimates can be used for v and a
n
v̂i =
j=1
âi = (xi − µ)2 −
µ̂ = X̄
v̂ =
1
r
P
·
(ni − 1)
ni
r X
X
mij (Xij − X̄i )2
i=1 j=1
i=1
"
1
â =
m−
r
P
1
m
i=1
·
m2i
r
X
#
2
mi (X̄i − X̄) − v̂(r − 1)
i=1
The estimated credibility factor for group i is
Ẑi =
mi
mi + k̂
=
mi
mi +
v̂
â
An alternative to the unbiased estimate µ̂ =
X̄ is the method that preserves total losses
(credibility-weighted average) estimate of µ̂.
This is found by first estimating Ẑ1 , · · · , Ẑr
in the way described for the Buhlmann-Straub
model and then
r
P
µ̂ =
Ẑi X̄i
i=1
r
P
Ẑi
i=1
If we use this estimate of µ to find the credibility
premiums for group 1, · · · , r and then calculate
total credibility premiums for past exposures,
that total will be equal to the actual total past
claims
Variations on the Buhlmann/BuhlmannStraub models
(i) If the manual rate µ is known (not necessarily an unbiased estimate), then it would be used
in the credibility premium equation instead of µ̂
(ii) If the actual value of µ is known, then an
alternative unbiased estimate for a is
â =
r
X
mi
i=1
m
(X̄i − µ)2 −
r
· v̂
m
(ii) If the actual value of µ is known, and if there
is data available for only policyholder i, then the
c
Exam C - Actuarial Models - LGD
i
1 X
mij (Xij − X̄i )2
ni − 1
22
v̂i
mi
Semiparametric empirical Bayesian credibility
The model has a parametric distribution for X
given Θ = θ, but an unspecified non-parametric
distribution for Θ. We use relationships linking
µ(θ) = E[X|Θ], v(θ) = V ar[X|Θ] and the fact
that V ar[X] = v + a to get estimates for µ, v
and a to use in the credibility premium equation
E.g., for X|Θ Poisson random variable, we can
use µ̂ = X̄ as an estimate of µ and similarly,
v̂ = X̄ as anestimate of v. Calculating the
sample variance, V ˆar[X] we can estimate a by
â = V ˆar[X] − v̂
Chapter SI1: Simulation
way: take n random values u1 , · · · , un from a
uniform [0, 1]. The simulated value of X is
Inverse transform method to simulate a
discrete random variable
Discrete random variable X with probability
distribution P [X = xj ] = pj for j = 0, 1, · · ·
where p0 + p1 + · · · = 1. Given a random number U from [0, 1], we find the integer j such that
1
X = − ln(u1 u2 · · · un )
λ
X = −θ ln(u1 u2 · · · un )
Number of simulated values needed
If we are simulating a random variable X which
has mean µ and variance σ 2 , and we are trying
to estimate µ and want to determine the number
of simulated values of X needed, say n, such
P |X̄ − µ| ≤ 0.05µ = 0.90
P [X ≤ xj−1 ] ≤ U ≤ P [X ≤ xj ]
FX (xj−1 ) ≤ U ≤ FX (xj )
j−1
X
pi ≤ U ≤
i=0
j
X
pi
i=0
Then, the simulated value of X is xj . Small
uniform random numbers correspond to small
simulated values
Then
n≥
Inverse transform method to simulate a
continuous random variable
Given the continuous random varaible X with
cdf FX (x), a value of X can be simulated from a
random number U = u from the uniform distribution over [0, 1]. Solve the equation u = FX (x)
for x. The simulated value of X is
2
·
σ2
µ2
The bootstrap method
ˆ of an
Estimate the mean square error M SE(θ)
estimator
M SE(θ̂) = E[(θ̂ − θ)2 ]
= V ar[θ̂] + [Bias(θ̂)]2
The bootstrap estimate is calculated by first
constucting the empirical distribution of the
random sample, and then “resampling” from the
empirical distribution and calculating the mean
square error of teh estimator.
x = FX−1 (u)
If the simulation is such that small random numbers correspond to large simulated values, solve
for x the equation u = 1 − F (x)
Simulation of the exponential distribution
1
x = − ln(1 − u)
λ
= −θ ln(1 − u)
Simulation of the Poisson distribution
The Poisson with mean λ can be simulated
as follows: From successive independent uniform [0, 1] numbers u1 , u2 , · · · find the successive
products u1 , u1 u2 , u1 u2 u3 , · · · . The simulated
value of N Poisson with mean λ is
N = max{n : u1 u2 · · · un ≥ e−λ }
Simulation of the gamma distribution
The gamma distribution X with parameters α =
n and θ = 1/λ can be simulated in the following
c
Exam C - Actuarial Models - LGD
1.645
0.05
23
Chapter IN1: Interpolation
Lagrange interpolating polynomial
Assuming n+1 points (x0 , f (x0 )), · · · , (xn , f (xn )),
we first construct a set n+1 Lagrange coefficient
Ln,k (x) =
which can be used to find the mi ’s and then the
coefficients of the splines
Types of splines
(i) Natural spline
(x−x0 )···(x−xk−1 )(x−xk+1 )···(x−xn )
(xk −x0 )···(xk −xk−1 )(xk −xk+1 )···(xk −xn )
Note that Ln,k (xk ) = 1 and Ln,k (xi ) = 0 if i 6= k.
Lagrange interpolating polynomial of degree n
P (x) =
n
X
m0 = mn = 0
(ii) Curvature adjusted spline
f (xk ) · Ln,k (x)
m0 = cte,
k=0
= f (x0 ) · Ln,0 (x) + · · · + f (xn ) · Ln,n (x)
P f (x )·
=
n
k
k=0
(iii) Parabolic runout spline
(x−x0 )···(x−xk−1 )(x−xk+1 )···(x−xn )
(xk −x0 )···(xk −xk−1 )(xk −xk+1 )···(xk −xn )
m0 = m1 and mn−1 = mn
Cubic spline interpolation
Given the n + 1 points x0 , · · · , xn (in numerically increasing order) and corresponding function values y0 , · · · , yn , a cubic spline interpolant
is a piecewise defined function consisting of n 3rd degree polynomials fj (x) (called splines) over
the interval [xj , xj+1 ]. At each xj , the spline
pieces have equal first and second derivatives
(iv) Cubic runout spline
f0 (x) = f1 (x) and fn−2 (x) = fn−1 (x)
(v) Clamped spline
0
Measure of curvature (smoothness)
fj (xj+1 ) = yj+1
0
00
00
0
f0 (x0 ) = cte and fn−1 (xn ) = cte
fj (xj ) = yj
0
mn = cte
S=
fj (xj+1 ) = fj+1 (xj+1 )
Zxn h
i2
00
f (x) dx
x0
fj (xj+1 ) = fj+1 (xj+1 )
Clamped and natural splines both have smaller
curvature than any other spline
Notations and calculation shortcut
Smoothing data with cubic splines
Assume that each of the y-values include a random additive component with variance σj2 for
yj . We construct a spline so that f (yj ) = aj
(where aj is not necessarily equal to yj ) and
then we define a goodness of fit measure
fj (x) = aj +bj (x−xj )+cj (x−xj )2 +dj (x−xj )3
hj
mj
gj
uj
= xj+1 − xj
00
= f (xj )
= 2(hj−1 + hj )
yj+1 − yj
yj − yj−1
= 6
−
hj
hj−1
n X
yj − aj 2
F =
σj
j=0
Then
aj
bj
cj
dj
uj
The tradeoff between good fit and smoothness is
represented by the quantity p where 0 ≤ p ≤ 1,
with p minimizing L
= yj j = 0, · · · , n − 1
yj+1 − yj
hj (2mj + mj+1 )
=
−
hj
6
mj
=
2
mj+1 − mj
=
6hj
= hj−1 mj−1 + gj mj + hj mj+1
c
Exam C - Actuarial Models - LGD
L = pF + (1 − p)S
24
© Copyright 2026 Paperzz