Generalized linear mixed models - UConn Math

Generalized linear mixed models (GLMMs) for
dependent compound risk models
Emiliano A. Valdez, PhD, FSA
joint work with H. Jeong, J. Ahn and S. Park
University of Connecticut
Seminar Talk at
Yonsei University
Seoul, Korea
15 May 2017
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
1 / 21
Outline
Introduction
Data structure
The frequency-severity model
Exponential dispersion family
Covariates
Compound risk models
Generalized linear mixed models
Model specifications
Model estimates
Claim frequency
Claim severity
Tweedie models
Model comparison
Gini index
Conclusion
Appendix A - Singapore
Insurance market
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
2 / 21
Introduction
Data structure
Data structure
“Policyholder” i is followed over time t = 1, . . . , Ti years, where Ti is
at most 9 years.
Unit of analysis “it” – a registered vehicle insured i over time t (year)
For each “it”, could have several claims, k = 0, 1, . . . , nit
Have available information on: number of claims nit , amount of claim
yitk , exposure eit and covariates (explanatory variables) xit
covariates often include age, gender, vehicle type, driving history and
so forth
We will model the pair (nit , cit ) where

nit
 1 P yitk ,
cit = nit k=1

0,
nit > 0
nit = 0
is the observed average claim size.
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
3 / 21
The frequency-severity model
The frequency-severity model
Traditional to predict/estimate insurance claims distributions:
Cost of Claims = Frequency × Severity
The joint density of the number of claims and the average claim size
can be decomposed as
f (N, C|x) = f (N |x) × f (C|N, x)
joint = frequency × conditional severity.
This natural decomposition allows us to investigate/model each
component separately and it does not preclude us from assuming N
and C are independent.
For purposes of notation, we will use the notation C = N C to be the
aggregate claims.
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
4 / 21
The frequency-severity model
Exponential dispersion family
Exponential dispersion family
We say Y comes from an exponential dispersion family if its density has
the form
yθ − ψ(θ)
+ c(y; φ) .
f (y) = exp
φ
where θ and ψ are location and scale parameters, respectively, and b(θ)
and c(y; ψ) are known functions.
The following well-known relations hold for these distributions:
mean: µ = E[Y ] = ψ 0 (θ)
variance: V ar[Y ] = φψ 00 (θ) = φV (µ)
Reproductive EDF: If Y1 , . . . , YN are mutually independent belonging to
the EDF(θ, φ), then its average Y also belongs to EDF(θ, φ/N ).
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
5 / 21
The frequency-severity model
Exponential dispersion family
Examples of members of this family
Normal N (µ, σ 2 ) with θ = µ and φ = σ 2 , V (µ) = 1
Gamma(α, β) with θ = −β/α and φ = 1/α, V (µ) = µ2
Inverse Gaussian(α, β) with θ = − 12 β 2 /α2 and φ = β/α2 , V (µ) = µ3
Poisson(µ) with θ = log µ and φ = 1, V (µ) = 1, V (µ) = µ
Binomial(m, p) with θ = log [p/(1 − p)] and φ = 1,
V (µ) = µ(1 − µ/m)
Negative Binomial(r, p) with θ = log(1 − p) and φ = 1,
V (µ) = µ(1 + µ/r)
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
6 / 21
The frequency-severity model
Exponential dispersion family
The link function and linear predictors
The linear predictor is a function of the covariates, or predictor
variables.
The link function connects the mean of the response to this linear
predictor in the form η = g(µ).
g(µ) is called the link function with µ = E(Y ) = linear predictor.
The transformed mean follows a linear model as
η = x0i β = β0 + β1 xi1 + · · · + βp xip
with p predictor variables (or covariates).
You can actually invert µ = g −1 (x0i β) = g −1 (η).
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
7 / 21
The frequency-severity model
Covariates
Covariates
Vehicle Type: automotive (A) or others (O).
Marital status: married (M), single (S), others (O)
Gender: male (M) or female (F).
Coverage type: Comprehensive (Comp) or Others
Age: in years, grouped into 3 categories
Young (Y), Middle Aged (M), Old (O)
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
8 / 21
Compound risk models
Compound risk models
See Garrido, et al. (2016)
In the case of independence, we have:
Mean: E(C) = E(N )E(C)
Variance: V ar(C) = E(N )V ar(C) + V ar(N )[E(C]2
In case we do not assume independence, we have
Mean: E(C) = E[N E(C|N ))] 6= E(N )E(C)
Variance: V ar(C) = E[N V ar(C|N ))] + V ar[N E(C|N )]
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
9 / 21
Compound risk models
Generalized linear mixed models
Generalized linear mixed models
GLMMs extend GLMs by allowing for random, or subject-specific, effects
in the linear predictor which reflects the idea that there is a natural
heterogeneity across subjects. For insurance applications, our subject i is
usually a policyholder observed for a period of Ti periods.
Given the vector bi with the random effects for each subject i, Yit belongs
to the EDF:
yit θit − ψ(θit )
+ c(yit ; φ)
f (yit |bi ) = exp
φ
with the following conditional relations:
mean: µit = E[Yit |bi ] = ψ 0 (θit )
variance: V ar[Y |bi ] = φψ 00 (θ + it) = φV (µit )
Link function: g(µit ) = x0it β + z0it bi
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
10 / 21
Model specifications
Model specifications
We calibrated models with the following specifications:
For the count of claims (frequency): negative binomial for our
baseline model
more suitable than the traditional Poisson model because it can handle
overdispersion and individual unobserved effects
For the severity of claims: Gamma (due to its reproductive property)
For the random effects for GLMMs: Normal with mean zero and
unknown variances
different variances for frequency and claim severity
The Tweedie model is a special case of the GLM family and it has
mass at zero (avoids modeling frequency and severity separately).
for some case, its density has no explicit form, but the variance
function has the form V (µ) = µp e.g. compound Poisson Gamma with
1 < p < 2.
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
11 / 21
Model estimates
Claim frequency
Negative binomial model for claim frequency
(Intercept)
VTypeCar
MaritalM
SexM
Comp
AgeYoung
AgeMid
AIC
BIC
Log Likelihood
Deviance
Num. obs.
Num. groups: ID
Variance: ID.(Intercept)
Variance: Residual
∗∗∗ p
< 0.001,
∗∗ p
MAE
MSE
Jeong/Ahn/Park/Valdez (U. of Connecticut)
NB GLM
−2.66∗∗∗
(0.08)
−0.01
(0.07)
−0.53
(0.35)
0.11∗∗∗
(0.02)
1.09∗∗∗
(0.03)
0.25∗∗
(0.08)
0.14∗∗∗
(0.02)
104835.76
104915.37
-52409.88
65407.89
155063
NB GLMM
−2.67∗∗∗
(0.12)
0.01
(0.12)
−0.54
(0.54)
0.11∗∗
(0.04)
1.09∗∗∗
(0.05)
0.23
(0.13)
0.14∗∗∗
(0.03)
104922.06
105011.62
-52452.03
155063
49978
0.28
2.27
< 0.01, ∗ p < 0.05
NB GLM
0.2038
0.2390
NB GLMM
0.2039
0.2391
Seminar Talk - Yonsei University
15 May 2017
12 / 21
Model estimates
Claim severity
Gamma distributions models for claim frequency
Gamma-indep GLM
9.24∗∗∗
(0.56)
0.43
(0.53)
−1.34
(2.29)
0.10
(0.15)
0.41
(0.23)
−0.56
(0.56)
0.06
(0.14)
(Intercept)
VTypeCar
MaritalM
SexM
Comp
AgeYoung
AgeMid
log(Freq)
AIC
BIC
Log Likelihood
Deviance
Num. obs.
Num. groups: Year
Variance: Year.(Intercept)
Variance: Residual
∗∗∗ p
< 0.001,
∗∗ p
MAPE
MAE
MSE
292952.28
293012.39
-146468.14
40735.09
13548
Gamma-dep GLM
8.86∗∗∗
(0.31)
−0.70∗
(0.29)
−0.51
(1.25)
0.03
(0.08)
0.48∗∗∗
(0.12)
−0.01
(0.31)
−0.01
(0.07)
−0.94∗∗∗
(0.04)
280007.81
280075.44
-139994.90
19294.67
13548
Gamma-dep GLMM
8.85∗∗∗
(0.31)
−0.69∗
(0.28)
−0.51
(1.24)
0.02
(0.08)
0.48∗∗∗
(0.12)
−0.01
(0.30)
−0.01
(0.07)
−0.94∗∗∗
(0.04)
280008.25
280083.39
-139994.12
13548
8
0.01
13.77
< 0.01, ∗ p < 0.05
Gamma-indep GLM
141
21738
22803
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Gamma-dep GLM
1.49
1576
4937
Gamma-dep GLMM
1.49
1575
4937
Seminar Talk - Yonsei University
15 May 2017
13 / 21
Model estimates
Tweedie models
Tweedie model estimates
(Intercept)
VTypeCar
MaritalM
SexM
Comp
AgeYoung
AgeMid
AIC
BIC
Log Likelihood
Deviance
Num. obs.
Num. groups: ID
Variance: ID.(Intercept)
Variance: Residual
MAE
MSE
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Tweedie GLM
5.69
0.45
−2.03
0.15
1.64
−0.34
0.16
385205
385294
-192593
385187
155063
Tweedie GLM
2131
3020
Tweedie GLMM
4.38
−0.18
−1.91
−0.53
1.86
−0.04
0.19
376779
376869
-188381
376761
155063
49978
2.33
594.78
Tweedie GLMM
478
2125
Seminar Talk - Yonsei University
15 May 2017
14 / 21
Model comparison
Gini index
Using the Gini index to compare models
See Frees, et al. (2016).
In computing the Gini index, we employ the following steps:
1
2
3
Sort observed claims Yi |i = 1, · · · , M according to risk score
Si |i = 1, · · · , M (in our case, estimated value from each method) in
ascending order. That is, calculate Ri |i = 1, · · · , M , where each Ri is
the rank of Si between 1 and M so that R1 = arg max (Si ).
PM
Compute Fscore (m/M ) = i=1 1(Ri ≤m) /M ,
PM
PM
Floss (m/M ) = i=1 Yi 1(Ri ≤m) / i=1 Yi for each m = 1, · · · , M .
Plot Fscore (m/M ) on x-axis, and Floss (m/M ) on y-axis.
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
15 / 21
Model comparison
Gini index
Comparing the Gini index for the two-part models
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
16 / 21
Model comparison
Gini index
Comparing the Gini index for the Tweedie models
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
17 / 21
Conclusion
Concluding remarks
We are still in the early stages of this work.
In particular, this is an early exploration of the promise of using
GLMMs to account for dependency between frequency and severity.
This work hopes to extend the literature on this type of dependency:
H. Jeong (2016) work on ”Simple Compound Risk Model with
Dependent Structure”
Garrido, et al. (2016)
Frees and Wang (2006), Frees, et al. (2011)
Czado (2012)
Shi, et al. (2015)
Further work is needed to measure the financial impact of using
GLMMs versus other models:
risk classification
a posteriori prediction and applications in bonus-malus systems
modified insurance coverage and reinsurance applications
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
18 / 21
Appendix A - Singapore
A bit about Singapore
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
19 / 21
Appendix A - Singapore
A bit about Singapore
1
Singa Pura: Lion city. Location: 136.8 km N of equator, between
latitudes 103 deg 38’ E and 104 deg 06’ E. [islands between Malaysia
and Indonesia]
Size: very tiny [647.5 sq km, of which 10 sq km is water] Climate:
very hot and humid [23-30 deg celsius]
Population: nearly 5 mn. Age structure: 0-14 yrs: 16%, 15-64 yrs:
76%, 65+ yrs 8%
Birth rate: 9.34 births/1,000. Death rate: 4.28 deaths/1,000; Life
expectancy: 81 yrs; male: 79 yrs; female: 83 yrs
Ethnic groups: Chinese 74%, Malay 13%, Indian 9%; Languages:
Chinese, Malay , Tamil, English
1
Updated: February 2010
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
20 / 21
Appendix A - Singapore
Insurance market
Insurance market in Singapore
As of 2009 2 : market consists of 45 general ins, 8 life ins, 7 both, 17
general reinsurers, 2 life reins, 7 both; also the largest captive
domicile in Asia, with 59 registered captives.
Monetary Authority of Singapore (MAS) is the supervisory/regulatory
body; also assists to promote Singapore as an international financial
center.
Insurance industry performance in 2009:
total premiums: 11.4 bn; total assets: 113.3 bn [20% annual growth]
life insurance: annual premium = 251.6 mn; single premium = 759.5
mn
general insurance: gross premium = 1.9 bn (domestic = 0.9; offshore
= 1.0)
Further information: http://www.mas.gov.sg
2
Source: wikipedia
Jeong/Ahn/Park/Valdez (U. of Connecticut)
Seminar Talk - Yonsei University
15 May 2017
21 / 21