Generalized linear mixed models (GLMMs) for dependent compound risk models Emiliano A. Valdez, PhD, FSA joint work with H. Jeong, J. Ahn and S. Park University of Connecticut Seminar Talk at Yonsei University Seoul, Korea 15 May 2017 Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 1 / 21 Outline Introduction Data structure The frequency-severity model Exponential dispersion family Covariates Compound risk models Generalized linear mixed models Model specifications Model estimates Claim frequency Claim severity Tweedie models Model comparison Gini index Conclusion Appendix A - Singapore Insurance market Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 2 / 21 Introduction Data structure Data structure “Policyholder” i is followed over time t = 1, . . . , Ti years, where Ti is at most 9 years. Unit of analysis “it” – a registered vehicle insured i over time t (year) For each “it”, could have several claims, k = 0, 1, . . . , nit Have available information on: number of claims nit , amount of claim yitk , exposure eit and covariates (explanatory variables) xit covariates often include age, gender, vehicle type, driving history and so forth We will model the pair (nit , cit ) where nit 1 P yitk , cit = nit k=1 0, nit > 0 nit = 0 is the observed average claim size. Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 3 / 21 The frequency-severity model The frequency-severity model Traditional to predict/estimate insurance claims distributions: Cost of Claims = Frequency × Severity The joint density of the number of claims and the average claim size can be decomposed as f (N, C|x) = f (N |x) × f (C|N, x) joint = frequency × conditional severity. This natural decomposition allows us to investigate/model each component separately and it does not preclude us from assuming N and C are independent. For purposes of notation, we will use the notation C = N C to be the aggregate claims. Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 4 / 21 The frequency-severity model Exponential dispersion family Exponential dispersion family We say Y comes from an exponential dispersion family if its density has the form yθ − ψ(θ) + c(y; φ) . f (y) = exp φ where θ and ψ are location and scale parameters, respectively, and b(θ) and c(y; ψ) are known functions. The following well-known relations hold for these distributions: mean: µ = E[Y ] = ψ 0 (θ) variance: V ar[Y ] = φψ 00 (θ) = φV (µ) Reproductive EDF: If Y1 , . . . , YN are mutually independent belonging to the EDF(θ, φ), then its average Y also belongs to EDF(θ, φ/N ). Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 5 / 21 The frequency-severity model Exponential dispersion family Examples of members of this family Normal N (µ, σ 2 ) with θ = µ and φ = σ 2 , V (µ) = 1 Gamma(α, β) with θ = −β/α and φ = 1/α, V (µ) = µ2 Inverse Gaussian(α, β) with θ = − 12 β 2 /α2 and φ = β/α2 , V (µ) = µ3 Poisson(µ) with θ = log µ and φ = 1, V (µ) = 1, V (µ) = µ Binomial(m, p) with θ = log [p/(1 − p)] and φ = 1, V (µ) = µ(1 − µ/m) Negative Binomial(r, p) with θ = log(1 − p) and φ = 1, V (µ) = µ(1 + µ/r) Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 6 / 21 The frequency-severity model Exponential dispersion family The link function and linear predictors The linear predictor is a function of the covariates, or predictor variables. The link function connects the mean of the response to this linear predictor in the form η = g(µ). g(µ) is called the link function with µ = E(Y ) = linear predictor. The transformed mean follows a linear model as η = x0i β = β0 + β1 xi1 + · · · + βp xip with p predictor variables (or covariates). You can actually invert µ = g −1 (x0i β) = g −1 (η). Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 7 / 21 The frequency-severity model Covariates Covariates Vehicle Type: automotive (A) or others (O). Marital status: married (M), single (S), others (O) Gender: male (M) or female (F). Coverage type: Comprehensive (Comp) or Others Age: in years, grouped into 3 categories Young (Y), Middle Aged (M), Old (O) Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 8 / 21 Compound risk models Compound risk models See Garrido, et al. (2016) In the case of independence, we have: Mean: E(C) = E(N )E(C) Variance: V ar(C) = E(N )V ar(C) + V ar(N )[E(C]2 In case we do not assume independence, we have Mean: E(C) = E[N E(C|N ))] 6= E(N )E(C) Variance: V ar(C) = E[N V ar(C|N ))] + V ar[N E(C|N )] Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 9 / 21 Compound risk models Generalized linear mixed models Generalized linear mixed models GLMMs extend GLMs by allowing for random, or subject-specific, effects in the linear predictor which reflects the idea that there is a natural heterogeneity across subjects. For insurance applications, our subject i is usually a policyholder observed for a period of Ti periods. Given the vector bi with the random effects for each subject i, Yit belongs to the EDF: yit θit − ψ(θit ) + c(yit ; φ) f (yit |bi ) = exp φ with the following conditional relations: mean: µit = E[Yit |bi ] = ψ 0 (θit ) variance: V ar[Y |bi ] = φψ 00 (θ + it) = φV (µit ) Link function: g(µit ) = x0it β + z0it bi Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 10 / 21 Model specifications Model specifications We calibrated models with the following specifications: For the count of claims (frequency): negative binomial for our baseline model more suitable than the traditional Poisson model because it can handle overdispersion and individual unobserved effects For the severity of claims: Gamma (due to its reproductive property) For the random effects for GLMMs: Normal with mean zero and unknown variances different variances for frequency and claim severity The Tweedie model is a special case of the GLM family and it has mass at zero (avoids modeling frequency and severity separately). for some case, its density has no explicit form, but the variance function has the form V (µ) = µp e.g. compound Poisson Gamma with 1 < p < 2. Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 11 / 21 Model estimates Claim frequency Negative binomial model for claim frequency (Intercept) VTypeCar MaritalM SexM Comp AgeYoung AgeMid AIC BIC Log Likelihood Deviance Num. obs. Num. groups: ID Variance: ID.(Intercept) Variance: Residual ∗∗∗ p < 0.001, ∗∗ p MAE MSE Jeong/Ahn/Park/Valdez (U. of Connecticut) NB GLM −2.66∗∗∗ (0.08) −0.01 (0.07) −0.53 (0.35) 0.11∗∗∗ (0.02) 1.09∗∗∗ (0.03) 0.25∗∗ (0.08) 0.14∗∗∗ (0.02) 104835.76 104915.37 -52409.88 65407.89 155063 NB GLMM −2.67∗∗∗ (0.12) 0.01 (0.12) −0.54 (0.54) 0.11∗∗ (0.04) 1.09∗∗∗ (0.05) 0.23 (0.13) 0.14∗∗∗ (0.03) 104922.06 105011.62 -52452.03 155063 49978 0.28 2.27 < 0.01, ∗ p < 0.05 NB GLM 0.2038 0.2390 NB GLMM 0.2039 0.2391 Seminar Talk - Yonsei University 15 May 2017 12 / 21 Model estimates Claim severity Gamma distributions models for claim frequency Gamma-indep GLM 9.24∗∗∗ (0.56) 0.43 (0.53) −1.34 (2.29) 0.10 (0.15) 0.41 (0.23) −0.56 (0.56) 0.06 (0.14) (Intercept) VTypeCar MaritalM SexM Comp AgeYoung AgeMid log(Freq) AIC BIC Log Likelihood Deviance Num. obs. Num. groups: Year Variance: Year.(Intercept) Variance: Residual ∗∗∗ p < 0.001, ∗∗ p MAPE MAE MSE 292952.28 293012.39 -146468.14 40735.09 13548 Gamma-dep GLM 8.86∗∗∗ (0.31) −0.70∗ (0.29) −0.51 (1.25) 0.03 (0.08) 0.48∗∗∗ (0.12) −0.01 (0.31) −0.01 (0.07) −0.94∗∗∗ (0.04) 280007.81 280075.44 -139994.90 19294.67 13548 Gamma-dep GLMM 8.85∗∗∗ (0.31) −0.69∗ (0.28) −0.51 (1.24) 0.02 (0.08) 0.48∗∗∗ (0.12) −0.01 (0.30) −0.01 (0.07) −0.94∗∗∗ (0.04) 280008.25 280083.39 -139994.12 13548 8 0.01 13.77 < 0.01, ∗ p < 0.05 Gamma-indep GLM 141 21738 22803 Jeong/Ahn/Park/Valdez (U. of Connecticut) Gamma-dep GLM 1.49 1576 4937 Gamma-dep GLMM 1.49 1575 4937 Seminar Talk - Yonsei University 15 May 2017 13 / 21 Model estimates Tweedie models Tweedie model estimates (Intercept) VTypeCar MaritalM SexM Comp AgeYoung AgeMid AIC BIC Log Likelihood Deviance Num. obs. Num. groups: ID Variance: ID.(Intercept) Variance: Residual MAE MSE Jeong/Ahn/Park/Valdez (U. of Connecticut) Tweedie GLM 5.69 0.45 −2.03 0.15 1.64 −0.34 0.16 385205 385294 -192593 385187 155063 Tweedie GLM 2131 3020 Tweedie GLMM 4.38 −0.18 −1.91 −0.53 1.86 −0.04 0.19 376779 376869 -188381 376761 155063 49978 2.33 594.78 Tweedie GLMM 478 2125 Seminar Talk - Yonsei University 15 May 2017 14 / 21 Model comparison Gini index Using the Gini index to compare models See Frees, et al. (2016). In computing the Gini index, we employ the following steps: 1 2 3 Sort observed claims Yi |i = 1, · · · , M according to risk score Si |i = 1, · · · , M (in our case, estimated value from each method) in ascending order. That is, calculate Ri |i = 1, · · · , M , where each Ri is the rank of Si between 1 and M so that R1 = arg max (Si ). PM Compute Fscore (m/M ) = i=1 1(Ri ≤m) /M , PM PM Floss (m/M ) = i=1 Yi 1(Ri ≤m) / i=1 Yi for each m = 1, · · · , M . Plot Fscore (m/M ) on x-axis, and Floss (m/M ) on y-axis. Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 15 / 21 Model comparison Gini index Comparing the Gini index for the two-part models Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 16 / 21 Model comparison Gini index Comparing the Gini index for the Tweedie models Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 17 / 21 Conclusion Concluding remarks We are still in the early stages of this work. In particular, this is an early exploration of the promise of using GLMMs to account for dependency between frequency and severity. This work hopes to extend the literature on this type of dependency: H. Jeong (2016) work on ”Simple Compound Risk Model with Dependent Structure” Garrido, et al. (2016) Frees and Wang (2006), Frees, et al. (2011) Czado (2012) Shi, et al. (2015) Further work is needed to measure the financial impact of using GLMMs versus other models: risk classification a posteriori prediction and applications in bonus-malus systems modified insurance coverage and reinsurance applications Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 18 / 21 Appendix A - Singapore A bit about Singapore Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 19 / 21 Appendix A - Singapore A bit about Singapore 1 Singa Pura: Lion city. Location: 136.8 km N of equator, between latitudes 103 deg 38’ E and 104 deg 06’ E. [islands between Malaysia and Indonesia] Size: very tiny [647.5 sq km, of which 10 sq km is water] Climate: very hot and humid [23-30 deg celsius] Population: nearly 5 mn. Age structure: 0-14 yrs: 16%, 15-64 yrs: 76%, 65+ yrs 8% Birth rate: 9.34 births/1,000. Death rate: 4.28 deaths/1,000; Life expectancy: 81 yrs; male: 79 yrs; female: 83 yrs Ethnic groups: Chinese 74%, Malay 13%, Indian 9%; Languages: Chinese, Malay , Tamil, English 1 Updated: February 2010 Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 20 / 21 Appendix A - Singapore Insurance market Insurance market in Singapore As of 2009 2 : market consists of 45 general ins, 8 life ins, 7 both, 17 general reinsurers, 2 life reins, 7 both; also the largest captive domicile in Asia, with 59 registered captives. Monetary Authority of Singapore (MAS) is the supervisory/regulatory body; also assists to promote Singapore as an international financial center. Insurance industry performance in 2009: total premiums: 11.4 bn; total assets: 113.3 bn [20% annual growth] life insurance: annual premium = 251.6 mn; single premium = 759.5 mn general insurance: gross premium = 1.9 bn (domestic = 0.9; offshore = 1.0) Further information: http://www.mas.gov.sg 2 Source: wikipedia Jeong/Ahn/Park/Valdez (U. of Connecticut) Seminar Talk - Yonsei University 15 May 2017 21 / 21
© Copyright 2026 Paperzz