The likelihood approach to statistics Marco Cattaneo Department of Statistics, LMU Munich [email protected] December 9, 2008 my research I PhD with Frank Hampel at ETH Zurich (November 2002 – March 2007): Statistical Decisions Based Directly on the Likelihood Function my research I PhD with Frank Hampel at ETH Zurich (November 2002 – March 2007): Statistical Decisions Based Directly on the Likelihood Function I Postdoc with Thomas Augustin at LMU Munich (SNSF Research Fellowship, October 2007 – March 2009): Decision making on the basis of a probabilistic-possibilistic hierarchical description of uncertain knowledge the likelihood function I P = {Pθ : θ ∈ Θ} set of probabilistic models: each Pθ ∈ P is a probability measure on (Ω, A) the likelihood function I P = {Pθ : θ ∈ Θ} set of probabilistic models: each Pθ ∈ P is a probability measure on (Ω, A) I when data A ∈ A are observed, the likelihood function lik(θ) ∝ Pθ (A) describes the relative ability of the models in P to forecast the observed data the likelihood function I P = {Pθ : θ ∈ Θ} set of probabilistic models: each Pθ ∈ P is a probability measure on (Ω, A) I when data A ∈ A are observed, the likelihood function lik(θ) ∝ Pθ (A) describes the relative ability of the models in P to forecast the observed data I the likelihood function is a central concept in statistics (both frequentist and Bayesian) maximum likelihood I the maximum likelihood estimate of θ is θ̂ML = arg max lik(θ) θ∈Θ maximum likelihood I the maximum likelihood estimate of θ is θ̂ML = arg max lik(θ) θ∈Θ I a statistical decision problem is described by a loss function L : Θ × D → [0, ∞], where L(θ, d) is the loss incurred by making decision d, according to model Pθ maximum likelihood I the maximum likelihood estimate of θ is θ̂ML = arg max lik(θ) θ∈Θ I a statistical decision problem is described by a loss function L : Θ × D → [0, ∞], where L(θ, d) is the loss incurred by making decision d, according to model Pθ I for instance, estimation of θ ∈ Rn with squared error: D = Θ ⊆ Rn and L(θ, d) = kθ − dk2 maximum likelihood I the maximum likelihood estimate of θ is θ̂ML = arg max lik(θ) θ∈Θ I a statistical decision problem is described by a loss function L : Θ × D → [0, ∞], where L(θ, d) is the loss incurred by making decision d, according to model Pθ I for instance, estimation of θ ∈ Rn with squared error: D = Θ ⊆ Rn and L(θ, d) = kθ − dk2 I MLD criterion: make decision dMLD = arg min L(θ̂ML , d) d∈D maximum likelihood I the maximum likelihood estimate of θ is θ̂ML = arg max lik(θ) θ∈Θ I a statistical decision problem is described by a loss function L : Θ × D → [0, ∞], where L(θ, d) is the loss incurred by making decision d, according to model Pθ I for instance, estimation of θ ∈ Rn with squared error: D = Θ ⊆ Rn and L(θ, d) = kθ − dk2 I MLD criterion: make decision dMLD = arg min L(θ̂ML , d) d∈D I the MLD criterion is very often implicitly used, but disregards the model uncertainty the hierarchical model I Bayesian criterion with prior probability measure π: make decision Z dπ = arg min L(θ, d) lik(θ) dπ(θ) d∈D the hierarchical model I Bayesian criterion with prior probability measure π: make decision Z dπ = arg min L(θ, d) lik(θ) dπ(θ) d∈D I the set P = {Pθ : θ ∈ Θ} of probabilistic models and the likelihood function lik on Θ are the two levels of a hierarchical model the hierarchical model I Bayesian criterion with prior probability measure π: make decision Z dπ = arg min L(θ, d) lik(θ) dπ(θ) d∈D I the set P = {Pθ : θ ∈ Θ} of probabilistic models and the likelihood function lik on Θ are the two levels of a hierarchical model I when data B ∈ A are observed, Pθ is updated to Pθ (·|B), and lik(θ) is updated to lik(θ) Pθ (B) the hierarchical model I Bayesian criterion with prior probability measure π: make decision Z dπ = arg min L(θ, d) lik(θ) dπ(θ) d∈D I the set P = {Pθ : θ ∈ Θ} of probabilistic models and the likelihood function lik on Θ are the two levels of a hierarchical model I when data B ∈ A are observed, Pθ is updated to Pθ (·|B), and lik(θ) is updated to lik(θ) Pθ (B) I before observing data, prior information ca be described by a (subjective) prior likelihood function the hierarchical model I Bayesian criterion with prior probability measure π: make decision Z dπ = arg min L(θ, d) lik(θ) dπ(θ) d∈D I the set P = {Pθ : θ ∈ Θ} of probabilistic models and the likelihood function lik on Θ are the two levels of a hierarchical model I when data B ∈ A are observed, Pθ is updated to Pθ (·|B), and lik(θ) is updated to lik(θ) Pθ (B) I before observing data, prior information ca be described by a (subjective) prior likelihood function I prior ignorance is described by a constant likelihood function nonadditive measures and integrals I the nonadditive measure on 2Θ defined by λ(H) = sup lik(θ) for all H ⊆ Θ θ∈H is used in particular in the likelihood ratio test nonadditive measures and integrals I the nonadditive measure on 2Θ defined by λ(H) = sup lik(θ) for all H ⊆ Θ θ∈H is used in particular in the likelihood ratio test I likelihood-based decision criterion: make decision Z d 0 = arg min L(θ, d) dλ(θ) d∈D nonadditive measures and integrals I the nonadditive measure on 2Θ defined by λ(H) = sup lik(θ) for all H ⊆ Θ θ∈H is used in particular in the likelihood ratio test I likelihood-based decision criterion: make decision Z d 0 = arg min L(θ, d) dλ(θ) d∈D I the resulting decision functions satisfy in particular asymptotic optimality (consistency), parametrization invariance, equivariance, and asymptotic efficiency MPL decision criterion I λ is completely maxitive: ! [ λ H = sup λ(H) H∈S H∈S for all S ⊆ 2Θ MPL decision criterion I λ is completely maxitive: ! [ λ H = sup λ(H) H∈S I for all S ⊆ 2Θ H∈S hence the Shilkret integral is Z S L(θ, d) dλ(θ) = sup lik(θ) L(θ, d) θ∈Θ MPL decision criterion I λ is completely maxitive: ! [ λ H = sup λ(H) H∈S I for all S ⊆ 2Θ H∈S hence the Shilkret integral is Z S L(θ, d) dλ(θ) = sup lik(θ) L(θ, d) θ∈Θ I MPL criterion: make decision dMPL = arg min sup lik(θ) L(θ, d) d∈D θ∈Θ MPL decision criterion I λ is completely maxitive: ! [ λ H = sup λ(H) H∈S I for all S ⊆ 2Θ H∈S hence the Shilkret integral is Z S L(θ, d) dλ(θ) = sup lik(θ) L(θ, d) θ∈Θ I MPL criterion: make decision dMPL = arg min sup lik(θ) L(θ, d) d∈D θ∈Θ I the MPL criterion is the only likelihood-based decision criterion satisfying the sure-thing principle: if d is optimal with respect to the set P of probabilistic models, and d is optimal also with respect to P 0 , then d is optimal with respect to P ∪ P 0 example: estimation of variance components I estimation of the variance components in the 3 × 3 random effect one-way layout, under normality assumptions and weighted squared error loss Xij = µ + αi + εij for all i, j ∈ {1, 2, 3} example: estimation of variance components I estimation of the variance components in the 3 × 3 random effect one-way layout, under normality assumptions and weighted squared error loss Xij = µ + αi + εij I for all i, j ∈ {1, 2, 3} normality assumptions: αi ∼ N (0, va ), εij ∼ N (0, ve ), all independent ⇒ Xij ∼ N (µ, va + ve ) dependent, µ ∈ (−∞, ∞), va , ve ∈ (0, ∞) example: estimation of variance components I estimates vbe and vba of variance components ve and va are functions of SSe = 3 X 3 X (xij − x̄i· )2 and SSa = 3 i=1 j=1 where x̄i· = 3 X (x̄i· − x̄·· )2 , i=1 3 3 3 1 X 1 XX xij , x̄·· = xij , 3 9 j=1 SSe ∼ χ26 , ve i=1 j=1 and 1 3 SSa ∼ χ22 va + 13 ve example: estimation of variance components I estimates vbe and vba of variance components ve and va are functions of SSe = 3 X 3 X (xij − x̄i· )2 and SSa = 3 i=1 j=1 where x̄i· = (x̄i· − x̄·· )2 , i=1 3 3 3 1 X 1 XX xij , x̄·· = xij , 3 9 j=1 SSe ∼ χ26 , ve I 3 X i=1 j=1 and 1 3 SSa ∼ χ22 va + 13 ve invariant loss functions: L(ve , vbe ) = 3 (ve − vbe )2 ve 2 and L(va , vba ) = (va − vba )2 (va + 31 ve )2 example: estimation of variance components vc e SSa +SSe vba SSa +SSe 0.16 0.15 0.12 0.1 0.05 0.08 0 0.04 0 0.2 0.4 0.6 0.8 1 SSa/(SSa+SSe) -0.05 0 0 0.2 0.4 0.6 0.8 1 SSa/(SSa+SSe) MPL MPL ANOVA ANOVA = ANOVA+ = MINQU ML ML ReML = ANOVA+ ReML nonneg. MINQ min. bias example: estimation of variance components 3 E [(c ve −ve )2 ] ve 2 E [(vba −va )2 ] (va + 31 ve )2 1 1.6 0.95 1.2 0.9 0.85 0.8 0.8 0.4 0.75 0.7 0 0 0.2 0.4 0.6 0.8 1 Va/(Va+Ve) MPL 0.2 0.4 0.6 0.8 1 Va/(Va+Ve) MPL ANOVA ANOVA = ANOVA+ = MINQU ML ML ReML = ANOVA+ ReML nonneg. MINQ min. bias present and future research I comparison/combination with other approaches in various applications present and future research I comparison/combination with other approaches in various applications I application to robustness problems (relaxation of i.i.d. assumption, robust likelihood) present and future research I comparison/combination with other approaches in various applications I application to robustness problems (relaxation of i.i.d. assumption, robust likelihood) I application to imprecise probability theory (better updating, possibilistic previsions, convex set of non-normalized measures) present and future research I comparison/combination with other approaches in various applications I application to robustness problems (relaxation of i.i.d. assumption, robust likelihood) I application to imprecise probability theory (better updating, possibilistic previsions, convex set of non-normalized measures) I application to graphical models (probabilistic and non-probabilistic aspects of uncertainty) present and future research I comparison/combination with other approaches in various applications I application to robustness problems (relaxation of i.i.d. assumption, robust likelihood) I application to imprecise probability theory (better updating, possibilistic previsions, convex set of non-normalized measures) I application to graphical models (probabilistic and non-probabilistic aspects of uncertainty) I application to financial risk measures (derivation and interpretation of convex risk measures)
© Copyright 2026 Paperzz