Generalized Inference
with Application to Small Sample Situations
Sam Weerahandi
(Joint Work with Kawai, Yu et al., Mathew et al.)
Draft – Work in Progress– Confidential – Do Not Distribute
1
Outline
Motivation: Why Generalize?
Problems with Classical & Bayesian Inferences
Introduction to Generalized inference
About Mixed Models
Mixed Models: An Overview
Issues with MLE based Inference
Application: BLUP in Mixed Models
Performance Comparison
An Application
2
Motivation: Why Generalize Classical Inference
STAT 200 teach how to make ANY inference! Really?
Classical Approach to Inference (tests, confidence intervals, etc)
works fine with mean μ, variance σ2 of Normal distribution
But it fails (MLE based inferences are asymptotics) with
most functions of the mean and variance, except for a few simple functions
advanced models such as Mixed Models and ANOVA with unequal variances
Classical Approach also fails to give small sample inferences with
non-normal distributions:
Some functions of parameters of Uniform distribution, U(α,β),
Scale parameter of Gamma distribution, parameters of Weibull
distribution, etc.
One can find various solutions in the literature, but approach vary
from one to another
What is desirable is a systematic approach that works with greater
class of functions of parameters
3
Further Issues With Classical Inference
and Bayesian Inference
Classical Inference can provide only large sample inference for
ANOVA with unequal variances
Variance Components in Mixed Models
BLUPs in Mixed Models
Classical Inference could yield wrong signs in Small Sample Inference
•
•
In Multi-regional clinical trials, some regions could yield negative dose response due
to chance
Estimated response to a TV Ad could become negative in some markets even if
there is no reason why the Ad would alienate any demographic segment
Bayesian Inference can provide small sample inference, but
You need a prior
When non-informative prior with such algorithms as MCMC is used, it
– takes days to estimate when model has large number of parameters
– yields fairly different estimates somewhat different hyper parameters are used
– yields fairly different estimates with different families of priors
Why not take the classical approach, but think like Bayesians?
4
Motivation (ctd.)
Multi-regional clinical trial example (ctd.)
If you run LSE you may not even get the right sign for some Regions
The problem could be alleviated using the same data in Mixed Model
setting
Then you will get much more reasonable estimates (rather BLUPs),
In fact, LSE could yield the wrong sign even with two parameters:
Simulation from exact model, Y = 10 + .05 X +e when sample size 500 and e~N(0,1):
Mixed Models and BLUP (Best Linear Unbiased Predictor) are heavily used in high noise &
small sample applications
But REML/ML frequently yield zero/negative variance components
BLUPs fail or all become equal
REML/ML could be inaccurate when factor variance is relatively small
5
An Introduction to Generalized Inference
Classical Pivotals for interval estimation are of the form Q=Q(X, q)
Generalized Inference on a parameter q, is a generalized pivotal of the
form Q=Q(X, x, q,z) that is a function of Observable X, observed x, and
nuisance parameters
satisfying Q(x,x, q, z) is free of z
having a distribution free of z
Classical Extreme Regions
are of the form Q(X, q0)<Q(x, q0)
cannot produce all extreme regions
Q( X,x, q0, z)< Q( x,x, q0, z) greater class of extreme regions
Generalized Test and Intervals are based on exact probability statements
on Q
Generalized Estimators are based on transformed Generalized Pivotals
If Q or a transformation satisfy Q(x,x, z)= q, then q is estimated using
E(Q), the expected value of Q, Median of Q, etc.
6
Generalized Inference: A Simple Example
Suppose, you have sample from X~N(μ, σ2 )
How to make Inferences about ρ = μ/σ, the coefficient of variation based
on Sample mean and the Sample Variance?
Despite simple distributional results
and
if you start out with the MLE,
/S , it will lead to just asymptotic inferences
But note that
is a Generalized Pivotal Quantity (GPQ), because (i) at the observed values
R reduces to ρ, (ii) The distribution of R is free of unknown parameters
So any inference is possible. For example, Pr(R≤ ρ) yields an exact onesided Generalized Confidence interval (GCI)
In fact above is a Classical CI, but MLE failed to produce it
Note: Exact CI always does not exist, but still you may be able to obtain
an exact GCI. In such cases GCI tend to outperform more complicated
approximations in terms of Repeated Sampling Properties
7
Generalized Inference (ctd.)
The case Q(x,x, z)= q is too restrictive except in location parameters
More generally, if Q(x,x, q, z) = 0, then the solution of E{Q(X,x,q,z)}=0 is
said to be the Generalized Estimate of q
Note: As in classical estimation, one will have a choice of estimates and
need to find one satisfying such desirable conditions as minimum MSE
Major advantage of GE is that, as in Bayesian Inference, it can assure,
via conditional expectation, any known signs of parameters
Variance components are positive
Variance ratio in BLUP is between 0 and 1
GE can produce inferences based on exact probabilities for Distributions
such as Gamma, Weibull, Uniform
To do so you DO NOT need Prior or deal with hyper parameters
Read more about Generalized Inference
at www.weerahandi.org and even read my second book FREE!
8
Application 1: Small Response Estimation when parameter
sign is known
Problem with known sign of parameters of ten arise in practice:
Price Elasticity of demand
Response to promotional tactics
Difference between a Treatment and Placebo effects
Adverse effect of a treatment
Assume that a regression parameter, q is supposed to be positive;
Let be LSE of q. Then T=
Suppose q>a (e.g. a=0 if sign is known).
Kim (2008) showed that the Bayesian Estimate under appropriate
non-informative prior is
The above estimate is always positive
The same estimate can be obtained by considering the
Generalized Pivotal Q=
- (T- q) with observed value q and
taking the conditional expectation E(Q|Q>a)
9
Small Response Estimation (ctd)
Moreover, such classical estimators can be
further improved by taking Stein (1961) like
approach
MSE Performance when s=1
Consider the class of estimators of the form
Find ks by Stein approach
The resulting estimator is denoted as IGE
As evident from the MSE (mean Squared Error)
comparison IGE is uniformly better than LSE
when the parameter is known to be positive
MLE (truncated) can also be improved upon
In Interval Estimation the approach provides
shorter intervals
10
Applications in Mixed Models
Mixed Models are especially useful in applications involving
large samples with noisy data
small samples with low noise
In Clinical Research & Public Health Studies, Mixed Model can yield
results of greater accuracy in estimating effects by
treatment levels
Patient groups
In Sales & Marketing Mixed Models are heavily used to estimate
Response due to promotional tactics:
– Advertisements (TV, Magazine, Web) by Market
– Doctors Response to Field Rep Detailing
In fact, if you don’t use Mixed Models in this type of applications
you may get unreliable or junk estimates, tests, and intervals
So, the BLUP has replaced the LSE as the most widely used
statistical technique
11
An Example
Suppose you are asked to estimate effect of a
TV/Magazine Ad by every Market/District using a
model of longitudinal sales data on ad-stocked exposure
If you run LSE you may not even get the right
sign of estimates for 40% of Markets
If you formulate in a Mixed Model setting you will get
much more reliable estimates
So, use Mixed Models and BLUP instead of LSE
Mixed Models and the BLUP (Best Linear Unbiased Predictor) are heavily used
in high noise & small sample applications
In analysis of promotions, SAS Proc Mixed or R/S+ Lme is used more than any
other procedure
But REML/ML frequently yield zero/negative variance components
BLUPs fail or all become equal
REML/ML could be inaccurate when factor variance is relatively small
12
Overview of Mixed Models
Suppose certain groups/segments distributed around their
parent
Assumption in Mixed Models: Random effects are Normally
distributed around the mean, the parent estimate, say M
Suppose Regression By Groups yield estimate Mi for
Segment i
Let Vs be the between segment variance and Ve be the
error variance, which are known as Variance Components
It can be shown that the Best Unbiased Predictor (BLUP)
of Segment i effect is
Ve M kVs M i
Ve kVs
a weighted average of the two estimates, and k is a known
constant that depends on sample size and group data
The above is a shrinkage estimate that move extreme
estimates towards the parent estimate
13
Problem in Mixed Model Inference
BLUP in Mixed model is a function of Variance Components
Classical estimates of Factor variance can become negative
when noise (error variance) is large and/or sample size is
small
Then, ML and REML fails: PROC Mixed will complaint about
non-convergence or will yield equal BLUPs for all segments
I tried the Bayesian approach with MCMC, but when I did a
sanity check
(i) by changing the hyper parameters OR (ii) by using Gamma
type prior in place of log-normal, I got very different estimates
After both the Classical & Bayesian Approaches failed me, I
wrote a paper about “Generalized Point Estimation”, which
can
Assure estimates fall into the parameter space
Can take advantage of known signs of parameters without any
prior
Can improve MSE of estimates by taking such classical
methods as Stein method
14
Estimating Variance Components and BLUPs
For simplicity consider a
balanced Mixed Model
The inference problems in
canonical form reduces to:
Generalized approach can produce the above estimate or better estimates
Generalized pivotal quantity
is a Generalized Estimator and E(Q)=0 yields the classical estimate
But the drawback of the classical estimate is that
MLE/UE frequently yields negative estimates
The conditional E(Q|C)=0 with known knowledge C yields
BLUPs are then obtained as weighted average Least Squares Estimates of Parent
and Child
15
Comparison of Variance Estimation Methods (based on
10,000 simulated samples): Performance of MLE Vs. GE
Assume One-Way Random
Effects model with
k segments
n data from each segment
Degrees of freedom a=k-1
and e=n(k-1)
The variance component is
estimated by the MLE and
GE
Note that with small sample
sizes MLE/UE yield negative
estimates for Variance
Component
In such situations SAS does
not provide estimates or
BLUP (just say “did not
converge”)
16
Comparison of Variance Estimation Methods:
Performance of ML/REML Vs. GE (ctd.)
Table below shows MSE performance of competing estimators of factor variance
Note that
Generalized estimate is better than any other estimate
REML is not as good as ML
For estimation of the BLUP, Yu, Zou, Carlson, and Weerahandi (2013) provides
similar improvements over the ML and REML
GE based methods do not suffer from the zero variance drawback of ML and REML 17
Further Issues with BLUP
ML and REML Prediction Intervals for BLUP are highly conservative:
Actual coverage of 95% intended intervals area as large as 100%
This implies serious lack of power in Testing of Hypotheses
The drawback prevails unless number of groups tend to infinity
Generalized Intervals proposed by Mathew, Gamage, and Weerahandi (2012) can
rectify the drawback
Table below shows Performance of competing estimates
18
Application: Estimation of Response to TV Ads by Market
Data Preparation:
Obtain TV GRP and weekly/monthly Sales data by market
Ad-stock (e.g. http://en.wikipedia.org/wiki/Advertising_adstock) TV GRP
Obtain data for other variables that you want to control for
De-mean all variables including ad-stocked GRP
Approach to Modeling:
Model Sales or log sales as a linear function of all explanatory variables, including trend
and seasonality in sales
Model the coefficients of ad-stocked GRP as random effects around the national average
Estimate the parameters of the Mixed Model by such methods as ML if there is no
convergence problem, and by proposed generalized method otherwise
Use estimated responses to TV to write down the profit function
Demo
19
© Copyright 2026 Paperzz