p * =1 p*=2 p*=3 p*=4 p*=5 p*=6

Sampling Models for the
Population Mean
Ed Stanek
UMASS
Amherst
1
Basic Problem (Population Mean)
Population
Data
Listing
Rose
Latent Value
yRose
yRose
Lily
yLily
Daisy
yDaisy

What is  ?
yRose  yLily  yDaisy
3
2
Basic Problem (Population Mean)
Some Notation
Label
Population
L     j  j  1,..., N 
Listing

Set of Subjects in the Population
Latent Value y
Rose
yRose

Lily
yLily

Daisy
yDaisy
λ0   j 

 1   Rose 
  2    Lily 
    Daisy 

 3 
y0 
 y 
j
1
y

N L
yLily  yRose  yDaisy
N
Using vector notation:
1 N
   yj
N j 1
Listing

Using set notation:
Latent Values
 y1   yRose 


  y2    yLily 
y  y

 3   Daisy 
Assumption: Response is equal to
the latent value for the subject. There
is no measurement error.
3
Sampling Model
• Select a simple random sample without
replacement of size n
– Define an estimator that is a linear function of
the sample data
– Require the estimator to be unbiased
– Determine coefficients that minimize the
variance (over all possible samples)
• Best Linear Unbiased Estimator (BLUE)
4
Sampling Model
Select a simple random sample without
replacement
Order
i
All possible Permutations
of subjects
p*=1 p*=2 p*=3 p*=4 p*=5 p*=6
Potential
Response
i 1
Y1
R
R
L
L
D
D
i2
Y2
L
D
R
D
R
L
i 3
Y3
D
L
D
R
L
R
Listing
p1
Probability of
Permutation
 0
I p*

1 if Y  u p* y 0


0 otherwise
 
E I p*   p p* 
0
0
p2
0
0
1
N!
p3
0
p4
0
p5
0
for all p*  1, 2,..., N !
p6
0
5
Sampling Model
Select a simple random sample without
replacement
All possible Permutations
of latent values
p*=1 p*=2 p*=3 p*=4 p*=5 p*=6
Potential Response
 Y1  N !
0
Y   Yi     Y2    I p* u p* y 0
 Y  p* 1
 3
y0 
 y 
j
 y1   yRose 


  y2    yLily 
y  y

 3   Daisy 
u1y 0
1 0 0
u1   0 1 0 
0 0 1


yRose
yRose
yLily yLily yDaisy
yDaisy
yLily
yDaisy yRose yDaisy yRose
yLily
yDaisy
yLily
yRose
u2y 0
1 0 0
u 2   0 0 1 
0 1 0


yDaisy yRose
yLily
u6 y 0
0 0 1
u 6   0 1 0 
1 0 0


6
Sampling Model
Select a simple random sample without
replacement
All possible Permutation
Yi  E Yi   Ei
Permutation
Order
i
N!
Y   I p* u p* y 0
Potential
Response
i 1
Y1
i2
Y2
i 3
Y3
0
p* 1
 Y1 
 
Y   Y2 
 Y3 
 
Data
Remainder
7
Sampling Model
Select a simple random sample without
replacement
•Represent the Population as a Vector of Random Variables
•The random variables are indexed by their position- not the label
for the subject in a position subject
•The subject corresponding to a random variable can not be
identified
Sample Size:
n=1
 Y1 
 
Y   Y2 
 Y3 
 
Permutation
Data
Remainder
Position i=1
Sample Size:
n=2
 Y1 
 
Y   Y2 
 Y3 
 
Permutation
Data
Remainder
8
Sampling Model
Define the Target
P  gY
Linear combination of Population
Random Variables:
g   g1
g N 
g2
N
  giYi
i 1
•May be a Parameter
•May be a Random variable
Special case: Mean (Parameter)
Special case: Latent value for
Randomly
Selected
Subject
1
1N Y
N
1 N
  Yi
N i 1

P
P  Yi
gi 
1
for all i  1,...N
N
gi  1 for i
gi*  0 for all i*  i
9
Sampling Model
Expected Value
Yi  E Yi   Ei
 Y1 
 Y   YI 
Y 2 
Y
 Y3   II 
 
Y  E Y  E
 Xβ  E
Under SRS w/o Rep:
E  Y   Xβ
Linear
Link Function
E Yi   
E  Y   1N 
Expected Value
E  Y   1N 
Data
E  X   1N
β 
Expected Value
 YI   X I 
 1n 
E   
   1 
Y
X
 II   II 
 N n 
10
Sampling Model
Variance
Yi  E Yi   Ei
 Y1 
 Y   YI 
Y 2 
Y
 Y3   II 
 
Variance
Term due to finite population
correction factor
1


var  Y    2  I N  J N 
N


  2 PN
where
PN  I N 
1
JN
N
1 N
2
 
y


 s 
N  1 s 1
2
Data
Variance
1

I

 n N Jn
 YI 
var     2 
  1 1 1
 YII 
 N N n n

 VI VI , II 


V
V
II
,
I
II


1

1n1N n 
N

1
I N n  J N n 
N


11
Sampling Model
Expected Value and Variance
Reference Sets
Expectation is evaluated over a reference set
Reference Set: The set of possible values that
sample random variables can
have with positive probability
Example:
 Y1 
 Y   YI 
Y 2 
Y
 Y3   II 
 
Data
If n  1
YI  Y1
Reference set for YI  Y1
y
Lily
, yRose , yDaisy 
12
Sampling Model
Expected Value and Variance:
Reference Sets
 Y1 
 Y   YI 
Y 2 
Y
 Y3   II 
 
Data
E  YI   E Y1  
YI  Y1
n 1
 Reference 
P

 Element  yReference
Reference 
 Element
Elements
y
Reference set for YI  Y1
Lily
, yRose , yDaisy 
 Reference  1
P
3
Element


 Reference 
E Y1    P 
 yReference
Element
Reference

 Element
Elements

1
1
1
yLily  yRose  yDaisy
3
3
3
13
Sampling Model
Expected Value and Variance
Reference Sets
Example when n  2
 Y1 
 Y   YI 
Y 2 
Y
 Y3   II 
 
Data
Y 
YI   1 
 Y2 
 Y1 
Reference set for YI   
 Y2 
 y
Lily

, yRose  , yLily , yDaisy  , yRose , yDaisy 
Sets of possible latent values
If
yLily  10
yDaisy  8
yRose  6
Reference set for YI
10 6,10 8,6 8
14
Sampling Model
Expected Value and Variance
Reference Sets vs Sequence
Permutation (sequences)
Example when n  2
 Y1 
 Y   YI 
Y 2 
Y
 Y3   II 
 
 Y1 
Data YI   
 Y2 
Reference Set for YI
 y
Lily
p*=1 p*=2 p*=3 p*=4 p*=5 p*=6
 Y1 
Y 
 2
 Y3 
 

, yRose  , yLily , yDaisy  , yRose , yDaisy 
L
L
R
R
D
D
R
D
L
D
L
R
D
R
D
L
R
L
Reference Sequence for YI

 yLily   yLily   yRose   yRose   yDaisy   yDaisy  

,
,
,
,
,

  y  y
 y 
 y

y
y
Daisy
Lily
Daisy
Lily

Rose
Rose











15
Sampling Model
Expected Value and Variance
Reference Sets vs Sequence
Example when n  2
 Y1 
 Y   YI 
Y 2 
Y
 Y3   II 
 
Y 
Data YI   1 
 Y2 
Reference Sequence :
Used in Random
Permutation
Model
Reference Set :
Sufficient, assuming
order doesn’t matter


 yLily   yRose   yLily   yDaisy   yRose   yDaisy  
,
, 
,
, 
,







y
y
y
y
y
y
 Rose   Lily   Daisy   Lily   Daisy   Rose  


 y
Lily

, yRose  , yLily , yDaisy  , yRose , yDaisy 
16
Sampling Model
Determining the BLUE for 
Target:
P  gY
 YI 
gII   
 YII 
 gI YI  gII YII
  gI
data
where
 gI
Linear Estimator: Pˆ   gI  a  YI
 gI YI  aYI
a   a1 a2
gII  
1
1n 1N n 
N
an 
Question: What should a be so that the estimator is unbiased
and has minimum variance?
17
Sampling Model
Determining the BLUE for
Unbiased Constraint
Unbiased requirement:



E Pˆ  P  0
P̂  P
Pˆ  gI YI  aYI
  P  gI YI  gII YII 
E  Y   1N 
 YI   1n 
E   

1
Y
 II   N n 
Pˆ  P  aYI  gII YII
X 
E Pˆ  P   a gII   I 
 X II 


Implies that
aXI  gII XII  0
18
Sampling Model
Determining the BLUE
Minimizing the Variance
Pˆ  P  aYI  gII YII
Variance
 VI
var R Pˆ  P   a gII  
 VII , I


Unbiased Constraint
VI , II   a 
VII   g II 
aXI  gII XII  0
Lagrangian Function to Minimize with Respect to a
f  a, λ   aVI a  2gII VII , I a  gII VII g II  2  aX I  gII X II  η
f  a, η
 2VI a  2VI , II g II  2X I η
a
f  a, η
 2  XI a  XII g II 
η
 f  aˆ ,ˆ  


1  a   VI

2  f  aˆ ,ˆ    XI
  


 VI
 X
 I
X I   aˆ   VI , II g II   0n 


0  ˆ   XII g II   0 
X I   aˆ   VI , II g II 

0  ˆ   XII g II 
19
Sampling Model
Determining the BLUE
Minimizing the Variance
Solving the Estimating Equations
 VI
 X
 I
A B
M

C
D


X I   aˆ   VI , II g II 

0  ˆ   XII g II 
1
1
1
1
1
1


A

A
BQ
CA

A
BQ
1
M 

1
1
1


Q
CA
Q


 VI


 XI
XI 

0 

1
1
where Q  D  CA B



1
 1
1
1

V 1 V 1X X V 1X
V

V
X
X
V
X
X
I
I
I
I
I
I
I
I
I
I
I
I
I


1
1
1
1
1




X I VI X I X I VI
 X I VI X I



aˆ   VI1  VI1X I X I VI1X I



1





X I VI1  VI , II g II  VI1X I X I VI1X I


1

1






X II g II
20
Sampling Model
Determining the BLUE
Minimizing the Variance
Solving the Estimating Equations


aˆ   VI1  VI1X I X I VI1X I


1


X I VI1  VI , II g II  VI1X I X I VI1X I


1
X II g II
aˆ   gII VI, II VI1  VI1X I  XI VI1X I  XI VI1   gII X II  XI VI1X I  XI VI1


1
1
Pˆ  gI YI  aˆ YI
Let

ˆ  X I VI1X I

1
X I VI1YI
Pˆ  g I YI  g II   X II ˆ  VII , I VI1  YI  X I ˆ  
 
var Pˆ  var   gI  aˆ   YI 
  gI  aˆ   VI  g I  aˆ 
21
Sampling Model
Determining the BLUE of 
Using
X I VI1X I 
 gI
n
N  N  n
1
1
1

X I  1n
gII    1n
1N n 
N
N
N

1
1 
1

1
VI   I n  J n   I n 
Jn
N 
N n


 1
so that ˆ  X I VI X I

1
and
X II 
1
1N  n
N
1

X I VI1YI  N  1n YI 
n

Pˆ  g I YI  g II   X II ˆ  VII , I VI1  YI  X I ˆ  
1

1  N  
1
 1
N 
Pˆ  f  1n YI   1N n  1N n  1n   VII , I VI1  I n  1n  1n    YI
N  n  
n
 N
n 

N
1 
1
 1  N  n   1 

 f  1n YI   
1n   1N n VII , I VI1  I n  J n   YI
n 
n
 N  n
 N


1
1  

 fY  1  f  Y  1N n VII , I VI1  I n  J n  YI 
N
n  


where
f 
n
N
1
Y  1n YI
n
22
Sampling Model
Determining the BLUE of 

1
1  

Pˆ  fY  1  f  Y  1N n VII , I VI1  I n  J n  YI 
N
n  


Now
VII , I VI1  
1
1


1N n1n  I n 
Jn 
N
N n 

1
n 

1N  n  1 
 1n
N
 N n
1

1N n1n
N n
1 

and
1n  I n  J n   0n
n 


As a result
Pˆ  fY  1  f  Y
Y
where
1
Y  1n YI
n
23
Sampling Model
Determining the BLUE of 
Pˆ  gI YI  aˆ YI
where
gI YI  fY
and aˆ YI  1  f  Y
Y
Now
 
var Pˆ  var   gI  aˆ   YI 
 var Y 

Since
1
1 var  YI  1n
2 n
n
1 

VI   2  I n  J n 
N 

 
var Pˆ  1  f 
2
n
24