Randomization Tests Under an Approximate Symmetry Assumption
Ivan A. Canay
Northwestern University
Joseph P. Romano
Stanford University
Azeem M. Shaikh
University of Chicago
1
Motivation
I
Economic data often exhibits dependence and heterogeneity
1. Time series
2. Spatial models
3. Panel data
4. More generally: clusters
I
Dependence and heterogeneity affects inference on parameters
2
Motivation
I
Economic data often exhibits dependence and heterogeneity
1. Time series
2. Spatial models
3. Panel data
4. More generally: clusters
I
Dependence and heterogeneity affects inference on parameters
I
Standard practice typically involve
1. Use appropriate LLN and CLT
2. Estimate asymptotic variance Σ with some form of HAC or CC.
3. Use t-test (or Wald) coupled with standard asymptotics.
4. Note: number of clusters needs to be large.
2
This paper
I
Randomization tests:
1. Assume data exhibits a symmetry under null hypothesis.
– i.e., invariance of distr. of data to group of transformations.
2. Construct tests that control size exactly.
3. But symmetry may not hold in finite samples ...
3
This paper
I
Randomization tests:
1. Assume data exhibits a symmetry under null hypothesis.
– i.e., invariance of distr. of data to group of transformations.
2. Construct tests that control size exactly.
3. But symmetry may not hold in finite samples ...
I
Randomization Tests under Approximate Symmetry:
1. Assume instead function of data satisfies symmetry approx.
– i.e., fcn. of data converges weakly to distr. satisfying symmetry.
2. Construct tests that control null rejection prob. asymptotically.
3. Setting: data grouped into “clusters”
I
Small number of cluster with many obs. within clusters.
I
Can be heterogeneous and have dependence within/across clusters.
I
Parameter of interest is identified within each cluster.
3
Outline of Talk
1. Review of Randomization Tests
a) Symmetric Location Example
2. Randomization Tests under Approximate Symmetry
a) Asymptotic Results
3. Applications
a) Time Series Regression
b) Difference in Differences
4. Empirical Application: Angrist & Lavy (2009) revisited.
5. Conclusions
4
Review of Randomization Tests
P 2 P on X
I
Observe data X
I
Hypotheses of interest:
H0 : P
I
2 P0
vs.
H1 : P
2 P n P0 .
Reject H0 for large values of a test statistic T = T (X ).
5
Review of Randomization Tests
P 2 P on X
I
Observe data X
I
Hypotheses of interest:
H0 : P
I
2 P0
vs.
H1 : P
2 P n P0 .
Reject H0 for large values of a test statistic T = T (X ).
Assumption R
d
gX = X for g
2 G and P 2 P0
G is a (finite) group of transformations from
I
X
to
X
Intuition: critical value by sampling from G.
5
Review of Randomization Tests (cont.)
I
We let M = jGj and k = d(1
)M e
I
Compute ordered values of T (gX ) for g
2 G:
T (1) ( X ) T (2) ( X ) T (k ) (X ) T (M ) (X )
6
Review of Randomization Tests (cont.)
I We let M = jGj and k = d(1
)M e
I
Compute ordered values of T (gX ) for g
2 G:
T (1) (X ) T (2) (X ) T (k ) (X )
T (M ) (X )
6
Review of Randomization Tests (cont.)
I We let M = jGj and k = d(1
)M e
I
Compute ordered values of T (gX ) for g
2 G:
T (1) (X ) T (2) (X ) T (k ) (X )
I
Define
T (M ) (X )
M + (X ) jfm : T (m ) (X ) > T (k ) (X )gj
M 0 (X ) jfm : T (m ) (X ) = T (k ) (X )gj
Definition 1 (Randomization Test)
(X ) 8
<1
a (X )
:
0
T (X ) > T (k ) (X )
T (X ) = T (k ) (X )
T ( X ) < T (k ) (X )
,
for
a (X ) =
M M+
M0
.
6
Review of Randomization Tests (cont.)
Theorem 1
If Assumption R holds, them
EP [(X )] = for all P
2 P0 .
Key Idea:
T (X )jT (1) (X )
, : : : , T (M ) (X ) Unif(fT (1) (X ), : : : , T (M ) (X )g)
Remark:
I
M may be too big.
I
Can replace with a stochastic approx. without affecting exactness.
I
Use g1 = identity and g2
, : : : , gB i.i.d. Unif(G).
|
7
Symmetric Location Example
Consider X = (X1
, : : : , Xq ) where
I
Xj are independent r.v. on Rd
I
Xj
I
Hypotheses:
I
Let G = f 1 1gq be the group of sign changes.
I
Define gX = (g1 X1
I
Assumption R holds:
Pj : Pj
is symmetric about
H0 : = 0
,
vs.
2 Rd
H1 : 6 = 0
.
, : : : , gq Xq )
(X ) controls size in finite sample.
8
Symmetric Location Example
Consider X = (X1
, : : : , Xq ) where
I
Xj are independent r.v. on Rd
I
Xj
I
Hypotheses:
I
Let G = f 1 1gq be the group of sign changes.
I
Define gX = (g1 X1
I
Assumption R holds:
I
If d = 1 (i.e. scalar) and Xj
Pj : Pj
2 Rd
is symmetric about
H0 : = 0
,
vs.
H1 : 6 = 0
.
, : : : , gq Xq )
(X ) controls size in finite sample.
N (, j2 ) or mixture of normals, then
– t-test valid for certain : 8% or 10% and q 14.
– but possibly quite conservative.
I
See Bakirov & Szekely (2005) and Ibragimov & Müller (2010).
– Used by Bester, Conley & Hansen (2011).
8
Symmetric Location Example Xj
Normal
N (0, 1) for j q /2 and Xj N (0, a 2 ) for j > q /2
t -test as in IM vs. rand. test using T = jt -statj
9
Symmetric Location Example Xj
N (0, 1) for j q /2 and Xj N (0, a 2 ) for j > q /2
t -test as in IM vs. rand. test using T = jt -statj
0.06
Normal Example − q= 16
0.06
Normal Example − q= 8
0.03
0.04
0.05
t−test
r−test
0.01
0.00
0.00
0.01
0.02
0.03
Rejection rate
0.04
0.05
t−test
r−test
0.02
Rejection rate
Normal
0
1
2
3
a^2: value of heterogeneity
4
5
0
1
2
3
4
5
a^2: value of heterogeneity
Figure: Rejection rates. t-test and randomization test. Parameter values are the
following: q = 8 (left panel) and q = 16 (right panel), = 0 05, and 100 000 MC.
.
,
9
Comments
Randomization test:
I
valid for all
() p-values)
10
Comments
Randomization test:
I
valid for all
I
valid for d
() p-values)
>1
10
Comments
Randomization test:
() p-values)
I
valid for all
I
valid for d
I
valid for any test statistic
>1
10
Comments
Randomization test:
() p-values)
I
valid for all
I
valid for d
I
valid for any test statistic
I
similar () better power)
>1
10
Comments
Randomization test:
() p-values)
I
valid for all
I
valid for d
I
valid for any test statistic
I
similar () better power)
I
valid for other symmetric distributions
>1
10
Outline
1. Review of Randomization Tests
a) Symmetric Location Example
2. Randomization Tests under Approximate Symmetry
a) Asymptotic Results
3. Applications
a) Time Series Regression
b) Difference in Differences
4. Empirical Application: Angrist & Lavy (2009) revisited.
5. Conclusions
11
RT under Approximate Symmetry
Pn 2 Pn on Xn
I
Observe data X (n )
I
Hypotheses of interest:
H0 : Pn
2 Pn 0
,
vs.
H1 : Pn
2 Pn n Pn 0 .
,
12
RT under Approximate Symmetry
Pn 2 Pn on Xn
I
Observe data X (n )
I
Hypotheses of interest:
H0 : Pn
2 Pn 0
,
vs.
H1 : Pn
2 Pn n Pn 0 .
,
Assumption RW
Let Sn : Xn
! S ( Euclidean Space) be a function of the data.
1. Sn = Sn (X (n ) )
d
2. gS = S for all g
S under Pn when Pn
2 Pn 0
,
2G
G is a (finite) group of transformations from
S to S
12
RT under Approximate Symmetry
Pn 2 Pn on Xn
I
Observe data X (n )
I
Hypotheses of interest:
H0 : Pn
2 Pn 0
,
vs.
H1 : Pn
2 Pn n Pn 0 .
,
Assumption RW
Let Sn : Xn
! S ( Euclidean Space) be a function of the data.
1. Sn = Sn (X (n ) )
d
2. gS = S for all g
S under Pn when Pn
2 Pn 0
,
2G
G is a (finite) group of transformations from
S to S
I
Reject H0 for large values of a test statistic T = T (Sn ).
I
As before, we let M = jGj and k = d(1
)M e.
12
RT under Approximate Symmetry (cont.)
T (1) (Sn ) T (2) (Sn ) T (k ) (Sn )
T (M ) (Sn )
Definition 2 (RT under Weak Convergence)
(Sn ) 8
<1
a (S )
: n
0
T (Sn ) > T (k ) (Sn )
T (Sn ) = T (k ) (Sn )
T (Sn ) < T (k ) (Sn )
,
for
a (Sn ) =
M M + (Sn )
M 0 (Sn )
.
13
RT under Approximate Symmetry (cont.)
T (1) (Sn ) T (2) (Sn ) T (k ) (Sn )
T (M ) (Sn )
Definition 2 (RT under Weak Convergence)
(Sn ) 8
<1
a (S )
: n
0
T (Sn ) > T (k ) (Sn )
T (Sn ) = T (k ) (Sn )
T (Sn ) < T (k ) (Sn )
,
for
a (Sn ) =
M M + (Sn )
M 0 (Sn )
.
Remarks:
I
Exactly as before, but with Sn in place of X
I
Test does not use the data beyond Sn
I
No invariance assumption on Pn
13
Main Result
Theorem 2
If Assumption RW holds,
a. T : S
b.
! R is continuous,
g : S ! S is continuous, and
2 G and g 0 2 G,
either T (gs ) = T (g 0 s ) 8s 2 S
or Pr(T (gS ) 6= T (g 0 S )) = 1 .
c. For any two distinct elements g
Then
as n
EPn [(Sn )] ! ! 1 when Pn 2 Pn 0 for all n 1.
,
,
14
Comments
I
Several Challenges:
1. Arguments valid for finite n may not hold even approx. for large n.
j
T (Sn ) T (1) (Sn )
, : : : , T (M ) (Sn ) 6 Unif(fT (1) (Sn ), : : : , T (M ) (Sn )g)
2. Mimicking traditional proof doesn’t seem fruitful
:::
... but will use it indirectly.
3. Earlier large-sample results for Sn = identity not useful either.
6
4. Rely on almost sure representation theorem and = arguments.
15
Comments
I
Several Challenges:
1. Arguments valid for finite n may not hold even approx. for large n.
j
T (Sn ) T (1) (Sn )
, : : : , T (M ) (Sn ) 6 Unif(fT (1) (Sn ), : : : , T (M ) (Sn )g)
2. Mimicking traditional proof doesn’t seem fruitful
:::
... but will use it indirectly.
3. Earlier large-sample results for Sn = identity not useful either.
6
4. Rely on almost sure representation theorem and = arguments.
I
Ties requirement satisfied for several statistics:
1. It holds for t -stat and absolute value of t -stat
2. It holds for wald-type test statistics
|
15
Outline
1. Review of Randomization Tests
a) Symmetric Location Example
2. Randomization Tests under Approximate Symmetry
a) Asymptotic Results
3. Applications
a) Time Series Regression
b) Difference in Differences
4. Empirical Application: Angrist & Lavy (2009) revisited.
5. Conclusions
16
Framework
Pn :
H 0 : Pn 2 P n 0
X (n )
,
17
Framework
Pn :
H 0 : Pn 2 P n 0
(n)
(n)
X (n )
X1
Pn ,1
Xq
Pn ,q
,
(n)
(n)
X2
X3
Pn ,2
Pn ,3
(n)
X4
Pn ,4
17
Framework
Pn :
H 0 : Pn 2 P n 0
(n)
Pn ,1
Xq
Pn ,q
,
(n)
(n)
X2
X3
Pn ,2
Pn ,3
Sn ,1 (Xn,1 )
(n)
X (n )
X1
Sn ,2 (Xn,2 )
(n)
Sn ,3 (Xn,3 )
X4
Pn ,4
Sn ,4 (Xn,4 )
Sn ,q (Xn,q )
17
Framework
Pn :
H 0 : Pn 2 P n 0
(n)
Pn ,1
Xq
Pn ,q
,
(n)
(n)
X2
X3
Pn ,2
Pn ,3
X4
(n)
Sn ,1 (Xn,1 )
(n)
X (n )
X1
Sn ,2 (Xn,2 )
Pn ,4
Sn ,3 (Xn,3 )
Sn ,4 (Xn,4 )
Sn ,q (Xn,q )
(Sn ) = I fT (Sn ) > cvn 1 g
,
17
Relate to Clusters
I Suppose Pn 0 = fPn 2 Pn : (Pn ) = 0 g.
,
(n )
I
X (n ) can be grouped into q clusters: Xj
I
Compute q estimators: ˆn ,j = ˆn ,j
(n )
Xj
.
for 1 j
q.
18
Relate to Clusters
I Suppose Pn 0 = fPn 2 Pn : (Pn ) = 0 g.
,
(n )
I
X (n ) can be grouped into q clusters: Xj
I
Compute q estimators: ˆn ,j = ˆn ,j
(n )
Xj
.
for 1 j
q.
Asymptotic Normality and sign-changes
Define Sn = (Sn ,1
, : : : , Sn q ) with
,
Sn ,j =
Suppose
Sn
pn (ˆ
n ,j
,
N (0 diag(Σ1
0 ) .
, : : : , Σq )) .
,
Special case of Assumption RW with G = f 1 1gq .
18
Relate to Clusters
I Suppose Pn 0 = fPn 2 Pn : (Pn ) = 0 g.
,
(n )
I
X (n ) can be grouped into q clusters: Xj
I
Compute q estimators: ˆn ,j = ˆn ,j
(n )
Xj
.
for 1 j
q.
Asymptotic Normality and sign-changes
Define Sn = (Sn ,1
, : : : , Sn q ) with
,
Sn ,j =
Suppose
pn (ˆ
n ,j
,
N (0 diag(Σ1
Sn
0 ) .
, : : : , Σq )) .
,
Special case of Assumption RW with G = f 1 1gq .
I
Our results apply more generally (i.e. rates, distributions, etc)
I
Applications: specify Xj
(n )
and ˆn ,j . Check convergence.
18
Application -
Time Series Regression
Time series linear regression as in BCH.
Yt = Zt + t
,
E [t Zt ] = 0
Two DGPs (N and H) for Zt and
Design N:
1 t
,
and
2 t
,
,:::,n .
t =1
t ,
Zt = 1 + Zt
t
,
= t
1
1 + 1,t
+ 2,t
.
,
,
are independent N (0 1) random variables.
19
Application -
Time Series Regression
Time series linear regression as in BCH.
,
Yt = Zt + t
E [t Zt ] = 0
Two DGPs (N and H) for Zt and
1 t
and
Design H:
1 t
= at ˜1,t and
,
˜k t ,
2 t
,
= t
1
1 + 1,t
+ 2,t
.
,
,
Design N:
,
,:::,n .
t =1
t ,
Zt = 1 + Zt
t
,
are independent N (0 1) random variables.
,
2 t
,
= bt ˜2,t , where
,
,
1
1
1
N ( 1 1/2) + N (0 1/2) + N (1 1/2)
3
3
3
.
The constant at and bt have a jump at t = n /2.
19
How to Apply our Method
I
Clusters: q blocks of consecutive obs.
I
Estimators: LS on each block
I
Under weak assumptions,
Sn (X (n ) ) =
pn (ˆ
n ,j
! Xj(n )
! ˆn j
,
d
0 , : : : , ˆn q 0 )0 !
N ( 0, Σ ) ,
,
where Σ is diagonal.
I
See, e.g., Jenish & Prucha (2009) and BCH: holds for Models N and H.
20
How to Apply our Method
I
Clusters: q blocks of consecutive obs.
I
Estimators: LS on each block
I
Under weak assumptions,
Sn (X (n ) ) =
pn (ˆ
n ,j
! Xj(n )
! ˆn j
,
d
0 , : : : , ˆn q 0 )0 !
N ( 0, Σ ) ,
,
where Σ is diagonal.
I
See, e.g., Jenish & Prucha (2009) and BCH: holds for Models N and H.
I
Our method: G = f 1 1gq and T = jt statj.
,
T (Sn ) =
I
q
j¯q p0 j , ¯q = 1 X
ˆn j ,
s / q
q
,
j =1
s2 =
q
X
1
q
1
(ˆn ,j
¯q )2 .
j =1
Note: rate of convergence drops-out.
20
Alternative methods
1. “Standard” approaches (large q)
(i) Compute ˆnF with OLS using the full sample.
(ii) Reject for large values of
pn jˆF j
0
n
.
sandwich
Critical value depends on asym. framework.
var. est. is consistent (Newey & West (1987), Andrews (1991))
var. est. is inconsistent (Keifer & Vogelsang (2005))
21
Alternative methods
1. “Standard” approaches (large q)
(i) Compute ˆnF with OLS using the full sample.
(ii) Reject for large values of
pn jˆF j
0
n
.
sandwich
Critical value depends on asym. framework.
var. est. is consistent (Newey & West (1987), Andrews (1991))
var. est. is inconsistent (Keifer & Vogelsang (2005))
2. Bias Reduced Linearization (BRL)
(i) Unbiased cluster covariance estimator (CCE) in “sandwich”.
(ii) Dof correction to match first two moments of chi-square
(Bell and McCaffrey, 2002, Imbens and Kolesar, 2012)
21
Alternative methods
1. “Standard” approaches (large q)
(i) Compute ˆnF with OLS using the full sample.
(ii) Reject for large values of
pn jˆF j
0
n
.
sandwich
Critical value depends on asym. framework.
var. est. is consistent (Newey & West (1987), Andrews (1991))
var. est. is inconsistent (Keifer & Vogelsang (2005))
2. Bias Reduced Linearization (BRL)
(i) Unbiased cluster covariance estimator (CCE) in “sandwich”.
(ii) Dof correction to match first two moments of chi-square
(Bell and McCaffrey, 2002, Imbens and Kolesar, 2012)
3. Ibragimov & Müller (2010):
Same test statistic we use here: but with cv from t distribution.
Same comments as before apply.
21
Alternative Methods
4. Bester, Conley, and Hansen (2011, BCH): Fixed-q asymptotics.
(i) Cluster Covariance Estimator (CCE).
(ii) Reject for large values of
pn jˆF j
0
n
.
sandwich
Critical value from t -dist. with q
1 dof.
22
Alternative Methods
4. Bester, Conley, and Hansen (2011, BCH): Fixed-q asymptotics.
(i) Cluster Covariance Estimator (CCE).
(ii) Reject for large values of
pn jˆF j
0
n
.
sandwich
Critical value from t -dist. with q
1 dof.
a) BCH: all conditions from IM are needed.
Our test: Keeps all advantages over IM here.
22
Alternative Methods
4. Bester, Conley, and Hansen (2011, BCH): Fixed-q asymptotics.
(i) Cluster Covariance Estimator (CCE).
(ii) Reject for large values of
pn jˆF j
0
n
.
sandwich
Critical value from t -dist. with q
1 dof.
a) BCH: all conditions from IM are needed.
Our test: Keeps all advantages over IM here.
b) BCH allows heterogeneity on z but needs n
Our test: Robust to Γj 6= Γj 0 .
1
Zj Zj0
!p Γj = Γ.
22
Alternative Methods
4. Bester, Conley, and Hansen (2011, BCH): Fixed-q asymptotics.
(i) Cluster Covariance Estimator (CCE).
(ii) Reject for large values of
pn jˆF j
0
n
.
sandwich
Critical value from t -dist. with q
1 dof.
a) BCH: all conditions from IM are needed.
Our test: Keeps all advantages over IM here.
b) BCH allows heterogeneity on z but needs n
Our test: Robust to Γj 6= Γj 0 .
1
Zj Zj0
!p Γj = Γ.
c) BCH: only for t -test - no easy F-test analog.
Our test: multivariate parameters/ other test statistics.
Our test: No “sandwich” estimator needed.
22
Size - Time Series Regression
Yt = Zt + t
,
Zt = 1 + Zt
1
+ vt
,
t
= t
1
Design N
q
4
8
12
Rand
BCH
BRL
Rand
BCH
BRL
Rand
BCH
BRL
0
5.0
5.4
4.8
5.0
5.4
4.7
5.1
5.7
4.9
.
05
5.2
6.1
4.9
5.4
6.9
5.3
5.6
7.4
5.7
.
08
5.2
8.2
5.2
5.8
11.3
7.0
5.9
13.1
8.8
,
H0 : = 1.
Design H
.
0 95
5.3
16.2
7.5
5.4
24.7
14.6
5.5
30.3
21.7
Table: Null rejection probabilities: n = 100,
100 000 MC replications (se. 0.0002).
,
+ ut
0
5.1
18.1
4.9
4.9
11.1
6.9
5.2
9.1
6.4
.
05
5.1
17.6
4.9
5.3
13.0
7.4
5.6
11.6
7.4
= 5%, and M
.
.
08
5.3
18.8
5.0
5.8
17.9
8.7
5.9
18.4
10.5
0 95
5.3
18.2
8.2
5.6
30.0
19.2
5.6
35.4
42.2
,
,
= minf1 000 2q g.
Note: BCH shown to perform better than stand. approach (HAC and KV).
23
Power - Time Series Regression
Yt = Zt + t
,
Zt = 1 + Zt
1
+ vt
,
t
= t
,
H0 : = 1
.
1.0
0.8
0.6
0.2
0.4
Size Adjusted Rejection Rate
0.8
0.6
0.4
0.0
BCH−test
r−test
0.0
0.5
1.0
1.5
2.0
Values of parameter Theta
BCH−test
r−test
0.0
0.2
Size Adjusted Rejection Rate
+ ut
Power in TS Model B − rho=0.8
1.0
Power in TS Model A − rho=0.8
1
0.0
0.5
1.0
1.5
2.0
Values of parameter Theta
Figure: Size Adjusted Power Curves. BCH-test and randomization test.
Parameter values are the following: Model A (left panel) and model B (right
panel), = 0 05, = 0 8, and 100 000 MC.
.
.
,
24
Outline
1. Review of Randomization Tests
a) Symmetric Location Example
2. Randomization Tests under Approximate Symmetry
a) Asymptotic Results
3. Applications
a) Time Series Regression
b) Difference in Differences
4. Empirical Application: Angrist & Lavy (2009) revisited
5. Conclusions
25
Application -
Difference in Differences
,
Observed data X(n ) = f(Yj ,t Dj ,t ) : j
T0
T1
J0
J1
=
=
=
=
2 J0 [ J1 , t 2 T0 [ T1 g, where
pre-treatment time periods
post-treatment time periods
control units
treatment units
Model:
Yj ,t = Dj ,t + j + t + j ,t with E [j ,t j(Dj ,t : t
2 T )] = 0
26
Application -
Difference in Differences
,
Observed data X(n ) = f(Yj ,t Dj ,t ) : j
T0
T1
J0
J1
=
=
=
=
2 J0 [ J1 , t 2 T0 [ T1 g, where
pre-treatment time periods
post-treatment time periods
control units
treatment units
Model:
Yj ,t = Dj ,t + j + t + j ,t with E [j ,t j(Dj ,t : t
2 T )] = 0
Assumption
,
1. minfjT0 j jT1 jg ! 1 and jJ0 j ! 1
j j
I
but J1 is fixed
I
i.e., small number of treated units (as in Conley & Taber, 2011)
2. Processes
fj t : t 2 T0 [ T1 g indep. across j .
,
26
How to Apply our Method
(n )
1. Clusters: Xj
,
= f(Yk ,t Dk ,t ) : k
2 fj g [ J0 , t 2 T0 [ T1 g for j 2 J1 .
(n )
2. Estimators: ˆn ,j by LS with fixed effects using data Xj
for j
2 J1 .
27
How to Apply our Method
(n )
1. Clusters: Xj
,
= f(Yk ,t Dk ,t ) : k
2 fj g [ J0 , t 2 T0 [ T1 g for j 2 J1 .
(n )
2. Estimators: ˆn ,j by LS with fixed effects using data Xj
for j
3. Equivalent to
ˆn j
,
where
∆n ,j =
= ∆n ,j
1
jT1 j
X
2
t T1
1
jJ0 j
Yj ,t
X
2
∆n ,k
2 J1 .
,
k J0
1
jT0 j
X
2
Yj ,t
.
t T0
27
How to Apply our Method
(n )
1. Clusters: Xj
,
= f(Yk ,t Dk ,t ) : k
2 fj g [ J0 , t 2 T0 [ T1 g for j 2 J1 .
(n )
2. Estimators: ˆn ,j by LS with fixed effects using data Xj
for j
3. Equivalent to
ˆn j
,
where
∆n ,j =
= ∆n ,j
1
jT1 j
X
2
t T1
1
jJ0 j
Yj ,t
X
2
∆n ,k
2 J1 .
,
k J0
1
jT0 j
X
2
Yj ,t
.
t T0
4. Under weak assumptions,
Sn (X (n ) ) p
T (ˆn ,j
0 : j 2 J1 )
, ,
N (0 Σ )
where Σ is diagonal.
5. Proceed as before. Note: jJ1 j is not a choice.
27
Our Method: Comments
I
Straightforward to modify for indiv.-level data and/or covariates.
I
Est. ˆn ,j are not independent.
I
Req. minfjT0 j jT1 jg ! 1 could be relaxed ...
,
... but only under stronger ass. on
j t
,
... implementation of test does not change !
28
Our Method: Comments
I
Straightforward to modify for indiv.-level data and/or covariates.
I
Est. ˆn ,j are not independent.
I
Req. minfjT0 j jT1 jg ! 1 could be relaxed ...
,
... but only under stronger ass. on
j t
,
... implementation of test does not change !
I
Req. jJ0 j ! 1 could be relaxed...
... form disjoint pairs of units from J0 and J1 .
|
F
28
Concluding Remarks
I
Developed theory of rand. tests when symmetry only holds approx.
I
The method is widely applicable.
– Time Series Regression.
– Differences-in-Differences.
– Clustered Regression.
I
Revisited the empirical application in Angrist & Lavy (2009).
I
Several advantages over existing methods (robustness and size control).
I
Easily implemented in standard packages, like STATA.
29
References
[1] Joshua Angrist and Victor Lavy. The effects of high stakes high school
achievement awards: Evidence from a randomized trial. The American
Economic Review, 99(4):1384–1414, 2009.
[2] Nail Kutluzhanovich Bakirov and G. J. Székely. Student’s t-test for Gaussian
scale mixtures. Journal of Mathematical Sciences, 139(3):6497–6505, 2006.
[3] Robert M Bell and Daniel F McCaffrey. Bias reduction in standard errors for
linear regression with multi-stage samples. Survey Methodology,
28(2):169–182, 2002.
[4] C. Alan Bester, Timothy G. Conley, and Christian B. Hansen. Inference with
dependent data using cluster covariance estimators. Journal of Econometrics,
165(2):137–151, 2011.
[5] Timothy G. Conley and Christopher R. Taber. Inference with “difference in
differences” with a small number of policy changes. The Review of Economics
and Statistics, 93(1):113–125, 2011.
[6] Rustam Ibragimov and Ulrich K. Müller. t-statistic based correlation and
heterogeneity robust inference. Journal of Business & Economic Statistics,
28(4):453–468, 2010.
[7] Guido W Imbens and Michal Kolesar. Robust standard errors in small
samples: some practical advice. Technical report, National Bureau of
Economic Research, 2012.
[8] Nazgul Jenish and Ingmar R. Prucha. Central limit theorems and uniform
laws of large numbers for arrays of random fields. Journal of econometrics,
150(1):86–98, 2009.
30
Simulations - Dif in Dif
We simulate data as
Yj ,t = Dj ,t + Zj ,t + j ,t
j t
,
Zj , t
= j ,t 1 + 1,j ,t
= Dj ,t + 2,j ,t
Dj ,t = I fj
,
,
,
2 j t N ( 0, 1 ) ,
2 J1 , t tj? g .
, ,
,
Base design: jJ0 j + jJ1 j = 100, q = jJ1 j = 8, T = 10, tj? = minf2j T g,
= 0 5, and 1,j ,t N (0 1).
.
,
31
Simulations - Dif in Dif
We simulate data as
,
Yj ,t = Dj ,t + Zj ,t + j ,t
j t
,
Zj , t
= j ,t 1 + 1,j ,t
= Dj ,t + 2,j ,t
Dj ,t = I fj
,
,
2 j t N ( 0, 1 ) ,
2 J1 , t tj? g .
, ,
,
Base design: jJ0 j + jJ1 j = 100, q = jJ1 j = 8, T = 10, tj? = minf2j T g,
= 0 5, and 1,j ,t N (0 1).
.
,
8 Additional designs:
- (b) jJ0 j + jJ1 j = 50, (c) q = 12, (d) tj? = T /2 for all j
- (e)
= 0.95, (f) T
2 J1 ,
= 3,
- (f: conditional heteroskedastic)
1,j ,t N (0 4) for j 2 J1 and 1,j ,t
,
N (0, 1) for j 2 J0 ,
- (g: heterogeneity in treatment units)
1,j ,t N (0 4) for j 4 and 1,j ,t N (0 1) for j
,
,
- (h: Asym. & Het in Z )
1,j ,t Gamma (0 25 1), and
. ,
2 j t
, ,
as
1 j t
, ,
>4,
in (g).
31
Simulations - Dif in Dif
Y j , t = D j , t + Z j , t + j , t
j t
,
Zj , t
= j ,t 1 + 1,j ,t
= Dj ,t + 2,j ,t
,
Spec.
(a) Base
(b) n = 50
(c) q = 12
(d) t = T /2
(e) = 0 95
(f) T = 3
(g) Cond. Heterosk.
(h) Het. in J1
(i) Asym. & Het in Z
.
Rand
5.58
6.39
6.26
5.56
6.06
5.41
4.78
5.52
5.60
,
,
2 j t N ( 0, 1 ) .
, ,
Rejection probabilities under = 1
NR R
IM
CT
CCE CGM
5.26 5.51
5.85 10.37
5.88
6.01 6.32
7.21
9.36
5.61
6.26 6.10
6.42
8.52
5.35
5.32 5.57
6.75
9.50
5.41
5.67 5.89
6.39
9.92
5.53
5.15 5.44
6.66
9.50
5.70
4.58 4.86 62.02 11.14
5.55
5.24 3.84 51.81 11.08
6.10
5.36 2.86 48.45
9.05
6.70
BRL
4.16
3.93
4.41
4.79
4.23
4.79
4.97
2.93
8.21
Table: Rejection rate (in %). Parameter values for the base design are
jJ0 j + jJ1 j = 100, T = 10, q = 8, = 0.5, and = 5%. Results based on 10, 000
Monte Carlo replications.
32
Simulations - Dif in Dif: power
Y j , t = D j , t + Z j , t + j , t
j t
,
Zj , t
= j ,t 1 + 1,j ,t
= Dj ,t + 2,j ,t
,
Spec.
(a) Base
(b) n = 50
(c) q = 12
(d) t = T /2
(e) = 0 95
(f) T = 3
(g) Cond. Heterosk.
(h) Het. in J1
(i) Asym. & Het in Z
.
Rand
66.49
64.69
85.37
69.44
32.29
59.06
9.54
20.11
60.21
,
,
2 j t N ( 0, 1 ) .
, ,
Rejection probabilities
NR R
IM
CT
65.21 67.70 80.37
63.35 65.60 78.74
85.37 85.58 91.28
68.20 70.40 82.58
31.14 32.91 41.92
57.58 59.93 73.45
8.99
9.69 70.12
19.51 16.61 66.49
59.44 56.83 90.77
under
CCE
81.11
79.51
89.76
81.61
29.20
73.61
18.36
27.55
67.16
=0
CGM
66.42
65.15
84.54
69.43
32.90
58.85
10.40
20.73
59.54
BRL
64.13
63.89
81.36
69.26
17.08
58.99
9.15
14.59
65.86
Table: Unadjusted power (in %). Parameter values for the base design are
jJ0 j + jJ1 j = 100, T = 10, q = 8, = 0.5, and = 5%. Results based on 10, 000
Monte Carlo replications.
|
33
Outline
1. Review of Randomization Tests
a) Symmetric Location Example
2. Randomization Tests under Approximate Symmetry
a) Asymptotic Results
3. Applications
a) Time Series Regression
b) Difference in Differences
4. Empirical Application: Angrist & Lavy (2009) revisited
5. Conclusions
34
Empirical Application -
I
Angrist & Lavy, 2009
AL09 study effect of cash award on Bagrut:
Bagrut: high school matriculation certificate awarded after a sequence
of subject tests (Israel).
Greater weight given to exams in later years.
Formal prerequisite for university admission.
I
Achievement Award demonstration
Project that provided cash award for low-achieving high school students.
35
Empirical Application -
I
Angrist & Lavy, 2009
AL09 study effect of cash award on Bagrut:
Bagrut: high school matriculation certificate awarded after a sequence
of subject tests (Israel).
Greater weight given to exams in later years.
Formal prerequisite for university admission.
I
Achievement Award demonstration
Project that provided cash award for low-achieving high school students.
I
Angrist and Lavy (2009):
Find an increase in Bagrut rates in treated schools (mostly female
students).
Driven largely by “marginal” female students.
35
Empirical Application -
Angrist & Lavy, 2009 (cont)
Data: 40 high schools with the lowest 1999 Bagrut rates.
All students in treated schools elegible for cash award.
36
Empirical Application -
Angrist & Lavy, 2009 (cont)
Form pairs of schools that are similar ex-ante
36
Empirical Application -
Angrist & Lavy, 2009 (cont)
20 Pairs: 1999 Bagrut rates used for matching
36
Empirical Application -
Angrist & Lavy, 2009 (cont)
Treatment
Treatment
Control
Treatment
Control
Control
Cluster 1
Cluster 2
Cluster 3
Treatment was assigned randomly in 2001 within pairs
Clusters: either a pair or two pairs
36
Empirical Application -
Angrist & Lavy, 2009 (cont)
Let i index students and j index schools
E [Yij j] = Λ[Dj + Zj +
X
dki k + Wij ]
,
k 3
I
Angrist and Lavy (2009):
I
OLS and Logit: Zj : school covariates. dki : lagged scores quartiles.
Wij : student covariates. Dj : treatment indicator.
I
Clustered s.e. at school level (using Bias Reduced Linearization)
- Do not use DoF correction to critical value.
37
Empirical Application -
Angrist & Lavy, 2009 (cont)
Let i index students and j index schools
E [Yij j] = Λ[Dj + Zj +
X
dki k + Wij ]
,
k 3
I
I
Angrist and Lavy (2009):
I
OLS and Logit: Zj : school covariates. dki : lagged scores quartiles.
Wij : student covariates. Dj : treatment indicator.
I
Clustered s.e. at school level (using Bias Reduced Linearization)
- Do not use DoF correction to critical value.
Our Approach:
1. Estimate
for each cluster (1 or 2 pairs of schools)
I
cluster
I
cluster
school ! does not identify pair ! does not allow for Zj in some clusters
2. Point estimator: average of ˆn ,j
3. Apply Theorem 2 with G =
f 1, 1g19 and T = jt
j
stat
4. Construct confidence intervals by inverting our test
37
Empirical Application -
Angrist & Lavy, 2009 (cont)
Let i index students and j index schools
E [Yij j] = Λ[Dj + Zj +
X
dki k + Wij ]
,
k 3
I
I
Angrist and Lavy (2009):
I
OLS and Logit: Zj : school covariates. dki : lagged scores quartiles.
Wij : student covariates. Dj : treatment indicator.
I
Clustered s.e. at school level (using Bias Reduced Linearization)
- Do not use DoF correction to critical value.
Our Approach:
1. Estimate
for each cluster (1 or 2 pairs of schools)
I
cluster
I
cluster
school ! does not identify pair ! does not allow for Zj in some clusters
2. Point estimator: average of ˆn ,j
3. Apply Theorem 2 with G =
f 1, 1g19 and T = jt
j
stat
4. Construct confidence intervals by inverting our test
37
Results: all students
Let i index students and j index schools
E [Yij j] = Λ[Dj + Zj +
X
dki k + Wij ]
,
k 3
Treatment Effect: Boys & Girls
Randomization Test
Angrist and Lavy
OLS
Logit
OLS
Logit
Sch. cov. only
90%
95%
Lagged score, micro cov.
90%
95%
0.049
[ -0.078 , 0.164
[ -0.109 , 0.182
0.075
[ -0.034 , 0.178
[ -0.059 , 0.198
]
]
]
]
-0.017
[ -0.147 , 0.093
[ -0.180 , 0.105
0.022
[ -0.058 , 0.102
[ -0.077 , 0.117
]
]
]
]
0.052
[ -0.025 , 0.130 ]
[ -0.040 , 0.144 ]
0.067
[ 0.008 , 0.126 ]
[ -0.003 , 0.138 ]
0.054
[ -0.016 , 0.125
[ -0.030 , 0.138
0.055
[ -0.004 , 0.114
[ -0.015 , 0.125
]
]
]
]
Table: Year 2001. Confidence interval for AL09 computed using their reported point
estimates, the BRL standard errors, and a conventional standard normal critical value.
Randomization test with q = 11, where the pairs of schools were clustered as follows:
{1,3}, {2,4}, {5,8}, {7}, {9,10}, {11}, {12,13}, {14,15}, {16,17}, {18,20}, {19}.
38
Results: Girls only
Let i index students and j index schools.
E [Yij j] = Λ[Dj + Zj +
X
dki k + Wij ]
,
k 3
Treatment Effect: Girls only
Randomization Test
Angrist and Lavy
OLS
Logit
OLS
Logit
Sch. cov. only
90%
95%
Lagged score, micro cov.
90%
95%
0.036
[ -0.132 , 0.195
[ -0.182 , 0.234
0.090
[ -0.049 , 0.226
[ -0.099 , 0.256
]
]
]
]
0.037
[ -0.099 , 0.165
[ -0.144 , 0.183
0.058
[ -0.020 , 0.140
[ -0.047 , 0.157
]
]
]
]
0.105
[ 0.005 , 0.205 ]
[ -0.014 , 0.224 ]
0.105
[ 0.027 , 0.182 ]
[ 0.012 , 0.197 ]
0.093
[ 0.006 , 0.179 ]
[ -0.010 , 0.197 ]
0.097
[ 0.021 , 0.172 ]
[ 0.006 , 0.187 ]
Table: Year 2001. Confidence interval for AL09 computed using their reported point
estimates, the BRL standard errors, and a conventional standard normal critical value.
Randomization test with q = 9, where the pairs of schools were clustered as follows:
{1,3}, {16,4}, {5,7}, {2,12}, {10,11}, {8,19}, {13}, {14,15}, {18,20}.
39
Results: Girls on Top of Cohort
AL09: largest effect on “marginal” female students.
– Use scores on tests prior to Jan. 2001.
– Use pred. prob. from Logit.
Treatment Effect: Girls on top half of cohort
Randomization Test
Angrist and Lavy
by lagged score
by pred. probability by lagged score by pred. probability
Sch. cov. only
90%
95%
Lagged s. or p. prob.
90%
95%
Table:
0.089
[ -0.077 , 0.259
[ -0.129 , 0.289
0.091
[ -0.064 , 0.252
[ -0.113 , 0.286
]
]
]
]
0.081
[ -0.099 , 0.262
[ -0.156 , 0.295
0.076
[ -0.095 , 0.249
[ -0.150 , 0.279
]
]
]
]
0.206
[ 0.076 , 0.335
[ 0.051 , 0.360
0.213
[ 0.083 , 0.342
[ 0.058 , 0.367
]
]
]
]
0.194
[ 0.067 , 0.320
[ 0.043 , 0.344
0.207
[ 0.079 , 0.334
[ 0.054 , 0.359
]
]
]
]
Results for Table 4 in AL: Top Girls only. Year 2001. Clusters as in previous table.
|
40
Proof of Theorem 1
Let P
2 P0 be given.
41
Proof of Theorem 1
Let P
2 P0 be given.
Note
2
EP
4
X
2
3
(gX )5 = MEP [(X )] .
g G
41
Proof of Theorem 1
Let P
2 P0 be given.
Note
2
EP
4
3
X
2
(gX )5 = MEP [(X )] .
g G
Since T (m ) (x ) = T (m ) (gx ) for all g
X
2
2 G and 1 j M , we have
(gx ) = M + (x ) + M 0 (x )a (x ) = M .
g G
2
Hence,
EP
4
X
2
3
(gX )5 = M .
g G
The result follows. |
41
Normal Example Xj
Power
N (0, 1) for j q /2 and Xj N (0, a 2 ) for j > q /2
t-test vs. rand. test using T = jt stat j
1.0
Example 1 − power for q=16 | a=5
1.0
Example 1 − power for q=8 | a=5
Rejection rate
0.6
0.8
t−test
r−test
0.0
0.2
0.4
0.6
0.4
0.0
0.2
Rejection rate
0.8
t−test
r−test
0.0
0.5
1.0
1.5
2.0
2.5
3.0
0.0
0.5
1.0
mu: value of mean
1.5
2.0
2.5
3.0
mu: value of mean
Figure: Rejection rates. t-test and randomization test. Parameter values: q = 8
(left panel) and q = 16 (right panel), = 0 05, and 100 000 MC.
.
,
|
42
Proof of Theorem 2
1. Let fPn
2 Pn 0 : n 1g be given.
,
By Almost Sure Rep. Thm., choose S̃n , S̃ , and U
S̃n
with
d
,
! S̃ w .p .1.
d
,
S̃n = Sn S̃ = S and U
U (0, 1) s.t.
? (S̃n , S̃ )
(on a common prob. space with prob. measure P ).
43
Proof of Theorem 2
1. Let fPn
2 Pn 0 : n 1g be given.
,
By Almost Sure Rep. Thm., choose S̃n , S̃ , and U
S̃n
with
d
,
! S̃ w .p .1.
d
,
S̃n = Sn S̃ = S and U
U (0, 1) s.t.
? (S̃n , S̃ )
(on a common prob. space with prob. measure P ).
2. Define
,
˜ (S̃n U ) 1
0
T (S̃n ) > T (k ) (S̃n ) or T (S̃n ) = T (k ) (S̃n ) and U
T (S̃n ) < T (k ) (S̃n )
< a (S̃n )
.
43
Proof of Theorem 2 (cont.)
By construction,
,
EPn [(Sn )] = EP [˜ (S̃n U )]
Also, from thm. 1
,
EP [˜ (S̃ U )] = .
.
It is enough to show
,
,
EP [˜ (S̃n U )] ! EP [˜ (S̃ U )]
.
44
Proof of Theorem 2 (cont.)
3. Wish to show
,
,
EP [˜ (S̃n U )] ! EP [˜ (S̃ U )]
.
Let En be event where orderings of
fT (g S̃ ) : g 2 Gg and fT (g S̃n ) : g 2 Gg
are the same.
Under our assumptions,
I fEn g ! 1 w.p.1.
45
Proof of Theorem 2 (cont.)
3. Wish to show
,
,
EP [˜ (S̃n U )] ! EP [˜ (S̃ U )]
.
Let En be event where orderings of
fT (g S̃ ) : g 2 Gg and fT (g S̃n ) : g 2 Gg
are the same.
Under our assumptions,
I fEn g ! 1 w.p.1.
On the event En ,
so
˜ (S̃n , U ) = ˜ (S̃ , U ) ,
,
,
EP [˜ (S̃n U )I fEn g] = EP [˜ (S̃ U )I fEn g]
.
45
Proof of Theorem 2 (cont.)
3. Wish to show
,
,
EP [˜ (S̃n U )] ! EP [˜ (S̃ U )]
.
Let En be event where orderings of
fT (g S̃ ) : g 2 Gg and fT (g S̃n ) : g 2 Gg
are the same.
Under our assumptions,
I fEn g ! 1 w.p.1.
On the event En ,
so
˜ (S̃n , U ) = ˜ (S̃ , U ) ,
,
,
EP [˜ (S̃n U )I fEn g] = EP [˜ (S̃ U )I fEn g]
4. The desired result thus follows by Dom. Conv. .
|
45
© Copyright 2026 Paperzz