Z - cloudfront.net

Rank-Based Approach to Optimal
Score via Dimension Reduction
Shao-Hsuan Wang
National Taiwan University, Taiwan
Nov 2015
1
Rank-based measures
Kendall’s
Concordance Index
Rank correlation
Widely used in medical statistics, epidemiology,
economics, and sociology, etc.
2
Rank-based measures
Regression Model
Y
Z
: a univariate response
(Z 1 ,
, Z p ) : multiple covariates
3
Rank-based measures
Y R
Response
Y
T
T
Z
Z R
Composite score
4
Rank-based measures
For pair of observations
(Y1 ,
T
Z)1
and
T
(Y,2
Z)2
,
concordant :
Y1 Y
2
and
T
T
Z
1
Z2
Y1
Y2 and
T
Z
Z2
Y1
Y2 and
T
Z
T
1
Z2
discordant :
Y1 Y
2
and
T
Z1
T
T
1
Z2
5
Rank-based measures
Kendall’s
T
P(Y
Y,2
1
T
Z1
Z2 )
P(Y1 Y 2 ,
T
Z1
T
Z2 )
Rank correlation
rc
P(Y 1 Y
2,
T
Z1
T
Z 2)
Concordance Index
CI
P(
T
Z1
T
Z 2 | Y1 Y) 2
6
Rank-based measures
Y R
Response
Y
T
T
Z
Z R
Composite score
7
Rank-based measures
There could not exist a
monotonic association !!
8
Motivation
Composite score
T
Z
g(Z)
measurable functions
10
C-max
Y R
g(Z) R
Response
Composite score
Concordance-index function :
C-max :
C(g)
P(g(Z1 ) g(Z)|Y2 Y)
1
Cmax
supC (g)
2
g Fc
Optimal score : m(Z) such that
m supC(g)
g Fc
11
Intrinsic model
behind Rank-based measures
M1 Distributional assumption
: Generalized Regression Model (Han 1987)
M2 Structural assumption
: Dimension Reduction (Li 1991, Cook 1991)
12
Intrinsic model
behind Rank-based measures
M1
a non-degenerate
monotonic function on R
Y
G(md0 (Z), )
13
Intrinsic model
behind Rank-based measures
M1
a non-degenerate
monotonic function on R
Y
G(md0 (Z), )
an unspecifed bivariate function strictly
increasing at each component for the other
one being fixed
14
Intrinsic model
behind Rank-based measures
M2
Y
D G(m d0 (Z),
a multivariate polynomial of the unknown degreed
)
0
15
Intrinsic model
behind Rank-based measures
M2
Dimension Reduction
md0 (Z)
(1)d
(2) B0
0
T
md k (B 0 Z)
0 0
be the smallest degree such that
{
01
,
,
0k0
Y Z | md 0(Z)
} is a basis of the central subspace (CS)
16
Model Flexibility
Linear regression model
Binary Choice model
Accelerated Failure time model
T
0
Y
Z
T
0
Y I(
Z
0)
T
0
log(Y)
Z
Generalized linear regression model (GLM)
Non-monotonic regression model
Y
(
T
0
Z) 2
17
Types of covariates
all discrete but continuous covariates
Covariates which moments could not exist
18
Theories
Propositions:
(1) Existence
(2) Uniqueness
md0 (Z
arg maxC(g)
g
f d0 (Z) arg maxC(g)
g
f
d0
(Z)
cm
1
d0
c
(Z)
for a ploynomial f d0 (z) of the degree d0
(3) Optimality
g(Z) arg maxC(g)
g(Z) T(m
g
d0
(Z))
for some monotonic function T
19
2
Summary
T
Z could not be the best composite score
Model flexibility
Various types of covariates
Optimal score : existence, uniqueness, and
optimality
20
How to estimate
d0 : structural degree
k 0 : structural dimension
S(B0 ) : the central subspace
md0 k(BZ)
:
the
optimal
score
0
0
C max : the C-max
Estimation Procedure
Estimation Procedure
Step1 Derive md (Z) by maximizing the concordance index function via
the generalized single-index form of the polynomial
d
Tips:
(1)
p
m(Z)
d
cr
1
r 0r1
Z
rp
rp r
rj
T
Z
j 1
n n
I(
(2)
C n (m (Z))
d
C 0n ( )
T
T
Zi
Z j,Y i Y )j
i 1 j 1
n n
I(Y i
i 1 j 1
Y j)
Estimation Procedure
Step 2 Apply the outer grandient approach to obtain B
Tips : (1)
md (u)
0
md k (B 0T u)
0 0
(2)
col(S(B))0
T
md0 (u)( m d0 (u))dW(u))
col(
u Rp
k
Estimation Procedure
Step 3 Derive the estimator of mdk (BkT Z)
Tips : (1)
Z
BkTZ
n n
(2)
T
I(
ˆ
Zi
T
Z,Y
j
i 1 j 1
arg max
n n
I(Y i Y j)
i 1 j 1
(3)
mdk (BT Z)
k
ˆT Z
i
Y )j
Estimation Procedure
Step 4
Adopt the concordance-based generalized BIC to estimate
d,0 k, 0S(B),m
0
Tips : (1)
T
(BZ),
and C
d0k0
0
IC(d,k)
nC
n (m
max
dk (B
T
k
withIC(0,k) 1/2
(2)
(d,k)
arg maxIC(d,k)
0 d,1
p 1
Z))
logn
2
k d
(C k
1)
Asymptotic results
Consistent model selection
--- parsimonious model among the class of
Correct models (d0 ,k 0 )
n-consistency of estimators of S(B)0 andm
T
(B
d0k0
0 Z)
Asymptotic normality of estimators of C max
27
Wine Data
• Vinho verde wine : red wine and white wine
(from the Minho Region of Northern Portugal)
• Collected from May/2004 -February/2007
• Red wine : sample size (n)=1599
White wine : n=4898
•Physicochemical and sensory tests
Wine data
Response (Y):
Preferences 0 (bad) -10 (excellent)
11 Covariates (Z) :
fixed acidity, volatile acidity,
citric acid, residual sugar,
chlorides, free sulfur dioxide,
total sulfur dioxide, density,
PH, sulphates, and alcohol
Wine data
30
Wine data
31
Thank You !