Proof of Lemma 1 Proof of Proposition 2

Page 1 of 5
Proof of Lemma 1
2N
1
∂
C(x),
YA =
∂x
RS (x + n10 )2 (2N − x − n10 )2
where
C(x) = 2S(xS − n10 R)(x + n10 )(2N − x − n10 ) − (xS − n10 R)2 [(2N − x − n10 ) − (x + n10 )]
= (xS − n10 R)(2N − x − n10 ) S(x + n10 ) − (xS − n10 R)
+ (xS − n10 R)(x + n10 ) S(2N − x − n10 ) + (xS − n10 R)
= (xS − n10 R) (2N − x − n10 )n10 N + (x + n10 )(2S − n10 )N .
Because 2N − x − n10 > 0 and 2S − n10 ≥ 0, therefore
∂
and ∂x
YA < 0 when x < n10 R/S.
∂
∂x YA
> 0 when x > n10 R/S,
Proof of Proposition 2
We will only prove the first part of Proposition 2 and show that when a significant
genotype table exists the first part indeed yields the shortest Hamming distance;
the second part of Proposition 2 is designed to handle the extreme cases and does
not need proof.
Denote the number of changes made to the insignificant genotype table D until it
becomes a significant table D0 in each possible direction by vE , vSE , vS , vW , vN W , vN ,
where the subscripts indicate the direction of change described in Figure 1; that is,
E
SE
S
W
NW
N
: (r0
: (r0
: (r0
: (r0
: (r0
: (r0
→ r0 + 1,
→ r0 + 1,
→ r0 ,
→ r0 − 1,
→ r0 − 1,
→ r0 ,
r1
r1
r1
r1
r1
r1
→ r1 )
→ r1 − 1)
→ r1 − 1)
→ r1 )
→ r1 + 1)
→ r1 + 1).
Let xL and xR denote the values of the χ2 statistic represented by the left and the
right black lines, respectively, in Figure 2. Let x0 denote the value of x on which D
resides. When we move the table D to the shaded area to the left of the black lines,
we will immediately stop moving D when it becomes a table D0 that resides on the
line 2r0 + r1 = bxL c. Similarly for the shaded area to the right of the black lines,
00
we immediately stop moving D when it becomes a table D that resides on the line
2r0 + r1 = dxR e. Observe that the number of dotted lines 2r0 + r1 = x, representing
discreet values of x, between D and the line on which D0 reside is x − bxL c − 1,
00
and that between D and the line on which D reside is dxR e − x − 1. Also observe
that moving in the direction of E, SE, S, W, N W, N results in a change of 2r0 + r1
in the direction of 2, 1, −1, −2, −1, 1, respectively.
Let’s first find the shortest Hamming distance from the table D, represented by the
point (r0 , r1 ), to the shaded area to the left of the black lines. Finding the shortest
Page 2 of 5
Hamming distance is equivalent to solving the following optimization problem:
minimize: vE + vSE + vS + vW + vN W + vN
subject to:
− 2vE − vSE + vS + 2vW + vN W − vN ≥ (2r0 + r1 ) − bxL c
− 2vE − vSE + vS + 2(vW − 1) + vN W − vN < (2r0 + r1 ) − bxL c
vW + vN W − vE − vSE ≤ r0 − r0min
vS + vSE − vN − vN W ≤ r1 − r1min
vE , vSE , vS , vW , vN W , vN ≥ 0,
where r0min and r1min are the smallest possible values for r0 and r1 , respectively.
Because of the requirement that the margins of the genotype table to be positive,
r0min and r1min can be greater than 0. The first constraint ensures that D crosses or
ends up on the black line on the left. The second constraint ensures that D does not
end up too far from the black line; i.e., moving D to the east by 1 step will prevent
D from crossing the black line. The third and fourth constraint ensure that D stays
inside the grid does not move past the r0min = 0 and r1min = 0 lines, respectively.
Let’s rewrite the optimization problem as the following:
minimize:
subject to:
vE + vSE + vS + vW + vN W + vN
2vE + vSE − vS − 2vW − vN W + vN + (2r0 + r1 ) − bxL c ≤ 0
2vE + vSE − vS − 2(vW − 1) − vN W + vN + (2r0 + r1 ) − bxL c − 1 ≤ 0
−vE − vSE + vW + vN W − r0 + r0min ≤ 0
vSE + vS − vN W − vN − r1 + r1min ≤ 0
−vE ≤ 0
−vSE ≤ 0
−vS ≤ 0
−vW ≤ 0
−vN W ≤ 0
−vN ≤ 0
Page 3 of 5
Let’s assign ui ≥ 0, i ∈ {1, 2, 3, 4, E, SE, S, W, N W, N } to each inequality constraint. Then the KKT conditions are


 −1 = 2u1 + 2u2 − u3 − uE



 −1 = u1 + u2 − u3 + u4 − uSE






−1 = −u1 − u2 + u4 − uS






−1 = −2u1 − 2u2 + u3 − uW





−1 = −u1 − u2 + u3 − u4 − uN W






−1 = u1 + u2 − u4 − uN






0 ≥ 2vE + vSE − vS − 2vW − vN W + vN + (2r0 + r1 ) − bxL c




 0 ≥ 2vE + vSE − vS − 2(vW − 1) − vN W + vN + (2r0 + r1 ) − bxL c − 1

0 ≥ −vE − vSE + vW + vN W − r0 + r0min






0 ≥ vSE + vS − vN W − vN − r1 + r1min






0 = u1 {2vE + vSE − vS − 2vW − vN W + vN + (2r0 + r1 ) − bxL c}






0 = u2 {2vE + vSE − vS − 2(vW − 1) − vN W + vN + (2r0 + r1 ) − bxL c − 1}





0 = u3 −vE − vSE + vW + vN W − r0 + r0min





min

0
=
u
v
+
v
−
v
−
v
−
r
+
r
4
SE
S
N
W
N
1

1




 0 = uE vE = uSE vSE = uS vS = uW vW = uN W vN W = uN vN




0 ≤ ui , i ∈ {1, 2, 4, E, SE, S, W, N W, N }
Because the objective function is concave and the inequality constraints are convex,
the KKT conditions are sufficient for optimality. The following points satisfy the
KKT conditions, and hence they are solutions to the optimization problem:
(i) When 2 r0 − r0min ≥ (2r0 + r1 ) − bxL c and (2r0 + r1 ) − bxL c is even:
(vE , vSE , vS , vW , vN W , vN ) = (0, 0, 0,
(2r0 + r1 ) − bxL c
, 0, 0)
2
1
(u1 , u2 , u3 , u4 ) = ( , 0, 0, 0)
2
3 1
1 3
(uE , uSE , uS , uW , uN W , uN ) = (2, , , 0, , )
2 2
2 2
(ii) When 2 r0 − r0min ≥ (2r0 + r1 ) − bxL c and (2r0 + r1 ) − bxL c is odd:
(vE , vSE , vS , vW , vN W , vN ) = (0, 0, 0,
(2r0 + r1 ) − bxL c
, 0, 0)
2
1
(u1 , u2 , u3 , u4 ) = (0, , 0, 0)
2
3 1
1 3
(uE , uSE , uS , uW , uN W , uN ) = (2, , , 0, , )
2 2
2 2
Page 4 of 5
(iii) When 2 r0 − r0min < (2r0 + r1 ) − bxL c:
vW = r0 − r0min
vS = (2r0 + r1 ) − bxL c − 2vW
(vE , vSE , vN W , vN ) = (0, 0, 0, 0)
(u1 , u2 , u3 , u4 ) = (1, 0, 1, 0)
(uE , uSE , uS , uW , uN W , uN ) = (2, 1, 0, 0, 1, 2)
That is, the optimal solution can be found by either
(1) increasing vW until a solution is found, if 2 r0 − r0min > (2r0 + r1 ) − bxL c, or
(2) increasing vW until vW = r0 then decreasing vS until a solution is found, if
2 r0 − r0min < (2r0 + r1 ) − bxL c.
Similarly, we can find the shortest Hamming distance from the table D, represented by the point (r0 , r1 ), to the shaded area to the right of the black linesby
solving the following optimization problem:
minimize:
vE + vSE + vS + vW + vN W + vN
subject to: 2vE + vSE − vS − 2vW − vN W + vN ≥ +dxR e − (2r0 + r1 )
2(vE − 1) + vSE − vS − 2vW − vN W + vN < dxR e − (2r0 + r1 )
(r1 + vN + vN W − vS − vSE ) − r1max
rmin − r1max
≤ 1max
min
(r0 + vE + vSE − vW − vN W ) − r0
r0 − r0min
vE , vSE , vS , vW , vN W , vN ≥ 0,
Where rimax and rimin are, respectively, the maximum and minimum values ri can
take. The first constraint ensures that D crosses or ends up on the black line on the
right. The second constraint ensures that D does not end up too far from the black
line; i.e., moving D to the west by 1 step will prevent D from crossing the black line.
The third constraint ensures that D stays inside the grid and does not move past the
rmin − r1max
y − r1max
= 1max
, which in Figure 2 is the top-right boundary formed
line
min
x − r0
r0 − r0min
by connecting the right most dots for each r1 . Once again, the KKT conditions
are sufficient for optimality. Let’s assign ui ≥ 0, i ∈ {1, 2, 3, E, SE, S, W, N W, N } to
each inequality constraint, then the following points satisfy the KKT conditions:
r max −r min
(i) When r0max −r0min (r1max − r1 )− r0 − r0min ≥ (2r0 +r12)−dxR e and (2r0 +r1 )−dxR e
1
1
is even:
(2r0 + r1 ) − dxR e
, 0, 0, 0, 0, 0)
2
1
(u1 , u2 , u3 ) = ( , 0, 0)
2
1 3
3 1
(uE , uSE , uS , uW , uN W , uN ) = (0, , , 2, , )
2 2
2 2
(vE , vSE , vS , vW , vN W , vN ) = (
Page 5 of 5
r max −r min
(ii) When r0max −r0min (r1max − r1 )− r0 − r0min ≥
1
1
is odd:
(2r0 +r1 )−dxR e
2
and (2r0 +r1 )−dxR e
(2r0 + r1 ) − dxR e
(vE , vSE , vS , vW , vN W , vN ) = (
, 0, 0, 0, 0, 0)
2
1
(u1 , u2 , u3 ) = (0, , 0)
2
3 1
1 3
(uE , uSE , uS , uW , uN W , uN ) = (0, , , 2, , )
2 2
2 2
(iii) When
r0max −r0min
r1max −r1min
(r1max − r1 ) − r0 − r0min <
vE =
(2r0 +r1 )−dxR e
:
2
r0max − r0min max
(r
− r1 ) − r0 − r0min
r1max − r1min 1
vSE = (2r0 + r1 ) − dxR e − 2vE
(vS , vW , vN W , vN ) = (0, 0, 0, 0)
r1max − r1min /2
1
u1 = +
2 2 r0max − r0min − r1max − r1min
u2 = 0
u3 =
1
2 r0max − r0min − r1max − r1min
(uE , uSE , uS , uW , uN W , uN ) = (0, 0, 1, 2, 2, 1)
That is, the optimal solution can be found by either
rmax − r0min max
(r
− r1 )− r0 − r0min ≥
(1) increasing vE until a solution is found, if 0max
r1 − r1min 1
(2r0 + r1 ) − dxR e
, or
2
rmax − r0min max
(r
− r1 ) − r0 − r0min , then decreas(2) increasing vE until vE = 0max
r1 − r1min 1
rmax − r0min max
ing vSE until a solution is found, if 0max
(r1 − r1 ) − r0 − r0min <
min
r1 − r1
(2r0 + r1 ) − dxR e
.
2