Capacity Lower Bounds for Channels with
Deletions and Insertions
Ramji Venkataramanan
Sekhar Tatikonda
University of Cambridge
Yale University
1 / 19
InDel Channel
Deletion
d
Input Symbol
i
1−d −i
Insertion
No change
2 / 19
InDel Channel
Deleted
d
iα
x ∈ {0, 1}
i ᾱ
1−d −i
X n = (X1 , . . . , Xn ),
xx (duplication)
x x̄ (complementary insertion)
x
Y Mn = (Y1 , . . . , YMn )
Mn is a random variable ≈ n(1 − d + i)
2 / 19
InDel Channel
Deleted
d
iα
x ∈ {0, 1}
i ᾱ
1−d −i
xx (duplication)
x x̄ (complementary insertion)
x
Computable lower bounds for InDel capacity C(d, i, α)
2 / 19
Motivation
High-density Data Storage
Binary data X
write
=⇒
Y
Bits could be skipped or repeated −→ Loss of synchronization
[Iyengar, Siegel, Wolf ’10]
3 / 19
Related Work
i = 0: Deletion channel
[Diggavi, Grossglauser ’06]
[Drinea, Mitzenmacher ’07]
[Kirsch, Drinea ’10]
[Diggavi, Mitzenmacher, Pfister ’07]
[Fertonani, Duman ’09]
[Kanoria, Montanari ’10]
[Dalai ’10]
..
.
‘Sticky’ channel: d = 0, α = 1 - [Mitzenmacher ’08]
Different channel model: [Gallager ’61], [Fertonani et al ’11]
4 / 19
Deletion Channel
Deleted
d
x
1−d
x
Why is computing C(d) hard?
5 / 19
Challenges
The channel has memory
100001 −→ 1001
Which bits were deleted?
6 / 19
Challenges
The channel has memory
100001 −→ 1001
Which bits were deleted?
1
Optimal input distribution ? (i.i.d inputs not good)
2
Analyzing the optimal decoder?
6 / 19
Strategy
1−γ
γ
0
1
γ
1−γ
P(Xj+1 = x|Xj = x) = γ
First-order Markov input distribution
- i.i.d run-lengths controlled by γ
- best-known lower bounds on C(d) [Drinea-Mitzenmacher ’07]
7 / 19
Strategy
1−γ
γ
0
γ
1
1−γ
P(Xj+1 = x|Xj = x) = γ
First-order Markov input distribution
- i.i.d run-lengths controlled by γ
- best-known lower bounds on C(d) [Drinea-Mitzenmacher ’07]
From Dobrushin ’67:
C(d) = lim Cn
n→∞
Cn , max
P(X n )
1
I (X n ; Y Mn )
n
Bound the mutual information
7 / 19
With Markov Inputs . . .
1−γ
γ
0
1
γ
1−γ
1
1
I (X n ; Y Mn ) ≥ h(γ) − H(X n |Y Mn )
n
P(X n ) n
Cn = max
Challenge: To compute the limiting behavior of n1 H(X n |Y Mn )
8 / 19
Think in terms of runs
1 1} 0| {z
0 0} −→ |{z}
0 0 |{z}
1 |{z}
0 0 = Y Mn
X n = |0 {z
0 0} 1| {z
X n first-order Markov: P(X n ) = P(LX = 3) P(LX = 3) P(LX = 3)
P(LX = j) = γ j−1 (1 − γ)
9 / 19
Think in terms of runs
1 1} 0| {z
0 0} −→ |{z}
0 0 |{z}
1 |{z}
0 0 = Y Mn
X n = |0 {z
0 0} 1| {z
X n first-order Markov: P(X n ) = P(LX = 3) P(LX = 3) P(LX = 3)
1-to-1 correspondence btw runs ⇒
P(Y Mn |X n ) = P(LY = 2|LX = 3) P(LY = 1|LX = 3) P(LY = 4|LX = 3)
k
P(LY = k|LX = j) =
(1 − d)k d j−k
j
9 / 19
Think in terms of runs
1 1} 0| {z
0 0} −→ |{z}
0 0 |{z}
1 |{z}
0 0 = Y Mn
X n = |0 {z
0 0} 1| {z
X n first-order Markov: P(X n ) = P(LX = 3) P(LX = 3) P(LX = 3)
1-to-1 correspondence btw runs ⇒
P(Y Mn |X n ) = P(LY = 2|LX = 3) P(LY = 1|LX = 3) P(LY = 4|LX = 3)
If no runs are deleted . . .
Single-letter characterization of P(X n , Y Mn ) in terms of runs:
H(X n |Y Mn ) = (# runs) H(LX |LY )
9 / 19
Deleted Runs
X n = 0 0 0 1 0 −→ 0 0 0 = Y Mn
Deleted runs destroy correspondence btw input and output runs
10 / 19
Deleted Runs
X n = 0 0 0 1 0 −→ 0 0 0 = Y Mn
Deleted runs destroy correspondence btw input and output runs
Main Idea
Extra info to indicate positions of completely deleted runs
0 0 0 1 0 −→ 0 0 0
0 0 0 1 0 −→ 0 0 0
00 − 0
000 − −
10 / 19
Deleted Runs
X n = 0 0 0 1 0 −→ 0 0 0 = Y Mn
Deleted runs destroy correspondence btw input and output runs
Main Idea
Extra info to indicate positions of completely deleted runs
0 0 0 1 0 −→ 0 0 0
0 0 0 1 0 −→ 0 0 0
00 − 0
00 − 0
000 − −
↔ (0, 0, 1, 0)
0 0 0 − − ↔ (0, 0, 0, 2)
Can be represented as auxiliary sequence (S1 , . . . , SMn +1 )
Sj : # runs completely deleted between Yj−1 and Yj
10 / 19
A Coding Scheme
Pick 2nR codewords {X n (1), X n (2), . . . , } ∼ Markov(γ)
From Y Mn , using joint typicality decode S Mn +1 and then X n
Probability of error:
2nR
Mn +1 |X n )
2| H(S {z
2−I (X
n S Mn +1 ;Y Mn )
}
aux. sequences/cwd
11 / 19
A Coding Scheme
Pick 2nR codewords {X n (1), X n (2), . . . , } ∼ Markov(γ)
From Y Mn , using joint typicality decode S Mn +1 and then X n
Probability of error:
2nR
Mn +1 |X n )
2| H(S {z
2−I (X
n S Mn +1 ;Y Mn )
}
aux. sequences/cwd
Achieved Rate
R<
i
1h
H(X n ) − H(S Mn +1 |Y Mn ) − H(X n |T Mn , Y Mn )
n
Each term can be computed exactly [ISIT ’11]
Scheme is suboptimal – what is the rate loss?
11 / 19
Mutual Information Decomposition
With first-order Markov input:
I (X n ; Y Mn )
= H(X n ) − H(X n |Y Mn )
= nh(γ) − H(S Mn +1 |Y Mn ) − H(X n |S Mn +1 , Y Mn ) + H(S Mn +1 |X n , Y Mn )
{z
} |
{z
}
|
rate of coding scheme
penalty term
12 / 19
Mutual Information Decomposition
With first-order Markov input:
I (X n ; Y Mn )
= H(X n ) − H(X n |Y Mn )
= nh(γ) − H(S Mn +1 |Y Mn ) − H(X n |S Mn +1 , Y Mn ) + H(S Mn +1 |X n , Y Mn )
{z
} |
{z
}
|
rate of coding scheme
penalty term
Lower bound on penalty term → improved lower bound on C(d)
12 / 19
Bounding the penalty
x = 0 0 0 0 1 1 1 0 0 0 0 −→ 0 0 0 = y
Possible S sequences:
(0, 0, 0, 2)
(0, 1, 0, 0)
(0, 0, 1, 0)
(2, 0, 0, 0)
↔
000 − −
↔
00 − 0
↔
↔
0 − 00
− −0 0 0
13 / 19
Bounding the penalty
x = 0 0 0 0 1 1 1 0 0 0 0 −→ 0 0 0 = y
Possible S sequences:
(0, 0, 0, 2)
(0, 1, 0, 0)
(0, 0, 1, 0)
(2, 0, 0, 0)
↔
000 − −
↔
00 − 0
↔
↔
0 − 00
− −0 0 0
For this (x, y ):
Can compute prob. of deletion patterns resulting in each S
H(S|x, y ) can be calculated exactly
13 / 19
Bounding the penalty
Can compute H(S|x, y ) exactly for
0 0} −→ 0| {z
0 0} = y
x = |0 0{z0 0} 1 . . . 1 |0 {z
z bits
r bits
s bits
14 / 19
Bounding the penalty
Can compute H(S|x, y ) exactly for
0 0} −→ 0| {z
0 0} = y
x = |0 0{z0 0} 1 . . . 1 |0 {z
z bits
r bits
s bits
For typical (X , Y ), count # times such a pattern appears
⇒ Lower bound on H(S|X , Y )
14 / 19
Bounding the penalty
Can compute H(S|x, y ) exactly for
0 0} −→ 0| {z
0 0} = y
x = |0 0{z0 0} 1 . . . 1 |0 {z
z bits
r bits
s bits
For typical (X , Y ), count # times such a pattern appears
⇒ Lower bound on H(S|X , Y )
lim inf n1 H(S|X , Y ) ≥ Φ(d, γ)
∞
z+r s X
d̄ q̄ γ̄ 3 d X
d̄
z +r
z+r
Φ(d, γ) = 2
(γd)
γ (1 − γd) z,r =1
d
s
s=1
( )
z
r
·H
l
s−l
z+r
s
l=0,...,s
14 / 19
Deletion bound
Theorem
The deletion capacity C(d) satisfies
n
o
C(d) ≥ max
h(γ) − d̄ H(S2 |Y1 Y2 ) − γ̄ H(LX |LY 0 ) + Φ(d, γ)
0<γ<1
15 / 19
Deletion bound
Theorem
The deletion capacity C(d) satisfies
n
o
C(d) ≥ max
h(γ) − d̄ H(S2 |Y1 Y2 ) − γ̄ H(LX |LY 0 ) + Φ(d, γ)
0<γ<1
d
0.05
0.10
0.15
0.20
0.25
0.30
0.35
0.40
0.45
0.50
LB of Thm.
0.7291
0.5638
0.4414
0.3482
0.2770
0.2225
0.1805
0.1478
0.1217
0.1005
LB [Drinea-Mitzenmacher]
0.7283
0.5620
0.4392
0.3467
0.2759
0.2224
0.1810
0.1484
0.1229
0.1019
Optimal γ
0.535
0.575
0.62
0.67
0.72
0.77
0.81
0.84
0.87
0.89
15 / 19
Insertions
iα
x
i ᾱ
1−i
Xn = 0 0 0 1 1 1 0 0 0
xx (duplication)
x x̄ (complementary insertion)
x
Y Mn = 0 01 0 1 1 1 0 0 00
Only complementary insertions introduce new runs
16 / 19
Insertions
iα
x
i ᾱ
1−i
Xn = 0 0 0 1 1 1 0 0 0
xx (duplication)
x x̄ (complementary insertion)
x
Y Mn = 0 01 0 1 1 1 0 0 00
Only complementary insertions introduce new runs
Define T Mn = indicator of complementary insertions
Above: T Mn = 0 0 1 0 0 0 0 0 0 0 0
Given T Mn , flip/delete complementary insertions in Y Mn
⇒ 1-to-1 correspondence btw runs
16 / 19
InDel Decomposition
Deleted
d
iα
xx
i ᾱ
x x̄
x
1−d −i
x
I (X n ; Y Mn )
= H(X n ) − H(X n |Y Mn )
n
= nh(γ) − HP (X n , T Mn , S Mn +1 |Y Mn ) + H(T M , S Mn +1 |X n , Y Mn )
|
{z
} |
{z
}
rate of coding scheme
penalty term
17 / 19
InDel Bound
Theorem
i
C (d, i, α) ≥ max [Rsub (γ) + Φ(d, γ) + (1 − d)Γ( 1−d
, α, q)].
0<γ<1
where Rsub (γ) is the rate of the suboptimal coding scheme.
18 / 19
InDel Bound
0.9
0.8
Capacity lower bound
0.7
0.6
α=1
0.5
0.4
α = 0.8
0.3
0.2
0.1
0
0
0.05
0.1
0.15
0.2
0.25
0.3
Deletion prob. d = Insertion prob. i
18 / 19
Summary
Main Idea
Mutual information = Rate of run-syncing decoder + Penalty
Improved C(d) for d ≤ 0.3
First bounds for Insertion & InDel Channels with α < 1
Future Directions
Improve bound on penalty term
- Identify patterns for which H(S, T |X , Y ) computable
UPPER bounds: assume these sequences come ‘for free’
Combine with techniques of [Diggavi et al ’07], [Fertonani et
al ’09]
Full paper: http://arxiv.org/abs/1102.5112
19 / 19
© Copyright 2026 Paperzz