Erasure Correcting
Codes
In The Real World
Udi Wieder
Incorporates presentations made by Michael Luby
and Michael Mitzenmacher.
Based On..
Practical Loss-Resilient Codes
Michael Luby, Amin Shokrollahi, Dan Spielman, Bolker Stemann
Analysis of Random Processes Using And-Or Tree
Evolution
Michael Luby, Amin Shokrollahi
SODA ’98
LT Codes
Michael Luby
STOC ’97
STOC 2002
Online Codes
Petar Maymounkov
Probabilistic Channels
1-p
0
1-p
0
0
p
0
p
?
p
1
1-p
p
1
The binary erasure
channel
1
1-p
1
The binary symmetric
channel
Erasure Codes
n
Content
Encoding
cn
Encoding
Transmission
≥n
Received
Decoding
n
Content
Performance Measures
Time Overhead
The time to encode and decode expressed as a multiple of the
encoding length.
Reception Efficiency
Ratio of packets in message to packets needed to decode.
Optimal is 1.
Known Codes
Random Linear Codes (Elias)
A linear code of minimum distance d is capable of correcting any
pattern of d-1 or less erasures.
Achieves capacity of the channel with high probability, i.e. can be
used to transmit over erasure channel at any rate R<1-p.
Decoding time O(n3). Unacceptable.
Reed-Solomon Codes
Optimal reception efficiency with probability 1.
Decoding and Encoding in Quadratic time. (About one minute to
encode 1MB).
Tornado Codes
Practical Loss-Resilient Codes
Michael Luby, Amin Shokrollahi, Dan Spielman, Bolker
Stemann (1997)
Analysis of Random Processes Using AndOr Tree Evolution
Michael Luby, Amin Shokrollahi (1998)
Low Density Parity Check
Codes
Introduced in the early 60’s by Gallager and were reinvented many
times.
a b e
Check bits
Message bits
a
b
c
d
e
f
g
h
i
The time to encode is proportional to the number of edges.
j
k
l
Encoding Process.
Bipartite
Bipartite
Graph
Graph
Standard
LossResilient
Code.
Length of message: k
Check bits:
Rate: 1-
Decoding Rule
Given the value of a check bit and all but one of the
message bits on which it depends, set the missing
message bit to be the XOR of the check bit and its
known message bits.
XOR the message bit with all its neighbors.
Delete from the graph the message bit and all edges to
which it belongs.
Decoding ends (successfully) when all edges are
deleted.
Decoding Process
a
b
?
b g
c
e g h
d
be g h
?
f
?
?
Decoding Process
b
?
bg g
e g h
gh h
be eg
?
?
?
Regular Graphs
Degree 3
Random
Permutation
of the Edges
Degree 6
3-6 Regular Graph Analysis
left
right
left
Pr[ not recovered]
= ¢ (1-(1-x)5)2
Pr[ all recovered]
= (1-x)5
x = Pr[ not recovered ]
Decoding to Completion
(sketch)
Most message bits are roots of trees.
Concentration results (edge exposure martingale) proves
that all but a small fraction of message bits are decoded
with high probability.
The remaining bits are decoded do to expansion.
(Original graph is a good expander on small sets).
If a set of size s and average degree a has more than as/2
neighbors then a unique neighbor exists and decoding continues.
Efficiency
Encoding time (sec), 1k packets
Decoding time (sec), 1k packets
size
size
Reed-Solomon Tornado
Reed-Solomon Tornado
250k
4.6
0.06
250k
2.06
0.06
500k
19
0.12
500k
8.4
0.09
1 MB
93
0.26
1 MB
40.5
0.14
2 MB
442
0.53
2 MB
199
0.19
4 MB
1717
1.06
4 MB
800
0.40
9 MB
6994
2.13
9 MB
3166
0.87
16 MB
30802
4.33
16 MB
13829
1.75
Rate = 0.5
Erasure probability = 0.5
Implementation = ?
LT Codes
LT Codes
Michael Luby (2002)
‘Rateless’ Codes
A different model of transmition.
Sender sends an infinite sequence of encoding
symbols.
Erasures are independent of content.
Receiver may decode when received enough
symbols.
Time complexity: Average time for encoding a symbol.
Reception efficiency.
‘Digital Fountain’ approach.
Applications
Unreliable Channels.
Multi-source download.
In Tornado codes small rate implies big graphs and therefore a
lot of memory (proportional to the size of the encoding).
Downloading from different servers requires no coordination.
Efficient exchange of data between users requires small rate of
the source.
Multi-cast without feedback (say over the internet).
Rateless codes are the natural notion.
Trivial Examples - Repetition
Each time unit send a random symbol of the
code.
Advantage: Encoding complexity O(1).
Disadvantage: Need k’ = k ln(k/) code symbols to
cover all k content symbols with failure probability at
most .
Example:
k = 100,000, =10-6
Reception overhead = 2400% (terrible)
Trivial Examples – Reed Solomon
Each time unit send an evaluation of the
polynomial on a random point.
Advantage: Decoding possible when k symbols
received.
Disadvantage: Large time complexity for encoding
and decoding.
Parameters of LT Codes
Encoding time complexity O(ln n) per symbol.
Decoding time complexity O(n ln n).
Reception efficiency: Asymptotically zero (unlike Tornado
codes).
Failure probability: very small (smaller than Tornado).
LT encoding
Content
Choose 2 random
content symbols
XOR content
symbols
2
Choose degree
Insert header, and send
Degree Dist.
Degree
Prob
1
0.055
2
0.3
3
0.1
4
0.08
100000
0.0004
LT encoding
Content
Choose 1 random
content symbol
Copy content
symbol
1
Choose degree
Insert header, and send
Degree Dist.
Degree
Prob
1
0.055
2
0.3
3
0.1
4
0.08
100000
0.0004
LT encoding
Content
Choose 4 random
content symbols
XOR content
symbols
4
Choose degree
Insert header, and send
Degree Dist.
Degree
Prob
1
0.055
2
0.3
3
0.1
4
0.08
100000
0.0004
LT encoding properties
Encoding symbols generated independently of each other
Any number of encoding symbols can be generated on the
fly
Reception overhead independent of loss patterns
The success of the decoding process depends only on the degree
distribution of received encoding symbols.
The degree distribution on received encoding symbols is the same
as the degree distribution on generated encoding symbols.
LT decoding
Content (unknown)
1.
Collect enough encoding symbols and set up graph between encoding
symbols and content symbols to be recovered
2.
Identify encoding symbol of degree 1. STOP if none exists.
3.
Copy value of encoding symbol into unique neighbor, XOR value of
newly recovered content symbol into encoding symbol neighbors and
delete edges emanating from content symbol.
4.
Go to Step 2.
Releasing an encoding symbol
xth recovered
content symbol
releases encoding symbol
x-1 recovered
content symbols
x-1
x
k-x unrecovered
content symbols
content symbol
can be recovered
by encoding symbol
i-2
encoding symbol of degree i
The Ripple
Definition: At each decoding step, the ripple is the set of
encoding symbols that have been released at any previous
decoding step but their one remaining content symbol has
not yet been recovered.
x recovered
k-x unrecovered
x
collision
content symbols
content symbols
encoding symbols
in the ripple
Successful Decoding
Decoding succeeds iff the ripple never becomes empty
Ripple small
chance of encoding symbol collisions small reception overhead
Risk of ripple becoming empty due to random fluctuations is large
Small
Ripple large
chance of encoding symbol collisions large reception overhead
Risk of ripple becoming empty due to random fluctuations is small
Large
LT codes idea
Control the release of encoding symbols over the
entire decoding process so that ripple is never
empty but never too large
Very
few encoding symbol collisions
Very little reception overhead
Release probability
Definition: Release probability for degree i encoding
symbols at decoding step x is q(i,x).
Proposition:
For
i = 1: q(i,x) = 1 for x = 0, q(i,x) = 0 for all x > 1
For i > 1: for x = i -1, …, k-1,
q (i, x)
i (i 1) ( k x)
i
j 1
i 2
j 1
k j 1
x j
Release probability
xth recovered
content symbol
releases encoding symbol
x-1 recovered
content symbols
i-2
x-1
x
k-x unrecovered
content symbols
content symbol
can be recovered
by encoding symbol
encoding symbol is released at decoding step x
Release distributions for
specific degrees
i=2
i=3
i=4
i = 10
i = 20
k = 1000
Overall release probability
Definition: At each decoding step x, r(x) is the overall
probability that an encoding symbol is released at
decoding step x with respect to specific degree
distribution p(·)
Proposition: r ( x)
p(i) q(i, x)
i
Uniform release question
Question: Is there a degree distribution such that
the overall release distribution is uniform over x?
Why interesting?
One
encoding symbol released for each content
symbol decoded
Ripple will tend to stay small minimize reception
overhead
Ripple will tend not to become empty decoding will
succeed
Uniform release answer: YES!
Ideal Soliton Distribution:
p(1) 1
k
For all i 1, p(i ) 1
i (i 1)
Ideal Soliton Distribution
k = 1000
A simple way to choose from
Ideal SD
Choose A uniformly from the interval [0,1)
If A 1 k then degree 1 A
Else degree = 1.
Degree
Value of A
1/k
0
1/k
6 5
4
1/6 1/4 1/3
1/5
3
2
1/2
1
Ideal SD theorem
Ideal SD Theorem: The overall release distribution is exactly
uniform, i.e., r(x) = 1/k for all x = 0,…,k-1.
Overall release distribution for
Ideal SD
Release
Distribution
k = 1000
In expected value …
Optimal recovery with respect to Ideal SD
Receive
exactly k encoding symbols
Exactly one encoding symbol released before any decoding
steps, recovers one content symbol
At each decoding step a content symbol is recovered, it releases
exactly one new encoding symbol, which in turn recovers exactly
one more content symbol
Ripple size always exactly 1
Performance Analysis
No
reception overhead
i p(i) 1
Average degree
i
k
k
i 2
1
i 1
H(k ) ln(k )
When taking into account
random fluctuations …
Ideal Soliton Distribution fails miserably
Expected
behavior not equal to actual behavior because
of variance
Ripple very likely to become empty
Fails with very very high probability (even with high
reception overhead)
Robust Soliton Distribution
design
Need to ensure that the ripple never empties
At the beginning of the decoding process
ISD:
ripple is not large enough to withstand random fluctuations
RSD: boost p(1)=c/ sqrt{k} so that expected ripple size at beginning
is c *sqrt{k}
At the end of the decoding process
ISD:
expected rate of adding to the ripple not large enough to
compensate for collisions towards the end of the decoding process
when ripple is large relative to the number of unrecovered content
symbols
RSD: boost p(i) for higher degrees i so that expected ripple growth
at the end of the decoding process is higher
LT Codes – Bottom line
Using the Robust Soliton Distribution:
Number of symbols needed to recover the data with
probability is:
The average degree of an encoding symbol is:
Online Codes
Online Codes
Petar Maymounkov
© Copyright 2026 Paperzz