Generalized Locally Repairable Codes

Coding and Distributed Caching for
Content Delivery
Alex Dimakis
Based on collaborations with
K. Shanmugam, N. Golrezaei, A. Molisch, G. Caire
Mingyue Ji, Antonia Tulino (ALU-Bell Labs)
Jaime Llorca (ALU- Bell Labs)
My three minutes on storage codes

Distributed storage codes are already used for `big data’.

Microsoft LRCs used in Azure and ship with Windows
server.

Piggyback codes (suboptimal Regenerating codes) will
become a part of Hadoop (Rashmi et al.)

Still numerous fundamental open problems remain.
Questions we investigate

How to optimize placement of popular video files in
storage-enabled small-cell stations.

When cache does not store what is desired this can still
be useful, through index coding

The Coded caching problem. Why previous schemes do
not work for realistic parameters and how to fix that.

Conclusions and outlook
Questions we investigate

How to optimize placement of popular video files in
storage-enabled small-cell stations. (FemtoCaching)

When cache does not store what is desired
it can still be useful, through index coding

The Coded caching problem. Why previous schemes do
not work for realistic parameters and how to fix that.

Conclusions and outlook
Single helper optimal caching
Assume that each user samples one file
from a given popularity distribution.
File n is chosen with probability pn
H1
Let pl (k) be the probability that the choice of
user k is available locally.
Reward= 1$ for each file found locally.
Max expected reward.
Then:
Single helper optimal caching
•If there is a single helper, easy
to find best caching policy
•Cache M most popular files
•Maximizes expected reward
•Also works for the more
general case of expected delay
The Femtocaching problem
• Conflict of interests in populating the caches.
– U1 and U2 want H1 to cache the M most popular files
– U4 prefers that H2 caches the M most popular files.
– U3 wants one of H1,H2 to cache the M most popular, and
the other one the M second-most-popular files.
7
Femtocaching formulation
• Files are either completely stored or not stored at all.
• Files available locally are retrieved with very small delay
• Fixed, known topology of users-helpers.
8
Problem Formulation
(cont’d)
– Expected reward for user k:
– Maximize expected reward for all K users
9
Theoretical Results
Theorem: The femtocaching placement problem is NPcomplete.
(reduction from 2-Disjoint set cover problem).
Theorem: The problem can be expressed as a
maximization of a submodular function subject to matroid
contraints.
A file-based greedy placement achieves factor of 2
approximation.
A pipage rounding placement algorithm achieves
1-1/e=0.63..
approximation. (K8 complexity)
10
Our Result
Theorem: New approximation algorithm that exploits
graph sparsity.
If each user connects to d or fewer helpers
Maximizing these submodular functions can be done
within a 1- (1-1/d)d provable approximation factor.
(for d growing recovers the 1-1/e result. )
Algorithm: Creating an appropriate LP and rounding it.
Question: Faster approximation algorithms?
11
Questions we investigate

How to optimize placement of popular video files in
storage-enabled small-cell stations. (FemtoCaching)

When cache does not store what is desired
it can still be useful, through index coding

The Coded caching problem. Why previous schemes do
not work for realistic parameters and how to fix that.

Conclusions and outlook
Local caching helps in two ways
1.
Find the desired file in a nearby friend. Cause less interference.
2.
By enabling coded broadcast transmissions from the BS. (Index
Coding)
Index Coding
U1
I pct/ broadcast trans.
Has: X2 X3
Wants: X1
Index Coding
U1
I pct/ broadcast trans.
Has: X2 X3
Wants: X1
U3
Has: X1 X2
U2
Wants: X3
Has: X3 X1
Wants: X2
Minimum no of transmissions for broadcast ?
Index Coding
2
U1
U2
Has: X2 X3 Has: X3 X1
Wants: X1
Wants: X2
U3
Has: X1 X2
Wants: X3
1
3
Directed Side Information
graph
Index Coding
2
U1
U2
Has: X2 X3 Has: X3 X1
Wants: X1
U3
Has: X1 X2
Wants: X3
Wants: X2
1
1
1
X1
X2
X3
= X1+X2+X3
1
3
Directed Side Information
graph
Index Coding: who cares?
•
Very simple to describe
•
Incredibly challenging (Hard to approximate, [Peeters, Langberg and
Sprinston]). Nonlinear codes might be needed (Alon et al. Lubetsky et al.)
•
Connects to fundamental graph theory (Shannon graph capacity) [Shannon,
Lovasz, Haemmers], Zero-error information Theory [Orlitsky and Körner]
•
Any multiuser network coding problem can be reduced to index coding
[El Rouayheb et al. Effros et al.]
•
Fundamental connections to Interference Alignment [Maleki et al., Jafar et
al.]
Generalized Locally Repairable Codes (GLRCs)
1
2
3
4
5
+
p1
19
Generalized Locally Repairable Codes (GLRCs)
p2
+
1
2
+
3
4
p3
5
+
p1
20
Generalized Locally Repairable Codes (GLRCs)
p2
+
1
2
+
3
4
p3
5
+
p1
Want to read
Block 1
21
Generalized Locally Repairable Codes (GLRCs)
p2
+
1
2
+
3
4
p3
5
I want to read Block 1
+
p1
Want to read
Block 1
22
Generalized Locally Repairable Codes (GLRCs)
p2
+
1
2
+
3
+
4
p3
5
Locality of block 1= 3
p1
Want to read
Block 1
23
Generalized Locally Repairable Codes (GLRCs)
p2
+
1
2
+
3
4
p3
I want to read Block 1
5
+
p1
Want to read
Block 1
Availability: Number of reads we can
support in parallel.
Will not worry about this in this talk.
24
Generalized Locally Repairable Codes (GLRCs)
p2
+
1
2
+
3
+
p1
4
5
p3
We are given: For each block i, a set N(i)
So that i is recoverable from the set N(i)
N(1)= {2,3,p2}
N(2)= {1,3,p2}
N(3)= {1,2,p2}
N(4)= {5,p3}
N(5)= {4,p3}
N(p2)={1,2,3}
N(p3)={4,5}
N(p1)={1,2,3,4,5}
Defines a directed graph on the blocks: Recoverability graph.
N(i) is in-neighborhood of vertex i.
25
Generalized LRCs- (GLRCs)
Rate 2 GLRC with recoverability conditions
a
b
a+b
1
2
3
Recoverability Graph
2
1
3
Generalized LRCs- (GLRCs)
Rate 2 GLRC with recoverability conditions
a
b
a+b
1
2
3
Recoverability Graph
2
1
3
GLRC-Index Code Duality
Rate 2 GLRC
Recoverability Graph
2
a
b
1
0
1
0
1
1
1
3
Side Info Graph
2
1
1
1
X1
X2
X3
1
3
GLRC-Index Code Duality
Rate 2 GLRC
Recoverability Graph
2
a
1
b
0
0
0
1
0
1
1
1
3
1
1
1
1
1
1
=0
Dual codes!
Side Info Graph
1
1
1
1
X1
X2
X3
2
1
3
GLRC- Index Code Duality
Recoverability Graph
Rate r vector linear GLRC
Dual codes!
Side Information Graph
Rate n-r for index coding
Complementary rate r
Theorem 1: The dual linear subspace of a GLRC is a
solution to an index coding problem where
1. The Side Information graph is the Recoverability graph
and
2. The rate is n-r
Brief history of index coding
1956: Shannon defined the capacity of a graph (Zero-error capacity)
Transmitter chooses one vertex of the
graph
1
5
2
Receiver sees one adjacent edge- must
decode which vertex was transmitted.
No probability of error allowed.
How many messages can we send per
channel use?
4
3
Use {1,3} to send two - that works.
Brief history of index coding
1956: Shannon defined the capacity of a graph (Zero-error capacity)
Transmitter chooses one vertex of the
graph
1
5
2
Receiver sees one adjacent edge- must
decode which vertex was transmitted.
No probability of error allowed.
How many messages can we send per
channel use?
4
3
Use {1,3} to send two - that works.
Can do a bit better: Shannon showed
that in 2 uses can send 5 messages
{11, 23, 35, 42, 54}
Brief history of index coding
1956: Shannon defined the capacity of a graph (Zero-error capacity)
1979: Lovasz defines theta function and shows Θ(G)<= th(G)
The Lovasz Theta function of a graph
can be computed by solving an SDP.
1
For this graph
5
2
hence
4
3
Lovasz also conjectures that his theta function is in fact equal to the
Graph Capacity.
Brief history of index coding
1956: Shannon defined the capacity of a graph (Zero-error capacity)
1979: Lovasz defines theta function and shows Θ(G)<= th(G)
1979: Haemers disproves that Lovasz= Shannon capacity by
defining Minrank(G)
1
5
2
4
3
Form a matrix A with 1s on
diagonal.
If Gij is an edge, place Aij=*
If Gij not edge, place Aij=0
Complete *s to minimize rank(A)
MinRank(G) is scalar index coding
for the side information defined by
graph G.
Lovasz also conjectures that his theta function is in fact equal to the
Graph Capacity.
Questions we investigate

How to optimize placement of popular video files in
storage-enabled small-cell stations. (FemtoCaching)

When cache does not store what is desired
it can still be useful, through index coding

The Coded caching problem. Why previous schemes do
not work for realistic parameters and how to fix that.

Conclusions and outlook
Coded Multicasting with cache design
BS
F packets/trans.
File partitioned into F packets.
Can design these.
U1
U2
UK
M*F
packets
want
Coded Multicasting with cache design
BS
File partitioned into F packets.
Can design these.
U1
U2
UK
Coded caching
[Maddah-Ali-Niesen]
is index coding with a twist:
1. We can design the side information sets. (placement phase)
M*F
2. After they
are set, adversary chooses which packet each user
chunks
wants. (most results also transfer for random demands)
3. Then, solve this index coding instance (delivery phase)
Previous Work-Asymptotic limits
BS
U1
1 file/trans.
U2
UK
M*F
packets
want
N total files (library)
K users
M files fit in cache of each user
Each file is F packets.
Can trivially satisfy all users with K
transmissions.
Previous Work-Asymptotic limits
BS
U1
1 file/trans.
U2
UK
M*F
packets
want
N total files (library)
K users
M files fit in cache of each user
Each file is F packets.
Can trivially satisfy all users with K
transmissions.
[MN] transmissions needed:
Gain factor g=KM/N which is also
order optimal.
Simple and efficient placement
Simple and efficient delivery
Previous Work-Asymptotic limits
BS
U1
U2
1 file/trans.
UK
Requires that
of
packets F
goes to
infinity
M*F
the number
packets
want
N total files (library)
K users
M files fit in cache of each user
Each file is F packets.
Can trivially satisfy all users with K
transmissions.
[MN] transmissions needed:
Gain factor g=KM/N which is also
order optimal.
Simple and efficient placement
Simple and efficient delivery
Previous Work-Asymptotic limits
BS
U1
U2
1 file/trans.
UK
Requires that
of
packets F
goes to
infinity
M*F
the number
packets
want
N total files (library)
K users
MHow
files fitshould
in cache F
of each user
Each
file is Fas
packets.
scale
a
of all users with K
Canfunction
trivially satisfy
transmissions.
M,N,K to get
these gains?
[MN] transmissions needed:
Gain factor g=KM/N which is also
order optimal.
Simple and efficient placement
Simple and efficient delivery
Results: The bad news
Theorem 1: For decentralized random caching using greedy coloring index
coding delivery (i.e. the [MN] scheme),
if F≤ eKM/N
Then no. of transmissions required (peak rate) ≥ K/2
Results: The bad news
Theorem 1: For decentralized random caching using greedy coloring index
coding delivery (i.e. the [MN] scheme),
if F≤ eKM/N
Then no. of transmissions required (peak rate) ≥ K/2
Think of M/N as constant.
Even if F = exp( c K ), we do not gain more than 2x using the [MN] scheme.
Best possible scaling of F(g)
F(g) : Number of sub-packets needed to achieve gain g over naïve K
transimissions
Best possible scaling of F(g)
F(g) : Number of sub-packets needed to achieve gain g over naïve K
transimissions
Theorem 2: For any random symmetric placement scheme and any clique
cover delivery algorithm, to get a gain of g,
At least
F(g) > (N/M)g g/K
sub-packetization is needed.
Best possible scaling of F(g)
F(g) : Number of sub-packets needed to achieve gain g over naïve K
transimissions
Theorem 2: For any random symmetric placement scheme and any clique
cover delivery algorithm, to get a gain of g,
At least
F(g) > (N/M)g g/K
sub-packetization is needed.
So: F(g) must be exponential in g, but possibly not in K.
Achievability: New coded caching algorithm
Theorem 3: Using a ‘reduced-greedy’ algorithm we can achieve
F(g) = (N/M)g polylog( (N/M)g )
Achievability: New coded caching algorithm
Theorem 3: Using a ‘reduced-greedy’ algorithm we can achieve
F(g) = (N/M)g polylog( (N/M)g )
Specifically:
F(g) =Θ( (N/M)g+1 (log N/M)g+1 (2e) g)
Achievability: New coded caching algorithm
Theorem 3: Using a ‘reduced-greedy’ algorithm we can achieve
F(g) = (N/M)g polylog( (N/M)g )
Specifically:
F(g) =Θ( (N/M)g+1 (log N/M)g+1 (2e) g)
So: F(g) exponential in g, but independent of K is possible.
Open if better scaling is achievable with deterministic placement or better
index coding delivery.
Finite length analysis of Caching-Aided Coded Multicasting
Shanmugam, Ji, Tulino, Llorca, AD (IT Transactions, under
review)
New coded caching algorithm
The problem of finite F
[MN] splits each of the N files into F blocks and places each block in each
cache with probability M/N independently.
Typical block will be in KM/N users caches
Then goes over each possible subset of users S.
A packet is relevant for S if it is cached by |S|-1 users in S, and desired by 1.
[MN] XORs all relevant packets for each S and transmits.
This may not be useful, e.g. if all relevant packets are desired by the same
user in S.
[MN] Tries to find XORs of size KM/N to get its huge asymptotic gain.
But this will not happen before F is exponential in K.
Don’t be so greedy
Key idea is to artificially aim for lower gain and randomly ignore some cached
copies of files.
Break K users into groups of size g N/M and only code within each group
For each packet reduce storage subset into only g random people.
Apply [MN] algorithm on this sparsified graph. Gives many more coding
opportunities.
Open problems in storage
Repair Bandwidth:
• OP1: Exact repair region ?
• OP2: Practical E-MSR codes for high rates ?
• OP3: Repair for existing codes (e.g. EvenOdd, Reed-Solomon, Reed
Muller, etc) ?
Locality:
• OP4: Explicit LRCs with Maximum recoverability?
Availability:
• OP5: Distance –availability tradeoff ?
• OP6: Practical explicit codes ?
Index Coding / GLRCs
• OP7: Index coding rate for random graphs
• OP8: Approximating GLRC/Index Coding sum rate in polylog factor ?
•
On Approximating the Sum-Rate for Multiple-Unicasts[Shanmugam et al.]
54
Open problems in caching

Storage in small cells is a practically useful idea
Femtocaching

The placement problem (femtocaching) has some solutions but high
complexity and cannot handle mobility.

OP9: Simple linear-time femtocaching algorithms with some approximation
guarantees?

OP10: Statistical benefits / scaling laws with a mobility model ? (a-la
Gupta&Kumar, Grossglauser&Tse)
Coded Caching

The coded caching problem can give tremendous gains but still not fully
understood for practical packet numbers.

Possibly better placement / delivery schemes could beat lower bound.

OP11: F(g) < (N/M)g possible ?

OP12: Coded caching with non-trivial topologies ?
55
Pointers
General
https://www.youtube.com/watch?v=obXTLCTBGuU
Simons tutorial on Regenerating codes
https://www.youtube.com/watch?v=9Y3uWLgKPkU
Simons tutorial on LRCs
http://storagewiki.ece.utexas.edu/
Distributed storage wiki
http://arxiv.org/pdf/1402.3895.pdf
Bounding Multiple Unicasts through Index Coding and Locally Repairable Codes
Shanmugam et al. (Duality between Index coding and GLRCs)
For Coded Caching (problems 11 and 12)
http://arxiv.org/abs/1508.05175
Finite Length Analysis of Caching-Aided Coded Multicasting
Shanmugam et al. (IT Transactions under review)
For Femtocaching (problems 9,10)
http://arxiv.org/abs/1109.4179
FemtoCaching paper
For problem 7:
http://arxiv.org/abs/1607.04842
The MinRank of random graphs (Golovnev et al.)
For OP2 (Exact MSR codes )
http://arxiv.org/pdf/1604.00454v2.pdf
Explicit constructions of high-rate MDS array codes with optimal repair bandwidth
Ye and Barg
Vijay Kumar recent work ?
For OP3 (repairing known codes)
http://storagewiki.ece.utexas.edu/doku.php?id=wiki:papers:all#repairing_known_cod
es
Repairing array codes
Rebuilding for array codes in distributed storage
(Zhiying Wang et al. Globecom 2010 )
A Repair Framework for Scalar MDS codes
Shanmugam et al. JSAC 2014
Repairing Reed-Solomon Codes
Guruswami and Wootters, STOC 2016
Barg et al ?
fin