Typhoon: An Ultra-Available
Archive and Backup System
Utilizing Linear-Time Erasure Codes
Part of OceanStore...
Typhoon: An Ultra-Available
Archive and Backup System
Utilizing Linear-Time Erasure Codes
Cache
Client
Naming/Location
Erasure Codes
• Erasure Code: a form of data coding that allows lost
portions of data to be recovered
• Idea is similar to ECC, except that the algorithm must
be told which portions of the data are missing
• Reed Solomon Codes are a common type of Erasure
Code, but they are computationally expensive and
are usually implemented in hardware
Tornado Codes: A Linear-Time
Probabilistic Family of Erasure Codes
• Tornado Codes are linear time, but use probabilistic
assumptions to “guarantee” that the decoding process will
succeed
• A 1/2 rate Erasure Code will double the size of a file
• Any half of e. file can be used to recreate the original data
• T. Codes also require slightly more than half of the encoded
file, thus trading a network bandwidth for speed
– Inventors of T. Codes report that 5% is typical
Overview of Encoding Process
• File is divided into nodes of
equal size (e.g. 512 bytes)
• Data Nodes are associated
with Check Nodes using a
series of Bipartite Graphs
• Contents of a Check Node is
the XOR of its neighbors
• Bipartite Graphs are
created to satisfy
mathematical constraints
that “guarantee” the
recovery process
will successfully
recover the file
Data File
Check
Nodes
Data Nodes
Overview of Encoding Process
Data File
Data Nodes
•Once a file is encoded,
the data nodes and check
nodes are randomly
distributed to a set of
recipients
Check Nodes
MMX: SIMD or Marketing?
•There are eight MMX registers
•Data in registers can be divided into four different sizes:
MMX: SIMD or Marketing?
•There are eight MMX registers
•Data in registers can be divided into four different sizes
•MMX has 57 instructions for 6 types of operations:
— ADD
— SUBTRACT
— MULTIPLY
— MULTIPLY THEN ADD
— COMPARISON
— LOGICAL
• AND
• NAND
• OR
• XOR
MMX: SIMD or Marketing?
•There are eight MMX registers
•Data in registers can be divided into four different sizes
•MMX has 57 instructions for 6 types of operations
char array1[512];
char array2[512];
for(int i=0; i<512; ++i)
array1[i]=array1[i] ^ array2[i];
MMX is 2.3 times faster than this (1.9 w/o pipeline sched.)
MMX: SIMD or Marketing?
•There are eight MMX registers
•Data in registers can be divided into four different sizes
•MMX has 57 instructions for 6 types of operations
char array1[512];
char array2[512];
long * array1ptr=(long*)array1;
long * array2ptr=(long*)array2;
for(int i=0; i<512/sizeof(long); ++i)
array1ptr[i]=array1ptr[i] ^ array2ptr[i];
MMX is 50% faster than this (22% w/o sched.)
MMX: SIMD or Marketing?
•There are eight MMX registers
•Data in registers can be divided into four different sizes
•MMX has 57 instructions for 6 types of operations
char array1[512];
char array2[512];
long * array1ptr=(long*)array1;
long * array2ptr=(long*)array2;
for(int i=0; i<512; i+=32)
xor32fast(array1ptr+i, array2ptr+i);
MMX: SIMD or Marketing?
inline void xor32bytes(long * array1reg, long* array2reg, long* destreg)
{
_asm
{
mov eax, [array1reg]
mov ecx, [array2reg]
movq mm0, [eax]
movq mm1, [ecx]
movq mm2, [eax+8]
movq mm3, [ecx+8]
movq mm4, [eax+16]
movq mm5, [ecx+16]
movq mm6, [eax+24]
movq mm7, [ecx+24]
pxor mm0, mm1 ; 64-bit xor
pxor mm2, mm3 ; 64-bit xor
pxor mm4, mm5 ; 64-bit xor
pxor mm6, mm7 ; 64-bit xor
mov ecx, [destreg]
movq [ecx],
mm0 ; store result
movq [ecx+8], mm2 ; store result
movq [ecx+16], mm4 ; store result
movq [ecx+24], mm6 ; store result
}
}
MMX: SIMD or Marketing?
inline void xor32fast(long * array1reg, long* array2reg, long* destreg)
{
_asm
{
mov eax, [array1reg]
mov ebx, [array2reg]
mov ecx, [destreg]
movq mm0, [eax]
; load 1a U
movq mm1, [ebx]
; load 1b U
movq mm2, [eax+8] ; load 2a U V
pxor mm0, mm1
; xor
1
movq mm3, [ebx+8] ; load 2b U
movq [ecx], mm0
; store 1
U V
pxor mm2, mm3
; xor
2
movq mm4, [eax+16] ; load 3a U
movq mm5, [ebx+16] ; load 3b U
movq mm6, [eax+24] ; load 4a U V
pxor mm4, mm5
; xor
3
movq mm7, [ebx+24] ; load 4b U
movq [ecx+8], mm2 ; store 2
U V
pxor mm6, mm7
; xor
4
movq [ecx+16], mm4 ; store 3
U
movq [ecx+24], mm6 ; store 4
U
}
}
Overview of Encoding Process
•Server sends storage announcement
to a particular set of severs
– Set can be determined/specified using multicast groups,
a server list, or some form of DNS address lookup
UDP
Overview of Encoding Process
•Server sends storage announcement
to a particular set of severs
– Set can be determined/specified using multicast groups,
a server list, or some form of DNS address lookup
Multicast
Overview of Encoding Process
•Server encodes file
•During encoding process, the data nodes and check
nodes are [randomly] distributed to other servers
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• A set of nodes are received,
ideally with random distribution
• Check nodes can be used to
recover missing data nodes
• Only check nodes that are
missing one neighbor can
recreate a data node
• The structure of the graph
ensures [w.h.p.] that the
encoding process will succeed
– Graph is designed so that there is
always at least one check node that
is missing only one child
– Data nodes can be used to recover
check nodes, but is not important
Data File
Check
Nodes
Node Received
Node Not Received
Overview of Decoding Process
• Server sends file request announcement to a particular set of servers
• Retrieves data from multiple servers simultaneously
• Recovery process can be performed in parallel with receive (network-based RAID-1)
• Depending on data loss pattern, a particular subset of the servers can be selected
• Fastest servers (closest servers, or least utilized servers)
• Operational Servers (i.e., some portion of the set is not functioning)
• All servers might be needed in some cases, such as network congestion / packet loss
Architecture
Cache
Client
Naming/Location
Architecture
•What did we implement?
• Client, Cache, Naming and Location Mechanism,
Replication mechanism, filestore.
•What did we test?
• Communication
•Explicit communication Unicast request
•Implicit communication Multicast request
• Network
•Distributed servers throughout Berkeley domain.
•Simulated network delay by randomizing response time.
• Caching
•None for worst case
• Simulation
•Strained the Typhoon system by creating requests at the same rate as a 24
hour NFS traces over a 3 hour period.
Tornado - GET avg_proc_tim e
Reed Solom on - GET avg_proc_tim e
25
2000
1800
20
1600
Time (sec)
Time (sec)
1400
1200
1000
800
15
10
client
Client
600
cache
Cache
Namingloc
400
Replication
namingloc
5
replication
Filestore
200
filestore
0
0
0
1000000 2000000
File Size (bytes)
3000000
0
500000
1000000
File Size (bytes)
1500000
Tornado - Replication
Reed Solomon - Replication
25
2000
1800
1600
20
1200
15
Time (sec)
Time (sec)
1400
1000
800
avg_proc_time
10
avg_proc_time
avg_dec_time
avg_dec_time
600
avg_comm_time
avg_comm_time
400
5
200
0
0
0
1000000
2000000
File Size (bytes)
3000000
0
500000
1000000
File Size (bytes)
1500000
Tornado - PUT avg_proc_time
Reed Solomon - PUT avg_proc_time
4500
6
4000
5
3500
3000
4
Cache
Namingloc
Replication
2000
client
Time (sec)
Time (sec)
Client
2500
cache
3
namingloc
replication
Filestore
filestore
2
1500
1000
1
500
0
0
0
1000000
2000000
3000000
File size (bytes)
4000000
0
1000000
2000000
3000000
File Size (bytes)
4000000
Reed Solomon - Replication
Tornado - Replication
70
6
60
5
50
40
30
avg_proc_time
Time (sec)
Time (sec)
4
avg_proc_time
avg_enc_time
avg_comm_time
3
avg_enc_time
avg_comm_time
2
20
1
10
0
0
0
200000
400000
File size (bytes)
600000
0
1000000 2000000 3000000 4000000
File Size (bytes)
Typhoon: An Ultra-Available
Archive and Backup System
Utilizing Linear-Time Erasure Codes
Benefits of Typhoon
• Data is ultra-available: up to half of the servers can fail before availability is affected
• Fast file retrieval: data can be retrieved simultaneously from multiple servers
– System can choose to use the fastest machines in a set of servers
– Load balancing can be achieved because slow or heavily utilized servers are not used
– Information can be disbursed geographically
• Increases the accessibility of data in the event of a major disaster, such as an earthquake
• Can benefit people who travel to remote locations, since data may be closer to them
– Multicast can be used to reduce latency
• Low-overhead algorithms: algorithms for encoding and decoding are linear-time
• Disk overhead of system can be adjusted (typically doubles the size of a file)
Conclusion
• Tornado Codes are significantly faster than
Cauchy-Reed Solomon
• A Typhoon based system can match the the
request of a loaded NFS
• Typhoon is a viable solution for increasing
the reliability and accessibility of data
Architecture
•What did we implement?
• Client, Cache, Naming and Location Mechanism,
Replication Mechanism, filestore.
•What did we test?
• Communication
•Explicit communication TCP request, TCP Response.
•Implicit communication Multicast request, TCP Response.
• Network
•Distributed servers throughout Berkeley domain.
•Simulated network delay by randomizing response time.
• Caching
•None for worst case
• Simulation
•Strained the Typhoon system by creating requests at the same rate as a 24
hour NFS traces over a 3 hour period.
© Copyright 2026 Paperzz