Imagining the NEXPReS WP8 Nodule

NEXPReS WP8
Provisioning High-Bandwidth, High-Capacity Networked
Storage on Demand
Ari Mujunen
Board Meeting 20-Sep-2010 in Manchester
Research leading to these results has received funding from the European Union Seventh Framework Programme (FP7/2007- 2013) under grant agreement n° RI261525. This presentation reflects only the author's views. The European Union is not liable for any use that may be made of the information contained therein.
WP8 – High-bw+cap Storage on Demand
Participants
JIVE, ASTRON, INAF, UMAN, OSO, PSNC, AALTO
Total person-months
163.2
Deliverables
12 (11 reports and one demonstration test)
2 (17)
WP8 – GANTT Chart with % of 1FTE
over Task Durations
3 (17)
Partner Focus Areas
AALTO Coordination, basic technologies
ASTRON Long-term archival & reprocessing
INAF Global/local allocation/deallocation schemes
JIVE Augmenting correlation capabilities /w buffering
OSO Trial-site performance & applicability testing
PSNC Role & trials of computing center buffering
UMAN Trial-site performance & applicability testing
4 (17)
Start at Partners, First Deliverables
AALTO Jul-2010 .. Dec-2010 (D8.1), .. Feb-2011 (D8.2), ..
ASTRON Jul-2010/Oct-2011 .. Mar-2013 (D8.9)
INAF Jul-2010 .. Apr-2011 (D8.3), .. May-2012 (D8.6)
JIVE Jul-2010 .. Aug-2012 (D8.8), .. Feb-2013 (D8.10)
OSO Feb-2011 .. Sep-2012 (in D8.4), .. Mar-2013 (in D8.5+7)
PSNC Feb-2011 .. Sep-2012 (in D8.4), .. Mar-2013 (in D8.5+7)
UMAN Feb-2011 .. Sep-2012 (in D8.4), .. Mar-2013 (in D8.5+7)
ACTION: Send your POC's email to '[email protected]'
for WP8 deliverables and execution of work!
5 (17)
Objective
• Determine the best practical mix of solutions
– What kind of storage
• HDDs, SSDs, memory buffers, others
– Where located & packaged
• Geographically (stations, correlators, computing centers, clouds,...)
• Locally (enclosures, racks, packaging, net topologies,...)
– Connected in which ways
• Locally (interface types, net equipment,...), globally (ship, net,
LP,...)
– How storage is allocated/deallocated and accessed
• Algorithms, APIs, sw structure; strategies to bookkeeping,...
• Which will serve the needs of evolving (>1Gbps) VLBI
data acquisition and processing
09/19/10
6 (17)
Model / Mindset Framework
• VLBI is globally geographically distributed data
acquisition, data storage, and data processing
– Where data from a given global observation in time must be
brought to one place to be compared / correlated
• => Implies data transfers geographically, globally
• Modelling the global VLBI network as a (potentially
hierarchically) connected network of “nodules”
– Which have capabilities like connectivity (BW, IF types,..)
storage (size, BW, BW dir limits,..), computing, etc.
– Which can be remodeled and replaced with new (hierarchical)
“nodule” designs without affecting (too much) the “big picture”
7 (17)
Nodules
• Pretty much any piece of equipment
– (Or a larger collection of such equipment, a “system”)
• Which can be described with a small set of capabilities
– Connectivity options and capabilities
• Interface types, bandwidths, bw / direction limitations
– Storage options and capabilities
• Device types, r/w bandwidths, bw /direction limitations, sw access
methods
– Internal CPU, RAM buffering, and data “pumping” power
– Packaging options
– Price, power consumption, longevity,...
8 (17)
Connectivity
• All sorts of methods used to transfer data from one
place to another
–
–
–
–
–
Physical shipping
Networking (both local and global)
Device interfaces (e.g. SATA II)
Internal buses within equipment
VLBI interfaces (e.g. legacy Mark IV formatter if)
• Connectivity has a given bandwith and its restrictions
– Direction, simultaneous use, less than theoretical performance
in a given interconnect,...
9 (17)
Existing “Nodules”
• Variants of Mark 5
– 5A, 5B: 1Gbps in or out /w shipping; 1.6Gbps in or out /w sw
– 5B+: 1/2Gbps in or out /w shipping; 3.2(?)Gbps in or out /w sw
– 5C: 4Gbps only in, /w shipping; 3.2(?)Gbps in or out /w sw
• Metsähovi 20-disk pack /w 10GE PC
– 6Gbps in or out /w shipping(?); 6Gbps in or out /w sw; in&out /w
sw not yet tested
• BackBlaze 45-disk 4U rackmount /w 1(!)GE PC
• Emerging high-end 2-4-6U rackmounts
– Claim “up to” 16—24Gbps r/w at a premium price
10 (17)
Nodule Jigsaw Puzzle
• For instance, try to find a balanced match of storage,
connectivity, and packaging options to accompany ~CPU
Storage Options
• Connectivity Options
– 2 1GE ports
2—4 SATA II disks
6 SATA II disks
• 1Gbps (or a little more)
– 3 1GE ports
4Gbps(?)
4—6 SATA II disks + 5 /w PM
= ~ 10 SATA II disks
20 SATA II disks /w PM
• 2Gbps(?)
– 1 10GE port
• 6Gbps, maybe 8Gbps
– 2 10GE ports
6Gbps
20—45 SATA II disks /w many
controllers
8-10-??Gbps
11 (17)
• ?
Packaging Puzzles
• Single unit
– Tend to become bulky; problems of (semi)custom construction
• Small-scale rack installation
• Full-size rack
– Rack connectivity: switches as 24/48 1GE x 2 10GE (cheap), 8
10GE, 24 10GE (rare, expensive, 10GE CX->T transition)
• Google-style “racks”
– Very economical for “20 or more small PCs” configuration
• But becomes trash in a couple of years and must be thrown away
and replaced with a new set...
• Rack farms
12 (17)
Simultaneous Read and Write
• Want to observe (and store a copy of data) and at the
same time, already start processing
• Frequently dictated by the need to use the same (maybe
special) connectivity for both directions
• Two problems:
– HDD seek time, slows down using more than one “spot” of disk
– Even without, double data streaming bandwidth required
throughout the internal data paths
• Seek alleviated by multiplexing HDDs
– Means more HDDs needed than the bare minimum
– Multiplexing typically in time, in time chunks >>HDD seek time
13 (17)
Imagining the NEXPReS WP8 Nodule...
• We want more than a trivial single-PC system
– But not any large-scale rack systems (no money for that!)
• Something that would retain its topology in 2015
– But go from 4—8Gbps (NEXPReS) to 16—32Gbps (2015)
• The most obvious Nodule would be a configuration of
six 1GE PCs and one 24 1GE-to-(1 or 2)10GE switch
– Could do 4Gbps in or out, 1—2Gbps in and out simultaneously
– Can exercise multiplexing in time and IP, and multiple nets/PC
– The obvious upgrade in 2015 would be to 100% 10GE
• Which means everything---except software!
• Might get up to 32Gbps in or out...
14 (17)
Imagining the NEXPReS WP8 Nodule...
• OTOH, a station Nodule could be a configuration of two
10GE PCs and one small 10GE switch
– Could do 8Gbps in or out, 4Gbps in and out simultaneously
– Can exercise multiplexing in time and IP
– The obvious upgrade in 2015 would be to buy more similar PCs
• But then: PCs of 2015 will be completely different—a mixed
configuration might look weird and make use (=software) more
complicated; the 10GE switch might prove too small and outdated
– So might end up buying all new stuff anyway...
• Will quite likely cost now more than the “six small PCs” scenario
• Well, this should be in the Dec-2010 D8.1 deliverable...
15 (17)
“The Inconvenient Truths” :-)
• About e-VLBI:
– “A given station cannot really sustain recording bandwidth larger
than their e-VLBI connectivity—unless given an unlimited disk
buffer.”
– “A single slow (or high-latency like shipping) connection in a
given e-VLBI network will force others (or some buffering party)
to buffer most of the VLBI data, if not all.”
• About buffers and archives:
– “Huge disk buffers with thousands of disks (whether distributed
or centralized) will cost a fortune, age rapidly, and be fragile
(even with the highest-end equipment) and in constant need of
(hw) maintenance.”
16 (17)
“The Inconvenient Truths” :-)
• About Mark 5s:
– “The existing 8-packs of PATA disks will never be accessed
simultaneously read and write—unless Conduant dramatically
changes StreamStor firmware.”
– “No variant of Mark 5 will ever feed the Mark IV correlator faster
than 1Gbps. While a given Mark 5 unit is feeding the correlator,
no new data can be fed into that Mark 5 at the same time.”
– “At 1.6Gbps and maybe 3.2Gbps in pairs, the existing 8-PATApacks make little sense in >=4Gbps buffering. 8-packs will
continue to be useful only for storing data certainly destined to
be shipped physically.”
17 (17)