The Science DMZ: A Network Design Pattern for Data

The Science DMZ: A Network Design Pa8ern for Data-­‐Intensive Science Jason Zurawski – [email protected]
Science Engagement Engineer, ESnet
Lawrence Berkeley National Laboratory
New Mexico Technology in EducaCon (NMTIE) November 19th, 2014 Overview •  ESnet Overview •  Science DMZ MoCvaCon and IntroducCon •  Science DMZ Architecture •  Network Monitoring •  Data Transfer Nodes & ApplicaCons •  Science DMZ Security •  User Engagement •  Wrap Up 2 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network SC Supports Research at More than 300 Institutions Across the U.S
ESnet at a Glance Universities
DOE laboratories
•  High-­‐speed naConal network, opCmized for DOE science missions: –  connecCng 40 labs, plants and faciliCes with >100 networks –  $32.6M in FY14, 42FTE –  older than commercial Internet, growing twice as fast •  $62M ARRA grant for 100G upgrade: The Office of Science supports:
27,000 Ph.D.s, graduate students, undergraduates, engineers, and technicians
26,000 users of open-access facilities
300 leading academic institutions
17 DOE laboratories
8
–  transiCon to new era of opCcal networking –  world’s first 100G network at conCnental scale •  Culture of urgency: –  4 awards in past 3 years –  R&D100 Award in FY13 –  “5 out of 5” for customer saCsfacCon in last review –  Dedicated staff to support the mission of science 3 – ESnet Science Engagement ([email protected]) - 11/19/14
4 – ESnet Science Engagement ([email protected]) - 11/19/14
Network as Infrastructure Instrument
US R&E
(DREN/Internet2/NLR)
ASIA-PACIFIC
(ASGC/Kreonet2/
TWAREN)
CANADA
(CANARIE)
RUSSIA
AND CHINA
(GLORIAD)
CANADA
(CANARIE)
LHCONE
FRANCE
(OpenTransit)
CERN
(USLHCNet)
RUSSIA
AND CHINA
(GLORIAD)
ASIA-PACIFIC
(KAREN/KREONET2/
NUS-GP/ODN/
REANNZ/SINET/
TRANSPAC/TWAREN)
SEATTLE
PNNL
ASIA-PACIFIC
(BNP/HEPNET)
AUSTRALIA
(AARnet)
BOSTON
BOISE
LATIN AMERICA
CLARA/CUDI
US R&E
(DREN/Internet2/
NISN/NLR)
SACRAMENTO
CERN
CHICAGO
BNL
NEW YORK
SUNNYVALE
ASIA-PACIFIC
(ASCC/KAREN/
KREONET2/NUS-GP/
ODN/REANNZ/
SINET/TRANSPAC)
LBNL
SLAC
US R&E
(DREN/Internet2/
NASA)
FNAL
AMES
CANADA
(CANARIE)
PPPL
ANL
WASHINGTON DC
US R&E
(NASA/NISN/
USDOI)
KANSAS CITY
DENVER
US R&E
(Internet2/
NLR)
JLAB
EUROPE
(GÉANT/
NORDUNET)
ASIA-PACIFIC
(SINET)
ORNL
NASHVILLE
ALBUQUERQUE
AUSTRALIA
(AARnet)
EUROPE
(GÉANT)
ATLANTA
LATIN AMERICA
(CLARA/CUDI)
El PASO
LATIN AMERICA
(AMPATH/CLARA)
US R&E
(DREN/Internet2/
NISN)
HOUSTON
Vision: ScienCfic progress will be completely unconstrained by the physical locaCon of instruments, people, computaConal resources, or data. 5 – ESnet Science Engagement ([email protected]) - 11/19/14
Overview •  ESnet Overview •  Science DMZ MoCvaCon and IntroducCon •  Science DMZ Architecture •  Network Monitoring •  Data Transfer Nodes & ApplicaCons •  Science DMZ Security •  User Engagement •  Wrap Up 6 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Mo8va8on •  Networks are an essenCal part of data-­‐intensive science –  Connect data sources to data analysis –  Connect collaborators to each other –  Enable machine-­‐consumable interfaces to data and analysis resources (e.g. portals), automaCon, scale •  Performance is criCcal –  ExponenCal data growth –  Constant human factors –  Data movement and data analysis must keep up •  EffecCve use of wide area (long-­‐haul) networks by scienCsts has historically been difficult 7 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Tradi8onal “Big Science” 8 – ESnet Science Engagement ([email protected]) - 11/19/14
Big Science Now Comes in Small Packages 9 – ESnet Science Engagement ([email protected]) - 11/19/14
Understanding Data Trends A few large collaborations
have internal software and
networking organizations
100PB
Medium
collaboration scale,
e.g. HPC codes
10PB
Data Scale
1PB
Small collaboration
scale, e.g. light and
neutron sources
100TB
10TB
Large collaboration
scale, e.g. LHC
1TB
100GB
10GB
Collaboration Scale
10 – ESnet Science Engagement ([email protected]) - 11/19/14
Data Mobility in a Given Time Interval This table available at:
http://fasterdata.es.net/fasterdata-home/requirements-and-expectations/
11 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network The Central Role of the Network •  The very structure of modern science assumes science networks exist: high performance, feature rich, global scope •  What is “The Network” anyway? –  “The Network” is the set of devices and applicaCons involved in the use of a remote resource •  This is not about supercomputer interconnects •  This is about data flow from experiment to analysis, between faciliCes, etc. –  User interfaces for “The Network” – portal, data transfer tool, workflow engine –  Therefore, servers and applicaCons must also be considered •  What is important? Ordered list: 1. 
2. 
3. 
Correctness Consistency Performance 12 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network TCP – Ubiquitous and Fragile •  Networks provide connecCvity between hosts – how do hosts see the network? –  From an applicaCon’s perspecCve, the interface to “the other end” is a socket –  CommunicaCon is between applicaCons – mostly over TCP •  TCP – the fragile workhorse –  TCP is (for very good reasons) Cmid – packet loss is interpreted as congesCon –  Packet loss in conjuncCon with latency is a performance killer –  Like it or not, TCP is used for the vast majority of data transfer applicaCons (more than 95% of ESnet traffic is TCP) 13 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network A small amount of packet loss makes a huge
difference in TCP performance
Local (LAN) Metro Area With loss, high performance
beyond metro distances is
essentially impossible
InternaConal Regional ConCnental Measured (TCP Reno)
Measured (HTCP)
14 – ESnet Science Engagement ([email protected]) - 11/19/14
Theoretical (TCP Reno)
Measured (no loss)
© 2014, Energy Sciences Network Working With TCP In Prac8ce •  Far easier to support TCP than to fix TCP –  People have been trying to fix TCP for years – limited success –  Like it or not we’re stuck with TCP in the general case •  PragmaCcally speaking, we must accommodate TCP –  Sufficient bandwidth to avoid congesCon –  Zero packet loss –  Verifiable infrastructure •  Networks are complex •  Must be able to locate problems quickly •  Small footprint is a huge win – small number of devices so that problem isolaCon is tractable 15 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network PuMng A Solu8on Together •  EffecCve support for TCP-­‐based data transfer –  Design for correct, consistent, high-­‐performance operaCon –  Design for ease of troubleshooCng •  Easy adopCon is criCcal –  Large laboratories and universiCes have extensive IT deployments –  DrasCc change is prohibiCvely difficult •  Cybersecurity – defensible without compromising performance •  Borrow ideas from tradiConal network security –  TradiConal DMZ •  Separate enclave at network perimeter (“Demilitarized Zone”) •  Specific locaCon for external-­‐facing services •  Clean separaCon from internal network –  Do the same thing for science – Science DMZ 16 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network The Science DMZ Superfecta Engagement • 
• 
• 
Partnerships EducaCon & ConsulCng Resources & Knowledgebase Engagement with Network Users Data Transfer Node • 
• 
• 
High performance Configured for data transfer Proper tools perfSONAR • 
Performance • 
TesCng & • 
Measurement Dedicated Systems for Data Transfer Enables fault isolaCon Verify correct operaCon Widely deployed in ESnet and other networks, as well as sites and faciliCes Network Architecture Science DMZ • 
• 
• 
17 – ESnet Science Engagement ([email protected]) - 11/19/14
Dedicated locaCon for DTN Proper security Easy to deploy -­‐ no need to redesign the whole network 17 – ESnet Science Engagement
© 2014, ([email protected])
Energy Sciences -N11/19/14
etwork Overview •  ESnet Overview •  Science DMZ MoCvaCon and IntroducCon •  Science DMZ Architecture •  Network Monitoring •  Data Transfer Nodes & ApplicaCons •  Science DMZ Security •  User Engagement •  Wrap Up 18 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Abstract or Prototype Deployment •  Add-­‐on to exisCng network infrastructure –  All that is required is a port on the border router –  Small footprint, pre-­‐producCon commitment •  Easy to experiment with components and technologies –  DTN prototyping –  perfSONAR tesCng •  Limited scope makes security policy excepCons easy –  Only allow traffic from partners –  Add-­‐on to producCon infrastructure – lower risk 19 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Science DMZ Design PaPern (Abstract) Border Router
perfSONAR
WAN
10G
Enterprise Border
Router/Firewall
10GE
Site / Campus
access to Science
DMZ resources
Clean,
High-bandwidth
WAN path
10GE
perfSONAR
10GE
Site / Campus
LAN
Science DMZ
Switch/Router
10GE
perfSONAR
Per-service
security policy
control points
High performance
Data Transfer Node
with high-speed storage
20 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Local And Wide Area Data Flows Border Router
perfSONAR
WAN
10G
Enterprise Border
Router/Firewall
10GE
10GE
Site / Campus
access to Science
DMZ resources
Clean,
High-bandwidth
WAN path
perfSONAR
10GE
Site / Campus
LAN
Science DMZ
Switch/Router
10GE
perfSONAR
Per-service
security policy
control points
High performance
Data Transfer Node
with high-speed storage
21 – ESnet Science Engagement ([email protected]) - 11/19/14
High Latency WAN Path
Low Latency LAN Path
© 2014, Energy Sciences Network Support For Mul8ple Projects •  Science DMZ architecture allows mulCple projects to put DTNs in place –  Modular architecture –  Centralized locaCon for data servers •  This may or may not work well depending on insCtuConal poliCcs –  Issues such as physical security can make this a non-­‐starter –  On the other hand, some shops already have service models in place •  On balance, this can provide a cost savings – it depends –  Central support for data servers vs. carrying data flows –  How far do the data flows have to go? 22 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Mul8ple Projects Border Router
Enterprise Border
Router/Firewall
WAN
10G
10GE
10GE
Site / Campus
access to Science
DMZ resources
Clean,
High-bandwidth
WAN path
perfSONAR
10GE
Site / Campus
LAN
Science DMZ
Switch/Router
perfSONAR
Project A DTN
Per-project
security policy
control points
Project B DTN
Project C DTN
23 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Supercomputer Center Deployment •  High-­‐performance networking is assumed in this environment –  Data flows between systems, between systems and storage, wide area, etc. –  Global filesystem oqen Ces resources together •  PorCons of this may not run over Ethernet (e.g. IB) •  ImplicaCons for Data Transfer Nodes •  “Science DMZ” may not look like a discrete enCty here –  By the Cme you get through interconnecCng all the resources, you end up with most of the network in the Science DMZ –  This is as it should be – the point is appropriate deployment of tools, configuraCon, policy control, etc. •  Office networks can look like an aqerthought, but they aren’t –  Deployed with appropriate security controls –  Office infrastructure need not be sized for science traffic 24 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Supercomputer Center Border Router
WAN
Firewall
Routed
Offices
perfSONAR
Virtual
Circuit
perfSONAR
Core
Switch/Router
Front end
switch
Front end
switch
perfSONAR
Data Transfer
Nodes
Supercomputer
Parallel Filesystem
25 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Supercomputer Center Data Path Border Router
WAN
Firewall
Routed
Offices
perfSONAR
Virtual
Circuit
perfSONAR
Core
Switch/Router
Front end
switch
Front end
switch
perfSONAR
Data Transfer
Nodes
High Latency WAN Path
Supercomputer
Low Latency LAN Path
Parallel Filesystem
26 – ESnet Science Engagement ([email protected]) - 11/19/14
High Latency VC Path
© 2014, Energy Sciences Network Development Environment •  One thing that oqen happens is that an early power user of the Science DMZ is the network engineering group that builds it –  Service prototyping –  Deployment of test applicaCons for other user groups to demonstrate value •  The producCon Science DMZ is just that – producCon –  Once users are on it, you can’t take it down to try something new –  Stuff that works tends to a8ract workload •  Take-­‐home message: plan for mul=ple Science DMZs from the beginning – at the very least you’re going to need one for yourself •  The Science DMZ model easily accommodates this 27 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Science DMZ – Flexible Design PaPern •  The Science DMZ design pa8ern is highly adaptable to research •  Deploying a research Science DMZ is straighrorward –  The basic elements are the same •  Capable infrastructure designed for the task •  Test and measurement to verify correct operaCon •  Security policy well-­‐matched to the environment, applicaCon set is strictly limited to reduce risk –  Connect the research DMZ to other resources as appropriate •  The same ideas apply to supporCng an SDN effort –  Test/research areas for development –  TransiCon to producCon as technology matures and need dictates –  One possible trajectory follows… 28 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Science DMZ – Separate SDN Connec8on Border Router
perfSONAR
Enterprise Border
Router/Firewall
WAN
High
performance
routed path
Site / Campus
access to Science
DMZ resources
perfSONAR
SDN
SDN
Science DMZ
Switch/Router
Production
Science DMZ
Switch/Router
SDN
Path
Site / Campus
LAN
perfSONAR
perfSONAR
Per-service
security policy
control points
Research DTN
29 – ESnet Science Engagement ([email protected]) - 11/19/14
Science DMZ
Connections
Production DTN
© 2014, Energy Sciences Network Science DMZ – Produc8on SDN Connec8on Border Router
perfSONAR
Enterprise Border
Router/Firewall
WAN
High
performance
routed path
perfSONAR
Research
Science DMZ
Switch/Router
SDN
SDN
Path
Site / Campus
access to Science
DMZ resources
Production SDN
Science DMZ
Switch/Router
Site / Campus
LAN
perfSONAR
perfSONAR
Per-service
security policy
control points
Research DTN
30 – ESnet Science Engagement ([email protected]) - 11/19/14
Science DMZ
Connections
Production DTN
© 2014, Energy Sciences Network Science DMZ – SDN Campus Border Border Router
perfSONAR
Enterprise Border
Router/Firewall
WAN
perfSONAR
High
performance
multi-service
path
Site / Campus
access to Science
DMZ resources
Production SDN
Science DMZ
Switch/Router
Research
Science DMZ
Switch/Router
Site / Campus
LAN
perfSONAR
perfSONAR
Per-service
security policy
control points
Research DTN
31 – ESnet Science Engagement ([email protected]) - 11/19/14
Science DMZ
Connections
Production DTN
© 2014, Energy Sciences Network Common Threads •  Two common threads exist in all these examples •  AccommodaCon of TCP –  Wide area porCon of data transfers traverses purpose-­‐built path –  High performance devices that don’t drop packets •  Ability to test and verify –  When problems arise (and they always will), they can be solved if the infrastructure is built correctly –  Small device count makes it easier to find issues –  MulCple test and measurement hosts provide mulCple views of the data path •  perfSONAR nodes at the site and in the WAN •  perfSONAR nodes at the remote site 32 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Mul8ple Ingress Flows, Common Egress Hosts will typically send packets at the speed of their interface (1G, 10G, etc.) •  Instantaneous rate, not average rate 10GE
DTN traffic with
wire-speed
bursts
•  If TCP has window available and data to send, host sends unCl there is either no data or no window Hosts moving big data (e.g. DTNs) can send large bursts of back-­‐to-­‐back packets 10GE
•  This is true even if the average rate as measured over seconds is slower (e.g. 4Gbps) •  On microsecond Cme scales, there is oqen congesCon Background
traffic or
competing bursts
10GE
•  Router or switch must queue packets or drop them 33 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Router and Switch Output Queues •  Interface output queue allows the router or switch to avoid causing packet loss in cases of momentary congesCon •  In network devices, queue depth (or ‘buffer’) is oqen a funcCon of cost –  Cheap, fixed-­‐config LAN switches (especially in the 10G space) have inadequate buffering. Imagine a 10G ‘data center’ switch as the guilty party –  Cut-­‐through or low-­‐latency Ethernet switches typically have inadequate buffering (the whole point is to avoid queuing!) •  Expensive, chassis-­‐based devices are more likely to have deep enough queues –  Juniper MX and Alcatel-­‐Lucent 7750 used in ESnet backbone –  Other vendors make such devices as well -­‐ details are important –  Thx to Jim: h8p://people.ucsc.edu/~warner/buffer.html •  This expense is one driver for the Science DMZ architecture – only deploy the expensive features where necessary 34 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Output Queue Drops – Common Loca8ons Site Core Switch/Router
Site Border Router
WAN
10GE
10GE
Inbound data path
Department uplink to site
core constrained by
budget or legacy
equipment
1GE
Common location of
output queue drops
for traffic inbound
from the WAN
Outbound data path
Department
Core Switch
1GE
Common locations
of output queue
drops for traffic
outbound toward
the WAN
Wiring closet switch
Cluster
data
transfer
node
Outbound data path
10GE
Department
cluster switch
1GE
1GE1GE
10GE
Workstations
1GE
1GE
1GE
32+ cluster
nodes
35 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Overview •  ESnet Overview •  Science DMZ MoCvaCon and IntroducCon •  Science DMZ Architecture •  Network Monitoring •  Data Transfer Nodes & ApplicaCons •  Science DMZ Security •  User Engagement •  Wrap Up 36 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Performance Monitoring •  Everything may funcCon perfectly when it is deployed •  Eventually something is going to break –  Networks and systems are complex –  Bugs, mistakes, … –  SomeCmes things just break – this is why we buy support contracts •  Must be able to find and fix problems when they occur •  Must be able to find problems in other networks (your network may be fine, but someone else’s problem can impact your users) •  TCP was intenConally designed to hide all transmission errors from the user: –  “As long as the TCPs conCnue to funcCon properly and the internet system does not become completely parCConed, no transmission errors will affect the users.” (From RFC793, 1981) 37 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network SoX Network Failures – Hidden Problems •  Hard failures are well-­‐understood –  Link down, system crash, soqware crash –  TradiConal network/system monitoring tools designed to quickly find hard failures •  Soq failures result in degraded capability –  ConnecCvity exists –  Performance impacted –  Typically something in the path is funcConing, but not well •  Soq failures are hard to detect with tradiConal methods –  No obvious single event –  SomeCmes no indicaCon at all of any errors •  Independent tesCng is the only way to reliably find soq failures 38 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Sample SoX Failures Rebooted router
with full route table
Gradual failure
of optical line
card
Gb/s normal performance
degrading performance
repair
one month 39 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Tes8ng Infrastructure – perfSONAR •  perfSONAR is: –  A widely-­‐deployed test and measurement infrastructure •  ESnet, Internet2, US regional networks, internaConal networks •  Laboratories, supercomputer centers, universiCes –  A suite of test and measurement tools –  A collaboraCon that builds and maintains the toolkit •  By installing perfSONAR, a site can leverage over 1100 test servers deployed around the world •  perfSONAR is ideal for finding soq failures –  Alert to existence of problems –  Fault isolaCon –  VerificaCon of correct operaCon 40 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network perfSONAR Deployment Footprint 41 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Lookup Service Directory Search: hPp://stats.es.net/ServicesDirectory/ 42 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network perfSONAR Dashboard: hPp://ps-­‐dashboard.es.net 43 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Overview •  ESnet Overview •  Science DMZ MoCvaCon and IntroducCon •  Science DMZ Architecture •  Network Monitoring •  Data Transfer Nodes & ApplicaCons •  Science DMZ Security •  User Engagement •  Wrap Up 44 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Dedicated Systems – Data Transfer Node •  The DTN is dedicated to data transfer •  Set up specifically for high-­‐performance data movement –  System internals (BIOS, firmware, interrupts, etc.) –  Network stack –  Storage (global filesystem, Fibrechannel, local RAID, etc.) –  High performance tools –  No extraneous soqware •  Limita=on of scope and func=on is powerful –  No conflicts with configuraCon for other tasks –  Small applicaCon set makes cybersecurity easier 45 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Data Transfer Tools For DTNs •  Parallelism is important –  It is oqen easier to achieve a given performance level with four parallel connecCons than one connecCon –  Several tools offer parallel transfers, including Globus/GridFTP •  Latency interacCon is criCcal –  Wide area data transfers have much higher latency than LAN transfers –  Many tools and protocols assume a LAN •  Workflow integraCon is important •  Key tools: Globus Online, HPN-­‐SSH 46 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Data Transfer Tool Comparison •  In addiCon to the network, using the right data transfer tool is criCcal • Data transfer test from Berkeley, CA to Argonne, IL (near Chicago). RTT = 53 ms, network capacity = 10Gbps. Tool Throughput scp: 140 Mbps
HPN patched scp: 1.2 Gbps qp 1.4 Gbps GridFTP, 4 streams 5.4 Gbps GridFTP, 8 streams 6.6 Gbps Note that to get more than 1 Gbps (125 MB/s) disk to disk requires properly engineered storage (RAID, parallel filesystem, etc.) 47 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Overview •  ESnet Overview •  Science DMZ MoCvaCon and IntroducCon •  Science DMZ Architecture •  Network Monitoring •  Data Transfer Nodes & ApplicaCons •  Science DMZ Security •  User Engagement •  Wrap Up 48 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Science DMZ Security •  Goal – disentangle security policy and enforcement for science flows from security for business systems •  RaConale –  Science data traffic is simple from a security perspecCve –  Narrow applicaCon set on Science DMZ •  Data transfer, data streaming packages •  No printers, document readers, web browsers, building control systems, financial databases, staff desktops, etc. –  Security controls that are typically implemented to protect business resources oqen cause performance problems •  SeparaCon allows each to be opCmized 49 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Performance Is A Core Requirement •  Core informaCon security principles –  ConfidenCality, Integrity, Availability (CIA) –  Oqen, CIA and risk miCgaCon result in poor performance •  In data-­‐intensive science, performance is an addiConal core mission requirement: CIA à PICA –  CIA principles are important, but if performance is compromised the science mission fails –  Not about “how much” security you have, but how the security is implemented –  Need a way to appropriately secure systems without performance compromises 50 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Placement Outside the Firewall •  The Science DMZ resources are placed outside the enterprise firewall for performance reasons –  The meaning of this is specific – Science DMZ traffic does not traverse the firewall data plane –  Packet filtering is fine – just don’t do it with a firewall •  Lots of heartburn over this, especially from the perspecCve of a convenConal firewall manager –  Lots of organizaConal policy direcCves mandaCng firewalls –  Firewalls are designed to protect converged enterprise networks –  Why would you put criCcal assets outside the firewall??? •  The answer is that firewalls are typically a poor fit for high-­‐
performance science applicaCons 51 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Firewall Capabili8es and Science Traffic •  Firewalls have a lot of sophisCcaCon in an enterprise se}ng –  ApplicaCon layer protocol analysis (HTTP, POP, MSRPC, etc.) –  Built-­‐in VPN servers –  User awareness •  Data-­‐intensive science flows typically don’t match this profile –  Common case – data on filesystem A needs to be on filesystem Z •  Data transfer tool verifies credenCals over an encrypted channel •  Then open a socket or set of sockets, and send data unCl done (1TB, 10TB, 100TB, …) –  One workflow can use 10% to 50% or more of a 10G network link •  Do we have to use a firewall? 52 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Firewalls As Access Lists •  When you ask a firewall administrator to allow data transfers through the firewall, what do they ask for? –  IP address of your host –  IP address of the remote host –  Port range –  That looks like an ACL to me! •  No special config for advanced protocol analysis – just address/port •  Router ACLs are be8er than firewalls at address/port filtering –  ACL capabiliCes are typically built into the router –  Router ACLs typically do not drop traffic permi8ed by policy 53 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Security Without Firewalls •  Data intensive science traffic interacts poorly with firewalls •  Does this mean we ignore security? NO! –  We must protect our systems –  We just need to find a way to do security that does not prevent us from ge}ng the science done •  Key point – security policies and mechanisms that protect the Science DMZ should be implemented so that they do not compromise performance •  Traffic permi8ed by policy should not experience performance impact as a result of the applicaCon of policy 54 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Firewall Performance Example •  Observed performance, via perfSONAR, through a firewall: Almost 20 Cmes slower through the firewall •  Observed performance, via perfSONAR, bypassing firewall: Huge improvement without the firewall 55 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network If Not Firewalls, Then What? •  Intrusion DetecCon Systems (IDS) –  One example is Bro – h8p://bro-­‐ids.org/ –  Bro is high-­‐performance and ba8le-­‐tested •  Bro protects several high-­‐performance naConal assets •  Bro can be scaled with clustering: h8p://www.bro-­‐ids.org/documentaCon/cluster.html –  Other IDS soluCons are available also •  Nerlow and IPFIX can provide intelligence, but not filtering •  Openflow and SDN –  Using Openflow to control access to a network-­‐based service seems pre8y obvious –  This could significantly reduce the a8ack surface for any authenCcated network service –  This would only work if the Openflow device had a robust data plane 56 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network If Not Firewalls, Then What? (2) •  Aggressive access lists –  More useful with project-­‐specific DTNs –  If the purpose of the DTN is to exchange data with a small set of remote collaborators, the ACL is pre8y easy to write –  Large-­‐scale data distribuCon servers are hard to handle this way (but then, the firewall ruleset for such a service would be pre8y open too) •  LimitaCon of the applicaCon set –  One of the reasons to limit the applicaCon set in the Science DMZ is to make it easier to protect –  Keep desktop applicaCons off the DTN (and watch for them anyway using logging, nerlow, etc – take violaCons seriously) –  This requires collaboraCon between people – networking, security, systems, and scienCsts 57 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Collabora8on Within The Organiza8on •  All stakeholders should collaborate on Science DMZ design, policy, and enforcement •  The security people have to be on board –  Remember: security people already have poliCcal cover – it’s called the firewall –  If a host gets compromised, the security officer can say they did their due diligence because there was a firewall in place –  If the deployment of a Science DMZ is going to jeopardize the job of the security officer, expect pushback •  The Science DMZ is a strategic asset, and should be understood by the strategic thinkers in the organizaCon –  Changes in security models –  Changes in operaConal models –  Enhanced ability to compete for funding –  Increased insCtuConal capability – greater science output 58 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Overview •  ESnet Overview •  Science DMZ MoCvaCon and IntroducCon •  Science DMZ Architecture •  Network Monitoring •  Data Transfer Nodes & ApplicaCons •  Science DMZ Security •  User Engagement •  Wrap Up 59 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Challenges to Network Adop8on •  Causes of performance issues are complicated for users. •  Lack of communicaCon and collaboraCon between the CIO’s office and researchers on campus. •  Lack of IT experCse within a science collaboraCon or experimental facility •  User’s performance expectaCons are low (“The network is too slow”, “I tried it and it didn’t work”). •  Cultural change is hard (“we’ve always shipped disks!”). The Capability Gap •  ScienCsts want to do science not IT support 60 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Requirements Reviews h8p://www.es.net/about/science-­‐requirements/network-­‐
requirements-­‐reviews/ The purpose of these reviews is to accurately characterize the near-­‐
term, medium-­‐term and long-­‐term network requirements of the science conducted by each program office. The reviews a8empt to bring about a network-­‐centric understanding of the science process used by the researchers and scienCsts, to derive network requirements. We have found this to be an effec=ve method for determining network requirements for ESnet's customer base. 61 – ESnet Science Engagement ([email protected]) - 11/19/14
High Energy Physics Biological and Environmental Research Photo courtesy of LBL Photo courtesy of JGI Photo courtesy of NIST Advanced ScienCfic CompuCng Research Basic Energy Science Photo courtesy of LBL Nuclear Physics Fusion Energy Sciences Photo courtesy of SLAC Photo courtesy of PPPL 62 – ESnet Science Engagement ([email protected]) - 11/19/14
How do we know what our scien8sts need? •  Each Program Office has a dedicated requirements review every three years •  Two workshops per year, a8endees chosen by science programs •  Discussion centered on science case studies • 
• 
• 
• 
Instruments and FaciliCes – the “hardware” Process of Science – science workflow Collaborators Challenges •  Network requirements derived from science case studies + discussions •  Reports contain requirements analysis, case study text, outlook 63 – ESnet Science Engagement ([email protected]) - 11/19/14
2013 BER Sample Findings: Environmental Molecular “EMSL
frequently needs to ship physical copies of media to users when
Sciences data sizes exceed a few GB. More often than not, this is due to lack of
Laboratory bandwidth or storage resources at the user's home institution.”
(EMSL) Overview •  ESnet Overview •  Science DMZ MoCvaCon and IntroducCon •  Science DMZ Architecture •  Network Monitoring •  Data Transfer Nodes & ApplicaCons •  On the Topic of Security •  User Engagement •  Wrap Up 65 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Wrapup •  The Science DMZ design pa8ern provides a flexible model for supporCng high-­‐performance data transfers and workflows •  Key elements: –  AccommodaCon of TCP •  Sufficient bandwidth to avoid congesCon •  Loss-­‐free IP service –  LocaCon – near the site perimeter if possible –  Test and measurement –  Dedicated systems –  Appropriate security •  Support for advanced capabiliCes (e.g. SDN) is much easier with a Science DMZ 66 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network The Science DMZ in 1 Slide Consists of three key components, all required: •  “FricCon free” network path –  Highly capable network devices (wire-­‐speed, deep queues) –  Virtual circuit connecCvity opCon –  Security policy and enforcement specific to science workflows –  Located at or near site perimeter if possible •  Dedicated, high-­‐performance Data Transfer Nodes (DTNs) © 2013 Wikipedia –  Hardware, operaCng system, libraries all opCmized for transfer –  Includes opCmized data transfer tools such as Globus Online and GridFTP •  Performance measurement/test node –  perfSONAR •  Engagement with end users Details at h8p://fasterdata.es.net/science-­‐dmz/ 67 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Links –  ESnet fasterdata knowledge base •  h8p://fasterdata.es.net/ –  Science DMZ paper •  h8p://www.es.net/assets/pubs_presos/sc13sciDMZ-­‐final.pdf –  Science DMZ email list •  h8ps://gab.es.net/mailman/lisCnfo/sciencedmz –  perfSONAR •  h8p://fasterdata.es.net/performance-­‐tesCng/perfsonar/ •  h8p://www.perfsonar.net 68 – ESnet Science Engagement ([email protected]) - 11/19/14
© 2014, Energy Sciences Network Thanks! Jason Zurawski – [email protected]
Science Engagement Engineer, ESnet
Lawrence Berkeley National Laboratory
New Mexico Technology in EducaCon (NMTIE) November 19th, 2014