Reasoning about Software Defined Networks

Formal Reasoning about
Networks
Mooly Sagiv
[email protected]
03-640-7606
Tel Aviv University
Sunday 14-16
http://www.cs.tau.ac.il/~msagiv/courses/rsys.html
Outline
•
•
•
•
Why bother about network verification?
Verifying Software Defined Networks [PLDI’14]
Middlebox Verification [TACAS’16]
Azure Verification [NSDI’15, POPL’16]
[PLDI’14] T. Ball, N. Bjørner, A. Gember, S. Itzhaky, A. Karbyshev, M. Sagiv, M.
Schapira, A. Valadarsky: VeriCon: towards verifying controller programs in
software-defined networks
[TACAS’16] Y. Velner, K. Alpernas, A. Panda, A. Rabinovich, M. Sagiv, S. Shenker and
S. Shoham. Some Complexity Results for Stateful Network Verification
[POPL’16] G. Plotkin, N. Bjørner, N. Lopes, A. Rybalchenko, G. Varghese: Scaling
network verification using symmetry and surgery
[NSDI’15] N. Lopes, N. Bjørner, P. Godefroid, K. Jayaraman, G. Varghese: Checking
Beliefs in Dynamic Networks.
The Internet: A Remarkable Story
• Tremendous success
– From research experiment
to global infrastructure
• Brilliance of under-specifying
– Network: best-effort packet delivery
– Hosts: arbitrary applications
• Enables innovation in applications
– Web, P2P, VoIP, social networks, virtual worlds
• But, change is easy only at the edge… 
3
Inside the ‘Net: A Different Story…
• Closed equipment
– Software bundled with hardware
– Vendor-specific interfaces
• Over specified
– Slow protocol standardization
• Few people can innovate
– Equipment vendors write the code
– Long delays to introduce new features
Impacts performance, security, reliability, cost…
4
Do We Need Innovation Inside?
Many boxes (routers, switches, firewalls, …), with
different interfaces.
5
How Hard are Networks to Manage?
• Operating a network is expensive
– More than half the cost of a network
– Yet, operator error causes most outages
• Buggy software in the equipment
– Routers with 20+ million lines of code
– Cascading failures, vulnerabilities, etc.
• The network is “in the way”
– Especially a problem in data centers
– … and home networks
6
Creating Foundation for Networking
• A domain, not a discipline
– Alphabet soup of protocols
– Header formats, bit twiddling
– Preoccupation with artifacts
• From practice, to principles
– Intellectual foundation for networking
– Identify the key abstractions
– … and support them efficiently
• To build networks worthy of society’s trust
7
Rethinking the “Division of
Labor”
8
Traditional Computer Networks
Data plane:
Packet
streaming
9
Forward, filter, buffer, mark,
rate-limit, and measure packets
Traditional Computer Networks
Control plane:
Distributed algorithms
10
Track topology changes, compute routes,
install forwarding rules
Traditional Computer Networks
Management plane:
Human time scale
11
Collect measurements and configure the
equipment
Shortest-Path Routing
• Management: set the link weights
• Control: compute shortest paths
• Data: forward packets to next hop
1
1
1
1
12
3
Shortest-Path Routing
• Management: set the link weights
• Control: compute shortest paths
• Data: forward packets to next hop
1
1
1
1
13
3
Inverting the Control Plane
• Traffic engineering
– Change link weights
– … to induce the paths
– … that alleviate congestion
5
1
1
1
14
3
Avoiding Transient Anomalies
• Distributed protocol
– Temporary disagreement among the nodes
– … leaves packets stuck in loops
– Even though the change was planned!
15
1
1
1
15
3
Death to the Control Plane!
• Simpler management
– No need to “invert” control-plane operations
• Faster pace of innovation
– Less dependence on vendors and standards
• Easier interoperability
– Compatibility only in “wire” protocols
• Simpler, cheaper equipment
– Minimal software
16
Software Defined Networking (SDN)
Logically-centralized control
Smart,
slow
API to the data plane
(e.g., OpenFlow)
Dumb,
fast
Switches
17
Classical Networking
Alice
Ted Stevens was right
Mallory
•
•
•
Networks provide end-to-end connectivity
Just contain host and switches
All interesting processing at the hosts
Bob
Trent
Security & Performance
Alice
Bob
Firewall
Load
Balancer
Mallory
•
•
•
Security (firewalls, IDSs,…)
Performance (caches, load balancers,…)
New functionality (proxies,…)
Cache
Trent
Middleboxes
• Middleboxes are intermediaries
– Interposed in‐between the communicating hosts
– Often without knowledge of one or both parties
• Examples
–
–
–
–
–
–
Network address translators (NAT)
Firewall
Traffic shapers
Intrusion detection systems (IDSs)
Transparent Web proxy caches
Application accelerators
NAT
local
prt
10.0.0.1 1
global
138.76.29.7
Firewalls
A
B
Trusted Hosts
H
A H
HA
Firewalls
A
B
Trusted Hosts
H
HB
Learning Switch
A
B
D
1
2
Learning
Switch
A on 1
D on 3
C
3
Web Clients and Servers
• Most Web applications use client-server protocol
– Client sends a request
– Server sends a response
• Proxies play both roles
– A server to the client
– A client to the server
www.google.com
Cache
25
www.cnn.com
Two Views of Middleboxes
• An abomination (toevah)
– Violation of layering
– Breaks the functional model
– Responsible for many subtle bugs
• A practical necessity
– Significant part of the network
– Solving real and pressing problems
– Needs that are not likely to go away
– Local functionality enhancements
Local enhancements: Riverbed
Cache
Proxy
Overloaded
Normal Load
Middlebox code can get complex
• Source code complexity
– Bro Network Intrusion
• 101,500 lines of C++, Python, Perl, Awk, Lex, Yacc
– Snort IDS 220,000 C, …
– Pfsense 476438 locs of C,php,scripts,…
• Hard to specify correctness
– What is a correct IDS?
Middlebox code can get complex
• Source code complexity
– Bro Network Intrusion
• 101,500 lines of C++, Python, Perl, Awk, Lex, Yacc
– Snort IDS 220,000 C, …
– Pfsense 476438 locs of C,php,scripts,…
• Hard to specify correctness
– What is a correct IDS?
Programming error
• The middlebox code fails to implement the required
functionality
• Incorrect intrusion detection system
– 10 CVE reports for pfsense in 2014, a popular firewall
– CVE on Firewall hardware from Palo Alto Networks (2010)
• Misinterprets HTTP cookie options, etc
• Heartbleed bug
– allows anyone on the Internet to read the memory of the
systems protected by the vulnerable versions of the
OpenSSL software
• Requires code analysis
Hypothesis
• There are only few types of middleboxes
• Can abstract the model of middleboxes as
finite state machines
Safety of Computer Networks
• Show that something bad cannot happen
• Early detection of potential bugs
• Isolation:
• A packet of type t sent from host A never reaches host
B
• Isolation between two universities
• SSH packets from host A cannot reach B
Safety with middleboxes
• Safety can be checked when the network only has
switches with static routing rules
• Trace the forwarding graph
• Middleboxes make everything harder
• Arbitrary behavior – black box
• Rewrite packet headers
• Middlebox behave differently over time – need to
reason about history
• Composition may violate safety
Firewall Misconfiguration
Cache Proxy
Firewall
Deny A
A
A B
Proxy P
B
P B
A is isolated from B
Complex misconfiguration
IDS
Load Balancer
At most one packet from B
B
B A
Load
Balancer
IDS
At most one packet from B
VeriCon: Towards Verifying
Controller Programs in SDNs
Thomas Ball, Nikolaj Bjorner, Aaron Gember,
Shachar Itzhaky, Aleksandr Karbyshev, Mooly Sagiv,
Michael Schapira, Asaf Valadarsky
Traditional Computer Networks
Control plane:
distributed algorithms
Data plane:
packet
streaming
37
New Paradigm:
Software Defined Networking (SDN)
logically-centralized control in software
smart but
slow
software
API to the data plane
(e.g., OpenFlow)
dumb
but fast
hardware
switches
38
Controller: Programmability
APP
APP
APP
Controller
events from switches
topology changes,
traffic statistics,
arriving packets
commands to switches
(un)install rules,
query statistics
39
Firewall Pseudocode
ft = {}
rel trusted(SW, HO) = {}
while true {
event(switch, srcdst, in-port) 
if exists out-port s.t. <switch, srcdst, port out-port> ft
switch.forward(srcdst, in-port out-port) // handled by switch
else
if in-port = 0
switch.forward(srcdst, 01)
// forward to outside world
trusted.insert(switch, dst)
// dst is now trusted
ft.insert(switch, src dst, 01) // insert a per-flow rule to forward future
else if in-port = 1
// packets from the outside world
if <switch, src>  trusted
switch.forward(src dst, 10) // forward the packet to trusted hosts
ft.insert(ft.insert(switch, src dst, 10) // insert a per-flow rule to
// forward future packets
}
Desired Network Properties
• Routing
– No forwarding loops, no black holes, …
• Security
– ACL, firewall, middleboxes, …
• Traffic Engineering
– Load balancing, VM migration, …
• …
41
How can we guarantee
such properties?
42
Traditional Networks vs. SDN
• Guaranteeing these properties in a traditional
networks is hard
– Switch/ Router code is a “black box”
– Protocols are distributed across devices
• SDN opens up the possibility of applying
formal software verification to networks!
– Accessible code
– Centralized control (sequential core)
– Distributed switches with simple semantics
43
Existing Approaches for SDN Verification
• Finite-state model checking
Might miss
bugs!
– NICE & Verificare, FlowLog
• Analyzing network snapshots
– Header Space Analysis
Discover bugs
too late
&
run-time
overhead
• Run-time checks
– VeriFlow & NetPlumber
44
Dream Scenario
• Verify network-wide properties
at compile time
– Find violations before they occur!
• Provable verification
– Prove correctness for correct programs
– Parametric network toplogies
– Find a counterexample for incorrect programs
(useful for debugging)
45
An Ideal Tool
switch1  switch2. port. link(switch1,
switch, packet.
port, switch2)
ssh(p)  !forward(switch, packet)
Restrictions on
Topology (T)
Controller Code
(P)
Desired
Properties 
Verification
Conditions
Generator
T P  
Solver
Counterexample
Proof
An Ideal Tool
Restrictions on
Topology (T)
Controller Code
(P)
Desired
Properties 
Verification
Conditions
Generator
In general P is not
expressible in FOL
T P  
FOL sat. Solver(z3)
Counterexample
Proof
Inductive Invariants
• An invariant Inv is inductive if:
1.
2.
The initial state satisfies Inv
Whenever an event E is executed on arbitrary state satisfying Inv
• the resulting state satisfies Inv
• {Inv} E {Inv}
• Permits compositional verification
• … but may be hard for programmers
• Can be inferred by backward propagation (WP)
Non-inductive
x = 2;
while true do x := 2* x - 1
Inductive
x>0
x>1
E
E
E
A Less Ideal Tool
Restrictions on
Topology (T)
Controller Code
(P)
Verification
Conditions
Generator
Desired
Properties 
Inv  
Init Inv 
Inv event  Inv
FOL sat. Solver(z3)
Counterexample
Proof
Firewall Pseudocode
ft = {}
rel trusted(SW, HO) = {}
while true {
event(switch, srcdst, in-port) 
if exists out-port s.t. <switch, srcdst, port out-port> ft
switch.forward(srcdst, in-port out-port) // handled by switch
else
if in-port = 0
switch.forward(srcdst, 01)
// forward to outside world
trusted.insert(switch, dst)
// dst is now trusted
ft.insert(switch, src dst, 01) // insert a per-flow rule to forward future
else if in-port = 1
// packets from the outside world
if <switch, src>  trusted
switch.forward(src dst, 10) // forward the packet to trusted hosts
ft.insert(ft.insert(switch, src dst, 10) // insert a per-flow rule to
// forward future packets
}
Desired Properties Firewall
• S.frwd( Src Dst, 10) 
Src’: HO. S.frwd(Dst Src, 0 1)
controller
trusted
event( , 1)
Switch
Host
Forwarding Table

a
s
1
a
0
Src
In
Dst
Out
*
1
*
0
Desired Properties Firewall(2)
• S.frwd( Src Dst, 10) 
Src’: HO. S.frwd(Dst Src, 0 1)
• S.ft( Src Dst, 1 0) 
Src’: HO.
S.frwd(Src’ Src, 0 1)
Desired Properties Firewall
• S.frwd( Src Dst, 10) 
Src’: HO. S.frwd(Dst Src, 0 1)
• S.ft( Src Dst, 1 0) 
Src’: HO.
S.frwd(Src’ Src, 01)
trusted
event( , 1)
Switch
Host
*
*
controller
Forwarding Table

a
s
1
a
0
Src
In
Dst
Out
Inductive Invariant Firewall
• S.frwd( Src Dst, 10) 
Src’: HO. S.frwd(Dst Src, 0 1)
• S.ft( Src Dst, 1 0) 
Src’: HO.
S.frwd(Src’ Src, 01)
• <S, H> trusted 
Src: HO. S.frwd(Src H, 01)
Programs Proved
Program
Program and Property
Firewall
Correct forwarding for a basic firewall abstraction
MigFirewall
Correct forwarding for a firewall supporting migration of
“safe” hosts
Learning
Topology learning for a simple learning switch
Resonance
Access control for host authentication in enterprises
Stratos
(Simplified)
Forwarding traffic through a sequence of middleboxes
55
Incorrect Programs
Program
CE
#Host
CE
#Switch
Auth-NoFlowRemoval
3
2
Firewall-ForgotConsistency
5
3
Firewall-ForgotPortCheck
6
3
Firewall-ForgotTrustedInvariant
6
3
Learning-NoSend
11
1
Resonance-StatesNotMutuallyExclusive
11
4
StatelessFireWall-AllowAll2to1Traffic
4
2
VeriCon: Challenges and Solutions
• Inductive Invariants
– We describe a simple tool that infers inductive
invariants for some SDN programs
• Iterative WP
• Future research: Abstract Interpretation, CEGAR
• SDN programs must be coded in a specific language (CSDN)
– VeriCon can be extended to support Java, Python, etc.
• SAT solver might not terminate!
– Many properties are in a sub-family of FOL (* *)
– … solver termination guaranteed!
• VeriCon assumes atomicity of events
– “Existing” solutions
– Future research: verify stronger properties
57
Summary
• SDN opens up an opportunity for applying
formal verification to networks
• VeriCon is the first system to directly prove
correctness of generic SDN programs at
compile time
– for unbounded topologies, #packets, etc.
58
On the Complexity of
Verifying Stateful Networks
A. Panda
S. Shenker
Y. Velner
K. Alpernas A. Rabinovich S. Shoham
Topology Assumptions
• Finite set of hosts H
• Fixed set of middleboxes M
– Switches are degenerate middleboxes
• Fixed undirected topology
E  (H  Pr  M)  (M  Pr  Pr M)
Packet Assumptions
• Finite set of packet types T
• Finite set of ports Pr per middlebox
• Finite set of packet headers
(t, src, dst, pr) P = T  H  H  Pr
• No bound on the number of packet sent
• Many packets may be sent before a safety
violation occurs
Middlebox Abstract Semantics
• The abstract semantics of each middlebox is a
function
– m: P*  P  2P = P*  (P  2P)
– Packet bodies are unchanged
Common middleboxes
Middlebox
Function
Switch
h, p = {p[outpr} | pr  PR – p.ip}
Firewall
h, p = if trusted(p, h)
then {p[outpr} | pr  PR – p.ip} // forward
else {} // drop
Learning
Switch
h, p = if there exists pr0 Prt such that
connected(p.dst, h, pr0)
then {p[outpr0] } // forward
else {p[out}  pr :pr Prt, pr p.ip} // flood
IDS
h, p = if trusted(p, h)
then {p[outpr} | pr  PR – p.ip} // forward
else {} // drop
Cache Proxy
h, p = if avail(p.body, h, response)
then {p[srcme, dst p.src,body response]}
else {p[src me]}
Modeling Middliboxes by FSMs
• A Transducer m =<S, s0, P, , >
where
–
–
–
–
–
S are the states of the middleboxes
s0  S is the initial state
: S  P  2P is the current forwarding behavior
: S  P  2S is the next state
Extend  to histories
•  ([]) = {s0}
•  (h . p) =  ( (h), p))
• m models m: P*  P  2P when for all h P* and
P P:
– ((h), p) = m(h, p)
Partial FSM for Firewall
…
…
…
…
…
Trusted ={2}
…
…
…
…
(Type, Source, Destination, Port)/{Forwarded Packets}
…
The Safety Problem
• Given a fixed topology of middleboxes
• A finite state transducer for each of the
middleboxes
• Prove that there exists no scenario of packet
transmissions leading to a bad state
• Identify such scenariors
Undecidability
• Checking safety properties such as isolation is
undecidable even for finite state middleboxes
– Cycles in the topology allows counting
– Even in the absence of forwarding loops
Obtaining Decidability
• Show that if there is a scenario leading to a
safety violation then there is also bounded
one
• Reduction to a decision procedure
Non-Deterministic Packet Handling
• Assumes that order of packet processing is
arbitrary
• It may be that a packet p arrives before q and yet
the middlebox processes q first
• If a the network is safe under non-deterministic
assumption it is also safe under FIFO assumption
• May lead to false alarms
– Middlebox can impose orders based on
acknowledgements
Decidability
• Under non-deterministic assumptions safety is
decidable
• More packets per state means more forwarding
options
– Order is immaterial
– Terminating backward reachabilty
• Well Quasi-Order on Packet Multisets
• Reduction to Coverability in Petri Net
– But complexity is high
• EXPSPACE-Complete
Middlebox classification
Arbitrary
Progressing
Increasing
IDS
Firewall
Stateless
Cache
Switch
Nat
Learning
Switch
Load
Balancer
Stateless Middleboxs
• Behavior independent of the history
– Can maintain configuration information
• For all h, h’  P*:
– m(h) = m(h’)
– For all p  P: m(h, p) = m(h’, p)
• Examples
– Switches and Routers
– ACL Firewall
– Simple load-balancer
Increasing Middleboxs
• For every history, adding packets increase
forwarding behavior
• For all h1, h2  P* , p, p’  P:
– m(h1:h2, p)  m(h1:p’:h2, p)
• Good examples
– Stateless
– Firewall
• Bad Examples
– Learning Switch
– Cache
Middlebox classification
Arbitrary
Progressing
Increasing
IDS
Firewall
Stateless
Cache
Switch
Nat
Learning
Switch
Load
Balancer
Abstract Middlebox Definition Language
• Powerful enough to express the behavior of interesting
middleboxes
• Succinct
– Sometimes exponential state saving
• Simple enough for analysis
• Lends itself to classification of middleboxes
– Same worst case complexity
– But sometimes exponential saving
Firewall (AMDL)
firewall(self) =
receive(p, prt)
when prt = 1
trusted_hosts.insert p.dst
forward p to 2
when prt = 2 and p.src  trusted_hosts
forward p to 1
Proxy (AMDL)
proxy(self) =
receive(p, prt) 
when (p.type, response) cache
//stored response
forward response[src=self.host] to prt
when (p.type, p.src, p.dst,rport)requested
// first response
cache.insert (p.type, p);
forward p[src = self.host] to port
otherwise // new message
requested.insert (p.type, p.src, p.dst, prt);
forward p[src = self.host] to oprt
forall oprt  AllPrt and oprt != pr
Firewall vs. FSM
firewall(self) =
receive(p, prt)
when prt = 1
trusted_hosts.insert p.dst
forward p to 2
when prt=2 and
p.srctrusted_hosts
forward p to 1
The MuteVer Toolset
Counterexample
Proof
AMDL spec
Front-End
DataLog
Petri-Net
LogicBlox
Lola
Amazon EC2 Security Groups model
Fat Tree Switch
Tenant 1
Tenant 2
Tenant n
Public 1
Private 1
Public 1
Private 1
Public 1
Private n
Public 2
Private 2
Public 2
Private 2
Public 2
Private 2
Query
• Q1: can a packet arrive from tenant 7 to
private host of faulty tenant, provided that the
private host never sent a packet to tenant 7?
(YES)
• Q2: can a packet arrive from tenant 7 to
private host at tenant 2 (not faulty), provided
that the private host never sent a packet to
tenant 7? (NO)
Results (muZ)
70
60
50
40
Time per query
(sec)
SAT (bug)
30
UNSAT (no bug)
20
10
0
0
200
400
600
800
Number of tenants (4 hosts per tenants)
1000
1200
Summary
• Middlebox classification
• Complexity results
• Initial toolset
Checking Beliefs in
Dynamic Networks
N. Lopez
N. Bjorner
P. Godefroid K. Jayaraman
G. Varghese:
A Cloud Harnessed by Logic/SE
Cloud
Explosion
Monitoring
at Scale
Network Policies:
Complexity, Challenge and Opportunity
Several devices, vendors, formats
• Net filters
• Firewalls
• Routers
Human errors > 4 x DOS attacks
Challenge in the field
• Do devices enforce policy?
• Ripple effect of policy changes
Arcane
• Low-level configuration files
• Mostly manual effort
• Kept working by
“Masters of Complexity”
Human Errors by Activity
13%
Config Changes
13%
Device hw/sw updates
74%
WA Cluster Setup
A Data-center Architecture
Policy
Policy
Policy
Policy
Policy
Policy
Policy
Policy
SecGuru workflow
StreamInsight Complex Event Processing (CEP) Application
Azure
Network Devices
Configuration
Stream
GNS Edge
Network Devices
Contract
Database
Contract
Stream
Reports
Database
SECGURU
ACL
Validation
Device Validation
Stream
Theorem Prover
Windows Azure Network Monitoring Infrastructure
Alerts
+
Reporting
in
WANetmon
Access Control
Contract:
DNS ports on DNS servers are accessible from tenant
devices over both TCP and UDP.
Contract:
The SSH ports on management devices are inaccessible
from tenant devices.
SecGuru in WANetmon
40,000 ACL checks per month
Each check 50-200ms
20 bugs/month (mostly for build-out)
MICROSOFT CONFIDENTIAL
SecGuru for GNS edge ACLs
Regression test suite + SecGuru check
correctness of Edge ACL prior to deployment
Edge ACL
Regression
Contracts
SecGuru
2700+ to 1000 ACLs
Edge ACL
SecGuru
Edge ACL
Several major
Edge ACL pushes
no major impact
on any services
Stable state
Regression
Contracts
Policies as Logical Formulas
Precise Semantics as
formulas
Traditional Low level of
Configuration network
managers use
Combining
semantics
Contracts/
Policies
Semantic
Diffs
Policies as Logical Formulas
Precise Semantics as
formulas
Traditional Low level of
Configuration network
managers use
Combining
semantics
Contracts/
Policies
Semantic
Diffs
Beyond Z3: a new idea to go
from one violation to all violations
Semantic
Diffs
dstIp
dstIp
SecGuru contains optimized algorithm for turning
single solutions into all (product of ranges)
srcIp
srcIp
srcPort