Differential Provenance: Better Network Diagnostics with Reference Events Ang Chen Yang Wu Andreas Haeberlen Wenchao Zhou+ University of Pennsylvania Boon Thau Loo Georgetown University+ Motivation: Finding the root cause of a symptom Traffic arriving at the wrong server !?! Overly specific flow entry Internet 4.3.2.0/24 4.3.3.0/24 Bob Web server 2 Web server 1 • DPI Networks can (and frequently do!) have bugs • • Example: Software-defined networks We need a good debugger! 1 Debugging networks with provenance C received packet Packet P Packet P B sent packet A B C B received packet Rule match on B A sent packet A received packet • • Rule match on A Typical debuggers tell us what happened: • • Rule installed by controller Incoming packet at controller NetSight: Packet histories Y!: Network provenance Key benefit: Rich explanation of what, when, and why. 2 Problem: Explanation can be too big! Rule 7: Next-hop=port2 ot ro root Root cause: faulty rule Packet arrives at wrong server • The problem: Finding the root cause in a large provenance tree. 3 Key insight: Use reference events! S1 S2 S3 S4 S5 S6 Web server 2 Web server 1 • • • Bob DPI Remember that some packets were routed correctly. The same things should have happened to all packets! Key insight: If we have both a (bad) symptom and a (good) reference, we only need to reason about the differences between them! 4 A new debugger fault Field 3 of config entry 4 is wrong! Bob reference Debugger • • • Bob collects both a bad symptom and a good reference Bob sends both events to the debugger Debugger generates provenance, outputs difference • Ideally, there is only one diff—the root cause! 5 Outline - Motivation: Network diagnostics Background Key insight A new debugger Differential provenance - Are references typically available? Strawman approach Our approach Initial results - Conclusion 6 Are references typically available? • • Survey: • • • Posts on the ‘Outages’ mailing list in Sept-Dec 2014. 64 posts related to diagnostics. 42/64 (66%) posts involve both a fault and some reference. Examples: • • • Some DNS servers have stale records, but others are good Probes sometimes fail, sometimes succeed More examples in the paper 7 Strawman solution faulty rule - root root = ? new root Bad provenance Reference provenance root • A strawman solution: Pick out different nodes in trees. • • • Bad provenance: 201 nodes Reference provenance: 156 nodes Naïve diff: 278 nodes! 8 Why does the strawman not work? Faulty rule • • • Observation: The diff can be larger than the individual trees. Reason #1: Differences that “do not matter” • E.g., timestamps, packet payloads, etc. Reason #2: “Butterfly effect” • A small difference can change later events drastically! 9 Differential provenance Output: - Rule 7: change port - Rule 9: change range • Bad provenance Reference provenance Approach: Change past events, and think about what could have happened. • (1) Find some early ‘differences’ in the trees. • (2) Change the faulty node to a correct equivalent. • (3) Use replay to determine what would have happened. • (4) Output the set of changes that align the trees. 10 Technical challenges • • • • Challenge #1: Where do we start? • • Heuristics: Change early events, minimum changes… E.g., prefer changing 1 event than 1000 events. Challenge #2: How should we make the change? • • Approach: Think about what should have happened. E.g., packet should go to switch 2, not 1. Challenge #3: Irrelevant differences? • • Approach: Equivalence relations between events. E.g., IPs 4.3.2.1 and 4.3.3.1 See paper for more details. 11 Setup Overly specific flow entry Internet Web server 1 • 4.3.2.0/24 4.3.3.0/24 DPI Setup • • • • Platform: RapidNet SDN: 6 switches, 2 servers The symptom: misrouted packets from 4.3.2.0/24 The reference: packets from 4.3.3.0/24 12 Initial results = new root Fault: 201 nodes Naïve diff root Reference: 156 nodes = Rule 7: next hop should be port 1, not 2! Differential provenance • Differential provenance finds a single node (the faulty rule) to be the root cause! 13 Conclusion • • • • Debugging networks is hard • Need good debuggers! Provenance can find the causes of an event • Problem: Explanation can be too detailed. Idea: Use reference events • • Sufficient to find the (few) differences to the observed symptom New debugger based on differential provenance Result: Very precise diagnostics • Ideally, can identify a single root cause! Thanks! 14
© Copyright 2026 Paperzz