An Efficient, Low-Cost Inconsistency Detection Framework for Data and Service Sharing in an Internet-Scale System Yijun Lu†, Hong Jiang†, and Dan Feng* †University of Nebraska-Lincoln, USA * Huazhong University of Science and Technology, China 1 Introduction • Consistency control is important – Active replication is essential to data security – Systems need to handle updates – Thus, consistency needs to be maintained • Challenges – Requirement is difficult to predict – Overhead to maintain consistency is high – In Grid-like systems, network is unreliable 2 Two Flavors: • Inconsistency avoidance – To avoid inconsistency in the first place. Incur high maintenance cost and support a specific application. – Examples: Strong consistency NFS consistency etc. – Optimistic consistency protocol? Pre-defined • Inconsistency detection – Our new approach – There is no need to define consistency protocols 3 Inconsistency Detection • Features – No need to pre-define consistency level – Detect inconsistency among nodes in a timely manner – Resolve inconsistencies based on application semantics • Advantages – Efficient: Timely inconsistency detection – Low-cost: No prohibitive cost associated with a given consistency protocol – Versatile: Several applications with different consistency requirement can run simultaneously 4 Overview of IDF 5 Efficient Detection Focus of this paper 6 Outline • • • • • • Background Design Evaluation Inconsistency resolution Related work Current status 7 Background • RanSub – Locate disjoint content within a system – Two processes: collect/distribute – Used to exchange nodes’ information among one another • Gossip-based data dissemination – A node disseminates non-duplicate packets to random set of neighbors every T seconds. – Each message travels a certain number of hops – Used to distribute updates 8 Design of Timely Detection • Basic idea – Two layers – Top layer captures most inconsistencies fast – Bottom layer catch all the missed inconsistencies • Terms – Temperature: the frequency that a user updates a certain file in a period of time. 9 1. Measure the Updating Patterns • Importance – Use nodes’ updating patterns as an indicator of their interest in a certain file, called temperature. – The higher the temperature, the more likely a node is the “trouble maker”—It causes most inconsistencies. • Strategy – A node tracks its updating history for a certain file during a certain period of time. 10 2. Learning the Updating Patterns • Use RanSub – Collect nodes’ updating patterns – Each node learns a random disjoint set with each distribution • Possible improvement – RanSub uses a single multicasting tree – This cannot tolerate a single interior node failure – Deploy a multicasting forest? 11 3. Temperature Collection/Dist. • Why does this matter? – Network bandwidth cost could be prohibitive – Think the total number of files in a computer • Interest-group based approach – Nodes only report the temperature of files that they are interested in. – In distribution, an interior node only relays the temperature of files that are interested in by nodes in its sub-tree • Result – It can be supported by any connectivity, including a dial-up connection. 12 4. Two-layer detection An example: • Two layers – Solid line: top layer – Dotted line: bottom layer • Version vector is used to detect inconsistencies • Mechanism – Travel the top layer first – If no inconsistency found in top layer Go to the bottom layer 13 5. Caching & Garbage Collection • Caching – Cache temperature information – Cache routing information among top layer, then smart decision can be made to save traversal time • Garbage collection – Keep the temperature fresh – Assign time stamp to each piece of temperature information – Temperature information expires when the an information is older than a threshold. 14 6. Discussion • Till now, we treat the term “update” generically – Only one kind of “update” • Several forms of update exist, indeed – Creating – Modifying – Deleting • It does not matter in the detection part, but does matter when we design the APIs for applications 15 Evaluation 1: Failure rate • Why do we care about it? – Top layer detects inconsistencies much faster than bottom layer – It is desirable that most inconsistencies are captured by the top layer • Analysis result – In worst case scenario, two sub-cases exist Case 1: failure rate 0.04% Case 2: failure rate 18.9% – See paper for clarification • Main message – Top layer captures the vast majority of inconsistencies! 16 Evaluation 2: Maintenance Cost • Metric – # of messages received by each node incurred by the maintenance process • Simulation setup – 1000 nodes in the network. – Simulation runs 800 seconds. • Result – Max bandwidth cost: < 6KB/s 17 Inconsistency Resolution • Overview – Utilize detection result – Support multiple applications with different requirement for consistency control • Semantic-based resolution (ongoing & future work) – Get semantics Hint-based Middleware detection – Resolution schemes Middleware automatically resolves inconsistency Ask users’ preference before reacting 18 Related Work • TACT – Explore trade-off between consistency level and performance • DENO – Peer-to-Peer scheme, yet to maintain strong consistency • Lpbcast – Pure gossip-based protocol • Quorum system – Could fails in the presence of node failure 19 Current Status • Dealing with inconsistency resolution – Support applications. • Implementing a prototype on Planet-Lab • Investigating the implications of the new framework to large-scale distributed systems in general 20
© Copyright 2026 Paperzz