RELIABLE PUBLISH/SUBSCRIBE MIDDLEWARE FOR TIMESENSITIVE INTERNET-SCALE APPLICATIONS (2009) WRITTEN BY: CHRISTIAN ESPOSITO, DOMENICO COTRONEO, ANIRUDDHA GOKHALE Presentation by Ana Salvado - nr.44299 INDEX • Introduction • Protocol Pub/Sub • • • • Properties Faults Available solutions Design of Reliable and Timely Event Dissemination • Simulation • Related Work • Conclusion and Future Work INTRODUCTION The problem It’s difficult to assure reliability and timeliness in hostile environment. The publish/subscribe model is a promising solution for scalable data dissemination over wide-area networks. In this paper they discuss a new approach to fill this gap making three contributions: A cluster-based peer-to-peer organization is introduced to handle a large number of publishers and subscribers. Clustering publishers and subscribers that reside on the same routing domain, and using a hierarchical peer-to-peer organization to handle large numbers of participants without affecting the message latency. INTRODUCTION The cluster coordinator is replicated to mask process crashes and to preserve cluster connectivity toward the outside world. Implementing a replication scheme for the coordinator in each cluster to treat process crashes without leading to disconnections among the clusters or causing considerable fluctuations in the message delivery time; Multiple-tree redundancy is applied to tolerate link crashes thereby minimizing unpredictability in the delivery time. Adopting a multi-tree approach to cope with link crashes and to preserve connectivity within the overlay of coordinators without worsening the timeliness of the data dissemination. In the last years, more industrial projects aim to develop the Large scale Complex Critical Infrastructures (LCCIs). INTRODUCTION The effectiveness and performance of LCCIs strongly depend on the quality of the adopted interconnection middleware, which has to deal with some challenges. Timeliness - critical operations exhibit a time-sensitive behavior. So messages have to be exchanged with a predictable latency. (information is useful only if delivered "on time" ) Reliability – critical systems must be dependable and able to properly handle error conditions imposed by network and process faults. Then, message dissemination has to be guaranteed despite manifestations of faults. Scalability - the rising of the activity of a single system and the escalation of connected systems increase the number of data exchanges. So, middleware must be able to scale while maintaining suitable performance. PROPERTIES OF RELIABLE AND TIMELY PUBLISH/SUBSCRIBE MIDDLEWARE A pub/sub service is made of several processes that exchange messages through notification service. These processes play the roles of publishers, which produce messages, and/or subscribers, which consume the messages in whom they are interested. This service has to satisfy safety and liveness. A pub/sub service is reliable if the message delivery is guaranteed despite that processes may fail and/or the network may be affected by several anomalies. For a reliable pub/sub system this properties has also to be satisfied: agreement, validity and integrity. LCCIs has real-time constrains jointly to fault-tolerance (if a message is delivered out of temporal deadlines is considered lost). The pub/sub system has to verify also the timeliness property. FAULTS IN PUBLISH/SUBSCRIBE MIDDLEWARE To achieve reliability, is needed to treated the several kinds of faults that may affect the publish/subscribe service. This faults can be: Network anomalies: misbehaviors of a link (loss, corruption, delay, partitioning…). Link crash: links experience loss of connectivity. Node crash: nodes crash due to hardware/software failures. Churn: nodes unexpectedly join/leave the system. APPROACHES TO RELIABLE PUBLISH/SUBSCRIBE MIDDLEWARE There are two ways to implement the notification service: IP Multicast (which I assume, for what the teacher said, that is the network level) o Retransmission o Forward Error Correction Application level multicast for scalability reasons o Epidemic Algorithm o Reconfigurations o Broker Replication o Path Redundancy DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION Their approach is a topic-based publish/subscribe middleware to federate mission critical systems (ex: LCCIs) over wide-area networks. The objective: all the subscribers are guaranteed to receive on-time all the messages despite several link and node crashes (They want to assure reliability and timely event delivery). Pub/sub service is composed of processes and they don’t expose churn (they only leave the network due to a failure). This service can be affected only by node and link crashes among the possible faults. Processes may fail due to crashes and without loss of generality, they assume that processes don’t recover after a crash. Links can crash, however, they recover after a certain period of time. They assume that the network isn’t partitionable due to link crashes (this means that with two correct processes there will always be links that allow one process to reach the other). DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION Grouping Nodes in Clusters The current Internet topology is composed of interconnected Routing Domains, each one sharing common administrtion control and routing protocol. Processes: Are scattered across different stub domains. Communicate each other through several transit domains. Behave according to the properties of reliable and timely pub/sub middleware DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION Stub Domains within which the path joining two of its nodes. Ex: LANs, AS (Autonomous Systems) Transit Domains They are in charge of efficiently interconnecting many stub domains and to form the network main support. They are affected by several failures that may compromise the effectiveness and resiliency of the message forwarding. DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION coordinator Hierarchical approach to organize the service: For simplicity a node runs only a single process Nodes in the same domain are clustered together. Eash cluster holds a coordinator that allows interactions with other clusters. Nodes communicate using intra-cluster routing and can send messages outside the cluster only through their coordinator. DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION Overlay Routing Uses IP Multicast to implement the intra-cluster routing. Provides efficiency Low probability that networks faults will occur. Timely and reliable data delivery within clusters is guaranteed. Each cluster has to be connected to other through its coordinator. DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION There are three possible solutions to implement the cooperation inter-cluster: Network-level multicast Proxy-based overlay network Peer-to-Peer application-level multicast: o Implements multicast services on top of a peer-to-peer communication infrastructure. o Provides the same Scalability properties of proxy-based overlay network, but has the self-* capabilities. o Suffer from inefficiencies due to high load on the network. Using proximity criteria allows reducing network load. DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION Peer-to-Peer application-level multicast have two approaches: tree-based and mesh- based. They adopt a tree-based solution: Organize the nodes into a tree, where each node can implicity define its parent from which it receives the incoming messages This approach provide direct control over the path followed by messages and exhibit lower communication overhead. Is built on top of a structured or Distributed Hash Table (DHT) overlay since it simplifies the tree construction DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION Tree construction: Each node have a unique peerID. Each multicast communication has a unique groupID. The coordinator whose peerID is numerically close to the groupID is selected as the rrot of the tree. If a coordinator wants to send a message (publish), it forwards the message to the root, and this would deliver it. DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION Fault-Tolerance at the Cluster Level Weakness: The entire system is vulnerable when a coordinator fails. If this happens the affected cluster remains isolated from the rest of the system and the disconnection of some coordinators from the rest of the overlay tree. Solution: Replicate the coordinator. DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION There are two options to replicate the coordinator: Passive replication scheme Active scheme They proposed a hybrid sheme where are p active coordinators at the same time and there are k backups for each active coordinator. DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION Fault-Tolerance at the Overlay Level The replication-based approach doesn’t handle with link crashes. Solution: path redundancy and there are three alternatives: Cross-link – connecting random peers via extra cross-cuting links. In-tree – establishing alternative links amoung different layers of the tree. Multiple-tree – creating several overlapping trees. All improve resiliency. They choose to adopt the multiple-tree, because it’s able to reduce delivery ratio and copes better with strict real-time deadlines. Also enforces reliability and timeliness. DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION When a message has to reach a node from another one, it travels through a path consisting of a succession of network devices (routers/sitches). Reciprocal diversity – number of overlapping network devices. The two paths are considered to be diverse if reciprocal diversity is zero. It is possible to have two different formulations of path diversity: Global diversity – If we can’t find in the tree two paths that have a positive reciprocal diversity. Local diversity – If we can’t find paths from one node to its parent and children that have a positive reciprocal diversity. DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION How a new node join two distinct trees? Problem: Joining procedure, because clusters host multiple active coordinators. Solution: only one coordinator at a time performs the procedure. Some nodes may experience message losses due to link crashes. DESIGN OF RELIABLE AND TIMELY EVENT DISSEMINATION N3 will never receive messages published by N15 even if it’s not isolated. This appens when all inbound connections to the link have crashed. The outbound connections are still correct, and they can be used to recover from this situation. Ex: N7 receive a message from N3 and then from N10. So when N7 receives a message from N10 but not from N3 before a timer has expired, it can assume that the message has not reached N3 and notifies it that it has received a message. Since it has lost a message, N3 assumes that all his inbound connections are incorrect and executes the joining procedure. SIMULATION Simulation environment called OMNET++ They tested with 300 nodes, 8 AS and 500 edges generated by a topology generator for OMNET++ called Rease. At each test they selected randomly a fixed number of nodes, which only one at a time is the publisher and all the others are subscribers. Results: RELATED WORK Clustering Referências: L. Querzoni. Interest clustering techniques for efficient event routing in large-scale settings. Proceedings of the 2nd ACM International Conference on Distributed Event-Based Systems (DEBS 08), pages 13–22, 2008. R. Baldoni, R. Beraldi, V. Quema, L. Querzoni, and S. Tucci-Piergiovanni. Tera: Topic-based event routing for peer-to-peer architectures. Proceedings of the Inaugural ACM International Conference on Distributed Event-Based Systems (DEBS 07), pages 2–13, 2007. T. Milo, T. Zur, and E. Verbin. Boosting topic-based publish-subscribe systems with dynamic clustering. Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 749-760, 2007 Multiple trees Referências: S. Birrer and F.E. Bustamante. A Comparison of Resilient Overlay Multicast Approaches. IEEE Journal on Selected Areas in Communications (JSAC), 25(9):1695–1705, December 2007. Broker replication Referências: N. Carvalho, F. Araujo, and L. Rodrigues. Scalable qos-based event routing in publish-subscribe systems. Proceedings of the Fourth IEEE International Symposium on Network Computing and Applications (NCA 05), pages 101–108, 2005. Y. Zhao, D. Sturman, and S. Bhola. Subscription propagation in highly-available publish/subscribe middleware. Proceedings of the 5th ACM/IFIP/USENIX International Conference on Middleware (MIDDLEWARE 04), pages 274–293, 2004. CONCLUSIONS AND FUTURE WORK Conclusion This study indicates that their approach enforces the reliability of event delivery without affecting its timeliness. Path redundancy has been shown to be an appealing solution. Future Work They plan to make a more comprehensive simulation study to analyze in detail the properties of the proposal solution under different network consitions. They aim to apply Network coding in order to optimize the trafic and to achieve efficient use of the network resources.
© Copyright 2026 Paperzz