Introduction Ricardo Jiménez-Peris, Marta Patiño-Martínez Lsd Distributed Systems Laboratory Universidad Politécnica de Madrid (UPM) http://lsd.ls.fi.upm.es/lsd/lsd.htm Contents • • • • • 2 Introduction to dependable distributed systems. Coordination and agreement. Transactions. Replication. Security. Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Bibliography • Distributed Systems Books: – Distributed Systems: Concepts and Design. G. Colouris, J. Dollimore, T. Kindberg. 3rd edition, Addison-Wesley, 2000. – Distributed Systems: Principles and Paradigms. A. Tannenbaum & M. van Steen. Prentice Hall. 2002. – Distributed Systems, S. Mullender, ed. ACM-Press. 2nd Ed. AddisonWesley, 1993. – Building Secure and Reliable Network Applications. K. Birman. Manning, 1996. – Distributed Systems for System Architects. P. Veríssimo, Luís Rodrigues. Kluwer, 2001. – Distributed Algorithms. N. Lynch. Morgan-Kaufmann, 1996. – Distributed Computing. H. Attiya and J. Welch. McGraw Hill. 1998. – Fault Tolerance in Distributed Systems. P. Jalote. Prentice Hall. 1996. – Gray, J. and A. Reuter, Transaction Processing: Concepts and Techniques, Morgan-Kauffman, 1993. 3 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Bibliography • T. D. Chandra and S. Toueg, Unreliable Failure Detectors for Reliable Distributed Systems, Journal of the ACM, pp. 225-267, v. 43, n. 2, mar., 1996. • J. C. Laprie, Dependable Computing and Fault Tolerance: Concepts and Terminology, Proc. of 15th Int. Symp. on Fault Tolerant Computing Systems, jun. 1985. • J. Laprie and J. Arlat and C. Béounes and K. Kanoun", • Definition and Analysis of Hardware- and Software-FaultTolerant Architectures, IEEE computer, v. 23, n. 7, pp. 39-51, 1990. • Herlihy, M.P. and J. M. Wing, Linearizability: A Correctness Condition for Concurrent Objects, ACM Trans. on Programming Languages and Systems. v. 12, n. 3, pp. 463-492, jul. 1990. 4 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Definition • A distributed system consists of a set of independent hosts interconnected by means of a network. • Inherent features of a distributed system: – – – – 5 Its components compute concurrently. There is not global clock. Components do not share memory. Components fail independently (ideally). Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Features • Desirables features of distributed systems: – – – – 6 Transparency. Scalability. Concurrency Control. Fault tolerance. Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Features: Transparency • Transparency can be applied to different aspects: – Heterogeneity: The resources are accessed in the same way independently of their architecture, operating system, programming language and software vendors. – Access: Local and remote resources are accessed in the same way. – Location: Resource can be accessed without knowing its physical location (e.g. through naming and directory services or discovery services). 7 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Features: Transparency – Replication: A replicated resource can be accessed in the same way as a non-replicated resource. – Failures: Resources can be accessed in the same way despite failures. – Mobility: Resources can be moved without affecting their operation. – Performance: The system balances the load without affecting the resource access. 8 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Features: Scalability • Metrics: – Throughput: Number of operations per unit of time. – Response time: Elapsed time between the client request and the reception of the response. – Reliability: Global system reliability with respect the reliability of its components. 9 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Features: Scalability • A distributed system is said to be scalable if one or more of its performance/dependability metrics improve by adding additional sites: – Enhance the throughput with a growing number of sites (ideally linearly). – Decreases the response time (or at least keeps it constant or grows very slowly) with a growing number of sites. – The system reliability increases logarithmically with the number of sites. 10 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Features: Concurrency Control • The resources can be accessed concurrently by different users without loosing their coherence. • Two important coherence definitions: – Linearizability: The result of concurrent invocations to a resource should be equivalent to a sequential execution of them. – Serializability: The result of a sequence of operations (a transaction) executed concurrently should be equivalent to a sequential execution of them. 11 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Features: Fault Tolerance • Failures in a distributed system are inherently partial. • Partial failures should not affect the rest of the system that should continue offering service, possibly in a degraded mode (graceful degradation). • Two important properties: – Availability: A resource remains available despite failures. – Atomicity: A resource remains consistent despite failures. 12 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Features: Fault Tolerance • Levels of Fault-Tolerance: – Detection. e.g.: Checksum. – Recovery. • Backward. • Forward. – Masking. • Fault treatment: – Redundancy: • temporal (e.g. message retransmission), • spatial (replication), • design/value (diversity). 13 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Arquitectonic Models • Remote Access: A central system is access by a terminal through the network. • Client-server: The application executes at the clients whilst the resources are stored at the servers. 14 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Arquitectonic Models • Multi-tier architectures: – Clients only contain software to interact with the system (thin clients). – Client request are handled by a front end (e.g. a web server) that forwards them to the applications that will process them. – Applications reside on application server (stateless o stateful) and are invoked by clients. – Resources are kept on data servers that are accessed by application servers. 15 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Arquitectonic Models • Mobile systems: – Mobile code: Components are fixed, but application code move (e.g. mobile agents). – Mobile sites: System which sites can move : wireless networks, ad-hoc networks, etc. that disconnect and connect again at a different point of the network. 16 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Arquitectonic Models • Event-based Arquitectures: The interaction is asynchronous, some components produce messages and other components consume them, but they are not necessarily connected at the same time. (e.g.: asynchronous messaging services). • Peer-to-peer systems: Totally decentralized massive systems (e.g. eDonkey, eMule, Gnutella, Freenet). • Service oriented architectures, SOA (e.g. web services). 17 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd System Models • Synchronous Systems : – The execution time of a process step has a known bound. – The time for transmitting a message through a communication channel is bounded and known. – The local clock drift has a known bound. – Another typical definition is that sites execute computation steps in lock-step. 18 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd System Models • Asynchronous Systems: – Computational steps are eventually executed. – Messages sent by a channel are eventually received. – Local clocks has no bounded drifts. 19 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd System Models • Partially synchronous systems. – Def. 1: The message transmission time and time taken by a computational step has a bound but is unkown. – Def. 2: The system behavior is divided into two periods. In the first period the system is totally asynchronous. In the second period, the time that takes a message transmission and a computational step has a known bound. The instant at which the system changes from the asynchronous behavior to the synchronous one is bounded, but unknown. 20 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd System Models • Failure Detectors: – The system is extended with an unreliable failure detector (that can make wrong failure detections). – The degree of synchrony of the system is modeled by the guarantees provided by the failure detector. – Failure detector properties: • Accuracy: It does not suspect from correct processes. – The accuracy can be permanent or eventual. – Strong: no correct process is suspected. – Weak: at least one correct process is never suspected. • Completeness: Incorrect processes are suspected. – Strong: By all correct processes. – Weak: By at least one correct process. 21 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd System Models Accuracy Strong Weak Eventually Eventually Strong Weak Perfect P Strong S Eventually Perfect ◊P Eventually Strong ◊S Q Weak W ◊Q Eventually Weak ◊W Completeness Strong Weak • The weakest failure detector that can solve consensus is ◊ W. • ◊ W can be extended with a small distributed algorithm to obtain ◊ S. 22 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Fault Models • Terminology: – Failure: Deviation of the system behavior from its specification. – Error: Part of the state that provokes the failure. – Fault: The cause of the error. • Faults can be: – Transitory: Limited duration (e.g. a network failure). – Intermittent: Transitory defect that repeat over time. – Permanent: Once the system fault happens it remains till the system is repaired. 23 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Fault Models • Faults can be classified according their nature: – Design: They are originated during the conception of the system or during its upgrade (e.g. the design fault in the Pentium mathematical co-processor). – Operational: originated by physical causes (e.g. a fault in the CPU fan that causes an overheating of the CPU that at its time makes the CPU to not behave according its specification). • A system tolerates failures if despite its ocurrence in one or more of its components the system remains fulfilling its specification with respect its users. That is, if it is able to mask failures of its components. 24 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd Fault Models • Fault Hierarchy: – Crash (fail-silent sites). The system crashes and stop working. Before crashing it fulfills its specification. – Omission. It might omit some action(s). – Timing. A component might execute an action before and after the interval it was specified. – Byzantine. Any kind of faults (e.g. arbitrary behavior resulting from memory corruption or CPU overheating) including malicious faults (e.g. those provoked by a hacker). 25 Laboratorio de Sistemas Distribuidos, Universidad Politécnica de Madrid Lsd
© Copyright 2026 Paperzz