Byzantine Techniques II Presenter: Georgios Piliouras Partly based on slides by Justin W. Hart & Rodrigo Rodrigues Papers • Practical Byzantine Fault Tolerance. Miguel Castro et. al. (OSDI 1999) • BAR Fault Tolerance for Cooperative Services. Amitanand S. Aiyer, et. al. (SOSP 2005) Motivation • Computer systems provide crucial services client server Problem • Computer systems provide crucial services • Computer systems fail – natural disasters client – hardware failures – software errors – malicious attacks server Need highly-available services Replication unreplicated service client server Replication unreplicated service client server replicated service client server replicas Replication algorithm: • masks a fraction of faulty replicas • high availability if replicas fail “independently” Assumptions are a Problem • Replication algorithms make assumptions: – behavior of faulty processes – synchrony – bound on number of faults • Service fails if assumptions are invalid Assumptions are a Problem • Replication algorithms make assumptions: – behavior of faulty processes – synchrony – bound on number of faults • Service fails if assumptions are invalid – attacker will work to invalidate assumptions Most replication algorithms assume too much Contributions • Practical replication algorithm: – weak assumptions  tolerates attacks – good performance • Implementation – BFT: a generic replication toolkit – BFS: a replicated file system • Performance evaluation BFS is only 3% slower than a standard file system Talk Overview • Problem • Assumptions • Algorithm • Implementation • Performance • Conclusions Bad Assumption: Benign Faults • Traditional replication assumes: – replicas fail by stopping or omitting steps Bad Assumption: Benign Faults • Traditional replication assumes: – replicas fail by stopping or omitting steps • Invalid with malicious attacks: – compromised replica may behave arbitrarily – single fault may compromise service – decreased resiliency to malicious attacks client server replicas attacker replaces replica’s code BFT Tolerates Byzantine Faults • Byzantine fault tolerance: – no assumptions about faulty behavior • Tolerates successful attacks – service available when hacker controls replicas client server replicas attacker replaces replica’s code Byzantine-Faulty Clients • Bad assumption: client faults are benign – clients easier to compromise than replicas Byzantine-Faulty Clients • Bad assumption: client faults are benign – clients easier to compromise than replicas • BFT tolerates Byzantine-faulty clients: – access control – narrow interfaces – enforce invariants attacker replaces client’s code server replicas Support for complex service operations is important Bad Assumption: Synchrony • Synchrony  known bounds on: – delays between steps – message delays • Invalid with denial-of-service attacks: – bad replies due to increased delays • Assumed by most Byzantine fault tolerance Asynchrony • No bounds on delays • Problem: replication is impossible Asynchrony • No bounds on delays • Problem: replication is impossible Solution in BFT: • provide safety without synchrony – guarantees no bad replies Asynchrony • No bounds on delays • Problem: replication is impossible Solution in BFT: • provide safety without synchrony – guarantees no bad replies • assume eventual time bounds for liveness – may not reply with active denial-of-service attack – will reply when denial-of-service attack ends Talk Overview • Problem • Assumptions • Algorithm • Implementation • Performance • Conclusions clients Algorithm Properties • Arbitrary replicated service – complex operations – mutable shared state • Properties (safety and liveness): replicas – system behaves as correct centralized service – clients eventually receive replies to requests • Assumptions: – 3f+1 replicas to tolerate f Byzantine faults (optimal) – strong cryptography – only for liveness: eventual time bounds Algorithm State machine replication: – deterministic replicas start in same state – replicas execute same requests in same order – correct replicas produce identical replies client replicas Algorithm State machine replication: – deterministic replicas start in same state – replicas execute same requests in same order – correct replicas produce identical replies client replicas Hard: ensure requests execute in same order Ordering Requests Primary-Backup: • View designates the primary replica client primary backups replicas • Primary picks ordering view • Backups ensure primary behaves correctly – certify correct ordering – trigger view changes to replace faulty primary Rough Overview of Algorithm • A client sends a request for a service to the primary client primary backups replicas Rough Overview of Algorithm • A client sends a request for a service to the primary • The primary mulicasts the request to the backups client primary backups replicas Rough Overview of Algorithm • A client sends a request for a service to the primary • The primary mulicasts the request to the backups • Replicas execute request and sent a reply to the client client primary backups replicas Rough Overview of Algorithm • A client sends a request for a service to the primary • The primary mulicasts the request to the backups • Replicas execute request and sent a reply to the client • The client waits for f+1 replies from different replicas with the same result; this is the result of the operation f+1 matching replies client primary backups view replicas Quorums and Certificates quorums have at least 2f+1 replicas quorum A quorum B 3f+1 replicas quorums intersect in at least one correct replica • Certificate  set with messages from a quorum • Algorithm steps are justified by certificates Algorithm Components • Normal case operation • View changes • Garbage collection • Recovery All have to be designed to work together Normal Case Operation • Three phase algorithm: – pre-prepare picks order of requests – prepare ensures order within views – commit ensures order across views • Replicas remember messages in log • Messages are authenticated – •k denotes a message sent by k Pre-prepare Phase assign sequence number n to request m in view v request : m multicast PRE-PREPARE,v,n,m0 primary = replica 0 replica 1 replica 2 replica 3 fail backups accept pre-prepare if: • in view v • never accepted pre-prepare for v,n with different request Prepare Phase digest of m multicast PREPARE,v,n,D(m),11 m prepare pre-prepare replica 0 replica 1 replica 2 replica 3 fail accepted PRE-PREPARE,v,n,m0 all collect pre-prepare and 2f matching prepares P-certificate(m,v,n) Order Within View No P-certificates with the same view and sequence number and different requests If it were false: replicas quorum for P-certificate(m,v,n) quorum for P-certificate(m’,v,n) one correct replica in common  m = m’ Commit Phase m pre-prepare multicast COMMIT,v,n,D(m),22 commit prepare replies replica 0 replica 1 replica 2 replica 3 fail replica has P-certificate(m,v,n) all collect 2f+1 matching commits Request m executed after: C-certificate(m,v,n) • having C-certificate(m,v,n) • executing requests with sequence number less than n View Changes • Provide liveness when primary fails: – timeouts trigger view changes – select new primary ( view number mod 3f+1) View Changes • Provide liveness when primary fails: – timeouts trigger view changes – select new primary ( view number mod 3f+1) • But also need to: – preserve safety – ensure replicas are in the same view long enough – prevent denial-of-service attacks View Change Protocol send P-certificates: VIEW-CHANGE,v+1,P,22 replica 0 = primary v fail 2f VC messages replica 1= primary v+1 replica 2 replica 3 primary collects VC-messages in X: NEW-VIEW,v+1,X,O1 pre-prepares messages for v+1 view in O with the same sequence number backups multicast prepare messages for pre-prepares in O View Change Safety Goal: No C-certificates with the same sequence number and different requests • Intuition: if replica has C-certificate(m,v,n) then quorum for C-certificate(m,v,n) any quorum Q correct replica in Q has P-certificate(m,v,n) Garbage Collection Truncate log with certificate: • periodically checkpoint state (K) • multicast CHECKPOINT,n,D(checkpoint),i i • all collect 2f+1 checkpoint messages S-certificate(h,checkpoint) discard messages and checkpoints sequence numbers Log h H=h+2K send checkpoint in view-changes reject messages Formal Correctness Proofs • Complete safety proof with I/O automata – invariants – simulation relations • Partial liveness proof with timed I/O automata – invariants Communication Optimizations • Digest replies: send only one reply to client with result Communication Optimizations • Digest replies: send only one reply to client with result • Optimistic execution: execute prepared requests client 2f+1 replies Read-write operations execute in two round-trips Communication Optimizations • Digest replies: send only one reply to client with result • Optimistic execution: execute prepared requests client 2f+1 replies Read-write operations execute in two round-trips • Read-only operations: executed in current state client 2f+1 replies Read-only operations execute in one round-trip Talk Overview • Problem • Assumptions • Algorithm • Implementation • Performance • Conclusions BFS: A Byzantine-Fault-Tolerant NFS replica 0 snfsd replication library andrew benchmark replication library relay kernel NFS client kernel VM snfsd replication library kernel VM replica n No synchronous writes – stability through replication Talk Overview • Problem • Assumptions • Algorithm • Implementation • Performance • Conclusions Andrew Benchmark Elapsed time (seconds) 70 60 50 Configuration 40 • 1 client, 4 replicas • Alpha 21064, 133 MHz • Ethernet 10 Mbit/s 30 20 10 0 BFS BFS-nr • BFS-nr is exactly like BFS but without replication • 30 times worse with digital signatures BFS is Practical Elapsed time (seconds) 70 60 50 Configuration 40 • 1 client, 4 replicas • Alpha 21064, 133 MHz • Ethernet 10 Mbit/s • Andrew benchmark 30 20 10 0 BFS NFS • NFS is the Digital Unix NFS V2 implementation Elapsed time (seconds) BFS is Practical 7 Years Later 500 450 400 350 300 250 200 150 100 50 0 Configuration • 1 client, 4 replicas • Pentium III, 600MHz • Ethernet 100 Mbit/s • 100x Andrew benchmark BFS BFS-nr NFS • NFS is the Linux 2.2.12 NFS V2 implementation Conclusions Byzantine fault tolerance is practical: – Good performance – Weak assumptions  improved resiliency What happens if we go MAD? What happens if we go MAD? • Several useful cooperative services span Multiple Administrative Domains. – Internet routing – File distribution – Cooperative backup e.t.c. • Dealing only with Byzantine behaviors is not enough. Why? • Nodes are under control of multiple administrators Why? • Nodes are under control of multiple administrators • Broken – Byzantine behaviors. – Misconfigured, or configured with malicious intent. Why? • Nodes are under control of multiple administrators • Broken – Byzantine behaviors. – Misconfigured, or configured with malicious intent. • Selfish – Rational behaviors – Alter the protocol to increase local utility Talk Overview • Problem • Model • 3 Level Architecture • Performance • Conclusions It is time to raise the BAR It is time to raise the BAR • Byzantine – Behaving arbitrarily or maliciously • Altruistic – Execute the proposed program, whether it benefits them or not • Rational – Deviate from the proposed program for purposes of local benefit Protocols • Incentive-Compatible Byzantine Fault Tolerant (IC-BFT) – It is in the best interest of rational nodes to follow the protocol exactly Protocols • Incentive-Compatible Byzantine Fault Tolerant (IC-BFT) – It is in the best interest of rational nodes to follow the protocol exactly • Byzantine Altruistic Rational Tolerant (BART) – Guarantees a set of safety and liveliness properties despite the presence of rational nodes • IC-BFT  BART General idea • Extend/Modify the Practical Byzantine Fault Tolerance Model in a way that combats the negative effects of rational (greedy) behavior. • We will achieve that by using gametheoretic tools. + A taste of Nash Equilibrium Swerve Swerve Go Straight 0, 0 -1,+1 Go Straight +1,-1 X_X,X_X -100,-100 Naughty nodes are punished • Nodes require access to a state machine in order to complete their objectives • Protocol contains methods for punishing rational nodes, including denying them access to the state machine Talk Overview • Problem • BAR Model • 3 Level Architecture • Performance • Conclusions Three-Level Architecture • Layered design • simplifies analysis/construction of systems • isolates classes of misbehavior at appropriate levels of abstraction Level 1 – Basic Primitives • Goals: – Provide IC-BFT versions of key abstractions – Ensure long-term benefit to participants – Limit non-determinism – Mitigate the efffects of residual nondeterminism – Enforce predictable communication patterns Level 1 – Basic Primitives • Goals: – Provide IC-BFT versions of key abstractions – Ensure long-term benefit to participants – Limit non-determinism – Mitigate the efffects of residual nondeterminism – Enforce predictable communication patterns Level 1 – Basic Primitives BART-RSM based on PBFT Differences: use TRB instead of consensus 3f+2 nodes required for f faulty Level 1 – Basic Primitives • Goals: – Provide IC-BFT versions of key abstractions – Ensure long-term benefit to participants – Limit non-determinism – Mitigate the efffects of residual nondeterminism – Enforce predictable communication patterns Level 1 – Basic Primitives • The RSM rotates the leadership role to participants. • Participants want to stay in the system in order to control the RSM and complete their protocols • Ultimately, incentives stem from the higher level service Level 1 – Basic Primitives • Goals: – Provide IC-BFT versions of key abstractions – Ensure long-term benefit to participants – Limit non-determinism – Mitigate the efffects of residual nondeterminism – Enforce predictable communication patterns Level 1 – Basic Primitives • Self interested nodes could hide behind nondeterminism to shirk work. – Tit-for-Tat policy – Communicate proofs of misconducts, leads to global punishment • Use Terminating Reliable Broadcast, rather than consensus. – In TRB, only the sender can propose a value – Other nodes can only adopt this value, or choose a default value Level 1 – Basic Primitives • Goals: – Provide IC-BFT versions of key abstractions – Ensure long-term benefit to participants – Limit non-determinism – Mitigate the efffects of residual nondeterminism – Enforce predictable communication patterns Level 1 – Basic Primitives • Balance costs – No incentive to make the wrong choice • Encourage timeliness – By allowing nodes to judge unilaterally whether other nodes’ messages are late and inflict sanctions to them (Penance) Level 1 – Basic Primitives • Goals: – Provide IC-BFT versions of key abstractions – Ensure long-term benefit to participants – Limit non-determinism – Mitigate the efffects of residual nondeterminism – Enforce predictable communication patterns Level 1 – Basic Primitives • Nodes have to have participated at every step in order to have the opportunity to issue a command • Message queues y y y x x x x x y I am waiting message from x Level 1 – Basic Primitives • Nodes have to have participated at every step in order to have the opportunity to issue a command • Message queues y y y x message x y x x x Level 1 – Basic Primitives • Nodes have to have participated at every step in order to have the opportunity to issue a command • Message queues y y y y x I am waiting message from y x y x x Theorems • Theorem 1: The TRB protocol satisfies Termination, Agreement, Integrity and Non-Triviality • Theorem 2: No node has a unilateral incentive to deviate from the protocol Level 2 • State machine replication is sufficient to support a backup service, but the overhead is unacceptable – 100 participants… 100 MB backed up… 10 GB of drive space • Assign work to individual nodes, using arithmetic codes to provide lowoverhead fault-tolerant storage Guaranteed Response • Direct communication is insufficient when nodes can behave rationally • We introduce a “witness” that overhears the conversation • This eliminates ambiguity • Messages are routed through this intermediary Guaranteed Response Guaranteed Response • Node A sends a request to Node B through the witness • The witness stores the request, and enters RequestReceived state • Node B sends a response to Node A through the witness • The witness stores the response, and enters ResponseReceived Guaranteed Response • Deviation from this protocol will cause the witness to either notice the timeout from Node B or lying on the part of Node A Optimization through Credible Threats Optimization through Credible Threats • Returns to game theory • Protocol is optimized so nodes can communicate directly. Add a fast path • If recipient does not respond, nodes proceed to the unoptimized case • Analogous to a game of chicken Periodic Work Protocol • Witness checks that periodic tasks, such as system maintenance are performed • It is expected that, with a certain frequency, each node in the system will perform such a task • Failure to perform one will generate a POM from the witness Authoritative Time Service • Maintains authoritative time • Binds messages sent to that time • Guaranteed response protocol relies on this for generating NoResponses Authoritative Time Service • Each submission to the state machine contains the timestamp of the proposer • Timestamp is taken to be the maximum of the median of timestamps of the previous f+1 decisions • If “no decision” is decided, then the timestamp is the previous authoritative time Level 3 BAR-B • BAR-B is a cooperative backup system • Three operations – Store – Retrieve – Audit Storage • • • • Nodes break files up into chunks Chunks are encrypted Chunks are stored on remote nodes Remote nodes send signed receipts and store StoreInfos Retrieval • A node storing a chunk can respond to a request for a chunk with – The chunk – A demonstration that the chunk’s lease has expired – A more recent StoreInfo Auditing • Receipts constitute audit records • Nodes will exchange receipts in order to verify compliance with storage quotas Talk Overview • Problem • BAR Model • 3 Level Architecture • Performance • Conclusions Evaluation • Performance is inferior to protocols that do note make these guarantees, but acceptable (?) Impact of additional nodes Impact of rotating leadership Impact of fast path optimization Talk Overview • Problem • BAR Model • 3 Level Architecture • Performance • Conclusions Conclusions • More useful as a proof of concept but certainly explores a very interesting common ground between systems and game theory as a way of exploring the performance of real-life systems. Conclusions • More useful as a proof of concept but certainly explores a very interesting common ground between systems and game theory as a way of exploring the performance of real-life systems. • CS 614 is over
© Copyright 2025 Paperzz