The SMART Way to Migrate Replicated Stateful Services Jacob R. Lorch, Atul Adya, Bill Bolosky, Ronnie Chaiken, John Douceur, Jon Howell Microsoft Research First EuroSys Conference 19 April 2006 services Replicated Replicated stateful services A C B Paxos • Problem: Machine failure leads to unavailability – Solution: Replicate the service for fault tolerance • Problem: Replica state can become inconsistent – Solution: Use replicated state machine approach The SMART Way to Migrate Replicated Stateful Services Migrating replicated services A B C D E • Migration: Changing the configuration – the set of machines running replicas • Uses of migration – Replace failed machines for long-term fault tolerance – Load balancing – Increasing or decreasing number of replicas The SMART Way to Migrate Replicated Stateful Services Limitations of current approaches Limitations of current approaches addressed by SMART remove non-failed machines • Can Cannot remove non-failed machines without creating of vulnerability – Enableswindow autonomic migration, i.e., migration – – – without involvement Can onlyhuman remove known-failed machines Enables load balancing Cannot use migration for load balancing • Can do process concurrent request processing Cannot requests in parallel • Can perform arbitrary migrations, even ones replacing entire configuration • Completely described in our paper The SMART Way to Migrate Replicated Stateful Services Outline • • • • Introduction Background on Paxos Limitations of existing approaches SMART: Service Migration And Replication Technique – Configuration-specific replicas – Shared execution modules • Implementation and evaluation • Conclusions The SMART Way to Migrate Replicated Stateful Services Background on Paxos The SMART Way to Migrate Replicated Stateful Services Background: Paxos overview Paxos protocol requests: slots: A B … … 123456 C 123456 … 123456 • Goal: Every service replica runs the same sequence of requests – Deterministic service ensures state changes and replies are consistent • Approach: Paxos assigns requests to virtual “slots” – No two replicas assign different requests to same slot – Each replica executes requests in slot order The SMART Way to Migrate Replicated Stateful Services Background: Paxos protocol Z Req client A DECIDED PROPOSE B LOGGED C LOGGED server replicas • One replica is the leader • Clients send requests to the leader • Leader proposes a request by sending PROPOSE message to all replicas • Each replica logs it and sends a LOGGED message to the leader • When leader receives LOGGED messages from a majority, it decides it and sends a DECIDED message The SMART Way to Migrate Replicated Stateful Services Background: Paxos leader change A PollB C Reply • If leader fails, another replica “elects” itself • New leader must poll replicas and hear replies from a majority – Ensures it learns enough about previous leaders’ actions to avoid conflicting proposals The SMART Way to Migrate Replicated Stateful Services Background: Paxos migration Service state α A 79 80 81 82 83 84 85 B 79 80 81 82 83 84 85 C 79 80 81 82 83 84 85 A, B, D C D • Service state includes current configuration – Request that changes that part of the state migrates the service • Configuration after request n responsible for requests n+α and beyond The SMART Way to Migrate Replicated Stateful Services Rationale for α Req Z A PROPOSE B LOGGED C LOGGED • With α=1, slot n can change the configuration responsible for slot n+1 • Leader can’t propose slot n+1 until n is decided – Doesn’t know who to make proposal to, let alone whether it can make proposal at all • Prevents pipelining of requests – Request may wait a network round trip and a disk write The SMART Way to Migrate Replicated Stateful Services Limitations of existing approaches The SMART Way to Migrate Replicated Stateful Services No request pipelining • Leader change is complicated – How to ensure that new leader knows the right configuration to poll? – How to handle some outstanding proposals being from one configuration and some from another? – Other problems • To avoid this complexity, current approaches use α=1 • But, this prevents request pipelining The SMART Way to Migrate Replicated Stateful Services Window of vulnerability A PROPOSE DECIDED B Poll LOGGED C LOGGED D • Removing a machine creates window of vulnerability – Effectively, it induces a failure of the removed replica – Consequently, service can become permanently unavailable even if less than half the machines fail • Considered acceptable since machines only removed when known to a human to have permanently failed • Not suitable for autonomic migration using imperfect failure detectors, or for load balancing The SMART Way to Migrate Replicated Stateful Services SMART The SMART Way to Migrate Replicated Stateful Services Configuration-specific replicas Replica 1A Replica 1B Replica 2A Replica 2B A B Replica 1C Replica 2D C D • Each configuration has its own set of replicas and its own separate instance of Paxos • Simplifies leader change so we can pipeline requests – Election always happens in a static configuration • No window of vulnerability because a replica can remain alive until next configuration is established The SMART Way to Migrate Replicated Stateful Services SMART migration protocol FINISHED FINISHED JOIN1A Replica FINISHED Replica 1B READY Replica 2A PREPARE READY JOIN2B Replica A B FINISHED Replica 1C Replica 2D C D JOIN-REQ • After creating new configuration, send JOIN msgs • After executing request n+α-1, send FINISHED msgs – Tells new replicas where they can get starting state – Makes up for possibly lost JOIN messages • When a majority of successor configuration have their starting state, replica kills itself • If a machine misses this phase, it can still join later The SMART Way to Migrate Replicated Stateful Services Shared execution modules Agreement Replica 1A1A Execution 1A Agreement Replica 1B1B Execution 1B Agreement 2A Replica 2A Execution 2A Execution A Agreement Replica 2B2B Execution 2B Execution B Agreement Replica 1C1C Execution 1C Execution Agreement 2D Replica 2D Execution Execution2D C • Configuration-specific replicas have a downside – One copy of service state for each replica – Need to copy state to new replicas • Solution: Shared execution modules – Divide replica into agreement and execution modules – One execution module for all replicas on machine The SMART Way to Migrate Replicated Stateful Services D Implementation and evaluation • SMART implemented in a replicated state machine library, LibSMART – Lets you build a service as if it were single-machine, then turns it into a replicated, migratable service • Farsite distributed file system service ported to LibSMART – Straightforward because LibSMART uses BFT interface • Experimental results using simple key/value service – Pipelining reduces average client latency by 14% – Migration happens quickly, so clients only see a bit of extra latency, less than 30 ms The SMART Way to Migrate Replicated Stateful Services Conclusions • Migration is useful for replicated services – Long-term fault tolerance, load balancing • Current approaches to migration have limitations • SMART removes these limitations by using configuration-specific replicas – Can remove live machines, enabling autonomic migration and load balancing – Can overlap processing of concurrent requests • SMART is practical – Implementation supports large, complex file system service The SMART Way to Migrate Replicated Stateful Services
© Copyright 2026 Paperzz