Project Presentation State Machine Replication Ido Zachevsky Marat Radan Supervisor: Ittay Eyal Winter Semester 2010 Goals • Learn and understand Paxos and Python. • Design program for fault-tolerant distributed system using the Paxos algorithm. • Test on a real internet scale system, PlanetLab. The Problem – Distributed Storage • Using Distributed Algorithms on a network has many advantages • It also has many problems • This project focuses on the Synchronization Problem Synchronization • The task: Successfully issue a state machine which involves all the computers of a network • All the computers need to be in sync regarding the Current State and the Next States. • All the computers need to know the transitions. Problems? • Can any computer choose the next state? • What if a computer disconnects ungracefully? • What if a message is delayed due to congestion? • Other problems… • Solution: Use a dedicated algorithm A Solution – Paxos • Keeping the Safety requirements ensures an agreed-upon value, by all computers, is chosen • Keeping the Liveness requirements ensures a value will be chosen Paxos - Background Paxos Made Simple Leslie Lamport 01 Nov 2001 • Paxos Made Live Principles • The system consists of three agent classes: – Proposers – Acceptors – Learners • Some of them distinguished • Communicate via messages Principles – continued • A single computer – a Leader – is in charge • Decision cycle in two phases: 1. A majority must promise to commit to a recent proposal. 2. Once a majority has committed, all computers are informed of the Decision. Safety requirements • Only a value that has been proposed may be chosen, • Only a single value is chosen, and • A process never learns that a value has been chosen unless it actually has been. Liveness requirements • Some proposed value is eventually chosen. • A process can eventually learn the value which has been chosen. Implementing a State Machine • Collection of servers, each implementing a state machine. • The i-th state machine command in the sequence is the value chosen by the i-th instance of the Paxos consensus algorithm. • A pre-decided set of commands is necessary. Planet-Lab • Planet-Lab is a global research network that supports the development of new network services. • Understanding the system is required • Monitoring is necessary – Generally, implemented via NSSL-lab. Project Design • Chosen language for implementation: Python • Network framework: Twisted Matrix • Implementation stages: – Single Decision on NSSL – Multiple Decisions on NSSL – Single Decision on Planet-Lab – Multiple Decisions on Planet-Lab Server N Listening Socket Clients N … … ... Transport Protocol Transport Protocol ... ... Transport Protocol The Network Protocol Factory Reactor Loop Server 2 Clients 2 Server 1 Clients 1 Transport Protocol ... ... Transport Protocol Transport Protocol Protocol Factory Paxos Algorithm Implementation • Use Cases – Acceptor disconnects? – Leader disconnects? • At which stage? – Acceptor message fails to deliver? Implementation • Leader Election – In fact an inherent part of the algorithm • Output and monitoring – Actual output not visible in general – Only via monitoring Flow 1. 2. 3. 4. 5. 6. Register Nodes Verify and install necessary files Upload Initiate Monitor Run and wait for activity Review results Implementation – File Structure Project File Structure Paxos Instance paxos_inst (py) Paxos Algorithm paxos_alg (py) Network Data paxos_net_data (txt) Paxos Monitor paxos_mon_serv (py) Deployment my_deploy (csh) Installation my_install (csh) Multi-Run my_multirun (csh) Initial Communication send_install (py) Multi-Stop my_multistop (csh) Uploading and Running Alive Machines Server install_serv (py) Alive Nodes list nodes (txt) combine_nodes (csh) conv_nodes (csh) remove_done (csh) Additional files Core Paxos Program Service Scripts and Files Initial Installation Results • Everything works at the NSSL • In Real-Life, not necessarily • Communication phenomena – messages arriving unordered, in large chunks, etc. • Works well for up to 20-30 Nodes • Use cases tested in Lab Conclusions • Preliminary work needed to understand Twisted Matrix and Planet-Lab • Dealing with network problems – SSH Tunnel instead of “real” monitoring • Requirements fulfilled Further work • Optimize networking protocol – Improve client-server interface – Inefficient startup – N(N-1) for N machines • Partition Decision processes – Only few nodes decide each resolution Thank you
© Copyright 2026 Paperzz