Fast Read Scaling PostgreSQL with repmgr Greg Smith 2ndQuadrant US © 2ndQuadrant Limited 2010-2011 Overview repmgr is an open source clusterware tool for PostgreSQL replication Uses Streaming Replication and Hot Standby Provides – Ease of Use – Performance – Monitoring – Best Practice © 2ndQuadrant Limited 2010-2011 Target Cluster Architecture Standby Node Master Node Standby Node © 2ndQuadrant Limited 2010-2011 Master Multiple Standby nodes Streaming replication Actions master – register standby – clone – register – promote – follow © 2ndQuadrant Limited 2010-2011 Adding Standby node Node2: Standby Node Node1: Current Master © 2ndQuadrant Limited 2010-2011 repmgr standby clone Performs all actions required to add one standby node Adding Standby node Node2: Standby Node Node1: Current Master Node3: Standby Node © 2ndQuadrant Limited 2010-2011 repmgr standby clone Performs all actions required to add second standby node Sample setup [postgres@node1]:~$ repmgr master register [postgres@node2]:~$ repmgr -D $PGDATA -U repmgr standby clone node1 [postgres@node2]:~$ pg_ctl start [postgres@node2]:~$ repmgrd -f $HOME/repmgr/repmgr.conf © 2ndQuadrant Limited 2010-2011 Usage All actions are simple one line commands Default is “the current node” Can execute actions on other nodes by explicitly naming them Configuration file provides additional parameters © 2ndQuadrant Limited 2010-2011 Failover (1) Node2: Standby Node Master Node Node3: Master Node © 2ndQuadrant Limited 2010-2011 repmgr standby promote Changes standby into new master node Fencing the old master: still your problem (for now!) Failover (2) Node2: Standby Node Master Node Node3: Master Node © 2ndQuadrant Limited 2010-2011 repmgr standby follow Changes standby to follow newly promoted master Failover (3) Node2: Standby Node Node1: Standby Node Node3: Master Node © 2ndQuadrant Limited 2010-2011 repmgr standby clone --force Forces old master into being a standby of newly promoted master Takes advantage of rsync optimization repmgrd Monitoring daemon on each node – repmgr master register – repmgr standby register Allows monitoring and management © 2ndQuadrant Limited 2010-2011 repmgrd configuration cluster=test node=1 conninfo='host=node1 user=repmgr dbname=pgbench' © 2ndQuadrant Limited 2010-2011 Monitoring $ psql x c "SELECT * FROM repmgr_test.repl_status" primary_node | 1 standby_node | 2 last_monitor_time | 20110223 08:19:39.79197405 last_wal_primary_location | 0/1902D5E0 last_wal_standby_location | 0/1902D5E0 replication_lag | 0 bytes apply_lag | 0 bytes time_lag | 00:00:13.30293 © 2ndQuadrant Limited 2010-2011 Read Scaling Use Cases • High availability with active monitoring • Offload long running reports • Materialize views • Load balance small read-only queries © 2ndQuadrant Limited 2010-2011 Routing reads and writes • Writes must go to master • Reads execute against master or any standby • Application may know • Application servers may support this concept • – JDBC – Django Database proxy servers can do this routing – • pgpool-II 3.0 “tastes” queries Hard to solve in all cases – Where do functions (stored procedures) go? © 2ndQuadrant Limited 2010-2011 Architecture: Read Scaling Many read copies with slight lag Each is also a potential failover node Not suitable for long reports pgpool-II router Reads “Hot” Read-Only Node Writes Primary Node © 2ndQuadrant Limited 2010-2011 “Hot” Read-Only Node Architecture: Reporting Server Rolling Reporting Server(s) Live servers runs queries Other servers provide failover capability for Primary “Hot” Read-Only Reporting Node Primary Node © 2ndQuadrant Limited 2010-2011 Archive Failover Node Architecture: Relay Server Archive data streamed to a standby Ship the result to a second layer standby pg_streamrecv – https://github.com/mhagander Hot Standby Primary Node © 2ndQuadrant Limited 2010-2011 Hot Standby Architecture challenges • Doing all maintenance on the master is hard – VACUUM – CREATE INDEX CONCURRENTLY • All writes are still going to all the slaves • Write bottlenecks can occur in multiple places – • 5 hour checkpoints are no fun Query cancellation © 2ndQuadrant Limited 2010-2011 Prioritization • Keep the standby current for failover • Long running reports on the standby • Avoid adding overhead to the master © 2ndQuadrant Limited 2010-2011 Query Conflicts Primary: Drop Database X Standby: Query on database X Cannot do both Action on primary has already happened, so whatever occurs, WAL recovery must always win © 2ndQuadrant Limited 2010-2011 Query Visibility: 9.0 • Queries executing on standby are independent • Master does not know what is running © 2ndQuadrant Limited 2010-2011 9.0 tuning in theory • Increase vacuum_defer_cleanup_age to reduce vacuum cleanup cancellation • Increase max_standby_*_delay for long running reports • Use dblink “sleep on open snapshot” technique to make MVCC data export back to the master © 2ndQuadrant Limited 2010-2011 9.0 tuning in practice • Setting vacuum_defer_cleanup_age in txid units is impossible for most • The maximum values available for max_standby_*_delay are only ~35 minutes • dblink snapshot export techniques work, but difficult to implement for most • Spurious cancellations are hard to eliminate completely © 2ndQuadrant Limited 2010-2011 What's new in 9.1 • pg_stat_replication makes non-wizard monitoring possible • max_standby_*_delay can be big • hot_standby_feedback makes MVCC style snapshot export easy • Base backups possible using the database connection • Synchronous replication • Improvements in b-tree delete handling © 2ndQuadrant Limited 2010-2011 Why still care about repmgr? • Remote node command execution makes management easier • rsync based approach can make fail-back dramatically faster • Newer features like autofailover • Best practices and workarounds for real-world issues are incorporated © 2ndQuadrant Limited 2010-2011 Shared knowledge helps • Albourne deployment and feedback from Martin Eriksson critical to V1.0 design • Heroku deployment contributed support for a new use case • Early adopters of V2.0 autofailover are paving that part of the roadmap right now • Unusual issues are being identified by community bug reports © 2ndQuadrant Limited 2010-2011 Community Publicly released in December 2010 Project hosted at GitHub Core team: Simon Riggs, Jaime Casanova, Greg Smith, Cédric Villemain GPL license to encourage sharing modifications V1.1 included code from 3 other companies V2.0 already pushed out, in unannounced beta © 2ndQuadrant Limited 2010-2011
© Copyright 2025 Paperzz