Easy schema changes

PostgreSQL
meetup
Amsterdam
How Zalando uses PostgreSQL
FEIKE STEENBERGEN
07-07-2016
ZALANDO: SOME NUMBERS
• 15 EU countries
• multiple fulfillment
centers (290,000 m2)
• 18 million
active customers
• 3 billion €
Revenue
• 10,000 employees
2
OUR GROWTH
3
2016:
● 1000 tech
employees
OUR DATABASES
• >150 production Postgresql databases
• >20 TB data
• >7 TB biggest DB
• 400-1000+ write tps
• >3 DB failures/month
• ~8 PostgreSQL database engineers
5
AGENDA
• Data access
• Change data models without downtime
• Install
• Configure
• Monitor
• Cloud
6
AGENDA
• Data access
• Change data models without downtime
• Install
• Configure
• Monitor
• Cloud
7
DATA ACCESS
ORM
▶
▶
▶
▶
is well known to developers
CRUD operations are easy
all business logic inside your application
developers are in their comfort zone
▷
▷
▷
▷
error prone transaction management
you have to reflect your tables in your code
all business logic inside your application
schema changes are not easy
8
DATA ACCESS
Are there alternatives to ORM?
Stored Procedures
▶ return/receive entity aggregates
▶ clear transaction scope
▶ more data consistency checks
▶ independent from underlying data schema
9
Java Sproc Wrapper
Java Application
Java Application
Sproc Wrapper
JDBC
Stored Procedure API
Database Tables
Database Tables
Java Sproc Wrapper
@SProcService
public interface CustomerSProcService {
@SProcCall
int registerCustomer(@SProcParam String email,
@SProcParam Gender gender);
}
JAVA
CREATE FUNCTION register_customer(p_email text,
p_gender z_data.gender)
RETURNS int
AS $$
INSERT INTO z_data.customer (c_email, c_gender)
VALUES (p_email, p_gender)
RETURNING c_id
$$
LANGUAGE 'sql' SECURITY DEFINER;
SQL
11
Java Sproc Wrapper
@SProcService
public interface CustomerSProcService {
@SProcCall
int registerCustomer(@SProcParam String email,
@SProcParam Gender gender);
}
JAVA
CREATE FUNCTION register_customer(p_email text,
p_gender z_data.gender)
RETURNS int
AS $$
INSERT INTO z_data.customer (c_email, c_gender)
VALUES (p_email, p_gender)
RETURNING c_id
$$
LANGUAGE 'sql' SECURITY DEFINER;
SQL
12
Java Sproc Wrapper
@SProcCall
List<Order> findOrders(@SProcParam String email);
JAVA
CREATE FUNCTION find_orders(p_email text,
OUT order_id int,
OUT order_created timestamptz,
OUT shipping_address order_address)
RETURNS SETOF record
AS $$
SELECT o_id, o_created,
ROW(oa_street, oa_city, oa_country)::order_address
FROM z_data."order"
JOIN z_data.order_address ON oa_order_id = o_id
JOIN z_data.customer ON c_id = o_customer_id
WHERE c_email = p_email
$$
LANGUAGE 'sql' SECURITY DEFINER;
SQL
13
Java Sproc Wrapper
@SProcCall
List<Order> findOrders(@SProcParam String email);
JAVA
CREATE FUNCTION find_orders(p_email text,
OUT order_id int,
OUT order_created timestamptz,
OUT shipping_address order_address)
RETURNS SETOF record
AS $$
SELECT o_id, o_created,
ROW(oa_street, oa_city, oa_country)::order_address
FROM z_data."order"
JOIN z_data.order_address ON oa_order_id = o_id
JOIN z_data.customer ON c_id = o_customer_id
WHERE c_email = p_email
$$
LANGUAGE 'sql' SECURITY DEFINER;
SQL
14
Zalando never sleeps
15
Stored Procedure
API versioning
search_path =
api_v13_01, public;
search_path =
api_v13_02, public;
api_v13_01
api_v13_02
Database Tables
16
Stored Procedure
API versioning
search_path =
api_v13_01, public;
search_path =
api_v13_02, public;
api_v13_01
api_v13_02
Database Tables
17
AGENDA
• Data access
• Change data models without downtime
• Install
• Configure
• Monitor
• Cloud
18
Easy schema changes
● PostgreSQL
▶ Schema changes with minimal locks with:
ADD/RENAME/DROP COLUMN
ADD/DROP DEFAULT VALUE
▶ CREATE/DROP INDEX CONCURRENTLY
▷ Constraints are still difficult to ALTER
19
Easy schema changes
● Stored Procedure API layer
▶ Can fill missing data on the fly
▶ Helps to change data structure
without application noticing it
20
Easy schema changes
● Read and write to old structure
● Write to both structures, old and new.
Try to read from new, fallback to old
● Migrate data
● Read from new, write to old and new
21
Easy schema changes
● Schema changes using SQL script files
○ SQL scripts written by developers (DBDIFFs)
○ registering DBDIFFs with Versioning
○ should be reviewed by DB guys
○ DB guys are rolling DB changes on the live
system
22
Easy schema changes
BEGIN;
SELECT _v.register_patch('ZEOS-5430.order');
CREATE TABLE z_data.order_address (
oa_id int SERIAL,
oa_country z_data.country,
oa_city varchar(64),
oa_street varchar(128), ...
);
ALTER TABLE z_data."order" ADD o_shipping_address_id int
REFERENCES z_data.order_address (oa_id);
COMMIT;
23
DBDIFF SQL
Easy schema changes
BEGIN;
SELECT _v.register_patch('ZEOS-5430.order');
\i order/database/order/10_tables/10_order_address.sql
ALTER TABLE z_data."order" ADD o_shipping_address_id int
REFERENCES z_data.order_address (oa_id);
COMMIT;
24
DBDIFF SQL
Easy schema changes
BEGIN;
SELECT _v.register_patch('ZEOS-5430.order');
\i order/database/order/10_tables/10_order_address.sql
SET statement_timeout TO ‘3s’;
ALTER TABLE z_data."order" ADD o_shipping_address_id int
REFERENCES z_data.order_address (oa_id);
COMMIT;
25
DBDIFF SQL
Easy schema changes
26
Easy schema changes
27
Easy schema changes
No downtime due to migrations or
deployment since we use PostgreSQL
28
Easy schema changes
No One downtime due to migrations or
deployment since we use PostgreSQL
29
AGENDA
• Data access
• Change data models without downtime
• Install
• Configure
• Monitor
• Cloud
30
Uniform look and feel
● We build our own packages
● the same location for Ubuntu, RedHat and
Solaris
○ /server/postgres/9.0.17 -> 9.0
○ /server/postgres/9.3.4 -> 9.3
● Distribution by puppet
31
Uniform look and feel
● data directory structure inspired by
oracle experience and depesz blog
○ /data/postgres/pgsql_<cluster>/9.0/data
○ pg_xlogs
pg_logs
WAL archive locations also standardized
32
AGENDA
• Data access
• Change data models without downtime
• Install
• Configure
• Monitor
• Cloud
33
Configuration management
● puppet seems to be too generic
● one hierarchical YAML with inheritance
○ patterns for database shards
○ contains IP range definitions to generate
pg_hba.conf
○ used to generate configs for all other tools
34
Configuration management
● one hierarchical YAML with inheritance
...
stdb01:
groups: [pg93]
service_name: stock1.db.zalando
custom_params:
max_connections: 500
shared_buffers: 4GB
effective_cache_size: 40GB
work_mem: 8MB
instances:
stdb03-repl:
stdb03:
itr-stdb01:
shared_buffers: 2GB
maintenance_work_mem: 128MB
effective_cache_size: 10GB
...
35
Configuration management
36
Configuration management
Current system:
● config files are generated based on config.yaml
and stored in git
● distributed to all machines from git
● linked as postgresql.conf on destination host:
/data/postgres/etc/9.{0,1,2,3}/
<instance>/ postgresql-<instance><hostname>.conf
37
AGENDA
• Data access
• Change data models without downtime
• Install
• Configure
• Monitor
• Cloud
38
Monitoring
39
pg_view
40
Monitoring
● Tools
○ psql wrapper on DBA client machines
■ psql_<instance>_<ENV>
○ aliases on the host machines
■ pg_ctl_<instance>
■ psql_<instance>
■ pg_taillog_<instance>
○ helper scripts
■ assign or remove service IPs
■ backup all instances on the host
41
Monitoring
● Zmon
● Dedicated 24x7 monitoring team
42
PGObserver
43
PGObserver
44
AGENDA
• Data access
• Change data models without downtime
• Install
• Configure
• Monitor
• Cloud
45
Cloud
● Microservices
● Explosion of databases
● Shift of responsibility
● Our solution: Spilo (as a service)
46
Cloud
● Teams monitor performance/locks
● DBE’s monitor infrastructure
● Trainings
● Advocacy
47
Wishlist
● Better sharding (citus?)
● Automatic failover (vintage datacenter)
● SQL Execution progress
● Our solution: Spilo (as a service)
48
Links
ZMON - monitoring
github.com/zalando/zmon
SProcWrapper – Java library for stored procedure access
github.com/zalando/java-sproc-wrapper
PGObserver – monitoring web tool for PostgreSQL
github.com/zalando/PGObserver
pg_view – top-like command line activity monitor
github.com/zalando/pg_view
PGObserver – monitoring web tool for PostgreSQL
github.com/zalando/PGObserver
49
Thank you!
50