very high replay_lag on 3-node cluster - Mailing list pgsql-general

From Tiemen Ruiten
Subject very high replay_lag on 3-node cluster
Date
Msg-id CAEkBuzfxH26a_G6ue=3zZhkLD2YTkmvz45LioeGMMesT1Cu_8A@mail.gmail.com
Whole thread Raw
Responses Re: very high replay_lag on 3-node cluster  (Tiemen Ruiten <t.ruiten@tech-lab.io>)
List pgsql-general
Hello,

In my previous post[1] on this list I brought up an issue with long running checkpoints. I reduced checkpoint_timeout to a more reasonable value (15m down from 60m) and forced/immediate checkpoints now complete mostly in under a minute. This thread and another one[2] on the Clusterlabs mailinglist also helped me understand more about how PostgreSQL internals work, thanks everyone!

Now to my current issue: I took the advice to add more monitoring on replay lag (using pg_last_xact_replay_timestamp) and things are not looking good. Last night replication lagged by almost 6 hours on one of the nodes[3], but eventually caught up. As you can see in that screenshot, ph-sql-03 is consistently slower to replay than ph-sql-05 (ph-sql-04 is the current master) and there happen to be different SSD's in ph-sql-03 (Crucial MX300 vs Crucial MX500 in the other two), which makes me think this is IO related. 

When I check the replay_lag column of pg_stat_replication, the numbers are consistent with the data from pg_last_xact_replay_timestamp:

postgres=# SELECT application_name, replay_lag FROM pg_stat_replication;
 application_name |   replay_lag
------------------+-----------------
 ph-sql-03        | 00:15:16.179952
 ph-sql-05        | 00:10:01.078163

Currently this doesn't present an operational issue, as the slaves aren't used by applications (still waiting for development to make the necessary changes). So there are no queries running at all on the slaves apart from the occasional monitoring. 

Cluster specifications:
all nodes: 
- filesystem: ZFS stripe of mirrors
- 2* CPU: Intel(R) Xeon(R) CPU E5-2630 v4 @ 2.20GHz (20 cores total)
- 128 GB RAM
ph-sql-03: 8* Crucial MX300 1050MB, underprovisioned to 1TB
ph-sql-0{4,5}: 8* Crucial MX500 1TB

postgresql.conf with GUCs changed from default:

shared_buffers = 8GB
work_mem = 64MB
maintenance_work_mem = 2GB
autovacuum_work_mem = 1GB
effective_io_concurrency = 200
max_worker_processes = 50
max_parallel_maintenance_workers = 8
max_parallel_workers_per_gather = 8
max_parallel_workers = 40
wal_level = replica
synchronous_commit = off
full_page_writes = off
wal_log_hints = on
wal_buffers = 128MB
checkpoint_timeout = 15min
max_wal_size = 8GB
min_wal_size = 1GB
checkpoint_completion_target = 0.9
archive_mode = on
archive_command = 'pgbackrest --stanza=pgdb2 archive-push %p'
max_wal_senders = 10
wal_keep_segments = 20
hot_standby = on
hot_standby_feedback = on
random_page_cost = 1.5
effective_cache_size = 48GB
default_statistics_target = 500
shared_preload_libraries = 'timescaledb, pg_cron, pg_prewarm'   # (change requires restart)
max_locks_per_transaction = 512
 
What are possible reasons for the high replay_lag? Is my storage just too slow? Are there any tunables available?


Thank you,

Tiemen Ruiten

pgsql-general by date:

Previous
From: Luca Ferrari
Date:
Subject: Re: Rearchitecting for storage
Next
From: Matthew Pounsett
Date:
Subject: Re: Rearchitecting for storage