Re: Streaming Replication Networking Best Practices? - Mailing list pgsql-admin

From Flavio Henrique Araque Gurgel
Subject Re: Streaming Replication Networking Best Practices?
Date
Msg-id CAGHTAePBqAyFidc7Q9BXZUe-Frov1DqZRfX9LiOiNc6D2bZXbQ@mail.gmail.com
Whole thread Raw
In response to Re: Streaming Replication Networking Best Practices?  (Don Seiler <don@seiler.us>)
Responses Re: Streaming Replication Networking Best Practices?  (Don Seiler <don@seiler.us>)
List pgsql-admin


Em seg, 14 de mai de 2018 às 18:22, Don Seiler <don@seiler.us> escreveu:
On Mon, May 14, 2018 at 11:17 AM, Flavio Henrique Araque Gurgel <fhagur@gmail.com> wrote:

If you're running 9.6, you can use replication slots to avoid to mess with wal_keep_segments [1]

I don't think replication slots would help here. As I mentioned in my original post, I changed wal_keep_files so that the WAL files weren't deleted too soon, but the streaming was still slow and lagged even further behind. My understanding is replication slots just serve to keep the WAL files in place. If I did that then they'd probably fill up the primary disk since the DR replica would take too long to process them all.

Are you sure that wal streaming from primary is the main cause of replication lag?
Take a look at the pg_stat_replication view and compare values of sent, write and flush locations. If flush lags behind more than sent or write locations, queries running on your standby server may need rows that have been cleaned up by your vacuum process on your master and replication is held until those queries finish. If it's the case you may consider increasing parameters like vacuum_defer_cleanup_age (be aware that already deleted/updated rows will remain dead longer on your master) or consider not vacuuming too soon (you may need to modify autovacuum parameters if it's the case)
 
 
Be aware that not only network bandwidth and latency are responsable for that behaviour, wal_receiver, disk write capability on your standby can be bottlenecks too.
It happens to me in local networks with 10Gbps capable hardware.

Worth looking into. I would have assumed that since our local replica handles the storage I/O just fine that the replica would since they use the same model of hardware for server and storage array. I'll make sure that my assumptions are correct though and see if anything else is up there.

If that's the case the sent_location will lag above all others.

Flavio Gurgel

pgsql-admin by date:

Previous
From: Don Seiler
Date:
Subject: Re: Streaming Replication Networking Best Practices?
Next
From: Don Seiler
Date:
Subject: Re: Streaming Replication Networking Best Practices?