Re: Streaming Replication Networking Best Practices? - Mailing list pgsql-admin

From Flavio Henrique Araque Gurgel
Subject Re: Streaming Replication Networking Best Practices?
Date
Msg-id CAGHTAeN+KU8vAS5f8M6HiHE8xb-3FFe0pG4rNvuxpJ5G3hw77Q@mail.gmail.com
Whole thread Raw
In response to Streaming Replication Networking Best Practices?  (Don Seiler <don@seiler.us>)
Responses Re: Streaming Replication Networking Best Practices?  (Don Seiler <don@seiler.us>)
List pgsql-admin


Em seg, 14 de mai de 2018 às 18:12, Don Seiler <don@seiler.us> escreveu:
Postgres 9.6.6. Primary has a local (HA) replica and a remote (DR) replica.

I've done a couple of big data purges the past few weeks. This past weekend I ran a DELETE & VACUUM that took all of 8 minutes. The local replica kept up just fine, but the remote replica lagged and broke streaming replication after just a few minutes. We have WAL archives sent via NetApp mirroring to back that up.

However I'd like to know if there are any optimal networking settings on the host or network that we maybe missing. My manager says that the circuit between data centers was only 60% utilized at its peak.

In the past I've tried increasing wal_keep_files, which keeps the WAL files available for streaming but the fact remains that they stream very slowly so the lag just gets worse than if we fell back to archives every 30 minutes or so.

I have no basis for this other than my previous experience with Oracle physical standbys, but I would think that streaming replication should be able to push more than it seems to be doing in my prod environment. The fact that the local replica keeps up just fine without breaking streaming replication tells me that the problem is in the cross-datacenter circuit, not in postgres recovery performance.

If anyone has any advice on host networking setup, tuning or testing, I'd love to hear it.

If you're running 9.6, you can use replication slots to avoid to mess with wal_keep_segments [1]
Be aware that not only network bandwidth and latency are responsable for that behaviour, wal_receiver, disk write capability on your standby can be bottlenecks too.
It happens to me in local networks with 10Gbps capable hardware.

[1] https://www.postgresql.org/docs/9.6/static/warm-standby.html#STREAMING-REPLICATION-SLOTS

Flavio Gurgel

pgsql-admin by date:

Previous
From: Don Seiler
Date:
Subject: Streaming Replication Networking Best Practices?
Next
From: Don Seiler
Date:
Subject: Re: Streaming Replication Networking Best Practices?