Re: Streaming Replication Networking Best Practices? - Mailing list pgsql-admin

From Johannes Truschnigg
Subject Re: Streaming Replication Networking Best Practices?
Date
Msg-id 20180514183153.d4b4cc6q4c3ywvxg@vault.lan
Whole thread Raw
In response to Streaming Replication Networking Best Practices?  (Don Seiler <don@seiler.us>)
Responses Re: Streaming Replication Networking Best Practices?  (Don Seiler <don@seiler.us>)
List pgsql-admin
On Mon, May 14, 2018 at 11:11:40AM -0500, Don Seiler wrote:
> Postgres 9.6.6. Primary has a local (HA) replica and a remote (DR) replica.
> [...]
> However I'd like to know if there are any optimal networking settings on
> the host or network that we maybe missing. My manager says that the circuit
> between data centers was only 60% utilized at its peak.

That actually hints at your network link/TCP performance _not_ being the
problem, I think.


Do you happen to have historical host-monitoring data available for when the
replication interruption happened? You should definitely check for CPU (on
both sides) and I/O (on the receiver/secondary) saturation.

I remember when we first set up streaming replication initially, back then
under postgres 9.0, the replication connection defaulted to using TLS/SSL; at
the time with SSL/TLS compression enabled. The huge extra work that this
incurred on the CPUs involved regularly made the WAL sender on the primary
break streaming replication because it couldn't possibly keep up with the data
that was being pushed into it encrypted & compressed TCP connection over a 10G
link. (Linux's excellent perf tool proved invaluable in determining the exact
cause for the high CPU load inside the postgres processes; once we had
re-compiled OpenSSL without compression, the problem went away.)

Now of course modern TLS library versions don't implement compression any
more, and the streaming ciphers are most probably hardware accelerated for
your combination of hard- and software, but the lesson we learned back then
may still be worth keeping in mind...


Other than that... have you verified that the network link between your hosts
can actually live up to you and your manager's expectations in terms of
bandwidth delivered? iperf3 could help verify that; if the measured bandwidth
for a single TCP stream lives up to what you'd expect, you can probably rule
out network-related concerns and concentrate on looking at other potential
bottlenecks.

--
with best regards:
- Johannes Truschnigg ( johannes@truschnigg.info )

www:   https://johannes.truschnigg.info/
phone: +43 650 2 133337
xmpp:  johannes@truschnigg.info

Please do not bother me with HTML-email or attachments. Thank you.

Attachment

pgsql-admin by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Can database access for roles be different on the hot standbyserver vs the master server?
Next
From: Johannes Truschnigg
Date:
Subject: Re: Master slave replication