Re: Streaming Replication Networking Best Practices? - Mailing list pgsql-admin

From Don Seiler
Subject Re: Streaming Replication Networking Best Practices?
Date
Msg-id CAHJZqBD4SPfw8xGvhXd092W2gJL3sG1aK+jyrwe-CPtd0JCj6Q@mail.gmail.com
Whole thread Raw
In response to Re: Streaming Replication Networking Best Practices?  (Johannes Truschnigg <johannes@truschnigg.info>)
Responses Re: Streaming Replication Networking Best Practices?  (Rui DeSousa <rui.desousa@icloud.com>)
List pgsql-admin
On Mon, May 14, 2018 at 1:31 PM, Johannes Truschnigg <johannes@truschnigg.info> wrote:

Do you happen to have historical host-monitoring data available for when the
replication interruption happened? You should definitely check for CPU (on
both sides) and I/O (on the receiver/secondary) saturation.

We do have grafana and zenoss info going way back, I'll see if I can get a login there.
 
I remember when we first set up streaming replication initially, back then
under postgres 9.0, the replication connection defaulted to using TLS/SSL; at
the time with SSL/TLS compression enabled. The huge extra work that this
incurred on the CPUs involved regularly made the WAL sender on the primary
break streaming replication because it couldn't possibly keep up with the data
that was being pushed into it encrypted & compressed TCP connection over a 10G
link. (Linux's excellent perf tool proved invaluable in determining the exact
cause for the high CPU load inside the postgres processes; once we had
re-compiled OpenSSL without compression, the problem went away.)

Now of course modern TLS library versions don't implement compression any
more, and the streaming ciphers are most probably hardware accelerated for
your combination of hard- and software, but the lesson we learned back then
may still be worth keeping in mind...

Very interesting read. I just re-examined all of our settings in postgresql.conf, pg_hba.con and recovery.conf and we don't have SSL enabled anywhere there. I'm going to assume that this isn't a bottleneck in our case then.
 
Other than that... have you verified that the network link between your hosts
can actually live up to you and your manager's expectations in terms of
bandwidth delivered? iperf3 could help verify that; if the measured bandwidth
for a single TCP stream lives up to what you'd expect, you can probably rule
out network-related concerns and concentrate on looking at other potential
bottlenecks.

Thanks, I'll play around with some of these tools.

Don.

--
Don Seiler
www.seiler.us

pgsql-admin by date:

Previous
From: arvind chikne
Date:
Subject: Re: Master slave replication
Next
From: Ron
Date:
Subject: Replication using VMware SRM