I noticed this recent buildfarm failure:
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=sidewinder&dt=2020-09-29%2018%3A45%3A17
which boils down to
error running SQL: 'psql:<stdin>:1: ERROR: could not connect to the publisher: FATAL: number of requested standby
connectionsexceeds max_wal_senders (currently 5)'
while running 'psql -XAtq -d port=62411 host=/tmp/cmXKiWUDs9 dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'ALTER
SUBSCRIPTIONsub2 REFRESH PUBLICATION' at
/home/pgbf/buildroot/HEAD/pgsql.build/src/test/subscription/../../../src/test/perl/PostgresNode.pmline 1546.
Digging in the postmaster log shows that indeed we were at the limit
of 5 wal senders. One was about to exit (else this test could never
succeed at all), but it had not done so fast enough to avoid this
failure.
Further digging in the buildfarm archives shows that "number of requested
standby connections exceeds max_wal_senders" seems rather common on our
slower buildfarm members, eg there are two such complaints in prairiedog's
latest successful HEAD build. Apparently, most of the time this gets
masked by automatic restart of logrep workers; but when a test script
involves explicit execution of a replication command, it's going to notice
if that try fails to connect.
So I wonder why PostgresNode.pm is doing
print $conf "max_wal_senders = 5\n";
Considering that our default these days is 10 senders, and that a
walsender slot doesn't really cost much, this seems unduly cheapskate.
I propose raising this to 10.
There might be some value in the fact that this situation is exercising
the automatic-reconnection behavior, but if so I'd like to find a more
consistent way of testing that.
regards, tom lane