[COMMITTERS] pgsql: Don't lose walreceiver start requests due to race condition inp - Mailing list pgsql-committers

From Tom Lane
Subject [COMMITTERS] pgsql: Don't lose walreceiver start requests due to race condition inp
Date
Msg-id E1dPbcL-0000O9-TJ@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Don't lose walreceiver start requests due to race condition in postmaster.

When a walreceiver dies, the startup process will notice that and send
a PMSIGNAL_START_WALRECEIVER signal to the postmaster, asking for a new
walreceiver to be launched.  There's a race condition, which at least
in HEAD is very easy to hit, whereby the postmaster might see that
signal before it processes the SIGCHLD from the walreceiver process.
In that situation, sigusr1_handler() just dropped the start request
on the floor, reasoning that it must be redundant.  Eventually, after
10 seconds (WALRCV_STARTUP_TIMEOUT), the startup process would make a
fresh request --- but that's a long time if the connection could have
been re-established almost immediately.

Fix it by setting a state flag inside the postmaster that we won't
clear until we do launch a walreceiver.  In cases where that results
in an extra walreceiver launch, it's up to the walreceiver to realize
it's unwanted and go away --- but we have, and need, that logic anyway
for the opposite race case.

I came across this through investigating unexpected delays in the
src/test/recovery TAP tests: it manifests there in test cases where
a master server is stopped and restarted while leaving streaming
slaves active.

This logic has been broken all along, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/21344.1498494720@sss.pgh.pa.us

Branch
------
REL9_2_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/e96adaacdc8fba490263265b162a2670c6d62c3a

Modified Files
--------------
src/backend/postmaster/postmaster.c | 39 ++++++++++++++++++++++++++++++-------
1 file changed, 32 insertions(+), 7 deletions(-)


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: [COMMITTERS] pgsql: Ignore old stats file timestamps when starting the statscollect
Next
From: Tom Lane
Date:
Subject: [COMMITTERS] pgsql: Reduce wal_retrieve_retry_interval in applicable TAP tests.