Re: Allow interrupts on waiting standby - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Allow interrupts on waiting standby
Date
Msg-id CAB7nPqSMu1BAPgHK2aqmYm=7f__VzwvXDCVHqYLZzbH1HFbE-Q@mail.gmail.com
Whole thread Raw
In response to [HACKERS] Allow interrupts on waiting standby  (Simon Riggs <simon@2ndquadrant.com>)
List pgsql-hackers
On Fri, Mar 31, 2017 at 4:46 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> Thank you, but did you remove WL_LATCH_SET from WaitLatch() intentionally?  I understood you added it for startup
processto respond quickly to events other than the postmaster death.  Why don't we restore WL_LATCH_SET?  I won't
objectto not adding the flag if there's a reason. 

It doesn't matter. MyLatch is never set in the context of the startup
process. The other code paths of the startup process using WaitLatch()
may be awaken by the latch set by the WAL receiver, but that does not
apply here.

> I'll mark this as ready for committer when I see WL_LATCH_SET added (optional)

Objection on this point.

> and you have reported that you did the following test cases:
> * Startup process vanishes immediately after postmaster dies, while it is waiting for a recovery conflict to be
resolved.
> * Startup process vanishes immediately after "pg_ctl stop -m fast", while it is waiting for a recovery conflict to be
resolved.
> * Startup process resumes WAL application when max_standby_{archive | streaming}_delay is changed from the default -1
toa short period, e.g. 10s, and "pg_ctl reload" is performed, while it is waiting for a recovery conflict to be
resolved.

Fine for me, I have checked those multiple scenarios and the startup
process is more responsive. I have emulated the conflict with a
transaction doing repeatable read on the standby while the master was
deleting and vacuuming a table.

But, actually, after looking again at this patch with fresher eyes, I
am being a bit doubtful that this is really correct... Calling
HandleStartupProcInterrupts() is actually dangerous I think, because
it would reload the GUC context while replaying a record. But I think
that it is cleaner to reload the context after being done with a
record.

It seems to me that the correct way to do things would be to switch
all the conflict code paths to use a latch instead of pg_usleep, by
either use a dedicated latch like the recovery wakeup one, or the
MyLatch of the startup process. Then, when a signal arrives in, we
simply set up the conflict latch which wakes up what is waiting for
activity. The latch needs to be reset because calling WaitLatch().
Leaving immediately on WL_POSTMASTER_DEATH looks fine though as far as
I have tested.

As time is growing short, I am marking this patch as returned with
feedback. Thanks for the input, Tsunakawa-san!
--
Michael



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: Logical decoding on standby
Next
From: Michael Paquier
Date:
Subject: Re: REINDEX CONCURRENTLY 2.0