Thread: [HACKERS] Allow interrupts on waiting standby

[HACKERS] Allow interrupts on waiting standby

From
Simon Riggs
Date:
Currently a waiting standby doesn't allow interrupts.

Patch implements that.

Barring objection, patching today with backpatches.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Allow interrupts on waiting standby

From
Robert Haas
Date:
On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> Currently a waiting standby doesn't allow interrupts.
>
> Patch implements that.
>
> Barring objection, patching today with backpatches.

"today" is a little quick, but the patch looks fine.  I doubt anyone's
going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Allow interrupts on waiting standby

From
Andres Freund
Date:
On 2017-01-26 12:24:44 -0500, Robert Haas wrote:
> On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> > Currently a waiting standby doesn't allow interrupts.
> >
> > Patch implements that.
> >
> > Barring objection, patching today with backpatches.
> 
> "today" is a little quick, but the patch looks fine.  I doubt anyone's
> going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.

I don't quite get asking for agreement, and then not waiting as
suggested.  I'm personally fine with going with a CHECK_FOR_INTERRUPTS
for now, but I think it'd better to replace it with a latch.

Andres



Re: [HACKERS] Allow interrupts on waiting standby

From
Simon Riggs
Date:
On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
> On 2017-01-26 12:24:44 -0500, Robert Haas wrote:
>> On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> > Currently a waiting standby doesn't allow interrupts.
>> >
>> > Patch implements that.
>> >
>> > Barring objection, patching today with backpatches.
>>
>> "today" is a little quick, but the patch looks fine.  I doubt anyone's
>> going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.
>
> I don't quite get asking for agreement, and then not waiting as
> suggested.  I'm personally fine with going with a CHECK_FOR_INTERRUPTS
> for now, but I think it'd better to replace it with a latch.

I have waited, so not sure what you mean. Tomorrow is too late.

Replacing with a latch wouldn't be backpatchable, IMHO.

I've no problem if you want to work on a deeper fix for future versions.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Allow interrupts on waiting standby

From
Andres Freund
Date:
On 2017-01-26 19:36:11 +0000, Simon Riggs wrote:
> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
> > On 2017-01-26 12:24:44 -0500, Robert Haas wrote:
> >> On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> > Currently a waiting standby doesn't allow interrupts.
> >> >
> >> > Patch implements that.
> >> >
> >> > Barring objection, patching today with backpatches.
> >>
> >> "today" is a little quick, but the patch looks fine.  I doubt anyone's
> >> going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.
> >
> > I don't quite get asking for agreement, and then not waiting as
> > suggested.  I'm personally fine with going with a CHECK_FOR_INTERRUPTS
> > for now, but I think it'd better to replace it with a latch.
> 
> I have waited, so not sure what you mean.

Well, Robert today said >> "today" is a little quick <<.


> Tomorrow is too late.

Huh? We're not wrapping today/tomorrow, are we?  If I missed something
and we are, then sure, it makes sense to push ahead.


> Replacing with a latch wouldn't be backpatchable, IMHO.

Hm, don't quite see why - isn't it just like three lines?


Andres



Re: [HACKERS] Allow interrupts on waiting standby

From
Stephen Frost
Date:
* Andres Freund (andres@anarazel.de) wrote:
> On 2017-01-26 19:36:11 +0000, Simon Riggs wrote:
> > Tomorrow is too late.
>
> Huh? We're not wrapping today/tomorrow, are we?  If I missed something
> and we are, then sure, it makes sense to push ahead.

I haven't seen anyone suggest that we're changing from the regular
release cycle, which would mean that we wrap on Monday, Feb 6th, for
release on Feb 9th.

Thanks!

Stephen

Re: [HACKERS] Allow interrupts on waiting standby

From
Alvaro Herrera
Date:
Simon Riggs wrote:
> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:

> > I'm personally fine with going with a CHECK_FOR_INTERRUPTS
> > for now, but I think it'd better to replace it with a latch.
> 
> I have waited, so not sure what you mean. Tomorrow is too late.
> 
> Replacing with a latch wouldn't be backpatchable, IMHO.

Hmm, you pushed this to the master branch as commit e8ee3d6b859a, but I
see no backpatch.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
>> On 2017-01-26 12:24:44 -0500, Robert Haas wrote:
>>> On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> > Currently a waiting standby doesn't allow interrupts.
>>> >
>>> > Patch implements that.
>>> >
>>> > Barring objection, patching today with backpatches.
>>>
>>> "today" is a little quick, but the patch looks fine.  I doubt anyone's
>>> going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.
>>
>> I don't quite get asking for agreement, and then not waiting as
>> suggested.  I'm personally fine with going with a CHECK_FOR_INTERRUPTS
>> for now, but I think it'd better to replace it with a latch.
>
> I have waited, so not sure what you mean. Tomorrow is too late.

This gives really little time for any feedback :(

> Replacing with a latch wouldn't be backpatchable, IMHO.
> I've no problem if you want to work on a deeper fix for future versions.

A deeper fix for HEAD proves to not be that complicated. Please see
the attached. The other two calls of pg_usleep() in standby.c are
waiting with 5ms and 10ms, it is not worth switching them to a latch.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Allow interrupts on waiting standby

From
Simon Riggs
Date:
On 26 January 2017 at 20:36, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Simon Riggs wrote:
>> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
>
>> > I'm personally fine with going with a CHECK_FOR_INTERRUPTS
>> > for now, but I think it'd better to replace it with a latch.
>>
>> I have waited, so not sure what you mean. Tomorrow is too late.
>>
>> Replacing with a latch wouldn't be backpatchable, IMHO.
>
> Hmm, you pushed this to the master branch as commit e8ee3d6b859a, but I
> see no backpatch.

There appeared to be a complaint at my speed, so I slowed down.

On reading that there are no objections to this being backpatched, I
will do that now.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Allow interrupts on waiting standby

From
Simon Riggs
Date:
On 27 January 2017 at 01:35, Michael Paquier <michael.paquier@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:

>>> I'm personally fine with going with a CHECK_FOR_INTERRUPTS
>>> for now, but I think it'd better to replace it with a latch.

>> Replacing with a latch wouldn't be backpatchable, IMHO.
>> I've no problem if you want to work on a deeper fix for future versions.
>
> A deeper fix for HEAD proves to not be that complicated. Please see
> the attached. The other two calls of pg_usleep() in standby.c are
> waiting with 5ms and 10ms, it is not worth switching them to a latch.

So you think 2 calls to pg_usleep() can stay; my opinion is 3 can stay.

I'm not clear why this particular call is worthy, while dozens of
calls in other modules remain unchanged. This seems like a code issue
rather than anything to do with Hot Standby in particular, so it
should be another thread. Doesn't seem important compared to other
things for this release I should work on.

Please add to the next CF so it gets proper review.

-- 
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Fri, Jan 27, 2017 at 9:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 27 January 2017 at 01:35, Michael Paquier <michael.paquier@gmail.com> wrote:
>> On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
>> A deeper fix for HEAD proves to not be that complicated. Please see
>> the attached. The other two calls of pg_usleep() in standby.c are
>> waiting with 5ms and 10ms, it is not worth switching them to a latch.
>
> So you think 2 calls to pg_usleep() can stay; my opinion is 3 can stay.

This patch replaces one call of pg_usleep() where the wait can go up
to 1s, this is largely noticeable by the user. The two others sleep
for a maximum of 10ms. Even if a latch is used the code path is going
to exit immediately anyway. That's why you added a call to
CHECK_FOR_INTERRUPTS only here, no?

> I'm not clear why this particular call is worthy, while dozens of
> calls in other modules remain unchanged. This seems like a code issue
> rather than anything to do with Hot Standby in particular, so it
> should be another thread.

Some should switch to Latch.

> Doesn't seem important compared to other
> things for this release I should work on.

That's your call.

> Please add to the next CF so it gets proper review.

Sure.
-- 
Michael



Re: [HACKERS] Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Fri, Jan 27, 2017 at 10:35 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote:
>>> On 2017-01-26 12:24:44 -0500, Robert Haas wrote:
>>>> On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>>> > Currently a waiting standby doesn't allow interrupts.
>>>> >
>>>> > Patch implements that.
>>>> >
>>>> > Barring objection, patching today with backpatches.
>>>>
>>>> "today" is a little quick, but the patch looks fine.  I doubt anyone's
>>>> going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call.
>>>
>>> I don't quite get asking for agreement, and then not waiting as
>>> suggested.  I'm personally fine with going with a CHECK_FOR_INTERRUPTS
>>> for now, but I think it'd better to replace it with a latch.
>>
>> I have waited, so not sure what you mean. Tomorrow is too late.
>
> This gives really little time for any feedback :(
>
>> Replacing with a latch wouldn't be backpatchable, IMHO.
>> I've no problem if you want to work on a deeper fix for future versions.
>
> A deeper fix for HEAD proves to not be that complicated. Please see
> the attached. The other two calls of pg_usleep() in standby.c are
> waiting with 5ms and 10ms, it is not worth switching them to a latch.

Two things I forgot in this patch:
- documentation for the new wait event
- the string for the wait event or this would show up as "???" in
pg_stat_activity.
There are no default clauses in the pgstat_get_wait_* routines so my
compiler is actually complaining...
-- 
Michael



Re: [HACKERS] Allow interrupts on waiting standby

From
Robert Haas
Date:
On Fri, Jan 27, 2017 at 8:17 AM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> There are no default clauses in the pgstat_get_wait_* routines so my
> compiler is actually complaining...

That's exactly WHY there are no default clauses there.   :-)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Sat, Jan 28, 2017 at 2:18 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jan 27, 2017 at 8:17 AM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> There are no default clauses in the pgstat_get_wait_* routines so my
>> compiler is actually complaining...
>
> That's exactly WHY there are no default clauses there.   :-)

And that's exactly why I missed them! ;p
-- 
Michael



Re: [HACKERS] Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Fri, Jan 27, 2017 at 10:17 PM, Michael Paquier
<michael.paquier@gmail.com> wrote:
> Two things I forgot in this patch:
> - documentation for the new wait event
> - the string for the wait event or this would show up as "???" in
> pg_stat_activity.
> There are no default clauses in the pgstat_get_wait_* routines so my
> compiler is actually complaining...

Both things are fixed in the new version attached. I have added as
well this patch to the next commit fest:
https://commitfest.postgresql.org/13/977/
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: [HACKERS] Allow interrupts on waiting standby

From
Peter Eisentraut
Date:
On 1/30/17 20:34, Michael Paquier wrote:
> On Fri, Jan 27, 2017 at 10:17 PM, Michael Paquier
> <michael.paquier@gmail.com> wrote:
>> Two things I forgot in this patch:
>> - documentation for the new wait event
>> - the string for the wait event or this would show up as "???" in
>> pg_stat_activity.
>> There are no default clauses in the pgstat_get_wait_* routines so my
>> compiler is actually complaining...
> 
> Both things are fixed in the new version attached. I have added as
> well this patch to the next commit fest:
> https://commitfest.postgresql.org/13/977/

I'm not clear now what this patch is intending to accomplish, seeing
that the original issue has already been fixed.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: [HACKERS] Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Wed, Mar 8, 2017 at 5:45 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 1/30/17 20:34, Michael Paquier wrote:
>> Both things are fixed in the new version attached. I have added as
>> well this patch to the next commit fest:
>> https://commitfest.postgresql.org/13/977/
>
> I'm not clear now what this patch is intending to accomplish, seeing
> that the original issue has already been fixed.

This makes ResolveRecoveryConflictWithVirtualXIDs() more responsive if
the process is signaled, as the wait in pg_usleep can go up to 1s if
things remain stuck for a long time.
-- 
Michael



Re: [HACKERS] Allow interrupts on waiting standby

From
Tom Lane
Date:
Michael Paquier <michael.paquier@gmail.com> writes:
> Both things are fixed in the new version attached. I have added as
> well this patch to the next commit fest:
> https://commitfest.postgresql.org/13/977/

Couple of thoughts on this patch ---

1. Shouldn't WaitExceedsMaxStandbyDelay's CHECK_FOR_INTERRUPTS be moved to
after the WaitLatch call?  Not much point in being woken immediately by
an interrupt if you're not going to respond.

2. Is it OK to ResetLatch here?  If the only possible latch event in this
process is interrupt requests, then I think WaitLatch, then ResetLatch,
then CHECK_FOR_INTERRUPTS is OK; but otherwise it seems like you risk
discarding events that need to be serviced later.

3. In the same vein, if we're going to check WL_POSTMASTER_DEATH, should
there be a test for that and immediate exit(1) here?

4. I'd be inclined to increase the sleep interval only if we did time out,
not if we were awakened by some other event.

5. The comment about maximum sleep length needs some work.  At first
glance you might think that without the motivation of preventing long
uninterruptible sleeps, we might as well allow the sleep length to grow
well past 1s.  I think that'd be bad, because we want to wake up
reasonably soon after the xact(s) we're waiting for commit.  But neither
the original text nor the proposed replacement mention this.
        regards, tom lane



Re: [HACKERS] Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Sun, Mar 19, 2017 at 5:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Couple of thoughts on this patch ---

Thanks!

> 1. Shouldn't WaitExceedsMaxStandbyDelay's CHECK_FOR_INTERRUPTS be moved to
> after the WaitLatch call?  Not much point in being woken immediately by
> an interrupt if you're not going to respond.
>
> 2. Is it OK to ResetLatch here?  If the only possible latch event in this
> process is interrupt requests, then I think WaitLatch, then ResetLatch,
> then CHECK_FOR_INTERRUPTS is OK; but otherwise it seems like you risk
> discarding events that need to be serviced later.

Right, I have switched to WaitLatch(), ResetLatch() and then
CHECK_FOR_INTERRUPTS().

> 3. In the same vein, if we're going to check WL_POSTMASTER_DEATH, should
> there be a test for that and immediate exit(1) here?

OK, if the postmaster has died, there is not much recovery conflict
needed anyway.

> 4. I'd be inclined to increase the sleep interval only if we did time out,
> not if we were awakened by some other event.

OK, that makes sense.

> 5. The comment about maximum sleep length needs some work.  At first
> glance you might think that without the motivation of preventing long
> uninterruptible sleeps, we might as well allow the sleep length to grow
> well past 1s.  I think that'd be bad, because we want to wake up
> reasonably soon after the xact(s) we're waiting for commit.  But neither
> the original text nor the proposed replacement mention this.

OK, I did some work on this comment.

What do you think about the updated version attached?
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment

Re: Allow interrupts on waiting standby

From
"Tsunakawa, Takayuki"
Date:
Hi, Michael,


From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
> What do you think about the updated version attached?

I reviewed this patch.  Here are some comments and questions:


(1) monitoring.sgml
The new row needs to be placed in the "Timeout" group.  The current patch puts the row at the end of IO group.  Please
insertthe new row according to the alphabetical order.
 

In addition, "morerows" attribute of the following line appears to be increased.
        <entry morerows="2"><literal>Timeout</></entry>

By the way, doesn't this wait event belong to IPC wait event type, because the process is waiting for other conflicting
processesto terminate the conflict conditions?  Did you choose Timeout type because max_standby_{archive |
streaming}_delayrelates to this wait?  I'm not confident which is appropriate, but I'm afraid users can associate this
waitwith a timeout.
 


(2) standby.c
Do we need to specify WL_LATCH_SET?  Who can set the latch?  Do the backends who ended the conflict set the latch?


(3) standby.c
+    if (rc & WL_LATCH_SET)
+        ResetLatch(MyLatch);
+
+    /* emergency bailout if postmaster has died */
+    if (rc & WL_POSTMASTER_DEATH)
+        proc_exit(1);

I thought the child processes have to terminate as soon as postmaster vanishes.  So, it would be better for the order
ofthe two if statements above to be reversed.  proc_exit() could be exit(), as some children do in postmaster/*.c.
 



(4) standby.c
The latch is not reset when the wait timed out.  The next WaitLatch() would return immediately.


Regards
Takayuki Tsunakawa



Re: Allow interrupts on waiting standby

From
"Tsunakawa, Takayuki"
Date:
(4) standby.c
> The latch is not reset when the wait timed out.  The next WaitLatch() would
> return immediately.

Sorry, let me withdraw this.  This is my misunderstanding.

OTOH, when is the latch reset before the wait?  Is there an assumption that MyLatch has been in reset state since it
wasinitialized?
 
Is it safe to use MyLatch here, not the special latch like something used in xlog.c?


Regards
Takayuki Tsunakawa


Re: Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Wed, Mar 29, 2017 at 5:04 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: pgsql-hackers-owner@postgresql.org
>> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
>> What do you think about the updated version attached?
> I reviewed this patch.  Here are some comments and questions:

Thanks!

> (1) monitoring.sgml
> The new row needs to be placed in the "Timeout" group.  The current patch puts the row at the end of IO group.
Pleaseinsert the new row according to the alphabetical order. 
>
> In addition, "morerows" attribute of the following line appears to be increased.
>
>          <entry morerows="2"><literal>Timeout</></entry>
>
> By the way, doesn't this wait event belong to IPC wait event type, because the process is waiting for other
conflictingprocesses to terminate the conflict conditions?  Did you choose Timeout type because max_standby_{archive |
streaming}_delayrelates to this wait?  I'm not confident which is appropriate, but I'm afraid users can associate this
waitwith a timeout. 

Actually I think that this event belongs to the timeout category,
because we wait until the timeout has finished, the latch being here
to be sure that the process is more responsive in case of a postmaster
death.

> (2) standby.c
> Do we need to specify WL_LATCH_SET?  Who can set the latch?  Do the backends who ended the conflict set the latch?

This makes the process able to react on SIGHUP. That's useful for
responsiveness.

> (3) standby.c
> +       if (rc & WL_LATCH_SET)
> +               ResetLatch(MyLatch);
> +
> +       /* emergency bailout if postmaster has died */
> +       if (rc & WL_POSTMASTER_DEATH)
> +               proc_exit(1);
>
> I thought the child processes have to terminate as soon as postmaster vanishes.  So, it would be better for the order
ofthe two if statements above to be reversed.  proc_exit() could be exit(), as some children do in postmaster/*.c. 

OK, reversed this order.

Attached is v4.
--
Michael

Attachment

Re: Allow interrupts on waiting standby

From
"Tsunakawa, Takayuki"
Date:
From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
> > By the way, doesn't this wait event belong to IPC wait event type, because
> the process is waiting for other conflicting processes to terminate the
> conflict conditions?  Did you choose Timeout type because
> max_standby_{archive | streaming}_delay relates to this wait?  I'm not
> confident which is appropriate, but I'm afraid users can associate this
> wait with a timeout.
> 
> Actually I think that this event belongs to the timeout category, because
> we wait until the timeout has finished, the latch being here to be sure
> that the process is more responsive in case of a postmaster death.

OK.  I confirmed the doc is fixed.

> > (2) standby.c
> > Do we need to specify WL_LATCH_SET?  Who can set the latch?  Do the
> backends who ended the conflict set the latch?
> 
> This makes the process able to react on SIGHUP. That's useful for
> responsiveness.

Oh, I see.  But how does the startup process respond quickly?  It seems that you need to call
HandleStartupProcInterrupts()instead of CHECK_FOR_INTERRUPTS().  But I'm not sure whether HandleStartupProcInterrupts()
canbe called here.
 



> > (3) standby.c
> > +       if (rc & WL_LATCH_SET)
> > +               ResetLatch(MyLatch);
> > +
> > +       /* emergency bailout if postmaster has died */
> > +       if (rc & WL_POSTMASTER_DEATH)
> > +               proc_exit(1);
> >
> > I thought the child processes have to terminate as soon as postmaster
> vanishes.  So, it would be better for the order of the two if statements
> above to be reversed.  proc_exit() could be exit(), as some children do
> in postmaster/*.c.
> 
> OK, reversed this order.

I think exit() instead of proc_exit() better represents what the code wants to do -- terminate the process ASAP without
cleaningup.  Many other background children do so.
 


Regards
Takayuki Tsunakawa


Re: Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Thu, Mar 30, 2017 at 1:52 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> From: pgsql-hackers-owner@postgresql.org
>> > (2) standby.c
>> > Do we need to specify WL_LATCH_SET?  Who can set the latch?  Do the
>> backends who ended the conflict set the latch?
>>
>> This makes the process able to react on SIGHUP. That's useful for
>> responsiveness.
>
> Oh, I see.  But how does the startup process respond quickly?  It seems that you need to call
HandleStartupProcInterrupts()instead of CHECK_FOR_INTERRUPTS().  But I'm not sure whether HandleStartupProcInterrupts()
canbe called here.
 

Bah. Of course you are right. We don't care about SetLatch() here as
signals are processed with a different code path than normal backends.

>> > (3) standby.c
>> > +       if (rc & WL_LATCH_SET)
>> > +               ResetLatch(MyLatch);
>> > +
>> > +       /* emergency bailout if postmaster has died */
>> > +       if (rc & WL_POSTMASTER_DEATH)
>> > +               proc_exit(1);
>> >
>> > I thought the child processes have to terminate as soon as postmaster
>> vanishes.  So, it would be better for the order of the two if statements
>> above to be reversed.  proc_exit() could be exit(), as some children do
>> in postmaster/*.c.
>>
>> OK, reversed this order.
>
> I think exit() instead of proc_exit() better represents what the code wants to do -- terminate the process ASAP
withoutcleaning up.  Many other background children do so.
 

Hm... OK.
-- 
Michael

Attachment

Re: Allow interrupts on waiting standby

From
"Tsunakawa, Takayuki"
Date:
Hi Michael, Simon,

From: pgsql-hackers-owner@postgresql.org
> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier
> > Oh, I see.  But how does the startup process respond quickly?  It seems
> that you need to call HandleStartupProcInterrupts() instead of
> CHECK_FOR_INTERRUPTS().  But I'm not sure whether
> HandleStartupProcInterrupts() can be called here.
> 
> Bah. Of course you are right. We don't care about SetLatch() here as signals
> are processed with a different code path than normal backends.

So, I understood you agreed that CHECK_FOR_INTERRUPTS() here does nothing.  But your patch still calls it:

+    /* An interrupt may have occurred while waiting */
+    CHECK_FOR_INTERRUPTS();

I got confused because the problem is not defined in this thread.  What problem does this patch address?  These ones?

* The startup process terminates as soon as postmaster dies.
* pg_stat_activity does not show the wait event of startup process waiting for a recovery conflict resolution.


My guess about why you decided to not call HandleStartupProcInterrupts() here is:

* Respond to postmaster death here.
* No need to reload config file here because there's no parameter to affect this conflict wait.  But
max_standby_{archive| streaming}_delay seems to affect the wait period.
 
* No need to handle SIGTERM and exit here, because the startup process doesn't wait for a conflict resolution here when
hecan terminate.
 

I think you can call HandleStartupProcInterrupts() here, instead of checking postmaster death.  Did you perform tests?

Did Simon's committed patch solve the problem as expected?

Regards
Takayuki Tsunakawa


Re: Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Thu, Mar 30, 2017 at 4:02 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> I think you can call HandleStartupProcInterrupts() here, instead of checking postmaster death.

Oops, sorry for that, I quite mess up with this patch. The WaitLatch()
call should still have WL_POSTMASTER_DEATH so as it can leave earlier,
but yes I agree with your analysis that HandleStartupProcInterrupts()
as this is part of the redo work.

> Did Simon's committed patch solve the problem as expected?

Does not seem so, I'll let Simon comment on this matter...
-- 
Michael

Attachment

Re: Allow interrupts on waiting standby

From
"Tsunakawa, Takayuki"
Date:
From: Michael Paquier [mailto:michael.paquier@gmail.com]
> Oops, sorry for that, I quite mess up with this patch. The WaitLatch() call
> should still have WL_POSTMASTER_DEATH so as it can leave earlier, but yes
> I agree with your analysis that HandleStartupProcInterrupts() as this is
> part of the redo work.

Thank you, but did you remove WL_LATCH_SET from WaitLatch() intentionally?  I understood you added it for startup
processto respond quickly to events other than the postmaster death.  Why don't we restore WL_LATCH_SET?  I won't
objectto not adding the flag if there's a reason.
 

I'll mark this as ready for committer when I see WL_LATCH_SET added (optional) and you have reported that you did the
followingtest cases:
 

* Startup process vanishes immediately after postmaster dies, while it is waiting for a recovery conflict to be
resolved.

* Startup process vanishes immediately after "pg_ctl stop -m fast", while it is waiting for a recovery conflict to be
resolved.

* Startup process resumes WAL application when max_standby_{archive | streaming}_delay is changed from the default -1
toa short period, e.g. 10s, and "pg_ctl reload" is performed, while it is waiting for a recovery conflict to be
resolved.


> > Did Simon's committed patch solve the problem as expected?
> 
> Does not seem so, I'll let Simon comment on this matter...

Agreed.  I guess his patch for earlier releases should work if CHECK_FOR_INTERRUPTS() is replaced with
HandleStartupProcInterrupts().

Regards
Takayuki Tsunakawa


Re: Allow interrupts on waiting standby

From
Michael Paquier
Date:
On Fri, Mar 31, 2017 at 4:46 PM, Tsunakawa, Takayuki
<tsunakawa.takay@jp.fujitsu.com> wrote:
> Thank you, but did you remove WL_LATCH_SET from WaitLatch() intentionally?  I understood you added it for startup
processto respond quickly to events other than the postmaster death.  Why don't we restore WL_LATCH_SET?  I won't
objectto not adding the flag if there's a reason. 

It doesn't matter. MyLatch is never set in the context of the startup
process. The other code paths of the startup process using WaitLatch()
may be awaken by the latch set by the WAL receiver, but that does not
apply here.

> I'll mark this as ready for committer when I see WL_LATCH_SET added (optional)

Objection on this point.

> and you have reported that you did the following test cases:
> * Startup process vanishes immediately after postmaster dies, while it is waiting for a recovery conflict to be
resolved.
> * Startup process vanishes immediately after "pg_ctl stop -m fast", while it is waiting for a recovery conflict to be
resolved.
> * Startup process resumes WAL application when max_standby_{archive | streaming}_delay is changed from the default -1
toa short period, e.g. 10s, and "pg_ctl reload" is performed, while it is waiting for a recovery conflict to be
resolved.

Fine for me, I have checked those multiple scenarios and the startup
process is more responsive. I have emulated the conflict with a
transaction doing repeatable read on the standby while the master was
deleting and vacuuming a table.

But, actually, after looking again at this patch with fresher eyes, I
am being a bit doubtful that this is really correct... Calling
HandleStartupProcInterrupts() is actually dangerous I think, because
it would reload the GUC context while replaying a record. But I think
that it is cleaner to reload the context after being done with a
record.

It seems to me that the correct way to do things would be to switch
all the conflict code paths to use a latch instead of pg_usleep, by
either use a dedicated latch like the recovery wakeup one, or the
MyLatch of the startup process. Then, when a signal arrives in, we
simply set up the conflict latch which wakes up what is waiting for
activity. The latch needs to be reset because calling WaitLatch().
Leaving immediately on WL_POSTMASTER_DEATH looks fine though as far as
I have tested.

As time is growing short, I am marking this patch as returned with
feedback. Thanks for the input, Tsunakawa-san!
--
Michael