Thread: [HACKERS] Allow interrupts on waiting standby
Currently a waiting standby doesn't allow interrupts. Patch implements that. Barring objection, patching today with backpatches. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > Currently a waiting standby doesn't allow interrupts. > > Patch implements that. > > Barring objection, patching today with backpatches. "today" is a little quick, but the patch looks fine. I doubt anyone's going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2017-01-26 12:24:44 -0500, Robert Haas wrote: > On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > > Currently a waiting standby doesn't allow interrupts. > > > > Patch implements that. > > > > Barring objection, patching today with backpatches. > > "today" is a little quick, but the patch looks fine. I doubt anyone's > going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call. I don't quite get asking for agreement, and then not waiting as suggested. I'm personally fine with going with a CHECK_FOR_INTERRUPTS for now, but I think it'd better to replace it with a latch. Andres
On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote: > On 2017-01-26 12:24:44 -0500, Robert Haas wrote: >> On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> > Currently a waiting standby doesn't allow interrupts. >> > >> > Patch implements that. >> > >> > Barring objection, patching today with backpatches. >> >> "today" is a little quick, but the patch looks fine. I doubt anyone's >> going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call. > > I don't quite get asking for agreement, and then not waiting as > suggested. I'm personally fine with going with a CHECK_FOR_INTERRUPTS > for now, but I think it'd better to replace it with a latch. I have waited, so not sure what you mean. Tomorrow is too late. Replacing with a latch wouldn't be backpatchable, IMHO. I've no problem if you want to work on a deeper fix for future versions. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2017-01-26 19:36:11 +0000, Simon Riggs wrote: > On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote: > > On 2017-01-26 12:24:44 -0500, Robert Haas wrote: > >> On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> > Currently a waiting standby doesn't allow interrupts. > >> > > >> > Patch implements that. > >> > > >> > Barring objection, patching today with backpatches. > >> > >> "today" is a little quick, but the patch looks fine. I doubt anyone's > >> going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call. > > > > I don't quite get asking for agreement, and then not waiting as > > suggested. I'm personally fine with going with a CHECK_FOR_INTERRUPTS > > for now, but I think it'd better to replace it with a latch. > > I have waited, so not sure what you mean. Well, Robert today said >> "today" is a little quick <<. > Tomorrow is too late. Huh? We're not wrapping today/tomorrow, are we? If I missed something and we are, then sure, it makes sense to push ahead. > Replacing with a latch wouldn't be backpatchable, IMHO. Hm, don't quite see why - isn't it just like three lines? Andres
* Andres Freund (andres@anarazel.de) wrote: > On 2017-01-26 19:36:11 +0000, Simon Riggs wrote: > > Tomorrow is too late. > > Huh? We're not wrapping today/tomorrow, are we? If I missed something > and we are, then sure, it makes sense to push ahead. I haven't seen anyone suggest that we're changing from the regular release cycle, which would mean that we wrap on Monday, Feb 6th, for release on Feb 9th. Thanks! Stephen
Simon Riggs wrote: > On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote: > > I'm personally fine with going with a CHECK_FOR_INTERRUPTS > > for now, but I think it'd better to replace it with a latch. > > I have waited, so not sure what you mean. Tomorrow is too late. > > Replacing with a latch wouldn't be backpatchable, IMHO. Hmm, you pushed this to the master branch as commit e8ee3d6b859a, but I see no backpatch. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote: >> On 2017-01-26 12:24:44 -0500, Robert Haas wrote: >>> On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> > Currently a waiting standby doesn't allow interrupts. >>> > >>> > Patch implements that. >>> > >>> > Barring objection, patching today with backpatches. >>> >>> "today" is a little quick, but the patch looks fine. I doubt anyone's >>> going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call. >> >> I don't quite get asking for agreement, and then not waiting as >> suggested. I'm personally fine with going with a CHECK_FOR_INTERRUPTS >> for now, but I think it'd better to replace it with a latch. > > I have waited, so not sure what you mean. Tomorrow is too late. This gives really little time for any feedback :( > Replacing with a latch wouldn't be backpatchable, IMHO. > I've no problem if you want to work on a deeper fix for future versions. A deeper fix for HEAD proves to not be that complicated. Please see the attached. The other two calls of pg_usleep() in standby.c are waiting with 5ms and 10ms, it is not worth switching them to a latch. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 26 January 2017 at 20:36, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Simon Riggs wrote: >> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote: > >> > I'm personally fine with going with a CHECK_FOR_INTERRUPTS >> > for now, but I think it'd better to replace it with a latch. >> >> I have waited, so not sure what you mean. Tomorrow is too late. >> >> Replacing with a latch wouldn't be backpatchable, IMHO. > > Hmm, you pushed this to the master branch as commit e8ee3d6b859a, but I > see no backpatch. There appeared to be a complaint at my speed, so I slowed down. On reading that there are no objections to this being backpatched, I will do that now. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 27 January 2017 at 01:35, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote: >>> I'm personally fine with going with a CHECK_FOR_INTERRUPTS >>> for now, but I think it'd better to replace it with a latch. >> Replacing with a latch wouldn't be backpatchable, IMHO. >> I've no problem if you want to work on a deeper fix for future versions. > > A deeper fix for HEAD proves to not be that complicated. Please see > the attached. The other two calls of pg_usleep() in standby.c are > waiting with 5ms and 10ms, it is not worth switching them to a latch. So you think 2 calls to pg_usleep() can stay; my opinion is 3 can stay. I'm not clear why this particular call is worthy, while dozens of calls in other modules remain unchanged. This seems like a code issue rather than anything to do with Hot Standby in particular, so it should be another thread. Doesn't seem important compared to other things for this release I should work on. Please add to the next CF so it gets proper review. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Fri, Jan 27, 2017 at 9:23 PM, Simon Riggs <simon@2ndquadrant.com> wrote: > On 27 January 2017 at 01:35, Michael Paquier <michael.paquier@gmail.com> wrote: >> On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote: >> A deeper fix for HEAD proves to not be that complicated. Please see >> the attached. The other two calls of pg_usleep() in standby.c are >> waiting with 5ms and 10ms, it is not worth switching them to a latch. > > So you think 2 calls to pg_usleep() can stay; my opinion is 3 can stay. This patch replaces one call of pg_usleep() where the wait can go up to 1s, this is largely noticeable by the user. The two others sleep for a maximum of 10ms. Even if a latch is used the code path is going to exit immediately anyway. That's why you added a call to CHECK_FOR_INTERRUPTS only here, no? > I'm not clear why this particular call is worthy, while dozens of > calls in other modules remain unchanged. This seems like a code issue > rather than anything to do with Hot Standby in particular, so it > should be another thread. Some should switch to Latch. > Doesn't seem important compared to other > things for this release I should work on. That's your call. > Please add to the next CF so it gets proper review. Sure. -- Michael
On Fri, Jan 27, 2017 at 10:35 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > On Fri, Jan 27, 2017 at 4:36 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> On 26 January 2017 at 19:20, Andres Freund <andres@anarazel.de> wrote: >>> On 2017-01-26 12:24:44 -0500, Robert Haas wrote: >>>> On Thu, Jan 26, 2017 at 7:18 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> > Currently a waiting standby doesn't allow interrupts. >>>> > >>>> > Patch implements that. >>>> > >>>> > Barring objection, patching today with backpatches. >>>> >>>> "today" is a little quick, but the patch looks fine. I doubt anyone's >>>> going to screech too loud about adding a CHECK_FOR_INTERRUPTS() call. >>> >>> I don't quite get asking for agreement, and then not waiting as >>> suggested. I'm personally fine with going with a CHECK_FOR_INTERRUPTS >>> for now, but I think it'd better to replace it with a latch. >> >> I have waited, so not sure what you mean. Tomorrow is too late. > > This gives really little time for any feedback :( > >> Replacing with a latch wouldn't be backpatchable, IMHO. >> I've no problem if you want to work on a deeper fix for future versions. > > A deeper fix for HEAD proves to not be that complicated. Please see > the attached. The other two calls of pg_usleep() in standby.c are > waiting with 5ms and 10ms, it is not worth switching them to a latch. Two things I forgot in this patch: - documentation for the new wait event - the string for the wait event or this would show up as "???" in pg_stat_activity. There are no default clauses in the pgstat_get_wait_* routines so my compiler is actually complaining... -- Michael
On Fri, Jan 27, 2017 at 8:17 AM, Michael Paquier <michael.paquier@gmail.com> wrote: > There are no default clauses in the pgstat_get_wait_* routines so my > compiler is actually complaining... That's exactly WHY there are no default clauses there. :-) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Sat, Jan 28, 2017 at 2:18 AM, Robert Haas <robertmhaas@gmail.com> wrote: > On Fri, Jan 27, 2017 at 8:17 AM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> There are no default clauses in the pgstat_get_wait_* routines so my >> compiler is actually complaining... > > That's exactly WHY there are no default clauses there. :-) And that's exactly why I missed them! ;p -- Michael
On Fri, Jan 27, 2017 at 10:17 PM, Michael Paquier <michael.paquier@gmail.com> wrote: > Two things I forgot in this patch: > - documentation for the new wait event > - the string for the wait event or this would show up as "???" in > pg_stat_activity. > There are no default clauses in the pgstat_get_wait_* routines so my > compiler is actually complaining... Both things are fixed in the new version attached. I have added as well this patch to the next commit fest: https://commitfest.postgresql.org/13/977/ -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
On 1/30/17 20:34, Michael Paquier wrote: > On Fri, Jan 27, 2017 at 10:17 PM, Michael Paquier > <michael.paquier@gmail.com> wrote: >> Two things I forgot in this patch: >> - documentation for the new wait event >> - the string for the wait event or this would show up as "???" in >> pg_stat_activity. >> There are no default clauses in the pgstat_get_wait_* routines so my >> compiler is actually complaining... > > Both things are fixed in the new version attached. I have added as > well this patch to the next commit fest: > https://commitfest.postgresql.org/13/977/ I'm not clear now what this patch is intending to accomplish, seeing that the original issue has already been fixed. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Mar 8, 2017 at 5:45 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 1/30/17 20:34, Michael Paquier wrote: >> Both things are fixed in the new version attached. I have added as >> well this patch to the next commit fest: >> https://commitfest.postgresql.org/13/977/ > > I'm not clear now what this patch is intending to accomplish, seeing > that the original issue has already been fixed. This makes ResolveRecoveryConflictWithVirtualXIDs() more responsive if the process is signaled, as the wait in pg_usleep can go up to 1s if things remain stuck for a long time. -- Michael
Michael Paquier <michael.paquier@gmail.com> writes: > Both things are fixed in the new version attached. I have added as > well this patch to the next commit fest: > https://commitfest.postgresql.org/13/977/ Couple of thoughts on this patch --- 1. Shouldn't WaitExceedsMaxStandbyDelay's CHECK_FOR_INTERRUPTS be moved to after the WaitLatch call? Not much point in being woken immediately by an interrupt if you're not going to respond. 2. Is it OK to ResetLatch here? If the only possible latch event in this process is interrupt requests, then I think WaitLatch, then ResetLatch, then CHECK_FOR_INTERRUPTS is OK; but otherwise it seems like you risk discarding events that need to be serviced later. 3. In the same vein, if we're going to check WL_POSTMASTER_DEATH, should there be a test for that and immediate exit(1) here? 4. I'd be inclined to increase the sleep interval only if we did time out, not if we were awakened by some other event. 5. The comment about maximum sleep length needs some work. At first glance you might think that without the motivation of preventing long uninterruptible sleeps, we might as well allow the sleep length to grow well past 1s. I think that'd be bad, because we want to wake up reasonably soon after the xact(s) we're waiting for commit. But neither the original text nor the proposed replacement mention this. regards, tom lane
On Sun, Mar 19, 2017 at 5:14 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Couple of thoughts on this patch --- Thanks! > 1. Shouldn't WaitExceedsMaxStandbyDelay's CHECK_FOR_INTERRUPTS be moved to > after the WaitLatch call? Not much point in being woken immediately by > an interrupt if you're not going to respond. > > 2. Is it OK to ResetLatch here? If the only possible latch event in this > process is interrupt requests, then I think WaitLatch, then ResetLatch, > then CHECK_FOR_INTERRUPTS is OK; but otherwise it seems like you risk > discarding events that need to be serviced later. Right, I have switched to WaitLatch(), ResetLatch() and then CHECK_FOR_INTERRUPTS(). > 3. In the same vein, if we're going to check WL_POSTMASTER_DEATH, should > there be a test for that and immediate exit(1) here? OK, if the postmaster has died, there is not much recovery conflict needed anyway. > 4. I'd be inclined to increase the sleep interval only if we did time out, > not if we were awakened by some other event. OK, that makes sense. > 5. The comment about maximum sleep length needs some work. At first > glance you might think that without the motivation of preventing long > uninterruptible sleeps, we might as well allow the sleep length to grow > well past 1s. I think that'd be bad, because we want to wake up > reasonably soon after the xact(s) we're waiting for commit. But neither > the original text nor the proposed replacement mention this. OK, I did some work on this comment. What do you think about the updated version attached? -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
Hi, Michael, From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier > What do you think about the updated version attached? I reviewed this patch. Here are some comments and questions: (1) monitoring.sgml The new row needs to be placed in the "Timeout" group. The current patch puts the row at the end of IO group. Please insertthe new row according to the alphabetical order. In addition, "morerows" attribute of the following line appears to be increased. <entry morerows="2"><literal>Timeout</></entry> By the way, doesn't this wait event belong to IPC wait event type, because the process is waiting for other conflicting processesto terminate the conflict conditions? Did you choose Timeout type because max_standby_{archive | streaming}_delayrelates to this wait? I'm not confident which is appropriate, but I'm afraid users can associate this waitwith a timeout. (2) standby.c Do we need to specify WL_LATCH_SET? Who can set the latch? Do the backends who ended the conflict set the latch? (3) standby.c + if (rc & WL_LATCH_SET) + ResetLatch(MyLatch); + + /* emergency bailout if postmaster has died */ + if (rc & WL_POSTMASTER_DEATH) + proc_exit(1); I thought the child processes have to terminate as soon as postmaster vanishes. So, it would be better for the order ofthe two if statements above to be reversed. proc_exit() could be exit(), as some children do in postmaster/*.c. (4) standby.c The latch is not reset when the wait timed out. The next WaitLatch() would return immediately. Regards Takayuki Tsunakawa
(4) standby.c > The latch is not reset when the wait timed out. The next WaitLatch() would > return immediately. Sorry, let me withdraw this. This is my misunderstanding. OTOH, when is the latch reset before the wait? Is there an assumption that MyLatch has been in reset state since it wasinitialized? Is it safe to use MyLatch here, not the special latch like something used in xlog.c? Regards Takayuki Tsunakawa
On Wed, Mar 29, 2017 at 5:04 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > From: pgsql-hackers-owner@postgresql.org >> [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier >> What do you think about the updated version attached? > I reviewed this patch. Here are some comments and questions: Thanks! > (1) monitoring.sgml > The new row needs to be placed in the "Timeout" group. The current patch puts the row at the end of IO group. Pleaseinsert the new row according to the alphabetical order. > > In addition, "morerows" attribute of the following line appears to be increased. > > <entry morerows="2"><literal>Timeout</></entry> > > By the way, doesn't this wait event belong to IPC wait event type, because the process is waiting for other conflictingprocesses to terminate the conflict conditions? Did you choose Timeout type because max_standby_{archive | streaming}_delayrelates to this wait? I'm not confident which is appropriate, but I'm afraid users can associate this waitwith a timeout. Actually I think that this event belongs to the timeout category, because we wait until the timeout has finished, the latch being here to be sure that the process is more responsive in case of a postmaster death. > (2) standby.c > Do we need to specify WL_LATCH_SET? Who can set the latch? Do the backends who ended the conflict set the latch? This makes the process able to react on SIGHUP. That's useful for responsiveness. > (3) standby.c > + if (rc & WL_LATCH_SET) > + ResetLatch(MyLatch); > + > + /* emergency bailout if postmaster has died */ > + if (rc & WL_POSTMASTER_DEATH) > + proc_exit(1); > > I thought the child processes have to terminate as soon as postmaster vanishes. So, it would be better for the order ofthe two if statements above to be reversed. proc_exit() could be exit(), as some children do in postmaster/*.c. OK, reversed this order. Attached is v4. -- Michael
Attachment
From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier > > By the way, doesn't this wait event belong to IPC wait event type, because > the process is waiting for other conflicting processes to terminate the > conflict conditions? Did you choose Timeout type because > max_standby_{archive | streaming}_delay relates to this wait? I'm not > confident which is appropriate, but I'm afraid users can associate this > wait with a timeout. > > Actually I think that this event belongs to the timeout category, because > we wait until the timeout has finished, the latch being here to be sure > that the process is more responsive in case of a postmaster death. OK. I confirmed the doc is fixed. > > (2) standby.c > > Do we need to specify WL_LATCH_SET? Who can set the latch? Do the > backends who ended the conflict set the latch? > > This makes the process able to react on SIGHUP. That's useful for > responsiveness. Oh, I see. But how does the startup process respond quickly? It seems that you need to call HandleStartupProcInterrupts()instead of CHECK_FOR_INTERRUPTS(). But I'm not sure whether HandleStartupProcInterrupts() canbe called here. > > (3) standby.c > > + if (rc & WL_LATCH_SET) > > + ResetLatch(MyLatch); > > + > > + /* emergency bailout if postmaster has died */ > > + if (rc & WL_POSTMASTER_DEATH) > > + proc_exit(1); > > > > I thought the child processes have to terminate as soon as postmaster > vanishes. So, it would be better for the order of the two if statements > above to be reversed. proc_exit() could be exit(), as some children do > in postmaster/*.c. > > OK, reversed this order. I think exit() instead of proc_exit() better represents what the code wants to do -- terminate the process ASAP without cleaningup. Many other background children do so. Regards Takayuki Tsunakawa
On Thu, Mar 30, 2017 at 1:52 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > From: pgsql-hackers-owner@postgresql.org >> > (2) standby.c >> > Do we need to specify WL_LATCH_SET? Who can set the latch? Do the >> backends who ended the conflict set the latch? >> >> This makes the process able to react on SIGHUP. That's useful for >> responsiveness. > > Oh, I see. But how does the startup process respond quickly? It seems that you need to call HandleStartupProcInterrupts()instead of CHECK_FOR_INTERRUPTS(). But I'm not sure whether HandleStartupProcInterrupts() canbe called here. Bah. Of course you are right. We don't care about SetLatch() here as signals are processed with a different code path than normal backends. >> > (3) standby.c >> > + if (rc & WL_LATCH_SET) >> > + ResetLatch(MyLatch); >> > + >> > + /* emergency bailout if postmaster has died */ >> > + if (rc & WL_POSTMASTER_DEATH) >> > + proc_exit(1); >> > >> > I thought the child processes have to terminate as soon as postmaster >> vanishes. So, it would be better for the order of the two if statements >> above to be reversed. proc_exit() could be exit(), as some children do >> in postmaster/*.c. >> >> OK, reversed this order. > > I think exit() instead of proc_exit() better represents what the code wants to do -- terminate the process ASAP withoutcleaning up. Many other background children do so. Hm... OK. -- Michael
Attachment
Hi Michael, Simon, From: pgsql-hackers-owner@postgresql.org > [mailto:pgsql-hackers-owner@postgresql.org] On Behalf Of Michael Paquier > > Oh, I see. But how does the startup process respond quickly? It seems > that you need to call HandleStartupProcInterrupts() instead of > CHECK_FOR_INTERRUPTS(). But I'm not sure whether > HandleStartupProcInterrupts() can be called here. > > Bah. Of course you are right. We don't care about SetLatch() here as signals > are processed with a different code path than normal backends. So, I understood you agreed that CHECK_FOR_INTERRUPTS() here does nothing. But your patch still calls it: + /* An interrupt may have occurred while waiting */ + CHECK_FOR_INTERRUPTS(); I got confused because the problem is not defined in this thread. What problem does this patch address? These ones? * The startup process terminates as soon as postmaster dies. * pg_stat_activity does not show the wait event of startup process waiting for a recovery conflict resolution. My guess about why you decided to not call HandleStartupProcInterrupts() here is: * Respond to postmaster death here. * No need to reload config file here because there's no parameter to affect this conflict wait. But max_standby_{archive| streaming}_delay seems to affect the wait period. * No need to handle SIGTERM and exit here, because the startup process doesn't wait for a conflict resolution here when hecan terminate. I think you can call HandleStartupProcInterrupts() here, instead of checking postmaster death. Did you perform tests? Did Simon's committed patch solve the problem as expected? Regards Takayuki Tsunakawa
On Thu, Mar 30, 2017 at 4:02 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > I think you can call HandleStartupProcInterrupts() here, instead of checking postmaster death. Oops, sorry for that, I quite mess up with this patch. The WaitLatch() call should still have WL_POSTMASTER_DEATH so as it can leave earlier, but yes I agree with your analysis that HandleStartupProcInterrupts() as this is part of the redo work. > Did Simon's committed patch solve the problem as expected? Does not seem so, I'll let Simon comment on this matter... -- Michael
Attachment
From: Michael Paquier [mailto:michael.paquier@gmail.com] > Oops, sorry for that, I quite mess up with this patch. The WaitLatch() call > should still have WL_POSTMASTER_DEATH so as it can leave earlier, but yes > I agree with your analysis that HandleStartupProcInterrupts() as this is > part of the redo work. Thank you, but did you remove WL_LATCH_SET from WaitLatch() intentionally? I understood you added it for startup processto respond quickly to events other than the postmaster death. Why don't we restore WL_LATCH_SET? I won't objectto not adding the flag if there's a reason. I'll mark this as ready for committer when I see WL_LATCH_SET added (optional) and you have reported that you did the followingtest cases: * Startup process vanishes immediately after postmaster dies, while it is waiting for a recovery conflict to be resolved. * Startup process vanishes immediately after "pg_ctl stop -m fast", while it is waiting for a recovery conflict to be resolved. * Startup process resumes WAL application when max_standby_{archive | streaming}_delay is changed from the default -1 toa short period, e.g. 10s, and "pg_ctl reload" is performed, while it is waiting for a recovery conflict to be resolved. > > Did Simon's committed patch solve the problem as expected? > > Does not seem so, I'll let Simon comment on this matter... Agreed. I guess his patch for earlier releases should work if CHECK_FOR_INTERRUPTS() is replaced with HandleStartupProcInterrupts(). Regards Takayuki Tsunakawa
On Fri, Mar 31, 2017 at 4:46 PM, Tsunakawa, Takayuki <tsunakawa.takay@jp.fujitsu.com> wrote: > Thank you, but did you remove WL_LATCH_SET from WaitLatch() intentionally? I understood you added it for startup processto respond quickly to events other than the postmaster death. Why don't we restore WL_LATCH_SET? I won't objectto not adding the flag if there's a reason. It doesn't matter. MyLatch is never set in the context of the startup process. The other code paths of the startup process using WaitLatch() may be awaken by the latch set by the WAL receiver, but that does not apply here. > I'll mark this as ready for committer when I see WL_LATCH_SET added (optional) Objection on this point. > and you have reported that you did the following test cases: > * Startup process vanishes immediately after postmaster dies, while it is waiting for a recovery conflict to be resolved. > * Startup process vanishes immediately after "pg_ctl stop -m fast", while it is waiting for a recovery conflict to be resolved. > * Startup process resumes WAL application when max_standby_{archive | streaming}_delay is changed from the default -1 toa short period, e.g. 10s, and "pg_ctl reload" is performed, while it is waiting for a recovery conflict to be resolved. Fine for me, I have checked those multiple scenarios and the startup process is more responsive. I have emulated the conflict with a transaction doing repeatable read on the standby while the master was deleting and vacuuming a table. But, actually, after looking again at this patch with fresher eyes, I am being a bit doubtful that this is really correct... Calling HandleStartupProcInterrupts() is actually dangerous I think, because it would reload the GUC context while replaying a record. But I think that it is cleaner to reload the context after being done with a record. It seems to me that the correct way to do things would be to switch all the conflict code paths to use a latch instead of pg_usleep, by either use a dedicated latch like the recovery wakeup one, or the MyLatch of the startup process. Then, when a signal arrives in, we simply set up the conflict latch which wakes up what is waiting for activity. The latch needs to be reset because calling WaitLatch(). Leaving immediately on WL_POSTMASTER_DEATH looks fine though as far as I have tested. As time is growing short, I am marking this patch as returned with feedback. Thanks for the input, Tsunakawa-san! -- Michael