Thread: [HACKERS] pg_ctl wait exit code (was Re: [COMMITTERS] pgsql: Additional testsfor subtransactions in recovery)
[HACKERS] pg_ctl wait exit code (was Re: [COMMITTERS] pgsql: Additional testsfor subtransactions in recovery)
From
Peter Eisentraut
Date:
On 4/27/17 08:41, Michael Paquier wrote: > +$node_slave->promote; > +$node_slave->poll_query_until('postgres', > + "SELECT NOT pg_is_in_recovery()") > + or die "Timed out while waiting for promotion of standby"; > > This reminds me that we should really switch PostgresNode::promote to > use the wait mode of pg_ctl promote, and remove all those polling > queries... I was going to say: This should all be obsolete already, because pg_ctl promote waits by default. However: Failure to complete promotion within the waiting time does not lead to an error exit, so you will not get a failure if the promotion does not finish. This is probably a mistake. Looking around pg_ctl, I found that this was handled seemingly inconsistently in do_start(), but do_stop() errors when it does not complete. Possible patches for this attached. Perhaps we need a separate exit code in pg_ctl to distinguish general errors from did not finish within timeout? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
[HACKERS] Re: pg_ctl wait exit code (was Re: [COMMITTERS] pgsql: Additionaltests for subtransactions in recovery)
From
Peter Eisentraut
Date:
On 5/1/17 12:19, Peter Eisentraut wrote: > On 4/27/17 08:41, Michael Paquier wrote: >> +$node_slave->promote; >> +$node_slave->poll_query_until('postgres', >> + "SELECT NOT pg_is_in_recovery()") >> + or die "Timed out while waiting for promotion of standby"; >> >> This reminds me that we should really switch PostgresNode::promote to >> use the wait mode of pg_ctl promote, and remove all those polling >> queries... > > I was going to say: This should all be obsolete already, because pg_ctl > promote waits by default. > > However: Failure to complete promotion within the waiting time does not > lead to an error exit, so you will not get a failure if the promotion > does not finish. This is probably a mistake. Looking around pg_ctl, I > found that this was handled seemingly inconsistently in do_start(), but > do_stop() errors when it does not complete. > > Possible patches for this attached. > > Perhaps we need a separate exit code in pg_ctl to distinguish general > errors from did not finish within timeout? I was going to hold this back for PG11, but since we're now doing some other tweaks in pg_ctl, it might be useful to add this too. Thoughts? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Attachment
[HACKERS] Re: pg_ctl wait exit code (was Re: [COMMITTERS] pgsql: Additionaltests for subtransactions in recovery)
From
Michael Paquier
Date:
On Sat, Jul 1, 2017 at 4:47 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 5/1/17 12:19, Peter Eisentraut wrote: >> However: Failure to complete promotion within the waiting time does not >> lead to an error exit, so you will not get a failure if the promotion >> does not finish. This is probably a mistake. Looking around pg_ctl, I >> found that this was handled seemingly inconsistently in do_start(), but >> do_stop() errors when it does not complete. This inconsistency could be treated like a bug, though changing such an old behavior in bacl-branches would be risky. So +1 for only HEAD with such a change, and pg_ctl promote -w is new in 10. >> Possible patches for this attached. >> >> Perhaps we need a separate exit code in pg_ctl to distinguish general >> errors from did not finish within timeout? I would treat that as a separate item for 11, but that's as far as my opinion goes. Per this link in pg_ctl.c the error code ought to be 4: https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html > I was going to hold this back for PG11, but since we're now doing some > other tweaks in pg_ctl, it might be useful to add this too. Thoughts? The use of 0 as exit code for the new promote -w if timeout is reached looks like an open item to me. Cleaning up the pool queries after promotion would be nice to see as well. -- Michael
Re: [HACKERS] Re: pg_ctl wait exit code (was Re: [COMMITTERS] pgsql:Additional tests for subtransactions in recovery)
From
Peter Eisentraut
Date:
On 7/2/17 20:28, Michael Paquier wrote: >> I was going to hold this back for PG11, but since we're now doing some >> other tweaks in pg_ctl, it might be useful to add this too. Thoughts? > > The use of 0 as exit code for the new promote -w if timeout is reached > looks like an open item to me. Cleaning up the pool queries after > promotion would be nice to see as well. committed -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Re: pg_ctl wait exit code (was Re: [COMMITTERS] pgsql:Additional tests for subtransactions in recovery)
From
Michael Paquier
Date:
On Thu, Jul 6, 2017 at 2:41 AM, Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote: > On 7/2/17 20:28, Michael Paquier wrote: >>> I was going to hold this back for PG11, but since we're now doing some >>> other tweaks in pg_ctl, it might be useful to add this too. Thoughts? >> >> The use of 0 as exit code for the new promote -w if timeout is reached >> looks like an open item to me. Cleaning up the pool queries after >> promotion would be nice to see as well. > > committed Thanks for finishing the cleanup. -- Michael