Thread: [HACKERS] pg_ctl wait exit code (was Re: [COMMITTERS] pgsql: Additional testsfor subtransactions in recovery)

On 4/27/17 08:41, Michael Paquier wrote:
> +$node_slave->promote;
> +$node_slave->poll_query_until('postgres',
> +   "SELECT NOT pg_is_in_recovery()")
> +  or die "Timed out while waiting for promotion of standby";
> 
> This reminds me that we should really switch PostgresNode::promote to
> use the wait mode of pg_ctl promote, and remove all those polling
> queries...

I was going to say: This should all be obsolete already, because pg_ctl
promote waits by default.

However: Failure to complete promotion within the waiting time does not
lead to an error exit, so you will not get a failure if the promotion
does not finish.  This is probably a mistake.  Looking around pg_ctl, I
found that this was handled seemingly inconsistently in do_start(), but
do_stop() errors when it does not complete.

Possible patches for this attached.

Perhaps we need a separate exit code in pg_ctl to distinguish general
errors from did not finish within timeout?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On 5/1/17 12:19, Peter Eisentraut wrote:
> On 4/27/17 08:41, Michael Paquier wrote:
>> +$node_slave->promote;
>> +$node_slave->poll_query_until('postgres',
>> +   "SELECT NOT pg_is_in_recovery()")
>> +  or die "Timed out while waiting for promotion of standby";
>>
>> This reminds me that we should really switch PostgresNode::promote to
>> use the wait mode of pg_ctl promote, and remove all those polling
>> queries...
> 
> I was going to say: This should all be obsolete already, because pg_ctl
> promote waits by default.
> 
> However: Failure to complete promotion within the waiting time does not
> lead to an error exit, so you will not get a failure if the promotion
> does not finish.  This is probably a mistake.  Looking around pg_ctl, I
> found that this was handled seemingly inconsistently in do_start(), but
> do_stop() errors when it does not complete.
> 
> Possible patches for this attached.
> 
> Perhaps we need a separate exit code in pg_ctl to distinguish general
> errors from did not finish within timeout?

I was going to hold this back for PG11, but since we're now doing some
other tweaks in pg_ctl, it might be useful to add this too.  Thoughts?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Attachment
On Sat, Jul 1, 2017 at 4:47 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 5/1/17 12:19, Peter Eisentraut wrote:
>> However: Failure to complete promotion within the waiting time does not
>> lead to an error exit, so you will not get a failure if the promotion
>> does not finish.  This is probably a mistake.  Looking around pg_ctl, I
>> found that this was handled seemingly inconsistently in do_start(), but
>> do_stop() errors when it does not complete.

This inconsistency could be treated like a bug, though changing such
an old behavior in bacl-branches would be risky. So +1 for only HEAD
with such a change, and pg_ctl promote -w is new in 10.

>> Possible patches for this attached.
>>
>> Perhaps we need a separate exit code in pg_ctl to distinguish general
>> errors from did not finish within timeout?

I would treat that as a separate item for 11, but that's as far as my
opinion goes. Per this link in pg_ctl.c the error code ought to be 4:
https://refspecs.linuxbase.org/LSB_3.1.0/LSB-Core-generic/LSB-Core-generic/iniscrptact.html

> I was going to hold this back for PG11, but since we're now doing some
> other tweaks in pg_ctl, it might be useful to add this too.  Thoughts?

The use of 0 as exit code for the new promote -w if timeout is reached
looks like an open item to me. Cleaning up the pool queries after
promotion would be nice to see as well.
-- 
Michael



On 7/2/17 20:28, Michael Paquier wrote:
>> I was going to hold this back for PG11, but since we're now doing some
>> other tweaks in pg_ctl, it might be useful to add this too.  Thoughts?
> 
> The use of 0 as exit code for the new promote -w if timeout is reached
> looks like an open item to me. Cleaning up the pool queries after
> promotion would be nice to see as well.

committed

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



On Thu, Jul 6, 2017 at 2:41 AM, Peter Eisentraut
<peter.eisentraut@2ndquadrant.com> wrote:
> On 7/2/17 20:28, Michael Paquier wrote:
>>> I was going to hold this back for PG11, but since we're now doing some
>>> other tweaks in pg_ctl, it might be useful to add this too.  Thoughts?
>>
>> The use of 0 as exit code for the new promote -w if timeout is reached
>> looks like an open item to me. Cleaning up the pool queries after
>> promotion would be nice to see as well.
>
> committed

Thanks for finishing the cleanup.
-- 
Michael