Thread: Return value of pg_promote()

Return value of pg_promote()

From
Ashutosh Sharma
Date:
Hi All,

At present, pg_promote() returns true to the caller on successful
promotion of standby, however it returns false in multiple scenarios
which includes:

1) The SIGUSR1 signal could not be sent to the postmaster process.
2) The postmaster died during standby promotion.
3) Standby couldn't be promoted within the specified wait time.

For an application calling this function, if pg_promote returns false,
it is hard to interpret the reason behind it. So I think we should
*only* allow pg_promote to return false when the server could not be
promoted in the given wait time and in other scenarios it should just
throw an error (FATAL, ERROR ... depending on the type of failure that
occurred). Please let me know your thoughts on this change. thanks.!

--
With Regards,
Ashutosh Sharma.



Re: Return value of pg_promote()

From
Laurenz Albe
Date:
On Tue, 2023-06-06 at 16:35 +0530, Ashutosh Sharma wrote:
> At present, pg_promote() returns true to the caller on successful
> promotion of standby, however it returns false in multiple scenarios
> which includes:
>
> 1) The SIGUSR1 signal could not be sent to the postmaster process.
> 2) The postmaster died during standby promotion.
> 3) Standby couldn't be promoted within the specified wait time.
>
> For an application calling this function, if pg_promote returns false,
> it is hard to interpret the reason behind it. So I think we should
> *only* allow pg_promote to return false when the server could not be
> promoted in the given wait time and in other scenarios it should just
> throw an error (FATAL, ERROR ... depending on the type of failure that
> occurred). Please let me know your thoughts on this change. thanks.!

As the original author, I'd say that that sounds reasonable, particularly
in case #1.  If the postmaster dies, we are going to die too, so it
probably doesn't matter much.  But I think an error is certainly also
correct in that case.

Yours,
Laurenz Albe



Re: Return value of pg_promote()

From
Fujii Masao
Date:

On 2023/06/07 2:00, Laurenz Albe wrote:
> On Tue, 2023-06-06 at 16:35 +0530, Ashutosh Sharma wrote:
>> At present, pg_promote() returns true to the caller on successful
>> promotion of standby, however it returns false in multiple scenarios
>> which includes:
>>
>> 1) The SIGUSR1 signal could not be sent to the postmaster process.
>> 2) The postmaster died during standby promotion.
>> 3) Standby couldn't be promoted within the specified wait time.
>>
>> For an application calling this function, if pg_promote returns false,
>> it is hard to interpret the reason behind it. So I think we should
>> *only* allow pg_promote to return false when the server could not be
>> promoted in the given wait time and in other scenarios it should just
>> throw an error (FATAL, ERROR ... depending on the type of failure that
>> occurred). Please let me know your thoughts on this change. thanks.!
> 
> As the original author, I'd say that that sounds reasonable, particularly
> in case #1.  If the postmaster dies, we are going to die too, so it
> probably doesn't matter much.  But I think an error is certainly also
> correct in that case.

+1

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: Return value of pg_promote()

From
Ashutosh Sharma
Date:
On Wed, Jun 7, 2023 at 9:55 PM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
>
>
>
> On 2023/06/07 2:00, Laurenz Albe wrote:
> > On Tue, 2023-06-06 at 16:35 +0530, Ashutosh Sharma wrote:
> >> At present, pg_promote() returns true to the caller on successful
> >> promotion of standby, however it returns false in multiple scenarios
> >> which includes:
> >>
> >> 1) The SIGUSR1 signal could not be sent to the postmaster process.
> >> 2) The postmaster died during standby promotion.
> >> 3) Standby couldn't be promoted within the specified wait time.
> >>
> >> For an application calling this function, if pg_promote returns false,
> >> it is hard to interpret the reason behind it. So I think we should
> >> *only* allow pg_promote to return false when the server could not be
> >> promoted in the given wait time and in other scenarios it should just
> >> throw an error (FATAL, ERROR ... depending on the type of failure that
> >> occurred). Please let me know your thoughts on this change. thanks.!
> >
> > As the original author, I'd say that that sounds reasonable, particularly
> > in case #1.  If the postmaster dies, we are going to die too, so it
> > probably doesn't matter much.  But I think an error is certainly also
> > correct in that case.
>
> +1
>

Thanks for sharing your thoughts, Laurenz and Fujii-san. I've prepared
a patch that makes pg_promote error out if it couldn't send SIGUSR1 to
the postmaster or if the postmaster died in the middle of standby
promotion. PFA. Please note that now (with this patch) pg_promote only
returns false if the standby could not be promoted within the given
wait time. In case of any kind of failure, it just reports an error
based on the type of failure that occurred.

--
With Regards,
Ashutosh Sharma.

Attachment

Re: Return value of pg_promote()

From
Michael Paquier
Date:
On Thu, Jun 08, 2023 at 04:53:50PM +0530, Ashutosh Sharma wrote:
> Thanks for sharing your thoughts, Laurenz and Fujii-san. I've prepared
> a patch that makes pg_promote error out if it couldn't send SIGUSR1 to
> the postmaster or if the postmaster died in the middle of standby
> promotion. PFA. Please note that now (with this patch) pg_promote only
> returns false if the standby could not be promoted within the given
> wait time. In case of any kind of failure, it just reports an error
> based on the type of failure that occurred.

     if (kill(PostmasterPid, SIGUSR1) != 0)
     {
-        ereport(WARNING,
-                (errmsg("failed to send signal to postmaster: %m")));
         (void) unlink(PROMOTE_SIGNAL_FILE);
-        PG_RETURN_BOOL(false);
+        ereport(ERROR,
+                (errmsg("failed to send signal to postmaster: %m")));
     }

Shouldn't you assign an error code to this one rather than the
default one for internal errors, like ERRCODE_SYSTEM_ERROR?

     /* return immediately if waiting was not requested */
@@ -744,7 +743,9 @@ pg_promote(PG_FUNCTION_ARGS)
          * necessity for manual cleanup of all postmaster children.
          */
         if (rc & WL_POSTMASTER_DEATH)
-            PG_RETURN_BOOL(false);
+            ereport(FATAL,
+                    (errcode(ERRCODE_ADMIN_SHUTDOWN),
+                     errmsg("terminating connection due to unexpected postmaster exit")));

I would add an errcontext here, to let somebody know that the
connection died while waiting for the promotion to be processed, say
"while waiting on promotion".
--
Michael

Attachment

Re: Return value of pg_promote()

From
Michael Paquier
Date:
On Wed, Aug 16, 2023 at 05:02:09PM +0900, Michael Paquier wrote:
>      if (kill(PostmasterPid, SIGUSR1) != 0)
>      {
> -        ereport(WARNING,
> -                (errmsg("failed to send signal to postmaster: %m")));
>          (void) unlink(PROMOTE_SIGNAL_FILE);
> -        PG_RETURN_BOOL(false);
> +        ereport(ERROR,
> +                (errmsg("failed to send signal to postmaster: %m")));
>      }
>
> Shouldn't you assign an error code to this one rather than the
> default one for internal errors, like ERRCODE_SYSTEM_ERROR?
>
>      /* return immediately if waiting was not requested */
> @@ -744,7 +743,9 @@ pg_promote(PG_FUNCTION_ARGS)
>           * necessity for manual cleanup of all postmaster children.
>           */
>          if (rc & WL_POSTMASTER_DEATH)
> -            PG_RETURN_BOOL(false);
> +            ereport(FATAL,
> +                    (errcode(ERRCODE_ADMIN_SHUTDOWN),
> +                     errmsg("terminating connection due to unexpected postmaster exit")));
>
> I would add an errcontext here, to let somebody know that the
> connection died while waiting for the promotion to be processed, say
> "while waiting on promotion".

I have just noticed that we do not have a CF entry for this proposal,
so I have added one with Laurenz as author:
https://commitfest.postgresql.org/44/4504/

For now the patch is waiting on author.  Could you address my
last review?
--
Michael

Attachment

Re: Return value of pg_promote()

From
Ashutosh Sharma
Date:
Hi Michael,

On Thu, Aug 17, 2023 at 6:07 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Wed, Aug 16, 2023 at 05:02:09PM +0900, Michael Paquier wrote:
> >      if (kill(PostmasterPid, SIGUSR1) != 0)
> >      {
> > -        ereport(WARNING,
> > -                (errmsg("failed to send signal to postmaster: %m")));
> >          (void) unlink(PROMOTE_SIGNAL_FILE);
> > -        PG_RETURN_BOOL(false);
> > +        ereport(ERROR,
> > +                (errmsg("failed to send signal to postmaster: %m")));
> >      }
> >
> > Shouldn't you assign an error code to this one rather than the
> > default one for internal errors, like ERRCODE_SYSTEM_ERROR?
> >
> >      /* return immediately if waiting was not requested */
> > @@ -744,7 +743,9 @@ pg_promote(PG_FUNCTION_ARGS)
> >           * necessity for manual cleanup of all postmaster children.
> >           */
> >          if (rc & WL_POSTMASTER_DEATH)
> > -            PG_RETURN_BOOL(false);
> > +            ereport(FATAL,
> > +                    (errcode(ERRCODE_ADMIN_SHUTDOWN),
> > +                     errmsg("terminating connection due to unexpected postmaster exit")));
> >
> > I would add an errcontext here, to let somebody know that the
> > connection died while waiting for the promotion to be processed, say
> > "while waiting on promotion".
>
> I have just noticed that we do not have a CF entry for this proposal,
> so I have added one with Laurenz as author:
> https://commitfest.postgresql.org/44/4504/
>
> For now the patch is waiting on author.  Could you address my
> last review?

Thanks for reviewing the patch and adding a CF entry for it. PFA patch
that addresses your review comments.

And... Sorry for the delayed response. I totally missed it.

--
With Regards,
Ashutosh Sharma.

Attachment

Re: Return value of pg_promote()

From
Michael Paquier
Date:
On Mon, Aug 28, 2023 at 11:50:45AM +0530, Ashutosh Sharma wrote:
> Thanks for reviewing the patch and adding a CF entry for it. PFA patch
> that addresses your review comments.

That looks OK seen from here.  Perhaps others have more comments?

> And... Sorry for the delayed response. I totally missed it.

No problem.
--
Michael

Attachment

Re: Return value of pg_promote()

From
Laurenz Albe
Date:
On Thu, 2023-08-17 at 09:37 +0900, Michael Paquier wrote:
> I have just noticed that we do not have a CF entry for this proposal,
> so I have added one with Laurenz as author:
> https://commitfest.postgresql.org/44/4504/

I have changed the author to Fujii Masao.

Yours,
Laurenz Albe



Re: Return value of pg_promote()

From
Michael Paquier
Date:
On Mon, Aug 28, 2023 at 02:09:37PM +0200, Laurenz Albe wrote:
> On Thu, 2023-08-17 at 09:37 +0900, Michael Paquier wrote:
> > I have just noticed that we do not have a CF entry for this proposal,
> > so I have added one with Laurenz as author:
> > https://commitfest.postgresql.org/44/4504/
>
> I have changed the author to Fujii Masao.

Still incorrect, as the author is Ashutosh Sharma.  Fujii-san has
provided some feedback, though.
--
Michael

Attachment