Thread: Crash after a call to pg_backup_start()

Crash after a call to pg_backup_start()

From
Kyotaro Horiguchi
Date:
While playing with a proposed patch, I noticed that a session crashes
after a failed call to pg_backup_start().

postgres=# select pg_backup_start(repeat('x', 1026));
ERROR:  backup label too long (max 1024 bytes)
postgres=# \q
> TRAP: failed Assert("during_backup_start ^ (sessionBackupState == SESSION_BACKUP_RUNNING)"), File: "xlog.c", Line:
8846,PID: 165835
 

Surprisingly this happens by a series of succussful calls to
pg_backup_start and stop.

postgres=# select pg_backup_start('x');
postgres=# select pg_backup_top();
postgres=# \q
> TRAP: failed Assert("durin..


>> do_pg_abort_backup(int code, Datum arg)
>     /* Only one of these conditions can be true */
>    Assert(during_backup_start ^
>           (sessionBackupState == SESSION_BACKUP_RUNNING));

It seems to me that the comment is true and the condition is a thinko.
This is introduced by df3737a651 so it is master only.

Please find the attached fix.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment

Re: Crash after a call to pg_backup_start()

From
Richard Guo
Date:

On Fri, Oct 21, 2022 at 3:10 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
It seems to me that the comment is true and the condition is a thinko.
 
Yeah, the two conditions could be both false. How about we update the
comment a bit to emphasize this? Such as

    /* At most one of these conditions can be true */
or
    /* These conditions can not be both true */
 
Please find the attached fix.
 
+1

Thanks
Richard

Re: Crash after a call to pg_backup_start()

From
Michael Paquier
Date:
On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
> Yeah, the two conditions could be both false. How about we update the
> comment a bit to emphasize this? Such as
>
>     /* At most one of these conditions can be true */
> or
>     /* These conditions can not be both true */

If you do that, it would be a bit easier to read as of the following
assertion instead?  Like:
Assert(!during_backup_start ||
       sessionBackupState == SESSION_BACKUP_NONE);

Please note that I have not checked in details all the interactions
behind register_persistent_abort_backup_handler() before entering in
do_pg_backup_start() and the ERROR_CLEANUP block used in this
routine (just a matter of some elog(ERROR)s put here and there, for
example).  Anyway, yes, both conditions can be false, and that's easy
to reproduce:
1) Do pg_backup_start().
2) Do pg_backup_stop().
3) Stop the session to kick do_pg_abort_backup()
4) Assert()-boom.
--
Michael

Attachment

Re: Crash after a call to pg_backup_start()

From
Bharath Rupireddy
Date:
On Fri, Oct 21, 2022 at 7:06 PM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
> > Yeah, the two conditions could be both false. How about we update the
> > comment a bit to emphasize this? Such as
> >
> >     /* At most one of these conditions can be true */
> > or
> >     /* These conditions can not be both true */
>
> If you do that, it would be a bit easier to read as of the following
> assertion instead?  Like:
> Assert(!during_backup_start ||
>        sessionBackupState == SESSION_BACKUP_NONE);
>
> Please note that I have not checked in details all the interactions
> behind register_persistent_abort_backup_handler() before entering in
> do_pg_backup_start() and the ERROR_CLEANUP block used in this
> routine (just a matter of some elog(ERROR)s put here and there, for
> example).  Anyway, yes, both conditions can be false, and that's easy
> to reproduce:
> 1) Do pg_backup_start().
> 2) Do pg_backup_stop().
> 3) Stop the session to kick do_pg_abort_backup()
> 4) Assert()-boom.

I'm wondering if we need the assertion at all. We know that when the
arg is true or the sessionBackupState is SESSION_BACKUP_RUNNING, the
runningBackups would've been incremented and we can just go ahead and
decrement it, like the attached patch. This is a cleaner approach IMO
unless I'm missing something here.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: Crash after a call to pg_backup_start()

From
Alvaro Herrera
Date:
On 2022-Oct-21, Michael Paquier wrote:

> On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:

> >     /* These conditions can not be both true */
> 
> If you do that, it would be a bit easier to read as of the following
> assertion instead?  Like:
> Assert(!during_backup_start ||
>        sessionBackupState == SESSION_BACKUP_NONE);

My intention here was that the Assert should be inside the block, that
is, we already know that at least one is true, and we want to make sure
that they are not *both* true.

AFAICT the attached patch also fixes the bug without making the assert
weaker.

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/

Attachment

Re: Crash after a call to pg_backup_start()

From
Bharath Rupireddy
Date:
On Sat, Oct 22, 2022 at 1:26 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2022-Oct-21, Michael Paquier wrote:
>
> > On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
>
> > >     /* These conditions can not be both true */
> >
> > If you do that, it would be a bit easier to read as of the following
> > assertion instead?  Like:
> > Assert(!during_backup_start ||
> >        sessionBackupState == SESSION_BACKUP_NONE);
>
> My intention here was that the Assert should be inside the block, that
> is, we already know that at least one is true, and we want to make sure
> that they are not *both* true.
>
> AFAICT the attached patch also fixes the bug without making the assert
> weaker.

+        /* We should be here only by one of these reasons, never both */
+        Assert(during_backup_start ^
+               (sessionBackupState == SESSION_BACKUP_RUNNING));
+

What's the problem even if we're here when both of them are true? The
runningBackups is incremented anyways, right? Why can't we just get
rid of the Assert and treat during_backup_start as
backup_marked_active_in_shmem or something like that to keep things
simple?

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Crash after a call to pg_backup_start()

From
Alvaro Herrera
Date:
On 2022-Oct-22, Bharath Rupireddy wrote:

> +        /* We should be here only by one of these reasons, never both */
> +        Assert(during_backup_start ^
> +               (sessionBackupState == SESSION_BACKUP_RUNNING));
> +
> 
> What's the problem even if we're here when both of them are true?

In what case should we be there with both conditions true?

> The runningBackups is incremented anyways, right?

In the current code, yes, but it seems to be easier to reason about if
we know precisely why we're there and whether we should be running the
cleanup or not.  Otherwise we might end up with a bug where we run the
function but it doesn't do anything because we failed to understand the
preconditions.  At the very least, this forces a developer changing this
code to think through it.

> Why can't we just get rid of the Assert and treat during_backup_start
> as backup_marked_active_in_shmem or something like that to keep things
> simple?

Why is that simpler?

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La verdad no siempre es bonita, pero el hambre de ella sí"



Re: Crash after a call to pg_backup_start()

From
Bharath Rupireddy
Date:
On Sat, Oct 22, 2022 at 1:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> > Why can't we just get rid of the Assert and treat during_backup_start
> > as backup_marked_active_in_shmem or something like that to keep things
> > simple?
>
> Why is that simpler?

IMO, the assertion looks complex there and thinking if we can remove it.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: Crash after a call to pg_backup_start()

From
Kyotaro Horiguchi
Date:
At Sat, 22 Oct 2022 09:56:06 +0200, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in 
> On 2022-Oct-21, Michael Paquier wrote:
> 
> > On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote:
> 
> > >     /* These conditions can not be both true */
> > 
> > If you do that, it would be a bit easier to read as of the following
> > assertion instead?  Like:
> > Assert(!during_backup_start ||
> >        sessionBackupState == SESSION_BACKUP_NONE);
> 
> My intention here was that the Assert should be inside the block, that
> is, we already know that at least one is true, and we want to make sure
> that they are not *both* true.
> 
> AFAICT the attached patch also fixes the bug without making the assert
> weaker.

I'm fine with either of them, but..

The reason for that works the same way is that the if() block excludes
the case of (!during_backup_start && S_B_RUNNING)<*1>. In other words
the strictness is a kind of illusion [*a]. Actually the assertion does
not detect the case <*1>.  In this regard, moving the current
assertion into the if() block might be confusing.

regards,

<*1>: It's evidently superfluous but "strictness" and "illusion" share
 the exactly the same pronounciation in Japanese "Ghen-Ka-Ku".

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: Crash after a call to pg_backup_start()

From
Michael Paquier
Date:
On Mon, Oct 24, 2022 at 11:42:58AM +0900, Kyotaro Horiguchi wrote:
> At Sat, 22 Oct 2022 09:56:06 +0200, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in
>> My intention here was that the Assert should be inside the block, that
>> is, we already know that at least one is true, and we want to make sure
>> that they are not *both* true.
>>
>> AFAICT the attached patch also fixes the bug without making the assert
>> weaker.

On the contrary, it seems to me that putting the assertion within the
if() block makes the assertion weaker, because we would never check
for an incorrect state after do_pg_abort_backup() is registered (aka
any pg_backup_start() call) when not entering in this if() block.

Saying that, if you feel otherwise I am fine with your conclusion as
well, so feel free to solve this issue as you see fit.  :p
--
Michael

Attachment

Re: Crash after a call to pg_backup_start()

From
Alvaro Herrera
Date:
On 2022-Oct-24, Michael Paquier wrote:

> On the contrary, it seems to me that putting the assertion within the
> if() block makes the assertion weaker, because we would never check
> for an incorrect state after do_pg_abort_backup() is registered (aka
> any pg_backup_start() call) when not entering in this if() block.

Reading it again, I agree with your conclusion, so I'll push as you
proposed with some extra tests, after they complete running.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La verdad no siempre es bonita, pero el hambre de ella sí"



Re: Crash after a call to pg_backup_start()

From
Michael Paquier
Date:
On Mon, Oct 24, 2022 at 11:39:19AM +0200, Alvaro Herrera wrote:
> Reading it again, I agree with your conclusion, so I'll push as you
> proposed with some extra tests, after they complete running.

Thanks for the fix, Álvaro!
--
Michael

Attachment