Thread: Crash after a call to pg_backup_start()
While playing with a proposed patch, I noticed that a session crashes after a failed call to pg_backup_start(). postgres=# select pg_backup_start(repeat('x', 1026)); ERROR: backup label too long (max 1024 bytes) postgres=# \q > TRAP: failed Assert("during_backup_start ^ (sessionBackupState == SESSION_BACKUP_RUNNING)"), File: "xlog.c", Line: 8846,PID: 165835 Surprisingly this happens by a series of succussful calls to pg_backup_start and stop. postgres=# select pg_backup_start('x'); postgres=# select pg_backup_top(); postgres=# \q > TRAP: failed Assert("durin.. >> do_pg_abort_backup(int code, Datum arg) > /* Only one of these conditions can be true */ > Assert(during_backup_start ^ > (sessionBackupState == SESSION_BACKUP_RUNNING)); It seems to me that the comment is true and the condition is a thinko. This is introduced by df3737a651 so it is master only. Please find the attached fix. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Attachment
On Fri, Oct 21, 2022 at 3:10 PM Kyotaro Horiguchi <horikyota.ntt@gmail.com> wrote:
It seems to me that the comment is true and the condition is a thinko.
Yeah, the two conditions could be both false. How about we update the
comment a bit to emphasize this? Such as
/* At most one of these conditions can be true */
or
/* These conditions can not be both true */
comment a bit to emphasize this? Such as
/* At most one of these conditions can be true */
or
/* These conditions can not be both true */
Please find the attached fix.
+1
Thanks
Richard
Richard
On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote: > Yeah, the two conditions could be both false. How about we update the > comment a bit to emphasize this? Such as > > /* At most one of these conditions can be true */ > or > /* These conditions can not be both true */ If you do that, it would be a bit easier to read as of the following assertion instead? Like: Assert(!during_backup_start || sessionBackupState == SESSION_BACKUP_NONE); Please note that I have not checked in details all the interactions behind register_persistent_abort_backup_handler() before entering in do_pg_backup_start() and the ERROR_CLEANUP block used in this routine (just a matter of some elog(ERROR)s put here and there, for example). Anyway, yes, both conditions can be false, and that's easy to reproduce: 1) Do pg_backup_start(). 2) Do pg_backup_stop(). 3) Stop the session to kick do_pg_abort_backup() 4) Assert()-boom. -- Michael
Attachment
On Fri, Oct 21, 2022 at 7:06 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote: > > Yeah, the two conditions could be both false. How about we update the > > comment a bit to emphasize this? Such as > > > > /* At most one of these conditions can be true */ > > or > > /* These conditions can not be both true */ > > If you do that, it would be a bit easier to read as of the following > assertion instead? Like: > Assert(!during_backup_start || > sessionBackupState == SESSION_BACKUP_NONE); > > Please note that I have not checked in details all the interactions > behind register_persistent_abort_backup_handler() before entering in > do_pg_backup_start() and the ERROR_CLEANUP block used in this > routine (just a matter of some elog(ERROR)s put here and there, for > example). Anyway, yes, both conditions can be false, and that's easy > to reproduce: > 1) Do pg_backup_start(). > 2) Do pg_backup_stop(). > 3) Stop the session to kick do_pg_abort_backup() > 4) Assert()-boom. I'm wondering if we need the assertion at all. We know that when the arg is true or the sessionBackupState is SESSION_BACKUP_RUNNING, the runningBackups would've been incremented and we can just go ahead and decrement it, like the attached patch. This is a cleaner approach IMO unless I'm missing something here. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On 2022-Oct-21, Michael Paquier wrote: > On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote: > > /* These conditions can not be both true */ > > If you do that, it would be a bit easier to read as of the following > assertion instead? Like: > Assert(!during_backup_start || > sessionBackupState == SESSION_BACKUP_NONE); My intention here was that the Assert should be inside the block, that is, we already know that at least one is true, and we want to make sure that they are not *both* true. AFAICT the attached patch also fixes the bug without making the assert weaker. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
Attachment
On Sat, Oct 22, 2022 at 1:26 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > On 2022-Oct-21, Michael Paquier wrote: > > > On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote: > > > > /* These conditions can not be both true */ > > > > If you do that, it would be a bit easier to read as of the following > > assertion instead? Like: > > Assert(!during_backup_start || > > sessionBackupState == SESSION_BACKUP_NONE); > > My intention here was that the Assert should be inside the block, that > is, we already know that at least one is true, and we want to make sure > that they are not *both* true. > > AFAICT the attached patch also fixes the bug without making the assert > weaker. + /* We should be here only by one of these reasons, never both */ + Assert(during_backup_start ^ + (sessionBackupState == SESSION_BACKUP_RUNNING)); + What's the problem even if we're here when both of them are true? The runningBackups is incremented anyways, right? Why can't we just get rid of the Assert and treat during_backup_start as backup_marked_active_in_shmem or something like that to keep things simple? -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On 2022-Oct-22, Bharath Rupireddy wrote: > + /* We should be here only by one of these reasons, never both */ > + Assert(during_backup_start ^ > + (sessionBackupState == SESSION_BACKUP_RUNNING)); > + > > What's the problem even if we're here when both of them are true? In what case should we be there with both conditions true? > The runningBackups is incremented anyways, right? In the current code, yes, but it seems to be easier to reason about if we know precisely why we're there and whether we should be running the cleanup or not. Otherwise we might end up with a bug where we run the function but it doesn't do anything because we failed to understand the preconditions. At the very least, this forces a developer changing this code to think through it. > Why can't we just get rid of the Assert and treat during_backup_start > as backup_marked_active_in_shmem or something like that to keep things > simple? Why is that simpler? -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "La verdad no siempre es bonita, pero el hambre de ella sí"
On Sat, Oct 22, 2022 at 1:56 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote: > > > Why can't we just get rid of the Assert and treat during_backup_start > > as backup_marked_active_in_shmem or something like that to keep things > > simple? > > Why is that simpler? IMO, the assertion looks complex there and thinking if we can remove it. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
At Sat, 22 Oct 2022 09:56:06 +0200, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in > On 2022-Oct-21, Michael Paquier wrote: > > > On Fri, Oct 21, 2022 at 05:53:25PM +0800, Richard Guo wrote: > > > > /* These conditions can not be both true */ > > > > If you do that, it would be a bit easier to read as of the following > > assertion instead? Like: > > Assert(!during_backup_start || > > sessionBackupState == SESSION_BACKUP_NONE); > > My intention here was that the Assert should be inside the block, that > is, we already know that at least one is true, and we want to make sure > that they are not *both* true. > > AFAICT the attached patch also fixes the bug without making the assert > weaker. I'm fine with either of them, but.. The reason for that works the same way is that the if() block excludes the case of (!during_backup_start && S_B_RUNNING)<*1>. In other words the strictness is a kind of illusion [*a]. Actually the assertion does not detect the case <*1>. In this regard, moving the current assertion into the if() block might be confusing. regards, <*1>: It's evidently superfluous but "strictness" and "illusion" share the exactly the same pronounciation in Japanese "Ghen-Ka-Ku". -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Oct 24, 2022 at 11:42:58AM +0900, Kyotaro Horiguchi wrote: > At Sat, 22 Oct 2022 09:56:06 +0200, Alvaro Herrera <alvherre@alvh.no-ip.org> wrote in >> My intention here was that the Assert should be inside the block, that >> is, we already know that at least one is true, and we want to make sure >> that they are not *both* true. >> >> AFAICT the attached patch also fixes the bug without making the assert >> weaker. On the contrary, it seems to me that putting the assertion within the if() block makes the assertion weaker, because we would never check for an incorrect state after do_pg_abort_backup() is registered (aka any pg_backup_start() call) when not entering in this if() block. Saying that, if you feel otherwise I am fine with your conclusion as well, so feel free to solve this issue as you see fit. :p -- Michael
Attachment
On 2022-Oct-24, Michael Paquier wrote: > On the contrary, it seems to me that putting the assertion within the > if() block makes the assertion weaker, because we would never check > for an incorrect state after do_pg_abort_backup() is registered (aka > any pg_backup_start() call) when not entering in this if() block. Reading it again, I agree with your conclusion, so I'll push as you proposed with some extra tests, after they complete running. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "La verdad no siempre es bonita, pero el hambre de ella sí"
On Mon, Oct 24, 2022 at 11:39:19AM +0200, Alvaro Herrera wrote: > Reading it again, I agree with your conclusion, so I'll push as you > proposed with some extra tests, after they complete running. Thanks for the fix, Álvaro! -- Michael