Re: BUG #17744: Fail Assert while recoverying from pg_basebackup - Mailing list pgsql-bugs

From Thomas Munro
Subject Re: BUG #17744: Fail Assert while recoverying from pg_basebackup
Date
Msg-id CA+hUKGKVjfmdBd3je31o1_yW9j=4DRaZDdZUAnTZjqAbirCurA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #17744: Fail Assert while recoverying from pg_basebackup  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #17744: Fail Assert while recoverying from pg_basebackup  (Michael Paquier <michael@paquier.xyz>)
List pgsql-bugs
On Sat, Feb 25, 2023 at 12:02 AM Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Feb 24, 2023 at 09:36:50AM +0900, Michael Paquier wrote:
> > I was thinking about that, and you may be fine as long as you skip
> > some parts of the restartpoint logic.  The case reported of this
> > thread does not cause crash recovery, actually, because startup
> > switches to +archive+ recovery any time it sees a backup_label file.
> > One thing I did not remember here is that we also set minRecoveryPoint
> > at a much earlier LSN than it should be (see 6c4f666).  However, we
> > rely heavily on backupEndRequired in the control file to make sure
> > that we've replayed up the end-of-backup record to decide if the
> > system is consistent or not.
>
> I have been spending more time on that to see if I was missing
> something, and reproducing the issue is rather easy by using pgbench
> that gets stopped with a SIGINT so as restart points would be able to
> see transactions still running in the code path triggering the assert.
> A cheap regression test should be possible, actually, though for now
> the only thing I have been able to rely on is a hack to force
> checkpoint_timeout at 1s to make the failure rate more aggressive.
>
> Anyway, with this simple method (and an increase of short pgbench runs
> that are interrupted to increase the chance of hits), a bisect points
> at 7ff23c6 :/

Thanks.  I've been thinking about how to make a deterministic test
script to study this and possible fixes, too.  Unfortunately I came
down with a nasty cold and stopped computing for a couple of days, so
sorry for the slow response on this thread, but I seem to have
rebooted now.  Looking.



pgsql-bugs by date:

Previous
From: Wesley Smith
Date:
Subject: Re: BUG #17806: PostgreSQL 13.10 returns "CREATE DATABASE cannot be executed within a pipeline"
Next
From: Tom Lane
Date:
Subject: Re: BUG #17800: ON CONFLICT DO UPDATE fails to detect incompatible fields that leads to a server crash