Le mardi 5 mars 2024, 00:05:03 CET Noah Misch a écrit :
> Does that procedure alone reproduce the ERROR, without any replica promote,
> postmaster restart, etc.? That, I do not have reason to expect.
Oh no sorrry for not being clear: it needs to go through recovery either
through a backup restoration, a replica or anything else. This is why I've
been struggling to reproduce the issue, it didn't occur to me that the fsm was
only sane on the primary, as the extensions were not wal logged.
>
> > > I think the fix is one of:
> > >
> > > - Revoke the undocumented rule. Make FSM consumers resilient to the FSM
> > >
> > > returning a now-too-large block number.
> >
> > Performance wise, that seems to be the better answer
>
> I would guess this one is more risky from a performance perspective, since
> we'd be adding to a hotter path under RelationGetBufferForTuple(). Still,
> it's likely fine.
Makes sense.
>
> > but won't this kind of
> > built-in resilience encourage subtle bugs creeping in ?
>
> Since the !wal_log_hints case avoids WAL for FSM, we've already accepted
> subtle bugs. I bet an unlucky torn FSM page could reach the ERROR you
> reached, even if we introduced a "main-fork WAL before FSM" rule. We'll
> have plenty of need for resilience. Hence, I'd lean toward this approach.
> > > - Enforce a new "main-fork WAL before FSM" rule for logged rels. For
> > > example, in each PageIsNew() case, either don't update FSM or WAL-log an
> > > init (like lazy_scan_new_or_empty() does when PageIsEmpty()).
> >
> > The FSM is not updated by the caller: in the bulk insert case, we
> > intentionally don't add them to the FSM. So having VACUUM WAL-log the page
> > in lazy_scan_new_or_empty in the New case as you propose seems a very
> > good idea.
> >
> > Performance wise that would keep the WAL logging outside of the backend
> > performing the preventive work for itself or others, and it should not be
> > that many pages that it gives VACUUM too much work.
>
> A drawback is that this approach touches several access methods, so
> out-of-tree access methods also likely need changes. Still, this is a good
> backup if the first option measurably slows INSERT throughput.
Would it actually ? At least for the currently discussed VACUUM case, we will
fall back to a generic FPI for the new page, which should work on any AM. But
the requirement to WAL log the page before recording it in the FSM should be
publicized, and the common infrastructure should adhere to it. I'd have to
look into the other cases you mentioned where that can happen.
>
> > I can try my hand at such a patch it if looks like a good idea.
>
> Please do.
>
> > Thank you very much for your insights.
>
> Thanks for the detailed report that made it possible.