Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows - Mailing list pgsql-bugs

From Robert Haas
Subject Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
Date
Msg-id CA+TgmoZj9dJhHQokHioXB8dyDZz5gjzyg7SbUrG=At-uShq7yA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-bugs
On Wed, Dec 11, 2024 at 8:26 AM Thomas Munro <thomas.munro@gmail.com> wrote:
> > I do wonder whether the new smgrtruncatefrom() should be used
> > everywhere instead of just from one of the call sites, but even if
> > that's desirable long-term, doing this much is still a lot better than
> > doing nothing. This data corrupting bug has been known and understood
> > for more than 4 years at this point.
>
> There are only three callers of smgrtruncate() in our tree:
>
>  1.  This one.
>  2.  Its redo-time counterpart.
>  3.  pg_truncate_visibility_map(), in contrib/pg_visibility.
>
> The recovery environment promotes errors of both types to FATAL (the
> "we have no handler" case in errstart()), and the postmaster handles
> startup failure a bit like a PANIC.  Or do you think PANIC and more
> similar code would be better?

I think mostly my thought was that if we could remove smgrtruncate()
entirely in favor of smgrtruncatefrom(), or just keep it called
smgrtruncate() but add a mandatory additional argument, that would be
less error-prone than having two versions between which future hackers
must pick.

> I am curious about the non-flushing though.  RelationTruncate() also
> doesn't flush if there is no fsm or vm.  Perhaps it always flushes in
> practice since VACUUM itself creates those, I'm not entirely sure yet.

I agree that we should make that unconditional. I think your analysis
shows that abandoning the WAL-before-data isn't safe, but even if
there were some doubt about whether your analysis is correct, we
should have a very strong bias against thinking that a rule as
fundamental as WAL-before-data is negotiable.

> PFA the first sketch of a patch to make pg_truncate_visibility_map()
> into... a cut-down version of RelationTruncate().  Still looking into
> the flushing, without which the comments in the patch are entirely
> bogus.

The patch looks to good to me. It seems to me that you could just add
an XLogFlush() call.

--
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-bugs by date:

Previous
From: Thomas Munro
Date:
Subject: Re: Build failure with GCC 15 (defaults to -std=gnu23)
Next
From: PG Bug reporting form
Date:
Subject: BUG #18748: Old name version