Home > mailing lists

Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows - Mailing list pgsql-bugs

From	Thomas Munro
Subject	Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
Date	October 4, 2023 21:12:27
Msg-id	CA+hUKG+V39zO6E9kzg=SP0S8jGYy6AYY9xWCkGLqKxfUJuuFeg@mail.gmail.com Whole thread Raw
In response to	Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows
List	pgsql-bugs

Tree view

On Thu, Oct 5, 2023 at 3:26 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael@paquier.xyz> writes:
> > On Wed, Oct 04, 2023 at 09:17:11AM +0200, Laurenz Albe wrote:
> >> Data corruption like this is not necessarily caused by a PostgreSQL bug.
>
> > Err, well...  A failure on the end-of-vacuum truncation should not
> > lead to corruption afterwards as well, and this ought to be safe even
> > if this step failed.  This is a very tricky problem that nobody has
> > really looked into yet.
>
> ISTM we did identify the problem: while all the tuples in the
> pages-to-be-truncated should be dead and thus invisible, it may
> be that some of those pages are dirty and haven't been written
> out of shared buffers yet, and the page versions on disk contain
> tuples that look live.  If VACUUM discards those dirty buffers
> and then fails to truncate, voila you have tuples rising from
> the dead.
>
> I'm too lazy to check the commit log right now, but I think
> we did implement a fix for that (ie, flush dirty pages even
> if we anticipate them going away due to truncation).  But as
> Laurenz says, v10 is out of support and possibly didn't get
> that fix.  Even if it did, you'd need to be running one of
> the last minor releases, because this wasn't very long ago.

This thread seems to be saying otherwise:

https://www.postgresql.org/message-id/flat/2348.1544474335%40sss.pgh.pa.us

> In the end though, the *real* problem here is running on a
> platform that randomly disallows writes to disk.  There's only
> so much that Postgres can possibly do about unreliability of the
> underlying platform.  I would never run a production database on
> Windows, because it's just too prone to that sort of BS.

It's surprising that ftruncate() AKA chsize() is able to fail like
this (I am not a Windows user but AFAIR that sharing stuff obstructs
stuff like open, unlink, rename, so it surprises me to see it come up
with ftruncate, since we must already have made it past the open
stage).  Hmm, the documentation is scant, but I know from my attempts
to use large files that chsize() is probably some kind of wrapper
around SetEndOfFile() or similar, and that is documented as failing if
someone has the file mapped.  I don't know why someone would have the
file mapped, though.

But as for what we should do about it, PANIC (as suggested by several
people) seems better than corruption, if we're not going to write some
kind of resilience?  How else are we supposed to deal with "this
shouldn't happen, and if it does we're hosed?"

pgsql-bugs by date:

From: "Given, Robert A"
Date: 04 October 2023, 17:46:10
Subject: REFRESH MATERIALIZED VIEW error

From: Thomas Munro
Date: 04 October 2023, 22:38:24
Subject: Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows

Re: BUG #18146: Rows reappearing in Tables after Auto-Vacuum Failure in PostgreSQL on Windows - Mailing list pgsql-bugs

Previous

Next