Re: Better HINT message for "unexpected data beyond EOF" - Mailing list pgsql-hackers

From Jakub Wartak
Subject Re: Better HINT message for "unexpected data beyond EOF"
Date
Msg-id CAKZiRmwFoaymHZZedNbdTQhDZNmuoA2JRKOrtjQbG+Y=UBN61g@mail.gmail.com
Whole thread Raw
In response to Re: Better HINT message for "unexpected data beyond EOF"  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: Better HINT message for "unexpected data beyond EOF"
List pgsql-hackers
On Wed, Mar 26, 2025 at 4:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
[..]
> > so how about:
> > -HINT:  This has been seen to occur with buggy kernels; consider
> > updating your system.
> > +HINT:  This has been observed with files being overwritten, buggy
> > kernels and potentially other external file system influence.
>
> I agree that we should emphasize the possibility of files being
> overwritten.

> I'm not sure we should even mention buggy kernels -- is
> there any evidence that's still a thing on still-running hardware?

No, I do not have any, other than comments in source code from Tom.

> I don't really like "other external file system influence" because that
> sounds like vague weasel-wording.

That was somehow intended, because I did not want to rule out any
external factor(s) and state it as vaguely as possible to stay
generic, because it is literally "paranormal" / "rogue" activity
happening from perspective of the core server itself (another entity
opening and overwriting data files) , but I suppose bugs or in some
cases fs corruption could cause it too ?)

E.g. I've tracked down that e.g. Pavan fixed something in 2ndQ
fast_redo/pg_xlog_prefetch extension in 2016, where some concurrency
bug in that extension was causing similiar problem back then on at
least one occasion: ```...issue was caused because the prefetch worker
process reading back blocks that are being concurrently dropped by the
startup process (as a result of truncate operation). When the startup
process later tries to extend the relation, it finds an existing valid
block in the shared buffers and panics. ``` (sounds like it is related
with data beyond EOF).

Proposals:
1. HINT:  This has been observed with files being overwritten.
2. HINT:  This has been observed with files being overwritten, old
(2.6.x) buggy Linux kernels .
3. HINT:  This has been observed with files being overwritten, old
(2.6.x) buggy Linux kernels, corruption or other non-core PostgreSQL
bugs.
4. HINT:  This has been observed with files being overwritten, buggy
kernels and potentially other external file system influence.

TBH, anything else is better that simply avoids blaming kernel folks
directly, but as a non-native speaker I'm finding it a little hard to
articulate.

-J.



pgsql-hackers by date:

Previous
From: Jesper Pedersen
Date:
Subject: Re: GSoC 2025 - Looking for Beginner-Friendly PostgreSQL Project
Next
From: Michael Paquier
Date:
Subject: Re: [PATCH] PGSERVICEFILE as part of a normal connection string