Re: Better HINT message for "unexpected data beyond EOF" - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Better HINT message for "unexpected data beyond EOF"
Date
Msg-id gluttro6ro2lsn7mvs6i6ihdhi4futxpgljyhslcvguci2a5rd@xteikqd6ftos
Whole thread Raw
In response to Re: Better HINT message for "unexpected data beyond EOF"  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Responses Re: Better HINT message for "unexpected data beyond EOF"
List pgsql-hackers
Hi,

On 2025-03-27 10:25:50 +0100, Jakub Wartak wrote:
> On Wed, Mar 26, 2025 at 4:01 PM Robert Haas <robertmhaas@gmail.com> wrote:
> [..]
> > > so how about:
> > > -HINT:  This has been seen to occur with buggy kernels; consider
> > > updating your system.
> > > +HINT:  This has been observed with files being overwritten, buggy
> > > kernels and potentially other external file system influence.
> >
> > I agree that we should emphasize the possibility of files being
> > overwritten.
> 
> > I'm not sure we should even mention buggy kernels -- is
> > there any evidence that's still a thing on still-running hardware?
> 
> No, I do not have any, other than comments in source code from Tom.

FWIW, I'm not sure how much that was ever true. We certainly had our own bugs
that could lead to the error occurring.


> E.g. I've tracked down that e.g. Pavan fixed something in 2ndQ
> fast_redo/pg_xlog_prefetch extension in 2016, where some concurrency
> bug in that extension was causing similiar problem back then on at
> least one occasion: ```...issue was caused because the prefetch worker
> process reading back blocks that are being concurrently dropped by the
> startup process (as a result of truncate operation). When the startup
> process later tries to extend the relation, it finds an existing valid
> block in the shared buffers and panics. ``` (sounds like it is related
> with data beyond EOF).

FWIW that's more generally broken than just this error. You can't just read in
data without holding a lock on a relation, that will cause breakage in all
kinds of ways.


> Proposals:
> 1. HINT:  This has been observed with files being overwritten.
> 2. HINT:  This has been observed with files being overwritten, old
> (2.6.x) buggy Linux kernels .
> 3. HINT:  This has been observed with files being overwritten, old
> (2.6.x) buggy Linux kernels, corruption or other non-core PostgreSQL
> bugs.
> 4. HINT:  This has been observed with files being overwritten, buggy
> kernels and potentially other external file system influence.

FWIW, I think we should just drop the HINT. We really have no clue what caused
it and a HINT should imo have at least some value other than "*Shrug*", which
is imo pretty much what these HINTs amount to, if they were a bit more blunt.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Richard Guo
Date:
Subject: Re: Reduce "Var IS [NOT] NULL" quals during constant folding
Next
From: Álvaro Herrera
Date:
Subject: Re: NOT ENFORCED constraint feature