Re: Fix possible 'unexpected data beyond EOF' on replica restart - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Fix possible 'unexpected data beyond EOF' on replica restart
Date
Msg-id 02e720a8-761b-4309-b26f-f41c288f7990@iki.fi
Whole thread Raw
In response to Re: Fix possible 'unexpected data beyond EOF' on replica restart  (Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>)
Responses Re: Fix possible 'unexpected data beyond EOF' on replica restart
List pgsql-hackers
On 17/12/2025 10:40, Anthonin Bonnefoy wrote:
> On Wed, Dec 17, 2025 at 8:26 AM Amul Sul <sulamul@gmail.com 
> <mailto:sulamul@gmail.com>> wrote:
> 
>> The deleted code you moved to mdtruncate() should be kept where it
>> was. We cannot ensure that every extension using this interface will
>> adhere to that requirement what the comment describes. 
> 
> Yeah, I've overlooked the case of extensions. I've moved back the 
> InvalidBlockNumber assignment.

The smgr interface isn't really an extension point, so we don't need to 
worry about that. (I wish it was, but that's a different story [1])

I don't think mdtruncate() should modify smgr_cached_nblocks, we should 
keep that in smgrtruncate(). smgr.c is responsible for all other updates 
of smgr_cached_nblocks, so doing it in mdtruncate() would be a layering 
violation.

I'm thinking that we should do the attached. Untested, and we should 
also add a comment to smgrtruncate() and mdtruncate() to explain how 
they behave if nblocks > curnblk.

I wonder if we should move the whole "if (nblocks > curnblk)" check and 
ereport() from mdtruncate() to smgrtruncate(). That logic doesn't really 
depend on anything specific to md.c. If you'd imagine a different smgr 
implementation, it'd need to just copy-paste that check. It's the 
caller's mistake if it passes nblocks > curnblk, when not in recovery. 
Then again, we do have other places in md.c too that behave differently 
when InRecovery.

What do you think?

> I wonder how critical it is to have an up to date value of 
> smgr_cached_nblocks after smgr_truncate. Leaving InvalidBlockNumber was 
> also an option as the next user will ask the kernel for the real size. 
> There are some functions like DropRelationBuffers which rely only on the 
> cached value, so it's probably safer to keep the same behaviour.

Leaving it invalid should work. But as the comment says, we might as 
well update the cached value since we have the value at hand. It's just 
that we were doing it wrong.

[1] 
https://www.postgresql.org/message-id/CAEze2WgMySu2suO_TLvFyGY3URa4mAx22WeoEicnK%3DPCNWEMrA%40mail.gmail.com

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Konstantin Knizhnik
Date:
Subject: Re: index prefetching
Next
From: Robert Haas
Date:
Subject: Re: [BUG] [PATCH] pg_basebackup produces wrong incremental files after relation truncation in segmented tables