Fwd: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition - Mailing list pgsql-bugs

From tender wang
Subject Fwd: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
Date
Msg-id CAHewXNnUZHVZUggkO_0yp95GQaOUr57WCf_zar6-PxK5+6tghA@mail.gmail.com
Whole thread Raw
In response to BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition  (PG Bug reporting form <noreply@postgresql.org>)
List pgsql-bugs
Sorry, I forgot to cc other peoples emails.  I only sent to Andres.

---------- Forwarded message ---------
发件人: tender wang <tndrwang@gmail.com>
Date: 2024年1月5日周五 11:49
Subject: Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
To: Andres Freund <andres@anarazel.de>

Andres Freund <andres@anarazel.de> 于2024年1月5日周五 02:57写道:
Hi,

On 2023-12-28 14:36:46 +0800, Richard Guo wrote:
> On Wed, Dec 27, 2023 at 5:08 PM tender wang <tndrwang@gmail.com> wrote:
>
> > Alexander Lakhin <exclusion@gmail.com> 于2023年12月27日周三 15:00写道:
> >
> >> I wonder, if "buf_state &= BM_VALID" is a typo here, maybe it supposed to
> >> be
> >> "buf_state &= ~BM_VALID" as in ExtendBufferedRelShared()...
> >>
> >
> > Yeah, that's true.  I analyze this issue again, and I think the root cause
> > is the " buf_state &= BM_VALID" .
> >
>
> Nice catch.

Indeed.

I wonder how this survived - I remember having written a script to locally
test this scenario...

Hm, I dimly remember having a hard time making my approach for this work for
temp tables - IIRC used chattr +i to makr the file immutable, which only works
for file descriptors opened after.


> I believe the intention is to clear the BM_VALID bit.

Indeed.


These paths had many bugs over the years, they are hard to hit in testing but
regularly hit in production.  In a more modern world we'd have a unit test for
these scenarios, in isolation. But that's hard at the moment.

I have been wondering if it's worth and possible to write a C level test for
them, perhaps in regress.c. But right now that seems hard because of the
DropRelationBuffers() in smgrtruncate() (seems like a layering violation to
have it there, but it's probably too risky to move at this point).

But perhaps this could actually serve as a good first case for Michael's
failure injection patch?  A failure injection to make
FileFallocate()/FileZero() fail should just be a few lines.
 
Yeah,  agree!  This is a great opportunity to introduce advanced testing techniques such as Michael's fault injection.

We generally have pretty much no coverage for out-of-space, partial
read/write, EINTR of file operations, because that's hard to do with the
current test infrastructure. So this might be a nice first case.

AFAIK for out-of-space cases:
 1、Checkpointer process and WalWriter process would exit, and the user's session will be disconnected
 2、restart instance will fail if it has redo logs to replay. This will influence availability.

1 and 2  are not user-friendly.  
 
Greetings,

Andres Freund

pgsql-bugs by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: BUG #18267: Logical replication bug: data is not synchronized after Alter Publication.
Next
From: Michael Paquier
Date:
Subject: Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition