On 2023-12-28 14:36:46 +0800, Richard Guo wrote: > On Wed, Dec 27, 2023 at 5:08 PM tender wang <tndrwang@gmail.com> wrote: > > > Alexander Lakhin <exclusion@gmail.com> 于2023年12月27日周三 15:00写道: > > > >> I wonder, if "buf_state &= BM_VALID" is a typo here, maybe it supposed to > >> be > >> "buf_state &= ~BM_VALID" as in ExtendBufferedRelShared()... > >> > > > > Yeah, that's true. I analyze this issue again, and I think the root cause > > is the " buf_state &= BM_VALID" . > > > > Nice catch.
Indeed.
I wonder how this survived - I remember having written a script to locally test this scenario...
Hm, I dimly remember having a hard time making my approach for this work for temp tables - IIRC used chattr +i to makr the file immutable, which only works for file descriptors opened after.
> I believe the intention is to clear the BM_VALID bit.
Indeed.
These paths had many bugs over the years, they are hard to hit in testing but regularly hit in production. In a more modern world we'd have a unit test for these scenarios, in isolation. But that's hard at the moment.
I have been wondering if it's worth and possible to write a C level test for them, perhaps in regress.c. But right now that seems hard because of the DropRelationBuffers() in smgrtruncate() (seems like a layering violation to have it there, but it's probably too risky to move at this point).
But perhaps this could actually serve as a good first case for Michael's failure injection patch? A failure injection to make FileFallocate()/FileZero() fail should just be a few lines.
Yeah, agree! This is a great opportunity to introduce advanced testing techniques such as Michael's fault injection.
We generally have pretty much no coverage for out-of-space, partial read/write, EINTR of file operations, because that's hard to do with the current test infrastructure. So this might be a nice first case.
AFAIK for out-of-space cases:
1、Checkpointer process and WalWriter process would exit, and the user's session will be disconnected
2、restart instance will fail if it has redo logs to replay. This will influence availability.