Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition - Mailing list pgsql-bugs

From tender wang
Subject Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition
Date
Msg-id CAHewXN=KYgBa0ER5JMNV=rGFU=Jes0sdiQFkdfq6mvBmi5WENA@mail.gmail.com
Whole thread Raw
In response to Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition  (Michael Paquier <michael@paquier.xyz>)
Responses Re: BUG #18259: Assertion in ExtendBufferedRelLocal() fails after no-space-left condition  (Michael Paquier <michael@paquier.xyz>)
List pgsql-bugs


Michael Paquier <michael@paquier.xyz> 于2023年12月31日周日 14:38写道:
On Thu, Dec 28, 2023 at 02:36:46PM +0800, Richard Guo wrote:
> On Wed, Dec 27, 2023 at 5:08 PM tender wang <tndrwang@gmail.com> wrote:

(Tender Wang..  I know somebody with this name while doing Postgres
stuff in the past, are you the same guy?)

 Sorry, I should not be the guy you know.   I started using this name to send emails in the community only since last year.

>> Yeah, that's true.  I analyze this issue again, and I think the root cause
>> is the " buf_state &= BM_VALID" .
>
> Nice catch.  I believe the intention is to clear the BM_VALID bit.

Hmm.  Yeah.  I'd need to think about that a bit more carefully, but it
looks like you are right here..  Nice catch.  That looks like an
unfortunate typo from 31966b151e6a when local buffers are handled.


I would like to share a further analysis of this issue.
This issue will not happen if we have enough disk space.  Because LocalBufHash will not have an entry which is goning to be the extending block.
So in ExtendBufferedRelLocal(), the found  variable will always be false when we search LocalBufHash. Then 'if (found) {...} 'branch will have no change to enter,
if disk has enough space.  So, this issue had never been exposed before.

What happen when disk have limited space?
At a certain moment,  smgrzeroextend() will report error because of no disk space, but the tag has already been inserted into the LocalBufHash. 
And we don't have remove the tag. So we will enter the 'if (found) {...}' branch when we have enough disk space, which will set the buf_state wrong status.
 
> By the way, I wonder if it would be worthwhile to define new macros for
> bit operations such as set_bit, clear_bit, test_bit, and so on, so that
> we can avoid such typos in the future.

Not sure.  This is hiding the code behind more layers without bringing
extra value if you know how to read bitwise operations, which is
something I'd expect somebody to get pretty good at when hacking on
Postgres because that's all around the code tree.  And this would
create conflict noises when backpatching.  Hence, -1.

We had better add more comments to eplain when enter if(found) {...} branch, and when enter the ‘else {...}'  branch in  ExtendBufferedRelLocal().
 
--
Michael

pgsql-bugs by date:

Previous
From: Andrei Lepikhov
Date:
Subject: Re: BUG #18261: Inconsistent results of SELECT affected by joined subqueries
Next
From: Richard Guo
Date:
Subject: Re: Postgres 16.1 - Bug: cache entry already complete