Re: Returning nbtree posting list TIDs in DESC order during backwards scans - Mailing list pgsql-hackers

From Andy Fan
Subject Re: Returning nbtree posting list TIDs in DESC order during backwards scans
Date
Msg-id 87h5yx92ci.fsf@163.com
Whole thread Raw
In response to Returning nbtree posting list TIDs in DESC order during backwards scans  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers

Hi,

> On Thu, Jul 17, 2025 at 2:26 PM Peter Geoghegan <pg@bowt.ie> wrote:
>> The loop has an early check for this (for non-itemDead entries) here:
>>
>>         /* Quickly skip over items never ItemDead-set by btgettuple */
>>         if (!kitem->itemDead)
>>             continue;
>>
>> I really doubt that this matters

> I'll need to think about issues around adding the new kitem->itemDead
> bitfield. I'm not really worried about _bt_killitems; more so about
> the routines called by _bt_readpage, which must make sure that the bit
> is unset every time a TID is saved in so->currPos.items[].

Yes, otherwise the live tuple maybe marked as dead, which is terrible. I
think you have unset the bit on all the needed places, including
_bt_saveitem, _bt_savepostingitem and _bt_setuppostingitems for the
first item in postlist item. I can't find out anything is missed.

(I thought another place for this work might be _bt_returnitem, this
might be a more centralized place to set. Later I think it is totally
wrong because for the queries like SELECT * FROM t WHERE idx_col = 3
LIMIT 3;  Some items in so->currPos.items[] were filled in
_bt_readpage but maybe never returned to heap at all, and later
_bt_killitems also run against it on the whole, so unsetting the bit on
_bt_returnitem is too late...)

I think this patch does two things. one is refactoring the data struct
for _bt_killitems, the other one is scaning the postlist in the backward
way for the backward scan. then does the below changes belongs to any of
them? Is it an intentional change?

_bt_killitems:

         if (offnum < minoff)
-            continue;            /* pure paranoia */
+        {
+            /*
+             * Page must have originally been the rightmost page, but has
+             * since split
+             */
+            Assert(!so->dropPin);
+            offnum = minoff;
+        }


At last, I can get the same test result as you. The buffer hits go back to
23 in the test case, thank for working on this!

> I think that this patch isn't too far off being committable.

I think so.

--
Best Regards
Andy Fan




pgsql-hackers by date:

Previous
From: Andrey Borodin
Date:
Subject: Re: IPC/MultixactCreation on the Standby server
Next
From: Greg Burd
Date:
Subject: Re: [PATCH] Let's get rid of the freelist and the buffer_strategy_lock