Thread: Improve the efficiency of _bt_killitems.

Improve the efficiency of _bt_killitems.

From
feichanghong
Date:
Hi hackers,

In the _bt_killitems function, the following logic is present: we search to the
right for an index item that matches the heap TID and attempt to mark it as
dead. If that index item has already been marked as dead by other concurrent
processes, we will continue searching. However, there should not be any more
matching index items on the current page.

```
while (offnum <= maxoff)
{
...
/*
* Mark index item as dead, if it isn't already.  Since this
* happens while holding a buffer lock possibly in shared mode,
* it's possible that multiple processes attempt to do this
* simultaneously, leading to multiple full-page images being sent
* to WAL (if wal_log_hints or data checksums are enabled), which
* is undesirable.
*/
if (killtuple && !ItemIdIsDead(iid))
{
/* found the item/all posting list items */
ItemIdMarkDead(iid);
killedsomething = true;
break; /* out of inner search loop */
}
offnum = OffsetNumberNext(offnum);
}
```

Perhaps we should exit the current loop immediately when the killtuple is set,
stopping the search to the right. This could improve efficiency, especially
when the index item to be killed is the first one on the current page:

```
diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c
index b4ba51357a5..529a3083165 100644
--- a/src/backend/access/nbtree/nbtutils.c
+++ b/src/backend/access/nbtree/nbtutils.c
@@ -4298,11 +4298,14 @@ _bt_killitems(IndexScanDesc scan)
              * to WAL (if wal_log_hints or data checksums are enabled), which
              * is undesirable.
              */
-            if (killtuple && !ItemIdIsDead(iid))
+            if (killtuple)
             {
-                /* found the item/all posting list items */
-                ItemIdMarkDead(iid);
-                killedsomething = true;
+                if (!ItemIdIsDead(iid))
+                {
+                    /* found the item/all posting list items */
+                    ItemIdMarkDead(iid);
+                    killedsomething = true;
+                }
                 break;          /* out of inner search loop */
             }
             offnum = OffsetNumberNext(offnum);
```


Best Regards,
Fei Changhong

Re: Improve the efficiency of _bt_killitems.

From
Heikki Linnakangas
Date:
On 01/11/2024 09:19, feichanghong wrote:
> Hi hackers,
> 
> In the _bt_killitems function, the following logic is present: we 
> search to the right for an index item that matches the heap TID and 
> attempt to mark it as dead. If that index item has already been 
> marked as dead by other concurrent processes, we will continue
> searching. However, there should not be any more matching index items
> on the current page.
Why could there not be more matching items on the page?

Are you assuming a unique index? Even then it's not right; you can have 
multiple index entries point to different heap tuples with the same key, 
as long as they're not visible at the same time. For example, if you 
UPDATE or DELETE+INSERT a row.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




Re: Improve the efficiency of _bt_killitems.

From
feichanghong
Date:


On Nov 1, 2024, at 16:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 01/11/2024 09:19, feichanghong wrote:
Hi hackers,
In the _bt_killitems function, the following logic is present: we search to the right for an index item that matches the heap TID and attempt to mark it as dead. If that index item has already been marked as dead by other concurrent processes, we will continue
searching. However, there should not be any more matching index items
on the current page.
Why could there not be more matching items on the page?

Are you assuming a unique index? Even then it's not right; you can have multiple index entries point to different heap tuples with the same key, as long as they're not visible at the same time. For example, if you UPDATE or DELETE+INSERT a row.

Maybe I didn't describe it clearly. What I meant to say is that there shouldn't
be multiple index items on the current page pointing to the same heap TID(ctid).
rather than the same index key.

Best Regards,
Fei Changhong

Re: Improve the efficiency of _bt_killitems.

From
Heikki Linnakangas
Date:
On 01/11/2024 10:41, feichanghong wrote:
> 
> 
>> On Nov 1, 2024, at 16:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>>
>> On 01/11/2024 09:19, feichanghong wrote:
>>> Hi hackers,
>>> In the _bt_killitems function, the following logic is present: we 
>>> search to the right for an index item that matches the heap TID and 
>>> attempt to mark it as dead. If that index item has already been 
>>> marked as dead by other concurrent processes, we will continue
>>> searching. However, there should not be any more matching index items
>>> on the current page.
>> Why could there not be more matching items on the page?
>>
>> Are you assuming a unique index? Even then it's not right; you can 
>> have multiple index entries point to different heap tuples with the 
>> same key, as long as they're not visible at the same time. For 
>> example, if you UPDATE or DELETE+INSERT a row.
> 
> Maybe I didn't describe it clearly. What I meant to say is that there 
> shouldn't
> be multiple index items on the current page pointing to the same heap 
> TID(ctid).
> rather than the same index key.

Ah, gotcha. That makes sense.

There's more we could do here in the similar vein:

If the TID was found in a posting list, but not all of the items were 
dead, 'killtuple' is set to false so we will still continue the search.

Then there's this comment:

>                 /*
>                  * Don't bother advancing the outermost loop's int iterator to
>                  * avoid processing killed items that relate to the same
>                  * offnum/posting list tuple.  This micro-optimization hardly
>                  * seems worth it.  (Further iterations of the outermost loop
>                  * will fail to match on this same posting list's first heap
>                  * TID instead, so we'll advance to the next offnum/index
>                  * tuple pretty quickly.)
>                  */

I think we should do that micro-optimization. It's a single line of code 
and a single instruction to advance 'i' variable AFAICS. While I agree 
with the comment that it doesn't matter much performance-wise, it seems 
simpler to just do it than explain why we don't do it.

-- 
Heikki Linnakangas
Neon (https://neon.tech)