Home > mailing lists

Re: Improve the efficiency of _bt_killitems. - Mailing list pgsql-hackers

From	feichanghong
Subject	Re: Improve the efficiency of _bt_killitems.
Date	November 3, 2024 14:56:43
Msg-id	tencent_890303415489ECBD2223A1F3E7C3D6A48405@qq.com Whole thread Raw
In response to	Re: Improve the efficiency of _bt_killitems. (Heikki Linnakangas <hlinnaka@iki.fi>)
List	pgsql-hackers

Tree view

On Nov 1, 2024, at 18:50, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 01/11/2024 10:41, feichanghong wrote:
On Nov 1, 2024, at 16:24, Heikki Linnakangas <hlinnaka@iki.fi> wrote:

On 01/11/2024 09:19, feichanghong wrote:
Hi hackers,
In the _bt_killitems function, the following logic is present: we search to the right for an index item that matches the heap TID and attempt to mark it as dead. If that index item has already been marked as dead by other concurrent processes, we will continue
searching. However, there should not be any more matching index items
on the current page.
Why could there not be more matching items on the page?

Are you assuming a unique index? Even then it's not right; you can have multiple index entries point to different heap tuples with the same key, as long as they're not visible at the same time. For example, if you UPDATE or DELETE+INSERT a row.
Maybe I didn't describe it clearly. What I meant to say is that there shouldn't
be multiple index items on the current page pointing to the same heap TID(ctid).
rather than the same index key.

Ah, gotcha. That makes sense.

There's more we could do here in the similar vein:

If the TID was found in a posting list, but not all of the items were dead, 'killtuple' is set to false so we will still continue the search.

Then there's this comment:

/*
* Don't bother advancing the outermost loop's int iterator to
* avoid processing killed items that relate to the same
* offnum/posting list tuple. This micro-optimization hardly
* seems worth it. (Further iterations of the outermost loop
* will fail to match on this same posting list's first heap
* TID instead, so we'll advance to the next offnum/index
* tuple pretty quickly.)
*/

I think we should do that micro-optimization. It's a single line of code and a single instruction to advance 'i' variable AFAICS. While I agree with the comment that it doesn't matter much performance-wise, it seems simpler to just do it than explain why we don't do it.

Yes, I agree that this micro-optimization is not significant for performance,

but it can still save some CPU overhead and make the code easier to understand.

In the test case below (I measured the execution time of the _bt_killitems

function in the code), the execution time of the _bt_killitems function before

optimization was 0.773 ms, while the time after optimization was 0.073 ms:

```

create table t(a int);

insert into t select 1 from generate_series(1, 2000);

create index on t(a);

delete from t ;

explain analyze select * from t order by a;

```

As you mentioned, there are two points of optimization in _bt_killitems:

1. Skipping duplicate items in the posting tuple to avoid unnecessary scans and

checks. For example, if the tids of a posting index tuple are

{"(0,1)","(0,2)","(0,3)","(0,4)","(0,5)"}, it will add 5 entries to killedItems,

and their indexOffset would be the same. The checks for the latter 4 items are

meaningless.

2. If we have already matched the index of the specified head TID, there is no

need to continue searching further.

Maybe we can make the following optimization:

```

diff --git a/src/backend/access/nbtree/nbtutils.c b/src/backend/access/nbtree/nbtutils.c

index 1b75066fb51..d44134ac70a 100644

--- a/src/backend/access/nbtree/nbtutils.c

+++ b/src/backend/access/nbtree/nbtutils.c

@@ -4174,6 +4174,7 @@ _bt_killitems(IndexScanDesc scan)

BTPageOpaque opaque;

OffsetNumber minoff;

OffsetNumber maxoff;

+ OffsetNumber preoff = InvalidOffsetNumber;

int i;

int numKilled = so->numKilled;

bool killedsomething = false;

@@ -4229,6 +4230,10 @@ _bt_killitems(IndexScanDesc scan)

BTScanPosItem *kitem = &so->currPos.items[itemIndex];

OffsetNumber offnum = kitem->indexOffset;

+ if (offnum == preoff)

+ continue; /* skip duplicates */

+ preoff = offnum;

Assert(itemIndex >= so->currPos.firstItem &&

itemIndex <= so->currPos.lastItem);

if (offnum < minoff)

@@ -4238,6 +4243,7 @@ _bt_killitems(IndexScanDesc scan)

ItemId iid = PageGetItemId(page, offnum);

IndexTuple ituple = (IndexTuple) PageGetItem(page, iid);

bool killtuple = false;

+ bool headTidMatched = false;

if (BTreeTupleIsPosting(ituple))

{

@@ -4266,6 +4272,8 @@ _bt_killitems(IndexScanDesc scan)

if (!ItemPointerEquals(item, &kitem->heapTid))

break; /* out of posting list loop */

+ headTidMatched = true;

* kitem must have matching offnum when heap TIDs match,

* though only in the common case where the page can't

@@ -4292,19 +4300,19 @@ _bt_killitems(IndexScanDesc scan)

}

- * Don't bother advancing the outermost loop's int iterator to

- * avoid processing killed items that relate to the same

- * offnum/posting list tuple. This micro-optimization hardly

- * seems worth it. (Further iterations of the outermost loop

- * will fail to match on this same posting list's first heap

- * TID instead, so we'll advance to the next offnum/index

- * tuple pretty quickly.)

+ * We don't directly advance the outermost loop's int iterator

+ * here to avoid processing killed items related to the same

+ * offnum or posting list tuple. This micro-optimization will

+ * be completed at the beginning of the outermost loop.

if (j == nposting)

killtuple = true;

}

else if (ItemPointerEquals(&ituple->t_tid, &kitem->heapTid))

+ {

killtuple = true;

+ headTidMatched = true;

+ }

* Mark index item as dead, if it isn't already. Since this

@@ -4321,6 +4329,15 @@ _bt_killitems(IndexScanDesc scan)

killedsomething = true;

break; /* out of inner search loop */

}

+ /*

+ * If an index entry has already been matched with a heap TID, we

+ * do not search further because there should not be index entries

+ * pointing to the same heap TID in the index.

+ */

+ if (headTidMatched)

+ break;

offnum = OffsetNumberNext(offnum);

}

```

Best Regards,
Fei Changhong

pgsql-hackers by date:

From: "bigbro_wq@hotmail.com"
Date: 03 November 2024, 09:18:47
Subject: Repeat the condition check twice in function distribute_qual_to_rels

From: vignesh C
Date: 03 November 2024, 18:22:26
Subject: Re: Pgoutput not capturing the generated columns

Re: Improve the efficiency of _bt_killitems. - Mailing list pgsql-hackers

Previous

Next