Re: Improving btree performance through specializing by key shape, take 2 - Mailing list pgsql-hackers

From Dilip Kumar
Subject Re: Improving btree performance through specializing by key shape, take 2
Date
Msg-id CAFiTN-vggB-zrFc06G6VvRNbcpYqhENrEwGGmhJ3GCctj876XQ@mail.gmail.com
Whole thread Raw
In response to Re: Improving btree performance through specializing by key shape, take 2  (Matthias van de Meent <boekewurm+postgres@gmail.com>)
Responses Re: Improving btree performance through specializing by key shape, take 2
List pgsql-hackers
On Fri, Jun 23, 2023 at 8:16 PM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
>
> On Fri, 23 Jun 2023 at 11:26, Dilip Kumar <dilipbalaut@gmail.com> wrote:

>
> > and I have one confusion in the
> > below hunk in the _bt_moveright function, basically, if the parent
> > page's right key is exactly matching the HIGH key of the child key
> > then I suppose while doing the "_bt_compare" with the HIGH_KEY we can
> > use the optimization right, i.e. column number from where we need to
> > start the comparison should be used what is passed by the caller.  But
> > in the below hunk, you are always passing that as 'cmpcol' which is 1.
> > I think this should be '*comparecol' because '*comparecol' will either
> > hold the value passed by the parent if high key data exactly match
> > with the parent's right tuple or it will hold 1 in case it doesn't
> > match.  Am I missing something?
>
> We can't carry _bt_compare prefix results across pages, because the
> key range of a page may shift while we're not holding a lock on that
> page. That's also why the code resets the prefix to 1 every time it
> accesses a new page ^1: it cannot guarantee correct results otherwise.
> See also [0] and [1] for why that is important.

Yeah that makes sense

> ^1: When following downlinks, the code above your quoted code tries to
> reuse the _bt_compare result of the parent page in the common case of
> a child page's high key that is bytewise equal to the right separator
> tuple of the parent page's downlink to this page.  However, if it
> detects that this child high key has changed (i.e. not 100% bytewise
> equal), we can't reuse that result, and we'll have to re-establish all
> prefix info on that page from scratch.
> In any case, this only establishes the prefix for the right half of
> the page's keyspace, the prefix of the left half of the data still
> needs to be established separetely.
>
> I hope this explains the reasons for why we can't reuse comparecol as
> _bt_compare argument.

Yeah got it, thanks for explaining this.  Now I see you have explained
this in comments as well above the memcmp() statement.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: ReadRecentBuffer() doesn't scale well
Next
From: Japin Li
Date:
Subject: Incorrect comment for memset() on pgssHashKey?