Thread: memory layouts for binary search in nbtree

memory layouts for binary search in nbtree

From

Andres Freund

Date:

18 May 2016, 13:25:43

Hi,

currently we IIRC use linearly sorted datums for the search in
individual btree nodes.  Not surprisingly that's often one of the
dominant entries in profiles. We could probably improve upon that by
using an order more optimized for efficient binary search.

See e.g.  http://cglab.ca/~morin/misc/arraylayout/ for benchmarks
showing benefits.

Andres

Re: memory layouts for binary search in nbtree

From

Simon Riggs

Date:

18 May 2016, 13:55:44

On 18 May 2016 at 14:25, Andres Freund <andres@anarazel.de> wrote:

Hi,

currently we IIRC use linearly sorted datums for the search in
individual btree nodes. Not surprisingly that's often one of the
dominant entries in profiles. We could probably improve upon that by
using an order more optimized for efficient binary search.

See e.g. http://cglab.ca/~morin/misc/arraylayout/ for benchmarks
showing benefits.

Some stuff from >10 years ago about cache conscious btree layout as well. That led to adoption of 64kB pages on some benchmarks.

I think its a good area of work.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: memory layouts for binary search in nbtree

From

Peter Geoghegan

Date:

20 May 2016, 02:28:37

On Wed, May 18, 2016 at 6:55 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
> I think its a good area of work.

I strongly agree.

Abbreviated keys in indexes are supposed to help with this. Basically,
the ItemId array is made to be interlaced with small abbreviated keys
(say one or two bytes), only in the typically less than 1% of pages
that are internal (leaf page binary searches don't change). Those
internal pages naturally have a wide range of values represented, so 1
byte turns out to be a lot more than you'd think. And, you only have
to generate a new one when there is a pagesplit, which is relatively
infrequent. You could squeeze out the lp_len bits to fit the
abbreviated keys, and store that in the IndexTuple proper. I've
discussed this idea with Mashahiko extensively in private. I have lots
of related ideas, and think it's a very promising area.

I think that this project will be very difficult without better testing.

This idea also enables complementary techniques, like interpolation
search that can degrade to binary search.

-- 
Peter Geoghegan

Re: memory layouts for binary search in nbtree

From

Peter Geoghegan

Date:

20 May 2016, 02:38:09

On Wed, May 18, 2016 at 6:25 AM, Andres Freund <andres@anarazel.de> wrote:
> currently we IIRC use linearly sorted datums for the search in
> individual btree nodes.  Not surprisingly that's often one of the
> dominant entries in profiles. We could probably improve upon that by
> using an order more optimized for efficient binary search.

Did you ever try running a pgbench SELECT benchmark, having modified
things such that all PKs are on columns that are not of type
int4/int8, but rather are of type numeric? It's an interesting
experiment, that I've been meaning to re-run on a big box.

Obviously this will be slower than an equivalent plain pgbench SELECT,
but the difference may be smaller than you expect.

-- 
Peter Geoghegan

Re: [HACKERS] memory layouts for binary search in nbtree

From

Peter Geoghegan

Date:

27 June 2017, 20:57:15

On Thu, May 19, 2016 at 7:28 PM, Peter Geoghegan <pg@heroku.com> wrote:
> Abbreviated keys in indexes are supposed to help with this. Basically,
> the ItemId array is made to be interlaced with small abbreviated keys
> (say one or two bytes), only in the typically less than 1% of pages
> that are internal (leaf page binary searches don't change).

I looked at this again recently. I wrote a patch to prove to myself
that we can fairly easily reclaim 15 bits from every nbtree internal
page ItemId, and put an abbreviated key there instead. The patch has
nbtree tell PageAdditem() to store an abbreviated key within lp_len
(actually, we call PageAddItemAbbrev() now). I didn't realize that
stealing lp_len might be feasible until recently, but it appears to be
-- this is a lot simpler than other approaches might be. I already
implemented a rudimentary, POC encoding scheme for int4, but text is
the datatype that I'd expect to see a real benefit for.

While this POC patch of mine could easily have bugs, it is at least
true that "make check-world" passes for me. For nbtree, reclaiming the
lp_len space was just a matter of teaching a small number of existing
places to go to the IndexTuple to get a tuple's length, rather than
trusting an ItemId's lp_len field (that is, rather than calling
ItemIdGetLength()). Most nbtree related code already gets the length
from the index tuple header today, since it's pretty rare to care
about the length of a tuple but not its contents. This did require
updating some generic routines within bufpage.c, too, but that wasn't
so bad.

Can you suggest a workload/hardware to assess the benefits of an
optimization like this, Andres? Is there a profile available somewhere
in the archives that shows many cache misses within _bt_binsrch()?

-- 
Peter Geoghegan

Re: [HACKERS] memory layouts for binary search in nbtree

From

Andres Freund

Date:

27 June 2017, 21:04:54

Hi,

On 2017-06-27 10:57:15 -0700, Peter Geoghegan wrote:
> I looked at this again recently. I wrote a patch to prove to myself
> that we can fairly easily reclaim 15 bits from every nbtree internal
> page ItemId, and put an abbreviated key there instead.

Interesting. Not sure however that really addresses my layout / cache
efficiency concern? Or is that just a largely independent optimization,
except it's affecting the same area of code?

> Can you suggest a workload/hardware to assess the benefits of an
> optimization like this, Andres? Is there a profile available somewhere
> in the archives that shows many cache misses within _bt_binsrch()?

I don't quite remember what triggered my report, but it's quite commonly
visible in any workloads that have btrees above toy sizes, but still
fitting in shared_buffers. Obviously you need somehting where btree
lookups are a significant share of the time, but that's easy enough with
index nested loop joins and such.  I do recall seeing it recently-ish in
a number of tpc-h queries.

Greetings,

Andres Freund

Re: [HACKERS] memory layouts for binary search in nbtree

From

Andres Freund

Date:

27 June 2017, 21:05:36

On 2016-05-19 19:38:02 -0700, Peter Geoghegan wrote:
> On Wed, May 18, 2016 at 6:25 AM, Andres Freund <andres@anarazel.de> wrote:
> > currently we IIRC use linearly sorted datums for the search in
> > individual btree nodes.  Not surprisingly that's often one of the
> > dominant entries in profiles. We could probably improve upon that by
> > using an order more optimized for efficient binary search.
> 
> Did you ever try running a pgbench SELECT benchmark, having modified
> things such that all PKs are on columns that are not of type
> int4/int8, but rather are of type numeric? It's an interesting
> experiment, that I've been meaning to re-run on a big box.

> Obviously this will be slower than an equivalent plain pgbench SELECT,
> but the difference may be smaller than you expect.

I'm not sure what that has to do with the topic?

- Andres

Re: [HACKERS] memory layouts for binary search in nbtree

From

Peter Geoghegan

Date:

27 June 2017, 21:13:38

On Tue, Jun 27, 2017 at 11:04 AM, Andres Freund <andres@anarazel.de> wrote:
>
> On 2017-06-27 10:57:15 -0700, Peter Geoghegan wrote:
>> I looked at this again recently. I wrote a patch to prove to myself
>> that we can fairly easily reclaim 15 bits from every nbtree internal
>> page ItemId, and put an abbreviated key there instead.
>
> Interesting. Not sure however that really addresses my layout / cache
> efficiency concern? Or is that just a largely independent optimization,
> except it's affecting the same area of code?

Well, you'd only do this on internal pages, which are a tiny minority
of the total, and yet are where the majority of binary searches for an
index scan occur in practice. The optimization has the effect of
making the binary search only need to access the much smaller ItemId
array in that best case. In the best case, where you resolve all
comparisons on internal pages, you still have to get the index tuple
that you need to follow the TID of to go to a page on the next level
down, once the binary search for an internal page actually finds it.
But that's all.

In the best case, and maybe the average case, this could be highly
effective, I think. There would definitely be cases where the
optimization wouldn't help at all, but hopefully it would also not
hurt.

-- 
Peter Geoghegan

Re: [HACKERS] memory layouts for binary search in nbtree

From

Peter Geoghegan

Date:

27 June 2017, 21:17:29

On Tue, Jun 27, 2017 at 11:05 AM, Andres Freund <andres@anarazel.de> wrote:
>> Did you ever try running a pgbench SELECT benchmark, having modified
>> things such that all PKs are on columns that are not of type
>> int4/int8, but rather are of type numeric? It's an interesting
>> experiment, that I've been meaning to re-run on a big box.
>
>> Obviously this will be slower than an equivalent plain pgbench SELECT,
>> but the difference may be smaller than you expect.
>
> I'm not sure what that has to do with the topic?

It suggests that cache misses are much more important than anything
else. Even with collated text, the difference is not so large IIRC.
Though, it was noticeably larger.

-- 
Peter Geoghegan

Re: [HACKERS] memory layouts for binary search in nbtree

From

Andres Freund

Date:

27 June 2017, 21:54:05

On 2017-06-27 11:13:38 -0700, Peter Geoghegan wrote:
> On Tue, Jun 27, 2017 at 11:04 AM, Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2017-06-27 10:57:15 -0700, Peter Geoghegan wrote:
> >> I looked at this again recently. I wrote a patch to prove to myself
> >> that we can fairly easily reclaim 15 bits from every nbtree internal
> >> page ItemId, and put an abbreviated key there instead.
> >
> > Interesting. Not sure however that really addresses my layout / cache
> > efficiency concern? Or is that just a largely independent optimization,
> > except it's affecting the same area of code?
> 
> Well, you'd only do this on internal pages, which are a tiny minority
> of the total, and yet are where the majority of binary searches for an
> index scan occur in practice. The optimization has the effect of
> making the binary search only need to access the much smaller ItemId
> array in that best case. In the best case, where you resolve all
> comparisons on internal pages, you still have to get the index tuple
> that you need to follow the TID of to go to a page on the next level
> down, once the binary search for an internal page actually finds it.
> But that's all.
> 
> In the best case, and maybe the average case, this could be highly
> effective, I think. There would definitely be cases where the
> optimization wouldn't help at all, but hopefully it would also not
> hurt.

In other words, it's an independent optimization.  That's cool, but I'd
rather talk about it in an independent thread, to avoid conflating the
issues.