Re: B-tree cache prefetches - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: B-tree cache prefetches
Date
Msg-id CAEepm=3+V-kKt5JjuMQARb6N66B3Dgu6Djj7yf+ndrMrzbRTBw@mail.gmail.com
Whole thread Raw
In response to Re: B-tree cache prefetches  (Andrey Borodin <x4mmm@yandex-team.ru>)
Responses Re: B-tree cache prefetches  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
On Mon, Oct 15, 2018 at 12:03 AM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> 31 авг. 2018 г., в 2:40, Thomas Munro <thomas.munro@enterprisedb.com> написал(а):
> A related topic is the cache-unfriendliness of traditional binary
> searches of sorted data.  Check out "Array Layouts for
> Comparison-Based Searching"[1] if you haven't already.  It says that
> if your array fits in L2 cache, as our btree pages do, then binary
> search still wins against Eytzinger et al, which I was a bit
> disappointed
>
> I've re-read that paper. Their results are not about our case, here's a quote:
> > For arrays small enough _to be kept_ in L2 cache, the branch-free binary search code listed in Listing 2 is the
fastestalgorithm 
>
> Surely, we cannot keep all pages in L2. Eytzinger layout could be realistic possibility, except one simple problem: I
donot see how can we insert items into this layout. 

I don't know.  Aren't we talking about binary search within one 8KB
page here, anyway?

Thinking about some other ideas for cache prefetching in btree code,
here's an idea to improve pipelining of index scans (a bit like the
two-step pipeline idea from the hash join prefetch thread).  We know
the TIDs of heap tuples we'll be looking up in the near future, so can
we prefetch all the way to the tuple?   There are three random
accesses that follow soon after reading an index tuple:
BufTableLookup(), then page header, then the tuple itself.  At the
logical extreme, we could anticipate the probe of SharedBufHash for
the TID from index tuple i + 3 (perhaps dynahash could expose a
hash_prefetch_for_hash() facility), and anticipate the read of the
page header for the tuple pointed to by index tuple i + 2 (perhaps a
dirty read of SharedBufHash that is only prepared to look at the first
entry in the bucket would be OK for this), and anticipate the read of
the tuple for the TID from index tuple i + 1 (perhaps a dirty read of
the page item table).  Or some less ambitious subset, if that pipeline
turns out to be too deep; it seems like the simplest experiment would
be to try prefetching just SharedBufHash lookups.  This may all be
completely crazy, I haven't tried any of it.

> Also, that research is quite distant from B-tree binsearch: thier data items are small and do not represent reference
toactual data. Eytzinger exploits the fact that two posiible future keys share same cache line. But it is hard to
provide,given we place items at quite random place of the page. 

Based on the the above quotes, I'm not suggesting we should use
Eytzinger for search-within-page.  But out of curiosity, I checked,
and saw that you can actually get a pair of index tuples holding a
six-character text key or an integer key into a 64 byte cache line.

--
Thomas Munro
http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Matheus de Oliveira
Date:
Subject: Re: [PATCH] Add support for ON UPDATE/DELETE actions on ALTER CONSTRAINT
Next
From: Tom Lane
Date:
Subject: Re: "SELECT ... FROM DUAL" is not quite as silly as it appears