Re: RFC: Improve CPU cache locality of syscache searches - Mailing list pgsql-hackers

From John Naylor
Subject Re: RFC: Improve CPU cache locality of syscache searches
Date
Msg-id CAFBsxsGkBtEVjjMLZcRQqKxUCZBauoiLBPmH3X-EDSSWd__Yug@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Improve CPU cache locality of syscache searches  (Andres Freund <andres@anarazel.de>)
Responses Re: RFC: Improve CPU cache locality of syscache searches  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Aug 4, 2021 at 3:44 PM Andres Freund <andres@anarazel.de> wrote:
> On 2021-08-04 12:39:29 -0400, John Naylor wrote:
> > typedef struct cc_bucket
> > {
> >   uint32 hashes[4];
> >   catctup *ct[4];
> >   dlist_head;
> > };
>
> I'm not convinced that the above the right idea though. Even if the hash
> matches, you're still going to need to fetch at least catctup->keys[0] from
> a separate cacheline to be able to return the cache entry.

I see your point. It doesn't make sense to inline only part of the information needed.

> struct cc_bucket_1
> {
>     uint32 hashes[3]; // 12
>     // 4 bytes alignment padding
>     Datum key0s[3]; // 24
>     catctup *ct[3]; // 24
>     // cacheline boundary
>     dlist_head conflicts; // 16
> };
>
> would be better for 1 key values?
>
> It's obviously annoying to need different bucket types for different key
> counts, but given how much 3 unused key Datums waste, it seems worth paying
> for?

Yeah, it's annoying, but it does make a big difference to keep out unused Datums:

keys  cachelines
      3 values  4 values

1     1 1/4     1 1/2
2     1 5/8     2
3     2         2 1/2
4     2 3/8     3

Or, looking at it another way, limiting the bucket size to 2 cachelines, we can fit:

keys  values
1     5
2     4
3     3
4     2

Although I'm guessing inlining just two values in the 4-key case wouldn't buy much.

> If we stuffed four values into one bucket we could potentially SIMD the hash
> and Datum comparisons ;)

;-) That's an interesting future direction to consider when we support building with x86-64-v2. It'd be pretty easy to compare a vector of hashes and quickly get the array index for the key comparisons (ignoring for the moment how to handle the rare case of multiple identical hashes).  However, we currently don't memcmp() the Datums and instead call an "eqfast" function, so I don't see how that part  would work in a vector setting.

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: very long record lines in expanded psql output
Next
From: Platon Pronko
Date:
Subject: Re: very long record lines in expanded psql output