Re: RFC: Improve CPU cache locality of syscache searches - Mailing list pgsql-hackers

From Andres Freund
Subject Re: RFC: Improve CPU cache locality of syscache searches
Date
Msg-id 20210804194444.gneyztouva3fmngg@alap3.anarazel.de
Whole thread Raw
In response to RFC: Improve CPU cache locality of syscache searches  (John Naylor <john.naylor@enterprisedb.com>)
Responses Re: RFC: Improve CPU cache locality of syscache searches  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers
Hi,

On 2021-08-04 12:39:29 -0400, John Naylor wrote:
> CPU caches have multiple levels, so I had an idea to use that concept in
> the syscaches.

I do think we loose a good bit to syscache efficiency in real workloads, so I
think it's worth investing time into it.

However:

> Imagine if catcache buckets were cacheline-sized. In that case, we would
> store the most recently accessed hash values and pointers to catctup, in
> addition to the dlist_head:
>
> typedef struct cc_bucket
> {
>   uint32 hashes[4];
>   catctup *ct[4];
>   dlist_head;
> };

I'm not convinced that the above the right idea though. Even if the hash
matches, you're still going to need to fetch at least catctup->keys[0] from
a separate cacheline to be able to return the cache entry.

If the hashes we use are decent and we're resizing at reasonable thresholds,
we shouldn't constantly lookup up values that are later in a collision chain -
particularly because we move cache hits to the head of the bucket. So I don't
think a scheme as you described would actually result in a really better cache
hit ratio?

ISTM that something like

struct cc_bucket_1
{
    uint32 hashes[3]; // 12
    // 4 bytes alignment padding
    Datum key0s[3]; // 24
    catctup *ct[3]; // 24
    // cacheline boundary
    dlist_head conflicts; // 16
};

would be better for 1 key values?

It's obviously annoying to need different bucket types for different key
counts, but given how much 3 unused key Datums waste, it seems worth paying
for?

One issue with stuff like this is that aset.c won't return cacheline aligned
values...


It's possible that it'd be better to put the catctup pointers onto a
neigboring cacheline (so you still get TLB benefits, as well as making it easy
for prefetchers) and increase the number of hashes/keys that fit on one
cacheline.

If we stuffed four values into one bucket we could potentially SIMD the hash
and Datum comparisons ;)

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: A varint implementation for PG?
Next
From: Andres Freund
Date:
Subject: Re: A varint implementation for PG?