Home > mailing lists

Re: RFC: Improve CPU cache locality of syscache searches - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: RFC: Improve CPU cache locality of syscache searches
Date	August 4, 2021 22:44:44
Msg-id	20210804194444.gneyztouva3fmngg@alap3.anarazel.de Whole thread Raw
In response to	RFC: Improve CPU cache locality of syscache searches (John Naylor <john.naylor@enterprisedb.com>)
Responses	Re: RFC: Improve CPU cache locality of syscache searches (John Naylor <john.naylor@enterprisedb.com>)
List	pgsql-hackers

Tree view

Hi,

On 2021-08-04 12:39:29 -0400, John Naylor wrote:
> CPU caches have multiple levels, so I had an idea to use that concept in
> the syscaches.

I do think we loose a good bit to syscache efficiency in real workloads, so I
think it's worth investing time into it.

However:

> Imagine if catcache buckets were cacheline-sized. In that case, we would
> store the most recently accessed hash values and pointers to catctup, in
> addition to the dlist_head:
>
> typedef struct cc_bucket
> {
>   uint32 hashes[4];
>   catctup *ct[4];
>   dlist_head;
> };

I'm not convinced that the above the right idea though. Even if the hash
matches, you're still going to need to fetch at least catctup->keys[0] from
a separate cacheline to be able to return the cache entry.

If the hashes we use are decent and we're resizing at reasonable thresholds,
we shouldn't constantly lookup up values that are later in a collision chain -
particularly because we move cache hits to the head of the bucket. So I don't
think a scheme as you described would actually result in a really better cache
hit ratio?

ISTM that something like

struct cc_bucket_1
{
    uint32 hashes[3]; // 12
    // 4 bytes alignment padding
    Datum key0s[3]; // 24
    catctup *ct[3]; // 24
    // cacheline boundary
    dlist_head conflicts; // 16
};

would be better for 1 key values?

It's obviously annoying to need different bucket types for different key
counts, but given how much 3 unused key Datums waste, it seems worth paying
for?

One issue with stuff like this is that aset.c won't return cacheline aligned
values...

It's possible that it'd be better to put the catctup pointers onto a
neigboring cacheline (so you still get TLB benefits, as well as making it easy
for prefetchers) and increase the number of hashes/keys that fit on one
cacheline.

If we stuffed four values into one bucket we could potentially SIMD the hash
and Datum comparisons ;)

Greetings,

Andres Freund

pgsql-hackers by date:

From: Robert Haas
Date: 04 August 2021, 22:37:36
Subject: Re: A varint implementation for PG?

From: Andres Freund
Date: 04 August 2021, 22:45:58
Subject: Re: A varint implementation for PG?

Re: RFC: Improve CPU cache locality of syscache searches - Mailing list pgsql-hackers

Previous

Next