Re: RFC: Improve CPU cache locality of syscache searches - Mailing list pgsql-hackers

From John Naylor
Subject Re: RFC: Improve CPU cache locality of syscache searches
Date
Msg-id CAFBsxsEtCJkO8vOjfBzfHdsG0TDGjt5CGRE3RG_1+0WOpyw3sg@mail.gmail.com
Whole thread Raw
In response to Re: RFC: Improve CPU cache locality of syscache searches  (Andres Freund <andres@anarazel.de>)
Responses Re: RFC: Improve CPU cache locality of syscache searches  (John Naylor <john.naylor@enterprisedb.com>)
List pgsql-hackers

On Thu, Aug 5, 2021 at 4:12 PM Andres Freund <andres@anarazel.de> wrote:
> I have wondered before whether we should have syscache definitions generate
> code specific to each syscache definition. I do think that'd give a good bit
> of performance boost. But I don't see a trivial way to get there without
> notational overhead.
>
> We could define syscaches in a header using a macro that's defined differently
> in syscache.c than everywhere else. The header would declare a set of
> functions for each syscache, syscache.c would define them to call an
> always_inline function with the relevant constants.
>
> Or perhaps we should move syscache definitions into the pg_*.h headers, and
> generate the relevant code as part of their processing. That seems like it
> could be nice from a modularity POV alone. And it could do better than the
> current approach, because we could hardcode the types for columns in the
> syscache definition without increasing notational overhead.

I had code laying around to generate the info needed for initialization, but I apparently deleted it and never emailed it. :-(

If we went that far, I wonder if we could specialize the cc_bucket for the actual types, rather than just number of Datums, which would further save on space. More complex, though.

Also, if comparisons got cheaper, maybe we could get away with not storing the full hash in the bucket in favor of a tag of the 16 highest bits. (The catctup would still need the full hash for invalidation, at least).

Another idea I came across while researching in-memory key-value stores is "bulk chaining" -- see [1] for an example. In that case, our example 2-cacheline bucket would have a dlist_head pointing not to the catctups, but to another bucket. So that throws away my L1/L2 analogy and uses a chain of buckets, each of which contain multiple sets of hashes/keys/pointers. That's closer to a normal collision chain, but with better cache locality. If we did that, I imagine the catctup could dispense with storing its Datum array and its dlist_node entirely.

[1] https://www.usenix.org/system/files/conference/nsdi14/nsdi14-paper-lim.pdf

--
John Naylor
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

Previous
From: "tanghy.fnst@fujitsu.com"
Date:
Subject: [PATCH]Remove obsolete macro CHECKFLOATVAL in btree_gist
Next
From: Tom Lane
Date:
Subject: Re: Alias collision in `refresh materialized view concurrently`