Re: RFC: Improve CPU cache locality of syscache searches - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: RFC: Improve CPU cache locality of syscache searches |
Date | |
Msg-id | 20210805201201.cnc4hagkglxk4pos@alap3.anarazel.de Whole thread Raw |
In response to | Re: RFC: Improve CPU cache locality of syscache searches (John Naylor <john.naylor@enterprisedb.com>) |
Responses |
Re: RFC: Improve CPU cache locality of syscache searches
Re: RFC: Improve CPU cache locality of syscache searches Re: RFC: Improve CPU cache locality of syscache searches |
List | pgsql-hackers |
Hi, On 2021-08-05 12:27:49 -0400, John Naylor wrote: > On Wed, Aug 4, 2021 at 3:44 PM Andres Freund <andres@anarazel.de> wrote: > > On 2021-08-04 12:39:29 -0400, John Naylor wrote: > > > typedef struct cc_bucket > > > { > > > uint32 hashes[4]; > > > catctup *ct[4]; > > > dlist_head; > > > }; > > > > I'm not convinced that the above the right idea though. Even if the hash > > matches, you're still going to need to fetch at least catctup->keys[0] > from > > a separate cacheline to be able to return the cache entry. > > I see your point. It doesn't make sense to inline only part of the > information needed. At least not for the unconditionally needed information. > Although I'm guessing inlining just two values in the 4-key case wouldn't > buy much. Not so sure about that. I'd guess that two key comparisons take more cycles than a cacheline fetch the further keys (perhaps not if we had inlined key comparisons). I.e. I'd expect out-of-order + speculative execution to hide the latency for fetching the second cacheline for later key values. > > If we stuffed four values into one bucket we could potentially SIMD the > hash > > and Datum comparisons ;) > > ;-) That's an interesting future direction to consider when we support > building with x86-64-v2. It'd be pretty easy to compare a vector of hashes > and quickly get the array index for the key comparisons (ignoring for the > moment how to handle the rare case of multiple identical hashes). > However, we currently don't memcmp() the Datums and instead call an > "eqfast" function, so I don't see how that part would work in a vector > setting. It definitely couldn't work unconditionally - we have to deal with text, oidvector, comparisons after all. But we could use it for the other types. However, looking at the syscaches, I think it'd not very often be applicable for caches with enough columns. I have wondered before whether we should have syscache definitions generate code specific to each syscache definition. I do think that'd give a good bit of performance boost. But I don't see a trivial way to get there without notational overhead. We could define syscaches in a header using a macro that's defined differently in syscache.c than everywhere else. The header would declare a set of functions for each syscache, syscache.c would define them to call an always_inline function with the relevant constants. Or perhaps we should move syscache definitions into the pg_*.h headers, and generate the relevant code as part of their processing. That seems like it could be nice from a modularity POV alone. And it could do better than the current approach, because we could hardcode the types for columns in the syscache definition without increasing notational overhead. Greetings, Andres Freund
pgsql-hackers by date: