> On 2021-08-04 12:39:29 -0400, John Naylor wrote:
> > typedef struct cc_bucket
> > {
> > uint32 hashes[4];
> > catctup *ct[4];
> > dlist_head;
> > };
>
> I'm not convinced that the above the right idea though. Even if the hash
> matches, you're still going to need to fetch at least catctup->keys[0] from
> a separate cacheline to be able to return the cache entry.
I see your point. It doesn't make sense to inline only part of the information needed.
> struct cc_bucket_1
> {
> uint32 hashes[3]; // 12
> // 4 bytes alignment padding
> Datum key0s[3]; // 24
> catctup *ct[3]; // 24
> // cacheline boundary
> dlist_head conflicts; // 16
> };
>
> would be better for 1 key values?
>
> It's obviously annoying to need different bucket types for different key
> counts, but given how much 3 unused key Datums waste, it seems worth paying
> for?
Yeah, it's annoying, but it does make a big difference to keep out unused Datums:
keys cachelines
3 values 4 values
1 1 1/4 1 1/2
2 1 5/8 2
3 2 2 1/2
4 2 3/8 3
Or, looking at it another way, limiting the bucket size to 2 cachelines, we can fit:
keys values
1 5
2 4
3 3
4 2
Although I'm guessing inlining just two values in the 4-key case wouldn't buy much.
> If we stuffed four values into one bucket we could potentially SIMD the hash
> and Datum comparisons ;)
;-) That's an interesting future direction to consider when we support building with x86-64-v2. It'd be pretty easy to compare a vector of hashes and quickly get the array index for the key comparisons (ignoring for the moment how to handle the rare case of multiple identical hashes). However, we currently don't memcmp() the Datums and instead call an "eqfast" function, so I don't see how that part would work in a vector setting.
--
John Naylor
EDB:
http://www.enterprisedb.com