On Mon, 27 Oct 2025 at 16:55, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hmm, I wasn't really expecting any direct time saving; the point
> was about cutting memory consumption. So Chao Li's nearby results
> are in line with mine.
It's for the same reason that Hash Join starts to run more slowly once
the hash table is larger than L3. Because the memory access pattern
when probing the hash table can't be predicted by the CPU, larger
tables will start having to shuffle cachelines in from RAM more often.
The same happens with smaller tables when having to go from L2 out to
L3 (and even L1d out to L2). If you graphed various different table
sizes, you'd see the performance dropping off per hash lookup as the
memory usage crosses cache size boundaries.
What you've done by using bump is made it so that more tuples will fit
in the same amount of memory, therefore increasing the chances that
useful cachelines are found.
If you happened to always probe the hash table in hash key order, then
this probably wouldn't happen (or at least to a lesser extent) as the
hardware prefetcher would see the forward pattern and prefetch the
memory.
David