On Wed, Apr 16, 2025 at 9:14 PM Jakub Wartak
<jakub.wartak@enterprisedb.com> wrote:
> 2. Should we also interleave DSA/DSM for Parallel Query? (I'm not an
> expert on DSA/DSM at all)
I have no answers but I have speculated for years about a very
specific case (without any idea where to begin due to lack of ... I
guess all this sort of stuff): in ExecParallelHashJoinNewBatch(),
workers split up and try to work on different batches on their own to
minimise contention, and when that's not possible (more workers than
batches, or finishing their existing work at different times and going
to help others), they just proceed in round-robin order. A beginner
thought is: if you're going to help someone working on a hash table,
it would surely be best to have the CPUs and all the data on the same
NUMA node. During loading, cache line ping pong would be cheaper, and
during probing, it *might* be easier to tune explicit memory prefetch
timing that way as it would look more like a single node system with a
fixed latency, IDK (I've shared patches for prefetching before that
showed pretty decent speedups, and the lack of that feature is
probably a bigger problem than any of this stuff, who knows...).
Another beginner thought is that the DSA allocator is a source of
contention during loading: the dumbest problem is that the chunks are
just too small, but it might also be interesting to look into per-node
pools. Or something. IDK, just some thoughts...