Re: NUMA shared memory interleaving - Mailing list pgsql-hackers

From Jakub Wartak
Subject Re: NUMA shared memory interleaving
Date
Msg-id CAKZiRmxYMPbQ4WiyJWh=Vuw_Ny+hLGH9_9FaacKRJvzZ-smm+w@mail.gmail.com
Whole thread Raw
In response to Re: NUMA shared memory interleaving  (Bertrand Drouvot <bertranddrouvot.pg@gmail.com>)
List pgsql-hackers
On Fri, Apr 18, 2025 at 7:48 PM Bertrand Drouvot
<bertranddrouvot.pg@gmail.com> wrote:
>
> Hi,
>
> On Thu, Apr 17, 2025 at 01:58:44AM +1200, Thomas Munro wrote:
> > On Wed, Apr 16, 2025 at 9:14 PM Jakub Wartak
> > <jakub.wartak@enterprisedb.com> wrote:
> > > 2. Should we also interleave DSA/DSM for Parallel Query? (I'm not an
> > > expert on DSA/DSM at all)
> >
> > I have no answers but I have speculated for years about a very
> > specific case (without any idea where to begin due to lack of ... I
> > guess all this sort of stuff): in ExecParallelHashJoinNewBatch(),
> > workers split up and try to work on different batches on their own to
> > minimise contention, and when that's not possible (more workers than
> > batches, or finishing their existing work at different times and going
> > to help others), they just proceed in round-robin order.  A beginner
> > thought is: if you're going to help someone working on a hash table,
> > it would surely be best to have the CPUs and all the data on the same
> > NUMA node.  During loading, cache line ping pong would be cheaper, and
> > during probing, it *might* be easier to tune explicit memory prefetch
> > timing that way as it would look more like a single node system with a
> > fixed latency, IDK (I've shared patches for prefetching before that
> > showed pretty decent speedups, and the lack of that feature is
> > probably a bigger problem than any of this stuff, who knows...).
> > Another beginner thought is that the DSA allocator is a source of
> > contention during loading: the dumbest problem is that the chunks are
> > just too small, but it might also be interesting to look into per-node
> > pools.  Or something.   IDK, just some thoughts...
>
> I'm also thinking that could be beneficial for parallel workers. I think the
> ideal scenario would be to have the parallel workers spread across numa nodes and
> accessing their "local" memory first (and help with "remote" memory access if
> there is still more work to do "remotely").

Hi Bertrand, I've played with CPU pinning of PQ workers (via adjusting
postmaster pin), but I've got quite opposite results - please see
attached, especially "lat"ency against how the CPUs were assigned VS
NUMA/s_b when it was not interleaved. Not that I intend to spend a lot
of time researching PQ vs NUMA , but I've included interleaving of PQ
shm segments too in the v4 patch in the subthread nearby. Those
attached results here, were made some time ago with v1 of the patch
where PQ shm segment was not interleaved.

If anything it would be to hear if there are any sensible
production-like scenarios/workloads when dynamic_shared_memory should
be set to sysv or mmap (instead of default posix) ? Asking for Linux
only, I couldn't imagine anything (?)

-J.

Attachment

pgsql-hackers by date:

Previous
From: Tatsuo Ishii
Date:
Subject: Re: PG18 protocol version
Next
From: Álvaro Herrera
Date:
Subject: Re: Remove HeapTupleheaderSetXmin{Committed,Invalid} functions