On Thu, Jun 9, 2022 at 2:36 PM Ma, Marcus <marcjma@amazon.com> wrote:
> I’m currently working on a parallelization optimization of the Sequential Scan in the codebase, and I need to share
informationbetween the workers as they scan a relation. I’ve done a decent amount of testing, and I know that the
parallelworkers all share the same dsa_area in the plan state. However, by the time I’m actually able to allocate a
dsa_pointervia dsa_allocate0(), the separate parallel workers have already been created so I can’t actually share the
pointerwith them. Since the workers all share the same dsa_area, all I need to do is be able to share the single
dsa_pointerwith them but so far I’ve been out of luck. Any advice?
Generally, the way you share information with a parallel worker is by
making an entry in a DSM TOC using a well-known value as the key, and
then the parallel worker reads that entry. That entry might contain
things like a dsa_pointer, in which case you can hang any amount of
additional stuff off of that storage. In the case of the executor, the
well-known value used as the key the plan_node_id. See
ExecSeqScanInitializeDSM and ExecSeqScanInitializeWorker for an
example of how to share data that is known before starting the workers
advance. In your case you'd need to adapt that technique. But notice
that all we're doing here is making a TOC entry for a
ParallelTableScanDesc. The contents of that struct can be anything.
For instance, it could contain a dsa_pointer and an LWLock protecting
the pointer and a ConditionVariable to wait for the pointer to change.
Another approach would be to set up a shm_mq and transmit the
dsa_pointer through it as a message.
--
Robert Haas
EDB: http://www.enterprisedb.com