Re: how to pass data (tuples) to worker processes? - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: how to pass data (tuples) to worker processes?
Date
Msg-id 004d01ce9330$bde5f280$39b1d780$@kapila@huawei.com
Whole thread Raw
In response to Re: how to pass data (tuples) to worker processes?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Tuesday, August 06, 2013 6:29 PM Robert Haas wrote:
> On Sat, Aug 3, 2013 at 6:31 AM, Andrew Tipton <andrew@kiwidrew.com>
> wrote:
> > Robert:  any chance you could share a few more details on the
> enhancements
> > you're planning for bgworkers?  I seem to recall reading that
> communicating
> > with the dynamic bgworkers after they had been launched was next on
> your
> > agenda...
> 
> Yeah, it is.  I'm working on a patch to allow additional shared memory
> segments to be created on the fly.  The idea I'm working with is that
> a backend that plans to launch a worker will first create a dynamic
> shared memory segment, then pass the ID of that segment to the worker
> via bgw_main_arg.  The worker will map the segment, and then the two
> processes can use that to communicate.  My thought is to create a
> queue abstraction that sits on top of the dynamic shared memory
> infrastructure, so that you can set aside a portion of your dynamic
> shared memory segment to use as a ring buffer and send messages back
> and forth with using some kind of API along these lines:
> 
> extern void dsm_queue_send(dsm_queue *, char *data, uint64 len);
> extern uint64 dsm_queue_receive(dsm_queue *, char **dataptr);
> 
> It would also be possible to implement message sending and receiving
> using pipes, but I'm leaning away from that because it would require
> even more OS-dependent code than I'm already having to write, and
> writing OS-dependent shim layers is one of the world's less-rewarding
> coding tasks; and also because I think it will be easier to achieve
> zero-copy semantics using shared memory.

Another idea to get parallel tasks done by bgworkers is rather than
dynamically invoking
a new bgworker, we can have a set of pre-allocated bgworkers for a sever and
then based on need allocate
bgworker from pre-allocated array. 

Now we can allocate shared memory in the beginning based on bgworkers and
the information needed to share between
Backend and bgworkers (Plan, Tuple, snapshot, .. ).

The basic idea can work as below:
a. Backend who wishes to get parallel tasks done by bgworker will divide the
tasks and check which bgworkers are free and share the plan in   corresponding bgworker share memory.
b. Bgworker who is polling on its slot of shared memory can retrieve the
plan and execute it.
c. Bgworker can share the tuples again in its shared memory slot
d. Backend can retrieve tuples from shared memory slots of bgworkers where
it has communicated the plan
e. Backend can send the tuples back to client

This idea has a drawback that queue of tuples to be shared has to be of
fixed size as we need to allocate memory in beginning.


With Regards,
Amit Kapila.




pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Unsafe GUCs and ALTER SYSTEM WAS: Re: ALTER SYSTEM SET
Next
From: Jeremy Harris
Date:
Subject: Re: how to pass data (tuples) to worker processes?