Re: [PoC] Asynchronous execution again (which is not parallel) - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: [PoC] Asynchronous execution again (which is not parallel)
Date
Msg-id 20151202.111522.61917802.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: [PoC] Asynchronous execution again (which is not parallel)  (Amit Kapila <amit.kapila16@gmail.com>)
Responses Re: [PoC] Asynchronous execution again (which is not parallel)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Thank you for picking this up.

At Tue, 1 Dec 2015 20:33:02 +0530, Amit Kapila <amit.kapila16@gmail.com> wrote in
<CAA4eK1LBwj7heY8pxRmMCOLhuMFr81TLHck-+ByBFuUADgeu+A@mail.gmail.com>
> On Mon, Nov 30, 2015 at 6:17 PM, Kyotaro HORIGUCHI <
> horiguchi.kyotaro@lab.ntt.co.jp> wrote:
> > ====== TODO or random thoughts, not restricted on this patch.
> >
> > - This patch doesn't contain planner part, it must be aware of
> >   async execution in order that this can be  in effective.
> >
> 
> How will you decide whether sync-execution is cheaper than parallel
> execution.  Do you have some specific cases in mind where async
> execution will be more useful than parallel execution?

Mmm.. Some confusion in wording? Sync-async is a discrimination
about when to start execution of a node (and its
descendents). Parallel-serial(sequential) is that of whether
multiple nodes can execute simultaneously. Async execution
premises parallel execution in any terms, bgworker or FDW.

As I wrote in the previous mail, async execution reduces startup
time of execution of parallel execution. So async execution is
not useful than parallel execution, but it accelerates parallel
execution. Is is effective when startup time of every parallel
execution node is rather long. We have enough numbers to cost it.

> > - Some measture to control execution on bgworker would be
> >   needed. At least merge join requires position mark/reset
> >   functions.
> >
> > - Currently, more tuples make reduce effectiveness of parallel
> >   execution, some method to transfer tuples in larger unit would
> >   be needed, or would be good to have shared workmem?
> >
> 
> Yeah, I think here one thing we need to figure out is whether the
> performance bottleneck is due to the amount of data that is transferred
> between worker and master or something else. One idea could be to pass
> TID and may be keep the buffer pin (which will be released by master
> backend), but on the other hand if we have to perform costly target list
> evaluation by bgworker, then it might be beneficial to pass the projected
> list back.

On possible bottle neck is singnalling between backends. Current
parallel execution uses signal to make producer-consumer world go
round. Conveying TID won't make it faster if the bottleneck is
the inter-process communication. I brought up bulk-transferring
or shared workmem as a example menas to reduce IPC frequency.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center





pgsql-hackers by date:

Previous
From: Kyotaro HORIGUCHI
Date:
Subject: Re: Foreign join pushdown vs EvalPlanQual
Next
From: Michael Paquier
Date:
Subject: Re: Re: In-core regression tests for replication, cascading, archiving, PITR, etc.