Re: [DESIGN] ParallelAppend - Mailing list pgsql-hackers

From Amit Langote
Subject Re: [DESIGN] ParallelAppend
Date
Msg-id 55B717B6.60202@lab.ntt.co.jp
Whole thread Raw
In response to Re: [DESIGN] ParallelAppend  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
Responses Re: [DESIGN] ParallelAppend  (Kouhei Kaigai <kaigai@ak.jp.nec.com>)
List pgsql-hackers
KaiGai-san,

On 2015-07-27 PM 11:07, Kouhei Kaigai wrote:
> 
>   Append
>    --> Funnel
>         --> PartialSeqScan on rel1 (num_workers = 4)
>    --> Funnel
>         --> PartialSeqScan on rel2 (num_workers = 8)
>    --> SeqScan on rel3
> 
>  shall be rewritten to
>   Funnel
>     --> PartialSeqScan on rel1 (num_workers = 4)
>     --> PartialSeqScan on rel2 (num_workers = 8)
>     --> SeqScan on rel3        (num_workers = 1)
> 

In the rewritten plan, are respective scans (PartialSeq or Seq) on rel1,
rel2 and rel3 asynchronous w.r.t each other? Or does each one wait for the
earlier one to finish? I would think the answer is no because then it
would not be different from the former case, right? Because the original
premise seems that (partitions) rel1, rel2, rel3 may be on different
volumes so parallelism across volumes seems like a goal of parallelizing
Append.

From my understanding of parallel seqscan patch, each worker's
PartialSeqScan asks for a block to scan using a shared parallel heap scan
descriptor that effectively keeps track of division of work among
PartialSeqScans in terms of blocks. What if we invent a PartialAppend
which each worker would run in case of a parallelized Append. It would use
some kind of shared descriptor to pick a relation (Append member) to scan.
The shared structure could be the list of subplans including the mutex for
concurrency. It doesn't sound as effective as proposed
ParallelHeapScanDescData does for PartialSeqScan but any more granular
might be complicated. For example, consider (current_relation,
current_block) pair. If there are more workers than subplans/partitions,
then multiple workers might start working on the same relation after a
round-robin assignment of relations (but of course, a later worker would
start scanning from a later block in the same relation). I imagine that
might help with parallelism across volumes if that's the case. MergeAppend
parallelization might involve a bit more complication but may be feasible
with a PartialMergeAppend with slightly different kind of coordination
among workers. What do you think of such an approach?

Thanks,
Amit




pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: Feature - Index support on an lquery field (from the ltree module)
Next
From: Ashutosh Bapat
Date:
Subject: Re: Autonomous Transaction is back