Re: Parallel Sort - Mailing list pgsql-hackers

From Noah Misch
Subject Re: Parallel Sort
Date
Msg-id 20130514145051.GA224350@tornado.leadboat.com
Whole thread Raw
In response to Re: Parallel Sort  (Kohei KaiGai <kaigai@kaigai.gr.jp>)
Responses Re: Parallel Sort  (Claudio Freire <klaussfreire@gmail.com>)
List pgsql-hackers
On Mon, May 13, 2013 at 09:52:43PM +0200, Kohei KaiGai wrote:
> 2013/5/13 Noah Misch <noah@leadboat.com>
> > The choice of whether to parallelize can probably be made a manner similar
> > to
> > the choice to do an external sort: the planner guesses the outcome for
> > costing
> > purposes, but the actual decision is made at execution time.  The planner
> > would determine a tuple count cutoff at which parallelism becomes
> > favorable,
> > and tuplesort would check that to establish its actual decision.
> 
> It probably crossovers my problem consciousness to off-load CPU bounds
> workloads; that I partially tried to implement on writable foreign table
> feature.
> Not only sorting stuff, I think it may be worthful to have capability to
> push
> heavy workload (like sort, aggregate or complex target-list) out external
> computing resources.
> However, I doubt whether the decision to parallelize should be done in
> execution time, rather than plan stage. For example, in case when we
> have enough number of records and 10-core multiprocessor, the wise
> plan may take parallel data load by 10-processors, partial-sort by 10-
> processors individually, then merge-sort. It needs fundamental different
> tree structure from the traditional single-processors based plan-tree.

That's taking a few steps more in the direction of parallel general query; at
some point, the planner would definitely become parallel-aware.  For the
narrower topic of parallel sort, I don't think it's necessary.  The node tree
supplying the sort can't run in parallel (yet), and the node pulling from the
sort won't receive the first tuple until the sort is complete.  For the time
being, the planner just needs to know enough about the sort node's projected
use of parallelism to estimate cost.

> So, it seems to me we should take an enhancement to allow to inject
> plan-tree special purpose parallel processing plan node.
> How about your opinion?

I'm not picturing how, specifically, this new node or class of nodes would be
used.  Could you elaborate?

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Cédric Villemain
Date:
Subject: Re: PostgreSQL 9.3 beta breaks some extensions "make install"
Next
From: Noah Misch
Date:
Subject: Re: Parallel Sort