Re: Parallel Sort - Mailing list pgsql-hackers

From Kohei KaiGai
Subject Re: Parallel Sort
Date
Msg-id CADyhKSUieOzpqS0erkTxTHtZr999f5jJR68DfT=nv=TVipbZ3g@mail.gmail.com
Whole thread
In response to Parallel Sort  (Noah Misch <noah@leadboat.com>)
Responses Re: Parallel Sort
List pgsql-hackers
2013/5/13 Noah Misch <noah@leadboat.com>
* Planner & Similar Issues

We should decide whether to actually sort in parallel based on the comparator
cost and the data size.  The system currently has no information on comparator
cost: bt*cmp (and indeed almost all built-in functions) all have procost=1,
but bttextcmp is at least 1000x slower than btint4cmp.  Let's improve the
procost estimates of all core B-tree and hash operators.  This will have other
benefits, but we will need to be cognizant of the risk of upsetting setups
that have tuned cpu_operator_cost based on the present situation.

The choice of whether to parallelize can probably be made a manner similar to
the choice to do an external sort: the planner guesses the outcome for costing
purposes, but the actual decision is made at execution time.  The planner
would determine a tuple count cutoff at which parallelism becomes favorable,
and tuplesort would check that to establish its actual decision.

It probably crossovers my problem consciousness to off-load CPU bounds
workloads; that I partially tried to implement on writable foreign table feature.
Not only sorting stuff, I think it may be worthful to have capability to push
heavy workload (like sort, aggregate or complex target-list) out external
computing resources.
However, I doubt whether the decision to parallelize should be done in
execution time, rather than plan stage. For example, in case when we
have enough number of records and 10-core multiprocessor, the wise
plan may take parallel data load by 10-processors, partial-sort by 10-
processors individually, then merge-sort. It needs fundamental different
tree structure from the traditional single-processors based plan-tree.
So, it seems to me we should take an enhancement to allow to inject
plan-tree special purpose parallel processing plan node.
How about your opinion?

Thanks,

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: erroneous restore into pg_catalog schema
Next
From: Simon Riggs
Date:
Subject: Re: Parallel Sort