Re: Parallel Sort - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Parallel Sort
Date
Msg-id CAB7nPqQMEOSXkVK75C=Z-kWbrWbtamA-BSQ7c=9cSV4AgTU7Sg@mail.gmail.com
Whole thread Raw
In response to Re: Parallel Sort  (Noah Misch <noah@leadboat.com>)
Responses Re: Parallel Sort  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On Tue, May 14, 2013 at 11:59 PM, Noah Misch <noah@leadboat.com> wrote:
On Tue, May 14, 2013 at 01:51:42PM +0900, Michael Paquier wrote:
> On Mon, May 13, 2013 at 11:28 PM, Noah Misch <noah@leadboat.com> wrote:
>
> > * Identifying Parallel-Compatible Functions
> >
> > Not all functions can reasonably run on a worker backend.  We should not
> > presume that a VOLATILE function can tolerate the unstable execution order
> > imposed by parallelism, though a function like clock_timestamp() is
> > perfectly
> > reasonable to run that way.  STABLE does not have that problem, but neither
> > does it constitute a promise that the function implementation is compatible
> > with parallel execution.  Consider xid_age(), which would need code
> > changes to
> > operate correctly in parallel.  IMMUTABLE almost guarantees enough; there
> > may
> > come a day when all IMMUTABLE functions can be presumed parallel-safe.  For
> > now, an IMMUTABLE function could cause trouble by starting a (read-only)
> > subtransaction.  The bottom line is that parallel-compatibility needs to be
> > separate from volatility classes for the time being.
> >
> I am not sure that this problem is only limited to functions, but to all
> the expressions
> and clauses of queries that could be shipped and evaluated on the worker
> backends when
> fetching tuples that could be used to accelerate a parallel sort. Let's
> imagine for example
> the case of a LIMIT clause that can be used by worker backends to limit the
> number of tuples
> to sort as final result.

It's true that the same considerations apply to other plan tree constructs;
however, every such construct is known at build time, so we can study each one
and decide how it fits with parallelism.
The concept of clause parallelism for backend worker is close to the concept of clause shippability introduced in Postgres-XC. In the case of XC, the equivalent of the master backend is a backend located on a node called Coordinator that merges and organizes results fetched in parallel from remote nodes where data scans occur (on nodes called Datanodes). The backends used for tuple scans across Datanodes share the same data visibility as they use the same snapshot and transaction ID as the backend on Coordinator. This is different from the parallelism as there is no idea of snapshot import to worker backends.

However, the code in XC planner used for clause shippability evaluation is definitely worth looking at just considering the many similarities it shares with parallelism when evaluating if a given clause can be executed on a worker backend or not. It would be a waste to implement twice the same thing is there is code already available.
 
Since functions are user-definable, it's preferable to reason about classes of functions.
Yes. You are right.
--
Michael

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: PostgreSQL 9.3 beta breaks some extensions "make install"
Next
From: Mark Kirkwood
Date:
Subject: Re: [GENERAL] autoanalyze criteria