Home > mailing lists

Re: Parallel Sort - Mailing list pgsql-hackers

From	Michael Paquier
Subject	Re: Parallel Sort
Date	May 14, 2013 23:12:40
Msg-id	CAB7nPqQMEOSXkVK75C=Z-kWbrWbtamA-BSQ7c=9cSV4AgTU7Sg@mail.gmail.com Whole thread Raw
In response to	Re: Parallel Sort (Noah Misch <noah@leadboat.com>)
Responses	Re: Parallel Sort
List	pgsql-hackers

Tree view

On Tue, May 14, 2013 at 11:59 PM, Noah Misch <noah@leadboat.com> wrote:

On Tue, May 14, 2013 at 01:51:42PM +0900, Michael Paquier wrote:
> On Mon, May 13, 2013 at 11:28 PM, Noah Misch <noah@leadboat.com> wrote:
>
> > * Identifying Parallel-Compatible Functions
> >
> > Not all functions can reasonably run on a worker backend. We should not
> > presume that a VOLATILE function can tolerate the unstable execution order
> > imposed by parallelism, though a function like clock_timestamp() is
> > perfectly
> > reasonable to run that way. STABLE does not have that problem, but neither
> > does it constitute a promise that the function implementation is compatible
> > with parallel execution. Consider xid_age(), which would need code
> > changes to
> > operate correctly in parallel. IMMUTABLE almost guarantees enough; there
> > may
> > come a day when all IMMUTABLE functions can be presumed parallel-safe. For
> > now, an IMMUTABLE function could cause trouble by starting a (read-only)
> > subtransaction. The bottom line is that parallel-compatibility needs to be
> > separate from volatility classes for the time being.
> >
> I am not sure that this problem is only limited to functions, but to all
> the expressions
> and clauses of queries that could be shipped and evaluated on the worker
> backends when
> fetching tuples that could be used to accelerate a parallel sort. Let's
> imagine for example
> the case of a LIMIT clause that can be used by worker backends to limit the
> number of tuples
> to sort as final result.

It's true that the same considerations apply to other plan tree constructs;
however, every such construct is known at build time, so we can study each one
and decide how it fits with parallelism.

The concept of clause parallelism for backend worker is close to the concept of clause shippability introduced in Postgres-XC. In the case of XC, the equivalent of the master backend is a backend located on a node called Coordinator that merges and organizes results fetched in parallel from remote nodes where data scans occur (on nodes called Datanodes). The backends used for tuple scans across Datanodes share the same data visibility as they use the same snapshot and transaction ID as the backend on Coordinator. This is different from the parallelism as there is no idea of snapshot import to worker backends.

However, the code in XC planner used for clause shippability evaluation is definitely worth looking at just considering the many similarities it shares with parallelism when evaluating if a given clause can be executed on a worker backend or not. It would be a waste to implement twice the same thing is there is code already available.

Since functions are user-definable, it's preferable to reason about classes of functions.

Yes. You are right.

--
Michael

pgsql-hackers by date:

From: Tom Lane
Date: 14 May 2013, 21:45:56
Subject: Re: PostgreSQL 9.3 beta breaks some extensions "make install"

From: Mark Kirkwood
Date: 15 May 2013, 00:33:41
Subject: Re: [GENERAL] autoanalyze criteria

Re: Parallel Sort - Mailing list pgsql-hackers

Previous

Next