Re: Parallel Sort - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Parallel Sort
Date
Msg-id CAB7nPqRfK2e_iM2L-ccMGSUGajDZTwm2Xzro3fLn9CE0LhgfCA@mail.gmail.com
Whole thread Raw
In response to Parallel Sort  (Noah Misch <noah@leadboat.com>)
Responses Re: Parallel Sort  (Robert Haas <robertmhaas@gmail.com>)
Re: Parallel Sort  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers



On Mon, May 13, 2013 at 11:28 PM, Noah Misch <noah@leadboat.com> wrote:
* Identifying Parallel-Compatible Functions

Not all functions can reasonably run on a worker backend.  We should not
presume that a VOLATILE function can tolerate the unstable execution order
imposed by parallelism, though a function like clock_timestamp() is perfectly
reasonable to run that way.  STABLE does not have that problem, but neither
does it constitute a promise that the function implementation is compatible
with parallel execution.  Consider xid_age(), which would need code changes to
operate correctly in parallel.  IMMUTABLE almost guarantees enough; there may
come a day when all IMMUTABLE functions can be presumed parallel-safe.  For
now, an IMMUTABLE function could cause trouble by starting a (read-only)
subtransaction.  The bottom line is that parallel-compatibility needs to be
separate from volatility classes for the time being.
I am not sure that this problem is only limited to functions, but to all the expressions
and clauses of queries that could be shipped and evaluated on the worker backends when
fetching tuples that could be used to accelerate a parallel sort. Let's imagine for example
the case of a LIMIT clause that can be used by worker backends to limit the number of tuples
to sort as final result.
In some ways, Postgres-XC has faced (and is still facing) similar challenges and they have
been partially solved.

I'm not sure what the specific answer here should look like.  Simply having a
CREATE FUNCTION ... PARALLEL_IS_FINE flag is not entirely satisfying, because
the rules are liable to loosen over time.
Having a flag would be enough to control parallelism, but cannot we also determine if
the execution of a function can be shipped safely to a worker based on its volatility
only? Immutable functions are presumably safe as they do not modify the database state
and give always the same result, volatile and stable functions are definitely not safe.
For such reasons, it would be better to keep things simple and rely on simple rules to
determine if a given expression can be executed safely on a backend worker.
--
Michael

pgsql-hackers by date:

Previous
From: Daniel Farina
Date:
Subject: Re: Better handling of archive_command problems
Next
From: Simon Riggs
Date:
Subject: Slicing TOAST